Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider timeseries for building the surrogate model #108

Open
SorooshMani-NOAA opened this issue Aug 10, 2023 · 6 comments
Open

Consider timeseries for building the surrogate model #108

SorooshMani-NOAA opened this issue Aug 10, 2023 · 6 comments
Assignees

Comments

@SorooshMani-NOAA
Copy link
Collaborator

SorooshMani-NOAA commented Aug 10, 2023

Currently only max water elevation is used to train the surrogate model. We'd like to consider the whole timeseries to see how it affects the surrogate output.

Tasks:

@saeed-moghimi-noaa
@WPringle
@SorooshMani-NOAA

@SorooshMani-NOAA SorooshMani-NOAA self-assigned this Aug 14, 2023
@SorooshMani-NOAA
Copy link
Collaborator Author

@FariborzDaneshvar-NOAA since you started exploring this item, can you please either link an existing ticket or just use this ticket to document your progress and impediments (like #128)

@FariborzDaneshvar-NOAA
Copy link
Collaborator

With the stacking suggestion in #129 (comment), I was able to execute the subset_dataset() function with stacked time&node! But the conversion of the KL surrogate model to the overall surrogate for each node step (execution of surrogate_from_karhunen_loeve() function) failed with MemoryError!

One suggestion was using a chunk of time steps. Here I will provide updates on that regard.

@FariborzDaneshvar-NOAA
Copy link
Collaborator

Building surrogate model for the first 100 time steps:

time_chunk = elev_timeseries.sel(time=slice("2018-08-30T13:00:00.000000000", "2018-09-03T16:00:00.000000000"))
time_chunk_stack = time_chunk.rename(
    nSCHISM_hgrid_node='node'
).stack(
    stacked=('time','node'), create_index=False
).swap_dims(
    stacked='node'
)
subset = subset_dataset(ds=time_chunk_stack, ...)

It went through and here are plots:

kL eigenvalues KL fit
KL_eigenvalues KL_fit
KL-surrogate fit validation boxplots
kl_surrogate_fit validation_boxplot
sensitivities model vs surrogate
sensitivities validation_vortex_4_variable_korobov_1

This results look weird! and to me the KL fit didn't work correctly! One possibility is that the first 100 time steps used here are long before landfall and minimal variation might exist between them. It also reveals the issue in the plotting function I mentioned earlier here #132

Despite these results, I couldn't make percentile and probability plots due to MemoryError : Unable to allocate 1.15 TiB for an array with shape (15772912, 10000) and data type float64

@FariborzDaneshvar-NOAA
Copy link
Collaborator

I also tried opening subset.nc with dask (chunk=auto), but it didn't change the outcome of memory error (still getting the same message for percentile and probability plots!
But interestingly, the sensitivity plots for along-track were different! (see below) @SorooshMani-NOAA how that might be possible?!
image

@SorooshMani-NOAA
Copy link
Collaborator Author

SorooshMani-NOAA commented Jan 19, 2024

@FariborzDaneshvar-NOAA about the memory issue, the problem is that in the function you showed me the other day it is calling numpy function directly, which means it will get all values to memory and then executes the function (as far as I understand). So you need to also change the function where the numpy method is called.

I'm not sure what is happening in the plots. Are you sure that mapping back to physical space is done correctly? Since we have a time-node dimension where neither times nor nodes are necessarily aligned, so we have to be very careful when reshaping.

I'm not sure if the plots we get are actually meaningful!

@FariborzDaneshvar-NOAA
Copy link
Collaborator

@SorooshMani-NOAA thanks for your comment, you brought up a good point about results! I didn't reshape it back to time/node, which might explain these plots, but it's not clear to me at which step it should be reshaped!

This new memory issue is different from what I mentioned before (for the numpy function in the surrogate expansion, when I used the entire time step), but you are right, it should be addressed separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants