-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove copy statements in sea ice metrics #1041
Conversation
@jasonb5 just looping you in on this thread, so we can try and sync concurrent changes to get to the final binder config that works seamlessly. Any suggestions for @acordonez to get these changes merged into your working |
@acordonez I just took a quick peek at code, and if you wedge this somewhere, you could call this to report resource usage throughout code execution: >>> import gc, resource
>>> import numpy as np
>>> def getResources():
... memGb = "{:05.2f}".format(np.float32(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)/1.e6)
... pyObj = "{:08d}".format(len(gc.get_objects()))
... print("Resources:", memGb, "Gb;", pyObj, "pyObjs in memory")
...
>>> getResources()
Resources: 00.03 Gb; 00021319 pyObjs in memory I just wedged this into the driver file as a test, I can push this to your branch if you like? |
@durack1 I'm doing something like that in my code locally, just don't want to commit it. |
@acordonez, just curious what if use |
@lee1043 Deleted my last comment about the .compute() statement, it actually doesn't seem to make a difference when looking at multiple runs. |
@lee1043 using open_mfdataset increases my max memory use back up, to 11GB. So it's better to use open_dataset if there's only one file to open in terms of memory. |
Masking out the eight different regions is where the memory usage jumps from 3 to 7.5GB, so even though I'm not making new copies directly the program still uses more space for that. |
a plus ~3Gb/~50% delta using a different function, woah, this is bad! It seems to me that excessive memory usage is an issue that a number of folks have encountered: The pangeo comment (below) seems to suggest that using |
@lee1043 I think this requests that the lazy loading is actually implemented, so in the case that you choose Turns out I was a little offbase, see pydata/xarray#6837 for a better description |
Update xcdat_openxml.py to use `open_dataset` instead of `open_mfdataset` to save memory, considering discussions at #1041 (comment)
@durack1 @lee1043 I've just been testing the Arctic section of the code; but with rearranging some steps and heavily using the del statement, I've been able to get the max memory usage down to 5.8GB. That's roughly half of what it was originally. I don't know if we can get it much lower than that. I need to clean up the code before I can push it but it could be ready sometime this afternoon. |
This sounds like great progress. If we can get that merged into And just to check, this 5.8Gb is for the whole notebook, even down to the multi-model examples at the bottom? |
@sashakames just looping you on this thread, as this real-world example of resource usage is probably something that would be useful in the ESGF2-US project discussions |
@durack1 No this is 5.8 GB for just the Arctic section of the driver code, tested on the command line. |
Ah ha, ok. It might be useful to run through the whole thing, so we can ascertain the MAXIMUM memory usage, and if the nimbus machine can support it, bump each binder environment (likely ~40 concurrently) from 10240 to whatever this max usage is, plus a buffer |
@acordonez just in case it's useful, I just found https://github.com/pythonprofilers/memory_profiler/ - which appears to be alive on conda-forge (here the |
@acordonez here are a couple of other tidbits: pydata/xarray#3200 (comment) - using |
@durack1 Thanks. I'm working on a general fix for the regions since that made a big difference for the Arctic, then I can see if any of the profilers show other spots for improvement. |
Perfect! I think while it's a little painful, we're learning alot here. The comment I made about suppressing warnings, that could also be caught quickly (e.g. here), although we'll have to be careful about using this when we could be suppressing really useful debugging info from users that are going to want to play outside of the basic notebook template import warnings
warnings.simplefilter("ignore") |
@acordonez just a little python hack, you can use |
@lee1043 @durack1 I've pushed my fix to lower the memory usage of the regional subsetting. This did result in some refactoring of the model loops, and there are some small changes to the resulting metrics values. See my main comment as well. I've pushed a rerun version of the notebook. @durack1 Adding this to the sea ice driver did not suppress warnings for me. I can try some other things later but need to run now.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@acordonez this looks great!
@acordonez thank you for the update! The numbers in JSON are consistent before and after the change, they are not different at least until 3rd decimal point. |
@lee1043 I did spot a couple of changes (there are likely a couple of extras that I missed)
|
I think @durack1 caught the worst changes. I edited the main comment to add more details about how the calculations changed. |
@lee1043 bummer, I just ran the code (pre this PR merge) and maxed out memory even with the 16384 MB limit, so this refactoring is required for the notebook to work with the update config. Once this is merged, I'll take another try. Edit: I just tried launching this branch through binder, and it's now out of sync with Edit2: bummer, I think I just managed to overwrite sea_ice_driver.py -p sea_ice_param.py --osyear 1990 --oeyear 2015 --msyear 1990 --meyear 2015 and this seemed to work, maxing out around ~12.5 GB through the Antarctic section. I have now tried this ~4 times or so and it seems to work everytime Edit3: ok, great, I tested this again couple of times, with kernel restarts each time. It works from start to end seamlessly with the temporal limitation from 1990 through 2015. I'll send a copy of the updated notebook via email, which has the most trivial edits over the top of what is the current |
Also if we add import logging
logging.getLogger('xcdat').setLevel(logging.ERROR) to all files calling xcdat, then we suppress those annoying warnings, voila - this cleans up the output for users a lot. It seems to work in the sector and line plotting scripts, but dropping it into |
@durack1 if you got this working on your end would you be able to open a PR for it? The suppress warnings statement has not worked for me inside the PMP code. |
@acordonez yes please. By merging this PR we can start testing the binderhub, and additional edits can be handled by a separate PR. |
Remove copy statements to reduce memory needs in sea ice metrics.
With local testing this reduced max memory usage for the Arctic section from 12 -> 8GB, so it seems to help.
Edit: The new changes I'm pushing get the max memory to about 4Gb for me on gates, on the command line. I've refactored the code a bit and the metrics calculation order has changed a bit in the multiple realizations case. I'm getting slightly different MSE for the climatology for some realizations, and it's not clear why. The difference is usually in the 0.1 or 0.01 place. The figures look very similar but you can see some small shifts in some of the realization spread. I can keep digging into this but it's going to take me at least another day to see if this is a bug or the natural result of the refactor.
Edit2: Some more explanation of the refactor. Previously I would loop through all realizations and save the total ice extent timeseries for each realization. At the end of the loop, I'd get the model mean, calculate the overall total extent and climatologies, and get the metrics.
The new method gets the total mean extent and climatological extent for each realization, then gets the average of those quantities over all models and calculates all the metrics. For the total extent the difference is quite small, but for the climatology this seems to make more of a difference for the MSE.