Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue for combining results #111

Open
FariborzDaneshvar-NOAA opened this issue Aug 24, 2023 · 0 comments
Open

Memory issue for combining results #111

FariborzDaneshvar-NOAA opened this issue Aug 24, 2023 · 0 comments
Assignees

Comments

@FariborzDaneshvar-NOAA
Copy link
Collaborator

I'm using c5n.18xlarge instance on NHC_COLAB_2 cluster on PW to run combine_results --schism --adcirc-like-output ./analyze command and combine SCHISM outputs, but getting memory error with large files.
For example, for a test run of Dorian with 20 ensembles, it failed when writing files:

[2023-08-24 13:24:41,431] parsing.schism  INFO    : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/perturbations.nc"
[2023-08-24 13:24:41,517] parsing.schism  INFO    : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/fort.63.nc"
[2023-08-24 13:31:44,604] parsing.schism  INFO    : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/maxele.63.nc"
[2023-08-24 13:37:25,194] parsing.schism  INFO    : writing to "/lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/analyze/fort.64.nc"
Traceback (most recent call last):
  File "/opt/conda/envs/prep/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/envs/prep/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/scripts/combine_ensemble.py", line 31, in <module>
    main(parser.parse_args())
  File "/scripts/combine_ensemble.py", line 16, in main
    output = combine_results(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/ensembleperturbation/client/combine_results.py", line 92, in combine_results
    parsed_data = combine_func(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/ensembleperturbation/parsing/schism.py", line 1332, in convert_schism_output_files_to_adcirc_like
    file_data.to_netcdf(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/dataset.py", line 2252, in to_netcdf
    return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/backends/api.py", line 1255, in to_netcdf
    writes = writer.sync(compute=compute)
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/backends/common.py", line 256, in sync
    delayed_store = chunkmanager.store(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/daskmanager.py", line 211, in store
    return store(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/threaded.py", line 89, in get
    results = get_async(
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/local.py", line 511, in get_async
    raise_exception(exc, tb)
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/local.py", line 319, in reraise
    raise exc
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/dask/local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 484, in __array__
    return np.asarray(self.get_duck_array(), dtype=dtype)
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 487, in get_duck_array
    return self.array.get_duck_array()
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 664, in get_duck_array
    return self.array.get_duck_array()
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/core/indexing.py", line 557, in get_duck_array
    array = array.get_duck_array()
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/coding/variables.py", line 74, in get_duck_array
    return self.func(self.array.get_duck_array())
  File "/opt/conda/envs/prep/lib/python3.9/site-packages/xarray/coding/variables.py", line 215, in _apply_mask
    return np.where(condition, decoded_fill_value, data)
  File "<__array_function__ internals>", line 200, in where
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 3.42 GiB for an array with shape (408, 1126302, 2) and data type float32
ERROR conda.cli.main_run:execute(49): `conda run python -m combine_ensemble --ensemble-dir /lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir/ --tracks-dir /lustre/hurricanes/dorian_2019_b08ea105-53cf-4f19-8fe3-bc34fb8aee53/setup/ensemble.dir//track_files` failed. (See above for error)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants