Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a time-slice capability to Omega IO and IOStream #189

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

philipwjones
Copy link

@philipwjones philipwjones commented Dec 20, 2024

With these changes, users can write multiple time slices of a field to a given file. This is specified through the stream input YAML configuration by adding a file frequency in addition to the data frequency. Currently this only applies to output streams. While input multi-frame reads are supported at the base IO level, I am waiting for forcing designs to determine how best to support multiple slices in an input file. Unit tests for reading/writing multiple slices from a file have been added for the base IO level and all unit tests pass. I have modified the default YAML config to write 10-day high-frequency output for two months in two monthly files and verified that correct times have been written for the time field. The user and developer docs have been updated to describe how to configure and use this feature.

Checklist

  • Documentation:
    • Design document has been generated and added to the docs
    • User's Guide has been updated
    • Documentation has been built locally and changes look as expected
  • Testing
    • A comment in the PR documents testing used to verify the changes including any tests that are added/modified/impacted.
    • CTest unit tests for new features have been added per the approved design.
    • Polaris tests for new features have been added per the approved design (and included in a test suite)
    • Unit tests have passed. Please provide a relevant CDash build entry for verification.
    • Polaris test suite has passed

   Adds a capability to read/write multiple time slices of
   an unlimited time dimension to a single file. This modifies
   just the base IO layer - IOStream support will follow in
   another commit.
   The base IO unit test and documentation have been modified
   to test and document the new functionality.
  adds the ability to write multiple time slices to a single
  file in IOStreams.
@philipwjones
Copy link
Author

Passes unit tests on Chrys/Intel and Frontier Cray cpu and gpu

Copy link

@hyungyukang hyungyukang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipwjones , this is great! This is what we need. Thanks a lot for your dedicated work on this, and I'm sorry for the delay in my review.

  • I have been testing this PR, and it performed as expected with the unit tests. However, I encountered an error while running the manufactured solution test. I will provide details on how to reproduce the error.

  • Should the time dimension and variable name time be changed to Time to match Omega's naming conventions?

components/omega/src/infra/IOStream.cpp Outdated Show resolved Hide resolved
components/omega/src/infra/IOStream.cpp Outdated Show resolved Hide resolved
TimeUnits FileUnits = TimeUnitsFromString(FileUnitStr);

// Various error checks
if (Err1 == 0 or Err2 == 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be helpful to include a log statement specifying the frequency (Freq and FreqUnits) and file frequency (FileFreq and FileFreqUnits) utilized by Omega.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we need to echo options to the log file, but I need to make some changes to the logging facility as part of the error handler so will make the changes then.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks.

@hyungyukang
Copy link

hyungyukang commented Jan 13, 2025

Here is the omega.log when running the manufactured solution test with FileFreq = 1 and FileFreqUnits = days:

omega.log (bottom half)

[* info] [IOStream.cpp:2470] Successfully read stream InitialState from file OmegaMesh.nc
[* info] [OceanRun.cpp:56] ocnRun: Time step 1 complete, clock time: 0001-01-01-00:10:00.0000
[* info] [OceanRun.cpp:56] ocnRun: Time step 2 complete, clock time: 0001-01-01-00:20:00.0000
[* info] [OceanRun.cpp:56] ocnRun: Time step 3 complete, clock time: 0001-01-01-00:30:00.0000
[* info] [OceanRun.cpp:56] ocnRun: Time step 4 complete, clock time: 0001-01-01-00:40:00.0000
[* info] [OceanRun.cpp:56] ocnRun: Time step 5 complete, clock time: 0001-01-01-00:50:00.0000
[* error] [IO.cpp:334] PIO error while reading dimension MaxCellsOnEdge
[* error] [IO.cpp:334] PIO error while reading dimension TWO
[* error] [IO.cpp:334] PIO error while reading dimension MaxEdges
[* error] [IO.cpp:334] PIO error while reading dimension maxEdges
[* error] [IO.cpp:334] PIO error while reading dimension NCells
[* error] [IO.cpp:334] PIO error while reading dimension nCells
[* error] [IO.cpp:334] PIO error while reading dimension NEdges
[* error] [IO.cpp:334] PIO error while reading dimension nEdges
[* error] [IO.cpp:334] PIO error while reading dimension NTracers
[* error] [IO.cpp:334] PIO error while reading dimension nTracers
[* error] [IO.cpp:334] PIO error while reading dimension NVertLevels
[* error] [IO.cpp:334] PIO error while reading dimension nVertLevels
[* error] [IO.cpp:334] PIO error while reading dimension NVertices
[* error] [IO.cpp:334] PIO error while reading dimension nVertices
[* error] [IO.cpp:334] PIO error while reading dimension VertexDegree
[* error] [IO.cpp:334] PIO error while reading dimension vertexDegree
[* error] [IOStream.cpp:2624] Error defined dimensions for file ocn.hist.0001-01-01_00:00:00.nc
[* error] [IOStream.cpp:310] writeAll error in stream History
[* error] [IOStream.cpp:310] writeAll error in stream InitialState
[* error] [IOStream.cpp:310] writeAll error in stream RestartRead
[* critical] [OceanRun.cpp:51] Error writing streams at end of step
[* error] [OceanDriver.cpp:47] Error advancing Omega run interval
[* info] [Tracers.cpp:249] Tracers::clear() called
[* error] [OceanDriver.cpp:61] OMEGA terminating due to error

And here is an instruction to reproduce the error on Frontier (cpu):

Error reproduce
git clone [email protected]:philipwjones/E3SM.git
git checkout omega/io-time-slice
git submodule update --init --recursive externals/ekat externals/scorpio cime

cd components/omega

mkdir build

cd build

module load cmake

export PARMETIS_ROOT=/ccs/proj/cli115/software/polaris/frontier/spack/dev_polaris_0_4_0_crayclang_mpich/var/spack/environments/dev_polaris_0_4_0_crayclang_mpich/.spack-env/view

cmake \
   -DOMEGA_CIME_COMPILER=crayclang \
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=frontier \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON \
   -Wno-dev \
   ../

./omega_build.sh

# Copy OmegaMesh.nc (the manufactured solution init fields (200km)) and omega.yml
cp /lustre/orion/cli115/proj-shared/hgkang/shared/OmegaMesh.nc ./
cp /lustre/orion/cli115/proj-shared/hgkang/shared/omega.yml ./

salloc -A cli115 -J e3sm -t 1:00:00 -p batch -N 1

./omega_run.sh

When I commented out FileFreq and FileFreqUnits in omega.yml, the simulation completed successfully but produced several single-time frame output files as expected. I will also debug the issue.

@philipwjones
Copy link
Author

@hyungyukang Thanks, I'll look into the manufactured solution case, but it looks like it thinks the file should already exist and is trying to read dims for consistency. So it probably has to do with assumptions about whether a file was written at the alarm boundary, esp when a simulation starts on a boundary.

Regarding time vs Time, there is no way to add a std name to dimensions so the dimension has to keep time as the name for CF compliance unless we want to translate it later. So I left the time field name consistent with the dimension name, though I could keep Time with time as a std name in that case. I'll let @xylar weigh in on the preference - don't know whether we're going to have a translation step at some point anyway as part of CMORizing.

@xylar
Copy link

xylar commented Jan 13, 2025

@philipwjones, sorry for the delay on this. I'll get to this as soon as I can.

@philipwjones
Copy link
Author

Quick update - there are indeed problems with the way I've computed the frame number in some specific cases (in addition to the bug above, there is one associated with calendar-based intervals like monthly and annual). These are limited to a small chunk of code that I should be able to fix pretty quickly.

@xylar
Copy link

xylar commented Jan 16, 2025

Regarding time vs Time, there is no way to add a std name to dimensions so the dimension has to keep time as the name for CF compliance unless we want to translate it later. So I left the time field name consistent with the dimension name, though I could keep Time with time as a std name in that case. I'll let @xylar weigh in on the preference

I do feel strongly that the coordinate time and the dimension time need to have the same name. That's what lots of tools are expecting. And it should be lowercase.

don't know whether we're going to have a translation step at some point anyway as part of CMORizing.

Yes, I think that's still inevitable but we do want cf-compliant output to the extent possible even so.

Copy link

@xylar xylar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run in this directory with the following omega.yml:

$ pwd
/lcrc/group/e3sm/ac.xylar/polaris_0.5/chrysalis/test_20250116/cosine_bell_io_time_slice/ocean/spherical/icos/cosine_bell/convergence_both/with_viz/forward/480km_1440s

$ cat omega.yml
Omega:
  TimeIntegration:
    CalendarType: No Leap
    TimeStepper: RungeKutta4
    TimeStep: 0000_00:24:00.000
    StartTime: 0001-01-01_00:00:00
    StopTime: none
    RunDuration: 0024_00:00:00.000
  Dimension:
    NVertLevels: 1
  Decomp:
    HaloWidth: 3
    DecompMethod: MetisKWay
  State:
    NTimeLevels: 2
  Advection:
    FluxThicknessType: Center
    FluxTracerType: Center
  Tendencies:
    ThicknessFluxTendencyEnable: false
    PVTendencyEnable: false
    KETendencyEnable: false
    SSHTendencyEnable: false
    VelDiffTendencyEnable: false
    ViscDel2: 1.0e3
    VelHyperDiffTendencyEnable: false
    ViscDel4: 1.2e11
    TracerHorzAdvTendencyEnable: true
    TracerDiffTendencyEnable: false
    EddyDiff2: 10.0
    TracerHyperDiffTendencyEnable: false
    EddyDiff4: 0.0
    UseCustomTendency: false
    ManufacturedSolutionTendency: false
  Tracers:
    Base: [Temperature, Salinity]
    Debug: [Debug1, Debug2, Debug3]
  ManufacturedSolution:
    WavelengthX: 5.0e6
    WavelengthY: 4.33013e6
    Amplitude: 1.0
  IOStreams:
    # InitialState should only be used when starting from scratch.
    # For restart runs, the frequency units should be changed from
    # "OnStartup" to "never" so that the initial state file is not read.
    InitialState:
      UsePointerFile: false
      Filename: init.nc
      Mode: read
      Precision: double
      Freq: 1
      FreqUnits: OnStartup
      UseStartEnd: false
      Contents:
      - State
      - Tracers
    History:
      UsePointerFile: false
      Filename: output.nc
      Mode: write
      IfExists: append
      Precision: double
      Freq: 2073600
      FreqUnits: seconds
      UseStartEnd: false
      Contents:
      - Tracers
      - LayerThickness
      - NormalVelocity

I am seeing these errors:

PIO: WARNING: Opening file (output.nc) with iotype=3 (PIO_IOTYPE_NETCDF4C) failed (ierr=2, No such file or directory). Retrying with iotype=PIO_IOTYPE_NETCDF
...

PIO: FATAL ERROR: Aborting... FATAL ERROR: No such file or directory (file = output.nc) (/home/ac.xylar/e3sm_work/omega/omega/io-time-slice/externals/scorpio/src/clib/pioc_support.c: 5090)

It seems like maybe the file is getting opened in append mode even if it doesn't exist, and that PIO is unhappy with that? Or am I doing something weird?

Comment on lines 97 to +98
- Append if you want to append (eg multiple time slices) to the existing
file (this option is not currently supported).
file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipwjones, could you clarify (here or elsewhere) a bit more about what happens when you choose append? In particular, if the the model goes back in time (e.g. as a result of a restart), will time entries that already exist in the file be skipped or overwritten (or, heaven forbid, appended!) when Omega attempts to rewrite the same time entry?

In MPAS-Ocean, the clobber model refers to specific time entries, not to files. So the equivalent of replace would mean overwriting time entries as they are reached, and append would mean skipping existing time entries and only adding new ones. I am not saying that this is what Omega should do, just that we want to be clear about what Omega does do. At the moment, it seems like Omega likely does only one or the other of the MPAS-Ocean behaviors. If that is overwriting time entries even in append mode, I think that's okay. If someone changes how the model runs, the output will get replaced with the updated version. But if time entries get skipped in append mode, that might be an unexpected behavior for users fiddling around with the yaml file. They could probably get the intended behavior by deleting files, though. So again, just a question of documenting what the behavior is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, append is just an indicator that the file may already exist and that we should write to the existing file if it exists and if not, create a new one. It was really only meant for multi-slice files (for single-slice files, I think we want to replace typically). The intent for multi-slice files was to overwrite any time slices that already existed in the file if it's re-running the same interval. I will clarify in the docs. As I was debugging Hyun's problem, I found that I was only computing the frame/slice correctly for some frequency options (intervals that didn't depend on calendar like seconds, minutes, etc) so trying to fix that now for the calendar intervals.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good. That's the behavior I would like us to have.

Comment on lines +119 to +128
- **FileFreq:** An optional entry for including multiple time slices in a
single file (only supported for output presently)
- **FreqUnits:** A field that, combined with the integer frequency,
determines the frequency of new files when multiple time slices are
included in a file. The file frequency must be less than or equal to
the data frequency. Acceptable values include all of the frequency values
listed above except the one-time values (eg OnStartup/Shutdown or AtTime).
The filename template should reflect the frequency and the time used for
the file will correspond to the template at the time the file is first
opened.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should these be set to if you just want all time entries to go to a single file? Should they be omitted? Or set to a very large frequency?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Omega write contents to these files on startup (before any time steps have run)? Do we need an option for specifying whether to write on startup or not?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar yes, if you want all entries in a single file, you'll need a file frequency that's very large. And no, we currently don't have a write on startup option, though you could kludge it by defining a separate stream with freq OnStartup that writes to the same file. Can you remind me what the use case is for writing on startup? Would you just need contents and metadata? Or would you actually want a full write?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MPAS-Ocean, it has been pretty standard to write out history files that include the initial state. This is particularly important for diagnostic variables that aren't part of an initial condition or restart file.

But we can cross that bridge when we come to it. I think there may be a few places in Polaris where we assume that the first output for test cases is the initial condition. We'll have to read the initial condition instead, but that's easy.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xylar yes, if you want all entries in a single file, you'll need a file frequency that's very large.

Is the fact that I didn't have FileFreq and FileFreqUnits perhaps why I got the errors I reported above? I can try adding those and running again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but there may be other bugs too. Worth a try.

And on the startup thing, it's not a huge deal to add since I already have the OnStartup flag internally. But it'll probably require the explicit flag and then not using it as freq units - requiring a bit of change in interpretation for one-time read/writes. Maybe it's clearer that way anyway.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't feel like you have to include that change here. It might be something we need down the road so it sounds good to keep it in mind.

@rljacob
Copy link
Member

rljacob commented Jan 16, 2025

Yes to following CF conventions in output which uses "time".

@@ -114,6 +116,16 @@ a template can be:
- Hours for a frequency every Freq hours (*not* Freq times per hour)
- Minutes for a frequency every Freq minutes (*not* Freq times per minute)
- Seconds for a frequency every Freq seconds (*not* Freq times per seconds)
- **FileFreq:** An optional entry for including multiple time slices in a
single file (only supported for output presently)
- **FreqUnits:** A field that, combined with the integer frequency,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **FreqUnits:** A field that, combined with the integer frequency,
- **FileFreqUnits:** A field that, combined with the integer frequency,

@xylar
Copy link

xylar commented Jan 17, 2025

@philipwjones, I'm still getting the No such file or directory (file = output.nc) error even when I added:

      FileFreq: 9999
      FileFreqUnits: years

My test is in:

/lcrc/group/e3sm/ac.xylar/polaris_0.5/chrysalis/test_20250117/cosine_bell_io_time_slice/ocean/spherical/icos/cosine_bell/convergence_both/with_viz/forward/480km_1440s

if you need a reproducer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants