-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a time-slice capability to Omega IO and IOStream #189
base: develop
Are you sure you want to change the base?
Adds a time-slice capability to Omega IO and IOStream #189
Conversation
Adds a capability to read/write multiple time slices of an unlimited time dimension to a single file. This modifies just the base IO layer - IOStream support will follow in another commit. The base IO unit test and documentation have been modified to test and document the new functionality.
adds the ability to write multiple time slices to a single file in IOStreams.
Passes unit tests on Chrys/Intel and Frontier Cray cpu and gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philipwjones , this is great! This is what we need. Thanks a lot for your dedicated work on this, and I'm sorry for the delay in my review.
-
I have been testing this PR, and it performed as expected with the unit tests. However, I encountered an error while running the manufactured solution test. I will provide details on how to reproduce the error.
-
Should the time dimension and variable name
time
be changed toTime
to match Omega's naming conventions?
TimeUnits FileUnits = TimeUnitsFromString(FileUnitStr); | ||
|
||
// Various error checks | ||
if (Err1 == 0 or Err2 == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be helpful to include a log statement specifying the frequency (Freq
and FreqUnits
) and file frequency (FileFreq
and FileFreqUnits
) utilized by Omega.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we need to echo options to the log file, but I need to make some changes to the logging facility as part of the error handler so will make the changes then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Thanks.
Here is the omega.log (bottom half)
And here is an instruction to reproduce the error on Frontier (cpu): Error reproduce
When I commented out |
@hyungyukang Thanks, I'll look into the manufactured solution case, but it looks like it thinks the file should already exist and is trying to read dims for consistency. So it probably has to do with assumptions about whether a file was written at the alarm boundary, esp when a simulation starts on a boundary. Regarding time vs Time, there is no way to add a std name to dimensions so the dimension has to keep time as the name for CF compliance unless we want to translate it later. So I left the time field name consistent with the dimension name, though I could keep Time with time as a std name in that case. I'll let @xylar weigh in on the preference - don't know whether we're going to have a translation step at some point anyway as part of CMORizing. |
@philipwjones, sorry for the delay on this. I'll get to this as soon as I can. |
Quick update - there are indeed problems with the way I've computed the frame number in some specific cases (in addition to the bug above, there is one associated with calendar-based intervals like monthly and annual). These are limited to a small chunk of code that I should be able to fix pretty quickly. |
I do feel strongly that the coordinate
Yes, I think that's still inevitable but we do want cf-compliant output to the extent possible even so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to run in this directory with the following omega.yml
:
$ pwd
/lcrc/group/e3sm/ac.xylar/polaris_0.5/chrysalis/test_20250116/cosine_bell_io_time_slice/ocean/spherical/icos/cosine_bell/convergence_both/with_viz/forward/480km_1440s
$ cat omega.yml
Omega:
TimeIntegration:
CalendarType: No Leap
TimeStepper: RungeKutta4
TimeStep: 0000_00:24:00.000
StartTime: 0001-01-01_00:00:00
StopTime: none
RunDuration: 0024_00:00:00.000
Dimension:
NVertLevels: 1
Decomp:
HaloWidth: 3
DecompMethod: MetisKWay
State:
NTimeLevels: 2
Advection:
FluxThicknessType: Center
FluxTracerType: Center
Tendencies:
ThicknessFluxTendencyEnable: false
PVTendencyEnable: false
KETendencyEnable: false
SSHTendencyEnable: false
VelDiffTendencyEnable: false
ViscDel2: 1.0e3
VelHyperDiffTendencyEnable: false
ViscDel4: 1.2e11
TracerHorzAdvTendencyEnable: true
TracerDiffTendencyEnable: false
EddyDiff2: 10.0
TracerHyperDiffTendencyEnable: false
EddyDiff4: 0.0
UseCustomTendency: false
ManufacturedSolutionTendency: false
Tracers:
Base: [Temperature, Salinity]
Debug: [Debug1, Debug2, Debug3]
ManufacturedSolution:
WavelengthX: 5.0e6
WavelengthY: 4.33013e6
Amplitude: 1.0
IOStreams:
# InitialState should only be used when starting from scratch.
# For restart runs, the frequency units should be changed from
# "OnStartup" to "never" so that the initial state file is not read.
InitialState:
UsePointerFile: false
Filename: init.nc
Mode: read
Precision: double
Freq: 1
FreqUnits: OnStartup
UseStartEnd: false
Contents:
- State
- Tracers
History:
UsePointerFile: false
Filename: output.nc
Mode: write
IfExists: append
Precision: double
Freq: 2073600
FreqUnits: seconds
UseStartEnd: false
Contents:
- Tracers
- LayerThickness
- NormalVelocity
I am seeing these errors:
PIO: WARNING: Opening file (output.nc) with iotype=3 (PIO_IOTYPE_NETCDF4C) failed (ierr=2, No such file or directory). Retrying with iotype=PIO_IOTYPE_NETCDF
...
PIO: FATAL ERROR: Aborting... FATAL ERROR: No such file or directory (file = output.nc) (/home/ac.xylar/e3sm_work/omega/omega/io-time-slice/externals/scorpio/src/clib/pioc_support.c: 5090)
It seems like maybe the file is getting opened in append mode even if it doesn't exist, and that PIO is unhappy with that? Or am I doing something weird?
- Append if you want to append (eg multiple time slices) to the existing | ||
file (this option is not currently supported). | ||
file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philipwjones, could you clarify (here or elsewhere) a bit more about what happens when you choose append
? In particular, if the the model goes back in time (e.g. as a result of a restart), will time entries that already exist in the file be skipped or overwritten (or, heaven forbid, appended!) when Omega attempts to rewrite the same time entry?
In MPAS-Ocean, the clobber model refers to specific time entries, not to files. So the equivalent of replace
would mean overwriting time entries as they are reached, and append
would mean skipping existing time entries and only adding new ones. I am not saying that this is what Omega should do, just that we want to be clear about what Omega does do. At the moment, it seems like Omega likely does only one or the other of the MPAS-Ocean behaviors. If that is overwriting time entries even in append mode, I think that's okay. If someone changes how the model runs, the output will get replaced with the updated version. But if time entries get skipped in append mode, that might be an unexpected behavior for users fiddling around with the yaml file. They could probably get the intended behavior by deleting files, though. So again, just a question of documenting what the behavior is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, append is just an indicator that the file may already exist and that we should write to the existing file if it exists and if not, create a new one. It was really only meant for multi-slice files (for single-slice files, I think we want to replace typically). The intent for multi-slice files was to overwrite any time slices that already existed in the file if it's re-running the same interval. I will clarify in the docs. As I was debugging Hyun's problem, I found that I was only computing the frame/slice correctly for some frequency options (intervals that didn't depend on calendar like seconds, minutes, etc) so trying to fix that now for the calendar intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sounds good. That's the behavior I would like us to have.
- **FileFreq:** An optional entry for including multiple time slices in a | ||
single file (only supported for output presently) | ||
- **FreqUnits:** A field that, combined with the integer frequency, | ||
determines the frequency of new files when multiple time slices are | ||
included in a file. The file frequency must be less than or equal to | ||
the data frequency. Acceptable values include all of the frequency values | ||
listed above except the one-time values (eg OnStartup/Shutdown or AtTime). | ||
The filename template should reflect the frequency and the time used for | ||
the file will correspond to the template at the time the file is first | ||
opened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should these be set to if you just want all time entries to go to a single file? Should they be omitted? Or set to a very large frequency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Omega write contents to these files on startup (before any time steps have run)? Do we need an option for specifying whether to write on startup or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar yes, if you want all entries in a single file, you'll need a file frequency that's very large. And no, we currently don't have a write on startup option, though you could kludge it by defining a separate stream with freq OnStartup that writes to the same file. Can you remind me what the use case is for writing on startup? Would you just need contents and metadata? Or would you actually want a full write?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MPAS-Ocean, it has been pretty standard to write out history files that include the initial state. This is particularly important for diagnostic variables that aren't part of an initial condition or restart file.
But we can cross that bridge when we come to it. I think there may be a few places in Polaris where we assume that the first output for test cases is the initial condition. We'll have to read the initial condition instead, but that's easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xylar yes, if you want all entries in a single file, you'll need a file frequency that's very large.
Is the fact that I didn't have FileFreq
and FileFreqUnits
perhaps why I got the errors I reported above? I can try adding those and running again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but there may be other bugs too. Worth a try.
And on the startup thing, it's not a huge deal to add since I already have the OnStartup flag internally. But it'll probably require the explicit flag and then not using it as freq units - requiring a bit of change in interpretation for one-time read/writes. Maybe it's clearer that way anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't feel like you have to include that change here. It might be something we need down the road so it sounds good to keep it in mind.
Yes to following CF conventions in output which uses "time". |
@@ -114,6 +116,16 @@ a template can be: | |||
- Hours for a frequency every Freq hours (*not* Freq times per hour) | |||
- Minutes for a frequency every Freq minutes (*not* Freq times per minute) | |||
- Seconds for a frequency every Freq seconds (*not* Freq times per seconds) | |||
- **FileFreq:** An optional entry for including multiple time slices in a | |||
single file (only supported for output presently) | |||
- **FreqUnits:** A field that, combined with the integer frequency, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- **FreqUnits:** A field that, combined with the integer frequency, | |
- **FileFreqUnits:** A field that, combined with the integer frequency, |
@philipwjones, I'm still getting the
My test is in:
if you need a reproducer. |
With these changes, users can write multiple time slices of a field to a given file. This is specified through the stream input YAML configuration by adding a file frequency in addition to the data frequency. Currently this only applies to output streams. While input multi-frame reads are supported at the base IO level, I am waiting for forcing designs to determine how best to support multiple slices in an input file. Unit tests for reading/writing multiple slices from a file have been added for the base IO level and all unit tests pass. I have modified the default YAML config to write 10-day high-frequency output for two months in two monthly files and verified that correct times have been written for the time field. The user and developer docs have been updated to describe how to configure and use this feature.
Checklist