-
Notifications
You must be signed in to change notification settings - Fork 1
Removing units from Quantity #65
Comments
Any xarray user code can do this by defining the units expected in its interfaces, and setting the appropriate units metadata on the output. The idea of including this attribute was to encourage such user code. What you described as a "workaround" isn't a workaround in my opinion, it's an important part of the code which defines the units interface as code. In the code you linked, I would argue the code to supply units that was added should have been added whether it was using Quantity or only using DataArray before writing to netcdf.
Quantity is an array combined with the metadata you need to understand its purpose and use in the model. The one piece of information still missing (which @ofuhrer has advocated to add, and we still might) is something like a In the case of MPI, the communicator is also responsible for passing metadata, e.g. scattering this units attribute when doing scatter operations. The docs outline (and discourage) that if you really don't want to manage units, you can set units to an empty string or other |
I agree that units could be enforced before I/O to provide full metadata for downstream users. However, the conversion from numpy array/DataArray to Quantity is
The feature would have more benefit if |
I'd be kind of excited to add units-aware arithmetic support, it's always been a long-term goal. The hardest part is coming up with an API set that's small enough to maintain and keep backwards-compatible but large enough to cover most of our use cases. Early on we didn't have enough experience to determine this, but maybe we could now. |
I think there is a middle ground between dumping units entirely (which is really annoying when inspecting fields or doing I/O) and ensuring that they are always up to date with overloading arithmetic operations between Quantity's and doing units checking. |
To me the "and" in this sentence indicates that these features maybe should be more decoupled in the code than they currently are, so that the limitations/assumptions of one do not limit the usability of the other. |
To me, a layered strategy where there is a quantity w/o metadata, and a quantity w/ metadata would be the best of both worlds. The former could be used for low-level model internals, and the latter for I/O, etc. |
This is already supported - any user package (including the prognostic run) can make a pretty small Quantity subclass, or even a factory function, that automatically sets units to "unknown".
Part of the purpose of Quantity is to help debug high-level model internals. Low-level model internals use |
Just wanted to write the same thing. While the single responsibility principle is a good one, here I would actually think that just not setting the units is a fine approach instead of having two separate concepts. |
Would this work with set_state in the wrapper? Does that routine make sure that the units match the values in the json files? |
Currently yes. If we ever added unit validation in the wrapper set_state, e.g. to automatically convert units, I would expect we at least offer a |
The idea of having a special units metadata is well motivated. In a world where all (or even most) computations preserved units metadata, it would make sense to keep this metadata around. However, in practice most python code doesn't do this, so this required piece of "metadata" often raises errors and leads to workarounds like this: ai2cm/fv3net#1145 (review).
Also, my understanding is that Quantity mostly represents arrays of numbers that can be passed around by MPI. This concern is different from the concern of enforcing metadata requirements and "physical correctness". Can we do away with the
units
attribute or possibly make it optional or make aQuantityWithUnits
subclass ofQuantity
?The text was updated successfully, but these errors were encountered: