-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure log entries use consistent ordering and types for columns #1404
Ensure log entries use consistent ordering and types for columns #1404
Conversation
In addition to the changes described above, this PR now also includes some more general changes / clean-up to the logging modules
|
A remaining question is whether we want the current |
That sounds like a good idea, at least to start. We can monitor the warnings on the nightly scale run. |
Now changed the exception to a warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good - one minor suggestion (not a big deal).
As mentioned in #1227 (comment) currently there are instances of structured log entries where the
columns
value computed for the header entry in the log from the first log instance for that key, is not consistent with the value forcolumns
that would be computed from subsequent log instances for the key, due to for example thecolumns
dicts have keys (column names) in different orders, different values (types for a given key) or completely non-overlapping key-value pairs.This PR makes a series of related changes
columns
dict values in the header log entry are now stored in the logger object and compared to corresponding dict computed for subsequent log entries with same key and if these do not match a new exception type,InconsistentLoggedColumnsError
is raised, with details of the differences.tlo.logging.helpers
for logging the properties of an individual in the population dataframe as a dictionary in a way that ensures stability of the types of the property values. In particular this is achieved by returning the NumPy / pandas extension scalar types associated with the datatype of the array underlying a particular column / property, except for the case of nullable booleans and categoricals for which there is no corresponding scalar type, in which case a length-1 array is instead used. The JSON encoding rules for these types is also updated accordingly.float
orstr
) and in some cases to ensure keys were aligned across different logger calls.Hopefully with fixes here we should be able to achieve consistent logging when suspending and resuming simulation using changes in #1227 compared to running contiguously.