-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for saving and loading simulation state to / from files #1227
Conversation
Avoids deadlock on trying to acquire lock on loaded file handler
Ensures event not lost in partial simulations
Allows use in fixtures with non-function scope
Has side effect of mutating counter
Better to be explicit
Yes, we want new log files when restoring simulation because you'd potentially run many simulations from the same saved simulation. And they'd go in as separate Azure Batch runs, so different directories etc. |
I'm working on scenarios using this - will give comments in light of that. |
@tamuri what would be the next steps to work on for this? I think you mentioned there was some issue with non-determinancy in logging your testing with this identified. Did you get anyhere with looking at how to use with scenarios, and is there something for me to pick up there? |
I've pushed my changes to |
I've started to look at differences that arise in logs formed from either a 'full' scenario run without any suspending or resuming, or the merged logs from a pair of suspended and resumed runs (but otherwise identical scenario settings), using the scenario files and script @tamuri added on the branch From what I can see so far, most (possibly) of the discrepancies arise from bugs that also effect logging without suspent and resuming, specifically that the
The ordering issues (1) are easy enough to resolve by always sorting the data dictionary by key before logging both the header and value messages. The non-overlapping dict keys (2) will probably require manual fixing in each case as the current structured logging approach fundamentally relies on entries being alignable with each other. For the non-constant column types (3) I am not sure what the implication is - often this is for example a value initially with |
The tests in
In terms of the checks of log consistency, with some updates to the script @tamuri created (now on branch
I haven't yet figured out why this difference is occuring. Full output from script
|
With the fixes in #1445 and #1446 the check script for the logs in the continuous and interrupted simulations now show no differences 🎉 (beyond expected differences in I've now pulled in changes to I've also added a function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work to get to this stage, nice job! I've run it locally a few times on my own [small] scenarios without issues. I think it's good to go in and we can start thinking about the next step of integrating with scenarios being run on Batch 😅
Some minor suggestions
Potentially resolves #86 though currently this doesn't deal with exposing this functionality with scenarios.
Adds new methods
save_to_pickle
andload_from_pickle
that respectively save the current simulation state to a pickle file and load the simulation state from a pickle file (with the latter being a class method to allow using to directly load a simulation asSimulation.load_from_pickle
. Pickling is dealt with bydill
as this supports a much wider range of Python objects than built inpickle
implementation.dill
is added to package dependencies here, but the import is currently wrapped with logic to avoidImportError
s in environments which do not havedill
available (withsave_to_pickle
andload_from_pickle
raising informative exceptions in this case).The contents of the current
Simulation.simulate
method have also been factored out in to three separate methodsSimulation.initialise
,Simulation.run_simulation_to
andSimulation.finalise
that deal allow initialising, running and finalising the simulation separately, withSimulation.simulate
retaining the same behaviour by just calling the three in sequence. This allows simulations to be partially run to an intermediate date before the simulation end date, saved to file and then reloaded and continued.As there is also global state recorded in
tlo.logging
we also need to reconfigure logging on loading a simulation from file. I initially included some logic inload_from_pickle
to inject loaded simulation in totlo
logger and set output file to previous logging path (loggingFileHandler
loaded bydill
is not able to acquire a lock causing deadlocks if used), but decided it would be better to make this step explicit, particularly as I'd guess we would often want to write to a new log file when resuming a loaded simulation.A set of tests that check for consistency of simulations when saving and loading from file, including using to resume a partially run simulation, are also added.
Apologies for the unrelated formatting changes
src/tlo/simulation.py
, I was usingblack
to autoformat my changes and forgot it would also reformat rest of module - I can revert these bits if it makes a pain to review.Questions
save_to_pickle
andload_from_pickle
and should we just use the namessave
andload
?simulation.output_file
toNone
inSimulation.load_from_pickle
to guard against accidental use of previous log file (and potential deadlock issues)?