Add support for saving and loading simulation state to / from files #1227

matt-graham · 2023-12-11T17:59:16Z

Potentially resolves #86 though currently this doesn't deal with exposing this functionality with scenarios.

Adds new methods save_to_pickle and load_from_pickle that respectively save the current simulation state to a pickle file and load the simulation state from a pickle file (with the latter being a class method to allow using to directly load a simulation as Simulation.load_from_pickle. Pickling is dealt with by dill as this supports a much wider range of Python objects than built in pickle implementation. dill is added to package dependencies here, but the import is currently wrapped with logic to avoid ImportErrors in environments which do not have dill available (with save_to_pickle and load_from_pickle raising informative exceptions in this case).

The contents of the current Simulation.simulate method have also been factored out in to three separate methods Simulation.initialise, Simulation.run_simulation_to and Simulation.finalise that deal allow initialising, running and finalising the simulation separately, with Simulation.simulate retaining the same behaviour by just calling the three in sequence. This allows simulations to be partially run to an intermediate date before the simulation end date, saved to file and then reloaded and continued.

As there is also global state recorded in tlo.logging we also need to reconfigure logging on loading a simulation from file. I initially included some logic in load_from_pickle to inject loaded simulation in to tlo logger and set output file to previous logging path (logging FileHandler loaded by dill is not able to acquire a lock causing deadlocks if used), but decided it would be better to make this step explicit, particularly as I'd guess we would often want to write to a new log file when resuming a loaded simulation.

A set of tests that check for consistency of simulations when saving and loading from file, including using to resume a partially run simulation, are also added.

Apologies for the unrelated formatting changes src/tlo/simulation.py, I was using black to autoformat my changes and forgot it would also reformat rest of module - I can revert these bits if it makes a pain to review.

Questions

Instead of save_to_pickle and load_from_pickle and should we just use the names save and load?
Should we explicitly set simulation.output_file to None in Simulation.load_from_pickle to guard against accidental use of previous log file (and potential deadlock issues)?

Avoids deadlock on trying to acquire lock on loaded file handler

Ensures event not lost in partial simulations

Allows use in fixtures with non-function scope

Has side effect of mutating counter

…l tests

Better to be explicit

… times

matt-graham · 2023-12-12T09:53:53Z

Failing test in tests/test_malaria.py is due to #1230 which #1231 should fix

tamuri · 2023-12-12T10:35:54Z

As there is also global state recorded in tlo.logging we also need to reconfigure logging on loading a simulation from file. I initially included some logic in load_from_pickle to inject loaded simulation in to tlo logger and set output file to previous logging path (logging FileHandler loaded by dill is not able to acquire a lock causing deadlocks if used), but decided it would be better to make this step explicit, particularly as I'd guess we would often want to write to a new log file when resuming a loaded simulation.

Yes, we want new log files when restoring simulation because you'd potentially run many simulations from the same saved simulation. And they'd go in as separate Azure Batch runs, so different directories etc.

tamuri · 2023-12-18T08:50:46Z

I'm working on scenarios using this - will give comments in light of that.

matt-graham · 2024-04-08T10:55:52Z

@tamuri what would be the next steps to work on for this? I think you mentioned there was some issue with non-determinancy in logging your testing with this identified. Did you get anyhere with looking at how to use with scenarios, and is there something for me to pick up there?

tamuri · 2024-05-18T17:58:03Z

I've pushed my changes to scenario.py here, as well as three scenario files and a script to check log output matches.

matt-graham · 2024-05-23T16:20:15Z

I've started to look at differences that arise in logs formed from either a 'full' scenario run without any suspending or resuming, or the merged logs from a pair of suspended and resumed runs (but otherwise identical scenario settings), using the scenario files and script @tamuri added on the branch tamuri/suspend-restore-scenario.

From what I can see so far, most (possibly) of the discrepancies arise from bugs that also effect logging without suspent and resuming, specifically that the columns entry logged in the header message for a first log entry is not consistent with later log entries, because of one or more of

Keys in (dict / dataframe) data logged being in different orders between first and subsequent calls to logger
Keys in (dict / dataframe) data logged on first and subsequent calls to logger differ (this seems to typically be when logging a multi-index series generated by a group-by operation and converted to a dict, where not all combinations of the index are present on each log iteration, for example if grouping on age_years one or more specific age_years values may be missing for any given set of data)
Values (types) associated with data logged on first and subsequent calls differ (this one is very common)

The ordering issues (1) are easy enough to resolve by always sorting the data dictionary by key before logging both the header and value messages.

The non-overlapping dict keys (2) will probably require manual fixing in each case as the current structured logging approach fundamentally relies on entries being alignable with each other.

For the non-constant column types (3) I am not sure what the implication is - often this is for example a value initially with int type being subsequently float, or bool being subsequently NoneType, along with some cases of types swapping between scalar types and lists (mainly in RTI module for the latter).

matt-graham · 2024-07-25T17:00:49Z

The tests in tests/test_simulation.py and tests/test_healthsystem.py failing here seem to be failing due to runner being out of disk space

OSError: [Errno 28] No space left on device

In terms of the checks of log consistency, with some updates to the script @tamuri created (now on branch mmg/suspend-restore-scenario), after merging in the changes from #1404 we seem to get almost consistent logs other than a remaining difference in consumables item_codes_not_recognised:

Column Undernutrition_Feeding row 0: [1221] vs [1171]

I haven't yet figured out why this difference is occuring.

Full output from script

Full run path: outputs/full-simple-2024-07-24T130128Z
Suspended run path: outputs/suspend-simple-2024-07-24T161415Z
Resumed run path: outputs/resume-simple-2024-07-24T163556Z
================================================================================
Processing tlo.methods.copd.pickle
        Key: copd_prevalence
                No differences
================================================================================
Processing tlo.methods.alri.pickle
        Key: incidence_count_by_age_and_pathogen
                No differences
        Key: event_counts
                No differences
================================================================================
Processing tlo.methods.rti.pickle
        Key: Inj_category_incidence
                No differences
        Key: Injury_information
                No differences
        Key: Open_fracture_information
                No differences
        Key: Percent_of_shock_in_rti
                No differences
        Key: number_of_injuries_in_hospital
                No differences
        Key: Requested_Pain_Management
                No differences
        Key: Successful_Pain_Management
                No differences
        Key: injury_severity
                No differences
        Key: summary_1m
                No differences
        Key: rti_demography
                No differences
        Key: model_progression
                No differences
        Key: RTI_Death_Injury_Profile
                No differences
================================================================================
Processing tlo.methods.tb.pickle
        Key: tb_incidence
                No differences
        Key: tb_prevalence
                No differences
        Key: tb_mdr
                No differences
        Key: tb_treatment
                No differences
        Key: tb_treatment_delays
                No differences
        Key: tb_false_positive
                No differences
================================================================================
Processing tlo.methods.enhanced_lifestyle.pickle
        Key: li_urban
                No differences
        Key: li_wealth
                No differences
        Key: li_low_ex
                No differences
        Key: li_tob
                No differences
        Key: li_ex_alc
                No differences
        Key: li_mar_stat
                No differences
        Key: li_in_ed
                No differences
        Key: li_ed_lev
                No differences
        Key: li_unimproved_sanitation
                No differences
        Key: li_no_clean_drinking_water
                No differences
        Key: li_wood_burn_stove
                No differences
        Key: li_no_access_handwashing
                No differences
        Key: li_high_salt
                No differences
        Key: li_high_sugar
                No differences
        Key: li_bmi
                No differences
        Key: li_is_circ
                No differences
        Key: li_is_sexworker
                No differences
================================================================================
Processing tlo.methods.measles.pickle
        Key: incidence
                No differences
        Key: measles_symptoms
                No differences
        Key: pop_age_range
                No differences
        Key: measles_incidence_age_range
                No differences
================================================================================
Processing tlo.methods.demography.detail.pickle
        Key: properties_of_deceased_persons
                No differences
================================================================================
Processing tlo.methods.diarrhoea.pickle
        Key: incident_case
                No differences
        Key: end_of_case
                No differences
================================================================================
Processing tlo.simulation.pickle
================================================================================
Processing tlo.methods.healthburden.pickle
        Key: disability_mapper_from_tlo_cause_to_common_label
                No differences
        Key: disability_mapper_from_gbd_cause_to_common_label
                No differences
        Key: daly_mapper_from_tlo_cause_to_common_label
                No differences
        Key: daly_mapper_from_gbd_cause_to_common_label
                No differences
        Key: yld_by_causes_of_disability
                No differences
        Key: yll_by_causes_of_death
                No differences
        Key: yll_by_causes_of_death_stacked
                No differences
        Key: yll_by_causes_of_death_stacked_by_age_and_time
                No differences
        Key: dalys
                No differences
        Key: dalys_stacked
                No differences
        Key: dalys_stacked_by_age_and_time
                No differences
        Key: dalys_by_wealth_stacked_by_age_and_time
                No differences
================================================================================
Processing tlo.methods.care_of_women_during_pregnancy.pickle
        Key: anc_interventions
                No differences
        Key: anc_count_on_birth
                No differences
        Key: anc_visits_which_ran
                No differences
================================================================================
Processing tlo.methods.malaria.pickle
        Key: rdt_log
                No differences
        Key: prev_district
                No differences
        Key: pop_district
                No differences
        Key: incidence
                No differences
        Key: status_counts
                No differences
        Key: prevalence
                No differences
        Key: coinfection_prevalence
                No differences
        Key: tx_coverage
                No differences
================================================================================
Processing tlo.methods.postnatal_supervisor.pickle
        Key: total_neo_pnc_visits
                No differences
        Key: total_mat_pnc_visits
                No differences
        Key: newborn_complication
                No differences
        Key: maternal_complication
                No differences
================================================================================
Processing tlo.methods.contraception.pickle
        Key: contraception_use_summary
                No differences
        Key: contraception_use_summary_by_age
                No differences
        Key: pregnancy
                No differences
        Key: contraception_change
                No differences
================================================================================
Processing tlo.methods.breast_cancer.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.methods.demography.pickle
        Key: mapper_from_tlo_cause_to_common_label
                No differences
        Key: mapper_from_gbd_cause_to_common_label
                No differences
        Key: other_deaths
                No differences
        Key: scaling_factor
                No differences
        Key: population
                No differences
        Key: age_range_m
                No differences
        Key: age_range_f
                No differences
        Key: num_children
                No differences
        Key: person_years
                No differences
        Key: on_birth
                No differences
        Key: death
                No differences
================================================================================
Processing tlo.methods.newborn_outcomes.pickle
        Key: newborn_complication
                No differences
        Key: postnatal_check
                No differences
        Key: twin_birth
                No differences
================================================================================
Processing tlo.methods.other_adult_cancers.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.population.pickle
        Key: info
                No differences
================================================================================
Processing tlo.methods.prostate_cancer.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.methods.schisto.pickle
        Key: infection_status_mansoni
                No differences
        Key: infection_status_haematobium
                No differences
================================================================================
Processing tlo.methods.depression.pickle
        Key: summary_stats
                No differences
        Key: event_counts
                No differences
================================================================================
Processing tlo.methods.stunting.pickle
        Key: prevalence
                No differences
================================================================================
Processing tlo.methods.bladder_cancer.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.methods.healthsystem.pickle
        Key: message
                No differences
        Key: Consumables
                No differences
        Key: Capacity
                No differences
        Key: bed_tracker_maternity_bed
                No differences
        Key: bed_tracker_delivery_bed
                No differences
        Key: bed_tracker_general_bed
                No differences
        Key: bed_tracker_non_bed_space
                No differences
        Key: item_codes_not_recognised
                Column Undernutrition_Feeding row 0: [1221] vs [1171]
================================================================================
Processing tlo.methods.oesophagealcancer.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.methods.hiv.pickle
        Key: hiv_test
                No differences
        Key: hiv_arv_NA
                No differences
        Key: summary_inc_and_prev_for_adults_and_children_and_fsw
                No differences
        Key: prev_by_age_and_sex
                No differences
        Key: infections_by_2age_groups_and_sex
                No differences
        Key: hiv_program_coverage
                No differences
        Key: hiv_treatment_delays
                No differences
================================================================================
Processing tlo.methods.population.pickle
        Key: scaling_factor
                No differences
================================================================================
Processing tlo.methods.epilepsy.pickle
        Key: epilepsy_logging
                No differences
        Key: inc_epilepsy
                No differences
================================================================================
Processing tlo.methods.epi.pickle
        Key: ep_vaccine_coverage
                No differences
================================================================================
Processing tlo.methods.cardio_metabolic_disorders.pickle
        Key: incidence_count_by_condition
                No differences
        Key: incidence_count_by_incident_event
                No differences
        Key: incidence_count_by_prevalent_event
                No differences
        Key: person_years_diabetes
                No differences
        Key: person_years_hypertension
                No differences
        Key: person_years_chronic_kidney_disease
                No differences
        Key: person_years_chronic_lower_back_pain
                No differences
        Key: person_years_chronic_ischemic_hd
                No differences
        Key: person_years_ever_stroke
                No differences
        Key: person_years_ever_heart_attack
                No differences
        Key: diabetes_prevalence_by_age_and_sex
                No differences
        Key: diabetes_prevalence
                No differences
        Key: diabetes_diagnosis_prevalence
                No differences
        Key: diabetes_medication_prevalence
                No differences
        Key: hypertension_prevalence_by_age_and_sex
                No differences
        Key: hypertension_prevalence
                No differences
        Key: hypertension_diagnosis_prevalence
                No differences
        Key: hypertension_medication_prevalence
                No differences
        Key: chronic_kidney_disease_prevalence_by_age_and_sex
                No differences
        Key: chronic_kidney_disease_prevalence
                No differences
        Key: chronic_kidney_disease_diagnosis_prevalence
                No differences
        Key: chronic_kidney_disease_medication_prevalence
                No differences
        Key: chronic_lower_back_pain_prevalence_by_age_and_sex
                No differences
        Key: chronic_lower_back_pain_prevalence
                No differences
        Key: chronic_lower_back_pain_diagnosis_prevalence
                No differences
        Key: chronic_lower_back_pain_medication_prevalence
                No differences
        Key: chronic_ischemic_hd_prevalence_by_age_and_sex
                No differences
        Key: chronic_ischemic_hd_prevalence
                No differences
        Key: chronic_ischemic_hd_diagnosis_prevalence
                No differences
        Key: chronic_ischemic_hd_medication_prevalence
                No differences
        Key: ever_stroke_prevalence_by_age_and_sex
                No differences
        Key: ever_stroke_prevalence
                No differences
        Key: ever_heart_attack_prevalence_by_age_and_sex
                No differences
        Key: ever_heart_attack_prevalence
                No differences
================================================================================
Processing tlo.methods.pregnancy_supervisor.pickle
        Key: maternal_complication
                No differences
        Key: conditions_on_birth
                No differences
        Key: antenatal_stillbirth
                No differences
        Key: preg_info
                No differences
================================================================================
Processing tlo.methods.healthsystem.summary.pickle
        Key: hsi_event_counts
                No differences
        Key: never_ran_hsi_event_counts
                No differences
        Key: HSI_Event
                No differences
        Key: Never_ran_HSI_Event
                No differences
        Key: Capacity
                No differences
        Key: Capacity_By_OfficerType_And_FacilityLevel
                No differences
        Key: Consumables
                No differences
        Key: BedDays
                No differences
        Key: FractionOfBedDaysUsed
                No differences
        Key: EquipmentEverUsed_ByFacilityID
                No differences
        Key: hsi_event_details
                No differences
        Key: never_ran_hsi_event_details
                No differences
================================================================================
Processing tlo.methods.labour.detail.pickle
        Key: intervention
                No differences
        Key: death_mni
                No differences
================================================================================
Processing tlo.methods.labour.pickle
        Key: women_data_debug
                No differences
        Key: maternal_complication
                No differences
        Key: message
                No differences
        Key: delivery_setting_and_mode
                No differences
        Key: postnatal_check
                No differences
        Key: caesarean_delivery
                No differences
        Key: cs_indications
                No differences
        Key: intrapartum_stillbirth
                No differences

Co-authored-by: Asif Tamuri <[email protected]>

matt-graham · 2024-07-31T11:01:04Z

With the fixes in #1445 and #1446 the check script for the logs in the continuous and interrupted simulations now show no differences 🎉 (beyond expected differences in tlo.simulation info log).

I've now pulled in changes to Scenario class by @tamuri from tamuri/suspend-restore-scenario branch so that this PR also adds the functionality needed to suspend and resume at a scenario level. I slightly simplified the logic to allow both suspend_date and resume_simulation arguments to be specified for a particular scenario, which might be the case if we wanted to split up the simulations in to more than two parts.

I've also added a function merge_log_files to tlo.analysis.utils that will merge the log files from a pair of simulations, with any repeated header lines in the latter log file being merged being skipped. This allows easy use of the existing parse_log_files function to parse the log files from a suspend / resumed simulation pair. An additional check to the tests has also been added that in a short pair of simulations the parsed logs are equivalent when running continously or suspending and then resuming.

matt-graham · 2024-09-24T10:45:05Z

@tamuri I've merged in changes from master with #1445 and #1446 merged in to this branch now so this should be ready for playing around with.

tamuri

Great work to get to this stage, nice job! I've run it locally a few times on my own [small] scenarios without issues. I think it's good to go in and we can start thinking about the next step of integrating with scenarios being run on Batch 😅

Some minor suggestions

src/tlo/analysis/utils.py

requirements/base.txt

src/tlo/scenario.py

tests/test_simulation.py

matt-graham added 16 commits November 14, 2023 13:21

Factor out parts of simulate method

0269ea1

Further refactoring of Simulation

a0a848f

Add methods for saving and loading simulations

7ea292f

Add initial test for simulation saving and loading

3fd5cd3

Factor out and add additional simulation test checks

7e5f666

Explicitly set logger output file when loading from pickle

c2a15c3

Avoids deadlock on trying to acquire lock on loaded file handler

Check next date in event queue before popping

921ab2e

Ensures event not lost in partial simulations

Make pytest seed parameter session scoped

7596fc6

Allows use in fixtures with non-function scope

Don't use next on counter in test check

369ea88

Has side effect of mutating counter

Refactor global constants to fixtures in simulation tests + additiona…

6c1afd8

…l tests

Move logging configuration out of load_from_pickle

e8bd4d8

Better to be explicit

Add test for exception when simulation past end date

775cac1

Add docstrings for new methods

d3ec718

Add errors when running without initialising or initialising multiple…

cc71c01

… times

Add dill to dependencies

a5d7289

Sort imports

2bb4066

matt-graham requested a review from tamuri December 11, 2023 17:59

matt-graham added 2 commits December 11, 2023 18:00

Merge branch 'master' into mmg/refactor-simulate

fc60e46

Fix fenceposting error in simulation end date

97af3b0

matt-graham mentioned this pull request Dec 12, 2023

test_dx_algorithm_for_malaria_outcomes calls simulate on same Simulation instance twice #1230

Closed

Merge branch 'master' into mmg/refactor-simulate

c81a0f3

tamuri mentioned this pull request Dec 15, 2023

Saving to file simulations in a suspended state and resuming #86

Closed

matt-graham added 2 commits March 20, 2024 11:02

Merge branch 'master' into mmg/refactor-simulate

1d84be6

Merge branch 'master' into mmg/refactor-simulate

cd1a310

Merge branch 'master' into mmg/refactor-simulate

e520cca

matt-graham mentioned this pull request Jun 21, 2024

Ensure log entries use consistent ordering and types for columns #1404

Merged

matt-graham added 6 commits July 24, 2024 09:38

Merge branch 'master' into mmg/refactor-simulate

05c61dd

Fix explicit comparison to type

d53da61

Add option to configure logging when loading from pickle

40d3eaa

Move check for open log file in close_output_file method

5205a27

Tidy up docstrings and type hints

e795670

Remove use of configure_logging in test

1b1c179

matt-graham mentioned this pull request Jul 26, 2024

Log entry for consumables item_codes_not_recognised in health system log is non-deterministic #1434

Closed

matt-graham and others added 4 commits July 31, 2024 10:37

Update scenario to allow suspending and resuming

1604bc2

Co-authored-by: Asif Tamuri <[email protected]>

Add utility function to merge log files

dc20983

Add test to check equality of parsed log files in suspend-resume

cbadce6

Fix import sort order

9c139e9

matt-graham added 2 commits September 9, 2024 18:23

Merge branch 'master' into mmg/refactor-simulate

1d91b95

Merge branch 'master' into mmg/refactor-simulate

19a2603

tamuri approved these changes Sep 25, 2024

View reviewed changes

src/tlo/analysis/utils.py Outdated Show resolved Hide resolved

requirements/base.txt Outdated Show resolved Hide resolved

src/tlo/scenario.py Outdated Show resolved Hide resolved

src/tlo/scenario.py Show resolved Hide resolved

tests/test_simulation.py Outdated Show resolved Hide resolved

matt-graham added 8 commits September 26, 2024 12:55

Update pinned dill version to 0.3.8

ec10b40

Adding log message when loading suspended simulation

8d9000a

Adding log message when saving suspended simulation

39f5ce4

Increase simulation pop size and duration in test

60f011c

Avoid reading in log files to be merged all at once

6f5a76d

Add tests for merge_log_files function

4eb2ebe

Fix import order sorting

87e5fa9

Fix import order sorting (second attempt)

f0bb572

matt-graham merged commit a23e57d into master Sep 26, 2024
60 checks passed

matt-graham deleted the mmg/refactor-simulate branch September 26, 2024 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for saving and loading simulation state to / from files #1227

Add support for saving and loading simulation state to / from files #1227

matt-graham commented Dec 11, 2023 •

edited

Loading

matt-graham commented Dec 12, 2023

tamuri commented Dec 12, 2023 •

edited

Loading

tamuri commented Dec 18, 2023

matt-graham commented Apr 8, 2024

tamuri commented May 18, 2024

matt-graham commented May 23, 2024

matt-graham commented Jul 25, 2024

matt-graham commented Jul 31, 2024

matt-graham commented Sep 24, 2024

tamuri left a comment •

edited

Loading

Add support for saving and loading simulation state to / from files #1227

Add support for saving and loading simulation state to / from files #1227

Conversation

matt-graham commented Dec 11, 2023 • edited Loading

Questions

matt-graham commented Dec 12, 2023

tamuri commented Dec 12, 2023 • edited Loading

tamuri commented Dec 18, 2023

matt-graham commented Apr 8, 2024

tamuri commented May 18, 2024

matt-graham commented May 23, 2024

matt-graham commented Jul 25, 2024

matt-graham commented Jul 31, 2024

matt-graham commented Sep 24, 2024

tamuri left a comment • edited Loading

Choose a reason for hiding this comment

matt-graham commented Dec 11, 2023 •

edited

Loading

tamuri commented Dec 12, 2023 •

edited

Loading

tamuri left a comment •

edited

Loading