Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for saving and loading simulation state to / from files #1227

Merged
merged 42 commits into from
Sep 26, 2024

Conversation

matt-graham
Copy link
Collaborator

@matt-graham matt-graham commented Dec 11, 2023

Potentially resolves #86 though currently this doesn't deal with exposing this functionality with scenarios.

Adds new methods save_to_pickle and load_from_pickle that respectively save the current simulation state to a pickle file and load the simulation state from a pickle file (with the latter being a class method to allow using to directly load a simulation as Simulation.load_from_pickle. Pickling is dealt with by dill as this supports a much wider range of Python objects than built in pickle implementation. dill is added to package dependencies here, but the import is currently wrapped with logic to avoid ImportErrors in environments which do not have dill available (with save_to_pickle and load_from_pickle raising informative exceptions in this case).

The contents of the current Simulation.simulate method have also been factored out in to three separate methods Simulation.initialise, Simulation.run_simulation_to and Simulation.finalise that deal allow initialising, running and finalising the simulation separately, with Simulation.simulate retaining the same behaviour by just calling the three in sequence. This allows simulations to be partially run to an intermediate date before the simulation end date, saved to file and then reloaded and continued.

As there is also global state recorded in tlo.logging we also need to reconfigure logging on loading a simulation from file. I initially included some logic in load_from_pickle to inject loaded simulation in to tlo logger and set output file to previous logging path (logging FileHandler loaded by dill is not able to acquire a lock causing deadlocks if used), but decided it would be better to make this step explicit, particularly as I'd guess we would often want to write to a new log file when resuming a loaded simulation.

A set of tests that check for consistency of simulations when saving and loading from file, including using to resume a partially run simulation, are also added.

Apologies for the unrelated formatting changes src/tlo/simulation.py, I was using black to autoformat my changes and forgot it would also reformat rest of module - I can revert these bits if it makes a pain to review.

Questions

  • Instead of save_to_pickle and load_from_pickle and should we just use the names save and load?
  • Should we explicitly set simulation.output_file to None in Simulation.load_from_pickle to guard against accidental use of previous log file (and potential deadlock issues)?

@matt-graham matt-graham requested a review from tamuri December 11, 2023 17:59
@matt-graham
Copy link
Collaborator Author

Failing test in tests/test_malaria.py is due to #1230 which #1231 should fix

@tamuri
Copy link
Collaborator

tamuri commented Dec 12, 2023

As there is also global state recorded in tlo.logging we also need to reconfigure logging on loading a simulation from file. I initially included some logic in load_from_pickle to inject loaded simulation in to tlo logger and set output file to previous logging path (logging FileHandler loaded by dill is not able to acquire a lock causing deadlocks if used), but decided it would be better to make this step explicit, particularly as I'd guess we would often want to write to a new log file when resuming a loaded simulation.

Yes, we want new log files when restoring simulation because you'd potentially run many simulations from the same saved simulation. And they'd go in as separate Azure Batch runs, so different directories etc.

@tamuri
Copy link
Collaborator

tamuri commented Dec 18, 2023

I'm working on scenarios using this - will give comments in light of that.

@matt-graham
Copy link
Collaborator Author

@tamuri what would be the next steps to work on for this? I think you mentioned there was some issue with non-determinancy in logging your testing with this identified. Did you get anyhere with looking at how to use with scenarios, and is there something for me to pick up there?

@tamuri
Copy link
Collaborator

tamuri commented May 18, 2024

I've pushed my changes to scenario.py here, as well as three scenario files and a script to check log output matches.

@matt-graham
Copy link
Collaborator Author

I've started to look at differences that arise in logs formed from either a 'full' scenario run without any suspending or resuming, or the merged logs from a pair of suspended and resumed runs (but otherwise identical scenario settings), using the scenario files and script @tamuri added on the branch tamuri/suspend-restore-scenario.

From what I can see so far, most (possibly) of the discrepancies arise from bugs that also effect logging without suspent and resuming, specifically that the columns entry logged in the header message for a first log entry is not consistent with later log entries, because of one or more of

  1. Keys in (dict / dataframe) data logged being in different orders between first and subsequent calls to logger
  2. Keys in (dict / dataframe) data logged on first and subsequent calls to logger differ (this seems to typically be when logging a multi-index series generated by a group-by operation and converted to a dict, where not all combinations of the index are present on each log iteration, for example if grouping on age_years one or more specific age_years values may be missing for any given set of data)
  3. Values (types) associated with data logged on first and subsequent calls differ (this one is very common)

The ordering issues (1) are easy enough to resolve by always sorting the data dictionary by key before logging both the header and value messages.

The non-overlapping dict keys (2) will probably require manual fixing in each case as the current structured logging approach fundamentally relies on entries being alignable with each other.

For the non-constant column types (3) I am not sure what the implication is - often this is for example a value initially with int type being subsequently float, or bool being subsequently NoneType, along with some cases of types swapping between scalar types and lists (mainly in RTI module for the latter).

@matt-graham
Copy link
Collaborator Author

The tests in tests/test_simulation.py and tests/test_healthsystem.py failing here seem to be failing due to runner being out of disk space

OSError: [Errno 28] No space left on device

In terms of the checks of log consistency, with some updates to the script @tamuri created (now on branch mmg/suspend-restore-scenario), after merging in the changes from #1404 we seem to get almost consistent logs other than a remaining difference in consumables item_codes_not_recognised:

Column Undernutrition_Feeding row 0: [1221] vs [1171]

I haven't yet figured out why this difference is occuring.

Full output from script
Full run path: outputs/full-simple-2024-07-24T130128Z
Suspended run path: outputs/suspend-simple-2024-07-24T161415Z
Resumed run path: outputs/resume-simple-2024-07-24T163556Z
================================================================================
Processing tlo.methods.copd.pickle
        Key: copd_prevalence
                No differences
================================================================================
Processing tlo.methods.alri.pickle
        Key: incidence_count_by_age_and_pathogen
                No differences
        Key: event_counts
                No differences
================================================================================
Processing tlo.methods.rti.pickle
        Key: Inj_category_incidence
                No differences
        Key: Injury_information
                No differences
        Key: Open_fracture_information
                No differences
        Key: Percent_of_shock_in_rti
                No differences
        Key: number_of_injuries_in_hospital
                No differences
        Key: Requested_Pain_Management
                No differences
        Key: Successful_Pain_Management
                No differences
        Key: injury_severity
                No differences
        Key: summary_1m
                No differences
        Key: rti_demography
                No differences
        Key: model_progression
                No differences
        Key: RTI_Death_Injury_Profile
                No differences
================================================================================
Processing tlo.methods.tb.pickle
        Key: tb_incidence
                No differences
        Key: tb_prevalence
                No differences
        Key: tb_mdr
                No differences
        Key: tb_treatment
                No differences
        Key: tb_treatment_delays
                No differences
        Key: tb_false_positive
                No differences
================================================================================
Processing tlo.methods.enhanced_lifestyle.pickle
        Key: li_urban
                No differences
        Key: li_wealth
                No differences
        Key: li_low_ex
                No differences
        Key: li_tob
                No differences
        Key: li_ex_alc
                No differences
        Key: li_mar_stat
                No differences
        Key: li_in_ed
                No differences
        Key: li_ed_lev
                No differences
        Key: li_unimproved_sanitation
                No differences
        Key: li_no_clean_drinking_water
                No differences
        Key: li_wood_burn_stove
                No differences
        Key: li_no_access_handwashing
                No differences
        Key: li_high_salt
                No differences
        Key: li_high_sugar
                No differences
        Key: li_bmi
                No differences
        Key: li_is_circ
                No differences
        Key: li_is_sexworker
                No differences
================================================================================
Processing tlo.methods.measles.pickle
        Key: incidence
                No differences
        Key: measles_symptoms
                No differences
        Key: pop_age_range
                No differences
        Key: measles_incidence_age_range
                No differences
================================================================================
Processing tlo.methods.demography.detail.pickle
        Key: properties_of_deceased_persons
                No differences
================================================================================
Processing tlo.methods.diarrhoea.pickle
        Key: incident_case
                No differences
        Key: end_of_case
                No differences
================================================================================
Processing tlo.simulation.pickle
================================================================================
Processing tlo.methods.healthburden.pickle
        Key: disability_mapper_from_tlo_cause_to_common_label
                No differences
        Key: disability_mapper_from_gbd_cause_to_common_label
                No differences
        Key: daly_mapper_from_tlo_cause_to_common_label
                No differences
        Key: daly_mapper_from_gbd_cause_to_common_label
                No differences
        Key: yld_by_causes_of_disability
                No differences
        Key: yll_by_causes_of_death
                No differences
        Key: yll_by_causes_of_death_stacked
                No differences
        Key: yll_by_causes_of_death_stacked_by_age_and_time
                No differences
        Key: dalys
                No differences
        Key: dalys_stacked
                No differences
        Key: dalys_stacked_by_age_and_time
                No differences
        Key: dalys_by_wealth_stacked_by_age_and_time
                No differences
================================================================================
Processing tlo.methods.care_of_women_during_pregnancy.pickle
        Key: anc_interventions
                No differences
        Key: anc_count_on_birth
                No differences
        Key: anc_visits_which_ran
                No differences
================================================================================
Processing tlo.methods.malaria.pickle
        Key: rdt_log
                No differences
        Key: prev_district
                No differences
        Key: pop_district
                No differences
        Key: incidence
                No differences
        Key: status_counts
                No differences
        Key: prevalence
                No differences
        Key: coinfection_prevalence
                No differences
        Key: tx_coverage
                No differences
================================================================================
Processing tlo.methods.postnatal_supervisor.pickle
        Key: total_neo_pnc_visits
                No differences
        Key: total_mat_pnc_visits
                No differences
        Key: newborn_complication
                No differences
        Key: maternal_complication
                No differences
================================================================================
Processing tlo.methods.contraception.pickle
        Key: contraception_use_summary
                No differences
        Key: contraception_use_summary_by_age
                No differences
        Key: pregnancy
                No differences
        Key: contraception_change
                No differences
================================================================================
Processing tlo.methods.breast_cancer.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.methods.demography.pickle
        Key: mapper_from_tlo_cause_to_common_label
                No differences
        Key: mapper_from_gbd_cause_to_common_label
                No differences
        Key: other_deaths
                No differences
        Key: scaling_factor
                No differences
        Key: population
                No differences
        Key: age_range_m
                No differences
        Key: age_range_f
                No differences
        Key: num_children
                No differences
        Key: person_years
                No differences
        Key: on_birth
                No differences
        Key: death
                No differences
================================================================================
Processing tlo.methods.newborn_outcomes.pickle
        Key: newborn_complication
                No differences
        Key: postnatal_check
                No differences
        Key: twin_birth
                No differences
================================================================================
Processing tlo.methods.other_adult_cancers.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.population.pickle
        Key: info
                No differences
================================================================================
Processing tlo.methods.prostate_cancer.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.methods.schisto.pickle
        Key: infection_status_mansoni
                No differences
        Key: infection_status_haematobium
                No differences
================================================================================
Processing tlo.methods.depression.pickle
        Key: summary_stats
                No differences
        Key: event_counts
                No differences
================================================================================
Processing tlo.methods.stunting.pickle
        Key: prevalence
                No differences
================================================================================
Processing tlo.methods.bladder_cancer.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.methods.healthsystem.pickle
        Key: message
                No differences
        Key: Consumables
                No differences
        Key: Capacity
                No differences
        Key: bed_tracker_maternity_bed
                No differences
        Key: bed_tracker_delivery_bed
                No differences
        Key: bed_tracker_general_bed
                No differences
        Key: bed_tracker_non_bed_space
                No differences
        Key: item_codes_not_recognised
                Column Undernutrition_Feeding row 0: [1221] vs [1171]
================================================================================
Processing tlo.methods.oesophagealcancer.pickle
        Key: summary_stats
                No differences
================================================================================
Processing tlo.methods.hiv.pickle
        Key: hiv_test
                No differences
        Key: hiv_arv_NA
                No differences
        Key: summary_inc_and_prev_for_adults_and_children_and_fsw
                No differences
        Key: prev_by_age_and_sex
                No differences
        Key: infections_by_2age_groups_and_sex
                No differences
        Key: hiv_program_coverage
                No differences
        Key: hiv_treatment_delays
                No differences
================================================================================
Processing tlo.methods.population.pickle
        Key: scaling_factor
                No differences
================================================================================
Processing tlo.methods.epilepsy.pickle
        Key: epilepsy_logging
                No differences
        Key: inc_epilepsy
                No differences
================================================================================
Processing tlo.methods.epi.pickle
        Key: ep_vaccine_coverage
                No differences
================================================================================
Processing tlo.methods.cardio_metabolic_disorders.pickle
        Key: incidence_count_by_condition
                No differences
        Key: incidence_count_by_incident_event
                No differences
        Key: incidence_count_by_prevalent_event
                No differences
        Key: person_years_diabetes
                No differences
        Key: person_years_hypertension
                No differences
        Key: person_years_chronic_kidney_disease
                No differences
        Key: person_years_chronic_lower_back_pain
                No differences
        Key: person_years_chronic_ischemic_hd
                No differences
        Key: person_years_ever_stroke
                No differences
        Key: person_years_ever_heart_attack
                No differences
        Key: diabetes_prevalence_by_age_and_sex
                No differences
        Key: diabetes_prevalence
                No differences
        Key: diabetes_diagnosis_prevalence
                No differences
        Key: diabetes_medication_prevalence
                No differences
        Key: hypertension_prevalence_by_age_and_sex
                No differences
        Key: hypertension_prevalence
                No differences
        Key: hypertension_diagnosis_prevalence
                No differences
        Key: hypertension_medication_prevalence
                No differences
        Key: chronic_kidney_disease_prevalence_by_age_and_sex
                No differences
        Key: chronic_kidney_disease_prevalence
                No differences
        Key: chronic_kidney_disease_diagnosis_prevalence
                No differences
        Key: chronic_kidney_disease_medication_prevalence
                No differences
        Key: chronic_lower_back_pain_prevalence_by_age_and_sex
                No differences
        Key: chronic_lower_back_pain_prevalence
                No differences
        Key: chronic_lower_back_pain_diagnosis_prevalence
                No differences
        Key: chronic_lower_back_pain_medication_prevalence
                No differences
        Key: chronic_ischemic_hd_prevalence_by_age_and_sex
                No differences
        Key: chronic_ischemic_hd_prevalence
                No differences
        Key: chronic_ischemic_hd_diagnosis_prevalence
                No differences
        Key: chronic_ischemic_hd_medication_prevalence
                No differences
        Key: ever_stroke_prevalence_by_age_and_sex
                No differences
        Key: ever_stroke_prevalence
                No differences
        Key: ever_heart_attack_prevalence_by_age_and_sex
                No differences
        Key: ever_heart_attack_prevalence
                No differences
================================================================================
Processing tlo.methods.pregnancy_supervisor.pickle
        Key: maternal_complication
                No differences
        Key: conditions_on_birth
                No differences
        Key: antenatal_stillbirth
                No differences
        Key: preg_info
                No differences
================================================================================
Processing tlo.methods.healthsystem.summary.pickle
        Key: hsi_event_counts
                No differences
        Key: never_ran_hsi_event_counts
                No differences
        Key: HSI_Event
                No differences
        Key: Never_ran_HSI_Event
                No differences
        Key: Capacity
                No differences
        Key: Capacity_By_OfficerType_And_FacilityLevel
                No differences
        Key: Consumables
                No differences
        Key: BedDays
                No differences
        Key: FractionOfBedDaysUsed
                No differences
        Key: EquipmentEverUsed_ByFacilityID
                No differences
        Key: hsi_event_details
                No differences
        Key: never_ran_hsi_event_details
                No differences
================================================================================
Processing tlo.methods.labour.detail.pickle
        Key: intervention
                No differences
        Key: death_mni
                No differences
================================================================================
Processing tlo.methods.labour.pickle
        Key: women_data_debug
                No differences
        Key: maternal_complication
                No differences
        Key: message
                No differences
        Key: delivery_setting_and_mode
                No differences
        Key: postnatal_check
                No differences
        Key: caesarean_delivery
                No differences
        Key: cs_indications
                No differences
        Key: intrapartum_stillbirth
                No differences

@matt-graham
Copy link
Collaborator Author

With the fixes in #1445 and #1446 the check script for the logs in the continuous and interrupted simulations now show no differences 🎉 (beyond expected differences in tlo.simulation info log).

I've now pulled in changes to Scenario class by @tamuri from tamuri/suspend-restore-scenario branch so that this PR also adds the functionality needed to suspend and resume at a scenario level. I slightly simplified the logic to allow both suspend_date and resume_simulation arguments to be specified for a particular scenario, which might be the case if we wanted to split up the simulations in to more than two parts.

I've also added a function merge_log_files to tlo.analysis.utils that will merge the log files from a pair of simulations, with any repeated header lines in the latter log file being merged being skipped. This allows easy use of the existing parse_log_files function to parse the log files from a suspend / resumed simulation pair. An additional check to the tests has also been added that in a short pair of simulations the parsed logs are equivalent when running continously or suspending and then resuming.

@matt-graham
Copy link
Collaborator Author

@tamuri I've merged in changes from master with #1445 and #1446 merged in to this branch now so this should be ready for playing around with.

Copy link
Collaborator

@tamuri tamuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work to get to this stage, nice job! I've run it locally a few times on my own [small] scenarios without issues. I think it's good to go in and we can start thinking about the next step of integrating with scenarios being run on Batch 😅

Some minor suggestions

src/tlo/analysis/utils.py Outdated Show resolved Hide resolved
requirements/base.txt Outdated Show resolved Hide resolved
src/tlo/scenario.py Outdated Show resolved Hide resolved
src/tlo/scenario.py Show resolved Hide resolved
tests/test_simulation.py Outdated Show resolved Hide resolved
@matt-graham matt-graham merged commit a23e57d into master Sep 26, 2024
60 checks passed
@matt-graham matt-graham deleted the mmg/refactor-simulate branch September 26, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Saving to file simulations in a suspended state and resuming
2 participants