Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix calibration scripts for Pandas v2 #1159

Merged
merged 18 commits into from
Nov 9, 2023
Merged

Fix calibration scripts for Pandas v2 #1159

merged 18 commits into from
Nov 9, 2023

Conversation

giordano
Copy link
Member

@giordano giordano commented Oct 11, 2023

I'm trying to run the calibration with pandas v2, but I'm facing a string of errors, trying to fix them one by one. Opening as a draft for the time being, as I'm not done yet with all the fixes (I'm also not entirely sure they're all correct).

Do I take correctly these scripts aren't currently tested?

@tbhallett
Copy link
Collaborator

tbhallett commented Oct 13, 2023

I'm trying to run the calibration with pandas v2, but I'm facing a string of errors, trying to fix them one by one. Opening as a draft for the time being, as I'm not done yet with all the fixes (I'm also not entirely sure they're all correct).

Do I take correctly these scripts aren't currently tested?

Thanks @giordano

These scripts are used really heavily in analyses and the dev server. They do work for me still (without error) and seem also to have no problems on the dev server runs. I'm just run them all again (with the pandas update) and no problems. Not sure what I'm missing..!

UPDATE: Worked out that I was looking at a different script. You're quite right about the errors on this one and the dev server seems that it hasn't caught up with the changes in the pinned pandas version (so no errors yet, but they will come but it gets to it!)

@giordano
Copy link
Member Author

What version of Pandas are you using? They work for me with Pandas v1.2.2, but not v2.0.3, which is now in

pandas==2.0.3

@tbhallett
Copy link
Collaborator

What version of Pandas are you using? They work for me with Pandas v1.2.2, but not v2.0.3, which is now in

pandas==2.0.3

hang-on. Update coming...... I've found my error

@tbhallett
Copy link
Collaborator

What version of Pandas are you using? They work for me with Pandas v1.2.2, but not v2.0.3, which is now in

pandas==2.0.3

I;m on that version, and do indeed reproduce the error (and the dev server will too when I gets to the change in version, I presume)

@tbhallett
Copy link
Collaborator

@giordano and @matt-graham -- is this script running well on your end now? Am I safe to merge it in and use it on my branches?

@giordano
Copy link
Member Author

No, with this PR I only addressed the first few errors I was facing, but then I got many more at which point I gave up.

I updated this PR to replace the try/except with the if suggested above, and the ax.fill_between call which reportedly started working with a different PR. I think this should be ok to merge as a starting point, but more work needs to be done to actually update all scripts.

@giordano giordano changed the title Update demography calibratrions script for Pandas v2 Some fixes to demography calibratrions script for Pandas v2 Oct 24, 2023
@giordano giordano marked this pull request as ready for review October 24, 2023 16:21
@tbhallett
Copy link
Collaborator

No, with this PR I only addressed the first few errors I was facing, but then I got many more at which point I gave up.

I updated this PR to replace the try/except with the if suggested above, and the ax.fill_between call which reportedly started working with a different PR. I think this should be ok to merge as a starting point, but more work needs to be done to actually update all scripts.

Ok, thanks. I'll get it working.

@tbhallett tbhallett requested a review from matt-graham November 3, 2023 10:08
@tbhallett
Copy link
Collaborator

Hi @giordano and @matt-graham
I think this is now fixed-up for the new versions of pandas.

@giordano giordano changed the title Some fixes to demography calibratrions script for Pandas v2 Fix calibratrion scripts for Pandas v2 Nov 3, 2023
Copy link
Collaborator

@matt-graham matt-graham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbhallett - thanks this looks good. I've verified that with your changes the src/scripts/calibration_analyses/analysis_scripts/analysis_all_calibration.py script runs to completion using the last full set of long_run_all_disease scenario results from the dev server (2023-09-20_090745_b9e157f95_021) in an environment with pandas v2.

I've made some suggested minor changes around removing plt.show() calls which I think should be unnecessary if plot is being saved and closed immediately, and some other small changes. I think there are also some other plt.show instances in the scripts which are touched by changes here so I can't directly make suggestions on. I would say if we're assuming this script is for producing save figure files rather than interactively displaying, it would be best to remove all plt.show instances to simplify running on machines without an X server for displaying plots.

tbhallett and others added 4 commits November 6, 2023 15:36
…se_of_death_and_disability_calibrations.py

Co-authored-by: Matt Graham <[email protected]>
…se_of_death_and_disability_calibrations.py

Co-authored-by: Matt Graham <[email protected]>
…se_of_death_and_disability_calibrations.py

Co-authored-by: Matt Graham <[email protected]>
@tbhallett
Copy link
Collaborator

@tbhallett - thanks this looks good. I've verified that with your changes the src/scripts/calibration_analyses/analysis_scripts/analysis_all_calibration.py script runs to completion using the last full set of long_run_all_disease scenario results from the dev server (2023-09-20_090745_b9e157f95_021) in an environment with pandas v2.

I've made some suggested minor changes around removing plt.show() calls which I think should be unnecessary if plot is being saved and closed immediately, and some other small changes. I think there are also some other plt.show instances in the scripts which are touched by changes here so I can't directly make suggestions on. I would say if we're assuming this script is for producing save figure files rather than interactively displaying, it would be best to remove all plt.show instances to simplify running on machines without an X server for displaying plots.

Thanks so much for the close review and the good tips, @matt-graham

I see what you mean about the use of plt.show() and fig.show(). I'll remove them now.

@tbhallett
Copy link
Collaborator

tbhallett commented Nov 6, 2023

@matt-graham - when you run this (plot_legends.py::apply), do you also get copious warnings of:

/Users/tbh03/GitHub/TLOmodel/src/tlo/population.py:67: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  props[prop_name] = prop.create_series(prop_name, size)

@tbhallett tbhallett requested a review from matt-graham November 6, 2023 16:27
@matt-graham
Copy link
Collaborator

@matt-graham - when you run this (plot_legends.py::apply), do you also get copious warnings of:

/Users/tbh03/GitHub/TLOmodel/src/tlo/population.py:67: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  props[prop_name] = prop.create_series(prop_name, size)

Yeah I also get those warnings which are I think from the lines

props = pd.DataFrame(index=pd.RangeIndex(stop=size, name="person"))
for module in self.sim.modules.values():
for prop_name, prop in module.PROPERTIES.items():
props[prop_name] = prop.create_series(prop_name, size)

This started appearing on the shift to pandas v2. I'm not sure, but I think the performance issue mentioned is with respect to the cost of creating a dataframe with lots of columns by inserting one by one rather than adding altogether, rather than having an implication for the subsequent cost of accessing data in that dataframe (which would be much more of an issue in our case), but it would still be worth fixing this to avoid the warnings as this should be quick to do. I'll raise this as a separate issue

I see what you mean about the use of plt.show() and fig.show(). I'll remove them now.

Thanks for making the edits. With the latest changes script still working locally for me on downloaded scenario outputs, and now no figure windows are appearing 🎉

@matt-graham matt-graham changed the title Fix calibratrion scripts for Pandas v2 Fix calibration scripts for Pandas v2 Nov 7, 2023
@tbhallett tbhallett merged commit 02ed59f into master Nov 9, 2023
55 checks passed
@tbhallett tbhallett deleted the mg/calibration branch November 9, 2023 13:55
@tbhallett tbhallett restored the mg/calibration branch November 9, 2023 13:56
@giordano giordano deleted the mg/calibration branch November 9, 2023 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants