Convert excel ResourceFiles to csv - Add a method to read converted files #1425

mnjowe · 2024-07-17T08:45:56Z

This PR aims at adding a method that will help reading the newly converted csv files in a similar manner as their excel equivalent were read. It also addresses situations where multiple sheets are required i.e pd.read_excel(resourcefile_path, sheet_name=[sheet1, sheet2, ...., sheetN]).

tbhallett

Thanks @mnjowe

This looks firmly on the right tracks to me.

src/tlo/util.py

matt-graham

Thanks @mnjowe - agree with @tbhallett this is looking good, have added some additional comments below.

src/tlo/util.py

mnjowe · 2024-07-25T08:41:44Z

Thanks @tbhallett and @matt-graham for your helpful comments above. I have pushed a couple of commits that seeks to address them all. Let's keep the discussion going. Thanks

matt-graham

Hi @mnjowe. Thanks for your work on this. I've added some comments and suggested changes. The most critical is with regards to the use of shutil.rmtree in one of the tests as I think the current implementation has some risk of unintentionally deleting files on systems the tests are run on in certain edge cases.

src/tlo/util.py

matt-graham · 2024-09-05T12:58:04Z

tests/test_utils.py

@@ -317,3 +319,170 @@ def check_hash_is_valid(dfh):
            # check hash differs for different dataframes
            if not dataframes[i].equals(dataframes[j]):
                assert df_hash != tlo.util.hash_dataframe(dataframes[j])
+
+
+def test_read_csv_files_method():


Having multiple different test cases is a good idea, but ideally they should each be in a separate test function as that gives us more information when a test fails (currently for example if test case 1 fails we will not get the result for any of the remaining tests so not know if they would also fail or not). Keeping test functions small also makes them more readable and maintainable and makes it easier to diagnose the source of a failure.

Great! My next push will address this. Thanks

matt-graham · 2024-09-05T13:42:04Z

tests/test_utils.py

+        excel_file_paths = [folder / file for file in files] \
+            if files is not None else [file for file in folder.rglob("*.xlsx")]


Generally we should avoid having code repeated between implementations and tests and aim to keep logic in tests as simple as possible. In this case I think it would be better to just have files argument be non optional here and deal with explicitly constructing the list of Excel files to be checked outside of the call to check_logic_of_converting_excel_files_to_csv_files using folder.rglob("*.xlsx") for the case where files argument to convert_excel_files_to_csv is None. This makes it more explicit that the expectation is in this case all .xslx files should be converted in this case.

tests/test_utils.py

mnjowe · 2024-09-06T08:46:31Z

Thanks @matt-graham for all the helpful comments above. I will address them soon

…o csv file. add tests

…ng. removed unused import

matt-graham

Thanks @mnjowe for your updates and addressing my comments. Not sure if you are still making changes to this or not, but from my perspective it looks good to merge.

mnjowe · 2024-09-13T15:04:17Z

Thanks @matt-graham. Just one more commit to address another of your comment on making tests small and simple.

matt-graham

Thanks @mnjowe for the updates to split up the test functions - this looks great!

mnjowe requested review from matt-graham and tbhallett July 17, 2024 08:45

mnjowe linked an issue Jul 17, 2024 that may be closed by this pull request

Convert '.xlsx' ResourceFiles to csv #1337

Open

mnjowe added framework epi labels Jul 17, 2024

mnjowe self-assigned this Jul 17, 2024

mnjowe mentioned this pull request Jul 17, 2024

Convert '.xlsx' ResourceFiles to csv #1337

Open

tbhallett reviewed Jul 23, 2024

View reviewed changes

src/tlo/util.py Outdated Show resolved Hide resolved

src/tlo/util.py Outdated Show resolved Hide resolved

src/tlo/util.py Outdated Show resolved Hide resolved

matt-graham reviewed Jul 23, 2024

View reviewed changes

src/tlo/util.py Outdated Show resolved Hide resolved

src/tlo/util.py Outdated Show resolved Hide resolved

src/tlo/util.py Outdated Show resolved Hide resolved

src/tlo/util.py Outdated Show resolved Hide resolved

src/tlo/util.py Outdated Show resolved Hide resolved

tamuri mentioned this pull request Jul 23, 2024

use of xlsx #1432

Closed

mnjowe changed the title ~~Convert excel ResourceFiles to csv~~ Convert excel ResourceFiles to csv - Add a method to read converted files Jul 26, 2024

mnjowe removed a link to an issue Jul 26, 2024

Convert '.xlsx' ResourceFiles to csv #1337

Open

mnjowe marked this pull request as ready for review September 4, 2024 09:40

matt-graham requested changes Sep 5, 2024

View reviewed changes

mnjowe added 3 commits September 11, 2024 09:55

addressing matt comments

b6a094e

Merge branch 'master' into mnjowe/convert_xlsx_to_csv

4abf8a8

get clean copies from master and add methods to convert Excel files t…

dfc66a6

…o csv file. add tests

mnjowe force-pushed the mnjowe/convert_xlsx_to_csv branch from 83f8ea3 to dfc66a6 Compare September 11, 2024 13:30

mnjowe added 2 commits September 11, 2024 16:23

copying and reading to temporal directory. added Excel file for testi…

9d15fa5

…ng. removed unused import

Merge branch 'master' into mnjowe/convert_xlsx_to_csv

dfa657a

matt-graham approved these changes Sep 13, 2024

View reviewed changes

mnjowe and others added 3 commits September 16, 2024 09:01

splitting into smaller tests

0bf9727

Merge branch 'master' into mnjowe/convert_xlsx_to_csv

d47b61f

Merge branch 'master' into mnjowe/convert_xlsx_to_csv

1f69a48

matt-graham approved these changes Oct 8, 2024

View reviewed changes

matt-graham merged commit 8d0cfee into master Oct 8, 2024
60 checks passed

matt-graham deleted the mnjowe/convert_xlsx_to_csv branch October 8, 2024 09:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert excel ResourceFiles to csv - Add a method to read converted files #1425

Convert excel ResourceFiles to csv - Add a method to read converted files #1425

mnjowe commented Jul 17, 2024 •

edited

Loading

tbhallett left a comment

matt-graham left a comment

mnjowe commented Jul 25, 2024

matt-graham left a comment

matt-graham Sep 5, 2024

mnjowe Sep 12, 2024

matt-graham Sep 5, 2024

mnjowe commented Sep 6, 2024

matt-graham left a comment

mnjowe commented Sep 13, 2024

matt-graham left a comment

		excel_file_paths = [folder / file for file in files] \
		if files is not None else [file for file in folder.rglob("*.xlsx")]

Convert excel ResourceFiles to csv - Add a method to read converted files #1425

Convert excel ResourceFiles to csv - Add a method to read converted files #1425

Conversation

mnjowe commented Jul 17, 2024 • edited Loading

tbhallett left a comment

Choose a reason for hiding this comment

matt-graham left a comment

Choose a reason for hiding this comment

mnjowe commented Jul 25, 2024

matt-graham left a comment

Choose a reason for hiding this comment

matt-graham Sep 5, 2024

Choose a reason for hiding this comment

mnjowe Sep 12, 2024

Choose a reason for hiding this comment

matt-graham Sep 5, 2024

Choose a reason for hiding this comment

mnjowe commented Sep 6, 2024

matt-graham left a comment

Choose a reason for hiding this comment

mnjowe commented Sep 13, 2024

matt-graham left a comment

Choose a reason for hiding this comment

mnjowe commented Jul 17, 2024 •

edited

Loading