Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert excel ResourceFiles to csv - Add a method to read converted files #1425

Merged
merged 8 commits into from
Oct 8, 2024

Conversation

mnjowe
Copy link
Collaborator

@mnjowe mnjowe commented Jul 17, 2024

This PR aims at adding a method that will help reading the newly converted csv files in a similar manner as their excel equivalent were read. It also addresses situations where multiple sheets are required i.e pd.read_excel(resourcefile_path, sheet_name=[sheet1, sheet2, ...., sheetN]).

@mnjowe mnjowe requested review from matt-graham and tbhallett July 17, 2024 08:45
@mnjowe mnjowe linked an issue Jul 17, 2024 that may be closed by this pull request
@mnjowe mnjowe self-assigned this Jul 17, 2024
Copy link
Collaborator

@tbhallett tbhallett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mnjowe

This looks firmly on the right tracks to me.

src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@matt-graham matt-graham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mnjowe - agree with @tbhallett this is looking good, have added some additional comments below.

src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
@tamuri tamuri mentioned this pull request Jul 23, 2024
@mnjowe
Copy link
Collaborator Author

mnjowe commented Jul 25, 2024

Thanks @tbhallett and @matt-graham for your helpful comments above. I have pushed a couple of commits that seeks to address them all. Let's keep the discussion going. Thanks

@mnjowe mnjowe changed the title Convert excel ResourceFiles to csv Convert excel ResourceFiles to csv - Add a method to read converted files Jul 26, 2024
@mnjowe mnjowe marked this pull request as ready for review September 4, 2024 09:40
Copy link
Collaborator

@matt-graham matt-graham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mnjowe. Thanks for your work on this. I've added some comments and suggested changes. The most critical is with regards to the use of shutil.rmtree in one of the tests as I think the current implementation has some risk of unintentionally deleting files on systems the tests are run on in certain edge cases.

src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
src/tlo/util.py Outdated Show resolved Hide resolved
@@ -317,3 +319,170 @@ def check_hash_is_valid(dfh):
# check hash differs for different dataframes
if not dataframes[i].equals(dataframes[j]):
assert df_hash != tlo.util.hash_dataframe(dataframes[j])


def test_read_csv_files_method():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having multiple different test cases is a good idea, but ideally they should each be in a separate test function as that gives us more information when a test fails (currently for example if test case 1 fails we will not get the result for any of the remaining tests so not know if they would also fail or not). Keeping test functions small also makes them more readable and maintainable and makes it easier to diagnose the source of a failure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! My next push will address this. Thanks

Comment on lines 455 to 456
excel_file_paths = [folder / file for file in files] \
if files is not None else [file for file in folder.rglob("*.xlsx")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we should avoid having code repeated between implementations and tests and aim to keep logic in tests as simple as possible. In this case I think it would be better to just have files argument be non optional here and deal with explicitly constructing the list of Excel files to be checked outside of the call to check_logic_of_converting_excel_files_to_csv_files using folder.rglob("*.xlsx") for the case where files argument to convert_excel_files_to_csv is None. This makes it more explicit that the expectation is in this case all .xslx files should be converted in this case.

tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
@mnjowe
Copy link
Collaborator Author

mnjowe commented Sep 6, 2024

Thanks @matt-graham for all the helpful comments above. I will address them soon

@mnjowe mnjowe force-pushed the mnjowe/convert_xlsx_to_csv branch from 83f8ea3 to dfc66a6 Compare September 11, 2024 13:30
Copy link
Collaborator

@matt-graham matt-graham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mnjowe for your updates and addressing my comments. Not sure if you are still making changes to this or not, but from my perspective it looks good to merge.

@mnjowe
Copy link
Collaborator Author

mnjowe commented Sep 13, 2024

Thanks @matt-graham. Just one more commit to address another of your comment on making tests small and simple.

Copy link
Collaborator

@matt-graham matt-graham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mnjowe for the updates to split up the test functions - this looks great!

@matt-graham matt-graham merged commit 8d0cfee into master Oct 8, 2024
60 checks passed
@matt-graham matt-graham deleted the mnjowe/convert_xlsx_to_csv branch October 8, 2024 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants