-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert excel ResourceFiles to csv - Add a method to read converted files #1425
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mnjowe
This looks firmly on the right tracks to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mnjowe - agree with @tbhallett this is looking good, have added some additional comments below.
Thanks @tbhallett and @matt-graham for your helpful comments above. I have pushed a couple of commits that seeks to address them all. Let's keep the discussion going. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mnjowe. Thanks for your work on this. I've added some comments and suggested changes. The most critical is with regards to the use of shutil.rmtree
in one of the tests as I think the current implementation has some risk of unintentionally deleting files on systems the tests are run on in certain edge cases.
tests/test_utils.py
Outdated
@@ -317,3 +319,170 @@ def check_hash_is_valid(dfh): | |||
# check hash differs for different dataframes | |||
if not dataframes[i].equals(dataframes[j]): | |||
assert df_hash != tlo.util.hash_dataframe(dataframes[j]) | |||
|
|||
|
|||
def test_read_csv_files_method(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having multiple different test cases is a good idea, but ideally they should each be in a separate test function as that gives us more information when a test fails (currently for example if test case 1 fails we will not get the result for any of the remaining tests so not know if they would also fail or not). Keeping test functions small also makes them more readable and maintainable and makes it easier to diagnose the source of a failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! My next push will address this. Thanks
tests/test_utils.py
Outdated
excel_file_paths = [folder / file for file in files] \ | ||
if files is not None else [file for file in folder.rglob("*.xlsx")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we should avoid having code repeated between implementations and tests and aim to keep logic in tests as simple as possible. In this case I think it would be better to just have files
argument be non optional here and deal with explicitly constructing the list of Excel files to be checked outside of the call to check_logic_of_converting_excel_files_to_csv_files
using folder.rglob("*.xlsx")
for the case where files
argument to convert_excel_files_to_csv
is None
. This makes it more explicit that the expectation is in this case all .xslx
files should be converted in this case.
Thanks @matt-graham for all the helpful comments above. I will address them soon |
83f8ea3
to
dfc66a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mnjowe for your updates and addressing my comments. Not sure if you are still making changes to this or not, but from my perspective it looks good to merge.
Thanks @matt-graham. Just one more commit to address another of your comment on making tests small and simple. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mnjowe for the updates to split up the test functions - this looks great!
This PR aims at adding a method that will help reading the newly converted csv files in a similar manner as their excel equivalent were read. It also addresses situations where multiple sheets are required i.e
pd.read_excel(resourcefile_path, sheet_name=[sheet1, sheet2, ...., sheetN])
.