Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support auto-setting AWS credentials for storage options #78

Merged

Conversation

milesgranger
Copy link
Collaborator

Will close #72

Also fix #74 along the way.

In [1]: import dask_deltatable as ddt
   ...: import dask.dataframe as dd
   ...: import pandas as pd
   ...:
   ...: df = pd.DataFrame({"x": range(100), "y": pd.Series(range(100)) // 2})
   ...: ddf = dd.from_pandas(df, npartitions=4)
   ...:
   ...: ddt.to_deltalake('s3://dask-deltatable-ci/test-delta', ddf)
   ...: read = ddt.read_deltalake('s3://dask-deltatable-ci/test-delta')

In [2]: read
Out[2]:
Dask DataFrame Structure:
                   x      y
npartitions=4
               int64  int64
                 ...    ...
                 ...    ...
                 ...    ...
                 ...    ...
Dask Name: to_string_dtype, 2 expressions
Expr=ArrowStringConversion(frame=FromMapProjectable(cdd6cff))

@mrocklin, I haven't added tests for this yet. I'm curious if you prefer integration type tests doing actual S3 writes to a protected bucket, so we could add some limited credentials here for like the s3://dask-deltalake-ci bucket. Or if you think mocking it is okay.

@milesgranger milesgranger force-pushed the milesgranger/72-auto-provide-s3-creds branch from cfbc44f to 14e4c93 Compare March 15, 2024 11:16
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 52.63158% with 18 lines in your changes are missing coverage. Please review.

Project coverage is 71.62%. Comparing base (cbe085a) to head (14e4c93).
Report is 1 commits behind head on main.

Files Patch % Lines
dask_deltatable/utils.py 26.08% 17 Missing ⚠️
dask_deltatable/core.py 85.71% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #78      +/-   ##
==========================================
- Coverage   74.06%   71.62%   -2.44%     
==========================================
  Files           6        6              
  Lines         320      356      +36     
==========================================
+ Hits          237      255      +18     
- Misses         83      101      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@milesgranger milesgranger self-assigned this Mar 15, 2024
@mrocklin
Copy link
Contributor

mrocklin commented Mar 15, 2024 via email

Copy link
Collaborator

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably best if tests don't rely on external resources. They'll just end up brittle or skipped for most folks. Given how well defined the S3 API is I think mocking is a good approach here.

@milesgranger milesgranger requested a review from jrbourbeau March 20, 2024 11:49
@fjetter fjetter merged commit ec1c90c into dask-contrib:main Mar 21, 2024
10 checks passed
@milesgranger milesgranger deleted the milesgranger/72-auto-provide-s3-creds branch March 21, 2024 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ImportError with deltalake=0.16.0 Specify AWS Permissions if reading from S3
5 participants