Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTC-2822 Check for dataset ownership on asset requests #515

Merged
merged 2 commits into from
May 10, 2024

Conversation

danscales
Copy link
Collaborator

GTC-2822 Check for dataset ownership on asset requests

Make use of get_owner in the asset operations.

The checks for the pure asset operations were slightly trickier than dataset/version operations, because the dataset name is available in the URL path. You need to do a DB operation to get the dataset name. So, I put the actual get_owner() call in the main function rather than in a Depends() argument. I didn't try to add a new Depends() function that magically does the DB operation, since some of the functions already do the DB operation for other reasons.

Updated the test for update_metadata and update_field_metadata to test success and failure cases for different users.

Added a missing check for the ADMIN role in get_owner() and updated its comment.

Updated the detail of the various permission exceptions to be more informative (GTC-2795). Let me know what you think. We could put a bit more information in these messages if we did these errors in the main functions, rather than in the Depends functions, but that is probably not worth it.

In test_dataset.py, several tests were being skipped unintentionally because they were missing the '@pytyest.mark.asyncio' annotation, so I added that in where needed.

Make use of get_owner in the asset operations.

The checks for the pure asset operations were slightly trickier than
dataset/version operations, because the dataset name is available in the
URL path. You need to do a DB operation to get the dataset name. So, I
put the actual get_owner() call in the main function rather than in a
Depends() argument. I didn't try to add a new Depends() function that
magically does the DB operation, since some of the functions already do
the DB operation for other reasons.

Updated the test for update_metadata and update_field_metadata to test
success and failure cases for different users.

Added a missing check for the ADMIN role in get_owner() and updated its
comment.

Updated the detail of the various permission exceptions to be more
informative (GTC-2795). Let me know what you think. We could put a bit
more information in these messages if we did these errors in the main
functions, rather than in the Depends functions, but that is probably
not worth it.

In test_dataset.py, several tests were being skipped unintentionally
because they were missing the '@pytyest.mark.asyncio' annotation, so I
added that in where needed.

if user.role == "ADMIN":
return user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, I thought I had it in the code but seem to have lost it. Justin caught it too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make mine match this

# Create a dataset
app.dependency_overrides[get_manager] = get_admin_mocked
# Create a dataset with a manager, then make sure get_owner succeeds with an admin.
app.dependency_overrides[get_manager] = get_manager_mocked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the explanations!

raise HTTPException(status_code=404, detail=str(e))

# This is the actual check that the user is either the dataset owner or an admin
_ = await get_owner(asset_row.dataset, user)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can get the asset row/dataset ID through a chained depends function, or do you think that's not preferable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I discussed, I didn't want to do that, since some of the functions already fetch the asset row for their own reasons, so we would be doing the same DB access twice.


if user.role == "ADMIN":
return user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make mine match this

dataset_row: ORMDataset = await datasets.get_dataset(dataset)
owner: str = dataset_row.owner_id
if owner != user.id:
raise HTTPException(status_code=401, detail="Unauthorized")
raise HTTPException(status_code=401, detail=f"Unauthorized write access to dataset {dataset} (or its versions/assets) by a user who is not an admin or owner of the dataset")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include actually the email of the owner? Otherwise they won't know who to contact. I added a function in another PR to look up a user by ID, I can add that part here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave GTC-2795 open (better access denied error messages) and fix this in a later change, just so I can get the current change in today.

Thanks for the comment/suggestion!

# generic_vector_source_version creates with owner of manager_mocked, so
# update by manager_mocked should succeed
appmain.dependency_overrides[get_user] = get_manager_mocked

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily in this PR, but wondering if there's a way for us to turn this into a reusable helper function since we keep copying this code block

@codecov-commenter
Copy link

codecov-commenter commented May 10, 2024

Codecov Report

Attention: Patch coverage is 37.50000% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 81.71%. Comparing base (302db98) to head (7c81f3f).

Files Patch % Lines
app/routes/assets/asset.py 33.33% 12 Missing ⚠️
app/authentication/token.py 0.00% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@                   Coverage Diff                    @@
##           feature/data_manager     #515      +/-   ##
========================================================
- Coverage                 81.86%   81.71%   -0.16%     
========================================================
  Files                       125      125              
  Lines                      5565     5583      +18     
========================================================
+ Hits                       4556     4562       +6     
- Misses                     1009     1021      +12     
Flag Coverage Δ
unittests 81.71% <37.50%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@danscales danscales merged commit 6b0285b into feature/data_manager May 10, 2024
2 checks passed
@danscales danscales deleted the gtc-2822 branch May 10, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants