-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-2822 Check for dataset ownership on asset requests #515
Conversation
Make use of get_owner in the asset operations. The checks for the pure asset operations were slightly trickier than dataset/version operations, because the dataset name is available in the URL path. You need to do a DB operation to get the dataset name. So, I put the actual get_owner() call in the main function rather than in a Depends() argument. I didn't try to add a new Depends() function that magically does the DB operation, since some of the functions already do the DB operation for other reasons. Updated the test for update_metadata and update_field_metadata to test success and failure cases for different users. Added a missing check for the ADMIN role in get_owner() and updated its comment. Updated the detail of the various permission exceptions to be more informative (GTC-2795). Let me know what you think. We could put a bit more information in these messages if we did these errors in the main functions, rather than in the Depends functions, but that is probably not worth it. In test_dataset.py, several tests were being skipped unintentionally because they were missing the '@pytyest.mark.asyncio' annotation, so I added that in where needed.
app/routes/datasets/dataset.py
Outdated
|
||
if user.role == "ADMIN": | ||
return user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this, I thought I had it in the code but seem to have lost it. Justin caught it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make mine match this
# Create a dataset | ||
app.dependency_overrides[get_manager] = get_admin_mocked | ||
# Create a dataset with a manager, then make sure get_owner succeeds with an admin. | ||
app.dependency_overrides[get_manager] = get_manager_mocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the explanations!
raise HTTPException(status_code=404, detail=str(e)) | ||
|
||
# This is the actual check that the user is either the dataset owner or an admin | ||
_ = await get_owner(asset_row.dataset, user) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can get the asset row/dataset ID through a chained depends function, or do you think that's not preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I discussed, I didn't want to do that, since some of the functions already fetch the asset row for their own reasons, so we would be doing the same DB access twice.
app/routes/datasets/dataset.py
Outdated
|
||
if user.role == "ADMIN": | ||
return user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make mine match this
dataset_row: ORMDataset = await datasets.get_dataset(dataset) | ||
owner: str = dataset_row.owner_id | ||
if owner != user.id: | ||
raise HTTPException(status_code=401, detail="Unauthorized") | ||
raise HTTPException(status_code=401, detail=f"Unauthorized write access to dataset {dataset} (or its versions/assets) by a user who is not an admin or owner of the dataset") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we include actually the email of the owner? Otherwise they won't know who to contact. I added a function in another PR to look up a user by ID, I can add that part here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave GTC-2795 open (better access denied error messages) and fix this in a later change, just so I can get the current change in today.
Thanks for the comment/suggestion!
# generic_vector_source_version creates with owner of manager_mocked, so | ||
# update by manager_mocked should succeed | ||
appmain.dependency_overrides[get_user] = get_manager_mocked | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily in this PR, but wondering if there's a way for us to turn this into a reusable helper function since we keep copying this code block
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## feature/data_manager #515 +/- ##
========================================================
- Coverage 81.86% 81.71% -0.16%
========================================================
Files 125 125
Lines 5565 5583 +18
========================================================
+ Hits 4556 4562 +6
- Misses 1009 1021 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
GTC-2822 Check for dataset ownership on asset requests
Make use of get_owner in the asset operations.
The checks for the pure asset operations were slightly trickier than dataset/version operations, because the dataset name is available in the URL path. You need to do a DB operation to get the dataset name. So, I put the actual get_owner() call in the main function rather than in a Depends() argument. I didn't try to add a new Depends() function that magically does the DB operation, since some of the functions already do the DB operation for other reasons.
Updated the test for update_metadata and update_field_metadata to test success and failure cases for different users.
Added a missing check for the ADMIN role in get_owner() and updated its comment.
Updated the detail of the various permission exceptions to be more informative (GTC-2795). Let me know what you think. We could put a bit more information in these messages if we did these errors in the main functions, rather than in the Depends functions, but that is probably not worth it.
In test_dataset.py, several tests were being skipped unintentionally because they were missing the '@pytyest.mark.asyncio' annotation, so I added that in where needed.