-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-2618 Allow appending layers to existing tables #525
Conversation
…ersion) We still don't have permissions correct for doing the needed PutBuckedLifecycleConfiguration operation done by expire. And the failed background job is causing the main delete request to fail with a 504. (Same for deleting tile caches directly.)
Disable expire_s3_objects call in delete_all_assets (used by delete_version)
Merge to production
Make name field optional in User model
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## develop #525 +/- ##
===========================================
- Coverage 82.03% 81.96% -0.07%
===========================================
Files 126 126
Lines 5665 5689 +24
===========================================
+ Hits 4647 4663 +16
- Misses 1018 1026 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you missing some other test changes? It seems like you added test.gpkg.zip, but then didn't use it. Also, I was expecting another append test that explicitly tests your new code.
app/routes/datasets/versions.py
Outdated
elif request.layers is not None: | ||
input_data["creation_options"]["layers"] += request.layers | ||
else: | ||
input_data["creation_options"]["layers"] = request.layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this one. The original dataset does have a layers value, so why are you then replacing input_data["creation_options"]["layers"] with the request.layers value which is equal to None? It seems like you want to either have an error if request.layers is None or leave the current layers the same, right? Or can you add a comment here on why you are setting layers to None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_data["creation_options"]
is the body for the append POST request, so we want to exclude the layers from the version creation creations for that operation. Ln 268-76 are updating the creation options object for the version itself. I added comments to reflect this.
update_data["creation_options"]["source_uri"] += request.source_uri # ERROR: only one source_uri is allowed | ||
if input_data["creation_options"].get("layers") is not None: | ||
if update_data["creation_options"]["layers"] is not None: | ||
update_data["creation_options"]["layers"] += request.layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If request.layers is None (which you check for explicitly at line 256 above), then you will get a Python error here when you try add None (which is not a list) to an existing list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this section to account for this. None
is now not being added to a list.
@@ -5,6 +5,8 @@ rw_api_url = "https://api.resourcewatch.org" | |||
desired_count = 2 | |||
auto_scaling_min_capacity = 2 | |||
auto_scaling_max_capacity = 15 | |||
fargate_cpu = 2048 | |||
fargate_memory = 4096 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is OK, but somehow you are including hotfixes from master into your change?
len(v) == 1 | ||
), "Non-CSV vector sources require one and only one input file" | ||
elif values.get("source_driver") in [VectorDrivers.esrijson, VectorDrivers.shp, VectorDrivers.geojson_seq, VectorDrivers.geojson]: | ||
assert (len(v) == 1), "GeoJSON and ESRI Shapefile vector sources require one and only one input file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're intending for this change to allow multiple input files for GPKG, but is that actually supported in the code? Maybe I'm looking in the wrong place, but in vector_source_assets.py it looks like we currently only make use of the first source_uri (perhaps because of the issues that might arise from specifying layers for multiple files?): https://github.com/wri/gfw-data-api/blob/master/app/tasks/vector_source_assets.py#L254
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't for allowing multiple input files, rather its for updating the version creation options after the append operation is successful (there would be more than 1 source_uri in this case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see. But would it have the side effect of now allowing multiple GPKG sources to be specified, though those after the first will be silently ignored?
None, | ||
description="List of layer names to append to version. " | ||
"If not set, all layers in source_uri will be appended.", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like if layers is not specified it will be assumed that there is only a layer named like the file (though I may be looking in the wrong place in the code): https://github.com/wri/gfw-data-api/blob/master/app/tasks/vector_source_assets.py#L209-L214
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I corrected the description. Layer names are required for .GDB and .GPKG, otherwise the file name is used as the layer name.
app/routes/datasets/versions.py
Outdated
# If source_driver is "text", this is a datapump request | ||
if input_data["creation_options"]["source_driver"] != "text": | ||
# Verify that source_driver is not None | ||
if input_data["creation_options"]["source_driver"] is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pydanic model for appending should be updated to take in creation_options
(or update_options), right? That'd be the right place to validate input like so:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manukala6, i left some comments, mainly I'd encourage updating the append pydantic model to include creation options to handle validation.
app/routes/datasets/versions.py
Outdated
) | ||
|
||
# Append the new layers to the existing ones | ||
if input_data["creation_options"].get("layers") is None: # ERROR: layers is not defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe meant this to be:
if input_data["creation_options"].get("layers") is None and request.layers is not None
?
Another way to simplify the whole logic could be:
if request.layers:
if input_data["creation_options"].get("layers") is None:
input_data["creation_options"]["layers"] += request.layers
else:
input_data["creation_options"]["layers"] += request.layers
Another alternative is to set the default layers value to []
but would be good to check that doesn't introduce breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input, I refactored it similarly.
Pull request checklist
Please check if your PR fulfills the following requirements:
Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
Issue Number: GTC-2618
What is the new behavior?
Does this introduce a breaking change?
Other information