-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-2618 Allow appending layers to existing tables #525
Changes from 17 commits
ebfe539
086ffc7
2d2a5be
6ce63f9
3d1cd5b
fcbf56a
218b11f
c9ee67d
65c496c
0219171
5d3817d
cda9016
ebf854c
ddb66d8
13d1145
a5509f1
4edd3b4
4e52f20
957023e
0608757
b414ea1
5013b66
5c478d1
63ae482
f75a142
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,8 @@ | ||
from typing import List, Optional, Tuple, Union | ||
|
||
from pydantic import BaseModel, Field | ||
from pydantic import BaseModel, Field, validator | ||
|
||
from ..enum.creation_options import VectorDrivers | ||
from ..enum.versions import VersionStatus | ||
from .base import BaseRecord, StrictBaseModel | ||
from .creation_options import SourceCreationOptions | ||
|
@@ -59,7 +60,14 @@ class VersionUpdateIn(StrictBaseModel): | |
|
||
class VersionAppendIn(StrictBaseModel): | ||
source_uri: List[str] | ||
|
||
source_driver: Optional[VectorDrivers] = Field( | ||
None, description="Driver of source file. Must be an OGR driver" | ||
) | ||
layers: Optional[List[str]] = Field( | ||
None, | ||
description="List of layer names to append to version. " | ||
"If not set, all layers in source_uri will be appended.", | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks to me like if layers is not specified it will be assumed that there is only a layer named like the file (though I may be looking in the wrong place in the code): https://github.com/wri/gfw-data-api/blob/master/app/tasks/vector_source_assets.py#L209-L214 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, I corrected the description. Layer names are required for .GDB and .GPKG, otherwise the file name is used as the layer name. |
||
|
||
class VersionResponse(Response): | ||
data: Version |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ | |
from copy import deepcopy | ||
from typing import Any, Dict, List, Optional, Sequence, Tuple, Union, cast | ||
from urllib.parse import urlparse | ||
import fiona | ||
|
||
from fastapi import ( | ||
APIRouter, | ||
|
@@ -211,8 +212,7 @@ async def update_version( | |
"/{dataset}/{version}/append", | ||
response_class=ORJSONResponse, | ||
tags=["Versions"], | ||
response_model=VersionResponse, | ||
deprecated=True, | ||
response_model=VersionResponse | ||
) | ||
async def append_to_version( | ||
*, | ||
|
@@ -240,13 +240,36 @@ async def append_to_version( | |
# For the background task, we only need the new source uri from the request | ||
input_data = {"creation_options": deepcopy(default_asset.creation_options)} | ||
input_data["creation_options"]["source_uri"] = request.source_uri | ||
|
||
# If source_driver is "text", this is a datapump request | ||
if input_data["creation_options"]["source_driver"] != "text": | ||
manukala6 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Verify that source_driver is not None | ||
if input_data["creation_options"]["source_driver"] is None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The pydanic model for appending should be updated to take in |
||
raise HTTPException( | ||
status_code=400, | ||
detail="Source driver must be specified for non-datapump requests." | ||
) | ||
|
||
# Append the new layers to the existing ones | ||
if input_data["creation_options"].get("layers") is None: # ERROR: layers is not defined | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe meant this to be: Another way to simplify the whole logic could be:
Another alternative is to set the default layers value to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the input, I refactored it similarly. |
||
input_data["creation_options"]["layers"] = request.layers | ||
elif request.layers is not None: | ||
input_data["creation_options"]["layers"] += request.layers | ||
else: | ||
input_data["creation_options"]["layers"] = request.layers | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this one. The original dataset does have a layers value, so why are you then replacing input_data["creation_options"]["layers"] with the request.layers value which is equal to None? It seems like you want to either have an error if request.layers is None or leave the current layers the same, right? Or can you add a comment here on why you are setting layers to None? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
background_tasks.add_task( | ||
append_default_asset, dataset, version, input_data, default_asset.asset_id | ||
) | ||
|
||
# We now want to append the new uris to the existing ones and update the asset | ||
update_data = {"creation_options": deepcopy(default_asset.creation_options)} | ||
update_data["creation_options"]["source_uri"] += request.source_uri | ||
update_data["creation_options"]["source_uri"] += request.source_uri # ERROR: only one source_uri is allowed | ||
manukala6 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if input_data["creation_options"].get("layers") is not None: | ||
if update_data["creation_options"]["layers"] is not None: | ||
update_data["creation_options"]["layers"] += request.layers | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If request.layers is None (which you check for explicitly at line 256 above), then you will get a Python error here when you try add None (which is not a list) to an existing list. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I refactored this section to account for this. |
||
else: | ||
update_data["creation_options"]["layers"] = request.layers | ||
manukala6 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
await assets.update_asset(default_asset.asset_id, **update_data) | ||
|
||
version_orm: ORMVersion = await versions.get_version(dataset, version) | ||
|
@@ -536,6 +559,17 @@ async def _version_response( | |
|
||
return VersionResponse(data=Version(**data)) | ||
|
||
#def _verify_layer_exists(source_uri: List[str], layes: List[str]) -> None: | ||
# with fiona.open(source_uri[0].replace("s3://", "/vsizip//vsis3/"), "r") as src: | ||
# layers = src.layer_names | ||
# for layer in layers: | ||
# if layer in layers: | ||
# return | ||
# else: | ||
# raise HTTPException( | ||
# status_code=400, | ||
# detail=f"Layer {layer} not found in source file." | ||
# ) | ||
manukala6 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def _verify_source_file_access(sources: List[str]) -> None: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,8 @@ rw_api_url = "https://api.resourcewatch.org" | |
desired_count = 2 | ||
auto_scaling_min_capacity = 2 | ||
auto_scaling_max_capacity = 15 | ||
fargate_cpu = 2048 | ||
fargate_memory = 4096 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this is OK, but somehow you are including hotfixes from master into your change? |
||
lambda_analysis_workspace = "default" | ||
key_pair = "dmannarino_gfw" | ||
new_relic_license_key_arn = "arn:aws:secretsmanager:us-east-1:401951483516:secret:newrelic/license_key-CyqUPX" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're intending for this change to allow multiple input files for GPKG, but is that actually supported in the code? Maybe I'm looking in the wrong place, but in vector_source_assets.py it looks like we currently only make use of the first source_uri (perhaps because of the issues that might arise from specifying layers for multiple files?): https://github.com/wri/gfw-data-api/blob/master/app/tasks/vector_source_assets.py#L254
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't for allowing multiple input files, rather its for updating the version creation options after the append operation is successful (there would be more than 1 source_uri in this case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see. But would it have the side effect of now allowing multiple GPKG sources to be specified, though those after the first will be silently ignored?