Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTC-2618 Allow appending layers to existing tables #525

Merged
merged 25 commits into from
Jul 1, 2024

Conversation

manukala6
Copy link
Member

@manukala6 manukala6 commented May 21, 2024

Pull request checklist

Please check if your PR fulfills the following requirements:

  • Make sure you are requesting to pull a topic/feature/bugfix branch (right side). Don't request your master!
  • Make sure you are making a pull request against the develop branch (left side). Also you should start your branch off our develop.
  • Check the commit's or even all commits' message styles matches our requested structure.
  • Check your code additions will fail neither code linting checks nor unit test.

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

Issue Number: GTC-2618

What is the new behavior?

  • Append layers or static files to existing tables on the API
  • Remove deprecated status for append to version route
  • Handle appends to test files (gfw-datapump updates) differently from other appends

Does this introduce a breaking change?

  • Yes
  • No

Other information

@codecov-commenter
Copy link

codecov-commenter commented Jun 13, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 70.96774% with 9 lines in your changes missing coverage. Please review.

Project coverage is 81.96%. Comparing base (e5140f8) to head (f75a142).

Files Patch % Lines
app/routes/assets/asset.py 41.66% 7 Missing ⚠️
app/routes/datasets/versions.py 83.33% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #525      +/-   ##
===========================================
- Coverage    82.03%   81.96%   -0.07%     
===========================================
  Files          126      126              
  Lines         5665     5689      +24     
===========================================
+ Hits          4647     4663      +16     
- Misses        1018     1026       +8     
Flag Coverage Δ
unittests 81.96% <70.96%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@manukala6 manukala6 marked this pull request as ready for review June 13, 2024 15:59
@manukala6 manukala6 changed the base branch from master to develop June 13, 2024 16:10
@manukala6 manukala6 changed the title GTC-2618 Allow appends of GPKG layers to table GTC-2618 Allow appending layers to existing tables Jun 13, 2024
Copy link
Collaborator

@danscales danscales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you missing some other test changes? It seems like you added test.gpkg.zip, but then didn't use it. Also, I was expecting another append test that explicitly tests your new code.

elif request.layers is not None:
input_data["creation_options"]["layers"] += request.layers
else:
input_data["creation_options"]["layers"] = request.layers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this one. The original dataset does have a layers value, so why are you then replacing input_data["creation_options"]["layers"] with the request.layers value which is equal to None? It seems like you want to either have an error if request.layers is None or leave the current layers the same, right? Or can you add a comment here on why you are setting layers to None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_data["creation_options"] is the body for the append POST request, so we want to exclude the layers from the version creation creations for that operation. Ln 268-76 are updating the creation options object for the version itself. I added comments to reflect this.

app/routes/datasets/versions.py Outdated Show resolved Hide resolved
app/routes/datasets/versions.py Outdated Show resolved Hide resolved
app/routes/datasets/versions.py Show resolved Hide resolved
update_data["creation_options"]["source_uri"] += request.source_uri # ERROR: only one source_uri is allowed
if input_data["creation_options"].get("layers") is not None:
if update_data["creation_options"]["layers"] is not None:
update_data["creation_options"]["layers"] += request.layers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If request.layers is None (which you check for explicitly at line 256 above), then you will get a Python error here when you try add None (which is not a list) to an existing list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this section to account for this. None is now not being added to a list.

@@ -5,6 +5,8 @@ rw_api_url = "https://api.resourcewatch.org"
desired_count = 2
auto_scaling_min_capacity = 2
auto_scaling_max_capacity = 15
fargate_cpu = 2048
fargate_memory = 4096
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is OK, but somehow you are including hotfixes from master into your change?

len(v) == 1
), "Non-CSV vector sources require one and only one input file"
elif values.get("source_driver") in [VectorDrivers.esrijson, VectorDrivers.shp, VectorDrivers.geojson_seq, VectorDrivers.geojson]:
assert (len(v) == 1), "GeoJSON and ESRI Shapefile vector sources require one and only one input file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're intending for this change to allow multiple input files for GPKG, but is that actually supported in the code? Maybe I'm looking in the wrong place, but in vector_source_assets.py it looks like we currently only make use of the first source_uri (perhaps because of the issues that might arise from specifying layers for multiple files?): https://github.com/wri/gfw-data-api/blob/master/app/tasks/vector_source_assets.py#L254

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't for allowing multiple input files, rather its for updating the version creation options after the append operation is successful (there would be more than 1 source_uri in this case)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see. But would it have the side effect of now allowing multiple GPKG sources to be specified, though those after the first will be silently ignored?

None,
description="List of layer names to append to version. "
"If not set, all layers in source_uri will be appended.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like if layers is not specified it will be assumed that there is only a layer named like the file (though I may be looking in the wrong place in the code): https://github.com/wri/gfw-data-api/blob/master/app/tasks/vector_source_assets.py#L209-L214

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I corrected the description. Layer names are required for .GDB and .GPKG, otherwise the file name is used as the layer name.

# If source_driver is "text", this is a datapump request
if input_data["creation_options"]["source_driver"] != "text":
# Verify that source_driver is not None
if input_data["creation_options"]["source_driver"] is None:
Copy link
Member

@solomon-negusse solomon-negusse Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pydanic model for appending should be updated to take in creation_options (or update_options), right? That'd be the right place to validate input like so:

Copy link
Member

@solomon-negusse solomon-negusse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manukala6, i left some comments, mainly I'd encourage updating the append pydantic model to include creation options to handle validation.

)

# Append the new layers to the existing ones
if input_data["creation_options"].get("layers") is None: # ERROR: layers is not defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe meant this to be:
if input_data["creation_options"].get("layers") is None and request.layers is not None?

Another way to simplify the whole logic could be:

if request.layers:
    if input_data["creation_options"].get("layers") is None:
        input_data["creation_options"]["layers"] += request.layers
    else:
        input_data["creation_options"]["layers"] += request.layers

Another alternative is to set the default layers value to [] but would be good to check that doesn't introduce breaking changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input, I refactored it similarly.

app/routes/datasets/versions.py Show resolved Hide resolved
@manukala6 manukala6 merged commit b568edc into develop Jul 1, 2024
2 checks passed
@manukala6 manukala6 deleted the feature/vector_layer_appends branch July 1, 2024 17:12
@manukala6 manukala6 mentioned this pull request Jul 1, 2024
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants