Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTC-2965 Enable analysis on lists with geostore IDs #99

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

danscales
Copy link
Contributor

GTC-2965 Enable analysis on lists with geostore IDs

The list of geostore IDs is specified by the "geostore_ids" field of the step function/lambda input. The raster analysis currently only deals with ResourceWatch geostore IDs, not data API geostore IDs. Used the geostore IDs themselves as the unique fid for the output.

Fixed a small bug where the aggregation lambda would fail if the "id_field" field is not specified, even though it should default to "fid" if not supplied.

Tested using sample inputs to the batch processing step function.

One question I left in the code: should we create and supply a specific API key for the find-by-ids requests to ResourceWatch, even though neither a bearer token nor an API key is required?

The list of geostore IDs is specified by the "geostore_ids" field of the
step function/lambda input. The raster analysis currently only deals with
ResourceWatch geostore IDs, not data API geostore IDs.

Fixed small bug where aggregation lambda would fail if the "id_field"
field is not specified, even though it should default to "fid" if not
supplied.

Tested using sample inputs to the batch processing step function.
@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.59%. Comparing base (2c49a96) to head (2d3652c).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop      #99   +/-   ##
========================================
  Coverage    94.59%   94.59%           
========================================
  Files           19       19           
  Lines         1258     1259    +1     
========================================
+ Hits          1190     1191    +1     
  Misses          68       68           
Flag Coverage Δ
unittests 94.59% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

LOGGER.info("Get geostore info by IDs")

headers: Dict[str, str] = {"x-api-key": apikey()}
url: str = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a globals.py in the raster_analysis module you could out this URI into, but not sure if it's loaded as a lambda layer for this lambda

rows.append([getattr(record, id_field), encoded_geom])
rows: List[List[str]] = []
if geostore_info is not None:
for info in geostore_info:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be a helper function to make the handler less mesy

id = info["geostoreId"]
# The RW find-by-ids call returns the geometry as a feature collection,
# which I think should always have one feature (?)
fc = info["geostore"]["data"]["attributes"]["geojson"]["features"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also for neatness, it might look better to pull this data out in get_geostore_info into a nicer construct so the logic is clearer here

# Use the geostoreId itself as the id field for the output.
id = info["geostoreId"]
# The RW find-by-ids call returns the geometry as a feature collection,
# which I think should always have one feature (?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should also have one feature

for f in fc:
if f.get("properties") is None:
f["properties"] = {}
minidf = gpd.GeoDataFrame.from_features(fc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're iterating one geostore at a time, I think you can skip turning it into a GDF and just call shape(fc[0]["geometry"]), which can covert GeoJSON to shapely directly

)
geostores: List[Dict[str, Any]] = []

for i in range(0, len(geostore_ids), GEOSTORE_PAGE_SIZE):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we may need to check once people are using it, all the lambdas by default have a 60 second runtime in this repo. It can be increased up to 15 min, and potentially this could be slow for a lot of geostores. Maybe not a huge issue now though since our whole step function has a 5 minute time limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants