-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-2965 Enable analysis on lists with geostore IDs #99
base: develop
Are you sure you want to change the base?
Conversation
The list of geostore IDs is specified by the "geostore_ids" field of the step function/lambda input. The raster analysis currently only deals with ResourceWatch geostore IDs, not data API geostore IDs. Fixed small bug where aggregation lambda would fail if the "id_field" field is not specified, even though it should default to "fid" if not supplied. Tested using sample inputs to the batch processing step function.
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## develop #99 +/- ##
========================================
Coverage 94.59% 94.59%
========================================
Files 19 19
Lines 1258 1259 +1
========================================
+ Hits 1190 1191 +1
Misses 68 68
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
LOGGER.info("Get geostore info by IDs") | ||
|
||
headers: Dict[str, str] = {"x-api-key": apikey()} | ||
url: str = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a globals.py in the raster_analysis module you could out this URI into, but not sure if it's loaded as a lambda layer for this lambda
rows.append([getattr(record, id_field), encoded_geom]) | ||
rows: List[List[str]] = [] | ||
if geostore_info is not None: | ||
for info in geostore_info: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be a helper function to make the handler less mesy
id = info["geostoreId"] | ||
# The RW find-by-ids call returns the geometry as a feature collection, | ||
# which I think should always have one feature (?) | ||
fc = info["geostore"]["data"]["attributes"]["geojson"]["features"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also for neatness, it might look better to pull this data out in get_geostore_info
into a nicer construct so the logic is clearer here
# Use the geostoreId itself as the id field for the output. | ||
id = info["geostoreId"] | ||
# The RW find-by-ids call returns the geometry as a feature collection, | ||
# which I think should always have one feature (?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, should also have one feature
for f in fc: | ||
if f.get("properties") is None: | ||
f["properties"] = {} | ||
minidf = gpd.GeoDataFrame.from_features(fc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're iterating one geostore at a time, I think you can skip turning it into a GDF and just call shape(fc[0]["geometry"])
, which can covert GeoJSON to shapely directly
) | ||
geostores: List[Dict[str, Any]] = [] | ||
|
||
for i in range(0, len(geostore_ids), GEOSTORE_PAGE_SIZE): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we may need to check once people are using it, all the lambdas by default have a 60 second runtime in this repo. It can be increased up to 15 min, and potentially this could be slow for a lot of geostores. Maybe not a huge issue now though since our whole step function has a 5 minute time limit.
GTC-2965 Enable analysis on lists with geostore IDs
The list of geostore IDs is specified by the "geostore_ids" field of the step function/lambda input. The raster analysis currently only deals with ResourceWatch geostore IDs, not data API geostore IDs. Used the geostore IDs themselves as the unique fid for the output.
Fixed a small bug where the aggregation lambda would fail if the "id_field" field is not specified, even though it should default to "fid" if not supplied.
Tested using sample inputs to the batch processing step function.
One question I left in the code: should we create and supply a specific API key for the find-by-ids requests to ResourceWatch, even though neither a bearer token nor an API key is required?