-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a function to pick important features by grid #2001
Conversation
This should be a config, but for
I imagine this to limit it's work to subset of features based on property matching, like I'm fine to limit it to point geometries. If the filter conditions aren't met, then there the features aren't touched (meaning they stay in tiles passed along without modification). |
Status on this:
Three major pieces remain:
|
Test data can be found in: Which has 375 populated points that cover the area of the 7/113/50 tile (in 512 px coordinates) gathered from Overpass and processed in QGIS, copied out as GeoJSON, and reformatted with some regex to fit the expected dsl format. This is around 1/2 of the features I see in the actual tile itself, possibly because each feature is being duplicated in the vector tile? For example, I see two Tokyo's in the tile MVT each with a different ID but with otherwise same properties. Not sure where that is coming from, but the upside should be we remove the dup's with this grid function anyhow. But we should probably also sort on ID at the end to make it predictable which one is kept. Pattern for how to write the test in: |
Probably we'd add new logic to queries.yaml just above where we rank neighbourhoods to exercise the new function: I imagine the syntax would be similar to what we do for
There's also a function to remove duplicate features we might want to run before running the grid function? Though that is indiscriminate within a search radius and the gridding will likely take care of it for us anyhow. |
The intercut and overlap functions already assume an order, and take a
|
Ok, was able to make some progress on this today.
|
Can you post a new tile here, that matches the tile coords in #1999 (comment) please? |
Here's an image based on the integration test data you gave me: Red dots: Data from the integration test (sourced here) querying 8/227/100 before adding Green dots: The same data and tile after adding Basically, the red dots in the screenshot here are the ones that are thinned out. Green dots remain in the tile after calling |
Ah, it's correctly outputting 4 |
Based on this and some other fiddling, I think the code is doing the right thing, we just need to tweak the queries.yaml to match what we want the output to look like. Let me know what would make iterating on that easier, @nvkelso. |
Good find on the What we do for the other layers is a progressive generalization per zoom... so at zoom x it might be grid size of Y, like with 3 transform configs:
Adjusted to taste. We might also want to play with different size multiples, like 3 instead of 4:
|
Here are two sets of 4 GeoJSONs in a ZIP zooming into Tokyo from 8/227/100 to 11/1819/806. One is for grid sizes multiples of 4, another for multiples of 3. My intuition is saying that the multiple of 4 lets too many through, and 3 is slightly better. Note that they are still in epsg:3857, but at least they have the properties visible in QGIS now. |
Reviewing the samples (there is confusion around 512 px tile sizes versus 256 here, beware!) , I propose:
For the examples, we start out with, and end up with:
Now what tile size does this config assume? I think it's targeting 512 pixel sized tiles (please confirm)... but it should be 1/2'd for 256 px tiles (since the dimensions are 1/2 in width and height)? But Tilezen is 256 logical pizel size, so should all those be offset by 1 and halved by default, and if the tilesize is 512 then the grid multiplied by 2? When I review 512 sized zooms 8, 9, and 10 I see a mix of I reviewed Tokyo "zoom 10" (512 logical zoom or zoom 11 at 256 px) being the last zoom seems fine. Same in Paris, London, New York, and other regions. But in Beijing there are dense
The 2nd note raises a question if we should also thin the We already add a Either keep_n_features (but really the
Or better yet grid thinning like (modify
|
@@ -3166,8 +3169,8 @@ def keep_n_features_gridded(ctx): | |||
return None | |||
|
|||
minx, miny, maxx, maxy = ctx.unpadded_bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By here do we know if we're in a 256 or 512 px tile and can adjust a grid_width
or grid_height
multiplier accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's no concept of pixel size here as far as I know. It's just processing a bounding box of vector data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wacky (future) idea: seems like you could even compute the features to keep in for multiple tile sizes and tag them as such. We could include this stuff in our layer output but then weed it out in tilequeue where we know what size we're cutting.
Ok, I have a tileserver running locally to compare like-for-like tiles. I asked tileserver for
All MVTs: Archive.zip I noticed that these "512/all/z/x/y.mvt" tiles are 4x bigger on the screen in QGIS than the same tiles requested from Tapalcatl with "512/all/z/x/y.mvt" (same geographic area, though), so I'm not really sure what's going on size-wise. At least we can compare before/after within tileserver. Another thing to note is that the "poi" layer is quite busy with transit stations starting at z10 and then at z11 with other OSM-sourced POI. We might want to consider thinning those out later, too. |
Let's track that separately with #2033. |
If you rename the files so they lead with the Eg: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed the width numbers are working for these 512 px tiles in QGIS.
This might make problems for people still using 256 px tile config, so leave a note about that in the source code and call it a day?
@travisgrigsby please also take a look and leave a PR review. |
self.assertEqual("test_shape_2", output_features[2][2]) | ||
self.assertEqual("test_shape_3", output_features[3][2]) | ||
|
||
def test_fail_on_non_integer_reverse_sort_key(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great tests. Really illustrate what this does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is, for the level of interesting that this is accomplishing, some of the easiest to read code ever :D Nice work.
v = props.get(k['sort_key']) | ||
|
||
if v is None: | ||
values.append(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand why you're adding none to the list but maybe I'll get it in a second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see - it's the value of the property. If it's none, then that value goes into the list for sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the idea is to include the None
to sort on if it's there.
# Sort the features in each bucket and pick the top items to include in the output | ||
for features_in_bucket in buckets.values(): | ||
sorted_features = sorted(features_in_bucket, key=sorting_values_for_feature) | ||
new_features.extend(sorted_features[:max_items]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice use of pythonisms to make this super easy to read and understand
end_zoom: 13 | ||
items_matching: { kind: locality } | ||
max_items: 1 | ||
grid_width: 12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it all these values for grid width and start/end zoom were arrived at experimentally? Did you try anything like reducing the width but increasing max_items to get a more organic look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I did a bit of experimentation based on the suggestions Nathaniel made above and these values seemed to look best. At least in the really dense area around Tokyo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't try more than one item per bucket. That and wider-than-taller buckets (since labels are wider than taller) are two things we could try in the future I think.
@@ -3166,8 +3169,8 @@ def keep_n_features_gridded(ctx): | |||
return None | |||
|
|||
minx, miny, maxx, maxy = ctx.unpadded_bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wacky (future) idea: seems like you could even compute the features to keep in for multiple tile sizes and tag them as such. We could include this stuff in our layer output but then weed it out in tilequeue where we know what size we're cutting.
I added a comment to the new function in ab0d108. |
The failing tests don't seem to be related to this PR, so I'm going to merge. |
For #1999
This adds a new transform function that filters point features by placing them into a grid and then only including the most important features in each grid cell.
TODO:
items_matching
? Should they be removed entirely or passed along without modification?