Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a function to pick important features by grid #2001

Merged
merged 17 commits into from
Dec 10, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 28 additions & 2 deletions queries.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -976,10 +976,36 @@ post_process:
params:
source_layer: places
start_zoom: 8
end_zoom: 12
end_zoom: 9
items_matching: { kind: locality }
max_items: 1
grid_width: 3
sorting_keys:
- { sort_key: 'min_zoom', reverse: False }
- { sort_key: 'collision_rank', reverse: False }
- { sort_key: 'population', reverse: True }
- { sort_key: 'id', reverse: True }
- fn: vectordatasource.transform.keep_n_features_gridded
params:
source_layer: places
start_zoom: 9
end_zoom: 11
items_matching: { kind: locality }
max_items: 1
grid_width: 6
sorting_keys:
- { sort_key: 'min_zoom', reverse: False }
- { sort_key: 'collision_rank', reverse: False }
- { sort_key: 'population', reverse: True }
- { sort_key: 'id', reverse: True }
- fn: vectordatasource.transform.keep_n_features_gridded
params:
source_layer: places
start_zoom: 11
end_zoom: 13
items_matching: { kind: locality }
max_items: 1
grid_size: 16
grid_width: 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it all these values for grid width and start/end zoom were arrived at experimentally? Did you try anything like reducing the width but increasing max_items to get a more organic look?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I did a bit of experimentation based on the suggestions Nathaniel made above and these values seemed to look best. At least in the really dense area around Tokyo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't try more than one item per bucket. That and wider-than-taller buckets (since labels are wider than taller) are two things we could try in the future I think.

sorting_keys:
- { sort_key: 'min_zoom', reverse: False }
- { sort_key: 'collision_rank', reverse: False }
Expand Down
14 changes: 7 additions & 7 deletions test/test_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,7 @@ def test_not_points(self):
source_layer="test_layer",
items_matching=dict(foo="bar"),
max_items=1,
grid_size=2,
grid_width=2,
sorting_keys=[
{"sort_key": "foo"},
],
Expand Down Expand Up @@ -461,7 +461,7 @@ def test_points_keep_1(self):
source_layer="test_layer",
items_matching=dict(foo="bar"),
max_items=1,
grid_size=2,
grid_width=2,
sorting_keys=[
{"sort_key": "foo"},
],
Expand Down Expand Up @@ -499,7 +499,7 @@ def test_points_keep_1_multisort_second(self):
source_layer="test_layer",
items_matching=dict(foo="bar"),
max_items=1,
grid_size=2,
grid_width=2,
sorting_keys=[
{"sort_key": "min_zoom"},
{"sort_key": "population", "reverse": True},
Expand Down Expand Up @@ -538,7 +538,7 @@ def test_points_keep_1_multisort_minzoom(self):
source_layer="test_layer",
items_matching=dict(foo="bar"),
max_items=1,
grid_size=2,
grid_width=2,
sorting_keys=[
{"sort_key": "min_zoom"},
{"sort_key": "population", "reverse": True},
Expand Down Expand Up @@ -581,7 +581,7 @@ def test_points_keep_1_different_buckets(self):
source_layer="test_layer",
items_matching=dict(foo="bar"),
max_items=1,
grid_size=2,
grid_width=2,
sorting_keys=[
{"sort_key": "population", "reverse": True},
],
Expand Down Expand Up @@ -625,7 +625,7 @@ def test_points_keep_more_than_in_one_bucket(self):
source_layer="test_layer",
items_matching=dict(foo="bar"),
max_items=5,
grid_size=2,
grid_width=2,
sorting_keys=[
{"sort_key": "min_zoom", "reverse": True},
{"sort_key": "population"},
Expand Down Expand Up @@ -667,7 +667,7 @@ def test_fail_on_non_integer_reverse_sort_key(self):
source_layer="test_layer",
items_matching=dict(foo="bar"),
max_items=5,
grid_size=2,
grid_width=2,
sorting_keys=[
{"sort_key": "population", "reverse": True},
],
Expand Down
17 changes: 10 additions & 7 deletions vectordatasource/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -3120,9 +3120,10 @@ def keep_n_features_gridded(ctx):
pairs in `items_matching` into a grid, then keep the
first `max_items` features in each grid cell.

The grid is created by dividing the width and height
of the tile into `grid_size` buckets (so you end up
with grid_size*grid_size buckets total).
The grid is created by dividing the tile into buckets.
You can specify the `grid_width` and `grid_height` to
get grid_width*grid_height buckets or just `grid_width`
to get grid_width*grid_width buckets.

NOTE: This only works with point features and will
pass through non-point features untouched.
Expand All @@ -3139,15 +3140,17 @@ def keep_n_features_gridded(ctx):
end_zoom = ctx.params.get('end_zoom')
items_matching = ctx.params.get('items_matching')
max_items = ctx.params.get('max_items')
grid_size = ctx.params.get('grid_size')
grid_width = ctx.params.get('grid_width')
# if grid_height is not specified, use grid_width for grid_height
grid_height = ctx.params.get('grid_height') or grid_width
sorting_keys = ctx.params.get('sorting_keys')

# leaving items_matching, grid_size, or max_items as None (or zero)
# would mean that this filter would do nothing, so assume
# that this is really a configuration error.
assert items_matching, 'keep_n_features_gridded: missing or empty item match dict'
assert max_items, 'keep_n_features_gridded: missing or zero max number of items'
assert grid_size, 'keep_n_features_gridded: missing or zero grid size'
assert grid_width, 'keep_n_features_gridded: missing or zero grid width'
assert sorting_keys, 'keep_n_features_gridded: missing sorting keys'
assert isinstance(sorting_keys, list), 'keep_n_features_gridded: sorting keys should be a list'

Expand All @@ -3166,8 +3169,8 @@ def keep_n_features_gridded(ctx):
return None

minx, miny, maxx, maxy = ctx.unpadded_bounds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By here do we know if we're in a 256 or 512 px tile and can adjust a grid_width or grid_height multiplier accordingly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's no concept of pixel size here as far as I know. It's just processing a bounding box of vector data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wacky (future) idea: seems like you could even compute the features to keep in for multiple tile sizes and tag them as such. We could include this stuff in our layer output but then weed it out in tilequeue where we know what size we're cutting.

bucket_width = (maxx - minx) / grid_size
bucket_height = (maxy - miny) / grid_size
bucket_width = (maxx - minx) / grid_width
bucket_height = (maxy - miny) / grid_height

# Sort the features into buckets
buckets = defaultdict(list)
Expand Down