-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rnet_breakup_vertices is way too slow for real network data #416
Comments
A small update here. # packages
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(sfnetworks)
library(mapview)
# data
download.file("https://github.com/ropensci/stplanr/releases/download/0.6.1/Example.zip", "Example.zip")
unzip("Example.zip")
roads = st_read("Example/Complete Roads/complete_roads.shp", quiet = TRUE, stringsAsFactors = FALSE) %>%
st_cast("LINESTRING")
#> Warning in st_cast.sf(., "LINESTRING"): repeating attributes for all sub-
#> geometries for which they may not be constant
roads_trf = st_transform(roads, 4674)
# POINTS
from = c(-49.95058, -24.77502)
to = c(-49.91084, -24.75200)
points <- st_sfc(st_point(from), st_point(to), crs = 4674)
bbox_points = st_as_sfc(st_bbox(points))
mapview(roads_trf[bbox_points, ], zcol = "osm_id") + points
#> although coordinates are longitude/latitude, st_intersects assumes that they are planar There is clearly a path between the two points but the problem is that if we build the road network starting from the boundary points of each LINESTRING object (as in stplanr or in sfnetworks), then there is no connection between yellow, green and blue roads (since they don't share any point in the boundaries). AFAIK, if we are working with OSM data (or similar structures) and we are creating road network objects considering only the boundary points for each LINESTRING (and this is the sfnetwork approach atm), then we should "modify the input data in two cases: The following code is taken from
Now I would like to apply |
Breaking the linestrings down into matrices with the I just discovered a paper on this topic: https://doi.org/10.1038/sdata.2016.46 This makes me think: it's worth doing a deep dive into this and perhaps a package/paper/other documenting an open, accessible, efficient and reproducible solutions to this important and seemingly common problem. |
It sounds like the
|
Hi! I think that when we discussed I like the idea of |
Here's another reprex, not progress but the source code library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(stplanr)
plot(st_geometry(rnet_roundabout), lwd = 2, col = rainbow(nrow(rnet_roundabout)))
boundary_points <- st_geometry(line2points(rnet_roundabout))
points_cols <- rep(rainbow(nrow(rnet_roundabout)), each = 2)
plot(boundary_points, pch = 16, add = TRUE, col = points_cols) # Clean the roundabout example.
rnet_roundabout_clean <- rnet_breakup_vertices(rnet_roundabout)
#> Splitting rnet object at the shared boundary points.
plot(st_geometry(rnet_roundabout_clean), lwd = 2, col = rainbow(nrow(rnet_roundabout_clean))) rnet_roundabout$variable = 1:nrow(rnet_roundabout)
system.time(rnet_roundabout_clean <- rnet_breakup_vertices(rnet_roundabout))
#> Splitting rnet object at the shared boundary points.
#> user system elapsed
#> 0.012 0.000 0.013
system.time(rnet_overline <- overline(rnet_roundabout, attrib = "variable"))
#> user system elapsed
#> 0.054 0.000 0.054
plot(st_geometry(rnet_overline), lwd = 2, col = rainbow(nrow(rnet_overline))) library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
rnet_agg = rnet_roundabout_clean %>%
group_by()
sl <- routes_fast_sf
sl$All <- flowlines$All
rnet <- overline(sl = sl, attrib = "All")
plot(rnet, lwd = 3) sl_broken = rnet_breakup_vertices(sl)
#> Splitting rnet object at the shared internal points.
sl_agg = sl_broken %>%
group_by(sf::st_as_text(geometry)) %>%
summarise(All = sum(All))
#> `summarise()` ungrouping output (override with `.groups` argument)
plot(sl_agg["All"], lwd = 3) mapview::mapview(sl_agg["All"], lwd = 3) sl_broken = rnet_breakup_vertices(sl, breakup_internal_vertex_matches = FALSE)
sl_agg = sl_broken %>%
group_by(sf::st_as_text(geometry)) %>%
summarise(All = sum(All))
#> `summarise()` ungrouping output (override with `.groups` argument)
plot(sl_agg["All"], lwd = 3) mapview::mapview(sl_agg["All"], lwd = 3) system.time({
sl_broken = rnet_breakup_vertices(sl, breakup_internal_vertex_matches = FALSE)
sl_agg = sl_broken %>%
group_by(sf::st_as_text(geometry)) %>%
summarise(All = sum(All))
})
#> `summarise()` ungrouping output (override with `.groups` argument)
#> user system elapsed
#> 0.276 0.008 0.290
system.time(rnet <- overline(sl = sl, attrib = "All"))
#> user system elapsed
#> 0.093 0.004 0.100 Created on 2020-11-05 by the reprex package (v0.3.0) |
Hi! I worked a little bit on this problem today, and I defined a new implementation of |
The following (super long) reprex presents the new approach. It's not perfect (since the first and last parts of that code could be translated to Rcpp), but that's good enough for every OSM data I will ever analyse. The next steps (after some reviews) are 1) more testings and 2) thinking about Z and M dimensions. # packages
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(stplanr)
library(osmextract)
library(data.table)
library(sfheaders)
library(sfnetworks)
library(tmap) First I need to define new auxiliary functions (that won't be exported). This is just a faster (and simpler) version of get_boundary_points <- function(rnet) {
# 1a. extract coordinates
coordinates <- sf::st_coordinates(rnet)
# 1b. Find index of L1 column
L1_index <- ncol(coordinates)
# 1c. Remove colnames
coordinates <- unname(coordinates)
# 2. Find idxs of first and last coordinate (i.e. the boundary points)
first_pair <- !duplicated(coordinates[, L1_index])
last_pair <- !duplicated(coordinates[, L1_index], fromLast = TRUE)
idxs <- first_pair | last_pair
# 3. Extract idxs and rebuild sfc
pairs <- coordinates[idxs, ]
boundary_points <- sf::st_cast(
sf::st_sfc(
sf::st_multipoint(pairs[, -L1_index]),
crs = sf::st_crs(rnet)
),
"POINT"
)
boundary_points
} The following function is a faster version of my_st_split <- function(rnet, points) {
# Extract coordinates from rnet and points
# rnet must be an sf data.frame with LINESTRING geometry (no ZM) while points must be an
# sfc with MULTIPOINT geometry
rnet_coordinates <- data.table(st_coordinates(rnet))
points_coordinates <- data.table(st_coordinates(points))
# Check Z/M dimensions
if (ncol(rnet_coordinates) > 3L) {
warning("The Z/M dimensions will be lost.", call. = FALSE, immediate. = TRUE)
}
# I cannot split a LINESTRING at a boundary point (since there is nothing "at
# the other side" of the boundary point), so I need to find ID of all shared
# internal points
id_not_boundary <- rnet_coordinates[duplicated(L1) & duplicated(L1, fromLast = TRUE), which = TRUE]
id_shared <- rnet_coordinates[points_coordinates, on = c("X", "Y"), which = TRUE]
id <- intersect(id_not_boundary, id_shared)
# Now I need to duplicate the coordinates of the shared internal points (since
# they will become boundary points for the new LINESTRING(s)). I use sort
# since I want to preserve the order within the points.
rnet_coordinates <- rnet_coordinates[sort(c(1:.N, id))]
# I will build the new sfc LINESTRING using sfheaders::sfc_linestring so I
# need to build an ID for each new LINESTRING. The ID for the new
# LINESTRING(s) is created starting from the L1 column in rnet_coordinates and
# incrementing the id by 1 at each break point.
break_id <- double(nrow(rnet_coordinates))
break_id[id + 1:length(id)] <- 1L # The ID is shifted by 1 at each break point
break_id <- cumsum(break_id)
rnet_coordinates$linestring_id <- rnet_coordinates$L1 + break_id
# Now I can create the new LINESTRING sfc
new_linestring <- sfc_linestring(
obj = rnet_coordinates,
x = "X",
y = "Y",
linestring_id = "linestring_id"
)
# Exclude Z and/or M dimension
attr(new_linestring, "z_range") <- NULL
attr(new_linestring, "m_range") <- NULL
# Determine the ID(s) of the old LINESTRING(s) (i.e. I need to create an sf
# object with the new geometry column and the old fields)
old_rnet_id <- rnet_coordinates[, list(L1 = unique(L1)), by = linestring_id]$L1
# Create the new object:
st_sf(
st_drop_geometry(rnet)[old_rnet_id, , drop = FALSE],
geometry = new_linestring,
crs = st_crs(rnet),
agr = st_agr(rnet),
precision = st_precision(rnet)
)
} Define the new breakup function (the one that will be exported) new_breakup <- function(rnet, verbose = FALSE) {
# A few safety checks
if (!inherits(rnet, "sf")) {
stop("At the moment this function was tested only for sf objects.", call. = FALSE)
}
if (!all(st_is(rnet, "LINESTRING"))) {
stop("The input sf object must have a LINESTRING geometry.")
}
if (!is.null(st_z_range(rnet)) || !is.null(st_m_range(rnet))) {
warning("The Z/M dimensions will be lost.", call. = FALSE, immediate. = TRUE)
}
# Check if we need to rebuild tbl_df at the end
rebuild_tbl <- FALSE
if (inherits(rnet, "tbl_df")) {
rebuild_tbl <- TRUE
}
# 1 - Find points that are both boundary points and internal points
internal_points <- setkey(data.table(st_coordinates(line2vertices(rnet))))
internal_points <- unique(internal_points) # I don't need duplicated pairs of coordinates
if (verbose) {
message("Extracted the internal points")
}
boundary_points <- setkey(data.table(st_coordinates(get_boundary_points(rnet))))
boundary_points <- unique(boundary_points)
if (verbose) {
message("Extracted the boundary points")
}
shared_internal_points_data_table <- boundary_points[
na.omit(boundary_points[internal_points, which = TRUE]),
]
if (nrow(shared_internal_points_data_table) > 0) {
shared_internal_points <- st_sfc(st_multipoint(
x = as.matrix(shared_internal_points_data_table)
), crs = st_crs(rnet))
# 2 - Split at shared internal points
if (length(shared_internal_points) > 0) {
message("Splitting the input object at the shared internal points.")
rnet <- my_st_split(rnet, shared_internal_points)
}
}
# 3 - Find duplicated internal points
internal_points <- st_geometry(line2vertices(rnet))
duplicated_internal_points <- internal_points[duplicated(internal_points)]
if (verbose) {
message("Extracted the duplicated internal points")
}
# 4 - Split at duplicated internal points
if (length(duplicated_internal_points) > 0) {
message("Splitting rnet object at the duplicated internal points.")
rnet <- my_st_split(rnet, duplicated_internal_points)
}
# 5 - Maybe we need to rebuild the tbl_df structure
if (rebuild_tbl) {
rnet <- st_as_sf(dplyr::as_tibble(rnet))
}
rnet
} Test roundabout example: new_rnet_rounabout <- new_breakup(rnet_roundabout)
#> Splitting the input object at the shared internal points.
# plot
set.seed(1)
par(mar = rep(0, 4))
plot(st_geometry(new_rnet_rounabout), col = sample(sf.colors(17, categorical = TRUE)), lwd = 3) new_rnet_overpass <- new_breakup(rnet_overpass)
#> Splitting the input object at the shared internal points.
plot(st_geometry(new_rnet_overpass), col = sf.colors(9, categorical = TRUE), lwd = 3) new_rnet_intersection <- new_breakup(rnet_cycleway_intersection)
#> Splitting rnet object at the duplicated internal points.
plot(st_geometry(new_rnet_intersection), col = sf.colors(4, categorical = TRUE), lwd = 3)
old_roundabout <- rnet_breakup_vertices(rnet_roundabout)
#> Splitting rnet object at the shared boundary points.
all.equal(old_roundabout, new_rnet_rounabout)
#> [1] "Component \"geometry\": Component 1: Attributes: < Component 2: Mean relative difference: 0.3333333 >"
#> [2] "Component \"geometry\": Component 1: Numeric: lengths (6, 4) differ"
#> [3] "Component \"geometry\": Component 2: Attributes: < Component 2: Mean relative difference: 1 >"
#> [4] "Component \"geometry\": Component 2: Numeric: lengths (4, 8) differ"
#> [5] "Component \"geometry\": Component 3: Mean relative difference: 5.901751e-06"
#> [6] "Component \"geometry\": Component 4: Attributes: < Component 2: Mean relative difference: 0.25 >"
#> [7] "Component \"geometry\": Component 4: Numeric: lengths (8, 10) differ"
#> [8] "Component \"geometry\": Component 5: Attributes: < Component 2: Mean relative difference: 0.2 >"
#> [9] "Component \"geometry\": Component 5: Numeric: lengths (10, 8) differ"
#> [10] "Component \"geometry\": Component 6: Attributes: < Component 2: Mean relative difference: 0.5 >"
#> [11] "Component \"geometry\": Component 6: Numeric: lengths (8, 4) differ"
#> [12] "Component \"geometry\": Component 7: Attributes: < Component 2: Mean relative difference: 0.5 >"
#> [13] "Component \"geometry\": Component 7: Numeric: lengths (4, 6) differ"
#> [14] "Component \"geometry\": Component 8: Mean relative difference: 5.782618e-06"
#> [15] "Component \"geometry\": Component 9: Mean relative difference: 4.591716e-06" but unlist(st_equals(old_roundabout, new_rnet_rounabout))
#> [1] 9 1 2 3 4 5 6 7 8 10 11 12 13 14 15 16 17 which means that they are the same objects but with different orders. If I consider a more realistic example: rnet <- oe_get(
place = "Isle of Wight",
quiet = FALSE,
query = "
SELECT osm_id, highway, z_order, other_tags, geometry
FROM 'lines'
WHERE highway IN (
'primary', 'primary_link',
'secondary', 'secondary_link',
'tertiary', 'tertiary_link',
'unclassified', 'residential', 'service'
)
"
)
#> The input place was matched with: Isle of Wight
#> The chosen file was already detected in the download directory. Skip downloading.
#> The corresponding gpkg file was already detected. Skip vectortranslate operations.
#> Reading layer `lines' from data source `C:\Users\Utente\Documents\osm_data\geofabrik_isle-of-wight-latest.gpkg' using driver `GPKG'
#> Simple feature collection with 11678 features and 4 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -1.583422 ymin: 50.57568 xmax: -1.070221 ymax: 50.76731
#> geographic CRS: WGS 84 then the results are extremely better using the new approach system.time(rnet_old <- rnet_breakup_vertices(rnet))
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> user system elapsed
#> 24.11 0.25 24.95
system.time(rnet_new <- new_breakup(rnet))
#> Splitting the input object at the shared internal points.
#> Splitting rnet object at the duplicated internal points.
#> user system elapsed
#> 3.40 0.01 3.47 The two results are not exactly the same since the new approach fixes a bug: rnet_small <- rnet[789, ]
rnet_old_temp <- rnet_breakup_vertices(rnet_small)
#> Splitting rnet object at the shared boundary points.
plot(st_geometry(rnet_old_temp), col = sf.colors(nrow(rnet_old_temp)), lwd = 2) rnet_new_temp <- new_breakup(rnet_small)
#> Splitting the input object at the shared internal points.
plot(st_geometry(rnet_new_temp), col = sf.colors(nrow(rnet_new_temp)), lwd = 2) Now I can also fix an old bug in the interaction between download.file("https://github.com/ropensci/stplanr/releases/download/0.6.1/Example.zip", "Example.zip")
unzip("Example.zip")
# sfnetworks approach ---------------------------------------
roads = st_read("Example/Complete Roads/complete_roads.shp", quiet = TRUE) %>%
st_cast("LINESTRING")
#> Warning in st_cast.sf(., "LINESTRING"): repeating attributes for all sub-
#> geometries for which they may not be constant
points = st_read("Example/Points/Points.shp", quiet = TRUE)
roads_trf = st_transform(roads, st_crs(points)) Now I need to "pre-process" nrow(roads_trf)
#> [1] 305608 and that the old approach takes more than 2h system.time(roads_trf <- new_breakup(roads_trf, verbose = TRUE))
#> Extracted the internal points
#> Extracted the boundary points
#> Splitting the input object at the shared internal points.
#> Extracted the duplicated internal points
#> Splitting rnet object at the duplicated internal points.
#> user system elapsed
#> 111.30 3.54 115.26 Moreover, please notice that now nrow(roads_trf)
#> [1] 813502
# calculate the sfnetwork object
system.time({
net <- as_sfnetwork(roads_trf, directed = FALSE) %>%
activate("edges") %>%
dplyr::mutate(weight = edge_length())
})
#> user system elapsed
#> 107.89 3.12 111.47
# routing
from = c(-49.95058, -24.77502)
to = c(-49.91084, -24.75200)
p1 = sf::st_as_sf(data.frame(x = from[1], y = from[2]), coords = c("x", "y"), crs = sf::st_crs(net))
p2 = sf::st_as_sf(data.frame(x = to[1], y = to[2]), coords = c("x", "y"), crs = sf::st_crs(net)) and finally r <- tidygraph::convert(net, to_spatial_shortest_paths, p1, p2)
#> although coordinates are longitude/latitude, st_nearest_points assumes that they are planar
#> although coordinates are longitude/latitude, st_nearest_points assumes that they are planar
# plot
tm_shape(r %>% activate("edges") %>% st_as_sf(), ext = 1.25) +
tm_lines() +
tm_shape(p1) +
tm_dots(col = "red", size = 1) +
tm_shape(p2) +
tm_dots(col = "blue", size = 1) Created on 2020-11-15 by the reprex package (v0.3.0) Session infodevtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 3.6.3 (2020-02-29)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Italian_Italy.1252
#> ctype Italian_Italy.1252
#> tz Europe/Berlin
#> date 2020-11-15
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> abind 1.4-5 2016-07-21 [1] CRAN (R 3.6.0)
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
#> backports 1.2.0 2020-11-02 [1] CRAN (R 3.6.3)
#> base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.6.0)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 3.6.3)
#> class 7.3-17 2020-04-26 [1] CRAN (R 3.6.3)
#> classInt 0.4-3 2020-04-07 [1] CRAN (R 3.6.3)
#> cli 2.1.0 2020-10-12 [1] CRAN (R 3.6.3)
#> codetools 0.2-16 2018-12-24 [2] CRAN (R 3.6.3)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
#> crosstalk 1.1.0.1 2020-03-13 [1] CRAN (R 3.6.3)
#> curl 4.3 2019-12-02 [1] CRAN (R 3.6.1)
#> data.table * 1.13.2 2020-10-19 [1] CRAN (R 3.6.3)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.3)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 3.6.3)
#> dichromat 2.0-0 2013-01-24 [1] CRAN (R 3.6.0)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 3.6.3)
#> dplyr 1.0.2 2020-08-18 [1] CRAN (R 3.6.3)
#> e1071 1.7-4 2020-10-14 [1] CRAN (R 3.6.3)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.3)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
#> foreign 0.8-75 2020-01-20 [2] CRAN (R 3.6.3)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 3.6.3)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 3.6.3)
#> geosphere 1.5-10 2019-05-26 [1] CRAN (R 3.6.1)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 3.6.3)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 3.6.3)
#> htmlwidgets 1.5.2 2020-10-03 [1] CRAN (R 3.6.3)
#> httr 1.4.2 2020-07-20 [1] CRAN (R 3.6.3)
#> igraph 1.2.6 2020-10-06 [1] CRAN (R 3.6.3)
#> KernSmooth 2.23-18 2020-10-29 [1] CRAN (R 3.6.3)
#> knitr 1.30 2020-09-22 [1] CRAN (R 3.6.3)
#> lattice 0.20-41 2020-04-02 [1] CRAN (R 3.6.3)
#> leafem 0.1.3 2020-07-26 [1] CRAN (R 3.6.3)
#> leaflet 2.0.3 2019-11-16 [1] CRAN (R 3.6.3)
#> leafsync 0.1.0 2019-03-05 [1] CRAN (R 3.6.1)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.2)
#> lwgeom 0.2-5 2020-06-12 [1] CRAN (R 3.6.3)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.3)
#> maptools 1.0-2 2020-08-24 [1] CRAN (R 3.6.3)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
#> mime 0.9 2020-02-04 [1] CRAN (R 3.6.2)
#> osmextract * 0.1.0 2020-11-12 [1] local
#> pillar 1.4.6 2020-07-10 [1] CRAN (R 3.6.3)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 3.6.3)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 3.6.3)
#> png 0.1-7 2013-12-03 [1] CRAN (R 3.6.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 3.6.3)
#> ps 1.4.0 2020-10-07 [1] CRAN (R 3.6.3)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.3)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 3.6.3)
#> raster 3.3-13 2020-07-17 [1] CRAN (R 3.6.3)
#> RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.6.0)
#> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 3.6.3)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 3.6.3)
#> rgeos 0.5-5 2020-09-07 [1] CRAN (R 3.6.3)
#> rlang 0.4.8 2020-10-08 [1] CRAN (R 3.6.3)
#> rmarkdown 2.5 2020-10-21 [1] CRAN (R 3.6.3)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
#> sf * 0.9-6 2020-09-13 [1] CRAN (R 3.6.3)
#> sfheaders * 0.3.0 2020-10-31 [1] CRAN (R 3.6.3)
#> sfnetworks * 0.3.1 2020-10-30 [1] local
#> sp 1.4-4 2020-10-07 [1] CRAN (R 3.6.3)
#> stars 0.4-4 2020-08-19 [1] Github (r-spatial/stars@b7b54c8)
#> stplanr * 0.8.0.9000 2020-11-10 [1] local
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
#> testthat 3.0.0 2020-10-31 [1] CRAN (R 3.6.3)
#> tibble 3.0.4 2020-10-12 [1] CRAN (R 3.6.3)
#> tidygraph 1.2.0 2020-05-12 [1] CRAN (R 3.6.3)
#> tidyr 1.1.2 2020-08-27 [1] CRAN (R 3.6.3)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.3)
#> tmap * 3.2 2020-09-15 [1] CRAN (R 3.6.3)
#> tmaptools 3.1 2020-07-12 [1] Github (mtennekes/tmaptools@947f3bd)
#> units 0.6-7 2020-06-13 [1] CRAN (R 3.6.3)
#> usethis 1.6.3 2020-09-17 [1] CRAN (R 3.6.3)
#> vctrs 0.3.4 2020-08-29 [1] CRAN (R 3.6.3)
#> viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.6.0)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 3.6.3)
#> xfun 0.19 2020-10-30 [1] CRAN (R 3.6.3)
#> XML 3.99-0.3 2020-01-20 [1] CRAN (R 3.6.2)
#> xml2 1.3.2 2020-04-23 [1] CRAN (R 3.6.3)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2)
#>
#> [1] C:/Users/Utente/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.3/library |
Hi @agila5 many thanks for the idea and detailed reprex. I think your idea has (fast) legs! Plan to pick up on this when the 'dust has settled' around some other projects/ideas but overall I think this is a good way forward. On a related note... I have found a way to split all linestrings into a maximum length using Catch up soon! |
Hi!
Awesome, thanks! IMO the idea is good, and I've already started using it. I will update my previous comment if I find any bug/problem.
I also checked the |
Heads-up @agila5, I've been doing some experiments and have found a function for getting boundary points that is 10 times faster than the # alternative
get_boundary_points3 <- function(rnet) {
coordinates <- sfheaders::sf_to_df(rnet)
# names(coordinates)
head(coordinates)
coordinates <- as.matrix(coordinates[c("x", "y", "linestring_id")])
L1_index <- ncol(coordinates)
coordinates <- unname(coordinates)
first_pair <- !duplicated(coordinates[, L1_index])
last_pair <- !duplicated(coordinates[, L1_index], fromLast = TRUE)
idxs <- first_pair | last_pair
pairs <- coordinates[idxs, ]
boundary_points <- sfheaders::sf_point(
pairs[, -L1_index]
)
sf::st_crs(boundary_points) <- sf::st_crs(rnet)
boundary_points
} Full reprex below: # Aim: test rnet_breakup components
# setup -------------------------------------------------------------------
remotes::install_github("itsleeds/osmextract")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'osmextract' from a github remote, the SHA1 (c8f08f09) has not changed since last install.
#> Use `force = TRUE` to force installation
library(osmextract)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright.
#> Any product made from OpenStreetMap must cite OSM as the data source.
#> Geofabrik data are taken from https://download.geofabrik.de/
#> For usage details of bbbike data see https://download.bbbike.org/osm/
#> OpenStreetMap_fr data are taken from http://download.openstreetmap.fr/
library(stplanr)
rnet <- oe_get(
place = "Isle of Wight",
quiet = FALSE,
query = "
SELECT osm_id, highway, z_order, other_tags, geometry
FROM 'lines'
WHERE highway IN (
'primary', 'primary_link',
'secondary', 'secondary_link',
'tertiary', 'tertiary_link',
'unclassified', 'residential', 'service'
)
"
)
#> The input place was matched with: Isle of Wight
#> The chosen file was already detected in the download directory. Skip downloading.
#> The corresponding gpkg file was already detected. Skip vectortranslate operations.
#> Reading layer `lines' from data source `/mnt/57982e2a-2874-4246-a6fe-115c199bc6bd/data/osm/geofabrik_isle-of-wight-latest.gpkg' using driver `GPKG'
#> Simple feature collection with 11475 features and 4 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -1.583473 ymin: 50.57568 xmax: -1.070221 ymax: 50.76731
#> geographic CRS: WGS 84
system.time(rnet_old <- rnet_breakup_vertices(rnet)) # ~10s
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> user system elapsed
#> 9.431 0.032 9.463
# basic checks
nrow(rnet_old) / nrow(rnet) # 1.7
#> [1] 1.747538
sum(sf::st_length(rnet_old)) / sum(sf::st_length(rnet)) # 1
#> 1 [1]
# andrea's new boundary points function:
get_boundary_points <- function(rnet) {
# 1a. extract coordinates
coordinates <- sf::st_coordinates(rnet)
# 1b. Find index of L1 column
L1_index <- ncol(coordinates)
# 1c. Remove colnames
coordinates <- unname(coordinates)
# 2. Find idxs of first and last coordinate (i.e. the boundary points)
first_pair <- !duplicated(coordinates[, L1_index])
last_pair <- !duplicated(coordinates[, L1_index], fromLast = TRUE)
idxs <- first_pair | last_pair
# 3. Extract idxs and rebuild sfc
pairs <- coordinates[idxs, ]
boundary_points <- sf::st_cast(
sf::st_sfc(
sf::st_multipoint(pairs[, -L1_index]),
crs = sf::st_crs(rnet)
),
"POINT"
)
boundary_points
}
# alternative
get_boundary_points2 <- function(rnet) {
coordinates <- sfheaders::sf_to_df(rnet)
# names(coordinates)
head(coordinates)
coordinates <- as.matrix(coordinates[c("x", "y", "linestring_id")])
L1_index <- ncol(coordinates)
coordinates <- unname(coordinates)
first_pair <- !duplicated(coordinates[, L1_index])
last_pair <- !duplicated(coordinates[, L1_index], fromLast = TRUE)
idxs <- first_pair | last_pair
pairs <- coordinates[idxs, ]
boundary_points <- sf::st_cast(
sf::st_sfc(
sf::st_multipoint(pairs[, -L1_index]),
crs = sf::st_crs(rnet)
),
"POINT"
)
boundary_points
}
# alternative
get_boundary_points3 <- function(rnet) {
coordinates <- sfheaders::sf_to_df(rnet)
# names(coordinates)
head(coordinates)
coordinates <- as.matrix(coordinates[c("x", "y", "linestring_id")])
L1_index <- ncol(coordinates)
coordinates <- unname(coordinates)
first_pair <- !duplicated(coordinates[, L1_index])
last_pair <- !duplicated(coordinates[, L1_index], fromLast = TRUE)
idxs <- first_pair | last_pair
pairs <- coordinates[idxs, ]
boundary_points <- sfheaders::sf_point(
pairs[, -L1_index]
)
sf::st_crs(boundary_points) <- sf::st_crs(rnet)
boundary_points
}
system.time(bp1 <- get_boundary_points(rnet)) # 0.3s
#> user system elapsed
#> 0.277 0.000 0.277
system.time(bp2 <- get_boundary_points2(rnet)) # 0.3s
#> user system elapsed
#> 0.367 0.000 0.367
system.time(bp3 <- get_boundary_points3(rnet)) # 10 times faster!
#> user system elapsed
#> 0.034 0.000 0.034
identical(bp1, bp2)
#> [1] TRUE
identical(sf::st_coordinates(bp1), sf::st_coordinates(bp3))
#> [1] TRUE Created on 2020-11-21 by the reprex package (v0.3.0) |
Update @agila5, I think we can make lots of savings, it will be substantially faster just by replacing I've not tried your linestring specific implementation of See also ad-hoc tests in #443 |
Hi Robin! I run (part of that) code on my laptop and I think there is a problem with # dev packages
remotes::install_github("ropensci/stplanr")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'stplanr' from a github remote, the SHA1 (59562f89) has not changed since last install.
#> Use `force = TRUE` to force installation
# packages
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(stplanr)
# data
rnet = trafficalmr::tc_data_osm
# output
bpline2points <- line2points(rnet)
boundary_points <- rnet_boundary_points(rnet)
# plot
par(mar = rep(0, 4))
plot(rnet$geometry)
plot(boundary_points, add = TRUE, cex = 0.75) # nothing happens, why? This is clearly not the intended result, right?
boundary_points
#> Simple feature collection with 998 features and 0 fields
#> geometry type: POINT
#> dimension: XYZ
#> bbox: xmin: 1 ymin: -0.116214 xmax: 499 ymax: -0.1021283
#> z_range: zmin: 51.50323 zmax: 51.50875
#> geographic CRS: WGS 84
#> First 10 features:
#> geometry
#> 1 POINT Z (1 -0.113894 51.50631)
#> 2 POINT Z (1 -0.1127652 51.50...
#> 3 POINT Z (2 -0.104463 51.50714)
#> 4 POINT Z (2 -0.1043284 51.50...
#> 5 POINT Z (3 -0.112129 51.50674)
#> 6 POINT Z (3 -0.1113433 51.50...
#> 7 POINT Z (4 -0.105887 51.50862)
#> 8 POINT Z (4 -0.1076651 51.50...
#> 9 POINT Z (5 -0.1141145 51.50...
#> 10 POINT Z (5 -0.1140915 51.50...
# old output
plot(rnet$geometry)
plot(bpline2points$geometry, add = TRUE, cex = 0.75) Created on 2020-11-28 by the reprex package (v0.3.0) If you want I can check the problem in the afternoon.
Great! So you are happy for a PR that add and tests the new approach? |
You are of course right, it was an unfinished function as documented in #450, just fixed. I think the boundary point functions are now finished and fast, as shown in the updated reprex below, suggesting we can already double the speed of
Sorry for the slow reply, yes please! Code for a reprex (key bits in a reprex and will push in a commit below): remotes::install_github("ropensci/stplanr")
library(sf)
library(stplanr)
# rnet = stplanr::rnet_roundabout
# rnet = stplanr::osm_net_example
rnet = trafficalmr::tc_data_osm
rnet_boundary_rnet_breakup <- function(rnet) {
rnet_nodes <- sf::st_geometry(line2points(rnet))
rnet_internal_vertexes <- sf::st_geometry(line2vertices(rnet))
unique_rnet_nodes <- do.call("c", unique(rnet_nodes))
unique_rnet_internal_vertexes <- do.call("c", unique(rnet_internal_vertexes))
rbind_nodes_internal_vertexes <- rbind(unique_rnet_nodes, unique_rnet_internal_vertexes)
index_intersection_points <- duplicated(rbind_nodes_internal_vertexes)
intersection_points <- sf::st_as_sf(
data.frame(rbind_nodes_internal_vertexes[index_intersection_points, , drop = FALSE]),
coords = c("x_coords", "y_coords"),
crs = sf::st_crs(rnet)
)
}
`%dtIN%` <- function(y, x) {
tmp = data.table::rbindlist(list(x,y))
len_ = nrow(x)
tmp[, idx := any(.I <= len_) & .N > 1L, by=names(tmp)]
tail(tmp$idx, nrow(y))
}
rnet_boundary_rnet_breakup2 <- function(rnet) {
coordinates <- sfheaders::sf_to_df(rnet)
first_pair <- !duplicated(coordinates[["sfg_id"]])
last_pair <- !duplicated(coordinates[["sfg_id"]], fromLast = TRUE)
idxs <- first_pair | last_pair
pairs <- unique(coordinates[idxs, c("x", "y")])
i <- coordinates[!idxs, c("x", "y")]
# find 'internal intersections'
i_dup <- duplicated(i)
i_u <- unique(i)
# this stage can be made a bit faster with data.table:
# https://stackoverflow.com/questions/23971160
# i_in_pairs = interaction(i_u) %in% interaction(pairs) # very slow!
if(requireNamespace("data.table", quietly = TRUE))
i_in_pairs <- i_u %dtIN% pairs
p_sf <- rbind(i_u[i_in_pairs, ], i[i_dup, ])
p <- sfheaders::sf_point(unique(p_sf))
sf::st_crs(p) <- sf::st_crs(rnet)
p
}
p1 = rnet_boundary_rnet_breakup(rnet)
p2 = rnet_boundary_rnet_breakup2(rnet)
nrow(p1)
nrow(p2)
plot(p1)
plot(p2, cex = 2, add = TRUE)
bpline2points <- line2points(rnet)
boundary_points <- rnet_boundary_points(rnet)
boundary_points2 <- rnet_duplicated_vertices(rnet)
boundary_points_rnb <- rnet_boundary_rnet_breakup(rnet)
boundary_points_rnb2 <- rnet_boundary_rnet_breakup2(rnet)
plot(rnet$geometry)
plot(boundary_points, add = TRUE, cex = 0.5) # too many
plot(boundary_points2, add = TRUE)
plot(boundary_points_rnb, col = "red", add = TRUE, cex = 3)
nrow(bpline2points)
nrow(boundary_points)
nrow(boundary_points2)
nrow(boundary_points_rnb)
identical(boundary_points2$geometry, boundary_points_rnb$geometry)
summary(boundary_points2$geometry %in% boundary_points_rnb$geometry)
bench::mark(check = FALSE,
l2p = {line2points(rnet)},
rbp = {rnet_boundary_points(rnet)},
rdv = {rnet_duplicated_vertices(rnet)},
rnb = {rnet_boundary_rnet_breakup(rnet)},
rnb2 = {rnet_boundary_rnet_breakup2(rnet)},
rnbv = {rnet_breakup_vertices(rnet)}
)
|
bench::mark(check = FALSE,
l2p = {line2points(rnet)},
rbp = {rnet_boundary_points(rnet)},
rdv = {rnet_duplicated_vertices(rnet)},
rnb = {rnet_boundary_rnet_breakup(rnet)},
rnb2 = {rnet_boundary_rnet_breakup2(rnet)},
rnbv = {rnet_breakup_vertices(rnet)}
)
#> # A tibble: 6 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 l2p 32.81ms 35.1ms 28.4 477.23KB 5.68
#> 2 rbp 1.24ms 1.39ms 694. 422.35KB 10.0
#> 3 rdv 13.39ms 14.09ms 70.5 1.49MB 3.92
#> 4 rnb 45.19ms 46.86ms 21.2 1.05MB 5.79
#> 5 rnb2 4.89ms 5.4ms 180. 716.48KB 5.94
#> 6 rnbv 191.64ms 195.81ms 5.12 7.51MB 6.83 |
Hi! I didn't check all the details yet but I think that the message is to use
Ok, great! I will add something here and in sfnetworks repo after lunch! |
Yes. If you want to avoid duplicate points use One other thing: do you think it's worth breaking the function into smaller exported pieces, such as: stplanr/data-raw/ad-hoc-tests/test-rnet-breakup-timing1.R Lines 25 to 51 in 8f055db
I think so! |
Clarification, I think the code chunk shown in the test code above can be used, after the fast version of the function Lines 100 to 121 in 8f055db
With something like this: intersection_points <- rnet_boundary_rnet_breakup(rnet) |
Almost worth creating a new package called |
If we can get 10 times faster boundary points function in |
Heads-up @luukvdmeer this is the faster code, pretty quick. BTW I've had doubts about the name stplanr/R/rnet_boundary_points.R Lines 20 to 37 in 8f055db
|
Great! I will test this in I might change |
Using this makes constructing an |
Thanks for all contributions to this really interesting issue. Just one note via library(stplanr)
library(sf)
rnet = stplanr::osm_net_example
# define:
# rnet_boundary_rnet_breakup
# `%dtIN%`
# rnet_boundary_rnet_breakup2
boundary_points_by_id_r <- function (rnet) {
ids <- lapply (rnet$geometry, function (i) rownames (as.matrix (i)))
ends <- vapply (ids, function (i) i [c (1, length (i))],
character (2), USE.NAMES = FALSE)
ends <- unique (as.vector (ends))
mids <- lapply (ids, function (i) i [-c (1, length (i))])
mids <- unname (do.call (c, mids))
mid_is_end <- mids [mids %in% ends]
mids <- table (mids)
mids_dup <- names (mids) [which (mids > 1)]
ids <- unique (c (mid_is_end, mids_dup))
xy <- do.call (rbind, lapply (rnet$geometry, function (i) as.matrix (i)))
xy <- xy [match (ids, rownames (xy)), ]
p <- sfheaders::sf_point (xy)
sf::st_crs(p) <- sf::st_crs(rnet)
return (p)
}
library(Rcpp)
cppFunction('
Rcpp::CharacterVector getids (Rcpp::List g) {
const int n = g.size ();
std::unordered_set <std::string> endset, midset, middupset;
Rcpp::List mids (n);
for (Rcpp::NumericMatrix i: g) {
Rcpp::List dimnames = i.attr ("dimnames");
Rcpp::CharacterVector rownames = dimnames (0);
endset.emplace (rownames (0));
endset.emplace (rownames (rownames.size () - 1));
rownames.erase (0);
rownames.erase (rownames.size () - 1);
for (auto j: rownames)
{
std::string js = Rcpp::as <std::string> (j);
if (midset.find (js) != midset.end ())
middupset.emplace (j);
midset.emplace (js);
}
}
std::unordered_set <std::string> mid_is_end;
for (auto i: midset) {
if (endset.find (i) != endset.end ())
mid_is_end.emplace (i);
}
std::unordered_set <std::string> res;
for (auto i: mid_is_end)
res.emplace (i);
for (auto i: middupset)
res.emplace (i);
return Rcpp::wrap (res);
}
')
boundary_points_by_id_cpp <- function (rnet) {
ids <- getids (rnet$geometry)
xy <- do.call (rbind, lapply (rnet$geometry, function (i) as.matrix (i)))
xy <- xy [match (ids, rownames (xy)), ]
p <- sfheaders::sf_point (xy)
sf::st_crs(p) <- sf::st_crs(rnet)
return (p)
}
pref <- rnet_boundary_rnet_breakup2 (rnet)
pr <- boundary_points_by_id_r (rnet)
pc <- boundary_points_by_id_cpp (rnet)
nrow (pref); nrow (pr); nrow (pc)
#> [1] 55
#> [1] 55
#> [1] 55
bench::mark(check = FALSE,
l2p = {line2points(rnet)},
rbp = {rnet_boundary_points(rnet)},
rdv = {rnet_duplicated_vertices(rnet)},
rnb = {rnet_boundary_rnet_breakup(rnet)},
rnb2 = {rnet_boundary_rnet_breakup2(rnet)},
rnbv = {rnet_breakup_vertices(rnet)},
rrid = {boundary_points_by_id_r(rnet)},
rcid = {boundary_points_by_id_cpp(rnet)},
time_unit = "ms")
#> # A tibble: 8 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <bch:byt> <dbl>
#> 1 l2p 5.83 6.10 153. NA 6.36
#> 2 rbp 0.435 0.471 2098. NA 6.20
#> 3 rdv 2.99 3.15 313. NA 2.03
#> 4 rnb 9.84 10.2 95.1 NA 6.48
#> 5 rnb2 2.10 2.34 405. NA 4.12
#> 6 rnbv 42.9 44.2 22.6 NA 5.02
#> 7 rrid 1.66 1.80 539. NA 6.24
#> 8 rcid 0.878 0.936 1021. NA 4.22 Created on 2020-12-02 by the reprex package (v0.3.0) Both versions there are pretty hacky, and the C++ could likely be made even more efficient. Main point I wanted to make is that when a new I've been needing a good home for a function like this too, so would be very happy to have it housed in |
Very happy with that @luukvdmeer, thanks! |
Many thanks for chipping in @mpadge, impressive CPP implementation. Confirmed: decent speeded-up version of #> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <bch:byt> <dbl>
#> 1 l2p 5.53 6.33 159. 96.33KB 5.77
#> 2 rbp 0.423 0.463 2098. 85.59KB 6.26
#> 3 rdv 3.25 3.51 282. 346.93KB 2.04
#> 4 rnb 10.1 10.9 90.7 276.41KB 6.64
#> 5 rnb2 2.04 2.23 440. 187.54KB 6.34
#> 6 rnbv 38.4 40.0 25.1 1.29MB 4.56
#> 7 rrid 1.70 1.89 524. 88.64KB 6.29
#> 8 rcid 0.740 0.840 1169. 82.91KB 4.11 # Aim: try to speed-up rnet_breakup_vertices
remotes::install_github("ropensci/stplanr")
#> Using github PAT from envvar GITHUB_PAT
#> Downloading GitHub repo ropensci/stplanr@HEAD
#>
#> checking for file ‘/tmp/Rtmpwfdfrp/remotes103c22f776dc/ropensci-stplanr-8f055db/DESCRIPTION’ ... ✔ checking for file ‘/tmp/Rtmpwfdfrp/remotes103c22f776dc/ropensci-stplanr-8f055db/DESCRIPTION’
#> ─ preparing ‘stplanr’:
#> checking DESCRIPTION meta-information ... ✔ checking DESCRIPTION meta-information
#> ─ cleaning src
#> ─ checking for LF line-endings in source and make files and shell scripts
#> ─ checking for empty or unneeded directories
#> ─ building ‘stplanr_0.8.0.9000.tar.gz’
#>
#>
#> Installing package into '/home/robin/R/x86_64-pc-linux-gnu-library/4.0'
#> (as 'lib' is unspecified)
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
library(stplanr)
# rnet = stplanr::rnet_roundabout
# rnet = stplanr::osm_net_example
rnet = trafficalmr::tc_data_osm
rnet_boundary_rnet_breakup <- function(rnet) {
rnet_nodes <- sf::st_geometry(line2points(rnet))
rnet_internal_vertexes <- sf::st_geometry(line2vertices(rnet))
unique_rnet_nodes <- do.call("c", unique(rnet_nodes))
unique_rnet_internal_vertexes <- do.call("c", unique(rnet_internal_vertexes))
rbind_nodes_internal_vertexes <- rbind(unique_rnet_nodes, unique_rnet_internal_vertexes)
index_intersection_points <- duplicated(rbind_nodes_internal_vertexes)
intersection_points <- sf::st_as_sf(
data.frame(rbind_nodes_internal_vertexes[index_intersection_points, , drop = FALSE]),
coords = c("x_coords", "y_coords"),
crs = sf::st_crs(rnet)
)
}
`%dtIN%` <- function(y, x) {
tmp = data.table::rbindlist(list(x,y))
len_ = nrow(x)
tmp[, idx := any(.I <= len_) & .N > 1L, by=names(tmp)]
tail(tmp$idx, nrow(y))
}
rnet_boundary_rnet_breakup2 <- function(rnet) {
coordinates <- sfheaders::sf_to_df(rnet)
first_pair <- !duplicated(coordinates[["sfg_id"]])
last_pair <- !duplicated(coordinates[["sfg_id"]], fromLast = TRUE)
idxs <- first_pair | last_pair
pairs <- unique(coordinates[idxs, c("x", "y")])
i <- coordinates[!idxs, c("x", "y")]
# find 'internal intersections'
i_dup <- duplicated(i)
i_u <- unique(i)
# this stage can be made a bit faster with data.table:
# https://stackoverflow.com/questions/23971160
# i_in_pairs = interaction(i_u) %in% interaction(pairs) # very slow!
if(requireNamespace("data.table", quietly = TRUE))
i_in_pairs <- i_u %dtIN% pairs
p_sf <- rbind(i_u[i_in_pairs, ], i[i_dup, ])
p <- sfheaders::sf_point(unique(p_sf))
sf::st_crs(p) <- sf::st_crs(rnet)
p
}
p1 = rnet_boundary_rnet_breakup(rnet)
p2 = rnet_boundary_rnet_breakup2(rnet)
nrow(p1)
#> [1] 203
nrow(p2)
#> [1] 253
plot(p1)
plot(p2, cex = 2, add = TRUE) bpline2points <- line2points(rnet)
boundary_points <- rnet_boundary_points(rnet)
boundary_points2 <- rnet_duplicated_vertices(rnet)
boundary_points_rnb <- rnet_boundary_rnet_breakup(rnet)
boundary_points_rnb2 <- rnet_boundary_rnet_breakup2(rnet)
plot(rnet$geometry)
plot(boundary_points, add = TRUE, cex = 0.5) # too many
plot(boundary_points2, add = TRUE)
plot(boundary_points_rnb, col = "red", add = TRUE, cex = 3) nrow(bpline2points)
#> [1] 998
nrow(boundary_points)
#> [1] 998
nrow(boundary_points2)
#> [1] 523
nrow(boundary_points_rnb)
#> [1] 203
identical(boundary_points2$geometry, boundary_points_rnb$geometry)
#> [1] FALSE
summary(boundary_points2$geometry %in% boundary_points_rnb$geometry)
#> Mode FALSE TRUE
#> logical 320 203
bench::mark(check = FALSE,
l2p = {line2points(rnet)},
rbp = {rnet_boundary_points(rnet)},
rdv = {rnet_duplicated_vertices(rnet)},
rnb = {rnet_boundary_rnet_breakup(rnet)},
rnb2 = {rnet_boundary_rnet_breakup2(rnet)},
rnbv = {rnet_breakup_vertices(rnet)}
)
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 6 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 l2p 30.82ms 31.93ms 29.5 477.23KB 7.88
#> 2 rbp 1.31ms 1.42ms 613. 422.35KB 9.99
#> 3 rdv 13.36ms 13.6ms 70.5 1.49MB 3.92
#> 4 rnb 42.3ms 45.39ms 21.9 1.05MB 5.96
#> 5 rnb2 5.02ms 5.2ms 177. 716.48KB 7.94
#> 6 rnbv 180.59ms 180.83ms 5.48 7.47MB 5.48
library(profvis)
profvis({
coordinates <- sfheaders::sf_to_df(trafficalmr::tc_data_osm)
first_pair <- !duplicated(coordinates[["sfg_id"]])
last_pair <- !duplicated(coordinates[["sfg_id"]], fromLast = TRUE)
idxs <- first_pair | last_pair
pairs <- unique(coordinates[idxs, c("x", "y")])
i <- coordinates[!idxs, c("x", "y")]
# find 'internal intersections'
i_dup <- duplicated(i)
i_u <- unique(i)
# this stage can be made a bit faster with data.table:
# https://stackoverflow.com/questions/23971160
# i_in_pairs = interaction(i_u) %in% interaction(pairs) # very slow!
tmp <- data.table::rbindlist(list(i_u, pairs))
len_ <- nrow(i_u)
tmp[, idx := any(.I <= len_) & .N > 1L, by=names(tmp)]
i_in_pairs <- tail(tmp$idx, nrow(pairs))
p_sf <- rbind(i_u[i_in_pairs, ], i[i_dup, ])
p <- sfheaders::sf_point(unique(p_sf))
sf::st_crs(p) <- sf::st_crs(rnet)
p
})
#> Error in parse_rprof(prof_output, expr_source): No parsing data available. Maybe your function was too fast?
rnet_breakup_vertices2 <- function(rnet, breakup_internal_vertex_matches = TRUE) {
rnet_nodes <- sf::st_geometry(line2points(rnet))
rnet_internal_vertexes <- sf::st_geometry(line2vertices(rnet))
# For the first part of the procedure I don't need duplicated nodes or
# duplicated vertexes so I can extract their unique values
unique_rnet_nodes <- do.call("c", unique(rnet_nodes))
unique_rnet_internal_vertexes <- do.call("c", unique(rnet_internal_vertexes))
# Intersection between nodes and internal vertexes
# The following code is the same as
# intersection_point <- sf::st_intersection(unique_rnet_nodes, unique_rnet_internal_vertexes)
# but faster since we are dealing only with points
rbind_nodes_internal_vertexes <- rbind(unique_rnet_nodes, unique_rnet_internal_vertexes)
index_intersection_points <- duplicated(rbind_nodes_internal_vertexes)
if (any(index_intersection_points)) {
intersection_points <- sf::st_as_sf(
data.frame(rbind_nodes_internal_vertexes[index_intersection_points, , drop = FALSE]),
coords = c("x_coords", "y_coords"),
crs = sf::st_crs(rnet)
)
message("Splitting rnet object at the shared boundary points.")
rnet_breakup_collection <- lwgeom::st_split(rnet, intersection_points$geometry)
rnet_clean <- sf::st_collection_extract(rnet_breakup_collection, "LINESTRING")
} else {
rnet_clean <- rnet
}
# Split again at the duplicated internal vertexes
rnet_internal_vertexes_duplicated <- rnet_internal_vertexes[duplicated(rnet_internal_vertexes)]
if (length(rnet_internal_vertexes_duplicated) > 0 & breakup_internal_vertex_matches) {
message("Splitting rnet object at the shared internal points.")
rnet_breakup_collection <- lwgeom::st_split(rnet_clean, rnet_internal_vertexes_duplicated)
rnet_clean <- sf::st_collection_extract(rnet_breakup_collection, "LINESTRING")
}
rnet_clean
}
plot(st_geometry(rnet_roundabout), lwd = 2, col = rainbow(nrow(rnet_roundabout)))
</details>
bench::mark(f1 = {rnet_roundabout_clean <- rnet_breakup_vertices(rnet_roundabout)})
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared boundary points.
#> # A tibble: 1 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 f1 8.65ms 8.83ms 110. 216KB 4.25
boundary_points <- st_geometry(line2points(rnet_roundabout))
points_cols <- rep(rainbow(nrow(rnet_roundabout)), each = 2)
plot(boundary_points, pch = 16, add = TRUE, col = points_cols) library(stplanr)
library(sf)
rnet = stplanr::osm_net_example
# define:
# rnet_boundary_rnet_breakup
# `%dtIN%`
# rnet_boundary_rnet_breakup2
boundary_points_by_id_r <- function (rnet) {
ids <- lapply (rnet$geometry, function (i) rownames (as.matrix (i)))
ends <- vapply (ids, function (i) i [c (1, length (i))],
character (2), USE.NAMES = FALSE)
ends <- unique (as.vector (ends))
mids <- lapply (ids, function (i) i [-c (1, length (i))])
mids <- unname (do.call (c, mids))
mid_is_end <- mids [mids %in% ends]
mids <- table (mids)
mids_dup <- names (mids) [which (mids > 1)]
ids <- unique (c (mid_is_end, mids_dup))
xy <- do.call (rbind, lapply (rnet$geometry, function (i) as.matrix (i)))
xy <- xy [match (ids, rownames (xy)), ]
p <- sfheaders::sf_point (xy)
sf::st_crs(p) <- sf::st_crs(rnet)
return (p)
}
library(Rcpp)
cppFunction('
Rcpp::CharacterVector getids (Rcpp::List g) {
const int n = g.size ();
std::unordered_set <std::string> endset, midset, middupset;
Rcpp::List mids (n);
for (Rcpp::NumericMatrix i: g) {
Rcpp::List dimnames = i.attr ("dimnames");
Rcpp::CharacterVector rownames = dimnames (0);
endset.emplace (rownames (0));
endset.emplace (rownames (rownames.size () - 1));
rownames.erase (0);
rownames.erase (rownames.size () - 1);
for (auto j: rownames)
{
std::string js = Rcpp::as <std::string> (j);
if (midset.find (js) != midset.end ())
middupset.emplace (j);
midset.emplace (js);
}
}
std::unordered_set <std::string> mid_is_end;
for (auto i: midset) {
if (endset.find (i) != endset.end ())
mid_is_end.emplace (i);
}
std::unordered_set <std::string> res;
for (auto i: mid_is_end)
res.emplace (i);
for (auto i: middupset)
res.emplace (i);
return Rcpp::wrap (res);
}
')
boundary_points_by_id_cpp <- function (rnet) {
ids <- getids (rnet$geometry)
xy <- do.call (rbind, lapply (rnet$geometry, function (i) as.matrix (i)))
xy <- xy [match (ids, rownames (xy)), ]
p <- sfheaders::sf_point (xy)
sf::st_crs(p) <- sf::st_crs(rnet)
return (p)
}
pref <- rnet_boundary_rnet_breakup2 (rnet)
pr <- boundary_points_by_id_r (rnet)
pc <- boundary_points_by_id_cpp (rnet)
nrow (pref); nrow (pr); nrow (pc)
#> [1] 55
#> [1] 55
#> [1] 55
#> [1] 55
#> [1] 55
#> [1] 55
bench::mark(check = FALSE,
l2p = {line2points(rnet)},
rbp = {rnet_boundary_points(rnet)},
rdv = {rnet_duplicated_vertices(rnet)},
rnb = {rnet_boundary_rnet_breakup(rnet)},
rnb2 = {rnet_boundary_rnet_breakup2(rnet)},
rnbv = {rnet_breakup_vertices(rnet)},
rrid = {boundary_points_by_id_r(rnet)},
rcid = {boundary_points_by_id_cpp(rnet)},
time_unit = "ms")
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> # A tibble: 8 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <bch:byt> <dbl>
#> 1 l2p 5.53 6.33 159. 96.33KB 5.77
#> 2 rbp 0.423 0.463 2098. 85.59KB 6.26
#> 3 rdv 3.25 3.51 282. 346.93KB 2.04
#> 4 rnb 10.1 10.9 90.7 276.41KB 6.64
#> 5 rnb2 2.04 2.23 440. 187.54KB 6.34
#> 6 rnbv 38.4 40.0 25.1 1.29MB 4.56
#> 7 rrid 1.70 1.89 524. 88.64KB 6.29
#> 8 rcid 0.740 0.840 1169. 82.91KB 4.11 Created on 2020-12-02 by the reprex package (v0.3.0) |
I'm up for that but suggest waiting until #451 by @agila5 is merged first. No harm in having parallel PRs but with |
But overall very keen for more super fast CPP code in there using |
Regarding your point about OSM ids and boundary points @mpadge, interesting one. Looking into it with this: mapview::mapview(rnet) +
mapview::mapview(pc) |
The closes OSM element I can find is this: https://www.openstreetmap.org/node/7592587253 |
Hi everyone and thank you very much for your inputs. I plan to check everything as soon as possible (Saturday probably). Just one note: the previous benchmarks should be adjusted since you are comparing different ideas (i.e. extract the boundary points, extract the "internal" points, breaking up a line string). |
If I may also contribute here ;)
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
library(sfheaders)
p1 = st_point(c(0, 1))
p2 = st_point(c(1, 1))
p3 = st_point(c(3, 1))
p4 = st_point(c(4, 1))
p5 = st_point(c(3, 2))
p6 = st_point(c(3, 0))
p7 = st_point(c(4, 3))
p8 = st_point(c(4, 2))
p9 = st_point(c(4, 0))
l1 = st_sfc(st_linestring(c(p1, p2, p3, p4)), crs = 4326)
l2 = st_sfc(st_linestring(c(p5, p3, p6)), crs = 4326)
l3 = st_sfc(st_linestring(c(p7, p8, p4, p9)), crs = 4326)
points = st_sfc(st_multipoint(c(p1, p2, p3, p4, p5, p6, p7, p8, p9)))
lines = c(l1, l2, l3)
lines = st_as_sf(lines)
par(mar = c(1, 1, 1, 1))
plot(lines, col = rainbow(nrow(lines)))
plot(points, pch = 20, add = TRUE) get_split_points = function(x, unique = FALSE) {
# Get linestring vertices.
df = sfc_to_df(st_geometry(x))
coords = df[names(df) %in% c("x", "y", "z", "m")]
# Find which of the vertices are boundaries.
is_startpoint = !duplicated(df[["sfg_id"]])
is_endpoint = !duplicated(df[["sfg_id"]], fromLast = TRUE)
is_boundary = is_startpoint | is_endpoint
# Find which of the vertices occur more than one.
is_duplicate_desc = duplicated(coords)
is_duplicate_asc = duplicated(coords, fromLast = TRUE)
is_multiple = is_duplicate_desc | is_duplicate_asc
# Split points are those vertices that:
# --> 1) Have duplicates, and
# --> 2) Are not boundaries.
split_idxs = is_multiple & !is_boundary
# Subset coordinates.
split_coords = coords[split_idxs, ]
if (unique) split_coords = unique(split_coords)
# Rebuild sf structure.
split_sf = sf_point(split_coords)
st_crs(split_sf) = st_crs(x)
split_sf
}
split_points_1 = get_split_points(lines, unique = TRUE)
split_points_1
#> Simple feature collection with 2 features and 0 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 3 ymin: 1 xmax: 4 ymax: 1
#> CRS: EPSG:4326
#> geometry
#> 1 POINT (3 1)
#> 2 POINT (4 1)
plot(lines, col = rainbow(nrow(lines)))
plot(points, pch = 20, add = TRUE)
plot(split_points_1, pch = 8, col = "red", add = TRUE) `%dtIN%` <- function(y, x) {
tmp = data.table::rbindlist(list(x,y))
len_ = nrow(x)
tmp[, idx := any(.I <= len_) & .N > 1L, by=names(tmp)]
tail(tmp$idx, nrow(y))
}
rnet_boundary_rnet_breakup2 <- function(rnet) {
coordinates <- sfheaders::sf_to_df(rnet)
first_pair <- !duplicated(coordinates[["sfg_id"]])
last_pair <- !duplicated(coordinates[["sfg_id"]], fromLast = TRUE)
idxs <- first_pair | last_pair
pairs <- unique(coordinates[idxs, c("x", "y")])
i <- coordinates[!idxs, c("x", "y")]
# find 'internal intersections'
i_dup <- duplicated(i)
i_u <- unique(i)
# this stage can be made a bit faster with data.table:
# https://stackoverflow.com/questions/23971160
# i_in_pairs = interaction(i_u) %in% interaction(pairs) # very slow!
if(requireNamespace("data.table", quietly = TRUE))
i_in_pairs <- i_u %dtIN% pairs
p_sf <- rbind(i_u[i_in_pairs, ], i[i_dup, ])
p <- sfheaders::sf_point(unique(p_sf))
sf::st_crs(p) <- sf::st_crs(rnet)
p
}
split_points_2 = rnet_boundary_rnet_breakup2(lines)
split_points_2
#> Simple feature collection with 2 features and 0 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 3 ymin: 1 xmax: 4 ymax: 1
#> CRS: EPSG:4326
#> geometry
#> 1 POINT (4 1)
#> 2 POINT (3 1)
plot(lines, col = rainbow(nrow(lines)))
plot(points, pch = 20, add = TRUE)
plot(split_points_2, pch = 8, col = "red", add = TRUE) bench::mark(
check = FALSE,
rnb2 = {rnet_boundary_rnet_breakup2(lines)},
gsp = {get_split_points(lines)},
time_unit = "ms"
)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <bch:byt> <dbl>
#> 1 rnb2 1.13 1.20 774. 100.78KB 24.5
#> 2 gsp 0.270 0.282 3326. 4.98KB 28.0 Created on 2020-12-03 by the reprex package (v0.3.0) |
You made it 4 times faster again @luukvdmeer? 🤯 amazing if so, not double checked but very exciting if so, many thanks for sharing! |
One more thought: All of the code using vertex coordinates relies on applying equality operations to floating point numbers (mostly via |
Here an extension of my previous post: a function to subdivide lines by shared internal vertices. Hence, an alternative to Of course @mpadge makes a very important point about the use of library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
library(sfheaders)
p1 = st_point(c(0, 1))
p2 = st_point(c(1, 1))
p3 = st_point(c(3, 1))
p4 = st_point(c(4, 1))
p5 = st_point(c(3, 2))
p6 = st_point(c(3, 0))
p7 = st_point(c(4, 3))
p8 = st_point(c(4, 2))
p9 = st_point(c(4, 0))
l1 = st_sfc(st_linestring(c(p1, p2, p3, p4)), crs = 4326)
l2 = st_sfc(st_linestring(c(p5, p3, p6)), crs = 4326)
l3 = st_sfc(st_linestring(c(p7, p8, p4, p9)), crs = 4326)
points = st_sfc(st_multipoint(c(p1, p2, p3, p4, p5, p6, p7, p8, p9)))
lines = c(l1, l2, l3)
lines = st_as_sf(lines)
lines$foo = letters[1:nrow(lines)]
lines
#> Simple feature collection with 3 features and 1 field
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 0 ymin: 0 xmax: 4 ymax: 3
#> CRS: EPSG:4326
#> x foo
#> 1 LINESTRING (0 1, 1 1, 3 1, ... a
#> 2 LINESTRING (3 2, 3 1, 3 0) b
#> 3 LINESTRING (4 3, 4 2, 4 1, ... c
par(mar = c(1, 1, 1, 1))
plot(st_geometry(lines), col = rainbow(nrow(lines)))
plot(points, pch = 20, add = TRUE) subdivision_1 = stplanr::rnet_breakup_vertices(lines)
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
subdivision_1
#> Simple feature collection with 6 features and 1 field
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 0 ymin: 0 xmax: 4 ymax: 3
#> CRS: EPSG:4326
#> foo x
#> 1 a LINESTRING (0 1, 1 1, 3 1)
#> 1.1 a LINESTRING (3 1, 4 1)
#> 2 b LINESTRING (3 2, 3 1)
#> 2.1 b LINESTRING (3 1, 3 0)
#> 3 c LINESTRING (4 3, 4 2, 4 1)
#> 3.1 c LINESTRING (4 1, 4 0)
plot(st_geometry(subdivision_1), col = rainbow(nrow(subdivision_1)))
plot(points, pch = 20, add = TRUE) subdivide_lines_v1 = function(x, drop_attrs = FALSE) {
# Get linestring vertices.
df = sfc_to_df(st_geometry(x))
coords = df[names(df) %in% c("x", "y", "z", "m")]
# Find which of the vertices are boundaries.
is_startpoint = !duplicated(df[["sfg_id"]])
is_endpoint = !duplicated(df[["sfg_id"]], fromLast = TRUE)
is_boundary = is_startpoint | is_endpoint
# Find which of the vertices occur more than one.
is_duplicate_desc = duplicated(coords)
is_duplicate_asc = duplicated(coords, fromLast = TRUE)
is_multiple = is_duplicate_desc | is_duplicate_asc
# Split points are those vertices that:
# --> 1) Have duplicates, and
# --> 2) Are not boundaries.
is_split = is_multiple & !is_boundary
# Duplicate coordinates of split points.
reps = rep(1L, nrow(coords)) # By default store each point once.
reps[is_split] = 2L # Store split points twice.
new_coords = data.frame(lapply(coords, function(i) rep(i, reps)))
# Update linestring indices.
# --> First duplicate original indices at each split point.
# --> Then increment those accordingly at each split point.
dup_idxs = rep(df$linestring_id, reps)
incs = integer(nrow(new_coords)) # By default don't increment.
incs[which(is_split) + 1:sum(is_split)] = 1L # Increment by 1 after each split.
new_idxs = dup_idxs + cumsum(incs)
new_coords$linestring_id = new_idxs
# Rebuild lines.
new_lines = sfc_linestring(new_coords, linestring_id = "linestring_id")
st_crs(new_lines) = st_crs(x)
# Restore original line attributes.
# Duplicate attributes within splitted lines.
if (drop_attrs) {
x_new = st_as_sf(new_lines)
} else {
orig_idxs = dup_idxs[!duplicated(new_idxs)]
x_new = x[orig_idxs, ]
st_geometry(x_new) = new_lines
}
x_new
}
subdivision_2 = subdivide_lines_v1(lines)
subdivision_2
#> Simple feature collection with 6 features and 1 field
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 0 ymin: 0 xmax: 4 ymax: 3
#> CRS: EPSG:4326
#> x foo
#> 1 LINESTRING (0 1, 1 1, 3 1) a
#> 1.1 LINESTRING (3 1, 4 1) a
#> 2 LINESTRING (3 2, 3 1) b
#> 2.1 LINESTRING (3 1, 3 0) b
#> 3 LINESTRING (4 3, 4 2, 4 1) c
#> 3.1 LINESTRING (4 1, 4 0) c
plot(st_geometry(subdivision_2), col = rainbow(nrow(subdivision_2)))
plot(points, pch = 20, add = TRUE) subdivision_3 = subdivide_lines_v1(lines, drop_attrs = TRUE)
subdivision_3
#> Simple feature collection with 6 features and 0 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 0 ymin: 0 xmax: 4 ymax: 3
#> CRS: EPSG:4326
#> x
#> 1 LINESTRING (0 1, 1 1, 3 1)
#> 2 LINESTRING (3 1, 4 1)
#> 3 LINESTRING (3 2, 3 1)
#> 4 LINESTRING (3 1, 3 0)
#> 5 LINESTRING (4 3, 4 2, 4 1)
#> 6 LINESTRING (4 1, 4 0)
plot(st_geometry(subdivision_3), col = rainbow(nrow(subdivision_3)))
plot(points, pch = 20, add = TRUE) subdivide_lines_v2 = function(x, drop_attrs = FALSE) {
# Get linestring vertices.
df = sfc_to_df(st_geometry(x))
coords = df[names(df) %in% c("x", "y", "z", "m")]
# Find which of the vertices are boundaries.
is_startpoint = !duplicated(df[["sfg_id"]])
is_endpoint = !duplicated(df[["sfg_id"]], fromLast = TRUE)
is_boundary = is_startpoint | is_endpoint
# Find which of the vertices occur more than one.
# is_duplicate_desc = duplicated(coords)
# is_duplicate_asc = duplicated(coords, fromLast = TRUE)
# is_multiple = is_duplicate_desc | is_duplicate_asc
is_multiple = lengths(st_equals(sf_cast(x, "POINT"))) > 1
# Split points are those vertices that:
# --> 1) Have duplicates, and
# --> 2) Are not boundaries.
is_split = is_multiple & !is_boundary
# Duplicate coordinates of split points.
reps = rep(1L, nrow(coords)) # By default store each point once.
reps[is_split] = 2L # Store split points twice.
new_coords = data.frame(lapply(coords, function(i) rep(i, reps)))
# Update linestring indices.
# --> First duplicate original indices at each split point.
# --> Then increment those accordingly at each split point.
dup_idxs = rep(df$linestring_id, reps)
incs = integer(nrow(new_coords)) # By default don't increment.
incs[which(is_split) + 1:sum(is_split)] = 1L # Increment by 1 after each split.
new_idxs = dup_idxs + cumsum(incs)
new_coords$linestring_id = new_idxs
# Rebuild lines.
new_lines = sfc_linestring(new_coords, linestring_id = "linestring_id")
st_crs(new_lines) = st_crs(x)
# Restore original line attributes.
# Duplicate attributes within splitted lines.
if (drop_attrs) {
x_new = st_as_sf(new_lines)
} else {
orig_idxs = dup_idxs[!duplicated(new_idxs)]
x_new = x[orig_idxs, ]
st_geometry(x_new) = new_lines
}
x_new
}
subdivision_4 = subdivide_lines_v2(lines)
subdivision_4
#> Simple feature collection with 6 features and 1 field
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: 0 ymin: 0 xmax: 4 ymax: 3
#> CRS: EPSG:4326
#> x foo
#> 1 LINESTRING (0 1, 1 1, 3 1) a
#> 1.1 LINESTRING (3 1, 4 1) a
#> 2 LINESTRING (3 2, 3 1) b
#> 2.1 LINESTRING (3 1, 3 0) b
#> 3 LINESTRING (4 3, 4 2, 4 1) c
#> 3.1 LINESTRING (4 1, 4 0) c
plot(st_geometry(subdivision_4), col = rainbow(nrow(subdivision_4)))
plot(points, pch = 20, add = TRUE) bench::mark(
check = FALSE,
rbv = {stplanr::rnet_breakup_vertices(lines)},
sl1 = {subdivide_lines_v1(lines)},
sl2 = {subdivide_lines_v2(lines)},
sl1d = {subdivide_lines_v1(lines, drop_attrs = TRUE)},
sl2d = {subdivide_lines_v2(lines, drop_attrs = TRUE)},
time_unit = "ms"
)
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> Splitting rnet object at the shared boundary points.
#> Splitting rnet object at the shared internal points.
#> # A tibble: 5 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <dbl> <dbl> <bch:byt> <dbl>
#> 1 rbv 8.70 9.60 102. 96.88KB 18.4
#> 2 sl1 1.01 1.05 909. 10.61KB 17.1
#> 3 sl2 2.24 2.38 400. 71.24KB 15.3
#> 4 sl1d 0.552 0.575 1629. 5.25KB 17.0
#> 5 sl2d 1.77 1.97 498. 65.88KB 15.2 Created on 2020-12-03 by the reprex package (v0.3.0) |
Great additional points @luukvdmeer. Just one follow-up in terms of benchmarking, I think @agila5's benchmark on a large real dataset should be used to ensure that it scales and ensure it works for edge cases that can be found on a citywide network. Would be interested to see benchmarks comparing with this starter for 10 from @agila5 in #451: remotes::install_github("ropensci/stplanr", "6ae27d0b81370fe9b5838fe879cc983db929d55c")
# get data
iow <- osmextract::oe_get("Isle of Wight", quiet = TRUE)
nrow(iow)
#> [1] 45422
# current approach
system.time(({
iow_breakup <- stplanr::rnet_breakup_vertices(iow)
}))
#> user system elapsed
#> 7.11 0.66 8.73
#> [1] 80957 |
Of course, totally agree! |
Output is the same and execution times similar. I think also the concept of the implementation is very similar. Just wanted to share this version without # remotes::install_github("ropensci/stplanr", "6ae27d0b81370fe9b5838fe879cc983db929d55c")
library(stplanr)
library(sf)
#> Linking to GEOS 3.7.1, GDAL 2.2.2, PROJ 4.9.2
library(sfheaders)
subdivide_lines_v1 = function(x, drop_attrs = FALSE) {
# Get linestring vertices.
df = sfc_to_df(st_geometry(x))
coords = df[names(df) %in% c("x", "y", "z", "m")]
# Find which of the vertices are boundaries.
is_startpoint = !duplicated(df[["sfg_id"]])
is_endpoint = !duplicated(df[["sfg_id"]], fromLast = TRUE)
is_boundary = is_startpoint | is_endpoint
# Find which of the vertices occur more than one.
is_duplicate_desc = duplicated(coords)
is_duplicate_asc = duplicated(coords, fromLast = TRUE)
is_multiple = is_duplicate_desc | is_duplicate_asc
# Split points are those vertices that:
# --> 1) Have duplicates, and
# --> 2) Are not boundaries.
is_split = is_multiple & !is_boundary
# Duplicate coordinates of split points.
reps = rep(1L, nrow(coords)) # By default store each point once.
reps[is_split] = 2L # Store split points twice.
new_coords = data.frame(lapply(coords, function(i) rep(i, reps)))
# Update linestring indices.
# --> First duplicate original indices at each split point.
# --> Then increment those accordingly at each split point.
dup_idxs = rep(df$linestring_id, reps)
incs = integer(nrow(new_coords)) # By default don't increment.
incs[which(is_split) + 1:sum(is_split)] = 1L # Increment by 1 after each split.
new_idxs = dup_idxs + cumsum(incs)
new_coords$linestring_id = new_idxs
# Rebuild lines.
new_lines = sfc_linestring(new_coords, linestring_id = "linestring_id")
st_crs(new_lines) = st_crs(x)
# Restore original line attributes.
# Duplicate attributes within splitted lines.
if (drop_attrs) {
x_new = st_as_sf(new_lines)
} else {
orig_idxs = dup_idxs[!duplicated(new_idxs)]
x_new = x[orig_idxs, ]
st_geometry(x_new) = new_lines
}
x_new
}
iow = osmextract::oe_get("Isle of Wight", quiet = TRUE)
#> Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
#> that
nrow(iow)
#> [1] 45421
system.time({iow_breakup_1 = rnet_breakup_vertices(iow)})
#> Splitting the input object at the internal points that are also boundaries.
#> Splitting rnet object at the duplicated internal points.
#> user system elapsed
#> 3.481 0.052 2.705
system.time({iow_breakup_2 = subdivide_lines_v1(iow)})
#> user system elapsed
#> 2.066 0.000 2.066
iow_breakup_1
#> Simple feature collection with 80955 features and 9 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -5.401978 ymin: 43.35489 xmax: -0.175775 ymax: 50.89601
#> CRS: 4326
#> First 10 features:
#> osm_id name highway waterway aerialway barrier man_made z_order
#> 1 413 Lane End residential <NA> <NA> <NA> <NA> 3
#> 2 413 Lane End residential <NA> <NA> <NA> <NA> 3
#> 3 415 Foreland Road secondary <NA> <NA> <NA> <NA> 6
#> 4 415 Foreland Road secondary <NA> <NA> <NA> <NA> 6
#> 5 415 Foreland Road secondary <NA> <NA> <NA> <NA> 6
#> 6 415 Foreland Road secondary <NA> <NA> <NA> <NA> 6
#> 7 415 Foreland Road secondary <NA> <NA> <NA> <NA> 6
#> 8 415 Foreland Road secondary <NA> <NA> <NA> <NA> 6
#> 9 415 Foreland Road secondary <NA> <NA> <NA> <NA> 6
#> 10 415 Foreland Road secondary <NA> <NA> <NA> <NA> 6
#> other_tags
#> 1 "lit"=>"yes","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph"
#> 2 "lit"=>"yes","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph"
#> 3 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 4 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 5 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 6 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 7 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 8 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 9 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 10 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> geometry
#> 1 LINESTRING (-1.083348 50.68...
#> 2 LINESTRING (-1.083194 50.68...
#> 3 LINESTRING (-1.083348 50.68...
#> 4 LINESTRING (-1.084021 50.68...
#> 5 LINESTRING (-1.084318 50.68...
#> 6 LINESTRING (-1.084632 50.68...
#> 7 LINESTRING (-1.084889 50.68...
#> 8 LINESTRING (-1.085483 50.68...
#> 9 LINESTRING (-1.086317 50.68...
#> 10 LINESTRING (-1.086771 50.68...
iow_breakup_2
#> Simple feature collection with 80955 features and 9 fields
#> geometry type: LINESTRING
#> dimension: XY
#> bbox: xmin: -5.401978 ymin: 43.35489 xmax: -0.175775 ymax: 50.89601
#> CRS: 4326
#> First 10 features:
#> osm_id name highway waterway aerialway barrier man_made
#> 1 413 Lane End residential <NA> <NA> <NA> <NA>
#> 1.1 413 Lane End residential <NA> <NA> <NA> <NA>
#> 2 415 Foreland Road secondary <NA> <NA> <NA> <NA>
#> 2.1 415 Foreland Road secondary <NA> <NA> <NA> <NA>
#> 2.2 415 Foreland Road secondary <NA> <NA> <NA> <NA>
#> 2.3 415 Foreland Road secondary <NA> <NA> <NA> <NA>
#> 2.4 415 Foreland Road secondary <NA> <NA> <NA> <NA>
#> 2.5 415 Foreland Road secondary <NA> <NA> <NA> <NA>
#> 2.6 415 Foreland Road secondary <NA> <NA> <NA> <NA>
#> 2.7 415 Foreland Road secondary <NA> <NA> <NA> <NA>
#> z_order
#> 1 3
#> 1.1 3
#> 2 6
#> 2.1 6
#> 2.2 6
#> 2.3 6
#> 2.4 6
#> 2.5 6
#> 2.6 6
#> 2.7 6
#> other_tags
#> 1 "lit"=>"yes","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph"
#> 1.1 "lit"=>"yes","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph"
#> 2 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 2.1 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 2.2 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 2.3 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 2.4 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 2.5 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 2.6 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> 2.7 "lit"=>"yes","ref"=>"B3395","lanes"=>"2","surface"=>"asphalt","maxspeed"=>"30 mph","sidewalk"=>"both"
#> geometry
#> 1 LINESTRING (-1.083348 50.68...
#> 1.1 LINESTRING (-1.083194 50.68...
#> 2 LINESTRING (-1.083348 50.68...
#> 2.1 LINESTRING (-1.084021 50.68...
#> 2.2 LINESTRING (-1.084318 50.68...
#> 2.3 LINESTRING (-1.084632 50.68...
#> 2.4 LINESTRING (-1.084889 50.68...
#> 2.5 LINESTRING (-1.085483 50.68...
#> 2.6 LINESTRING (-1.086317 50.68...
#> 2.7 LINESTRING (-1.086771 50.68... Created on 2020-12-03 by the reprex package (v0.3.0) |
Very interesting @luukvdmeer, seems they are pretty close. Both more than fast enough to close this issue. Reduced From what I can tell your implementation not only has fewer dependencies but is 30% faster. 2.705 / 2.066
#> [1] 1.309293 Created on 2020-12-03 by the reprex package (v0.3.0) |
Hi everyone! I finally read all the messages and I think that the conclusion is that we should prefer @luukvdmeer's approach. IMO the code is quite clear (even if I think there should be more comments in the final version of the function). Thanks for helping us here. I tested both implementations using a variety of road networks (namely Isle of Wight with approx 45k edges, West Yorkshire with approximately 180k edges and Greater London with approximately 500k edges) and, in all cases, @luukvdmeer I think that you could use the same function to create a new morpher for luukvdmeer/sfnetworks#73 and refactor
Hi @mpadge! I agree with the suggestion. IMO that's not a priority (see also the next comment) but I still think that it's a good idea and I would also like to read (and maybe understand) the C++ implementation.
Thank you very much for pointing it out! Unfortunately, my experience with these problems is limited to FAQ 7.31. Looking at the code behind btw: I wouldn't bother rewriting the whole function in C++ since it runs in less than 60 seconds considering a network of 500,000 edges. I think that's all. Feel free to add any question. |
* First ideas for #416 * fix problem in the interaction between route() and data.table created after setting .datatable.aware = TRUE * Second round * Improve rnet_breakup * fix problem with CRS in rnet_breakup
Thank you very much to all the people that joined the discussion. Looking forward to the next steps. |
I think that the solution for the complex example showed in #412 and luukvdmeer/sfnetworks#67 is using the function
stplanr::rnet_breakup_vertices
before creating thesfnetwork
object. Thestructural_properties c:4597 :Couldn't reach some vertices
error is the same error as per the "roundabout" example and the underlying reason is that the algorithm behindsfnetworks
andstplanr
could fail with OSM data (@Robinlovelace Section 4 of the RJ paper).Unfortunately, at the moment, the ideas are good but the code is way too slow and I need to rewrite it as soon as possible.
The text was updated successfully, but these errors were encountered: