I’m working with an sf file containing a road network, where each row represents a road segment. My goal is to create a fully connected network, so I tried building the network using the sf_network and igraph packages.
After filtering and transforming the data, I created the network using the following code:
strade_net <- strade_snapped %>%
as_sfnetwork() %>%
activate("edges") %>%
arrange(edge_length()) %>%
filter(!edge_is_multiple()) %>%
filter(!edge_is_loop()) %>%
activate("edges") %>%
mutate(weight = edge_length()) %>%
activate("nodes") %>%
mutate(bc = centrality_betweenness(weights = weight, directed = FALSE)) %>%
mutate(id = as.character(seq_along(.)))
nodes_strade <- sf::st_as_sf(strade_net, "nodes")
edges_strade <- sf::st_as_sf(strade_net, "edges")
nodes <- st_transform(nodes_strade, crs = 32632)
edges <- st_transform(edges_strade, crs = 32632)
# Create the graph
graph <- graph_from_data_frame(d = edges, vertices = nodes$id, directed = FALSE)
However, the resulting igraph
network is divided into multiple disconnected components, whereas I need a single, fully connected component. Hers'an example with the first 50 comp:
print(components(graph)$no)
[1] 3749
print(components(graph)$csize[1:50])
[1] 2 5 5 4 41 7 20 92 2 18 245 11 4 6 308 4 19 2 2 2 4 2 2 2 2
[26] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 101 5 2 2 2
print(components(graph)$membership[1:50])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
17 18 18 19 19 20 20 21 21 22 22 23 23 24 24
I attempted to connect the components by identifying the closest point on component "y" for each node in component "x," applying a 10 m threshold. To improve efficiency and avoid running the computation over the entire dataset, I used a spatial grid.
Specifically, for each grid cell, I identify all components present within it. Then, for each pair of components in the same cell, I calculate the distance between every node in component "x" and the geometry of component "y" to determine the closest connection point.
The following approach iterates over a 2.5 km x 2.5 km grid and attempts to create a list containing the corresponding node-point to connect between different components:
# Create a grid with 2500m x 2500m cells
grid <- st_make_grid(nodes, cellsize = c(2500, 2500))
# Set max connection distance (10m)
max_distance <- set_units(10, "m")
# Identify connected components
components <- components(graph)
# Extract node geometries
node_geometries <- st_geometry(nodes)
# Create LINESTRING geometries for each component
component_geometries <- lapply(unique(components$membership), function(comp) {
nodes_in_component <- which(components$membership == comp)
st_union(node_geometries[nodes_in_component])
})
edges_to_add <- list()
checked_pairs <- list()
# Iterate over each grid cell
for (g in seq_along(grid)) {
inside_nodes <- st_within(node_geometries, grid[g], sparse = FALSE)
components_in_grid <- unique(components$membership[inside_nodes])
if (length(components_in_grid) < 2) next
cut_geometries <- lapply(components_in_grid, function(comp) {
st_intersection(component_geometries[[comp]], grid[g])
})
names(cut_geometries) <- components_in_grid
component_pairs <- combn(components_in_grid, 2, simplify = FALSE)
for (pair in component_pairs) {
comp1 <- pair[1]
comp2 <- pair[2]
pair_key <- paste(sort(c(comp1, comp2)), collapse = "-")
if (pair_key %in% checked_pairs) next
checked_pairs <- append(checked_pairs, pair_key)
geom1_cut <- cut_geometries[[as.character(comp1)]]
geom2_cut <- cut_geometries[[as.character(comp2)]]
nodes_i <- which(components$membership == comp1 & inside_nodes)
nodes_j <- which(components$membership == comp2 & inside_nodes)
dist_i_to_geom2 <- st_distance(node_geometries[nodes_i], geom2_cut)
dist_j_to_geom1 <- st_distance(node_geometries[nodes_j], geom1_cut)
min_i <- which.min(dist_i_to_geom2)
min_j <- which.min(dist_j_to_geom1)
min_dist_i <- dist_i_to_geom2[min_i]
min_dist_j <- dist_j_to_geom1[min_j]
if (min_dist_i <= min_dist_j & min_dist_i <= max_distance) {
closest_point_on_geom2 <- st_nearest_points(node_geometries[nodes_i[min_i]], geom2_cut)
new_coord <- st_coordinates(closest_point_on_geom2)[2, ]
edges_to_add <- append(edges_to_add, list(list(
from = nodes_i[min_i], to = new_coord, distance = as.numeric(min_dist_i)
)))
} else if (min_dist_j <= max_distance) {
closest_point_on_geom1 <- st_nearest_points(node_geometries[nodes_j[min_j]], geom1_cut)
new_coord <- st_coordinates(closest_point_on_geom1)[2, ]
edges_to_add <- append(edges_to_add, list(list(
from = nodes_j[min_j], to = new_coord, distance = as.numeric(min_dist_j)
)))
}
}
}
However, this approach is extremely slow, and it does not even include the creation of the edges—it only generates a list of nodes and the coordinates of the nearest points where new edges should be placed to connect different components.
Question:
Is there a more efficient way to achieve a fully connected network using sf_network
and igraph
? Specifically, is there a faster method for identifying and linking the closest components in a spatially explicit way?