最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Creating a Adjecency matrix fast for use in path finding to calculate gird distance from origin - Stack Overflow

programmeradmin2浏览0评论

Goal

I'm working on a project that requires me visualize how far a given budget can reach on a map with different price zones.

Example with 3 price zones, water, countryside, and city, each having different costs per meter traveled:

Red point is the origin, green point is just a sample point, it has no bearing on the result. Gradient goes from yellow (cheap) to purple (max budget)

Issue

Time complexity is currently really bad, causing time to grow very fast as I increase the resolution (decreasing size in meters of each cell). This is most likely due to the iterative creation of the scipy.sparse.lil_array when creating a array adjacency graph.

Details

My current processes to achieve this is by following 6 major steps (time added to show bottlenecks):

  1. Create a Python list of shapely.geometry.Point which represents all points in a 2d grid which we will sample the prices from the geopandas geometry. The list is initialized with empty values before modifications. I iteratively set each index (one at a time) to a point created depending on the grid cell size. And lastly I convert the python list into a pandas dataFrame to later intersect with the geopandas regions. (13.6s)

  2. I now use the dataFrame with points and convert it into a GeoDataFrame and the use gpd.sjoin against my diffrent price areas to receive in this case 3 geoDataFrame's each containing only the points which intersected with the given price area. (0.7s)

  3. After mapping each point with their price, I want to combine it back into a 2d array with the value being the price to do this I use a Python 2d List and index and replace values using a conversion from point coordinates to index coordinates. (14.9s)

  4. We now have a 2d array that contains the cost/weight for each cell. Now I want to convert it into a graph (matrix adjacency graph). To do this I use an iterative procces and inserts each edge into a scipy.sparse.lil_array and then after each element has been connected convert the array into a scipy.sparse.csr_array (136.7s)

  5. I now call scipy.sparse.csgraph.dijkstra and use the distance array created. (0.4s)

  6. Due to a fixed price needed for each node I once again iteratively loop through the array adding the fixed cost. (0.2s)

In addition to step 4 I should mention that I use more than 4 edges per node. Namely using a 5x5 centered on the node (but ignoring duplicate cardinal and diagonal paths), but this introduces a bit of a error margin as it is possible to avoid cost of a cell by jumping diagonal over it, but the error margin should be small enough that it can be ignored. This is due to the otherwise square result:

More info regarding step 4:

To create the edges I iterate trough a list for each node

adjecency_offsets= [
    (-3, -2, 3.60),
    (-3, -1, 3.16),
    (-3,  1, 3.16),
    (-3,  2, 3.60),
    (-2, -3, 3.60),
    (-2, -1, 2.24),
    (-2,  1, 2.24), 
    (-2,  3, 3.60), 
    (-1, -3, 3.16), 
    (-1, -2, 2.24), 
    (-1, -1, 1.42), 
    (-1,  0, 1   ), 
    (-1,  1, 1.42), 
    (-1,  2, 2.24), 
    (-1,  3, 3.16), 
    ( 0, -1, 1   ), 
    ( 0,  1, 1   ), 
    ( 1, -3, 3.16), 
    ( 1, -2, 2.24),
    ( 1, -1, 1.42),
    ( 1,  0, 1   ),
    ( 1,  1, 1.42),
    ( 1,  2, 2.24),
    ( 1,  3, 3.16), 
    ( 2, -3, 3.60),
    ( 2, -1, 2.24),
    ( 2,  1, 2.24),
    ( 2,  3, 3.60), 
    (-3, -2, 3.60),
    (-3, -1, 3.16),   
    (-3,  1, 3.16), 
    (-3,  2, 3.60) 
]

first integer is the row offset, 2nd is column offset, and 3rd value is the distance multiplier to reach the cell from the node (using approximations instead of sq(2) etc)

Then by using following code we create a graph and populate the adjacency array

def Step_4(arr: list[list[int]]):
    # arr is a 2d-array of cost for each point (the index)
    height = len(arr)
    width = len(arr[0])
    num_nodes = width*height
    tmp_array = scipy.sparse.lil_array((num_nodes, num_nodes), dtype=np.float32) # we dont need more precision than this

    # For every row and column in 2d array
    for row in range(height):
        for col in range(width):
            # Create a src vertex
            src = row * width + col

            # and find all edges
            for (r_off, c_off, w_mul) in adjecency_offsets:
                # ignore invalid destinations
                if (row + r_off < 0 or col + c_off < 0):
                    continue
                if (row + r_off >= height or col + c_off >= width):
                    continue

                # valid destination
                dst = (row + r_off) * width + col + c_off

                # Multiply cost of cell with distance to cell
                float_w = w_mul * arr[row + r_off][col + c_off]

                # Set weight in adjacency array
                tmp_array[src, dst] = float_w

    # return crs version due to faster pathfinding
    return tmp_array.tocsr()

Question

Is this a good approach to this problem? (even if it has blocky unnatural edges due to it using graph with limited amount of edges for each node)
If so how can I more efficiently create my adjacency graph (Reduce time spent in step 4)?

What I have tried

  1. I have tried drawing lines from origin in a circle and saving how long the line can get before the budget runs out. this gave me poor runtime (probably a coding issue) but also can not handle finding a cheaper route by going further thru cheaper areas.

  2. Using a list adjacency graph. whilst this reduced creation time by a lot, it also increased path finding time by a larger margin (needed to use a self implemented version of Dijkstra based on Wikipedia using a heap and early exit using budget) in total the total time of creating and path finding is around 2x slowdown compared to the adjacency matrix method mentioned above.

  3. I have tried using DOK format instead but that resulted in a time of 674s instead of 120s. So not sure whats happening there.

Goal

I'm working on a project that requires me visualize how far a given budget can reach on a map with different price zones.

Example with 3 price zones, water, countryside, and city, each having different costs per meter traveled:

Red point is the origin, green point is just a sample point, it has no bearing on the result. Gradient goes from yellow (cheap) to purple (max budget)

Issue

Time complexity is currently really bad, causing time to grow very fast as I increase the resolution (decreasing size in meters of each cell). This is most likely due to the iterative creation of the scipy.sparse.lil_array when creating a array adjacency graph.

Details

My current processes to achieve this is by following 6 major steps (time added to show bottlenecks):

  1. Create a Python list of shapely.geometry.Point which represents all points in a 2d grid which we will sample the prices from the geopandas geometry. The list is initialized with empty values before modifications. I iteratively set each index (one at a time) to a point created depending on the grid cell size. And lastly I convert the python list into a pandas dataFrame to later intersect with the geopandas regions. (13.6s)

  2. I now use the dataFrame with points and convert it into a GeoDataFrame and the use gpd.sjoin against my diffrent price areas to receive in this case 3 geoDataFrame's each containing only the points which intersected with the given price area. (0.7s)

  3. After mapping each point with their price, I want to combine it back into a 2d array with the value being the price to do this I use a Python 2d List and index and replace values using a conversion from point coordinates to index coordinates. (14.9s)

  4. We now have a 2d array that contains the cost/weight for each cell. Now I want to convert it into a graph (matrix adjacency graph). To do this I use an iterative procces and inserts each edge into a scipy.sparse.lil_array and then after each element has been connected convert the array into a scipy.sparse.csr_array (136.7s)

  5. I now call scipy.sparse.csgraph.dijkstra and use the distance array created. (0.4s)

  6. Due to a fixed price needed for each node I once again iteratively loop through the array adding the fixed cost. (0.2s)

In addition to step 4 I should mention that I use more than 4 edges per node. Namely using a 5x5 centered on the node (but ignoring duplicate cardinal and diagonal paths), but this introduces a bit of a error margin as it is possible to avoid cost of a cell by jumping diagonal over it, but the error margin should be small enough that it can be ignored. This is due to the otherwise square result:

More info regarding step 4:

To create the edges I iterate trough a list for each node

adjecency_offsets= [
    (-3, -2, 3.60),
    (-3, -1, 3.16),
    (-3,  1, 3.16),
    (-3,  2, 3.60),
    (-2, -3, 3.60),
    (-2, -1, 2.24),
    (-2,  1, 2.24), 
    (-2,  3, 3.60), 
    (-1, -3, 3.16), 
    (-1, -2, 2.24), 
    (-1, -1, 1.42), 
    (-1,  0, 1   ), 
    (-1,  1, 1.42), 
    (-1,  2, 2.24), 
    (-1,  3, 3.16), 
    ( 0, -1, 1   ), 
    ( 0,  1, 1   ), 
    ( 1, -3, 3.16), 
    ( 1, -2, 2.24),
    ( 1, -1, 1.42),
    ( 1,  0, 1   ),
    ( 1,  1, 1.42),
    ( 1,  2, 2.24),
    ( 1,  3, 3.16), 
    ( 2, -3, 3.60),
    ( 2, -1, 2.24),
    ( 2,  1, 2.24),
    ( 2,  3, 3.60), 
    (-3, -2, 3.60),
    (-3, -1, 3.16),   
    (-3,  1, 3.16), 
    (-3,  2, 3.60) 
]

first integer is the row offset, 2nd is column offset, and 3rd value is the distance multiplier to reach the cell from the node (using approximations instead of sq(2) etc)

Then by using following code we create a graph and populate the adjacency array

def Step_4(arr: list[list[int]]):
    # arr is a 2d-array of cost for each point (the index)
    height = len(arr)
    width = len(arr[0])
    num_nodes = width*height
    tmp_array = scipy.sparse.lil_array((num_nodes, num_nodes), dtype=np.float32) # we dont need more precision than this

    # For every row and column in 2d array
    for row in range(height):
        for col in range(width):
            # Create a src vertex
            src = row * width + col

            # and find all edges
            for (r_off, c_off, w_mul) in adjecency_offsets:
                # ignore invalid destinations
                if (row + r_off < 0 or col + c_off < 0):
                    continue
                if (row + r_off >= height or col + c_off >= width):
                    continue

                # valid destination
                dst = (row + r_off) * width + col + c_off

                # Multiply cost of cell with distance to cell
                float_w = w_mul * arr[row + r_off][col + c_off]

                # Set weight in adjacency array
                tmp_array[src, dst] = float_w

    # return crs version due to faster pathfinding
    return tmp_array.tocsr()

Question

Is this a good approach to this problem? (even if it has blocky unnatural edges due to it using graph with limited amount of edges for each node)
If so how can I more efficiently create my adjacency graph (Reduce time spent in step 4)?

What I have tried

  1. I have tried drawing lines from origin in a circle and saving how long the line can get before the budget runs out. this gave me poor runtime (probably a coding issue) but also can not handle finding a cheaper route by going further thru cheaper areas.

  2. Using a list adjacency graph. whilst this reduced creation time by a lot, it also increased path finding time by a larger margin (needed to use a self implemented version of Dijkstra based on Wikipedia using a heap and early exit using budget) in total the total time of creating and path finding is around 2x slowdown compared to the adjacency matrix method mentioned above.

  3. I have tried using DOK format instead but that resulted in a time of 674s instead of 120s. So not sure whats happening there.

Share Improve this question edited Feb 17 at 15:22 Mercury 4,1811 gold badge14 silver badges43 bronze badges asked Feb 16 at 17:23 Hampus ToftHampus Toft 671 silver badge10 bronze badges 5
  • I do not think Python is great for this use case. manipulation of Python lists (and more generally dynamic data structure) is inherently slow in Python compared to native languages (and you seems to need pretty dynamic data structure). More specifically, one of the bottleneck is the LIL matrix building which is basically Python lists of lists internally. It is slow mainly because of that (meanwhile CSR is basically 3 Numpy arrays ans so native ones). This can be an order of magnitude faster (if not even 2 for a non vectorized code). – Jérôme Richard Commented Feb 16 at 18:35
  • @JérômeRichard I completely understand that python is not the most performant language out there, but I feel like this should still be possible. I might be using the wrong method, and I am definitely building the arrays inefficiently, however I fail to see what i need to change / how to build the array in step 4 in a performant manner. – Hampus Toft Commented Feb 16 at 18:40
  • Besides, your graph seems to contain a LOT of nodes. Do you really need such high precision? Generally, when performance matters, we tends to use hierarchical approaches. The result is less precise but this can be much faster. This is often done in path findings of games (which often also split the operation in refining steps so not to cause any freeze). Here this seems an interesting approach to consider since the results is actually very "uniform". – Jérôme Richard Commented Feb 16 at 18:40
  • Ok. I think the step 4 is not sufficiently detailed so we can really help you. Do you sort the indices as mentioned in the Scipy LIL doc? You could try DOK matrices for this step (rather cheap to test). I expect the iterative process to be the bottleneck and not easy to parallelise (especially in Python). – Jérôme Richard Commented Feb 16 at 18:46
  • @JérômeRichard Added more info regarding step 4, hopefully it helps. Not sure if they are sorted, but I'm guessing they are not. but the first edge is strictly left of the next and so on, so edit should go left -> right in the row, and source should make sure that we edit from top to bottom. – Hampus Toft Commented Feb 16 at 21:13
Add a comment  | 

1 Answer 1

Reset to default 1

The complexity of this code seems fine. The main issue is that your code is a pure-Python code: it is not vectorized (i.e. it does not spent most of its time in fast native functions). As a result, it is very slow because Python codes are generally executed using the CPython interpreter.

The key to fix this issue is to make this function native (e.g. using Cython, Numba, by rewriting this in languages like C/C++/Rust), or using modules like Numpy (e.g. using broadcasting operation).

In this case, I choose to use Numba (so to keep a similar code and show how inefficient interpreted CPython code is). Numba is Python module able to compile specific Python functions (mainly operating on Numpy arrays) to native ones at runtime (Just-in-Time compiler a.k.a. JIT).

Here is the optimized Numba code:

import scipy
import numpy as np

@nb.njit('(int32[:,::1], int32[::1], int32[::1], float32[::1])')
def compute(arr, row_offsets, col_offsets, weight_offsets):
    # arr is a 2d-array of cost for each point (the index)
    height = len(arr)
    width = len(arr[0])
    num_nodes = width * height
    out_rows = np.zeros(num_nodes * weight_offsets.size, dtype=np.int32)
    out_cols = np.zeros(num_nodes * weight_offsets.size, dtype=np.int32)
    out_vals = np.zeros(num_nodes * weight_offsets.size, dtype=np.float32)
    nnz = 0

    for row in range(height):
        for col in range(width):
            src = row * width + col

            for (r_off, c_off, w_mul) in zip(row_offsets, col_offsets, weight_offsets):
                if (row + r_off < 0 or col + c_off < 0) or (row + r_off >= height or col + c_off >= width):
                    continue

                dst = (row + r_off) * width + col + c_off
                float_w = w_mul * arr[row + r_off][col + c_off]

                if float_w != 0:
                    out_rows[nnz] = src
                    out_cols[nnz] = dst
                    out_vals[nnz] = float_w
                    nnz += 1

    return (out_rows, out_cols, out_vals, nnz)

def Step_4_fast(list_of_list):
    arr = np.array(list_of_list, dtype=np.int32)
    row_offsets = np.array([item[0] for item in adjecency_offsets], dtype=np.int32)
    col_offsets = np.array([item[1] for item in adjecency_offsets], dtype=np.int32)
    weight_offsets = np.array([item[2] for item in adjecency_offsets], dtype=np.float32)
    out_rows, out_cols, out_vals, nnz = compute(arr, row_offsets, col_offsets, weight_offsets)
    return scipy.sparse.csr_array((out_vals[:nnz], (out_rows[:nnz], out_cols[:nnz])), shape=(arr.size, arr.size))

The main idea is to build a COO matrix manually (i.e. "ijv triplet") and then build a Scipy's CSR matrix from it. The COO matrix takes a significant space but be aware that CPython list of CPython list was already pretty inefficient in term of memory usage: each float item typically takes 24 bytes on mainstream machines and each float reference also takes 8 bytes which means 32 bytes per item instead of just 4 (so 8 times more than needed).

I used 32-bit integer so to reduce the memory usage and improve performance. You should check if this is fine in your real-wold use-case though (i.e. check there are no overflow). Otherwise, you should just use 64-bit ones.


Benchmark

Here are input for the benchmark:

np.random.seed(42)
# adjecency_offsets is like in the OP question
arr = np.random.randint(1, 100_000, (500, 500)).tolist()

Here are performance results on my i5-9600KF CPU:

Step_4(arr):          16.43 seconds
Step_4_fast(arr):      0.16 seconds

Thus, the Numba code is about 100 times faster!


Notes

One downside of the Numba code is that the function is compiled at runtime. It takes about 0.4 second on my machine. If this is a problem, then you can use Cython instead. Alternatively, you can cache the compilation so only the first compilation is slower (using the cache=True compilation flag).

Note that only ~30% of the time is spent in the compute function showing how fast it is. ~60% of the time is spent in the COO to CSR matrix conversion. ~10% of the time is spent in the list-of-list to Numpy array conversion. Thus, for better performance, it would be a good idea to use a better data structure than a COO matrix (a kind of CSR one but with growing lists) and avoid list-of-list to Numpy array conversions (by not using lists at all but Numpy arrays instead). That being said, this is certainly not easy to do that efficiently in Numba (simpler in native languages).

Another way to make the code faster is to parallelize the operations once the code is modified to directly operate on CSR-like matrices (array of growing list). However this is far from being easy in Python (even in Cython/Numba). In a native language, each thread can operate on its block of arr and modify a thread-local CSR-like matrix. The CSR-like matrices should then be merged if possible also in parallel (pretty difficult part). This last implementation should be an order of magnitude faster than the already much faster above Numba code on a mainstream CPU (e.g. with 6~8 cores).

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论