最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

algorithm - For given edge, find all nodes the edge lies on any path from - Stack Overflow

programmeradmin0浏览0评论

I have undirected connected graph with one main node (denoted by M in examples below).

I am trying to find efficient algorithm to following specification:

  • input is set of edges (denoted by thick lines)
  • output for each input edge is set of nodes
  • node belongs to input edge if the edge is element of any path from it to main node

Example 1:

Expected output: {2: [B,C,D,E,F,G,H], 4: [D,E,F]}

Explanation: for all nodes from B to the right the path must go through edge 2. However the nodes A, B and C are not under edge 4 because no path between any of them and M contains edge 4.

Example 2:

Expected output: {1: [A,B,C,D,E,F], 2: [A,B,C,D,E,F], 5: [D,E,F]}

Explanation: both of nodes A and B must belong to both of edges 1 and 2 because node M is reachable from each of them (besides the shortest path) also through node C. Path length does not matter. Like in example 1, none of nodes in the cycle belongs to edge 5.

Example 3: (UPDATE - counterexample for @user58697's comment)

Expected output: {1: [A,B,C,D,E,F], 5: [D,E,F]}

Explanation: for every node A..F there exists path through edge 1 to M (for example BCAM, FECAM). However none of nodes A, B, C can reach M through edge 5 so they do not belong to it in result map (ACDFECBM is a trail, not path).

Straightforward solution

The dummy algorithm works correctly (I show pseudocode because I have implemented it in Neo4j Traversal framework actually):

Map<Edge,Set<Node>> execute(Set<Edge> input) {
    Map<Edge,Set<Node>> result = new HashMap<>();
    Node m = graph.getMainNode();
    for (Node n : graph.getAllNodes()) {
        for (Path p : depthFirstSearchAllUniquePathsFrom(n, m)) {
            for (Edge e : p.getEdges().filter(input::contains).toList()) {
                resultputeIfAbsent(e, new HashSet<>()).add(n);
            }
        }
    }
    return result;
}

The problem

The algorithm is slow. The traversal of graph of 758 nodes and 133 input edges with many cycles takes 20min on my local computer. Debugging proves there is a lot of similar paths (the depthFirstSearchAllUniquePathsFrom is actually traversal with Uniqueness.NODE_PATH which is very time consuming).

I seek for efficient algorithm which would better utilize information gained during traversal (for comparison - something like if Floyd-Warshall algorithm finds all paths with better complexity than would multiple calls of Dijkstra algorithm between each pair do).

I actually don't want to enumerate paths (which has exponential of factorial complexity). I cannot distinguish whether I am falling into trap of some NP-hard problem or there is just some simple solution I am overlooking.

Notes

  • similar problem - OP seeks for all vertices passed by all paths (while I seek for all edges and "en-masse")

I have undirected connected graph with one main node (denoted by M in examples below).

I am trying to find efficient algorithm to following specification:

  • input is set of edges (denoted by thick lines)
  • output for each input edge is set of nodes
  • node belongs to input edge if the edge is element of any path from it to main node

Example 1:

Expected output: {2: [B,C,D,E,F,G,H], 4: [D,E,F]}

Explanation: for all nodes from B to the right the path must go through edge 2. However the nodes A, B and C are not under edge 4 because no path between any of them and M contains edge 4.

Example 2:

Expected output: {1: [A,B,C,D,E,F], 2: [A,B,C,D,E,F], 5: [D,E,F]}

Explanation: both of nodes A and B must belong to both of edges 1 and 2 because node M is reachable from each of them (besides the shortest path) also through node C. Path length does not matter. Like in example 1, none of nodes in the cycle belongs to edge 5.

Example 3: (UPDATE - counterexample for @user58697's comment)

Expected output: {1: [A,B,C,D,E,F], 5: [D,E,F]}

Explanation: for every node A..F there exists path through edge 1 to M (for example BCAM, FECAM). However none of nodes A, B, C can reach M through edge 5 so they do not belong to it in result map (ACDFECBM is a trail, not path).

Straightforward solution

The dummy algorithm works correctly (I show pseudocode because I have implemented it in Neo4j Traversal framework actually):

Map<Edge,Set<Node>> execute(Set<Edge> input) {
    Map<Edge,Set<Node>> result = new HashMap<>();
    Node m = graph.getMainNode();
    for (Node n : graph.getAllNodes()) {
        for (Path p : depthFirstSearchAllUniquePathsFrom(n, m)) {
            for (Edge e : p.getEdges().filter(input::contains).toList()) {
                resultputeIfAbsent(e, new HashSet<>()).add(n);
            }
        }
    }
    return result;
}

The problem

The algorithm is slow. The traversal of graph of 758 nodes and 133 input edges with many cycles takes 20min on my local computer. Debugging proves there is a lot of similar paths (the depthFirstSearchAllUniquePathsFrom is actually traversal with Uniqueness.NODE_PATH which is very time consuming).

I seek for efficient algorithm which would better utilize information gained during traversal (for comparison - something like if Floyd-Warshall algorithm finds all paths with better complexity than would multiple calls of Dijkstra algorithm between each pair do).

I actually don't want to enumerate paths (which has exponential of factorial complexity). I cannot distinguish whether I am falling into trap of some NP-hard problem or there is just some simple solution I am overlooking.

Notes

  • similar problem - OP seeks for all vertices passed by all paths (while I seek for all edges and "en-masse")
Share Improve this question edited 2 hours ago Tomáš Záluský asked 20 hours ago Tomáš ZáluskýTomáš Záluský 12.1k3 gold badges43 silver badges71 bronze badges 2
  • If I understand correctly (that's why it is a comment, and not an answer), you want to (1) remove the edge, and (2) find connected components. If the graph is still connected, all vertices are in the set, otherwise the set is only vertices from the component not containing M. – user58697 Commented 17 hours ago
  • @user58697 Thanks for response, it took me a while to analyze. It looks promising but finally I found a counterexample, see newly added Example 3. If I tested graph just to disconnection on given edge, removing of edge 5 wouldn't break graph connectivity and thus lead to "all vertices are in the set" conclusion, which is incorrect. Perhaps your method would be sufficient if trail (rather than path) was enough. Anyway your comment was very helpful since it made me to rethink problem specification (the background is real task from telco domain - outage analysis). – Tomáš Záluský Commented 1 hour ago
Add a comment  | 

1 Answer 1

Reset to default 0

I don't quite understand the criteria for example 2, but it seems you want all reachable nodes up to but excluding M. But here is an efficient query for my best guess:

UNWIND $edgeIds AS edgeId
CALL (edgeId) {
  MATCH p = SHORTEST 1 (m {id: "M"})--*(x)-[r {id: edgeId}]-(s)
  RETURN m, r, s ORDER BY length(p)
  LIMIT 1
}
RETURN r, COLLECT {
         MATCH (s)-[rel WHERE rel <> r]-*(e WHERE e <> m)
         RETURN DISTINCT e.id ORDER BY e.id
       } AS nodes

To show it works, I made the values "M", 2 etc values of property id of the nodes and relationships for my examples.

Here is the data and results for the first example:

CREATE ({id: "M"})-[:R]->({id: "A"})-[:R {id: 2}]->({id: "B"})-[:R {id: 3}]->
       (n15 {id: "C"})-[:R {id: 4}]->(n3 {id: "D"}),
       ({id: "F"})<-[:R]-(n3)-[:R]->({id: "E"}),
       (n15)-[:R]->({id: "G"})-[:R]->({id: "H"})

:param edgeIds => [2, 4]

+----------------------------------------------------+
| r            | nodes                               |
+----------------------------------------------------+
| [:R {id: 2}] | ["B", "C", "D", "E", "F", "G", "H"] |
| [:R {id: 4}] | ["D", "E", "F"]                     |
+----------------------------------------------------+

And the second example:

CREATE (n10 {id: "B"})<-[:R {id: 2}]-({id: "M"})-[:R {id: 1}]->
       ({id: "A"})-[:R]->(n11 {id: "C"})<-[:R]-(n10),
       (n11)-[:R {id: 5}]->(n12 {id: "D"})-[:R]->({id: "E"}),
       (n12)-[:R]->({id: "F"})

:param edgeIds => [1, 2, 5]

+-----------------------------------------------+
| r            | nodes                          |
+-----------------------------------------------+
| [:R {id: 1}] | ["A", "B", "C", "D", "E", "F"] |
| [:R {id: 2}] | ["A", "B", "C", "D", "E", "F"] |
| [:R {id: 5}] | ["D", "E", "F"]                |
+-----------------------------------------------+

The query inside the COLLECT is planned as BFS so should be fast. The only part that may slow things down is the SHORTEST part.

发布评论

评论列表(0)

  1. 暂无评论