We are testing a very large depth tree with ArangoDB. It's loaded up in docker using the latest arangodb-instance image.
A tree with 1,000,000 nodes and 1,000,000 edges has been uploaded. It is set up with one child per node. Showing this visually, a portion of the graph looks like this.
With this set up, the test is this query
FOR v IN 1..100 INBOUND 'node/1_30000' GRAPH 'Line'
RETURN
[v._key, v.parameter]
For 100 traversals, this is quite fast
Query Statistics:
Writes Exec Writes Ign Doc. Lookups Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 0 200 100 / 0 0 32768 0.00163
When this is expanded by 100x (FOR v IN 1..10000), the query exec time grows by 190,793 times (from 0.00163s to 310.99316s).
Query Statistics:
Writes Exec Writes Ign Doc. Lookups Scan Full Scan Index Cache Hits/Misses Filtered Peak Mem [b] Exec Time [s]
0 0 0 0 20000 0 / 10000 0 1572864 310.99316
In looking at the timings, the time is almost exclusively in TraversalNode
2 TraversalNode 10 9 10000 0 310.99188
In this exercise to see where a graph database can be effectively utilized it appears as if this case isn't one of them.
Is a large depth tree not a tuned use case for ArangoDB or are there ways this tree should be tweaked/queried that are not set by default? Can any graph database handle this use case without exponentially slowing down?