We are using MongoDB to manage a user hierarchy where each user can have multiple child users (recursively). Each user also has multiple bank accounts. Our goal is to efficiently fetch all accounts of a given user along with the accounts of all its descendant users.
However, our current aggregation query is slow (~1 minute latency), especially when dealing with millions of users. We are looking for an optimized approach to improve query performance.
Data Model and Example Data
1. User Hierarchy (Closure Table)
We store user relationships in a Closure Table, where each document represents an ancestor-descendant relationship.
ancestorUserUid | descendantUserUid | depth |
---|---|---|
A | A | 0 |
A | B | 1 |
A | C | 1 |
A | D | 2 |
A | E | 2 |
A | F | 2 |
A | G | 2 |
B | B | 0 |
B | D | 1 |
B | E | 1 |
C | C | 0 |
C | F | 1 |
C | G | 1 |
We are using MongoDB to manage a user hierarchy where each user can have multiple child users (recursively). Each user also has multiple bank accounts. Our goal is to efficiently fetch all accounts of a given user along with the accounts of all its descendant users.
However, our current aggregation query is slow (~1 minute latency), especially when dealing with millions of users. We are looking for an optimized approach to improve query performance.
Data Model and Example Data
1. User Hierarchy (Closure Table)
We store user relationships in a Closure Table, where each document represents an ancestor-descendant relationship.
ancestorUserUid | descendantUserUid | depth |
---|---|---|
A | A | 0 |
A | B | 1 |
A | C | 1 |
A | D | 2 |
A | E | 2 |
A | F | 2 |
A | G | 2 |
B | B | 0 |
B | D | 1 |
B | E | 1 |
C | C | 0 |
C | F | 1 |
C | G | 1 |
Example Hierarchy (Tree Representation)
User A
├── User B
│ ├── User D
│ ├── User E
├── User C
├── User F
├── User G
A
is the top-level user.B
andC
are direct children ofA
.D
,E
,F
,G
are descendants ofA
at different depths.
2. Accounts Collection
Each user has multiple bank accounts, stored in the accounts
collection.
accountId | userUid | accountNumber | balance | status |
---|---|---|---|---|
ACC001 | A | 12345 | 1000 | ACTIVE |
ACC002 | B | 67890 | 2000 | ACTIVE |
ACC003 | D | 11111 | 1500 | ACTIVE |
ACC004 | E | 22222 | 2500 | INACTIVE |
ACC005 | C | 33333 | 3000 | ACTIVE |
ACC006 | F | 44444 | 4000 | ACTIVE |
ACC007 | G | 55555 | 5000 | ACTIVE |
Initial Approach (Direct Query + $in
Clause)
Step 1: Query the
userHierarchy
table to get all descendant user IDs of a given user.{ "ancestorUserUid": "A" }
Returns:
["A", "B", "C", "D", "E", "F", "G"]
Step 2: Query the
accounts
collection using$in
to fetch all accounts linked to those users.{ "userUid": { "$in": ["A", "B", "C", "D", "E", "F", "G"] } }
Issues:
- Slow performance when user has thousands of child users.
- Production MongoDB limits the number of documents that can be fetched in one go.
$in
with a large list performs poorly.
Second Approach: Aggregation with $lookup
We attempted to use aggregation with $lookup
to fetch accounts directly.
Aggregation Pipeline Used
{
"aggregate": "userHierarchy",
"pipeline": [
{ "$match": { "ancestorUserUid": { "$in": ["A"] } } },
{ "$project": { "descendantUserUid": 1 } },
{ "$lookup": {
"from": "accounts",
"localField": "descendantUserUid",
"foreignField": "userUid",
"as": "accounts"
}},
{ "$unwind": "$accounts" },
{ "$replaceRoot": { "newRoot": "$accounts" } },
{ "$match": { "status": { "$in": ["ACTIVE"] } } },
{ "$skip": 0 },
{ "$limit": 100 }
]
}
Performance Issues:
$lookup
is extremely slow when the user has thousands of descendants.- MongoDB does not efficiently optimize the join operation, despite indexes.
- Latency: ~1 minute (unacceptable for production).
Expected Outcome
We need an efficient way to fetch all accounts of a user and their descendants, ensuring:
- Low latency (ideally under a few seconds).
- Scalability for millions of users.
- Efficient MongoDB query execution.
What is the best approach to optimize this query further?
Share edited Feb 21 at 16:39 James Z 12.3k10 gold badges27 silver badges47 bronze badges asked Feb 21 at 8:51 Sourabh BankaSourabh Banka 1,0375 gold badges25 silver badges50 bronze badges 2- Just a minor note: Better use term "parent" rather than "ancestor". "ancestor" means all parent, grant-parent, ... up to the root – Wernfried Domscheit Commented Feb 21 at 9:22
- @WernfriedDomscheit thanks for the suggestion. can you also help me with the mentioned problem? – Sourabh Banka Commented Feb 21 at 10:23
2 Answers
Reset to default 3I would suggest a data model like this:
[
{ accountId: ACC001, userUid: A, accountNumber: 12345, children: [B, C] }
{ accountId: ACC002, userUid: B, accountNumber: 67890, children: [D, E] }
{ accountId: ACC003, userUid: D, accountNumber: 11111 }
{ accountId: ACC004, userUid: E, accountNumber: 22222 }
{ accountId: ACC005, userUid: C, accountNumber: 33333, children: [F, G] }
{ accountId: ACC006, userUid: F, accountNumber: 44444 }
{ accountId: ACC007, userUid: G, accountNumber: 55555 }
]
I.e. just one collection.
Then it should be possible to run a recursive query with $graphLookup for example like this:
db.collection.aggregate([
{ $match: { userUid: 'A' } },
{
$graphLookup: {
from: 'collection',
startWith: "$userUid",
connectFromField: "children",
connectToField: "userUid",
as: "accountHierarchy",
depthField: 'depth',
restrictSearchWithMatch: { status: "ACTIVE" }
}
}
])
Have you considered running parallel queries by breaking the array passed to userUid
in the $in
down into smaller chunks?
Modifying you Initial Approach to something like:
const userUids = ["A", "B", ...];
const promises = [];
const chunk = 100;
for (let i = 0; i < userUids.length; i += chunk) {
const ids = userUids.slice(i, i + chunk);
const promise = db.accounts.find({
userUid: { $in: ids }
}).toArray();
promises.push(promise)
}
const results = await Promise.all(promises)
const docs = [];
for(let r of results){
docs.push(...r);
}
This will run parallel queries where the number of ids
passed to $in
is limited to 100
. This is achievable by not executing the query in the for
loop, instead you store it in the variable promise
and then push that to an array of promises
.
Now docs
holds all of the documents found.
When running these type of parallel queries you should specify the Read Preference in the driver so that they make use of secondary read replicas. I would adopt secondaryPreferred or nearest in this case.