最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

performance - MongoDB Aggregation Optimization for Fetching Hierarchical User Accounts with Low Latency - Stack Overflow

programmeradmin1浏览0评论

We are using MongoDB to manage a user hierarchy where each user can have multiple child users (recursively). Each user also has multiple bank accounts. Our goal is to efficiently fetch all accounts of a given user along with the accounts of all its descendant users.

However, our current aggregation query is slow (~1 minute latency), especially when dealing with millions of users. We are looking for an optimized approach to improve query performance.


Data Model and Example Data

1. User Hierarchy (Closure Table)

We store user relationships in a Closure Table, where each document represents an ancestor-descendant relationship.

ancestorUserUid descendantUserUid depth
A A 0
A B 1
A C 1
A D 2
A E 2
A F 2
A G 2
B B 0
B D 1
B E 1
C C 0
C F 1
C G 1

We are using MongoDB to manage a user hierarchy where each user can have multiple child users (recursively). Each user also has multiple bank accounts. Our goal is to efficiently fetch all accounts of a given user along with the accounts of all its descendant users.

However, our current aggregation query is slow (~1 minute latency), especially when dealing with millions of users. We are looking for an optimized approach to improve query performance.


Data Model and Example Data

1. User Hierarchy (Closure Table)

We store user relationships in a Closure Table, where each document represents an ancestor-descendant relationship.

ancestorUserUid descendantUserUid depth
A A 0
A B 1
A C 1
A D 2
A E 2
A F 2
A G 2
B B 0
B D 1
B E 1
C C 0
C F 1
C G 1

Example Hierarchy (Tree Representation)

User A  
 ├── User B  
 │   ├── User D  
 │   ├── User E  
 ├── User C  
     ├── User F  
     ├── User G  
  • A is the top-level user.
  • B and C are direct children of A.
  • D, E, F, G are descendants of A at different depths.

2. Accounts Collection

Each user has multiple bank accounts, stored in the accounts collection.

accountId userUid accountNumber balance status
ACC001 A 12345 1000 ACTIVE
ACC002 B 67890 2000 ACTIVE
ACC003 D 11111 1500 ACTIVE
ACC004 E 22222 2500 INACTIVE
ACC005 C 33333 3000 ACTIVE
ACC006 F 44444 4000 ACTIVE
ACC007 G 55555 5000 ACTIVE

Initial Approach (Direct Query + $in Clause)

  1. Step 1: Query the userHierarchy table to get all descendant user IDs of a given user.

    { "ancestorUserUid": "A" }
    

    Returns: ["A", "B", "C", "D", "E", "F", "G"]

  2. Step 2: Query the accounts collection using $in to fetch all accounts linked to those users.

    { "userUid": { "$in": ["A", "B", "C", "D", "E", "F", "G"] } }
    

Issues:

  • Slow performance when user has thousands of child users.
  • Production MongoDB limits the number of documents that can be fetched in one go.
  • $in with a large list performs poorly.

Second Approach: Aggregation with $lookup

We attempted to use aggregation with $lookup to fetch accounts directly.

Aggregation Pipeline Used

{
  "aggregate": "userHierarchy",
  "pipeline": [
    { "$match": { "ancestorUserUid": { "$in": ["A"] } } },
    { "$project": { "descendantUserUid": 1 } },
    { "$lookup": {
        "from": "accounts",
        "localField": "descendantUserUid",
        "foreignField": "userUid",
        "as": "accounts"
    }},
    { "$unwind": "$accounts" },
    { "$replaceRoot": { "newRoot": "$accounts" } },
    { "$match": { "status": { "$in": ["ACTIVE"] } } },
    { "$skip": 0 },
    { "$limit": 100 }
  ]
}

Performance Issues:

  • $lookup is extremely slow when the user has thousands of descendants.
  • MongoDB does not efficiently optimize the join operation, despite indexes.
  • Latency: ~1 minute (unacceptable for production).

Expected Outcome

We need an efficient way to fetch all accounts of a user and their descendants, ensuring:

  • Low latency (ideally under a few seconds).
  • Scalability for millions of users.
  • Efficient MongoDB query execution.

What is the best approach to optimize this query further?

Share edited Feb 21 at 16:39 James Z 12.3k10 gold badges27 silver badges47 bronze badges asked Feb 21 at 8:51 Sourabh BankaSourabh Banka 1,0375 gold badges25 silver badges50 bronze badges 2
  • Just a minor note: Better use term "parent" rather than "ancestor". "ancestor" means all parent, grant-parent, ... up to the root – Wernfried Domscheit Commented Feb 21 at 9:22
  • @WernfriedDomscheit thanks for the suggestion. can you also help me with the mentioned problem? – Sourabh Banka Commented Feb 21 at 10:23
Add a comment  | 

2 Answers 2

Reset to default 3

I would suggest a data model like this:

[
    { accountId: ACC001, userUid: A, accountNumber: 12345, children: [B, C] }
    { accountId: ACC002, userUid: B, accountNumber: 67890, children: [D, E] }
    { accountId: ACC003, userUid: D, accountNumber: 11111 }
    { accountId: ACC004, userUid: E, accountNumber: 22222 }
    { accountId: ACC005, userUid: C, accountNumber: 33333, children: [F, G] }
    { accountId: ACC006, userUid: F, accountNumber: 44444 }
    { accountId: ACC007, userUid: G, accountNumber: 55555 }
]

I.e. just one collection.

Then it should be possible to run a recursive query with $graphLookup for example like this:

db.collection.aggregate([
   { $match: { userUid: 'A' } },
   {
      $graphLookup: {
         from: 'collection',
         startWith: "$userUid",
         connectFromField: "children",
         connectToField: "userUid",
         as: "accountHierarchy",
         depthField: 'depth',
         restrictSearchWithMatch: { status: "ACTIVE" }
      }
   }
])

Have you considered running parallel queries by breaking the array passed to userUid in the $in down into smaller chunks?

Modifying you Initial Approach to something like:

const userUids = ["A", "B", ...];

const promises = [];

const chunk = 100;
for (let i = 0; i < userUids.length; i += chunk) {

   const ids = userUids.slice(i, i + chunk);
   const promise = db.accounts.find({ 
      userUid: { $in: ids } 
   }).toArray();

   promises.push(promise)
}
const results = await Promise.all(promises)
const docs = [];
for(let r of results){
   docs.push(...r);
}

This will run parallel queries where the number of ids passed to $in is limited to 100. This is achievable by not executing the query in the for loop, instead you store it in the variable promise and then push that to an array of promises.

Now docs holds all of the documents found.

When running these type of parallel queries you should specify the Read Preference in the driver so that they make use of secondary read replicas. I would adopt secondaryPreferred or nearest in this case.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论