最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Mongodb aggregation vs client side processing - Stack Overflow

programmeradmin0浏览0评论

I have a blogs collection which has almost the following schema:

{ 
    title: { name: "My First Blog Post",
             postDate: "01-28-11" },
    content: "Here is my super long post ...",
    ments: [ { text: "This post sucks!"
              , name: "seanhess"
              , created: 01-28-14}
            , { text: "I know! I wish it were longer"
              , name: "bob"
              , postDate: 01-28-11} 
            ] 
}

I mainly want to run three queries:

  1. Give me all the ments made by only bob
  2. Find all the ments made at the same day the post is written which is ments.postDate = title.postDate.
  3. Find all the ments made by bob on the same day the post is written

My questions are as following:

  • These three are going to be really frequent queries, so is it a good idea to use aggregation framework?
  • For the third query, I can simply make a query like db.blogs.find({"ments.name":"bob"}, {ments.name:1, ments.postDate:1, title.postDate:1}) and then do a client side post processing to loop through the returned results. Is it a good idea? I'd like to note that it is possible that this might return several thousand documents back.
  • I will be happy if you can propose some ways to make the third query.

I have a blogs collection which has almost the following schema:

{ 
    title: { name: "My First Blog Post",
             postDate: "01-28-11" },
    content: "Here is my super long post ...",
    ments: [ { text: "This post sucks!"
              , name: "seanhess"
              , created: 01-28-14}
            , { text: "I know! I wish it were longer"
              , name: "bob"
              , postDate: 01-28-11} 
            ] 
}

I mainly want to run three queries:

  1. Give me all the ments made by only bob
  2. Find all the ments made at the same day the post is written which is ments.postDate = title.postDate.
  3. Find all the ments made by bob on the same day the post is written

My questions are as following:

  • These three are going to be really frequent queries, so is it a good idea to use aggregation framework?
  • For the third query, I can simply make a query like db.blogs.find({"ments.name":"bob"}, {ments.name:1, ments.postDate:1, title.postDate:1}) and then do a client side post processing to loop through the returned results. Is it a good idea? I'd like to note that it is possible that this might return several thousand documents back.
  • I will be happy if you can propose some ways to make the third query.
Share edited Jun 26, 2017 at 11:33 Neil Lunn 151k36 gold badges355 silver badges325 bronze badges asked Mar 21, 2014 at 9:54 anvarikanvarik 6,4975 gold badges41 silver badges53 bronze badges
Add a ment  | 

1 Answer 1

Reset to default 16

It probably is best practice here to "break-up" your multiple questions in to several questions, if not only for that maybe the answer on one question would have led you to understand the other.

I am also not very keen on answering anything where there is no example shown of what yo have tried to do. But with that said and "shooting myself in the foot", the questions are reasonable from a design approach so I will answer.

Point 1 : Comments by "bob"

Standard $unwind and filter the results. Use $match first so you don't process unneeded documents.

db.collection.aggregate([

    // Match to "narrow down" the documents.
    { "$match": { "ments.name": "bob" }},

    // Unwind the array
    { "$unwind": "$ments" },

    // Match and "filter" just the "bob" ments
    { "$match": { "ments.name": "bob" }},

    // Possibly wind back the array
    { "$group": {
       "_id": "$_id",
       "title": { "$first": "$title" },
       "content": { "$first": "$content" },
       "ments": { "$push": "$ments" }
    }}
])

Point 2: All ments on the same day

db.collection.aggregate([

    // Try and match posts within a date or range
    // { "$match": { "title.postDate": Date( /* something */ ) }},

    // Unwind the array
    { "$unwind": "$ments" },

    // Aha! Project out the same day. Not the time-stamp.
    { "$project": {
        "title": 1,
        "content": 1,
        "ments": 1,
        "same": { "$eq": [
            {
                "year"   : { "$year":  "$title.postDate" },
                "month"  : { "$month": "$title.postDate" },
                "day": { "$dayOfMonth": "$title.postDate" }
            },
            {
                "year"   : { "$year": "$ments.postDate" },
                "month"  : { "$month": "$ments.postDate" },
                "day": { "$dayOfMonth": "$ments.postDate" }
            }
        ]}
     }},

     // Match the things on the "same 
     { "$match": { "same": true } },     

    // Possibly wind back the array
    { "$group": {
       "_id": "$_id",
       "title": { "$first": "$title" },
       "content": { "$first": "$content" },
       "ments": { "$push": "$ments" }
    }}

])

Point 3: "bob" on the same date

db.collection.aggregate([

    // Try and match posts within a date or range
    // { "$match": { "title.postDate": Date( /* something */ ) }},

    // Unwind the array
    { "$unwind": "$ments" },

    // Aha! Project out the same day. Not the time-stamp.
    { "$project": {
        "title": 1,
        "content": 1,
        "ments": 1,
        "same": { "$eq": [
            {
                "year"   : { "$year":  "$title.postDate" },
                "month"  : { "$month": "$title.postDate" },
                "day": { "$dayOfMonth": "$title.postDate" }
            },
            {
                "year"   : { "$year": "$ments.postDate" },
                "month"  : { "$month": "$ments.postDate" },
                "day": { "$dayOfMonth": "$ments.postDate" }
            }
        ]}
     }},

     // Match the things on the "same" field
     { "$match": { "same": true, "ments.name": "bob" } },     

    // Possibly wind back the array
    { "$group": {
       "_id": "$_id",
       "title": { "$first": "$title" },
       "content": { "$first": "$content" },
       "ments": { "$push": "$ments" }
    }}

])

Results

Honestly, and especially if you are using some indexing to feed to the initial $match stages of these operations, then it should be very clear that this will "run rings" around trying to iterate this in code.

At the very least this reduces the returned records "over the wire", so there is less network traffic. And of course there is less (or nothing) to post process once the query results have been received.

As a general convention, database server hardware tends to be an order of magnitude higher rated in performance than "application server" hardware. So again the general condition is that anything executed on the server will run faster.

  • Is aggregation the right thing: "Yes". and by a long long way. You even get a cursor very soon.

  • How can you do the queries you want: Shown to be pretty simple. And in real world code we never "hard code" this, we build it dynamically. So adding conditions and attributes should be as simple as all you normal data manipulation code.

So I would not normally answer this style of question. But say thank-you! Please ?

发布评论

评论列表(0)

  1. 暂无评论