

本文介绍了Mongo聚合:将值划分为组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述


This is probably a longshot, but:


I'd like to group a set of time-series documents by gaps between dates: sort the documents ascending by date, then partition when the interval between the current and previous is above some threshold.


I can do this easily after getting the documents, of course; in this example, the original documents get a new partition number field:

// assuming sorted docs var partition = 0; var partitioned = docs.map((e,i) => { if(i > 0) if(e.date - docs[i-1].date > minInterval) partition++; return { date: e.date, partition: partition } });


But I don't actually need the documents themselves, I just need the first and last dates and number of docs for each partition. It's just unclear how I would do the partitioning function.


Is this possible with an aggregation? I see a possibly relevant Mondo ticket that is open, so I'm guessing not.


是的,有可能.要比较多个文档,您需要使用 $组,然后将null作为_id传递.然后,要开始比较值,就需要像for循环一样的索引,以便可以使用 $ range 运算符.

Yes, it is possible. To compare multiple documents you need to put them in one array using $group and passing null as _id. Then to start comparing values you need an index just like in for loop so you can generate it using $range operator.

要确定分区,您需要加倍 $ map .第一个将返回一个0和1值的数组,其中1表示该日期开始新分区.

To determine partitions you need double $map. First one will return an array of 0 and 1 values where 1 means that this date starts new partition.

第二个$map将日期与分区索引合并.要获取分区索引,您可以 $ sum 和零和一的子数组( $ slice ).

Second $map is to merge dates with partition indexes. To get the partition index you can $sum an subarray ($slice) of zeros and ones.


db.col.save({ date: ISODate("2019-04-12T21:00:00.000Z") }) db.col.save({ date: ISODate("2019-04-12T21:15:00.000Z") }) db.col.save({ date: ISODate("2019-04-12T21:45:00.000Z") }) db.col.save({ date: ISODate("2019-04-12T23:00:00.000Z") }) db.col.save({ date: ISODate("2019-04-12T20:00:00.000Z") }) db.col.save({ date: ISODate("2019-04-12T18:30:00.000Z") }) db.col.save({ date: ISODate("2019-04-12T20:10:00.000Z") })


For the interval of 20 minutes you can run below aggregation:

db.col.aggregate([ { $sort: { date: 1 } }, { $group: { _id: null, dates: { $push: "$date" } } }, { $addFields: { partitions: { $map: { input: { $range: [ 0, { $size: "$dates" } ] }, as: "index", in: { $let: { vars: { current: { $arrayElemAt: [ "$dates", "$$index" ] }, prev: { $arrayElemAt: [ "$dates", { $add: [ "$$index", -1 ] } ] } }, in: { $cond: [ { $or: [ { $eq: [ "$$index", 0 ] }, { $lt: [ { $subtract: [ "$$current", "$$prev" ] }, 1200000 ] } ] }, 0, 1 ] } } } } } } }, { $project: { datesWithPartitions: { $map: { input: { $range: [ 0, { $size: "$dates" } ] }, as: "index", in: { date: { $arrayElemAt: [ "$dates", "$$index" ] }, partition: { $sum: { $slice: [ "$partitions", { $add: [ "$$index", 1 ] } ] } } } } } } } ])


{ "_id" : null, "datesWithPartitions" : [ { "date" : ISODate("2019-04-12T18:30:00Z"), "partition" : 0 }, { "date" : ISODate("2019-04-12T20:00:00Z"), "partition" : 1 }, { "date" : ISODate("2019-04-12T20:10:00Z"), "partition" : 1 }, { "date" : ISODate("2019-04-12T21:00:00Z"), "partition" : 2 }, { "date" : ISODate("2019-04-12T21:15:00Z"), "partition" : 2 }, { "date" : ISODate("2019-04-12T21:45:00Z"), "partition" : 3 }, { "date" : ISODate("2019-04-12T23:00:00Z"), "partition" : 4 } ] }




  1. 暂无评论