I have a large timeseries collection filled with millions of documents called "Events" that have the following structure
{
"_id": {"$oid": "123456"},
"class": "...",
"event_data": {...},
"ts": {"$date": "2025-01-01T01:00:00.000Z"}
}
With indexes on the ts
field both ascending and descending. I am trying to run a query to pull the documents between two timestamps using Java. Like so:
ZonedDateTime lowerBound = ...;
ZonedDateTime upperBound = ...;
var query = Query.query(new Criteria().andOperator(
Criteria.where("ts").gte(lowerbound.toInstant().toEpochMilli()),
Criteria.where("ts").lt(upperbound.toInstant().toEpochMilli()),
)
var result = mongoTemplate.find(query, Events.class)
This query works... eventually after like 20 minutes since it scans the entire collections not using the indexes. When I use IntelliJ's debugger I can see the query getting formatted as follows:
Query: { "$and" : [{ "ts": { "$gte" : 1733852133000}}. { "ts" : { "$lt": 1733853933000}} ] }
Which I translate into the MongoDB console's code as such:
db.events.find({
"$and": [
{ "ts": { "$gte": 1733852133000}},
{ "ts": { "$lt": 1733853933000}},
]
})
Running this mimics what is happening in the Java code exactly, it technically works but is scanning the entire collection, which I can further see if a do a .explain()
and only see the COLLSCAN
stage.
However, if I write the following snippet, the code executes in under a couple seconds:
db.events.find({
"$and": [
{ "ts": { "$gte": new Date("2025-01-01T01:00:00Z)}},
{ "ts": { "$lt": new Date("2025-01-02T01:00:00Z)}},
]
})
Which also matches up when I run .explain()
since I can see the FETCH
and IXSCAN
stages.
What is the underlying difference between these two queries? How can I get my Java code to translate into the second version and actually make use of the indexing we have in place?
Some things I have tried:
- Using
Instant
orDate
instead ofLong
as the values, these just have the same issue of doing aCOLSCAN
- Adding a
hint("ts_1")
to force the usage of an index. When running this in the MongoDB console it errors out withhint provided does not correspond to an existing index
which is completely false since I can see the index when running.getIndexes()
. Running it in Java code seems to have the same issue as before where it does aCOLLSCAN
Edit 3/21/25
Here are some more details of what is going on. First just to show the indexes do exists:
db.events.getIndexes();
// returns
{
"key": {
"ts": 1
},
"name": "ts_1",
"v": 2
},
{
"key": {
"ts": -1
},
"name": "ts_-1",
"v": 2
}
Then the stages from .explain()
on the working query (I have stripped out some details):
db.events.find({
"$and": [
{ "ts": { "$gte": new Date("2024-12-10T17:35:33Z")} },
{ "ts": { "$lt": new Date("2024-12-10T18:05:33Z")} }
],
}, {}, {}).explain();
// returns
{
...
"stages": [
{
"$cursor": {
"queryPlanner": {
"namespace": "database.system.buckets.events",
"indexFilterSet": false,
"parsedQuery": {
"$and": [
{
"_id": {
"$lt": {"$oid": "675882ed0000000000000000"}
}
},
{
"_id": {
"$gte": {"$oid": "67572a650000000000000000"}
}
},
{
"control.max.ts": {
"$_internalExprGte": {"$date": "2024-12-10T17:35:33.000Z"}
}
},
{
"control.min.ts": {
"$_internalExprGte": {"$date": "2024-12-09T17:35:33.000Z"}
}
},
{
"control.max.ts": {
"$_internalExprLt": {"$date": "2024-12-11T18:05:33.000Z"}
}
},
{
"control.min.ts": {
"$_internalExprLt": {"$date": "2024-12-10T18:05:33.000Z"}
}
}
]
},
...
"winningPlan": {
"stage": "FETCH",
"filter": {
"$and": [
{
"_id": {
"$lt": {"$oid": "675882ed0000000000000000"}
}
},
{
"_id": {
"$gte": {"$oid": "67572a650000000000000000"}
}
}
]
},
"inputStage": {
"stage": "IXSCAN",
"keyPattern": {
"control.min.ts": 1,
"control.max.ts": 1
},
"indexName": "ts_1",
"isMultiKey": false,
"multiKeyPaths": {
"control.min.ts": [],
"control.max.ts": []
},
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds": {
"control.min.ts": ["[new Date(1733765733000), new Date(1733853933000))"],
"control.max.ts": ["[new Date(1733852133000), new Date(1733940333000))"]
}
}
}
}
And the stages from .explain()
on the non-working query (also stripped out some details):
db.events.find({
$and: [
{ ts: { $gte: 1733852133000 } },
{ ts: { $lt: 1733853933000 } }
],
}, {}, {})
// returns
{
...
"stages": [
{
"$cursor": {
"queryPlanner": {
"namespace": "db.system.buckets.events",
"indexFilterSet": false,
"parsedQuery": {
},
"queryHash": "5F5FC979",
"planCacheKey": "5F5FC979",
"maxIndexedOrSolutionsReached": false,
"maxIndexedAndSolutionsReached": false,
"maxScansToExplodeReached": false,
"winningPlan": {
"stage": "COLLSCAN",
"direction": "forward"
},
"rejectedPlans": []
}
}
},
{
"$_internalUnpackBucket": {
"exclude": [],
"timeField": "ts",
"metaField": "data",
"bucketMaxSpanSeconds": 86400,
"assumeNoMixedSchemaData": true,
"eventFilter": {
"$and": [
{
"ts": {
"$gte": 1733852133000
}
},
{
"ts": {
"$lt": 1733853933000
}
}
]
}
}
}
]
}
Finally here is the Java Query object's value which is how I am deriving the above MongoDB query:
Query: { "$and" : [{ "ts" : { "$gte" : 1733852133000}}, { "ts" : { "$lt" : 1733853933000}}]}, Fields: {}, Sort: {}
Even if I used $date
instead of the epoch milli for the filters the results are the same.
From these explain comments we can see the large difference between the two where the working one with new Date()
is adding additional filtering on the _id
column I am assuming this has something to do with how time series collections work but am not sure. In addition to that the filters get broken down and there is the IXSCAN
input stage before the FETCH
.
I have found another question on the MongoDB Forum that is extremely similar but with no real answer on to what is going on.. link
I have a large timeseries collection filled with millions of documents called "Events" that have the following structure
{
"_id": {"$oid": "123456"},
"class": "...",
"event_data": {...},
"ts": {"$date": "2025-01-01T01:00:00.000Z"}
}
With indexes on the ts
field both ascending and descending. I am trying to run a query to pull the documents between two timestamps using Java. Like so:
ZonedDateTime lowerBound = ...;
ZonedDateTime upperBound = ...;
var query = Query.query(new Criteria().andOperator(
Criteria.where("ts").gte(lowerbound.toInstant().toEpochMilli()),
Criteria.where("ts").lt(upperbound.toInstant().toEpochMilli()),
)
var result = mongoTemplate.find(query, Events.class)
This query works... eventually after like 20 minutes since it scans the entire collections not using the indexes. When I use IntelliJ's debugger I can see the query getting formatted as follows:
Query: { "$and" : [{ "ts": { "$gte" : 1733852133000}}. { "ts" : { "$lt": 1733853933000}} ] }
Which I translate into the MongoDB console's code as such:
db.events.find({
"$and": [
{ "ts": { "$gte": 1733852133000}},
{ "ts": { "$lt": 1733853933000}},
]
})
Running this mimics what is happening in the Java code exactly, it technically works but is scanning the entire collection, which I can further see if a do a .explain()
and only see the COLLSCAN
stage.
However, if I write the following snippet, the code executes in under a couple seconds:
db.events.find({
"$and": [
{ "ts": { "$gte": new Date("2025-01-01T01:00:00Z)}},
{ "ts": { "$lt": new Date("2025-01-02T01:00:00Z)}},
]
})
Which also matches up when I run .explain()
since I can see the FETCH
and IXSCAN
stages.
What is the underlying difference between these two queries? How can I get my Java code to translate into the second version and actually make use of the indexing we have in place?
Some things I have tried:
- Using
Instant
orDate
instead ofLong
as the values, these just have the same issue of doing aCOLSCAN
- Adding a
hint("ts_1")
to force the usage of an index. When running this in the MongoDB console it errors out withhint provided does not correspond to an existing index
which is completely false since I can see the index when running.getIndexes()
. Running it in Java code seems to have the same issue as before where it does aCOLLSCAN
Edit 3/21/25
Here are some more details of what is going on. First just to show the indexes do exists:
db.events.getIndexes();
// returns
{
"key": {
"ts": 1
},
"name": "ts_1",
"v": 2
},
{
"key": {
"ts": -1
},
"name": "ts_-1",
"v": 2
}
Then the stages from .explain()
on the working query (I have stripped out some details):
db.events.find({
"$and": [
{ "ts": { "$gte": new Date("2024-12-10T17:35:33Z")} },
{ "ts": { "$lt": new Date("2024-12-10T18:05:33Z")} }
],
}, {}, {}).explain();
// returns
{
...
"stages": [
{
"$cursor": {
"queryPlanner": {
"namespace": "database.system.buckets.events",
"indexFilterSet": false,
"parsedQuery": {
"$and": [
{
"_id": {
"$lt": {"$oid": "675882ed0000000000000000"}
}
},
{
"_id": {
"$gte": {"$oid": "67572a650000000000000000"}
}
},
{
"control.max.ts": {
"$_internalExprGte": {"$date": "2024-12-10T17:35:33.000Z"}
}
},
{
"control.min.ts": {
"$_internalExprGte": {"$date": "2024-12-09T17:35:33.000Z"}
}
},
{
"control.max.ts": {
"$_internalExprLt": {"$date": "2024-12-11T18:05:33.000Z"}
}
},
{
"control.min.ts": {
"$_internalExprLt": {"$date": "2024-12-10T18:05:33.000Z"}
}
}
]
},
...
"winningPlan": {
"stage": "FETCH",
"filter": {
"$and": [
{
"_id": {
"$lt": {"$oid": "675882ed0000000000000000"}
}
},
{
"_id": {
"$gte": {"$oid": "67572a650000000000000000"}
}
}
]
},
"inputStage": {
"stage": "IXSCAN",
"keyPattern": {
"control.min.ts": 1,
"control.max.ts": 1
},
"indexName": "ts_1",
"isMultiKey": false,
"multiKeyPaths": {
"control.min.ts": [],
"control.max.ts": []
},
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds": {
"control.min.ts": ["[new Date(1733765733000), new Date(1733853933000))"],
"control.max.ts": ["[new Date(1733852133000), new Date(1733940333000))"]
}
}
}
}
And the stages from .explain()
on the non-working query (also stripped out some details):
db.events.find({
$and: [
{ ts: { $gte: 1733852133000 } },
{ ts: { $lt: 1733853933000 } }
],
}, {}, {})
// returns
{
...
"stages": [
{
"$cursor": {
"queryPlanner": {
"namespace": "db.system.buckets.events",
"indexFilterSet": false,
"parsedQuery": {
},
"queryHash": "5F5FC979",
"planCacheKey": "5F5FC979",
"maxIndexedOrSolutionsReached": false,
"maxIndexedAndSolutionsReached": false,
"maxScansToExplodeReached": false,
"winningPlan": {
"stage": "COLLSCAN",
"direction": "forward"
},
"rejectedPlans": []
}
}
},
{
"$_internalUnpackBucket": {
"exclude": [],
"timeField": "ts",
"metaField": "data",
"bucketMaxSpanSeconds": 86400,
"assumeNoMixedSchemaData": true,
"eventFilter": {
"$and": [
{
"ts": {
"$gte": 1733852133000
}
},
{
"ts": {
"$lt": 1733853933000
}
}
]
}
}
}
]
}
Finally here is the Java Query object's value which is how I am deriving the above MongoDB query:
Query: { "$and" : [{ "ts" : { "$gte" : 1733852133000}}, { "ts" : { "$lt" : 1733853933000}}]}, Fields: {}, Sort: {}
Even if I used $date
instead of the epoch milli for the filters the results are the same.
From these explain comments we can see the large difference between the two where the working one with new Date()
is adding additional filtering on the _id
column I am assuming this has something to do with how time series collections work but am not sure. In addition to that the filters get broken down and there is the IXSCAN
input stage before the FETCH
.
I have found another question on the MongoDB Forum that is extremely similar but with no real answer on to what is going on.. link
Share Improve this question edited Mar 21 at 12:13 ben8622 asked Mar 20 at 22:35 ben8622ben8622 736 bronze badges 4 |1 Answer
Reset to default 1The key here is that "events" is a view, not a collection.
Documents inserted are grouped by the metaField "data", bucketed by the timeField "ts", and the buckets are stored in the "system.buckets.events" collection.
Note that in the explain plan the "ts_1" index is a compound index on the "control.min.ts" and "control.max.ts" fields.
MongoDB query operators are type sensitive, with a few exceptions for convenience. In an ordinary collection, qureying for
{ "ts": { "$gte": 1733852133000}}
would match both the integer 1733852134000
and the date 2024-12-10T17:35:34Z
However, when you query a timeseries collection like that, it is not able to both convert that to a date and ensure complete results, so the database opts for correctness, which prevents using the control field index.
To ensure the indexes and buckets can be used, look at the parsed query portion of the explain output, and adjust the client-side code to ensure that is a BSON date.
explain()
outputs (perhaps liking to a gist or pastebin if needed)? – user20042973 Commented Mar 21 at 1:40hint()
to the suggested query which did at least include one index but there's a ton more going on that I can show in theexplain()
output if I can get to it. – ben8622 Commented Mar 21 at 3:10{ "$date": "2025-01-01T01:00:00Z" }
is the Extended-JSON notation for Dates, just like in your example doc structure above. So it should have used the index - but as per your discussion comment, it doesn't. :-/ – aneroid Commented Mar 21 at 3:29.explain()
output – ben8622 Commented Mar 21 at 12:14