最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

java - MongoDB Timeseries Query Not Using Index And Taking Excessively Long - Stack Overflow

programmeradmin4浏览0评论

I have a large timeseries collection filled with millions of documents called "Events" that have the following structure

{
  "_id": {"$oid": "123456"},
  "class": "...",
  "event_data": {...},
  "ts": {"$date": "2025-01-01T01:00:00.000Z"}
}

With indexes on the ts field both ascending and descending. I am trying to run a query to pull the documents between two timestamps using Java. Like so:

ZonedDateTime lowerBound = ...;
ZonedDateTime upperBound = ...;

var query = Query.query(new Criteria().andOperator(
  Criteria.where("ts").gte(lowerbound.toInstant().toEpochMilli()),
  Criteria.where("ts").lt(upperbound.toInstant().toEpochMilli()),
)

var result = mongoTemplate.find(query, Events.class)

This query works... eventually after like 20 minutes since it scans the entire collections not using the indexes. When I use IntelliJ's debugger I can see the query getting formatted as follows:

Query: { "$and" : [{ "ts": { "$gte" : 1733852133000}}. { "ts" : { "$lt": 1733853933000}} ] }

Which I translate into the MongoDB console's code as such:

db.events.find({
  "$and": [
    { "ts": { "$gte": 1733852133000}},
    { "ts": { "$lt": 1733853933000}},
  ]
})

Running this mimics what is happening in the Java code exactly, it technically works but is scanning the entire collection, which I can further see if a do a .explain() and only see the COLLSCAN stage.

However, if I write the following snippet, the code executes in under a couple seconds:

db.events.find({
  "$and": [
    { "ts": { "$gte": new Date("2025-01-01T01:00:00Z)}},
    { "ts": { "$lt": new Date("2025-01-02T01:00:00Z)}},
  ]
})

Which also matches up when I run .explain() since I can see the FETCH and IXSCAN stages.

What is the underlying difference between these two queries? How can I get my Java code to translate into the second version and actually make use of the indexing we have in place?

Some things I have tried:

  1. Using Instant or Date instead of Long as the values, these just have the same issue of doing a COLSCAN
  2. Adding a hint("ts_1") to force the usage of an index. When running this in the MongoDB console it errors out with hint provided does not correspond to an existing index which is completely false since I can see the index when running .getIndexes(). Running it in Java code seems to have the same issue as before where it does a COLLSCAN

Edit 3/21/25

Here are some more details of what is going on. First just to show the indexes do exists:

db.events.getIndexes();

// returns
{
    "key": {
      "ts": 1
    },
    "name": "ts_1",
    "v": 2
  },
  {
    "key": {
      "ts": -1
    },
    "name": "ts_-1",
    "v": 2
  }

Then the stages from .explain() on the working query (I have stripped out some details):

db.events.find({
    "$and": [
        { "ts": { "$gte": new Date("2024-12-10T17:35:33Z")} },
        { "ts": { "$lt": new Date("2024-12-10T18:05:33Z")} }
    ],
}, {}, {}).explain();

// returns
{
 ...
 "stages": [
      {
        "$cursor": {
          "queryPlanner": {
            "namespace": "database.system.buckets.events",
            "indexFilterSet": false,
            "parsedQuery": {
              "$and": [
                {
                  "_id": {
                    "$lt": {"$oid": "675882ed0000000000000000"}
                  }
                },
                {
                  "_id": {
                    "$gte": {"$oid": "67572a650000000000000000"}
                  }
                },
                {
                  "control.max.ts": {
                    "$_internalExprGte": {"$date": "2024-12-10T17:35:33.000Z"}
                  }
                },
                {
                  "control.min.ts": {
                    "$_internalExprGte": {"$date": "2024-12-09T17:35:33.000Z"}
                  }
                },
                {
                  "control.max.ts": {
                    "$_internalExprLt": {"$date": "2024-12-11T18:05:33.000Z"}
                  }
                },
                {
                  "control.min.ts": {
                    "$_internalExprLt": {"$date": "2024-12-10T18:05:33.000Z"}
                  }
                }
              ]
            },
            
            ...

            "winningPlan": {
              "stage": "FETCH",
              "filter": {
                "$and": [
                  {
                    "_id": {
                      "$lt": {"$oid": "675882ed0000000000000000"}
                    }
                  },
                  {
                    "_id": {
                      "$gte": {"$oid": "67572a650000000000000000"}
                    }
                  }
                ]
              },
              "inputStage": {
                "stage": "IXSCAN",
                "keyPattern": {
                  "control.min.ts": 1,
                  "control.max.ts": 1
                },
                "indexName": "ts_1",
                "isMultiKey": false,
                "multiKeyPaths": {
                  "control.min.ts": [],
                  "control.max.ts": []
                },
                "isUnique": false,
                "isSparse": false,
                "isPartial": false,
                "indexVersion": 2,
                "direction": "forward",
                "indexBounds": {
                  "control.min.ts": ["[new Date(1733765733000), new Date(1733853933000))"],
                  "control.max.ts": ["[new Date(1733852133000), new Date(1733940333000))"]
                }
              }
            }
}

And the stages from .explain() on the non-working query (also stripped out some details):

db.events.find({
    $and: [
        { ts: { $gte: 1733852133000 } },
        { ts: { $lt: 1733853933000 } }
    ],
}, {}, {})

// returns
{
...
"stages": [
      {
        "$cursor": {
          "queryPlanner": {
            "namespace": "db.system.buckets.events",
            "indexFilterSet": false,
            "parsedQuery": {
            },
            "queryHash": "5F5FC979",
            "planCacheKey": "5F5FC979",
            "maxIndexedOrSolutionsReached": false,
            "maxIndexedAndSolutionsReached": false,
            "maxScansToExplodeReached": false,
            "winningPlan": {
              "stage": "COLLSCAN",
              "direction": "forward"
            },
            "rejectedPlans": []
          }
        }
      },
      {
        "$_internalUnpackBucket": {
          "exclude": [],
          "timeField": "ts",
          "metaField": "data",
          "bucketMaxSpanSeconds": 86400,
          "assumeNoMixedSchemaData": true,
          "eventFilter": {
            "$and": [
              {
                "ts": {
                  "$gte": 1733852133000
                }
              },
              {
                "ts": {
                  "$lt": 1733853933000
                }
              }
            ]
          }
        }
      }
    ]
}

Finally here is the Java Query object's value which is how I am deriving the above MongoDB query:

Query: { "$and" : [{ "ts" : { "$gte" : 1733852133000}}, { "ts" : { "$lt" : 1733853933000}}]}, Fields: {}, Sort: {}

Even if I used $date instead of the epoch milli for the filters the results are the same.

From these explain comments we can see the large difference between the two where the working one with new Date() is adding additional filtering on the _id column I am assuming this has something to do with how time series collections work but am not sure. In addition to that the filters get broken down and there is the IXSCAN input stage before the FETCH.

I have found another question on the MongoDB Forum that is extremely similar but with no real answer on to what is going on.. link

I have a large timeseries collection filled with millions of documents called "Events" that have the following structure

{
  "_id": {"$oid": "123456"},
  "class": "...",
  "event_data": {...},
  "ts": {"$date": "2025-01-01T01:00:00.000Z"}
}

With indexes on the ts field both ascending and descending. I am trying to run a query to pull the documents between two timestamps using Java. Like so:

ZonedDateTime lowerBound = ...;
ZonedDateTime upperBound = ...;

var query = Query.query(new Criteria().andOperator(
  Criteria.where("ts").gte(lowerbound.toInstant().toEpochMilli()),
  Criteria.where("ts").lt(upperbound.toInstant().toEpochMilli()),
)

var result = mongoTemplate.find(query, Events.class)

This query works... eventually after like 20 minutes since it scans the entire collections not using the indexes. When I use IntelliJ's debugger I can see the query getting formatted as follows:

Query: { "$and" : [{ "ts": { "$gte" : 1733852133000}}. { "ts" : { "$lt": 1733853933000}} ] }

Which I translate into the MongoDB console's code as such:

db.events.find({
  "$and": [
    { "ts": { "$gte": 1733852133000}},
    { "ts": { "$lt": 1733853933000}},
  ]
})

Running this mimics what is happening in the Java code exactly, it technically works but is scanning the entire collection, which I can further see if a do a .explain() and only see the COLLSCAN stage.

However, if I write the following snippet, the code executes in under a couple seconds:

db.events.find({
  "$and": [
    { "ts": { "$gte": new Date("2025-01-01T01:00:00Z)}},
    { "ts": { "$lt": new Date("2025-01-02T01:00:00Z)}},
  ]
})

Which also matches up when I run .explain() since I can see the FETCH and IXSCAN stages.

What is the underlying difference between these two queries? How can I get my Java code to translate into the second version and actually make use of the indexing we have in place?

Some things I have tried:

  1. Using Instant or Date instead of Long as the values, these just have the same issue of doing a COLSCAN
  2. Adding a hint("ts_1") to force the usage of an index. When running this in the MongoDB console it errors out with hint provided does not correspond to an existing index which is completely false since I can see the index when running .getIndexes(). Running it in Java code seems to have the same issue as before where it does a COLLSCAN

Edit 3/21/25

Here are some more details of what is going on. First just to show the indexes do exists:

db.events.getIndexes();

// returns
{
    "key": {
      "ts": 1
    },
    "name": "ts_1",
    "v": 2
  },
  {
    "key": {
      "ts": -1
    },
    "name": "ts_-1",
    "v": 2
  }

Then the stages from .explain() on the working query (I have stripped out some details):

db.events.find({
    "$and": [
        { "ts": { "$gte": new Date("2024-12-10T17:35:33Z")} },
        { "ts": { "$lt": new Date("2024-12-10T18:05:33Z")} }
    ],
}, {}, {}).explain();

// returns
{
 ...
 "stages": [
      {
        "$cursor": {
          "queryPlanner": {
            "namespace": "database.system.buckets.events",
            "indexFilterSet": false,
            "parsedQuery": {
              "$and": [
                {
                  "_id": {
                    "$lt": {"$oid": "675882ed0000000000000000"}
                  }
                },
                {
                  "_id": {
                    "$gte": {"$oid": "67572a650000000000000000"}
                  }
                },
                {
                  "control.max.ts": {
                    "$_internalExprGte": {"$date": "2024-12-10T17:35:33.000Z"}
                  }
                },
                {
                  "control.min.ts": {
                    "$_internalExprGte": {"$date": "2024-12-09T17:35:33.000Z"}
                  }
                },
                {
                  "control.max.ts": {
                    "$_internalExprLt": {"$date": "2024-12-11T18:05:33.000Z"}
                  }
                },
                {
                  "control.min.ts": {
                    "$_internalExprLt": {"$date": "2024-12-10T18:05:33.000Z"}
                  }
                }
              ]
            },
            
            ...

            "winningPlan": {
              "stage": "FETCH",
              "filter": {
                "$and": [
                  {
                    "_id": {
                      "$lt": {"$oid": "675882ed0000000000000000"}
                    }
                  },
                  {
                    "_id": {
                      "$gte": {"$oid": "67572a650000000000000000"}
                    }
                  }
                ]
              },
              "inputStage": {
                "stage": "IXSCAN",
                "keyPattern": {
                  "control.min.ts": 1,
                  "control.max.ts": 1
                },
                "indexName": "ts_1",
                "isMultiKey": false,
                "multiKeyPaths": {
                  "control.min.ts": [],
                  "control.max.ts": []
                },
                "isUnique": false,
                "isSparse": false,
                "isPartial": false,
                "indexVersion": 2,
                "direction": "forward",
                "indexBounds": {
                  "control.min.ts": ["[new Date(1733765733000), new Date(1733853933000))"],
                  "control.max.ts": ["[new Date(1733852133000), new Date(1733940333000))"]
                }
              }
            }
}

And the stages from .explain() on the non-working query (also stripped out some details):

db.events.find({
    $and: [
        { ts: { $gte: 1733852133000 } },
        { ts: { $lt: 1733853933000 } }
    ],
}, {}, {})

// returns
{
...
"stages": [
      {
        "$cursor": {
          "queryPlanner": {
            "namespace": "db.system.buckets.events",
            "indexFilterSet": false,
            "parsedQuery": {
            },
            "queryHash": "5F5FC979",
            "planCacheKey": "5F5FC979",
            "maxIndexedOrSolutionsReached": false,
            "maxIndexedAndSolutionsReached": false,
            "maxScansToExplodeReached": false,
            "winningPlan": {
              "stage": "COLLSCAN",
              "direction": "forward"
            },
            "rejectedPlans": []
          }
        }
      },
      {
        "$_internalUnpackBucket": {
          "exclude": [],
          "timeField": "ts",
          "metaField": "data",
          "bucketMaxSpanSeconds": 86400,
          "assumeNoMixedSchemaData": true,
          "eventFilter": {
            "$and": [
              {
                "ts": {
                  "$gte": 1733852133000
                }
              },
              {
                "ts": {
                  "$lt": 1733853933000
                }
              }
            ]
          }
        }
      }
    ]
}

Finally here is the Java Query object's value which is how I am deriving the above MongoDB query:

Query: { "$and" : [{ "ts" : { "$gte" : 1733852133000}}, { "ts" : { "$lt" : 1733853933000}}]}, Fields: {}, Sort: {}

Even if I used $date instead of the epoch milli for the filters the results are the same.

From these explain comments we can see the large difference between the two where the working one with new Date() is adding additional filtering on the _id column I am assuming this has something to do with how time series collections work but am not sure. In addition to that the filters get broken down and there is the IXSCAN input stage before the FETCH.

I have found another question on the MongoDB Forum that is extremely similar but with no real answer on to what is going on.. link

Share Improve this question edited Mar 21 at 12:13 ben8622 asked Mar 20 at 22:35 ben8622ben8622 736 bronze badges 4
  • Interesting. The current (AI generated?) answer is wrong because it misses the fact that your sample document is provided in extended JSON format. Even if the data type was wrong the database should still be able to generate index bounds with it. Plus you're presumably getting right results... Can you share the full explain() outputs (perhaps liking to a gist or pastebin if needed)? – user20042973 Commented Mar 21 at 1:40
  • Yeah it did seem a bit AI generated just didn't want to call it out.. I will attempt to tomorrow. This is company stuff so I have to obscure a ton of relevant information. I did however attempt to add a hint() to the suggested query which did at least include one index but there's a ton more going on that I can show in the explain() output if I can get to it. – ben8622 Commented Mar 21 at 3:10
  • The only part of that answer which is correct is to "use Date instead of Long". The generated { "$date": "2025-01-01T01:00:00Z" } is the Extended-JSON notation for Dates, just like in your example doc structure above. So it should have used the index - but as per your discussion comment, it doesn't. :-/ – aneroid Commented Mar 21 at 3:29
  • 1 @aneroid and @user2004973 I have added more details with .explain() output – ben8622 Commented Mar 21 at 12:14
Add a comment  | 

1 Answer 1

Reset to default 1

The key here is that "events" is a view, not a collection.

Documents inserted are grouped by the metaField "data", bucketed by the timeField "ts", and the buckets are stored in the "system.buckets.events" collection.

Note that in the explain plan the "ts_1" index is a compound index on the "control.min.ts" and "control.max.ts" fields.

MongoDB query operators are type sensitive, with a few exceptions for convenience. In an ordinary collection, qureying for

{ "ts": { "$gte": 1733852133000}}

would match both the integer 1733852134000 and the date 2024-12-10T17:35:34Z

However, when you query a timeseries collection like that, it is not able to both convert that to a date and ensure complete results, so the database opts for correctness, which prevents using the control field index.

To ensure the indexes and buckets can be used, look at the parsed query portion of the explain output, and adjust the client-side code to ensure that is a BSON date.

发布评论

评论列表(0)

  1. 暂无评论