I have a large table in ClickHouse (MergeTree engine) with a few hundred million rows and around a hundred columns. When I select a few dozen columns based on a complex condition that returns several dozen million rows, everything works fine - I get the results back and can process them in my code.
However, when I added a single "ORDER BY" clause:
SELECT <... my 20 columns ...>
WHERE <... my complex filter>
ORDER BY log_name_with_timestamp;
the query fails. The error message is:
Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: (total) memory limit exceeded:
would use 56.36 GiB (attempt to allocate chunk of 6628255 bytes),
current RSS 48.65 GiB, maximum: 56.36 GiB.
OvercommitTracker decision: Query was selected to stop by OvercommitTracker:
While executing BufferingToFileTransform. (MEMORY_LIMIT_EXCEEDED)
I tried setting
max_bytes_before_external_sort = 10000000000; -- e.g., 10GB
, but it didn't help. I'm still getting an error.I tried to add an index to this field, but it didn't help:
┌─database─┬─table──┬─name─────────┬─type───┬─type_full─┬─expr─────┬─granularity─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─marks─┐ 1. │ default │ mylogs │ idx_fname_ts │ minmax │ minmax │ fname_ts │ 1 │ 236431 │ 2573020 │ 49574 │ └──────────┴────────┴──────────────┴────────┴───────────┴──────────┴─────────────┴───────────────────────┴─────────────────────────┴───────┘
What's wrong and what an I missing to make the sorting work? Unfortunately, I can't remove the "ORDER BY" clause, but I'm willing to accept a performance cost.
I am sure I run out of memory on sorting. I clearly see in the CLI that all rows in the table got processed. I can also successfully run the query without without ORDER BY
suffix. I suspect the issue is within BufferingToFileTransform
, but I don't fully grasp how to fix it.
I have a large table in ClickHouse (MergeTree engine) with a few hundred million rows and around a hundred columns. When I select a few dozen columns based on a complex condition that returns several dozen million rows, everything works fine - I get the results back and can process them in my code.
However, when I added a single "ORDER BY" clause:
SELECT <... my 20 columns ...>
WHERE <... my complex filter>
ORDER BY log_name_with_timestamp;
the query fails. The error message is:
Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: (total) memory limit exceeded:
would use 56.36 GiB (attempt to allocate chunk of 6628255 bytes),
current RSS 48.65 GiB, maximum: 56.36 GiB.
OvercommitTracker decision: Query was selected to stop by OvercommitTracker:
While executing BufferingToFileTransform. (MEMORY_LIMIT_EXCEEDED)
I tried setting
max_bytes_before_external_sort = 10000000000; -- e.g., 10GB
, but it didn't help. I'm still getting an error.I tried to add an index to this field, but it didn't help:
┌─database─┬─table──┬─name─────────┬─type───┬─type_full─┬─expr─────┬─granularity─┬─data_compressed_bytes─┬─data_uncompressed_bytes─┬─marks─┐ 1. │ default │ mylogs │ idx_fname_ts │ minmax │ minmax │ fname_ts │ 1 │ 236431 │ 2573020 │ 49574 │ └──────────┴────────┴──────────────┴────────┴───────────┴──────────┴─────────────┴───────────────────────┴─────────────────────────┴───────┘
What's wrong and what an I missing to make the sorting work? Unfortunately, I can't remove the "ORDER BY" clause, but I'm willing to accept a performance cost.
I am sure I run out of memory on sorting. I clearly see in the CLI that all rows in the table got processed. I can also successfully run the query without without ORDER BY
suffix. I suspect the issue is within BufferingToFileTransform
, but I don't fully grasp how to fix it.
- What's the version you are using? And how about reducing columns and still memory exceeded or nor? – jsc0218 Commented Feb 10 at 21:53
- @jsc0218 ClickHouse server version 25.2.1. If I reduce the number of columns I request it also work just fine, even with "ORDER BY". But I do need all the columns. – CuriousD Commented Feb 10 at 23:33
- How about max_memory_usage? max_bytes_before_external_sort should be less than it. Also what's your allocated ram? BTW, you can also try to set less max_threads. – jsc0218 Commented Feb 11 at 14:47
2 Answers
Reset to default 0Try to use Common Table Expression to divide your query into 2 parts:
- Select resulting data
- Sort result
This should reduce memory requirements for this query. Something like this pseudo-code:
WITH result AS (
SELECT <... my 20 columns ...>
WHERE <... my complex filter>
)
SELECT * FROM result
ORDER BY log_name_with_timestamp;
Unfortunately, the only viable solution I found is to create a projection based on the field name. In this case, the query optimizer avoids sorting because the data is already pre-sorted in immutable files.