I'm running Cassandra 5.0.2 on a single node and loading many gigabytes of text data into a table with a columns (key bigint, value text) through a Python program.
Anytime I load large amounts of data, eventually I'm unable to query the table being loaded. My cqlsh prints:
cqlsh> select * from article_infos limit 5;
ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures: UNKNOWN from /10.1.1.93:7000" info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 1, 'error_code_map': {'10.1.1.93': '0x0000'}}
Whenever executing such a query, the Cassandra log prints:
ERROR [ReadStage-2] 2025-03-14 10:03:29,385 JVMStabilityInspector.java:70 - Exception in thread Thread[ReadStage-2,5,SharedPool]
java.lang.AssertionError: 7671074 > 7340032
at .apache.cassandra.io.util.MmappedRegions$State.floor(MmappedRegions.java:363)
at .apache.cassandra.io.util.MmappedRegions.floor(MmappedRegions.java:242)
at .apache.cassandra.io.util.MmapRebufferer.rebuffer(MmapRebufferer.java:40)
at .apache.cassandra.io.tries.Walker.<init>(Walker.java:75)
at .apache.cassandra.io.tries.ValueIterator.<init>(ValueIterator.java:96)
at .apache.cassandra.io.tries.ValueIterator.<init>(ValueIterator.java:80)
at .apache.cassandra.io.sstable.format.bti.PartitionIndex$IndexPosIterator.<init>(PartitionIndex.java:407)
at .apache.cassandra.io.sstable.format.bti.PartitionIterator.<init>(PartitionIterator.java:113)
at .apache.cassandra.io.sstable.format.bti.PartitionIterator.create(PartitionIterator.java:75)
at .apache.cassandra.io.sstable.format.bti.BtiTableReader.coveredKeysIterator(BtiTableReader.java:295)
at .apache.cassandra.io.sstable.format.bti.BtiTableScanner$BtiScanningIterator.prepareToIterateRow(BtiTableScanner.java:114)
at .apache.cassandra.io.sstable.format.SSTableScanner$BaseKeyScanningIteratorputeNext(SSTableScanner.java:264)
at .apache.cassandra.io.sstable.format.SSTableScanner$BaseKeyScanningIteratorputeNext(SSTableScanner.java:244)
at .apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at .apache.cassandra.io.sstable.format.SSTableScanner.hasNext(SSTableScanner.java:206)
at .apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:90)
at .apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:375)
at .apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:187)
at .apache.cassandra.utils.MergeIterator$ManyToOneputeNext(MergeIterator.java:156)
at .apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at .apache.cassandra.db.partitions.UnfilteredPartitionIterators$4.hasNext(UnfilteredPartitionIterators.java:264)
at .apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:90)
at .apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:334)
at .apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
at .apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
at .apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
at .apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:372)
at .apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2210)
at .apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2607)
at .apache.cassandra.concurrent.ExecutionFailure$2.run(ExecutionFailure.java:163)
at .apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143)
at ioty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
ERROR [Reference-Reaper] 2025-03-14 10:03:55,267 Ref.java:243 - LEAK DETECTED: a reference (class .apache.cassandra.io.util.FileHandle$Cleanup@1883489551:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Data.db) to class .apache.cassandra.io.util.FileHandle$Cleanup@1883489551:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Data.db was not released before the reference was garbage collected
ERROR [Reference-Reaper] 2025-03-14 10:03:55,268 Ref.java:243 - LEAK DETECTED: a reference (class .apache.cassandra.io.util.FileHandle$Cleanup@34244591:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Partitions.db) to class .apache.cassandra.io.util.FileHandle$Cleanup@34244591:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Partitions.db was not released before the reference was garbage collected
ERROR [Reference-Reaper] 2025-03-14 10:03:55,268 Ref.java:243 - LEAK DETECTED: a reference (class .apache.cassandra.io.util.FileHandle$Cleanup@713715721:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Rows.db) to class .apache.cassandra.io.util.FileHandle$Cleanup@713715721:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Rows.db was not released before the reference was garbage collected
I'm currently using Kafka and the datastax connector sink to write the data to Cassandra, but the same problem occurs even if I use insert statements directly, without going through Kafka.
After an undefined period of time, the errors usually go away and I can query the database.
I'm wondering if anyone knows what's going on here. I'm curious if anyone knows:
- If this error can be avoided
- How to know when the database is in a "good" state after writing tons of data. Even assuming the above select query succeeds, I have observed this problem so many times that it has shaken my faith in the underlying data integrity. I know Cassandra runs compaction steps even after data writing has completed, but getting assertion errors like the above is disconcerting, not to mention that I can never be sure if the database is actually "ready" to be queried.
I'm running Cassandra 5.0.2 on a single node and loading many gigabytes of text data into a table with a columns (key bigint, value text) through a Python program.
Anytime I load large amounts of data, eventually I'm unable to query the table being loaded. My cqlsh prints:
cqlsh> select * from article_infos limit 5;
ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures: UNKNOWN from /10.1.1.93:7000" info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 1, 'error_code_map': {'10.1.1.93': '0x0000'}}
Whenever executing such a query, the Cassandra log prints:
ERROR [ReadStage-2] 2025-03-14 10:03:29,385 JVMStabilityInspector.java:70 - Exception in thread Thread[ReadStage-2,5,SharedPool]
java.lang.AssertionError: 7671074 > 7340032
at .apache.cassandra.io.util.MmappedRegions$State.floor(MmappedRegions.java:363)
at .apache.cassandra.io.util.MmappedRegions.floor(MmappedRegions.java:242)
at .apache.cassandra.io.util.MmapRebufferer.rebuffer(MmapRebufferer.java:40)
at .apache.cassandra.io.tries.Walker.<init>(Walker.java:75)
at .apache.cassandra.io.tries.ValueIterator.<init>(ValueIterator.java:96)
at .apache.cassandra.io.tries.ValueIterator.<init>(ValueIterator.java:80)
at .apache.cassandra.io.sstable.format.bti.PartitionIndex$IndexPosIterator.<init>(PartitionIndex.java:407)
at .apache.cassandra.io.sstable.format.bti.PartitionIterator.<init>(PartitionIterator.java:113)
at .apache.cassandra.io.sstable.format.bti.PartitionIterator.create(PartitionIterator.java:75)
at .apache.cassandra.io.sstable.format.bti.BtiTableReader.coveredKeysIterator(BtiTableReader.java:295)
at .apache.cassandra.io.sstable.format.bti.BtiTableScanner$BtiScanningIterator.prepareToIterateRow(BtiTableScanner.java:114)
at .apache.cassandra.io.sstable.format.SSTableScanner$BaseKeyScanningIteratorputeNext(SSTableScanner.java:264)
at .apache.cassandra.io.sstable.format.SSTableScanner$BaseKeyScanningIteratorputeNext(SSTableScanner.java:244)
at .apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at .apache.cassandra.io.sstable.format.SSTableScanner.hasNext(SSTableScanner.java:206)
at .apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:90)
at .apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:375)
at .apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:187)
at .apache.cassandra.utils.MergeIterator$ManyToOneputeNext(MergeIterator.java:156)
at .apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at .apache.cassandra.db.partitions.UnfilteredPartitionIterators$4.hasNext(UnfilteredPartitionIterators.java:264)
at .apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:90)
at .apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:334)
at .apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:201)
at .apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:186)
at .apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:48)
at .apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:372)
at .apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2210)
at .apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2607)
at .apache.cassandra.concurrent.ExecutionFailure$2.run(ExecutionFailure.java:163)
at .apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:143)
at ioty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
ERROR [Reference-Reaper] 2025-03-14 10:03:55,267 Ref.java:243 - LEAK DETECTED: a reference (class .apache.cassandra.io.util.FileHandle$Cleanup@1883489551:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Data.db) to class .apache.cassandra.io.util.FileHandle$Cleanup@1883489551:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Data.db was not released before the reference was garbage collected
ERROR [Reference-Reaper] 2025-03-14 10:03:55,268 Ref.java:243 - LEAK DETECTED: a reference (class .apache.cassandra.io.util.FileHandle$Cleanup@34244591:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Partitions.db) to class .apache.cassandra.io.util.FileHandle$Cleanup@34244591:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Partitions.db was not released before the reference was garbage collected
ERROR [Reference-Reaper] 2025-03-14 10:03:55,268 Ref.java:243 - LEAK DETECTED: a reference (class .apache.cassandra.io.util.FileHandle$Cleanup@713715721:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Rows.db) to class .apache.cassandra.io.util.FileHandle$Cleanup@713715721:/scratch/USER/cassandra/data/PROJECT/article_infos-0e9822e0ff5111ef8c7267a1b8a131a2/da-3gok_1b58_3yj682gtxbd0q6ceck-bti-Rows.db was not released before the reference was garbage collected
I'm currently using Kafka and the datastax connector sink to write the data to Cassandra, but the same problem occurs even if I use insert statements directly, without going through Kafka.
After an undefined period of time, the errors usually go away and I can query the database.
I'm wondering if anyone knows what's going on here. I'm curious if anyone knows:
- If this error can be avoided
- How to know when the database is in a "good" state after writing tons of data. Even assuming the above select query succeeds, I have observed this problem so many times that it has shaken my faith in the underlying data integrity. I know Cassandra runs compaction steps even after data writing has completed, but getting assertion errors like the above is disconcerting, not to mention that I can never be sure if the database is actually "ready" to be queried.
- Thanks for posting on Stack Overflow! A friendly reminder that this site is for getting help with coding, algorithm, or programming language problems so I voted to have your post moved to DBA Stack Exchange. For future reference, you should post DB admin/ops questions on dba.stackexchange/questions/ask?tags=cassandra. Cheers! – Erick Ramirez Commented Mar 17 at 1:07
1 Answer
Reset to default 0The symptoms you described indicate that the node is overloaded and becomes unresponsive. This is the reason you cannot query it after loading a lot of data.
If your cluster is not sized correctly, loading data at a throughput higher than the disks can sustain is like doing a DDoS attack. For example, if the commitlog/
disk can only handle 10,000 IO per second (IOPS) but you're inserting data at 15,000 IOPS then the requests will queue up on the disk until it can eventually catch up.
In the meantime, the database becomes unresponsive while waiting for the disk to become operational again. This is why it eventually appears to be operational again, or as you said "errors usually go away [after an undefined period of time]".
To use an analogy, imagine a fast food restaurant where a worker can take orders from 5 people per minute. If you're a customer, the wait time is bearable if there are 5 or less in the queue. But if there were 10 people in the queue, the wait time starts to get annoying. The solution is for the restaurant manager to put at least one more worker to increase their capacity to take orders.
The same thing applies to your cluster. If it can normally deal with a throughput of 10,000 IOPS but are sending 15K IOPS, you have two options:
- Throttle the load so it is only sending 10K requests per second.
- Increase the capacity of your cluster by adding 50% more nodes (e.g. add 5 nodes to a 10-node cluster).
There is no magic to it. You cannot max out the disks. There is no amount of tuning you can do to beat physics. Cheers!