Cassandra node is stuck in the mutation stage and CPU utilization is very high. Could you please advise on how to resolve the issue?
$ NODETOOL tpstats
Pool Name Active Pending Completed Blocked All time blocked
RequestResponseStage 0 0 26110939 0 0
MutationStage 124 13165 874648434519 0 0
ReadStage 0 0 4197248 0 0
CompactionExecutor 0 0 6855061 0 0
top
top - 22:46:48 up 59 days, 2:34, 1 user, load average: 132.11, 135.86, 135.80
Tasks: 387 total, 6 running, 380 sleeping, 0 stopped, 1 zombie
%Cpu(s): 98.7 us, 0.5 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.3 hi, 0.1 si, 0.0 st
MiB Mem : 63857.6 total, 613.9 free, 26076.0 used, 37167.7 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 35953.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1240 cassand+ 20 0 162.9g 22.4g 207912 S 983.1 35.9 45122:57 java
NODETOOL gcstats
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
3490349390 45781 1832856 384 249097766280528 36321 -1
Cassandra node is stuck in the mutation stage and CPU utilization is very high. Could you please advise on how to resolve the issue?
$ NODETOOL tpstats
Pool Name Active Pending Completed Blocked All time blocked
RequestResponseStage 0 0 26110939 0 0
MutationStage 124 13165 874648434519 0 0
ReadStage 0 0 4197248 0 0
CompactionExecutor 0 0 6855061 0 0
top
top - 22:46:48 up 59 days, 2:34, 1 user, load average: 132.11, 135.86, 135.80
Tasks: 387 total, 6 running, 380 sleeping, 0 stopped, 1 zombie
%Cpu(s): 98.7 us, 0.5 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.3 hi, 0.1 si, 0.0 st
MiB Mem : 63857.6 total, 613.9 free, 26076.0 used, 37167.7 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 35953.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1240 cassand+ 20 0 162.9g 22.4g 207912 S 983.1 35.9 45122:57 java
NODETOOL gcstats
Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC Reclaimed (MB) Collections Direct Memory Bytes
3490349390 45781 1832856 384 249097766280528 36321 -1
Share
Improve this question
edited Mar 31 at 11:33
Erick Ramirez
16.5k2 gold badges21 silver badges31 bronze badges
asked Mar 31 at 5:49
BaluBalu
12 bronze badges
1 Answer
Reset to default 0To me, it looks like the node is handling a constant stream of writes, and it's having trouble keeping-up. Not sure how big the cluster is or what RF it's running at, but my first thought is to try and add another node to the cluster to spread-out the writes a bit.
Otherwise, It might make sense to throttle-down the application's write throughput. Or maybe put an event queue like Pulsar or Kafka in-between the application and the Cassandra cluster.