Flight recording is enabled with disk=false when the service starts and a timer service triggers jcmd <PID> JFR.dump
every five minutes to capture flight recordings to a file.
However we notice performance issues that coincide with the execution of that command. This manifests in my Vert.x based service as:
2025-02-05 17:00:39.998 [33] WARN io.vertx.core.impl.BlockedThreadChecker - Thread Thread[#34,vert.x-eventloop-thread-0,5,main] has been blocked for 3154 ms, time limit is 2000 ms
It doesn't trigger that log every time, but every time I see that log it matches when the JFR.dump
ran.
The periodic process doesn't do much. The service is running on an AWS EC2 instance. There are a couple curl calls to get details about the instance, then it makes the JFR file and copies it to S3.
The main jcmd
is simply:
jcmd $PID JFR.dump name=jfr filename=$JFR_FILE_LOCATION > /dev/null && log_message "JFR dump succeeded" || log_message "JFR dump failed"
Tried skipping the copy to S3, thinking it might slow AWS API calls or something to stall the main thread, it didn't help. Tried using ZGC just in case, but it didn't help. (Heap statistics are not enabled.) I didn't think flight recordings should stall the JVM in such a significant way. Is JFR.dump known to have this effect? How can I fix it?
Flight recording is enabled with disk=false when the service starts and a timer service triggers jcmd <PID> JFR.dump
every five minutes to capture flight recordings to a file.
However we notice performance issues that coincide with the execution of that command. This manifests in my Vert.x based service as:
2025-02-05 17:00:39.998 [33] WARN io.vertx.core.impl.BlockedThreadChecker - Thread Thread[#34,vert.x-eventloop-thread-0,5,main] has been blocked for 3154 ms, time limit is 2000 ms
It doesn't trigger that log every time, but every time I see that log it matches when the JFR.dump
ran.
The periodic process doesn't do much. The service is running on an AWS EC2 instance. There are a couple curl calls to get details about the instance, then it makes the JFR file and copies it to S3.
The main jcmd
is simply:
jcmd $PID JFR.dump name=jfr filename=$JFR_FILE_LOCATION > /dev/null && log_message "JFR dump succeeded" || log_message "JFR dump failed"
Tried skipping the copy to S3, thinking it might slow AWS API calls or something to stall the main thread, it didn't help. Tried using ZGC just in case, but it didn't help. (Heap statistics are not enabled.) I didn't think flight recordings should stall the JVM in such a significant way. Is JFR.dump known to have this effect? How can I fix it?
Share Improve this question asked Feb 7 at 14:16 swpalmerswpalmer 4,3812 gold badges28 silver badges35 bronze badges1 Answer
Reset to default 0What stands out in your configuration is disk=false. Oracle's long running stress testing of JFR has been with disk=true, which is the default.
When you run with disk=false, stack trace data is not flushed out, and it could accumulate over time, leading to bugs that are hard to notice in short-lived unit tests or observed when running manually during development.