I am currently using java vertx-rabbitmq-client version 4.5.11 and facing issues with the client's automatic recovery mechanism. Although I haven't managed to reproduce the issue consistently, it typically occurs in environments with a large number of queues (100 or more).
The problem arises during maintenance windows when we deploy a new version of RabbitMQ, leading to a RabbitMQ cluster restart. In such cases, our application often crashes. The symptoms are consistent: immediately after the connection loss, the application logs over 1000 error messages almost simultaneously:
{"instant":{"epochSecond":1742287707,"nanoOfSecond":105110000},"thread":"vert.x-eventloop-thread-1","level":"ERROR","loggerName":"io.vertx.rabbitmq.impl.RabbitMQClientImpl","message":"Retries disabled. Will not attempt to restart","endOfBatch":true,"loggerFqcn":"io.vertx.core.logging.Logger","threadId":24,"threadPriority":5}
Following this, the application's CPU usage spikes to an unusually high level and continues until we manually restart the service. It appears the automatic recovery does not function as expected, leaving the application in a broken state.
This is configuration that I am currently using for the RabbitMQ client:
RabbitMQOptions options = new RabbitMQOptions()
.setVirtualHost("xxx")
.setHost("xxx")
.setPort(5672)
.setUser("xxx")
.setPassword("xxx")
.setConnectionTimeout(120000)
.setHandshakeTimeout(10000)
.setRequestedHeartbeat(30)
.setConnectionName(getConnectionName())
// Recovery configuration
.setAutomaticRecoveryEnabled(true)
.setReconnectAttempts(0)
.setNetworkRecoveryInterval(1000L);
-> Are there any known issues with recovery in rabbitmq-client during large-scale queue handling?
-> Is there a recommended approach to better handle connection loss and ensure proper recovery?
-> Should I adjust the retry configuration or use a different strategy for environments with numerous queues?
Thank you for any help