apache kafka - Debezium MySQL Source Connector takes too long to resume streaming on restart (in schema_only mode)

I am using the Debezium MySQL Source Connector (v2.2.1.Final) in a multi-tenant application. As the number of tenant databases increased, the size of the schema history topic grown significantly. Whenever I restart the connector (while having the snapshot.mode=schema_only), it takes approximately 1 hour and 30 minutes to resume streaming new records.

I have observed that:

The schema history topic already contains all the persisted schema information.
The __consumer_offsets topic has the correct offsets.
The connect.offsets file contains the correct mysqlbinlog position.
During the schemahistory recovery process, I don’t see any activity on the database server.

Log:

[2025-02-20 04:34:34,669] INFO [mysql_source_connector|task-0] Closing connection before starting schema recovery (io.debezium.connector.mysql.MySqlConnectorTask:94)
[2025-02-20 04:34:34,743] INFO [mysql_source_connector|task-0] Started database schema history recovery (io.debezium.relational.history.SchemaHistoryMetrics:115)
[2025-02-20 04:34:37,825] INFO [mysql_source_connector|task-0] Database schema history recovery in progress, recovered ... records (io.debezium.relational.history.SchemaHistoryMetrics:130)
[2025-02-20 04:34:38,235] INFO [mysql_source_connector|task-0] Already applied .... database changes (io.debezium.relational.history.SchemaHistoryMetrics:140)
[2025-02-20 04:34:38,629] INFO [mysql_source_connector|task-0] Database schema history recovery in progress, recovered .... records (io.debezium.relational.history.SchemaHistoryMetrics:130)
[2025-02-20 04:34:38,630] INFO [mysql_source_connector|task-0] Already applied .... database changes (io.debezium.relational.history.SchemaHistoryMetrics:140)
[2025-02-20 04:34:40,629] INFO [mysql_source_connector|task-0] Already applied .... database changes (io.debezium.relational.history.SchemaHistoryMetrics:140)

My Question:

What is the need for the schema history recovery process every time the connector restarts, given that the connector already has the binlog position?
Wouldn't it be more efficient for the connector to resume from the existing binlog position and only append newly added tables and database schema changes to the schema history topic, instead of reprocessing everything?

Any insights on optimizing the recovery process would be appreciated!

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

apache kafka - Debezium MySQL Source Connector takes too long to resume streaming on restart (in schema_only mode) - Stack Overf

与本文相关的文章

评论列表(0)