最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

apache kafka - Debezium MySQL Source Connector takes too long to resume streaming on restart (in schema_only mode) - Stack Overf

programmeradmin0浏览0评论

I am using the Debezium MySQL Source Connector (v2.2.1.Final) in a multi-tenant application. As the number of tenant databases increased, the size of the schema history topic grown significantly. Whenever I restart the connector (while having the snapshot.mode=schema_only), it takes approximately 1 hour and 30 minutes to resume streaming new records.

I have observed that:

  • The schema history topic already contains all the persisted schema information.
  • The __consumer_offsets topic has the correct offsets.
  • The connect.offsets file contains the correct mysqlbinlog position.
  • During the schemahistory recovery process, I don’t see any activity on the database server.

Log:

[2025-02-20 04:34:34,669] INFO [mysql_source_connector|task-0] Closing connection before starting schema recovery (io.debezium.connector.mysql.MySqlConnectorTask:94)
[2025-02-20 04:34:34,743] INFO [mysql_source_connector|task-0] Started database schema history recovery (io.debezium.relational.history.SchemaHistoryMetrics:115)
[2025-02-20 04:34:37,825] INFO [mysql_source_connector|task-0] Database schema history recovery in progress, recovered ... records (io.debezium.relational.history.SchemaHistoryMetrics:130)
[2025-02-20 04:34:38,235] INFO [mysql_source_connector|task-0] Already applied .... database changes (io.debezium.relational.history.SchemaHistoryMetrics:140)
[2025-02-20 04:34:38,629] INFO [mysql_source_connector|task-0] Database schema history recovery in progress, recovered .... records (io.debezium.relational.history.SchemaHistoryMetrics:130)
[2025-02-20 04:34:38,630] INFO [mysql_source_connector|task-0] Already applied .... database changes (io.debezium.relational.history.SchemaHistoryMetrics:140)
[2025-02-20 04:34:40,629] INFO [mysql_source_connector|task-0] Already applied .... database changes (io.debezium.relational.history.SchemaHistoryMetrics:140)

My Question:

  1. What is the need for the schema history recovery process every time the connector restarts, given that the connector already has the binlog position?
  2. Wouldn't it be more efficient for the connector to resume from the existing binlog position and only append newly added tables and database schema changes to the schema history topic, instead of reprocessing everything?

Any insights on optimizing the recovery process would be appreciated!

发布评论

评论列表(0)

  1. 暂无评论