return FALSE; $r = well_tag_thread__update(array('id' => $id), $update); return $r; } function well_tag_thread_find($tagid, $page, $pagesize) { $arr = well_tag_thread__find(array('tagid' => $tagid), array('id' => -1), $page, $pagesize); return $arr; } function well_tag_thread_find_by_tid($tid, $page, $pagesize) { $arr = well_tag_thread__find(array('tid' => $tid), array(), $page, $pagesize); return $arr; } ?>The schemaTrackingLocation + mergeSchema, PySpark streaming jobs, schema changes, and OpenSearch - Stack Overflow
最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

The schemaTrackingLocation + mergeSchema, PySpark streaming jobs, schema changes, and OpenSearch - Stack Overflow

programmeradmin1浏览0评论

I have a Stream job in PySpark, which reads the CDC Feed. So far, nothing special. When the source table gets a schema change, the Feed changes its schema, too. My Stream job stops working and fails with:

com.databricks.sql.transaction.tahoe.DeltaStreamingColumnMappingSchemaIncompatibleException: Streaming read is not supported on tables with read-incompatible schema changes (e.g. rename or drop or datatype changes).
Please provide a 'schemaTrackingLocation' to enable non-additive schema evolution for Delta stream processing.

Normally, I would fix this by adding schemaTrackingLocation parameter so that Spark can also track not-additive schema changes (i.e., column rename/type change )

The problem is that the sink is OpenSearch.

Yes, I know it: OpenSearch does not need this paramters, however I have to idea how to accept the incoming data frame when some schema change happens.

At the moment with or without the schemaTrackingLocation, the Spark stream jobs stops working.

BTW, the data pushed to OpenSearch is filtered, thus the document mapping is always honored.

Any help is appreciated.

 stream_os = (
   query_push_os
  .options(**w_options) <-- here mergeSchema and schemaTrackingLocation
  .outputMode('append')
  .format(".opensearch.spark.sql")
  .options(**os_config) <-- only OpenSearch settings.
  .start()
) 

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论