最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

pyspark - Databricks Spark streaming - read checkpoint file content - Stack Overflow

programmeradmin5浏览0评论

I need to read a checkpoint file from one of my streaming jobs im databricks. Here is the file structure:

   path-to-delta-table
    |-- data-file.parquet
    |-- _delta_log
    |-- _checkpoint
       |-- commits
       |-- offsets
          |-- 12345

I want to read contents of the checkpoint offset file called 12345 (file has no extension). When I try to read it, I get the error:

[DELTA_INVALID_FORMAT] A transaction log for delta was found at...

I know that the presence of _delta_log directory conflicts with that read and spark assumes this is a delta table read. The only solution I found was to move the checkpoint to different location, but that cannot be done in my case.

Is there any other solution for this? Can I find this information somewhere else? I especially need the reservoirVersion field from the checkpoint file.

发布评论

评论列表(0)

  1. 暂无评论