最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

hive - Spark Overwrite table , getting data loss when terminated at insertion stage - Stack Overflow

programmeradmin1浏览0评论

Objective :

we need to read the table in spark application and trasform the data and rewrite the same table

Senario :

I am trying to overwrite external non partitioned table with spark

Since same data read and write was not possible , we are using a concept of checkpoint.

We have observed that if any application got terminated during the insertion job in spark. the data in the original table is getting lost before inserting the revised data as application terminated in the middle. Due to this we are loosing entire data in table.

Understood that, spark will first delete the existing data and then will write the modified data we are inserting.

Is there workaround to prevent data loss or what appoach is best to read / trasform/ write the same table with spark.

发布评论

评论列表(0)

  1. 暂无评论