Objective :
we need to read the table in spark application and trasform the data and rewrite the same table
Senario :
I am trying to overwrite external non partitioned table with spark
Since same data read and write was not possible , we are using a concept of checkpoint.
We have observed that if any application got terminated during the insertion job in spark. the data in the original table is getting lost before inserting the revised data as application terminated in the middle. Due to this we are loosing entire data in table.
Understood that, spark will first delete the existing data and then will write the modified data we are inserting.
Is there workaround to prevent data loss or what appoach is best to read / trasform/ write the same table with spark.