dataframe - How to gracefully decomission spark executors

I am using df.cache() to cachce a dataframe and using databricks autoscaling with Min instance as 1 and Max instance as 8. But cache don't work properly here because some executors dies in the middle of execution and cached data also got lost. When I set the min and max instance equal then I can see cache works fine. How to configure so that during downscaling cached data don't get lost?

Share Improve this question asked yesterday gaurav narang 213 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

The only practical things you can do in a plain-vanilla environment are:

Use of checkpointing.
Or use of DISK_ONLY option. Slower.

Of course the Executor may fail before anything written.

Recompute from source vs. last checkpoint is still possible due to Spark's Fault Tolerance, so in the end not a huge issue.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

dataframe - How to gracefully decomission spark executors - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)