最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

dataframe - How to gracefully decomission spark executors - Stack Overflow

programmeradmin0浏览0评论

I am using df.cache() to cachce a dataframe and using databricks autoscaling with Min instance as 1 and Max instance as 8. But cache don't work properly here because some executors dies in the middle of execution and cached data also got lost. When I set the min and max instance equal then I can see cache works fine. How to configure so that during downscaling cached data don't get lost?

I am using df.cache() to cachce a dataframe and using databricks autoscaling with Min instance as 1 and Max instance as 8. But cache don't work properly here because some executors dies in the middle of execution and cached data also got lost. When I set the min and max instance equal then I can see cache works fine. How to configure so that during downscaling cached data don't get lost?

Share Improve this question asked yesterday gaurav naranggaurav narang 213 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

The only practical things you can do in a plain-vanilla environment are:

  1. Use of checkpointing.
  2. Or use of DISK_ONLY option. Slower.

Of course the Executor may fail before anything written.

Recompute from source vs. last checkpoint is still possible due to Spark's Fault Tolerance, so in the end not a huge issue.

发布评论

评论列表(0)

  1. 暂无评论