How to setup dynamic allocation for a spark job which is having data rate about 450k?
I tried with the below configurations, but the executor pods are always running with the max executors and it's not scaling down even when the data rate is just 20k-30k.
- --conf spark.dynamicAllocation.enabled=true
- --conf spark.dynamicAllocation.shuffleTracking.enabled=true
- --conf spark.dynamicAllocation.shuffleTracking.timeout=30s
- --conf spark.dynamicAllocation.minExecutors=10
- --conf spark.dynamicAllocation.initialExecutors=2
- --conf spark.dynamicAllocation.maxExecutors=85
- --conf spark.dynamicAllocation.executorIdleTimeout=300s
- --conf spark.dynamicAllocation.schedulerBacklogTimeout=210s
- --conf spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=30s
How can I fix this and why the job is always ends up in running with max executors all the time? I'm trying to optimize the job and expecting to run with minimum number of executors when the data size is minimal.