最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

apache spark - After finishing running vacuum on all tables i don't see its freeing up the space - Stack Overflow

programmeradmin2浏览0评论

In Databricks, I ran vacuum on all the tables in loop after successful run, when i check the history for all the tables which contain operation "Vacuum start" and captured "sizeOfDataToDelete" which was closer to 1TB but when check the container metrics its still remains the size of before performing vacuum, it didn't free up the space below attached sample of the output, what could be the issue?

In Databricks, I ran vacuum on all the tables in loop after successful run, when i check the history for all the tables which contain operation "Vacuum start" and captured "sizeOfDataToDelete" which was closer to 1TB but when check the container metrics its still remains the size of before performing vacuum, it didn't free up the space below attached sample of the output, what could be the issue?

Share Improve this question asked Mar 30 at 3:06 user2703679user2703679 373 silver badges13 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2

In Databricks, Vaccuum removes only the files which are not part of delta log . I am assuming you have used DELETE command to delete the data you no longer need first. If so by default deleted data will be marked as delete but not deleted till 7 days. That is the default time till which the files exists although not part of the delta log. https://docs.databricks/aws/en/sql/language-manual/delta-vacuum

VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. VACUUM will skip all directories that begin with an underscore (_), which includes the _delta_log. Partitioning your table on a column that begins with an underscore is an exception to this rule; VACUUM scans all valid partitions included in the target Delta table. Delta table data files are deleted according to the time they have been logically removed from Delta’s transaction log plus retention hours, not their modification timestamps on the storage system. The default threshold is 7 days.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论