最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

google cloud platform - Why are BQ snapshots using the full storage amount of the original table? - Stack Overflow

programmeradmin6浏览0评论

I was hoping for an explanation of some unexpected BQ snapshot behavior. Here's a minimal example to create a table and snapshot it twice:

-- Create a partitioned table and populate it with data
CREATE OR REPLACE TABLE `your_project.your_dataset.snapshot_example_data` (
  id STRING,
  value STRING,
  created_on DATE
)
PARTITION BY created_on AS
SELECT
  GENERATE_UUID() AS id,
  GENERATE_UUID() AS value,
  CASE
    WHEN n <= 500000 THEN DATE("2025-01-01")
    ELSE DATE("2025-02-01")
  END AS created_on
FROM
  UNNEST(GENERATE_ARRAY(1, 1000000)) AS n;

-- Create the first snapshot
CREATE OR REPLACE SNAPSHOT TABLE `your_project.your_dataset.snapshot_example_data_1`
CLONE `your_project.your_dataset.snapshot_example_data`;

-- Update some rows in the January partition
UPDATE `your_project.your_dataset.snapshot_example_data`
SET value = GENERATE_UUID()
WHERE created_on = DATE('2025-01-01')
  AND STARTS_WITH(value, 'a');

-- Create the second snapshot
CREATE OR REPLACE SNAPSHOT TABLE `your_project.your_dataset.snapshot_example_data_2`
CLONE `your_project.your_dataset.snapshot_example_data`;

My second snapshot is the same size as the first, in both logical bytes and physical bytes, even though I only modified a subset of rows & columns. I'd the second snapshot to be significantly smaller, storing only the differences from the first.

Why is this the case? All I can think of is that there are unexpected consequences of creating snapshots within the time travel window, but doesn't seem likely.

发布评论

评论列表(0)

  1. 暂无评论