I was hoping for an explanation of some unexpected BQ snapshot behavior. Here's a minimal example to create a table and snapshot it twice:
-- Create a partitioned table and populate it with data
CREATE OR REPLACE TABLE `your_project.your_dataset.snapshot_example_data` (
id STRING,
value STRING,
created_on DATE
)
PARTITION BY created_on AS
SELECT
GENERATE_UUID() AS id,
GENERATE_UUID() AS value,
CASE
WHEN n <= 500000 THEN DATE("2025-01-01")
ELSE DATE("2025-02-01")
END AS created_on
FROM
UNNEST(GENERATE_ARRAY(1, 1000000)) AS n;
-- Create the first snapshot
CREATE OR REPLACE SNAPSHOT TABLE `your_project.your_dataset.snapshot_example_data_1`
CLONE `your_project.your_dataset.snapshot_example_data`;
-- Update some rows in the January partition
UPDATE `your_project.your_dataset.snapshot_example_data`
SET value = GENERATE_UUID()
WHERE created_on = DATE('2025-01-01')
AND STARTS_WITH(value, 'a');
-- Create the second snapshot
CREATE OR REPLACE SNAPSHOT TABLE `your_project.your_dataset.snapshot_example_data_2`
CLONE `your_project.your_dataset.snapshot_example_data`;
My second snapshot is the same size as the first, in both logical bytes and physical bytes, even though I only modified a subset of rows & columns. I'd the second snapshot to be significantly smaller, storing only the differences from the first.
Why is this the case? All I can think of is that there are unexpected consequences of creating snapshots within the time travel window, but doesn't seem likely.