I'm building an AWS Glue ETL job (using Glue 4.0, Python 3) that processes JSON files stored in S3 and computes a total customer count for each user. I then need to write these results into a Snowflake table (TOTAL_CUSTOMER_METRICS
) in the KLAVIYO_DASHBOARD.PUBLIC
schema. The table is keyed on USER_ID, and I want to perform an upsert—i.e., if a record with a given USER_ID
exists, update its TOTAL_CUSTOMERS
column; if not, insert a new record.
Currently, My script writes new rows to Snowflake, but I want to update an existing record if the USER_ID already exists (i.e., perform an upsert or merge). I've seen approaches that use the Snowflake Python connector with a MERGE statement after the dynamic frame write, but that requires a separate connection and SQL query, which makes the dynamic frame write feel redundant.
My question is:
Is there a way to perform an upsert (merge) operation directly using the AWS Glue DynamicFrame write operation to Snowflake? In other words, can I configure the write operation (perhaps using preactions/postactions or some other option) so that it updates the existing record if the key (USER_ID
) exists, and inserts a new record if it doesn't, without having to run a separate SQL query externally?