Title: Issue Passing Dynamic Variables from Shell Script into Dataform (v2.6.7)
Post:
Hi Dataform Community,
I'm encountering issues dynamically passing variables (specifically dataset names) from a shell script into my Dataform project (version 2.6.7). Here’s a simplified and anonymized example clearly illustrating my scenario:
Scenario Overview
I have four key files:
1. datasets.csv
datasetA,datasetB
2. dataform.json
{
"warehouse": "bigquery",
"defaultSchema": "sample_schema",
"defaultDatabase": "sample_project",
"vars": {
"PROJECT_ID": "sample_project",
"SCHEMA": "sample_schema",
"DATASETS": ""
}
}
3. filter_data.sqlx
config {
type: "table",
schema: "sample_schema"
}
SELECT *
FROM ${ref("source_table")}
WHERE dataset_id IN (${dataform.projectConfig.vars.DATASETS.split(",").map(d => `'${d.trim()}'`).join(",")})
4. run_dataform.sh
#!/bin/bash
DATASETS=$(awk 'NF' datasets.csv | paste -sd, -)
cp dataform.json dataform.json.bak
jq --arg DATASETS "$DATASETS" '.vars.DATASETS=$DATASETS' dataform.json > temp.json && mv temp.json dataform.json
dataform run --credentials "./envs/.df-credentials.json"
mv dataform.json.bak dataform.json
Issue Explanation
When I execute the script locally, the variables update correctly, and the Dataform job runs successfully. However, once deployed in a cloud workflow environment, Dataform executes successfully but returns empty tables. It seems like the variables from datasets.csv
aren't being recognized or passed correctly in the workflow context.
Desired Outcome
- I want the compiled SQL in Dataform to reflect the dynamic datasets passed from
datasets.csv
:
dataset_id IN ('datasetA','datasetB')
My questions to the community
- Is passing dynamic variables via modification of
dataform.json
considered best practice? - Is there a better, officially supported method (perhaps environment variables or CLI parameters) to dynamically pass variables into Dataform at runtime?
Any help, advice, or examples would be greatly appreciated. Thanks in advance!