I am trying to create a DLT pipeline and this is my first time doing it. What is the best possible way to satisfy my following requirements. I am fully aware that the path I am choosing may not be optimal and I am open to design recommendations here as well. Here's what I am trying to do:
@dlt.table(
name="bronze_dlt_table",
comment="This table reads data from a Delta location",
table_properties={
"quality": "bronze"
}
)
def read_raw_bronze_dlt_table():
return spark.read.format("delta").load("Delta Table Path written from Upstream location")
@dlt.table(
name="silver_dlt_table",
partition_cols=["ABC"],
table_properties={
"quality": "silver"
})
def refresh_silver_dlt_table():
bronzeDF = dlt.read("bronze_dlt_table")
LookupDF = spark.read("Read data from a delta table")
//Perform some basic column manipulation and joins between BronzeDF & LookupDF
silverDF
dlt.apply_changes(
target = silver_dlt_table,
source = silverDF,
sequence_by = col("Newly Added Column in SilverDF based on LookupDF")
)
return