I am performing a join where I'd like to have a variable name for the column to join on.
For example, DF1
is an income statement that uses raw names for line items. DF2
contains a mapping of the raw names to cleaned up names that depend on which company's income statement we are looking at. I'd like to have a variable CO
that determines which column to join on. The end result should be bringing the cleaned up names into DF1
.
DF1
:
DF2
:
An example join is:
DF1.join(DF2, DF1.Company_A = DF2.Final)
How do I define a variable CO
to specify the DF1
column in the join? So the join would be:
DF1.join(DF2, DF1.CO = DF2.Final)
I am not sure how to write this in a way that Snowflake doesn't think the variable CO
is a column name in DF1
.
Snowpark solution preferred, but Pandas is OK as long as it works in Snowflake.
I am performing a join where I'd like to have a variable name for the column to join on.
For example, DF1
is an income statement that uses raw names for line items. DF2
contains a mapping of the raw names to cleaned up names that depend on which company's income statement we are looking at. I'd like to have a variable CO
that determines which column to join on. The end result should be bringing the cleaned up names into DF1
.
DF1
:
DF2
:
An example join is:
DF1.join(DF2, DF1.Company_A = DF2.Final)
How do I define a variable CO
to specify the DF1
column in the join? So the join would be:
DF1.join(DF2, DF1.CO = DF2.Final)
I am not sure how to write this in a way that Snowflake doesn't think the variable CO
is a column name in DF1
.
Snowpark solution preferred, but Pandas is OK as long as it works in Snowflake.
Share Improve this question edited Mar 19 at 8:39 Timus 11.4k5 gold badges17 silver badges31 bronze badges asked Mar 18 at 21:03 user29988621user29988621 11 Answer
Reset to default 0You can do it using col() method or using simple d1[col_name] (array access method):
# The Snowpark package is required for Python Worksheets.
# You can add more packages by selecting them using the Packages control and then importing them.
import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col
def main(session: snowpark.Session):
df1 = session.create_dataframe([ ("Premium-Life",1.0,1.1,1.2),("ExpAcct",0.5, -1.0, 0.0)],schema = ["company_a","t0","t1","t2"])
df2 = session.create_dataframe([ ("Premium", "Premium-Life", "Prem (Ann)"),("Expenses", "ExpAcct","MaintExp")],schema = ["final","company_a","company_b"])
# return df1.join(df2, df1pany_a == df2pany_a ) -- static join
col_name = "company_a"
# alternative 1:
return df1.join(df2, df1[col_name] == df2pany_a )
# alternative 2:
return df1.join(df2, df1.col(col_name) == df2pany_a )