最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Is there a way to parameterize the name of a column using a Snowpark Dataframe? - Stack Overflow

programmeradmin1浏览0评论

I am performing a join where I'd like to have a variable name for the column to join on.

For example, DF1 is an income statement that uses raw names for line items. DF2 contains a mapping of the raw names to cleaned up names that depend on which company's income statement we are looking at. I'd like to have a variable CO that determines which column to join on. The end result should be bringing the cleaned up names into DF1.

DF1:

DF2:

An example join is:

DF1.join(DF2, DF1.Company_A = DF2.Final)

How do I define a variable CO to specify the DF1 column in the join? So the join would be:

DF1.join(DF2, DF1.CO = DF2.Final)

I am not sure how to write this in a way that Snowflake doesn't think the variable CO is a column name in DF1.

Snowpark solution preferred, but Pandas is OK as long as it works in Snowflake.

I am performing a join where I'd like to have a variable name for the column to join on.

For example, DF1 is an income statement that uses raw names for line items. DF2 contains a mapping of the raw names to cleaned up names that depend on which company's income statement we are looking at. I'd like to have a variable CO that determines which column to join on. The end result should be bringing the cleaned up names into DF1.

DF1:

DF2:

An example join is:

DF1.join(DF2, DF1.Company_A = DF2.Final)

How do I define a variable CO to specify the DF1 column in the join? So the join would be:

DF1.join(DF2, DF1.CO = DF2.Final)

I am not sure how to write this in a way that Snowflake doesn't think the variable CO is a column name in DF1.

Snowpark solution preferred, but Pandas is OK as long as it works in Snowflake.

Share Improve this question edited Mar 19 at 8:39 Timus 11.4k5 gold badges17 silver badges31 bronze badges asked Mar 18 at 21:03 user29988621user29988621 1
Add a comment  | 

1 Answer 1

Reset to default 0

You can do it using col() method or using simple d1[col_name] (array access method):

# The Snowpark package is required for Python Worksheets. 
# You can add more packages by selecting them using the Packages control and then importing them.

import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col

def main(session: snowpark.Session): 

    df1 = session.create_dataframe([ ("Premium-Life",1.0,1.1,1.2),("ExpAcct",0.5, -1.0, 0.0)],schema = ["company_a","t0","t1","t2"])

    df2 = session.create_dataframe([ ("Premium", "Premium-Life", "Prem (Ann)"),("Expenses", "ExpAcct","MaintExp")],schema = ["final","company_a","company_b"])

    
    # return df1.join(df2, df1pany_a == df2pany_a ) -- static join
    col_name = "company_a"

    # alternative 1:
    return df1.join(df2, df1[col_name] == df2pany_a )

    # alternative 2:
    return df1.join(df2, df1.col(col_name) == df2pany_a )
发布评论

评论列表(0)

  1. 暂无评论