join - What is the equivalent SQL for this SAS merge?

I am stuck on this SAS code that I have to rewrite for SQL (PySpark specifically).

data data2 data3;
merge input_2(in=in2) 
      input_1(in=in1);
by col_1
   col_2;

    if in1 and in2 then do;
        new_col = 'yes';
        output data3; 
    end;
    else if in1 then output data2; 
run;

For "if in1 and in2", I believe that's like a SQL inner join. But for "else if in1", this would be a left join, yes?

If so, does the order of "merge input_2 input_1" matter? Is input_2 equivalent to the "left" of a SQL left join?

I am stuck on this SAS code that I have to rewrite for SQL (PySpark specifically).

data data2 data3;
merge input_2(in=in2) 
      input_1(in=in1);
by col_1
   col_2;

    if in1 and in2 then do;
        new_col = 'yes';
        output data3; 
    end;
    else if in1 then output data2; 
run;

For "if in1 and in2", I believe that's like a SQL inner join. But for "else if in1", this would be a left join, yes?

If so, does the order of "merge input_2 input_1" matter? Is input_2 equivalent to the "left" of a SQL left join?

Share Improve this question edited Jan 30 at 17:14 samkart 6,6543 gold badges16 silver badges34 bronze badges asked Jan 30 at 15:08 Chuck 1,3052 gold badges30 silver badges64 bronze badges

2 the else in else in1 precludes data2 from being the left join. data2 contains the data of input_1 that is not paired to that in input_2. data2 is input_1 EXCEPT input_2 – Richard Commented Jan 30 at 17:20

Add a comment |

1 Answer 1

Sorted by: Reset to default 3

Yes, if in1 and in2 is an inner join.
Yes, else if in1 is a left join.
Order of MERGE does not determine the "left" dataset, but in1 (from input_1) being checked first suggests that input_1 is the left table in SQL.

You can try:

merged_df = input_1_df.join(input_2_df, on=["col_1", "col_2"], how="left")

# Create new columns based on the SAS logic
result_df = merged_df.withColumn(
    "new_col",
    when(col("col_1").isNotNull() & col("col_2").isNotNull(), lit("yes"))
).select(
    *input_1_df.columns, "new_col"  
)

# Filter into separate outputs
data3_df = result_df.filter(col("new_col") == "yes")
data2_df = result_df.filter(col("new_col").isNull())

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

join - What is the equivalent SQL for this SAS merge? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)