apache spark sql - Merging sequence of dataframes

I have a sequence of dataframes , each hypothetically having the following composition:

date | LA Physics Avg 
2024-02-05 | 88

date | Chicago Chemistry Avg 
2024-02-05 | 74

The desired output after merging the dataframes from the sequence would be as below :

date | LA Physics Avg  | Chicago Chemistry Avg
2024-02-05 | 88 | 74

THe attempted code is below -

seqOfDF.tail.foldLeft(seqOfDF.head){ (df1,df2)=>df1.join(df2, "date") }

This works fine when the dataframes in the sequence have the same 'date' .
However, if sequence contains a dataframe with a different date - the code breaks and gives an erroneous result.

So, if the following dataframe were also part of the sequence -

date | Houston Biology Avg 
    2024-03-08 | 52

I would get an empty dataframe with the header as -

date | LA Physics Avg  | Chicago Chemistry Avg | Houston Biology Avg

The desired output would be -

   date | LA Physics Avg  | Chicago Chemistry Avg | Houston Biology Avg
    2024-02-05 | 88 | 74 | 0
    2024-03-08 | 0 | 0 | 53

How do I acheive this?

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

apache spark sql - Merging sequence of dataframes - Stack Overflow

与本文相关的文章

评论列表(0)