最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

apache spark sql - Merging sequence of dataframes - Stack Overflow

programmeradmin5浏览0评论

I have a sequence of dataframes , each hypothetically having the following composition:

date | LA Physics Avg 
2024-02-05 | 88

date | Chicago Chemistry Avg 
2024-02-05 | 74

The desired output after merging the dataframes from the sequence would be as below :

date | LA Physics Avg  | Chicago Chemistry Avg
2024-02-05 | 88 | 74

THe attempted code is below -

seqOfDF.tail.foldLeft(seqOfDF.head){ (df1,df2)=>df1.join(df2, "date") }

This works fine when the dataframes in the sequence have the same 'date' .
However, if sequence contains a dataframe with a different date - the code breaks and gives an erroneous result.

So, if the following dataframe were also part of the sequence -

date | Houston Biology Avg 
    2024-03-08 | 52

I would get an empty dataframe with the header as -

date | LA Physics Avg  | Chicago Chemistry Avg | Houston Biology Avg

The desired output would be -

   date | LA Physics Avg  | Chicago Chemistry Avg | Houston Biology Avg
    2024-02-05 | 88 | 74 | 0
    2024-03-08 | 0 | 0 | 53

How do I acheive this?

发布评论

评论列表(0)

  1. 暂无评论