最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - How to merge list of dataframes where neither columns nor rows are shared by all dataframes in list - Stack Overflow

programmeradmin3浏览0评论

Given a list of data frames where each dataframe in the list shares a common column name 'ID', but otherwise only overlap in their values of 'ID' OR in the names of the other columns, but not both. How can I merge these into a single dataframe? I've tried to use a couple of ways of specifying merge and Reduce but can't get it to work. I would think that the below should do it, but the way merge requires unique column names produces erroneous results. Reduce(function(x,y) merge(x,y,by='ID',all.x=T,all.y=T),dflist)

df1=data.frame(ID=c(1,2,3,4,5),A=c('a','b','c','d','e'),B=c('aa','bb','cc','dd','ee'))
df2=data.frame(ID=c(6,7,8,9,10),A=c('f','g','h','i','j'),B=c('ff','gg','hh','ii','jj'))
df3=data.frame(ID=c(1,2,3,4,5),C=c('a','b','c','d','e'),D=c('aa','bb','cc','dd','ee'))
df4=data.frame(ID=c(6,7,8,9,10),C=c('f','g','h','i','j'),D=c('ff','gg','hh','ii','jj'))
df5=data.frame(ID=c(11,12,13,1114,15),E=c('f','g','h','i','j'),F=c('ff','gg','hh','ii','jj'))
dflist<- list(df1,df2,df3,df4,df5)

Desired Output:

ID A  B  C  D  E  F
1  a  aa a  aa NA NA
2  b  bb b  bb NA NA
3  c  cc c  cc NA NA
4  d  dd d  dd NA NA
5  e  ee e  ee NA NA
6  f  ff f  ff NA NA
7  g  gg g  gg NA NA
8  h  hh h  hh NA NA
9  i  ii i  ii NA NA
10 j  jj j  jj NA NA
11 NA NA NA NA f  ff
12 NA NA NA NA g  gg
13 NA NA NA NA h  hh
14 NA NA NA NA i  ii
15 NA NA NA NA j  jj

Edit: I failed to consider the case wherein a dataframe in the list contains neither the rows nor columns previously encountered.

Given a list of data frames where each dataframe in the list shares a common column name 'ID', but otherwise only overlap in their values of 'ID' OR in the names of the other columns, but not both. How can I merge these into a single dataframe? I've tried to use a couple of ways of specifying merge and Reduce but can't get it to work. I would think that the below should do it, but the way merge requires unique column names produces erroneous results. Reduce(function(x,y) merge(x,y,by='ID',all.x=T,all.y=T),dflist)

df1=data.frame(ID=c(1,2,3,4,5),A=c('a','b','c','d','e'),B=c('aa','bb','cc','dd','ee'))
df2=data.frame(ID=c(6,7,8,9,10),A=c('f','g','h','i','j'),B=c('ff','gg','hh','ii','jj'))
df3=data.frame(ID=c(1,2,3,4,5),C=c('a','b','c','d','e'),D=c('aa','bb','cc','dd','ee'))
df4=data.frame(ID=c(6,7,8,9,10),C=c('f','g','h','i','j'),D=c('ff','gg','hh','ii','jj'))
df5=data.frame(ID=c(11,12,13,1114,15),E=c('f','g','h','i','j'),F=c('ff','gg','hh','ii','jj'))
dflist<- list(df1,df2,df3,df4,df5)

Desired Output:

ID A  B  C  D  E  F
1  a  aa a  aa NA NA
2  b  bb b  bb NA NA
3  c  cc c  cc NA NA
4  d  dd d  dd NA NA
5  e  ee e  ee NA NA
6  f  ff f  ff NA NA
7  g  gg g  gg NA NA
8  h  hh h  hh NA NA
9  i  ii i  ii NA NA
10 j  jj j  jj NA NA
11 NA NA NA NA f  ff
12 NA NA NA NA g  gg
13 NA NA NA NA h  hh
14 NA NA NA NA i  ii
15 NA NA NA NA j  jj

Edit: I failed to consider the case wherein a dataframe in the list contains neither the rows nor columns previously encountered.

Share Improve this question edited Feb 6 at 21:59 W. Kessler asked Feb 6 at 21:21 W. KesslerW. Kessler 4475 silver badges10 bronze badges 0
Add a comment  | 

2 Answers 2

Reset to default 4

You can try Reduce on top of tapply

Reduce(
    \(x, y) merge(x, y, all = TRUE),
    tapply(
        dflist,
        sapply(dflist, \(x) toString(names(x))),
        \(d) do.call(rbind, d)
    )
)

which gives

     ID    A    B    C    D    E    F
1     1    a   aa    a   aa <NA> <NA>
2     2    b   bb    b   bb <NA> <NA>
3     3    c   cc    c   cc <NA> <NA>
4     4    d   dd    d   dd <NA> <NA>
5     5    e   ee    e   ee <NA> <NA>
6     6    f   ff    f   ff <NA> <NA>
7     7    g   gg    g   gg <NA> <NA>
8     8    h   hh    h   hh <NA> <NA>
9     9    i   ii    i   ii <NA> <NA>
10   10    j   jj    j   jj <NA> <NA>
11   11 <NA> <NA> <NA> <NA>    f   ff
12   12 <NA> <NA> <NA> <NA>    g   gg
13   13 <NA> <NA> <NA> <NA>    h   hh
14   15 <NA> <NA> <NA> <NA>    j   jj
15 1114 <NA> <NA> <NA> <NA>    i   ii

Here's a dplyr approach using bind_rows(), which is is similar to do.call(rbind, dfs), but the output will contain all columns that appear in any of the inputs. The summarise() call ensures every column keeps its non-missing values for each ID.

library(dplyr)

bind_rows(dflist) |>
    summarise(
        across(everything(), \(x) if (all(is.na(x))) NA else na.omit(x)),
        .by = ID
    )
#      ID    A    B    C    D    E    F
# 1     1    a   aa    a   aa <NA> <NA>
# 2     2    b   bb    b   bb <NA> <NA>
# 3     3    c   cc    c   cc <NA> <NA>
# 4     4    d   dd    d   dd <NA> <NA>
# 5     5    e   ee    e   ee <NA> <NA>
# 6     6    f   ff    f   ff <NA> <NA>
# 7     7    g   gg    g   gg <NA> <NA>
# 8     8    h   hh    h   hh <NA> <NA>
# 9     9    i   ii    i   ii <NA> <NA>
# 10   10    j   jj    j   jj <NA> <NA>
# 11   11 <NA> <NA> <NA> <NA>    f   ff
# 12   12 <NA> <NA> <NA> <NA>    g   gg
# 13   13 <NA> <NA> <NA> <NA>    h   hh
# 14 1114 <NA> <NA> <NA> <NA>    i   ii
# 15   15 <NA> <NA> <NA> <NA>    j   jj

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论