Given a list of data frames where each dataframe in the list shares a common column name 'ID', but otherwise only overlap in their values of 'ID' OR in the names of the other columns, but not both. How can I merge these into a single dataframe?
I've tried to use a couple of ways of specifying merge
and Reduce
but can't get it to work. I would think that the below should do it, but the way merge requires unique column names produces erroneous results.
Reduce(function(x,y) merge(x,y,by='ID',all.x=T,all.y=T),dflist)
df1=data.frame(ID=c(1,2,3,4,5),A=c('a','b','c','d','e'),B=c('aa','bb','cc','dd','ee'))
df2=data.frame(ID=c(6,7,8,9,10),A=c('f','g','h','i','j'),B=c('ff','gg','hh','ii','jj'))
df3=data.frame(ID=c(1,2,3,4,5),C=c('a','b','c','d','e'),D=c('aa','bb','cc','dd','ee'))
df4=data.frame(ID=c(6,7,8,9,10),C=c('f','g','h','i','j'),D=c('ff','gg','hh','ii','jj'))
df5=data.frame(ID=c(11,12,13,1114,15),E=c('f','g','h','i','j'),F=c('ff','gg','hh','ii','jj'))
dflist<- list(df1,df2,df3,df4,df5)
Desired Output:
ID A B C D E F
1 a aa a aa NA NA
2 b bb b bb NA NA
3 c cc c cc NA NA
4 d dd d dd NA NA
5 e ee e ee NA NA
6 f ff f ff NA NA
7 g gg g gg NA NA
8 h hh h hh NA NA
9 i ii i ii NA NA
10 j jj j jj NA NA
11 NA NA NA NA f ff
12 NA NA NA NA g gg
13 NA NA NA NA h hh
14 NA NA NA NA i ii
15 NA NA NA NA j jj
Edit: I failed to consider the case wherein a dataframe in the list contains neither the rows nor columns previously encountered.
Given a list of data frames where each dataframe in the list shares a common column name 'ID', but otherwise only overlap in their values of 'ID' OR in the names of the other columns, but not both. How can I merge these into a single dataframe?
I've tried to use a couple of ways of specifying merge
and Reduce
but can't get it to work. I would think that the below should do it, but the way merge requires unique column names produces erroneous results.
Reduce(function(x,y) merge(x,y,by='ID',all.x=T,all.y=T),dflist)
df1=data.frame(ID=c(1,2,3,4,5),A=c('a','b','c','d','e'),B=c('aa','bb','cc','dd','ee'))
df2=data.frame(ID=c(6,7,8,9,10),A=c('f','g','h','i','j'),B=c('ff','gg','hh','ii','jj'))
df3=data.frame(ID=c(1,2,3,4,5),C=c('a','b','c','d','e'),D=c('aa','bb','cc','dd','ee'))
df4=data.frame(ID=c(6,7,8,9,10),C=c('f','g','h','i','j'),D=c('ff','gg','hh','ii','jj'))
df5=data.frame(ID=c(11,12,13,1114,15),E=c('f','g','h','i','j'),F=c('ff','gg','hh','ii','jj'))
dflist<- list(df1,df2,df3,df4,df5)
Desired Output:
ID A B C D E F
1 a aa a aa NA NA
2 b bb b bb NA NA
3 c cc c cc NA NA
4 d dd d dd NA NA
5 e ee e ee NA NA
6 f ff f ff NA NA
7 g gg g gg NA NA
8 h hh h hh NA NA
9 i ii i ii NA NA
10 j jj j jj NA NA
11 NA NA NA NA f ff
12 NA NA NA NA g gg
13 NA NA NA NA h hh
14 NA NA NA NA i ii
15 NA NA NA NA j jj
Edit: I failed to consider the case wherein a dataframe in the list contains neither the rows nor columns previously encountered.
Share Improve this question edited Feb 6 at 21:59 W. Kessler asked Feb 6 at 21:21 W. KesslerW. Kessler 4475 silver badges10 bronze badges 02 Answers
Reset to default 4You can try Reduce
on top of tapply
Reduce(
\(x, y) merge(x, y, all = TRUE),
tapply(
dflist,
sapply(dflist, \(x) toString(names(x))),
\(d) do.call(rbind, d)
)
)
which gives
ID A B C D E F
1 1 a aa a aa <NA> <NA>
2 2 b bb b bb <NA> <NA>
3 3 c cc c cc <NA> <NA>
4 4 d dd d dd <NA> <NA>
5 5 e ee e ee <NA> <NA>
6 6 f ff f ff <NA> <NA>
7 7 g gg g gg <NA> <NA>
8 8 h hh h hh <NA> <NA>
9 9 i ii i ii <NA> <NA>
10 10 j jj j jj <NA> <NA>
11 11 <NA> <NA> <NA> <NA> f ff
12 12 <NA> <NA> <NA> <NA> g gg
13 13 <NA> <NA> <NA> <NA> h hh
14 15 <NA> <NA> <NA> <NA> j jj
15 1114 <NA> <NA> <NA> <NA> i ii
Here's a dplyr
approach using bind_rows()
, which is is similar to do.call(rbind, dfs)
, but the output will contain all columns that appear in any of the inputs. The summarise()
call ensures every column keeps its non-missing values for each ID
.
library(dplyr)
bind_rows(dflist) |>
summarise(
across(everything(), \(x) if (all(is.na(x))) NA else na.omit(x)),
.by = ID
)
# ID A B C D E F
# 1 1 a aa a aa <NA> <NA>
# 2 2 b bb b bb <NA> <NA>
# 3 3 c cc c cc <NA> <NA>
# 4 4 d dd d dd <NA> <NA>
# 5 5 e ee e ee <NA> <NA>
# 6 6 f ff f ff <NA> <NA>
# 7 7 g gg g gg <NA> <NA>
# 8 8 h hh h hh <NA> <NA>
# 9 9 i ii i ii <NA> <NA>
# 10 10 j jj j jj <NA> <NA>
# 11 11 <NA> <NA> <NA> <NA> f ff
# 12 12 <NA> <NA> <NA> <NA> g gg
# 13 13 <NA> <NA> <NA> <NA> h hh
# 14 1114 <NA> <NA> <NA> <NA> i ii
# 15 15 <NA> <NA> <NA> <NA> j jj