I present here an input
data frame that contains lists of data frames that contains lists.
Some of the bottom level lists are empty and some lists have length greater than one.
I am looking for some R code that will turn the input
into the output
(also given below)
input = structure(list(pet = c("colin", "fred", "roy"),
fruit = list(structure(list(apple = "red",
banana = "yellow",
mango = "green"),
class = "data.frame",
row.names = 1L),
structure(list(apple = "mouldy",
banana = "bruised",
mango = "cut"),
class = "data.frame",
row.names = 1L),
structure(list(apple = c("windfall", "cooking"),
banana = c("picked", "ripe"),
mango = c("stolen", "round")),
class = "data.frame",
row.names = 1:2)),
flavours = list(structure(list()),
structure(list(sweet = "very",
sour = "ouch",
spicy = "hot"),
class = "data.frame",
row.names = 1L),
structure(list(sweet = c("sugary", "calories"),
sour = c("citrus", "lemon"),
spicy = c("inferno", "burning")),
class = "data.frame",
row.names = 1:2))),
row.names = c(NA, 3L),
class = "data.frame")
output = data.frame(pet = c("colin", "fred", "roy", "roy"),
fruit.apple = c("red", "mouldy", "windfall", "cooking"),
fruit.banana = c("yellow", "bruised", "picked", "ripe"),
fruit.mango = c("green", "cut", "stolen", "round"),
flavours.sweet = c(NA, "very", "sugary", "calories"),
flavours.sour = c(NA, "ouch", "citrus", "lemon"),
flavours.spicy = c(NA, "hot", "inferno", "burning"))
The features must be that the output
data frame has column names that are the concatenation of the names already existing in the input
, separated by a dot.
Where the inner data frame has a list that's empty, this should give rise to NA in the final data frame, and not throw any errors.
Where either of the inner data frames has a list with length greater than one, this length will be matched across all lists in that row (it's a design of the input data), and should give rise to two rows in the output
with the corresponding values.
I have tried a wide combination of approaches already using such things as rrapply::rrapply(), data.table::'s .SDcol within lapply, unlist(), and every flatten() function that I can find, from purrr:: to jsonlite:: ! None have worked for me so far.
I found some Stack Overflow links (here and here) that came close to what I wanted, but none delivered the correct column names, dealing with empty lists, and lists of greater than length one all at once.
Can you help please? Thank you.
I present here an input
data frame that contains lists of data frames that contains lists.
Some of the bottom level lists are empty and some lists have length greater than one.
I am looking for some R code that will turn the input
into the output
(also given below)
input = structure(list(pet = c("colin", "fred", "roy"),
fruit = list(structure(list(apple = "red",
banana = "yellow",
mango = "green"),
class = "data.frame",
row.names = 1L),
structure(list(apple = "mouldy",
banana = "bruised",
mango = "cut"),
class = "data.frame",
row.names = 1L),
structure(list(apple = c("windfall", "cooking"),
banana = c("picked", "ripe"),
mango = c("stolen", "round")),
class = "data.frame",
row.names = 1:2)),
flavours = list(structure(list()),
structure(list(sweet = "very",
sour = "ouch",
spicy = "hot"),
class = "data.frame",
row.names = 1L),
structure(list(sweet = c("sugary", "calories"),
sour = c("citrus", "lemon"),
spicy = c("inferno", "burning")),
class = "data.frame",
row.names = 1:2))),
row.names = c(NA, 3L),
class = "data.frame")
output = data.frame(pet = c("colin", "fred", "roy", "roy"),
fruit.apple = c("red", "mouldy", "windfall", "cooking"),
fruit.banana = c("yellow", "bruised", "picked", "ripe"),
fruit.mango = c("green", "cut", "stolen", "round"),
flavours.sweet = c(NA, "very", "sugary", "calories"),
flavours.sour = c(NA, "ouch", "citrus", "lemon"),
flavours.spicy = c(NA, "hot", "inferno", "burning"))
The features must be that the output
data frame has column names that are the concatenation of the names already existing in the input
, separated by a dot.
Where the inner data frame has a list that's empty, this should give rise to NA in the final data frame, and not throw any errors.
Where either of the inner data frames has a list with length greater than one, this length will be matched across all lists in that row (it's a design of the input data), and should give rise to two rows in the output
with the corresponding values.
I have tried a wide combination of approaches already using such things as rrapply::rrapply(), data.table::'s .SDcol within lapply, unlist(), and every flatten() function that I can find, from purrr:: to jsonlite:: ! None have worked for me so far.
I found some Stack Overflow links (here and here) that came close to what I wanted, but none delivered the correct column names, dealing with empty lists, and lists of greater than length one all at once.
Can you help please? Thank you.
Share Improve this question asked 20 hours ago NevilNevil 1711 silver badge11 bronze badges 1 |2 Answers
Reset to default 1Here's one way to do this with tidyr::unnest
library(tidyr)
input |>
unnest_wider(col = c(fruit, flavours),
names_sep = '.') |>
unnest(cols = -pet)
#> # A tibble: 4 × 7
#> pet fruit.apple fruit.banana fruit.mango flavours.sweet flavours.sour
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 colin red yellow green <NA> <NA>
#> 2 fred mouldy bruised cut very ouch
#> 3 roy windfall picked stolen sugary citrus
#> 4 roy cooking ripe round calories lemon
#> # ℹ 1 more variable: flavours.spicy <chr>
You need to replace
list()
with sufficient amount of NA
.
> cbind(
+ pet=rep(input$pet, sapply(input$fruit, nrow)),
+ fruit=do.call(rbind, input$fruit),
+ flavours=do.call(rbind, replace(input$flavours, lengths(input$flavours) == 0, list(rep_len(NA, nrow(input)))))
+ )
pet fruit.apple fruit.banana fruit.mango flavours.sweet flavours.sour flavours.spicy
1 colin red yellow green <NA> <NA> <NA>
2 fred mouldy bruised cut very ouch hot
3 roy windfall picked stolen sugary citrus inferno
4 roy cooking ripe round calories lemon burning
You can wrap this in a function:
> fx <- \(x) {do.call(rbind, replace(x, lengths(x) == 0, list(rep_len(NA, length(x)))))}
> cbind(pet=rep(input$pet, sapply(input$fruit, nrow)), do.call(cbind, lapply(input[-1], fx)))
pet fruit.apple fruit.banana fruit.mango flavours.sweet flavours.sour flavours.spicy
1 colin red yellow green <NA> <NA> <NA>
2 fred mouldy bruised cut very ouch hot
3 roy windfall picked stolen sugary citrus inferno
4 roy cooking ripe round calories lemon burning
collapse::unlist2d(input)
and it seems to return the sameinput
object, unchanged in any discernible way. – Nevil Commented 19 hours ago