最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Flattening lists within a dataframe within a dataframe, whilst preserving names - Stack Overflow

programmeradmin1浏览0评论

I present here an input data frame that contains lists of data frames that contains lists. Some of the bottom level lists are empty and some lists have length greater than one. I am looking for some R code that will turn the input into the output (also given below)

input = structure(list(pet = c("colin", "fred", "roy"),
                       fruit = list(structure(list(apple = "red",
                                                   banana = "yellow", 
                                                   mango = "green"),
                                              class = "data.frame",
                                              row.names = 1L), 
                                    structure(list(apple = "mouldy",
                                                   banana = "bruised",
                                                   mango = "cut"),
                                              class = "data.frame",
                                              row.names = 1L), 
                                    structure(list(apple = c("windfall", "cooking"),
                                                   banana = c("picked", "ripe"),
                                                   mango = c("stolen", "round")),
                                              class = "data.frame",
                                              row.names = 1:2)), 
                       flavours = list(structure(list()),
                                       structure(list(sweet = "very",
                                                      sour = "ouch", 
                                                      spicy = "hot"),
                                                 class = "data.frame",
                                                 row.names = 1L), 
                                       structure(list(sweet = c("sugary", "calories"),
                                                      sour = c("citrus", "lemon"),
                                                      spicy = c("inferno", "burning")),
                                                 class = "data.frame",
                                                 row.names = 1:2))),
                  row.names = c(NA, 3L),
                  class = "data.frame")

output = data.frame(pet = c("colin", "fred", "roy", "roy"),
                    fruit.apple = c("red", "mouldy", "windfall", "cooking"),
                    fruit.banana = c("yellow", "bruised", "picked", "ripe"),
                    fruit.mango = c("green", "cut", "stolen", "round"),
                    flavours.sweet = c(NA, "very", "sugary", "calories"),
                    flavours.sour = c(NA, "ouch", "citrus", "lemon"),
                    flavours.spicy = c(NA, "hot", "inferno", "burning"))

The features must be that the output data frame has column names that are the concatenation of the names already existing in the input, separated by a dot. Where the inner data frame has a list that's empty, this should give rise to NA in the final data frame, and not throw any errors. Where either of the inner data frames has a list with length greater than one, this length will be matched across all lists in that row (it's a design of the input data), and should give rise to two rows in the output with the corresponding values.

I have tried a wide combination of approaches already using such things as rrapply::rrapply(), data.table::'s .SDcol within lapply, unlist(), and every flatten() function that I can find, from purrr:: to jsonlite:: ! None have worked for me so far.

I found some Stack Overflow links (here and here) that came close to what I wanted, but none delivered the correct column names, dealing with empty lists, and lists of greater than length one all at once.

Can you help please? Thank you.

I present here an input data frame that contains lists of data frames that contains lists. Some of the bottom level lists are empty and some lists have length greater than one. I am looking for some R code that will turn the input into the output (also given below)

input = structure(list(pet = c("colin", "fred", "roy"),
                       fruit = list(structure(list(apple = "red",
                                                   banana = "yellow", 
                                                   mango = "green"),
                                              class = "data.frame",
                                              row.names = 1L), 
                                    structure(list(apple = "mouldy",
                                                   banana = "bruised",
                                                   mango = "cut"),
                                              class = "data.frame",
                                              row.names = 1L), 
                                    structure(list(apple = c("windfall", "cooking"),
                                                   banana = c("picked", "ripe"),
                                                   mango = c("stolen", "round")),
                                              class = "data.frame",
                                              row.names = 1:2)), 
                       flavours = list(structure(list()),
                                       structure(list(sweet = "very",
                                                      sour = "ouch", 
                                                      spicy = "hot"),
                                                 class = "data.frame",
                                                 row.names = 1L), 
                                       structure(list(sweet = c("sugary", "calories"),
                                                      sour = c("citrus", "lemon"),
                                                      spicy = c("inferno", "burning")),
                                                 class = "data.frame",
                                                 row.names = 1:2))),
                  row.names = c(NA, 3L),
                  class = "data.frame")

output = data.frame(pet = c("colin", "fred", "roy", "roy"),
                    fruit.apple = c("red", "mouldy", "windfall", "cooking"),
                    fruit.banana = c("yellow", "bruised", "picked", "ripe"),
                    fruit.mango = c("green", "cut", "stolen", "round"),
                    flavours.sweet = c(NA, "very", "sugary", "calories"),
                    flavours.sour = c(NA, "ouch", "citrus", "lemon"),
                    flavours.spicy = c(NA, "hot", "inferno", "burning"))

The features must be that the output data frame has column names that are the concatenation of the names already existing in the input, separated by a dot. Where the inner data frame has a list that's empty, this should give rise to NA in the final data frame, and not throw any errors. Where either of the inner data frames has a list with length greater than one, this length will be matched across all lists in that row (it's a design of the input data), and should give rise to two rows in the output with the corresponding values.

I have tried a wide combination of approaches already using such things as rrapply::rrapply(), data.table::'s .SDcol within lapply, unlist(), and every flatten() function that I can find, from purrr:: to jsonlite:: ! None have worked for me so far.

I found some Stack Overflow links (here and here) that came close to what I wanted, but none delivered the correct column names, dealing with empty lists, and lists of greater than length one all at once.

Can you help please? Thank you.

Share Improve this question asked 20 hours ago NevilNevil 1711 silver badge11 bronze badges 1
  • Thanks Friede. Sadly, I tried collapse::unlist2d(input) and it seems to return the same input object, unchanged in any discernible way. – Nevil Commented 19 hours ago
Add a comment  | 

2 Answers 2

Reset to default 1

Here's one way to do this with tidyr::unnest

library(tidyr)

input |>
  unnest_wider(col = c(fruit, flavours),
               names_sep = '.') |>
  unnest(cols = -pet)
#> # A tibble: 4 × 7
#>   pet   fruit.apple fruit.banana fruit.mango flavours.sweet flavours.sour
#>   <chr> <chr>       <chr>        <chr>       <chr>          <chr>        
#> 1 colin red         yellow       green       <NA>           <NA>         
#> 2 fred  mouldy      bruised      cut         very           ouch         
#> 3 roy   windfall    picked       stolen      sugary         citrus       
#> 4 roy   cooking     ripe         round       calories       lemon        
#> # ℹ 1 more variable: flavours.spicy <chr>

You need to replace list() with sufficient amount of NA.

> cbind(
+   pet=rep(input$pet, sapply(input$fruit, nrow)),
+   fruit=do.call(rbind, input$fruit),
+   flavours=do.call(rbind, replace(input$flavours, lengths(input$flavours) == 0, list(rep_len(NA, nrow(input)))))
+ )
    pet fruit.apple fruit.banana fruit.mango flavours.sweet flavours.sour flavours.spicy
1 colin         red       yellow       green           <NA>          <NA>           <NA>
2  fred      mouldy      bruised         cut           very          ouch            hot
3   roy    windfall       picked      stolen         sugary        citrus        inferno
4   roy     cooking         ripe       round       calories         lemon        burning

You can wrap this in a function:

> fx <- \(x) {do.call(rbind, replace(x, lengths(x) == 0, list(rep_len(NA, length(x)))))}
> cbind(pet=rep(input$pet, sapply(input$fruit, nrow)), do.call(cbind, lapply(input[-1], fx)))
    pet fruit.apple fruit.banana fruit.mango flavours.sweet flavours.sour flavours.spicy
1 colin         red       yellow       green           <NA>          <NA>           <NA>
2  fred      mouldy      bruised         cut           very          ouch            hot
3   roy    windfall       picked      stolen         sugary        citrus        inferno
4   roy     cooking         ripe       round       calories         lemon        burning
发布评论

评论列表(0)

  1. 暂无评论