最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Using pivot_longer() to expand rows of data and combine a pattern of columns - Stack Overflow

programmeradmin3浏览0评论

I have a data frame that looks like this:

  ID             Email  Name   Company TripIdentifier      Date1  Campsite1 NumberOfAnimals1      Date2  Campsite2 NumberOfAnimals2
1  1 [email protected] Alice Company A         Trip 1 2022-01-01 Campsite A                5 2022-01-02 Campsite C                5
2  2 [email protected]   Bob Company B         Trip 2 2022-01-02 Campsite B                5 2022-01-03 Campsite D                5

I am trying to create an output table that combines a set of columns that is duplicated many times in my dataset (Date1, Campsite1, NumberOfAnimals1). They are always in the same order. I would like my resulting table to look like this:

  ID               Email   Name    Company TripIdentifier       Date      Campsite NumberOfAnimals
1  1   [email protected]  Alice  Company A         Trip 1 2022-01-01    Campsite A               5
2  1.  [email protected]  Alice  Company A         Trip 1 2022-01-02    Campsite C               5
3  2.  [email protected]    Bob  Company B         Trip 2 2022-01-02    Campsite B               5
4  2.  [email protected]    Bob  Company B         Trip 2 2022-01-03    Campsite D               5

So far, I have been trying to use pivot_longer() with a names_pattern() argument:

# Define the test data frame
Test <- data.frame(
  ID = c(1, 2),
  Email = c("[email protected]", "[email protected]"),
  Name = c("Alice", "Bob"),
  Company = c("Company A", "Company B"),
  TripIdentifier = c("Trip 1", "Trip 2"),
  Date1 = as.Date(c("2022-01-01", "2022-01-02")),
  Campsite1 = c("A", "B"),
  NumberOfAnimals1 = c(5, 5),
  Date2 = as.Date(c("2022-01-02", "2022-01-03")),
  Campsite2 = c("C", "D"),
  NumberOfAnimals2 = c(5, 5),
  stringsAsFactors = FALSE
)

# Create the specification using pivot_longer
spec <- Test %>%
  pivot_longer(
    cols = starts_with("Date"),  
    names_to = c(".value", "trip"),  
    names_pattern = "(.*)(\\d+)$" 
  )

# Now use this specification
reshaped <- spec
  )

However, this puts out:

# A tibble: 4 × 11
     ID Email             Name  Company  TripIdentifier Campsite1 NumberOfAnimals1 Campsite2 NumberOfAnimals2 trip  Date      
  <dbl> <chr>             <chr> <chr>    <chr>          <chr>                <dbl> <chr>                <dbl> <chr> <date>    
1     1 [email protected] Alice Company… Trip 1         A                        5 C                        5 1     2022-01-01
2     1 [email protected] Alice Company… Trip 1         A                        5 C                        5 2     2022-01-02
3     2 [email protected] Bob   Company… Trip 2         B                        5 D                        5 1     2022-01-02
4     2 [email protected] Bob   Company… Trip 2         B                        5 D                        5 2     2022-01-03

The resulting table only combines the "Date" column, but not the others in the pattern. I am new to Tidyverse and am getting a bit confused about all the ways to use pivot_longer(). Any ideas on how to accomplish this would be helpful and thanks in advance!

I have a data frame that looks like this:

  ID             Email  Name   Company TripIdentifier      Date1  Campsite1 NumberOfAnimals1      Date2  Campsite2 NumberOfAnimals2
1  1 [email protected] Alice Company A         Trip 1 2022-01-01 Campsite A                5 2022-01-02 Campsite C                5
2  2 [email protected]   Bob Company B         Trip 2 2022-01-02 Campsite B                5 2022-01-03 Campsite D                5

I am trying to create an output table that combines a set of columns that is duplicated many times in my dataset (Date1, Campsite1, NumberOfAnimals1). They are always in the same order. I would like my resulting table to look like this:

  ID               Email   Name    Company TripIdentifier       Date      Campsite NumberOfAnimals
1  1   [email protected]  Alice  Company A         Trip 1 2022-01-01    Campsite A               5
2  1.  [email protected]  Alice  Company A         Trip 1 2022-01-02    Campsite C               5
3  2.  [email protected]    Bob  Company B         Trip 2 2022-01-02    Campsite B               5
4  2.  [email protected]    Bob  Company B         Trip 2 2022-01-03    Campsite D               5

So far, I have been trying to use pivot_longer() with a names_pattern() argument:

# Define the test data frame
Test <- data.frame(
  ID = c(1, 2),
  Email = c("[email protected]", "[email protected]"),
  Name = c("Alice", "Bob"),
  Company = c("Company A", "Company B"),
  TripIdentifier = c("Trip 1", "Trip 2"),
  Date1 = as.Date(c("2022-01-01", "2022-01-02")),
  Campsite1 = c("A", "B"),
  NumberOfAnimals1 = c(5, 5),
  Date2 = as.Date(c("2022-01-02", "2022-01-03")),
  Campsite2 = c("C", "D"),
  NumberOfAnimals2 = c(5, 5),
  stringsAsFactors = FALSE
)

# Create the specification using pivot_longer
spec <- Test %>%
  pivot_longer(
    cols = starts_with("Date"),  
    names_to = c(".value", "trip"),  
    names_pattern = "(.*)(\\d+)$" 
  )

# Now use this specification
reshaped <- spec
  )

However, this puts out:

# A tibble: 4 × 11
     ID Email             Name  Company  TripIdentifier Campsite1 NumberOfAnimals1 Campsite2 NumberOfAnimals2 trip  Date      
  <dbl> <chr>             <chr> <chr>    <chr>          <chr>                <dbl> <chr>                <dbl> <chr> <date>    
1     1 [email protected] Alice Company… Trip 1         A                        5 C                        5 1     2022-01-01
2     1 [email protected] Alice Company… Trip 1         A                        5 C                        5 2     2022-01-02
3     2 [email protected] Bob   Company… Trip 2         B                        5 D                        5 1     2022-01-02
4     2 [email protected] Bob   Company… Trip 2         B                        5 D                        5 2     2022-01-03

The resulting table only combines the "Date" column, but not the others in the pattern. I am new to Tidyverse and am getting a bit confused about all the ways to use pivot_longer(). Any ideas on how to accomplish this would be helpful and thanks in advance!

Share Improve this question edited Mar 4 at 21:36 r2evans 162k7 gold badges88 silver badges168 bronze badges Recognized by R Language Collective asked Mar 4 at 21:17 R BeginnerR Beginner 11 silver badge1 bronze badge
Add a comment  | 

1 Answer 1

Reset to default 2

To achieve your desired result you also have to include the NumberOfAnimals and Campsite columns when pivoting.

library(tidyr)

Test %>%
  pivot_longer(
    cols = c(
      starts_with("Date"),
      starts_with("NumberOfAnimals"),
      starts_with("Campsite")
    ),
    names_to = c(".value", "trip"),
    names_pattern = "(.*)(\\d+)$"
  )
#> # A tibble: 4 × 9
#>      ID Email      Name  Company TripIdentifier trip  Date       NumberOfAnimals
#>   <dbl> <chr>      <chr> <chr>   <chr>          <chr> <date>               <dbl>
#> 1     1 user1@exa… Alice Compan… Trip 1         1     2022-01-01               5
#> 2     1 user1@exa… Alice Compan… Trip 1         2     2022-01-02               5
#> 3     2 user2@exa… Bob   Compan… Trip 2         1     2022-01-02               5
#> 4     2 user2@exa… Bob   Compan… Trip 2         2     2022-01-03               5
#> # ℹ 1 more variable: Campsite <chr>

Or to simplify you could use matches to include columns ending on a digit (thanks to @Onyambu for the reminder):

Test %>%
  pivot_longer(
    cols = matches("\\d+$"),
    names_to = c(".value", "trip"),
    names_pattern = "(.*)(\\d+)$"
  )
#> # A tibble: 4 × 9
#>      ID Email             Name  Company TripIdentifier trip  Date       Campsite
#>   <dbl> <chr>             <chr> <chr>   <chr>          <chr> <date>     <chr>   
#> 1     1 [email protected] Alice Compan… Trip 1         1     2022-01-01 A       
#> 2     1 [email protected] Alice Compan… Trip 1         2     2022-01-02 C       
#> 3     2 [email protected] Bob   Compan… Trip 2         1     2022-01-02 B       
#> 4     2 [email protected] Bob   Compan… Trip 2         2     2022-01-03 D       
#> # ℹ 1 more variable: NumberOfAnimals <dbl>
发布评论

评论列表(0)

  1. 暂无评论