I have a data frame that looks like this:
ID Email Name Company TripIdentifier Date1 Campsite1 NumberOfAnimals1 Date2 Campsite2 NumberOfAnimals2
1 1 [email protected] Alice Company A Trip 1 2022-01-01 Campsite A 5 2022-01-02 Campsite C 5
2 2 [email protected] Bob Company B Trip 2 2022-01-02 Campsite B 5 2022-01-03 Campsite D 5
I am trying to create an output table that combines a set of columns that is duplicated many times in my dataset (Date1, Campsite1, NumberOfAnimals1). They are always in the same order. I would like my resulting table to look like this:
ID Email Name Company TripIdentifier Date Campsite NumberOfAnimals
1 1 [email protected] Alice Company A Trip 1 2022-01-01 Campsite A 5
2 1. [email protected] Alice Company A Trip 1 2022-01-02 Campsite C 5
3 2. [email protected] Bob Company B Trip 2 2022-01-02 Campsite B 5
4 2. [email protected] Bob Company B Trip 2 2022-01-03 Campsite D 5
So far, I have been trying to use pivot_longer() with a names_pattern() argument:
# Define the test data frame
Test <- data.frame(
ID = c(1, 2),
Email = c("[email protected]", "[email protected]"),
Name = c("Alice", "Bob"),
Company = c("Company A", "Company B"),
TripIdentifier = c("Trip 1", "Trip 2"),
Date1 = as.Date(c("2022-01-01", "2022-01-02")),
Campsite1 = c("A", "B"),
NumberOfAnimals1 = c(5, 5),
Date2 = as.Date(c("2022-01-02", "2022-01-03")),
Campsite2 = c("C", "D"),
NumberOfAnimals2 = c(5, 5),
stringsAsFactors = FALSE
)
# Create the specification using pivot_longer
spec <- Test %>%
pivot_longer(
cols = starts_with("Date"),
names_to = c(".value", "trip"),
names_pattern = "(.*)(\\d+)$"
)
# Now use this specification
reshaped <- spec
)
However, this puts out:
# A tibble: 4 × 11
ID Email Name Company TripIdentifier Campsite1 NumberOfAnimals1 Campsite2 NumberOfAnimals2 trip Date
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <date>
1 1 [email protected] Alice Company… Trip 1 A 5 C 5 1 2022-01-01
2 1 [email protected] Alice Company… Trip 1 A 5 C 5 2 2022-01-02
3 2 [email protected] Bob Company… Trip 2 B 5 D 5 1 2022-01-02
4 2 [email protected] Bob Company… Trip 2 B 5 D 5 2 2022-01-03
The resulting table only combines the "Date" column, but not the others in the pattern. I am new to Tidyverse and am getting a bit confused about all the ways to use pivot_longer(). Any ideas on how to accomplish this would be helpful and thanks in advance!
I have a data frame that looks like this:
ID Email Name Company TripIdentifier Date1 Campsite1 NumberOfAnimals1 Date2 Campsite2 NumberOfAnimals2
1 1 [email protected] Alice Company A Trip 1 2022-01-01 Campsite A 5 2022-01-02 Campsite C 5
2 2 [email protected] Bob Company B Trip 2 2022-01-02 Campsite B 5 2022-01-03 Campsite D 5
I am trying to create an output table that combines a set of columns that is duplicated many times in my dataset (Date1, Campsite1, NumberOfAnimals1). They are always in the same order. I would like my resulting table to look like this:
ID Email Name Company TripIdentifier Date Campsite NumberOfAnimals
1 1 [email protected] Alice Company A Trip 1 2022-01-01 Campsite A 5
2 1. [email protected] Alice Company A Trip 1 2022-01-02 Campsite C 5
3 2. [email protected] Bob Company B Trip 2 2022-01-02 Campsite B 5
4 2. [email protected] Bob Company B Trip 2 2022-01-03 Campsite D 5
So far, I have been trying to use pivot_longer() with a names_pattern() argument:
# Define the test data frame
Test <- data.frame(
ID = c(1, 2),
Email = c("[email protected]", "[email protected]"),
Name = c("Alice", "Bob"),
Company = c("Company A", "Company B"),
TripIdentifier = c("Trip 1", "Trip 2"),
Date1 = as.Date(c("2022-01-01", "2022-01-02")),
Campsite1 = c("A", "B"),
NumberOfAnimals1 = c(5, 5),
Date2 = as.Date(c("2022-01-02", "2022-01-03")),
Campsite2 = c("C", "D"),
NumberOfAnimals2 = c(5, 5),
stringsAsFactors = FALSE
)
# Create the specification using pivot_longer
spec <- Test %>%
pivot_longer(
cols = starts_with("Date"),
names_to = c(".value", "trip"),
names_pattern = "(.*)(\\d+)$"
)
# Now use this specification
reshaped <- spec
)
However, this puts out:
# A tibble: 4 × 11
ID Email Name Company TripIdentifier Campsite1 NumberOfAnimals1 Campsite2 NumberOfAnimals2 trip Date
<dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <date>
1 1 [email protected] Alice Company… Trip 1 A 5 C 5 1 2022-01-01
2 1 [email protected] Alice Company… Trip 1 A 5 C 5 2 2022-01-02
3 2 [email protected] Bob Company… Trip 2 B 5 D 5 1 2022-01-02
4 2 [email protected] Bob Company… Trip 2 B 5 D 5 2 2022-01-03
The resulting table only combines the "Date" column, but not the others in the pattern. I am new to Tidyverse and am getting a bit confused about all the ways to use pivot_longer(). Any ideas on how to accomplish this would be helpful and thanks in advance!
Share Improve this question edited Mar 4 at 21:36 r2evans 162k7 gold badges88 silver badges168 bronze badges Recognized by R Language Collective asked Mar 4 at 21:17 R BeginnerR Beginner 11 silver badge1 bronze badge1 Answer
Reset to default 2To achieve your desired result you also have to include the NumberOfAnimals
and Campsite
columns when pivoting.
library(tidyr)
Test %>%
pivot_longer(
cols = c(
starts_with("Date"),
starts_with("NumberOfAnimals"),
starts_with("Campsite")
),
names_to = c(".value", "trip"),
names_pattern = "(.*)(\\d+)$"
)
#> # A tibble: 4 × 9
#> ID Email Name Company TripIdentifier trip Date NumberOfAnimals
#> <dbl> <chr> <chr> <chr> <chr> <chr> <date> <dbl>
#> 1 1 user1@exa… Alice Compan… Trip 1 1 2022-01-01 5
#> 2 1 user1@exa… Alice Compan… Trip 1 2 2022-01-02 5
#> 3 2 user2@exa… Bob Compan… Trip 2 1 2022-01-02 5
#> 4 2 user2@exa… Bob Compan… Trip 2 2 2022-01-03 5
#> # ℹ 1 more variable: Campsite <chr>
Or to simplify you could use matches
to include columns ending on a digit (thanks to @Onyambu for the reminder):
Test %>%
pivot_longer(
cols = matches("\\d+$"),
names_to = c(".value", "trip"),
names_pattern = "(.*)(\\d+)$"
)
#> # A tibble: 4 × 9
#> ID Email Name Company TripIdentifier trip Date Campsite
#> <dbl> <chr> <chr> <chr> <chr> <chr> <date> <chr>
#> 1 1 [email protected] Alice Compan… Trip 1 1 2022-01-01 A
#> 2 1 [email protected] Alice Compan… Trip 1 2 2022-01-02 C
#> 3 2 [email protected] Bob Compan… Trip 2 1 2022-01-02 B
#> 4 2 [email protected] Bob Compan… Trip 2 2 2022-01-03 D
#> # ℹ 1 more variable: NumberOfAnimals <dbl>