How to insert a new row every time there is a break in the row
column numbering, while filling the other columns with NA
?
Edit: Iroha's suggestion based on tidyr::complete()
works fine, as do the others. However, when I apply it to millions of rows, it runs for hours.
It seems like a data.table
approach would be faster, but I never use it and I can't seem to find a solution.
Any help/advice would be greatly appreciated!
Initial data:
x <- c("a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b")
y <- c(1,1,2,4,4,4,7,7,11,1,3,3,3,6,6,8)
dat0 <- data.frame(x,y)
>dat0
x y
1 a 1
2 a 1
3 a 2
4 a 4
5 a 4
6 a 4
7 a 7
8 a 7
9 a 11
10 b 1
11 b 3
12 b 3
13 b 3
14 b 6
15 b 6
16 b 8
Desired output:
> dat1
x y
1 a 1
2 a 1
3 a 2
4 <NA> 3
5 a 4
6 a 4
7 a 4
8 <NA> 5
9 <NA> 6
10 a 7
11 a 7
12 <NA> 8
13 <NA> 9
14 <NA> 10
15 a 11
16 b 1
17 <NA> 2
18 b 3
19 b 3
20 b 3
21 <NA> 4
22 <NA> 5
23 b 6
24 b 6
25 <NA> 7
26 b 8
Thanks for help
How to insert a new row every time there is a break in the row
column numbering, while filling the other columns with NA
?
Edit: Iroha's suggestion based on tidyr::complete()
works fine, as do the others. However, when I apply it to millions of rows, it runs for hours.
It seems like a data.table
approach would be faster, but I never use it and I can't seem to find a solution.
Any help/advice would be greatly appreciated!
Initial data:
x <- c("a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b")
y <- c(1,1,2,4,4,4,7,7,11,1,3,3,3,6,6,8)
dat0 <- data.frame(x,y)
>dat0
x y
1 a 1
2 a 1
3 a 2
4 a 4
5 a 4
6 a 4
7 a 7
8 a 7
9 a 11
10 b 1
11 b 3
12 b 3
13 b 3
14 b 6
15 b 6
16 b 8
Desired output:
> dat1
x y
1 a 1
2 a 1
3 a 2
4 <NA> 3
5 a 4
6 a 4
7 a 4
8 <NA> 5
9 <NA> 6
10 a 7
11 a 7
12 <NA> 8
13 <NA> 9
14 <NA> 10
15 a 11
16 b 1
17 <NA> 2
18 b 3
19 b 3
20 b 3
21 <NA> 4
22 <NA> 5
23 b 6
24 b 6
25 <NA> 7
26 b 8
Thanks for help
Share Improve this question edited Feb 16 at 21:46 denis asked Feb 14 at 6:12 denisdenis 7944 silver badges14 bronze badges3 Answers
Reset to default 2This is pretty straight forward with tidyr::complete()
except that grouping variables can't be modified so need to be copied first.
library(tidyr)
library(dplyr)
dat0 |>
group_by(g2 = x) |>
complete(y = full_seq(y, 1)) |>
ungroup(g2) |>
select(-g2)
# A tibble: 26 × 2
y x
<dbl> <chr>
1 1 a
2 1 a
3 2 a
4 3 NA
5 4 a
6 4 a
7 4 a
8 5 NA
9 6 NA
10 7 a
# ℹ 16 more rows
# ℹ Use `print(n = ...)` to see more rows
A bit long-winded:
library(dplyr)
reframe(dat0, y=seq(min(y), max(y)), .by=x) %>%
anti_join(dat0) %>%
mutate(x2=x, x=NA) %>%
full_join(dat0) %>%
mutate(x2=coalesce(x, x2)) %>%
arrange(x2, y) %>%
select(-x2)
x y
1 a 1
2 a 1
3 a 2
4 <NA> 3
5 a 4
6 a 4
7 a 4
8 <NA> 5
9 <NA> 6
10 a 7
11 a 7
12 <NA> 8
13 <NA> 9
14 <NA> 10
15 a 11
16 b 1
17 <NA> 2
18 b 3
19 b 3
20 b 3
21 <NA> 4
22 <NA> 5
23 b 6
24 b 6
25 <NA> 7
26 b 8
Hmm odd requirement, but in base R you could do
res <- dat0
for(let in unique(dat0$x)){
missing_y <- setdiff(1:max(dat0[dat0$x==let,]$y), dat0[dat0$x==let,]$y)
missing_x <- rep(NA, length(missing_y))
res <- rbind(data.frame(x=missing_x, y = missing_y),res)
}
But I would recommend
library(tidyverse)
res_filled <- dat0 %>% group_by(x) %>% complete(y = seq(min(y),max(y),1))