R - dplyr case_when with which.max issue

I have a very large dataset that I need help processing conditionally

Here is a snippet:

structure(list(month = c(202407L, 202408L, 202409L, 202410L, 
202411L, 202412L, 202501L), item = c("A02", "A02", "A02", "A02", 
"A02", "A02", "A02"), estimate = c("35.79", "", "35.79", 
"35.79", "", "", ""), cycle = c("M", "O", "O", "M", "O", "O", 
"M"), rsp = c(1, 1, 1, 1, 0, 0, 0)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), groups = structure(list(
item = "A02", .rows = structure(list(1:7), ptype = integer(0), class = 
c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

What I would like to do is create a variable out that behaves like so:

If cycle is M and estimate is "", then out is 1
If cycle is M and estimate is not "", then out is 0
If cycle is O and estimate is "", then, look back at most recent row where cycle is M

If, in that month, estimate is "", then out = 1

If estimate is not "", then out= 0.

Here is what I tried:

group_by(item) |>
mutate(out = case_when(cycle == "M" & estimate != "" ~ 0,
                     cycle == "M"& estimate == "" ~ 1,
                     cycle == "O" & which.max(cycle == "M" & estimate == "") ~ 1,
                     .default = 0))

Desired Result--The column out should look like so: c(0, 0, 0, 0, 0, 0, 1)

Note: the dataframe is grouped and that number of rows between each cycle==M entry is not necessarily fixed.

Any help is much appreciated. I tried using case_when because I figured dplyr would be the easiest way to do this but other suggestions are more than welcome.

Thanks.

I have a very large dataset that I need help processing conditionally

Here is a snippet:

structure(list(month = c(202407L, 202408L, 202409L, 202410L, 
202411L, 202412L, 202501L), item = c("A02", "A02", "A02", "A02", 
"A02", "A02", "A02"), estimate = c("35.79", "", "35.79", 
"35.79", "", "", ""), cycle = c("M", "O", "O", "M", "O", "O", 
"M"), rsp = c(1, 1, 1, 1, 0, 0, 0)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), groups = structure(list(
item = "A02", .rows = structure(list(1:7), ptype = integer(0), class = 
c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

What I would like to do is create a variable out that behaves like so:

If cycle is M and estimate is "", then out is 1
If cycle is M and estimate is not "", then out is 0
If cycle is O and estimate is "", then, look back at most recent row where cycle is M

If, in that month, estimate is "", then out = 1

If estimate is not "", then out= 0.

Here is what I tried:

group_by(item) |>
mutate(out = case_when(cycle == "M" & estimate != "" ~ 0,
                     cycle == "M"& estimate == "" ~ 1,
                     cycle == "O" & which.max(cycle == "M" & estimate == "") ~ 1,
                     .default = 0))

Desired Result--The column out should look like so: c(0, 0, 0, 0, 0, 0, 1)

Note: the dataframe is grouped and that number of rows between each cycle==M entry is not necessarily fixed.

Any help is much appreciated. I tried using case_when because I figured dplyr would be the easiest way to do this but other suggestions are more than welcome.

Thanks.

Share Improve this question asked yesterday jvalenti 6401 gold badge10 silver badges32 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

Fill up the cycle == "M" values first and then use tidyr::fill to get the most recent values copied at cycle == "O".

library(dplyr)

df %>%
  mutate(out = case_when(cycle == "M" & estimate == "" ~ 1, 
                         cycle == "M" & estimate != "" ~ 0)) %>%
  tidyr::fill(out)

# A tibble: 7 × 6
# Groups:   item [1]
#   month item  estimate cycle   rsp   out
#   <int> <chr> <chr>    <chr> <dbl> <dbl>
#1 202407 A02   "35.79"  M         1     0
#2 202408 A02   ""       O         1     0
#3 202409 A02   "35.79"  O         1     0
#4 202410 A02   "35.79"  M         1     0
#5 202411 A02   ""       O         0     0
#6 202412 A02   ""       O         0     0
#7 202501 A02   ""       M         0     1

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

R - dplyr case_when with which.max issue - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)