最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

R - dplyr case_when with which.max issue - Stack Overflow

programmeradmin3浏览0评论

I have a very large dataset that I need help processing conditionally

Here is a snippet:

structure(list(month = c(202407L, 202408L, 202409L, 202410L, 
202411L, 202412L, 202501L), item = c("A02", "A02", "A02", "A02", 
"A02", "A02", "A02"), estimate = c("35.79", "", "35.79", 
"35.79", "", "", ""), cycle = c("M", "O", "O", "M", "O", "O", 
"M"), rsp = c(1, 1, 1, 1, 0, 0, 0)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), groups = structure(list(
item = "A02", .rows = structure(list(1:7), ptype = integer(0), class = 
c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

What I would like to do is create a variable out that behaves like so:

  • If cycle is M and estimate is "", then out is 1

  • If cycle is M and estimate is not "", then out is 0

  • If cycle is O and estimate is "", then, look back at most recent row where cycle is M

    If, in that month, estimate is "", then out = 1

    If estimate is not "", then out= 0.

Here is what I tried:

group_by(item) |>
mutate(out = case_when(cycle == "M" & estimate != "" ~ 0,
                     cycle == "M"& estimate == "" ~ 1,
                     cycle == "O" & which.max(cycle == "M" & estimate == "") ~ 1,
                     .default = 0))

Desired Result--The column out should look like so: c(0, 0, 0, 0, 0, 0, 1)

Note: the dataframe is grouped and that number of rows between each cycle==M entry is not necessarily fixed.

Any help is much appreciated. I tried using case_when because I figured dplyr would be the easiest way to do this but other suggestions are more than welcome.

Thanks.

I have a very large dataset that I need help processing conditionally

Here is a snippet:

structure(list(month = c(202407L, 202408L, 202409L, 202410L, 
202411L, 202412L, 202501L), item = c("A02", "A02", "A02", "A02", 
"A02", "A02", "A02"), estimate = c("35.79", "", "35.79", 
"35.79", "", "", ""), cycle = c("M", "O", "O", "M", "O", "O", 
"M"), rsp = c(1, 1, 1, 1, 0, 0, 0)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -7L), groups = structure(list(
item = "A02", .rows = structure(list(1:7), ptype = integer(0), class = 
c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .drop = TRUE))

What I would like to do is create a variable out that behaves like so:

  • If cycle is M and estimate is "", then out is 1

  • If cycle is M and estimate is not "", then out is 0

  • If cycle is O and estimate is "", then, look back at most recent row where cycle is M

    If, in that month, estimate is "", then out = 1

    If estimate is not "", then out= 0.

Here is what I tried:

group_by(item) |>
mutate(out = case_when(cycle == "M" & estimate != "" ~ 0,
                     cycle == "M"& estimate == "" ~ 1,
                     cycle == "O" & which.max(cycle == "M" & estimate == "") ~ 1,
                     .default = 0))

Desired Result--The column out should look like so: c(0, 0, 0, 0, 0, 0, 1)

Note: the dataframe is grouped and that number of rows between each cycle==M entry is not necessarily fixed.

Any help is much appreciated. I tried using case_when because I figured dplyr would be the easiest way to do this but other suggestions are more than welcome.

Thanks.

Share Improve this question asked yesterday jvalentijvalenti 6401 gold badge10 silver badges32 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

Fill up the cycle == "M" values first and then use tidyr::fill to get the most recent values copied at cycle == "O".

library(dplyr)

df %>%
  mutate(out = case_when(cycle == "M" & estimate == "" ~ 1, 
                         cycle == "M" & estimate != "" ~ 0)) %>%
  tidyr::fill(out)

# A tibble: 7 × 6
# Groups:   item [1]
#   month item  estimate cycle   rsp   out
#   <int> <chr> <chr>    <chr> <dbl> <dbl>
#1 202407 A02   "35.79"  M         1     0
#2 202408 A02   ""       O         1     0
#3 202409 A02   "35.79"  O         1     0
#4 202410 A02   "35.79"  M         1     0
#5 202411 A02   ""       O         0     0
#6 202412 A02   ""       O         0     0
#7 202501 A02   ""       M         0     1
发布评论

评论列表(0)

  1. 暂无评论