最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

date - R Iteratively modify values based on previous row if a condition is met - Stack Overflow

programmeradmin1浏览0评论

Here is a sample of my df:

library(dplyr)

# Create the dataframe
df <- data.frame(
  id = c('A', 'A', 'A', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'),
 supply_start_date = as.Date(c('2024-01-01', '2024-01-20', '2024-04-20', '2024-01-02', '2018-03-01', '2018-07-03', '2018-10-07', '2019-01-23', '2017-04-28', '2017-05-26', '2017-06-06')),
 supply_qty = c(30, 60, 100, 100, 100, 100, 100, 100, 30, 30, 30)
)

It has supply start date and quantity supplied for 3 IDs. For each ID, I want to do the following 1) create supply end date 2) if a supply from the previous row lasts longer than the current supply start date, generate a new supply begin and new supply end date. for example for A, first supply ends on 1/30/2022 and second supply begins on 1/20/2022 i.e, there is an overlap of 10 days.In this scenario, new_supply_begin_date for row 2 is 1/31/2022. If a supply_start_date and previous supply_end_date does not overlap, then no modification is necessary. Evaluation is done by ID. This is what I have tried

# Add supply_end column
df <- df %>%
  mutate(supply_end_date=  supply_start_date + ( supply_qty - 1))

df2<-df %>% 
  arrange(id, supply_start_date) %>% 
  group_by(id) %>% 
  mutate(new_supply_start_date=as.Date(ifelse(row_number()>1 & (supply_start_date<=lag(supply_end_date,default = first(supply_end_date))+supply_qty-1),
         lag(supply_end_date,default=first(supply_end_date)+1),
         supply_start_date)),
         new_supply_end_date=as.Date(new_supply_start_date+supply_qty-1)) %>% 
  ungroup()
df2

if you look at the last row, for example, the new_supply_start_date should be 2017-06-26 and not 2017-06-24. I think I need to iteratively modify new_supply_start_date then modify new_supply_end_date for each row but I am not sure how to achieve that. Any help/tip is much appreciated. Thanks

Here is a sample of my df:

library(dplyr)

# Create the dataframe
df <- data.frame(
  id = c('A', 'A', 'A', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'),
 supply_start_date = as.Date(c('2024-01-01', '2024-01-20', '2024-04-20', '2024-01-02', '2018-03-01', '2018-07-03', '2018-10-07', '2019-01-23', '2017-04-28', '2017-05-26', '2017-06-06')),
 supply_qty = c(30, 60, 100, 100, 100, 100, 100, 100, 30, 30, 30)
)

It has supply start date and quantity supplied for 3 IDs. For each ID, I want to do the following 1) create supply end date 2) if a supply from the previous row lasts longer than the current supply start date, generate a new supply begin and new supply end date. for example for A, first supply ends on 1/30/2022 and second supply begins on 1/20/2022 i.e, there is an overlap of 10 days.In this scenario, new_supply_begin_date for row 2 is 1/31/2022. If a supply_start_date and previous supply_end_date does not overlap, then no modification is necessary. Evaluation is done by ID. This is what I have tried

# Add supply_end column
df <- df %>%
  mutate(supply_end_date=  supply_start_date + ( supply_qty - 1))

df2<-df %>% 
  arrange(id, supply_start_date) %>% 
  group_by(id) %>% 
  mutate(new_supply_start_date=as.Date(ifelse(row_number()>1 & (supply_start_date<=lag(supply_end_date,default = first(supply_end_date))+supply_qty-1),
         lag(supply_end_date,default=first(supply_end_date)+1),
         supply_start_date)),
         new_supply_end_date=as.Date(new_supply_start_date+supply_qty-1)) %>% 
  ungroup()
df2

if you look at the last row, for example, the new_supply_start_date should be 2017-06-26 and not 2017-06-24. I think I need to iteratively modify new_supply_start_date then modify new_supply_end_date for each row but I am not sure how to achieve that. Any help/tip is much appreciated. Thanks

Share Improve this question edited Feb 4 at 4:25 user3641630 asked Feb 4 at 4:11 user3641630user3641630 3351 gold badge3 silver badges11 bronze badges 0
Add a comment  | 

2 Answers 2

Reset to default 1

Here is a solution that iterates over the rows of each subset of the data frame. I used the split function to make the subsets and one of the map_() functions from purrr to modify each subset. I added a Row column to the output data frame to make it easier to track the subsets. That line of code can be deleted.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Create the dataframe
df <- data.frame(
  id = c('A', 'A', 'A', 'B', 'C', 'C', 'C', 'C', 'D', 'D', 'D'),
  supply_start_date = as.Date(c('2024-01-01', '2024-01-20', '2024-04-20', '2024-01-02', '2018-03-01', '2018-07-03', '2018-10-07', '2019-01-23', '2017-04-28', '2017-05-26', '2017-06-06')),
  supply_qty = c(30, 60, 100, 100, 100, 100, 100, 100, 30, 30, 30)
)
# Add supply_end column
df <- df %>%
  mutate(supply_end_date=  supply_start_date + ( supply_qty - 1))

AdjDates <- function(DF) {
  DF <- DF |> mutate(Row = row_number())
  for (i in 1:nrow(DF)) {
    if (i > 1) {
      PrevEnd <- DF[i-1, 'supply_end_date']
      if (PrevEnd >= DF[i, 'supply_start_date']) {
        DF[i, 'supply_start_date'] = PrevEnd + 1
        DF[i, 'supply_end_date'] = DF[i, 'supply_start_date'] + DF[i, 'supply_qty'] -1
      }
    }
  }
  return(DF)
}
df |> split(df$id) |> purrr::map_dfr(AdjDates)
#>    id supply_start_date supply_qty supply_end_date Row
#> 1   A        2024-01-01         30      2024-01-30   1
#> 2   A        2024-01-31         60      2024-03-30   2
#> 3   A        2024-04-20        100      2024-07-28   3
#> 4   B        2024-01-02        100      2024-04-10   1
#> 5   C        2018-03-01        100      2018-06-08   1
#> 6   C        2018-07-03        100      2018-10-10   2
#> 7   C        2018-10-11        100      2019-01-18   3
#> 8   C        2019-01-23        100      2019-05-02   4
#> 9   D        2017-04-28         30      2017-05-27   1
#> 10  D        2017-05-28         30      2017-06-26   2
#> 11  D        2017-06-27         30      2017-07-26   3

Created on 2025-02-03 with reprex v2.1.1

I don't think you need to explicitly iterate over the data, instead you can calculate the cumulative sum of contiguous overlaps and use these values to adjust the dates.

library(dplyr)

df |> 
  mutate(supply_end_date = supply_start_date + (supply_qty - 1),
         overlap = pmax(0, lag(supply_end_date) - supply_start_date, na.rm = TRUE), .by = id) |> 
  mutate(cso = cumsum(overlap),
         overlap = cso - cummax((overlap == 0) * cso),
         new_start_date = supply_start_date + overlap + (overlap != 0),
         new_end_date = new_start_date + (overlap + supply_qty - 1)) |> 
  select(-c(overlap, cso))

   id supply_start_date supply_qty supply_end_date new_start_date new_end_date
1   A        2024-01-01         30      2024-01-30     2024-01-01   2024-01-30
2   A        2024-01-20         60      2024-03-19     2024-01-31   2024-04-09
3   A        2024-04-20        100      2024-07-28     2024-04-20   2024-07-28
4   B        2024-01-02        100      2024-04-10     2024-01-02   2024-04-10
5   C        2018-03-01        100      2018-06-08     2018-03-01   2018-06-08
6   C        2018-07-03        100      2018-10-10     2018-07-03   2018-10-10
7   C        2018-10-07        100      2019-01-14     2018-10-11   2019-01-21
8   C        2019-01-23        100      2019-05-02     2019-01-23   2019-05-02
9   D        2017-04-28         30      2017-05-27     2017-04-28   2017-05-27
10  D        2017-05-26         30      2017-06-24     2017-05-28   2017-06-27
11  D        2017-06-06         30      2017-07-05     2017-06-26   2017-08-13
发布评论

评论列表(0)

  1. 暂无评论