最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Imputing date in time series dataframe - Stack Overflow

programmeradmin3浏览0评论

I have a dataframe in R with several ID, DAY and TIME and amount of a compound (AMT). Typically, for every ID, there should two records at every day, indicating two doses a day, typically in the morning (at around 8 am) and evening (at around 8 pm). Now sometimes the DAY column may indicate "impute" which indicates same dosing as before until there is again an actual DAY value. If this is the case, and the column comment_yh indicates "blue", then I want to impute days. In the end the dataframe should contain the original TIME points (e.g. 8:05 or 19:53) and the imputed ones which are always 8:00 and 20:00.

A minimal example could be:

df <- data.frame(
  ID = c(4, 4, 4, 4, 4, 4,
          5, 5, 5, 5, 
          6, 6, 6, 6),
  DAY = c("14/02/2020", "14/02/2020", "15/02/2020", "impute", "18/02/2020", "18/02/2020", 
          "13/02/2020", "impute", "15/02/2020", "15/02/2020", 
          "13/02/2020", "impute", "15/02/2020", "15/02/2020"),
  TIME = c("8:05", "19:53", "7:45", "NA", "8:10", "20:01", 
           "8:01", "NA", "8:00", "19:50", 
           "8:02", "NA", "8:02", "20:06"),
  AMT = c(3, 3, 2, NA, 4, 5,
          3.5, NA, 3, 4,
          2, NA, 1, 2),
  comment_yh = c(NA, NA, NA, "blue", NA, NA, 
          NA, "blue", NA, NA, 
          NA, "red", NA, NA)
)

Where the resulting, imputed dataframe should like this:

df_final <- data.frame(
  ID = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
            5, 5, 5, 5, 5, 5, 
            6, 6, 6, 6),
  DAY = c("14/02/2020", "14/02/2020", "15/02/2020", "15/02/2020", "16/02/2020", "16/02/2020", "17/02/2020", "17/02/2020", "18/02/2020", "18/02/2020", 
          "13/02/2020",  "13/02/2020",  "14/02/2020", "14/02/2020", "15/02/2020", "15/02/2020", 
          "13/02/2020", "impute", "15/02/2020", "15/02/2020"),
  TIME = c("8:05", "19:53", "7:45", "20:00", "8:00", "20:00", "8:00", "20:00", "8:10", "20:01",
           "8:01", "20:00", "8:00", "20:00", "8:00", "19:50", 
           "8:02", "NA", "8:02", "20:06"),
  AMT = c(3, 3, 2, 2, 2, 2, 2, 2, 4, 5,
          3.5, 3.5, 3.5, 3.5, 3, 4,
          2, NA, 1, 2)
)

Any suggestion is very welcome!

I already tried to loop it but I am not very proficient with R and having problems with it.

I have a dataframe in R with several ID, DAY and TIME and amount of a compound (AMT). Typically, for every ID, there should two records at every day, indicating two doses a day, typically in the morning (at around 8 am) and evening (at around 8 pm). Now sometimes the DAY column may indicate "impute" which indicates same dosing as before until there is again an actual DAY value. If this is the case, and the column comment_yh indicates "blue", then I want to impute days. In the end the dataframe should contain the original TIME points (e.g. 8:05 or 19:53) and the imputed ones which are always 8:00 and 20:00.

A minimal example could be:

df <- data.frame(
  ID = c(4, 4, 4, 4, 4, 4,
          5, 5, 5, 5, 
          6, 6, 6, 6),
  DAY = c("14/02/2020", "14/02/2020", "15/02/2020", "impute", "18/02/2020", "18/02/2020", 
          "13/02/2020", "impute", "15/02/2020", "15/02/2020", 
          "13/02/2020", "impute", "15/02/2020", "15/02/2020"),
  TIME = c("8:05", "19:53", "7:45", "NA", "8:10", "20:01", 
           "8:01", "NA", "8:00", "19:50", 
           "8:02", "NA", "8:02", "20:06"),
  AMT = c(3, 3, 2, NA, 4, 5,
          3.5, NA, 3, 4,
          2, NA, 1, 2),
  comment_yh = c(NA, NA, NA, "blue", NA, NA, 
          NA, "blue", NA, NA, 
          NA, "red", NA, NA)
)

Where the resulting, imputed dataframe should like this:

df_final <- data.frame(
  ID = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
            5, 5, 5, 5, 5, 5, 
            6, 6, 6, 6),
  DAY = c("14/02/2020", "14/02/2020", "15/02/2020", "15/02/2020", "16/02/2020", "16/02/2020", "17/02/2020", "17/02/2020", "18/02/2020", "18/02/2020", 
          "13/02/2020",  "13/02/2020",  "14/02/2020", "14/02/2020", "15/02/2020", "15/02/2020", 
          "13/02/2020", "impute", "15/02/2020", "15/02/2020"),
  TIME = c("8:05", "19:53", "7:45", "20:00", "8:00", "20:00", "8:00", "20:00", "8:10", "20:01",
           "8:01", "20:00", "8:00", "20:00", "8:00", "19:50", 
           "8:02", "NA", "8:02", "20:06"),
  AMT = c(3, 3, 2, 2, 2, 2, 2, 2, 4, 5,
          3.5, 3.5, 3.5, 3.5, 3, 4,
          2, NA, 1, 2)
)

Any suggestion is very welcome!

I already tried to loop it but I am not very proficient with R and having problems with it.

Share Improve this question edited Jan 13 at 15:09 dthorbur 1,0053 gold badges13 silver badges25 bronze badges asked Jan 13 at 14:51 YHOYHO 113 bronze badges 1
  • 1 What about showing your problems so we can help you with them? – jay.sf Commented Jan 13 at 16:23
Add a comment  | 

1 Answer 1

Reset to default 0

To get your required output, you can do this:

library(dplyr)
library(tidyr)
df$DAY <- as.Date(df$DAY, "%d/%m/%Y")

result_df <- df  # Create a copy to store results

for(i in 1:nrow(df)){
  if(!is.na(df$comment_yh[i]) && df$comment_yh[i] == "blue"){
    
    date_seq <- seq(df$DAY[i-1] + 1, df$DAY[i+1] - 1, by = "days") # Create sequence of dates
    n <- length(date_seq)
    if(n > 0){
      result_df <- rbind(result_df,  
                         data.frame( # Insert the new rows 
                            ID = rep(df$ID[i], n*2+1),
                            DAY = c(df$DAY[i-1], rep(date_seq, each = 2)),
                            TIME = c("20:00", rep(c("8:00", "20:00"), n)),
                            AMT = rep(2.0, n*2+1),  # Use dose amount 2.0
                            comment_yh = NA
                          )
                   ) 
    }
  }
}
result_df <- result_df %>% 
  filter(is.na(comment_yh) | comment_yh=="red") %>%
  arrange(ID,DAY,TIME) %>%
  select(-comment_yh) %>% # deselect comment_yh column
  drop_na()  # drop NAs in red row

Output

Note: I dropped the row with "red" as comment_yh

ID DAY TIME AMT
4 2020-02-14 19:53 3.0
4 2020-02-14 8:05 3.0
4 2020-02-15 20:00 2.0
4 2020-02-15 7:45 2.0
4 2020-02-16 20:00 2.0
4 2020-02-16 8:00 2.0
4 2020-02-17 20:00 2.0
4 2020-02-17 8:00 2.0
4 2020-02-18 20:01 5.0
4 2020-02-18 8:10 4.0
5 2020-02-13 20:00 2.0
5 2020-02-13 8:01 3.5
5 2020-02-14 20:00 2.0
5 2020-02-14 8:00 2.0
5 2020-02-15 19:50 4.0
5 2020-02-15 8:00 3.0
6 2020-02-13 8:02 2.0
6 2020-02-15 20:06 2.0
6 2020-02-15 8:02 1.0
发布评论

评论列表(0)

  1. 暂无评论