I have a dataframe like this:
A tibble: 30 x 4
id index day time
<dbl> <int> <int> <chr>
1 238686 1 1 11:53:33
2 238686 2 1 17:45:27
3 238686 3 1 21:12:36
4 238686 4 2 00:32:36
5 238686 5 2 11:07:08
6 238686 6 2 14:43:41
7 238686 7 2 20:50:29
8 238686 8 2 23:22:33
9 238686 9 3 12:05:53
10 238686 10 3 14:48:50
id refers to a participant (each participant provides several reports per day for 10 days), index refers to the consequtive number of report per participant throughout the whole study, day refers to day of study for each participant, and time refers to the time of day of each report.
I'd need a new variable indicating the last report for each day for each participant. I got a very helpful reply from here to do this using
ex_data <- ex_data |>
mutate(last=as.integer(max(index) == index), .by = c(id, day))
This worked great; however, I hadn't realized that some participants had provided the last report of the day after midnight on some days. I managed to mark those occasions using
ex_data$time2<-as.hms(ex_data$time)
ex_data <- ex_data |>
mutate(nighttime = if_else(time2 >= parse_hms("00:00:00") & time2 < parse_hms("03:00:01"), 1, 0))
But I can't come up with a way to create a variable that would indicate whether a report is "not last report of the day" or "last report of the day" for each participant that would include last reports of the day given before and after midnight.
Here's the data (it's a snippet of the actual data):
ex_data<-
structure(list(id = c(238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
239297, 239297, 239297), index = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 1L, 2L, 3L), day = c(1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L, 6L,
6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 1L, 1L, 1L), time = c("11:53:33", "17:45:27", "21:12:36",
"00:32:36", "11:07:08", "14:43:41", "20:50:29", "23:22:33", "12:05:53",
"14:48:50", "21:15:33", "12:09:46", "14:27:06", "18:01:24", "20:56:40",
"23:17:18", "11:19:02", "17:32:18", "00:05:52", "11:45:11", "18:10:10",
"20:08:09", "00:30:00", "11:17:36", "14:29:43", "18:12:06", "20:54:48",
"23:20:32", "11:16:00", "17:26:45", "11:45:13", "14:30:26", "18:31:49",
"20:31:42", "23:47:41", "14:16:07", "23:55:13", "11:16:34", "12:01:56",
"14:33:38")), row.names = c(NA, -40L), class = c("tbl_df", "tbl",
"data.frame"))
And what I'd like is
id index day time last
238686 1 1 11:53:33 0
238686 2 1 17:45:27 0
238686 3 1 21:12:36 0
238686 4 2 00:32:36 1
238686 5 2 11:07:08 0
238686 6 2 14:43:41 0
238686 7 2 20:50:29 0
238686 8 2 23:22:33 1
238686 9 3 12:05:53 0
I have a dataframe like this:
A tibble: 30 x 4
id index day time
<dbl> <int> <int> <chr>
1 238686 1 1 11:53:33
2 238686 2 1 17:45:27
3 238686 3 1 21:12:36
4 238686 4 2 00:32:36
5 238686 5 2 11:07:08
6 238686 6 2 14:43:41
7 238686 7 2 20:50:29
8 238686 8 2 23:22:33
9 238686 9 3 12:05:53
10 238686 10 3 14:48:50
id refers to a participant (each participant provides several reports per day for 10 days), index refers to the consequtive number of report per participant throughout the whole study, day refers to day of study for each participant, and time refers to the time of day of each report.
I'd need a new variable indicating the last report for each day for each participant. I got a very helpful reply from here to do this using
ex_data <- ex_data |>
mutate(last=as.integer(max(index) == index), .by = c(id, day))
This worked great; however, I hadn't realized that some participants had provided the last report of the day after midnight on some days. I managed to mark those occasions using
ex_data$time2<-as.hms(ex_data$time)
ex_data <- ex_data |>
mutate(nighttime = if_else(time2 >= parse_hms("00:00:00") & time2 < parse_hms("03:00:01"), 1, 0))
But I can't come up with a way to create a variable that would indicate whether a report is "not last report of the day" or "last report of the day" for each participant that would include last reports of the day given before and after midnight.
Here's the data (it's a snippet of the actual data):
ex_data<-
structure(list(id = c(238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
238686, 238686, 238686, 238686, 238686, 238686, 238686, 238686,
239297, 239297, 239297), index = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 1L, 2L, 3L), day = c(1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 6L, 6L, 6L,
6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 1L, 1L, 1L), time = c("11:53:33", "17:45:27", "21:12:36",
"00:32:36", "11:07:08", "14:43:41", "20:50:29", "23:22:33", "12:05:53",
"14:48:50", "21:15:33", "12:09:46", "14:27:06", "18:01:24", "20:56:40",
"23:17:18", "11:19:02", "17:32:18", "00:05:52", "11:45:11", "18:10:10",
"20:08:09", "00:30:00", "11:17:36", "14:29:43", "18:12:06", "20:54:48",
"23:20:32", "11:16:00", "17:26:45", "11:45:13", "14:30:26", "18:31:49",
"20:31:42", "23:47:41", "14:16:07", "23:55:13", "11:16:34", "12:01:56",
"14:33:38")), row.names = c(NA, -40L), class = c("tbl_df", "tbl",
"data.frame"))
And what I'd like is
id index day time last
238686 1 1 11:53:33 0
238686 2 1 17:45:27 0
238686 3 1 21:12:36 0
238686 4 2 00:32:36 1
238686 5 2 11:07:08 0
238686 6 2 14:43:41 0
238686 7 2 20:50:29 0
238686 8 2 23:22:33 1
238686 9 3 12:05:53 0
Share
Improve this question
asked Mar 26 at 13:58
SointuSointu
2091 silver badge7 bronze badges
5
|
1 Answer
Reset to default 4So, we need to treat times before some cut-off time as if they were on the previous day. Create a new modified_day
column that does this.
day_cutoff = "03:00:01"
ex_data |>
mutate(day_modified = ifelse(time < day_cutoff, day - 1, day)) |>
mutate(last = as.integer(row_number() == n()), .by = c(id, day_modified)) |>
print(n = 20)
# # A tibble: 40 × 6
# id index day time day_modified last
# <dbl> <int> <int> <chr> <dbl> <int>
# 1 238686 1 1 11:53:33 1 0
# 2 238686 2 1 17:45:27 1 0
# 3 238686 3 1 21:12:36 1 0
# 4 238686 4 2 00:32:36 1 1
# 5 238686 5 2 11:07:08 2 0
# 6 238686 6 2 14:43:41 2 0
# 7 238686 7 2 20:50:29 2 0
# 8 238686 8 2 23:22:33 2 1
# 9 238686 9 3 12:05:53 3 0
# 10 238686 10 3 14:48:50 3 0
# 11 238686 11 3 21:15:33 3 1
# 12 238686 12 4 12:09:46 4 0
# 13 238686 13 4 14:27:06 4 0
# 14 238686 14 4 18:01:24 4 0
# 15 238686 15 4 20:56:40 4 0
# 16 238686 16 4 23:17:18 4 1
# 17 238686 17 5 11:19:02 5 0
# 18 238686 18 5 17:32:18 5 0
# 19 238686 19 6 00:05:52 5 1
# 20 238686 20 6 11:45:11 6 0
# # ℹ 20 more rows
# # ℹ Use `print(n = ...)` to see more rows
ex_data
, I think that is not what OP wants. – Gregor Thomas Commented Mar 26 at 16:4901:00:00
? Maybe04:00:00
? Is that what the03:00:01
is in your attempt? – Gregor Thomas Commented Mar 26 at 16:50