I am trying to calculate the mean time of day a particular activity occurs per participant in a long dataframe. Each participant has 7 timepoints of data, and I need to calculate the mean time of day across these timepoints. The 'time' variable is in 24hr time and the time of day is important for the output.
Create some example data:
data <- data.frame(
ID = c(1, 1, 1, 1, 1, 1),
time = c("23:49:47", "23:49:37", "23:39:02", "23:46:37", "00:27:40", "00:10:22", "
00:41:22"))
Try to calculate mean:
format(mean(strptime(data$time, "%H:%M:%S")), "%H:%M:%S")
This keeps giving output of "13:46:21" (1:46pm) which is not correct given all the time values are around midnight, hence the average should be somewhere around midnight.
I have also tried solutions from ChatGPT, converting HH:MM:SS to total minutes and back to HH:MM:SS but that also is giving the same answer. I am really stuck. Any help is greatly appreciated.
I am trying to calculate the mean time of day a particular activity occurs per participant in a long dataframe. Each participant has 7 timepoints of data, and I need to calculate the mean time of day across these timepoints. The 'time' variable is in 24hr time and the time of day is important for the output.
Create some example data:
data <- data.frame(
ID = c(1, 1, 1, 1, 1, 1),
time = c("23:49:47", "23:49:37", "23:39:02", "23:46:37", "00:27:40", "00:10:22", "
00:41:22"))
Try to calculate mean:
format(mean(strptime(data$time, "%H:%M:%S")), "%H:%M:%S")
This keeps giving output of "13:46:21" (1:46pm) which is not correct given all the time values are around midnight, hence the average should be somewhere around midnight.
I have also tried solutions from ChatGPT, converting HH:MM:SS to total minutes and back to HH:MM:SS but that also is giving the same answer. I am really stuck. Any help is greatly appreciated.
Share Improve this question asked Feb 6 at 21:05 rookie11rookie11 154 bronze badges 2 |4 Answers
Reset to default 2If you have presumed that all the times instances are around mid-night, you should specify that 00:MM:SS
denotes the times in the "next day", instead of "today" (as the reference date).
You can try the following workaround for example
with(
data,
format(
mean(as.POSIXct(paste0(Sys.Date() + startsWith(time, "0"), time))),
"%H:%M:%S"
)
)
which shows
"00:03:29"
You don't talk about dates. It makes a huge difference whether the date is included in the calculation or whether you are only looking at time of day. Others have suggested ways to handle the date; I'll talk about the case where the date is irrelevant.
In that case, time of day data is circular or directional data. 00:00:00 is one minute after 23:59:00, no matter which day they occurred on, and the mean should be 23:59:30.
To handle this you could use the directional mean. That needs conversion of times to directions and back. Here's code to do it:
timeToDirection <- function(times) {
result <- difftime(times,
as.POSIXct("00:00:00", format = "%H:%M:%S"),
units = "days")*2*pi
attr(result, "units") <- NULL
class(result) <- NULL
result
}
directionToTime <- function(direction) {
# Convert direction to days and add to midnight
days <- direction/2/pi
as.POSIXct("00:00:00", format = "%H:%M:%S") +
as.difftime(days, units = "days")
}
directionalMean <- function(dirs) {
atan2(mean(sin(dirs)), mean(cos(dirs)))
}
timeMean <- function(times) {
times |>
timeToDirection() |>
directionalMean() |>
directionToTime()
}
times <- as.POSIXct(c("23:49:47", "23:49:37", "23:39:02",
"23:46:37", "00:27:40", "00:10:22", "00:41:22"),
format = "%H:%M:%S")
timeMean(times)
#> [1] "2025-02-07 00:03:28 EST"
Created on 2025-02-07 with reprex v2.1.1
You could also do it with lubridate
library(lubridate)
time_seconds <- period_to_seconds(hms(data$time)) # Convert time column to seconds
time_seconds <- ifelse(time_seconds < 12 * 3600, time_seconds + 24 * 3600, time_seconds) # Adjust times around 0
mean_time <- mean(time_seconds) %% (24 * 3600) # Compute mean time and wrap within a day
format(as.POSIXct(mean_time, origin = "1970-01-01", tz = "UTC"), "%H:%M:%S") # Format as HH:MM:SS
[1] "00:03:29"
# or more precise
seconds_to_period(mean_time)
[1] "3M 29.5714285714348S"
Let visualize this
If we plot your times on a clock, we can clearly see that the average time of 3 min 29 sec (green) after midnight seems realistic:
Add the day to your data.
data <- data.frame(
ID = c(1, 1, 1, 1, 1, 1, 1),
date = c(rep(as.Date(c("2025-02-06"), 4), as.Date("2025-02-07"), 3)),
time = c("23:49:47", "23:49:37", "23:39:02", "23:46:37", "00:27:40", "00:10:22", "00:41:22"))
paste(data$date, data$time) |>
as.POSIXct() |>
mean() |>
format("%H:%M:%S")
# [1] "00:03:29"
If you don't have or care about the day, you can use circular arithmetic (see this answer).
library(lubridate)
seconds <- period_to_seconds(hms(as.POSIXct(data$time)))
conv <- 2*pi/86400 ## seconds -> radians
s <- (86400 + Arg(mean(exp(conv*(seconds)*1i)))/conv) %% 86400
round(seconds_to_period(s))
#[1] "3M 28S"
strptime(data$time, "%H:%M:%S")
. The average of 3 times at nearly 00:00:00 and 4 times at 23:59 on the same day is about 2pm on that day. You need the day as well as the time to make this work as intended. – thelatemail Commented Feb 6 at 21:31