最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

datetime - Calculate mean time of day in %H:%M:%S in R - Stack Overflow

programmeradmin1浏览0评论

I am trying to calculate the mean time of day a particular activity occurs per participant in a long dataframe. Each participant has 7 timepoints of data, and I need to calculate the mean time of day across these timepoints. The 'time' variable is in 24hr time and the time of day is important for the output.

Create some example data:

data <- data.frame(
  ID = c(1, 1, 1, 1, 1, 1),
  time = c("23:49:47", "23:49:37", "23:39:02", "23:46:37", "00:27:40", "00:10:22", "    
00:41:22"))

Try to calculate mean:

format(mean(strptime(data$time, "%H:%M:%S")), "%H:%M:%S")

This keeps giving output of "13:46:21" (1:46pm) which is not correct given all the time values are around midnight, hence the average should be somewhere around midnight.

I have also tried solutions from ChatGPT, converting HH:MM:SS to total minutes and back to HH:MM:SS but that also is giving the same answer. I am really stuck. Any help is greatly appreciated.

I am trying to calculate the mean time of day a particular activity occurs per participant in a long dataframe. Each participant has 7 timepoints of data, and I need to calculate the mean time of day across these timepoints. The 'time' variable is in 24hr time and the time of day is important for the output.

Create some example data:

data <- data.frame(
  ID = c(1, 1, 1, 1, 1, 1),
  time = c("23:49:47", "23:49:37", "23:39:02", "23:46:37", "00:27:40", "00:10:22", "    
00:41:22"))

Try to calculate mean:

format(mean(strptime(data$time, "%H:%M:%S")), "%H:%M:%S")

This keeps giving output of "13:46:21" (1:46pm) which is not correct given all the time values are around midnight, hence the average should be somewhere around midnight.

I have also tried solutions from ChatGPT, converting HH:MM:SS to total minutes and back to HH:MM:SS but that also is giving the same answer. I am really stuck. Any help is greatly appreciated.

Share Improve this question asked Feb 6 at 21:05 rookie11rookie11 154 bronze badges 2
  • 1 The average is correct as shown given the results of strptime(data$time, "%H:%M:%S"). The average of 3 times at nearly 00:00:00 and 4 times at 23:59 on the same day is about 2pm on that day. You need the day as well as the time to make this work as intended. – thelatemail Commented Feb 6 at 21:31
  • Interesting question. My advice is to think about mapping the time of day onto a circle, and then finding the mean value as the point on the circle which has angle equal to the mean of the angles of the points on the circle. For values near midnight, the angles will be small negative values (just before midnight) or small positive values (just after midnight. Working with a circle doesn't entirely make the problem go away, but could help you think about where the problem is exactly, and what to do about it. – Robert Dodier Commented Feb 6 at 21:57
Add a comment  | 

4 Answers 4

Reset to default 2

If you have presumed that all the times instances are around mid-night, you should specify that 00:MM:SS denotes the times in the "next day", instead of "today" (as the reference date).

You can try the following workaround for example

with(
    data,
    format(
        mean(as.POSIXct(paste0(Sys.Date() + startsWith(time, "0"), time))),
        "%H:%M:%S"
    )
)

which shows

"00:03:29"

You don't talk about dates. It makes a huge difference whether the date is included in the calculation or whether you are only looking at time of day. Others have suggested ways to handle the date; I'll talk about the case where the date is irrelevant.

In that case, time of day data is circular or directional data. 00:00:00 is one minute after 23:59:00, no matter which day they occurred on, and the mean should be 23:59:30.

To handle this you could use the directional mean. That needs conversion of times to directions and back. Here's code to do it:

timeToDirection <- function(times) {
  result <- difftime(times, 
           as.POSIXct("00:00:00", format = "%H:%M:%S"), 
           units = "days")*2*pi
  attr(result, "units") <- NULL
  class(result) <- NULL
  result
}

directionToTime <- function(direction) {
  # Convert direction to days and add to midnight
  days <- direction/2/pi
  as.POSIXct("00:00:00", format = "%H:%M:%S") + 
    as.difftime(days, units = "days")
}

directionalMean <- function(dirs) {
  atan2(mean(sin(dirs)), mean(cos(dirs)))
}

timeMean <- function(times) {
  times |> 
    timeToDirection() |> 
    directionalMean() |> 
    directionToTime()
}

times <- as.POSIXct(c("23:49:47", "23:49:37", "23:39:02", 
"23:46:37", "00:27:40", "00:10:22", "00:41:22"),
                  format = "%H:%M:%S")

timeMean(times)
#> [1] "2025-02-07 00:03:28 EST"

Created on 2025-02-07 with reprex v2.1.1

You could also do it with lubridate

library(lubridate)

time_seconds <- period_to_seconds(hms(data$time)) # Convert time column to seconds
time_seconds <- ifelse(time_seconds < 12 * 3600, time_seconds + 24 * 3600, time_seconds) # Adjust times around 0
mean_time <- mean(time_seconds) %% (24 * 3600) # Compute mean time and wrap within a day
format(as.POSIXct(mean_time, origin = "1970-01-01", tz = "UTC"), "%H:%M:%S") # Format as HH:MM:SS
[1] "00:03:29"

# or more precise
seconds_to_period(mean_time)
[1] "3M 29.5714285714348S"

Let visualize this

If we plot your times on a clock, we can clearly see that the average time of 3 min 29 sec (green) after midnight seems realistic:

Add the day to your data.

data <- data.frame(
  ID = c(1, 1, 1, 1, 1, 1, 1),
  date = c(rep(as.Date(c("2025-02-06"), 4), as.Date("2025-02-07"), 3)),
  time = c("23:49:47", "23:49:37", "23:39:02", "23:46:37", "00:27:40", "00:10:22", "00:41:22"))

paste(data$date, data$time) |>
  as.POSIXct() |>
  mean() |>
  format("%H:%M:%S")
# [1] "00:03:29"

If you don't have or care about the day, you can use circular arithmetic (see this answer).

library(lubridate)
seconds <- period_to_seconds(hms(as.POSIXct(data$time)))
conv <- 2*pi/86400 ## seconds -> radians
s <- (86400 + Arg(mean(exp(conv*(seconds)*1i)))/conv) %% 86400
round(seconds_to_period(s))
#[1] "3M 28S"
发布评论

评论列表(0)

  1. 暂无评论