I have a data frame as below
df1 <- read.table(text = "entity_id state last_changed DT.diff
sensor.kincony02_temperature03 20.4 '2025-02-04 23:00:15' 15.188
sensor.kincony02_temperature03 20.3 '2025-02-04 23:08:15' 479.849
sensor.kincony02_temperature03 20.2 '2025-02-04 23:10:15' 120.115
sensor.kincony02_temperature03 20.3 '2025-02-04 23:15:15' 300.136
sensor.kincony02_temperature03 20.4 '2025-02-04 23:18:15' 180.020
sensor.kincony02_temperature03 20.5 '2025-02-04 23:21:15' 180.020
sensor.kincony02_temperature03 20.6 '2025-02-04 23:22:15' 59.904
sensor.kincony02_temperature03 20.7 '2025-02-04 23:23:15' 59.904
sensor.kincony02_temperature03 20.8 '2025-02-04 23:25:15' 120.115
sensor.kincony02_temperature03 20.9 '2025-02-04 23:27:15' 119.809
sensor.kincony02_temperature03 21.0 '2025-02-04 23:30:15' 179.979
sensor.kincony02_temperature03 21.1 '2025-02-04 23:31:15' 60.252
sensor.kincony02_temperature03 21.2 '2025-02-04 23:35:15' 239.921
sensor.kincony02_temperature03 21.3 '2025-02-04 23:46:15' 659.865
sensor.kincony02_temperature03 21.2 '2025-02-04 23:47:15' 60.008
sensor.kincony02_temperature03 21.1 '2025-02-04 23:51:15' 240.025
sensor.kincony02_temperature03 21.2 '2025-02-04 23:53:15' 120.218
sensor.kincony02_temperature03 21.1 '2025-02-04 23:54:15' 59.903
sensor.kincony02_temperature03 21.0 '2025-02-05 00:02:15' 479.803
sensor.kincony02_temperature03 20.9 '2025-02-05 00:06:15' 239.999
sensor.kincony02_temperature03 20.8 '2025-02-05 00:11:15' 300.007
sensor.kincony02_temperature03 20.7 '2025-02-05 00:13:15' 119.997
sensor.kincony02_temperature03 20.6 '2025-02-05 00:14:15' 60.008
sensor.kincony02_temperature03 20.5 '2025-02-05 00:15:15' 60.002
sensor.kincony02_temperature03 20.4 '2025-02-05 00:17:15' 119.999
sensor.kincony02_temperature03 20.3 '2025-02-05 00:19:15' 119.996
sensor.kincony02_temperature03 20.2 '2025-02-05 00:20:15' 59.998
sensor.kincony02_temperature03 20.1 '2025-02-05 00:24:15' 240.009
sensor.kincony02_temperature03 20.0 '2025-02-05 00:27:15' 179.997", header = TRUE)
I need to add rows so that the data frame contains values for each minute. Currently, the device only records the moment of change of the measured value, which means that between the recorded times the measured value did not change.
The expected data frame for the first few rows should look like this:
df1.1 <- read.table(text = "entity_id state last_changed
sensor.kincony02_temperature03 20.4 '2025-02-04 23:00:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:01:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:02:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:03:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:04:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:05:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:06:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:07:15'
sensor.kincony02_temperature03 20.3 '2025-02-04 23:08:15'
sensor.kincony02_temperature03 20.3 '2025-02-04 23:09:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:10:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:11:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:12:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:13:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:14:15'
sensor.kincony02_temperature03 20.3 '2025-02-04 23:15:15'", header = TRUE)
I have a data frame as below
df1 <- read.table(text = "entity_id state last_changed DT.diff
sensor.kincony02_temperature03 20.4 '2025-02-04 23:00:15' 15.188
sensor.kincony02_temperature03 20.3 '2025-02-04 23:08:15' 479.849
sensor.kincony02_temperature03 20.2 '2025-02-04 23:10:15' 120.115
sensor.kincony02_temperature03 20.3 '2025-02-04 23:15:15' 300.136
sensor.kincony02_temperature03 20.4 '2025-02-04 23:18:15' 180.020
sensor.kincony02_temperature03 20.5 '2025-02-04 23:21:15' 180.020
sensor.kincony02_temperature03 20.6 '2025-02-04 23:22:15' 59.904
sensor.kincony02_temperature03 20.7 '2025-02-04 23:23:15' 59.904
sensor.kincony02_temperature03 20.8 '2025-02-04 23:25:15' 120.115
sensor.kincony02_temperature03 20.9 '2025-02-04 23:27:15' 119.809
sensor.kincony02_temperature03 21.0 '2025-02-04 23:30:15' 179.979
sensor.kincony02_temperature03 21.1 '2025-02-04 23:31:15' 60.252
sensor.kincony02_temperature03 21.2 '2025-02-04 23:35:15' 239.921
sensor.kincony02_temperature03 21.3 '2025-02-04 23:46:15' 659.865
sensor.kincony02_temperature03 21.2 '2025-02-04 23:47:15' 60.008
sensor.kincony02_temperature03 21.1 '2025-02-04 23:51:15' 240.025
sensor.kincony02_temperature03 21.2 '2025-02-04 23:53:15' 120.218
sensor.kincony02_temperature03 21.1 '2025-02-04 23:54:15' 59.903
sensor.kincony02_temperature03 21.0 '2025-02-05 00:02:15' 479.803
sensor.kincony02_temperature03 20.9 '2025-02-05 00:06:15' 239.999
sensor.kincony02_temperature03 20.8 '2025-02-05 00:11:15' 300.007
sensor.kincony02_temperature03 20.7 '2025-02-05 00:13:15' 119.997
sensor.kincony02_temperature03 20.6 '2025-02-05 00:14:15' 60.008
sensor.kincony02_temperature03 20.5 '2025-02-05 00:15:15' 60.002
sensor.kincony02_temperature03 20.4 '2025-02-05 00:17:15' 119.999
sensor.kincony02_temperature03 20.3 '2025-02-05 00:19:15' 119.996
sensor.kincony02_temperature03 20.2 '2025-02-05 00:20:15' 59.998
sensor.kincony02_temperature03 20.1 '2025-02-05 00:24:15' 240.009
sensor.kincony02_temperature03 20.0 '2025-02-05 00:27:15' 179.997", header = TRUE)
I need to add rows so that the data frame contains values for each minute. Currently, the device only records the moment of change of the measured value, which means that between the recorded times the measured value did not change.
The expected data frame for the first few rows should look like this:
df1.1 <- read.table(text = "entity_id state last_changed
sensor.kincony02_temperature03 20.4 '2025-02-04 23:00:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:01:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:02:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:03:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:04:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:05:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:06:15'
sensor.kincony02_temperature03 20.4 '2025-02-04 23:07:15'
sensor.kincony02_temperature03 20.3 '2025-02-04 23:08:15'
sensor.kincony02_temperature03 20.3 '2025-02-04 23:09:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:10:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:11:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:12:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:13:15'
sensor.kincony02_temperature03 20.2 '2025-02-04 23:14:15'
sensor.kincony02_temperature03 20.3 '2025-02-04 23:15:15'", header = TRUE)
Share
Improve this question
asked Feb 7 at 13:06
GrBaGrBa
6494 silver badges12 bronze badges
1
- 1 While this is a reproductive example, it is better to provide a minimal example. In your case, you don't need to present the first 29 columns to exemplify your problem. – Maël Commented Feb 7 at 13:22
2 Answers
Reset to default 2You can first format date-times as.POSIXct
, then merge
with a data.frame
of a date-time sequence over the range
of last_changed variable, and finally—using zoo::na.locf
—replacing each NA with the most recent non-NA prior to it.
> df1 |>
+ transform(last_changed=last_changed |> as.POSIXct()) |>
+ merge(
+ data.frame(last_changed=do.call('seq', c(as.list(range(df1$last_changed)),
+ by='min'))), all=TRUE) |>
+ transform(state=zoo::na.locf(state),
+ entity_id=zoo::na.locf(entity_id),
+ DT.diff=NULL)
last_changed entity_id state
1 2025-02-04 23:00:15 sensor.kincony02_temperature03 20.4
2 2025-02-04 23:01:15 sensor.kincony02_temperature03 20.4
3 2025-02-04 23:02:15 sensor.kincony02_temperature03 20.4
4 2025-02-04 23:03:15 sensor.kincony02_temperature03 20.4
5 2025-02-04 23:04:15 sensor.kincony02_temperature03 20.4
6 2025-02-04 23:05:15 sensor.kincony02_temperature03 20.4
7 2025-02-04 23:06:15 sensor.kincony02_temperature03 20.4
8 2025-02-04 23:07:15 sensor.kincony02_temperature03 20.4
9 2025-02-04 23:08:15 sensor.kincony02_temperature03 20.3
10 2025-02-04 23:09:15 sensor.kincony02_temperature03 20.3
11 2025-02-04 23:10:15 sensor.kincony02_temperature03 20.2
12 2025-02-04 23:11:15 sensor.kincony02_temperature03 20.2
13 2025-02-04 23:12:15 sensor.kincony02_temperature03 20.2
[...]
You need to complete
your data and fill
values with the last observation carried forward:
library(dplyr)
library(tidyr)
df1 |>
mutate(last_changed = as.POSIXct(last_changed)) |>
complete(last_changed = full_seq(last_changed, 60)) |>
fill(c(entity_id, state, DT.diff))
# head()
# last_changed entity_id state DT.diff
# 1 2025-02-04 23:00:15 sensor.kincony02_temperature03 20.4 15.188
# 2 2025-02-04 23:01:15 sensor.kincony02_temperature03 20.4 15.188
# 3 2025-02-04 23:02:15 sensor.kincony02_temperature03 20.4 15.188
# 4 2025-02-04 23:03:15 sensor.kincony02_temperature03 20.4 15.188
# 5 2025-02-04 23:04:15 sensor.kincony02_temperature03 20.4 15.188
# 6 2025-02-04 23:05:15 sensor.kincony02_temperature03 20.4 15.188