最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Insert appropriate number of rows into data frame based on date differences - Stack Overflow

programmeradmin5浏览0评论

I have a data frame as below

df1 <- read.table(text = "entity_id state        last_changed DT.diff
sensor.kincony02_temperature03  20.4 '2025-02-04 23:00:15'  15.188
sensor.kincony02_temperature03  20.3 '2025-02-04 23:08:15' 479.849
sensor.kincony02_temperature03  20.2 '2025-02-04 23:10:15' 120.115
sensor.kincony02_temperature03  20.3 '2025-02-04 23:15:15' 300.136
sensor.kincony02_temperature03  20.4 '2025-02-04 23:18:15' 180.020
sensor.kincony02_temperature03  20.5 '2025-02-04 23:21:15' 180.020
sensor.kincony02_temperature03  20.6 '2025-02-04 23:22:15'  59.904
sensor.kincony02_temperature03  20.7 '2025-02-04 23:23:15'  59.904
sensor.kincony02_temperature03  20.8 '2025-02-04 23:25:15' 120.115
sensor.kincony02_temperature03  20.9 '2025-02-04 23:27:15' 119.809
sensor.kincony02_temperature03  21.0 '2025-02-04 23:30:15' 179.979
sensor.kincony02_temperature03  21.1 '2025-02-04 23:31:15'  60.252
sensor.kincony02_temperature03  21.2 '2025-02-04 23:35:15' 239.921
sensor.kincony02_temperature03  21.3 '2025-02-04 23:46:15' 659.865
sensor.kincony02_temperature03  21.2 '2025-02-04 23:47:15'  60.008
sensor.kincony02_temperature03  21.1 '2025-02-04 23:51:15' 240.025
sensor.kincony02_temperature03  21.2 '2025-02-04 23:53:15' 120.218
sensor.kincony02_temperature03  21.1 '2025-02-04 23:54:15'  59.903
sensor.kincony02_temperature03  21.0 '2025-02-05 00:02:15' 479.803
sensor.kincony02_temperature03  20.9 '2025-02-05 00:06:15' 239.999
sensor.kincony02_temperature03  20.8 '2025-02-05 00:11:15' 300.007
sensor.kincony02_temperature03  20.7 '2025-02-05 00:13:15' 119.997
sensor.kincony02_temperature03  20.6 '2025-02-05 00:14:15'  60.008
sensor.kincony02_temperature03  20.5 '2025-02-05 00:15:15'  60.002
sensor.kincony02_temperature03  20.4 '2025-02-05 00:17:15' 119.999
sensor.kincony02_temperature03  20.3 '2025-02-05 00:19:15' 119.996
sensor.kincony02_temperature03  20.2 '2025-02-05 00:20:15'  59.998
sensor.kincony02_temperature03  20.1 '2025-02-05 00:24:15' 240.009
sensor.kincony02_temperature03  20.0 '2025-02-05 00:27:15' 179.997", header = TRUE) 

I need to add rows so that the data frame contains values ​​for each minute. Currently, the device only records the moment of change of the measured value, which means that between the recorded times the measured value did not change.

The expected data frame for the first few rows should look like this:

df1.1 <- read.table(text = "entity_id state        last_changed 
sensor.kincony02_temperature03  20.4 '2025-02-04 23:00:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:01:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:02:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:03:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:04:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:05:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:06:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:07:15'
sensor.kincony02_temperature03  20.3 '2025-02-04 23:08:15'
sensor.kincony02_temperature03  20.3 '2025-02-04 23:09:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:10:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:11:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:12:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:13:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:14:15'
sensor.kincony02_temperature03  20.3 '2025-02-04 23:15:15'", header = TRUE) 

I have a data frame as below

df1 <- read.table(text = "entity_id state        last_changed DT.diff
sensor.kincony02_temperature03  20.4 '2025-02-04 23:00:15'  15.188
sensor.kincony02_temperature03  20.3 '2025-02-04 23:08:15' 479.849
sensor.kincony02_temperature03  20.2 '2025-02-04 23:10:15' 120.115
sensor.kincony02_temperature03  20.3 '2025-02-04 23:15:15' 300.136
sensor.kincony02_temperature03  20.4 '2025-02-04 23:18:15' 180.020
sensor.kincony02_temperature03  20.5 '2025-02-04 23:21:15' 180.020
sensor.kincony02_temperature03  20.6 '2025-02-04 23:22:15'  59.904
sensor.kincony02_temperature03  20.7 '2025-02-04 23:23:15'  59.904
sensor.kincony02_temperature03  20.8 '2025-02-04 23:25:15' 120.115
sensor.kincony02_temperature03  20.9 '2025-02-04 23:27:15' 119.809
sensor.kincony02_temperature03  21.0 '2025-02-04 23:30:15' 179.979
sensor.kincony02_temperature03  21.1 '2025-02-04 23:31:15'  60.252
sensor.kincony02_temperature03  21.2 '2025-02-04 23:35:15' 239.921
sensor.kincony02_temperature03  21.3 '2025-02-04 23:46:15' 659.865
sensor.kincony02_temperature03  21.2 '2025-02-04 23:47:15'  60.008
sensor.kincony02_temperature03  21.1 '2025-02-04 23:51:15' 240.025
sensor.kincony02_temperature03  21.2 '2025-02-04 23:53:15' 120.218
sensor.kincony02_temperature03  21.1 '2025-02-04 23:54:15'  59.903
sensor.kincony02_temperature03  21.0 '2025-02-05 00:02:15' 479.803
sensor.kincony02_temperature03  20.9 '2025-02-05 00:06:15' 239.999
sensor.kincony02_temperature03  20.8 '2025-02-05 00:11:15' 300.007
sensor.kincony02_temperature03  20.7 '2025-02-05 00:13:15' 119.997
sensor.kincony02_temperature03  20.6 '2025-02-05 00:14:15'  60.008
sensor.kincony02_temperature03  20.5 '2025-02-05 00:15:15'  60.002
sensor.kincony02_temperature03  20.4 '2025-02-05 00:17:15' 119.999
sensor.kincony02_temperature03  20.3 '2025-02-05 00:19:15' 119.996
sensor.kincony02_temperature03  20.2 '2025-02-05 00:20:15'  59.998
sensor.kincony02_temperature03  20.1 '2025-02-05 00:24:15' 240.009
sensor.kincony02_temperature03  20.0 '2025-02-05 00:27:15' 179.997", header = TRUE) 

I need to add rows so that the data frame contains values ​​for each minute. Currently, the device only records the moment of change of the measured value, which means that between the recorded times the measured value did not change.

The expected data frame for the first few rows should look like this:

df1.1 <- read.table(text = "entity_id state        last_changed 
sensor.kincony02_temperature03  20.4 '2025-02-04 23:00:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:01:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:02:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:03:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:04:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:05:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:06:15'
sensor.kincony02_temperature03  20.4 '2025-02-04 23:07:15'
sensor.kincony02_temperature03  20.3 '2025-02-04 23:08:15'
sensor.kincony02_temperature03  20.3 '2025-02-04 23:09:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:10:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:11:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:12:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:13:15'
sensor.kincony02_temperature03  20.2 '2025-02-04 23:14:15'
sensor.kincony02_temperature03  20.3 '2025-02-04 23:15:15'", header = TRUE) 
Share Improve this question asked Feb 7 at 13:06 GrBaGrBa 6494 silver badges12 bronze badges 1
  • 1 While this is a reproductive example, it is better to provide a minimal example. In your case, you don't need to present the first 29 columns to exemplify your problem. – Maël Commented Feb 7 at 13:22
Add a comment  | 

2 Answers 2

Reset to default 2

You can first format date-times as.POSIXct, then merge with a data.frame of a date-time sequence over the range of last_changed variable, and finally—using zoo::na.locf—replacing each NA with the most recent non-NA prior to it.

> df1 |> 
+   transform(last_changed=last_changed |> as.POSIXct()) |> 
+   merge(
+     data.frame(last_changed=do.call('seq', c(as.list(range(df1$last_changed)), 
+                                              by='min'))), all=TRUE) |> 
+   transform(state=zoo::na.locf(state), 
+             entity_id=zoo::na.locf(entity_id), 
+             DT.diff=NULL)
          last_changed                      entity_id state
1  2025-02-04 23:00:15 sensor.kincony02_temperature03  20.4
2  2025-02-04 23:01:15 sensor.kincony02_temperature03  20.4
3  2025-02-04 23:02:15 sensor.kincony02_temperature03  20.4
4  2025-02-04 23:03:15 sensor.kincony02_temperature03  20.4
5  2025-02-04 23:04:15 sensor.kincony02_temperature03  20.4
6  2025-02-04 23:05:15 sensor.kincony02_temperature03  20.4
7  2025-02-04 23:06:15 sensor.kincony02_temperature03  20.4
8  2025-02-04 23:07:15 sensor.kincony02_temperature03  20.4
9  2025-02-04 23:08:15 sensor.kincony02_temperature03  20.3
10 2025-02-04 23:09:15 sensor.kincony02_temperature03  20.3
11 2025-02-04 23:10:15 sensor.kincony02_temperature03  20.2
12 2025-02-04 23:11:15 sensor.kincony02_temperature03  20.2
13 2025-02-04 23:12:15 sensor.kincony02_temperature03  20.2
[...]

You need to complete your data and fill values with the last observation carried forward:

library(dplyr)
library(tidyr)

df1 |>
  mutate(last_changed = as.POSIXct(last_changed)) |>
  complete(last_changed = full_seq(last_changed, 60)) |>
  fill(c(entity_id, state, DT.diff))


# head()
#          last_changed                      entity_id state DT.diff
# 1 2025-02-04 23:00:15 sensor.kincony02_temperature03  20.4  15.188
# 2 2025-02-04 23:01:15 sensor.kincony02_temperature03  20.4  15.188
# 3 2025-02-04 23:02:15 sensor.kincony02_temperature03  20.4  15.188
# 4 2025-02-04 23:03:15 sensor.kincony02_temperature03  20.4  15.188
# 5 2025-02-04 23:04:15 sensor.kincony02_temperature03  20.4  15.188
# 6 2025-02-04 23:05:15 sensor.kincony02_temperature03  20.4  15.188
发布评论

评论列表(0)

  1. 暂无评论