最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Using survsplit() on recurrent events data - Stack Overflow

programmeradmin3浏览0评论

I am wanting to split a survival dataset at event times. So in the simple case of one row of data per person, convert that to counting process form (multiple rows per person) where each person's observation time is split at ALL subjects event times (of course up to however long a person's observation time is).

I can do this easily with one row per person where a person has an observation time recorded and either a status of having had the event (1) or not - censored (0).

But I would also like to be able to do this with recurrent events data. In this case, each person has potentially multiple rows of data recording multiple events (the last time may be an event or be censored).

Using survSplit() seems to expand the data by row, not ID (as I naively thought initially). Is there a way to do this so that the expanded dataset produced by survSplit() only splits time within an individual - not within every event experienced by that individual?

Some example code below:

library(survival)
library(dplyr)

dat <-  structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), 
                       age = c(43L, 43L, 43L, 43L, 43L, 43L, 43L, 41L, 41L, 41L, 41L), 
                       treat = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
                                         levels = c("old", "new"), class = "factor"), 
                       time0 = c(0L, 6L, 9L, 56L, 0L, 42L, 87L, 0L, 15L, 17L, 36L), 
                       time1 = c(6L, 9L, 56L, 88L, 42L, 87L, 91L, 15L, 17L, 36L, 112L), 
                       status = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0), 
                       event = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L)),
                  datalabel = "Chapter 9 Exercises", time.stamp = " 7 Dec 1999 08:53", formats = c("%9.0g", "%9.0g", "%9.0g", "%9.0g", "%9.0g", "%19.0g", "%9.0g"), 
                  types = c(105L, 98L, 98L, 105L, 105L, 98L, 98L), 
                  val.labels = c("", "", "oldnew", "", "", "censor", ""), 
                  var.labels = c("Subject Identification", "Age", "Treatment Assignment", "Time of Last Episode", "Time of Current Episode or censoring", "Indicator for Soreness Episode or censoring", "Soreness Episode Number"), 
                  row.names = c("3", "4", "1", "2", "5", "7", "6", "8", "9", "10", "11"), 
                  version = 6L, label.table = list(oldnew = structure(0:1, names = c("new", "old")), 
                                                   censor = structure(0:1, names = c("censored", "experienced"))), class = "data.frame")


# Split at event times
event_times <- sort(unique(with(dat, time1[status == 1])))
# Create new df in CP form with splits at every event time
dat2 <- survSplit(Surv(time1, status) ~., dat, cut = event_times)
# This is NOT what I want as it expands by row (event) not ID.

Instead, below is a screenshot of the expanded dataset for the first 3 subjects as I would like to recreate. I have done this manually in Excel.

There does not seem to be a way to do this in survSplit(), unless I have missed something?

I am wanting to split a survival dataset at event times. So in the simple case of one row of data per person, convert that to counting process form (multiple rows per person) where each person's observation time is split at ALL subjects event times (of course up to however long a person's observation time is).

I can do this easily with one row per person where a person has an observation time recorded and either a status of having had the event (1) or not - censored (0).

But I would also like to be able to do this with recurrent events data. In this case, each person has potentially multiple rows of data recording multiple events (the last time may be an event or be censored).

Using survSplit() seems to expand the data by row, not ID (as I naively thought initially). Is there a way to do this so that the expanded dataset produced by survSplit() only splits time within an individual - not within every event experienced by that individual?

Some example code below:

library(survival)
library(dplyr)

dat <-  structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), 
                       age = c(43L, 43L, 43L, 43L, 43L, 43L, 43L, 41L, 41L, 41L, 41L), 
                       treat = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
                                         levels = c("old", "new"), class = "factor"), 
                       time0 = c(0L, 6L, 9L, 56L, 0L, 42L, 87L, 0L, 15L, 17L, 36L), 
                       time1 = c(6L, 9L, 56L, 88L, 42L, 87L, 91L, 15L, 17L, 36L, 112L), 
                       status = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0), 
                       event = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L)),
                  datalabel = "Chapter 9 Exercises", time.stamp = " 7 Dec 1999 08:53", formats = c("%9.0g", "%9.0g", "%9.0g", "%9.0g", "%9.0g", "%19.0g", "%9.0g"), 
                  types = c(105L, 98L, 98L, 105L, 105L, 98L, 98L), 
                  val.labels = c("", "", "oldnew", "", "", "censor", ""), 
                  var.labels = c("Subject Identification", "Age", "Treatment Assignment", "Time of Last Episode", "Time of Current Episode or censoring", "Indicator for Soreness Episode or censoring", "Soreness Episode Number"), 
                  row.names = c("3", "4", "1", "2", "5", "7", "6", "8", "9", "10", "11"), 
                  version = 6L, label.table = list(oldnew = structure(0:1, names = c("new", "old")), 
                                                   censor = structure(0:1, names = c("censored", "experienced"))), class = "data.frame")


# Split at event times
event_times <- sort(unique(with(dat, time1[status == 1])))
# Create new df in CP form with splits at every event time
dat2 <- survSplit(Surv(time1, status) ~., dat, cut = event_times)
# This is NOT what I want as it expands by row (event) not ID.

Instead, below is a screenshot of the expanded dataset for the first 3 subjects as I would like to recreate. I have done this manually in Excel.

There does not seem to be a way to do this in survSplit(), unless I have missed something?

Share Improve this question edited Feb 17 at 5:19 LucaS asked Feb 16 at 22:32 LucaSLucaS 1,2931 gold badge13 silver badges25 bronze badges 4
  • 3 Can you please manually construct the output that you DO want (i.e. not just the output that you DON'T want)? – langtang Commented Feb 16 at 23:25
  • I can't show you what it should look like in this case (as I don't know how to code to produce that), but I have added some code in the question that shows how survsplit() works on single row/person data. There is an event for subject 1 at time1 = 88, so subject's 2 and 3 both have their time split at that point as well. Obviously with a lot more subjects there are potentially a lot more events and consequently times to split at. – LucaS Commented Feb 17 at 0:26
  • 1 again, I feel like I can't move forward until I see an example of what you actually want. If you know dat2 is not what you want, then you must have some idea of what the actual desired output would look like. You should be able to construct an example manually, using a toy dataset of two or three individuals, with at least one without any events, at least one with two events, and at least one with only one event. – langtang Commented Feb 17 at 1:37
  • @langtang I've added a screenshot of what I manually created in Excel to give you some idea of the type of result I'm after. – LucaS Commented Feb 17 at 4:57
Add a comment  | 

2 Answers 2

Reset to default 4

You can split the data on ID and then apply the survSplit function to each and then combine the results together with map_dfr from purrr.

library(purrr)
library(survival)

map_dfr(split(dat, dat$ID), \(x) {
              survSplit(Surv(time0, time1, status) ~ .,
              data=x,
              cut=unique(dat$time1))})

   ID age treat event time0 time1 status
1   1  43   new     1     0     6      1
2   1  43   new     2     6     9      1
3   1  43   new     3     9    15      0
4   1  43   new     3    15    17      0
5   1  43   new     3    17    36      0
6   1  43   new     3    36    42      0
7   1  43   new     3    42    56      1
8   1  43   new     4    56    87      0
9   1  43   new     4    87    88      1
10  2  43   new     1     0     6      0
11  2  43   new     1     6     9      0
12  2  43   new     1     9    15      0
13  2  43   new     1    15    17      0
14  2  43   new     1    17    36      0
15  2  43   new     1    36    42      1
16  2  43   new     2    42    56      0
17  2  43   new     2    56    87      1
18  2  43   new     3    87    88      0
19  2  43   new     3    88    91      0
20  3  41   new     1     0     6      0
21  3  41   new     1     6     9      0
22  3  41   new     1     9    15      1
23  3  41   new     2    15    17      1
24  3  41   new     3    17    36      1
25  3  41   new     4    36    42      0
26  3  41   new     4    42    56      0
27  3  41   new     4    56    87      0
28  3  41   new     4    87    88      0
29  3  41   new     4    88    91      0
30  3  41   new     4    91   112      0

Update: the last but one row was missing (row 29); I refined the code:

We write a small function split_single_row() that

  1. takes a single‐row data frame data_row
  2. subsets the global cut points cutpoints to those strictly inside the interval time0, time1
  3. calls survSplit() on just that row

Finally we apply row‐wise splitting for each subject with group_modify() where each .x is the subset of rows for one subject (possibly multiple intervals). Using map_dfr() we split each row and combine the results within that group.

library(dplyr)
library(survival)
library(purrr)

split_single_row <- function(data_row, cutpoints) {
  survival::survSplit(
    formula = Surv(time0, time1, status) ~ .,
    data    = data_row,
    cut     = cutpoints,
    start   = "time0",
    end     = "time1",
    event   = "status"
  )
}

dat %>%
  group_modify(~ {
    map_dfr(seq_len(nrow(.x)), function(i) {
      split_single_row(.x[i, ], cutpoints = unique(dat$time1))
    })
  }, .by = ID)


 ID age treat event time0 time1 status
1   1  43   new     1     0     6      1
2   1  43   new     2     6     9      1
3   1  43   new     3     9    15      0
4   1  43   new     3    15    17      0
5   1  43   new     3    17    36      0
6   1  43   new     3    36    42      0
7   1  43   new     3    42    56      1
8   1  43   new     4    56    87      0
9   1  43   new     4    87    88      1
10  2  43   new     1     0     6      0
11  2  43   new     1     6     9      0
12  2  43   new     1     9    15      0
13  2  43   new     1    15    17      0
14  2  43   new     1    17    36      0
15  2  43   new     1    36    42      1
16  2  43   new     2    42    56      0
17  2  43   new     2    56    87      1
18  2  43   new     3    87    88      0
19  2  43   new     3    88    91      0
20  3  41   new     1     0     6      0
21  3  41   new     1     6     9      0
22  3  41   new     1     9    15      1
23  3  41   new     2    15    17      1
24  3  41   new     3    17    36      1
25  3  41   new     4    36    42      0
26  3  41   new     4    42    56      0
27  3  41   new     4    56    87      0
28  3  41   new     4    87    88      0
29  3  41   new     4    88    91      0
30  3  41   new     4    91   112      0
发布评论

评论列表(0)

  1. 暂无评论