r - Using survsplit() on recurrent events data

I am wanting to split a survival dataset at event times. So in the simple case of one row of data per person, convert that to counting process form (multiple rows per person) where each person's observation time is split at ALL subjects event times (of course up to however long a person's observation time is).

I can do this easily with one row per person where a person has an observation time recorded and either a status of having had the event (1) or not - censored (0).

But I would also like to be able to do this with recurrent events data. In this case, each person has potentially multiple rows of data recording multiple events (the last time may be an event or be censored).

Using survSplit() seems to expand the data by row, not ID (as I naively thought initially). Is there a way to do this so that the expanded dataset produced by survSplit() only splits time within an individual - not within every event experienced by that individual?

Some example code below:

library(survival)
library(dplyr)

dat <-  structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), 
                       age = c(43L, 43L, 43L, 43L, 43L, 43L, 43L, 41L, 41L, 41L, 41L), 
                       treat = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
                                         levels = c("old", "new"), class = "factor"), 
                       time0 = c(0L, 6L, 9L, 56L, 0L, 42L, 87L, 0L, 15L, 17L, 36L), 
                       time1 = c(6L, 9L, 56L, 88L, 42L, 87L, 91L, 15L, 17L, 36L, 112L), 
                       status = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0), 
                       event = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L)),
                  datalabel = "Chapter 9 Exercises", time.stamp = " 7 Dec 1999 08:53", formats = c("%9.0g", "%9.0g", "%9.0g", "%9.0g", "%9.0g", "%19.0g", "%9.0g"), 
                  types = c(105L, 98L, 98L, 105L, 105L, 98L, 98L), 
                  val.labels = c("", "", "oldnew", "", "", "censor", ""), 
                  var.labels = c("Subject Identification", "Age", "Treatment Assignment", "Time of Last Episode", "Time of Current Episode or censoring", "Indicator for Soreness Episode or censoring", "Soreness Episode Number"), 
                  row.names = c("3", "4", "1", "2", "5", "7", "6", "8", "9", "10", "11"), 
                  version = 6L, label.table = list(oldnew = structure(0:1, names = c("new", "old")), 
                                                   censor = structure(0:1, names = c("censored", "experienced"))), class = "data.frame")


# Split at event times
event_times <- sort(unique(with(dat, time1[status == 1])))
# Create new df in CP form with splits at every event time
dat2 <- survSplit(Surv(time1, status) ~., dat, cut = event_times)
# This is NOT what I want as it expands by row (event) not ID.

Instead, below is a screenshot of the expanded dataset for the first 3 subjects as I would like to recreate. I have done this manually in Excel.

There does not seem to be a way to do this in survSplit(), unless I have missed something?

I can do this easily with one row per person where a person has an observation time recorded and either a status of having had the event (1) or not - censored (0).

Some example code below:

library(survival)
library(dplyr)

dat <-  structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), 
                       age = c(43L, 43L, 43L, 43L, 43L, 43L, 43L, 41L, 41L, 41L, 41L), 
                       treat = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
                                         levels = c("old", "new"), class = "factor"), 
                       time0 = c(0L, 6L, 9L, 56L, 0L, 42L, 87L, 0L, 15L, 17L, 36L), 
                       time1 = c(6L, 9L, 56L, 88L, 42L, 87L, 91L, 15L, 17L, 36L, 112L), 
                       status = c(1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0), 
                       event = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 4L)),
                  datalabel = "Chapter 9 Exercises", time.stamp = " 7 Dec 1999 08:53", formats = c("%9.0g", "%9.0g", "%9.0g", "%9.0g", "%9.0g", "%19.0g", "%9.0g"), 
                  types = c(105L, 98L, 98L, 105L, 105L, 98L, 98L), 
                  val.labels = c("", "", "oldnew", "", "", "censor", ""), 
                  var.labels = c("Subject Identification", "Age", "Treatment Assignment", "Time of Last Episode", "Time of Current Episode or censoring", "Indicator for Soreness Episode or censoring", "Soreness Episode Number"), 
                  row.names = c("3", "4", "1", "2", "5", "7", "6", "8", "9", "10", "11"), 
                  version = 6L, label.table = list(oldnew = structure(0:1, names = c("new", "old")), 
                                                   censor = structure(0:1, names = c("censored", "experienced"))), class = "data.frame")


# Split at event times
event_times <- sort(unique(with(dat, time1[status == 1])))
# Create new df in CP form with splits at every event time
dat2 <- survSplit(Surv(time1, status) ~., dat, cut = event_times)
# This is NOT what I want as it expands by row (event) not ID.

Instead, below is a screenshot of the expanded dataset for the first 3 subjects as I would like to recreate. I have done this manually in Excel.

There does not seem to be a way to do this in survSplit(), unless I have missed something?

Share Improve this question edited Feb 17 at 5:19 asked Feb 16 at 22:32 LucaS 1,2931 gold badge13 silver badges25 bronze badges

3 Can you please manually construct the output that you DO want (i.e. not just the output that you DON'T want)? – langtang Commented Feb 16 at 23:25
I can't show you what it should look like in this case (as I don't know how to code to produce that), but I have added some code in the question that shows how survsplit() works on single row/person data. There is an event for subject 1 at time1 = 88, so subject's 2 and 3 both have their time split at that point as well. Obviously with a lot more subjects there are potentially a lot more events and consequently times to split at. – LucaS Commented Feb 17 at 0:26
1 again, I feel like I can't move forward until I see an example of what you actually want. If you know dat2 is not what you want, then you must have some idea of what the actual desired output would look like. You should be able to construct an example manually, using a toy dataset of two or three individuals, with at least one without any events, at least one with two events, and at least one with only one event. – langtang Commented Feb 17 at 1:37
@langtang I've added a screenshot of what I manually created in Excel to give you some idea of the type of result I'm after. – LucaS Commented Feb 17 at 4:57

Add a comment |

2 Answers 2

Sorted by: Reset to default 4

You can split the data on ID and then apply the survSplit function to each and then combine the results together with map_dfr from purrr.

library(purrr)
library(survival)

map_dfr(split(dat, dat$ID), \(x) {
              survSplit(Surv(time0, time1, status) ~ .,
              data=x,
              cut=unique(dat$time1))})

   ID age treat event time0 time1 status
1   1  43   new     1     0     6      1
2   1  43   new     2     6     9      1
3   1  43   new     3     9    15      0
4   1  43   new     3    15    17      0
5   1  43   new     3    17    36      0
6   1  43   new     3    36    42      0
7   1  43   new     3    42    56      1
8   1  43   new     4    56    87      0
9   1  43   new     4    87    88      1
10  2  43   new     1     0     6      0
11  2  43   new     1     6     9      0
12  2  43   new     1     9    15      0
13  2  43   new     1    15    17      0
14  2  43   new     1    17    36      0
15  2  43   new     1    36    42      1
16  2  43   new     2    42    56      0
17  2  43   new     2    56    87      1
18  2  43   new     3    87    88      0
19  2  43   new     3    88    91      0
20  3  41   new     1     0     6      0
21  3  41   new     1     6     9      0
22  3  41   new     1     9    15      1
23  3  41   new     2    15    17      1
24  3  41   new     3    17    36      1
25  3  41   new     4    36    42      0
26  3  41   new     4    42    56      0
27  3  41   new     4    56    87      0
28  3  41   new     4    87    88      0
29  3  41   new     4    88    91      0
30  3  41   new     4    91   112      0

Update: the last but one row was missing (row 29); I refined the code:

We write a small function split_single_row() that

takes a single‐row data frame data_row
subsets the global cut points cutpoints to those strictly inside the interval time0, time1
calls survSplit() on just that row

Finally we apply row‐wise splitting for each subject with group_modify() where each .x is the subset of rows for one subject (possibly multiple intervals). Using map_dfr() we split each row and combine the results within that group.

library(dplyr)
library(survival)
library(purrr)

split_single_row <- function(data_row, cutpoints) {
  survival::survSplit(
    formula = Surv(time0, time1, status) ~ .,
    data    = data_row,
    cut     = cutpoints,
    start   = "time0",
    end     = "time1",
    event   = "status"
  )
}

dat %>%
  group_modify(~ {
    map_dfr(seq_len(nrow(.x)), function(i) {
      split_single_row(.x[i, ], cutpoints = unique(dat$time1))
    })
  }, .by = ID)


 ID age treat event time0 time1 status
1   1  43   new     1     0     6      1
2   1  43   new     2     6     9      1
3   1  43   new     3     9    15      0
4   1  43   new     3    15    17      0
5   1  43   new     3    17    36      0
6   1  43   new     3    36    42      0
7   1  43   new     3    42    56      1
8   1  43   new     4    56    87      0
9   1  43   new     4    87    88      1
10  2  43   new     1     0     6      0
11  2  43   new     1     6     9      0
12  2  43   new     1     9    15      0
13  2  43   new     1    15    17      0
14  2  43   new     1    17    36      0
15  2  43   new     1    36    42      1
16  2  43   new     2    42    56      0
17  2  43   new     2    56    87      1
18  2  43   new     3    87    88      0
19  2  43   new     3    88    91      0
20  3  41   new     1     0     6      0
21  3  41   new     1     6     9      0
22  3  41   new     1     9    15      1
23  3  41   new     2    15    17      1
24  3  41   new     3    17    36      1
25  3  41   new     4    36    42      0
26  3  41   new     4    42    56      0
27  3  41   new     4    56    87      0
28  3  41   new     4    87    88      0
29  3  41   new     4    88    91      0
30  3  41   new     4    91   112      0

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

r - Using survsplit() on recurrent events data - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)