How to calculate confidence intervals for estimated rates in RPart survival trees in R

I have made a survival tree using the rpart package in R. When I print the tree each node contains an estimated rate. The rpart documents say this value is an estimated relative death rate of that node to the overall death rate. I'm assuming this is similar to a risk or hazard ratio measurement, comparing that node to the root node. A reviewer has suggested we add confidence intervals to these reported values. I have not been able to figure out how to extract CIs from rpart directly and when I use coxph() to make hazard ratios they compare to a reference group, not the overall hazard so they are not equivalent to what is returned by rpart. Is there a way to extract or calculate CIs based on the estimated rates from rpart? I have added the Stage C Prostate Cancer model that is used in the R documents for reference.

library(rpart)
library(survival)
library(rpart.plot)

pfit <- rpart(Surv(pgtime, pgstat) ~ age + eet + g2 + grade + gleason + ploidy, data = stagec)
pfit2 <- prune(pfit, cp = 0.016)
rpart.plot(pfit2)

In this example, the estimated rates are 0.12 for the bottom left node, 0.73 for the next one, etc.

library(rpart)
library(survival)
library(rpart.plot)

pfit <- rpart(Surv(pgtime, pgstat) ~ age + eet + g2 + grade + gleason + ploidy, data = stagec)
pfit2 <- prune(pfit, cp = 0.016)
rpart.plot(pfit2)

In this example, the estimated rates are 0.12 for the bottom left node, 0.73 for the next one, etc.

Share Improve this question edited Feb 14 at 18:43 jpsmith 17.5k6 gold badges20 silver badges45 bronze badges asked Feb 14 at 18:30 Heather Treleaven 333 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

rpart does not provide confidence intervals (CIs) for these estimates directly. To compute CIs, you need to estimate the uncertainty in the node-specific hazard rates.

# Extract node information
frame <- pfit2$frame
nodes <- rownames(frame)
rates <- frame$yval  # Relative death rates

# Get number of events (deaths) in each node (ensuring integer values)
death_counts <- round(frame$dev)  # Poisson test requires integer event counts
node_sizes <- frame$n  # Number of subjects per node

# Function to calculate Poisson confidence intervals
library(epitools)
ci_poisson <- function(events) {
  if (events == 0) {
    return(c(0, NA))  # Avoid Poisson test errors when no events occur
  } else {
    return(poisson.test(events)$conf.int)
  }
}

# Compute confidence intervals
ci_lower <- numeric(length(rates))
ci_upper <- numeric(length(rates))

for (i in seq_along(rates)) {
  ci <- ci_poisson(death_counts[i])
  ci_lower[i] <- ci[1]
  ci_upper[i] <- ci[2]
}

# Combine results
ci_df <- data.frame(
  Node = nodes,
  Rate = rates,
  CI_Lower = ci_lower,
  CI_Upper = ci_upper
)

# Print results
print(ci_df)

output:

> print(ci_df)
  Node      Rate   CI_Lower  CI_Upper
1    1 1.0000000 165.801137 221.16312
2    2 0.3634439  32.823309  60.21354
3    4 0.1229835   4.115373  17.08480
4    5 0.7345610  18.605797  40.46780
5    3 1.6148603 101.313626 145.66817
6    6 1.4255042  84.071938 124.91746
7   12 1.1407323  51.044395  83.96818
8   13 2.0307294  22.715682  46.34427
9    7 3.1822324   7.653930  23.48962

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

How to calculate confidence intervals for estimated rates in RPart survival trees in R - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)