I have made a survival tree using the rpart package in R. When I print the tree each node contains an estimated rate. The rpart documents say this value is an estimated relative death rate of that node to the overall death rate. I'm assuming this is similar to a risk or hazard ratio measurement, comparing that node to the root node. A reviewer has suggested we add confidence intervals to these reported values. I have not been able to figure out how to extract CIs from rpart directly and when I use coxph() to make hazard ratios they compare to a reference group, not the overall hazard so they are not equivalent to what is returned by rpart. Is there a way to extract or calculate CIs based on the estimated rates from rpart? I have added the Stage C Prostate Cancer model that is used in the R documents for reference.
library(rpart)
library(survival)
library(rpart.plot)
pfit <- rpart(Surv(pgtime, pgstat) ~ age + eet + g2 + grade + gleason + ploidy, data = stagec)
pfit2 <- prune(pfit, cp = 0.016)
rpart.plot(pfit2)
In this example, the estimated rates are 0.12 for the bottom left node, 0.73 for the next one, etc.
I have made a survival tree using the rpart package in R. When I print the tree each node contains an estimated rate. The rpart documents say this value is an estimated relative death rate of that node to the overall death rate. I'm assuming this is similar to a risk or hazard ratio measurement, comparing that node to the root node. A reviewer has suggested we add confidence intervals to these reported values. I have not been able to figure out how to extract CIs from rpart directly and when I use coxph() to make hazard ratios they compare to a reference group, not the overall hazard so they are not equivalent to what is returned by rpart. Is there a way to extract or calculate CIs based on the estimated rates from rpart? I have added the Stage C Prostate Cancer model that is used in the R documents for reference.
library(rpart)
library(survival)
library(rpart.plot)
pfit <- rpart(Surv(pgtime, pgstat) ~ age + eet + g2 + grade + gleason + ploidy, data = stagec)
pfit2 <- prune(pfit, cp = 0.016)
rpart.plot(pfit2)
In this example, the estimated rates are 0.12 for the bottom left node, 0.73 for the next one, etc.
Share Improve this question edited Feb 14 at 18:43 jpsmith 17.5k6 gold badges20 silver badges45 bronze badges asked Feb 14 at 18:30 Heather TreleavenHeather Treleaven 333 bronze badges1 Answer
Reset to default 0rpart
does not provide confidence intervals (CIs) for these estimates directly. To compute CIs, you need to estimate the uncertainty in the node-specific hazard rates.
# Extract node information
frame <- pfit2$frame
nodes <- rownames(frame)
rates <- frame$yval # Relative death rates
# Get number of events (deaths) in each node (ensuring integer values)
death_counts <- round(frame$dev) # Poisson test requires integer event counts
node_sizes <- frame$n # Number of subjects per node
# Function to calculate Poisson confidence intervals
library(epitools)
ci_poisson <- function(events) {
if (events == 0) {
return(c(0, NA)) # Avoid Poisson test errors when no events occur
} else {
return(poisson.test(events)$conf.int)
}
}
# Compute confidence intervals
ci_lower <- numeric(length(rates))
ci_upper <- numeric(length(rates))
for (i in seq_along(rates)) {
ci <- ci_poisson(death_counts[i])
ci_lower[i] <- ci[1]
ci_upper[i] <- ci[2]
}
# Combine results
ci_df <- data.frame(
Node = nodes,
Rate = rates,
CI_Lower = ci_lower,
CI_Upper = ci_upper
)
# Print results
print(ci_df)
output:
> print(ci_df)
Node Rate CI_Lower CI_Upper
1 1 1.0000000 165.801137 221.16312
2 2 0.3634439 32.823309 60.21354
3 4 0.1229835 4.115373 17.08480
4 5 0.7345610 18.605797 40.46780
5 3 1.6148603 101.313626 145.66817
6 6 1.4255042 84.071938 124.91746
7 12 1.1407323 51.044395 83.96818
8 13 2.0307294 22.715682 46.34427
9 7 3.1822324 7.653930 23.48962