I am working with a binary outcome (chd, 0 or 1) and a continuous predictor (sbp, systolic blood pressure). I want to visualize the relationship between sbp and the probability of chd using LOESS smoothing. However, I noticed a difference between using geom_smooth() in ggplot2 and manually fitting a LOESS model.
set.seed(123)
n <- 100
data_mi <- data.frame(
sbp = rnorm(n, mean = 130, sd = 15), # Systolic BP
chd = rbinom(n, size = 1, prob = 0.3) # CHD occurrence (0/1)
)
chd_odd_log = log(sum(data_mi$chd) / (nrow(data_mi) - sum(data_mi$chd)))
data_mi$chd_odd = ifelse(data_mi$chd==1,chd_odd_log,1/chd_odd_log)
library(ggplot2)
loess_fit <- loess(chd ~ sbp, data = data_mi, degree = 1)
loess_pred <- predict(loess_fit)
ggplot(data_mi, aes(x = sbp, y = chd_odd)) +
geom_smooth(method = "loess")+
geom_point(aes(y = log(loess_pred / (1 - loess_pred))))
# Logit transformation
plot(data_mi$sbp, log(loess_pred / (1 - loess_pred)), main = "Log-Odds Transformation")
I think the difference is in the order of transformation. Then, how can the loess handle the binary data?
I am working with a binary outcome (chd, 0 or 1) and a continuous predictor (sbp, systolic blood pressure). I want to visualize the relationship between sbp and the probability of chd using LOESS smoothing. However, I noticed a difference between using geom_smooth() in ggplot2 and manually fitting a LOESS model.
set.seed(123)
n <- 100
data_mi <- data.frame(
sbp = rnorm(n, mean = 130, sd = 15), # Systolic BP
chd = rbinom(n, size = 1, prob = 0.3) # CHD occurrence (0/1)
)
chd_odd_log = log(sum(data_mi$chd) / (nrow(data_mi) - sum(data_mi$chd)))
data_mi$chd_odd = ifelse(data_mi$chd==1,chd_odd_log,1/chd_odd_log)
library(ggplot2)
loess_fit <- loess(chd ~ sbp, data = data_mi, degree = 1)
loess_pred <- predict(loess_fit)
ggplot(data_mi, aes(x = sbp, y = chd_odd)) +
geom_smooth(method = "loess")+
geom_point(aes(y = log(loess_pred / (1 - loess_pred))))
# Logit transformation
plot(data_mi$sbp, log(loess_pred / (1 - loess_pred)), main = "Log-Odds Transformation")
I think the difference is in the order of transformation. Then, how can the loess handle the binary data?
Share Improve this question edited Feb 5 at 2:51 doraemon asked Feb 5 at 2:04 doraemondoraemon 8356 silver badges14 bronze badges 2 |1 Answer
Reset to default 1Your two plots aren't doing the same thing. Try the following:
set.seed(123)
n <- 100
data_mi <- data.frame(
sbp = rnorm(n, mean = 130, sd = 15), # Systolic BP
chd = rbinom(n, size = 1, prob = 0.3) # CHD occurrence (0/1)
)
library(ggplot2)
p1 <- ggplot(data_mi, aes(x = sbp, y = chd)) +
geom_point() +
geom_smooth(method = "loess")
loess_fit <- loess(chd ~ sbp, data = data_mi)
loess_pred <- predict(loess_fit)
p1 +
geom_point(aes(y=loess_pred), col="red")
loess
call rather than the defaultdegree = 2
? – Ben Bolker Commented Feb 5 at 2:28