r - Difference Between geom_smooth() and Manual LOESS Fit in Logistic Regression

I am working with a binary outcome (chd, 0 or 1) and a continuous predictor (sbp, systolic blood pressure). I want to visualize the relationship between sbp and the probability of chd using LOESS smoothing. However, I noticed a difference between using geom_smooth() in ggplot2 and manually fitting a LOESS model.

set.seed(123)
n <- 100
data_mi <- data.frame(
  sbp = rnorm(n, mean = 130, sd = 15),  # Systolic BP
  chd = rbinom(n, size = 1, prob = 0.3)  # CHD occurrence (0/1)
)
chd_odd_log = log(sum(data_mi$chd) / (nrow(data_mi) - sum(data_mi$chd)))

data_mi$chd_odd = ifelse(data_mi$chd==1,chd_odd_log,1/chd_odd_log)

library(ggplot2)
loess_fit <- loess(chd ~ sbp, data = data_mi, degree = 1)
loess_pred <- predict(loess_fit)
ggplot(data_mi, aes(x = sbp, y = chd_odd)) +
  geom_smooth(method = "loess")+
  geom_point(aes(y = log(loess_pred / (1 - loess_pred))))

# Logit transformation
plot(data_mi$sbp, log(loess_pred / (1 - loess_pred)), main = "Log-Odds Transformation")

I think the difference is in the order of transformation. Then, how can the loess handle the binary data?

set.seed(123)
n <- 100
data_mi <- data.frame(
  sbp = rnorm(n, mean = 130, sd = 15),  # Systolic BP
  chd = rbinom(n, size = 1, prob = 0.3)  # CHD occurrence (0/1)
)
chd_odd_log = log(sum(data_mi$chd) / (nrow(data_mi) - sum(data_mi$chd)))

data_mi$chd_odd = ifelse(data_mi$chd==1,chd_odd_log,1/chd_odd_log)

library(ggplot2)
loess_fit <- loess(chd ~ sbp, data = data_mi, degree = 1)
loess_pred <- predict(loess_fit)
ggplot(data_mi, aes(x = sbp, y = chd_odd)) +
  geom_smooth(method = "loess")+
  geom_point(aes(y = log(loess_pred / (1 - loess_pred))))

# Logit transformation
plot(data_mi$sbp, log(loess_pred / (1 - loess_pred)), main = "Log-Odds Transformation")

I think the difference is in the order of transformation. Then, how can the loess handle the binary data?

Share Improve this question edited Feb 5 at 2:51 asked Feb 5 at 2:04 doraemon 8356 silver badges14 bronze badges

any particular reason you used degree 1 in your manual loess call rather than the default degree = 2 ? – Ben Bolker Commented Feb 5 at 2:28
There is no specific reason I used degree = 2...I just copied the code from the note. – doraemon Commented Feb 5 at 2:47

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

Your two plots aren't doing the same thing. Try the following:

set.seed(123)
n <- 100
data_mi <- data.frame(
  sbp = rnorm(n, mean = 130, sd = 15),  # Systolic BP
  chd = rbinom(n, size = 1, prob = 0.3)  # CHD occurrence (0/1)
)

library(ggplot2)
p1 <- ggplot(data_mi, aes(x = sbp, y = chd)) +
  geom_point() +
  geom_smooth(method = "loess")

loess_fit <- loess(chd ~ sbp, data = data_mi)
loess_pred <- predict(loess_fit)

p1 +
  geom_point(aes(y=loess_pred), col="red")

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

r - Difference Between geom_smooth() and Manual LOESS Fit in Logistic Regression - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)