r - My call to glm in a function does not find the formula in the environment

In the following example, I create a function to fit a glm, but the function cannot find the formula defined immediately before. I believe this has to do with the function looking in the wrong environment, but I can't understand why. Here is an example:

n <- 20
ncov <- 3
df <- as.data.frame(replicate(ncov+1, runif(n)))
names(df) <- c(paste0("x", seq(ncov)), "y")
df

fun1 <- function(mod, pTrain = 0.5){
  print(environment())
  data <- mod$data
  y <- mod$y
  train <- sample(nrow(data), size = nrow(data)*pTrain)
  valid <- -train
  modTrain <- update(object = mod, data = data[train,])
  yhat <- predict(modTrain, newdata = data[valid,])
  res <- data.frame(y = y, yhat = yhat)
  return(res)
}

fun2 <- function(useCovs = c(1,0,0), data = df){
  print(environment())
  fmla <- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
  # environment(fmla) <- environment() # does not help
  mod <- glm(formula = fmla, data = data)
  res <- fun1(mod, pTrain = 0.5)
  score <- sqrt(mean((res$y - res$yhat)^2))
  return(c(aic = AIC(mod), rmse = score))
}

fmla <- NULL # just to be sure there is no
fun2(useCovs = c(1,0,1))
# Error in eval(mf, parent.frame()) : object 'fmla' not found

If I use a <<- assignment for the formula, the function works, but I worry about the potential issues with this:

fun3 <- function(useCovs = c(1,0,0), data = df){
  print(environment())
  fmla <<- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
  mod <- glm(formula = fmla, data = data)
  res <- fun1(mod, pTrain = 0.5)
  score <- sqrt(mean((res$y - res$yhat)^2))
  return(c(aic = AIC(mod), rmse = score))
}

fmla <- NULL # just to be sure there is no
fun3(useCovs = c(1,0,1)) # works
fmla # this equals the environment of fun2

n <- 20
ncov <- 3
df <- as.data.frame(replicate(ncov+1, runif(n)))
names(df) <- c(paste0("x", seq(ncov)), "y")
df

fun1 <- function(mod, pTrain = 0.5){
  print(environment())
  data <- mod$data
  y <- mod$y
  train <- sample(nrow(data), size = nrow(data)*pTrain)
  valid <- -train
  modTrain <- update(object = mod, data = data[train,])
  yhat <- predict(modTrain, newdata = data[valid,])
  res <- data.frame(y = y, yhat = yhat)
  return(res)
}

fun2 <- function(useCovs = c(1,0,0), data = df){
  print(environment())
  fmla <- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
  # environment(fmla) <- environment() # does not help
  mod <- glm(formula = fmla, data = data)
  res <- fun1(mod, pTrain = 0.5)
  score <- sqrt(mean((res$y - res$yhat)^2))
  return(c(aic = AIC(mod), rmse = score))
}

fmla <- NULL # just to be sure there is no
fun2(useCovs = c(1,0,1))
# Error in eval(mf, parent.frame()) : object 'fmla' not found

If I use a <<- assignment for the formula, the function works, but I worry about the potential issues with this:

fun3 <- function(useCovs = c(1,0,0), data = df){
  print(environment())
  fmla <<- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
  mod <- glm(formula = fmla, data = data)
  res <- fun1(mod, pTrain = 0.5)
  score <- sqrt(mean((res$y - res$yhat)^2))
  return(c(aic = AIC(mod), rmse = score))
}

fmla <- NULL # just to be sure there is no
fun3(useCovs = c(1,0,1)) # works
fmla # this equals the environment of fun2

Share Improve this question edited Nov 16, 2024 at 7:06 Jan 10.3k6 gold badges21 silver badges33 bronze badges asked Nov 16, 2024 at 5:51 Marc in the box 12k5 gold badges49 silver badges100 bronze badges

1 I don’t think formula takes a text argument. If you want to build a formula then as.formula is the standard approach. Voting to close as a typo. – IRTFM Commented Nov 16, 2024 at 6:46
Hmmm. It also works if you pass fmla as an argument to fun1... – Limey Commented Nov 16, 2024 at 6:58

Add a comment |

2 Answers 2

Sorted by: Reset to default 2

Inspired by this post - in particular, the answer that has not been accepted - this seems to solve the problem.

fun1 <- function(mod, pTrain = 0.5){
  data <- mod$data
  y <- mod$y
  train <- sample(nrow(data), size = nrow(data)*pTrain)
  valid <- -train
  # New code
  ev <- environment()
  parent.env(ev) <- environment(mod$formula)
  environment(mod$formula) <- ev
  # End of new code
  modTrain <- update(object = mod, data = data[train,])
  yhat <- predict(modTrain, newdata = data[valid,])
  res <- data.frame(y = y, yhat = yhat)
  return(res)
}

I cannot explain why, though the discussion in the accepted answer above is probably worth some study.

As I mentioned in my comment, amending the signature of fun1 to

fun1 <- function(mod, pTrain = 0.5, fmla)

and the call to it in fun2 to

  res <- fun1(mod, pTrain = 0.5, fmla)

also succeeds.

Replace

mod <- glm(formula = fmla, data = data)

with

mod <- do.call("glm", list(formula = fmla, data = data))

or alternately with (less preferred)

mod <- eval(substitute(glm(fmla, data = data), list(fmla = fmla)))

so that the value of fmla is passed to glm rather than the fmla variable. Normally this would not matter but glm uses non-standard evaluation (NSE) so it does.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

r - My call to glm in a function does not find the formula in the environment - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)