In the following example, I create a function to fit a glm
, but the function cannot find the formula defined immediately before. I believe this has to do with the function looking in the wrong environment, but I can't understand why. Here is an example:
n <- 20
ncov <- 3
df <- as.data.frame(replicate(ncov+1, runif(n)))
names(df) <- c(paste0("x", seq(ncov)), "y")
df
fun1 <- function(mod, pTrain = 0.5){
print(environment())
data <- mod$data
y <- mod$y
train <- sample(nrow(data), size = nrow(data)*pTrain)
valid <- -train
modTrain <- update(object = mod, data = data[train,])
yhat <- predict(modTrain, newdata = data[valid,])
res <- data.frame(y = y, yhat = yhat)
return(res)
}
fun2 <- function(useCovs = c(1,0,0), data = df){
print(environment())
fmla <- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
# environment(fmla) <- environment() # does not help
mod <- glm(formula = fmla, data = data)
res <- fun1(mod, pTrain = 0.5)
score <- sqrt(mean((res$y - res$yhat)^2))
return(c(aic = AIC(mod), rmse = score))
}
fmla <- NULL # just to be sure there is no
fun2(useCovs = c(1,0,1))
# Error in eval(mf, parent.frame()) : object 'fmla' not found
If I use a <<-
assignment for the formula, the function works, but I worry about the potential issues with this:
fun3 <- function(useCovs = c(1,0,0), data = df){
print(environment())
fmla <<- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
mod <- glm(formula = fmla, data = data)
res <- fun1(mod, pTrain = 0.5)
score <- sqrt(mean((res$y - res$yhat)^2))
return(c(aic = AIC(mod), rmse = score))
}
fmla <- NULL # just to be sure there is no
fun3(useCovs = c(1,0,1)) # works
fmla # this equals the environment of fun2
In the following example, I create a function to fit a glm
, but the function cannot find the formula defined immediately before. I believe this has to do with the function looking in the wrong environment, but I can't understand why. Here is an example:
n <- 20
ncov <- 3
df <- as.data.frame(replicate(ncov+1, runif(n)))
names(df) <- c(paste0("x", seq(ncov)), "y")
df
fun1 <- function(mod, pTrain = 0.5){
print(environment())
data <- mod$data
y <- mod$y
train <- sample(nrow(data), size = nrow(data)*pTrain)
valid <- -train
modTrain <- update(object = mod, data = data[train,])
yhat <- predict(modTrain, newdata = data[valid,])
res <- data.frame(y = y, yhat = yhat)
return(res)
}
fun2 <- function(useCovs = c(1,0,0), data = df){
print(environment())
fmla <- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
# environment(fmla) <- environment() # does not help
mod <- glm(formula = fmla, data = data)
res <- fun1(mod, pTrain = 0.5)
score <- sqrt(mean((res$y - res$yhat)^2))
return(c(aic = AIC(mod), rmse = score))
}
fmla <- NULL # just to be sure there is no
fun2(useCovs = c(1,0,1))
# Error in eval(mf, parent.frame()) : object 'fmla' not found
If I use a <<-
assignment for the formula, the function works, but I worry about the potential issues with this:
fun3 <- function(useCovs = c(1,0,0), data = df){
print(environment())
fmla <<- formula(paste("y ~", paste(paste0("x", seq(useCovs))[as.logical(useCovs)], collapse = " + ")))
mod <- glm(formula = fmla, data = data)
res <- fun1(mod, pTrain = 0.5)
score <- sqrt(mean((res$y - res$yhat)^2))
return(c(aic = AIC(mod), rmse = score))
}
fmla <- NULL # just to be sure there is no
fun3(useCovs = c(1,0,1)) # works
fmla # this equals the environment of fun2
Share
Improve this question
edited Nov 16, 2024 at 7:06
Jan
10.3k6 gold badges21 silver badges33 bronze badges
asked Nov 16, 2024 at 5:51
Marc in the boxMarc in the box
12k5 gold badges49 silver badges100 bronze badges
2
|
2 Answers
Reset to default 2Inspired by this post - in particular, the answer that has not been accepted - this seems to solve the problem.
fun1 <- function(mod, pTrain = 0.5){
data <- mod$data
y <- mod$y
train <- sample(nrow(data), size = nrow(data)*pTrain)
valid <- -train
# New code
ev <- environment()
parent.env(ev) <- environment(mod$formula)
environment(mod$formula) <- ev
# End of new code
modTrain <- update(object = mod, data = data[train,])
yhat <- predict(modTrain, newdata = data[valid,])
res <- data.frame(y = y, yhat = yhat)
return(res)
}
I cannot explain why, though the discussion in the accepted answer above is probably worth some study.
As I mentioned in my comment, amending the signature of fun1
to
fun1 <- function(mod, pTrain = 0.5, fmla)
and the call to it in fun2
to
res <- fun1(mod, pTrain = 0.5, fmla)
also succeeds.
Replace
mod <- glm(formula = fmla, data = data)
with
mod <- do.call("glm", list(formula = fmla, data = data))
or alternately with (less preferred)
mod <- eval(substitute(glm(fmla, data = data), list(fmla = fmla)))
so that the value of fmla
is passed to glm
rather than the fmla
variable. Normally this would not matter but glm
uses non-standard evaluation (NSE) so it does.
formula
takes a text argument. If you want to build a formula thenas.formula
is the standard approach. Voting to close as a typo. – IRTFM Commented Nov 16, 2024 at 6:46fmla
as an argument tofun1
... – Limey Commented Nov 16, 2024 at 6:58