最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Between in dplyr with lm function - Stack Overflow

programmeradmin2浏览0评论

I am testing for outliers using the iris dataset

mod <- lm(Sepal.Width ~ Sepal.Length*Species, data = iris)

I use rstudent() to calculate the studentized residuals, and add an indicator whether the value is outside the range [-2, 2].

iris2 <-
  iris |> 
  mutate(res_stud = rstudent(mod),
         res_stud_large = as.numeric(!between(res_stud, -2, 2)))

but I get this error:

Error in `mutate()`:
ℹ In argument: `res_stud_large = as.numeric(!between(res_stud, -2, 2))`.
Caused by error:
! length(g) must match nrow(X)
Backtrace:
  1. dplyr::mutate(...)
 13. base::stop(`<Rcpp::xc>`)
> 

I checked that

str(rstudent(mod))

 Named num [1:150] -0.0113 -1.2776 0.0609 -0.0142 0.6545 ...
 - attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...

Probably because of this, I get this error?

I tried using subset function but without success.

I am testing for outliers using the iris dataset

mod <- lm(Sepal.Width ~ Sepal.Length*Species, data = iris)

I use rstudent() to calculate the studentized residuals, and add an indicator whether the value is outside the range [-2, 2].

iris2 <-
  iris |> 
  mutate(res_stud = rstudent(mod),
         res_stud_large = as.numeric(!between(res_stud, -2, 2)))

but I get this error:

Error in `mutate()`:
ℹ In argument: `res_stud_large = as.numeric(!between(res_stud, -2, 2))`.
Caused by error:
! length(g) must match nrow(X)
Backtrace:
  1. dplyr::mutate(...)
 13. base::stop(`<Rcpp::xc>`)
> 

I checked that

str(rstudent(mod))

 Named num [1:150] -0.0113 -1.2776 0.0609 -0.0142 0.6545 ...
 - attr(*, "names")= chr [1:150] "1" "2" "3" "4" ...

Probably because of this, I get this error?

I tried using subset function but without success.

Share Improve this question edited Mar 17 at 6:23 marc_s 756k184 gold badges1.4k silver badges1.5k bronze badges asked Mar 16 at 18:54 mariannmariann 331 silver badge5 bronze badges 3
  • as.numeric(!data.table::between(res_stud, -2, 2)) works for me. – jay.sf Commented Mar 16 at 19:04
  • Also, if I start with a blank workspace and just load dplyr, it works as well. I wonder if you had a version of between() from some other place? – DaveArmstrong Commented Mar 16 at 19:06
  • Thank you, as.numeric(!data.table::between(res_stud, -2, 2)) worked just fine! – mariann Commented Mar 16 at 20:28
Add a comment  | 

1 Answer 1

Reset to default 2

I think there may be something else going on here. Using just dplyr and the iris it works.

library(dplyr)
mod <- lm(Sepal.Width ~ Sepal.Length*Species, data = iris)
iris2 <-
  iris |> 
  mutate(res_stud = rstudent(mod),
         res_stud_large = as.numeric(!between(res_stud, -2, 2)))

This works because the iris data are complete (no NA values). If we impose a missing value, you'll see that it fails in the same way as your example:

iris$Species[1] <- NA

mod <- lm(Sepal.Width ~ Sepal.Length*Species, data = iris)
iris2 <-
  iris |> 
  mutate(res_stud = rstudent(mod),
         res_stud_large = as.numeric(!between(res_stud, -2, 2)))
#> Error in `mutate()`:
#> ℹ In argument: `res_stud = rstudent(mod)`.
#> Caused by error:
#> ! `res_stud` must be size 150 or 1, not 149.

If you estimate the model with na.action = na.exclude, then when R returns things like fitted values or residuals, it will do so including the NA values for the cases that were not used in the analysis - making the output the same size as the original input.

mod2 <- lm(Sepal.Width ~ Sepal.Length*Species, data = iris, 
           na.action = na.exclude)
iris2 <- iris |> 
  mutate(res_stud = rstudent(mod2),
         res_stud_large = as.numeric(!between(res_stud, -2, 2)))

I wonder if something like this happened along the way that wasn't documented in your example?

Created on 2025-03-16 with reprex v2.1.1.9000

发布评论

评论列表(0)

  1. 暂无评论