最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

dplyr - Using across() and where() with replace_na() with integer column in R - Stack Overflow

programmeradmin1浏览0评论

This is my code

library(tidyverse)
library(palmerpenguins)
 penguins %>%
          mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
                 across(where(is.factor), ~ replace_na(.x, levels(.x)[1])))

Why when I do this it doesnt work:

penguins %>%
          mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
                 across(where(is.factor), ~ replace_na(.x, levels(.x)[1])),
                 across(where(is.integer),~ replace_na(.x, mean(.x, na.rm = TRUE))))

When I add the part to deal with the integer column is not working

This is my code

library(tidyverse)
library(palmerpenguins)
 penguins %>%
          mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
                 across(where(is.factor), ~ replace_na(.x, levels(.x)[1])))

Why when I do this it doesnt work:

penguins %>%
          mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
                 across(where(is.factor), ~ replace_na(.x, levels(.x)[1])),
                 across(where(is.integer),~ replace_na(.x, mean(.x, na.rm = TRUE))))

When I add the part to deal with the integer column is not working

Share Improve this question asked Feb 6 at 22:15 LauraLaura 51314 silver badges41 bronze badges 2
  • The mean of the integer columns will be double, and the error is because the double can't be store in the integer colums. You might want to take the median() value as replacement, of change the column as.numeric(). – VinceGreg Commented Feb 6 at 22:24
  • 1 @VinceGreg The median of a vector of integers can still be non-integral. Try: median(1:4). – Edward Commented Feb 7 at 8:57
Add a comment  | 

2 Answers 2

Reset to default 1

As per the error information:

Error in mutate(): ℹ In argument: across(where(is.integer), ~replace_na(.x, mean(.x, na.rm = TRUE))). Caused by error in across(): ! Can't compute column flipper_length_mm. Caused by error in vec_assign(): ! Can't convert from replace to data due to loss of precision. • Locations: 1 Run rlang::last_trace() to see where the error occurred.

I think you need to force the type conversion from double (caused by mean) to int, e.g.,

penguins %>%
    mutate(
        across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
        across(where(is.factor), ~ replace_na(.x, levels(.x)[1])),
        across(where(is.integer), ~ replace_na(.x, as.integer(mean(.x, na.rm = TRUE))))
    )

Since the result after imputation using the mean is likely to be non-integer, you could convert all integer columns to numeric before imputation:

penguins %>%
  mutate(across(where(is.integer), as.numeric),
         across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
         across(where(is.factor), ~ replace_na(.x, levels(.x)[1])))

# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex     year
   <fct>   <fct>              <dbl>         <dbl>             <dbl>       <dbl> <fct>  <dbl>
 1 Adelie  Torgersen           39.1          18.7              181        3750  male    2007
 2 Adelie  Torgersen           39.5          17.4              186        3800  female  2007
 3 Adelie  Torgersen           40.3          18                195        3250  female  2007
 4 Adelie  Torgersen           43.9          17.2              201.       4202. female  2007
 5 Adelie  Torgersen           36.7          19.3              193        3450  female  2007
 6 Adelie  Torgersen           39.3          20.6              190        3650  male    2007
 7 Adelie  Torgersen           38.9          17.8              181        3625  female  2007
 8 Adelie  Torgersen           39.2          19.6              195        4675  male    2007
 9 Adelie  Torgersen           34.1          18.1              193        3475  female  2007
10 Adelie  Torgersen           42            20.2              190        4250  female  2007
# ℹ 334 more rows
# ℹ Use `print(n = ...)` to see more rows

As an aside, most would not recommend missing value imputation using the mean as it can cause bias in the analysis.

发布评论

评论列表(0)

  1. 暂无评论