dplyr - Using across() and where() with replace_na() with integer column in R

This is my code

library(tidyverse)
library(palmerpenguins)
 penguins %>%
          mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
                 across(where(is.factor), ~ replace_na(.x, levels(.x)[1])))

Why when I do this it doesnt work:

penguins %>%
          mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
                 across(where(is.factor), ~ replace_na(.x, levels(.x)[1])),
                 across(where(is.integer),~ replace_na(.x, mean(.x, na.rm = TRUE))))

When I add the part to deal with the integer column is not working

This is my code

library(tidyverse)
library(palmerpenguins)
 penguins %>%
          mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
                 across(where(is.factor), ~ replace_na(.x, levels(.x)[1])))

Why when I do this it doesnt work:

penguins %>%
          mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
                 across(where(is.factor), ~ replace_na(.x, levels(.x)[1])),
                 across(where(is.integer),~ replace_na(.x, mean(.x, na.rm = TRUE))))

When I add the part to deal with the integer column is not working

Share Improve this question asked Feb 6 at 22:15 Laura 51314 silver badges41 bronze badges

The mean of the integer columns will be double, and the error is because the double can't be store in the integer colums. You might want to take the median() value as replacement, of change the column as.numeric(). – VinceGreg Commented Feb 6 at 22:24
1 @VinceGreg The median of a vector of integers can still be non-integral. Try: median(1:4). – Edward Commented Feb 7 at 8:57

Add a comment |

2 Answers 2

Sorted by: Reset to default 1

As per the error information:

Error in mutate(): ℹ In argument: across(where(is.integer), ~replace_na(.x, mean(.x, na.rm = TRUE))). Caused by error in across(): ! Can't compute column flipper_length_mm. Caused by error in vec_assign(): ! Can't convert from replace to data due to loss of precision. • Locations: 1 Run rlang::last_trace() to see where the error occurred.

I think you need to force the type conversion from double (caused by mean) to int, e.g.,

penguins %>%
    mutate(
        across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
        across(where(is.factor), ~ replace_na(.x, levels(.x)[1])),
        across(where(is.integer), ~ replace_na(.x, as.integer(mean(.x, na.rm = TRUE))))
    )

Since the result after imputation using the mean is likely to be non-integer, you could convert all integer columns to numeric before imputation:

penguins %>%
  mutate(across(where(is.integer), as.numeric),
         across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
         across(where(is.factor), ~ replace_na(.x, levels(.x)[1])))

# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex     year
   <fct>   <fct>              <dbl>         <dbl>             <dbl>       <dbl> <fct>  <dbl>
 1 Adelie  Torgersen           39.1          18.7              181        3750  male    2007
 2 Adelie  Torgersen           39.5          17.4              186        3800  female  2007
 3 Adelie  Torgersen           40.3          18                195        3250  female  2007
 4 Adelie  Torgersen           43.9          17.2              201.       4202. female  2007
 5 Adelie  Torgersen           36.7          19.3              193        3450  female  2007
 6 Adelie  Torgersen           39.3          20.6              190        3650  male    2007
 7 Adelie  Torgersen           38.9          17.8              181        3625  female  2007
 8 Adelie  Torgersen           39.2          19.6              195        4675  male    2007
 9 Adelie  Torgersen           34.1          18.1              193        3475  female  2007
10 Adelie  Torgersen           42            20.2              190        4250  female  2007
# ℹ 334 more rows
# ℹ Use `print(n = ...)` to see more rows

As an aside, most would not recommend missing value imputation using the mean as it can cause bias in the analysis.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

dplyr - Using across() and where() with replace_na() with integer column in R - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)