This is my code
library(tidyverse)
library(palmerpenguins)
penguins %>%
mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
across(where(is.factor), ~ replace_na(.x, levels(.x)[1])))
Why when I do this it doesnt work:
penguins %>%
mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
across(where(is.factor), ~ replace_na(.x, levels(.x)[1])),
across(where(is.integer),~ replace_na(.x, mean(.x, na.rm = TRUE))))
When I add the part to deal with the integer
column is not working
This is my code
library(tidyverse)
library(palmerpenguins)
penguins %>%
mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
across(where(is.factor), ~ replace_na(.x, levels(.x)[1])))
Why when I do this it doesnt work:
penguins %>%
mutate(across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
across(where(is.factor), ~ replace_na(.x, levels(.x)[1])),
across(where(is.integer),~ replace_na(.x, mean(.x, na.rm = TRUE))))
When I add the part to deal with the integer
column is not working
2 Answers
Reset to default 1As per the error information:
Error in
mutate()
: ℹ In argument:across(where(is.integer), ~replace_na(.x, mean(.x, na.rm = TRUE)))
. Caused by error inacross()
: ! Can't compute columnflipper_length_mm
. Caused by error invec_assign()
: ! Can't convert fromreplace
todata
due to loss of precision. • Locations: 1 Runrlang::last_trace()
to see where the error occurred.
I think you need to force the type conversion from double
(caused by mean
) to int
, e.g.,
penguins %>%
mutate(
across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
across(where(is.factor), ~ replace_na(.x, levels(.x)[1])),
across(where(is.integer), ~ replace_na(.x, as.integer(mean(.x, na.rm = TRUE))))
)
Since the result after imputation using the mean is likely to be non-integer, you could convert all integer columns to numeric before imputation:
penguins %>%
mutate(across(where(is.integer), as.numeric),
across(where(is.double), ~ replace_na(.x, mean(.x, na.rm = TRUE))),
across(where(is.factor), ~ replace_na(.x, levels(.x)[1])))
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 female 2007
3 Adelie Torgersen 40.3 18 195 3250 female 2007
4 Adelie Torgersen 43.9 17.2 201. 4202. female 2007
5 Adelie Torgersen 36.7 19.3 193 3450 female 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
7 Adelie Torgersen 38.9 17.8 181 3625 female 2007
8 Adelie Torgersen 39.2 19.6 195 4675 male 2007
9 Adelie Torgersen 34.1 18.1 193 3475 female 2007
10 Adelie Torgersen 42 20.2 190 4250 female 2007
# ℹ 334 more rows
# ℹ Use `print(n = ...)` to see more rows
As an aside, most would not recommend missing value imputation using the mean as it can cause bias in the analysis.
mean
of the integer columns will be double, and the error is because the double can't be store in the integer colums. You might want to take themedian()
value as replacement, of change the columnas.numeric()
. – VinceGreg Commented Feb 6 at 22:24median(1:4)
. – Edward Commented Feb 7 at 8:57