最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Calculate svymean for variable within user-defined function called by mutate(), across(), with variable name as function par

programmeradmin1浏览0评论

My goal is to take a dataframe that contains individual level data of many variables and apply a user defined function to return those variables' survey-adjusted weighted means and create new columns called, wt_mean_VAR, where each entry is the svymean for that VAR.

In my attempt to do so, I have defined a function, calc_wt_mean_var_fn_df_ex, that takes a dataframe (data) and variable name (VAR) as parameters. The function is then supposed to create the svydesign object (svydes) from data, and then calculate the svymean of VAR. The function should return the svymean of this VAR.

I am then try to combine this user-defined function with a vector of variable names, i.e., dplyr::mutate:across:any_of(regularVar_names_ex), to create columns called wt_mean_VAR, where every entry for each particular column is the svymean calculated in my user defined-function for that VAR.

Example data:

input.ds.2018 = tibble(
  Var1 = c(1, 1, NA, NA, 1, 0),
  Var2 = rep(c(1, 0), 3),
  V3 = c(NA, rep(2, 4), 1),
  y_4 = c(NA, "y", "z", "l", "m", "n"),
  X_AGE80 = c(17, 18, NA, 84, 21, 72),
  WT = c(2,3,8,4,0.1,5),
  X_PSU = c(1,2,3,4,5,6),
  X_STSTR = c(1,2,1,2,1,2)
)

regularVar_names_ex = c("Var1","Var2") 

calc_wt_mean_var_fn_df_ex = function(data,VAR){
  #only proceed if there are at least 2 unique answers to the variable of interest, VAR
  if(data %>% filter(!is.na(eval(parse(text = VAR)))) %>% group_by(eval(parse(text = VAR))) %>% count %>%  nrow > 1){
    #create survey design object using dataframe, data
    svydes = svydesign(id =~X_PSU,strata =~ X_STSTR, weights =~WT, data =data %>% filter(!is.na(WT) & !is.na(X_PSU)),nest = TRUE )
    #calculate survey-adjusted mean and SE and return results as tibble
    res = svymean((~eval(parse(text = VAR))), svydes, na.rm = TRUE) %>%  as_tibble()
    #extract the answer, the survey-adjusted mean weight stored in res, as ans
    ans = res %>% select(mean) %>% pull 
  } else{
    ans = NA
  }
  return(ans)
}

The survey-adjusted weighted mean of Var1 in input.ds.2018 = 0.505 and weighted mean of Var2 = 0.457.

When I do:

output.ds.2018 = input.ds.2018 %>%
  mutate(across(all_of(regularVar_names_ex), 
                \(x) calc_wt_mean_var_fn_df_ex(pick(x,X_PSU,X_STSTR,WT), VAR = x), 
                .names = "wt_mean_{.col}"))

I would like to get the following:

output.desired = tibble(
  Var1 = c(1, 1, NA, NA, 1, 0),
  Var2 = rep(c(1, 0), 3),
  V3 = c(NA, rep(2, 4), 1),
  y_4 = c(NA, "y", "z", "l", "m", "n"),
  X_AGE80 = c(17, 18, NA, 84, 21, 72),
  WT = c(2,3,8,4,0.1,5),
  X_PSU = c(1,2,3,4,5,6),
  X_STSTR = c(1,2,1,2,1,2),
  wt_mean_Var1 = c(0.505,0.505,0.505,0.505,0.505,0.505),
  wt_mean_Var2 = c(0.457,0.457,0.457,0.457,0.457,0.457)
)
> output.desired
# A tibble: 6 x 10
   Var1  Var2    V3 y_4   X_AGE80    WT X_PSU X_STSTR wt_mean_Var1 wt_mean_Var2
  <dbl> <dbl> <dbl> <chr>   <dbl> <dbl> <dbl>   <dbl>        <dbl>        <dbl>
1     1     1    NA NA         17   2       1       1        0.505        0.457
2     1     0     2 y          18   3       2       2        0.505        0.457
3    NA     1     2 z          NA   8       3       1        0.505        0.457
4    NA     0     2 l          84   4       4       2        0.505        0.457
5     1     1     2 m          21   0.1     5       1        0.505        0.457
6     0     0     1 n          72   5       6       2        0.505        0.457

but instead I get NAs for the weighted means. What am I doing wrong?

> output.ds.2018
# A tibble: 6 x 10
   Var1  Var2    V3 y_4   X_AGE80    WT X_PSU X_STSTR wt_mean_Var1 wt_mean_Var2
  <dbl> <dbl> <dbl> <chr>   <dbl> <dbl> <dbl>   <dbl> <lgl>        <lgl>       
1     1     1    NA NA         17   2       1       1 NA           NA          
2     1     0     2 y          18   3       2       2 NA           NA          
3    NA     1     2 z          NA   8       3       1 NA           NA          
4    NA     0     2 l          84   4       4       2 NA           NA          
5     1     1     2 m          21   0.1     5       1 NA           NA          
6     0     0     1 n          72   5       6       2 NA           NA

My goal is to take a dataframe that contains individual level data of many variables and apply a user defined function to return those variables' survey-adjusted weighted means and create new columns called, wt_mean_VAR, where each entry is the svymean for that VAR.

In my attempt to do so, I have defined a function, calc_wt_mean_var_fn_df_ex, that takes a dataframe (data) and variable name (VAR) as parameters. The function is then supposed to create the svydesign object (svydes) from data, and then calculate the svymean of VAR. The function should return the svymean of this VAR.

I am then try to combine this user-defined function with a vector of variable names, i.e., dplyr::mutate:across:any_of(regularVar_names_ex), to create columns called wt_mean_VAR, where every entry for each particular column is the svymean calculated in my user defined-function for that VAR.

Example data:

input.ds.2018 = tibble(
  Var1 = c(1, 1, NA, NA, 1, 0),
  Var2 = rep(c(1, 0), 3),
  V3 = c(NA, rep(2, 4), 1),
  y_4 = c(NA, "y", "z", "l", "m", "n"),
  X_AGE80 = c(17, 18, NA, 84, 21, 72),
  WT = c(2,3,8,4,0.1,5),
  X_PSU = c(1,2,3,4,5,6),
  X_STSTR = c(1,2,1,2,1,2)
)

regularVar_names_ex = c("Var1","Var2") 

calc_wt_mean_var_fn_df_ex = function(data,VAR){
  #only proceed if there are at least 2 unique answers to the variable of interest, VAR
  if(data %>% filter(!is.na(eval(parse(text = VAR)))) %>% group_by(eval(parse(text = VAR))) %>% count %>%  nrow > 1){
    #create survey design object using dataframe, data
    svydes = svydesign(id =~X_PSU,strata =~ X_STSTR, weights =~WT, data =data %>% filter(!is.na(WT) & !is.na(X_PSU)),nest = TRUE )
    #calculate survey-adjusted mean and SE and return results as tibble
    res = svymean((~eval(parse(text = VAR))), svydes, na.rm = TRUE) %>%  as_tibble()
    #extract the answer, the survey-adjusted mean weight stored in res, as ans
    ans = res %>% select(mean) %>% pull 
  } else{
    ans = NA
  }
  return(ans)
}

The survey-adjusted weighted mean of Var1 in input.ds.2018 = 0.505 and weighted mean of Var2 = 0.457.

When I do:

output.ds.2018 = input.ds.2018 %>%
  mutate(across(all_of(regularVar_names_ex), 
                \(x) calc_wt_mean_var_fn_df_ex(pick(x,X_PSU,X_STSTR,WT), VAR = x), 
                .names = "wt_mean_{.col}"))

I would like to get the following:

output.desired = tibble(
  Var1 = c(1, 1, NA, NA, 1, 0),
  Var2 = rep(c(1, 0), 3),
  V3 = c(NA, rep(2, 4), 1),
  y_4 = c(NA, "y", "z", "l", "m", "n"),
  X_AGE80 = c(17, 18, NA, 84, 21, 72),
  WT = c(2,3,8,4,0.1,5),
  X_PSU = c(1,2,3,4,5,6),
  X_STSTR = c(1,2,1,2,1,2),
  wt_mean_Var1 = c(0.505,0.505,0.505,0.505,0.505,0.505),
  wt_mean_Var2 = c(0.457,0.457,0.457,0.457,0.457,0.457)
)
> output.desired
# A tibble: 6 x 10
   Var1  Var2    V3 y_4   X_AGE80    WT X_PSU X_STSTR wt_mean_Var1 wt_mean_Var2
  <dbl> <dbl> <dbl> <chr>   <dbl> <dbl> <dbl>   <dbl>        <dbl>        <dbl>
1     1     1    NA NA         17   2       1       1        0.505        0.457
2     1     0     2 y          18   3       2       2        0.505        0.457
3    NA     1     2 z          NA   8       3       1        0.505        0.457
4    NA     0     2 l          84   4       4       2        0.505        0.457
5     1     1     2 m          21   0.1     5       1        0.505        0.457
6     0     0     1 n          72   5       6       2        0.505        0.457

but instead I get NAs for the weighted means. What am I doing wrong?

> output.ds.2018
# A tibble: 6 x 10
   Var1  Var2    V3 y_4   X_AGE80    WT X_PSU X_STSTR wt_mean_Var1 wt_mean_Var2
  <dbl> <dbl> <dbl> <chr>   <dbl> <dbl> <dbl>   <dbl> <lgl>        <lgl>       
1     1     1    NA NA         17   2       1       1 NA           NA          
2     1     0     2 y          18   3       2       2 NA           NA          
3    NA     1     2 z          NA   8       3       1 NA           NA          
4    NA     0     2 l          84   4       4       2 NA           NA          
5     1     1     2 m          21   0.1     5       1 NA           NA          
6     0     0     1 n          72   5       6       2 NA           NA
Share Improve this question edited Jan 29 at 7:20 abrar asked Jan 29 at 2:00 abrarabrar 1451 silver badge11 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

I think there were a couple of things going on here, but the biggest one is that when your function expected VAR to be a variable name, but when you pass the variable in via mutate(), it is a vector of values. So, the function wasn't getting what it expected for VAR. I rewrote the function to take vectors as arguments - things that will get passed in via mutate(). The function now takes the names of the variables for VAR, PSU, STRATA and WT. Regardless of those names in the input dataset, these variables take on those names (PSU, STRATA, WT and VAR) inside the function. That makes it a bit easier to deal with and removes the need for eval(parse(text = ...)), which is often frowned upon around here.

calc_wt_mean_var_fn_df_ex = function(VAR, PSU, STRATA, WT){
  #only proceed if there are at least 2 unique answers to the variable of interest, VAR
  data <- data.frame(VAR, PSU, STRATA,WT, one=1) 
  data <- subset(data, !is.na(PSU) & !is.na(WT))
  nr <- nrow(na.omit(aggregate(data$one, list(data$VAR), sum)))
   
  if(nr > 1){
    #create survey design object using dataframe, data
    svydes = svydesign(id =~PSU,
                       strata =~STRATA, 
                       weights =~WT, 
                       data =data,nest = TRUE )
    #calculate survey-adjusted mean and SE and return results as tibble
    res = svymean(~VAR, svydes, na.rm = TRUE) 
    #extract the answer, the survey-adjusted mean weight stored in res, as ans
    ans = res[[1]]
  } else{
    ans = NA
  }
  return(ans)
}

Here's the function at work:

library(survey)
library(dplyr)
input.ds.2018 = tibble(
  Var1 = c(1, 1, NA, NA, 1, 0),
  Var2 = rep(c(1, 0), 3),
  V3 = c(NA, rep(2, 4), 1),
  y_4 = c(NA, "y", "z", "l", "m", "n"),
  X_AGE80 = c(17, 18, NA, 84, 21, 72),
  WT = c(2,3,8,4,0.1,5),
  X_PSU = c(1,2,3,4,5,6),
  X_STSTR = c(1,2,1,2,1,2)
)

regularVar_names_ex = c("Var1","Var2") 

input.ds.2018 %>%
  mutate(across(all_of(regularVar_names_ex), 
                \(x) calc_wt_mean_var_fn_df_ex(x, X_PSU, X_STSTR, WT), 
                .names = "wt_mean_{.col}"))
#> # A tibble: 6 × 10
#>    Var1  Var2    V3 y_4   X_AGE80    WT X_PSU X_STSTR wt_mean_Var1 wt_mean_Var2
#>   <dbl> <dbl> <dbl> <chr>   <dbl> <dbl> <dbl>   <dbl>        <dbl>        <dbl>
#> 1     1     1    NA <NA>       17   2       1       1        0.505        0.457
#> 2     1     0     2 y          18   3       2       2        0.505        0.457
#> 3    NA     1     2 z          NA   8       3       1        0.505        0.457
#> 4    NA     0     2 l          84   4       4       2        0.505        0.457
#> 5     1     1     2 m          21   0.1     5       1        0.505        0.457
#> 6     0     0     1 n          72   5       6       2        0.505        0.457

Created on 2025-01-29 with reprex v2.1.1

发布评论

评论列表(0)

  1. 暂无评论