最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Passing data frame columns into simple functions with NSE - Stack Overflow

programmeradmin1浏览0评论

Every time I think I've figured out the details of passing data frame columns into functions, I find a new situation that complicates the process.

I have a custom function in which I'm passing the data frame columns using curly brackets {{}}. This works great for calling them as part of dplyr sequences, as shown in sampfun1 below. However, if I want to use a very simple function on a single column (for example, sd(mtcars$disp)), I run into difficulties, as it does not seem possible to use the curly brackets directly on the dataframe (df${{col}} or any similar alternative I've tried).

Right now I'm getting around this by using df[[deparse(substitute(col))]], as shown in sampfun2 below. This is fine, but is a bit clunky, especially in complex functions where multiple columns are being passed and then being used in different ways. Is there a simpler way to achieve the output for sampfun2? I know I could just pass the column name as a string and go directly to df[[col], but I'd like to avoid that since I'm using the column in other ways elsewhere in the function.

library(dplyr)

sampfun1 <- function(df, col){
  df %>% 
    mutate(xsd = sd({{col}}))
}

sampfun2 <- function(df, col){
  colStr <- deparse(substitute(col))
  dat_sd <- sd(df[[colStr]])
}

disp_sd1 <- sampfun1(mtcars, disp)
disp_sd2 <- sampfun2(mtcars, disp)

EDIT for clarification: This is a very simplified function just to display the issue of passing a column into a function and then calling just the column (rather than e.g. something through dplyr that calls first the data frame and then the function). My goal isn't to pass a large number of columns to the same function, just to simplify the syntax if I need to repeatedly call that column in different contexts. When calling a subset of the data frame using dplyr, this isn't a problem - it only arises when trying to extract the column. Here is another example to maybe better illustrate what I'm trying to do:

sampfun3 <- function(df, col){
  single_col <- df %>% select({{col}}) %>% pull()
  dat_sd <- sd(single_col)
}

This also works for what I'm trying to do, though it's a little more cumbersome than sampfun2. I was just wondering if there's a simpler way to extract a specific column when it's been passed using {{}}.

Every time I think I've figured out the details of passing data frame columns into functions, I find a new situation that complicates the process.

I have a custom function in which I'm passing the data frame columns using curly brackets {{}}. This works great for calling them as part of dplyr sequences, as shown in sampfun1 below. However, if I want to use a very simple function on a single column (for example, sd(mtcars$disp)), I run into difficulties, as it does not seem possible to use the curly brackets directly on the dataframe (df${{col}} or any similar alternative I've tried).

Right now I'm getting around this by using df[[deparse(substitute(col))]], as shown in sampfun2 below. This is fine, but is a bit clunky, especially in complex functions where multiple columns are being passed and then being used in different ways. Is there a simpler way to achieve the output for sampfun2? I know I could just pass the column name as a string and go directly to df[[col], but I'd like to avoid that since I'm using the column in other ways elsewhere in the function.

library(dplyr)

sampfun1 <- function(df, col){
  df %>% 
    mutate(xsd = sd({{col}}))
}

sampfun2 <- function(df, col){
  colStr <- deparse(substitute(col))
  dat_sd <- sd(df[[colStr]])
}

disp_sd1 <- sampfun1(mtcars, disp)
disp_sd2 <- sampfun2(mtcars, disp)

EDIT for clarification: This is a very simplified function just to display the issue of passing a column into a function and then calling just the column (rather than e.g. something through dplyr that calls first the data frame and then the function). My goal isn't to pass a large number of columns to the same function, just to simplify the syntax if I need to repeatedly call that column in different contexts. When calling a subset of the data frame using dplyr, this isn't a problem - it only arises when trying to extract the column. Here is another example to maybe better illustrate what I'm trying to do:

sampfun3 <- function(df, col){
  single_col <- df %>% select({{col}}) %>% pull()
  dat_sd <- sd(single_col)
}

This also works for what I'm trying to do, though it's a little more cumbersome than sampfun2. I was just wondering if there's a simpler way to extract a specific column when it's been passed using {{}}.

Share Improve this question edited Feb 15 at 19:34 Jaken asked Feb 15 at 16:29 JakenJaken 5212 silver badges10 bronze badges 1
  • 1 I prefer and encourage people coding custom functions in base R. Here is how to do it in {dplyr} according to its documentation. – Friede Commented Feb 15 at 21:27
Add a comment  | 

3 Answers 3

Reset to default 1

More approaches:

sampfun3 <- function(df, col) {
  df |> pull({{col}}) |> sd()
}

> sampfun3(mtcars, disp)
[1] 123.9387



sampfun4 <- function(df, col){
  df |> summarize(across( {{col}}, ~sd(.x)))
}

sampfun4(mtcars, disp)

> sampfun3(mtcars, disp)
      disp
1 123.9387

You could use dots and match.call.

sampfun3 <- function(df, ..., FUN=sd) {
  args <- match.call(expand.dots=FALSE)$...
  df[sapply(args, deparse)] |> sapply(FUN)
}

> sampfun3(mtcars, disp)
    disp 
123.9387 
> sampfun3(mtcars, disp, mpg, am, hp)
       disp         mpg          am          hp 
123.9386938   6.0269481   0.4989909  68.5628685 
> sampfun3(mtcars, disp, mpg, am, hp, FUN=mean)
     disp       mpg        am        hp 
230.72188  20.09062   0.40625 146.68750 

The first two use rlang but the last two seem closer to what is mentioned in the comment.

1) dplyr If the problem is to calculate the sd of several columns then we can pass a selection using tidy-select syntax.

library(dplyr)

sampfun3 <- function(df, sel) {
  df %>% summarize(across({{sel}}, sd))
}

sampfun3(mtcars, mpg:disp)  # columns from mpg to disp
##        mpg      cyl     disp
## 1 6.026948 1.785922 123.9387

sampfun3(mtcars, starts_with("c"))  # columns whose name starts with c
##        cyl   carb
## 1 1.785922 1.6152

sampfun3(mtcars, disp)  # just disp
##       disp
## 1 123.9387

2) rlang If the problem is not multiple columns but rather just avoiding character strings then this does not use any character strings anywhere. It requires one extra line of code for each unquoted argument passed.

library(rlang)

sampfun4 <- function(df, col) {
  col <- eval_tidy(enquo(col), df)
  sd(col) 
}

sampfun4(mtcars, disp)
## [1] 123.9387

3) Base R With this approach we start and end the function body as shown and between those two lines we can have as many lines and references to arguments as desired with no extra per-argument code.

sampfun5 <- function(df, col1, col2) eval.parent(substitute({
  sd(df$col1) / mean(df$col2)
}))

sampfun5(mtcars, disp, cyl)
## [1] 20.0305

4) gtools defmacro in gtools provides a wrapper implementing (3). See the article by Thomas Lumley starting on page 11 of https://cran.r-project./doc/Rnews/Rnews_2001-3.pdf .

library(gtools)

sampfun6 <- defmacro(df, col1, col2, expr = {
  sd(df$col1) / mean(df$col2)
})

sampfun6(mtcars, disp, cyl)
## [1] 20.0305
发布评论

评论列表(0)

  1. 暂无评论