Every time I think I've figured out the details of passing data frame columns into functions, I find a new situation that complicates the process.
I have a custom function in which I'm passing the data frame columns using curly brackets {{}}. This works great for calling them as part of dplyr sequences, as shown in sampfun1 below. However, if I want to use a very simple function on a single column (for example, sd(mtcars$disp)
), I run into difficulties, as it does not seem possible to use the curly brackets directly on the dataframe (df${{col}}
or any similar alternative I've tried).
Right now I'm getting around this by using df[[deparse(substitute(col))]]
, as shown in sampfun2 below. This is fine, but is a bit clunky, especially in complex functions where multiple columns are being passed and then being used in different ways. Is there a simpler way to achieve the output for sampfun2? I know I could just pass the column name as a string and go directly to df[[col]
, but I'd like to avoid that since I'm using the column in other ways elsewhere in the function.
library(dplyr)
sampfun1 <- function(df, col){
df %>%
mutate(xsd = sd({{col}}))
}
sampfun2 <- function(df, col){
colStr <- deparse(substitute(col))
dat_sd <- sd(df[[colStr]])
}
disp_sd1 <- sampfun1(mtcars, disp)
disp_sd2 <- sampfun2(mtcars, disp)
EDIT for clarification: This is a very simplified function just to display the issue of passing a column into a function and then calling just the column (rather than e.g. something through dplyr that calls first the data frame and then the function). My goal isn't to pass a large number of columns to the same function, just to simplify the syntax if I need to repeatedly call that column in different contexts. When calling a subset of the data frame using dplyr
, this isn't a problem - it only arises when trying to extract the column. Here is another example to maybe better illustrate what I'm trying to do:
sampfun3 <- function(df, col){
single_col <- df %>% select({{col}}) %>% pull()
dat_sd <- sd(single_col)
}
This also works for what I'm trying to do, though it's a little more cumbersome than sampfun2. I was just wondering if there's a simpler way to extract a specific column when it's been passed using {{}}.
Every time I think I've figured out the details of passing data frame columns into functions, I find a new situation that complicates the process.
I have a custom function in which I'm passing the data frame columns using curly brackets {{}}. This works great for calling them as part of dplyr sequences, as shown in sampfun1 below. However, if I want to use a very simple function on a single column (for example, sd(mtcars$disp)
), I run into difficulties, as it does not seem possible to use the curly brackets directly on the dataframe (df${{col}}
or any similar alternative I've tried).
Right now I'm getting around this by using df[[deparse(substitute(col))]]
, as shown in sampfun2 below. This is fine, but is a bit clunky, especially in complex functions where multiple columns are being passed and then being used in different ways. Is there a simpler way to achieve the output for sampfun2? I know I could just pass the column name as a string and go directly to df[[col]
, but I'd like to avoid that since I'm using the column in other ways elsewhere in the function.
library(dplyr)
sampfun1 <- function(df, col){
df %>%
mutate(xsd = sd({{col}}))
}
sampfun2 <- function(df, col){
colStr <- deparse(substitute(col))
dat_sd <- sd(df[[colStr]])
}
disp_sd1 <- sampfun1(mtcars, disp)
disp_sd2 <- sampfun2(mtcars, disp)
EDIT for clarification: This is a very simplified function just to display the issue of passing a column into a function and then calling just the column (rather than e.g. something through dplyr that calls first the data frame and then the function). My goal isn't to pass a large number of columns to the same function, just to simplify the syntax if I need to repeatedly call that column in different contexts. When calling a subset of the data frame using dplyr
, this isn't a problem - it only arises when trying to extract the column. Here is another example to maybe better illustrate what I'm trying to do:
sampfun3 <- function(df, col){
single_col <- df %>% select({{col}}) %>% pull()
dat_sd <- sd(single_col)
}
This also works for what I'm trying to do, though it's a little more cumbersome than sampfun2. I was just wondering if there's a simpler way to extract a specific column when it's been passed using {{}}.
Share Improve this question edited Feb 15 at 19:34 Jaken asked Feb 15 at 16:29 JakenJaken 5212 silver badges10 bronze badges 1 |3 Answers
Reset to default 1More approaches:
sampfun3 <- function(df, col) {
df |> pull({{col}}) |> sd()
}
> sampfun3(mtcars, disp)
[1] 123.9387
sampfun4 <- function(df, col){
df |> summarize(across( {{col}}, ~sd(.x)))
}
sampfun4(mtcars, disp)
> sampfun3(mtcars, disp)
disp
1 123.9387
You could use dots and match.call
.
sampfun3 <- function(df, ..., FUN=sd) {
args <- match.call(expand.dots=FALSE)$...
df[sapply(args, deparse)] |> sapply(FUN)
}
> sampfun3(mtcars, disp)
disp
123.9387
> sampfun3(mtcars, disp, mpg, am, hp)
disp mpg am hp
123.9386938 6.0269481 0.4989909 68.5628685
> sampfun3(mtcars, disp, mpg, am, hp, FUN=mean)
disp mpg am hp
230.72188 20.09062 0.40625 146.68750
The first two use rlang but the last two seem closer to what is mentioned in the comment.
1) dplyr If the problem is to calculate the sd
of several columns then we can pass a selection using tidy-select syntax.
library(dplyr)
sampfun3 <- function(df, sel) {
df %>% summarize(across({{sel}}, sd))
}
sampfun3(mtcars, mpg:disp) # columns from mpg to disp
## mpg cyl disp
## 1 6.026948 1.785922 123.9387
sampfun3(mtcars, starts_with("c")) # columns whose name starts with c
## cyl carb
## 1 1.785922 1.6152
sampfun3(mtcars, disp) # just disp
## disp
## 1 123.9387
2) rlang If the problem is not multiple columns but rather just avoiding character strings then this does not use any character strings anywhere. It requires one extra line of code for each unquoted argument passed.
library(rlang)
sampfun4 <- function(df, col) {
col <- eval_tidy(enquo(col), df)
sd(col)
}
sampfun4(mtcars, disp)
## [1] 123.9387
3) Base R With this approach we start and end the function body as shown and between those two lines we can have as many lines and references to arguments as desired with no extra per-argument code.
sampfun5 <- function(df, col1, col2) eval.parent(substitute({
sd(df$col1) / mean(df$col2)
}))
sampfun5(mtcars, disp, cyl)
## [1] 20.0305
4) gtools defmacro
in gtools provides a wrapper implementing (3). See the article by Thomas Lumley starting on page 11 of https://cran.r-project./doc/Rnews/Rnews_2001-3.pdf .
library(gtools)
sampfun6 <- defmacro(df, col1, col2, expr = {
sd(df$col1) / mean(df$col2)
})
sampfun6(mtcars, disp, cyl)
## [1] 20.0305
{dplyr}
according to its documentation. – Friede Commented Feb 15 at 21:27