最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

r - Is there a way to calculate IQR for dates in gtsummary - Stack Overflow

programmeradmin6浏览0评论

I want to have IQR for dates but when I run the code there are columns with NA...

library(tibble)
library(gtsummary)

set.seed(123)  # Set seed for reproducibility

date_tbl <- tibble(
  start_date = sample(seq(as.Date("2023-01-01"), as.Date("2023-12-31"),
                          by = "day"), 100, replace = TRUE),
  end_date = sample(seq(as.Date("2024-01-01"), as.Date("2024-12-31"),
                        by = "day"), 100, replace = TRUE),
  country = sample(c("Kenya", "uganda", "Rwanda", "Burundi"), 100,
                   replace = TRUE)
)

date_tbl |>
  tbl_summary(by = "country")
#> The following errors were returned during `tbl_summary()`:
#> ✖ For variable `end_date` (`country = "Burundi"`) and "p25" and "p75"
#>   statistics: * not defined for "Date" objects
#> ✖ For variable `start_date` (`country = "Burundi"`) and "p25" and "p75"
#>   statistics: * not defined for "Date" objects
#> ✖ For variable `end_date` (`country = "uganda"`) and "p25" and "p75"
#>   statistics: * not defined for "Date" objects
#> ✖ For variable `start_date` (`country = "uganda"`) and "p75" statistic: * not
#>   defined for "Date" objects

Created on 2025-03-24 with reprex v2.1.1

I want to have IQR for dates but when I run the code there are columns with NA...

library(tibble)
library(gtsummary)

set.seed(123)  # Set seed for reproducibility

date_tbl <- tibble(
  start_date = sample(seq(as.Date("2023-01-01"), as.Date("2023-12-31"),
                          by = "day"), 100, replace = TRUE),
  end_date = sample(seq(as.Date("2024-01-01"), as.Date("2024-12-31"),
                        by = "day"), 100, replace = TRUE),
  country = sample(c("Kenya", "uganda", "Rwanda", "Burundi"), 100,
                   replace = TRUE)
)

date_tbl |>
  tbl_summary(by = "country")
#> The following errors were returned during `tbl_summary()`:
#> ✖ For variable `end_date` (`country = "Burundi"`) and "p25" and "p75"
#>   statistics: * not defined for "Date" objects
#> ✖ For variable `start_date` (`country = "Burundi"`) and "p25" and "p75"
#>   statistics: * not defined for "Date" objects
#> ✖ For variable `end_date` (`country = "uganda"`) and "p25" and "p75"
#>   statistics: * not defined for "Date" objects
#> ✖ For variable `start_date` (`country = "uganda"`) and "p75" statistic: * not
#>   defined for "Date" objects

Created on 2025-03-24 with reprex v2.1.1

Share Improve this question asked Mar 24 at 8:14 MosesMoses 1,51617 silver badges35 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

I want to have IQR for dates

1) A single numeric value (it's very definition)

Set-up your IQR-function with type=1 (see Details: Types in help file of quantile).

library(gtsummary)
iqr_date = \(x) IQR(x, type=1)

date_tbl |>
  tbl_summary(statistic=list(start_date ~ '{iqr_date}', 
                             end_date ~ '{iqr_date}'), 
              by='country')

2) A date range

Date of q25 to date of q75 as character. This might be what you want.

date_iq_range = \(x) quantile(x, probs=c(.25, .75), type=1) |>
  paste0(collapse='-to-') 

date_tbl |>
  tbl_summary(statistic=list(start_date ~ '{date_iq_range}', 
                             end_date ~ '{date_iq_range}'), 
              by='country')

You might want to use a different separator than -to-. Maybe |> toString() |> paste0('(', ...=_, ')') instead of |> paste0(collapse='-to-').


A) Data

set.seed(123)  
date_tbl = tibble::tibble(
  start_date = sample(seq(as.Date("2023-01-01"), as.Date("2023-12-31"),
                          by = "day"), 100, replace = TRUE),
  end_date = sample(seq(as.Date("2024-01-01"), as.Date("2024-12-31"),
                        by = "day"), 100, replace = TRUE),
  country = sample(c("Kenya", "uganda", "Rwanda", "Burundi"), 100,
                   replace = TRUE)
)

I don't have a great answer for you. But the issue is not related to gtsummary, and perhaps may be a bug in the quantile() function?

Running the code you provided, the the quantiles can be calculated for 2 of the countries, while the other two result in errors. I did some poking around, but didn't see a clear pattern in the data that resulted in an error vs a returned quantile value.

library(gtsummary)
set.seed(123)  # Set seed for reproducibility
date_tbl <- tibble::tibble(
start_date = sample(seq(as.Date("2023-01-01"), as.Date("2023-12-31"),
by = "day"), 100, replace = TRUE),
end_date = sample(seq(as.Date("2024-01-01"), as.Date("2024-12-31"),
by = "day"), 100, replace = TRUE),
country = sample(c("Kenya", "uganda", "Rwanda", "Burundi"), 100,
replace = TRUE)
)
date_tbl |> tbl_summary(by = country) |> as_kable()
#> The following errors were returned during `as_kable()`:
#> ✖ For variable `end_date` (`country = "Burundi"`) and "p25" and "p75"
#>   statistics: * not defined for "Date" objects
#> ✖ For variable `start_date` (`country = "Burundi"`) and "p25" and "p75"
#>   statistics: * not defined for "Date" objects
#> ✖ For variable `end_date` (`country = "uganda"`) and "p25" and "p75"
#>   statistics: * not defined for "Date" objects
#> ✖ For variable `start_date` (`country = "uganda"`) and "p75" statistic: * not
#>   defined for "Date" objects
**Characteristic** **Burundi** N = 16 **Kenya** N = 43 **Rwanda** N = 17 **uganda** N = 24
start_date 2023-03-30 (NA, NA) 2023-06-07 (2023-03-13, 2023-09-13) 2023-08-24 (2023-04-26, 2023-10-26) 2023-07-16 (2023-05-17, NA)
end_date 2024-08-26 (NA, NA) 2024-07-16 (2024-03-24, 2024-10-12) 2024-07-18 (2024-05-04, 2024-11-05) 2024-06-17 (NA, NA)
# Error for Burundi, but no error for Kenya
date_tbl |>
dplyr::filter(country == "Burundi") |>
dplyr::pull(start_date) |>
quantile(probs = 0.25, type = 2)
#> Error in Ops.Date((1 - h), x[j + 2L]): * not defined for "Date" objects
date_tbl |>
dplyr::filter(country == "Kenya") |>
dplyr::pull(start_date) |>
quantile(probs = 0.25, type = 2)
#>          25%
#> "2023-03-13"
# excluding the first obs, no error
date_tbl$start_date[2:100] |>
quantile(probs = 0.25, type = 2)
#>          25%
#> "2023-03-22"
# including all obs, ERROR
date_tbl$start_date[1:100] |>
quantile(probs = 0.25, type = 2)
#> Error in Ops.Date((1 - h), x[j + 2L]): * not defined for "Date" objects
# excluding the last 5 obs, no error
date_tbl$start_date[1:95] |>
quantile(probs = 0.25, type = 2)
#>          25%
#> "2023-03-19"

<sup>Created on 2025-03-24 with [reprex v2.1.1](https://reprex.tidyverse.)</sup>

发布评论

评论列表(0)

  1. 暂无评论