Feature request: Effective sample size for `tbl_svysummary()` #2120

malcolmbarrett · 2025-01-09T18:24:57Z

We use tbl_svysummary() for propensity score-weighted analyses. tbl_svysummary() reports the sum of the weights as the sample size, which is certainly true from one perspective.

Another perspective is the effective sample size, the sample size from a simple random sampling that would give the same precision as the weighted analysis. It's often a good indicator of problems and intuitive for readers to understand the variance better. The calculation is very simple. In halfmoon, it's defined as:

ess <- function(wts) {
  sum(wts)^2 / sum(wts^2)
}

It would be great if tbl_svysummary() could have an option to include ESS. I think what I would really love is to be able to show the unweighted sample size, the sum of the weights, and the ESS all together, because each perspective is useful.

If you're interested, I'd be happy to make a PR if you could point me to the right spot for where these calculations happen

The text was updated successfully, but these errors were encountered:

ddsjoberg · 2025-01-09T19:34:15Z

It seems reasonable to me! @larmarange any thoughts?

@malcolmbarrett would this calculation appear in the header of tbl_svysummary() only? Anywhere else these calculations are needed? Would these estimates be needed for every categorical tabulation and used as the denominator?

larmarange · 2025-01-10T07:12:32Z

Hi

They are also mentioned here: https://rdrr.io/github/mainwaringb/rakehelper/man/eff_n.html

This implementation is also adapted for survey objects.

These metrics should be used carefully as they do not take into account stratifications or clustering.

Also, I do not know if they are valid when the weights represent total population size (which could also happen), i.e. when the sum of weights is equal to the total population in the survey area and not to the total number of individuals surveyed.

We may consider making these indicators available to be used in modify_header() but we should be careful about the doc. What do you think?

malcolmbarrett · 2025-01-10T18:06:14Z

Would these estimates be needed for every categorical tabulation and used as the denominator?

You could definitely use them this way, but I think it's diminishing returns in terms of gaining insight into the analysis. As a first pass, I'd be happy with being able to do it for the overall sample and by the column groups.

As @larmarange, they are imperfect and don't represent the variance well in some cases. In IPW, they are often a good diagnostic. I can't speak as much to surveys

ddsjoberg · 2025-01-14T16:05:34Z

Thank you both for your thoughts! If we do add it, the actual tabulation would go into the cardx package, and it would look something like this:

library(magrittr)

# function for calculating effective sample size
# (this only return n, not the denominator nor p)
ard_survey_ess <- function(data, by = NULL) {
  cardx:::set_cli_abort_call()

  # check inputs ---------------------------------------------------------------
  cardx:::check_not_missing(data)
  cardx:::check_class(data, "survey.design")

  # calculate ESS --------------------------------------------------------------
  cards::ard_continuous(
    data =
      data$variables |>
      dplyr::mutate(...cards_survey_design_weights_column... = stats::weights(data)),
    variables = "...cards_survey_design_weights_column...",
    by = {{ by }},
    statistic = everything() ~ list(ess = \(x) sum(x) ^ 2 / sum(x ^ 2))
  ) |>
    dplyr::mutate(
      variable = "..ess..",
      context = "survey_ess",
      stat_label = "Effective Sample Size"
    )
}

svy <- survey::svydesign(~1, data = gtsummary::trial, weights = ~1)
ard_survey_ess(svy)
#> {cards} data frame: 1 x 8
#>   variable   context stat_name stat_label stat fmt_fn
#> 1  ..ess.. survey_e…       ess  Effectiv…  200      1
#> ℹ 2 more variables: warning, error
ard_ess <- ard_survey_ess(svy, by = trt)
ard_ess
#> {cards} data frame: 2 x 10
#>   group1 group1_level variable stat_name stat_label stat
#> 1    trt       Drug A  ..ess..       ess  Effectiv…   98
#> 2    trt       Drug B  ..ess..       ess  Effectiv…  102
#> ℹ 4 more variables: context, fmt_fn, warning, error

To replace the current header of the table, you can use code like below.

# prep the results to add to header
lst_header <-
  ard_ess |>
  cards::apply_fmt_fn() |>
  dplyr::mutate(
    column = paste0("stat_", dplyr::row_number()),
    header = purrr::map2(.data$group1_level, stat_fmt, ~glue::glue("**{.x}**  \nN = {.y}"))
  ) |>
  dplyr::select(column, header) %>%
  {stats::setNames(.[["header"]], .[["column"]])}
lst_header
#> $stat_1
#> **Drug A**  
#> N = 98.0
#> 
#> $stat_2
#> **Drug B**  
#> N = 102.0

# add the header to a gtsummary table
gtsummary::tbl_svysummary(
  data = svy,
  by = trt,
  include = c(age, grade)
) |>
  gtsummary::bold_labels() |>
  gtsummary::modify_header(!!!lst_header) |>
  gtsummary::as_kable()

Characteristic	Drug A N = 98.0	Drug B N = 102.0
Age	46 (37, 60)	48 (39, 56)
Unknown	7	4
Grade
I	35 (36%)	33 (32%)
II	32 (33%)	36 (35%)
III	31 (32%)	33 (32%)

^{Created on 2025-01-14 with reprex v2.1.1}

I still need to think more about how and if it's something we should support natively. My concern is primarily around maintaining the code into the future, and further requests to have ESS Ns integrated further into the package (and then maintaining that code 😆 ).

If we do add, then I am thinking it would be an add-on function, like add_ess_header() that would replace the default header with the ESS numbers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Effective sample size for `tbl_svysummary()` #2120

Feature request: Effective sample size for `tbl_svysummary()` #2120

malcolmbarrett commented Jan 9, 2025

ddsjoberg commented Jan 9, 2025 •

edited

Loading

larmarange commented Jan 10, 2025

malcolmbarrett commented Jan 10, 2025

ddsjoberg commented Jan 14, 2025

Feature request: Effective sample size for tbl_svysummary() #2120

Feature request: Effective sample size for tbl_svysummary() #2120

Comments

malcolmbarrett commented Jan 9, 2025

ddsjoberg commented Jan 9, 2025 • edited Loading

larmarange commented Jan 10, 2025

malcolmbarrett commented Jan 10, 2025

ddsjoberg commented Jan 14, 2025

Feature request: Effective sample size for `tbl_svysummary()` #2120

Feature request: Effective sample size for `tbl_svysummary()` #2120

ddsjoberg commented Jan 9, 2025 •

edited

Loading