Add support for `vcov = "satterthwaite" | "kenward-roger"` w/ aggragation #1302

mattansb · 2024-12-12T22:50:13Z

Currently, when aggregation is done, an error is thrown regarding the Satterthwaite / Kenward-Roger degrees of freedom.

library(lmerTest)
library(marginaleffects)

data("sleepstudy", package="lme4")

m <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)

slopes(m, vcov = "satterthwaite")
#> 
#>  Estimate Std. Error    t Pr(>|t|)    S 2.5 % 97.5 % Df
#>      19.7       1.55 12.7   <0.001 31.2 16.40   22.9 17
#>      19.7       1.55 12.7   <0.001 31.2 16.40   22.9 17
#>      19.7       1.55 12.7   <0.001 31.2 16.41   22.9 17
#>      19.7       1.55 12.7   <0.001 31.2 16.40   22.9 17
#>      19.7       1.55 12.7   <0.001 31.2 16.40   22.9 17
#> --- 170 rows omitted. See ?avg_slopes and ?print.marginaleffects --- 
#>      11.8       1.55  7.6   <0.001 20.4  8.49   15.0 17
#>      11.8       1.55  7.6   <0.001 20.4  8.49   15.0 17
#>      11.8       1.55  7.6   <0.001 20.4  8.49   15.0 17
#>      11.8       1.55  7.6   <0.001 20.4  8.49   15.0 17
#>      11.8       1.55  7.6   <0.001 20.4  8.49   15.0 17
#> Term: Days
#> Type:  response 
#> Comparison: dY/dX
#> Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted, Reaction, Days, Subject, df 
#> 


avg_slopes(m, vcov = "satterthwaite")
#> Error: Satterthwaite and Kenward-Roger corrections are not supported in this command.

It is my understanding that this is because these d.f. are obtained via insight::get_df(model, data = newdata, type = vcov, df_per_observation = TRUE), returning one d.f. per-row of newdata.

The way {emmeans} deals with cases were the d.f.s aren't directly estimatable via lmerTest::calcSatterth() or pbkrtest::Lb_ddf() (such as when non-linear transformations are applied) is to take the smallest d.f. for the aggregated group:

https://github.com/rvlenth/emmeans/blob/495ed3ac04bb494e41dba053dbdf299e8e093605/R/emmGrid-methods.R#L971-L972

Would this possibly be applicable here as well?

The text was updated successfully, but these errors were encountered:

vincentarelbundock · 2024-12-12T23:05:14Z

That's interesting.

To be honest, I know none of the relevant theory, here. I'm definitely open to doing it that way, but before implementing I'd ideally like to see a reference to an academic text to justify this. Does such a thing exist?

mattansb · 2024-12-15T08:44:44Z

I'm not familiar with such a reference. @rvlenth, can you point us in the right direction?

rvlenth · 2024-12-15T19:54:54Z

Maybe I'm missing something, but isn't your aggregation a linear process? If so, the right thing to do is aggregate the linear functions (of the fixed-effect coefficients) involved, thus obtaining one linear function for each aggregated estimate; then apply pbkrtest::Lb_ddf() or whatever to those.

The only place (at least that I can remember) where emmeans uses the minimum d.f. is in emmeans::joint_tests() where we are doing a joint test of several linear functions, as opposed to a simple average of those functions.

mattansb · 2024-12-15T21:16:25Z

isn't your aggregation a linear process? [...] thus obtaining one linear function for each aggregated estimate; then apply pbkrtest::Lb_ddf() or whatever to those.

As far as I understand, the various df-producing weights require a vector/matrix of weights in terms of the model's fixed effects. But there are never produced by marginaleffects, and so even if the aggregation is linear, this cannot be used to produce dfs.

This also seems to be the case in emmeans - once the emmGrid is no longer defined in terms of linear transformations of the fixed effects, the "min-df" method is used.

library(lmerTest)
library(emmeans)

sleepstudy$Reaction[c(1:3, 11:14)] <- NA

fm1 <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)

em <- emmeans(fm1, ~ Days, cov.reduce = FALSE)
em@dffun
#> function (k, dfargs) 
#> pbkrtest::Lb_ddf(k, dfargs$unadjV, dfargs$adjV)
#> <environment: base>
#> attr(,"mesg")
#> [1] "kenward-roger"

em2 <- regrid(em)
em2@dffun
#> function (k, dfargs) 
#> {
#>     idx = which(zapsmall(k) != 0)
#>     ifelse(length(idx) == 0, NA, min(dfargs$df[idx], na.rm = TRUE))
#> }
#> <bytecode: 0x000001ec6fb8a278>
#> <environment: 0x000001ec6fb8e278>
#> attr(,"mesg")
#> [1] "inherited from kenward-roger when re-gridding"

^{Created on 2024-12-15 with reprex v2.1.1}

rvlenth · 2024-12-16T21:36:05Z

But I don't see why maerginaleffects needs to re-grid anything. We're just estimating slopes, then averaging them. If you can get the Satterthwaite or K-R d.f. for each slope, then you can get it for the average.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `vcov = "satterthwaite" | "kenward-roger"` w/ aggragation #1302

Add support for `vcov = "satterthwaite" | "kenward-roger"` w/ aggragation #1302

mattansb commented Dec 12, 2024

vincentarelbundock commented Dec 12, 2024

mattansb commented Dec 15, 2024

rvlenth commented Dec 15, 2024

mattansb commented Dec 15, 2024

rvlenth commented Dec 16, 2024

Add support for vcov = "satterthwaite" | "kenward-roger" w/ aggragation #1302

Add support for vcov = "satterthwaite" | "kenward-roger" w/ aggragation #1302

Comments

mattansb commented Dec 12, 2024

vincentarelbundock commented Dec 12, 2024

mattansb commented Dec 15, 2024

rvlenth commented Dec 15, 2024

mattansb commented Dec 15, 2024

rvlenth commented Dec 16, 2024

Add support for `vcov = "satterthwaite" | "kenward-roger"` w/ aggragation #1302

Add support for `vcov = "satterthwaite" | "kenward-roger"` w/ aggragation #1302