The goal of partR2
is to partition the variance
explained in generalized linear mixed models (GLMMs) into variation
unique to and shared among predictors. This can be done using
semi-partial (or ‘part’) R2 and inclusive R2. But the package also
does a few other things. Here is a quick summary of what the package
calculates:
To install the development version of partR2
:
In order to use the package appropriately, it is important to know about the key quantities and how they are calculated. Here are some brief clarifications:
effectsize
(Makowski &
Lüdecke, 2019) offer more flexible ways to estimate beta weights.rptR
and partR2
: ratios of variance
componentsThe rptR
package (Stoffel et al., 2017) and the
partR2
package go hand in hand. The intra-class coefficient
or repeatability calculated in rptR
is the proportion of
variance explained by random effects while the partR2
package estimates the proportion of variance explained by fixed effects
(or fixed plus random effects when
R2_type = "conditional"
). Both the repeatability R and the coefficient of
determination R2
are therefore ratios of variance components. However, we now changed the
design of the partR2
package compared to rptR
.
The partR2
user now fits the model first in
lme4
so that any modeling problem can be recognized
straight away. We think this is generally preferable as it separates the
modeling part from the rest of the calculations and allows the user to
focus on specifying an appropriate model first.
The workhorse of the package is a single function,
partR2()
. It takes a fitted model from lme4
(Bates et al. 2015), with either a Gaussian, Poisson or binomial family.
The function returns a partR2
object. The package includes
print()
, summary()
and
forestplot()
functions to display the results, and a helper
function mergeR2
that is useful for models with
interactions (see below).
Before we go through some examples, we load the biomass
dataset. This is a simulated dataset that aims to mimic a study on
biomass production in grasslands. In a nutshell, virtual invertebrates
were sampled once every year over 10 consecutive years (Year)
from 20 different virtual populations (Population).
Temperature and Precipitation were measured and
overall species diversity (SpeciesDiversity) and
Biomass were recorded for each Population in each
Year.
library(partR2)
library(lme4)
data("biomass")
head(biomass)
#> Temperature Precipitation Population SpeciesDiversity Year Biomass
#> 1 24.62971 399.7463 1 24 1991 41.61498
#> 2 21.61168 371.0526 1 24 1992 37.78420
#> 3 23.28266 414.3659 1 24 1993 40.21902
#> 4 27.53635 416.6633 1 24 1994 44.45499
#> 5 26.67594 403.2696 1 24 1995 43.63841
#> 6 29.43292 410.0026 1 24 1996 44.34646
#> ResilienceLat Recovered NotRecovered ExtinctionLat Extinction
#> 1 0.8095387 38 12 5.274812 4
#> 2 0.6994134 40 10 3.068951 5
#> 3 0.9060063 47 3 5.437968 7
#> 4 0.9915898 50 0 5.460337 4
#> 5 0.9178274 46 4 6.174574 5
#> 6 0.9025914 45 5 5.076269 4
First we fit a linear mixed model in lme4
. We assume,
that Biomass depends on effects of Year,
Temperature, Precipitation and
SpeciesDiversity (fitted as fixed effects) and
Population (fitted as a random effect).
modBM <- lmer(Biomass ~ Year + Temperature + Precipitation +
SpeciesDiversity + (1|Population), data = biomass)
Now we would usually do the standard model checks and evaluations to
ensure that the model works well. For the sake of simplicity (and
because we simulated the data from Gaussian distributions), we skip this
step here and go straight into the analysis with partR2
. We
first calculate the overall marginal R2 and use parametric
bootstrapping to estimate confidence intervals. Marginal R2 refers to the variance
explained by fixed effect predictors relative to the total variance in
the response. Alternatively, we can estimate conditional R2 (by setting
R2_type ="conditional"
), which is the variance explained by
fixed effects and random effects together relative to the total variance
in the response.
Note that we supply the fitted merMod
object (the
lme4
output) and the original dataset used to fit the
model. If no data is provided, partR2 will try to fetch it, so it is
usually not necessary to provide the data. There is one important thing
to pay attention to: If there are missing observations for some of the
predictors/response, lmer
and glmer
will
remove all rows containing NA
, which will result in a
mismatch between the data in the data
object and the data
used to fit the model. In order to avoid complications, it is advisable
to remove rows with missing data prior to the fitting the model.
R2_BM <- partR2(modBM, data = biomass, R2_type = "marginal", nboot = 10)
R2_BM
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.6006 0.529 0.6641 10 5
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> [1] "No partitions selected."
Marginal R2 is around 60% and confidence intervals are fairly narrow. Temperature and Precipitation are highly correlated in the dataset (as they often are in real-life situations) and we want to know how much each of them uniquely explains and what they explain together.
R2_BMa <- partR2(modBM, partvars = c("Temperature", "Precipitation"),
R2_type = "marginal", nboot = 10)
R2_BMa
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.6006 0.571 0.6807 10 5
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.6006 0.5710 0.6807 10 5
#> Temperature 0.0427 0.0000 0.1616 10 4
#> Precipitation 0.0830 0.0042 0.1986 10 4
#> Temperature+Precipitation 0.3913 0.3388 0.4857 10 3
So Temperature and Precipitation each uniquely explain only a small proportion of the variation in Biomass (around 4% and 9%, respectively). Together, however, they explain around 39% of the variation. This situation is typical for correlated predictors, since part R2 is the variance uniquely explained by each predictor, while here a large part of the variance is explained equally by both predictors. If Temperature is removed from the model, the variance explained is only marginally reduced, because Precipitation still explains a large part of the variance in Biomass (and vice versa).
Besides part R2,
partR2
also estimates inclusive R2, standardized model
estimates (beta weights) and structure coefficients. These are shown
when calling the summary()
function.
summary(R2_BMa)
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper ndf
#> 0.6006 0.571 0.6807 5
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper ndf
#> Model 0.6006 0.5710 0.6807 5
#> Temperature 0.0427 0.0000 0.1616 4
#> Precipitation 0.0830 0.0042 0.1986 4
#> Temperature+Precipitation 0.3913 0.3388 0.4857 3
#>
#> ----------
#>
#> Inclusive R2 (SC^2 * R2):
#> Predictor IR2 CI_lower CI_upper
#> Year 0.0219 0.0102 0.0521
#> Temperature 0.3530 0.2611 0.4026
#> Precipitation 0.3749 0.2766 0.4295
#> SpeciesDiversity 0.1873 0.1254 0.2932
#>
#> ----------
#>
#> Structure coefficients r(Yhat,x):
#> Predictor SC CI_lower CI_upper
#> Year 0.1912 0.1318 0.2887
#> Temperature 0.7667 0.6623 0.8107
#> Precipitation 0.7901 0.6796 0.8473
#> SpeciesDiversity 0.5585 0.4601 0.7012
#>
#> ----------
#>
#> Beta weights (standardised estimates)
#> Predictor BW CI_lower CI_upper
#> Year 0.1140 0.0647 0.1927
#> Temperature 0.2855 0.2079 0.3593
#> Precipitation 0.4004 0.2882 0.4631
#> SpeciesDiversity 0.3986 0.3182 0.5127
#>
#> ----------
#>
#> Parametric bootstrapping resulted in warnings or messages:
#> Check r2obj$boot_warnings and r2obj$boot_messages.
Beta weights (BW) show that all four predictors seem to have an effect on Biomass, because none of the confidence intervals overlaps zero. The effect of Precipitation appears largest. Structure coefficients (SC) tell us that both Temperature and Precipitation are quite strongly correlated with the overall model prediction. Structure coefficients are correlations and by squaring them followed by multiplications with the marginal R2 of the full model, we get inclusive R2 (IR2 = SC2 * R2). SC and IR2 are large for Temperature and Precipitation as both are highly correlated to the predicted response, but their part R2 (the variance that they uniquely explain) are small due to their collinearity.
By default, partR2
calculates part R2 for all predictors
individually and in all possible combinations. The number of
combinations increases exponentially with 2n − 1 combinations for
n predictors. This increases
computation time exponentially. There are two options to control the
number of R2 to be
calculated. Sometimes we want only part R2 for each predictor,
but not for all their combinations. We can specify this with the
argument max_level = 1
.
R2_BMb <- partR2(modBM, partvars = c("Temperature", "Precipitation", "Year",
"SpeciesDiversity"),
R2_type = "marginal", max_level = 1, nboot = 10)
R2_BMb
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.6006 0.543 0.6389 10 5
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.6006 0.5430 0.6389 10 5
#> Temperature 0.0427 0.0000 0.1250 10 4
#> Precipitation 0.0830 0.0052 0.1611 10 4
#> Year 0.0125 0.0000 0.0979 10 4
#> SpeciesDiversity 0.1806 0.1074 0.2487 10 4
Another option is offered by the partbatch
argument.
partbatch
takes a list of character vectors that contain
predictor names. The terms in each character vector are then always
fitted or removed together. This is most useful for models with many
variables, when using dummy coding or when predictors are otherwise
linked in any meaningful way. We illustrate the use of
partbatch
here by requesting part R2 for the two climatic
variables (Temperature and Precipitation) in
combination.
R2_BMc <- partR2(modBM, partbatch = list(c("Temperature", "Precipitation")),
R2_type = "marginal", nboot = 10)
R2_BMc
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.6006 0.5693 0.6585 10 5
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.6006 0.5693 0.6585 10 5
#> Temperature+Precipitation 0.3913 0.3525 0.4580 10 3
partbatch
can be used instead of or in combination with
partvars
. For convenience it is also possible to name the
elements in the list given to partbatch
. The output will
then show the name of the batch instead of all list elements. We now
request part R2 for
the two climatic variables (now named ClimateVars) along with
SpeciesDiversity.
R2_BMd <- partR2(modBM, partvars = c("SpeciesDiversity"),
partbatch = list(ClimateVars = c("Temperature", "Precipitation")),
R2_type = "marginal", nboot = 10)
R2_BMd
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.6006 0.5349 0.6439 10 5
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.6006 0.5349 0.6439 10 5
#> ClimateVars 0.3913 0.3302 0.4561 10 3
#> SpeciesDiversity 0.1806 0.1068 0.2715 10 4
#> SpeciesDiversity+ClimateVars 0.5786 0.5134 0.6235 10 2
We can plot the results as forestplots, powered by
ggplot2
(Wickham, 2016). The output of
forestplot
is a ggplot
object. We can
customize the object using ggplot
syntax or assemble
multiple plots with packages like patchwork
(Pedersen,
2019). In the example here, we added a few arguments for a nicer
appearance (see ?forestplot
for more information). What is
plotted is controlled by the type
argument, which can be
type = "R2"
(the default) for part R2,
type = "IR2"
for inclusive R2,
type = "SC"
for structure coefficients,
type = "BW"
for beta weights and type = "Ests"
for raw model estimates.
library(patchwork)
p1 <- forestplot(R2_BMb, type = "R2", text_size = 10)
p2 <- forestplot(R2_BMb, type = "IR2", text_size = 10)
p3 <- forestplot(R2_BMb, type = "SC", text_size = 10)
p4 <- forestplot(R2_BMb, type = "BW", text_size = 10)
(p1 + p2) / (p3 + p4) + plot_annotation(tag_levels = "A")
However, for a publication we might want the data extract the results
either for more customized plotting or because we want to present it in
a table. Here is how the results can be retrieved from a
partR2
object:
# An overview
str(R2_BMb, max.level = 1)
#> List of 17
#> $ call : language lmer(formula = Biomass ~ Year + Temperature + Precipitation + SpeciesDiversity + (1 | Population), data = biomass)
#> $ R2_type : chr "marginal"
#> $ R2 : tibble [5 × 5] (S3: tbl_df/tbl/data.frame)
#> $ SC : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#> $ IR2 : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#> $ BW : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#> $ Ests : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#> $ R2_boot : tibble [5 × 2] (S3: tbl_df/tbl/data.frame)
#> $ SC_boot : tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> $ IR2_boot : tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> $ BW_boot : tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> $ Ests_boot : tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#> $ partvars : chr [1:4] "Temperature" "Precipitation" "Year" "SpeciesDiversity"
#> $ partbatch : logi NA
#> $ CI : num 0.95
#> $ boot_warnings:List of 10
#> $ boot_messages:List of 10
#> - attr(*, "class")= chr "partR2"
The list shows the key elements of partR2
that contain
(a) the point estimates and confidence intervals and (b) bootstrap
replicates. Note that each bootstrap replicate refers to a fit to a
simulated dataset (data simulated based on model estimates). Notably,
bootstrap estimates are stored in list columns. Every element in a list
column contains a vector with all bootstrap estimates for a given
term.
# (a) point estimates and confidence intervals
R2_BMb$R2 # R2s
R2_BMb$SC # Structure coefficients
R2_BMb$IR2 # inclusive R2s
R2_BMb$BW # Standardised model estimates
R2_BMb$Ests # Model estimates
# (b) bootstrap replicates
R2_BMb$R2_boot
R2_BMb$SC_boot
R2_BMb$IR2_boot
R2_BMb$BW_boot
R2_BMb$Ests_boot
Parametric bootstrapping works through simulation of new response
variables based on model estimates, followed by refitting the model to
these simulated responses. This regularly causes warnings, usually
reporting convergence problems or singularity warnings. Such warning
messages often occur when one of the components is close to zero (often
one of the random effects). Removing those components often prevents
warning messages. However, if the cause of warnings is that some
component is estimated at the boundary, then this does not constitute a
major problem. It is more important that the original model fit is
scrutinized for model validity than every single bootstrap iteration.
Nevertheless, partR2
captures warnings (and messages) and
saves them in the return object. Lets have a look at potential warnings
and messages for the first two simulations, though it turns out there
are none in this analysis.
R2_BMb$boot_warnings[1:2]
#> $sim_1
#> character(0)
#>
#> $sim_2
#> character(0)
R2_BMb$boot_messages[1:2]
#> $sim_1
#> character(0)
#>
#> $sim_2
#> character(0)
We generally advice to do all variable transformation before fitting
the model. Otherwise it is important to specify the variable in the
partvars
(or partbatch
) argument exactly (!)
how it was done in the original model fit. Here is an example where we
fit an additional term for Precipitation^2. A model with
Precipitation and Precipitation^2 produces warning
messages, because of the strong correlation between the linear and the
squared term. This can be solved by centering before squaring (see
Schielzeth 2010).
biomass$PrecipitationC <- biomass$Precipitation - mean(biomass$Precipitation)
modBMC <- lmer(Biomass ~ Temperature + PrecipitationC +
I(PrecipitationC^2) + (1|Population),
data = biomass)
R2_BMe <- partR2(modBMC, partvars = c("Temperature"), partbatch=list(c("PrecipitationC", "I(PrecipitationC^2)")),
nboot = 10)
R2_BMe
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.4365 0.37 0.5155 10 4
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot
#> Model 0.4365 0.3700 0.5155 10
#> PrecipitationC+I(PrecipitationC^2) 0.0997 0.0224 0.1741 10
#> Temperature 0.0497 0.0024 0.1257 10
#> Temperature+PrecipitationC+I(PrecipitationC^2) 0.4365 0.3700 0.5155 10
#> ndf
#> 4
#> 2
#> 3
#> 1
More generally, we have previously had problems with partial matching (e.g. terms female and male in the same model). Although this problem should now be fixed, it is worth to be aware that unusual names may cause complications and renaming usually offers an easy solution.
partR2
in parallelParametric bootstrapping is an inherently slow process, because it involves repeated fitting of mixed effects models. Furthermore, computation time increases exponentially with the number of terms requested, all being tested in isolation and in combination. It is therefore advisable to run preliminary analyses first with low numbers of bootstraps, just to see that things work and make sense in principle. The final analysis should be done with a large number of bootstraps (at least 100, better 1000). This can take time.
A simple way to save runtime is to distribute the bootstrap
iterations across multiple cores. partR2
parallelises with
the future
(Bengtsson, 2019) and furrr
(Vaughan & Dancho, 2018) packages. This makes it flexible for the
user on how exactly to parallelise and should work on whatever OS you
are running your analyses, be it Mac, Windows or Linux.
We illustrate the parallel option of partR2
with the
modBM
fitted above. First we specify how we want to
parallelise using future’s plan()
function. Check out
?future::plan
for more information. Generally, if you the
analysis is run in RStudio, it is recommended to use
plan(multisession)
. After specifying the plan, you need to
set the parallel = TRUE
argument when calling
partR2()
and bootstrapping will run in parallel. If there
is no specified plan, partR2
will still run
sequentially.
Partitioning R2
in Poisson and binomial models works with the same partR2
function, in the same way as for Gaussian data. Here, we simulated a
Poisson distributed response variable Extinction for the
biomass dataset.
First we fit a Poisson model using lme4::glmer
with
Year, Temperature, Precipitation and
SpeciesDiversity as fixed effect predictors and
Population as a random effect. Then we request part R2 for
Temperature and Precipitation.
Since a model with raw predictors produces warning messages (and the recommendation to scale variables), we scale Temperature and Precipitation and center Year and SpeciesDiversity.
biomass$YearC <- biomass$Year - mean(biomass$Year)
biomass$TemperatureS <- scale(biomass$Temperature)
biomass$PrecipitationS <- scale(biomass$Precipitation)
biomass$SpeciesDiversityC <- biomass$SpeciesDiversity - mean(biomass$SpeciesDiversity)
modExt <- glmer(Extinction ~ YearC + TemperatureS + PrecipitationS +
SpeciesDiversityC + (1|Population),
data=biomass, family="poisson")
R2_Ext <- partR2(modExt, partvars = c("TemperatureS", "PrecipitationS"),
R2_type = "marginal", nboot=10)
#> An observational level random-effect has been fitted
#> to account for overdispersion.
print(R2_Ext, round_to = 3) # rounding decimals in print and summary
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.077 0.023 0.125 10 5
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.077 0.023 0.125 10 5
#> TemperatureS 0.002 0.000 0.045 10 4
#> PrecipitationS 0.043 0.004 0.089 10 4
#> TemperatureS+PrecipitationS 0.058 0.007 0.105 10 3
We also simulated a proportion-scale response variables
Recovered and NotRecovered for the biomass dataset.
Again, we first fit a Binomial model using lme4::glmer
with
Year, Temperature, Precipitation and
SpeciesDiversity as fixed effect predictors and
Population as a random effect. Then we calculate part R2 for
Temperature and Precipitation. Again, we use
centered/scaled predictors to facilitate model convergence.
modRecov <- glmer(cbind(Recovered, NotRecovered) ~ YearC +
TemperatureS + PrecipitationS + SpeciesDiversityC + (1|Population),
data=biomass, family="binomial")
R2_Recov <- partR2(modRecov, partvars = c("TemperatureS", "PrecipitationS"),
R2_type = "marginal", nboot=10)
summary(R2_Recov, round_to=3)
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper ndf
#> 0.104 0.06 0.172 5
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper ndf
#> Model 0.104 0.060 0.172 5
#> TemperatureS 0.008 0.000 0.084 4
#> PrecipitationS 0.006 0.000 0.081 4
#> TemperatureS+PrecipitationS 0.044 0.002 0.116 3
#>
#> ----------
#>
#> Inclusive R2 (SC^2 * R2):
#> Predictor IR2 CI_lower CI_upper
#> YearC 0.015 0.006 0.026
#> TemperatureS 0.048 0.033 0.056
#> PrecipitationS 0.041 0.022 0.046
#> SpeciesDiversityC 0.045 0.005 0.111
#>
#> ----------
#>
#> Structure coefficients r(Yhat,x):
#> Predictor SC CI_lower CI_upper
#> YearC 0.376 0.211 0.613
#> TemperatureS 0.680 0.542 0.757
#> PrecipitationS 0.629 0.455 0.691
#> SpeciesDiversityC 0.656 0.140 0.830
#>
#> ----------
#>
#> Beta weights (standardised estimates)
#> Predictor BW CI_lower CI_upper
#> YearC 0.264 0.159 0.354
#> TemperatureS 0.307 0.238 0.352
#> PrecipitationS 0.247 0.108 0.292
#> SpeciesDiversityC 0.477 0.061 0.804
#>
#> ----------
#>
#> Parametric bootstrapping resulted in warnings or messages:
#> Check r2obj$boot_warnings and r2obj$boot_messages.
Both Poisson and binomials models give low R2 values – even though they have been generated with similar parameter settings as the Gaussian data. This situation is common and originates from the inherent rounding when traits are expressed as discrete or binary numbers. Fixed and random effects explain variation only at the latent scale, but inherent distribution-specific variance contribute to the residual variances. This leads to large residual variance and relatively lower explained variance components.
One thing to keep in mind when using GLMMs are observation-level
random effects (OLRE, see Harrison 2014). partR2
fits OLRE
internally to quantify overdispersion but the function will recognise an
OLRE when it is already fitted in the model. Fitting of OLRE can be
suppressed by olre = FALSE
in case this is required. (Note,
that we do not fit an OLRE for binomial models with binary responses
because in this case there is no overdispersion.)
It is possible to estimate part R2 for models with interaction terms, but this requires some thought. Recall that the part R2 of a predictor is calculated as the reduction in the variance explained after the predictor is removed from the model. Interactions are the product of two predictors. When raw covariates are multiplied with each other, the product (=the interaction) is usually highly correlated with each one of the main effects. Hence, interactions and main effects tend to have a large overlap in the variance that they explain in the response. This can lead to erroneous conclusions. A generally useful strategies when dealing with models with interactions is to center all covariates, because this usually mitigates the correlations and makes main effects independent of interactions (Schielzeth 2010). However, even with centered covariates there are different options on how we can partition the variance explained.
We will introduce these options with the guinea pig
dataset (Mutwill et al. in prep.). The dataset contains testosterone
measurements of 31 male guinea pigs, each measured at 5 time points. We
will analyze log-transformed testosterone titers (Testo) and
fit male identify (MaleID) as a random effect. As covariates
the dataset contains Time (time point of measurement) and a
Rank (rank index quantified from behavioral observations).
data(GuineaPigs)
head(GuineaPigs)
#> MaleID Time Rank Testo
#> 1 M64B 1 0.10 4.52
#> 2 M44D 1 NA 12.60
#> 3 M42C 1 0.07 4.88
#> 4 M65B 1 0.03 7.92
#> 5 M74A 1 0.10 7.83
#> 6 M45D 1 0.27 7.00
GuineaPigs <- subset(GuineaPigs, !is.na(Testo) & !is.na(Rank))
GuineaPigs$TestoTrans <- log(GuineaPigs$Testo)
One option is to estimate the variance explained by main effects
irrespective of whether there are interactions. We can do this, as
usual, by specifying main effects (and interactions) in the
partvars
argument. partR2
will then
iteratively remove each of those predictors and will treat interactions
and main effects equally.
modGP <- lmer(TestoTrans ~ Rank * Time + (1|MaleID), data=GuineaPigs)
R2_modGPa <- partR2(modGP, partvars = c("Rank", "Time", "Rank:Time"), nboot=10)
R2_modGPa
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.1634 0.1213 0.239 10 4
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.1634 0.1213 0.2390 10 4
#> Rank 0.0900 0.0417 0.1613 10 3
#> Time 0.0051 0.0000 0.0738 10 3
#> Rank:Time 0.0297 0.0000 0.0991 10 3
#> Rank+Time 0.1241 0.0787 0.1963 10 2
#> Rank+Rank:Time 0.1623 0.1201 0.2378 10 2
#> Time+Rank:Time 0.0496 0.0025 0.1196 10 2
#> Rank+Time+Rank:Time 0.1634 0.1213 0.2390 10 1
A conceptual Venn diagram shows what has been estimated. Y illustrates the total variance of the response (here TestoTrans), YRE the variance explained by random effects (here MaleID) and YR the residual variance. The other fields are the variance explained by the main effects (e.g. X1 = Rank and X2 = Time) and their interaction (Xint). Intersections show parts of the phenotypic variance explained by multiple components. What we have estimated is the variance uniquely explained by main effects and uniquely explained by the interaction. The variance explained that is shared between main effects and the interaction is neither attributed to the main effect nor the interaction. Instead we have estimated the variance Yx1 (horizontal stripes,Rank) and Yx2 (vertical stripes, Time). Finally, Yint (dotted) is attributed to the the interaction (Rank:Time).
One way to think about variance explained by main effects and their
interactions is to pool the variance explained by a main effect with the
variance explained by interactions that the term is involved in.
Rank might be considered important either as a main effect or
in interaction with Time and we might want to estimate the
total effect of Rank. The same might be true for Time.
This can be done using the partbatch
argument.
R2_modGPb <- partR2(modGP, partbatch = list(c("Rank", "Rank:Time"),
c("Time", "Rank:Time")),
data = GuineaPigs, nboot = 10)
R2_modGPb
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.1634 0.1215 0.2476 10 4
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.1634 0.1215 0.2476 10 4
#> Rank+Rank:Time 0.1623 0.1204 0.2465 10 2
#> Time+Rank:Time 0.0496 0.0051 0.1404 10 2
#> Rank+Rank:Time+Time 0.1634 0.1215 0.2476 10 1
A conceptual Venn diagram shows what has been estimated. Here, YX1 is the (combined) part R2 for Rank and Rank:Time (horizontal stripes), and YX2 is the (combined) part R2 for Time and Rank:Time (vertical stripes). Note that the interaction variance is attributed to both components (area with horizontal and vertical stripes).
The last option is the one that we think is usually preferable. The
option involves prioritising main effects by assigning the proportion of
variance that is explained by a main effect together with the variance
jointly explained with its interaction to the main effect. This implies
that first, only the part R2 for the interaction is
calculated from a model fit (modGP1
and
R2_GPc_part1
below). In a second model fit, the variance
explained by the main effects is estimated when their interaction is
excluded from the model (modGP2
and
R2_GP_part2
below). We have implemented a helper function
mergeR2
that allows to merge the two partR2
runs.
modGP1 <- lmer(TestoTrans ~ Rank * Time + (1|MaleID), data = GuineaPigs)
R2_GPc_part1 <- partR2(modGP1, partvars = c("Rank:Time"), data = GuineaPigs,
nboot = 10)
modGP2 <- lmer(TestoTrans ~ Rank + Time + (1|MaleID), data = GuineaPigs)
R2_GPc_part2 <- partR2(modGP2, partvars = c("Rank", "Time"), data = GuineaPigs,
max_level = 1, nboot = 10)
# the first partR2 object is based on the full model
R2_GPc <- mergeR2(R2_GPc_part1, R2_GPc_part2)
R2_GPc
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.1634 0.1009 0.2232 10 4
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.1634 0.1009 0.2232 10 4
#> Rank:Time 0.0297 0.0000 0.0913 10 3
#> Rank 0.1352 0.0786 0.1787 10 2
#> Time 0.0203 0.0000 0.0390 10 2
A conceptual Venn diagram shows what has been estimated. The model now assigns variance explained by a main effect, including the variance shared with its interaction, to the main effect. Here, X1 is the part R2 for Rank (horizontal stripes), and X2 is the (combined) part R2 for (vertical stripes) and the variance uniquely explained by the interaction (Rank:Time) is estimated separately (dotted).
Another special case are interactions involving factorial predictors. For illustration, we fit Time in the guinea pig analysis as a five-level factor.
GuineaPigs$TimeF <- factor(GuineaPigs$Time)
modGPF <- lmer(TestoTrans ~ Rank * TimeF + (1|MaleID), data=GuineaPigs)
R2_modGPFa <- partR2(modGPF, partvars = c("Rank", "TimeF", "Rank:TimeF"),
nboot=10)
R2_modGPFa
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.1969 0.1357 0.2757 10 10
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.1969 0.1357 0.2757 10 10
#> Rank 0.0000 0.0000 0.0802 10 10
#> TimeF 0.0230 0.0000 0.1016 10 6
#> Rank:TimeF 0.0516 0.0029 0.1289 10 6
#> Rank+TimeF 0.0230 0.0000 0.1016 10 6
#> Rank+Rank:TimeF 0.1796 0.1197 0.2583 10 5
#> TimeF+Rank:TimeF 0.0861 0.0329 0.1638 10 2
#> Rank+TimeF+Rank:TimeF 0.1969 0.1357 0.2757 10 1
The column ndf
shows that the full model and the model
without Rank have the same numbers of fitted parameters. For
such a model with one covariate, a five-level factor and their
interaction, the model has estimated ten parameters in the fixed part:
one intercept, one for the main effect of the covariate Rank,
four contrasts for the main-effect of the factor TimeF and four
contrasts for the interaction Rank:TimeF. If the
covariate is removed from the model, the contrasts will be
reparameterized, such that there are still ten parameters estimated in
the fixed part: one intercept, four contrast for the main-effect of the
factor TimeF, and now five for the interactions
Rank:TimeF.
One way to prevent this reparametrization of the design matrices is the use of dummy coded variables. We include dummy-coded variabels in the data frame and also center them.
GuineaPigs <- cbind(GuineaPigs, model.matrix(~ 0 + TimeF, data=GuineaPigs))
GuineaPigs$TimeF2 <- GuineaPigs$TimeF2 - mean(GuineaPigs$TimeF2)
GuineaPigs$TimeF3 <- GuineaPigs$TimeF3 - mean(GuineaPigs$TimeF3)
GuineaPigs$TimeF4 <- GuineaPigs$TimeF4 - mean(GuineaPigs$TimeF4)
GuineaPigs$TimeF5 <- GuineaPigs$TimeF5 - mean(GuineaPigs$TimeF5)
Now we can keep the four dummy-coded (and centered) TimeF
variables together in a named partbatch
argument.
batch <- c("TimeF2", "TimeF3", "TimeF4", "TimeF5")
modGPFD <- lmer(TestoTrans ~ (TimeF2 + TimeF3 + TimeF4 + TimeF5) * Rank +
(1|MaleID), data=GuineaPigs)
R2_GPFD <- partR2(modGPFD, partvars=c("Rank"),
partbatch=list(TimeF=batch, `TimeF:Rank`=paste0(batch, ":Rank")),
nboot=10)
R2_GPFD
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper nboot ndf
#> 0.1969 0.1282 0.272 10 10
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> Predictor(s) R2 CI_lower CI_upper nboot ndf
#> Model 0.1969 0.1282 0.2720 10 10
#> TimeF 0.0230 0.0000 0.1080 10 6
#> TimeF:Rank 0.0516 0.0020 0.1349 10 6
#> Rank 0.1628 0.0939 0.2399 10 9
#> Rank+TimeF 0.1718 0.1029 0.2483 10 5
#> Rank+TimeF:Rank 0.1796 0.1108 0.2558 10 5
#> Rank+TimeF+TimeF:Rank 0.1969 0.1282 0.2720 10 1
Standardised model estimates or beta weights are difficult to
calculate for models with interactions (including polynomials, which are
essentially interactions of a variable with itself). While there are
specialised packages like effectsize
(Makowski &
Lüdecke, 2019) to standardize model estimates, we recommend
standardising the variables involved in interactions or polynomials
before fitting the model. We will again use the guinea pig dataset with
Rank and Time fitted as a continuous, but now scaled
variables.
GuineaPigs$TestoTransS <- scale(GuineaPigs$TestoTrans)
GuineaPigs$TimeS <- scale(GuineaPigs$Time)
GuineaPigs$RankS <- scale(GuineaPigs$Rank)
modGPS <- lmer(TestoTrans ~ RankS * TimeS + (1|MaleID), data = GuineaPigs)
We then run the partR2
analysis. In the case of
(pre-)standardised variables, it is useful to look at the model
estimates instead of beta weights (since the latter are posthoc
standardized). The summary()
function takes an additional
argument ests = TRUE
display them. The estimates from
pre-standardized variable are preferable over post-standardized BW for
interactions and polynomials. However, the difference is small in this
case.
R2_GPS <- partR2(modGPS, nboot = 10)
summary(R2_GPS, ests = TRUE)
#>
#>
#> R2 (marginal) and 95% CI for the full model:
#> R2 CI_lower CI_upper ndf
#> 0.1634 0.1347 0.2855 4
#>
#> ----------
#>
#> Part (semi-partial) R2:
#> [1] "No partitions selected."
#>
#> ----------
#>
#> Inclusive R2 (SC^2 * R2):
#> Predictor IR2 CI_lower CI_upper
#> RankS 0.1217 0.0821 0.2367
#> TimeS 0.0012 0.0001 0.0102
#> RankS:TimeS 0.0018 0.0000 0.0134
#>
#> ----------
#>
#> Structure coefficients r(Yhat,x):
#> Predictor SC CI_lower CI_upper
#> RankS 0.8631 0.7387 0.9213
#> TimeS 0.0858 -0.0870 0.2548
#> RankS:TimeS -0.1060 -0.3108 0.0865
#>
#> ----------
#>
#> Model estimates
#> Predictor Estimate CI_lower CI_upper
#> RankS 0.2062 0.1719 0.2597
#> TimeS -0.0762 -0.0995 -0.0414
#> RankS:TimeS -0.0806 -0.1081 -0.0528
#>
#> ----------
#>
#> Beta weights (standardised estimates)
#> Predictor BW CI_lower CI_upper
#> RankS 0.4758 0.4071 0.6102
#> TimeS -0.1760 -0.2341 -0.0943
#> RankS:TimeS -0.1832 -0.2436 -0.1188
#>
#> ----------
#>
#> Parametric bootstrapping resulted in warnings or messages:
#> Check r2obj$boot_warnings and r2obj$boot_messages.
Bates, D. M., Maechler, M., Bolker, B. M. & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67: 1-48.
Bengtsson, H. (2019). future: Unified Parallel and Distributed Processing in R for Everyone. R package version 1.14.0. https://CRAN.R-project.org/package=future
Faraway, J. J. (2014). Linear models with R. CRC press, Boca Raton.
Harrison, X. A. (2014). Using observation-level random effects to model overdispersion in count data in ecology and evolution. PeerJ 2: e616.
Makowski, B.-S. & Lüdecke, D. (2019). Compute and interpret indices of effect size. R package. https://github.com/easystats/effectsize.
Mutwill, A.M., Schielzeth, H., Zimmermann, T.D., Richter, H., Kaiser, S. & Sachser, N. (in prep.). Endocrine mechanisms underlying dominance rank acquisition in males: individuality meets plasticity in a complex social environment.
Nakagawa, S., & Schielzeth, H. (2010). Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews 85: 935-956.
Nakagawa, S. & Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4: 133-142.
Schielzeth, H. (2010). Simple means to improve the interpretability of regression coefficients. Methods in Ecology and Evolution 1: 103-113.
Stoffel, M. A., Nakagawa, S. & Schielzeth, H. (2017). rptR: Repeatability estimation and variance decomposition by generalized linear mixed-effects models. Methods in Ecology and Evolution 8: 1639-1644.
Vaughan, D. & Dancho, M. (2018). furrr: Apply Mapping Functions in Parallel using Futures. R package version 0.1.0. https://CRAN.R-project.org/package=furrr
Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer, London.
Yeatts, P.E., Barton, M., Henson, R.K. & Martin, S.B. (2017). The use of structure coefficients to address multicollinearity in sport and exercise science. Measurement in Physical Education and Exercise Science 21: 83-91.
rptR
and partR2
: ratios of variance componentspartR2
in
parallel