The concept of ‘repeatability’ relates to the way of quantifying the reliability of measurements. In a wider context, however, the repeatability offers insights into the components contributing to variability in the data. The repeatability describes the relative partitioning of variance into within-group and between-group sources of variance and is more generally referred to as the intra-class correlation (ICC). The hierarchical decomposition of variances has become the focus of several disciplines. In the context of behavioral research, for example, repeatability is often used to quantify stable individual differences and thus became a fundamental quantity of interest in the field of animal personality research.
A very powerful tool for estimating repeatabilities is the mixed effects model framework. Random-effect predictors are used to estimate variances at different hierarchical levels. In the simplest case of a design where repeated measures (e.g. quantifications of behavior) were taken from the same features (e.g. individuals), the repeatability R is calculated as the variance among group means (group-level variance VG) over the sum of group-level and data-level (residual) variance VR: R = VG/(VG + VR)
While it is relatively easy to extract point estimates for VG and VR from standard
software packages, it is considerably more difficult to quantify the
uncertainty in estimated R
values. The rptR
package offers easy-to-use functions for
estimating R and its
uncertainty. The package includes estimates of the repeatability (and
also raw variances and marginal R2; sensu
Nakagawa and Schielzeth 2013) for Gaussian traits, binomial and
Poisson-distributed traits. The theory for for the quantification of the
repeatability for non-Gaussian traits is reviewed in Nakagawa &
Schielzeth (2010).
The functions build heavily on functions for fitting mixed models, in
particular on lmer
from the lme4
package
(Bates et al. 2015). Mixed-effect models are fitted within the function
call and fitted model objects are returned as part of the class
rpt
return object. Uncertainty is quantified via parametric
bootstrapping. In a nutshell, parametric bootstrapping works by (i)
fitting the model, (ii) simulating new data based on the estimators of
the original model while adhering to the original sampling design, (iii)
refitting the model to the simulated data and storage of the
repeatability of the model fit. Repeating steps (ii) to (iii) a large
number of times (e.g. 1000 times) yields the sampling variance give the
data structure under the assumption that the model is valid.
As usual, the CRAN-version of the package can be installed via a call to:
Since the package is developed on Github (www.github.com), there is likely to be a developmental version that implements new features prior to release on CRAN. This current stable version on Github can be installed via a call to:
After one of those two steps, the package can be loaded by
The citation can be found with
The central wrapper function for estimating the repeatability is
rpt
. The argument datatype
decides whether the
call is forwarded to the specialized functions rptGaussian
for Gaussian traits, rptBinary
for binary (0/1) outcomes,
rptProportion
for proportion data represented as joint
vectors (or matrix) of successes and failures and
rptPoisson
for Poisson-distributed count data. The
functions return S3 class rpt
objects that can be viewed
via the str
function. The generic functions
summary
, print
and plot
are
available for a more convenient display of the results. In the following
we walk through the features of rptR
starting with Gaussian
models for introducing the basic plots and outputs, before addressing
the specifics of count and binary trait analyses.
As an example dataset we will use a simulated toy dataset that was
introduced for a different purpose (Nakagawa & Schielzeth 2013). It
offers a balanced dataset with rather simple structure, sizable effects
and decent sample size, just right to demonstrate some features of
rptR
. Sufficient sample size is required in particular for
the non-Gaussian traits, because those tend to be more computationally
demanding and less rich in information per data point than simple
Gaussian traits.
In brief, the imaginary sampling design of the simulated dataset is
as follows. Beetle larvae were sampled from 12 populations
(Population
) with samples taken from two discrete
microhabitats at each location (Habitat
). Samples were
split in equal proportions and raised in two dietary treatments
(Treatment
). Beetles were sexed at the pupal stage
(Sex
) and pupae were kept in single-sex containers
(Container
). Phenotypes (BodyL
,
Egg
and Color
) were measured at the imaginal
stage and are introduced below in the separate sections.
As an example for Gaussian data, we will analyse body length
(BodyL
). To start with, we will analyse the repeatability
at the level of the population that might arise through genetic
differentiation and/or early environmental influences prior to
sampling.
## 'data.frame': 960 obs. of 6 variables:
## $ Population: int 1 1 1 1 1 1 1 1 1 1 ...
## $ Container : int 1 1 1 1 1 1 1 1 2 2 ...
## $ Sex : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 2 2 ...
## $ Habitat : Factor w/ 2 levels "A","B": 1 1 1 1 2 2 2 2 1 1 ...
## $ Treatment : Factor w/ 2 levels "Cont","Exp": 1 2 1 2 1 2 1 2 1 2 ...
## $ BodyL : num 10.8 11.2 11.6 15.6 12.4 ...
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 80 80 80 80 80 80 80 80 80 80 80 80
We analyse the data using the rpt
function with the
argument datatype = "Gaussian"
. It is possible to call
rptGaussian
directly, while omitting the
datatype
argument. As parametric bootstrapping and
permutation can be slow, we recommend starting with shutting
bootstrapping and permutation off by setting nboot = 0
and
npermut = 0
. Note that the formula argument uses the same
syntax as lmer
from the lme4
packages, because
the formula-call is directly forwarded to lmer
. Also note
that grname
has to be specified with quotation marks and
that the term has to match exactly a random effect as it is used in the
formula
argument. The data
argument is
mandatory and is specified without quotes.
rpt(BodyL ~ (1|Population), grname="Population", data=BeetlesBody, datatype="Gaussian", nboot=0, npermut=0)
##
##
## Repeatability estimation using the lmm method
##
## Repeatability for Population
## R = 0.299
## SE = NA
## CI = [NA, NA]
## P = 8.08e-62 [LRT]
## NA [Permutation]
Since this works fine and we see sizable repeatability, we increase
the number of parametric bootstraps to 1000, in order to evaluate the
uncertainty in the repeatability estimate. The increase in the number of
parametric bootstraps will not affect the point estimates, but interval
estimates will be affected. At the same time, the permutation test can
be kept switched off. The results are stored in an object named
rep1
.
rep1 <- rpt(BodyL ~ (1|Population), grname="Population", data=BeetlesBody, datatype="Gaussian", nboot=1000, npermut=0)
It is also possible to increase the number of bootstrap iterations
(and also of permutations, see below) by calling the function again with
the update = TRUE
and rptObj
accepting the rpt
object from the previous function call. Note that all other arguments
(except for nboot
and npermut
) have to be
identical to the original function call. We here add an additional 500
bootstraps to the original 1000 bootstraps.
rep1 <- rpt(BodyL ~ (1|Population), grname="Population", data=BeetlesBody, datatype="Gaussian", nboot=500, npermut=0, update=TRUE, rptObj=rep1)
The rptR
package offers three generic function for
convenient display. The print
function displays the essence
of the repeatability estimation: the point estimate R, standard error
SE, confidence interval CI and P values. The summary
function is more verbose, summarizing parts of the data structure,
distributional summaries of the bootstrap and permutation samples and
the full details on likelihood ratio tests. Finally, the
plot
function shows the distribution of the parametric
bootstrap samples along with the point estimate and the limits of the
confidence interval. The plot can be customized by specifying additional
graphical parameters as with the cex.main
argument below
(see ?par
for additional arguments).
##
##
## Repeatability estimation using the lmm method
##
## Repeatability for Population
## R = 0.299
## SE = 0.089
## CI = [0.116, 0.466]
## P = 9.59e-62 [LRT]
## NA [Permutation]
##
## Repeatability estimation using the lmm method
##
## Call = rpt(formula = BodyL ~ (1 | Population), grname = "Population", data = BeetlesBody, datatype = "Gaussian", nboot = 500, npermut = 0, rptObj = rep1, update = TRUE)
##
## Data: 960 observations
## ----------------------------------------
##
## Population (12 groups)
##
## Repeatability estimation overview:
## R SE 2.5% 97.5% P_permut LRT_P
## 0.299 0.0895 0.116 0.466 NA 0
##
## Bootstrapping and Permutation test:
## N Mean Median 2.5% 97.5%
## boot 1500 0.287 0.285 0.116 0.466
## permut 2 NA NA 0.000 0.000
##
## Likelihood ratio test:
## logLik full model = -1946.634
## logLik red. model = -2083.405
## D = 274, df = 1, P = 9.59e-62
##
## ----------------------------------------
Parametric bootstrapping and permutations are inherently repetitive
procedures that can be computed on multiple cores in parallel. The
rptR
package has build-in options for using multiple cores
by setting parallel = TRUE
. The default
ncores = NULL
determines the number of cores on your
machine and uses all but one core. It is also possible to specify the
number of cores to be used via the ncores
argument.
The statistical significance of the repeatability is tested by
likelihood ratio tests (LRT), comparing the model fit of a model
including the grouping factor of interest and one excluding it (i.e.,
constraining it group-level variance to zero). Because the LRT are
conducted at the boundary of possible parameter space (against zero
group-level variance), the difference in log-likelihoods is assumed to
follow a mixture distribution of a χ2-distribution with 1
degree-of-freedom and point mass at zero. P values based on LRT are part of
the standard print
output and the full details can be see
by a call to summary
(see above). In this case the variance
among populations is highly significant and so is the repeatability at
this level.
Permutation can be used as a robust alternative or addition to LRT. In the simplest case this will work by randomizing the vector of group identities against the response vector followed by refitting the model to the randomized data and storage of the repeatability for each of many replications. However, in more complex models involving multiple random effects and/or fixed effects, this will also break the data structure between the grouping factor of interest and other aspects of the experimental design. We therefore construct randomized datasets by fitting a model without the grouping factor of interest and then add the permutated residuals to the fitted values of this model. This maintains the general data structure and any effects of other design aspects on the response while still breaking the link between grouping factor and the response.
rep2 <- rpt(BodyL ~ (1|Population), grname="Population", data=BeetlesBody, datatype="Gaussian", nboot=0, npermut=1000)
It is not unusual to see warnings about convergence problems, because
the data are simulated with zero group-level variance and this might
occasionally produce odd data. The results of the permutation tests are
shown as part of print
and summary
, and can be
visualized using the type
argument in the plot
function. The results are in general agreement with the LRT in this
case, but note that the resolution of the permutation tests is limited
by the number of iterations. Since the empirical estimate is always
included as one permutation sample (following the reasoning that even
under the null hypothesis, the observed data are a possible, albeit
possibly unlikely, outcome, Manly 2007), the resolution is limited to
1/N were N is the number of permutations. It
is typical to see the spike at zero and an extended right tail if
variances are low.
The full return object can be viewed by the str
functions, which shows, for example, that R_boot
holds all
parametric bootstrapping samples, R_permut
all permutation
samples (the first one of which is the point estimate),
all_warnings
hold all warning messages and mod
holds the model fit as an lmerMod
as returned by
lmer
.
## Linear mixed model fit by REML ['lmerMod']
## Formula: BodyL ~ (1 | Population)
## Data: data
##
## REML criterion at convergence: 3893.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2950 -0.7770 0.0186 0.7867 2.6107
##
## Random effects:
## Groups Name Variance Std.Dev.
## Population (Intercept) 1.377 1.173
## Residual 3.235 1.798
## Number of obs: 960, groups: Population, 12
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 14.0827 0.3436 40.98
It is possible to include additional random effect components in the
repeatability estimation. Indeed there does exist a repeatability for
each grouping level (i.e. random effects) and rptR
allows
estimating one, multiple or all of them simultaneously by using the
grname
argument. For example, we might want to estimate the
repeatability at the level of the source population
(Population
) as well as at the level of the housing group
at the pupal stage (Container
). Variances at the level of
the pupal housing group (Container
) might arise from
various environmental influences, such as temperature or humidity, that
might vary on a fine scale among containers.
rep3 <- rpt(BodyL ~ (1|Container) + (1|Population), grname=c("Container", "Population"), data=BeetlesBody, datatype="Gaussian", nboot=1000, npermut=0)
##
##
## Repeatability estimation using the lmm method
##
## Repeatability for Container
## R = 0.478
## SE = 0.071
## CI = [0.351, 0.624]
## P = 2.29e-138 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## R = 0.256
## SE = 0.094
## CI = [0.066, 0.425]
## P = 2.16e-07 [LRT]
## NA [Permutation]
The print
and summary
functions will show
results for both grouping factors, while the action of the
plot
function needs to be controlled by the
grname
argument. The results show substantial repeatable
variation among containers with still sizable (albeit slightly reduced)
variation among populations.
plot(rep3, grname="Container", type="boot", cex.main=0.8)
plot(rep3, grname="Population", type="boot", cex.main=0.8)
In our example, containers are nested within populations in the sense that samples from the same population are spread over multiple containers, but each container houses individuals from only a single population. In principle it might have been possible to apply a crossed sampling design with samples from the same population spread across multiple containers while each container houses samples from multiple populations. Crossed designs are usually preferred, but their appliation might be constraint like in the case of beetles where it is not possible to follow individuals (and consequently population identities) across the pupal stage, because all external markings are lost.
Nested sampling has consequences for how variances are interpreted. In our example, the population level repeatability captures all variation among populations (arising for genetic and environmental reasons) over all other variance components. But the container-level variance with population included in the model captures only among-container variance after controlling for variation among populations and is hence foremost environmental in origin. The population x container interaction variance, however, will appear as part of the container repeatability. The interaction variance might, in this case, arise from genotype-environment interactions or early environmental x late environment interactions (see Schielzeth & Nakagawa 2013 for a more detailed discussion of nesting).
One might argue that the full container variance should contain the population variance. It is here possible to estimate the repeatability at the container-level without controlling for population and the results turn out to be almost equivalent to adding the population-level variance to the container-level variance.
rep4 <- rpt(BodyL ~ (1|Container), grname="Container", data=BeetlesBody, datatype="Gaussian", nboot=1000, npermut=0)
##
##
## Repeatability estimation using the lmm method
##
## Repeatability for Container
## R = 0.729
## SE = 0.029
## CI = [0.668, 0.78]
## P = 2.74e-192 [LRT]
## NA [Permutation]
All repeatability estimates so far were agreement repeatabilities in
the sense that they represent the group-level variance over the sum of
all other variance components. This situation changes when fixed effects
come into play. Fixed effects explain part of the variances in the data
and the variance assigned to fixed effects in the model will not appear
in the denominator of the repeatability, at least under the
rptR
default settings. It is still possible to fit fixed
effects in rptR
, but the repeatability has then to be
interpreted as the repeatability after controlling for fixed effects.
This can be thought of as the repeatability when all observations are
shifted along the slopes of the fixed effects covariates till they reach
the ordinate. An equivalent estimate would have been derived if all data
had been collected at the point where all covariates are zero. Since the
procedure involves statistically adjustments, we have called this the
adjusted repeatability (Nakagawa & Schielzeth 2010).
Estimating adjusted repeatability is facilitated in rprR
by the formula
argument. Fixed effect covariates can be
included as in any lmer
formula. We will here include
Treatment
and Sex
as covariates. Note that the
treatment was artificially applied, hence it potentially introduces
excess variation to the sample that we might want to control for
statistically. Sex
is a more controversial example, because
it represents part of the natural variation, but often we want to know
the (adjusted) repeatabilities given that the sexual identities of the
individuals are known (and assuming that repeatabilities are identical
across the sexes).
rep5 <- rpt(BodyL ~ Treatment + Sex + (1|Container) + (1|Population), grname=c("Container", "Population"), data=BeetlesBody, datatype="Gaussian", nboot=1000, npermut=0)
The results can be viewed and plotted as before and the significance of the fixed effect can be assessed from the fitted model.
##
##
## Repeatability estimation using the lmm method
##
## Repeatability for Container
## R = 0.083
## SE = 0.027
## CI = [0.043, 0.145]
## P = 1.34e-13 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## R = 0.491
## SE = 0.113
## CI = [0.239, 0.676]
## P = 3.19e-31 [LRT]
## NA [Permutation]
plot(rep5, type="boot", grname="Container", cex.main=0.8)
plot(rep5, type="boot", grname="Population", cex.main=0.8)
It turns out that most of the variance among containers was caused by
sexual dimorphism (females are larger, as can be seen from the slope of
SexMale
in the fitted model rep5$mod
), albeit
some significant container repeatability remains even after controlling
for sexual dimorphism.
It is sometimes desiarable to fit models with fixed effects without
loosing the variance explained by fixed effects from the denominator. We
have therefore introduced the argument adjusted
. It
defaults to TRUE
, which means that adjusted repeatabilites
are estimated. But when the argument is set to FALSE
, the
function will estimate what we here call the enhanced agreement
repeatability, because it allows to fit an improved model without
loosing the variance from the repeatability denominator of the
repeatability. With adjusted = FALSE
the variance explained
by fixed effects will be calculated by the variance in the linear
predictor (on the link scale for non-Gaussian data). This variance is
added to the denominator of the repeatability equation just as other
variance components.
rep6 <- rpt(BodyL ~ Treatment + Sex + (1|Container) + (1|Population), grname=c("Container", "Population"), data=BeetlesBody, datatype="Gaussian", nboot=1000, npermut=0, adjusted=FALSE)
##
##
## Repeatability estimation using the lmm method
##
## Repeatability for Container
## R = 0.051
## SE = 0.013
## CI = [0.027, 0.078]
## P = 1.34e-13 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## R = 0.299
## SE = 0.089
## CI = [0.127, 0.47]
## P = 3.19e-31 [LRT]
## NA [Permutation]
In our example, it turns out that the fixed effects explain a
substantial amount of phenotypic variance and this reduces the relative
contribution of Population
and Container
. We
will see below that the change is entirely explained by the fraction of
the phenotypic variance explained by fixed effects. Note that the
repeatability for Population
is now the same as in our
initial model (rep1
), although a far more complex model was
fitted. The estimation of enhanced agreement repeatabilities works
equally for the non-Gaussian models introduced below.
If we are able to feed the variance explained by fixed effects to the
denominator of the repeatability calculation, it is also possible to
extract the variance explained by fixed effects in the model as a
quantity of interest. We have introduced a reserved term
Fixed
to the grname
argument for this purpose.
In combination with ratio = FALSE
,
grname = "Fixed"
will return the variance in the linear
predictor (on the latent scale for non-Gaussian data). In combination
with ratio = TRUE
, grname = "Fixed"
will
return the fraction of the total variance that is explained by variance
in the linear predictor, i.e. by fixed effects. Incidently, the
combination grname = "Fixed", ratio = TRUE
estimates a form
of the coefficient of determination R2 that we have called
the marginal R2 of mixed effects
models (Nakagawa & Schielzeth 2013). rptR
provides a
quantification of uncertainty for the variance explained by fixed
effects by parametric bootstrapping as for other variance components.
However, we have not implemented significance testing for the fixed
effect variance. This should be quantified by other means.
rep7 <- rpt(BodyL ~ Treatment + Sex + (1|Container) + (1|Population), grname=c("Container", "Population", "Fixed"), data=BeetlesBody, datatype="Gaussian", nboot=1000, npermut=0, adjusted=FALSE)
##
##
## Repeatability estimation using the lmm method
##
## Repeatability for Container
## R = 0.051
## SE = 0.013
## CI = [0.027, 0.08]
## P = 1.34e-13 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## R = 0.299
## SE = 0.091
## CI = [0.12, 0.468]
## P = 3.19e-31 [LRT]
## NA [Permutation]
##
## Repeatability for Fixed
## R = 0.391
## SE = 0.053
## CI = [0.294, 0.5]
## P = NA [LRT]
## NA [Permutation]
The variances explained by container and population are evidently
unchanged, but it is now visible that the fixed effects alone explain
almost 40% of the phenotypic variance in body length in our example. A
big part of this is simply sexual dimorphism. This serves as an example,
because in this particular case, we would probably be more interested in
the adjusted repeatability controlling for Treatment
and
Sex
.
Count data represent a typical situation in which we might want to consider fitting generalized linear mixed-effects models (GLMM). Poisson error distributions with log-link are often appropriate for count data, although there is no guarantee that count data will be best modeled by Poisson error distributions (negative binomial distributions are an obvious alternative). Using the link function, the model parameters are fitted on the latent scale rather than on the original data scale and consequently the model output will represent results on the latent scale.
While in a Gaussian model, there are different parameters for the
location and the spread of the distribution (mean and variance), this is
not the case in non-Gaussian models. The Poisson distribution, for
example, has an inherent variance that is equal to the mean. This
inherent variance needs to be respected when quantifying repeatabilities
for non-Gaussian data. We have reviewed the literature on non-Gaussian
link-scale variances (Nakagawa & Schielzeth 2010) and
rptR
is based on the formula presented therein. It turns
out that the distribution-specific variance for a Poisson model with log
link is approximated by ln(1/λ + 1) where
λ is the average count. This
approximation however becomes increasingly inaccurate at low λ.
In order to account for additive overdisperison in Poisson and other
non-Gaussian models (except for binary data), the current implementation
of rptR
internally adds an observation level random effect
to the model specified in formula
. An observation level
random effect is an additional random effect term (named
Overdispersion
) with as many levels as there are
observations to the specified model. The additive overdispersion
explained by the new Overdispersion
term is added to the
distribution-specific variance in the denominator of the repeatability.
See als section ‘Multiplicative and additive overdispersion models’
towards the end of the vignette.
As an example, we will analyse the number of eggs laid by female
beetles (Eggs
) in our toy dataset. We will estimate the
repeatability at the level of the population as well as at the level of
the container simultaneously while controlling for excess variation
introduced by the experimental nutritional treatment. There is no need
to control for sex-differences in this sex-limited trait.
We will estimate the repeatability through the rpt
function using the datatype = "Poisson"
argument.
Alternatively, it would be possible to fit the model directly using the
rptPoisson
function. The formula
syntax
follows the rules used in glmer
from the lme4
package, which is also the central engine for fitting the mixed effects
model.
rep8 <- rpt(Egg ~ Treatment + (1|Container) + (1|Population), grname=c("Container", "Population"), data=BeetlesFemale, datatype="Poisson", nboot=1000, npermut=0)
In non-Gaussian models, it frequently happens that some of the
bootstrap or permutation iterations do notF converge on a good model
fit. Warnings are thrown by glmer
and are collected in the
warnings
element of the rpt
return object.
As usual, the output can be viewed and displayed by
print
, summary
and plot
. The
results show substantial repeatability at the level of the population
(due to genetic differentiation or lasting environmental influence prior
to sampling), but very little repeatability at the level of the
container (indicating limited shared environmental effects).
##
##
## Repeatability estimation using the glmm method and log link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0.038
## SE = 0.025
## CI = [0, 0.1]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.032
## SE = 0.023
## CI = [0, 0.089]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.52
## SE = 0.122
## CI = [0.234, 0.689]
## P = 1.19e-16 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.504
## SE = 0.121
## CI = [0.223, 0.68]
## P = 1.19e-16 [LRT]
## NA [Permutation]
There are two repeatabilities for Poisson GLMM, derived by a latent scale approximation and a original scale approximation. We have previously called the two estimates the latent scale repeatability and the original scale repeatability. However, they are both estimates of the same quantities, not entirely different repeatabilites, hence we now prefer to call them the latent and orginal scale approximations. In the case of Poisson GLMM, original scale approximations are actually the exact solutions, while link scale approximations are indeed approximate. However, in our experience the two typically give very similar results.
plot(rep8, grname="Container", scale="link", cex.main=0.8)
plot(rep8, grname="Population", scale="link", cex.main=0.8)
plot(rep8, grname="Container", scale="original", cex.main=0.8)
plot(rep8, grname="Population", scale="original", cex.main=0.8)
As with Gaussian models, it is possible to estimate enhanced agreement repeatabilities as well as the proportion of variances explained by fixed effects (marginal R2). In this case the treatment effect explained less than 10% of the variance and including this fraction to the denomionator of the repeatability calculation gives marginally reduced repeatability estimates for population and container:
rep9 <- rpt(Egg ~ Treatment + (1|Container) + (1|Population), grname=c("Container", "Population", "Fixed"), data=BeetlesFemale, datatype="Poisson", nboot=1000, npermut=0, adjusted=FALSE)
print(rep9)
##
##
## Repeatability estimation using the glmm method and log link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0.034
## SE = 0.022
## CI = [0, 0.089]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.022
## SE = 0.015
## CI = [0, 0.059]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.468
## SE = 0.117
## CI = [0.195, 0.643]
## P = 1.19e-16 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.344
## SE = 0.097
## CI = [0.132, 0.497]
## P = 1.19e-16 [LRT]
## NA [Permutation]
##
## Repeatability for Fixed
## --------------------------------
## Link-scale approximation:
## R = 0.1
## SE = 0.028
## CI = [0.059, 0.164]
## P = NA [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.065
## SE = 0.017
## CI = [0.038, 0.104]
## P = NA [LRT]
## NA [Permutation]
The quantification of the distribution-specific variance of Poisson
models with log-link requires a parameter λ that represents the expected
value, i.e. the global mean on the data scale. We have previously given
different advice for estimating λ, including the inverse-link of the
intercept on the latent scale. While this gives a reasonable
approximation in many cases, the estimate will be biased for small
expected values (smaller than about 2). A preferred alternative is to
use the mean of the observations in the sample as we have also suggested
previously. The is now the default option in rptR
that is
determined by the expect
argument with its default
expect = "meanobs"
.
There is also an analytical alternative that takes the intercept and
variances as estimated on the latent scale. The expected value is E[y] = exp(β0 + vartot/2),
where β0 represents
the intercept on the latent scale and vartot
represents the sum of the variances, including variance caused by fixed
effects, on the latent scale. While this is generally a preferable
solution, the estimates are susceptible to the distribution of
covariates and, in most cases, it gives appropriate results only if all
covariates are centered. Since centering is not enforced within
rptR
, we have made this an opt-in option that can be used
by setting expect = "latent"
. The repeatabilities estimated
by the formulae presented in Nakagawa & Schielzeth (2010) can be
requested by expect = "liability"
.
Binary data are sets of single observations with failure/success
status (coded 0/1). They are a special case of proportion data (see
below) with a sample size of 1 per observation. Because of a number of
specificities, repeatabilities for binary and proportion data are
separately coded in rptR
. For example, binary models to not
fit the addional overdispersion term, since there is not definable
overdispersion in binary models. Binary data are handed over to
rpt
as a single vector of 0’s and 1’s or as a two-level
factor.
Males in our toy example occur in two distinct color morphs: dark and
reddish-brown. We will analyze the repeatability of color morph
identities (Colour
) as an example for the repeatability of
binary data. This is a test for spatial heterogeneity in morph
ratios.
##
## dark reddish
## 275 205
We estimate the repeatability through the wrapper function
rpt
at the level of population and at the level of
containers. As with Poisson models it frequently happens that some of
the bootstrap or permutation iterations do not converge on a model fit
and that glmer
throws warnings that are collected in the
all_warnings
element of the rpt
object.
rep10 <- rpt(Colour ~ (1|Container) + (1|Population), grname=c("Container", "Population"), data=BeetlesMale, datatype="Binary", nboot=1000, npermut=0)
##
##
## Repeatability estimation using the glmm method and logit link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0
## SE = 0.014
## CI = [0, 0.051]
## P = 1 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0
## SE = 0.014
## CI = [0, 0.051]
## P = 1 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.188
## SE = 0.073
## CI = [0.042, 0.33]
## P = 5.81e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.185
## SE = 0.071
## CI = [0.042, 0.322]
## P = 5.81e-09 [LRT]
## NA [Permutation]
As with Poisson GLMM, there are two approximations for the repeatability that we call the latent scale approximation and original scale approximation and in the case of Binomial GLMM, the two are both approximations of the same quantity and in our experience the two typically give very similar results.
It turns out that there is significant repeatability at the level of populations, but not on the level of containers (at least not beyond variance among populations). It frequently happens that small variances are estimated as zero, because small variances are difficult to estimate with any level of confidence. For a check of robustness, we still estimate the enhanced agreement repeatabilities controlling for the treatment manipulation, but it turns out that this change to the model does hardly affect the repeatability estimates due to the low explanatory power of the treatment.
rep11 <- rpt(Colour ~ Treatment + (1|Container) + (1|Population), grname=c("Container", "Population", "Fixed"), data=BeetlesMale, datatype="Binary", nboot=1000, npermut=0, adjusted=FALSE)
##
##
## Repeatability estimation using the glmm method and logit link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0
## SE = 0.014
## CI = [0, 0.05]
## P = 0.5 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0
## SE = 0.013
## CI = [0, 0.046]
## P = 0.5 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.196
## SE = 0.074
## CI = [0.054, 0.339]
## P = 5.55e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.174
## SE = 0.059
## CI = [0.052, 0.276]
## P = 5.55e-09 [LRT]
## NA [Permutation]
##
## Repeatability for Fixed
## --------------------------------
## Link-scale approximation:
## R = 0.045
## SE = 0.019
## CI = [0.016, 0.09]
## P = NA [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.04
## SE = 0.016
## CI = [0.015, 0.076]
## P = NA [LRT]
## NA [Permutation]
plot(rep11, grname="Population", scale="link", cex.main=0.8)
plot(rep11, grname="Population", scale="original", cex.main=0.8)
plot(rep11, grname="Container", scale="link", cex.main=0.8)
plot(rep11, grname="Container", scale="original", cex.main=0.8)
plot(rep11, grname="Fixed", scale="link", cex.main=0.8)
plot(rep11, grname="Fixed", scale="original", cex.main=0.8)
We have previously adviced to estimate the distribution specific
variance for the link scale approximation as π2/3 (Nakagawa &
Schielzeth 2010). While this gives a reasonable approximation, it is
preferable to estimate the link scale variance as 1/(p(1 − p)) where p is the expected probability of
success (Nakagawa & Schielzeth 2016). The current version of
rptR
implements the more accurate approximation 1/(p(1 − p)).
As with Poisson GLMM, we have implemented two options for estimating
the expectation for p. The
default version with expect = "meanobs"
takes the average
succes rate across all observations. There is also a latent scale
approximation (details are presented in Nakagawa & Schielzeth 2016)
that can be used upon choice by setting expect = "latent"
.
As with Poisson GLMM, we consider the analytical solution via
expect = "latent"
more accurate and putitatively more
robust to unbalanced sampling, but expect = "meanobs"
seems
safer, because it does not depend on how covariates are coded.
Counts of successes and failures are conveniently modelled as
proportion data. They are handed over to rpt
(and
ultimately to glmer
) as combined vectors of integers. We
will illustrate the repeatability analysis of proportion data by
aggregating color morph identities within containers. The analysis is
equivalent to the analysis of the data as binary data with
Container
treated as a random effect as it was done above
(model rep10
).
BeetlesMale$Dark = BeetlesMale$Colour
BeetlesMale$Reddish = (BeetlesMale$Colour-1)*-1
BeetlesMaleAggr <- aggregate(cbind(Dark, Reddish) ~ Population + Container, data=BeetlesMale, FUN=sum)
rep12 <- rpt(cbind(Dark, Reddish) ~ (1|Population), grname=c("Population"), data=BeetlesMaleAggr, datatype="Proportion", nboot=1000, npermut=0)
print(rep12)
##
##
## Repeatability estimation using the glmm method and logit link
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.188
## SE = 0.072
## CI = [0.045, 0.318]
## P = 5.81e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.185
## SE = 0.07
## CI = [0.045, 0.308]
## P = 5.81e-09 [LRT]
## NA [Permutation]
There are two alternative ways of modelling overdispersion in Poisson
and proportion models. In the current implementation of
rptR
we use the underlying glmer
model fits
for estimating additive overdispersion as excess variation at the latent
level (excess beyond variability arising from the sampling distribution
alone). It is implemented as an additional random effect term (named
Overdispersion
) with as many levels as there are
observations. This variation is added to the distribution-specific
variance in the denominator of the repeatability.
Multiplicative overdispersion, on the contrary, models overdispersion as a parameter (conveniently labelled ω) that multiplies the distribution-specific variance. ω = 1 is thus equivalent to no overdispersion beyond what is expected from the distribution. See Nakagawa & Schielzeth (2010) for details on additive and multiplicative overdispersion.
The choice of additive versus multiplicative overdispersion is a
matter of convenience. The data for the toy example was simulated as
additive overdispersion and it happens that the powerful
glmer
function implements additive overdispersion. We have
previously used glmmPQL
for fitting multiplicative
overdispersion models, but do not maintain this option any longer.
In our examples we have used the default link functions of Poisson
models (log
link) and binary/proportion data
(logit
link). Alternative link functions are possible, as
long as the distribution-specific variances are known for the
distribution-link family (see Nakagawa & Schielzeth 2010 for an
overview). Beyond what has been demonstrated above, rptR
supports the sqrt
link for Poisson and the
probit
link for binary/proportion data. In the above
examples, the differences between alternative link functions are rather
minor (but note that the original scale approximations are not available
for Possion data with sqrt
link and for binary/proportion
models with probit
link).
###Poisson models###
##
##
## Repeatability estimation using the glmm method and log link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0.038
## SE = 0.025
## CI = [0, 0.1]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.032
## SE = 0.023
## CI = [0, 0.089]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.52
## SE = 0.122
## CI = [0.234, 0.689]
## P = 1.19e-16 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.504
## SE = 0.121
## CI = [0.223, 0.68]
## P = 1.19e-16 [LRT]
## NA [Permutation]
rep8a <- rpt(Egg ~ Treatment + (1|Container) + (1|Population), grname=c("Container", "Population"), data=BeetlesFemale, datatype="Poisson", link="sqrt", nboot=1000, npermut=0)
##
##
## Repeatability estimation using the glmm method and sqrt link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0.039
## SE = 0.024
## CI = [0.001, 0.097]
## P = 0.00771 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 0.00771 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.477
## SE = 0.107
## CI = [0.219, 0.627]
## P = 2.07e-16 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 2.07e-16 [LRT]
## NA [Permutation]
###Binary models###
##
##
## Repeatability estimation using the glmm method and logit link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0
## SE = 0.014
## CI = [0, 0.051]
## P = 1 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0
## SE = 0.014
## CI = [0, 0.051]
## P = 1 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.188
## SE = 0.073
## CI = [0.042, 0.33]
## P = 5.81e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.185
## SE = 0.071
## CI = [0.042, 0.322]
## P = 5.81e-09 [LRT]
## NA [Permutation]
rep10a <- rpt(Colour ~ (1|Container) + (1|Population), grname=c("Container", "Population"), data=BeetlesMale, datatype="Binary", link="probit", nboot=1000, npermut=0)
##
##
## Repeatability estimation using the glmm method and probit link
##
## Repeatability for Container
## --------------------------------
## Link-scale approximation:
## R = 0
## SE = 0.013
## CI = [0, 0.047]
## P = 1 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 1 [LRT]
## NA [Permutation]
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.178
## SE = 0.07
## CI = [0.043, 0.316]
## P = 5.08e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 5.08e-09 [LRT]
## NA [Permutation]
###Proportion models###
##
##
## Repeatability estimation using the glmm method and logit link
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.188
## SE = 0.072
## CI = [0.045, 0.318]
## P = 5.81e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = 0.185
## SE = 0.07
## CI = [0.045, 0.308]
## P = 5.81e-09 [LRT]
## NA [Permutation]
rep12a <- rpt(cbind(Dark, Reddish) ~ (1|Population), grname=c("Population"), data=BeetlesMaleAggr, datatype="Proportion", link="probit", nboot=1000, npermut=0)
##
##
## Repeatability estimation using the glmm method and probit link
##
## Repeatability for Population
## --------------------------------
## Link-scale approximation:
## R = 0.178
## SE = 0.07
## CI = [0.052, 0.312]
## P = 5.08e-09 [LRT]
## NA [Permutation]
##
## Original-scale approximation:
## R = NA
## SE = NA
## CI = [NA, NA]
## P = 5.08e-09 [LRT]
## NA [Permutation]
We wrote rptR
with the focus primarily on estimating
repeatabilities, including a quantification of their uncertainty.
However, we realize that it is often of interest to estimate variances
and their uncertainty directly rather than as ratios of variances.
rptR
can be used for this purpose too, by setting the
ratio
argument from TRUE
(the default) to
FALSE
.
For example, if we are interested in the link scale variances of colour morph variation, we can estimate the variances as:
rep13 <- rpt(Egg ~ Treatment + (1|Container) + (1|Population), grname=c("Container", "Population"), data=BeetlesFemale, datatype="Poisson", nboot=1000, npermut=0, ratio=FALSE)
##
##
## Variance estimation using the glmm method and log link
##
## Variance of Container
## --------------------------------
## Link-scale approximation:
## Var = 0.022
## SE = 0.012
## CI = [0.001, 0.05]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Variance of Population
## --------------------------------
## Link-scale approximation:
## Var = 0.304
## SE = 0.126
## CI = [0.09, 0.605]
## P = 1.19e-16 [LRT]
## NA [Permutation]
plot(rep13, grname="Container", scale="link", cex.main=0.8)
plot(rep13, grname="Population", scale="link", cex.main=0.8)
Evidently, the significance of the random effect terms won’t change
no matter whether they are presented as variances or as variances
devided by the phenotypic variance (repeatabilities). If we are
interested in inspecting raw variances, we are often also interested in
quantifying the amount of residual variance and its uncertainty, too. We
have therefore introduced two addition reserved terms for the
grname
argument (besides grname = "Fixed"
) to
allow this. Residual
will allow estimating the residual
variance, which in the case of non-Gaussian models is the overdispersion
on the latent scale plus the distribution-specific variance.
Overdispersion
will allow estimating the overdispersion
variance and the latent scale, which in the case of Gaussian models
(with implicit identity link) is equal to the residual estimated by
Residual
.
It is necessary to specify at least one additional grouping level in
the grname
argument along with Residual
. The
results for Container
and Population
are
unaffected and are the same as above (rep13
). In our
example there is significant overdispersion, meaning that besides
treatment effects and variation among populations and among containers
within populations there is more unexplained variation than expected
from the Poisson distribution alone.
rep14 <- rpt(Egg ~ Treatment + (1|Container) + (1|Population), grname=c("Container", "Population", "Overdispersion", "Residual"), data=BeetlesFemale, datatype="Poisson", nboot=1000, npermut=0, ratio=FALSE)
##
##
## Variance estimation using the glmm method and log link
##
## Variance of Container
## --------------------------------
## Link-scale approximation:
## Var = 0.022
## SE = 0.012
## CI = [0.002, 0.05]
## P = 0.00874 [LRT]
## NA [Permutation]
##
## Variance of Population
## --------------------------------
## Link-scale approximation:
## Var = 0.304
## SE = 0.133
## CI = [0.084, 0.582]
## P = 1.19e-16 [LRT]
## NA [Permutation]
##
## Variance of Overdispersion
## --------------------------------
## Link-scale approximation:
## Var = 0.103
## SE = 0.019
## CI = [0.062, 0.138]
## P = 2.22e-17 [LRT]
## NA [Permutation]
##
## Variance of Residual
## --------------------------------
## Link-scale approximation:
## Var = 0.259
## SE = 0.032
## CI = [0.202, 0.328]
## P = NA [LRT]
## NA [Permutation]
plot(rep14, grname="Overdispersion", scale="link", cex.main=0.8)
plot(rep14, grname="Residual", scale="link", cex.main=0.8)
It is pointless to test for the statistical significance of residual
variance against a null hypothesis of no residual variance and we
therefore omit this test. But for non-Gaussian models it is worthwhile
testing if there is statistically significant overdispersion and we have
therefore implemented the LRT test (but not the permutation test) for
Overdispersion
. In gerneral, however, be aware that each
grouping level specified in grname
will be tested in
separate iterations if a permutation test is required. Hence it is
recommened to be careful with the use of npermut
in
combination with multiple grouping levels.
rep15 <- rpt(Egg ~ Treatment + (1|Container) + (1|Population), grname=c("Container", "Population", "Overdispersion"), data=BeetlesFemale, datatype="Poisson", nboot=0, npermut=1000, ratio=FALSE)
##
##
## Variance estimation using the glmm method and log link
##
## Variance of Container
## --------------------------------
## Link-scale approximation:
## Var = 0.022
## SE = NA
## CI = [NA, NA]
## P = 0.00874 [LRT]
## 0.211 [Permutation]
##
## Variance of Population
## --------------------------------
## Link-scale approximation:
## Var = 0.304
## SE = NA
## CI = [NA, NA]
## P = 1.19e-16 [LRT]
## 0.088 [Permutation]
##
## Variance of Overdispersion
## --------------------------------
## Link-scale approximation:
## Var = 0.103
## SE = NA
## CI = [NA, NA]
## P = 2.22e-17 [LRT]
## NA [Permutation]
It is possible to use both Overdisperion
and
Residual
in combination with ratio = FALSE
or
with ratio = TRUE
. The ratio of the residual variance to
the total variance should then be interpreted as the
‘non-repeatability’. We assume that the combination of these two terms
with ratio = FALSE
will be more commonly used.
It is possible to fit random-slope models following the approach
introduced by Johnson (2014). The function call will return the average
repeatability across the distribution of the covariate in the sample
(see Johnson 2014 for details). Permutation tests and likelihood ratio
test are conducted with random-intercept and random-slope terms being
jointly removed from the model. Note that random-slope models can be
specified in multiple ways and the function will likely handle only
those models well that have been specified in the classical way
(foo | grname)
, implicitly fitting random-intercepts,
random slopes for the covariate foo
and their correlation.
Multiple random-slopes are possible, but models with covariances being
constraint do currently not work.
The following code gives and example for a repeatability estimation
for body size where random slopes are fitted for Sex
and
Treatment
at the level of populations. The resulting model
allows for variance components to vary by sex and by treatment with the
average across the sex/treatment levels being returned as the model
output.
rep16 <- rpt(BodyL ~ Treatment + Sex + (1|Container) + (Treatment + Sex|Population), grname=c("Population"), data=BeetlesBody, datatype="Gaussian", nboot=500, npermut=0, adjusted=FALSE)
##
## Repeatability estimation using the lmm method
##
## Call = rpt(formula = BodyL ~ Treatment + Sex + (1 | Container) + (Treatment + Sex | Population), grname = c("Population"), data = BeetlesBody, datatype = "Gaussian", nboot = 500, npermut = 0, adjusted = FALSE)
##
## Data: 960 observations
## ----------------------------------------
##
## Population (12 groups)
##
## Repeatability estimation overview:
## R SE 2.5% 97.5% P_permut LRT_P
## 0.303 0.0854 0.129 0.473 NA 0
##
## Bootstrapping and Permutation test:
## N Mean Median 2.5% 97.5%
## boot 500 0.293 0.294 0.129 0.473
## permut 1 NA NA NA NA
##
## Likelihood ratio test:
## logLik full model = -1526.295
## logLik red. model = -1595.399
## D = 138, df = 6, P = 2.39e-27
##
## ----------------------------------------
The rptR
package offers convenient extraction of (ratios
of) variance components for multiple grouping levels for estimating
repeatabilities, including the option to control for fixed effects in
order to estimate adjusted repeatabilities. While this is easy-to-use we
want to remind users to make sure their data is appropriately modeled.
We therefore provide the fitted mixed model as part of the
rpt
object and recommend that this is inspected for any
indications of violated model assumptions or missing predictors.
Repeatability estimates from Poisson, binary and proportion data might
often appear disappointingly low. This is likely to be a simple
reflection of the fact that for non-Gaussian data, the numerator
contains the among-group variance in predispositions (e.g. probabilities
of successes, long-range averages of counts) while the observed data is
discretely scattered around these expectations. A lot of the phenotypic
variance is hence inherently random in origin and each datapoint
contains only very limited information about the underlying
predisposition. The estimation of repeatabilities for count, binary and
proportion data will thus require substantial sample sizes and (ideally)
optimized sampling designs.
Bates, D., M. Maechler, B. Bolker & S. Walker (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67: 1-48.
Johnson, P.C.D. (2014). Extension of Nakagawa & Schielzeth’s RGLMM2 to random slopes models. Methods in Ecology and Evolution 5: 944-946
Nakagawa, S. & H. Schielzeth (2010). Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews 85: 935-956.
Nakagawa, S. & H. Schielzeth (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4: 133-142.
Nakagawa, S. & H. Schielzeth (2016). The coefficient of determination R2 and intra-class correlation ICC from generalized linear mixed-effects models revisited and expanded. bioRxive: doi 10.1101/095851.
Manly, B. F. J. (2007). Randomization, bootstrap and Monte Carlo methods in Biology. 3rd edn. Chapman & Hall, Boca Rato.
Schielzeth, H. & S. Nakagawa (2013). Nested by design: Model fitting and interpretation in a mixed model era. Methods in Ecology and Evolution 4: 14-24.