Title: | Exact (Restricted) Likelihood Ratio Tests for Mixed and Additive Models |
---|---|
Description: | Rapid, simulation-based exact (restricted) likelihood ratio tests for testing the presence of variance components/nonparametric terms for models fit with nlme::lme(),lme4::lmer(), lmeTest::lmer(), gamm4::gamm4(), mgcv::gamm() and SemiPar::spm(). |
Authors: | Fabian Scheipl [aut, cre] , Ben Bolker [ctb] |
Maintainer: | Fabian Scheipl <[email protected]> |
License: | GPL |
Version: | 3.1-8 |
Built: | 2024-11-04 05:03:58 UTC |
Source: | https://github.com/fabian-s/rlrsim |
RLRsim
implements fast simulation-based exact tests for variance components in mixed and additive models for
conditionally Gaussian responses – i.e., tests for questions like:
is the variance of my random intercept significantly different from 0?
is this smooth effect significantly nonlinear?
is this smooth effect significantly different from a constant effect?
The convenience functions exactRLRT
and exactLRT
can deal with fitted models from packages lme4, nlme, gamm4, SemiPar and
from mgcv's gamm()
-function.
Workhorse functions LRTSim
and RLRTSim
accept design matrices as inputs directly and can thus be used more generally
to generate exact critical values for the corresponding
(restricted) likelihood ratio tests.
The theory behind these tests was first developed in:
Crainiceanu, C. and Ruppert, D. (2004)
Likelihood ratio tests in
linear mixed models with one variance component, Journal of the Royal
Statistical Society: Series B, 66, 165–185.
Power analyses and sensitivity studies for RLRsim can be found in:
Scheipl, F., Greven, S. and Kuechenhoff, H. (2008)
Size and power of tests
for a zero random effect variance or polynomial regression in additive and
linear mixed models. Computational Statistics and Data Analysis,
52(7), 3283–3299, doi:10.1016/j.csda.2007.10.022.
Fabian Scheipl ([email protected]), Ben Bolker
This function provides an exact likelihood ratio test based on simulated values from the finite sample distribution for simultaneous testing of the presence of the variance component and some restrictions of the fixed effects in a simple linear mixed model with known correlation structure of the random effect and i.i.d. errors.
exactLRT( m, m0, seed = NA, nsim = 10000, log.grid.hi = 8, log.grid.lo = -10, gridlength = 200, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL )
exactLRT( m, m0, seed = NA, nsim = 10000, log.grid.hi = 8, log.grid.lo = -10, gridlength = 200, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL )
m |
The fitted model under the alternative; of class |
m0 |
The fitted model under the null hypothesis; of class |
seed |
Specify a seed for |
nsim |
Number of values to simulate |
log.grid.hi |
Lower value of the grid on the log scale. See
|
log.grid.lo |
Lower value of the grid on the log scale. See
|
gridlength |
Length of the grid. See |
parallel |
The type of parallel operation to be used (if any). If missing, the default is "no parallelization"). |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. Defaults to 1, i.e., no parallelization. |
cl |
An optional parallel or snow cluster for use if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the call. |
The model under the alternative must be a linear mixed model
with a single
random effect
with known correlation structure and error terms that
are i.i.d. The hypothesis to be tested must be of the form
versus
We use the exact finite sample distribution of the likelihood ratio test statistic as derived by Crainiceanu & Ruppert (2004).
A list of class htest
containing the following components:
statistic
the observed likelihood ratio
p
p-value for the observed test statistic
method
a character string indicating what type of test was
performed and how many values were simulated to determine the critical value
sample
the samples from the null distribution returned by
LRTSim
Fabian Scheipl, updates for lme4.0-compatibility by Ben Bolker
Crainiceanu, C. and Ruppert, D. (2004) Likelihood ratio tests in linear mixed models with one variance component, Journal of the Royal Statistical Society: Series B,66,165–185.
LRTSim
for the underlying simulation algorithm;
RLRTSim
and exactRLRT
for restricted likelihood
based tests
library(nlme); data(Orthodont); ##test for Sex:Age interaction and Subject-Intercept mA<-lme(distance ~ Sex * I(age - 11), random = ~ 1| Subject, data = Orthodont, method = "ML") m0<-lm(distance ~ Sex + I(age - 11), data = Orthodont) summary(mA) summary(m0) exactLRT(m = mA, m0 = m0)
library(nlme); data(Orthodont); ##test for Sex:Age interaction and Subject-Intercept mA<-lme(distance ~ Sex * I(age - 11), random = ~ 1| Subject, data = Orthodont, method = "ML") m0<-lm(distance ~ Sex + I(age - 11), data = Orthodont) summary(mA) summary(m0) exactLRT(m = mA, m0 = m0)
This function provides an (exact) restricted likelihood ratio test based on simulated values from the finite sample distribution for testing whether the variance of a random effect is 0 in a linear mixed model with known correlation structure of the tested random effect and i.i.d. errors.
exactRLRT( m, mA = NULL, m0 = NULL, seed = NA, nsim = 10000, log.grid.hi = 8, log.grid.lo = -10, gridlength = 200, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL )
exactRLRT( m, mA = NULL, m0 = NULL, seed = NA, nsim = 10000, log.grid.hi = 8, log.grid.lo = -10, gridlength = 200, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL )
m |
The fitted model under the alternative or, for testing in models
with multiple variance components, the reduced model containing only the
random effect to be tested (see Details), an |
mA |
The full model under the alternative for testing in models with multiple variance components |
m0 |
The model under the null for testing in models with multiple variance components |
seed |
input for |
nsim |
Number of values to simulate |
log.grid.hi |
Lower value of the grid on the log scale. See
|
log.grid.lo |
Lower value of the grid on the log scale. See
|
gridlength |
Length of the grid. See |
parallel |
The type of parallel operation to be used (if any). If missing, the default is "no parallelization"). |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. Defaults to 1, i.e., no parallelization. |
cl |
An optional parallel or snow cluster for use if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the call. |
Testing in models with only a single variance component require only the
first argument m
. For testing in models with multiple variance
components, the fitted model m
must contain only the random
effect set to zero under the null hypothesis, while mA
and m0
are the models under the alternative and the null, respectively. For models
with a single variance component, the simulated distribution is exact if the
number of parameters (fixed and random) is smaller than the number of
observations. Extensive simulation studies (see second reference below)
confirm that the application of the test to models with multiple variance
components is safe and the simulated distribution is correct as long as the
number of parameters (fixed and random) is smaller than the number of
observations and the nuisance variance components are not superfluous or
very small. We use the finite sample distribution of the restricted
likelihood ratio test statistic as derived by Crainiceanu & Ruppert (2004).
No simulation is performed if the observed test statistic is 0. (i.e., if the fit of the model fitted under the alternative is indistinguishable from the model fit under H0), since the p-value is always 1 in this case.
A list of class htest
containing the following components:
A list of class htest
containing the following components:
statistic
the observed likelihood ratio
p
p-value for the observed test statistic
method
a character string indicating what type of test was
performed and how many values were simulated to determine the critical value
sample
the samples from the null distribution returned by
RLRTSim
Fabian Scheipl, bug fixes by Andrzej Galecki, updates for lme4-compatibility by Ben Bolker
Crainiceanu, C. and Ruppert, D. (2004) Likelihood ratio tests in linear mixed models with one variance component, Journal of the Royal Statistical Society: Series B,66,165–185.
Greven, S., Crainiceanu, C., Kuechenhoff, H., and Peters, A. (2008) Restricted Likelihood Ratio Testing for Zero Variance Components in Linear Mixed Models, Journal of Computational and Graphical Statistics, 17 (4): 870–891.
Scheipl, F., Greven, S. and Kuechenhoff, H. (2008) Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models. Computational Statistics & Data Analysis, 52(7):3283–3299.
RLRTSim
for the underlying simulation algorithm;
exactLRT
for likelihood based tests
data(sleepstudy, package = "lme4") mA <- lme4::lmer(Reaction ~ I(Days-4.5) + (1|Subject) + (0 + I(Days-4.5)|Subject), data = sleepstudy) m0 <- update(mA, . ~ . - (0 + I(Days-4.5)|Subject)) m.slope <- update(mA, . ~ . - (1|Subject)) #test for subject specific slopes: exactRLRT(m.slope, mA, m0) library(mgcv) data(trees) #test quadratic trend vs. smooth alternative m.q<-gamm(I(log(Volume)) ~ Height + s(Girth, m = 3), data = trees, method = "REML")$lme exactRLRT(m.q) #test linear trend vs. smooth alternative m.l<-gamm(I(log(Volume)) ~ Height + s(Girth, m = 2), data = trees, method = "REML")$lme exactRLRT(m.l)
data(sleepstudy, package = "lme4") mA <- lme4::lmer(Reaction ~ I(Days-4.5) + (1|Subject) + (0 + I(Days-4.5)|Subject), data = sleepstudy) m0 <- update(mA, . ~ . - (0 + I(Days-4.5)|Subject)) m.slope <- update(mA, . ~ . - (1|Subject)) #test for subject specific slopes: exactRLRT(m.slope, mA, m0) library(mgcv) data(trees) #test quadratic trend vs. smooth alternative m.q<-gamm(I(log(Volume)) ~ Height + s(Girth, m = 3), data = trees, method = "REML")$lme exactRLRT(m.q) #test linear trend vs. smooth alternative m.l<-gamm(I(log(Volume)) ~ Height + s(Girth, m = 2), data = trees, method = "REML")$lme exactRLRT(m.l)
These functions extract various elements of the design of a fitted
lme
-, mer
or lmerMod
-Object. They are called by
exactRLRT
and exactLRT
.
extract.lmeDesign(m)
extract.lmeDesign(m)
m |
a fitted |
a a list with components
Vr
estimated covariance of the random effects divided by the
estimated variance of the residuals
X
design of the fixed effects
Z
design of the random effects
sigmasq
variance of the residuals
lambda
ratios of the variances of the random effects and the
variance of the residuals
y
response variable
Fabian Scheipl, extract.lmerModDesign
by Ben Bolker.
Many thanks to Andrzej Galecki and Tomasz Burzykowski for bug fixes.
library(nlme) design <- extract.lmeDesign(lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)) str(design)
library(nlme) design <- extract.lmeDesign(lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)) str(design)
These functions simulate values from the (exact) finite sample distribution
of the (restricted) likelihood ratio statistic for testing the presence of
the variance component (and restrictions of the fixed effects) in a simple
linear mixed model with known correlation structure of the random effect and
i.i.d. errors. They are usually called by exactLRT
or
exactRLRT
.
LRTSim( X, Z, q, sqrt.Sigma, seed = NA, nsim = 10000, log.grid.hi = 8, log.grid.lo = -10, gridlength = 200, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL )
LRTSim( X, Z, q, sqrt.Sigma, seed = NA, nsim = 10000, log.grid.hi = 8, log.grid.lo = -10, gridlength = 200, parallel = c("no", "multicore", "snow"), ncpus = 1L, cl = NULL )
X |
The fixed effects design matrix of the model under the alternative |
Z |
The random effects design matrix of the model under the alternative |
q |
The number of parameters restrictions on the fixed effects (see Details) |
sqrt.Sigma |
The upper triangular Cholesky factor of the correlation matrix of the random effect |
seed |
Specify a seed for |
nsim |
Number of values to simulate |
log.grid.hi |
Lower value of the grid on the log scale. See Details |
log.grid.lo |
Lower value of the grid on the log scale. See Details |
gridlength |
Length of the grid for the grid search over lambda. See Details |
parallel |
The type of parallel operation to be used (if any). If missing, the default is "no parallelization"). |
ncpus |
integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs. Defaults to 1, i.e., no parallelization. |
cl |
An optional parallel or snow cluster for use if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the call. |
The model under the alternative must be a linear mixed model
with a single random
effect
with known correlation structure
and i.i.d errors.
The simulated distribution of the likelihood ratio statistic was derived by
Crainiceanu & Ruppert (2004). The simulation algorithm uses a grid search over
a log-regular grid of values of
to
maximize the likelihood under the alternative for
nsim
realizations of
drawn under the null hypothesis.
log.grid.hi
and
log.grid.lo
are the lower and upper limits of this grid on the log
scale. gridlength
is the number of points on the grid.\ These are just
wrapper functions for the underlying C code.
A vector containing the the simulated values of the (R)LRT under the
null, with attribute 'lambda' giving (see
Crainiceanu, Ruppert (2004)) for the simulations.
Fabian Scheipl; parallelization code adapted from boot
package
Crainiceanu, C. and Ruppert, D. (2004) Likelihood ratio tests in linear mixed models with one variance component, Journal of the Royal Statistical Society: Series B,66,165–185.
Scheipl, F. (2007) Testing for nonparametric terms and random effects in structured additive regression. Diploma thesis (unpublished).
Scheipl, F., Greven, S. and Kuechenhoff, H (2008) Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models, Computational Statistics & Data Analysis, 52(7):3283-3299
library(lme4) g <- rep(1:10, e = 10) x <- rnorm(100) y <- 0.1 * x + rnorm(100) m <- lmer(y ~ x + (1|g), REML=FALSE) m0 <- lm(y ~ 1) (obs.LRT <- 2*(logLik(m)-logLik(m0))) X <- getME(m,"X") Z <- t(as.matrix(getME(m,"Zt"))) sim.LRT <- LRTSim(X, Z, 1, diag(10)) (pval <- mean(sim.LRT > obs.LRT))
library(lme4) g <- rep(1:10, e = 10) x <- rnorm(100) y <- 0.1 * x + rnorm(100) m <- lmer(y ~ x + (1|g), REML=FALSE) m0 <- lm(y ~ 1) (obs.LRT <- 2*(logLik(m)-logLik(m0))) X <- getME(m,"X") Z <- t(as.matrix(getME(m,"Zt"))) sim.LRT <- LRTSim(X, Z, 1, diag(10)) (pval <- mean(sim.LRT > obs.LRT))