14.3 Linear mixed models
We will start our explorations into GLMM by looking at the somewhat familiar case of “normal” data, whatever mythical meaning it may have. As with the relationship between ANOVA and GLM, we can say that the linear mixed model (LMM) is just a special case of the GLMM (hence the name), both of which belong to the group of multi-level or hierarchical models that house basically every kind of model we have looked at this semester.
So, what is a mixed model? This is a model that, generally speaking, assumes at least one parameter of interest is drawn from a population of potential sample sets. We usually use these when we are dealing with repeated samples for some group or individual, or if we wish to account for some latent variable beyond our control (e.g., lake or year). The use of random effects allows us to remove extraneous noise (variance) from the study system by explicitlty accounting for it. This can improve both the accuracy and the precision of estimates to make hypothesis testing on other explanatory variables more robust. It also allows us to generalize our conclusions to a broader scope (e.g. any lake instead of lakes X, Y, and Z).
Beyond these mundane uses, a “multi-level” approach to modeling allows for a great deal of flexibility in assumptions we make about the effects and associated errors in our model. We might assume within our model that effects are different between populations by assigning random intercepts and/or slopes. We can specify whether we think the influence of a continuous covariate is correlated with the starting point (correlated random slopes and intercepts). There are even rare cases when we might wish to examine random slopes with shared intercepts or vice versa. In Bayesian inference we can use information at higher levels of organization, like the North American Range of a species, to inform parameter estimation at lower levels, such as individual populations.
The point here is that random effects on a given parameter need not be a “nuisance” for which we wish to account: it may be the very thing we wish to harness for inference, estimation, or prediction.
As with so many things, these tools are often best investigated through the use of a worked example. Generally speaking, we want the grouping variable we use to specify random effects to contain a relatively large number of potential levels (usually > 5, but often > 10) as this tends to result in more accurate, and more precise parameter estimates. We will look at a case to start in which we use fewer for the sake of demonstration.