11.1 Introduction

As we have learned in the past couple weeks, we often encounter situations for which there are multiple, competing hypotheses about what factors, or combinations of factors, best explain the observed patterns in our response of interest. This uncertainty arises for two primary reasons:

1. Complexity of the study system

Biological systems are complex, and often we are interested in which factor, or set of factors, best predict the patterns we observe in the natural world. In carefully designed experiments, we might be interested in evaluating competing hypotheses about mechanistic drivers of biological phenomena. In complex observational studies, we might simply wish to know what factor or subset of possible factors best predicts the patterns we observe, with the understanding that these findings cannot be used to infer causality (or ‘mechanism’) although they can help us better design studies that do.

2. Collinearity

Oh, snap! What did he just say? Collinearity is the idea that certain explanatory variables are related to one another. I know, I know; last week I told you that independence of observations was one of the fundamental assumptions that we make about linear models. That is, all observations (rows of data) are sampled independently from one another. This is a nice ideal, and in certain experimental designs that are “orthogonal”, we can ensure that variables are not collinear. But, in the real world, this is almost never the case.

Model selection offers a means for us to weigh effects of collinearity against the information that is gained as a result of including explanatory variables that are related to one another. In real-world cases, our best model will almost always fall somewhere between a model that contains all of the variables we want to include, and a model that contains only one of those variables.

However, model selection is just as useful for testing hypotheses of rigorously designed, controlled experiments. And as we’ll see it can often help to provide more meaningful interpretation of those hypotheses than do p-values alone.

We will be working out of the tidyverse as usual for this chapter. We will also be working with functions from the AICcmodavg package. You will need to install AICcmodavg if you do not already have it installed.