16.1 Introduction

In the second half of this decidedly worst stats text, we are going to introduce and apply Bayesian inference. In class, this shift aligns with the introduction of GLM about halfway through the semester for reasons that I hope will become obvious to the astute learner. If not, my rationale for introducing these concepts together is 1) both of these tools rely heavily upon knowledge of sampling distributions that we began to build earlier, and 2) we are beginning to move into the realm where we will become more reliant upon methods of estimation other than ordinary least squares (OLS), which is what we used to estimate models during the first half of this text and class (ANOVA, linear regression, and ANCOVA).

In Chapter 12 and Chapter 13 we introduced maximum likelihood estimation and extended this to include restricted maximum likelihood estimation in Chapter 15. The fact is most biologists will never understand the mechanical/mathematical underpinnings of those estimation methods either but they won’t hesitate to use them, so why not also take a look at Bayesian estimation tools as well? Plus, as you will see, this approach will allow us to do a whole bunch of stuff we just can’t do using MLE in most software packages.

During the next several chapters, I am hoping that we can dive in to the basic underpinnings of the Bayesian framework for scientific inference, along with maximum likelihood estimation, as we move through more complex extensions of linear models. The Bayesian framework has been around for a long time, but only recently has it become really broadly applicable to common analytical problems. There are some fundamental (and philosophical) differences between the use of maximum likelihood estimation (aka “frequentist”) methods and the application of Bayes theorem for answering statistical questions. I am hoping we can touch on some of these during our discussions and show the real, practical strengths of maximum likelihood and Bayesian inference that might actually make you want to use one or the other for certain applications.

For better or worse, there is no way we can possibly do a comprehensive treatment of Bayesian statistics within the context of a survey-style, applied statistics course or textbook such as this. We can, however, set you up with some basic tools so you can apply Bayesian inference to commonly encountered situations (t-tests, regression, ANOVA, ANCOVA, GLM, LMM, and GLMM) that should allow you to explore these concepts on your own in the future. To achieve this, I would like for us to cover some Bayesian analogs to some of frequentist tests that we have considered so far this semester. For this reason, we will explore maximum likelihood and Bayesian estimation methods side by side while learning new techniques during class.

Even though we are switching out our estimation method in this Part, we’ll continue to work with the tidyverse. Be sure to load it whenever you are ready to get started.

library(tidyverse)

16.1.1 Installing RStan

For this book, we will introduce Bayesian estimation using a similar approach to what we learned for GLM. You need to pause here and appreciate how ridiculously easy folks have made this for you. I used to teach students how to code each of their models by hand in this class. We had to package the data in special ways, write out separate model files with pre-defined likelihoods, and manage the monstrous results lists by hand. Now, thanks to the functionality provided by packages such as RStan and rstanarm you can do this without ever leaving the comfort of R. Do also note that the real power as a modeler still lays in the ability to formulate and specify these models explicitly. This opens up whole new possibilities for model and data structuring that can’t be achieved in any single R package. But, that is for another course and a much better textbook (eventually I will add citations…or not…this is The Worst Stats Text eveR). We will discuss how the actual estimation works as we go along (and in a less diffuse fashion in class), but you’ll need to spend a significant amount of time on your own if you are looking for a deep understanding of the mechanics. You’ll probably want to build up to the Stan User Manual rather than starting there if this is your first exposure to Bayesian estimation methods.

The hardest part about getting started with Stan is installing the necessary R packages, but this is getting easier all the time.

We are going to install two R packages here: RStan, the interface to Stan software, and rstanarm, a package that provides Bayesian generalized linear models via Stan. This package allows us to fit everything from ANOVAs discussed in Chapter 7 to generalized linear mixed models discussed in Chapter 15.

But, first we need to do a little work to get ready. Here’s what we are going to do:

Step 1: Configure the C++ toolchain on your operating system following the instructions provided by the Stan Development Team on the RStan wiki

Step 2: Install Rstan following the next step on the same wiki here.

Step 3: Install the rstanarm package by running: install.packages("rstanarm").

Once, you’ve done that, don’t forget to load it:

library(rstanarm)