16.3 The prior

The prior distribution, in simple terms, is the information that we have at our disposal prior to collecting any further data. Those data might come in the form of hard numbers collected through a pilot study, or they might come from some logical process based on deductive reasoning. We will discuss the fact that the latter form of knowledge can be really useful for establishing book-ends.

One of the really attractive aspects of Bayesian inference is that we have the ability to incorporate information from prior experiences into our statistical models. The advantage of this is that we can start off with some information, and then collect new information to update our beliefs. Why would we want to do this? Glad you ask:

1. Improved inference The use of an informed prior allows us to improve the precision of our parameter estimates by narrowing the scope of credible values that “the machine” considers for our estimates. A strong prior can keep our estimates within a certain range of realistic values, for example, if we don’t have a ton of data.

2. Adaptive research Incorporation of information from previous studies allows us to continually update our scientific beliefs in an iterative way. If we have data from a similar study, or a previous year of study, then we can use that to inform inference moving forward to obtain more accurate and precise estimates of the parameters of interest, either by adjusting our prior or by including additional data directly.

3. Hypothesis testing We can use specific formulations of the prior distribution to test specific hypotheses about the probability of the event of interest. For example, if we suspect that the probability of a patient surviving an operation is strongly related to the age (or some other pre-existing condition) of the patient, then you could test different formulations of the prior and see which one results in a better model fit to your data. I tend to favor use of cross-validation criteria for Bayesian model selection to do this these days.

4. Incorporation of uncertainty If there is a lot of uncertainty in the event of interest, we can set a very “weak” or “diffuse” prior. When the prior is extremely diffuse (e.g. a uniform or “flat” prior), then Bayesian inference will yield results that are essentially identical to the results we expect to get from maximum likelihood estimation. The only noticeable difference may be increased precision or accuracy under Bayesian inference in some cases depending on the estimator that we use (some max likelihood estimators don’t do well in some situations in which Bayesian does just fine).

So let’s go through a couple examples of what a prior distribution actually looks like.

16.3.1 The hospital example

For this example, let’s assume that we are interested in the survival of a hospital patient. Survival will be denoted as a ‘success’, or 1, and mortality as a ‘failure’, or ‘0’. In this sense, we are dealing with a binomial outcome. But, remember, we can always represent binomial outcomes on the probability scale…right?

In this case, let’s say that we are assuming a priori that survival might be due to random chance, or that it might be influenced by some factor of interest (we’ll use “hospital” in the example below).

There are multiple approaches that we could take to formulating a prior distribution for this case.

# A uniform distribution that indicates we
# have no knowledge about how survival
# varies between hospitals
flat <- runif(1e4, 0, 1)

# A diffuse prior that indicates we think
# survival is the same between hospitals but
# we don't want to make too strong a statement
diffuse <- rbeta(1e4, 5, 5)

# A peaked (strong) prior that indicates we
# are relatively certain ahead of time that
# survival is the same in both hospitals
strong <- rbeta(1e4, 500, 500)

# A strong prior that indicates we think
# survival is substantially different
# between hospitals
bimodal <- rbeta(1e4, .5, .5)

We can combine these into a dataframe for visualizing them, and then use the pivot_longer() function to stack the dataframe for plotting:

priors <- data.frame(flat, diffuse, strong, bimodal) %>%
  pivot_longer(cols = c(flat, diffuse, strong, bimodal))

We can look at these to compare them. Note that the x-axis is the same in all of the plots below, so changes in the location, shape, and spread are all controlled by the differences in the parameters used to specify each of these priors.

ggplot(priors, aes(x = value, color = name, fill = name)) +
  geom_histogram(bins = 20) +
  facet_wrap(~name, scales = "free_y")

You can see how different each of these priors is from one another. Hopefully, you are also starting to think about the different kinds of hypotheses that we might test with these different priors. In this case, the issue that we are always trying to address is whether or not survival (or whatever) is due only to random chance. This could be likened to asking whether or not a coin that we toss is a fair coin, or if it has some bias (say for example that it is more likely to land heads up because it is heavier on one side).

Now that we have a prior distribution for our event of interest, we can go out into the world and collect some data about that event. We will then use those data to formulate a ‘posterior’ distribution that reflects some combination of our prior distribution and the data that we have collected. This process is commonly referred to as ‘updating’ our prior beliefs about the event of interest, and is the foundation that underlies Bayesian inference. How we get from the prior to a posterior is wholly dependent on the tools we use to obtain the solution to Bayes theorem, but most often this occurs through the use of Markov-chain Monte Carlo simulation. This approach allows us to work through Bayes theorem one set of values at a time to obtain a heuristic, simulation-based approach to solving for conditional probabilities. We will discuss this in some (but not too much!) detail as we move forward.

Before moving on, it is important to note that our prior beliefs can potentially have a strong influence on the posterior distribution. This has been the subject of much controversy in the application of Bayesian inference to modern scientific study. Our goal in using prior information should not be to dominate the posterior with our prior beliefs in biological and ecological studies, specifically. It should be to support improved inference through the inclusion of relevant information, and can be extremely helpful for situations in which data are somewhat deficient. We want our data to dominate the form of the posterior distributions that result from our analyses. If this is not the case, then we need to be explicit about this and should almost always attempt to evaluate the “sensitivity” of our posterior distribution(s) to the prior(s) we have chosen. This is a field of ongoing development in specific disciplines, and I encourage you to seek out the relevant literature on the matter if you intend to use Bayesian inference in your own research.