15.3 Count models

15.3.1 Data explanation {glmm-count-data}

We will wrap up our discussions about GLMM with a worked example of count models. For this one, we will attempt to predict counts of walleye, Sander vitreus, in spawning streams of Otsego Lake based on historical counts and climate data.

We begin by reading in the data set:

# Read in the walleye data
eyes <- read.csv("data/walleye.csv", stringsAsFactors = FALSE)

Have a look at the first ten lines of the data set:

head(eyes, 10)

And check out the data structure:

str(eyes)
## 'data.frame':    71 obs. of  15 variables:
##  $ date     : chr  "2009-04-01" "2009-04-01" "2009-04-01" "2009-04-05" ...
##  $ site     : chr  "Cripple Creek" "Hayden Creek" "Shadow Brook" "Cripple Creek" ...
##  $ counts   : int  1 1 1 4 61 17 50 10 1 35 ...
##  $ high_f   : num  48 48 48 52 52 52 NA NA 39 39 ...
##  $ low_f    : num  34 34 34 32 32 32 NA NA 28 28 ...
##  $ high_c   : num  8.89 8.89 8.89 11.11 11.11 ...
##  $ low_c    : num  1.11 1.11 1.11 0 0 ...
##  $ mean_c   : num  5 5 5 5.56 5.56 ...
##  $ ddPrep   : num  5 5 5 5.56 5.56 ...
##  $ day      : int  91 91 91 95 95 95 97 97 99 99 ...
##  $ year     : int  2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
##  $ dd       : num  72.4 72.4 72.4 109.3 109.3 ...
##  $ dd2      : num  5240 5240 5240 11954 11954 ...
##  $ daylight : num  12.7 12.7 12.7 12.9 12.9 ...
##  $ daylight2: num  161 161 161 166 166 ...

These data are counts of walleye that were captured in spawning tributaries of Otsego Lake during the 2009, 2013, 2017,and 2018 spawning season. These measurements are accompanied by various environmental indicators that include high and low flows, precipitation (rain and snow), high, low, mean temperatures (c) and degree days (dd), and photoperiod (daylight) on each day of the year.

We will use the data to predict number of walleye we expect to see each day in the spawning tribs based on historical counts and some explanatory variables of interest.

15.3.2 Data analysis

We start by estimating a model using the glmer() function. Let’s say for the sake of argument that we are simply interested in the lake-wide mean of our counts so that we know when students should, for example, be heading out to tributaries to look for walleyes in streams.

For now, we will model walleye count as a function of photoperiod, with a random effect of site on the intercepts. This model assumes that there is variability in counts of spawning individuals between sites, but that the relationship between photoperiod and count is the same across all sites. In this case, we will specify a quadratic relationship between counts and dates because we expect the number of fish to increase to some point in the run before it decreases. We are not interested in the individual sites in this case, but need to account for repeated observations within spawning locations.

In the lme4 package, the model might look something like this:

# Load the package
library(lme4)

# Make the model
waeMod1 <- glmer(counts ~ dd + I(dd^2) + (1 | site), data = eyes, family = poisson)

# Have a look-see at the results
summary(waeMod1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: counts ~ dd + I(dd^2) + (1 | site)
##    Data: eyes
## 
##      AIC      BIC   logLik deviance df.resid 
##   1440.0   1449.1   -716.0   1432.0       67 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -6.150 -2.289 -1.159  1.948 12.425 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  site   (Intercept) 0.01095  0.1047  
## Number of obs: 71, groups:  site, 3
## 
## Fixed effects:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.087e+00  4.066e-01  -10.05   <2e-16 ***
## dd           9.275e-02  5.248e-03   17.67   <2e-16 ***
## I(dd^2)     -2.795e-04  1.666e-05  -16.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr) dd    
## dd      -0.975       
## I(dd^2)  0.945 -0.990
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## unable to evaluate scaled gradient
## Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

As we look through these results, we can see that we have a significant effect of degree days on spawning behavior. What’s more is that our count of spawning fish appears to increase during the year to a point before it starts to decrease.

But we’ve got what appear to be some major issues related to convergence at the bottom of this output! [dun, dun, dun]

Luckily, we can fix all of these issues by simply standardizing the covariate (dd) as discussed previously. This helps keep things on a unit scale for model estimation, and prevents wacky estimates like negative variances (!). You can think of this as calculating z-scores for each observation of a given variable.

# Standardize the covariate
eyes$sdd <- as.vector(scale(eyes$dd))

# Make the model
waeMod2 <- glmer(counts ~ sdd + I(sdd^2) + (1 | site),
  data = eyes, family = poisson
)

# Have a look-see at the results
summary(waeMod2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: counts ~ sdd + I(sdd^2) + (1 | site)
##    Data: eyes
## 
##      AIC      BIC   logLik deviance df.resid 
##   1440.0   1449.1   -716.0   1432.0       67 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -6.150 -2.289 -1.159  1.948 12.425 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  site   (Intercept) 0.01095  0.1047  
## Number of obs: 71, groups:  site, 3
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.52854    0.07123   49.54   <2e-16 ***
## sdd          0.45898    0.03737   12.28   <2e-16 ***
## I(sdd^2)    -0.67374    0.04018  -16.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr) sdd   
## sdd      -0.001       
## I(sdd^2) -0.292 -0.297

15.3.3 Predictions

Now, if we want, we can make a graph to show these predictions. Here, we make predictions for all years, and plot by site.

# Simulate predictions from the relationship
# stored in the model fit using our original data
log_preds <- predictInterval(
  merMod = waeMod2,
  level = 0.95, n.sims = 10000,
  stat = "median", type = "linear.prediction"
)

# Convert them to the real scale for plotting
real_preds <- apply(log_preds, 2, exp)

# Combine predictions with the original data
mer_preds <- data.frame(eyes, real_preds)

# Plot the predictions
ggplot(mer_preds, aes(x = dd, y = counts, color = site, fill = site)) +
  geom_point(alpha = 0.10) +
  geom_ribbon(aes(ymin = lwr, ymax = upr, color = NULL), alpha = .3) +
  geom_line(aes(y = fit), lwd = 1, alpha = 0.50) +
  xlab("Growing degree days") +
  ylab("Number of walleye in stream")

We can see that our mean predictions aren’t terrible, but there is quite a bit of uncertainty here, as above. In this case, it may behoove us to look at this within the context of negative binomial GLMMs, but that is a story for another day (and class!!).