9.4 The turtle problem

Let’s get some data to demonstrate these assumptions.

These are data that were collected 2013-2015 for Kemp’s Ridley sea turtles incidentally caught by anglers in the Gulf of Mexico. After being caught, the turtles were taken to a wildlife rehabilitation center so they could have fishing hooks removed and recover.

# Read in the turtles data,
# It's a bit messy, so we will read it
# in with an extra option to strip white spaces.
turtles = read.csv('data/turtles.txt', header = TRUE, strip.white = TRUE)

Here is a quick explanation of the variables (columns) in the dataframe:

ID: turtle ID
Year: year of capture
Gear: the gear type with which the turtle was hooked
Width: the gape width of the hook
Removed: the location from which the hook was removed
Status: survived (1) or did not (0)
Stay: length of stay in the rehab facility
nHooks: Number of hooks in the turtle

We will use Stay as the response variable here. This is a great data set because Stay has all kinds of problems related to assumptions of linear models that require analyzing it in a different framework than those we have discussed so far (or will for a few weeks!).