9.4 The turtle problem
Let’s get some data to demonstrate these assumptions.
These are data that were collected 2013-2015 for Kemp’s Ridley sea turtles incidentally caught by anglers in the Gulf of Mexico. After being caught, the turtles were taken to a wildlife rehabilitation center so they could have fishing hooks removed and recover.
# Read in the turtles data,
# It's a bit messy, so we will read it
# in with an extra option to strip white spaces.
turtles = read.csv('data/turtles.txt', header = TRUE, strip.white = TRUE)
Here is a quick explanation of the variables (columns) in the dataframe:
ID
: turtle ID
Year
: year of capture
Gear
: the gear type with which the turtle was hooked
Width
: the gape width of the hook
Removed
: the location from which the hook was removed
Status
: survived (1) or did not (0)
Stay
: length of stay in the rehab facility
nHooks
: Number of hooks in the turtle
We will use Stay
as the response variable here. This is a great data set because Stay
has all kinds of problems related to assumptions of linear models that require analyzing it in a different framework than those we have discussed so far (or will for a few weeks!).