As folks learned in the Introductory session, once we have a basic grasp on how R works, and how and where to find help, the learning process becomes a lot less painful, and we can start to build an appreciation for how convenient it is to have a script we can come back to again and again. To show off this convenience, and the power of R as a statistics program, we will spend the afternoon session walking through some applied analyses and data visualization tools.
The plan for the Intermediate session is to work through two a combination of long-term and seasonal examples of common analyses in lake management. We will start with the Otsego Lake data from the Introductory session and do some data manipulation, statistical analyses, and visualization of long-term changes in a single lake. We’ll then look at how this can be applied to many lakes at once using the same tools. Finally, we’ll wrap things up by investigating seasonal changes in the Otsego Lake water column throughout the year by making some isopleths.
We will work with functions from a bunch of packages within the
tidyverse
for this session. You can go ahead and load that
now. We will load a few others as we go along in this module, and during
the workshop.
library(tidyverse)
Before we ever get into a statistical analysis, it is always good
practice to have a good, long look at the data we are working with to
make sure everything makes sense, and to note issues that may rear their
heads down the road. Let’s start by reading in the Otsego Lake data and
getting reacquainted. This data set is part of a long-term monitoring
data set collected by the SUNY Oneonta Biological Field Station at
Otsego Lake in Cooperstown, NY. We read the data in with the argument
stringsAsFactors = FALSE
because there are a lot of string
variables in the data, and factors can add extra handling time to our
workflow (see Introductory session tutorial
).
<- read.csv("data/physical.csv", stringsAsFactors = FALSE) otsego
So, what are we working with here? Let’s have a look. Remember from
the Introductory session that it might be useful to understand how R
sees your data first and foremost. The most reliable method for doing
this with dataframes is to look at the structure of
your data using str()
or the glimpse()
function from the tidyverse
.
# Like this:
glimpse(otsego)
: 11,339
Rows: 9
Columns$ date <chr> "1/29/1988", "1/29/1988", "1/29/1988", "1/29/1988", "1/29/1988…
$ year <int> 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 1988, 19…
$ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,…
$ depth <dbl> 48.0, 46.0, 44.0, 40.0, 35.0, 42.0, 30.0, 25.0, 20.0, 18.0, 16…
$ temp <dbl> 2.84, 2.65, 2.52, 2.44, 2.43, 2.41, 2.38, 2.16, 1.94, 1.86, 1.…
$ ph <dbl> 7.56, 7.71, 7.76, 7.77, 7.79, 7.77, 7.79, 7.80, 7.82, 7.82, 7.…
$ do_mgl <dbl> 8.4, 9.8, 10.4, 10.5, 10.6, 10.6, 10.7, 10.9, 11.3, 11.3, 11.5…
$ sp_cond <dbl> 0.320, 0.308, 0.310, 0.308, 0.307, 0.307, 0.302, 0.308, 0.311,…
$ do_sat <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Now that we have an idea of what the data set looks like, let’s take a little closer look. First, there are a couple of things that we can do that will clean up our code down the road a little bit. Let’s have a quick look at our column names again.
names(otsego)
1] "date" "year" "month" "depth" "temp" "ph" "do_mgl"
[8] "sp_cond" "do_sat" [
Most of these are nice and clean, but there are some things that
happen when we read data into R from Excel files (like replacing special
characters with ...
). Plus, the less we have to change when
it comes time to plot the better off we’ll be.
We are not going to replace all of the names, because we are not
going to work with all of the columns, but let’s replace a couple of
these that we are definitely going to use (if only to show how this
works). For now, let’s just change the name for temp
so
that it is easier to recognize as temperature.
Remember that the result of names(otsego)
is a vector,
and we can replace individual elements of that vector. We just need to
know the index of the element we wish to replace. In this case,
temp
is the 3rd column of the dataframe
otsego
.
Here is how it works:
# First, we replace temp with Temperature
names(otsego)[5] <- "Temperature"
Have a quick look to make sure you are happy with the new names.
# Print the names of the df
# to the console so we can
# see them.
names(otsego)
1] "date" "year" "month" "depth" "Temperature"
[6] "ph" "do_mgl" "sp_cond" "do_sat" [
Now that we have had a quick look at our data, and we have made some
changes for the sake of convenience, let’s dig a little deeper. This
afternoon, we are going to use Temperature
as our
response, or dependent variable to
demonstrate some basic statistical techniques in R. Before we can do any
actual statistics, though, it is good practice to scrutinize the data we
intend to use.
First, we’ll need to go ahead and do some work to get variables like
month
and year
out of our date
column. To do this, we’ll use functions from the lubridate
package.
library(lubridate)
# Get date in a standard, unambiguous format
$date <- as.Date(otsego$date, format = "%m/%d/%Y")
otsego
# Make a variable to hold year in the df
$year <- year(otsego$date)
otsego
# Convert month to a factor so it can be easily re-leveled and ordered
$month <- month.abb[otsego$month]
otsego$month <- factor(otsego$month, levels = month.abb)
otsego
# Let's make an ordinal date variable to get day of year as well
$doy <- yday(otsego$date) otsego
Now that we at least have a handle on where our data come from, we can start to poke around a bit more.
To start with, let’s take a quick look at Temperature
in
our original otsego
dataset using a histogram. If we were
going to do some statistical analyses, this would be our response of
interest in the data set!
ggplot(otsego, aes(x = Temperature)) +
geom_histogram(bins = 30)
From this plot alone, there are a few things that should be obvious.
First, we can see that the distribution of our response is right-skewed, with many more observations between 4 and 10 degrees Celsius than at the extremes. This should make intuitive sense based on the depth of the lake (about 50 m) and the fact that this lake is temperately located.
Second, perhaps more nuanced, is that there are a few zero values and even a couple of negatives. Since we know that fresh water freezes at zero degrees, then we can probably afford to get rid of these for our purposes. If we had the ability, however, we would want to confirm that this was appropriate to do.
We will need to think about both of these characteristics as we move into statistical analyses. One of the central assumptions of modern regression tools relates to normality of residuals, which in the absence of heterogeneous groups, can initially be approximated using a histogram of the response. And, transformations that we might wish to use to achieve normality can be sensitive to the presence of zeroes or negative values.
We can handle both of these problems by log-transforming our response variable like this:
$log_temperature <- log(otsego$Temperature) otsego
We can plot histogram of the residuals again to see what it did:
ggplot(otsego, aes(x = log_temperature)) +
geom_histogram(bins = 30)
Our residuals look more like a normal distribution now, although there is still some skew. Now it is just going in the other direction [eye roll].
As you can see, our response variable is no longer constrained to be greater than zero on the log scale, so we don’t have to worry about getting negative predictions from any of the statistical models that we make now either. We’ll just have to deal with that on the back end.
You should also notice that in both cases now, you’ve gotten warnings to let you know that we have some missing or non-finite values. We’ll be sure to remove these prior to any analyses to make our lives easier down the road. Let’s filter the data set to get only those observations with non-zero temperatures to make our lives a little easier:
<- otsego %>%
otsego filter(!is.na(Temperature) & Temperature > 0)
Now we can move forward with some statistics!
Let’s start with the simple case of comparing
Temperature
between the epilimnion and hypolimnion of the
lake. If you have a basic understanding of statistics, you might
immediately realize that this is a comparison of a continuous variable
between two groups. We have a couple of paths forward here. We could
either set aside distributional assumptions and use
non-parametric methods, or we could assume some
distribution for our error structure and proceed using
parametric or semi-parametric
statistics. Please note that these assumptions should not necessarily be
based on the histogram of the data that we created above.
In either case, we are going to have to make sure we assign each observation to either the epilimnion or hypolimnion before we can start.
To do this, we’ll create a new variable in the data set.
$layer <- "epilimnion"
otsego$layer[otsego$depth > 15] <- "hypolimnion" otsego
Let’s start by fitting a quick Wilcoxon rank-sums test (aka Mann-Whitney U-test). While our options for non-parametric statistics are limited in complexity, they do not require distributional assumptions, and they tend to be more conservative than parametric tests, especially under limited sample sizes (not the case here).
Here is how we do it:
# Wilcox test to assess the null hypothesis
# that there is no difference in temperature between
# epilimnion and hypolimnion.
wilcox.test(Temperature ~ layer, data = otsego)
Wilcoxon rank sum test with continuity correction
: Temperature by layer
data= 25639031, p-value < 2.2e-16
W : true location shift is not equal to 0 alternative hypothesis
Okay, if you have never done a Wilcox test by hand, or even in Excel, please appreciate how ridiculously easy this is. But, more importantly, what is going in the output here?
We can see that we have conducted a Wilcoxon rank sum test. Second, R
gives us a friendly reminder of what our data were that went into this.
Next, we have the actual statistics for the test, including the test
statistic W
and the p-value
for the
comparison. With the handy-dandy alternative hypothesis
that R provides us, we can conclude that since p < 0.05 we reject the
null hypothesis that there is no difference between groups.
While this is useful for making inference, we don’t get a good sense of how the group means actually differed from this test (although we could graph it). For that, we need to start making some assumptions about our data and the shape of the error associated with those data.
Parametric statistics rely on some basic assumptions about or data and the error structure around our observations. There are a number of fundamental assumptions that guide our forays into what is essentially “the rest of statistics”. How we deal with these assumptions can range from doing nothing, to data transformation, to use of alternative model structures that allow us to shift or relax them.
What are these assumptions? Glad you ask!
Linear models you say? Yes. While we don’t have time to go in the ugly math behind this (ahem, the beautiful, unifying math behind this), suffice to say that most of the statistics we use in aquatic research are special cases, or generalizations, of the linear model. This includes things like t-tests, linear regression, ANOVA, ANCOVA, GLM, GLMM, GLS, GEE, and even multivariate statistics. The same holds true whether we are working with classical frequentist tools relying on least-squares estimation and maximum likelihood, or Bayesian methods using MCMC estimation. That means we always need to have these assumptions in mind.
Briefly, these assumptions include:
1. Independence of observations.
2. Normality of residuals (with mean=0).
3. Homogeneity of variances (i.e. homoscedasticity).
We will discuss quickly in the afternoon session, but a more detailed explanation of each can be found here . This is definitely something you should be comfortable with before getting crazy with the stats.
The t-test is our go-to tool for comparing two group means with parametric statistics. Even when we use analysis of variance (ANOVA) to test for differences in means between more than two groups, we still need to use t-tests to follow up and determine pair-wise differences (usually using a Tukey HSD or analagous test to control family-wise error rates).
Using a t-test in R is pretty straightforward. For this example, we
will use a t-test to test the null hypothesis that there is no
difference in Secchi
between Lake
and
Reservoir
classes. To do this, we use the function
t.test()
.
# T-test to assess the null hypothesis
# that there is no difference in temperature
# between the epilimnion and hypolimnion in Otsego Lake.
# We use log_temperature to meet assumptions
# of normality.
# We can specify this one using a formula.
# To be conservative here, we will assume
# that we have unequal variances using
# one of the optional arguments. Note that
# the default in R is to assume that variances
# are not equal, and this defaults to
# a Welch's t-test that uses a calculated df
# to adjust the calculated test statistic.
t.test(log_temperature ~ layer, data = otsego, equal = FALSE)
-test
Welch Two Sample t
: log_temperature by layer
data= 54.947, df = 6136, p-value < 2.2e-16
t : true difference in means between group epilimnion and group hypolimnion is not equal to 0
alternative hypothesis95 percent confidence interval:
0.6318867 0.6786428
:
sample estimatesin group epilimnion mean in group hypolimnion
mean 2.289787 1.634522
Yay, more output!
Similar to the wilcox.test
, the output of our
t.test
gives us the test that was used, the data provided
to the function, the test statistics (now with df
), and the
alternative hypothesis
to be accepted if p < 0.05. But,
in contrast to the Wilcox, we have a little more information to go
on.
This time, we get an actual confidence interval our estimate of the
difference between groups. Here, we see that we are 95% confident that
the true difference in Temperature
between
epilimnion
and hypolimnion
is between 0.63 and
0.68. Note that this difference is positive because of the alphabetical
ordering of the layers. If we had reversed this, our 95% confidence
interval on the estimated difference would have used negative values. In
either case, we can now say that the epilimnion is warmer than the
hypolimnion…shocker.
R also tells us that the mean of log_temperature
is 2.29
for epilimnion
and 1.63 for hypolimnion
. We
can bring these back to the real scale using the function
exp
to say that the mean of Temperature
is
9.87 for epilimnion
and 5.13 for
hypolimnion
.
We could go about showing these differences graphically using (for example) a violin plot.
ggplot(data = otsego,
aes(x = layer, y = log_temperature, color = layer, fill = layer)) +
geom_violin(alpha = 0.10, draw_quantiles = 0.50)
This plot shows us that there is clearly a lot more variability in epilimnetic temperatures, as well! This is because we haven’t begun to touch seasonal or long-term differences. We could start to tease apart seasonal differences with another test from this same family.
It is only a small step to move from a t-test to analysis of variance (ANOVA) conceptually, and this also requires only a small change to code in R. In theory, ANOVA is usually used to compare means of a continuous response when we have three or more groups. In practice, it is mathematically equivelant to a t-test if we have only two groups (the F-statistic is, in fact, just the sqaure of the t-statistic).
To demonstrate this, and to keep ourselves from being swept afield with another data set, let’s demonstrate ANOVA by investigating seasonal changes in lake-wide temperature.
It can be more convenient to store the results of ANOVA in an object
than just dumping the output to the console, so let’s start there.
First, we will fit our ANOVA with the lm
function (stands
for linear model) because this is the most general tool for us.
# Fit an ANOVA to test for differences in
# means between groups
<- lm(log(Temperature) ~ month, data = otsego) mod
Now, before we get into looking at p-values here, it is important that we diagnose the model first. But, we’ve only got a few hours to get through this so we’ll skip it until the end of this analysis. I’m trying to tell a story for now, bear with me.
We can make an ANOVA summary table in R using the lowercase
anova
function to get overall factor significance.
# Get ANOVA summary for the model
anova(mod)
Analysis of Variance Table
: log(Temperature)
ResponsePr(>F)
Df Sum Sq Mean Sq F value 11 2187.7 198.884 853.15 < 2.2e-16 ***
month 11300 2634.2 0.233
Residuals ---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Signif. codes
Here, we can see that our factor month
had a significant
effect on Temperature
, and we can reject the null
hypothesis that there is no difference between any of the group
means.
From there, we can go on to make group-specific boxplots of the response to make the difference clearer for the reader.
ggplot(otsego, aes(x = month, y = Temperature)) +
geom_boxplot()
Non-parametric statistics, t-tests, and ANOVAs are great for testing hypotheses about differences between groups, but they don’t really allow us to examine changes that occur along continual gradients. In our fields, those kinds of changes are often the things in which we are actually interested. For example, we may wish to understand how temperature or Secchi depth change with year to quantify long-term trends. To do this, we will rely on linear regression.
Linear regression takes the familiar form of the equation for a line:
\(y = mx + b\)
Where \(y\) is the dependent variable, \(b\) is the y-intercept, \(m\) is the slope of the line, and \(X\) is the independent variable.
In statistics, we write this as
\(y_{i} = \beta_{0} + \beta_{j} \cdot X_{j,i} + \epsilon_{i}\)
Where \(y_{i}\) is the ith value of the dependent variable, \(\beta_{0}\) is the y-intercept, \(\beta_{j}\) is the slope of the line, \(X_{j,i}\) is an independent variable, and \(\epsilon_{i}\) is our error (assumed to have a mean of zero so it can be dropped- that’s why it is important to check).
More generally, we could extend this to have an arbitrarily large number of slopes and independent variables (i.e., multiple regression):
\(y_{i} = \beta_{0} + \sum_{\substack{j=0}}^k (\beta_{j} \cdot X_{j,i}, ..., \beta_{k} \cdot X_{k,i}) + \epsilon_{i}\)
What fun!
How about fitting this kind of model in R? It will take us less time than it did to write out the equations.
Here, we will fit a linear regression to test the effects of
year
on Temperature
. To do this, we can stick
with the lm()
function.
Next, we will fit the model using the same lm()
function
that we used for ANOVA above. The reason we can do this is that ANOVA,
t-tests, and linear regressions are all just specific cases of the
linear model.
We will fit a model testing the effect of year
on
Temperature
.
# Fit the model
<- lm(log(Temperature) ~ year, data = otsego) lmod
We can get the regression coefficients for our model directly from a summary of a fitted model object.
summary(lmod)
:
Calllm(formula = log(Temperature) ~ year, data = otsego)
:
Residuals
Min 1Q Median 3Q Max -5.8032 -0.3608 -0.0943 0.3745 1.3695
:
CoefficientsPr(>|t|)
Estimate Std. Error t value -4.6433230 1.2440197 -3.733 0.000191 ***
(Intercept) 0.0032721 0.0006207 5.271 1.38e-07 ***
year ---
: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Signif. codes
: 0.6522 on 11310 degrees of freedom
Residual standard error-squared: 0.002451, Adjusted R-squared: 0.002363
Multiple R-statistic: 27.79 on 1 and 11310 DF, p-value: 1.379e-07 F
Here, (Intercept)
is the y-intercept for our model, and
year
is our “slope”. In this case, we see that our
y-intercept is significantly different from zero (p = 0.00019), and we
detected a significant effect of Latitude
on
logSecchi
(p = 1.4e-07). The interpretation of the latter
is that the slope is not equal to zero.
Looking a little closer at the output, we can also see that we have
only explained about 0.2% of the variability in Temperature
using year
as the only explanatory variable (R2
= 0.002). But, let’s plot the result anyway.
In order to plot the result of our model, we need to make some predictions from it first. This is super easy to do in R. You’ll get a warning that you are making predictions from the observed data even though this is supposed to be fore future predictions on new data - fine for now, can talk about that one when we have more time!
<- predict(lmod, interval = "prediction") log_preds
We just need to undo our log transformation once we have our predictions so we can plot them:
# Convert back to real scale
<- apply(X = log_preds, MARGIN = 2, FUN = exp) real_preds
Let’s have a look at what our predictions look like:
# Take a look at the first few
# rows of the preds dataframe
head(real_preds)
fit lwr upr1 6.4346 1.79168 23.10908
2 6.4346 1.79168 23.10908
3 6.4346 1.79168 23.10908
4 6.4346 1.79168 23.10908
5 6.4346 1.79168 23.10908
6 6.4346 1.79168 23.10908
Now we can smash them back together with the raw data:
<- data.frame(otsego, real_preds) otsego_preds
ggplot(otsego_preds, aes(x = year, y = Temperature)) +
geom_point(alpha = 0.10) +
geom_line(aes(y = fit), size = 1, color = "black") +
geom_ribbon(aes(ymin = lwr, ymax = upr),
color = "black",
fill = "gray87",
alpha = .5,
lty = 2,
lwd = .5
+
) theme(legend.position = "none")
Wow, this graph is almost as underwhelming as the R2 for the model that was used to create it. If we had just reported p-values, this would have gone un-noticed and someone might have misused the result. The graph and the R2 tell us that our model, while statistically significant, doesn’t do a great job capturing the trends in our data. You can even see in this graph that we are uncertain about the actual slope of the line by looking at the lower, mean, and upper prediction intervals. If you have been paying attention, or if you spend much time on glacial lakes above latitude 42 N, then you know that this is because 1) there is a ton of seasonal variability that we haven’t captured in this regression and 2) we are looking at depths throughout the lake, which are not likely to change in temperature the same way. How to fix this?
Analysis of covariance (ANCOVA) is as complex as we’ll get with data
analysis today. And, it is a great tool for understanding effects of
multiple explanatory variables as continuous responses. Without getting
into the math, it can accommodate both continuous explanatory variables
like year
and categorical explanatory variables like
month
names. We’ll wrap up this section by combining our
other examples to simultaneously model Temperature
as a
function of year
and month
. This model tests
the hypothesis that temperature has changed across years since the
1990s, while accounting for variation among months.
To simplify, we will work just with the top 5 meters of the water column so we don’t have to argue about how best to incorporate depth in what time remains.
Filter the data to get just the first 5 meters.
<- otsego %>%
surface filter(depth <= 5)
Now we can fit the statistical model:
<- lm(log(Temperature) ~ year + month, data = surface) surface_mod
Typically, here we would do statistical inference next using p-values, model selection, or cross-validation. And then, finally, We can make predictions from this model the same way that we did for linear regression and ANOVA. We’ll take some time to make this pretty during the workshop, of course
# Make predictions from the model on the log scale
<- predict(surface_mod, interval = "prediction")
lpreds
# Back-transform fit, lwr, upr to the real scale by exponentiating
<- apply(lpreds, 2, exp)
rpreds
# Smash it together with the original data.
# Yes R, we know these are not future responses, just do it
<- data.frame(surface, rpreds)
lake_preds
# Plot the predictions against the observed data
ggplot(lake_preds, aes(x = year, y = Temperature)) +
geom_point(alpha = 0.10) +
geom_line(aes(y = fit)) +
geom_ribbon(aes(xmax = year, ymin = lwr, ymax = upr), alpha = 0.1) +
facet_wrap(~month) +
theme_bw()
When we have more than one continuous variable in a regression model,
it can be helpful to think about surfaces instead of
lines. While it sounds difficult conceptually, these
are actually fairly common tools in our everyday lives. Elevation
contours are one such example of a response surface (think
lm(elevation~latitude*longitude)
in the simplest sense now
that you are fluent in R).
In lake management, we frequently run across these kinds of applications when we create bathymetric maps and isopleths.
Here, we will investigate some basic tools for visualizing response surfaces, and specifically, we will create some isopleths.
For this example, we will go back to the otsego
data and
start by making an isopleth of Temperature
for a single
year. In this crowd, the concept probably requires less explanation than
any of the statistics we’ve used so far, but basically we want to come
up with a 3-dimensional way to visualize changes in temperature across
time (days) and space (depth).
NOTE that if we had just read this data set in, one of the first things that we are going to have to deal with is the formatting of dates in this data set. We have done this already.
Let’s start by creating a column with inverted depths so we can plot the lake from surface to bottom and make the graph easier to read.
# Set this aside as a new object
# We'll drop any observations that lack data
<- otsego %>%
lim filter(!is.na(depth) & !is.na(doy) & !is.na(Temperature))
# Multiply depth column by -1 so depth will
# plot from top to bottom.
$depth <- -1 * lim$depth lim
Now, we are ready to do the math behind the isopleth and plot it. To
do the math, we need to load the akima
package:
library(akima)
Next, we interpolate temp
across date
and
depth
. The interpolation we are using is basically just a
bunch of linear regressions to predict temp
for values of
date
and temp
across a regular grid.
# Create a data frame containing the
# x, y, and z variables of interest
<- data.frame(x = lim$doy, y = lim$depth, z = lim$Temperature)
plotter
# Sort it so we have ascending values of x and y
<- plotter[with(plotter, order(x, y)), ]
plotter
# Make a regularly spaced x, y, z grid using
# linear interpolation from the akima package
<- with(
im
plotter,::interp(x, y, z,
akimaduplicate = "mean",
nx = 360,#length(unique(lim$doy)),
ny = 50 #length(unique(lim$depth))
) )
To wrap it all up, we can make the plot. This looks like a lot of code, but it is mostly comments.
# Plot the isopleth
# filled.contour is the function that actually
# makes the contour plot. This is the same function
# that is used in the wtr.heat.map function in the
# RLakeAnalyzer package, but it is executed from
# within a convenience wrapper there, so it is
# hard to customize.
# I tend to work with the filled.contour
# function from the graphics package (included
# in base R and loaded by default). This is
# just a preference driven by need for
# more flexibility.
# Set up plotting window margins
par(mar = c(4, 4, 2, 8))
# Make the graph
filled.contour(
$x, # Variable on x-axis (date)
im$y, # Variable on y-axis (depth)
im$z, # Response (water quality parameter)
im# Could also choose 'grey.colors' or 'terrain.colors'.
# If you want the ramp to go the other way,
# just delete the 'rev'. Note that you will
# need to change the 26 in parentheses to match
# the number of levels that you actually have or
# want to display.
col = topo.colors(30),
# I don't like in-figure titles.
# You can add one, though. You will, however,
# need to change the 'mar' argument in the call
# to par above.
main = expression(paste("Temperature (", degree, "C)")),
# Specify y-axis limits.
ylim = c(min(im$y), max(im$y)),
# Specify x-axis limits. In this case, we are "zooming in" on year 2017
# Can use this one instead if date formatting throws error
# xlim = c(as.numeric(as.Date("2017/05/01")), max(im$x)),
xlim = c(0, 366),
# X-axis label
xlab = "Day of Year",
# Y-axis label
ylab = "Depth (m)",
# This next block of code controls the lines that are overlaid. The interpolation
# is the same, so the axes and the number of levels need to be the same, too.
plot.axes = {
contour(
$x, im$y, im$z, # X, Y, Z values (same)
imnlevels = 26, # Need to change this with the No. on line 37 to be same!
levels = seq(0, 26, 1), # Will need to change for different parameters
drawlabels = FALSE, # Don't want R to draw the labels, they are ugly
col = rgb(1,1,1,0),
add = TRUE); # Add the lines to the existing plot
# X-axis to assign month abbreviation at first ordinal day of each month
axis(1, at = c(1, 32, 61, 92, 122, 153, 183, 214, 245, 275, 306, 336),
labels = month.abb, cex.axis=1.10
)# Y-axis
### Change to match range of data
axis(side=2, at=seq(-50, 0, 5), seq(50, 0, -5), las=2, yaxt='l', cex.axis=1.1);
# Add sampling points to x-axis
points(unique(lim$doy),
rep(-50, length(unique(lim$doy))),
pch=21, bg=rgb(0,0,0,.05), xpd=T)
})
One of the things we have to be careful about with these graphs is that they are models, and if we are missing data at critical time periods, then our models are only as good as our data allow. This can be seen in the plot above. There are some odd artifacts in these estimates that can show up where we are missing data or when lake surface elevation changes due to ice (in the case above) or drawdown. Likewise, this is a composite across nearly 40 years so there are few gaps from January through December, which is not usually the case. If we miss key seasonal events like mixing in spring or fall, these graphs can provide incorrect approximations of the real in-lake phenomena.
Have a go at these isopleths again, substituting DO or pH or looking
at just one year at a time. You should find that the only code that
needs to be changed is the variable used for z
when you
create the plotter
data above, and the title of the
plot.
This work is licensed under a Creative Commons Attribution 4.0 International License. Data are provided for educational purposes only unless otherwise noted.