3.2 Quick data summaries
There are a number of simple ways to summarize data quickly in base R. We already looked at a few of these in previous chapters. But what about something a little more in-depth?
One quick way to look at your data is using the summary()
function
## Sex Age Length yearCollected backCalculated
## B:9512 Min. :1.000 Min. : 3.00 Min. :2010 Mode :logical
## R:7434 1st Qu.:2.000 1st Qu.:31.00 1st Qu.:2011 FALSE:3046
## Median :3.000 Median :38.00 Median :2012 TRUE :13900
## Mean :3.155 Mean :36.39 Mean :2012
## 3rd Qu.:4.000 3rd Qu.:43.00 3rd Qu.:2013
## Max. :7.000 Max. :55.00 Max. :2014
##
## Mass
## Min. : 0
## 1st Qu.: 900
## Median :1120
## Mean :1173
## 3rd Qu.:1440
## Max. :3280
## NA's :14115
This is useful for getting the big-picture. For continuous variables (e.g., Age
and Length
) R will report some descriptive statistics like the mean
, median
, and quantiles. For discrete variables (e.g. Sex
and backCalculated
) we get the mode (if not factor
or chr
) and counts of observations within each discrete level (e.g. number of observations of B
and R
in the variable Sex
).
But, this approach doesn’t really give us much info.
We can create more meaningful summaries pretty easily if we install and load some packages like we talked about in Chapter 1, and then look at different ways of sub-setting the data with base R and some methods that might be a little more intuitive for you.