3.2 Quick data summaries

There are a number of simple ways to summarize data quickly in base R. We already looked at a few of these in previous chapters. But what about something a little more in-depth?

One quick way to look at your data is using the summary() function

summary(am_shad)

##  Sex           Age            Length      yearCollected  backCalculated 
##  B:9512   Min.   :1.000   Min.   : 3.00   Min.   :2010   Mode :logical  
##  R:7434   1st Qu.:2.000   1st Qu.:31.00   1st Qu.:2011   FALSE:3046     
##           Median :3.000   Median :38.00   Median :2012   TRUE :13900    
##           Mean   :3.155   Mean   :36.39   Mean   :2012                  
##           3rd Qu.:4.000   3rd Qu.:43.00   3rd Qu.:2013                  
##           Max.   :7.000   Max.   :55.00   Max.   :2014                  
##                                                                         
##       Mass      
##  Min.   :   0   
##  1st Qu.: 900   
##  Median :1120   
##  Mean   :1173   
##  3rd Qu.:1440   
##  Max.   :3280   
##  NA's   :14115

This is useful for getting the big-picture. For continuous variables (e.g., Age and Length) R will report some descriptive statistics like the mean, median, and quantiles. For discrete variables (e.g. Sex and backCalculated) we get the mode (if not factor or chr) and counts of observations within each discrete level (e.g. number of observations of B and R in the variable Sex).

But, this approach doesn’t really give us much info.

We can create more meaningful summaries pretty easily if we install and load some packages like we talked about in Chapter 1, and then look at different ways of sub-setting the data with base R and some methods that might be a little more intuitive for you.