3.2 Quick data summaries

There are a number of simple ways to summarize data quickly in base R. We already looked at a few of these in previous chapters. But what about something a little more in-depth?

One quick way to look at your data is using the summary() function

summary(am_shad)
##      Sex                 Age       
##  Length:16946       Min.   :1.000  
##  Class :character   1st Qu.:2.000  
##  Mode  :character   Median :3.000  
##                     Mean   :3.155  
##                     3rd Qu.:4.000  
##                     Max.   :7.000  
##                                    
##      Length      yearCollected 
##  Min.   : 3.00   Min.   :2010  
##  1st Qu.:31.00   1st Qu.:2011  
##  Median :38.00   Median :2012  
##  Mean   :36.39   Mean   :2012  
##  3rd Qu.:43.00   3rd Qu.:2013  
##  Max.   :55.00   Max.   :2014  
##                                
##  backCalculated       Mass      
##  Mode :logical   Min.   :   0   
##  FALSE:3046      1st Qu.: 900   
##  TRUE :13900     Median :1120   
##                  Mean   :1173   
##                  3rd Qu.:1440   
##                  Max.   :3280   
##                  NA's   :14115

This is useful for getting the big-picture. For continuous variables (e.g., Age and Length) R will report some descriptive statistics like the mean, median, and some quantiles. For discrete variables (e.g. Sex and backCalculated) we get the mode (if not factor or chr) and counts of observations within each discrete level (e.g. number of observations of B and R in the variable Sex).

But, this approach doesn’t really give us much info.

We can create more meaningful summaries pretty easily if we install and load some packages like we talked about in Chapter 1, and then look at different ways of sub-setting the data with base R and some methods that might be a little more intuitive for you.