3.4 Better data summaries
Now, we’ll look at some slightly more advanced summaries. Start by loading the dplyr
package into your R session with the following code.
We can use functions from the dplyr
package to calculate mean Length
of fish for each combination of Sex
and Age
group much more easily than we did for a single group above.
First, we group the data in measured
data frame that we created previously using the group_by
function. For this, we just need to give R the data frame and the variables by which we would like to group:
This doesn’t change how we see the data much (it gets converted to a tibble
), just how R sees it.
Next, we summarize the variable Length
by Sex
and Age
using the summarize
function:
## # A tibble: 6 x 3
## # Groups: Sex [2]
## Sex Age avg
## <fct> <int> <dbl>
## 1 B 3 38.1
## 2 B 4 40.5
## 3 B 5 42.0
## 4 B 6 43.4
## 5 B 7 46.8
## 6 R 4 45.0
Wow! That was super-easy!
Finally, to make things even more streamlined, we can chain all of these operations together using the %>%
function from magrittr
. This really cleans up the code and gives us small chunks of code that are easier to read than the dozens of lines of code it would take to do this manually.
# This will do it all at once!
sum_out <- # Front-end object assignment
measured %>% # Pass measured to the group_by function
group_by(Sex, Age) %>% # Group by Sex and age and pass to summarize
summarize(avg = mean(Length))
We could also assign the output to a variable at the end, whichever is easier for you to read:
measured %>% # Pass measured to the group_by function
group_by(Sex, Age) %>% # Group by Sex and age and pass to summarize
summarize(avg = mean(Length)) -> sim_out # Back-end object assignment
And, it is really easy to get multiple summaries out like this at once:
sum_out <-
measured %>%
group_by(Sex, Age) %>%
summarize(avg = mean(Length), s.d. = sd(Length))
head(sum_out)
## # A tibble: 6 x 4
## # Groups: Sex [2]
## Sex Age avg s.d.
## <fct> <int> <dbl> <dbl>
## 1 B 3 38.1 2.75
## 2 B 4 40.5 2.70
## 3 B 5 42.0 2.29
## 4 B 6 43.4 2.09
## 5 B 7 46.8 1.61
## 6 R 4 45.0 2.65
Isn’t that slick? Just think how long that would have taken most of us in Excel!
This is just one example of how functions in packages can make your life easier and your code more efficient. Now that we have the basics under our belts, lets move on to how we create new variables.