5.6 Sample statistics
In this section, we will learn how to derive the parameters of the normal distribution using a few different methods in R. We will use this opportunity to re-introduce the parameters as moments of the distribution so we can talk about what we mean by confidence intervals. We also will introduce a couple of different methods for calculating moments of a distribution. Specifically, we will look at how to derive…
5.6.1 Moments about the mean
Sounds fancy, huh? Here they are, like a bandaid:
- Zeroth moment
- This is the sum of the total probability of the distribution 1.00, always
- First moment
- The mean
- We will look at a few ways to calculate this
- Second moment
- The variance
- As with the mean, we will examine a couple of options for calculating
- Third moment
- Skew
- We won’t calculate for this class, but we have discussed, and this parameter contributes to the location/spread of the distribution (how far left or right the peak is)
- Fourth moment
- Kurtosis
- Similarly, we won’t cover the calculation, but this is another moment
that we may have discussed with respect to departure from a z
distribution in the normal
5.6.2 Estimating parameters of the normal distribution from a sample
The tools demonstrated below can be used for most of the probability distributions that have been implemented in R, and we could go on and on forever about them. But, for the sake of our collective sanity we will walk through the tools available using the normal distribution alone. Most of the time this will suffice because our objective in understanding other distributions is really just so that we can use them to assume asymptotic normality in response variables (with transformations) or parameter distributions (with link functions) later on anyway.
5.6.2.1 Method of moments estimator
The moments of the normal distribution are well defined, and you are probably familiar with how to calculate a mean (average) already. See if you can rearrange this in a way that makes sense with how you know to calculate a mean and a variance!
Start by simulating a variable with a known mean and standard deviation. We’ll pretend that we are simulating cold temperatures here:
# Take a random sample from a normal
# with a mean of 20 and a standard
# deviation of 2
test_norm <- rnorm(1e4, 20, 2)
First, we’ll estimate it by making our own function:
# Write the function
# First, define a function by name
norm.mom = function(x){
# Calculate mean
x_bar = (1/length(x)) * sum(x)
# Calculate variance
sigma2 = (1/length(x)) * sum((x-x_bar)^2)
# Return the calculations
return(c(x_bar, sigma2))
}
# Test the function
norm.mom(test_norm)
## [1] 20.004822 4.000842
Because this one is so common, R has built-in estimators that rely on the exact solution provided by the formulas for the first two moments of the normal distribution:
## [1] 20.00482
## [1] 4.001242
Wow, that was a lot less code. That is the beauty of having these functions available. How do these compare to the answers returned by our function if you scroll back up?
5.6.2.2 Maximum likelihood estimator
R also has built-in maximum likelihood estimators that provide an exact solution to the first two moments of the normal distribution. These are available through the MASS
package.
## mean sd
## 20.00482153 2.00021041
## ( 0.02000210) ( 0.01414362)
Only one problem here: R doesn’t report the second moment! It reports the square root of the second moment: the standard deviation!
Finally, let’s write our own function and maximize the likelihood with the optim()
function in R.
# Define the function
normal.lik = function(theta, y){
# The starting value for
# mu that we provide
mu = theta[1]
# The starting value for
# sigma2 that we provide
sigma2 = theta[2]
# Number of observations in the data
n = nrow(y)
# Compute the log likelihood of the
# data (y) using the likelihood
# function for the normal distribution
# given the starting values for our
# parameters (contained in the vector 'theta')
logl = -.5*n*log(2*pi) -.5*n*log(sigma2)-(1/(2*sigma2))*sum((y-mu)**2)
return(-logl)
}
Now, we use the optim
function to maximize the likelihood of the data
(technically by minimizing the -2*log[likehood]) given different values of
our parameters (mu
and sigma2
).
To get started, we need to take a guess at what those parameters could be. (Yes, we know they are mu = 20 and sd = 2)
## $par
## [1] 20.004768 3.999364
##
## $value
## [1] 21121.91
##
## $counts
## function gradient
## 47 NA
##
## $convergence
## [1] 0
##
## $message
## NULL
The pieces are in pars
here (right where we told R to put them!). We can also make the output into an object and call the parts by name:
## Warning in log(sigma2): NaNs produced
## Warning in log(sigma2): NaNs produced
## Warning in log(sigma2): NaNs produced
## Warning in log(sigma2): NaNs produced
## Warning in log(sigma2): NaNs produced
## Warning in log(sigma2): NaNs produced
Look at the structure I’ll be damned, it’s a list! Hey, we learned about those!
## List of 5
## $ par : num [1:2] 20 3.98
## $ value : num 21122
## $ counts : Named int [1:2] 99 NA
## ..- attr(*, "names")= chr [1:2] "function" "gradient"
## $ convergence: int 0
## $ message : NULL
And, here are the estimates:
## [1] 20.00292 3.98411
## [1] 20.00292
## [1] 3.98411
There you have it, a couple of different ways to calculate parameters of the normal distribution using a couple of different approaches each.
5.6.3 Quantiles and other descriptive statistics
There are a number of other ways we might like to describe this this (or any) sampling distribution. Here are a few examples that we will work with this semester.
## [1] 20.02231
## 2.5% 97.5%
## 16.08722 23.94468
## 25% 75%
## 18.64861 21.35609
## [1] 12.95179 27.45577