Independence of observations

This assumption basically means that each row of your data was collected independently of all others. In other words, no two rows of your data are related to one another. This assumption is rarely met in practice, although, depending on the extent, it may not be an issue. When it is an issue, the effect is generally noticed the precision of estimated of variances, and not in the accuracy of estimated means.

We often rely on tools like variance inflation factors to determine the extent of the problem. This also is one of the primary reasons we use model selection tools to discriminate between models containing different combinations of potentially collineaer explanatory variables.

Note that we can relax this assumption by explicitly including variables and constructs within the models that actually account for these kinds of relationships.

For example: in one-way ANOVA, we include grouping (factor) variables to to account for non-independence of some observations. In fact, this lack of independence is often the very thing we are interested in testing! In ANOVA, for example, we are interested in whether individuals from the same group respond in the same way. Note that this in turn places the assumption of independence on our groups

This assumption cannot, in practice, be tested within the analysis phase of a study. So, it needs to be addressed within the context of experimental design, and if not met may require alternatives to the simple cases of one-way ANOVA, ANCOVA or simple linear regression.

Normality of residuals

In all linear models we make the assumption that the residual error of our model is normally distributed with a mean of zero. This allows us to drop the error term, ‘epsilon’ from computation in the model fitting and allows us to calculate an exact solution in the case of ANOVA and linear regression (technological advances have really made this unecessary because we solve everything through optimization now and not ordinary least squares).

There are a multitude of tools at our disposal for examining normality of the residuals for linear models. One option is to examine group-specific error structures as a surrogate for residual error prior to analysis. The other option is to examine diagnostic plots of residuals directly from a fitted model object in R or other software programs (this is actually the more appropriate tool).

Homogeneity of variances

This assumption states that the residual error within groups is about equal (still normal, with a mean of zero).

One option for assessing the validity of this assumption is to use the Kolmogrov-Smirnov test to determine whether the data from different groups come from the same distribution. The null hypothesis for this test is that the two distributions do not differ.

But: this only works for grouping variables, and it only compares two groups at a time, so this could be very cumbersome. And, since we are using R to make our lives easier, we do not want cumbersome if we can help it.This test also is supposed to be reserved for continuous distributions.




This work is licensed under a Creative Commons Attribution 4.0 International License. Data are provided for educational purposes only unless otherwise noted.