Module 2:2 - ANOVA and the General Linear Test
- 1 Fisher-Snedecor Distribution
- 2 Sums of Squares in ANOVA
- 3 Degrees of Freedom
- 4 General Linear Test Approach
The Fisher-Snedecor Distribution describes what would happen in nature if you were to take a variance estimate from one sample, then take another sample from the same population and form another variance estimate. This allows us to test the τj's (or taus) from our sample data to the population data. We can see all at once whether our treatment offsets are actually 0.
When you take the ratio of two variance estimates, the distribution follows a specific function. The distribution of the ratio of variances is distributed as the F distribution.
Degrees of Freedom (df): Independent pieces of information that are used in the F distribution
As we have learned before, larger samples produce less skewed sampling distributions of variance. These are all unbiased in that all have a mean that is equal to the population variance. This skew however is very important because it gives the F distribution its characteristic shape.
We can use the F distribution to determine whether all treatment offsets (or τj's) are equal to zero. It will tell all at once whether it's likely to be zero or likely to not be zero. All τj's will be non-zero due to sampling error alone.
Based on our H0, the variance of the τj's = 0. This is the same as saying every observation is identical. We can test whether we think population τj's are different from zero or not based on sample data.
To do this we must form a test statistic as a ratio of variances.
By this we are able to tell whether treatment offsets (τj's) are reasonably different from what we would expect just by chance error.
Graphing the F Distribution
The F distribution is a ratio of chi-square distributions. in other words, when we take a sample, estimate a population variance and repeat we get a chi-square distribution. The Fisher-Snedecor distribution is the shape of the ratio.
This distribution is not symmetric as you are not equally likely to get values above 1 as below 1. This distribution has a skew related the the skew of variance estimates. This skew will depend on how many observations we have and the shape depends on the sampling distributions of variance estimates.
If the ratio consists of a small number and a large number, there will be many F ratios close to 0 and 1. The average of the distribution (the mean) will still be the same but because the shape of sampling distributions of variance estimates, the shape of the F distribution will change. This change will dramatically depend on how many observations go in to the numerator and denominator estimate.
Interpreting the F Distribution
If variance of treatments from population is very large and we take a sample from it, than the variance of treatment to variance of error should be larger than 1.
By doing this, we should be able to capture what we expect by chance and what we actually observe.
The numerator is actually capturing two things: systematic variance + random variance if H1 is true (Not all τj's = 0)
If there actually is some diff between τj's and population, then the variance in the numerator will tend to be very large.
If there are no true effects, H0 is true and our equation is
random variance/random variance
By this equation we know what values of F we should expect to get If we get a value that is very different, we should be able to reject H0 We can also get a p-value to find how unlikely a value is if H0 true
We can see our final equation by looking into the SSerror and SStreatment as well as the dferror and dftreatment
SSerror - deviations between actual scores and individual scores squared (how much spread out around group means)
(SSwithin): within a group or amount of error within own group
MStreat :deviation among the τj's divided by degrees of freedom SStreat sum of squared diffs for each individual predicted score from grand mean
The sum for each individual of predicted score(group mean) - grand mean SSbetween - SS between groups vs SSwithin
Sums of Squares in ANOVA
Partitioning the Sums of Squares
Every individual in a data set has a value for treatment deviation and error deviation; we are partitioning each individual’s score into one part treatment and one part error.
The calculation for SSerror and SStreatment share a common variable: ŷij, or the predicted score for an individual. Note that this variable is just the individual’s group mean without taking individual error into account. By adding up the equations for calculating SSerror and SStreatment, we get the equation for calculating SStotal. Notice that by canceling out the common variable ŷij, we are left with Yij - Y, which represents how much an individual’s score deviate from the grand mean.
Analysis of variance (ANOVA) helps us understand how big the treatment effect is relative to error.
Consider 2 individuals in the cost of flight example: Y(10)(1) and Y(10)(2).
In the first graph, we see the distance between their individual scores (orange dots) and the grand mean (blue dots), which represents each individual’s total deviation. The second graph shows the distance between their individuals scores (orange dots) and their own group means (dark blue dots), which is also known as error deviation. The third graph shows the distance between their own group means (dark blue dots) and the grand mean (blue dots), which is also known as treatment deviation. The total deviation is perfectly cut up into one part error deviation and one part treatment deviation.
SStreatment is the same predicted score for every individual in the same group, because it is simply the treatment offset between the group’s mean and grand mean. Therefore, the equation for calculating SStreatment can also be written as:
- nj = number of individuals in group j, and
- tj = group j's treatment offset between group j’s mean and grand mean
Degrees of Freedom
- Degrees of freedom represents how much independent information we have.
- In ANOVA, we are allocating some of the independent information to SStreatment, and some to SSerror. Similar to sums of squares, the total degrees of freedom we have in a data set (DFtotal) is going to be one part DFerror and one part DFtreatment (see figure).
- Like the sums of squares, the degrees of freedom add up.
- DFerror + DFtreatment = DFtotal
Degrees of Freedom for SStotal (DFtotal)
- Recall that we use SS/n-1 to calculate variance in a sample. The n-1 here represents the degrees of freedom for SStotal.
- We use n-1 because we have to calculate the grand mean in order to calculate the sums of squares and the variance, and calculating the grand mean takes away one degree of freedom, or one independent piece of information.
- For example, if we have 15 pieces of independent information in a data set, we have 15 degrees of freedom to start with. However, after calculating the grand mean, we only have 14 pieces of independent information, because we can calculate the last piece of information by using the grand mean, so that piece of information is no longer independent. Our degrees of freedom becomes 15-1=14.
Degrees of freedom for SStreatment (DFtreatment)
- DFtreatment = j-1, where j=the number of groups.
- This is because the treatment offsets for all groups have to add up to 0, because we found these offsets from the grand mean. Therefore, we only have j-1 degrees of freedom or pieces of independent information.
- For example, if there were three groups, and we knew the treatment offset of two groups, we can calculate the third treatment offset because the sum of these offsets is equal to 0. Therefore, the third treatment offset is not independent information, and we have 3-1=2 degrees of freedom.
Degrees of freedom for SSerror (DFerror)
- DFerror uses up the remaining degrees of freedom we have and can be calculated in two ways:
- The first equation calculates DFerror by subtracting one degree of freedom from the number of individuals in each group (for similar reason as calculating for DFtreatment), and add across all groups.
- The second equation calculates DFerror by subtracting DFtreatment (j-1) and the one DF we used for calculating the grand mean from the total number of individuals in the data set.
General Linear Test Approach
General Linear Test Formula
In the general linear test, we add parameters, and see how much error we produce by doing so, per parameter added. We compare the amount of error in the full model versus the amount of error in the reduced model. It is a model comparison.
To do so, we...
-Specify a full (unrestricted) model and determine amount of error
-Specify a reduced (restricted) model and determine amount of error
-Test the reduction in error, and divide that by the number of added parameters.
-Divide everything by the baseline error. We have error from the full model, and the restricted model. We use the error from the full model. This is because when the null hypothesis is true, the error from the full model is simply random. So this is the best, unbiased guess of the population variance, since it is uncontaminated by treatment offsets (if the treatment is actually even there).
-With this test, we still find an F-statistic, and this test ends up being identical to the analysis of variance test.
Full Model v Reduced Model
Same as one factor linear model
Yij = mean + treatment offset + individual error
Under the null hypothesis, hold constant the tau sub j’s to be zero
yi= grand mean + error
We want to test whether the full model fits our data better than the reduced model.
What we are trying to determine, is for every parameter added, have we reduced error enough to think that it is simply sampling error alone.
Further Explanation of Each Component the Formula
Reduction in error: The reduced model will always have more error, so we take the sums of squares error of the reduced model, and subtract the sums of squares error of the full model (to get a positive value).
Added parameters: The reduced model will always have fewer parameters than the full model, so the difference in degrees of freedom of the two models will give us the number of added parameters. Again, we start with the reduced model, and we take the degrees of freedom from the reduced model, and subtract the degrees of freedom from the full model.
Baseline error: This is the amount of error we should expect in the world. We have two different measures of error, the error from the restricted and the full models. Again, we use the error from the full model. This is because when the null hypothesis is true, the error from the full model is simply random. So this is the best, unbiased guess of the population variance, since it is uncontaminated by treatment offsets (if the treatment is actually even there).
We want to see if the treatment offsets (how they differ from the grand mean) are more different/larger than what random grouping would be.
Assuming we have the same number of observations, the more groupings we have, the more likely we are to see deviations from the grand mean. The whole purpose of the general linear test is to see how much we produced error, per parameter added.
How the General Linear Test Compares to the Analysis of Variance
When we take a look at the numerator, we have the sums of square error from the reduced model, minus the sums of square error from the full model
All of that is divided by the degrees of freedom from the reduced model, minus the degrees of freedom from the full model
The sums of squared and degrees of freedom from the reduced model is really just the sums of squares and degrees of freedom total, and the sums of squared and degrees of freedom from the full model are really the sums of squares and degrees of freedom error. We know that total - error = treatment. This simply turns the entire numerator into the mean square treatment.
With the denominator, we have the baseline error, for which we used the full model to represent. In the denominator, we have the sums of square error from the full model divided by the degrees of freedom from the full model, this is equivalent to the mean square for error we developed in analysis of variance test.
Therefore, the entire formula is simplified to mean square treatment over mean square error.