# Module 2:2 - ANOVA and the General Linear Test

Youtube Playlist Associated Files

## Contents

## Fisher-Snedecor Distribution[edit]

The Fisher-Snedecor Distribution describes what would happen in nature if you were to take a variance estimate from one sample, then take another sample from the **same** population and form another variance estimate. This allows us to test the τ_{j}'s (or taus) from our sample data to the population data. We can see all at once whether our treatment offsets are actually 0.

When you take the ratio of two variance estimates, the distribution follows a specific function. The distribution of the ratio of variances is distributed as the F distribution.

**Remember**

**Degrees of Freedom** * (df)*: Independent pieces of information that are used in the F distribution

As we have learned before, larger samples produce less skewed sampling distributions of variance. These are all unbiased in that all have a mean that is equal to the population variance. This skew however is very important because it gives the F distribution its characteristic shape.

We can use the F distribution to determine whether all treatment offsets (or τ_{j}'s) are equal to zero. It will tell all at once whether it's likely to be zero or likely to not be zero. All τ_{j}'s will be non-zero due to sampling error alone.

Based on our H_{0}, the variance of the τ_{j}'s = 0. This is the same as saying every observation is identical. We can test whether we think population τ_{j}'s are different from zero or not based on sample data.

To do this we must form a test statistic as a ratio of variances.

By this we are able to tell whether treatment offsets (τ_{j}'s) are reasonably different from what we would expect just by chance error.

### Graphing the F Distribution[edit]

The F distribution is a ratio of chi-square distributions. in other words, when we take a sample, estimate a population variance and repeat we get a chi-square distribution. The Fisher-Snedecor distribution is the shape of the ratio.

This distribution is * not* symmetric as you are not equally likely to get values above 1 as below 1. This distribution has a skew related the the skew of variance estimates. This skew will depend on how many observations we have and the shape depends on the sampling distributions of variance estimates.

If the ratio consists of a * small* number and a

*number, there will be many F ratios close to 0 and 1. The average of the distribution (the mean) will still be the same but because the shape of sampling distributions of variance estimates, the shape of the F distribution*

**large***change. This change will dramatically depend on how many observations go in to the numerator and denominator estimate.*

**will**

### Interpreting the F Distribution[edit]

If variance of treatments from population is very large and we take a sample from it, than the variance of treatment to variance of error should be larger than 1.

By doing this, we should be able to capture what we expect by chance and what we actually observe.

The numerator is actually capturing two things: **systematic variance + random variance** if H_{1} is true (Not all τ_{j}'s = 0)

If there actually is some diff between τ_{j}'s and population, then the variance in the numerator will tend to be very * large*.

If there are no true effects, H_{0} is true and our equation is

**random variance/random variance**

By this equation we know what values of F we should expect to get
If we get a value that is very different, we should be able to reject H_{0}
We can also get a p-value to find how unlikely a value is if H_{0} true

We can see our final equation by looking into the SS_{error} and SS_{treatment} as well as the
df_{error} and df_{treatment}

SS_{error} - deviations between actual scores and individual scores squared (how much spread out around group means)

(SS_{within}): within a group or amount of error within own group

MS_{treat}
:deviation among the τ_{j}'s divided by degrees of freedom
SS_{treat} sum of squared diffs for each individual predicted score from grand mean

The sum for each individual of predicted score(group mean) - grand mean
SS_{between} - SS between groups vs SS_{within}

## Sums of Squares in ANOVA[edit]

### Partitioning the Sums of Squares[edit]

Every individual in a data set has a value for **treatment deviation** and **error deviation**; we are partitioning each individual’s score into one part treatment and one part error.

The calculation for SSerror and SStreatment share a common variable: ŷij, or the predicted score for an individual. Note that this variable is just the individual’s group mean without taking individual error into account. By adding up the equations for calculating SSerror and SStreatment, we get the equation for calculating SStotal. Notice that by canceling out the common variable ŷij, we are left with Yij - Y, which represents how much an individual’s score deviate from the grand mean.

Analysis of variance (ANOVA) helps us understand how big the treatment effect is relative to error.

#### Example[edit]

Consider 2 individuals in the cost of flight example: Y(10)(1) and Y(10)(2).

In the first graph, we see the distance between their individual scores (orange dots) and the grand mean (blue dots), which represents each individual’s **total deviation**. The second graph shows the distance between their individuals scores (orange dots) and their own group means (dark blue dots), which is also known as **error deviation**. The third graph shows the distance between their own group means (dark blue dots) and the grand mean (blue dots), which is also known as **treatment deviation**. The **total deviation** is perfectly cut up into one part **error deviation** and one part **treatment deviation**.

### Understanding SStreatment[edit]

SStreatment is the same predicted score for every individual in the same group, because it is simply the treatment offset between the group’s mean and grand mean. Therefore, the equation for calculating SStreatment can also be written as:

- nj = number of individuals in group j, and
- tj = group j's treatment offset between group j’s mean and grand mean

## Degrees of Freedom[edit]

- Degrees of freedom represents how much independent information we have.
- In ANOVA, we are allocating some of the independent information to SStreatment, and some to SSerror. Similar to sums of squares, the total degrees of freedom we have in a data set (
**DFtotal**) is going to be one part**DFerror**and one part**DFtreatment**(see figure). - Like the sums of squares, the degrees of freedom add up.
**DFerror + DFtreatment = DFtotal**

### Degrees of Freedom for SStotal (DFtotal)[edit]

- Recall that we use SS/n-1 to calculate variance in a sample. The n-1 here represents the degrees of freedom for SStotal.
- We use n-1 because we have to calculate the grand mean in order to calculate the sums of squares and the variance, and calculating the grand mean takes away one degree of freedom, or one independent piece of information.
- For example, if we have 15 pieces of independent information in a data set, we have 15 degrees of freedom to start with. However, after calculating the grand mean, we only have 14 pieces of independent information, because we can calculate the last piece of information by using the grand mean, so that piece of information is no longer independent. Our degrees of freedom becomes 15-1=14.

### Degrees of freedom for SStreatment (DFtreatment)[edit]

- DFtreatment = j-1, where j=the number of groups.
- This is because the treatment offsets for all groups have to add up to 0, because we found these offsets from the grand mean. Therefore, we only have j-1 degrees of freedom or pieces of independent information.
- For example, if there were three groups, and we knew the treatment offset of two groups, we can calculate the third treatment offset because the sum of these offsets is equal to 0. Therefore, the third treatment offset is not independent information, and we have 3-1=2 degrees of freedom.

### Degrees of freedom for SSerror (DFerror)[edit]

- DFerror uses up the remaining degrees of freedom we have and can be calculated in two ways:
- The first equation calculates DFerror by subtracting one degree of freedom from the number of individuals in each group (for similar reason as calculating for DFtreatment), and add across all groups.
- The second equation calculates DFerror by subtracting DFtreatment (j-1) and the one DF we used for calculating the grand mean from the total number of individuals in the data set.

## General Linear Test Approach[edit]

**General Linear Test Formula**

In the general linear test, we add parameters, and see how much error we produce by doing so, per parameter added. We compare the amount of error in the full model versus the amount of error in the reduced model. It is a **model comparison**.

To do so, we...

*-Specify a full (unrestricted) model and determine amount of error*

*-Specify a reduced (restricted) model and determine amount of error*

*-Test the reduction in error, and divide that by the number of added parameters.*

*-Divide everything by the baseline error.*
*We have error from the full model, and the restricted model. We use the error from the full model. This is because when the null hypothesis is true, the error from the full model is simply random. So this is the best, unbiased guess of the population variance, since it is uncontaminated by treatment offsets (if the treatment is actually even there).*

-With this test, we still find an F-statistic, and this test ends up being identical to the analysis of variance test.

**Full Model v Reduced Model**

*Full model*

Same as one factor linear model

Yij = mean + treatment offset + individual error

*Reduced model*

Under the null hypothesis, hold constant the tau sub j’s to be zero

yi= grand mean + error

We want to test whether the full model fits our data better than the reduced model.

What we are trying to determine, is for every parameter added, have we reduced error enough to think that it is simply sampling error alone.

**Further Explanation of Each Component the Formula**

*Reduction in error*: The reduced model will always have more error, so we take the sums of squares error of the reduced model, and subtract the sums of squares error of the full model (to get a positive value).

*Added parameters*: The reduced model will always have fewer parameters than the full model, so the difference in degrees of freedom of the two models will give us the number of added parameters. Again, we start with the reduced model, and we take the degrees of freedom from the reduced model, and subtract the degrees of freedom from the full model.

*Baseline error*: This is the amount of error we should expect in the world. We have two different measures of error, the error from the restricted and the full models. Again, we use the error from the full model. This is because when the null hypothesis is true, the error from the full model is simply random. So this is the best, unbiased guess of the population variance, since it is uncontaminated by treatment offsets (if the treatment is actually even there).

**Graphically:**

We want to see if the treatment offsets (how they differ from the grand mean) are more different/larger than what random grouping would be.

Assuming we have the same number of observations, the more groupings we have, the more likely we are to see deviations from the grand mean. The whole purpose of the general linear test is to see how much we produced error, per parameter added.

**How the General Linear Test Compares to the Analysis of Variance**

When we take a look at the numerator, we have the *sums of square error from the reduced model,* minus the

*sums of square error from the*

**full**modelAll of that is divided by the *degrees of freedom from the reduced model,* minus the

*degrees of freedom from the*

**full**model

The sums of squared and degrees of freedom from the reduced model is really just the sums of squares and degrees of freedom total, and the sums of squared and degrees of freedom from the full model are really the sums of squares and degrees of freedom error. We know that total - error = treatment. This simply turns the entire numerator into the **mean square treatment.**

With the denominator, we have the baseline error, for which we used the full model to represent. In the denominator, we have the *sums of square error from the full model* divided by the *degrees of freedom from the full model,* this is equivalent to the * mean square for error* we developed in analysis of variance test.

Therefore, the entire formula is simplified to mean square treatment over mean square error.