# Module 2:6 - Assumptions of GLM

## A - Introduction

### 1 - Assumptions of the General Linear Model:

a) The error of the population is distributed normally around 0.

b) The different treatment groups do not differ in their variance.

### 2 - Types of Unruly Data

a) Heterogenous Error Variance: When the variance across all treatment groups is not the same.

b) Non-Normal Error variance: When the variables of the populations are not normally distributed.

c) Outliers: Can represent error introduced in the data that could potentially skewed the sample distribution and other statistics such as power. They have to be accounted for a priori to avoid experimental bias.

## B - Homogeneity of Error for One Factor Models

### 1 - Homogeneity

a) All distributions for each level will have the same variance (Fig. 1.1).

b) The assumption made in the GLM is in accordance with reality.

• Please note that the assumption for equal variance is independent of whether Ho is true or false.

### 2 - Heterogeneity

a) The actual distribution of each group's variance will be different (Fig. 1.2).

b) The assumption made in the GLM is not in accordance with reality.

### 3 - Significance

The p-values that the model produces are derived from the assumption that each group treatment has equal variance in the population. When this is not the case, the model risks overestimating or underestimating the mean square error of the the group treatments, thus yielding p-values that do not accurately reflect the population.

## C - Homogeneity of Error for Two Factor Models

Homogeneity in a two factor model is very similar to that in a one factor model.

### 1 - Homogeneity

a) The variance across all levels of both factors will be equal (Fig. 2.1).

b) Again, the assumption of the GLM is in accordance with reality.

### 2 - Heterogeneity

a) The variance across all levels of both factors is not equal (Fig. 2.2).

b) The assumption of the GLM is not in accordance with reality.

## D - Testing for the Homogeneity of Variance Assumption

### 1 - Testing Within a Single Factor

a) In order to verify the homogeneity of variance assumption we can use the "Fit Y by X" platform of JMP. This platform does not compare across factors, only across levels of the same factor.

b) JMP performs four different hypothesis tests to calculate the probability that our data violate the assumption, that is, that the variance within the different levels across one factor does not differ. In this example, the four tests yielded the conclusion that the variance between levels is in fact different, this is attested by the low p-values.

c) As mentioned before, this does not compare levels between factors. In order to fully verify our assumption we need to know if the variance is equal not only within the levels of one factor but in the combination of the levels in all factors.

### 2 - Testing Between Multiple Factors

a) To this in JMP we again go the the "Fit Y by X" platform but this time with a small modification:

b) What JMP does here is to cross every level of one factor with each of the other, thus producing every possible combination. Similar to the previous window, JMP will once again perform the appropriate tests, this time applying the tests across factors.

c) In this example, the p-values are extreme enough to conclude that the assumption of homogeneity is in fact violated.

### 3 - Visualizing the Assumption

a) Hypothesis tests are not always the most useful way to meaningfully represent the homogeneity of variance assumption. To obtain a useful visualization of the spread of variances in our data, we can use the variability/attribute Gauge platform.

b) The first input shows in a visual plot the size of the variability within each combination of factors. This is useful since sometimes the effect size is more meaningful than mere statistical significance.

Another output that JMP will produce is a control chart, which delineates the region where variances do not differ form the average.

c) This example shows the control table for route variance, in which we see that Gilman falls far below the shaded region denoting the average.

d) It is important to remember that the visualizations that the "variability/attribute Gauge" produces do not necessarily lead to the same inferences as the previous formal hypothesis tests.