Module 2:5 - Factorial ANOVA in JMP
- 1 A - Introduction
- 2 B - Using the Fit Model
- 3 C - Factorial ANOVA Larger Than 2x2
A - Introduction
Factorial ANOVA in JMP considers multiple factors and their interactions, which moves away from previous single factor evaluations. In this module, we will be looking at various methods to extract and display information of a 2x2 design as well as models greater than 2x2, such as the 4x4. There will be an emphasis in scrutinizing individual factors for overall effect, but also specific dependent effects.
Before manipulating any of the data, it is always crucial to check that there are no idiosyncrasies in the data set, which consists of having observations for each combination, consistent numbers in each group, and a relatively normal distribution of observations.
- View using Analyze>Distribution>drop all into Y,Columns
B - Using the Fit Model
1 - Factorial ANOVA
When conducting a factorial ANOVA, using Fit Model in JMP is most useful as it allows us to fit models on the basis of different model effects. It is in essence, a general model with different personalities. Before you begin the model, make sure the output is set to standard least squares (it should be as it is automated, but double check to be sure) and change the effect leverage to minimal report.
First, place your response variable in the y-column. Then designate model effects (model terms). To follow that, take the first factor on the left hand side and click add. Repeat the process for the second factor. To analyze an interaction between the two, select one of factors on right hand side and one factor on the left and click cross. Though, a much quicker way to do this is to select two factors at once on the left hand side then select under MACROS (on the right hand side), select full factorial. Then run the analysis.
After running the analysis, we get the following information from our output:
- The intercept of the model is y-bar, the grand mean across everybody.
- A1 is the offset associated with Gilman Drive. We don't have a value for A2 because it is redundant – knowing A1 we can find A2.
- B1 is the offset associated with time of day. Again, B2 is not given because by definition we know B2.
- AB11 is the first offset for the interaction terms.
Another aspect of the output is the Effect Tests Section: Unlike in a one-way anova, in a full factorial the effect tests are rather useful. In this output, we have the sum of squares for each factor and interaction listed separately, so that we can see the effects of each factor as well as the effects of factor A by factor B. We also have an F ratio for each factor and interaction, in addition to p-values for each.
Now let's take a look at the Analysis of Variance: This is great for testing all the terms at once. First, we have the Sums of Squares total, the total variation in y. Next we have the Sums of Squares Error, the total amount of error remaining after we fit each factor. Then, the Sums of Squares Model, the total sums of squares that are explained by the different sources in the model (the main effects and interaction).
And lastly, we have the Effect Details. In this, we have the Least Squares Means Table, which provides details on the observed means we have for each level of each factor (i.e. different routes, times of day, and cross over). Though it is useful looking at the table, it is most useful to look at plots of these different means, which provides a visualization of the effects. Both the table and plot is accessible under the red triangle of each factor, such as RoutexTime of Day, and are labeled LS Means Table and LS Means Plot respectively.
From this, we can see the time to campus least squares means by route. Around each point is the 95% confidence interval, which shows that on average La Jolla Village Drive was slower than Gilman Drive. Statistically we can see this by looking back to the original tests. This route by time of day facotrial plot is nice, because we can see the effects of the routes (the midpoints between the lines) and the overall effect of each time of day (the midpoints of the time of day ). If we wanted to switch factors on factorial plot in JMP, we would deselect the plot --> hold down the shift key --> re-select the plot and then let go of shift key – jmp will automatically switch the axes.
If we want to make predictions based of analysis, we would take a look at the prediction profiler.
To do this: click on the top most red triangle next to Respose time to campus –-> select factor profiling –-> and finally select profiler. This shows us that depending on which level we have dialed in, we have a predicted mean and confidence interval for the time to campus. In essence this is showing that if I am taking Gilman Drive to campus at 8 am, the prediction is that it would take 669 seconds to get to campus. But what we really want to do with prediction profiler is click between profiles. So when you click between La Jolla Village Drive and Gilman Drive pay attention to time of day line and then click on La Jolla Village Drive the plot jumps for the time of day (effect switches). And when you click between 8 am and 9:30, at 8 it is a huge difference in means but at 930 it's a smaller difference. The difference in time of day depends on what route we are taking. Though, either way of looking at it, whichever factor is first, it is completely symmetrical.
2 - Pairwise Comparisons
As mentioned in previous modules, pairwise comparisons evaluate hypothesis tests of specific mean differences. For Factorial ANOVA in JMP, there are a couple of compatible ways to accomplish this.
Recall in a previous module the Student's t and Tukey function were used to look into effects from comparing variables. Under each category source there is a red triangle that will provide the LSMeans for both functions. Turn off the Crosstab Report and turn on the Ordered Differences Report and Detailed Comparisons for a quick analysis of comparisons.
1)Connecting Letters Report
- Comparisons differentiated by letters (differences indicate statistical significance).
2)Ordered Differences Report
- Consolidated table of comparisons.
- Has sampling distribution representation.
Tukey HSD is far more powerful in accurately detecting effects in comparison to using the Student's t as it corrects for all possible comparisons. In this particular sample of response time to campus data, both tests have identical output after comparing means. In summary, using Student's t or Tukey HSD will output all possible comparisons.
The LSMeans Contrast function, otherwise known as a linear contrast, allows for the comparison of specific means. This is located under the same red triangle that pulled up LSMeans Student's t and LSMeans Tukey HSD. The contrast function is especially helpful for larger designs that call for specific comparisons rather than sorting through many outputs.
- Click +1 for the first mean>-1 for the second mean>Done
However, remember that the contrast function is not a corrected test, which means running too many will cause an inflation of the alpha level (though it may be corrected using the Bonferroni Correction) and provide inaccurate results.
C - Factorial ANOVA Larger Than 2x2So far we've looked at factorial ANOVA examples consisting only of a 2x2 design, a 2 factor experiment design with only 2 levels of each factor.
However, as you might assume, there are many situations in which we will have more than 2 levels for each of our experimental factors. When analyzing a factorial ANOVA larger than a 2x2 we will still use the 2 factor linear model, only difference being, we'll have more aj's, bk's, and (ab)jk's.Our tests for the effects of our two factors will use the same equations as a 2x2 design, as well as our test for the interaction between factors.
1 - Factorial ANOVA 4x4
For our example of a 4x4 factorial design we will use the data set titled Times To Campus 4x4.
Determining Equal Combination of Observations
The first thing you want to do before analyzing any data set, including this one, is to look at the distribution histograms and statistics to make sure that there are no idiosyncrasies or problems (extreme distributions (outliers), things that don't make sense, typos/coding issues, etc...). There are three ways to determine equal observations.
1. You can click through the different histograms in the distribution platform to make sure there are observations in each level.
- This is a more visual approach.
2. From the original data set go to Analyze --> Fit Y by X. Place "Time of Day" (nominal variable) in the Y box and Route (nominal variable) in the X Factor, Hit OK. This produces a mosaic plot that allows you to see if you have equal observations across levels.
- This is also a more visual approach.
3. From the original data set go to Table --> Summary. Select the continuous variable, "Times to Campus", Select N under the Statistics drop down menu (we choose N here because the statistic that we are interested in is the number of observations that are in the "Time to Campus" column. Now put "Route" and "Time of Day" (the nominal variables) into the Group box, and hit OK. You will be provided with a table that shows the number of rows counting in each of the combinations.
- Table --> Summary is a way of getting statistics for a combination of factors or individual factors.
- Orthogonal Analysis of Variance: analysis of variance techniques are done best when you have equal number of observations in every cell.
- Having equal observations in every cell is not a requirement, but if there is ever a cell with 0 observations that is a problem.
- Tables --> Summary method is recommended when doing analyses of variance with larger designs, with more than 2 factors.
Fitting The ANOVA
Once you are sure that your data is free of any problems you can fit the analysis of variance. To do this go to:
Analyze --> Fit Model --> Place the continuous variable, "Time to Campus" in the Y box --> Select both "Route" and "Time of Day" (the nominal variables) --> Select Full Factorial from the Macros drop down menu --> Select Minimal Report from the Emphases drop down menu --> Hit the Run button
- In setting up the factorial model nothing changes between doing a 2x2 design or a 4x4 design
- The difference is seen after running the analysis in that the model now requires more parameters to represent all the different conditions (cells) in the design.
After "running" the analysis of variance the first tab you should explore is the Effects Test tab, which shows us the statistical significance for each of the different sources in our model -- the main effects and the interaction.
Next we want to look at the Least Squares Means Plots. To do this we should expand the Effect Details tab.
- To quickly produce the LS Means Plots for all sources hold down the command/ctrl key --> click the red triangle beside Route --> Select LS Means Plot --> Now anywhere a LS Means Plot can be created JMP will create it for you.
- The LS Means Plot allows you to analyze factors while ignoring everything else.
- In the Factorial (Interaction) LS Means Plot, if the lines are not parallel there is something in addition to the overall effects that is happening in the data.
When analyzing the analysis of variance output in JMP it is wise to also produce the Factorial Profiler. To do this we go to the red triangle next to the Response Time To Campus --> Scroll over Factor Profiling --> Select Profiler
- The Profiler allows you to click through dynamically to see what effects you have in one factor based on the level your in of the other factor.
Now that we've done all of our overall analysis, we can run some follow up tests.
2 - Testing Slices
A test of a slice is a linear contrast testing the overall effect of one factor at the single level of another factor. For example, we would be looking at the time of day across a single route, such as Gilman Dr. Essentially, testing slices makes comparisons by grouping variables for consideration.
- Similarly, a sub-design ANOVA is where a single factor ANOVA is conducted on one factor and another factor is held constant.
Under the red triangle of the category source, in this case RoutexTimeofDay, select Test Slices. In this particular example, there are four routes and four times of day being evaluated, which produces eight slices, one for each separate factor. Each slice indicates which level is being held constant, so for the slice with value Genessee Ave, the slice would be evaluating the time of day effect for Genessee Ave.