Module 2:3 - ANOVA and Pairwise Comparisons in JMP
- 1 One-Factor ANOVA in JMP
- 2 Main Effect
- 3 Pairwise Comparison
- 4 The Problem of Multiplicity and Alpha Escalation
- 5 Planned vs. Unplanned Comparisons
- 6 Summary
One-Factor ANOVA in JMP
In JMP, we can perform One-Factor (one way) ANOVA using two different methods: Fit Y by X, and Fit Model. Using the Cost of Flight data generated, we will be seeking an explanation for variances among each group, trying to answer the question: why do people's cost of flight differ from the grand mean. It is our hope to analyze a factor that can either absorb or explain some of the variability. Variability being the differences between individual costs and the grand mean. For this module, we will be exploring the functionalities of some of the JMP features that allow us to perform ANOVA and Pairwise comparison.
Fit Y by X
Using the Fit Y by X platform in JMP (Distribution >> Fit Y by X), we assign
- cost of flight as the Y, responses: individuals containing variability.
- Airline as the X, factors: things we want to make predictions from.
From the red triangle menu for Oneway Analysis, select Means/Anova. In the dropped down report, we will notice a various of display providing important information for us to analyze the data.
Analysis of Variance
Sum of Squares
- SStotal: C. Total, also known as the corrected total, is the sum of squares associated with the Y column (cost of flight), which is the value of sum of squares if we ignore the model all together and simply looking for the variability in cost of flight. Our ANOVA model is working to explain this total variability on the basis of some treatment or group.
- SStreatment:Sum of Squares of treatment is the sum of squares associated with a certain explanatory factor, which is the Airline group in this example. This value explains how much variability a treatment/group can explain in the Sum of Square Total.
- SSError:Sum of Squares of error is the raining variance. This is the sum of squares within the degree to which individuals are varying around their group means.
From Sum of Squares, we also get Mean squares, or the variance associated with each source.
- MStreatment:Mean Squares of Treatment/Group is the variances explained by the treatment category.
- MSError:Mean Squares of Error is the variability after we've explained everything we could with different groups.
With Mean Squares of Treatment/Group and Mean Squares of Error, we can obtain the F ratio, or Fobserved:.
- If the null hypothesis stating there are no differences among each group means is true, then we would expect a F ratio close to 1, since Mean Squares of Treatment/Group and Mean Squares of Error will be estimating random variability.
- If the alternative hypothesis suggesting at least one of the group mean is different than others, we would expect a F ratio that's not close to 1. A relatively large F ratio is unlikely to occur if the two mean squares are estimating the same thing.
A p-value can be derived from the calculated F ratio. The p-value is the probability that if the null is true, the likelihood for us to observe a F ratio this far (or close depends on the data) from 1. In essence, using this cost of flight example, the p-value tells us the probability of randomly sampling groups producing deviations that result in large treatment differences. How do we receive the p-value depends on the degree of freedom and applying Fisher Sendecor distribution.
Fit Model (Analyze >> Fit Model) is a platform that is useful more generally for linear models. In the opening window, personality tab indicates the suitable method for the desired analysis. When using for the purpose of ANOVA, Standard Least Squares will be used. After assigned cost of flight in the Y variable, JMP is able to automatically script the proper Personality and Emphasis, which is the value that will be returned when running the analysis. In the Model Effect Section, we need to assign the designated effect or term we want to have in the model, which is the Airline in our example.
Analysis of Variance
Fit Model provides the exact same analysis of variance summary as the one we've seen in Fit Y by X method.
- Note: Sum of Square Treatment/Group is now labeled with model instead of Airline. This model could refer to additional factors that is inputted in the model.
The Effect Test summary identify the Sum of Squares associated with each factors for multiple factors model. Using the Cost of Flight data as an example, say, if we were to input more factors such as "time of the year", "purchase method", "route of purchase" etc. to the model, we would pay closer attention to this section to see which factors is contributing to the explanatory power of the overall model. Along with Sum of Squares, F observed and p-value associated with the specific factor are also included in the summary.
The Parameter Estimate section are specific estimates and tests associated with the parameterization of the model. They are the derived mathematical form of ANOVA, i.e. taus and grand mean.
In this specific example:
- Intercept of the model: equivalent to grand mean, where treatment offsets are "off-set" from.
- t1 indicates the estimate for Delta
- t2 indicates the estimate for Southwest
- Notice how Virgin America has been omitted. This is because the degrees of freedom for groups is 2. Since we already have two pieces of information regarding to individual group estimate, and the sum of t's is going to be 0, we can retrieve information regarding to the last offset (Virgin America). Therefore, our model is not refitting an offset that does not need to be fitted statistically. In another word, the statistic models do not redundantly estimate information that is already being figured based on ways we parameterize.
- In order to show all the offsets, you may select Red Triangle Menu>> Estimate >> Expand Estimates.
- T ratio and p-value: tests of whether or not these parameters are different from 0. These information are useful to see if one of the group (airline) has an estimate that is different from the grand mean.
A Main Effect is the overall factor of a factor. A hypothesis test for a main effect is a test of whether there is evidence for an effect of different treatments (the levels of the factor) in the population. Main effects are rather unspecific and an omnibus test. When rejecting the null, the main effect would tell us that there are differences somewhere among all the different group means.
A Pairwise Comparison is a hypothesis test of a specific mean difference. In the context of ANOVA, pairwise comparison are useful when we are following up to that omnibus test.
Fit Y by X
Using the same Cost of Flight data, we perform the same Fit Y by X distribution test:
- cost of flight as the Y, responses: individuals containing variability.
- Airline as the X, factors: things we want to make predictions from.
Each Pair, Student's t
Underneath the Red Triangle Menu, select Compare Means >> Each Pair Student's t. This produce output indicating the comparisons of each group vs. each other groups.
Ordered Differences Report
The Ordered Differences Report include features indicating differences between each comparison, the Standard Error Differences between each comparison, the upper and lower bound of Confidence Interval, and p-values.
- P-value reports the probability observing a difference as extreme as the one we observed between two groups if the null hypothesis (difference between mean is 0) is true.
- The report is ordered in a ascending order in regards to p-value.
Connecting Letters Report
The Connecting Letters Report is useful showing many possible comparisons. By assigning each group a specific letters, this report illustrates significance differences among each group base on the criteria (alpha level), that is:
- If two groups does not share the same letter, the two groups have a statistically significant difference.
- If two groups does share the same letter, the two groups do not have statistically significant difference.
In this example, since Virgin American and Southwest share the same letter, it appears that these two groups do not have the statistically significant difference. Since Delta has a different letter from the rest of the group, it appears that Delta has a statistically significant difference from Virgin America and Southwest.
The gray circles appeared on the right side of ANOVA diamonds is the graphic representation of each pair, student's t-test. Each circle identifies the confidence interval of that specific mean, by selecting the specific circle, both the group title and circle will appear to be red. The remaining strongly grayed circles are the statistically significantly different means from the selected group.
- Note: when clicking on Southwest, Virgin America's circle also reacted by becoming highlighted with red color, showing the same evidence suggested by the previous two reports.
Detailed Comparison Report
By selecting the Detailed Comparison Report (Red Triangle Menu of Comparisons for each pair using Student's t >> Detailed Comparison Report), a more detailed t-test report between each of the two group means will be reported, including representation of sampling distribution and associated t ratios.
By using the Fit Model platform, we can produce each pair student's t test via Effect Details report. This shows the details of effects in our model, which is the single factor airline in this specific example.
Least Square Means Table
In the Least Squares Means table, we can see the means for different groups.
- By selecting Least Squares Means Plot (Red Triangle Menu >> LS Means Plot), a plot of means with confidence intervals to produce test among means that is the same test as the one in Fit Y by X.
- By selecting Least Squares Student's t test (Red Triangle Menu >> LSMeans Student's t), a display showing a table with differences in each combination of least squares mean will dropped down, computing a Student's t-test for analysis. Note: Least Squares Means refer to those that were fit by least squares modeling, which are the same means we would find in a distribution breakdown base on group factors.
- LSMeans Differences Student's t table gives a table with each combination among groups. This is useful when looking at the confidence interval of each differences.
The Problem of Multiplicity and Alpha Escalation
- Multiplicity: refers to running multiple pairwise comparisons, causing familywise error rate (FWER)
- Alpha escalation: refers to elevation of alpha level due to multiplicity
- Familywise Error Rate (FWER): is the probability of making one or more false alarms when performing multiple pairwise comparisons; it leads to “alpha escalation”; it is also called experimentwise error rate
- aFWER: alpha level when the FWER is considered
JMP Demonstration with Sample Data-IQ Drugs
Data table to work with: Hypothetical IQ Data 1000 People Taking 20 Different Drug
- The file is located at: Module 2-3 journal >> Module 2:3 >> Pairwise Comparisons >> Data >> IQ Drugs
- The IQ scores of the data are formed on the basis of random function with normal distribution
Pairwise t test
P value is 0.8666, not statistically significant. Thus, we fail to reject the null. This result is expected because we are comparing the grand mean of IQ with each categorized mean of IQ, which is randomized with normal distribution.
- to run pairwise t test: Analyze >> fit Y by X >> Y = IQ, X = Drugs >> Red triangle under Oneway Analysis >> Means/Anova
Student’s t test
However, the result is very different if we run student’s t test, which tests for all possible individual comparisons, no adjustment for multiple tests. That is, we compare every drug with every other drug.
- To run student’s t: Analyze >> fit Y by X >> Y = IQ, X = Drugs >> Red triangle under Oneway Analysis >> Compare Means >> Each Pair, Student’s t
- Connecting Letters Report shows that Drug R and B are statistically significantly different from Drug E.
- Ordered Difference Report shows two comparison tests that have p-value below 0.05 in red, representing statistical significance
Inference about the Problem of Multiplicity and Alpha Escalation
While our ANOVA did not come up with a statistically significant result, student’s t did.
This difference is due to FWER. That is, in student’s t, we ran many different comparisons, leading into many opportunities to get false alarm. Simply speaking, if we run many random comparisons, there is higher chance of rejecting the null even when the null is true, because it is more likely that we compare the mean that happens to be somewhat high with the mean that happens to be somewhat low, resulting in statistical significance. Therefore, we may infer that the statistically significant results from student’s t test may have been false alarms.
We have as many false alarm opportunities as the number of comparisons we are making. Namely, if we are making 3 different comparisons using student’s t, there are 3 opportunities to make false alarm. Therefore, in calculation of aFWER, (1 – aeach comparison)g is probability of not making a false alarm in g number of comparisons.
aFWER = 1 – (1 – aeach comparison)g
- aFWER is alpha level when FWER is considered
- aeach comparison is alpha level of each comparison set by experimenter, which is usually 0.05
- 1 - aeach comparison is the probability of not making a false alarm in each comparison
- g is the number of comparisons made
- (1 – aeach comparison)g is probability of not making a false alarm in g number of comparisons
- 1 - (1 – aeach comparison)g is the probability of making a false alarm in g number of comparisons.
Planned vs. Unplanned Comparisons
Because we have problems of multiplicity and escalation of alpha level when conducting student’s t test for making many comparisons, we use different tests to compare many means to each other.
Controlling Alpha for Planned Comparison
Planned Comparison (A Priori) is a specific comparison of means that a researcher was interested in testing before looking at the data.
Use Bonferroni Correction to compute corrected alpha for planned comparison.
aB = aFWER / g
- aB: alpha level for each planned comparison
- aFWER: desired maximum familywise error rate
- g: the number of specific planned comparisons
To run planned comparison test using Bonferroni Correction in JMP: [[Analysis > Fit Y by X > red triangle under Oneway Analysis > set a level > other > type aB value]]
- This doesn’t result in any change in JMP graph. Also, p-values from the ordered differences report did not change. One thing noticeable is that JMP is still determining statistical significance based on the alpha of 0.05, because it is the global setting of JMP. That is, all the p-values below 0.05 and above aB have red color, while they are not statistically significant according to the new alpha, aB.
- Using Bonferroni correction only changes the value of alpha in confidence quantile. Thus,we are to use this alpha in confidence quantile to determine which value is statistically significant.
Controlling Alpha for Unplanned Comparison
Unplanned Comparison (Post-Hoc) is any comparison of means that a research is interested in testing after looking at the data. Unplanned comparisons are those tests that were suggested by the data.
- Unplanned comparison requires more strict correction because it allows you to capitalize on chance. In other words, the chance of making false alarm is larger for unplanned comparisons.
- If null is true, you don’t know which mean is the most different by chance alone. So if you compare the two that happen to be the most different by chance, you will raise the probability of rejecting null for that comparison
Use Tukey-Kramer (HSD or honestly significant difference) for unplanned comparison test
HSD = q * (MSerror / nj)0.5.
- HSD is the mean difference considered “honestly different” given the number of comparisons. That is, it measures how big of a difference should we expect among the sample means based on number of comparisons that we can possibly make.
- q is the Studentized Range statistic based on
- a) the number of treatments
- b) the degrees of freedom for MSerror
- c) the alpha level for the family of comparisons
- Useful thing about Tukey-Krammer is that we can look at every difference without specifying which particular mean comparison we want to make. Also, based on formulation of test, we can be sure that our overall FWER is no higher than the rate we specify.
To run unplanned comparison test using HSD in JMP: [[Analysis > Fit Y by X > red triangle under Oneway Analysis >Compare means > All Pairs, Tukey HSD]]
- In interpreting data, we do not need to worry if we are getting it by multiplicity or capitalizing on chance. Our overall alpha level, aFWER is 0.05 regardless of the number of the comparisons we have.
- In the data analysis, p-values changed. They do not represent the probability of obtaining that particular mean difference alone. These p-values have been corrected by the number of comparisons we have, so they are considerably higher. This is because it is very likely to get a larger difference when there are many comparisons made
Here is a little checklist for you to use the suitable correction when needed.
Planned Comparison (A Priori)
- With 1 or 2 planned comparison, no corrections to alpha is usually needed (with a statistically significant main effect)
- With 3-5 planned comparison, the Bonferroni correction is usually most powerful
- With more than 5 planned comparison, the Tukey-Kramer HSD is usually most powerful
Unplanned Comparison (Post Hoc)
- Use Tukey-Kramer HSD
- Bonferroni correction is never appropriate
- Unless you correct for every possible pairwise comparison