# Module 3:3 - Simple Linear Regression in JMP

Youtube Playlist Associated Files

## Fit Y by X[edit]

Using the Fit Y by X platform in JMP (**Distribution >> Fit Y by X**),

- variable you are making prediction from as X (factor): in the case of the module it is hours spent studying.
- Responses as Y: in this module it is grade received.

JMP will automatically detect this as a bivariate fit (a quantitative variable predicted by another quantitative variable). When you click Ok JMP will produce a simple scatter plot. From the red triangle menu at the top, select **Fit Line**. JMP will produce the best fitting line of Grades (Y) to hours studied (X); in this case Grade = 72.37 + 1.77*Hours. The line starts at 0 for X and the Y value is the amount expected if no hours were spent studying (X=0). So if a student did not study at all, it is predicted that they would receive a 72.37 and 1.77 units of grade would be added for each hour studied.

The output includes an **Analysis of Variance** and a **Parameter Estimate**.

The **ANOVA** includes a p-value and the values for MS_{error} and MS_{regression}. These variance estimates are divided and an F ratio is produced.The MS_{error} is taking the SS_{error} and diving by the degrees of freedom (4 df in the example). We have 4 from the original 6 independent pieces of information because we lose one df for our slope and another for estimating the intercept. The MS_{error} provides the best unbiased guess for variance of population around this line. The ANOVA is not testing the intercept, it is simply a test of the regression relationship independent of where the regression line starts.

The **Parameter Estimates** output gives these statistics in the form of t statistics.

_{0})/ (Standard Error)

We are given the intercept estimate of where the line hits the y axis and we are given our t ratio and a p-value that is received from our appropriate t distribution. we are also given the hours (slope) that is adding 1.77 hours for the grade that is expected. The standard error for intercept is substantially smaller because the formula for intercept has many more terms which could have the potential for more things to go wrong.

We are given the estimate of the slope of 1.77, the estimated standard error of 0.15, a t ratio of 11.77 formed by dividing the estimated slop by dividing the slope by the standard error, and a p-value of 0.0003 calculated from the t distribution.

- This p-value is exactly the same as the one calculated in ANOVA. They two tests are different in that the parameter estimates uses a t test to form a t ratio, and the ANOVA partions the SStotal into SSregression and SSerror. In simple linear regression, these two values will match because they are testing the same thing. One may choose to use one test over the other depending on context, but a t ratio is typically used for regression tests.

### More Options[edit]

For more information about the linear regression, we can click the red triangle next to “Linear Fit” under the graph.

- Confid Curves Fit: confidence curves of the fit around the line, indicating where we should expect the line to be in the actual population.
- Confid Curves Indiv: confidence curves of the individuals, indicating where each individuals would lie in the actual population.
- The confidence curves of the individuals is wider because it represents the error around the line, while the confidence curve of the fit only represents means of y given x.

- Confid Shaded Fit/Confid Shaded Indiv: shades the regions bounded by the curves to give a better and clearer representation

- Save Predicted: saves values predicted by the line for each individual to the data table
- Save residuals: saves the deviation between each individual’s actual score and predicted score
- For example, Tom’s residual score can be calculated by Tom’s actual score - predicted score (91-91.86=-0.86). This means that the model over-predicted Tom’s score by 0.86
- These values are useful because we can use the distribution platform to observe any pattern and see if they obey our assumptions (e.g., equal variances)

## Fit Model[edit]

Using Fit Model for Regression in JMP (**Analyze > Fit Model**),

When using Fit Model, you'll want to put your Response Variable as **Y** and your Factor Variable (that you would typically put as X) as **Construct Model Effects.** Fit Model works great for more complicated models, since it *allows more than one model effect*. If you tried to enter more than one Factor Variable in a Fit Y by X model, you would get more than one separate linear regressions, making Fit Model the necessary option for more complicated models.

As usual, you'll want to select **Minimal Report** before you hit **Run** with your Fit Model for regression.

Fit Model gives us an **F Ratio** and p-value in the Analysis of Variance, and gives us a **t Ratio** and p-value in the Parameter Estimates. The p-values and test statistics will be equivalent as long as you are doing a simple linear regression (only one predictor). If you had more than one predictor, you can find their each individual **F Ratios** in Effect Tests. In general, you typically should be looking at the **t Ratio** when you are talking about regression.

Fit Model lets us predict values of Y for any given value of X. To do this, you use the **Prediction Profiler**, which you can find under **Red Triangle > Factor Profiling > Profiler**
Using the Prediction Profiler, you can actually double-click the value of X, and enter in your own specific value to predict the value for Y.

Just like in Fit Y by X, you can save columns to your data table using Fit Model. You can do this under **Red Triangle > Prediction Formula** or **Predicted Values** or **Residuals** etc.

In Fit Model, you can also use Row Diagnostics, which are diagnostics that checks for the assumptions of our model.