# Module 3:2 - Simple Linear Regression Tests

## A - Error in Regression Models

Error in Regression Models is just like the individual error we have observed in our previous models. Ei = the degree to which individuals differ from what is predicted by the model. So in a regression model, the individual error will be how much that score differs from all other scores at that specific value of x. If we look at an actual regression line, we can see that the Ei, is the distance of each individual point from the regression line.

With our Ei, we can estimate our variance. Important: We are assuming that the error in the population is the same at every level of X. It is similar to our Tests of Homogeneity, but in terms of estimating error in this model, we call it "the assumption of homoscedasticity". So, how much an individual differs from the average at that level of X will not change at a different level of X. If this is true, then we can assume that every Ei in the model can be used equally to estimate the population variance.

## B - Statistical Inferences with Regression

### Tests of Parameters

Test of b1 - this test measures if the regression slope is not equal to 0 in the population, or rather, the effect of X on Y. Usually, this test is of more interest to us.

Test of b0 - this test measures if in the population, the Y-int is different from 0.

### Confidence Intervals

Confidence interval for the conditional mean of Y and X - this shows us the confidence interval for predicting the mean for all scores

Confidence interval for Y for an individual at X - this shows us the confidence interval for predicting where an individual score lies

## C - Sampling Distributions Known (T-tests)

Remember that a t-statistic is the sample statistic minus population parameter all divided by the estimated standard error of statistic.

The T-test for B1 is the observed value of B1 (the slope observed in the sample) minus the population value of B1 if the null hypothesis is true divided by the estimated standard error of B1.

In our t-test, our B1 value ends up being 0 leaving us with b1 in the numerator.

Adding more people to our sample adds more squares that we are summing up. The sums of squares of x will get larger as we add more people but it is also sensitive to the spacing of the observations.

If we space our means close to each other and move them around, the slope will become unstable and change a lot.

If we space our means farther apart and move them around, the slope will not move around as much.

```If possible, space out measurements as much as possible to get a more stable estimate of the population slope.
```

## D - T-test for the Simple Regression Y-Intercept

Now we can move onto the t-test for the Y-intercept. Here is the formula for the t-test for B0.

For this case B0 will be zero. This is not affected by differing slopes.

There are more parts to the standard error formula for B0. This reveals that more things can go wrong with this estimate. There tends to be more error with this estimate.

The estimated standard error is the error associated with estimating where this line will hit the y axis or where this line is when x = 0.

The Sums of Squares of x is the function of the number of observations and the spacing of observations.

The 1/n is the error variance in estimating the mean of Y decreases as a function of the sample size (n).

Since our line will always go through x bar and y bar, the better we are able to estimate where y bar is, the better we are able to estimate where the intercept is. If there is a large amount of error in estimating the mean of y (as seen when we move the line around) we are able to see that the intercept moves as much as the mean of y experiences error.

As the sums of squares for x (located in the denominator) increases, the error in the estimate of Beta 0 will decrease. This means that if we mis-estimate the slope, we will also mis-estimate the intercept. The more stable the estimate of the slope is, the more stable the estimate of the intercept is.

X bar squared (located above the sums of squares for x) is a function of how far 0 is from the mean of x and scales how much of an influence our estimation of the slope has on the error the intercept experiences.

The Mean Squares for error is the scaling factor for the rest of the components. If its small then all the components will have a small influence in moving the line around and actually influencing the intercept but if its large then all factors will contribute to the large mis-estimation of the intercept.

Having a large sample size and the mean of x as close to zero as possible, will help decrease the amount of error in the estimation of the intercept.

## E - Approaches to Statistical Inference with Regression

### Using Analysis of Variance and General Linear Test (F-Test)

In our one predictor linear regression model our focus is on the slope parameter (which estimates whether in the population we have evidence the slope between y and x is different from zero). We are attempting to explain sums of squares total, in other words the total amount of deviations in the y axis. To achieve this we will break apart the sums of squares total into the sums of squares for error (the deviation between an individual’s actual score and what is predicted for them) and the sums of squares regression (which captures the effect of our x variable).

In a graph format it would look like this:

The regression deviation and sums of squares regressions are capturing the degree of the slope of the line, and every individual contributes a regression deviation. We are now able to form an F-test for the simple regression slope and with this F-test obtain a p-value.

```Notice that we are forming this specific F-test for the regression slope in the model, not the intercept.
```