To Documents
Residual Analysis
- The residuals of the regression model yi = a xi +
b + εi
are the random errors εi.
- In addition to a appropriately large R2 value, the residuals must be well-behaved, as explained in the following section.
Standard Assumptions about Residuals
- The four standard assumptions about the residuals of a linear regression model:
- Unbiased: E(εi) = 0, i = 1, ... , n. Residuals are
unbiased if the average value of the residuals is zero
in any thin vertical rectangle in the residual plot.
- Homoscedastic: Cov(εi) = σ2, i = 1, ... , n.
Residuals are homoscedastic if the standard deviation of the residuals is the same in any thin rectangle in the residual plot.
- Independent: Cov(εi, εj)
= 0, for i ≠ j.
- Normal: εi ∼ N(0, σ2), i = 1, ... , n.
- Verify assumptions 1 and 2 from residual plots of the residuals vs. the predicted values, or of the residuals vs individual independent
variables. Here are
some residual plots. Which plots satisfy the assumptions? Which plots violate one or more of the assumptions.
- Verify assumption 3 using the Durbin-Watson statistic, which we will look at later. The normal distribution is the only distribution
for which random variables x and y are uncorrelated implies x and y are independent. (x and y independent implies x and y are uncorrelated
for any distribution.)
- Verify assumption 4 from a normal plot of the residuals. The residuals are required to be normal to insure that the t-distribution can
be used to obtain accurate confidence intervals for the estimated regression parameters. Here are some normal plots that show some of the
ways that residuals can deviate from normality.