To Lecture Notes
Linear Regression Definitions and Three Regression Models
Some Definitions
- Dependent Variable The variable y being investigated.
- Independent Variable Any of the variables x1, x2, ... , xp
that might affect the values of the dependent variable.
Other names for the independent variable are predictor and
regressor. A continuous independent variable is called a covariate;
a discrete independent variable is also called a factor or a
grouping variable. A dummy variable is
an independent variable that can only take on the values 0 and 1.
- Linear Regression Equation An equation of the following form that
could predict the
value of the dependent variable y from the values of the dependent variables
x1, x2, ... , xp:
y = β0 + β1x1 + β2x2 + ... +
βpxp
- The coefficients of the regression model are the variables β0, β1, ... , βp.
- The Linear Regression Model shows the relationship between the actual dependent variable values,
y1, y2, ... , yn, the
coefficients
β0, β1, ..., βp, the
independent variables
xij, i = 1, ..., n, j = 0, ... , p, and the random errors
or residuals ε1, ε2, ... , εn:
yi = β0 + β1xi1 + ... + βpxip +
εi.
- In practice, the coefficients of the regression model and the residuals are unknown (except possibly in
theoretical disciplines like physics or chemistry). The coefficients must be estimated
with a procedure known as obtaining a least squares estimate (LSE).
- To denote anything in a formula as estimated or predicted, we put a hat (^) on it. For example, y^, a^, b^,
βj^ are the predicted y, a, b, and βj. They are read as y hat, a hat, b hat, and beta
j hat, respectively.
- After we have the estimated coefficients β0^, β1^, ... , βp^.
we can compute the predicted value of y (y^) with the estimated coefficients:
- Estimated y value: yi^ = β0^ + β1^ xi1 + ... +
βp^ xip.
- Estimated Residual: εi = yi - yi^
- The sum of squares for error (SSE) is defined as SSE =
Σi=1n εi2.
- The least squares estimator (LSE) for a regression model
are the values of the coefficients that minimizes SSE for the regression model.
- To minimize the SSE, use the standard minimization procedure from calculus: compute the derivatives with respect to the coefficients,
set these derivatives to zero, and solve for the coefficients.
- Here are technical details
for deriving the LSE estimates for the
following three regression models.
Model 1: Horizontal Line Regression
- Regression model: yi = μ + εi
- Illustrative Graph
- LSE: μ^ = y
- The horizontal line regression model is a model with no independent variables. This is not interesting except in an
ideal measurement model situation. However, this model is also important as a null hypothesis model for testing
whether the influence of an independent variable on the dependent variable is significant.
Model 2: Regression through the Origin
- Regression model: yi = a xi + εi
- Illustrative Graph
- LSE: a^ = x, a^ =
(
Σi=1n xi yi
) / (
Σi=1n xi2
)
- The regression through the origin model should not be used unless it is known
that the regression line must through the origin.
- Regression through the origin is also called regression without intercept.
- Regression through the origin is difficult to compare with regression models with intercept, so it is rarely used in practice.
- Find the horizontal line regression model using the Spring data set:
Displacement (x): | 0 | 1 | 2 |
3 | 4 |
Force (y): | 0 | 49 | 101 | 149 | 201 |
Ans: a^ =
(
Σi=1n xi yi
) / (
Σi=1n xi2
) =
1502 / 30 = 50.06667, so the regression line is
y = a^ x or a = 50.066667 x.
Model 3: Simple Straight Line Regression
- Regression model: yi = a xi + b + εi
- Illustrative Graph
- LSE: y - y = (rxy sy
/ sx)(x - x), solve for a^ and b^.
- The equation on the previous line is the point-slope form of the regression
equation:
The point is (x, y),
The slope is rxy sy / sx.
- The general form of the point-slope form of a line is y - y0 = m (x - x0).
- Recall that the slope is the rise over the run. In the regression equation, whenever x increases by one
x-standard deviation over the x-mean, the predicted y increases by rxy y-standard deviations over the y-mean.
- Find the simple linear regression model using the Spring data set:
Displacement (x): | 0 | 1 | 2 |
3 | 4 |
Force (y): | 0 | 49 | 101 | 149 | 201 |
Ans: x = 2 sx = 1.581139
y = 100 sy = 79.37884
rxy = 0.9999286
y - y = (rxy sy
/ sx)(x - x)
y - 100 = (0.9999286 * 79.37884 / 1.581139) (x - 2)
y - 100 = 50.2 (x - 2)
y = 50.2 x - 50.2 * 2 + 100
y = 50.2 x - 0.4,
so a^ = 50.2 and b^ = -0.4.
- The equation y = 50.2 x - 0.4 is written in slope intercept form.
- Question: What is the predicted force for a displacement of 2.5 inches?
Ans: y^ = 50.2 * 2.5 - 0.4 = 125.1. With regression through the origin,
y^ = 50.066667 * 2.5 = 125.1667.