Linear Regression

Linear Regression Definitions and Three Regression Models

Dependent Variable The variable y being investigated.
Independent Variable Any of the variables x₁, x₂, ... , x_p that might affect the values of the dependent variable. Other names for the independent variable are predictor and regressor. A continuous independent variable is called a covariate; a discrete independent variable is also called a factor or a grouping variable. A dummy variable is an independent variable that can only take on the values 0 and 1.
Linear Regression Equation An equation of the following form that could predict the value of the dependent variable y from the values of the dependent variables x₁, x₂, ... , x_p:
The coefficients of the regression model are the variables β₀, β₁, ... , β_p.
The Linear Regression Model shows the relationship between the actual dependent variable values, y₁, y₂, ... , y_n, the coefficients β₀, β₁, ..., β_p, the independent variables x_ij, i = 1, ..., n, j = 0, ... , p, and the random errors or residuals ε₁, ε₂, ... , ε_n:

In practice, the coefficients of the regression model and the residuals are unknown (except possibly in theoretical disciplines like physics or chemistry). The coefficients must be estimated with a procedure known as obtaining a least squares estimate (LSE).
To denote anything in a formula as estimated or predicted, we put a hat (^) on it. For example, y^, a^, b^, β_j^ are the predicted y, a, b, and β_j. They are read as y hat, a hat, b hat, and beta j hat, respectively.
After we have the estimated coefficients β₀^, β₁^, ... , β_p^. we can compute the predicted value of y (y^) with the estimated coefficients:
1. Estimated y value: y_i^ = β₀^ + β₁^ x_i1 + ... + β_p^ x_ip.
2. Estimated Residual: ε_i = y_i - y_i^
3. The sum of squares for error (SSE) is defined as SSE = Σ_i=1ⁿ ε_i².
4. The least squares estimator (LSE) for a regression model are the values of the coefficients that minimizes SSE for the regression model.
To minimize the SSE, use the standard minimization procedure from calculus: compute the derivatives with respect to the coefficients, set these derivatives to zero, and solve for the coefficients.
Here are technical details for deriving the LSE estimates for the following three regression models.

Regression model: y_i = μ + ε_i
Illustrative Graph
LSE: μ^ = y
The horizontal line regression model is a model with no independent variables. This is not interesting except in an ideal measurement model situation. However, this model is also important as a null hypothesis model for testing whether the influence of an independent variable on the dependent variable is significant.

Regression model: y_i = a x_i + ε_i
Illustrative Graph
LSE: a^ = x, a^ = ( Σ_i=1ⁿ x_i y_i ) / ( Σ_i=1ⁿ x_i² )
The regression through the origin model should not be used unless it is known that the regression line must through the origin.
Regression through the origin is also called regression without intercept.
Regression through the origin is difficult to compare with regression models with intercept, so it is rarely used in practice.
Find the horizontal line regression model using the Spring data set:

Displacement (x): 0 1 2 3 4

Force (y): 0 49 101 149 201

Ans: a^ = ( Σ_i=1ⁿ x_i y_i ) / ( Σ_i=1ⁿ x_i² ) = 1502 / 30 = 50.06667, so the regression line is
y = a^ x or a = 50.066667 x.

Regression model: y_i = a x_i + b + ε_i
Illustrative Graph
LSE: y - y = (r_xy s_y / s_x)(x - x), solve for a^ and b^.
The equation on the previous line is the point-slope form of the regression equation:
The general form of the point-slope form of a line is y - y₀ = m (x - x₀).
Recall that the slope is the rise over the run. In the regression equation, whenever x increases by one x-standard deviation over the x-mean, the predicted y increases by r_xyy-standard deviations over the y-mean.
Find the simple linear regression model using the Spring data set:

Displacement (x): 0 1 2 3 4

Force (y): 0 49 101 149 201

Ans: x = 2 s_x = 1.581139 y = 100 s_y = 79.37884 r_xy = 0.9999286

y - y = (r_xy s_y / s_x)(x - x)

y - 100 = (0.9999286 * 79.37884 / 1.581139) (x - 2)

y - 100 = 50.2 (x - 2)

y = 50.2 x - 50.2 * 2 + 100

y = 50.2 x - 0.4,

so a^ = 50.2 and b^ = -0.4.
The equation y = 50.2 x - 0.4 is written in slope intercept form.
Question: What is the predicted force for a displacement of 2.5 inches?

Ans: y^ = 50.2 * 2.5 - 0.4 = 125.1. With regression through the origin, y^ = 50.066667 * 2.5 = 125.1667.

Displacement (x):	0	1	2	3	4
Force (y):	0	49	101	149	201

Displacement (x):	0	1	2	3	4
Force (y):	0	49	101	149	201