SAS PROCEDURES FOR REGRESSION AND RESIDUAL ANALYSIS

 

 

 

PROC REG

 

The REG procedure is a general SAS procedure for regression analysis. It computes the regression line that fits the data.

 

            PROC REG DATA=dataset-name;

                        MODEL y-variable=x-variable;   ßdefines the model to be fitted.

            RUN;

 

Example:

 

proc reg;

model brate=lgnp;

      run;

 

Output

 

                             The Statistics of Poverty and Inequality                           

                                        The REG Procedure

                                          Model: MODEL1

                       Dependent Variable: brate birth rate (per 1,000 pop)

 

                                       Analysis of Variance

                                                          Sum of           Mean

          Source                       DF        Squares          Square        F Value    Pr > F

 

          Model                          1     9152.56716     9152.56716     105.28    <.0001

          Error                          89     7737.37042       86.93675

          Corrected Total          90          16890

 

                       Root MSE                    9.32399         R-Square     0.5419

                       Dependent Mean       29.46044        Adj R-Sq     0.5367

                       Coeff Var                  31.64918

 

                                               Parameter Estimates

 

                                                                 Parameter      Standard

Variable     Label                          DF       Estimate          Error    t Value    Pr > |t|

 

Intercept                                         1       75.52126         4.59430      16.44      <.0001

lgnp                                                 1        -6.13194         0.59762     -10.26      <.0001

 

 

The part in bold font, is the output of the REG procedure that we are interested in.

Look under parameter estimate for the values of the intercept and the slope. In the regression line y=a+bx: a is the intercept value and b is the estimate associated to the x-variable.

 

 

Residual analysis

 

  1. Use the PROC REG to perform a residual analysis. The PLOT statement in the PROC REG produce residual plots.

For example,

PLOT predicted.*residual.;

generates one plot of the predicted values by the residuals for each dependent variable in the MODEL statement. These statistics can also be plotted against any of the variables in the VAR or MODEL statements.

Possible keywords are (note the period after the keyword):

Predicted. (or pred. or p.) = predicted values;

Residual. (or r.) = residuals;

Student. = studentized residuals;

                        Npp. = normal probability plot;

Specialized plots are requested with special options: the PRED option plots the 95% prediction intervals for the predicted values of Y (using the root mean square error) 

 

PROC REG <DATA=dataset-name>;

MODEL yvar=xvar;

PLOT yvar*xvar/nostat;  ß draw scatter plot and regression line

PLOT residual.*xvar residual.*predicted.;  ßresidual plots to check for linear assumption and outliers

PLOT nnp.*residual.;  ß probability plot for the residuals (to check for normality)

PLOT yvar*xvar/PRED;  ß draw scatter plot & upper and lower prediction

bounds.

RUN;

 

Example (cont.)

 

PROC REG;

MODEL brate=lgnp;

PLOT brate*lgnp/nostat; 

PLOT residual.*predicted.

PLOT nnp.*residual.;

PLOT brate*lgnp/PRED;

RUN;