To Lecture Notes

CSC 423 -- 7/31/17

Review Exercises

  1. What is an R package? How do you install an R package?
    Ans: An R package is a collection of R functions, data, and compiled code that performs a task useful to R users. You load an R package with the install.packages function. For example, to install the leaps package:
  2. If x is a random variable, what does E(x) mean?
    Ans: E(x) is the expected value of x, which is the long run average of the outputs of x. If x is a discrete random variable defined by a probability distribution function like this:
     
    Outcome Probability
    x1 P(x1)
    x2 P(x2)
    ... ...
    xn P(xn)

    The expected value of x is defined by
     
    If p is the probability density of a continuous random variable x, compute the expected value with this integral:
     

  3. What does the term unbiased mean?
    Ans: A sample statistic T is unbiased for a population parameter θ if E(T) = θ. Here are four examples of unbiased statistics that we have seen:
    1. Sample mean: E(y) = μ (The sample mean is unbiased for the population mean.)
    2. Sample Variance: E(sy2) = σ2 (The sample variance is unbiased for the population variance.)
    3. Estimated Regression Parameter: E(βj^) = βj (Each estimated regression parameter is unbiased for the true population parameter.
    4. Estimated Value: E(yi^) = yi
    a, b, c, and d are based on the assumption that the residuals are unbiased: E(εi) = 0.
     
  4. What is a residual plot? a partial residual plot?
    Ans: It is a plot of the residuals plotted vs. one of the independent variables. If x is an independent variable here are the SAS and R
  5. What are the definitions of these statistics?
     

    Ans:
    SSM = Sum of Squares for Model = Σi=1n (yi^ - y)2
    SSE = Sum of Squares for Error = Σi=1n (yi - yi^)2
    SSE = Total Sum of Squares = Σi=1n (yi - y)2
    DFM = Degrees of Freedom for Model = p - 1
    DFE = Degrees of Freedom for Error = n - p
    DFT = Total Degrees of Freedom = n - 1
     
  6. What is the overall F-test for regression tell you?
    Ans: The overall F-test tests the null hypothesis that all of the non-intercept regression parameters are equal to zero:
     

    The test statistic is
     

    Look up the confidence interval for the F statistic, assuming the null hypothesis, in an F table with (DFM, DFE) degrees of freedom.
     
  7. Given these statistics for a regression problem:
     

    Should you accept or reject the null hypotheses for the overall F-test:
     

    Ans: DFM = p - 1 = 4 - 1 = 3, DFE = n - p = 18 - 4 = 14,
     

    The 95% confidence interval for the F statistic with (3, 14) degrees of freedom is (0, 3.344), so accept the null hypothesis.
     
  8. For the regression model in Question 4, compute R2 and R2
    Ans: R2 = SSM / (SSM + SSE) = 59.1 / (59.1 + 130.8) = 0.311. Then
    R2 = 1 - (1 - R2)((n - 1) / (n - p)) = 1 - (1 - 0.311)(17 / 14) = 0.163.
     
  9. What are some SAS options for the model statement of proc reg?
    Ans: influence noint tol vif. Today, we will see the options p and r.
     
  10. What is the difference between a standardized residual and an externally studentized residual?
    Ans: A standardized variable is another name for the z-score of a variable:
    zi = (xi - x) / sx, where x and sx are computed from the entire dataset. For the externally studentized variable, the sample mean and standard deviation are computed from the dataset with the ith observation deleted:
    zi* = (xi - x(i)) / sx,(i). In the case of externally studentized residuals:
    zi* = (εi - ε(i)) / sε,(i).
     
  11. What is an influence point? List five measures of influence?
    Ans: It is an observation that has an unusually high influence on the regression model; if the observation is deleted from the dataset, the regression model changes dramatically.
     
  12. What is multicollinarity? List two measures of multicollinarity?
    Ans: Multicollinarity exists in a multiple regression model if at least one of the independent variables is highly correlated with another independent variable or with a linear combination of some of the other independent variables. Two measures of multicollinarity are the tolerance and the variance inflation factor (VIF).

 

Using R2 to Pick the Best Regression Model

 

Matrix Form of the Regression Model

 

Confidence and Prediction Bands

 

More about Influence Points

 

Dummy Variables

 

Project 3

 

Stepwise Regression and Model Validation

 

Final Project

 

A Quadratic Model

 

Interaction Terms

 

Transformations