CSC 423 -- 7/31/17

Review Exercises

  1. What is an R package? How do you install an R package?
    Ans: An R package is a collection of R functions, data, and compiled code that performs a task useful to R users. You load an R package with the install.packages function. For example, to install the leaps package:
  2. If x is a random variable, what does E(x) mean?
    Ans: E(x) is the expected value of x, which is the long run average of the outputs of x. If x is a discrete random variable defined by a probability distribution function like this:
    Outcome Probability
    x1 P(x1)
    x2 P(x2)
    ... ...
    xn P(xn)

    The expected value of x is defined by
    If p is the probability density of a continuous random variable x, compute the expected value with this integral:

  3. What does the term unbiased mean?
    Ans: A sample statistic T is unbiased for a population parameter θ if E(T) = θ. Here are four examples of unbiased statistics that we have seen:
    1. Sample mean: E(y) = μ (The sample mean is unbiased for the population mean.)
    2. Sample Variance: E(sy2) = σ2 (The sample variance is unbiased for the population variance.)
    3. Estimated Regression Parameter: E(βj^) = βj (Each estimated regression parameter is unbiased for the true population parameter.
    4. Estimated Value: E(yi^) = yi
    a, b, c, and d are based on the assumption that the residuals are unbiased: E(εi) = 0.
  4. What is a residual plot? a partial residual plot?
    Ans: It is a plot of the residuals plotted vs. one of the independent variables. If x is an independent variable here are the SAS and R
  5. What are the definitions of these statistics?

    SSM = Sum of Squares for Model = Σi=1n (yi^ - y)2
    SSE = Sum of Squares for Error = Σi=1n (yi - yi^)2
    SSE = Total Sum of Squares = Σi=1n (yi - y)2
    DFM = Degrees of Freedom for Model = p - 1
    DFE = Degrees of Freedom for Error = n - p
    DFT = Total Degrees of Freedom = n - 1
  6. What is the overall F-test for regression tell you?
    Ans: The overall F-test tests the null hypothesis that all of the non-intercept regression parameters are equal to zero:

    The test statistic is

    Look up the confidence interval for the F statistic, assuming the null hypothesis, in an F table with (DFM, DFE) degrees of freedom.
  7. Given these statistics for a regression problem:

    Should you accept or reject the null hypotheses for the overall F-test:

    Ans: DFM = p - 1 = 4 - 1 = 3, DFE = n - p = 18 - 4 = 14,

    The 95% confidence interval for the F statistic with (3, 14) degrees of freedom is (0, 3.344), so accept the null hypothesis.
  8. For the regression model in Question 4, compute R2 and R2
    Ans: R2 = SSM / (SSM + SSE) = 59.1 / (59.1 + 130.8) = 0.311. Then
    R2 = 1 - (1 - R2)((n - 1) / (n - p)) = 1 - (1 - 0.311)(17 / 14) = 0.163.
  9. What are some SAS options for the model statement of proc reg?
    Ans: influence noint tol vif. Today, we will see the options p and r.
  10. What is the difference between a standardized residual and an externally studentized residual?
    Ans: A standardized variable is another name for the z-score of a variable:
    zi = (xi - x) / sx, where x and sx are computed from the entire dataset. For the externally studentized variable, the sample mean and standard deviation are computed from the dataset with the ith observation deleted:
    zi* = (xi - x(i)) / sx,(i). In the case of externally studentized residuals:
    zi* = (εi - ε(i)) / sε,(i).
  11. What is an influence point? List five measures of influence?
    Ans: It is an observation that has an unusually high influence on the regression model; if the observation is deleted from the dataset, the regression model changes dramatically.
  12. What is multicollinarity? List two measures of multicollinarity?
    Ans: Multicollinarity exists in a multiple regression model if at least one of the independent variables is highly correlated with another independent variable or with a linear combination of some of the other independent variables. Two measures of multicollinarity are the tolerance and the variance inflation factor (VIF).


