To Lecture Notes

CSC 423 -- 7/27/16

Review Questions

  1. What are the test statistics for (a) the one-sample t-test, (b) the paired sample t-test, and (c) the independent two-sample t-test? For this last test, see The Independent Two-sample t-test, for the test when the variances of the two groups are assumed to be equal. See Question 2 when the variances are not assumed to be equal. Ans:
    (a) t = (x - μ) / SExbar
    (b) if di = x1i - x2i , t = (d - 0) / SEdbar
    (c) See the Independent Two-sample t-test document
     
  2. What is the Welsh-Satterthwaite test?
    Ans: The Welch-Satterthwaite test (1947) is the independent 2-sample t-test when the variances of the two groups are not assumed to be equal. The Indepent two-sample t-test with pooled variance is used when the variances are assumed to be equal.  Welch-Satterthwaite test uses an approximation with fractional degrees of freedom. The Behrens-Fisher Problem is the problem of finding a good independent 2-sample t-test when the variances of the two groups are not assumed to be equal. The Welsh-Satterthwaite test is generally accepted to be the best solution to the Behrens-Fisher Problem, there are other tests as well. These include the test by Chapman (1950), the Prokof’yev-Shishkin Test (1974), and the Dudewicz-Ahmed Test (1998).
     
  3. For a regression model, what is the difference between yi and yi^?   between μ and μ^?  In other words, what does the hat ( ^ ) mean?
    Ans: The hat means "estimated": μ^ is the estimated value of the unknown parameter μ, yi^ is the estimated value for the ith value of the independent variable.
     
  4. What is a least squares estimator (LSE) for a regression model?
    Ans: It is the regression equation that minimizes the sum of squares for error (SSE) over all possible regression equations. TO find the LSE, rhe sum of the squares of the residuals (SSE) is differentiated with respect to each of the parameters being estimated. Each of these expressions is then set to 0 and the set of simultaneous equations is solved for the parameters. We discussed the two simplest cases of horizontal line regression and regression through the origin in class. See the document Derivations of the LSE for Four Regression Models. You don't need to know the derivations for the final exam, only the resulting regression equations in Exercise 5.
     
  5. What are the estimated regression equations for (a) horizontal line regression, (b), regression through the origin, and (c) simple straight line regression. Use the LSE in each case.
    Ans: See Sections 2, 3, and 4 in the Technical Details document: Derivations of the LSE for Four Regression Models. Here are the three regression equations:
    1. Horizontal Regression Equation: y = y
    2. Regression through the Origin: y = a^x, where a^ = (x1y1 + ... + xnyn) / (x12 + ... + xn2).
    3. Simple Linear Regression: y - y = (rxy sy / sx) (x - x)

  6. What does the R2 value tell you about a regression model?
    Ans: The larger R2, the better the regression model. R2 tells you the fraction of variation of the dependent variable that is due to the variation in the independent variables.
     
  7. What are the requirements for a good regression model?
    Ans: A good regression model must be accurate (have a large R2 value), but it must also be parsimonious (as few independent variables as possible). In addition, its residuals must be well behaved (unbiased, homoscedastic, normal).
     
  8. Use SAS or R to compute x, y, sx, sy, and r for the following bivariate dataset:
     
    x:   1 2 3 4
    y:   1 3 2 4

    Then use these statistics to compute the simple straight line regression equation and the regression through the origin equation for this dataset.
    Ans: Here are the R and SAS scripts. The computed statistics are
    Then sub these values into the simple linear regression equation:

    Here is the calculation of the regression through the origin equation:
    so that the regression through the origin equation is y = 0.9666667 x.
     
  9. Add an options statement to the following SAS script that will modify the listing (typewriter output) as follows:
    1. Suppress the date,
    2. Suppress "The SAS System",
    3. Start numbering the pages at 1,
    4. Set the page width to 70 characters,
    5. Use only keyboard characters.
    Hint: use this option to force only keyboard characters to be used in the output:
    Ans: Add this options statement to the top of the SAS script:
    Also add the statement
    to the data step or proc for which you want to suppress the default title.

 

More About Linear Regression

 

The Adjusted R2 Value

 

Influence Points

 

Multicollinarity

 

Regresson Diagnostic Plots