Final Exam Spring 12 -- Answers for parts A and B. Part A. Multiple Choice Questions. 1. a. To know how a random variable works, you must know the process of choosing it. Another way of specifying a random variable is to give its probability distribution function for a discrete random variable or its probability density function for a continuous random variable. 2. b; The distance between x = mu and one of the inflection points is sigma. Therefore the distance between the inflection points is 2 * sigma. 3. d. The normal distribution is the standard by which all distributions are measured, so it has neither thick or thin tails. The t distribution has thicker tails that and normal distribution for all degrees of freedom. The lower the number of degrees of freedom, the thicker the tails. The uniform distribution has no tails at all outside of its minimum and maximum limits, so it has thin tails. 4. d; 5. d 6. d; Have each subject evaluate both websites in random order. Pair the evaluation score for each website with the score of the other website evaluated by the same person. 7. b; The diagonal elements of Cov(epsilon) are the variances of the residuals, which should all be sigma^2 if the residuals are homoscedastic. The covariances Cov(epsilon_i, epsilon_j) should all be 0 if the residuals are uncorrelated. This means that Cov(epsilon) = sigma^2 I. 8. a. 9. a; If SSE = SST, SSE / SST = 1. Then R-squared = 1 - (SSE / SST) = 1 - 1 = 0. 10. b; The predicted value vector is y^ = Hy. The residual vector is y - y^ = y - Hy = Iy - Hy = (I - H)y. The null hypotheses is H0: mu = 108, the test statistic is z = (xbar - mu) / (sx / sqrt(n)) = (115 / 108) / (12 / sqrt(15)) = 0.344. Using the t-table with degrees of freedom = n - 1 = 15 - 1 = 14, the 10% confidence interval is [-1.76,1.76]; the 5% confidence interval is [-2.15,2.15]. 0.344 is both intervals so the test is not significant at either the 5 or 10% confidence intervals. Part B: Short Answer Questions. 1. Look up the value in the t-table for df=16-1 = 15 and Upper-tail probability p = 0.005 (confidence 99%). The value is 2.947, which gives a confidence interval of [-2.947, 2.947] -2.947 < t < 2.947 -2.947 < (3.72 - mu) / ((0.245 / sqrt(16)) < 2.947 -2.947 < (3.72 - mu) / 0.06125 < 2.947 -2.947 * 0.06125 < 3.72 - mu < 2.947 * 0.06125 -0.1805 < 3.72 - mu < 0.1805 -3.9005 < -mu < -3.5395 3.5395 < mu < 3.9005 2. The Van der Waerden normal scores divide the standard normal curve into n + 1 = 9 + 1 = 10 equal areas. Each area will be 1 / (n + 1) = 1 / 10 = 0.1. Look in body of the z-table to find the z-scores that produce these areas under the standard normal curve: 1/10, 2/10, ... , 9/10. The corresponding z-scores are -1.28, -0.84, -0.52, -0.25, 0.00, 0.25, 0.52, 0.84, 1.28. The second normal score is -0.84. Of course, you don't have to compute all of the normal scores, just the second that corresponds to an area of 2/10 = 0.2. 3. The 75th percentile = Q3 is the z-score for which the area of the normal curve over the interval [-infinity, z] is 0.75. Look this up in the z-table for positive z values to get z=0.67. Similarly look up the 25th percentile = Q1 in the z-table for negative values to get x=-0.67 The interquartile range IQR = 0.67 - (-0.67) = 1.34. Now the inner fences are marked at Q1 - 1.5 * IQR = -0.67 - 1.5 * 1.34 = -2.68 and Q3 + 1.5 * IQR = 0.67 + 1.5 * 1.34 = 2.68. Using the z-table, the probability that a standard normal observation is less than z = -2.68 is 0.00368. P(z > 2.68) is also equal to 0.00368 by the symmetry of the normal density. Therefore the probability that a point is an outlier is P(z < -2.68) + P(z > 2.68) = 0.00368 + 0.00368 = 0.00736. 4. Degrees of freedom = n1 + n2 - 2 = 8 + 10 - 2 = 16. The t confidence interval for 16 df = [-2.1199, 2.1199]. The t statistic is -2.497, so reject the null hypothesis that the two velocities using the two powders is the same. 5. y - ybar = (r * sy / sx) * (x - xbar) y - 52 = (0.8 * 3 / 0.05) * (x - 1.7) y - 52 = 48 * (x - 1.7) y - 52 = 48 * x - 48 * 1.7 y = 48 * x - 48 * 1.7 + 52 y = 48 * x - 29.6 6. SSE = 8.4, SSM = 5.8. n = 15, p = 6, so DFM = p - 1 = 6 - 1 = 5, DFE = n - p = 15 - 6 = 9. F = (SSM / DFM) / (SSE / DFE) = (5.8 / 5) / (8.4 / 9) = 1.243. The 95% F confidence interval for (5, 9) degrees of freedom is {0, 3.482]. Because 1.243 is in this interval, accept the null hypothesis that none of the regressors are significant. 7. Multiply row 3 of (X'X)^-1 by X'y: -1.000 * 26.77 + 0.000 * 33.71 + 0.6667 * 50.63 = 6.985 8. MSE = SSE / DFE = 0.6667 / (6 - 3) = 0.113. Now, Cov(beta) = (X'X)^-1 * (MSE). The standard error of beta2^ is the square root of row 2 and column 2 of (X'X)^-1 * (MSE) = sqrt(0.6667 * 0.113) = 0.274. 9. y^ = Hy, where H is the hat matrix. We only need y5^. y5^ = row 5 of the hat matrix multiplied by y: (0 0 0 0.33 0.33 0.33) * (5.93 0.71 -3.73 12.87 8.30 2.69)' = 0 * 5.93 + 0 * 0.71 * 0 * (-3.73) + 0.333 * 12.87 + 0.333 * 8.30 + 0.333 * 2.69 = 0.333 * (12.87 + 8.30 + 2.69) = 7.945.