Final Exam Winter 12 -- Answers for parts A and C. Part A. Multiple Choice Questions. Reasons are only supplied for starred (*) questions. 1. a. 2. b; The distance between x = mu and one of the inflection points is sigma. Therefore the distance between the inflection points is 2 * sigma. 3. c; The 75th percentile is the z-score for which the area of the normal curve over the interval [-infinity, z] is 0.75. Look this up in the z-table for positive z values to get z=0.67. Similarly look up the 25th percentile in the z-table for negative values to get x=-0.67 The interquartile range IQR = 0.67 - (-0.67) = 1.34. 4. a; 5. d; 6. d; 7. Closest answer is b: -2.58 < z < 2.58 -2.58 < (4.52 - mu) / ((0.345 / sqrt(16)) < 2.58 -2.58 < (4.52 - mu) / 0.08625 < 2.58 -2.58 * 0.08625 < 4.52 - mu < 2.58 * 0.08625 -0.2225 < 4.52 - mu < 0.2225 -4.743 < -mu < -4.297 4.297 < mu < 4.743 8. b; 9. a; The null hypotheses is H0: mu = 108, the test statistic is z = (xbar - mu) / (sx / sqrt(n)) = (115 / 108) / (12 / sqrt(15)) = 0.344. Using the t-table with degrees of freedom = n - 1 = 15 - 1 = 14, the 10% confidence interval is [-1.76,1.76]; the 5% confidence interval is [-2.15,2.15]. 0.344 is both intervals so the test is not significant at either the 5 or 10% confidence intervals. 10. d; Have each subject evaluate both websites in random order. Pair the evaluation score for each website with the score of the other website evaluated by the same person. 11. a. 12. b; The Van der Waerden normal scores divide the standard normal curve into n + 1 = 9 + 1 = 10 equal areas. Each area will be 1 / (n + 1) = 1 / 10 = 0.1. Look in body of the z-table to find the z-scores that produce these areas under the standard normal curve: 1/10, 2/10, ... , 9/10. The corresponding z-scores are -1.28, -0.84, -0.52, -0.25, 0.00, 0.25, 0.52, 0.84, 1.28. The fifth normal score is 0.00. Of course you don't have to compute all the normal scores. You can notice that the fifth normal score when dividing the standard normal curve into n + 1 = 10 equal areas divides the curve exactly in half. The z-score that does this is z = 0.00. 13. a. The R2 value is the square of the correlation coefficient rxy = sxy / (sx * sy) = 361 / (25.3 * 31.7) = 0.450. Therefore, R2 is r^2 = 0.450^2 = 0.203 14,15,16. Use ' to denote the matrix transpose and ^-1 to represent the matrix inverse. The solution to the matrix normal equation is beta = (X'X)^-1 (X'Y) [beta0] [ 0.89583 -0.1875 -0.0556 ] [ 87.2] [7.3592] [beta1] = [-0.18750 0.0625 0.0000 ] %*% [274.0] = [0.7750] [beta2] [-0.05556 0.0000 0.0185 ] [348.6] [1.6043] 14. b; You can calculate this directly as row 1 of (X'X)^-1 times X'Y: 0.89583 * 87.2 + (-0.1875) * 274.0 + (-0.0556) * 348.6 = 7.3592 15. b; Calculate this directly as row 3 of (X'X)^-1 times X'Y: -0.05556 * 87.2 + 0.0000 * 274.0 + 0.0185 * 348.6 = 1.6043 16. b; DFE = n - p = 6 - 3 = 3. Var(beta) = (X'X)^-1 * (MSE) (X'X)^-1 * (SSE / DFE) [ 0.89583 -0.1875 -0.0556 ] = [-0.18750 0.0625 0.0000 ] * (0.7357 / (6 - 3) [-0.05556 0.0000 0.0185 ] [ 0.2197 -0.0460 -0.0136 ] [-0.0460 0.0153 0.0000 ] [-0.0136 0.0000 0.0045 ] The standard error of beta1 = sqrt(0.0153) = 0.1238. You can compute this directly with sqrt(0.0625 * (0.7357 / (6 - 3)) 17. b; R-squared = 1 - (SSE / DFE) = 1 - (0.7367 / 149.78) = 0.9950815. 18. b; F = 4.17, DFM = p - 1 = 7 - 1 = 6, DFE = n - p = 19 = 7 = 12. Use the F tables to get a 95% confidence interval (level=0.05, DF numerator = 6, DF denominator = 12): [0, 2.996] and 99% confidence interval (level=0.01, DF numerator = 6, DF denominator = 12): [0. 4.821]. F = 4.17 is only in the 95% confidence interval. 19. b; 20. b. Part C: Short Answer Questions. 1. Looking at the boxplots, there are no outliers. The normal plots show that the Depression Score values are approximately normal, so again there are no outliers (below the normal plot on the left or above the normal plot on the right). 2. Step 1: H0: mu for men = mu for women; H1: mu for men != mu for women. Step 2: t = 2.36 assuming equal variances; t = 2.24 not assuming equal variances. Step 3: Look up confidence interval using t-test with degrees of freedom = n - 2, assuming equal variances: [-2.026, 2.026]. Step 4: 2.24 is not in the interval [-2.026, 2.026], so reject H0. Step 5: The p-value is 0.0227 when assuming equal variances of the groups, the p-value is 0.0327 when not assuming equal variances. In either case p < 0.05, which confirms that rejecting H0 is correct. 3. The independent two-sample t-test assumes that the sample for each group is normally distributed. The normal plots on Page 13 show that this is the case. 4. For Model 1, F = 6.55, p = 0.0012. Reject the null hypotheses that all of the regression coefficients equal 0 except possibly beta0 (intercept coefficient). 5. Model 4 (dep = age wp) has the largest adjusted R-squared value of 0.2397. In addition, the residual plot for Model 4 shows that the residuals are homoscedastic and unbiased. The normal plot shows that the residuals are approximately normal. Model 4 looks like the best model. 6. The variance inflation factor indicates if there is a linear dependency among the independent variables. The accepted cutoff value for the VIF is 5. For models 1, 2, 3, and 4 the VIF values are all close to 1, so there is no multicollinarity problem in these models.