Final Exam Fall 11 -- Answers for Part A and Part C. Part A. Multiple Choice Questions: 1. a 2. b 3. b 4. b 5. d 6. d 7. c 8. a 9. a 10. a 11. d 12. c 13. c 14. c 15. d 16. a 17. d 18. b 19. b 20. b Part C. Problem 1: a. The assumptions for the independent two-sample t-test are (1) the values in each group are normally distributed and (2) the groups are independent. Assumption (1) can be checked by looking at the normal plots on page 3 . They show that each group is approximately normal. b. Null hypothesis: the average reaction times for the two groups are equal. Alternative hypothesis: the average reaction times for the two groups are not equal. c. t = =1.94 d. Use the t-table with n - 2 = 16 - 2 = 14. The confidence interval is (-2.145, 2.145). e. Accept H0 because t is in the confidence interval. f. p = 0.0729 for equal variance assumption, p = 0.0747 for unequal variance assumption. To be safe, use the larger of the two p-values. Problem 2: a. The estimated regression equation is body_fat = 117.08 + 4.334*triceps - 2.857*thigh - 2.186*midarm b. R-squared = 0.8014. It says that 80.14% of the variation of body_fat is due to the variation of the independent variables. c. Yes, the F-value for the overall F test is 21.52, which has a p-value < 0.0001. R shows the p-value for the overall F-test to be 7.343e-06 = 7.343 x 10^-6. d. The residual plots on pages 8 and 9 are all approximately unbiased and homoscedastic. e. The normal plot of the residuals on page 10 is slightly skewed to the right, but it is fairly close to normal. f. R marks observations 5 and 15 as influence points, with high values for h_ii (ith diagonal value of Hat matrix) Observation 1 has a h_ii value almost as large. Neither DFFITS or the DFBETAS statistics have large absolute values. g. The VIF factors for triceps, thigh, and midarm are 708.84291, 564.34339, and 104.60601, respectively. These are all very large. One of these variables needs to be removed. Remove the one with the largest p-value. h. The R-squared value is good, the residual plots are homoscedastic, the normal plots of the residuals are close to normal. However, there is a big multicorrelation problem. So some variable must be eliminated from the model. i. The regression parameter associated with thigh is not significant (p = 0.2849), so eliminate this variable from the model and reevaluate. j. body_fat^ = 117.08 + 4.334*triceps - 2.857*thigh - 2.186*midarm = 117.08 + 4.334*25.0 - 2.857*50.0 - 2.186*26.0 = 25.744 Note that the coefficients for thigh and forearm are negative, which is the opposite of what you would expect. This often happens when multicollinarity is present.