To Exam Info
Last revised 8/5/31/17, 9:00am.
Final Exam Review Guide
Date
- In class exam: Wednesday, August 10 (Day 8).
- Online Learning: In class, Wednesday (8/10), Thursday (8/11), Friday (8/12), or Saturday
(8/13). Finish by 9:00pm on Weekdays, finish by 5:00pm on
Friday or Saturday.
Bring to Midterm
- One 8 1/2 × 11 sheet of notes, both sides; 5 function calculator (+, -, *, /, √ )
- Final Exam is closed book.
- z, t, F, and chi-squared tables will be provided.
Exam Format
- Multiple choice with optional reason, short answer questions, short essay questions, problems.
- Find information on and interpret either SAS and R output.
Definitions
- Greek letters (α, β,
ε, μ, ρ, σ, θ, χ),
univariate dataset, variable types (categorical, continuous),
population, discrete probability distribution, continuous probability density,
cumulative probability distribution, quantile of probability distribution, random number,
random variable, expected value, variance,
population parameter (θ), sample statistic (T), unbiased statistic (E(T) = θ),
random sample, controlled experiment, treatment group, control group, response
variable, descriptive statistics (mean, standard deviation, standard error of
mean, median, IQR, Tukey's Hinges), Central Limit Theorem, exploratory data
analysis (histogram, boxplot, normal plot, outlier), normal scores using Van der
Waerden's method, interpretation of normal plot (normal, skewed to the left,
skewed to the right, fat tails, thin tails), z-score, confidence interval for μ, degree of freedom,
tests of hypothesis (z-test, t-test; one-sample, paired two-sample, independent
two-sample; overall F test for regression; t-test for significance of regression
parameter), Welsh-Satterthwaite Test, Behrens-Fisher Problem, null hypothesis, alternative hypothesis, test statistic, confidence
interval for test statistic, level of statistical test (α), p-value, sample covariance (sxy), sample
correlation (rxy), R-squared value, regression model (horizontal line regression, regression through
the origin, simple linear regression, multiple regression), least squares
estimator, residual plot (r. vs. p.), partial residual plot (r. vs. each independent variable),
normal plot for residuals (r. vs. normal scores),
qualities of a good regression model (accuracy, parsimony, well-behaved
residuals), matrix form of regression model,
hat matrix H,
observation space, parameter space, residual space, hat matrix, SSM, SSE,
SST, DFM, DFE, DFT, MSM, MSE, R2 = SSM / SST = 1 - SSE / SST, R-squared for
multiple regression, adjusted R-squared (R2) for multiple regression,
overall F-test, transforms of regression models (log, sqrt), influence statisics (leverage values hii,
deleted studentized residual zi*, Cook's D, DFBETA, DFFITS), collinearity,
tolerance, variance inflation factor (VIF), dummy
variables, variable selection for multiple regression models (backward selection, maximum R-squared
method using adjusted R2), training set, test set, jackknife
crossvalidation, PRESS statistic, k-fold crossvalidation, logistic regression,
logistic regression coefficients, bernoulli distribution, binomial distribution,
link function, logit function, odds ratio, predicted probabilities.
Sample Short Essay Questions
Short essays should be written in complete sentences and paragraphs. Include
an introduction and conclusion:
- Based on what you know of SAS or R (choose one) so far, describe what you like and dislike about that statistical software.
Be constructive.
- What are some of the popular transforms that can be used on a regression
model. When would you use them?
- Describe some of the statistics that measure the influence of a regression
model. What they tell you?
- What is the hat matrix and how is it useful for regression analysis.
- What is a least squares regression estimator and how is one obtained? Give a general description without detailed math equations.
- What are some methods of external validation for a regression model.
- Explain what logistic regression is and how it differs from ordinary multiple regression.
Be Able To
- Answer problems like the ones in the Review Questions at the beginning of each of the lecture notes.
- Find descriptive statistics on SAS or R output.
- Interpret histograms, boxplots, residual plots, normal plots.
- Find outliers using Tukey's method.
- Compute normal scores given the sample size using Van der Waerden's method.
- Interpret SAS or R output that performs these t-tests: one-sample, paired-sample, independent two-sample.
- Interpret SAS or R output for multiple regression models, including the overall F-test for regression.
- Write out the five steps of a test of hypothesis (one-sample t-test,
paired-sample t-test, overall F-test for regression).
- Given n, p, SSE, and SSM for a regression model, compute SST, DFM, DFE, DFT,
MSM, MSE, F, R2, R2
- Given appropriate sample statistics, obtain the LSE model for horizontal regression, regression through the origin, and simple linear regression.
- Calculate the predicted value of a new observation for a multiple regression model.
- Find statistics for a regression model on SAS or R output.
- Calculate a confidence interval for a true regression paramater of a multiple regression model.
- Discuss residual plots and normal plots of residuals for a regression model.
- Construct dummy variables for a categorical variable.
- Explain what the Bernoulli and binomial distributions are and some examples where they occur in practical applications.
- Given a probability, calculate the odds ratio. Given the odds ratio,
calculate the probability.