Final Exam Review Guide

To Documents

Final Exam Review Guide

Format of Exam Questions

Short answer, multiple choice, short essay, problems, SPSS analysis

Symbols

μ Population mean
σ Population standard deviation
σ² Population variance
x Sample mean
SD Sample standard deviation, divide by n
s_x = SD+ Sample standard deviation, divide by n - 1
Q0 Minimum value in sample
Q1 1st Quartile = 25th Percentile
Q2 2nd Quartile = 50th Percentile
Q3 3rd Quartile = 75th Percentile
Q4 Maximum value in sample

Formulas

Interquartile Range: IQR = Q3 - Q1
Inner fences for boxplot: Q1 - 1.5 × IQR; Q3 + 1.5 × IQR
Outer fences for boxplot: Q1 - 3.0 × IQR; Q3 + 3.0 × IQR
z-score for individual observations: z = (x - x) / SD+
Standard error of the average: SE_ave = SD+ / √n
z-score for sample average: z = (x - μ) / SE_ave
Ideal measurement model: x_i = μ + e_i
Linear regression model: y_i = ax_i + b + e_i
Estimated Linear regression model: y_i - y = (r SD_y / SD_y) (x_i - x)
Root mean squared error for regression: RMSE = (SD+) √(1 - r²)
Addition Rule: if A and B are disjoint events, P(A ∪ B) = P(A) + P(B).
Multiplication Rule: if A and B are independent events, P(A ∩ B) = P(A)P(B)
Probability of at Least Success, of Bernoulli trials: 1 - (1 - p)ⁿ
Expected Value of a random variable: E(x) = x₁P(x₁) + ... + x_mP(x_m)
Theoretical Variance of a random variable: Var(x) = (x₁ - E(x))² P(x₁) + ... + (x_m - E(x))² P(x_m)
Theoretical SD of a random variable: SD(x) = √Var(x)
Expected Value of a Sum: E(S) = nE(x₁)
Theoretical Variance of a Sum, where the random variables x₁, ... , x_n are independent: Var(S) = nVar(x₁)
Standard Error of a Sum, where the random variables x₁, ... , x_n are independent: σ_S = √x
Standard Error of an Average, where the random variables x₁, ... , x_n are independent: σ_ave = σ_x / √n
Test Statistic for a z-test: z = (x - μ) / SE_ave, SE_ave = SD+ / √n
Test Statistic for a t-test: t = (x - μ) / SE_ave, SE_ave = SD+ / √n
Test Statistic for a Chi-squared test: χ² = (O₁ - E₁)² / E₁ + ... + (O_k - E_k)² / E_k

Persons

Pascal, Graunt, de Moivre, Cotes, Gauss, Fisher, Tukey

Definitions

Controlled experiment, double blind, randomization, observational study, lurking variables (also called confounding factors), univariate dataset, histogram, density histogram, bin, mean, median, variance, parsimonious, stem-plot, boxplot, normal plot, mild outliers, extreme outliers, normal histogram, ideal measurement model, bias, center, spread, plot of x_i vs. i (unbiased, biased, homoscedastic, heteroscedastic, standard normal curve), critical point, inflection point, standard units, standard error of the mean, normal score (Van der Waerden's method), normal plot, bivariate dataset, bivariate normal, correlation, R-squared value, causation, regression line, residual plot (residuals vs. predicted values, unbiased, biased, homoscedastic, heteroscedastic), root mean square error, probability, ways to obtain probabilities (theoretical, frequentist, subjective), fair bet, mutually exclusive events, the addition rule, independent events, the multiplication rule, n factorial (n!), 0! = 1, binomial formula, random variable (rv), probability distribution, expected value of a rv, theoretical SD of a random variable, expected value and theoretical SD of a sum, sample mean, SD+, expected value, and SE for average in ideal measurement model, expected value and SD for Bernoulli random variables, law of averages (law of large numbers = LLN), central limit theorem = CLT, normal approximation, confidence interval for p, confidence interval for μ, the 5 steps to a test of hypothesis, null hypothesis, alternative hypothesis, test statistic, one-sample z-test for average, z-test for a sum, one-sample t-test, 95% confidence intervals using z and t tables, p-value, paired sample z- and t-tests, independent two-sample sample and t-test, importance vs. significance, one-tailed tests, chi-squared test for goodness-of-fit, chi-squared test for independence, bootstrapping, Simpson's Paradox.
Where appropriate, know both the defining formula and the intuitive idea behind the concept.

Know How To

Construct a stem-plot, also called a stem-and-leaf display.
Determine the number or percentage of observations in an interval of a histogram (assuming the data in each bin is distributed uniformily.
Draw a histogram with possibly unequal class widths.
Compute the median, Q1, Q3, IQR, and mean of a histogram.
Estimate the proportion of observations in an interval for a histogram.
Find the proportion of observations within a given interval of a normal histogram using the standard normal table.
Write down or discuss the ideal measurement model.
Compute the standard error of the average and a 95 confidence interval for the true measurement in the ideal measurement model
Draw the boxplot and use it to detect outliers.
Given an x-value, x, and SD_x, compute the z-score.
Use the normal table to determine the proportion of observations in a bin of the form [a, b], (-∞, a], or [a, ∞).
Given a number p between 1 and 100, find the percentile for that value, using a normal table: work backwards by looking up the proportion in the body of the table to find the corresponding z-score, then use x = z * mu + sigma, if necessary.
Use SPSS to compute proportions under the normal curve with Cdf.Normal.
Use SPSS to compute percentiles with Idf.Normal.
Find normal scores using Van der Waerden's method.
Interpret a normal plot (normal, skewed to the left or right, thin tails, thick tails).
Discuss a plot of x_i vs. i.
Compute a regression equation given using the formula
y - y = (r SD_y / SD_x) (x - x)
Given a regression equation and x value find the predicted y value.
Assess whether a regression model is adequate using residual and normal plots.
Interpret a residual plot (unbiased, biased, homoscedastic, heteroscedastic).
Compute the root mean squared error using this formula:
RMSE = SD_y √(1 - r²)
Calculate probabilities using the addition rule, the multiplication rule, the binomial formula.
Use the formula 1 - (1 - p)ⁿ to calculate probability of at least one success out of n Bernoulli trials.
Compute the expected value and SE for the sum of random variables.
Compute confidence intervals for the sum of Bernoulli random variables.
Compute the confidence interval for μ in the ideal measurement model.
Perform the 5 steps of a test of hypothesis, either by hand or using SPSS output:
1. Write down the null and alternative hypotheses.
2. Compute the test statistic, assuming that the null hypothesis is true.
3. Write down a 95% confidence interval. [-1.96, 1.96] for a z-test. For simplicity, sometimes we use [-2, 2] for a 95% confidence interval. Use standard normal table to get confidence intervals of other sizes.
4. Decide whether to accept or reject the null hypothesis.
5. Compute the p-value. Accept the null hypothesis if p ≥ 0.05 and reject the null hypothesis if p < 0.05.
Compute confidence intervals using the normal table and using the t-table. Look at the bottom of the t-table to get confidence intervals easily.
Perform these tests of hypothesis by hand: one sample z-test for μ, one sample z-test for S, paired-sample z-test.
Perform these tests by hand and/or using SPSS output: one sample t-test, paired-sample t-test, chi-squared test for goodness of fit, chi-squared test for independence.
Use SPSS to obtain the p-value for each test.

Compute a percentile of a dataset using the SPSS HAVERAGE method. Here is an example:

Problem: compute the 47th percentile for this dataset:
38 92 15 78 79 61

Sort the dataset:
      15   38   61   78   79   92
      This dataset has observation numbers 1 through 6.
      The number of observations is n = 6.

Compute the weights: compute w_i from the observation numbers as
w_i = 100 · i / (n + 1)

Observation Number:	1	2	3	4	5	6
Weight w_i:	14.28571	28.57143	42.85714	57.14256	71.42857	85.71429
Observation x_i:	15	38	61	78	79	92

Find the percentile with linear interpolation: Since 42.85714 < 47 < 57.12256, the 47th percentile x will be between 61 and 78. We use linear interpolation like we did to find percentiles for a histogram:
      (x - 61) / (78 - 61) = (47 - 42.85714) / (57.12256 - 42.85714)
      (x - 61) / 17 = 0.2904128
      (x - 61) = 4.937017
      x = 65.934017

Now find the 47th percentile using SPSS to check your answer:
      Create a new dataset with a variable x. Set the measure of x to scale and enter the six numbers into the dataset.
      Select Analyze >> Descriptive Statistics >> Frequencies.
      Drag the variable x into the Variables(x) box.
      Click the Statistics button. In the Statistics Dialog, check the Percentiles checkbox, enter 47 into the box after Percentiles, and click Add.
      Click Continue. Click OK
In the Statistics section this output is obtained:
      Percentiles   47   65.9300.

Explain

Be able to explain in terms that someone not very familiar with statistics will understand:
1. What to the sample mean and SD tell you about a dataset?
2. What does a histogram tell you and what must you watch out for of the bin widths are not all equal?
3. What is the ideal measurement model?
4. The original and current definitions of the meter, second, and kilogram.
5. What is correlation and how does it relate to causation?
6. Why is correlation not always the same as causation?
7. What is a regression equation? What is required to have a good linear regression model.
8. How are probabilities determined?
9. What is the Law of Large Numbers (Law of Averages) and how is it commonly misstated?
10. Why is the Central Limit Theorem important in statistics?
11. What does it mean for a result to be statistically significant? Why is statistical significance not the same as importance?
12. Explain what a test of hypothesis is and some things to watch out for when using a test of hypothesis.
13. What is the differences and similarities between the paired-sample t-test and the independent-sample t-test.
14. How are bootstrapped confidence intervals different than traditional confidence intervals?
15. What is Simpsons paradox?