To Documents
Final Exam Review Guide
Format of Exam Questions
- Short answer, multiple choice, short essay, problems, SPSS analysis
Symbols
-
μ Population mean
σ Population standard deviation
σ2 Population
variance
x Sample mean
SD
Sample standard deviation, divide by n
sx = SD+ Sample standard deviation, divide by n - 1
Q0
Minimum value in sample
Q1
1st Quartile = 25th Percentile
Q2
2nd Quartile = 50th Percentile
Q3
3rd Quartile = 75th Percentile
Q4 Maximum
value in sample
Formulas
- Interquartile Range: IQR = Q3 - Q1
- Inner fences for boxplot: Q1 - 1.5 × IQR; Q3 + 1.5 × IQR
- Outer fences for boxplot: Q1 - 3.0 × IQR; Q3 + 3.0 × IQR
- z-score for individual observations: z = (x - x) / SD+
- Standard error of the average: SEave = SD+ / √n
- z-score for sample average: z = (x - μ) / SEave
- Ideal measurement model: xi = μ + ei
- Linear regression model: yi = axi + b + ei
- Estimated Linear regression model: yi - y =
(r SDy / SDy) (xi - x)
- Root mean squared error for regression: RMSE = (SD+) √(1 - r2)
- Addition Rule: if A and B are disjoint events, P(A ∪ B) = P(A) + P(B).
- Multiplication Rule: if A and B are independent events, P(A ∩ B) = P(A)P(B)
- Probability of at Least Success, of Bernoulli trials: 1 - (1 - p)n
- Expected Value of a random variable: E(x) = x1P(x1) + ... + xmP(xm)
- Theoretical Variance of a random variable: Var(x) = (x1 - E(x))2 P(x1) + ... +
(xm - E(x))2 P(xm)
- Theoretical SD of a random variable: SD(x) = √Var(x)
- Expected Value of a Sum: E(S) = nE(x1)
- Theoretical Variance of a Sum, where the random variables x1, ... , xn
are independent: Var(S) = nVar(x1)
- Standard Error of a Sum, where the random variables x1, ... , xn
are independent: σS = √x
- Standard Error of an Average, where the random variables x1, ... , xn
are independent: σave = σx / √n
- Test Statistic for a z-test: z = (x - μ) / SEave, SEave = SD+ / √n
- Test Statistic for a t-test: t = (x - μ) / SEave, SEave = SD+ / √n
- Test Statistic for a Chi-squared test: χ2 = (O1 - E1)2 / E1 + ... +
(Ok - Ek)2 / Ek
Persons
- Pascal, Graunt, de Moivre, Cotes, Gauss, Fisher, Tukey
Definitions
- Controlled experiment, double blind, randomization, observational
study, lurking variables (also called confounding factors),
univariate dataset, histogram, density histogram, bin, mean,
median, variance, parsimonious, stem-plot, boxplot, normal plot, mild outliers,
extreme outliers, normal histogram, ideal measurement model, bias,
center, spread, plot of xi vs. i (unbiased, biased, homoscedastic,
heteroscedastic, standard normal curve), critical point, inflection point,
standard units, standard error of the mean, normal score (Van der Waerden's
method), normal plot, bivariate dataset, bivariate normal, correlation,
R-squared value, causation, regression line, residual plot (residuals vs.
predicted values, unbiased, biased, homoscedastic, heteroscedastic), root mean
square error, probability, ways to obtain probabilities (theoretical,
frequentist, subjective), fair bet, mutually exclusive events, the addition
rule, independent events, the multiplication rule, n factorial (n!), 0! = 1,
binomial formula, random variable (rv), probability distribution, expected value
of a rv, theoretical SD of a random variable, expected value and theoretical SD
of a sum, sample mean, SD+, expected value, and SE for
average in ideal measurement model, expected value and SD for Bernoulli random
variables, law of averages (law of large numbers = LLN), central limit theorem =
CLT, normal approximation, confidence interval for p, confidence interval for μ,
the 5 steps to a test of hypothesis, null hypothesis, alternative hypothesis,
test statistic, one-sample z-test for average, z-test for a sum, one-sample t-test,
95% confidence
intervals using z and t tables, p-value, paired sample z- and t-tests, independent two-sample sample and t-test, importance vs.
significance, one-tailed tests, chi-squared test for
goodness-of-fit, chi-squared test for independence, bootstrapping, Simpson's
Paradox.
- Where appropriate, know both the defining formula and the
intuitive idea behind the concept.
Know How To
- Construct a stem-plot, also called a stem-and-leaf display.
- Determine the number or percentage of observations in an
interval of a histogram (assuming the data in each bin is distributed
uniformily.
- Draw a histogram with possibly unequal class widths.
- Compute the median, Q1, Q3, IQR, and mean of a histogram.
- Estimate the proportion of observations in an interval for a
histogram.
- Find the proportion of observations within a given interval of
a normal histogram using the standard normal table.
- Write down or discuss the ideal measurement model.
- Compute the standard error of the average and a 95 confidence
interval for the true measurement in the ideal measurement model
- Draw the boxplot and use it to detect outliers.
- Given an x-value, x, and SDx,
compute the z-score.
- Use the normal table to determine the proportion of observations in
a bin of the form [a, b], (-∞, a], or [a, ∞).
- Given a number p between 1 and 100, find the percentile for that
value, using a normal table: work backwards by looking up the proportion
in the body of the table to find the corresponding z-score, then use
x = z * mu + sigma, if necessary.
- Use SPSS to compute proportions under the normal curve with Cdf.Normal.
- Use SPSS to compute percentiles with Idf.Normal.
- Find normal scores using Van der Waerden's method.
- Interpret a normal plot (normal, skewed to the left or right,
thin tails, thick tails).
- Discuss a plot of xi vs. i.
- Compute a regression equation given using the formula
y - y =
(r SDy / SDx)
(x - x)
- Given a regression equation and x value find the predicted y value.
- Assess whether a regression model is adequate using residual and
normal plots.
- Interpret a residual plot (unbiased, biased, homoscedastic, heteroscedastic).
- Compute the root mean squared error using this formula:
RMSE = SDy √(1 - r2)
- Calculate probabilities using the addition rule, the multiplication rule,
the binomial formula.
- Use the formula 1 - (1 - p)n to calculate
probability of at least one success out of n Bernoulli trials.
- Compute the expected value and SE for the sum of random variables.
- Compute confidence intervals for the sum of Bernoulli random variables.
- Compute the confidence interval for μ in the ideal measurement model.
- Perform the 5 steps of a test of hypothesis, either by hand or using SPSS output:
- Write down the null and alternative hypotheses.
- Compute the test statistic, assuming that the null hypothesis is true.
- Write down a 95% confidence interval. [-1.96, 1.96] for a
z-test. For simplicity, sometimes we use [-2, 2] for a 95% confidence interval. Use standard normal table to get confidence intervals of other sizes.
- Decide whether to accept or reject the null hypothesis.
- Compute the p-value. Accept the null hypothesis if p ≥ 0.05 and
reject the null hypothesis if p < 0.05.
- Compute confidence intervals using the normal table and using the
t-table. Look at the bottom of the t-table to get confidence intervals
easily.
- Perform these tests of hypothesis by hand: one sample z-test for μ,
one sample z-test for S, paired-sample z-test.
- Perform these tests by hand and/or using SPSS output: one sample t-test,
paired-sample t-test, chi-squared test for goodness of fit, chi-squared test for
independence.
- Use SPSS to obtain the p-value for each test.
- Compute a percentile of a dataset using the SPSS HAVERAGE method. Here is an example:
Problem: compute the 47th percentile for this dataset:
38 92 15 78 79 61
- Sort the dataset:
15 38
61 78 79 92
This dataset has observation numbers 1 through 6.
The number of observations is n = 6.
- Compute the weights: compute wi from the observation numbers as
wi = 100 · i / (n + 1)
Observation Number: |
1 |
2 |
3 |
4 |
5 |
6 |
Weight wi: | 14.28571 | 28.57143 | 42.85714 | 57.14256 | 71.42857 | 85.71429 |
Observation xi: | 15 | 38 |
61 | 78 |
79 | 92 |
- Find the percentile with linear interpolation: Since
42.85714 < 47 < 57.12256, the 47th percentile x will be between 61 and 78.
We use linear interpolation like we did to find percentiles for a histogram:
(x - 61) / (78 - 61) = (47 - 42.85714) / (57.12256 - 42.85714)
(x - 61) / 17 = 0.2904128
(x - 61) = 4.937017
x = 65.934017
Now find the 47th percentile using SPSS to check your
answer:
Create a new dataset with a variable
x. Set the measure of x to scale and enter the six numbers into the
dataset.
Select Analyze >> Descriptive
Statistics >> Frequencies.
Drag the variable x
into the Variables(x) box.
Click the
Statistics button. In the Statistics Dialog, check the Percentiles checkbox,
enter 47 into the box after Percentiles, and click Add.
Click Continue. Click OK
In the Statistics section this output is
obtained:
Percentiles 47
65.9300.
Explain
- Be able to explain in terms that someone not very familiar with
statistics will understand:
- What to the sample mean and SD tell you about a dataset?
- What does a histogram tell you and what must you watch out for
of the bin widths are not all equal?
- What is the ideal measurement model?
- The original and current definitions of the meter, second, and kilogram.
- What is correlation and how does it relate to causation?
- Why is correlation not always the same as causation?
- What is a regression equation? What is required to have a good
linear regression model.
- How are probabilities determined?
- What is the Law of Large Numbers (Law of Averages) and how is it commonly misstated?
- Why is the Central Limit Theorem important in statistics?
- What does it mean for a result to be statistically significant?
Why is statistical significance not the same as importance?
- Explain what a test of hypothesis is and some things to watch out for
when using a test of hypothesis.
- What is the differences and similarities between the paired-sample
t-test and the independent-sample t-test.
- How are bootstrapped confidence intervals different than traditional confidence intervals?
- What is Simpsons paradox?