To Documents
Final Exam Review Guide
Format of Exam Questions
- Short answer, multiple choice, short essay, problems, interpret R output.
Symbols
-
xi Value of ith variable of univariate dataset
μ Population mean
σ Population standard deviation
σ2 Population
variance
x Sample mean
SD
Sample standard deviation, divide by n
sx or SD+ Sample standard deviation, divide by n - 1
Q0
Q1 Q2 Q3 Q4 Sample quartiles
zx zX zS
z-scores
xi Value of ith independent variable of bivariate dataset
yi Value of ith dependent variable of bivariate dataset
r Correlation of x and y in a bivariate dataset
a Slope of true regression line
a^ Slope of estimated regression line
b Intercept of true regression line
b^ Intercept of estimated regression line
RMSE Root mean square error for a regression model
H0 The null hypothesis
H1 The alternative hypothesis
α Level of a z-test or t-test.
Common levels are α = 0.05 and
α = 0.01
Formulas
- Interquartile Range: IQR = Q3 - Q1
- Inner fences for boxplot: Q1 - 1.5 × IQR; Q3 + 1.5 × IQR
- Outer fences for boxplot: Q1 - 3.0 × IQR; Q3 + 3.0 × IQR
- z-score for individual observations: z = (x - x) / SD+
- Standard error of the average: SEave = SD+ / √n
- z-score for sample average: z = (x - μ) / SEave
- Ideal measurement model: xi = μ + ei
- Linear regression model: yi = axi + b + ei
- Estimated Linear regression model: yi - y =
(r SDy / SDy) (xi - x)
- Root mean squared error for regression: RMSE = (SD+) √(1 - r2)
- Addition Rule: if A and B are disjoint events, P(A ∪ B) = P(A) + P(B).
- Multiplication Rule: if A and B are independent events, P(A ∩ B) = P(A)P(B)
- Probability of at Least One Success, of independent Bernoulli trials: 1 - (1 - p)n
- Expected Value of a random variable: E(x) = x1P(x1) + ... + xmP(xm)
- Theoretical Variance of a random variable: Var(x) = (x1 - E(x))2 P(x1) + ... +
(xm - E(x))2 P(xm)
- Theoretical SD of a Random Variable: SD(x) = √Var(x)
- Expected Value of Bernoulli Random Variable: E(X) = p
- Variance of Bernoulli Random Variable: Var(X) = p(1 - p)
- Expected Value of a Sum: E(S) = nE(x1)
- Variance of a Sum, where the random variables x1, ... , xn
are independent: Var(S) = nVar(x1)
- Standard Error of a Sum, where the random variables x1, ... , xn
are independent: σS = √x
- Standard Error of an Average, where the random variables x1, ... , xn
are independent: σave = σx / √n
- Test Statistic for a z-test: z = (x - μ) / SEave,
where SEave = SD+ / √n
- Number of ways of choosing k items from n items: kCn =
n!/(k! (n-k)!)
- Probability of k out of n successes, where P(success) = p: nCk pk (1 - p)n-k
- Expected Value of Binomial Random Variable: E(X) = np
- Variance of Binomial Random Variable: Var(X) = np(1 - p)
- Test Statistic for a z-test: z = (x - μ) / SEave,
where SEave = SD+ / √n
- Test Statistic for a t-test: t = (x - μ) / SEave,
where SEave = SD+ / √n
Persons
- Blaise Pascal, Abraham de Moivre, Jakob Bernoulli, Karl Gauss, Ronald Fisher,
John Tukey, Alexandr Lyapunov, William Gossat (Student)
Definitions
- Controlled experiment, double blind, randomization, observational
study, lurking variables (also called confounding factors),
univariate dataset, histogram, density histogram, bin, mean,
median, variance, parsimonious, stem-plot, boxplot, normal plot, mild outliers,
extreme outliers, normal histogram, ideal measurement model, bias,
center, spread, plot of xi vs. i (unbiased, biased, homoscedastic,
heteroscedastic, standard normal curve), critical point, inflection point,
standard error of the mean, normal scores, normal plot, bivariate dataset,
bivariate normal, correlation, R-squared value, causation, regression line,
residual plot (residuals vs. predicted values, unbiased, biased, homoscedastic,
heteroscedastic), root mean square error, probability, ways to obtain
probabilities (theoretical, frequentist, subjective), fair bet, mutually
exclusive events, the addition rule, independent events, the multiplication
rule, n factorial (n!), 0! = 1, binomial formula, random variable (RV),
probability distribution, expected value of an RV, theoretical SD of an RV, expected value and theoretical SD of a sum, sample mean, SD+, expected
value, and SE for average in ideal measurement model, expected value and SD for
Bernoulli random variables, law of averages (law of large numbers = LLN),
central limit theorem = CLT, normal approximation of the binomial distribution, confidence interval for p,
confidence interval for μ, the 5 steps to a test of hypothesis, null hypothesis,
alternative hypothesis, test statistic, one-sample z-test for average, z-test
for a sum, one-sample t-test, 95% confidence intervals using z and t tables,
p-value, paired sample z- and t-tests, importance vs. significance.
Know How To
- Find the proportion of observations within a given interval of
a normal histogram using the standard normal table.
- Write down and/or discuss the ideal measurement model.
- Compute the standard error of the average and a 95 confidence
interval for the true measurement in the ideal measurement model
- Draw the boxplot or interpret an R boxplot. Use it to detect outliers.
- Given an x-value, x, and SDx,
compute the z-score.
- Use the normal table to determine the proportion of observations in
a bin of the form [a, b], (-∞, a], or [a, ∞).
- Given a number p between 1 and 100, find the percentile for that
value, using a normal table: work backwards by looking up the proportion
in the body of the table to find the corresponding z-score, then use
x = z * mu + sigma, if necessary.
- Use R to compute areas under the normal curve with pnorm.
- Use R to compute percentiles of the normal curve with qnorm.
- Use R to generate normally distributed random outcomes with qnorm.
- Find normal scores by hand and using qqnorm.
- Interpret a normal plot (normal, skewed to the left or right,
thin tails, thick tails).
- Compute a regression equation given using the formula
y - y =
(r SDy / SDx)
(x - x)
- Given a regression equation and x value find the predicted y value.
- Assess whether a regression model is adequate using residual and
normal plots.
- Interpret a residual plot (unbiased, biased, homoscedastic, heteroscedastic).
- Compute the root mean squared error using this formula:
RMSE = SDy √(1 - r2)
- Calculate probabilities using the addition rule, the multiplication rule,
the binomial formula.
- Use the formula 1 - (1 - p)n to calculate
probability of at least one success out of n Bernoulli trials.
- Compute the expected value and SE for the sum of random variables.
- Compute confidence intervals for the sum of Bernoulli random variables.
- Compute the confidence interval for μ in the ideal measurement model.
- Perform the 5 steps of a test of hypothesis, either by hand or using R output:
- Write down the null and alternative hypotheses for a z-test or y-test.
- Compute the test statistic, assuming that the null hypothesis is true.
- Write down a 95% confidence interval. [-1.96, 1.96] for a
z-test. For simplicity, sometimes we use [-2, 2] for a 95% confidence interval.
Use standard normal table to get confidence intervals of other sizes.
- Decide whether to accept or reject the null hypothesis.
- Compute the p-value. Accept the null hypothesis if p ≥ 0.05 and
reject the null hypothesis if p < 0.05.
- Compute confidence intervals using the normal table and using the
t-table. Look at the bottom of the t-table to get confidence intervals
easily.
- Perform these tests of hypothesis by hand: one sample z-test for μ,
one sample z-test for S, paired-sample z-test.
- Perform these tests by hand and/or using R output: one sample t-test,
paired-sample t-test
- Use R to obtain the p-value for a z-test or t-test.
R Functions
c sum mean sd cor plot boxplot
dnorm pnorm qnorm qqnorm read.csv
data.frame lm summary resid predict
dbinom pbinom qbinom rbinom t.test
Explain
- Be able to explain in terms that someone not very familiar with
statistics will understand:
- What to the sample mean and SD tell you about a dataset?
- What does a histogram tell you and what must you watch out for
of the bin widths are not all equal?
- What is the ideal measurement model?
- What is correlation and how does it relate to causation?
- Why is correlation not always the same as causation?
- What is a regression equation? What is required to have a good
linear regression model.
- How are probabilities determined?
- What is the Law of Large Numbers (Law of Averages) and how is it commonly misstated?
- Why is the Central Limit Theorem important in statistics?
- What does it mean for a result to be statistically significant?
Why is statistical significance not the same as importance?
- Explain what a test of hypothesis is and some things to watch out for
when using a test of hypothesis.
- What is a paired-sample t-test.