8/7/17 Notes

CSC 423 -- 8/7/17

Review Questions

What do the terms inner fence and outer fence mean for a boxplot?
Ans: The inner fences are the locations 1.5 IQRs below Q1 and 1.5 IQRs above Q3; The outer fences are the locations 3.0 IQRs below Q1 and 3.0 IQRs above Q3 Points outside of the outer fences are called extreme outliers; points between an inner fence and an outer fence are called mild outliers.
What do these expressions mean?
Ans: They mean that the residuals are unbiased and homoscedastic, respectively.
What are some other names for a regression equation?
Ans: Least squares estimator and, for a simple linear regression equation, the line of averages.
Compute the normal scores for a dataset when n = 9.
Ans: Look up the areas
inside the normal table to find these corresponding z-scores:
You can also accomplish this with these R and SAS scripts:
What is the hat matrix? What does it have to do with influence points?
Ans:
What does AIC mean?
Ans: It is Akaike's Information criterion defined by
or
What is a quadratic regression model?
Ans: It is a model that includes a second degree term x_i² in the model.
For the Pendul Example, why is a square root transform better than adding a quadratic term?
Ans: Because the quadratic model predicted values would form a parabola, which would eventually descend after decreasing. A pendulum does not do this, the period of the parabola continues to increase as its length decreases. A this is still true when using the square root transformation.
For the BodyBrain Example, use the log-log model to find the predicted brain weight if the body weight is 100 kg. Here is the predicted value for log_brain:
Ans: Find the predicted value using the log-log model:
Then take the exponential to find the actual brain weight: brain = exp(log(brain)) = exp(4.73) = 113.

What is the Bernoulli distribution? Use R to generate 100 random Bernoulli outcomes with p = 0.5. Ans:

rbinom(n=100, size=1, prob=0.5)
  [1] 1 1 1 0 0 1 0 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 1 0 1
 [26] 1 0 1 1 0 0 0 1 0 1 1 1 0 1 1 0 0 0 1 1 0 1 1 1 0
 [51] 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 1
 [76] 0 0 1 0 1 0 0 1 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 1 1

A maximum likelihood estimator for a parameter is the value of the parameter that maximizes the probability (or the log of the probability) that the obtained sample occurs. Since the Bernouilli probability function is
The joint density for a sample of n Bernoulli random variables is
Find the value of p that maximizes log Prob(x₁, ... , x_n) by differentiating the following expression with respect to p, setting the result equal to zero, and solving for p:
Ans: Recall that the derivative of log(p) with respect to p is 1 / p. Also, let S be the number of sucesses. We set the derivative of the preceding expression with respect to p to zero:
which is the usual way that we estimate the true probability of success p.
How does generalized regression differ from ordinary regression?
Ans: Ordinary regression assumes that each y_i is a random variable ∼ N(E(y_i, σ²). With generalized regression each dependent variable value is determined by a distribution, which might not be normal (for example Bernoulli or Poisson). It also uses a link function, which is the identity function for ordinary regression.)
What is a link function?
Ans: It is a function which ties the independent variables to the dependent variable. Logistic regression uses the logit function: logit(s) = log(s / (1 - s))

Projects

Discuss Project 3.
Discuss Project 4.
Look at the Surface Example to help you with Project 5.
To complete Project 5, use the Discrim Example to get you started.

Topics in Categorical Data Analysis

Discuss sections 1 (Bernoulli distribution only), 2, 4 (know what a maximum likelihood estimator is), 5, 8, 9 (Goodness-of-fit test only), 11, 13.

Review for Final Exam

Look at the review materials on the ExamInfo Page. In particular, look at these practice exams:
The answers are on the Exam Info Page or in the files themselves.