To Lecture Notes

CSC 423 -- 8/7/17

Review Questions

  1. What do the terms inner fence and outer fence mean for a boxplot?
    Ans: The inner fences are the locations 1.5 IQRs below Q1 and 1.5 IQRs above Q3; The outer fences are the locations 3.0 IQRs below Q1 and 3.0 IQRs above Q3 Points outside of the outer fences are called extreme outliers; points between an inner fence and an outer fence are called mild outliers.
     
  2. What do these expressions mean?
     

    Ans: They mean that the residuals are unbiased and homoscedastic, respectively.
     
  3. What are some other names for a regression equation?
    Ans: Least squares estimator and, for a simple linear regression equation, the line of averages.
     
  4. Compute the normal scores for a dataset when n = 9.
    Ans: Look up the areas
     

    inside the normal table to find these corresponding z-scores:
     

    You can also accomplish this with these R and SAS scripts:
  5. What is the hat matrix? What does it have to do with influence points?
    Ans:
     
  6. What does AIC mean?
    Ans: It is Akaike's Information criterion defined by
    or
  7. What is a quadratic regression model?
    Ans: It is a model that includes a second degree term xi2 in the model.
     
  8. For the Pendul Example, why is a square root transform better than adding a quadratic term?
    Ans: Because the quadratic model predicted values would form a parabola, which would eventually descend after decreasing. A pendulum does not do this, the period of the parabola continues to increase as its length decreases. A this is still true when using the square root transformation.
     
  9. For the BodyBrain Example, use the log-log model to find the predicted brain weight if the body weight is 100 kg. Here is the predicted value for log_brain:
    Ans: Find the predicted value using the log-log model:
    Then take the exponential to find the actual brain weight: brain = exp(log(brain)) = exp(4.73) = 113.
     
  10. What is the Bernoulli distribution? Use R to generate 100 random Bernoulli outcomes with p = 0.5. Ans:
  11. A maximum likelihood estimator for a parameter is the value of the parameter that maximizes the probability (or the log of the probability) that the obtained sample occurs. Since the Bernouilli probability function is
     

    The joint density for a sample of n Bernoulli random variables is
     

    Find the value of p that maximizes log Prob(x1, ... , xn) by differentiating the following expression with respect to p, setting the result equal to zero, and solving for p:
     

    Ans: Recall that the derivative of log(p) with respect to p is 1 / p.  Also, let S be the number of sucesses. We set the derivative of the preceding expression with respect to p to zero:
     
    which is the usual way that we estimate the true probability of success p.
     
  12. How does generalized regression differ from ordinary regression?
    Ans: Ordinary regression assumes that each yi is a random variable ∼ N(E(yi, σ2). With generalized regression each dependent variable value is determined by a distribution, which might not be normal (for example Bernoulli or Poisson). It also uses a link function, which is the identity function for ordinary regression.)
     
  13. What is a link function?
    Ans: It is a function which ties the independent variables to the dependent variable. Logistic regression uses the logit function: logit(s) = log(s / (1 - s))

 

Projects

 

Topics in Categorical Data Analysis

 

Review for Final Exam