To Notes

IT 224 -- May 6, 2024

Review Exercises

  1. What does the correlation tell you about the relationship between two data vectors x and y?
    Answer: the correlation of x and y tells you the amount of linear association between x and y.
  2. Compute the correlation of x and y by hand:
    Obs: 1 2 3 4 5
    x: 1 2 3 4 5
    y: 1 3 2 5 4

    Verify your answer using R.
    Answer:
    x = (1 + 2 + 3 + 4 + 5) / 5 = 3,
    x = (1 + 3 + 2 + 5 + 4) / 5 = 3,
    SDx = √((1-3)2 + (2-3)2 + (3-3)2 + (4-3)2 + (5-3)2) / 5) = √2
    SDy = √((1-3)2 + (3-3)2 + (2-3)2 + (5-3)2 + (4-3)2) / 5) = √2
    Now compute the z-scores for both x and y:
    zxi = (xi - x) / SDx  and zyi = (yi - y) / SDy for i = 1, ... , n.
    Then the correlation is the average of the products of the z-scores:
      xi     yi           zxi         zyi        zxi*zyi
    -----+-----+---------+---------+-----------
       1      1    -2/√(2)   -2/√(2)   +4/2 = 2
       2      3    -1/√(2)       0            0   = 0
       3       2       0         -1/√(2)       0   = 0
       4      5    +1/√(2) +2/√(2)   +2/2 = 1
       5      4    +2/√(2) +1/√(2)   +2/2 = 1  
    The average of the products is (2 + 0 + 0 + 1 + 1) / 5 = 0.8.
    Verify your calculation with R:
    > x <- 1:5
    [1] 1 2 3 4 5
    > y <- c(1, 3, 2, 5, 4)
    [1] 1 3 2 5 4
    > cor(x, y)
    [1] 0.8 
    
  3. In the Correlation document, look at the scatterplots of bivariate datasets with various correlations.
  4. Estimate the correlation r in these situations. The correct answer is marked with *:
    1. Height of father, height of son.
      i. -0.30    ii. 0.05    *iii. 0.70    iv. 0.99
    2. IQ of husband, IQ of wife.
      i. -0.70    ii. 0.00    *iii. 0.60    iv. 1.00
    3. Height of husband, height of wife if men always married women that were exactly 6 inches shorter.
      i. -0.60    ii. 0.60    iii. 0.99    *iv. 1.00
    4. Weight of husband, weight of wife if men always married women that weighed 70% of their weight.
      i. 0.00    ii. 0.50    iii. 0.70    (iv. 1.00
  5. Match the correlation to the dataset:
    1. GPA in freshman year, GPA in sophomore year.  Answer: 0.70
    2. GPA in freshman year, GPA in senior year.  Answer: 0.30
    3. Length and weight of 2 by 4 boards.  Answer: 0.99
    -0.50   0.005   0.30   0.70   0.99
  6. What would happen to the correlation r if
    1. x were replaced by x + 10.
    2. y were replaced by 2 * y + 8.
    3. x and y were interchanged.
    Answer: in all three cases, the correlation would remain the same.
  7. How large must r be to be considered meaningful?
    Answer: See the table in the Correlation document.
  8. Why is the computed value of r the same whether the SD or the SD+ is used for the x and y standard deviations?
  9. Use R to compute the pairwise correlations of the variables in the Nielsen Dataset. The rows of this dataset are the ratings for various television shows. Interpret the correlations.

Linear Correlation Cautions

Linear Regression

Project 3

The Regression Fallacy

Additional Regression Problem