To Notes

IT 223 -- Feb 4, 2026

Review Exercises

  1. Find the expected normal scores for a dataset with n = 9.
    Answer: find 9 z-values that divide the normal curve into 10 equal areas. To do this, use the standard normal tables to look up the z-scores that most closely correspond to these quantiles:
    0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9
    
    These z-scores are
    -1.28 -0.84 -0.52 -0.25 -0.00 0.25 0.52 0.84 1.28
    
  2. Use R to obtain the same z-scores (more accurately):
    > qnorm(seq(0.1, 0.9, 0.1))
    [1] -1.2815516 -0.8416212 -0.5244005 -0.2533471  0.0000000 
    [6]  0.2533471  0.5244005  0.8416212  1.2815516
    
  3. What does the correlation tell you about the relationship between two data vectors x and y?
    Answer: the correlation of x and y tells you the amount of linear association between x and y.
  4. Compute the correlation of x and y by hand:
    Obs: 1 2 3 4 5
    x: 1 2 3 4 5
    y: 1 3 2 5 4

    Verify your answer using R.
    Answer:
    x = (1 + 2 + 3 + 4 + 5) / 5 = 3,
    x = (1 + 3 + 2 + 5 + 4) / 5 = 3,
    SDx = √((1-3)2 + (2-3)2 + (3-3)2 + (4-3)2 + (5-3)2) / 5) = √2
    SDy = √((1-3)2 + (3-3)2 + (2-3)2 + (5-3)2 + (4-3)2) / 5) = √2
    Now compute the z-scores for both x and y:
    zxi = (xi - x) / SDx  and zyi = (yi - y) / SDy for i = 1, ... , n.
    Then the correlation is the average of the products of the z-scores:
      xi      yi        zxi            zyi          zxi*zyi
    -----+-----+---------+---------+-----------
       1      1    -2/√2       -2/√(2)    +4/2  = 2
       2      3    -1/√2           0             0    = 0
       3      2        0          -1/√(2)       0    = 0
       4      5    +1/√2      +2/√(2)    +2/2 = 1
       5      4    +2/√2      +1/√(2)    +2/2 = 1  
    The average of the products is (2 + 0 + 0 + 1 + 1) / 5 = 0.8.
    Verify your calculation with R:
    > x <- 1:5
    > x
    [1] 1 2 3 4 5
    > y <- c(1, 3, 2, 5, 4)
    > y
    [1] 1 3 2 5 4
    > cor(x, y)
    [1] 0.8 
    
  5. Estimate the correlation r in these situations.
    1. Height of father, height of son.
      i. -0.30    ii. 0.05    iii. 0.70    iv. 0.99
          Answer: 0.70
    2. IQ of husband, IQ of wife.
      i. -0.70    ii. 0.00    iii. 0.60    iv. 1.00
          Answer: 0.60
    3. Height of husband, height of wife if men always married women that were exactly 6 inches shorter.
      i. -0.60    ii. 0.60    iii. 0.99    iv. 1.00
          Answer: 1.00
    4. Weight of husband, weight of wife if men always married women that weighed 70% of their weight.
      i. 0.00    ii. 0.50    iii. 0.70    iv. 1.00
           Answer: 1.00
  6. Match the correlation to the dataset:
    1. GPA in freshman year, GPA in sophomore year.
      i. 0.00    ii. 0.30    iii. 0.70    iv. 1.00
         Answer: 0.70
    2. GPA in freshman year, GPA in senior year.
      i. 0.00    ii. 0.30    iii. 0.70    iv. 1.00
         Answer: 0.30
    3. Length and weight of 2 by 4 boards.
      i. -0.50   ii. 0.005   iii. 0.30   iv. 0.70   v. 0.99
         Answer: 0.99
  7. What would happen to the correlation r if
    1. x were replaced by x + 10.
    2. y were replaced by 2 * y + 8.
    3. x and y were interchanged.
    Answer: in all three cases, the correlation would remain the same.
  8. How large must r be to be considered meaningful?
    Answer: See the table in the Correlation document.

Linear Regression

Project 3

The Regression Fallacy

Additional Regression Problem