To Notes

IT 223 -- May 13, 2024

Review Exercises

  1. What is the regression fallacy?
    Ans: In a pretest/posttest situation, the regression fallacy is the mistaken notion that if someone does well on the pretest, he or she should do equally well on the posttest; if that person does poorly on the pretest, he or she should do equally poorly on the posttest. In fact, unless the pretest and posttest scores are perfectly correlated (r = 1), someone that obtains a pretest score k SDxs above x, will, on the average, obtain a posttest score r × k SDys above the average. In other words, the posttest score will be lower than the pretest score, on the average. The situation is the opposite if the person obtains a below average score on the pretest. If the pretest score is k SDxs below x, then, on the average, the posttest score will be r × k SDys below the average. In other words, the posttest score will be higher than the pretest score, on the average.
  2. Use this R script to analyze the regression model for predicting weights of Chicago Bears players from their heights: bears.R. Place this script and the data file bears-2024-roster.txt into the folder c:/it223/bears. Delete the top comment line from the data file. Then enter these lines in R to run the bears.R script:
    > # Set working directory.
    > setwd("c:/it223/bears"
    >
    > # Run the script.
    > source("bears.R")
    
  3. What is the RMSE for a regression model?
    Ans: The root mean squared error (RMSE is the standard deviation of the residuals. In particular, RMSE is the SD of the residuals in a thin rectangle that contains a specific x value. RMSE is defined as SD+y √(1 - r2).
  4. A law school finds this relationship between LSAT scores (independent variable x) and first-year scores (dependent variable y). The data are bivariate normal. Here are the summary variables:
           x = 162    SDx = 6
           y = 68    SDy = 10
           r = 0.6
    1. About what percentage of the students have first-year scores over 75? Because the data are bivariate normal, the first-year scores are normally distributed, so we can use the normal table.
      Answer: z = (y - y) / SDy = (75 - 68) / 10 = 0.7. The area of the bin (-∞, 0.7] is 0.7580 = 76%. Therefore the percentage of students with first year score over 75 is 100% - 76% = 24%.
    2. Of those students who scored 165 on the LSAT, about what percentage have first-year scores over 75?  Visualize these scores as lying in a thin vertical rectangle centered at LSAT = 165. The observations in the thin vertical rectangle centered over x=165 are normally distributed.
      Answer: The regression equation is
            y - 68 = (0.6 * 10 / 6) (x - 162)
            y - 68 = 1 (x - 162)
            y = x - 94
      so the predicted value for the students in the thin vertical rectangle centered at x = 165 is y = x - 94 = 165 - 94 = 71.

      The RMSE for those students in the thin vertical rectangle centered at x = 165 is
            RMSE = √1 - r2 SDy = √1 - 0.62 * 10 = 0.8 * 10 = 8
      Then z = (y - y^) / RMSE = (75 - 71) / 8 = 0.5. The area of the bin (-∞, 0.5] is 0.6915 = 69%. Therefore of the students having LSAT score equal to 165, the percentage of students having first year score over 75 is 100% - 69% = 31%.

Learning Outcomes for this Week

Probability

Practice Problems

  1. What is wrong with this argument? Either the Bears will win the Super Bowl in 2024 or they won't. Therefore the probability that the Cubs will win the World Series is 50%.
    Ans: Just because there are two outcomes doesn't mean they are equally likely. Some outcomes are very likely; they have probabilities close to 1. Some outcomes are unlikely; they have outcomes close to 0. Other outcomes have probabilities close to 0.5. In this case it does not make sense to use the a priori probabilities of 0.5 for winning the superbowl or not winning it in 2024.
  2. What is wrong with this strategy? Double down after after each loss. Eventually you win and recoup your losses. For example:
    -1 - 2 - 4 - 8 - 16 + 32 = 1.
    Now start over with 1 and repeat the double down strategy.
    Ans: The problem with this strategy is that, eventually, you will either reach the casino betting limit or you will run out of money.
  3. A bookmaker offers 20 to 1 odds that the Bears will win the Super Bowl in 2024. If this is a fair bet, what is the probability p that the Cubs will win the World Series?
    Ans: The expected amount that you win is 20p + (-1)(1 - p); this expression is 0 because we are assuming that is a fair bet. Now solve for p:
         20p + (-1)(1 - p) = 0
         20p - 1 + p = 0
         21p = 1
         p = 1 / 21 = 0.0476 = 4.8%.

Random Variables