IT 223: Data Analysis

Assignment 6 Solutions

 

Proportions

 

These problems match a binary setting, where you have a set of independent yes/no outcomes. In this case, each error will have a caught or not-caught outcome. There are 50 outcomes (n=50) and the probability of catching an error (p=.8) is 80%.

 

  1. Mean = n p = 50 * .80 = 40

SD = sqrt(n p (1 - p)) = sqrt(50 * .8 * .2) = 2.83

 

  1. 45 on the standard scale is (45 - 40)/2.83 = 1.77

The cumulative proportion of 1.77 is .962 so the probability of finding more than 45 is 1 - .962, which is .038 (or 3.8%).

 

  1. Mean = p = .80

SD = sqrt(p (1 - p) / n)) = sqrt(.8 * .2 / n) = .0566

 

  1. .7 on the standard scale is (.7 - .8) / .0566 = -1.77

The cumulative proportion for -1.77 is .038. The probability of getting more than this score is 1 - .038, which is .962 (or 96.2%).

 

Dice Rolling Revisited

 

  1. The mean in the sample of 50 should be the mean of rolling one die as a random variable. This is the average of all six rolls (since each is equally probable): 3.5

 

2.      The variance of one roll can be calculated in one of two ways:

·         1/6 (1 - 3.5)2 + 1/6 (2 - 3.5)2 + 1/6 (3 - 3.5)2 + 1/6 (4 - 3.5)2 + 1/6 (5 - 3.5)2 + 1/6 (6 - 3.5)2

·         Taking the population variance (VARP in Excel) of 1, 2, 3, 4, 5, 6

Either method produces a variance of 2.92 The standard deviation is the square-root of the variance, which is 1.71.

The standard error for the sample of 20 is SD of one roll divided by sqrt(n):

1.71 / sqrt(50) = .242

  1. 68% of the dice averages from a sample of 50 should be within one standard error of the expected mean. This range is 3.5 +/- .242, or the range from 3.258 through 3.742.

 

  1. For my run, I found that 724 of the 1000 averages were within this range or 72.4%.