IT 223: Data Analysis
Solutions to Assignment 5

General chance problems

  1. Counting the possibilities is one approach, but it's easier to calculate the likelihood that no 3's be drawn for all four draws and then subtract that chance from 1 to find out the chance that at least one 3 will be drawn. The chance of not drawing a 3 on one draw is 3/5. We can use the multiplication rule to figure the chance of not drawing a 3 on four successive draws: 3/5 * 3/5 * 3/5 * 3/5, which equals .1296. The chance of drawing at least one 3 is then 1 - .1296 or .8704 (87.04%).
  2. Random variable calculations:

a)      Expected value of one draw: .2 * 1 + .4 * 2 + .4 * 3 + .2 = 2.2 (alternatively taking the average of 1, 2, 2, 3, 3 also gives you the answer).

b)      Variance of drawing: .2 * (1 - 2.2)2 + .4 * (2 - 2.2)2 + .4 * (3 - 2.2)2 = .56 (note: taking the square-root of this answer produces the standard deviation). Alternatively, you can apply VARP to 1, 2, 2, 3, 3 to get the same answer.

c)      Adding a constant to a random variable increases the mean by the same amount. So, the new mean is 2.2 + 2 = 4.2.

d)      Adding a constant to a random variable does not change its variance. So, the variance is still .56.

e)      The mean of adding two random variables is the same as adding the means of the random variables. So, the answer is 2.2 + 2.2 = 4.4.

f)       The spread of adding two random variables is greater than the spread of just one of the random variables. More precisely, the variance of adding the random variables is equal to adding their variances. In this case: .56 + .56 = 1.12

g)      The standard deviation is the square-root of the variance. In this case, it’s the square-root of 1.12 or approximately 1.06.

  1. In absolute terms, more tosses are likely to increase the absolute count away from 50% heads. However, percentage-wise, more tosses should produce a percentage of heads closer to 50%.

a)      Unlikely for both 100 and 1000, but more unlikely for 1000 since 1000 tosses should be closer to 50%. So, you should prefer 100.

b)      Likely for both 100 and 1000, but more likely for 1000. So, you should prefer 1000.

c)      Percentage-wise 1000 rolls should be closer to 50%. So, you should prefer 1000.

d)      It’s unlikely that either 100 or 1000 will produce exactly 50% heads. Moreover, more tosses will increase the distance away from 50% in absolute counts. So, you should prefer 100.

Dice Rolling Simulations

The expected value of rolling a 6-sided die: (1+2+3+4+5+6)/6 = 3.5

The expected sum of rolling 10 6-sided dice: 10 * (1+2+3+4+5+6)/6 = 35

The expected sum of rolling 100 6-sided dice: 100 * 3.5 = 350

For the simulations, your answers will vary. But you should find that the samples of 10 dice produce smaller error (i.e. the absolute value of the difference between predicted value and the actual value) for sums but greater error for averages. Note that the law of large numbers says that the error in relative terms (e.g. as an average or a percent) decreases with larger samples.

The variance of the sum is 50 * the variance of one roll. The variance of one roll can be calculated in one of two ways:

·         1/6 (1 - 3.5)2 + 1/6 (2 - 3.5)2 + 1/6 (3 - 3.5)2 + 1/6 (4 - 3.5)2 + 1/6 (5 - 3.5)2 + 1/6 (6 - 3.5)2

·         Taking the population variance (VARP in Excel) of 1, 2, 3, 4, 5, 6

Either method gives you 2.92. The variance of the sum is then 50 * 2.92 or 146. The standard deviation is then calculated by taking the square-root of the variance to get approximately 12.1.

Typically more trials will produce a mean and standard deviation closer to what is predicted.

The distribution of a 1000 rolls should approximate a normal distribution. The central limit theorem states that the distribution of sums (as well as means) from a random variable produces a normal distribution (for sufficiently large N).