CSC 239: Personal Computing for Science

Second Assignment (Due 11:30pm Monday 26 September 2005)

Assignment

For this assignment, you will use the same data set that you used in the last assignment (Forbes magazine list of billionaires). You may put all the work on a single worksheet. If so, eliminate any unused worksheets. If you choose to use multiple worksheets, make sure that they are clearly named.
  1. Import the data set into Excel, using the Data wizard.
  2. Label the columns using the labels Age and Worth, respectively.
  3. Compute the descriptive statistics using the “descriptive statistics tool” for the observed variables age. (You can use the output from your first homework).
  4. Compute the five-number summary (min, Q1, median, Q3, max) for the observed variables age. (You can use the output from your first homework).
  5. Use the normal distribution to answer the following questions.
    1. Construct the table for computing normal probabilities and percentiles similar to the one demonstrated in class and performed in the lab.
    2. What is the percentage of billionaires who are older than 75 years?
    3. What is the percentage of billionaires who are between 35 and 50 years of age?
    4. Estimate the first and third quartile for the age values using the normal approximation. Compare these approximate values with the actual values of the first and third quartiles computed in the 5-number summary.
    5. Do you believe that the normal density is a good approximation of the distribution of the billionaires’ ages?
    6. What is the percentage of billionaires who are younger than 60 years?
    7. How old must a billionaire be to belong to the bottom 5% of the youngest billionaires?

Additional challenge

This one is difficult, but will provide you a very nice visualization of how your data compares to the normal curve.

Create a custom chart that shows the histogram of ages and the normal distribution. Follow these steps:
  1. Calculate a histogram using a bin size of 1. Your bins will thus start at 29 (the minimum age) and go to 96 (the maximum age). Don't enter them all by hand - enter 29 in one cell and then 30 below it. Highlight the two cells and move the mouse to the lower right corner of the two cells until it becomes a fine black cross. Then click and drag the mouse down the column. You will see a small number indicating how far your sequence is going. When you get to 96, stop.
  2. Use the histogram tool to create frequency counts for each age.
  3. Calculate percentages for the histogram, as we did for the yellow alert in homework 1.
  4. Calculate the normal distribution for the ages. To do this, use the NORMDIST function and the mean and standard deviation calculated in the descriptive statistics for age.
  5. Use the Line-Column custom chart in the chart wizard to plot both the percentages and the normal distribution on the same graph. (Eliminate the gaps between the bars as we did before.)

Your graph should look something like this:

Graph with histogram and normal curve

Deliverables

Submit your excel workbook using Course On-Line. Make sure that the answers to questions 5.2 through 5.7 are included in your workbook.