CSC 239: Personal Computing for Science

First Assignment (Due 11:30pm Monday 19 September 2005)

Reading

Read all lecture notes and slides from the first full week of class.

Assignment

Forbes magazine lists the world's richest people. The data in the file richest.txt are taken from Forbes magazine's 2004 list of the world's billionaires. The data set contains 6 columns:
  1. Rank
  2. Name
  3. Age
  4. Wealth in billions
  5. Citizenship
  6. Residence
For this assignment, you will focus age and wealth.

Download the file richest.txt into your work directory. (Click on "File" in Internet Explorer and select "Save As..." – Then save the file in your work directory as a text file. Alternately, you may right-click the hyperlink and choose "Save Target As...")

  1. Import the data set into Excel, using the Data wizard. (Click on "Data", select "Import External Data", then select "Import Data" and import the text file richest.txt. A "Text Import Wizard" window will open up. Click on "Next", check the "Space" option, then click "Next" and "Finish". When the "Import Data" window opens up, click on "OK". You should now have the data in two columns. Mileage may vary, depending on your version of Excel. (For a challenge in newer versions of Excel, use Data|Import External Data|New Web Query to import the data directly from the web. If you do this, you will also need to use Data|Text to Columns... to get the data into two columns.)
  2. Prepare the table for analyzing just Age and Worth by doing the following:
    1. Label the Age and Worth columns.
    2. Delete the other columns
    3. Delete the rows where Age is not specified.
  3. Compute the descriptive statistics using the "descriptive statistics tool" for the observed variables age and worth.
  4. Compute the five-number summary (min, Q1, median, Q3, max) for the observed variables age and worth.
  5. Plot the histogram for the observations on age – (with bin intervals of size 7 starting at 26 and going to 96)
  6. Plot the histogram for the observations on worth; (what bins will you use?)
  7. Answer the following questions based on your Excel output:
    1. Looking at the histograms, is the distribution of the age of the billionaires symmetric, clearly skewed or neither? What about the distribution of their worth?
    2. Based on the shape of the distribution of Age, would you report the mean or the median? Why?
    3. Based on the shape of the distribution of Worth, which statistics would you use to describe the center and the spread of the data on Worth? Why?

Additional challenges

  1. Excel always creates a bin called "More" when computing histograms. If you choose your bins intelligently, this bin will always be empty. Unfortunately, it also gets included in your histogram chart. Modify your histogram chart so that the more bin is not included.
  2. Note that if you chose your first bin value as less than the smallest value in your data, the first bin is empty. This is because Excel counts your first bin value as the right endpoint of a bin containing everything less than your first bin value. Modify your chart to remove this empty bin also.
  3. Record a macro of you modifying the histogram for age to eliminate space between the bars. (Make sure you select the chart before you begin running the macro.) Run your macro on the histogram for worth. (Again, make sure you select the worth chart before you run the macro.)

Deliverables

Submit your excel workbook using Course On-Line. Make sure that the answers to questions 7.1 through 7.3 are included as text entries in your workbook.