To Notes

IT 403 -- Apr 15, 2016

Review Exercises

  1. Approximately, what percentage of observations are in the interval [140, 450) for this density histogram?
    0.005 +
          |
    0.004 +           +--------+
          |           |        |
    0.003 +           |        |
          |           |        |
    0.002 +     +-----+        |
          |     |     |        |
    0.001 +     |     |        +-----------+
          |     |     |        |           |
    0.000 +-----+-----+-----+--+--+-----+--+--+
          0    100   200   300   400   500   600   
    Answer: Recall that the percent of observations in each interval is the area of the rectangle over that interval, which is computed as
    area = base * height = b * h.
    Let area [l, r] be the area of the histogram bar over the interval [l, r].
         area [140, 450] = area[140, 200] + area[200, 350] + area[350, 550].
    area[200, 350] is the entire rectangle so the percent of observations in this rectangle is
         area = base * height = (350 - 200) * 0.004 = 150 * 0.004 = 0.6 = 60%.
    However, the interval [140, 200] only uses part of the rectangle over [100, 200].
         We set up an equation to compute the proportion of the rectangle area over [140, 200]:
         area = base * height = (200 - 140) * 0.002 = 12%.
    The area of the third rectangle over the interval [350, 450] is
         area = base * height = (450 - 350) * 0.001 = 0.1 = 10%.
    The total area over [140, 450] is 0.12 + 0.6 + 0.1 = 0.82 = 82%.
  2. What is the label for the vertical axis of the histogram in Exercise 1?
    Ans: Fraction of observations per horizontal unit. The label for the horizontal axis is horizontal units.
  3. Compute the mean and median for this density histogram. To compute the mean, use weighted average. The weights are the areas of each histogram bar and the observations are the midpoints of the bars
    0.6  +
         |
    0.5  +                 +-------+
         |                 |       |
    0.4  +                 |       |
         |                 |       |
    0.3  +         +-------+       |
         |         |       |       |
    0.2  +         |       |       |
         |         |       |       |
    0.1  + +-------+       |       +-------+
         | |       |       |       |       |
    0.0  + +-------+-------+-------+-------+
           0       1       2       3       4
    
    Answer: the third histogram bar contains the median. (1-0)*0.1 + (2-1)*0.3 = 0.4 of the observations are to the left of the bin from 2; (1-0)*0.1 + (2-1)*0.3 + (3-2)*0.5 = 0.9 of the observations are to the left of 3. Therefore, if m is the median,
    (0.5-0.4)/(0.9-0.4) = (m-2)/(3-2)
    0.1/0.5 = (m-2)/1
    0.2 = m - 2
    m = 2.2

    To compute the mean, we use the weighted average:
    x1 w1 + ... + xn wn    0.5*0.1 + 1.5*0.3 + 2.5*0.5 + 3.5*0.1 
    -------------------- = -------------------------------------
       w1 + ... + wn               0.1 + 0.3 + 0.5 + 0.1
       
    0.05 + 0.45 + 1.25 + 0.35
    ------------------------- = 2.1
               1.0
    
  4. What does a parsimonious description of a histogram mean?
    Answer: it means the histogram can be described succintly or with a very few descriptors.  In the case of a normal histogram, it can be described with only two descriptors, the sample mean and the sample standard deviation.
  5. What does it mean to say that the normal distribution is ubiquitous in statistics? Ans: It means to say that the normal distribution shows up everywhere.
  6. What is the ideal measurement model?
    Ans: It says that
    actual measurement = true measurement + random error
    The true measurement could be a prototype weight that does not change.  Or in the case of measuring the weights of potatos, the true measurement could be the "ideal potato" that nature is trying to produce. The random errors for the potatos are just random variations to do genetics and the environment in which the potato grows.
  7. What are the official definitions of the meter, second, and kilogram?
    Ans: See the Ideal Measurement Model document.

The Standard Deviation

z-scores

Project 2

The Normal Distribution