To Documents

Final Exam Practice Problems 1

Multiple Choice Questions

For each question, show your work or give a reason explaining your answer. 4 points for the reason, 1 point for the correct answer.

  1. Roger Cotes was the first to publish a study on
    1. how well IQ scores fit the normal distribution.
    2. the applications of probability to economics.
    3. the theory of errors in astronomy.
    4. the uses of statistics in genetics.
    Ans: b.
  2. Which of these is another name for categorical variables?
      a. Continuous    b. Nominal    c. Ordinal    d. Scale

    Ans: b. Nominal means that the data is not numbers. This is the term used in SPSS.
  3. If Q1 = 1,230 and Q3 = 5,238, using the boxplot, the observation at 11,389 is
      a. a mild outlier   b. an extreme outlier   c. below the 75th percentile   d. the median 

    Ans: a. IQR = 4,098. The inner fence to the right is at Q3 + 1.5 * IQR = 11,385. The inner fence to the right is at Q3 + 3.0 * IQR = 17,532. 11,389 is between the inner and outer fences, so it is a mild outlier.
  4. What percentage of IQ scores are over 160, assuming that IQ scores are normally distributed?
      a. 0.32%    b. 0.032%    c. 0.0032%    d. 0.00032%

    Ans: c: 0.0032%.
  5. For the curve shown below, the points shown in red are
      a. asymptotes   b. critical points   c. inflection points   d. outliers

    Ans: c: inflection points.
  6. The residual plot that we use in it223 consists of
    1. residuals plotted vs. normal scores
    2. residuals plotted vs. observation number
    3. residuals plotted vs. predicted values
    4. y-values plotted vs. x-values

    Ans: c. Definition of residual plot.

Short Essay Questions

For full credit, use complete sentences and paragraphs. Your explanation should make sense to someone that does not understand statistics, like your mother.

  1. What is a histogram and in which situations is it useful? What are the tradeoffs of using a histogram with many bins vs. a histogram with few bins?
  2. What is the difference between the mean and the median? In which situations should each be used?

Problems

Show all of your work. You may use a calculator.

  1. Given the following data, draw the box plot.

    Q0 = 0.001     Q1 = 0.035     Q2 = 0.057     Q3 = 0.089     Q4 = 0.311
    Additional outliers are at 0.141, 0.189, 0.217, 0.240. You decide whether they are mild or extreme.
    Ans: See Review Problem 9 of the 7/22 notes.

  2. Given this table of grouped data, do the following:
    Bin Percentage of Observations
    [1,3] 30%
    (3,4] 40%
    (4,5] 20%
    (5,6]  0%
    (6,10] 10%

    1. Draw the histogram.
    2. Compute Q1, Q2, Q3 and IQR.
      Ans: Q1 = 1.167, Q2 = 2.00, Q3 = 4.00, IQR = Q3 - Q1 = 4 - 1.167 = 2.833
    3. Compute the mean using a weighted average.
      Ans: 2.7
    4. What percentage of the observations are greater than 2.5?
      Ans: 77.5%.
  3. Compute the correlation of this dataset:
    x: 1    2    3    4    5
    y: 2    4    7    3    4

    Here are the z-scores:
    zx: -1.414   -0.707   0.000   0.707   1.414
    zy: -1.195   0.000   0.000   -0.423   0.000

    Ans: r = 0.2536

SPSS Analysis

Perform the following analyses with SPSS. Save your output file as a Word .doc file. Type any interpretation of the output into the output file itself.

  1. Input the Excel file tv-gpa.xlsx into SPSS.
  2. Supply labels to the variables as follows:
    Variable Name Label
    Hours Hours spent watching TV per week
    HS_GPA High School GPA
  3. Determine the following for Hours and HS_GPA:
    Q0    Q1    Q2    Q3    Q4    mean    SD+
    Ans: Using Tukey's Hinges for Percentiles
    Hours:    Q0=1.9  Q1=2.5  Q2=2.9  Q3=3.3  Q4=3.7  mean=2.871  SD+=0.5130
    HS_GPA: Q0=2  Q1=5  Q2=9  Q3=14  Q4=14  mean=9.71  SD+=5.425
    1. Create the boxplot for Hours and HS_GPA.  Ans: Use SPSS.
    2. Determine the correlation between Hours and HS_GPA. Ans: -0.626
    3. Which is the independent variable? which is the dependent variable? Ans: Hours, HS_GPA
    4. Compute and interpret the r-squared value. Ans: 0.00392. 
      R2 is the proportion of variation in the dependent variable that can be attributed to the independent variable.
    5. Find the regression equation for predicting the dependent variable from the independent variable.
      Ans: HS_GPA = -0.006 · Hours + 2.931
    6. What is the predicted highschool GPA for someone that watches TV 40 hours per week. Why is this prediction not likely to be very accurate? Ans: 2.691. Because R2 is large.
    7. Create and interpret the residual plot. Use SPSS
    8. Create and interpret the normal plot of the residuals. Use SPSS