To Notes

IT 223 -- May 4 , 2026

Review Exercises

  1. If X is a random variable with expected value E(X), variance Var(X), and standard deviation σX, use the formulas that you know to obtain the following:
          E(S)   Var(S)   σS    zS
          E(X)   Var(X)   σX    zX
    Assume that S = x1 + x2 + ... + xn and X, where the xi are all independent.
    Answer:
          E(S) = E(X1)
          Var(S) = n * Var(X1) if X1, ... , Xn are independent.
          σS = √n * σX1 if X1, ... , Xn are independent
          z = (S - E(S) ) / σS
          E(X) = E(X1)
          Var(X) = E(X1) / n, if X1, ... , Xn are independent.
          σX = σX1 / √n
          z = (X - E(X)) / σX
  2. What is a confidence interval?
    Answer: a 95% confidence interval is expected to contain the unknown parameter, for example μ, 95% of the time if the expected value of the sample mean is equal to μ.
  3. What is a statistical test?
    Answer: it is a test to see if a sample statistic (e.g., the sample mean x) is equal to a population parameter (e.g., the population mean). We write this as H0: x = μ. Here are the five steps of a z- or t-test:
    1. Write down the null hypothesis and the alternative hypothesis.
    2. Compute the value of the test statistic z or t.
    3. Find a 95% (or 99%) confidence interval I for z or t using the z- or t-tables.
    4. If z or t ∈ I, accept H0; if z or t ∉ I, reject H0.
    5. Compute the p-value that, which is the probability of obtaining a test statistic as extreme or more extreme as the value actually obtained, given that the null hypothesis is true.
  4. For a standard normal z-score, verify these probabilities using R:
    z P(-1 ≤ z ≤ 1)
    (-1, 1)         0.68
    (-2, 2)         0.95
    (-3, 3)         0.997
    Answer:
    > pnorm(1) - pnorm(-1)
    [1] 0.6826895
    > pnorm(2) - pnorm(-2)
    [1] 0.9544997
    > pnorm(3) - pnorm(-3)
    [1] 0.9973002
    
  5. Use the R function qnorm to verify that a 95% confidence interval for standard normal z-score is [-1.96, 1.96]. Also verify that a 99% confidence interval for such a z-score is [-2.58, 2.58].
    Answer: for a 95% confidence interval, the area in the two tails is 100% - 95% = 5% and the area of one tail is 5% / 2 = 2.5% = 0.025.  Using R:
    > qnorm(0.025)
    [1] -1.959964
    
    For a 99% confidence interval, the area of the two tails is 100% - 99% = 1% and area in one tail is 1% / 2 = 0.5% = 0.005. Using R:
    > qnorm(0.005)
    [1] -2.575829
    
  6. How is the t-test different than the z-test?
    Answer: for a z-test, n ≥ 30, so even though the random variable x is not normally distributed, the average x is approximately normally distributed by the Central Limit Theorem. Thus we can use the normal table to find confidence intervals for z-tests.

    For a t-test, we do not assume that n > 30, so for a good result, the original data should be approximately normsl. Create a normal plot of the data to check this. Since n is small, the sample standard deviation sx may not be a good approximation of the population standard deviation σx. Therefore we need a wider confidence interval to account for this extra uncertainty. For example, a 95% confidence interval for the z-test is (-1.96, 1.96), but a 95% confidence interval for a t-test with n = 5 is wider: (-2.78, 2.78).
  7. What are degrees of freedom? Why are they important for the t-test?
    Answer: when computing the sample standard deviation, the sum of the deviations xi is always zero. These lines show why:
    (x1 - x) + (x2 - x) + ... + (xn - x) =
    (x1 + x2 + ... + xn) - (x + x + ... + x) = nx - nx = 0
    Because the sum of the deviations is zero, once n - 1 deviations have been computed, the nth deviation has been already determined. This is why we say that there are only n - 1 degrees of freedom for the deviations.
  8. How is the significance level for a test related to the confidence interval for the test statistic?
    Answer: if a confidence interval for a test is 0.95, the test's significance level is 1 - 0.95 = 0.05. In general, if the significance level of a test is α, the significance level of the test is 1 - α.
  9. Who invented the t-test?
    Answer: William Gosset (1876 - 1937) invented the t-test to control the quality when brewing beer at the Guinness Company outside of Dublin Ireland.  Guinness did not allow employees to publish their research because it was proprietary. Gosset published his research about the t-test using the pen name Student. Even today, the t-test is often called the Student's t-test.
  10. Use R to perform the t-test for the CO Concentration Example from Mar 2. Answer:
    > t.test(conc, mu=70)
    
            One Sample t-test
    
    data:  conc
    t = 2.16, df = 4, p-value = 0.09689
    alternative hypothesis: true mean is not equal to 70
    95 percent confidence interval:
     67.774 87.826
    sample estimates:
    mean of x 
         77.8 
    
    Conclusion: because p = 0.09689 ≥ 0.05, reject the null hypothesis. There is not enough evidence to reject the null hypothesis.
    Additionally,
    1. Use the t-table to compute the 0.95 confidence interval for the t statistic.
      Answer: in the t-table, look in the Degrees of Freedom row 5 - 1 = 4 and the Upper Tail Probability column 0.025. The entry is 2.776.
    2. Verify the 95% confidence interval for x in the R t-test output by solving this inequality:
            -t0.025,df ≤ t ≤ t0.025,df
      where
            t = (x - μ) / (sx / √n)
      Answer: n = 5,   x = 77.8,  sx = 8.074652, so
      t = (x - μ) / (sdx / √n) = (77.8 - μ) / (8.074652 / √5) = (77.8 - μ) / 3.611094
      Therefore the confidence interval is
      -2.776 ≤ t ≤ 2.776
      -2.776 ≤ (77.8 - μ) / 3.611094 ≤ 2.776
      -2.776 * 3.611094 ≤ 77.8 - μ ≤ 2.776 * 3.611094
      -2.776 * 3.611094 - 77.8 ≤ -μ ≤ 2.776 * 3.611094 - 77.8
      -87.83 ≤ -μ ≤ -67.77
      67.77 ≤ μ ≤ 87.83
      so (67.77, 87.83) is the 95% confidence interval for μ. This matches the confidence interval output by the t.test function.
  11. Perform a t-test to determine if student test scores in a class differ from the national average of 75. These are the test scores:
     78  82  71  75  80  72  79  74  77  76
    Perform the calculations "by hand" using R. Then verify your work using the R t.test function. Also use the t.test function at the 0.99 confidence level.

Practice Problems for t-tests

  1. In 1998, as an adverting campaign, a cookie company claimed that every 18-ounce bag contained an an average of 1000 chocolate chips. Students at the Air Force Academy in Colorado Springs bought some randomly selected bags of cookies and counted the chocolate chips. Here is the dataset
    1219 1214 1087 1200 1419 1121 1235 1345
    1244 1258 1356 1132 1191 1270 1295 1135
    
    1. Form the normal plot of the chocolate chip counts. Are the counts normally distributed?
    2. Perform a 99%-level test of hypothesis that the average chocolate chip count is 1000 per bag.
  2. Psychology experiments involve testing the ability of rate to navigate mazes. The mazes are classified according to difficulty, as measured by the average length of time it takes rats to find the food at the end. One researcher needs a maze that will take rats an average of about one minute to solve. He tests one maze on several rats, collecting this data
    38.4 57.6 46.2 55.5 62.5 49.5 38.0 
    40.9 62.8 44.3 33.9 93.8 50.4 47.9
    35.0 69.2 52.8 46.2 60.1 56.3 55.1	
    
    1. Form the normal plot and box plot of the maze times. Are the times normally distributed?
    2. Test the hypothesis that the true completion time is one minute.
    3. Eliminate the outlier and perform the hypothesis test again.
  3. A food company marks the net weight of their potato chip bags as 28.3 grams. To test whether this claim is true, students collect and measure the net weights of bags. Here is the dataset
    29.3  28.2  29.1  28.7  28.9  28.5.
    
    1. Form the normal plot of the new weights of the potato chip bags. Are the weights normally distributed?
    2. Test the hypothesis that the true weight of potato chip bags is 28.3 grams.

Normal Approximation for the Binomial Distribution.

The Paired Sample t-test

The Independent Two-Sample t-test

We will discuss this section again on May 9.