To Notes

IT 223 -- June 3, 2024

Review Exercises

  1. If X is a random variable with expected value E(X), variance Var(X), and standard deviation σX, use the formulas that you know to obtain the following:
          E(S)   Var(S)   σS    zS
          E(X)   Var(X)   σX    zX
    Answer:
          E(S) = E(X1)
          Var(S) = n * Var(X1) if X1, ... , Xn are independent.
          σS = √n * σX1 if X1, ... , Xn are independent
          z = (S - E(S) ) / σS
          E(X) = E(X1)
          Var(X) = E(X1) / n, if X1, ... , Xn are independent.
          σX = σX1 / √n
          z = (X - E(X)) / σX
  2. Look at Problem 3a of Project 4.  Which R statement do you use to simulate 1,200 rolls of a fair die?
    Answer: You can simulate these 1,200 rolls using rbinom in two ways:
    > # Method 1: Repeat 1,200 times the 
    > # experiment of rolling a die once
    > rbinom(1200, 1, 1/6)
    
    The only problem with Method 1 is that we need to count how many ones are rolled with the die. Better: use the R the sum function to add up how many ones are obtained:
    > # Better method 1:
    > sum(rbinom(1200, 1, 1/6))
    
    The alternative is to conduct one experiment of flipping 1,200 coins once each:
    > # Method 2
    > rbinom(1, 1200, 1/6)
    
    This method call returns the number of successes (k) obtained in 1,200 (n) trys with probability p of success, where "success" is obtaining a one with a single die roll.
  3. What are the five steps for the z-test? Answer:
    1. State the null (H0) and alternative (H1) hypotheses:
            H0: μ = μ0
            H0: μ ≠ μ0
    2. Compute the test statistic z:
            z = (x - μ) / (SDx/√n)
    3. When can we use the z-test?
      Answer: we can use the t-test when n ≥ 30. Even if the observations are not normally distributed, by the Central Limit Theorem x is close to normally distributed, so the test statistic z also has close to a normal distribution.
    4. Write down confidence interval I for the test statistic. Usually we use the 95% confidence interval [-1.96, 1.96] or the 99% confidence interval [-2.58, 2.58].
    5. If z ∈ I, accept H0; if z ∉ I, reject H0.
    6. Compute the p-value, which is the probability of obtaining a test statistic value z as extreme or more extreme than the value of z actually, obtained, given that the null hypothesis is true.
  4. Use the R function qnorm to verify that a 95% confidence interval for normally distributed data is [-1.96, 1.96].  Also verify that a 99% confidence interval confidence interval is [-2.58, 2.58].
    Answer: for a 95% confidence interval, the area in the two tails is 100% - 95% = 5% and the area of one tail is 5% / 2 = 2.5% = 0.025.  Using R:
    > qnorm(0.025)
    [1] -1.959964
    
    For a 99% confidence interval, the area of the two tails is 100% - 99% = 1% and area in one tail is 1% / 2 = 0.5% = 0.005. Using R:
    > qnorm(0.005)
    [1] -2.575829
    
  5. In 1999, it was reported that the mean serum cholesterol level for female undergraduates was 168 mg/dl. A recent study at Baylor university collected the following data for cholesterol levels for 100 females:
          x = 173.7     SD+ = 27
    Is there a significant difference between the chloresterol levels of the women in the Baylor study and the reported value in 1999? Perform the test at the 5%-level;  at the 1% level.
  6. Claim:if all high school seniors in California took the SAT test, the mean score would be equal to 450. To test this claim, select a random sample of 400 high school seniors and give them the test. Here are the data:
    n = 400     x = 461     SD+ = 100
    Is this result for the sample significantly different from 450 or is it just chance variation? Perform the test at the 99%-level.
    Ans: Here are the steps of the z-test:
    1. H0: μ = 450      H1: μ ≠ 450
    2. z = (x - μ) / SEave = (461 - 450) / (100 / √400) = 2.2.
    3. A 99% confidence interval for z is [-2.58,2.58].
    4. 2.2 ∉ [-2.58, 2.58], so reject the null hypothesis.
    5. The p-value is the probability of obtaining a z-value as extreme or more extreme than the one actually obtained. Find the area corresponding to the bin [-2.2,2.2]: 2 × 0.0139 = 0.0278.
  7. How is the t-test different than the z-test?
  8. Who invented the t-test?
    Answer: William Gosset (1876 - 1937) invented the t-test to control the quality when brewing beer at the Guinness Company outside of Dublin Ireland.  Guinness did not allow employees to publish their research because it was proprietary. Gosset published his research on the t-test under the pen name "Student." Even today, the t-test is often called the Student's t-test.

The t-test

Degrees of Freedom

The Paired Sample t-test

Perform the Pendulum Experiment

Inference for Simple Linear Regression

Tests of Hypotheses for Proportions

One-tailed vs. Two-tailed Tests

  1. At the risk of complicating things, our formulation of the z-test is phrased as a two-tailed test, where
          H0: μ = c      H1 ≠ c For a 5% level test (95% confidence), this means that we reject H0 when the test statistic is not in the confidence interval I = [-1.96, 1.96].
  2. It is also possible to phrase the z-test (and the t-tests we will discuss later) as one tailed tests, for which the null and alternative hypotheses are
          H0: μ = c      H1: μ > c
    or
          H0: μ = c       H1: μ < c
    In the case of H1: μ > c, we reject H0 when the test statistic is not in the interval I = (-∞, 1.645].
  3. Some researchers think that the one-tailed test is an improvement over the two-tailed test because the test of hypothesis is more precise.
  4. However, other researchers think that the one-tailed test is cheating, because it makes rejecting the null hypothesis easier.