To Lecture Notes

CSC 423 -- 7/24/17

Review Exercises

  1. What does a 99% confidence interval for the population mean tell you? (Also see Problems 2 and 6.) Find a 99% confidence interval for these IQ scores:
     

    Ans: No matter how many times you construct a confidence interval for the population mean by choosing a random sample, the population mean μ is fixed. It is the confidence interval that will change from sample to sample, and if the population is normally distributed, the 99% confidence interval will contain the true population mean μ 99% of the time.
     
    To construct the 99% confidence interval for the true measurement μ you need the sample mean x and the standard error for the mean SEmean (computed from the sample standard deviation and the sample size). Because n < 30, we use the t-distribution instead of the normal distribution for constructing the confidence interval. (My advice is to always use the t-distribution, even when n ≥ 30.)
     

    Look up a 99% confidence interval (alpha = 0.005) in the t-table in these Statistical Tables using n - 1 = 3 degrees of freedom: (-3.182, 3.182). Then
     

    so the 99% confidence interval for μ is (96.71, 114.79).
     
    You can also obtain the t-table value from R like this:
    and from SAS in a data step like this:
  2. Use SAS and R to obtain a 99% confidence interval for the population mean of the dataset in Exercise 1. Ans:
     
     
  3. Which statistics can the SAS proc means compute? Which R functions compute these same statistics?
    Ans: Recall that the SAS proc means statement without any options computes the simple descriptive statistics sample size, mean, standard deviation, minimum, and maximum. However, proc means can also compute many other statistics if requested as options. Here are the proc means options:
    SAS proc means
    option
    Meaning R Function
    n Sample size length
    mean Sample mean mean
    std Sample standard deviation sd
    stderr Standard error of the mean  
    min Minimum value of the sample min
    max Maximum value of the sample max
    skewness Measures the skewness of the sample;
    positive value means skewed to the right;
    negative value means skewed to the left
    skewness,
    moments
    package
    kurtosis Measures the thickness or thiness of the
    tails of the sample; positive value means
    thick tails relative to a normal distribution;
    negative value means thin tails relative
    to a normal distribution
    kurtosis,
    moments
    package
    lclm Lower confidence limit for the mean t.test
    uclm Upper confidence limit for the mean t.test
    clm Both lclm and uclm t.test
    p1 p5 p10 p25 p50
    p75 p90 p95 p99
    These precentiles quantile
    q1 25th percentile quantile
    q3 75th percentile quantile
    qrange Interquartile range (q3 - q1) IQR
    t Test statistic for one-sample
    t-test with H0: μ=0
    t.test
    prt Two sided p-value for one-sample
     t-test with H0: μ=0
    t.test

    To obtain confidence intervals (clm, lclm, and/or uclm) other than 95%, specify value of alpha.
     
  4. What are the assumptions for a one sample z-test? for the one-sample t-test?
     
  5. What the definition of the p-value for a statistical test.
     
  6. To test whether eating fish makes increases intelligence, a researcher selects a random sample of four persons and puts them on a fish-rich diet for one year. At the end of the year, she gives each subject an intellegence test. Here are the results:
     

    1. Test the null hypotheses that the eating fish does not make a difference using a t-test. Ans:
        Step 1: State the null and alternative hypotheses: H0: μ=100, H1: μ≠100
        Step 2: Compute the test statistic: t = (x - μ) / SEmean = (105.75 - 100) / (3.096 / √4) = 3.715
        Step 3: Find a 95% confidence interval using the t-table with n - 1 = 4 - 1 = 3 degrees of freedom: I = [-3.182, 3.182].
        Step 4: 3.715 ∉ [-3.182, 3.182], so reject H0.
        Step 5: Use SAS or R to find the p=value: 0.034. Since p < 0.05, this confirms that H0 should be rejected.

    2. Suggest a way to improve the design of the experiment?

    3. Ans: (1) choose a larger sample size, (2) use a treatment group that eats fish and a control group that does not, then use an independent-sample t-test.

  7. Explain the difference between the paired-sample t-test and the independent two-sample t-test.
    Ans: With the paired-sample t-test there is a natural pairing between the observations in Group A with the observations in Group B. Thid pairing may reduce the variability and increase the chances of rejecting the null hypothesis or of obtaining a low p-value.
     
  8. For each of these scenerios, which t-test would you use? Ans:
    1. Independent
    2. Paired, pair each subject in the study with his or her twin;
    3. You could assign whole houses to paint brands (independent) or randomly choose exterior walls of each house to paint (paired)
    4. Paired, pair up the men and the women and let both subjects in the pair drive the same car
    5. Paired, let each tester in the study evaluate both websites
    6. Pair up the paper measurements by measurer.

  9. To see how R can get confused when trying to read from a UTF-8 data file:
     

    There are two ways to correct this problem:
     

 

Some Arguments for R Plots

 

Examples of Correlation

 

Covariance and Correlation

 

Simple Linear Regression

 

Project 2

 

Least Squares Estimators