Project 1

CSC 423 -- Project 1

Use the NIST and BloodPressure Examples to help you write your SAS or R code.

Look at the Project Submission Guidelines before submitting any project.

The total points for this project is listed as 75, this raw score will be multiplied by 100/75 to obtain the scaled score.

Part A. Univariate Data Analysis (30 pts.)

The measurements in the Paper dataset in paper1.txt and paper2.txt consist of the paper thicknesses of two colors of paper (white and yellow) that we measured in class with the micrometer. Use this data to answer the following questions. Include source code comments to explain what you are doing.

Read in the data from paper1.txt and print it to verify that everything was input correctly.
Read in the data from paper2.txt and print it to verify that everything was input correctly.
Obtain these univariate statistics separately by color for the paper thicknesses: sample mean, sample standard deviation, sample median, sample IQR, these percentiles: 5, 10, 25, 75, 90, 95. You can use the SAS proc means or proc univariate to compute these statistics. Don't compute them by hand.
Find 95% confidence intervals for the true thickness for each color of paper separately. Show your hand calculations and also show the relevent SAS or R output to verify your calculations.
Create three histograms for thicknesses from the combined colors of paper:
1. Create a histogram using the default setting for the number of bins Run your SAS or R code first without the code in steps 6b or 6c to see what bins are obtained with the default setting.
2. Create a histogram with more bins than the default. In SAS, you can do this with an option on the histogram statement of proc univariate. For example:
  11 is a value that is less than all of the data values and 25 is a value that is (greater than all of the data values. 1.0 is the width of the histogram bins. (You will use different numbers that make sense for your histograms.)
  
  In R, you can set the number of bin boundaries (breaks) like this:
3. Create a histogram with less bins than the default. (See step 6b.)
Create side-by-side boxplots of the thicknesses for the colors white and yellow. Discuss what the boxplots tell you. Are there any outliers? If you are using SAS, use the paper2.txt dataset and sort the dataset by color before plotting the boxplots.

Part B. One-sample t-test (20 pts.)

To investigate the load on its network, a technician records the number of concurrent users at fifty locations (thousands of people):

17.2  22.1  18.5  17.2  18.6  14.8  21.7  15.8  16.3  22.8
24.1  13.3  16.2  17.5  19.0  23.9  14.8  22.2  21.7  20.7
13.5  15.8  13.1  16.1  21.9  23.9  19.3  12.0  19.9  19.4
15.4  16.7  19.5  16.2  16.9  17.1  20.2  13.4  19.8  17.7
19.7  18.7  17.6  15.9  15.2  17.1  15.0  18.8  21.6  11.9

Create a SAS or R dataset containing the number of concurrent users at each location. For example, call your variable nusers. If you are using SAS and copy the preceding data lines verbatim, don't forget the trailing @@ in the input statement. Use this R statement to read data from within the script:
Create create and interpret the normal plot for nusers.
Compute a 95% confidence interval for nusers. Show your hand calculations with the relevant SAS or R output. Don't use the standard normal confidence interval [-1.96,1.96], use the t-distribution confidence interval obtained from the t-table for 50 - 1 = 49 degrees of freedom. You can check your answer with SAS using proc means or proc ttest. You can check your answer with R using the t.test function.
Show the five steps of the one-sample t-test at the α = 0.05 level to test whether nusers has changed in the past month. Usage data from last month shows an average of 17.2 thousand concurrent users. You don't need to perform hand calculations for steps 1, 2, and 5; just copy the values from the SAS or R output. For Step 3, find the 95% confidence interval using the t-table.

Part C. Two-sample t-tests (25 pts.)

Use the data in paper1.txt and/or paper2.txt to answer the following questions.

If you are using SAS, create labels for each variable thickness and color. If you are using R, add print statements in your source code to explain what your output means.
Create normal plots of the thicknesses separately for the paper colors white and yellow. Interpret these normal plots.
Type out the five steps of a 0.05-level paired-sample t-test to test the null hypothesis that there is no difference between the paper thicknesses in paper1.txt. Show relevent SAS or R output in your report. You will need to obtain the confidence interval for the test statistic from the t-table.
Type out the five steps a 0.05-level independent two-sample t-test to test the null hypothesis that there is no difference between the paper thicknesses for colors white and yellow. Show your output and discuss what it means. Use paper2.txt sorted by color for SAS but paper1.txt for R. You will need to obtain the confidence interval for the test statistic from the t-table.
Is the paired sample or the independent two-sample t-test is more appropriate to decide if the true thickness of a sheet of paper is different for color white or or yellow? Explain your answer. How do the p-values compare for the two t-tests? Is this what you would expect?