﻿ Project 2

## CSC 423 -- Project 2

Look at the Project Submission Guidelines before submitting projects.

The total points for this project is listed as 90, this raw score will be multiplied by 100/90 to obtain the scaled score.

### Part A. Flour Dataset (40 pts.)

Use the flour dataset to do these problems. This dataset is based on a mill that grinds grain into flour.  Each time a batch of flour is ground a certain number of pounds of flower is produced (independent variable Weight).  This flour is put into bags for shipping.  The number of bags needed is the dependent variable NBags.

1. Create and print a SAS dataset or R dataframe named flour.

2. Use SAS or R to compute the means and standard deviations for weight and nbags. Also compute the correlation between weight and nbags.

3. Compute the regression model by hand using the formula

y - y = (rxy sy / sx)(x - x)

4. Use SAS or R to find the simple linear regression model for predicting nbags from weight. Compare your hand calculations in Question A3 to the simple linear regression model obtained by SAS or R.

5. For the simple linear regression model, create and interpret the residual plot and normal plot of the residuals.

6. Compute the regression through the origin model by hand using the formula

a = (x1y1 + ... + xnyn) / (x12 + ... + xn2),     y = ax.

7. Use SAS or R to find the regression through the origin model for predicting nbags from weight. Compare your hand calculation in Question A6 to the regression through the origin model obtained with SAS or R.

8. For the regression through the origin model, create and interpret the residual plot and normal plot of the residuals.

### Part B: Used Car Dataset (50 pts.)

Collect the following used car data from the internet or elsewhere for at least 20 cars (or other vehicles like motorcycles or motorboats) of the same make and model: price, year, miles.

1. Create and print the SAS dataset or R data frame called UsedCars.

2. Create the pairwise scatterplots of year, miles, and price.

3. Find the pairwise correlations of year, miles, and price with SAS or R. Interpret them.

4. Find the simple linear regression model price=year with SAS or R.

5. Create the residual plot residuals*predicted and the normal plot of the residuals. Interpret these plots.

6. Find the simple linear regression model price=miles with SAS or R.

7. Create the residual plot residuals*predicted and normal plot of the residuals. Interpret these plots.

8. Find the multiple linear regression model price=year miles with SAS or R.

9. Create residual plots and the normal plot of the residuals.  Create these three residual plots: residuals*predicted, residuals*year, and residuals*miles. Interpret these plots.

10. In your opinion, which is the best regression model out of price=year, price=miles, price=year miles for predicting price.  Explain your answer.