To Projects

CSC 423/324 -- Final Project

Grading Criteria

Due Dates

Groups for Final Project

Submission Items

  1. A non-technical summary, not more than one page. This summary should present your results to a non-technical audience, such as your boss, your friend, or your mother (assuming that they do not understand statistics in detail).
  2. A technical report (suggested length, 5 pages) with the details of your analysis, presented for a statistically literate audience. This report must be clearly written in complete sentences with an introduction and conclusion. You can include input datasets, source code, output, and graphs in an appendix.
  3. Power Point slides for your final presentation. Online Learning students will not do an in-class presentation, but they will still create Power Point slides.
  4. The Group Member Evaluation Form. Don't submit this form if you are working in a group by yourself.
  5. The three files in steps 1-3 should be submitted in a zip file:, with the names of the group members in your zip file name. Only one group member needs to submit the zip file with the submission items listed in this section. However, the other member or members should submit a comment stating who is in your group and who is submitting the submission items.

Final Project Content

Your final project report should address these points:

  1. The dependent variable for your regression models must be a continuous variable unless you are using logistic regression.
  2. Your dataset should include at least 8 variables, and preferably at least 10 observations per variable. Ideally, you should set aside a random selection of the data for validation, so if you hold out 50% of the observations for validation, you will still be left with at least 5 observations per variable.
  3. The exploratory data analysis may suggest a model that is adequate for fitting the data. Do the data show a nonlinear relationship so that a data transform is needed?
  4. Check for collinearity among the independent variables and adjust your model accordingly.
  5. Did you use a variable selection procedure? If so, describe it.
  6. If several models appear equally good for predicting the data, discuss them all, including, perhaps, an intuitive choice of the model that makes the most sense.
  7. Analyze the residual plots. This might indicate failures in the assumptions or inadequacies in the model
  8. Check for leverage points, influential points, and outliers in your dataset. Decide if they should be deleted or if the model needs to be modified to account for them.
  9. Can your model be improved? Are you satisfied with the model you selected?
  10. Use your selected model to examine the relationships among the variables. Identify the strongest predictors among the independent variables?
  11. Apply cross-validation techniques to evaluate how well your model does for prediction.

Note: even if you do not find a satisfactory regression model for your dataset, you can still explain what models you tried and why they were not satisfactory.  If you found two models that seemed equally good, compare them, using graphs and goodness-of-fit statistics.