To Projects
CSC 423/324 -- Final Project
Grading Criteria
- Grading Breakdown for Final Project: Technical Report: 40%, Non-technical Summary:
25%, PowerPoint Slides: 25%, Group Member Evaluation: 10%.
- Grading Criteria for Summary (Presentation, PowerPoint Slides, and Non-technical
Summary): Accurate: 25%, Complete: 25%, Concise: 25%, Interesting: 25%
- Grading Criteria for Technical Report: Correct and Complete: 40%, Well-written: 20, Appropriate Dataset: 20%, Interesting: 20%
- Group Member Evaluation Form
Due Dates
- Final Project presentations will be on the last day of class, which is
August 16 (Day 10). Post the members in your group and the project title by
Monday,
August 14 (Day 9).
Groups for Final Project
- Final projects are be completed in groups of size 2, or 3. Online Learning students
may work with in-class students.
- A discussion forum named FinalProject has been created on http://d2l.depaul.edu with two topics:
FormProjectGroup, for inviting persons to join your group of two or three, and
ProjectInfo, for posting the title of your final project and group members after you have formed
your group.
Submission Items
- A non-technical summary, not more than one page. This
summary should present your results to a non-technical audience, such as your
boss, your friend, or your mother (assuming that they do not understand
statistics in detail).
- A technical report (suggested length, 5 pages) with the details of your analysis, presented for a statistically literate audience.
This report must be clearly written in complete sentences with an introduction and conclusion. You can include input datasets, source code, output, and graphs in an appendix.
- Power Point slides for your final presentation. Online Learning students will not
do an in-class
presentation, but they will still create Power Point slides.
- The Group Member Evaluation Form. Don't submit this form if you are working in a group by yourself.
- The three files in steps 1-3 should be submitted in a zip file: finalproj-larsson-smith-suarez.zip,
with the names of the group members in your zip file name. Only one group
member needs to submit the zip file with the submission items listed in this
section. However, the other member or members should submit a comment
stating who is in your group and who is submitting the submission items.
Final Project Content
Your final project report should address these points:
- The dependent variable for your regression models must be a continuous
variable unless you are using logistic regression.
- Your dataset should include at least 8 variables, and preferably at least 10 observations per variable.
Ideally, you should set aside a random selection of the data for validation, so
if you hold out 50% of the observations for validation, you will still be left
with at least 5 observations per variable.
- The exploratory data analysis may suggest a model that is adequate for fitting the data. Do the data show a nonlinear relationship so that a data transform is needed?
- Check for collinearity among the independent variables and adjust your model
accordingly.
- Did you use a variable selection procedure? If so, describe it.
- If several models appear equally good for predicting the data, discuss
them all, including, perhaps, an intuitive choice of the model that makes
the most sense.
- Analyze the residual plots. This might indicate failures in the
assumptions or inadequacies in the model
- Check for leverage points, influential points, and outliers in your
dataset. Decide if they should be deleted or if the model needs to be
modified to account for them.
- Can your model be improved? Are you satisfied with the model you
selected?
- Use your selected model to examine the relationships among the variables.
Identify the strongest predictors among the independent variables?
- Apply cross-validation techniques to evaluate how well your model does
for prediction.
Note: even if you do not find a satisfactory regression model for your
dataset, you can still explain what models you tried and why they were not
satisfactory. If you found two models that seemed equally good, compare
them, using graphs and goodness-of-fit statistics.