CSC 478
Spring 2018

 Syllabus 

 Course Material 

 Assignments 

 Class Project 

 Online Resources 

 Home




Comments/Suggestions


Schedule and Class Material


Week 1 - Mar 26, 2018
Lecture Material Resources Assignments/Readings
Topics: 
 
Introduction to the Course
Brief Python review and an overview of Numpy
Overview of Data Mining and Knowledge Discovery Process
[Slides] [Video (33 mins)]
Data Preparation and Preprocessing
[Slides] [Video (50 mins)]
A short video introduction to Jupyter Notebook (By Corey Schafer)
Class Examples (Notebooks)
- Python/Numpy Basics
- Populations
- Video Store
Related Files for Examples
- populations.txt
- Video_Store.csv
Read Section 1.1 of Python Scientific Lecture Notes.
Go through your favorite Python tutorial (see Online Resources) for a quick refresher.
Install and test Python distribution (ideally you should install the distributon from Anaconda which automaticaly installs all of the necessary libraries used in this class).
Familiarize yourself with IPython, and particularly, Jupyter Notebook. There is also a nice 30 min. Jupyter Notebook Tutorial Video by Corey Schafer.
Go through the "Quick Tutorial" on Numpy User Guide and try to follow the examples on your own (using Jupyter Notebook as the shell).
Week 2 - Apr 2, 2018
Lecture Material Resources Assignments/Readings
Topics: 
 
Understanding Characteristics of Data
 
More on Python for data analysis and visualization





Watch Wes McKinny's 10 Minute Tour of Pandas
Examples (Notebooks)
- Populations with Pandas
- Video Store with Pandas
Related Files for Examples
- populations.txt
- Video_Store.csv
In the Matplotlib User Guide, read the Matplotlib Pyplot Tutorial
Read Section 1.4 of the Python Scientific Lecture Notes on Matplotlib.
Familiarize yourself with Pandas basics. A good places to start are Wes McKinny's video on 10 Minute Tour of Pandas and "Lessons for New pandas Users" in the Pandas Tutorials. Also review Intro to Data Structures in Pandas Documentation.
Continue reading through Tentative Numpy Tutorial.
Week 3 - Apr 9, 2018
Lecture Material Resources Assignments/Readings
Topics: 
 
More on Python for data analysis and visualization
Distances, Similarities, and K-Nearest-Neighbor Search



Examples (Notebooks)

- Video Store (Missing Values)

-
KNN Search Example 1
- KNN Search Example 2


- Video Store (KNN Classifier)

 

Read Chapter 2 of Machine Learning in Action (MLA).
Assignment 1 Due: TBA.
Week 4 - Apr 16, 2018
Lecture Material Resources Assignments/Readings
Topic: Supervised Learning
 
Classification & Prediction - Review of Basic Concepts
Text Categorization

Decision Trees  
[Video (41 mins)] [Slides]
Bayesian Classification 
[Video (32 mins)] [Slides]

Examples (Notebooks)

- Term-Doc Matrix and TF*IDF

- Video Store (Scikit-learn)


Read Chapters 3 and 4 of MLA.
Read scikit-learn user guide: Sections: 1.2, 1.6, 1.9, 1.10, 4.3.
Week 5 - Apr 23, 2018
Lecture Material Resources Assignments/Readings
Topic: Supervised Learning
 
Classification (continued)
 
Notes on Personalization and Recommender Systems



Examples (Notebooks)

- Video Store - Scikit-learn
   (Cont.)

- Video Store - Scikit-learn
   (Part 2)

Other Relevant Resources
- Recommender Systems Wiki
- Recommender-Systems.org
Read Chapter 8 of MLA.
Review scikit-learn user guide: Sections: 3.1, 3.3, 3.5.
Review scikit-learn user guide: Sections: 1.1 (Linear Models)
Read Recommender Systems Article in the Encyclopedia of Machine Learning
Read Wikipedia article on Collaborative Filtering
Week 6 - Apr 30, 2018
Lecture Material Resources Assignments/Readings
Topics: Supervised Learning
Personalization & Recommender Systems (cont.)
Basic Regression Analysis
Model Selection & Optimization:
- Gradient Descent
  Optimization

- Feature Selection
- Parameter Selection


 




Examples (Notebooks)

- Regression Analysis using
  Scikit-learn


- Feature / Model Selection
  Strategies

Read Chapter 10 of MLA.
Review scikit-learn user guide: Sections: 1.5, 1.13, 3.2.
Review scikit-learn user guide: Sections: 2.3 (Clustering), and the API documentation for Kmeans.
 
Review Wikipedia pages on Cluster Analysis, including the article on Kmeans Clustering and Hierarchical Clustering.
 
Week 7 - May 7, 2018
Lecture Material Resources Assignments/Readings
Topic: Unsupervised Learning
Clustering
Principal Component Analysis



Examples (Notebooks)

- K-means Clustering

-
Document Clustering, PCA,
   and SVD

- Another PCA Example

Read Chapters 13 and 14 of MLA.
Week 8 - May 14, 2018
Lecture Material Resources Assignments/Readings
Topic: Unsupervised Learning
SVD (Singular Value Decomposition)  and Matrix Factorization
Association Rule Mining

Examples (Notebooks)

-
Document Clustering, PCA,
   and SVD

- Joking with Matrix
   Factorization


- Association Rule Mining
Read Chapters 11 and 14 of MLA.
Read: Matrix Factorization: A Simple Tutorial and Implementation in Python, by Albert Au Yeung.
Review Wikipedia pages on Association Rule Learning.
 
Week 9 - May 21, 2018
Lecture Material Resources Assignments/Readings
Ensemble Methods
 
Brief Course Summary

Examples (Notebooks)

-
Ensemble Classification

- Model Selection on
  Newsgroup Data



Review the Final Project Checklist.
Review scikit-learn user guide: Sections: 1.11 (Ensemble Methods).
Week10 - May 28, 2018
Lecture Material Resources Assignments/Readings

Memorial Day:
No Class


Review the Final Project Checklist.
   
Final Projects Due on Monday, June 4, 2018

Copyright © 2016-2019, Bamshad Mobasher, DePaul University.