DSC 478
Fall 2022

 Syllabus 

 Course Material 

 Assignments 

 Class Project 

 Online Resources 

 Home




Comments/Suggestions



Schedule and Class Material


Week 1 - Sep 7 - Sep 13
Lecture Material Resources Assignments/Readings
Topics: 
 
Introduction to the Course
Overview of Machine Learning, Data Mining & the Knowledge Discovery Process
Brief Python review and an overview of Numpy
Brief Pandas Tutorial
Lecture Videos (D2L)
1- Brief Course Overview
2- Overview of Data Mining, ML  & KDD Process (41 mins)
3- Basics of Python & Numpy (44 mins)
4- Population Example - Part 1: Numpy (34 mins)
5- Population Example - Part 2: Pandas (26 mins)


Class Examples (Notebooks)
- Python/Numpy Basics
- Populations
- Populations with Pandas
Related Files for Examples
- populations.txt

Install and test Python distribution (ideally you should install the distributon from Anaconda which automaticaly installs all of the necessary libraries used in this class).
Familiarize yourself with IPython, and particularly, Jupyter Notebook. There is also a nice 30 min. Jupyter Notebook Tutorial Video by Corey Schafer.
Go through the "Quick Tutorial" on Numpy User Guide and try to follow the examples on your own (using Jupyter Notebook as the shell).
Review Section 1 of Python Scientific Lecture Notes.
Week 2 - Sep 14 - Sep 20
Lecture Material Resources Assignments/Readings
Topics: 
 
Understanding Characteristics of Data
 
Data Preparation and Preprocessing
   


Lecture Videos (D2L)
1- Understanding Data Characteristics (35 mins)
2- Data Preparation & Preprocessing (50 Mins)
3- Preprocessing w/ Pandas Video Store Example (63  mins)
Examples (Notebooks)
- Video Store with Pandas

- Video Store (Missing Values)
Related Files for Examples
- Video_Store.csv
Familiarize yourself with Pandas basics. A good place to start  is the Pandas Tutorials page in Pandas Documentation. You might also review Python Pandas Tutorial: A Complete Introduction for Beginners.
In the Matplotlib User Guide, read the Matplotlib Pyplot Tutorial
Read Section 1.4 of the Python Scientific Lecture Notes on Matplotlib.
   
Week 3 - Sep 21 - Sep 27
Lecture Material Resources Assignments/Readings
Topics: 
 
Distances, Similarities, and K-Nearest-Neighbor Search


Review Material:
Classification & Prediction - Review of Basic Concepts
Lecture Videos (D2L)
1- Review of Classification Basic Concepts (48 mins)
2- Distances, Similarities, & KNN Classification (69 mins)
3- KNN Search Example Notebooks (35 mins)
4- KNN Classification - Video Store Example (47 mins)

Examples (Notebooks)
- KNN Search Example 1
- KNN Search Example 2
- Video Store KNN Classifier

 

Read Chapter 2 of Machine Learning in Action (MLA).

Week 4 - Sep 28 - Oct 4
Lecture Material Resources Assignments/Readings
Topic: Supervised Learning
 
Text Categorization

Review Material:

Decision Trees  
[Video (41 mins)] [Slides]
Bayesian Classification 
[Video (32 mins)] [Slides]

Lecture Videos (D2L)
1 - Text Categorization (46 mins)
2 - TF*IDF and Document  Categorization (21 mins)
3 - Review: Decision Tree Learning (41 mins)
4 - Review: Bayesian Classification (32 mins)
5 - Classification using Scikit-learn (61 mins)


Examples (Notebooks)

- TF*IDF and Document
   Categorization


- Video Store (Scikit-learn)


- Video Store - Scikit-learn
   (Part 2)


Read Chapters 3 and 4 of MLA.
Read scikit-learn user guide: Sections: 1.2, 1.6, 1.9, 1.10, 6.3.
Week 5 - Oct 5 - Oct 11
Lecture Material Resources Assignments/Readings
Topic: Supervised Learning
Classification (continued)
 
Overview of Recommender Systems
 
Notes on Assignment 2



Lecture Videos (D2L)

6 - Overview of Recommender Systems (72 mins)
Other Relevant Resources
- Recommender Systems Wiki
- Recommender-Systems.org
Read Chapter 8 of MLA.
Review scikit-learn user guide: Sections: 3.1, 3.3
Review scikit-learn user guide: Sections: 1.1 (Linear Models)
Read Recommender Systems Article in the Encyclopedia of Machine Learning
Read Wikipedia article on Collaborative Filtering
Week 6 - Oct 12 - Oct 18
Lecture Material Resources Assignments/Readings
Topics: Supervised Learning
Basic Regression Analysis
Model Selection & Optimization:
- Gradient Descent
  Optimization

- Feature Selection
- Parameter
Optimization




Lecture Videos (D2L):

1- Basic Regressions Analysis (34 mins)
2 - Feature Selection & Model Optimization (46 mins)
3 - Gradient Descent Optimization (35 mins)
Examples (Notebooks):

- Regression Analysis using Scikit-learn

- Feature / Model Selection
  Strategies


- Gradient Descent 
  Optimization


Read Chapter 10 of MLA.
Review scikit-learn user guide: Sections: 1.1, 1.5, 3.2.
Review scikit-learn user guide: Sections: 2.3 (Clustering), and the API documentation for Kmeans.
 
Week 7 - Oct 19 - Oct 25
Lecture Material Resources Assignments/Readings
Topic: Unsupervised Learning
Clustering

* New: Additional Notes on Assignment 3
Lecture Videos (D2L)

1 - Clustering: Basic Concepts and Algorithms (40 mins)

2 - Clustering Examples - Jupyter Notebooks (34 mins)


Examples (Notebooks)

- K-means Clustering

-
Document Clustering


Review Wikipedia pages on Cluster Analysis, including the article on Kmeans Clustering and Hierarchical Clustering.
 
Read Chapters 13 and 14 of MLA.
Week 8 - Oct 26 - Nov 1
Lecture Material Resources Assignments/Readings
Topic: Unsupervised Learning
Principal Component Analysis
SVD (Singular Value Decomposition) and Matrix Factorization

Lecture Videos (D2L)

1- Principal Component Analysis - PCA (35 mins)
2- Singular Value Decomposition (22 mins)
3- Recommender Systems & Matrix Factorization (49 mins)

Examples (Notebooks)

Basic PCA Example

- Document Clustering, PCA,
   and SVD

- Item-Based Rec Test

- Joking with Matrix
   Factorization


Read Chapters 11 and 14 of MLA.
Read: Matrix Factorization: A Simple Tutorial and Implementation in Python, by Albert Au Yeung.
Week 9 - Nov 2 - Nov 8
Lecture Material Resources Assignments/Readings
 
Support Vector Machines

Also:

- Using Machine Learning
  Pipelines in Scikit-learn
- More examples of Model
  Optimization
Lecture Videos (D2L):

1. Support Vector Machines (24 mins)
2. SVM Jupyter Notebook Example (22 mins)
3. Model Selection on Newsgroup Data (28 mins)

Examples (Notebooks)

- Support Vector Machines

- Model Selection on
  Newsgroup Data



Review the Final Project Checklist.
Review scikit-learn user guide: Sections: 1.4 (Support Vector Machines).
Week10 - Nov 9 - Nov 15
Lecture Material Resources Assignments/Readings
 
Ensemble Methods
 Brief Course Summary
Lecture Videos (D2L)

1. Enssemble Methods (53 mins)

Examples (Notebooks)

- Ensemble Classification



Review scikit-learn user guide: Sections: 1.11 (Ensemble Methods).
Review the Final Project Checklist.
   
Final Projects Due on Wednesday, November 16, 2022

Copyright ©, Bamshad Mobasher, DePaul University.