DSC 478
Fall 2020

 Syllabus 

 Course Material 

 Assignments 

 Class Project 

 Online Resources 

 Home




Comments/Suggestions



Schedule and Class Material


Week 1 - Sep 09, 2020
Lecture Material Resources Assignments/Readings
Topics: 
 
Introduction to the Course
Overview of Data Mining and Knowledge Discovery Process
Brief Python review and an overview of Numpy
Brief Pandas Tutorial
Lecture Videos (D2L)
1- Brief Course Overview
2- Overview of Data Mining &
     KDD Process (32 mins)
3- Basics of Python & Numpy
     (44 mins)
4- Population Example - Part 1:
     Numpy (34 Mins)
5- Population Example - Part 2:
     Pandas (26 Mins)


Class Examples (Notebooks)
- Python/Numpy Basics
- Populations
- Populations with Pandas
Related Files for Examples
- populations.txt
Install and test Python distribution (ideally you should install the distributon from Anaconda which automaticaly installs all of the necessary libraries used in this class).
Familiarize yourself with IPython, and particularly, Jupyter Notebook. There is also a nice 30 min. Jupyter Notebook Tutorial Video by Corey Schafer.
Go through the "Quick Tutorial" on Numpy User Guide and try to follow the examples on your own (using Jupyter Notebook as the shell).
Review Section 1 of Python Scientific Lecture Notes.
Week 2 - Sep 16, 2020
Lecture Material Resources Assignments/Readings
Topics: 
 
Brief Pandas Tuorial (Cont.)
Understanding Characteristics of Data
[Slides] [Video (35 mins)]
 
Data Preparation and Preprocessing
[Slides] [Video (50 mins)]
   


Lecture Videos (D2L)
1- Understanding Data
     Characteristics (35 mins)
2- Data Preparation &
     Preprocessing (50 Mins)
3- Preprocessing w/ Pandas Video Store Example (63  mins)
Examples (Notebooks)
- Video Store with Pandas

- Video Store (Missing Values)
Related Files for Examples
- Video_Store.csv
Familiarize yourself with Pandas basics. A good place to start  is the Pandas Tutorials page in Pandas Documentation. You might also review Python Pandas Tutorial: A Complete Introduction for Beginners.
In the Matplotlib User Guide, read the Matplotlib Pyplot Tutorial
Read Section 1.4 of the Python Scientific Lecture Notes on Matplotlib.
   
Week 3 - Sep 23,  2020
Lecture Material Resources Assignments/Readings
Topics: 
 
Distances, Similarities, and K-Nearest-Neighbor Search


Review Material:
Classification & Prediction - Review of Basic Concepts
Video (43 mins)] [Slides]
Lecture Videos (D2L)
1- Distances, Similarities, &
    KNN Classification (69 mins)
2- KNN Search Example
    Notebooks (35 mins)
3- Review of Classification
    Basic Concepts (43 mins)
4- KNN Classification - Video
    Store Example (33 mins)

Examples (Notebooks)

-
KNN Search Example 1
- KNN Search Example 2


- Video Store (KNN Classifier)

 

Read Chapter 2 of Machine Learning in Action (MLA).

Week 4 - Sep 30, 2020
Lecture Material Resources Assignments/Readings
Topic: Supervised Learning
 
Text Categorization

Review Material:

Decision Trees  
[Video (41 mins)] [Slides]
Bayesian Classification 
[Video (32 mins)] [Slides]

Lecture Videos (D2L)
1 - Text Categorization
      (46 mins)
2 - TF*IDF and Document
      Categorization (21 mins)

3 - Decision Tree Learning
      (41 mins)

4 - Bayesian Classification 
      (32 mins)

5 - Classification using Scikit-learn (61 mins)
Examples (Notebooks)

- TF*IDF and Document
   Categorization


- Video Store (Scikit-learn)


Read Chapters 3 and 4 of MLA.
Read scikit-learn user guide: Sections: 1.2, 1.6, 1.9, 1.10, 4.3.
Week 5 - Oct 07, 2020
Lecture Material Resources Assignments/Readings
Topic: Supervised Learning
Classification (continued)
 
Notes on Assignment 2
 
Notes on Personalization and Recommender Systems



Lecture Videos (D2L)
1 - Classification using
     Scikit-learn (61 mins)

2 - Notes on Assignment 2
     (55 mins) - See PPT slides
     in the left column
Examples (Notebooks)

- Video Store - Scikit-learn
   (Part 2)
   (Part 1 is posted above in
   Week 4)

Other Relevant Resources
- Recommender Systems Wiki
- Recommender-Systems.org
Read Chapter 8 of MLA.
Review scikit-learn user guide: Sections: 3.1, 3.3, 3.5.
Review scikit-learn user guide: Sections: 1.1 (Linear Models)
Read Recommender Systems Article in the Encyclopedia of Machine Learning
Read Wikipedia article on Collaborative Filtering
Week 6 - Oct 14, 2020
Lecture Material Resources Assignments/Readings
Topics: Supervised Learning
Basic Regression Analysis
Model Selection & Optimization:
- Gradient Descent
  Optimization

- Feature Selection
- Parameter Selection





Lecture Videos (D2L)
1- Basic Regressions Analysis
     (34 mins)
2 - Regression Using
     Scikit-learn (44 mins)
3 - Feature / Model Selection
     Strategies (46 mins)
4 - Gradient Descent
     Optimization (35 mins)
Examples (Notebooks)

- Regression Analysis using
  Scikit-learn


- Feature / Model Selection
  Strategies


- Gradient Descent 
  Optimization

Read Chapter 10 of MLA.
Review scikit-learn user guide: Sections: 1.5, 1.13, 3.2.
Review scikit-learn user guide: Sections: 2.3 (Clustering), and the API documentation for Kmeans.
 
Week 7 - Oct 21, 2020
Lecture Material Resources Assignments/Readings
Topic: Unsupervised Learning
Clustering

Video: Additional Notes on Assignment 3 (30 mins)
- PPT slides for the video

Lecture Videos (D2L)

1 - Clustering Concepts and Algorithms (29 mins)
2 - Clustering Example Jupyter Notebooks (34 mins)
3 - Additional Notes on Assignment 3 (30 mins)
[PPT slides for the video]


Examples (Notebooks)

- K-means Clustering

-
Document Clustering


Review Wikipedia pages on Cluster Analysis, including the article on Kmeans Clustering and Hierarchical Clustering.
 
Read Chapters 13 and 14 of MLA.
Week 8 - Oct 28, 2020
Lecture Material Resources Assignments/Readings
Topic: Unsupervised Learning
Principal Component Analysis
SVD (Singular Value Decomposition),  Recommender Systems, and Matrix Factorization

Lecture Videos (D2L)

1- Principal Component
     Analysis - PCA (35 mins)
2- Singular Value
     Decomposition (22 mins)
3- Recommender Systems &
     Matrix Factorization
     (49 mins)

Examples (Notebooks)

Basic PCA Example

-
Document Clustering, PCA,
   and SVD

- Item-Based Rec Test

- Joking with Matrix
   Factorization

Read Chapters 11 and 14 of MLA.
Read: Matrix Factorization: A Simple Tutorial and Implementation in Python, by Albert Au Yeung.
Week 9 - Nov 04, 202
Lecture Material Resources Assignments/Readings
 
Support Vector Machines

Also:

- Using Machine Learning
  Pipelines in Scikit-learn
- More examples of Model
  Optimization
Lecture Videos (D2L):

1 - Support Vector Machines -
      Basics (24 mins)
2 - SVM - Jupyter Notebook
      Example (22 mins)
3 - Model Selection on News
      Group Data (28 mins)

Examples (Notebooks)

-
Support Vector Machines

- Model Selection on
  Newsgroup Data



Review the Final Project Checklist.
Review scikit-learn user guide: Sections: 1.4 (Support Vector Machines).
Week10 - Nov 11, 2020
Lecture Material Resources Assignments/Readings
 
Ensemble Methods
 Brief Course Summary
Lecture Videos (D2L)

- Ensemble Methods

Examples (Notebooks)

-
Ensemble Classification



Review scikit-learn user guide: Sections: 1.11 (Ensemble Methods).
Review the Final Project Checklist.
   
Final Projects Due on Friday, Nov 20, 2020

Copyright ©, Bamshad Mobasher, DePaul University.