DSC 478
Spring 2020

 Syllabus 

 Course Material 

 Assignments 

 Class Project 

 Online Resources 

 Home




Comments/Suggestions



Schedule and Class Material


Week 1 - Mar 31, 2020
Lecture Material Resources Assignments/Readings
Topics: 
 
Introduction to the Course
Overview of Data Mining and Knowledge Discovery Process
Brief Python review and an overview of Numpy
Brief Pandas Tuorial
Lecture Videos (D2L)
- Brief Course Overview
- Intro - Part 1 (54 mins)
- Intro - Part 2 (75 mins)

Class Examples (Notebooks)
- Python/Numpy Basics
- Populations
- Populations with Pandas
Related Files for Examples
- populations.txt
Install and test Python distribution (ideally you should install the distributon from Anaconda which automaticaly installs all of the necessary libraries used in this class).
Familiarize yourself with IPython, and particularly, Jupyter Notebook. There is also a nice 30 min. Jupyter Notebook Tutorial Video by Corey Schafer.
Go through the "Quick Tutorial" on Numpy User Guide and try to follow the examples on your own (using Jupyter Notebook as the shell).
Review Section 1 of Python Scientific Lecture Notes.
Week 2 - Apr 07, 2020
Lecture Material Resources Assignments/Readings
Topics: 
 
Brief Pandas Tuorial (Cont.)
Understanding Characteristics of Data
[Slides] [Video (35 mins)]
 
Data Preparation and Preprocessing
[Slides] [Video (50 mins)]
   


Lecture Videos (D2L)
- More Pandas (26 mins)
- Pandas Video Store (53 mins)

Examples (Notebooks)
- Video Store with Pandas

- Video Store (Missing Values)
Related Files for Examples
- Video_Store.csv
Assignment 1 is available. Due date: Sunday, April 19.
Familiarize yourself with Pandas basics. A good place to start  is the Pandas Tutorials page in Pandas Documentation. You might also review Python Pandas Tutorial: A Complete Introduction for Beginners.
In the Matplotlib User Guide, read the Matplotlib Pyplot Tutorial
Read Section 1.4 of the Python Scientific Lecture Notes on Matplotlib.
   
Week 3 - Apr 14, 2020
Lecture Material Resources Assignments/Readings
Topics: 
 
More on Python for data analysis and visualization
Distances, Similarities, and K-Nearest-Neighbor Search



Lecture Videos (D2L)
- Distances & Similarities
  (74 mins)
- KNN Notebooks (72 mins)

Examples (Notebooks)

-
KNN Search Example 1
- KNN Search Example 2


- Video Store (KNN Classifier)

 

Read Chapter 2 of Machine Learning in Action (MLA).

Assignment 1 is Due Sunday, April 19.
Week 4 - Apr 21, 2020
Lecture Material Resources Assignments/Readings
Topic: Supervised Learning
 
Classification & Prediction - Review of Basic Concepts
Text Categorization

Review Material:

Decision Trees  
[Video (41 mins)] [Slides]
Bayesian Classification 
[Video (32 mins)] [Slides]

Lecture Videos (D2L)

- Classification/Prediction
  (47 mins)

- Text Categorization (45 mins)

- Classification with scikit-learn
   (52 mins)

Examples (Notebooks)

- TF*IDF and Document
   Categorization


- Video Store (Scikit-learn)


Read Chapters 3 and 4 of MLA.
Read scikit-learn user guide: Sections: 1.2, 1.6, 1.9, 1.10, 4.3.
Week 5 - Apr 28, 2020
Lecture Material Resources Assignments/Readings
Topic: Supervised Learning
Classification (continued)
 
Notes on Assignment 2
[Video (32 mins)] [Slides]
 
Notes on Personalization and Recommender Systems



Examples (Notebooks)

- Video Store - Scikit-learn
   (Part 2)
   (See: Classification with
   scikit-learn video posted in
   Week 4)

Other Relevant Resources
- Recommender Systems Wiki
- Recommender-Systems.org
Read Chapter 8 of MLA.
Review scikit-learn user guide: Sections: 3.1, 3.3, 3.5.
Review scikit-learn user guide: Sections: 1.1 (Linear Models)
Read Recommender Systems Article in the Encyclopedia of Machine Learning
Read Wikipedia article on Collaborative Filtering
Week 6 - May 05, 2020
Lecture Material Resources Assignments/Readings
Topics: Supervised Learning
Basic Regression Analysis
Model Selection & Optimization:
- Gradient Descent
  Optimization

- Feature Selection
- Parameter Selection





Lecture Videos (D2L)

- Regression Analysis (36 mins)

- Regression using scikit-learn
  (49 mins)

- Feature / Model Selection
   (49 mins)


- Gradient Descent
  Optimization
(37 mins)
Examples (Notebooks)

- Regression Analysis using
  Scikit-learn


- Feature / Model Selection
  Strategies


- Gradient Descent 
  Optimization

Read Chapter 10 of MLA.
Review scikit-learn user guide: Sections: 1.5, 1.13, 3.2.
Review scikit-learn user guide: Sections: 2.3 (Clustering), and the API documentation for Kmeans.
 
Week 7 - May 12, 2020
Lecture Material Resources Assignments/Readings
Topic: Unsupervised Learning
Clustering



Lecture Videos (D2L)

Clustering Concepts & Algorihtms
 
Clustering Examples - Jupyter Notebooks (see below for the Notebooks covered in this video).

Examples (Notebooks)

- K-means Clustering

-
Document Clustering


Review Wikipedia pages on Cluster Analysis, including the article on Kmeans Clustering and Hierarchical Clustering.
 
Read Chapters 13 and 14 of MLA.
Week 8 - May 19, 2020
Lecture Material Resources Assignments/Readings
Topic: Unsupervised Learning
Principal Component Analysis
SVD (Singular Value Decomposition),  Recommender Systems, and Matrix Factorization

Lecture Videos (D2L)

Principal Component Analysis (39 mins)

Singular Value Decomposition (26 mins)

Recommender Systems & Matrix Factorization (55 mins)

Recommender Systems & The Netflix Challenge (30 mins)

Examples (Notebooks)

Basic PCA Example

-
Document Clustering, PCA,
   and SVD

- Item-Based Rec Test

- Joking with Matrix
   Factorization

Read Chapters 11 and 14 of MLA.
Read: Matrix Factorization: A Simple Tutorial and Implementation in Python, by Albert Au Yeung.
Week 9 - May 26, 202
Lecture Material Resources Assignments/Readings
 
Support Vector Machines

Lecture Videos (D2L)

Support Vector Machines - Concepts (28 mins)

Support Vector Machines - Jupyter Notebook (28 mins) - See Below

Model Selection with News Group Data (32 mins) - See Below

Examples (Notebooks)

-
Support Vector Machines

- Model Selection on
  Newsgroup Data



Review the Final Project Checklist.
Review scikit-learn user guide: Sections: 1.4 (Support Vector Machines).
Week10 - June 02, 2020
Lecture Material Resources Assignments/Readings
 
Ensemble Methods
 Brief Course Summary
Lecture Videos (D2L)

Ensemble Classification (62 mins)

Examples (Notebooks)

-
Ensemble Classification



Review scikit-learn user guide: Sections: 1.11 (Ensemble Methods).
Review the Final Project Checklist.
   
Final Projects Due on Tuesday, June 9, 2020

Copyright ©, Bamshad Mobasher, DePaul University.