﻿ CSC 478 - Programming Data Mining Applications - Schedule

DSC 478
Fall 2020

Course Material

## Schedule and Class Material

Week 1 - Sep 09, 2020
Topics:

 * Introduction to the Course * Overview of Data Mining and Knowledge Discovery Process * Brief Python review and an overview of Numpy * Brief Pandas Tutorial
 * Lecture Videos (D2L)1- Brief Course Overview 2- Overview of Data Mining &      KDD Process (32 mins)3- Basics of Python & Numpy      (44 mins)4- Population Example - Part 1:      Numpy (34 Mins)5- Population Example - Part 2:      Pandas (26 Mins) * Class Examples (Notebooks) - Python/Numpy Basics- Populations- Populations with Pandas * Related Files for Examples - populations.txt
 * Install and test Python distribution (ideally you should install the distributon from Anaconda which automaticaly installs all of the necessary libraries used in this class). * Familiarize yourself with IPython, and particularly, Jupyter Notebook. There is also a nice 30 min. Jupyter Notebook Tutorial Video by Corey Schafer. * Go through the "Quick Tutorial" on Numpy User Guide and try to follow the examples on your own (using Jupyter Notebook as the shell). * Review Section 1 of Python Scientific Lecture Notes.
Week 2 - Sep 16, 2020
Topics:

 * Brief Pandas Tuorial (Cont.) * Understanding Characteristics of Data [Slides] [Video (35 mins)] * Data Preparation and Preprocessing[Slides] [Video (50 mins)]

 * Lecture Videos (D2L)1- Understanding Data      Characteristics (35 mins)2- Data Preparation &      Preprocessing (50 Mins)3- Preprocessing w/ Pandas Video Store Example (63  mins) * Examples (Notebooks) - Video Store with Pandas - Video Store (Missing Values)
 * Related Files for Examples - Video_Store.csv
 * Familiarize yourself with Pandas basics. A good place to start  is the Pandas Tutorials page in Pandas Documentation. You might also review Python Pandas Tutorial: A Complete Introduction for Beginners. * In the Matplotlib User Guide, read the Matplotlib Pyplot Tutorial * Read Section 1.4 of the Python Scientific Lecture Notes on Matplotlib.
Week 3 - Sep 23,  2020
Topics:

Review Material:
 * Classification & Prediction - Review of Basic Concepts Video (43 mins)] [Slides]
 * Lecture Videos (D2L)1- Distances, Similarities, &     KNN Classification (69 mins)2- KNN Search Example     Notebooks (35 mins) 3- Review of Classification     Basic Concepts (43 mins)4- KNN Classification - Video     Store Example (33 mins) * Examples (Notebooks)- KNN Search Example 1- KNN Search Example 2- Video Store (KNN Classifier)

 * Read Chapter 2 of Machine Learning in Action (MLA).
Week 4 - Sep 30, 2020
Topic: Supervised Learning

 * Text Categorization

Review Material:

 * Decision Trees  [Video (41 mins)] [Slides]
 * Bayesian Classification [Video (32 mins)] [Slides]
 * Lecture Videos (D2L)1 - Text Categorization       (46 mins)2 - TF*IDF and Document       Categorization (21 mins)3 - Decision Tree Learning       (41 mins)4 - Bayesian Classification       (32 mins) 5 - Classification using Scikit-learn (61 mins) * Examples (Notebooks)- TF*IDF and Document    Categorization - Video Store (Scikit-learn)
 * Read Chapters 3 and 4 of MLA. * Read scikit-learn user guide: Sections: 1.2, 1.6, 1.9, 1.10, 4.3.
Week 5 - Oct 07, 2020
Topic: Supervised Learning
 * Classification (continued)

 * Notes on Assignment 2
 * Notes on Personalization and Recommender Systems

 * Lecture Videos (D2L)1 - Classification using      Scikit-learn (61 mins) 2 - Notes on Assignment 2      (55 mins) - See PPT slides      in the left column * Examples (Notebooks)- Video Store - Scikit-learn   (Part 2)   (Part 1 is posted above in    Week 4)
 * Other Relevant Resources - Recommender Systems Wiki - Recommender-Systems.org
 * Read Chapter 8 of MLA. * Review scikit-learn user guide: Sections: 3.1, 3.3, 3.5. * Review scikit-learn user guide: Sections: 1.1 (Linear Models) * Read Recommender Systems Article in the Encyclopedia of Machine Learning * Read Wikipedia article on Collaborative Filtering
Week 6 - Oct 14, 2020
Topics: Supervised Learning
 * Model Selection & Optimization: - Gradient Descent   Optimization - Feature Selection- Parameter Selection

 * Lecture Videos (D2L)1- Basic Regressions Analysis      (34 mins)2 - Regression Using      Scikit-learn (44 mins)3 - Feature / Model Selection      Strategies (46 mins)4 - Gradient Descent      Optimization (35 mins) * Examples (Notebooks) - Regression Analysis using   Scikit-learn- Feature / Model Selection   Strategies - Gradient Descent   Optimization
 * Read Chapter 10 of MLA. * Review scikit-learn user guide: Sections: 1.5, 1.13, 3.2. * Review scikit-learn user guide: Sections: 2.3 (Clustering), and the API documentation for Kmeans.
Week 7 - Oct 21, 2020
Topic: Unsupervised Learning
 * Clustering

Video: Additional Notes on Assignment 3 (30 mins)
- PPT slides for the video

 * Lecture Videos (D2L) 1 - Clustering Concepts and Algorithms (29 mins)2 - Clustering Example Jupyter Notebooks (34 mins)3 - Additional Notes on Assignment 3 (30 mins)[PPT slides for the video]
 * Examples (Notebooks) - K-means Clustering - Document Clustering

 * Review Wikipedia pages on Cluster Analysis, including the article on Kmeans Clustering and Hierarchical Clustering. * Read Chapters 13 and 14 of MLA.
Week 8 - Oct 28, 2020
Topic: Unsupervised Learning
 * Lecture Videos (D2L) 1- Principal Component      Analysis - PCA (35 mins)2- Singular Value      Decomposition (22 mins)3- Recommender Systems &      Matrix Factorization      (49 mins)
 * Examples (Notebooks) - Basic PCA Example- Document Clustering, PCA,    and SVD- Item-Based Rec Test- Joking with Matrix
 * Read Chapters 11 and 14 of MLA. * Read: Matrix Factorization: A Simple Tutorial and Implementation in Python, by Albert Au Yeung.
Week 9 - Nov 04, 202

Also:

- Using Machine Learning
Pipelines in Scikit-learn
- More examples of Model
Optimization
 * Lecture Videos (D2L):1 - Support Vector Machines -       Basics (24 mins)2 - SVM - Jupyter Notebook       Example (22 mins)3 - Model Selection on News       Group Data (28 mins)
 * Examples (Notebooks) - Support Vector Machines- Model Selection on   Newsgroup Data

 * Review the Final Project Checklist. * Review scikit-learn user guide: Sections: 1.4 (Support Vector Machines).
Week10 - Nov 11, 2020