CSC 478
Spring 2018

 Syllabus 

 Course Material 

 Assignments 

 Class Project 

 Online Resources 

 Home




Comments/Suggestions



Course Syllabus

INSTRUCTOR

Bamshad Mobasher
Email: mobasher@cs.depaul.edu
Office: Loop Campus, CDM Building, Room 833
Phone: (312) 362-5174
Office Hours: Mondays 4:00-5:30 PM (or by appointment)

COURSE DESCRIPTION

The course will focus on the implementations of various data mining and machine learning techniques and their applications in various domains. The primary tools used in the class are the Python programming language and several associated libraries. Additional open source machine learning and data mining tools may also be used as part of the class material and assignments. Students will develop hands on experience developing supervised and unsupervised machine learning algorithms and will learn how to employ these techniques in the context of popular applications such as automatic classification, recommender systems, searching and ranking, text mining, group and community discovery, and social media analytics.

PREREQUISITES

CSC 401 and IS 467 (formerly IS 567)

TEXTBOOKS & COURSE MATERIAL

We will use numerous online resources and documents throughout the course. The required and recommended textbooks are listed below. The resources directly relevant to topics covered in the course are listed in the Course Material section. Additional resources can be found on the Resources section.

 

Required Text

Machine Learning in Action, by Peter Harrington, Manning Publications, 2012.

Also available at Amazon.

Recommended Texts

  Python Data Science Essentials - Learn the fundamentals of Data Science with Python, by Alberto Boschetti and Luca Massaron, Packt Publishing, 2015

Also, available at Amazon.
  Python for Data Analysis, by Wes McKinney, O'Reilly, 2012.

Also available at Amazon.
  Data Mining: Practical Machine Learning Tools and Techniques, by Ian Witten and Eibe Frank, 3rd Ed., Morgan Kaufmann, 2011. 

Available at Amazon.

GRADING & COURSE REQUIREMENTS

The structure and grading in the class will be centered around 4-5 assignments and a final project. The assignments will involve Python implementations of selected data mining techniques and their applications in various domains. The assignments will typically involve both programming components as well as problems related to the material covered in class. Some assignments may also involve the use of other open source data mining tools. These assignments must be done individually, unless otherwise specified. Late assignments will be penalized 10% per day (with weekends counting as one day).

 The final project will be a more complex programming/implementation assignment that will involve integrating multiple concepts and techniques. Student will be able to choose from among several possible projects ideas or propose their own. More details on the final project are available in the Project section.

The final grade will be determined (tentatively) based on the following components:

    Assignments = 65%
    Final Project = 35%
The general grading scheme will be based on a curve. At the end of the quarter, some adjustments may be made based on overall class performance as well as signs of individual effort. Plusses and minuses will be given at the high/low ends of each grade range.

TENTATIVE LIST OF TOPICS

The following issues and topics will be covered throughout the course. Many of these topics will be revisited several times during the course in a variety of contexts.

  • Data Mining and Knowledge Discovery
    • The KDD process and methodology
    • Data preparation for knowledge discovery
    • Overview of data mining and Machine Learning techniques
    • Review of Python and overview of Python tools for Data Analysis
  • Supervised Techniques
    • Classification and Prediction using K-Nearest-Neighbor
    • Classifying with Probability Theory; Naïve Bayes
    • Building Decision Trees
    • Forecasting and Regression models
    • Evaluating predictive models
  • Unsupervised Learning
    • Clustering using K-Means
    • Association Rule discovery
    • Sequential Pattern Analysis
    • Principal Component Analysis and Dimensionality Reduction
  • Possible Applications (covered throughout the course)
    • Collaborative Recommender Systems
    • Content Based personalization
    • Predictive User Modeling
    • Concept Discovery from Documents, Blogs, Social Annotations
    • Finding groups using social or behavioral data
    • Building predictive models for target marketing
    • Customer or user segmentation
  • Advance Topics (if time permits)
    • SVD and Matrix Factorization
    • Search and Optimization Techniques
    • Markov Models
    • Dealing with Big Data and MapReduce

Copyright © 2016-2019, Bamshad Mobasher, DePaul University