DSC 478
Fall 2025

 Syllabus 

 Course Material 

 Assignments 

 Class Project 

 Online Resources 

 Home




Comments/Suggestions



 Online Resources & Reference Material

General Python Resources

Important Tools and Libraries

  • Jupyter NotebooK Documentation
  • Jupyter Notebook Tutorial - Nice  tutorial video by Corey Schafer.
  • matplotlib:  A very nice plotting library, capable of generating production-level visualizations programmatically. Matlab-like syntax makes plotting very easy.
  • NumPy: The fundamental package for scientific computing with Python.
  • SciPy: the open source library for mathematics, science and engineering
  • scikit-learn: a robust machine learning library building on top of NumPy, SciPy and matplotlib. Includes of a wide variety of modeling techniques.
  • Pandas (python data analysis library): data structures and tools for common data analysis tasks, including an efficient data frame implementation (similar to R).
  • NLTK: Natural Language Toolkit for Python, including tools for text preprocessing, tokenization, and vectorization (you may  also be interested in an online book that shows how NLTK is used).
  • BeautifulSoup: A general parsing library particularly useful for parsing html and xml.
  • NetworkX: Python language library for the creation, manipulation, and analysis of graphs and networks.

Installation of Python and Scientific Libraries

  • Anaconda - (Mac, Windows, Linux) Python distribution for large-scale data processing and scientific computing (includes scientific and data analysis libraries such as Numpy, Pandas, and scikit-learn, as well as Jupyter). This is the recommended package for this class.
  • Notepad++: Excellent Python-friendly text editor
  • Standalone Python Distributions

References for Data Analysis in Python

Other Relevant Tools & Resources

Data Sets

 


Copyright ©, Bamshad Mobasher, DePaul University.