ECT 584
Spring 2015


 Course Material 


 Class Project 

 Online Resources 



 Online Resources & Reference Material


WEKA-Related Resources

Data Mining Resources & Reference Material

Data Sets and Sources of Data

  • Preprocessed DePaul CTI Web Usage Data - Cleaned, filtered, and sessionized data of visits to the main CTI site during a 2 week period in April 2002. The data also includes basic statistics on users and sessions.
  • Cleaned DePaul CTI Web Usage Data - The full cleaned CTI Web usage data for April 2002. This data set has been cleaned (including spider removal) and converted into tab delimited format. However, no user identification, sessionization, or other data preparation steps have been performed.
  • Non-Preprocessed DePaul CTI Web Usage Data - The full CTI Web usage data for April 2002. The only cleaning step performed on this data was the removal of references to auxiliary files (e.g., image files). No other cleaning or preprocessing has been performed. The data is in the original log format used by Microsoft IIS.
  • UCI Machine Learning Repository - A repository of more than 200 data sets for machine learning and data mining
  • Movie Ratings Data - Real movie ratings data from Web site. Contains ratings on 1600+ movies by 1000 users
  • Competition Data Sets - Data sets from a variety of competitions. Also a good source for class project ideas.
  • Stanford Large Network Dataset Collection - A variety of network data sets, including data from social networks, product reviews, online communities, etc.
  • Yelp Data Set Challenge - Reviews and check-in data on thousands of businesses.
  • Million Song Dataset - Freely-available collection of audio features and metadata for a million contemporary popular music tracks.
  • Public Data sets on Amazon Web Services - Large public data sets (including data sets for US Census, Wikipedia, Freebase, human genome project), ready for big data analytics on the cloud.
  • - Publically available data sets from Federal, State, and local government, including economic, geological, demographic and many other types of data sources. This site also includes a list of other Open Data Sites with similar publicly available data sources from various cities, states, and countries.
  • KDnugget's list of data sets for data mining
  • Infochimps Data Market - Thousands of data sets, including data from various social networks and collaborative tagging sites such as Twitter, Delicious,, MusicBrainz, as well as data sets from many other domains.


Copyright © 2014-2015, Bamshad Mobasher, DePaul University.