ECT 584 - Web Data Mining

ECT 584
Spring 2015

Course Material

Assignments

Class Project

Online Resources

Home

Comments/Suggestions

Final Project

Final Project Checklist - Information about what you need to submit for the final project.

In addition to assignments that will be completed throughout the quarter, you will be required to complete a final project in the class by the end of the quarter. There are two important due dates to keep in mind: The project proposals are due no later than Sunday, May 3. The project due date is Monday, June 8.

You may choose one of three types of projects: an implementation project, a research paper, or a data analysis project. A project proposal must be submitted and approved prior to proceeding with the project. The project proposal (1-2 pages) must include partner names (if applicable), Project type (one of the 3 categories below), project description, proposed methodology, techniques, approaches, implementation choices, resources used, and the tentative schedule. In the case of research papers, the proposal must contain an abstract, a short (but detailed) outline, and a list of reference sources to be used in the research. In the case of data analysis project, the proposal must include a detailed description (and samples) of the data to be analyzed, the data mining problems to be solved, the techniques to be employed to solve the problems, and the tools to be used.

Note: Research papers must be done individually. Implementation or data analysis projects can be done individually or in groups of up to 3 students (depending on the scope and complexity of the project, and with prior approval).

More details on and examples of different project types are provided below.

Research Papers

Research papers involve doing an in-depth study, survey, or evaluation of one or more topics related to Web data mining. A research paper may examine the use of specific data mining or Web mining techniques in one or more application areas. A research paper must relate to one or more of the topics discussed in class, but must not be simply a summary of the material covered in class or the readings. The goal of such a "research project" is to go beyond the class material and examine one of the topics in a much more in-depth manner. The evaluation of the papers will be based on thoroughness (including adequate coverage of relevant issues/techniques as well as references to related work), soundness (including justification for any claims made, illustrative example, correct and adequate analysis of connections or relationships among concepts or techniques), clarity/organization, and significance (defined by the degree to which the paper covers new material, and the extent of the original work by the author in drawing conclusions and synthesizing scholarly work in this area). Note that a research paper must not simply be a concatenation of material from several other papers, but must include some original analysis of that work in the context of the paper topic. The suggested length for a research paper is 15-20 single-spaced pages.

Some examples of topics for research papers include but are not limited to the following:

Recommender Systems and Personalization: a comparative study of various techniques in data mining and machine learning to learn user profiles and predict future user behavior. The study should examine variety of techniques and approaches used in the design of recommender systems (including collaborative filtering, content-based filtering, model-based approaches which use data mining, and hybrid recommender systems).
Web content mining/Text mining: a study of various techniques to mine information and patterns from semi-structured data on the web such as text, Web documents, news, reviews, etc. The paper may examine in detail some of the related topics such as the applications of text mining on the Web, information extraction and mining of data records from the Web, concept discovery from document collections, event detection and topic tracking, sentiment analysis and opinion mining, etc.
Web structure mining: a study of various techniques to mine knowledge from the linkage structure of the Web. Among the topics that can be explored in this study are application of structure mining in information retrieval (such as Google's Pagerank algorithm), algorithms based on the notions of Hubs and Authorities, and the automatic discovery of Web communities. The paper may also examine approaches based on graph analysis and mining used in social or information networks.
Data Mining on the Social Web: Social Web (Web 2.0) technologies allow users to connect, share resources, and actively generate content on Web sites. How can the rich data in such social networking sites such as Facebook and Twitter, or in resource sharing sites such as Flickr, Last.fm, and Delicious be mined to help users interact with these sites more effectively. Among the topics that can be explored are the use of Social Network Analysis algorithms to discover interesting relationships, using data mining and machine learning algorithms to predict user behavior or to recommend resources and users (friends), and application for tag suggestion/ recommendation in social tagging Web sites.
A study of platforms and approaches to mine and analyze data at the large scale. This study must include an analysis of challenges in managing and leveraging large data repositories and various proposed and implemented solutions (such as Big Table, Map Reduce, and other approaches based on "cloud computing"). The study can also focus on implementation platforms that enable scalable data mining (e.g., Hadoop), and ideally provide illustrative examples or experiments with one or more of these platforms.

Data Analysis Projects

Data Analysis projects involve the application of data mining and Web mining techniques discussed in class or in readings to one or more specific data sets. The goal of DA projects is to go through the full data mining cycle with respect to a particular data set (including the specification of the business problem to be solved, the specification of the data mining tasks to be performed, selection, preprocessing, integration, and transformation of the data, application of several DM tasks and the discovery of patterns, evaluation of patterns, and recommending specific actions with respect to relevant findings). A DA projects may involve the application of data mining in a particular domain and data set with which you are familiar such as your work, the Web, e-commerce, etc. The final report should include a detailed analysis of the complete scenario for the application of the KDD process, including specification of the DM problem (based on application objectives), data collection, data preparation, pattern discovery using a variety of data mining and statistical techniques, interpretation of results, and conclusions).

Some examples of data analysis projects include:

Performing data mining on Web usage (or e-commerce) data from a particular Web site in order to analyze the behavior of users, including various site metrics, user metrics, user segments, associations, and opportunities for personalization. The project plan must include all aspects of Web usage preprocessing discussed in class. Note: in lieu of a real data set from another source, you can use the CTI Web usage data available from the Online Resources section.
Performing the full KDD cycle of a real data set (other than Web usage or e-commerce data). Examples of such data may include (but are not limited to) user or customer data from a real business or organization, census or demographic data, sensor data for device diagnostics, network traffic data, music playlist data obtained from music sharing Web sites, social networking data (such as social tags obtained from sites such as del.icio.us or last.fm).
Examination of one or more specific commercial or freely available data mining packages, other than those used in class. In this option you must be able to install and experiment with the system. The package must be compared in detail with other comparable products. The final report must include a technical evaluation and provide a critical analysis of the results of applying various KDD capabilities provided by the software on at least two realistic data sets, such as test data sets available from the UCI KDD Archive.

Implementation Projects

The implementation projects may involve implementing and performing experimental evaluation on one or more techniques discussed in the course (e.g., clustering, association rules, classification, etc.), or combining various data mining techniques (possibly using available tools) into a Web data mining solution for a specific problem.

Some specific examples of implementation projects include:

Implementing or extending one of the data mining techniques discussed in class (or related techniques) and testing the implementation on various test data sets, such as Web usage or e-commerce data or other test data sets available from the UCI KDD Archive.
Implementing or extending the techniques discussed in class for preprocessing of Web usage data, including user/session identification, path completion, automatic discovery and filtering of robot navigation, and pageview identification. Ideally, this should be implemented as an extension of the WEKA data mining package, so that the results of preprocessing can be directly used as input for DM components.
Designing and implementing a data warehouse for integration and management of Web usage, structure, content, and e-commerce data, and analyzing this data by performing OLAP queries against the data warehouse, and using the results as input to data mining algorithms.
Implementing a system to analyze the effectiveness of a Web site by comparing the site structure to the navigational behavior of users, analyzing site and user e-metrics, and predict user behavior for individual or segments of users.
Implementing a recommender system based on usage mining, content filtering, or collaborative filtering techniques discussed in class.
Designing an automated classification tool that uses machine learning to automatically identify and classify robot navigation sessions from Web usage log files.
Designing a query language for querying interesting rules or patterns resulting from Web usage or Web content data.
Developing your own mail, news, or Web information filtering agent (e.g., an agent that extracts information about a particular topic/product from specific sites). The agent design must include one or more machine learning and data mining techniques such as classification, clustering, association rule mining, Markov models, etc.