CSC 575
Winter 2017

 Syllabus 

 Course Material 

 Assignments 

 Class Project 

 Online Resources 

 Home




Comments/Suggestions



Intelligent Information Retrieval

Class Project

Notes:

  • Final Project Checklist - Information about what you need to submit for the final project.
  • Each group or individual must submit a specific project proposal to be approved no later than February 13. Written research projects must be done individually. Implementation projects can be done in groups of up to 3, depending on the size and complexity of the project. Note that the size and make up of the groups must also be approved along with the project proposal.
  • Due date for final project: Monday, March 13.

The following are a list of ideas for the class project. You may choose any of these ideas or their variations. You may also choose to combine parts of these projects, or come up with your own idea. In all cases, however, your project idea is subject to approval based on a project proposal that specifies on a set of project requirements and deliverables.

Written Projects

Written projects involve doing an in-depth study, survey, or evaluation of one or more topics related to information retrieval and filtering. The project can take a form of a research paper examining the use of a specific technique or model in various IR systems, or it can be a detailed case study involving two or more existing IR systems. In either case, the paper should contain a summary and a technical evaluation of the state-of-the-art related to the particular topic studied. If the paper involves a case study, then a thorough comparative evaluation with other similar systems must be provided. A research paper should present a new idea or provide a detailed survey of methods to solve a specific IR-related problem. The approach presented should be, at least in part, a novel and original contribution, and should ideally be evaluated experimentally. A research paper could be good start for a Masters or Ph.D. research project. The maximum length for the written projects is 20 single-spaced pages (12 point font), including figures and references. The evaluation of the papers will be based on clarity, thoroughness, and soundness of ideas and concepts presented, as well as the overall organization of the paper.

Note: Written project should not simply be a summary of some of the material covered directly in the lectures, but rather should go beyond this material in one or more specific areas related to that material. The following is a non-exhaustive list of ideas for a written project (very broadly stated):

  • Personalized Search: A study of various techniques and approaches used to create personalized search applications on the Web. The study should include a survey of techniques for re-ranking or filtering search results based on user profiles, as well as intelligent agents that take into account user characteristics or profiles to assist users in search.

  • Exploring various techniques for Web IR based on hyperlink analysis and mining. The study should include examination of techniques based on linkage as a measure of authority of the information source (e.g., HITS or Pagerank algorithms), as well as other techniques to use ratings or popularity as measures of quality or authority.

  • A comparative study of implementation techniques for scalable information retrieval on large-scale search engines or Web-based information systems (such as Google, Facebook, etc.). This study must include an analysis of challenges in managing and leveraging large data repositories and various proposed and implemented solutions (such as Big Table, Map Reduce, and other approaches based on "cloud computing"). The study can also focus on implementation platforms that enable scalable retrieval (e.g., Hadoop).

  • Study of the use of social network analysis and its use in information retrieval. This study should include a detailed summary of various techniques from SNA and their use in providing relevant information to users in online social network and/or traditional search engines.

  • Web Content Mining: a study of various techniques to mine information and patterns from semi-structured data on the Web. Examples include the use of agents designed to extract specific types of information (e.g., shopping agents), the use of XML to integrated available "meta-data" into current search technologies, Web data warehousing, etc.

  • Web Usage Mining: a study of the feasibility and effectiveness of techniques to incorporate Web usage data (e.g., clickthrough data, search query logs, and user behavior data) into search and retrieval, and how this can be used to develop more effective search engines.

  • Collaborative Filtering and Recommender Systems: A comparative study of various collaborative filtering techniques and their applications in several recommender systems. The study should include a technical summary of various techniques, and evaluation of existing methods in use today on the Web.

Implementation Projects

Implementation projects involve the development and evaluation of an original application using information retrieval, text mining, and/or machine learning techniques. The application must be tested and evaluated using appropriate test data sets. The application must also involve  the use of one or more of the modeling techniques relevant to the course topics. Your application may also include a significant extension of an existing applications or technqiues discussed in class materials or other sources (in this case, the application must be extended to include additional or more sophisticated types of featrues). The deliverable for the project must include the fully documented code, distribution files, including any third party sources, installation/deployment documents (including examples, screen shots of test runs, etc.), data used for the evaluation of the application, and a detailed project report providing a description of the components of the application and the results of evaluation. Many different types of applications are possible, but some examples of such applications include (but are not limited to):

  1. Build your own search/retrieval system:
    • Should include implementations for the basic components including separate crawler, indexer, and query processing components (including a reasonable query interface)
    • Should work on a local document corpus in a directory structure or as a Web search engine (applied to a limited set of Web sites or for a specific domain)
    • The indexing component should parse and index documents using inverted file format with relevant term frequency information
    • Should make use of stemming and stop lists (you can existing tools for this part).
    • The system should use TF-IDF weights (and possibly additional weighting schemes) for index terms
    • The base implementation should use the vector-space model with Cosine similarity to be used for the matching queries and indexed documents. Optionally, you can implement other retrieval models such as probabilistic models or models based on link analysis.
    • It should be possible to save the index to an offline storage and reload it for subsequent retrieval sessions (during a retrieval session, the search engine should run in the background as a server process and handle incoming queries).
    • Optional components or functionality can be added depending the desired features or complexity of the project, including: additional weighting schemes, part-of-speech tagging, phrase indexing, n-gram indexing, proximity operators, personalized search, and relevance feedback.

  2. Implement a personalized information filtering system:
    • Your system should provide the capability for selective dissemination of information based on a user profile.
    • The system should obtain and subsequently update a user's profile represented as a set of topics (e.g., using a vector-space representation)
    • Based on the user profile, the system (in the background) should search for items of interest to the user. Depending on the type of target domain, these items could be interesting Web pages, news stories, blog posts, tweets or posts on other social networking sites, or even objects of interest (movies, books, consumer items, etc.). The applications can be a general information filtering agent, or an agent designed to work in a specific target domain (e.g., a personalized shopping agent, a news filtering agent, etc.).
    • The user's profile should be updated when the user provides feedback on one or more of recommended items (e.g., using relevance feedback).
    • The system should minimally include components to create and maintain an index of items/documents selected, a component to maintain and update a user profile, and a component to search selected Web sites in the background for items of interest.
    • Optionally, the system can include additional features such as clustering and categorization of items selected for the user; the ability to update search for items similar to a recommended item selected by the user (e.g., "more like this" capability), etc.

  3. Design an enhanced user interface for a retrieval system:
    • Your interface should help guide the user in formulating a query. You can explore options such as the use of a classification hierarchy (such as Yahoo's category labels), providing the capability for natural language queries (possibly through the use of WordNet and basic natural language processing tools such as part-of-speech tagging), adding context-awareness by maintaining a user profile (based on past searches or other types of preference elicitation) in order to reduce ambiguity in queries, etc.
    • Your system should also provide and enhanced interface for the user to browse the retrieved documents and provide mechanisms such as relevance feedback and query by example.
    • Finally, the system should have the ability to cluster the retrieved documents (preferably using hierarchical clustering) and present the clusters to users for easier navigation and browsing.
    • For this project you don't have to implement your own indexing and matching algorithms, however, you may need to modify an existing system (with source code) to incorporate the additional capabilities. You may also need to do post processing of documents retrieved as a results of a search. 

  4. Build a simple recommender system:
    • Allow multiple users to access a server and rate items based on their preferences (e.g., movies, music, Web pages, etc.);
    • Use collaborative filtering technology (or other profiling techniques such as clustering) to find similar groups of users.
    • Based on ratings of other similar users create dynamic recommendations for the current user of the system.
    • Alternatively or in conjunction to collaborative filtering, you may use content-based filtering approaches that compare items in a user's profile with other similar items as a way to generate recommenations.
    • Many different variations of this idea is possible.


Copyright © 2016-2017, Bamshad Mobasher, DePaul University.