| CSC 575
 Winter 2023
 
  Syllabus   Course Material   Assignments   Class Project   Online Resources   Home 
 
 Comments/Suggestions
 
 
 | Intelligent
Information
Retrieval
 Class Project
Notes: 
  
Final Project Checklist - Information about
what you need to submit for the final project. Projects may be done in groups of up to 3, depending on the size and 
  complexity of the project. Each group or individual must submit a specific project proposal to
be approved by February 11.  Note that the size and make up of the groups must also be approved
along with the project proposal. Due date for final project: Wednesday, March 15. 
 
The following are a list of ideas for the class project. You may
choose any of these ideas or their variations. You may also choose
to combine parts of these projects, or come up with your own idea. In
all cases, however, your project idea is subject to approval based on a project 
proposal that specifies on a set of
project requirements and deliverables.
 Implementation ProjectsImplementation projects involve the development and evaluation of an original 
application using information retrieval, text mining, and/or machine learning 
techniques. The application must be tested and evaluated using appropriate test 
data sets. The application must also involve  the use of one or more of the 
modeling techniques relevant to the course topics. Your application may also 
include a significant extension of an existing applications or technqiues 
discussed in class materials or other sources (in this case, the application 
must be extended to include additional or more sophisticated types of featrues). 
The deliverable for the project must include the fully documented code, 
distribution files, including any third party sources, installation/deployment 
documents (including examples, screen shots of test runs, etc.), data used for 
the evaluation of the application, and a detailed project report providing a 
description of the components of the application and the results of evaluation. 
Many different types of applications are possible, but some examples of such 
applications include (but are not limited to) the following.
Build your own search/retrieval system:
 
   Should include implementations for the basic components including 
	separate crawler, indexer, and query processing components (including a 
	reasonable query interface)Should work on a local document corpus in a 
	directory structure or as a Web search engine (applied to a limited set of 
	Web sites or for a specific domain)The indexing component should parse and index documents using inverted file format 
	with relevant term frequency informationShould make use of stemming and stop lists (you can
     existing tools for this part).
   The system should use TF-IDF weights (and possibly additional weighting 
	schemes) for index termsThe base implementation should use the 
	vector-space model with Cosine similarity to be used for the matching 
	queries and indexed documents. Optionally, you can implement other retrieval 
	models such as probabilistic models or models based on link analysis.It 
	should be possible to save the index to an offline storage and reload it for 
	subsequent retrieval sessions (during a retrieval session, the search engine 
	should run in the background as a server process and handle incoming 
	queries).Optional components or functionality can be added depending the 
	desired features or complexity of the project, including: additional 
	weighting schemes, part-of-speech tagging, phrase indexing, n-gram indexing, 
	proximity operators, personalized search, and relevance feedback.Implement a personalized information filtering system:
 
   Your system should provide the capability for selective
     dissemination of information based on a user profile.
   The system should obtain and subsequently update a user's profile 
	represented as a set of topics (e.g., using a vector-space representation)
	Based on the user profile, the system (in the background) should search for 
	items of interest to the user. Depending on the type of target domain, these 
	items could be interesting Web pages, news stories, blog posts, tweets or 
	posts on other social networking sites, or even objects of interest (movies, 
	books, consumer items, etc.). The applications can be a general information 
	filtering agent, or an agent designed to work in a specific target domain 
	(e.g., a personalized shopping agent, a news filtering agent, etc.).The 
	user's profile should be updated when the user provides feedback on one or 
	more of recommended items (e.g., using relevance feedback).
   The system should minimally include components to create and maintain an 
	index of items/documents selected, a component to maintain and update a user 
	profile, and a component to search selected Web sites in the background for 
	items of interest.Optionally, the system can include additional features 
	such as clustering and categorization of items selected for the user; the 
	ability to update search for items similar to a recommended item selected by 
	the user (e.g., "more like this" capability), etc.Design an enhanced user interface for a retrieval system:
 
   Your interface should help guide the user in formulating a
     query. You can explore options such as the use of a
	 classification hierarchy (such as Yahoo's category labels), providing the 
	capability for natural language queries (possibly through the use
	 of WordNet and basic natural language processing tools such as 
	part-of-speech tagging), adding context-awareness by maintaining a user 
	profile (based on past searches or other types of preference elicitation) in 
	order to reduce ambiguity in queries, etc.
   Your system should also provide and enhanced interface for
     the user to browse the retrieved documents and provide mechanisms
	 such as relevance feedback and query by example.
   Finally, the system should have the ability to cluster the retrieved 
	documents (preferably using hierarchical clustering) and present the 
	clusters to users for easier navigation and browsing.For this project you don't have to implement your own indexing
     and matching algorithms, however, you may need to modify an
	 existing system (with source code) to incorporate the additional
	 capabilities.
   You may also need to do post processing of documents retrieved as a results 
	of a search. 
   Build and evaluate a recommender system:
 
   Allow multiple users to access a server and rate items based on their
   preferences (e.g., movies, music, Web pages, etc.);
   Use different methods such as collaborative and content-based filtering technology (or other profiling techniques
   such as clustering.
   Based on ratings of other similar users create dynamic recommendations
   for the current user of the system.
   Alternatively or in conjunction to collaborative filtering, you may use 
	content-based filtering approaches that compare items in a user's profile 
	with other similar items as a way to generate recommenations.Many different variations of this idea is possible.
    Research Papers
Research projects involve doing an in-depth study, survey, or evaluation of one or more topics
related to information retrieval and filtering. The project can take a form of a 
research paper
examining the use of a specific technique or model in various IR systems, or it can be a detailed
case study involving two or more existing IR systems. In either case, the paper should contain a
summary and a technical evaluation of the state-of-the-art related to the particular topic studied.
If the paper involves a case study, then a thorough comparative evaluation with other similar
systems must be provided.  A research
paper should present a new idea or provide a detailed survey of methods to solve a specific IR-related problem. The approach
presented should be, at least in part, a novel and original contribution, and should be
evaluated experimentally. A research paper could be good start for a Masters or Ph.D. research
project. The maximum length for the written projects is 20 single-spaced pages (12 point font),
including figures and references. The evaluation of the papers will be based on clarity,
thoroughness, soundness, originality, and evaluation of ideas and concepts presented, as well as the overall organization
of the paper.
 
Note: Research projects should not simply be a summary of some of the 
material covered directly in the lectures, but rather should go beyond this 
material in one or more specific focus areas and attempt to survey and 
synthesize some of the recent research ideas and methods in that focus area. A 
typical research paper should also include implementations of one or more such 
techniques and their evaluation against some baselines using at least one data 
set. 
A list of potential general areas are as follows. 
Personalization in Search: A study of various techniques and approaches used to 
create personalized search applications on the Web. The study should include a 
survey and a comparative evaluation of techniques for re-ranking or filtering search results based on user 
profiles, as well as intelligent agents that take into account user 
characteristics or profiles to assist users in search.
Topic/event prediction and tracking: using pattern extraction from 
unstructured data (such as news stories, social media posts, tweets, etc.) 
possibly in conjunction with the underlying graph structures inherent in social 
networks to identify and track topics, or to predict events.
A comparative study of implementation techniques for scalable information 
retrieval on large-scale search engines or Web-based information systems (such 
as Google, Facebook, etc.). This study must include an analysis of challenges in 
managing and leveraging large data repositories and various proposed and 
implemented solutions (such as Big Table, Map Reduce, and other approaches based 
on "cloud computing"). The study can also focus on implementation platforms that 
enable scalable retrieval (e.g., Hadoop).
Study of the use of social network analysis in information 
retrieval. This study should include a detailed summary of various techniques 
from SNA and their use in providing relevant information to users in online 
social network and/or traditional search engines. The study should also explore 
the use of network and graph structures in social networks to identify or 
predict trends, patterns, and relationshipsIntegration of semantic knowledge in search: a study of various techniques 
to mine information and patterns from semi-structured data on the Web for more 
intelligent search. The study may include the use of agents designed to extract 
specific types of information, the integration and 
semantic knowledge such as ontologies and knowledge graphs into current search 
technologies, etc., as well as the use of natural language processing and other 
relevant techniques to extract meaningful semantics information from 
unstructured data.  
Recommender Systems: A comparative study of various recommender systems 
techqniues including different approaches to collaborative
and content-based filtering and their applications in several recommender systems. The study should include
a technical summary of various techniques, and evaluation of existing methods in use today on the Web.
 
 |