CSC 575
Winter 2015

 Syllabus 

 Course Material 

 Assignments/Exams 

 Class Project 

 Online Resources 

 Home




Comments/Suggestions



Intelligent Information Retrieval

IR Tools and Software

  • ALIWEB
    ALIWEB is a framework for automatic collection and processing of resource indices in the World Wide Web. It retrieves index files from many servers in the Web and combines them into a single searchable database in IAFA (Internet Anonymous FTP Archives) format.

  • WAIS Toolkit
    WAIS Toolkit is essentially the Windows NT implementation of the freeWAIS software. Along with the executable files, the ZIP file below also contains the complete source code. The source (written in C) includes tools for creation of an inverted index (using a dictionary and hashing) as well as C implementations of Porter's stemmer and stop list utilities (see the "ir" directory).

  • BDDBot
    BDDBot is a web robot, search engine, and web server written entirely in Java. The complete source code, as well as the class files are included in the ZIP file below, including the Java classes for indexing, crawling, and searching.

  • ClientSearch.java
    This is another simple search engine written in Java. However, ClientSearch.java does not create an index; it extracts links from a specified page, and searches for a query string among these pages.

  • ICE
    ICE is a Perl package for indexing World Wide Web archives. By installing it as a CGI gateway, users can perform searches on the Web servers document space. It is written entirely in Perl in two files: one for indexing and another for performing the search.
    • These are the two files containing the Perl source code for ICE: ice2-for.pl and ice2-idx.pl.
    • The documentation and additional information can be found from the ICE Homepage.

  • SWISH-E
    SWISH-Enhanced is a flexible and easy to use system, written in C, for indexing collections of Web pages or other text files. Key features include the ability to limit searches to certain HTML tags. The SWISH-E software also includes a package of Perl programs that allow you to create and maintain indexes (AutoSwish).
    • The following Source Directory contains the complete source code for SWISH-E.
    • the AutoSwish package is contained in the following ZIP file autoswish.zip.
    • The documentation and additional information can be found from SWISH-E Web site.

  • ISearch
    ISearch is a sophisticated IR system developed at the Center for Networked Information Discovery and Retrieval (CNIDR). It is written in C++ and the source code is available form the compressed gzipped file below. Isearch support fielded searching, relevance ranking, Boolean queries, free-text search, hetergeneous database, and powerful document type-specific customization and extensibility via C++ class inheritance.
    • The complete source code and documentation is in the Source Directory.
    • The documentation and additional information can be found from CNIDR.

  • WebGlimpse
    WebGlimpse is a flexible search engine, written mainly in Perl, which combines search and browsing, and allows the search to cover the neighborhood of a specified page or the whole site. WebGlimpse uses the Glimpse search engine, developed at the Univesity of Arizona.

  • Xavatoria
    Xavatoria is another Perl search and indexing program which uses meta tags, such as keywords and description, to index files. The list of hits for a particular query is weighted by relevance Xavatoria allows for Boolean operators, grouping, case control, and basic wildcard searches.

  • Some Additional Programs and Scripts
    • stem.pl - A Perl function implementing Porter's stemming algorithm. Also see: test_stem.pl.
    • stem.c - A C function implementing Porter's stemming algorithm.
    • HomePageSearch.java - A simple keyword search program in Java.
    • websearch.pl - A simple keyword search program in Perl.
    • meta-idx.pl - A simple Perl program that creates a META tag index file from Web pages.

  • Some useful articles


Back to Online Resources

Copyright © 2014-2015, Bamshad Mobasher, DePaul University.