Online Resources & Reference Material
- The primary textbook for the course; by
Christopher D. Manning
, Cambridge University Press. 2008.
Porter's Stemmer Online
- Try doing some stemming with this online implementation of Porter's algorithm. You can enter a set of words, a sentence, or a paragraph and get the stemming results.
Information Retrieval Resources
- Resource page for the Manning et al. IR Text book.
IR Resources (university of Glasgow)
Information Retrieval (Online book by van Rijsbergen)
Search Engine Watch
TREC Home Page
Information Retrieval Links
Natural Language Processing & Information Retrieval (NLPIR)
Information Visualization Resources on the Web
Tools and Software
- use Google's indexing & search services to build your own application.
Tools for Preprocessing
- includes stemming and stop word removal, and a program to extract text and text frequency from HTML files.
- a full-featured text search engine library in Java
- Apache's open-source web crawler based on Java.
A Comparison of Open Source Search Engines
(contains an up-to-date list of available search engine software) - by
C. Middleton, R. Baeza-Yates
: A general parsing library for Python particularly useful for parsing html and xml.
Natural Language Toolkit for Python, including tools for text preprocessing, tokenization, and vectorization (you may also be interested in an
that shows how NLTK is used).
open source search engines
MySQL full text search
Text to Matrix Generator
, a MATLAB toolbox for indexing, retrieval and other text processing tasks
Web IR Resources
Other Related and Useful Links
Common IR Test Collections
A Small Stop List
(Recommended for Projects)
Porter's Stemming Algorithm
A Big Stop List
Copyright © 2018-2019,
, DePaul University.