CSC 594 Home

Lecture Notes
& Schedule

Projects

Assignments

Syllabus

Resources
& Links

 


CDM COL

DePaul academic 
calendar

 

CSC 594 Topics in AI: Applied Natural Language Processing
Fall 2009/2010

There are many NLP tools available online. 
From what I looked around, it seems the Stanford University NLP Annotated List of Resources (by Chris Manning) provides the most comprehensive (and relatively up-to-date) list of NLP tools and resources.

Of the tools, my recommendations are as follows:

  • Part of speech tagging
    * Stanford POS tagger -- Loglinear tagger in Java (by Kristina Toutanova)
    * Brill's Transformation-based learning Tagger -- A symbolic tagger, written in C.
  •  
  • Information Extraction
    * MALLET -- A library of Java code for machine learning applied to text. It provides facilities not only for document classification, but also information extraction, part-of-speech tagging, noun phrase segmentation, and much more.
    * GATE -- A comprehensive NLP tool suite.  It includes various tools (some external ones) for word/sentence segmentation, part-of-speech tagging, named entity recognition, information extraction, dependency parsing, etc.
  •  
  • Named Entity Recognition
    * Stanford Named Entity Recognizer -- A Java Conditional Random Field sequence model with trained models for Named Entity Recognition. Java. GPL. By Jenny Finkel.
    * LingPipe -- Tools include statistical named-entity recognition, a heuristic sentence boundary detector, and a heuristic within-document coreference resolution engine. Java. GPL. By Bob Carpenter, Breck Baldwin and co.
     
  • Parsing -- Full, probabilistic, partial and dependency parsing
    * Stanford Parser -- Java implementations of probabilistic parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser.
    * Collins Parser -- PCFG parser by Michael Collins.
    * Minipar -- A principle-based broad coverage full parser by DeKang Lin.
    * MSTParser-- Dependency parser by Ryan McDonald.
    * YamCha -- SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source.
     
  • Generic Sequence Modeling
    * MALLET -- See above under Information Extraction
    * CRF++ -- Generic CRF-based model in C++. Open source. By the author of YamCha.