Lecture Notes
& Schedule
Projects
Assignments
Syllabus
Resources
& Links
CDM COL
DePaul academic
calendar |
CSC 594 Topics in AI: Applied Natural Language
Processing
Fall 2009/2010
There are many NLP tools available online.
From what I looked around, it seems the
Stanford
University NLP Annotated List of Resources (by Chris Manning)
provides the most comprehensive (and relatively up-to-date) list of NLP
tools and resources. Of the tools, my recommendations are as follows:
- Part of speech tagging
-
Stanford POS
tagger -- Loglinear tagger in Java (by Kristina Toutanova)
-
Brill's
Transformation-based learning Tagger -- A symbolic tagger,
written in C.
-
- Information Extraction
-
MALLET -- A library of
Java code for machine learning applied to text. It provides
facilities not only for document classification, but also
information extraction, part-of-speech tagging, noun phrase
segmentation, and much more.
-
GATE -- A
comprehensive NLP tool suite. It includes various tools
(some external ones) for word/sentence segmentation,
part-of-speech tagging, named entity recognition, information
extraction, dependency parsing, etc.
-
- Named Entity Recognition
-
Stanford Named Entity Recognizer
--
A Java Conditional Random Field sequence model with trained
models for Named Entity Recognition. Java. GPL. By Jenny Finkel.
-
LingPipe --
Tools include statistical named-entity recognition, a
heuristic sentence boundary detector, and a heuristic
within-document coreference resolution engine. Java. GPL. By Bob
Carpenter, Breck Baldwin and co.
-
- Parsing -- Full, probabilistic, partial and dependency parsing
Stanford
Parser -- Java implementations of probabilistic parsers, both
highly optimized PCFG and dependency parsers, and a lexicalized PCFG
parser.
Collins
Parser -- PCFG parser by Michael Collins.
Minipar
-- A principle-based broad coverage full parser by DeKang Lin.
MSTParser--
Dependency parser by Ryan McDonald.
YamCha -- SVM-based NP-chunker, also usable for POS tagging, NER,
etc. C/C++ open source.
- Generic Sequence Modeling
MALLET -- See above
under Information Extraction
CRF++ --
Generic CRF-based model in C++. Open source. By the author of
YamCha.
|