Package edu.stanford.nlp.process

Contains classes for processing documents.

See:
          Description

Interface Summary
Function<T1,T2> An interface for classes that act as a function transforming one object to another.
Tokenizer<T> Tokenizers break up text into individual Objects.
 

Class Summary
AbstractTokenizer<T> An abstract tokenizer.
Americanize Takes a HasWord or String and returns a lowercase version of it.
Morphology Morphology computes the base form of English words, by removing just inflections (not derivational morphology).
PTBTokenizer Tokenizer implementation that conforms to the Penn Treebank tokenization conventions.
PTBTokenizer.PTBTokenizerFactory  
TokenizerAdapter This class adapts between a java.io.StreamTokenizer and a edu.stanford.nlp.process.Tokenizer.
WhitespaceTokenizer Simple Tokenizer implementation that tokenizes on whitespace.
 

Package edu.stanford.nlp.process Description

Contains classes for processing documents. The key here is the Processor interface, which has a sole Document process(Document) method which takes a document and returns another processed document, which may be parsed, stoplisted, stemmed, etc.


Sepandar David Kamvar
Last modified: Thu Oct 31 11:14:34 PST 2002