edu.stanford.nlp.parser.lexparser
Interface TreebankLangParserParams

All Superinterfaces:
java.io.Serializable
All Known Implementing Classes:
AbstractTreebankParserParams, ArabicTreebankParserParams, EnglishTreebankParserParams

public interface TreebankLangParserParams
extends java.io.Serializable

Contains fields necessary to get the parser to parse an arbitrary treebank.


Method Summary
 TreeTransformer collinizer()
          the tree transformer used to produce trees for evaluation.
 TreeTransformer collinizerEvalb()
          the tree transformer used to produce trees for evaluation.
 java.util.List defaultTestSentence()
          Return a default sentence for the language (for testing)
 DiskTreebank diskTreebank()
          returns a DiskTreebank appropriate to the treebank source
 void display()
          display language-specific settings
 java.lang.String getOutputEncoding()
          Returns the output encoding being used.
 HeadFinder headFinder()
           
 Lexicon lex(edu.stanford.nlp.parser.lexparser.Options.LexOptions op)
           
 MemoryTreebank memoryTreebank()
          returns a MemoryTreebank appropriate to the treebank source
 java.io.PrintWriter pw()
          returns a PrintWriter used to print output.
 java.io.PrintWriter pw(java.io.OutputStream o)
          returns a PrintWriter used to print output to the OutputStream o.
 void setInputEncoding(java.lang.String encoding)
           
 int setOptionFlag(java.lang.String[] args, int i)
          Set a language-specific option according to command-line flags.
 void setOutputEncoding(java.lang.String encoding)
           
 java.lang.String[] sisterSplitters()
          Returns the splitting strings used for selective splits.
 TreeTransformer subcategoryStripper()
          Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.
 MemoryTreebank testMemoryTreebank()
          returns a MemoryTreebank appropriate to the testing treebank source
 edu.stanford.nlp.parser.lexparser.TreeHeadPair transformTree(Tree t, Tree root, edu.stanford.nlp.parser.lexparser.TreeHeadPair thp)
          transformTree does language-specific tree transformations such as splicing.
 TreebankLanguagePack treebankLanguagePack()
          returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels
 TreeReaderFactory treeReaderFactory()
          factorz for reading in trees from the source you want.
 TokenizerFactory<Tree> treeTokenizerFactory()
           
 

Method Detail

headFinder

HeadFinder headFinder()

setInputEncoding

void setInputEncoding(java.lang.String encoding)

setOutputEncoding

void setOutputEncoding(java.lang.String encoding)

getOutputEncoding

java.lang.String getOutputEncoding()
Returns the output encoding being used.


treeReaderFactory

TreeReaderFactory treeReaderFactory()
factorz for reading in trees from the source you want. It's the responsibility of tr to deal properly with character-set encoding of the input. It also is the responsibility of tr to properly normalize trees


lex

Lexicon lex(edu.stanford.nlp.parser.lexparser.Options.LexOptions op)

collinizer

TreeTransformer collinizer()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things.


collinizerEvalb

TreeTransformer collinizerEvalb()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things. The evalb version should strip some more stuff off. (finish this doc!)


memoryTreebank

MemoryTreebank memoryTreebank()
returns a MemoryTreebank appropriate to the treebank source


diskTreebank

DiskTreebank diskTreebank()
returns a DiskTreebank appropriate to the treebank source


testMemoryTreebank

MemoryTreebank testMemoryTreebank()
returns a MemoryTreebank appropriate to the testing treebank source


treebankLanguagePack

TreebankLanguagePack treebankLanguagePack()
returns a TreebankLanguagePack containing Treebank-specific (but not parser-specific) info such as what is punctuation, and also information about the structure of labels


pw

java.io.PrintWriter pw()
returns a PrintWriter used to print output. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank


pw

java.io.PrintWriter pw(java.io.OutputStream o)
returns a PrintWriter used to print output to the OutputStream o. It's the responsibility of the returned PrintWriter to deal properly with character encodings for the relevant treebank


sisterSplitters

java.lang.String[] sisterSplitters()
Returns the splitting strings used for selective splits.

Returns:
An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

subcategoryStripper

TreeTransformer subcategoryStripper()
Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.


transformTree

edu.stanford.nlp.parser.lexparser.TreeHeadPair transformTree(Tree t,
                                                             Tree root,
                                                             edu.stanford.nlp.parser.lexparser.TreeHeadPair thp)
transformTree does language-specific tree transformations such as splicing. Any parameterizations should be inside the specific TreebankLangParserParams class. This method is recursively applied to each node in the tree, so normally you shouldn't write this method to apply recursively to tree members.


display

void display()
display language-specific settings


setOptionFlag

int setOptionFlag(java.lang.String[] args,
                  int i)
Set a language-specific option according to command-line flags. This routine should try to process the option starting at args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.

Parameters:
args - Array of command line arguments
i - Index in command line arguments to try to process as an option
Returns:
The index of the item after arguments processed as part of this command line option.

defaultTestSentence

java.util.List defaultTestSentence()
Return a default sentence for the language (for testing)


treeTokenizerFactory

TokenizerFactory<Tree> treeTokenizerFactory()