edu.stanford.nlp.parser.lexparser
Class AbstractTreebankParserParams

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.AbstractTreebankParserParams
All Implemented Interfaces:
TreebankLangParserParams, java.io.Serializable
Direct Known Subclasses:
ArabicTreebankParserParams, EnglishTreebankParserParams

public abstract class AbstractTreebankParserParams
extends java.lang.Object
implements TreebankLangParserParams

An abstract class providing a common method base from which to complete a TreebankLangParserParams implementing class.

With some extending classes you'll want to have access to special attributes of the corresponding TreebankLanguagePack while taking advantage of this class's code for making the TreebankLanguagePack accessible. A good way to do this is to pass a new instance of the appropriate TreebankLanguagePack into this class's constructor, then get it back later on by casting a call to treebankLanguagePack(). See ChineseTreebankParserParams for an example.

See Also:
Serialized Form

Nested Class Summary
static interface AbstractTreebankParserParams.DependencyTyper<T>
           
protected  class AbstractTreebankParserParams.SubcategoryStripper
           
 
Field Summary
protected  java.lang.String inputEncoding
           
protected  java.lang.String outputEncoding
           
protected  TreebankLanguagePack tlp
           
 
Constructor Summary
protected AbstractTreebankParserParams(TreebankLanguagePack tlp)
          Stores the passed-in TreebankLanguagePack.
 
Method Summary
abstract  TreeTransformer collinizer()
          the tree transformer used to produce trees for evaluation.
abstract  TreeTransformer collinizerEvalb()
          the tree transformer used to produce trees for evaluation.
static
<E> java.util.Collection<E>
dependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer, AbstractTreebankParserParams.DependencyTyper<E> typer)
          Returns the set of dependencies in a tree, according to some AbstractTreebankParserParams.DependencyTyper.
abstract  void display()
          display language-specific settings
 java.lang.String getOutputEncoding()
          Returns the output encoding being used.
abstract  HeadFinder headFinder()
          the HeadFinder to use for your treebank.
 Lexicon lex()
           
 Lexicon lex(edu.stanford.nlp.parser.lexparser.Options.LexOptions op)
           
abstract  MemoryTreebank memoryTreebank()
          returns a MemoryTreebank appropriate to the treebank source
static java.util.Collection parsevalObjectify(Tree t, TreeTransformer collinizer)
          Takes a Tree and a collinizer and returns a Collection of Constituents for PARSEVAL evaluation.
 java.io.PrintWriter pw()
          The PrintWriter used to print output.
 java.io.PrintWriter pw(java.io.OutputStream o)
          The PrintWriter used to print output.
 void setInputEncoding(java.lang.String encoding)
          Sets the input encoding.
abstract  int setOptionFlag(java.lang.String[] args, int i)
          Set language-specific options according to flags.
 void setOutputEncoding(java.lang.String encoding)
          Sets the output encoding.
abstract  java.lang.String[] sisterSplitters()
          Returns the splitting strings used for selective splits.
 TreeTransformer subcategoryStripper()
          Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.
 MemoryTreebank testMemoryTreebank()
          You can often return the same thing for testMemoryTreebank as for memoryTreebank
abstract  edu.stanford.nlp.parser.lexparser.TreeHeadPair transformTree(Tree t, Tree root, edu.stanford.nlp.parser.lexparser.TreeHeadPair thp)
          transformTree does language-specific tree transformations such as splicing.
 TreebankLanguagePack treebankLanguagePack()
          Returns an appropriate treebankLanguagePack
 TokenizerFactory<Tree> treeTokenizerFactory()
           
static EquivalenceClasser<java.util.List<java.lang.String>> typedDependencyClasser()
          returns an EquivalenceClasser that classes typed dependencies by the syntactic categories of mother, head and daughter, plus direction.
static java.util.Collection<java.util.List<java.lang.String>> typedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
          Returns a collection of word-word dependencies typed by mother, head, daughter node syntactic categories.
static java.util.Collection<java.util.List<java.lang.String>> untypedDependencyObjectify(Tree t, HeadFinder hf, TreeTransformer collinizer)
          Returns a collection of untyped word-word dependencies for the tree.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.stanford.nlp.parser.lexparser.TreebankLangParserParams
defaultTestSentence, diskTreebank, treeReaderFactory
 

Field Detail

inputEncoding

protected java.lang.String inputEncoding

outputEncoding

protected java.lang.String outputEncoding

tlp

protected TreebankLanguagePack tlp
Constructor Detail

AbstractTreebankParserParams

protected AbstractTreebankParserParams(TreebankLanguagePack tlp)
Stores the passed-in TreebankLanguagePack.

Method Detail

setInputEncoding

public void setInputEncoding(java.lang.String encoding)
Sets the input encoding.

Specified by:
setInputEncoding in interface TreebankLangParserParams

setOutputEncoding

public void setOutputEncoding(java.lang.String encoding)
Sets the output encoding.

Specified by:
setOutputEncoding in interface TreebankLangParserParams

getOutputEncoding

public java.lang.String getOutputEncoding()
Returns the output encoding being used.

Specified by:
getOutputEncoding in interface TreebankLangParserParams

memoryTreebank

public abstract MemoryTreebank memoryTreebank()
returns a MemoryTreebank appropriate to the treebank source

Specified by:
memoryTreebank in interface TreebankLangParserParams

testMemoryTreebank

public MemoryTreebank testMemoryTreebank()
You can often return the same thing for testMemoryTreebank as for memoryTreebank

Specified by:
testMemoryTreebank in interface TreebankLangParserParams

pw

public java.io.PrintWriter pw()
The PrintWriter used to print output. It's the responsibility of pw to deal properly with character encodings for the relevant treebank.

Specified by:
pw in interface TreebankLangParserParams

pw

public java.io.PrintWriter pw(java.io.OutputStream o)
The PrintWriter used to print output. It's the responsibility of pw to deal properly with character encodings for the relevant treebank.

Specified by:
pw in interface TreebankLangParserParams

treebankLanguagePack

public TreebankLanguagePack treebankLanguagePack()
Returns an appropriate treebankLanguagePack

Specified by:
treebankLanguagePack in interface TreebankLangParserParams

headFinder

public abstract HeadFinder headFinder()
the HeadFinder to use for your treebank.

Specified by:
headFinder in interface TreebankLangParserParams

lex

public Lexicon lex()

lex

public Lexicon lex(edu.stanford.nlp.parser.lexparser.Options.LexOptions op)
Specified by:
lex in interface TreebankLangParserParams

parsevalObjectify

public static java.util.Collection parsevalObjectify(Tree t,
                                                     TreeTransformer collinizer)
Takes a Tree and a collinizer and returns a Collection of Constituents for PARSEVAL evaluation. Some notes on this particular parseval: (Note that I haven't checked this rigorously yet with the PARSEVAL definition -- Roger.)


untypedDependencyObjectify

public static java.util.Collection<java.util.List<java.lang.String>> untypedDependencyObjectify(Tree t,
                                                                                                HeadFinder hf,
                                                                                                TreeTransformer collinizer)
Returns a collection of untyped word-word dependencies for the tree.


typedDependencyObjectify

public static java.util.Collection<java.util.List<java.lang.String>> typedDependencyObjectify(Tree t,
                                                                                              HeadFinder hf,
                                                                                              TreeTransformer collinizer)
Returns a collection of word-word dependencies typed by mother, head, daughter node syntactic categories.


dependencyObjectify

public static <E> java.util.Collection<E> dependencyObjectify(Tree t,
                                                              HeadFinder hf,
                                                              TreeTransformer collinizer,
                                                              AbstractTreebankParserParams.DependencyTyper<E> typer)
Returns the set of dependencies in a tree, according to some AbstractTreebankParserParams.DependencyTyper.


typedDependencyClasser

public static EquivalenceClasser<java.util.List<java.lang.String>> typedDependencyClasser()
returns an EquivalenceClasser that classes typed dependencies by the syntactic categories of mother, head and daughter, plus direction.


collinizer

public abstract TreeTransformer collinizer()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things.

Specified by:
collinizer in interface TreebankLangParserParams

collinizerEvalb

public abstract TreeTransformer collinizerEvalb()
the tree transformer used to produce trees for evaluation. Will be applied both to the parse output tree and to the gold tree. Should strip punctuation and maybe do some other things. The evalb version should strip some more stuff off. (finish this doc!)

Specified by:
collinizerEvalb in interface TreebankLangParserParams

sisterSplitters

public abstract java.lang.String[] sisterSplitters()
Returns the splitting strings used for selective splits.

Specified by:
sisterSplitters in interface TreebankLangParserParams
Returns:
An array containing ancestor-annotated Strings: categories should be split according to these ancestor annotations.

subcategoryStripper

public TreeTransformer subcategoryStripper()
Returns a TreeTransformer appropriate to the Treebank which can be used to remove functional tags (such as "-TMP") from categories.

Specified by:
subcategoryStripper in interface TreebankLangParserParams

transformTree

public abstract edu.stanford.nlp.parser.lexparser.TreeHeadPair transformTree(Tree t,
                                                                             Tree root,
                                                                             edu.stanford.nlp.parser.lexparser.TreeHeadPair thp)
transformTree does language-specific tree transformations such as splicing. Any parameterizations should be inside the specific TreebankLangParserParams class

Specified by:
transformTree in interface TreebankLangParserParams

display

public abstract void display()
display language-specific settings

Specified by:
display in interface TreebankLangParserParams

setOptionFlag

public abstract int setOptionFlag(java.lang.String[] args,
                                  int i)
Set language-specific options according to flags. This routine should process the option starting in args[i] (which might potentially be several arguments long if it takes arguments). It should return the index after the last index it consumed in processing. In particular, if it cannot process the current option, the return value should be i.

Specified by:
setOptionFlag in interface TreebankLangParserParams
Parameters:
args - Array of command line arguments
i - Index in command line arguments to try to process as an option
Returns:
The index of the item after arguments processed as part of this command line option.

treeTokenizerFactory

public TokenizerFactory<Tree> treeTokenizerFactory()
Specified by:
treeTokenizerFactory in interface TreebankLangParserParams