(VP [< VV | [< NP & % NP] ] )
Relations can be negated with the '!' operator, in which case the
expression will match only if there is no node satisfying the relation.
For example (NP !< NNP)
matches only NPs not dominating
an NNP. Label descriptions can also be negated with '!': (NP < !NNP|NNS) matches
NPs dominating some node that is not an NNP or an NNS.
In order to consider only the "basic category" of tree labels for matching,
a node, prefix that node's description with the @ symbol. For example
(@NP < @NN)
This can only be used for individual nodes;
if you want all nodes to use the basic category, it would be more efficient
to use a TreeNormalizer
to remove functional
tags before passing the tree to the TregexPattern.
Nodes can be given names using '='. A named node will be stored in a
map that maps names to nodes so that if a match is found, the node
corresponding to the named node can be extracted from the map. For
example (NP < NNP=name)
will match an NP dominating an NNP
and after a match is found, the map can be queried with the
name to retreived the matched node using TregexMatcher.getNode(Object o)
with (String) argument "name" (not "=name").
Note that a ParseException
will be thrown if a named node is used in the
scope of a negated relation.
Named nodes that refer back to previous named nodes need not have a node
description -- this is known as "backreferencing". In this case, the expression
will match only if the subsequently named node is equal to the previously named
node (in the == sense).
For example: (@NP <, (@NP $+ (/,/ $+ (@NP $+ /,/=comma))) <- =comma)
matches an NP dominating exactly the sequence NP comma NP comma. Multiple
backreferences are allowed. If the node w/ no node description does not refer
to a previously named node, there will be no error, the expression simply will
not match anything.
Another way to refer to previously named nodes is with the "link" symbol: '~'.
A link is like a backreference, except that instead of having to be *equal to* the
referred node, the current node only has to match the label of the referred to node.
A link cannot have a node description, i.e. the '~' symbol must immediately follow a
relation symbol.
Relations can be made optional with the '?' operator. This way the
expression will match even if the optional relation is not satisfied, but
if it is satisfied named nodes under it will still be put into the map.
The HeadFinder used to determine heads for the head relations, and also
the Function mapping from labels to Basic Category tags can be
chosen by using a TregexPatternCompiler
.
Current known bugs/shortcomings:
- Node search currently takes no advantage of limitations
imposed by the queried relations. This reduces the efficiency of
the search, quite a bit in some cases.
- Due to the lack of parent pointers in
Tree
s, parents are found via
exhaustive depth-first search from the root. This is a serious efficiency bottleneck.
- See Also:
- Serialized Form
Method Summary |
static TregexPattern |
compile(java.lang.String tregex)
Creates a pattern from the given string using the default Headfinder and
BasicCategoryFunction. |
static void |
main(java.lang.String[] args)
Use to match a tree pattern to the trees in files. |
TregexMatcher |
matcher(Tree t)
Get a TregexMatcher for this pattern on this tree. |
java.lang.String |
pattern()
|
void |
prettyPrint()
Print a multi-line respresentation of the pattern illustrating
it's syntax to System.out. |
void |
prettyPrint(java.io.PrintStream ps)
Print a multi-line respresentation
of the pattern illustrating it's syntax. |
void |
prettyPrint(java.io.PrintWriter pw)
Print a multi-line respresentation
of the pattern illustrating it's syntax. |
void |
setPatternString(java.lang.String patternString)
|
abstract java.lang.String |
toString()
A single-line string representation of the pattern |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
currentBasicCatFunction
protected static Function currentBasicCatFunction
matcher
public TregexMatcher matcher(Tree t)
- Get a
TregexMatcher
for this pattern on this tree.
- Parameters:
t
- a tree to match on
- Returns:
- a TregexMatcher
compile
public static TregexPattern compile(java.lang.String tregex)
throws ParseException
- Creates a pattern from the given string using the default Headfinder and
BasicCategoryFunction. If you want to use a different HeadFinder or
BasicCategoryFunction, use a
TregexPatternCompiler
object.
- Parameters:
tregex
- the pattern string
- Returns:
- a TregexPattern for the string.
- Throws:
ParseException
- if the string does not parse
pattern
public java.lang.String pattern()
setPatternString
public void setPatternString(java.lang.String patternString)
toString
public abstract java.lang.String toString()
- A single-line string representation of the pattern
- Overrides:
toString
in class java.lang.Object
- Returns:
prettyPrint
public void prettyPrint(java.io.PrintWriter pw)
- Print a multi-line respresentation
of the pattern illustrating it's syntax.
prettyPrint
public void prettyPrint(java.io.PrintStream ps)
- Print a multi-line respresentation
of the pattern illustrating it's syntax.
prettyPrint
public void prettyPrint()
- Print a multi-line respresentation of the pattern illustrating
it's syntax to System.out.
main
public static void main(java.lang.String[] args)
- Use to match a tree pattern to the trees in files.
Usage:
java edu.stanford.nlp.trees.tregex.TregexPattern [-T] [-C] [-w] [-f] pattern
[handle] filepath
It prints out all the matches of the tree pattern to every tree.
- Parameters:
args
- Command line arguments: Argument 1 is the tree pattern which
should name a node with =name (for some arbitrary string "name"),
argument 2 is an optional name =name, and argument 3 is a filepath
to files with trees. A -T flag causes all trees to be printed as
processed. Otherwise just matches are printed. The -C flag
suppresses printing of matches, so only a number of matches is
printed. The -w flag causes the whole of a tree that matches to
be printed. The -f flag causes the filename to be printed.