edu.stanford.nlp.util
Class Distribution

java.lang.Object
  extended by edu.stanford.nlp.util.Distribution
All Implemented Interfaces:
java.io.Serializable

public class Distribution
extends java.lang.Object
implements java.io.Serializable

Immutable class for representing normalized, smoothed discrete distributions from Counters. Smoothed counters reserve probability mass for unseen items, so queries for the probability of unseen items will return a small positive amount. totalCount() should always return 1.

Counter passed in constructors is copied.

See Also:
Serialized Form

Field Summary
protected  Counter counter
           
 
Method Summary
static Distribution absolutelyDiscountedDistribution(GenericCounter counter, int numberOfKeys, double discount)
           
 void addToKeySet(java.lang.Object o)
          Insures that object is in keyset (with possibly zero value)
 java.lang.Object argmax()
           
 boolean containsKey(java.lang.Object key)
           
static Distribution distributionFromLogisticCounter(GenericCounter cntr)
          Maps a counter representing the linear weights of a multiclass logistic regression model to the probabilities of each class.
static Distribution distributionWithDirichletPrior(GenericCounter c, Distribution prior, double weight)
          Returns a Distribution that uses prior as a Dirichlet prior weighted by weight.
static Distribution dynamicCounterWithDirichletPrior(GenericCounter c, Distribution prior, double weight)
          Like normalizedCounterWithDirichletPrior except probabilities are computed dynamically from the counter and prior instead of all at once up front.
 boolean equals(java.lang.Object o)
           
 double getCount(java.lang.Object key)
          Returns the current count for the given key, which is 0 if it hasn't been seen before.
 Counter getCounter()
           
static Distribution getDistribution(GenericCounter counter)
          Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.
static Distribution getDistributionFromLogValues(GenericCounter counter)
          Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.
static Distribution getDistributionFromPartiallySpecifiedCounter(Counter c, int numKeys)
          Assuming that c has a total count < 1, returns a new Distribution using the counts in c as probabilities.
static Distribution getDistributionWithReservedMass(GenericCounter counter, double reservedMass)
           
 int getNumberOfKeys()
           
static Distribution getPerturbedDistribution(GenericCounter wordCounter, java.util.Random r)
           
static Distribution getPerturbedUniformDistribution(java.util.Set s, java.util.Random r)
           
 double getReservedMass()
           
static Distribution getUniformDistribution(java.util.Set s)
           
static Distribution goodTuringSmoothedCounter(GenericCounter counter, int numberOfKeys)
          Creates a Good-Turing smoothed Distribution from the given counter.
static Distribution goodTuringWithExplicitUnknown(GenericCounter counter, java.lang.Object UNK)
          Creates a Good-Turing smoothed Distribution from the given counter without creating any reserved mass-- instead, the special object UNK in the counter is assumed to be the count of "UNSEEN" items.
 int hashCode()
           
 java.util.Set keySet()
           
static Distribution laplaceSmoothedDistribution(GenericCounter counter, int numberOfKeys)
          Creates an Laplace smoothed Distribution from the given counter, ie adds one count to every item, including unseen ones, and divides by the total count.
static Distribution laplaceSmoothedDistribution(GenericCounter counter, int numberOfKeys, double lambda)
          Creates a smoothed Distribution using Lidstone's law, ie adds lambda (typically between 0 and 1) to every item, including unseen ones, and divides by the total count.
static Distribution laplaceWithExplicitUnknown(GenericCounter counter, double lambda, java.lang.Object UNK)
          Creates a smoothed Distribution with Laplace smoothing, but assumes an explicit count of "UNKNOWN" items.
static void main(java.lang.String[] args)
          For internal testing purposes only.
 double probabilityOf(java.lang.Object key)
          Returns the normalized count of the given object.
 java.lang.Object sampleFrom()
          Returns an object sampled from the distribution.
 java.lang.String toString()
           
 java.lang.String toString(java.text.NumberFormat nf)
           
 double totalCount()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

counter

protected Counter counter
Method Detail

getCounter

public Counter getCounter()

toString

public java.lang.String toString(java.text.NumberFormat nf)

getReservedMass

public double getReservedMass()

getNumberOfKeys

public int getNumberOfKeys()

keySet

public java.util.Set keySet()

containsKey

public boolean containsKey(java.lang.Object key)

getCount

public double getCount(java.lang.Object key)
Returns the current count for the given key, which is 0 if it hasn't been seen before. This is a convenient version of get that casts and extracts the primitive value.


getDistributionFromPartiallySpecifiedCounter

public static Distribution getDistributionFromPartiallySpecifiedCounter(Counter c,
                                                                        int numKeys)
Assuming that c has a total count < 1, returns a new Distribution using the counts in c as probabilities. If c has a total count > 1, returns a normalized distribution with no remaining mass.


getUniformDistribution

public static Distribution getUniformDistribution(java.util.Set s)
Parameters:
s - a Set of keys.
Returns:

getPerturbedUniformDistribution

public static Distribution getPerturbedUniformDistribution(java.util.Set s,
                                                           java.util.Random r)
Parameters:
s - a Set of keys.
Returns:

getPerturbedDistribution

public static Distribution getPerturbedDistribution(GenericCounter wordCounter,
                                                    java.util.Random r)

getDistribution

public static Distribution getDistribution(GenericCounter counter)
Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.

Parameters:
counter -
Returns:
a new Distribution

getDistributionWithReservedMass

public static Distribution getDistributionWithReservedMass(GenericCounter counter,
                                                           double reservedMass)

getDistributionFromLogValues

public static Distribution getDistributionFromLogValues(GenericCounter counter)
Creates a Distribution from the given counter, ie makes an internal copy of the counter and divides all counts by the total count.

Parameters:
counter -
Returns:
a new Distribution

absolutelyDiscountedDistribution

public static Distribution absolutelyDiscountedDistribution(GenericCounter counter,
                                                            int numberOfKeys,
                                                            double discount)

laplaceSmoothedDistribution

public static Distribution laplaceSmoothedDistribution(GenericCounter counter,
                                                       int numberOfKeys)
Creates an Laplace smoothed Distribution from the given counter, ie adds one count to every item, including unseen ones, and divides by the total count.

Parameters:
counter -
numberOfKeys -
Returns:
a new add-1 smoothed Distribution

laplaceSmoothedDistribution

public static Distribution laplaceSmoothedDistribution(GenericCounter counter,
                                                       int numberOfKeys,
                                                       double lambda)
Creates a smoothed Distribution using Lidstone's law, ie adds lambda (typically between 0 and 1) to every item, including unseen ones, and divides by the total count.

Parameters:
counter -
numberOfKeys -
lambda -
Returns:
a new Lidstone smoothed Distribution

laplaceWithExplicitUnknown

public static Distribution laplaceWithExplicitUnknown(GenericCounter counter,
                                                      double lambda,
                                                      java.lang.Object UNK)
Creates a smoothed Distribution with Laplace smoothing, but assumes an explicit count of "UNKNOWN" items. Thus anything not in the original counter will have probability zero.

Parameters:
counter - the counter to normalize
lambda - the value to add to each count
UNK - the UNKNOWN symbol
Returns:
a new Laplace-smoothed distribution

goodTuringSmoothedCounter

public static Distribution goodTuringSmoothedCounter(GenericCounter counter,
                                                     int numberOfKeys)
Creates a Good-Turing smoothed Distribution from the given counter.

Parameters:
counter -
numberOfKeys -
Returns:
a new Good-Turing smoothed Distribution.

goodTuringWithExplicitUnknown

public static Distribution goodTuringWithExplicitUnknown(GenericCounter counter,
                                                         java.lang.Object UNK)
Creates a Good-Turing smoothed Distribution from the given counter without creating any reserved mass-- instead, the special object UNK in the counter is assumed to be the count of "UNSEEN" items. Probability of objects not in original counter will be zero.

Parameters:
counter - the counter
UNK - the unknown symbol
Returns:
a good-turing smoothed distribution

distributionWithDirichletPrior

public static Distribution distributionWithDirichletPrior(GenericCounter c,
                                                          Distribution prior,
                                                          double weight)
Returns a Distribution that uses prior as a Dirichlet prior weighted by weight. Essentially adds "pseudo-counts" for each Object in prior equal to that Object's mass in prior times weight, then normalizes.

WARNING: If unseen item is encountered in c, total may not be 1. NOTE: This will not work if prior is a DynamicDistribution to fix this, you could add a CounterView to Distribution and use that in the linearCombination call below

Parameters:
c -
prior -
weight - multiplier of prior to get "pseudo-count"
Returns:
new Distribution

dynamicCounterWithDirichletPrior

public static Distribution dynamicCounterWithDirichletPrior(GenericCounter c,
                                                            Distribution prior,
                                                            double weight)
Like normalizedCounterWithDirichletPrior except probabilities are computed dynamically from the counter and prior instead of all at once up front. The main advantage of this is if you are making many distributions from relatively sparse counters using the same relatively dense prior, the prior is only represented once, for major memory savings.

Parameters:
c -
prior -
weight - multiplier of prior to get "pseudo-count"
Returns:
new Distribution

distributionFromLogisticCounter

public static Distribution distributionFromLogisticCounter(GenericCounter cntr)
Maps a counter representing the linear weights of a multiclass logistic regression model to the probabilities of each class.


sampleFrom

public java.lang.Object sampleFrom()
Returns an object sampled from the distribution. There may be a faster way to do this if you need to...

Returns:
a sampled object

probabilityOf

public double probabilityOf(java.lang.Object key)
Returns the normalized count of the given object.

Parameters:
key -
Returns:
the normalized count of the object

argmax

public java.lang.Object argmax()

totalCount

public double totalCount()

addToKeySet

public void addToKeySet(java.lang.Object o)
Insures that object is in keyset (with possibly zero value)

Parameters:
o - object to put in keyset

equals

public boolean equals(java.lang.Object o)
Overrides:
equals in class java.lang.Object

hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

main

public static void main(java.lang.String[] args)
For internal testing purposes only.