📄 hmmdecoder.java
字号:
/* * LingPipe v. 3.5 * Copyright (C) 2003-2008 Alias-i * * This program is licensed under the Alias-i Royalty Free License * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Alias-i * Royalty Free License Version 1 for more details. * * You should have received a copy of the Alias-i Royalty Free License * Version 1 along with this program; if not, visit * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211, * +1 (718) 290-9170. */package com.aliasi.hmm;import com.aliasi.symbol.SymbolTable;import com.aliasi.util.BoundedPriorityQueue;import com.aliasi.util.Iterators;import com.aliasi.util.Scored;import com.aliasi.util.ScoredObject;import java.util.Map;import java.util.Iterator;/** * An <code>HmmDecoder</code> provides a range of decoding services * for hidden Markov models (HMMs). A decoder is constructed from a * hidden Markov model. * * <P>The method {@link #firstBest(String[])} provides the most * likely (first best) path of HMM states (tags) given a sequence of * string emissions (outputs). First-best decoding is implemented * using Viterbi's dynamic programming algorithm. * * <h3>N-best Output</h3> * * <P>The method {@link #nBest(String[])} returns an iterator over the * n-best analyses. N-best decoding is implemented using the Viterbi * algorithm in a forward pass and the A<sup>*</sup> algorithm in the * backward pass using the Viterbi estimates as exact completion * estimates. The variant {@link #nBestConditional(String[])} further * normalizes outputs to posterior conditional probabilities. * * <h3>Confidence and Lattice Output</h3> * * <P>The method {@link #lattice(String[])} computes the entire * forward-backward lattice for the specified input. Lattice decoding * is implemented using the standard forward-backward algorithm. The * lattice is an instance of {@link TagWordLattice}; see that class's * documentation for information on how to retrieve cumulative (total) * probabilities for input token sequences and posterior conditional * probabilities (confidence scores) per token. * * <h3>Caching</h3> * * <P>The decoder is able to cache emission probabilities, the * computation of which is often the bottleneck in decoding. Caches * may be provided in the constructor for both ordinary (linear) * probabilities and log probabilities of emissions. A cache is * simply an instance of {@link Map}. * * <p>The first-best and n-best decoders only uses log probabilities, * whereas the n-best normalized and lattice decoders use linear * probabilities. Only the probabilities computed are cached, so if a * program only does first-best processing, only linear estimates are * cached. * * <P>Any implementation of <code>Map</code> may be used as a cache, * but particular attention must be paid to thread safety and * scalability. A fully synchronized cache can be created with: * * <pre> * Map cache = java.util.Collections.synchronizedMap(new HashMap());</pre> * * The map implementation {@link com.aliasi.util.FastCache} is designed * particularly to be used as a cache in settings such as these. * * <P>It is often (e.g. on English newsire) easy to get high token * coverage (e.g. 97%) with a rather modestly sized cache (e.g. 100K * tokens). Other corpora and languages may vary and we encourage * experimentation with efficiency versus memory for caching. Note * that run times will speed up as more and more estimates are returned * from the cache rather than being computed directly. * * <h3>Synchrnonization and Thread Safety</h3> * * <p>This class does not perform any underlying sychronization. If * the hidden Markov model is not thread safe, then it must be * synchronized. Similarly for the caches. Note that {@link * com.aliasi.util.FastCache}, while not synchronized, is thread safe. * Similarly, the compilation of an HMM trained with {@link * HmmCharLmEstimator} is thread safe, in fact allowing safe * concurrent access because it is immutable. * * <h3>Beam and Pruning</h3> * * <p>For first-best and n-best decoding, through methods {@link * #firstBest(String[])}, {@link #nBest(String[])}, and {@link * #nBestConditional(String[])}, a beam may be used to prune unlikely * hypotheses. This beam is set during construction or through the * method {@link #setLog2EmissionBeam(double)} (setting and access * must be concurrent read/exclusive write synchronized from the * caller). The beam works token by token. As each token is * considered, any tag whose emission log (base 2) likelihood is * more than the beam less than the bes5t emission estimate is * eliminated from further consideration. * * @author Bob Carpenter * @version 3.6 * @since LingPipe2.1 */public class HmmDecoder { private final HiddenMarkovModel mHmm; private Map<String,double[]> mEmissionCache; private Map<String,double[]> mEmissionLog2Cache; private double mLog2EmissionBeam; private double mLog2Beam; /** * Construct an HMM decoder from the specified HMM. No caching is * applied to estimates, and the beams are set to positive infinity, * turning off pruning. This constructor is appropriate for * dynamic models with changing probability estimates. * * @param hmm Model to use as basis of decoding. */ public HmmDecoder(HiddenMarkovModel hmm) { this(hmm,null,null); } /** * Construct an HMM decoder from the specified HMM using the * specified caches for linear and log probabilities. The beams * are set to positive infinity, turning off pruning. Either or * both of the caches may be <code>null</code>, in which case the * corresponding values will not be cached. * * @param hmm Model to use for decoding. * @param emissionCache Map to use for emission caching. * @param emissionLog2Cache Map to use for log emission caching. */ public HmmDecoder(HiddenMarkovModel hmm, Map<String,double[]> emissionCache, Map<String,double[]> emissionLog2Cache) { this(hmm,emissionCache,emissionLog2Cache, Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY); } /** * Construct an HMM decoder from the specified HMM using the * specified caches for linear and log probabilities, with the * specified beam width for emission estimates. Either or both of * the caches may be <code>null</code>, in which case the * corresponding values will not be cached. * * @param hmm Model to use for decoding. * @param emissionCache Map to use for emission caching. * @param emissionLog2Cache Map to use for log emission caching. * @param log2Beam The log (base 2) beam for pruning full hypotheses. * @param log2EmissionBeam The log (base 2) beam for pruning emission hypotheses. * @throws IllegalArgumentException If either beam is not a * non-negative number. */ public HmmDecoder(HiddenMarkovModel hmm, Map<String,double[]> emissionCache, Map<String,double[]> emissionLog2Cache, double log2Beam, double log2EmissionBeam) { mHmm = hmm; mEmissionCache = emissionCache; mEmissionLog2Cache = emissionLog2Cache; setLog2Beam(log2Beam); setLog2EmissionBeam(log2EmissionBeam); } /** * Returns the hidden Markov model underlying this decoder. * The returned value is the actual HMM used by this decoder, * so changes to it will affect this decoder. * * @return The HMM for this decoder. */ public HiddenMarkovModel getHmm() { return mHmm; } /** * Returns the mapping used to cache emission probabilities, or * <code>null</code> if not caching. This is the actual mapping, * so changes to it will affect this decoder. * * @return The emission probability cache. */ public Map<String,double[]> emissionCache() { return mEmissionCache; } /** * Returns the mapping used to cache log (base 2) emission * probabilities, or <code>null</code> if not caching. This is * the actual mapping, so changes to it will affect this decoder. * * @return The emission probability cache. */ public Map<String,double[]> emissionLog2Cache() { return mEmissionLog2Cache; } /** * Sets the emission cache to the specified value. * * <p><i>Warning:</i> This method should not be executed * concurrently with any calls to decoding, as it may produce an * inconsistent result. The typical application will be to set a * cache before using a decoder. * * @param cache Cache for linear emission estimates. */ public void setEmissionCache(Map<String,double[]> cache) { mEmissionCache = cache; } /** * Sets the log (base 2) emission beam width. Any tag with * a log (base 2) emission probability more than the beam width * less than the best hypothesis is discarded. Setting the beam * width to zero results in pruning to category with the best emission. * Setting the beam width to positive infinity effectively turns * off the beam. * * @param log2EmissionBeam Width of beam. * @throws IllegalArgumentException If the beam width is negative. */ public void setLog2EmissionBeam(double log2EmissionBeam) { if (log2EmissionBeam <= 0 || Double.isNaN(log2EmissionBeam)) { String msg = "Beam width must be a positive number." + " Found log2EmissionBeam=" + log2EmissionBeam; throw new IllegalArgumentException(msg); } mLog2EmissionBeam = log2EmissionBeam; } /** * Sets the value of the log2 beam to the specified value. * This beam controls pruning based on full Viterbi values * for a given lattice slice. See the class documentation above * for more details. * * @param log2Beam The log (base 2) Viterbi beam. * @throws IllegalArgumentException If the beam width is negative. */ public void setLog2Beam(double log2Beam) { if (log2Beam <= 0 || Double.isNaN(log2Beam)) { String msg = "Beam width must be a positive number." + " Found log2EmissionBeam=" + log2Beam; throw new IllegalArgumentException(msg); } mLog2Beam = log2Beam; } /** * Sets the log emission cache to the specified value. * * <p><i>Warning:</i> This method should not be executed * concurrently with any calls to decoding, as it may produce an * inconsistent result. The typical application will be to set a * cache before using a decoder. * * @param cache Cache for linear emission estimates. */ public void setEmissionLog2Cache(Map<String,double[]> cache) { mEmissionLog2Cache = cache; } double[] cachedEmitProbs(String emission) { double[] emitProbs = mEmissionCache.get(emission); if (emitProbs != null) {
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -