📄 hmmdecoder.java

📁 一个自然语言处理的Java开源工具包。LingPipe目前已有很丰富的功能
💻 JAVA
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
/* * LingPipe v. 3.5 * Copyright (C) 2003-2008 Alias-i * * This program is licensed under the Alias-i Royalty Free License * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the Alias-i * Royalty Free License Version 1 for more details. * * You should have received a copy of the Alias-i Royalty Free License * Version 1 along with this program; if not, visit * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211, * +1 (718) 290-9170. */package com.aliasi.hmm;import com.aliasi.symbol.SymbolTable;import com.aliasi.util.BoundedPriorityQueue;import com.aliasi.util.Iterators;import com.aliasi.util.Scored;import com.aliasi.util.ScoredObject;import java.util.Map;import java.util.Iterator;/** * An <code>HmmDecoder</code> provides a range of decoding services * for hidden Markov models (HMMs).  A decoder is constructed from a * hidden Markov model. * * <P>The method {@link #firstBest(String[])} provides the most * likely (first best) path of HMM states (tags) given a sequence of * string emissions (outputs).  First-best decoding is implemented * using Viterbi's dynamic programming algorithm. * * <h3>N-best Output</h3> * * <P>The method {@link #nBest(String[])} returns an iterator over the * n-best analyses.  N-best decoding is implemented using the Viterbi * algorithm in a forward pass and the A<sup>*</sup> algorithm in the * backward pass using the Viterbi estimates as exact completion * estimates.  The variant {@link #nBestConditional(String[])} further * normalizes outputs to posterior conditional probabilities. * * <h3>Confidence and Lattice Output</h3> * * <P>The method {@link #lattice(String[])} computes the entire * forward-backward lattice for the specified input.  Lattice decoding * is implemented using the standard forward-backward algorithm.  The * lattice is an instance of {@link TagWordLattice}; see that class's * documentation for information on how to retrieve cumulative (total) * probabilities for input token sequences and posterior conditional * probabilities (confidence scores) per token. * * <h3>Caching</h3>  * * <P>The decoder is able to cache emission probabilities, the * computation of which is often the bottleneck in decoding.  Caches * may be provided in the constructor for both ordinary (linear) * probabilities and log probabilities of emissions.  A cache is * simply an instance of {@link Map}. * * <p>The first-best and n-best decoders only uses log probabilities, * whereas the n-best normalized and lattice decoders use linear * probabilities.  Only the probabilities computed are cached, so if a * program only does first-best processing, only linear estimates are * cached. *  * <P>Any implementation of <code>Map</code> may be used as a cache, * but particular attention must be paid to thread safety and * scalability.  A fully synchronized cache can be created with: *  * <pre> *      Map cache = java.util.Collections.synchronizedMap(new HashMap());</pre> * * The map implementation {@link com.aliasi.util.FastCache} is designed * particularly to be used as a cache in settings such as these. * * <P>It is often (e.g. on English newsire) easy to get high token * coverage (e.g. 97%) with a rather modestly sized cache (e.g. 100K * tokens).  Other corpora and languages may vary and we encourage * experimentation with efficiency versus memory for caching.  Note * that run times will speed up as more and more estimates are returned * from the cache rather than being computed directly. * * <h3>Synchrnonization and Thread Safety</h3> * * <p>This class does not perform any underlying sychronization.  If * the hidden Markov model is not thread safe, then it must be * synchronized.  Similarly for the caches.  Note that {@link * com.aliasi.util.FastCache}, while not synchronized, is thread safe. * Similarly, the compilation of an HMM trained with {@link * HmmCharLmEstimator} is thread safe, in fact allowing safe * concurrent access because it is immutable. * * <h3>Beam and Pruning</h3> * * <p>For first-best and n-best decoding, through methods {@link * #firstBest(String[])}, {@link #nBest(String[])}, and {@link * #nBestConditional(String[])}, a beam may be used to prune unlikely * hypotheses.  This beam is set during construction or through the * method {@link #setLog2EmissionBeam(double)} (setting and access * must be concurrent read/exclusive write synchronized from the * caller).  The beam works token by token.  As each token is * considered, any tag whose emission log (base 2) likelihood is * more than the beam less than the bes5t emission estimate is * eliminated from further consideration. * * @author Bob Carpenter * @version 3.6 * @since   LingPipe2.1 */public class HmmDecoder {    private final HiddenMarkovModel mHmm;    private Map<String,double[]> mEmissionCache;    private Map<String,double[]> mEmissionLog2Cache;    private double mLog2EmissionBeam;    private double mLog2Beam;    /**     * Construct an HMM decoder from the specified HMM.  No caching is     * applied to estimates, and the beams are set to positive infinity,     * turning off pruning.  This constructor is appropriate for     * dynamic models with changing probability estimates.     *     * @param hmm Model to use as basis of decoding.     */    public HmmDecoder(HiddenMarkovModel hmm) {        this(hmm,null,null);    }    /**     * Construct an HMM decoder from the specified HMM using the     * specified caches for linear and log probabilities.  The beams     * are set to positive infinity, turning off pruning.  Either or     * both of the caches may be <code>null</code>, in which case the     * corresponding values will not be cached.     *     * @param hmm Model to use for decoding.     * @param emissionCache Map to use for emission caching.     * @param emissionLog2Cache Map to use for log emission caching.     */    public HmmDecoder(HiddenMarkovModel hmm,                       Map<String,double[]> emissionCache,                      Map<String,double[]> emissionLog2Cache) {        this(hmm,emissionCache,emissionLog2Cache,             Double.POSITIVE_INFINITY, Double.POSITIVE_INFINITY);    }    /**     * Construct an HMM decoder from the specified HMM using the     * specified caches for linear and log probabilities, with the     * specified beam width for emission estimates.  Either or both of     * the caches may be <code>null</code>, in which case the     * corresponding values will not be cached.     *     * @param hmm Model to use for decoding.     * @param emissionCache Map to use for emission caching.     * @param emissionLog2Cache Map to use for log emission caching.     * @param log2Beam The log (base 2) beam for pruning full hypotheses.     * @param log2EmissionBeam The log (base 2) beam for pruning emission hypotheses.     * @throws IllegalArgumentException If either beam is not a     * non-negative number.     */    public HmmDecoder(HiddenMarkovModel hmm,                      Map<String,double[]> emissionCache,                      Map<String,double[]> emissionLog2Cache,                      double log2Beam,                      double log2EmissionBeam) {        mHmm = hmm;        mEmissionCache = emissionCache;        mEmissionLog2Cache = emissionLog2Cache;        setLog2Beam(log2Beam);        setLog2EmissionBeam(log2EmissionBeam);    }    /**     * Returns the hidden Markov model underlying this decoder.     * The returned value is the actual HMM used by this decoder,     * so changes to it will affect this decoder.     *     * @return The HMM for this decoder.     */    public HiddenMarkovModel getHmm() {        return mHmm;    }    /**     * Returns the mapping used to cache emission probabilities, or     * <code>null</code> if not caching.  This is the actual mapping,     * so changes to it will affect this decoder.     *     * @return The emission probability cache.     */    public Map<String,double[]> emissionCache() {        return mEmissionCache;    }    /**     * Returns the mapping used to cache log (base 2) emission     * probabilities, or <code>null</code> if not caching.  This is     * the actual mapping, so changes to it will affect this decoder.     *     * @return The emission probability cache.     */    public Map<String,double[]> emissionLog2Cache() {        return mEmissionLog2Cache;    }    /**     * Sets the emission cache to the specified value.     *     * <p><i>Warning:</i> This method should not be executed     * concurrently with any calls to decoding, as it may produce an     * inconsistent result.  The typical application will be to set a     * cache before using a decoder.     *     * @param cache Cache for linear emission estimates.     */    public void setEmissionCache(Map<String,double[]> cache) {        mEmissionCache = cache;    }    /**     * Sets the log (base 2) emission beam width.  Any tag with     * a log (base 2) emission probability more than the beam width     * less than the best hypothesis is discarded.  Setting the beam     * width to zero results in pruning to category with the best emission.     * Setting the beam width to positive infinity effectively turns     * off the beam.     *     * @param log2EmissionBeam Width of beam.     * @throws IllegalArgumentException If the beam width is negative.     */    public void setLog2EmissionBeam(double log2EmissionBeam) {        if (log2EmissionBeam <= 0             || Double.isNaN(log2EmissionBeam)) {            String msg = "Beam width must be a positive number."                + " Found log2EmissionBeam=" + log2EmissionBeam;            throw new IllegalArgumentException(msg);        }        mLog2EmissionBeam = log2EmissionBeam;    }    /**     * Sets the value of the log2 beam to the specified value.       * This beam controls pruning based on full Viterbi values     * for a given lattice slice.  See the class documentation above     * for more details.     *     * @param log2Beam The log (base 2) Viterbi beam.     * @throws IllegalArgumentException If the beam width is negative.     */    public void setLog2Beam(double log2Beam) {        if (log2Beam <= 0             || Double.isNaN(log2Beam)) {            String msg = "Beam width must be a positive number."                + " Found log2EmissionBeam=" + log2Beam;            throw new IllegalArgumentException(msg);        }        mLog2Beam = log2Beam;    }    /**     * Sets the log emission cache to the specified value.     *     * <p><i>Warning:</i> This method should not be executed     * concurrently with any calls to decoding, as it may produce an     * inconsistent result.  The typical application will be to set a     * cache before using a decoder.     *     * @param cache Cache for linear emission estimates.     */    public void setEmissionLog2Cache(Map<String,double[]> cache) {        mEmissionLog2Cache = cache;    }    double[] cachedEmitProbs(String emission) {        double[] emitProbs = mEmissionCache.get(emission);        if (emitProbs != null) {
12 3 下一页
💿 文件大小 4561 K
👤 上传用户 edan1181
📂 所属分类 Java编程
🏷️ 相关标签

#LingPipe #Java #自然语言处理 #开源
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -