📄 node.java

📁 一个自然语言处理的Java开源工具包。LingPipe目前已有很丰富的功能
💻 JAVA
📖 第 1 页 / 共 2 页
字号:
12 下一页
/* * LingPipe v. 3.5 * Copyright (C) 2003-2008 Alias-i * * This program is licensed under the Alias-i Royalty Free License * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the Alias-i * Royalty Free License Version 1 for more details. * * You should have received a copy of the Alias-i Royalty Free License * Version 1 along with this program; if not, visit * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211, * +1 (718) 290-9170. */package com.aliasi.chunk;import com.aliasi.symbol.SymbolTableCompiler;import java.util.Iterator;import java.util.TreeMap;import java.util.Set;/** * A <code>Node</code> represents a context event for evaluation of a * conditional probability.  It is the main data structure employed by * {@link EstimatorTrie}.  A node represents a context as a string * symbol.  A node maps symbols representing event outcomes to a * {@link OutcomeCounter}.  A node supports incrementing outcome * counters, and tracks counts the total number of outcomes and total * number of times all of its outcomes have been incremented.  Nodes * are used for smoothed estimation, in support of which they map * event symbols to finer-grained context and provide a back-pointer * to the next most general context (in general, for a node with * symbol <code>"a"</code>, if <code>node.backoffNode()</code> exists, * then <code>node == node.backoffNode().getChild("a")</code>. * * <p> Each node computes estimates based on Witten-Bell-style * smoothing, which is a form of linear interpolation smoothing where the * interpolation between a finer and coarser contexts is determined by * counts on the finer context.  The exact formula used to define * the estimate for a given node in a given context, assuming that * there is a backoff context, is: * * <pre> *   context.estimate(outcome) *       = context.lambda()          * context.directEstimate(outcome) *         + ( 1 -context.lambda() ) * context.backoffContext().estimate(outcome) * *   context.lambda() *       = context.totalCount() *         / ( context.totalCount()               + ( LAMBDA_FACTOR * context.numOutcomes() ) ) * </pre> * * where <code>context.estimate(outcome)</code> is the estimate * provided by the node <code>context</code> to the outcome * <code>outcome</code>, <code>context.lambda()</code> is the linear * interpolation factor, <code>context.directEstimate(outcome)</code> * is a simple frequency-based maximum likelihood estimate of the * outcome (number of times the outcome event has been incremented * divided by the total number of outcomes that have been * incremented), and <code>context.backoffContext()</code> is the next * most general context after <code>context</code>.  In cases where * the context being the most general context and having no backoff * context, the direct estimate is returned.  The definition of * <code>lambda()</code> involves <code>context.totalCount())</code>, * which is the total number of training events incremented on the * node, and <code>context.numOutcomes()</code>, which is the number * of distinct outcomes incremented on this node.  The maximum * likelihood estimate for an outcome is given by * <code>context.directEstimate()</code> which is just the count of a * particular outcome (read from its counter) divided by the total * count.  The definition of <code>lambda()</code> ensures <code>0.0 * <= lambda() <= 1.0</code> for a proper linear interpolation factor. * <code>lambda()</code> increases as the total number of outcomes * increases, meaning that the more data seen by the finer grained * estimate, the more weight it is given in interpolation, whereas * <code>lambda()</code> decreases as more different events have been * seen; the balance between these two factors is determined by the * <code>LAMBDA_FACTOR</code>, which is an estimator tuning parameter. * * <br/><br/> * <table cellpadding="5" border="1"><tr><td width="100%"> * * For example, estimating <code>P(a|b,c)</code>, assume the context * <code>c1</code> represents <code>b,c</code> and context * <code>c2</code> represents context <code>b</code>, so that * <code>c1.backoffContext() == c2</code>, and further assume that * <code>c2</code> is the most general context available.  Then the * estimate of <code>P(a|b,c)</code> is given by: * * <blockquote><code> *   c1.lambda() * *   c1.directEstimate(a) + (1-c1.lambda()) * c2.directEstimate(a) * </code></blockquote> * * </td></tr></table> * </p> * * <p>In typical usage, a root node will be used to navigate down to * more specific contexts.  The root node may or may not act as a * general backoff node; estimators can go even further than the null * observed context to a uniform prior estimate.  These nodes will * form a trie-structure of event contexts which will be incremented * during training.  Then, they may be pruned from a root node to * remove all daughter nodes and counters with fewer than a specified * number of outcomes.  Finally, estimates may be compiled, which * caches the important values, and assigns array indexes to each node * where the nodes are implicit in the indexes of a collection of * parallel arrays.  The structure of an entire estimator is captured, * for example, by {@link com.aliasi.ne.TrainableEstimator}.  </p> * * @author  Bob Carpenter * @version 1.0 * @since   LingPipe1.0 * @see OutcomeCounter * @see EstimatorTrie */class Node {    /**     * Cached version of 1-lambda()     */    float mOneMinusLambda;    /**     * Index in an array used to store nodes.     */    private int mIndex = -1;    /**     * Total number of outcomes incremented for this node.     */    private int mTotalCount = 0;    /**     * Number of distinct outcomes for this node.     */    private short mNumOutcomes = 0;    /**     * A mapping from event symbols to more specific context nodes.     */    private final TreeMap mChildren = new TreeMap();    /**     * A mapping from event symbols to outcome counters.     */    private final TreeMap mOutcomes = new TreeMap();    /**     * The next most general node in an estimator; may be     * <code>null</code> for maximally specific contexts.     */    private final Node mBackoffNode;    /**     * Symbol table compiler for the context symbols of this     * estimator.     */    private final SymbolTableCompiler mSymbolTable;    /**     * The symbol representing the event difference between     * this outcome and the next most general outcome.     */    private final String mSymbol;    /**     * Construct a node representing an estimation context event with     * a specified symbol representing an event, a symbol table     * compiler for the symbol, and a backoff node representing a more     * general context.     *     * @param symbol String representing symbol for this context.     * @param symbolTable Table compiler for the symbol.     * @param backoffNode Next more general context.     */    public Node(String symbol, SymbolTableCompiler symbolTable,                Node backoffNode) {        mSymbol = symbol;        if (symbolTable == null)            throw new IllegalArgumentException("Null table.");        mSymbolTable = symbolTable;        if (symbol != null) symbolTable.addSymbol(symbol);        mBackoffNode = backoffNode;    }    public void printSymbols() {        if (mSymbolTable == null) System.out.println("NULL Symbol TABLE");        System.out.println(mSymbolTable.toString());    }    /**     * Return the identifier for the symbol representing the context     * event for this node.  The full context is the joint context of     * this node's context and all of the more general nodes'     * contexts.     *     * @return Identifier for the symbol representing the context     * event for this node.     */    public int getSymbolID() {        if (mSymbol == null) return -1;        return mSymbolTable.symbolToID(mSymbol);    }    /**     * Adds the symbol for this node to the specified symbol table,     * and recursively for each more specific context node and each     * outcome counter.     */    public void generateSymbols() {        if (mSymbol != null) mSymbolTable.addSymbol(mSymbol);        Iterator outcomeIterator = mOutcomes.values().iterator();        while (outcomeIterator.hasNext())            ((OutcomeCounter)outcomeIterator.next()).addSymbolToTable();        Iterator childIterator = mChildren.values().iterator();        while (childIterator.hasNext())            ((Node)childIterator.next()).generateSymbols();    }    /**     * Index of this node after compiling all nodes into     * array indices.  Will be <code>-1</code> if called     * before {@link #setIndex(int)} is called to set     * the index.     *     * @return Array index for this context.     */    public int index() {        return mIndex;    }    /**     * Set the array index for this node, which will be     * available through {@link #index()}.     *     * @param index Index for this context.     */    public void setIndex(int index) {        mIndex = index;    }    /**     * Prune all daughter nodes and outcomes with fewer than the     * specified number of outcomes.  Adjusts total outcome counts     * accordingly for estimation.     *     * @param threshold Minimal count for outcomes and child nodes to     * be preserved.     */    public void prune(int threshold) {        Iterator outcomes = outcomes().iterator();        while (outcomes.hasNext()) {            OutcomeCounter counter = getOutcome(outcomes.next().toString());            if (counter.count() < threshold) {                mTotalCount -= counter.count();                --mNumOutcomes;                outcomes.remove();            }        }        Iterator childrenIt = children().iterator();
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -