📄 dfastate.java
字号:
/* [The "BSD licence"] Copyright (c) 2005-2006 Terence Parr All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.*/package org.antlr.analysis;import org.antlr.misc.IntSet;import org.antlr.misc.OrderedHashSet;import org.antlr.misc.Utils;import org.antlr.tool.Grammar;import java.util.*;/** A DFA state represents a set of possible NFA configurations. * As Aho, Sethi, Ullman p. 117 says "The DFA uses its state * to keep track of all possible states the NFA can be in after * reading each input symbol. That is to say, after reading * input a1a2..an, the DFA is in a state that represents the * subset T of the states of the NFA that are reachable from the * NFA's start state along some path labeled a1a2..an." * In conventional NFA->DFA conversion, therefore, the subset T * would be a bitset representing the set of states the * NFA could be in. We need to track the alt predicted by each * state as well, however. More importantly, we need to maintain * a stack of states, tracking the closure operations as they * jump from rule to rule, emulating rule invocations (method calls). * Recall that NFAs do not normally have a stack like a pushdown-machine * so I have to add one to simulate the proper lookahead sequences for * the underlying LL grammar from which the NFA was derived. * * I use a list of NFAConfiguration objects. An NFAConfiguration * is both a state (ala normal conversion) and an NFAContext describing * the chain of rules (if any) followed to arrive at that state. There * is also the semantic context, which is the "set" of predicates found * on the path to this configuration. * * A DFA state may have multiple references to a particular state, * but with different NFAContexts (with same or different alts) * meaning that state was reached via a different set of rule invocations. */public class DFAState extends State { public static final int INITIAL_NUM_TRANSITIONS = 4; public static final int PREDICTED_ALT_UNSET = NFA.INVALID_ALT_NUMBER-1; /** We are part of what DFA? Use this ref to get access to the * context trees for an alt. */ public DFA dfa; /** Track the transitions emanating from this DFA state. The List * elements are Transition objects. */ protected List transitions = new ArrayList(INITIAL_NUM_TRANSITIONS); /** When doing an acyclic DFA, this is the number of lookahead symbols * consumed to reach this state. This value may be nonzero for most * dfa states, but it is only a valid value if the user has specified * a max fixed lookahead. */ protected int k; /** The NFA->DFA algorithm may terminate leaving some states * without a path to an accept state, implying that upon certain * input, the decision is not deterministic--no decision about * predicting a unique alternative can be made. Recall that an * accept state is one in which a unique alternative is predicted. */ protected int acceptStateReachable = DFA.REACHABLE_UNKNOWN; /** Rather than recheck every NFA configuration in a DFA state (after * resolving) in findNewDFAStatesAndAddDFATransitions just check * this boolean. Saves a linear walk perhaps DFA state creation. * Every little bit helps. */ protected boolean resolvedWithPredicates = false; /** If a closure operation finds that we tried to invoke the same * rule too many times (stack would grow beyond a threshold), it * marks the state has aborted and notifies the DecisionProbe. */ protected boolean abortedDueToRecursionOverflow = false; /** If we detect recursion on more than one alt, decision is non-LL(*), * but try to isolate it to only those states whose closure operations * detect recursion. There may be other alts that are cool: * * a : recur '.' * | recur ';' * | X Y // LL(2) decision; don't abort and use k=1 plus backtracking * | X Z * ; */ protected boolean abortedDueToMultipleRecursiveAlts = false; /** Build up the hash code for this state as NFA configurations * are added as it's monotonically increasing list of configurations. */ protected int cachedHashCode; protected int cachedUniquelyPredicatedAlt = PREDICTED_ALT_UNSET; /** The set of NFA configurations (state,alt,context) for this DFA state */ protected Set nfaConfigurations = new HashSet(); /** Used to prevent the closure operation from looping to itself and * hence looping forever. Sensitive to the NFA state, the alt, and * the context. This just the nfa config set because we want to * prevent closures only on states contributed by closure not reach * operations. */ protected Set closureBusy = new HashSet(); /** As this state is constructed (i.e., as NFA states are added), we * can easily check for non-epsilon transitions because the only * transition that could be a valid label is transition(0). When we * process this node eventually, we'll have to walk all states looking * for all possible transitions. That is of the order: size(label space) * times size(nfa states), which can be pretty damn big. It's better * to simply track possible labels. * This is type List<Label>. */ protected OrderedHashSet reachableLabels = new OrderedHashSet(); public DFAState(DFA dfa) { this.dfa = dfa; } public Transition transition(int i) { return (Transition)transitions.get(i); } public int getNumberOfTransitions() { return transitions.size(); } public void addTransition(Transition t) { transitions.add(t); } /** Add a transition from this state to target with label. Return * the transition number from 0..n-1. */ public int addTransition(DFAState target, Label label) { transitions.add( new Transition(label, target) ); return transitions.size()-1; } public Transition getTransition(int trans) { return (Transition)transitions.get(trans); } public void removeTransition(int trans) { transitions.remove(trans); } /** Add an NFA configuration to this DFA node. Add uniquely * an NFA state/alt/syntactic&semantic context (chain of invoking state(s) * and semantic predicate contexts). * * I don't see how there could be two configurations with same * state|alt|synCtx and different semantic contexts because the * semantic contexts are computed along the path to a particular state * so those two configurations would have to have the same predicate. * Nonetheless, the addition of configurations is unique on all * configuration info. I guess I'm saying that syntactic context * implies semantic context as the latter is computed according to the * former. * * As we add configurations to this DFA state, track the set of all possible * transition labels so we can simply walk it later rather than doing a * loop over all possible labels in the NFA. */ public void addNFAConfiguration(NFAState state, NFAConfiguration c) { if ( nfaConfigurations.contains(c) ) { return; } nfaConfigurations.add(c); // update hashCode; for some reason using context.hashCode() also // makes the GC take like 70% of the CPU and is slow! cachedHashCode += c.state + c.alt; // update reachableLabels if ( state.transition(0)!=null ) { Label label = state.transition(0).label; if ( !(label.isEpsilon()||label.isSemanticPredicate()) ) { if ( state.transition(1)==null ) { c.singleAtomTransitionEmanating = true; } addReachableLabel(label); } } } public void addNFAConfiguration(NFAState state, int alt, NFAContext context, SemanticContext semanticContext) { NFAConfiguration c = new NFAConfiguration(state.stateNumber, alt, context, semanticContext); addNFAConfiguration(state, c); } /** Add label uniquely and disjointly; intersection with * another set or int/char forces breaking up the set(s). * * Example, if reachable list of labels is [a..z, {k,9}, 0..9], * the disjoint list will be [{a..j,l..z}, k, 9, 0..8]. * * As we add NFA configurations to a DFA state, we might as well track * the set of all possible transition labels to make the DFA conversion * more efficient. W/o the reachable labels, we'd need to check the * whole vocabulary space (could be 0..\uFFFF)! The problem is that * labels can be sets, which may overlap with int labels or other sets. * As we need a deterministic set of transitions from any * state in the DFA, we must make the reachable labels set disjoint. * This operation amounts to finding the character classes for this * DFA state whereas with tools like flex, that need to generate a * homogeneous DFA, must compute char classes across all states. * We are going to generate DFAs with heterogeneous states so we * only care that the set of transitions out of a single state are * unique. :) * * The idea for adding a new set, t, is to look for overlap with the * elements of existing list s. Upon overlap, replace * existing set s[i] with two new disjoint sets, s[i]-t and s[i]&t. * (if s[i]-t is nil, don't add). The remainder is t-s[i], which is * what you want to add to the set minus what was already there. The * remainder must then be compared against the i+1..n elements in s * looking for another collision. Each collision results in a smaller * and smaller remainder. Stop when you run out of s elements or * remainder goes to nil. If remainder is non nil when you run out of * s elements, then add remainder to the end. * * Single element labels are treated as sets to make the code uniform. */ protected void addReachableLabel(Label label) { /* System.out.println("addReachableLabel to state "+dfa.decisionNumber+"."+stateNumber+": "+label.getSet().toString(dfa.nfa.grammar)); System.out.println("start of add to state "+dfa.decisionNumber+"."+stateNumber+": " + "reachableLabels="+reachableLabels.toString()); */ if ( reachableLabels.contains(label) ) { // exact label present return; } IntSet t = label.getSet(); IntSet remainder = t; // remainder starts out as whole set to add int n = reachableLabels.size(); // only look at initial elements
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -