📄 bison.texinfo
字号:
@end example@noindentThe action says how to produce the semantic value of the sum expressionfrom the values of the two subexpressions.@node GLR Parsers@section Writing @acronym{GLR} Parsers@cindex @acronym{GLR} parsing@cindex generalized @acronym{LR} (@acronym{GLR}) parsing@findex %glr-parser@cindex conflicts@cindex shift/reduce conflicts@cindex reduce/reduce conflictsIn some grammars, Bison's standard@acronym{LALR}(1) parsing algorithm cannot decide whether to apply acertain grammar rule at a given point. That is, it may not be able todecide (on the basis of the input read so far) which of two possiblereductions (applications of a grammar rule) applies, or whether to applya reduction or read more of the input and apply a reduction later in theinput. These are known respectively as @dfn{reduce/reduce} conflicts(@pxref{Reduce/Reduce}), and @dfn{shift/reduce} conflicts(@pxref{Shift/Reduce}).To use a grammar that is not easily modified to be @acronym{LALR}(1), amore general parsing algorithm is sometimes necessary. If you include@code{%glr-parser} among the Bison declarations in your file(@pxref{Grammar Outline}), the result is a Generalized @acronym{LR}(@acronym{GLR}) parser. These parsers handle Bison grammars thatcontain no unresolved conflicts (i.e., after applying precedencedeclarations) identically to @acronym{LALR}(1) parsers. However, whenfaced with unresolved shift/reduce and reduce/reduce conflicts,@acronym{GLR} parsers use the simple expedient of doing both,effectively cloning the parser to follow both possibilities. Each ofthe resulting parsers can again split, so that at any given time, therecan be any number of possible parses being explored. The parsersproceed in lockstep; that is, all of them consume (shift) a given inputsymbol before any of them proceed to the next. Each of the clonedparsers eventually meets one of two possible fates: either it runs intoa parsing error, in which case it simply vanishes, or it merges withanother parser, because the two of them have reduced the input to anidentical set of symbols.During the time that there are multiple parsers, semantic actions arerecorded, but not performed. When a parser disappears, its recordedsemantic actions disappear as well, and are never performed. When areduction makes two parsers identical, causing them to merge, Bisonrecords both sets of semantic actions. Whenever the last two parsersmerge, reverting to the single-parser case, Bison resolves all theoutstanding actions either by precedences given to the grammar rulesinvolved, or by performing both actions, and then calling a designateduser-defined function on the resulting values to produce an arbitrarymerged result.@menu* Simple GLR Parsers:: Using @acronym{GLR} parsers on unambiguous grammars* Merging GLR Parses:: Using @acronym{GLR} parsers to resolve ambiguities* Compiler Requirements:: @acronym{GLR} parsers require a modern C compiler@end menu@node Simple GLR Parsers@subsection Using @acronym{GLR} on Unambiguous Grammars@cindex @acronym{GLR} parsing, unambiguous grammars@cindex generalized @acronym{LR} (@acronym{GLR}) parsing, unambiguous grammars@findex %glr-parser@findex %expect-rr@cindex conflicts@cindex reduce/reduce conflicts@cindex shift/reduce conflictsIn the simplest cases, you can use the @acronym{GLR} algorithmto parse grammars that are unambiguous, but fail to be @acronym{LALR}(1).Such grammars typically require more than one symbol of look-ahead,or (in rare cases) fall into the category of grammars in which the@acronym{LALR}(1) algorithm throws away too much information (they are in@acronym{LR}(1), but not @acronym{LALR}(1), @ref{Mystery Conflicts}).Consider a problem thatarises in the declaration of enumerated and subrange types in theprogramming language Pascal. Here are some examples:@exampletype subrange = lo .. hi;type enum = (a, b, c);@end example@noindentThe original language standard allows only numericliterals and constant identifiers for the subrange bounds (@samp{lo}and @samp{hi}), but Extended Pascal (@acronym{ISO}/@acronym{IEC}10206) and many otherPascal implementations allow arbitrary expressions there. This givesrise to the following situation, containing a superfluous pair ofparentheses:@exampletype subrange = (a) .. b;@end example@noindentCompare this to the following declaration of an enumeratedtype with only one value:@exampletype enum = (a);@end example@noindent(These declarations are contrived, but they are syntacticallyvalid, and more-complicated cases can come up in practical programs.)These two declarations look identical until the @samp{..} token.With normal @acronym{LALR}(1) one-token look-ahead it is notpossible to decide between the two forms when the identifier@samp{a} is parsed. It is, however, desirablefor a parser to decide this, since in the latter case@samp{a} must become a new identifier to represent the enumerationvalue, while in the former case @samp{a} must be evaluated with itscurrent meaning, which may be a constant or even a function call.You could parse @samp{(a)} as an ``unspecified identifier in parentheses'',to be resolved later, but this typically requires substantialcontortions in both semantic actions and large parts of thegrammar, where the parentheses are nested in the recursive rules forexpressions.You might think of using the lexer to distinguish between the twoforms by returning different tokens for currently defined andundefined identifiers. But if these declarations occur in a localscope, and @samp{a} is defined in an outer scope, then both formsare possible---either locally redefining @samp{a}, or using thevalue of @samp{a} from the outer scope. So this approach cannotwork.A simple solution to this problem is to declare the parser touse the @acronym{GLR} algorithm.When the @acronym{GLR} parser reaches the critical state, itmerely splits into two branches and pursues both syntax rulessimultaneously. Sooner or later, one of them runs into a parsingerror. If there is a @samp{..} token before the next@samp{;}, the rule for enumerated types fails since it cannotaccept @samp{..} anywhere; otherwise, the subrange type rulefails since it requires a @samp{..} token. So one of the branchesfails silently, and the other one continues normally, performingall the intermediate actions that were postponed during the split.If the input is syntactically incorrect, both branches fail and the parserreports a syntax error as usual.The effect of all this is that the parser seems to ``guess'' thecorrect branch to take, or in other words, it seems to use morelook-ahead than the underlying @acronym{LALR}(1) algorithm actually allowsfor. In this example, @acronym{LALR}(2) would suffice, but also some casesthat are not @acronym{LALR}(@math{k}) for any @math{k} can be handled this way.In general, a @acronym{GLR} parser can take quadratic or cubic worst-case time,and the current Bison parser even takes exponential time and spacefor some grammars. In practice, this rarely happens, and for manygrammars it is possible to prove that it cannot happen.The present example contains only one conflict between tworules, and the type-declaration context containing the conflictcannot be nested. So the number ofbranches that can exist at any time is limited by the constant 2,and the parsing time is still linear.Here is a Bison grammar corresponding to the example above. Itparses a vastly simplified form of Pascal type declarations.@example%token TYPE DOTDOT ID@group%left '+' '-'%left '*' '/'@end group%%@grouptype_decl : TYPE ID '=' type ';' ;@end group@grouptype : '(' id_list ')' | expr DOTDOT expr ;@end group@groupid_list : ID | id_list ',' ID ;@end group@groupexpr : '(' expr ')' | expr '+' expr | expr '-' expr | expr '*' expr | expr '/' expr | ID ;@end group@end exampleWhen used as a normal @acronym{LALR}(1) grammar, Bison correctly complainsabout one reduce/reduce conflict. In the conflicting situation theparser chooses one of the alternatives, arbitrarily the onedeclared first. Therefore the following correct input is notrecognized:@exampletype t = (a) .. b;@end exampleThe parser can be turned into a @acronym{GLR} parser, while also telling Bisonto be silent about the one known reduce/reduce conflict, byadding these two declarations to the Bison input file (before the first@samp{%%}):@example%glr-parser%expect-rr 1@end example@noindentNo change in the grammar itself is required. Now theparser recognizes all valid declarations, according to thelimited syntax above, transparently. In fact, the user does not evennotice when the parser splits.So here we have a case where we can use the benefits of @acronym{GLR}, almostwithout disadvantages. Even in simple cases like this, however, thereare at least two potential problems to beware.First, always analyze the conflicts reported byBison to make sure that @acronym{GLR} splitting is only done where it isintended. A @acronym{GLR} parser splitting inadvertently may causeproblems less obvious than an @acronym{LALR} parser statically choosing thewrong alternative in a conflict.Second, consider interactions with the lexer (@pxref{Semantic Tokens})with great care. Since a split parser consumes tokenswithout performing any actions during the split, the lexer cannotobtain information via parser actions. Some cases oflexer interactions can be eliminated by using @acronym{GLR} toshift the complications from the lexer to the parser. You must checkthe remaining cases for correctness.In our example, it would be safe for the lexer to return tokensbased on their current meanings in some symbol table, because no newsymbols are defined in the middle of a type declaration. Though itis possible for a parser to define the enumerationconstants as they are parsed, before the type declaration iscompleted, it actually makes no difference since they cannot be usedwithin the same enumerated type declaration.@node Merging GLR Parses@subsection Using @acronym{GLR} to Resolve Ambiguities@cindex @acronym{GLR} parsing, ambiguous grammars@cindex generalized @acronym{LR} (@acronym{GLR}) parsing, ambiguous grammars@findex %dprec@findex %merge@cindex conflicts@cindex reduce/reduce conflictsLet's consider an example, vastly simplified from a C++ grammar.@example%@{ #include <stdio.h> #define YYSTYPE char const * int yylex (void); void yyerror (char const *);%@}%token TYPENAME ID%right '='%left '+'%glr-parser%%prog : | prog stmt @{ printf ("\n"); @} ;stmt : expr ';' %dprec 1 | decl %dprec 2 ;expr : ID @{ printf ("%s ", $$); @} | TYPENAME '(' expr ')' @{ printf ("%s <cast> ", $1); @} | expr '+' expr @{ printf ("+ "); @} | expr '=' expr @{ printf ("= "); @} ;decl : TYPENAME declarator ';' @{ printf ("%s <declare> ", $1); @} | TYPENAME declarator '=' expr ';' @{ printf ("%s <init-declare> ", $1); @} ;declarator : ID @{ printf ("\"%s\" ", $1); @} | '(' declarator ')' ;@end example@noindentThis models a problematic part of the C++ grammar---the ambiguity betweencertain declarations and statements. For example,@exampleT (x) = y+z;@end example@noindentparses as either an @code{expr} or a @code{stmt}(assuming that @samp{T} is recognized as a @code{TYPENAME} and@samp{x} as an @code{ID}).Bison detects this as a reduce/reduce conflict between the rules@code{expr : ID} and @code{declarator : ID}, which it cannot resolve at thetime it encounters @code{x} in the example above. Since this is a@acronym{GLR} parser, it therefore splits the problem into two parses, one foreach choice of resolving the reduce/reduce conflict.Unlike the example from the previous section (@pxref{Simple GLR Parsers}),however, neither of these parses ``dies,'' because the grammar as it stands isambiguous. One of the parsers eventually reduces @code{stmt : expr ';'} andthe other reduces @code{stmt : decl}, after which both parsers are in anidentical state: they've seen @samp{prog stmt} and have the same unprocessedinput remaining. We say that these parses have @dfn{merged.}At this point, the @acronym{GLR} parser requires a specification in thegrammar of how to choose between the competing parses.In the example above, the two @code{%dprec}declarations specify that Bison is to give precedence
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -