📄 bison_8.htm
字号:
<HTML><HEAD><!-- This HTML file has been created by texi2html 1.44 from /opt/src/gnu/bison-1.25/bison.texinfo on 30 June 1997 --><TITLE>Bison 1.25 - The Bison Parser Algorithm</TITLE></HEAD><BODY>Go to the <A HREF="bison_1.html">first</A>, <A HREF="bison_7.html">previous</A>, <A HREF="bison_9.html">next</A>, <A HREF="bison_15.html">last</A> section, <A HREF="index.html">table of contents</A>.<HR><H1><A NAME="SEC68" HREF="index.html#SEC68">The Bison Parser Algorithm</A></H1><P><A NAME="IDX144"></A><A NAME="IDX145"></A><A NAME="IDX146"></A><A NAME="IDX147"></A><A NAME="IDX148"></A><A NAME="IDX149"></A></P><P>As Bison reads tokens, it pushes them onto a stack along with theirsemantic values. The stack is called the <STRONG>parser stack</STRONG>. Pushing atoken is traditionally called <STRONG>shifting</STRONG>.</P><P>For example, suppose the infix calculator has read <SAMP>`1 + 5 *'</SAMP>, with a<SAMP>`3'</SAMP> to come. The stack will have four elements, one for each tokenthat was shifted.</P><P>But the stack does not always have an element for each token read. Whenthe last <VAR>n</VAR> tokens and groupings shifted match the components of agrammar rule, they can be combined according to that rule. This is called<STRONG>reduction</STRONG>. Those tokens and groupings are replaced on the stack by asingle grouping whose symbol is the result (left hand side) of that rule.Running the rule's action is part of the process of reduction, because thisis what computes the semantic value of the resulting grouping.</P><P>For example, if the infix calculator's parser stack contains this:</P><PRE>1 + 5 * 3</PRE><P>and the next input token is a newline character, then the last threeelements can be reduced to 15 via the rule:</P><PRE>expr: expr '*' expr;</PRE><P>Then the stack contains just these three elements:</P><PRE>1 + 15</PRE><P>At this point, another reduction can be made, resulting in the single value16. Then the newline token can be shifted.</P><P>The parser tries, by shifts and reductions, to reduce the entire input downto a single grouping whose symbol is the grammar's start-symbol(see section <A HREF="bison_4.html#SEC8">Languages and Context-Free Grammars</A>).</P><P>This kind of parser is known in the literature as a bottom-up parser.</P><H2><A NAME="SEC69" HREF="index.html#SEC69">Look-Ahead Tokens</A></H2><P><A NAME="IDX150"></A></P><P>The Bison parser does <EM>not</EM> always reduce immediately as soon as thelast <VAR>n</VAR> tokens and groupings match a rule. This is because such asimple strategy is inadequate to handle most languages. Instead, when areduction is possible, the parser sometimes "looks ahead" at the nexttoken in order to decide what to do.</P><P>When a token is read, it is not immediately shifted; first it becomes the<STRONG>look-ahead token</STRONG>, which is not on the stack. Now the parser canperform one or more reductions of tokens and groupings on the stack, whilethe look-ahead token remains off to the side. When no more reductionsshould take place, the look-ahead token is shifted onto the stack. Thisdoes not mean that all possible reductions have been done; depending on thetoken type of the look-ahead token, some rules may choose to delay theirapplication.</P><P>Here is a simple case where look-ahead is needed. These three rules defineexpressions which contain binary addition operators and postfix unaryfactorial operators (<SAMP>`!'</SAMP>), and allow parentheses for grouping.</P><PRE>expr: term '+' expr | term ;term: '(' expr ')' | term '!' | NUMBER ;</PRE><P>Suppose that the tokens <SAMP>`1 + 2'</SAMP> have been read and shifted; whatshould be done? If the following token is <SAMP>`)'</SAMP>, then the first threetokens must be reduced to form an <CODE>expr</CODE>. This is the only validcourse, because shifting the <SAMP>`)'</SAMP> would produce a sequence of symbols<CODE>term ')'</CODE>, and no rule allows this.</P><P>If the following token is <SAMP>`!'</SAMP>, then it must be shifted immediately sothat <SAMP>`2 !'</SAMP> can be reduced to make a <CODE>term</CODE>. If instead theparser were to reduce before shifting, <SAMP>`1 + 2'</SAMP> would become an<CODE>expr</CODE>. It would then be impossible to shift the <SAMP>`!'</SAMP> becausedoing so would produce on the stack the sequence of symbols <CODE>expr'!'</CODE>. No rule allows that sequence.</P><P><A NAME="IDX151"></A>The current look-ahead token is stored in the variable <CODE>yychar</CODE>.See section <A HREF="bison_7.html#SEC67">Special Features for Use in Actions</A>.</P><H2><A NAME="SEC70" HREF="index.html#SEC70">Shift/Reduce Conflicts</A></H2><P><A NAME="IDX152"></A><A NAME="IDX153"></A><A NAME="IDX154"></A><A NAME="IDX155"></A></P><P>Suppose we are parsing a language which has if-then and if-then-elsestatements, with a pair of rules like this:</P><PRE>if_stmt: IF expr THEN stmt | IF expr THEN stmt ELSE stmt ;</PRE><P>Here we assume that <CODE>IF</CODE>, <CODE>THEN</CODE> and <CODE>ELSE</CODE> areterminal symbols for specific keyword tokens.</P><P>When the <CODE>ELSE</CODE> token is read and becomes the look-ahead token, thecontents of the stack (assuming the input is valid) are just right forreduction by the first rule. But it is also legitimate to shift the<CODE>ELSE</CODE>, because that would lead to eventual reduction by the secondrule.</P><P>This situation, where either a shift or a reduction would be valid, iscalled a <STRONG>shift/reduce conflict</STRONG>. Bison is designed to resolvethese conflicts by choosing to shift, unless otherwise directed byoperator precedence declarations. To see the reason for this, let'scontrast it with the other alternative.</P><P>Since the parser prefers to shift the <CODE>ELSE</CODE>, the result is to attachthe else-clause to the innermost if-statement, making these two inputsequivalent:</P><PRE>if x then if y then win (); else lose;if x then do; if y then win (); else lose; end;</PRE><P>But if the parser chose to reduce when possible rather than shift, theresult would be to attach the else-clause to the outermost if-statement,making these two inputs equivalent:</P><PRE>if x then if y then win (); else lose;if x then do; if y then win (); end; else lose;</PRE><P>The conflict exists because the grammar as written is ambiguous: eitherparsing of the simple nested if-statement is legitimate. The establishedconvention is that these ambiguities are resolved by attaching theelse-clause to the innermost if-statement; this is what Bison accomplishesby choosing to shift rather than reduce. (It would ideally be cleaner towrite an unambiguous grammar, but that is very hard to do in this case.)This particular ambiguity was first encountered in the specifications ofAlgol 60 and is called the "dangling <CODE>else</CODE>" ambiguity.</P><P>To avoid warnings from Bison about predictable, legitimate shift/reduceconflicts, use the <CODE>%expect <VAR>n</VAR></CODE> declaration. There will be nowarning as long as the number of shift/reduce conflicts is exactly <VAR>n</VAR>.See section <A HREF="bison_6.html#SEC54">Suppressing Conflict Warnings</A>.</P><P>The definition of <CODE>if_stmt</CODE> above is solely to blame for theconflict, but the conflict does not actually appear without additionalrules. Here is a complete Bison input file that actually manifests theconflict:</P><PRE>%token IF THEN ELSE variable%%stmt: expr | if_stmt ;if_stmt: IF expr THEN stmt | IF expr THEN stmt ELSE stmt ;expr: variable ;</PRE><H2><A NAME="SEC71" HREF="index.html#SEC71">Operator Precedence</A></H2><P><A NAME="IDX156"></A><A NAME="IDX157"></A></P><P>Another situation where shift/reduce conflicts appear is in arithmeticexpressions. Here shifting is not always the preferred resolution; theBison declarations for operator precedence allow you to specify when toshift and when to reduce.</P><H3><A NAME="SEC72" HREF="index.html#SEC72">When Precedence is Needed</A></H3><P>Consider the following ambiguous grammar fragment (ambiguous because theinput <SAMP>`1 - 2 * 3'</SAMP> can be parsed in two different ways):</P><PRE>expr: expr '-' expr | expr '*' expr | expr '<' expr | '(' expr ')' ... ;</PRE><P>Suppose the parser has seen the tokens <SAMP>`1'</SAMP>, <SAMP>`-'</SAMP> and <SAMP>`2'</SAMP>;should it reduce them via the rule for the addition operator? It dependson the next token. Of course, if the next token is <SAMP>`)'</SAMP>, we mustreduce; shifting is invalid because no single rule can reduce the tokensequence <SAMP>`- 2 )'</SAMP> or anything starting with that. But if the nexttoken is <SAMP>`*'</SAMP> or <SAMP>`<'</SAMP>, we have a choice: either shifting orreduction would allow the parse to complete, but with differentresults.</P><P>To decide which one Bison should do, we must consider theresults. If the next operator token <VAR>op</VAR> is shifted, then itmust be reduced first in order to permit another opportunity toreduce the sum. The result is (in effect) <SAMP>`1 - (2<VAR>op</VAR> 3)'</SAMP>. On the other hand, if the subtraction is reducedbefore shifting <VAR>op</VAR>, the result is <SAMP>`(1 - 2) <VAR>op</VAR>3'</SAMP>. Clearly, then, the choice of shift or reduce should dependon the relative precedence of the operators <SAMP>`-'</SAMP> and<VAR>op</VAR>: <SAMP>`*'</SAMP> should be shifted first, but not <SAMP>`<'</SAMP>.</P><P><A NAME="IDX158"></A>What about input such as <SAMP>`1 - 2 - 5'</SAMP>; should this be<SAMP>`(1 - 2) - 5'</SAMP> or should it be <SAMP>`1 - (2 - 5)'</SAMP>? Formost operators we prefer the former, which is called <STRONG>leftassociation</STRONG>. The latter alternative, <STRONG>right association</STRONG>, isdesirable for assignment operators. The choice of left or rightassociation is a matter of whether the parser chooses to shift orreduce when the stack contains <SAMP>`1 - 2'</SAMP> and the look-aheadtoken is <SAMP>`-'</SAMP>: shifting makes right-associativity.</P><H3><A NAME="SEC73" HREF="index.html#SEC73">Specifying Operator Precedence</A></H3><P><A NAME="IDX159"></A><A NAME="IDX160"></A><A NAME="IDX161"></A></P><P>Bison allows you to specify these choices with the operator precedencedeclarations <CODE>%left</CODE> and <CODE>%right</CODE>. Each such declarationcontains a list of tokens, which are operators whose precedence andassociativity is being declared. The <CODE>%left</CODE> declaration makes allthose operators left-associative and the <CODE>%right</CODE> declaration makesthem right-associative. A third alternative is <CODE>%nonassoc</CODE>, whichdeclares that it is a syntax error to find the same operator twice "in arow".</P><P>The relative precedence of different operators is controlled by theorder in which they are declared. The first <CODE>%left</CODE> or<CODE>%right</CODE> declaration in the file declares the operators whoseprecedence is lowest, the next such declaration declares the operatorswhose precedence is a little higher, and so on.</P><H3><A NAME="SEC74" HREF="index.html#SEC74">Precedence Examples</A></H3><P>In our example, we would want the following declarations:</P><PRE>%left '<'%left '-'%left '*'</PRE><P>In a more complete example, which supports other operators as well, wewould declare them in groups of equal precedence. For example, <CODE>'+'</CODE> isdeclared with <CODE>'-'</CODE>:</P><PRE>%left '<' '>' '=' NE LE GE%left '+' '-'%left '*' '/'</PRE><P>(Here <CODE>NE</CODE> and so on stand for the operators for "not equal"and so on. We assume that these tokens are more than one character longand therefore are represented by names, not character literals.)</P><H3><A NAME="SEC75" HREF="index.html#SEC75">How Precedence Works</A></H3><P>The first effect of the precedence declarations is to assign precedencelevels to the terminal symbols declared. The second effect is to assignprecedence levels to certain rules: each rule gets its precedence from thelast terminal symbol mentioned in the components. (You can also specifyexplicitly the precedence of a rule. See section <A HREF="bison_8.html#SEC76">Context-Dependent Precedence</A>.)</P><P>Finally, the resolution of conflicts works by comparing theprecedence of the rule being considered with that of thelook-ahead token. If the token's precedence is higher, thechoice is to shift. If the rule's precedence is higher, thechoice is to reduce. If they have equal precedence, the choiceis made based on the associativity of that precedence level. Theverbose output file made by <SAMP>`-v'</SAMP> (see section <A HREF="bison_12.html#SEC87">Invoking Bison</A>) sayshow each conflict was resolved.</P><P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -