bison_8.htm

来自「Lex和Yacc的Manual」· HTM 代码 · 共 845 行 · 第 1/2 页
HTM
845 行
Not all rules and not all tokens have precedence.  If either the rule orthe look-ahead token has no precedence, then the default is to shift.</P><H2><A NAME="SEC76" HREF="index.html#SEC76">Context-Dependent Precedence</A></H2><P><A NAME="IDX162"></A><A NAME="IDX163"></A><A NAME="IDX164"></A><A NAME="IDX165"></A><A NAME="IDX166"></A></P><P>Often the precedence of an operator depends on the context.  This soundsoutlandish at first, but it is really very common.  For example, a minussign typically has a very high precedence as a unary operator, and asomewhat lower precedence (lower than multiplication) as a binary operator.</P><P>The Bison precedence declarations, <CODE>%left</CODE>, <CODE>%right</CODE> and<CODE>%nonassoc</CODE>, can only be used once for a given token; so a token hasonly one precedence declared in this way.  For context-dependentprecedence, you need to use an additional mechanism: the <CODE>%prec</CODE>modifier for rules.</P><P>The <CODE>%prec</CODE> modifier declares the precedence of a particular rule byspecifying a terminal symbol whose precedence should be used for that rule.It's not necessary for that symbol to appear otherwise in the rule.  Themodifier's syntax is:</P><PRE>%prec <VAR>terminal-symbol</VAR></PRE><P>and it is written after the components of the rule.  Its effect is toassign the rule the precedence of <VAR>terminal-symbol</VAR>, overridingthe precedence that would be deduced for it in the ordinary way.  Thealtered rule precedence then affects how conflicts involving that ruleare resolved (see section <A HREF="bison_8.html#SEC71">Operator Precedence</A>).</P><P>Here is how <CODE>%prec</CODE> solves the problem of unary minus.  First, declarea precedence for a fictitious terminal symbol named <CODE>UMINUS</CODE>.  Thereare no tokens of this type, but the symbol serves to stand for itsprecedence:</P><PRE>...%left '+' '-'%left '*'%left UMINUS</PRE><P>Now the precedence of <CODE>UMINUS</CODE> can be used in specific rules:</P><PRE>exp:    ...        | exp '-' exp        ...        | '-' exp %prec UMINUS</PRE><H2><A NAME="SEC77" HREF="index.html#SEC77">Parser States</A></H2><P><A NAME="IDX167"></A><A NAME="IDX168"></A><A NAME="IDX169"></A></P><P>The function <CODE>yyparse</CODE> is implemented using a finite-state machine.The values pushed on the parser stack are not simply token type codes; theyrepresent the entire sequence of terminal and nonterminal symbols at ornear the top of the stack.  The current state collects all the informationabout previous input which is relevant to deciding what to do next.</P><P>Each time a look-ahead token is read, the current parser state togetherwith the type of look-ahead token are looked up in a table.  This tableentry can say, "Shift the look-ahead token."  In this case, it alsospecifies the new parser state, which is pushed onto the top of theparser stack.  Or it can say, "Reduce using rule number <VAR>n</VAR>."This means that a certain number of tokens or groupings are taken offthe top of the stack, and replaced by one grouping.  In other words,that number of states are popped from the stack, and one new state ispushed.</P><P>There is one other alternative: the table can say that the look-ahead tokenis erroneous in the current state.  This causes error processing to begin(see section <A HREF="bison_9.html#SEC81">Error Recovery</A>).</P><H2><A NAME="SEC78" HREF="index.html#SEC78">Reduce/Reduce Conflicts</A></H2><P><A NAME="IDX170"></A><A NAME="IDX171"></A></P><P>A reduce/reduce conflict occurs if there are two or more rules that applyto the same sequence of input.  This usually indicates a serious errorin the grammar.</P><P>For example, here is an erroneous attempt to define a sequenceof zero or more <CODE>word</CODE> groupings.</P><PRE>sequence: /* empty */                { printf ("empty sequence\n"); }        | maybeword        | sequence word                { printf ("added word %s\n", $2); }        ;maybeword: /* empty */                { printf ("empty maybeword\n"); }        | word                { printf ("single word %s\n", $1); }        ;</PRE><P>The error is an ambiguity: there is more than one way to parse a single<CODE>word</CODE> into a <CODE>sequence</CODE>.  It could be reduced to a<CODE>maybeword</CODE> and then into a <CODE>sequence</CODE> via the second rule.Alternatively, nothing-at-all could be reduced into a <CODE>sequence</CODE>via the first rule, and this could be combined with the <CODE>word</CODE>using the third rule for <CODE>sequence</CODE>.</P><P>There is also more than one way to reduce nothing-at-all into a<CODE>sequence</CODE>.  This can be done directly via the first rule,or indirectly via <CODE>maybeword</CODE> and then the second rule.</P><P>You might think that this is a distinction without a difference, because itdoes not change whether any particular input is valid or not.  But it doesaffect which actions are run.  One parsing order runs the second rule'saction; the other runs the first rule's action and the third rule's action.In this example, the output of the program changes.</P><P>Bison resolves a reduce/reduce conflict by choosing to use the rule thatappears first in the grammar, but it is very risky to rely on this.  Everyreduce/reduce conflict must be studied and usually eliminated.  Here is theproper way to define <CODE>sequence</CODE>:</P><PRE>sequence: /* empty */                { printf ("empty sequence\n"); }        | sequence word                { printf ("added word %s\n", $2); }        ;</PRE><P>Here is another common error that yields a reduce/reduce conflict:</P><PRE>sequence: /* empty */        | sequence words        | sequence redirects        ;words:    /* empty */        | words word        ;redirects:/* empty */        | redirects redirect        ;</PRE><P>The intention here is to define a sequence which can contain either<CODE>word</CODE> or <CODE>redirect</CODE> groupings.  The individual definitions of<CODE>sequence</CODE>, <CODE>words</CODE> and <CODE>redirects</CODE> are error-free, but thethree together make a subtle ambiguity: even an empty input can be parsedin infinitely many ways!</P><P>Consider: nothing-at-all could be a <CODE>words</CODE>.  Or it could be two<CODE>words</CODE> in a row, or three, or any number.  It could equally well be a<CODE>redirects</CODE>, or two, or any number.  Or it could be a <CODE>words</CODE>followed by three <CODE>redirects</CODE> and another <CODE>words</CODE>.  And so on.</P><P>Here are two ways to correct these rules.  First, to make it a single levelof sequence:</P><PRE>sequence: /* empty */        | sequence word        | sequence redirect        ;</PRE><P>Second, to prevent either a <CODE>words</CODE> or a <CODE>redirects</CODE>from being empty:</P><PRE>sequence: /* empty */        | sequence words        | sequence redirects        ;words:    word        | words word        ;redirects:redirect        | redirects redirect        ;</PRE><H2><A NAME="SEC79" HREF="index.html#SEC79">Mysterious Reduce/Reduce Conflicts</A></H2><P>Sometimes reduce/reduce conflicts can occur that don't look warranted.Here is an example:</P><PRE>%token ID%%def:    param_spec return_spec ','        ;param_spec:             type        |    name_list ':' type        ;return_spec:             type        |    name ':' type        ;type:        ID        ;name:        ID        ;name_list:             name        |    name ',' name_list        ;</PRE><P>It would seem that this grammar can be parsed with only a single tokenof look-ahead: when a <CODE>param_spec</CODE> is being read, an <CODE>ID</CODE> is a <CODE>name</CODE> if a comma or colon follows, or a <CODE>type</CODE> if another<CODE>ID</CODE> follows.  In other words, this grammar is LR(1).</P><P><A NAME="IDX172"></A><A NAME="IDX173"></A>However, Bison, like most parser generators, cannot actually handle allLR(1) grammars.  In this grammar, two contexts, that after an <CODE>ID</CODE>at the beginning of a <CODE>param_spec</CODE> and likewise at the beginning ofa <CODE>return_spec</CODE>, are similar enough that Bison assumes they are thesame.  They appear similar because the same set of rules would beactive--the rule for reducing to a <CODE>name</CODE> and that for reducing toa <CODE>type</CODE>.  Bison is unable to determine at that stage of processingthat the rules would require different look-ahead tokens in the twocontexts, so it makes a single parser state for them both.  Combiningthe two contexts causes a conflict later.  In parser terminology, thisoccurrence means that the grammar is not LALR(1).</P><P>In general, it is better to fix deficiencies than to document them.  Butthis particular deficiency is intrinsically hard to fix; parsergenerators that can handle LR(1) grammars are hard to write and tend toproduce parsers that are very large.  In practice, Bison is more usefulas it is now.</P><P>When the problem arises, you can often fix it by identifying the twoparser states that are being confused, and adding something to make themlook distinct.  In the above example, adding one rule to<CODE>return_spec</CODE> as follows makes the problem go away:</P><PRE>%token BOGUS...%%...return_spec:             type        |    name ':' type        /* This rule is never used.  */        |    ID BOGUS        ;</PRE><P>This corrects the problem because it introduces the possibility of anadditional active rule in the context after the <CODE>ID</CODE> at the beginning of<CODE>return_spec</CODE>.  This rule is not active in the corresponding contextin a <CODE>param_spec</CODE>, so the two contexts receive distinct parser states.As long as the token <CODE>BOGUS</CODE> is never generated by <CODE>yylex</CODE>,the added rule cannot alter the way actual input is parsed.</P><P>In this particular example, there is another way to solve the problem:rewrite the rule for <CODE>return_spec</CODE> to use <CODE>ID</CODE> directlyinstead of via <CODE>name</CODE>.  This also causes the two confusingcontexts to have different sets of active rules, because the one for<CODE>return_spec</CODE> activates the altered rule for <CODE>return_spec</CODE>rather than the one for <CODE>name</CODE>.</P><PRE>param_spec:             type        |    name_list ':' type        ;return_spec:             type        |    ID ':' type        ;</PRE><H2><A NAME="SEC80" HREF="index.html#SEC80">Stack Overflow, and How to Avoid It</A></H2><P><A NAME="IDX174"></A><A NAME="IDX175"></A><A NAME="IDX176"></A></P><P>The Bison parser stack can overflow if too many tokens are shifted andnot reduced.  When this happens, the parser function <CODE>yyparse</CODE>returns a nonzero value, pausing only to call <CODE>yyerror</CODE> to reportthe overflow.</P><P><A NAME="IDX177"></A>By defining the macro <CODE>YYMAXDEPTH</CODE>, you can control how deep theparser stack can become before a stack overflow occurs.  Define themacro with a value that is an integer.  This value is the maximum numberof tokens that can be shifted (and not reduced) before overflow.It must be a constant expression whose value is known at compile time.</P><P>The stack space allowed is not necessarily allocated.  If you specify alarge value for <CODE>YYMAXDEPTH</CODE>, the parser actually allocates a smallstack at first, and then makes it bigger by stages as needed.  Thisincreasing allocation happens automatically and silently.  Therefore,you do not need to make <CODE>YYMAXDEPTH</CODE> painfully small merely to savespace for ordinary inputs that do not need much stack.</P><P><A NAME="IDX178"></A>The default value of <CODE>YYMAXDEPTH</CODE>, if you do not define it, is10000.</P><P><A NAME="IDX179"></A>You can control how much stack is allocated initially by defining themacro <CODE>YYINITDEPTH</CODE>.  This value too must be a compile-timeconstant integer.  The default is 200.</P><HR>Go to the <A HREF="bison_1.html">first</A>, <A HREF="bison_7.html">previous</A>, <A HREF="bison_9.html">next</A>, <A HREF="bison_15.html">last</A> section, <A HREF="index.html">table of contents</A>.</BODY></HTML>
bison_8.htm - 源码说明

本页面展示了「Lex和Yacc的Manual」中的 bison_8.htm 源码文件，采用 HTM 编程语言编写，共 845 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Manual相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?