bison_6.htm

来自「Lex和Yacc的Manual」· HTM 代码 · 共 1,405 行 · 第 1/3 页
HTM
1,405 行
<A NAME="IDX78"></A><A NAME="IDX79"></A><A NAME="IDX80"></A><A NAME="IDX81"></A></P><P>In a simple program it may be sufficient to use the same data type forthe semantic values of all language constructs.  This was true in theRPN and infix calculator examples (see section <A HREF="bison_5.html#SEC16">Reverse Polish Notation Calculator</A>).</P><P>Bison's default is to use type <CODE>int</CODE> for all semantic values.  Tospecify some other type, define <CODE>YYSTYPE</CODE> as a macro, like this:</P><PRE>#define YYSTYPE double</PRE><P>This macro definition must go in the C declarations section of the grammarfile (see section <A HREF="bison_6.html#SEC35">Outline of a Bison Grammar</A>).</P><H3><A NAME="SEC45" HREF="index.html#SEC45">More Than One Value Type</A></H3><P>In most programs, you will need different data types for different kindsof tokens and groupings.  For example, a numeric constant may need type<CODE>int</CODE> or <CODE>long</CODE>, while a string constant needs type <CODE>char *</CODE>,and an identifier might need a pointer to an entry in the symbol table.</P><P>To use more than one data type for semantic values in one parser, Bisonrequires you to do two things:</P><UL><LI>Specify the entire collection of possible data types, with the<CODE>%union</CODE> Bison declaration (see section <A HREF="bison_6.html#SEC52">The Collection of Value Types</A>).<LI>Choose one of those types for each symbol (terminal or nonterminal)for which semantic values are used.  This is done for tokens with the<CODE>%token</CODE> Bison declaration (see section <A HREF="bison_6.html#SEC50">Token Type Names</A>) and for groupingswith the <CODE>%type</CODE> Bison declaration (see section <A HREF="bison_6.html#SEC53">Nonterminal Symbols</A>).</UL><H3><A NAME="SEC46" HREF="index.html#SEC46">Actions</A></H3><P><A NAME="IDX82"></A><A NAME="IDX83"></A><A NAME="IDX84"></A></P><P>An action accompanies a syntactic rule and contains C code to be executedeach time an instance of that rule is recognized.  The task of most actionsis to compute a semantic value for the grouping built by the rule from thesemantic values associated with tokens or smaller groupings.</P><P>An action consists of C statements surrounded by braces, much like acompound statement in C.  It can be placed at any position in the rule; itis executed at that position.  Most rules have just one action at the endof the rule, following all the components.  Actions in the middle of a ruleare tricky and used only for special purposes (see section <A HREF="bison_6.html#SEC48">Actions in Mid-Rule</A>).</P><P>The C code in an action can refer to the semantic values of the componentsmatched by the rule with the construct <CODE>$<VAR>n</VAR></CODE>, which stands forthe value of the <VAR>n</VAR>th component.  The semantic value for the groupingbeing constructed is <CODE>$$</CODE>.  (Bison translates both of these constructsinto array element references when it copies the actions into the parserfile.)</P><P>Here is a typical example:</P><PRE>exp:    ...        | exp '+' exp            { $$ = $1 + $3; }</PRE><P>This rule constructs an <CODE>exp</CODE> from two smaller <CODE>exp</CODE> groupingsconnected by a plus-sign token.  In the action, <CODE>$1</CODE> and <CODE>$3</CODE>refer to the semantic values of the two component <CODE>exp</CODE> groupings,which are the first and third symbols on the right hand side of the rule.The sum is stored into <CODE>$$</CODE> so that it becomes the semantic value ofthe addition-expression just recognized by the rule.  If there were auseful semantic value associated with the <SAMP>`+'</SAMP> token, it could bereferred to as <CODE>$2</CODE>.</P><P><A NAME="IDX85"></A>If you don't specify an action for a rule, Bison supplies a default:<CODE>$$ = $1</CODE>.  Thus, the value of the first symbol in the rule becomesthe value of the whole rule.  Of course, the default rule is valid onlyif the two data types match.  There is no meaningful default action foran empty rule; every empty rule must have an explicit action unless therule's value does not matter.</P><P><CODE>$<VAR>n</VAR></CODE> with <VAR>n</VAR> zero or negative is allowed for referenceto tokens and groupings on the stack <EM>before</EM> those that match thecurrent rule.  This is a very risky practice, and to use it reliablyyou must be certain of the context in which the rule is applied.  Hereis a case in which you can use this reliably:</P><PRE>foo:      expr bar '+' expr  { ... }        | expr bar '-' expr  { ... }        ;bar:      /* empty */        { previous_expr = $0; }        ;</PRE><P>As long as <CODE>bar</CODE> is used only in the fashion shown here, <CODE>$0</CODE>always refers to the <CODE>expr</CODE> which precedes <CODE>bar</CODE> in thedefinition of <CODE>foo</CODE>.</P><H3><A NAME="SEC47" HREF="index.html#SEC47">Data Types of Values in Actions</A></H3><P><A NAME="IDX86"></A><A NAME="IDX87"></A></P><P>If you have chosen a single data type for semantic values, the <CODE>$$</CODE>and <CODE>$<VAR>n</VAR></CODE> constructs always have that data type.</P><P>If you have used <CODE>%union</CODE> to specify a variety of data types, then youmust declare a choice among these types for each terminal or nonterminalsymbol that can have a semantic value.  Then each time you use <CODE>$$</CODE> or<CODE>$<VAR>n</VAR></CODE>, its data type is determined by which symbol it refers toin the rule.  In this example,</P><PRE>exp:    ...        | exp '+' exp            { $$ = $1 + $3; }</PRE><P><CODE>$1</CODE> and <CODE>$3</CODE> refer to instances of <CODE>exp</CODE>, so they allhave the data type declared for the nonterminal symbol <CODE>exp</CODE>.  If<CODE>$2</CODE> were used, it would have the data type declared for theterminal symbol <CODE>'+'</CODE>, whatever that might be.</P><P>Alternatively, you can specify the data type when you refer to the value,by inserting <SAMP>`&#60;<VAR>type</VAR>&#62;'</SAMP> after the <SAMP>`$'</SAMP> at the beginning of thereference.  For example, if you have defined types as shown here:</P><PRE>%union {  int itype;  double dtype;}</PRE><P>then you can write <CODE>$&#60;itype&#62;1</CODE> to refer to the first subunit of therule as an integer, or <CODE>$&#60;dtype&#62;1</CODE> to refer to it as a double.</P><H3><A NAME="SEC48" HREF="index.html#SEC48">Actions in Mid-Rule</A></H3><P><A NAME="IDX88"></A><A NAME="IDX89"></A></P><P>Occasionally it is useful to put an action in the middle of a rule.These actions are written just like usual end-of-rule actions, but theyare executed before the parser even recognizes the following components.</P><P>A mid-rule action may refer to the components preceding it using<CODE>$<VAR>n</VAR></CODE>, but it may not refer to subsequent components becauseit is run before they are parsed.</P><P>The mid-rule action itself counts as one of the components of the rule.This makes a difference when there is another action later in the same rule(and usually there is another at the end): you have to count the actionsalong with the symbols when working out which number <VAR>n</VAR> to use in<CODE>$<VAR>n</VAR></CODE>.</P><P>The mid-rule action can also have a semantic value.  The action can setits value with an assignment to <CODE>$$</CODE>, and actions later in the rulecan refer to the value using <CODE>$<VAR>n</VAR></CODE>.  Since there is no symbolto name the action, there is no way to declare a data type for the valuein advance, so you must use the <SAMP>`$&#60;...&#62;'</SAMP> construct to specify adata type each time you refer to this value.</P><P>There is no way to set the value of the entire rule with a mid-ruleaction, because assignments to <CODE>$$</CODE> do not have that effect.  Theonly way to set the value for the entire rule is with an ordinary actionat the end of the rule.</P><P>Here is an example from a hypothetical compiler, handling a <CODE>let</CODE>statement that looks like <SAMP>`let (<VAR>variable</VAR>) <VAR>statement</VAR>'</SAMP> andserves to create a variable named <VAR>variable</VAR> temporarily for theduration of <VAR>statement</VAR>.  To parse this construct, we must put<VAR>variable</VAR> into the symbol table while <VAR>statement</VAR> is parsed, thenremove it afterward.  Here is how it is done:</P><PRE>stmt:   LET '(' var ')'                { $&#60;context&#62;$ = push_context ();                  declare_variable ($3); }        stmt    { $$ = $6;                  pop_context ($&#60;context&#62;5); }</PRE><P>As soon as <SAMP>`let (<VAR>variable</VAR>)'</SAMP> has been recognized, the firstaction is run.  It saves a copy of the current semantic context (thelist of accessible variables) as its semantic value, using alternative<CODE>context</CODE> in the data-type union.  Then it calls<CODE>declare_variable</CODE> to add the new variable to that list.  Once thefirst action is finished, the embedded statement <CODE>stmt</CODE> can beparsed.  Note that the mid-rule action is component number 5, so the<SAMP>`stmt'</SAMP> is component number 6.</P><P>After the embedded statement is parsed, its semantic value becomes thevalue of the entire <CODE>let</CODE>-statement.  Then the semantic value from theearlier action is used to restore the prior list of variables.  Thisremoves the temporary <CODE>let</CODE>-variable from the list so that it won'tappear to exist while the rest of the program is parsed.</P><P>Taking action before a rule is completely recognized often leads toconflicts since the parser must commit to a parse in order to execute theaction.  For example, the following two rules, without mid-rule actions,can coexist in a working parser because the parser can shift the open-bracetoken and look at what follows before deciding whether there is adeclaration or not:</P><PRE>compound: '{' declarations statements '}'        | '{' statements '}'        ;</PRE><P>But when we add a mid-rule action as follows, the rules become nonfunctional:</P><PRE>compound: { prepare_for_local_variables (); }          '{' declarations statements '}'        | '{' statements '}'        ;</PRE><P>Now the parser is forced to decide whether to run the mid-rule actionwhen it has read no farther than the open-brace.  In other words, itmust commit to using one rule or the other, without sufficientinformation to do it correctly.  (The open-brace token is what is calledthe <STRONG>look-ahead</STRONG> token at this time, since the parser is stilldeciding what to do about it.  See section <A HREF="bison_8.html#SEC69">Look-Ahead Tokens</A>.)</P><P>You might think that you could correct the problem by putting identicalactions into the two rules, like this:</P><PRE>compound: { prepare_for_local_variables (); }          '{' declarations statements '}'        | { prepare_for_local_variables (); }          '{' statements '}'        ;</PRE><P>But this does not help, because Bison does not realize that the two actionsare identical.  (Bison never tries to understand the C code in an action.)</P><P>If the grammar is such that a declaration can be distinguished from astatement by the first token (which is true in C), then one solution whichdoes work is to put the action after the open-brace, like this:</P><PRE>compound: '{' { prepare_for_local_variables (); }          declarations statements '}'        | '{' statements '}'        ;</PRE><P>Now the first token of the following declaration or statement,which would in any case tell Bison which rule to use, can still do so.</P><P>Another solution is to bury the action inside a nonterminal symbol whichserves as a subroutine:</P><PRE>subroutine: /* empty */          { prepare_for_local_variables (); }        ;compound: subroutine          '{' declarations statements '}'        | subroutine          '{' statements '}'        ;</PRE><P>Now Bison can execute the action in the rule for <CODE>subroutine</CODE> withoutdeciding which rule for <CODE>compound</CODE> it will eventually use.  Note thatthe action is now at the end of its rule.  Any mid-rule action can beconverted to an end-of-rule action in this way, and this is what Bisonactually does to implement mid-rule actions.</P><H2><A NAME="SEC49" HREF="index.html#SEC49">Bison Declarations</A></H2><P><A NAME="IDX90"></A><A NAME="IDX91"></A></P><P>The <STRONG>Bison declarations</STRONG> section of a Bison grammar defines the symbolsused in formulating the grammar and the data types of semantic values.See section <A HREF="bison_6.html#SEC40">Symbols, Terminal and Nonterminal</A>.</P><P>All token type names (but not single-character literal tokens such as<CODE>'+'</CODE> and <CODE>'*'</CODE>) must be declared.  Nonterminal symbols must bedeclared if you need to specify which data type to use for the semanticvalue (see section <A HREF="bison_6.html#SEC45">More Than One Value Type</A>).</P><P>The first rule in the file also specifies the start symbol, by default.If you want some other symbol to be the start symbol, you must declareit explicitly (see section <A HREF="bison_4.html#SEC8">Languages and Context-Free Grammars</A>).</P><H3><A NAME="SEC50" HREF="index.html#SEC50">Token Type Names</A></H3><P><A NAME="IDX92"></A><A NAME="IDX93"></A><A NAME="IDX94"></A><A NAME="IDX95"></A></P><P>The basic way to declare a token type name (terminal symbol) is as follows:</P><PRE>%token <VAR>name</VAR></PRE><P>Bison will convert this into a <CODE>#define</CODE> directive inthe parser, so that the function <CODE>yylex</CODE> (if it is in this file)can use the name <VAR>name</VAR> to stand for this token type's code.</P><P>Alternatively, you can use <CODE>%left</CODE>, <CODE>%right</CODE>, or <CODE>%nonassoc</CODE>instead of <CODE>%token</CODE>, if you wish to specify precedence.See section <A HREF="bison_6.html#SEC51">Operator Precedence</A>.</P><P>You can explicitly specify the numeric code for a token type by appendingan integer value in the field immediately following the token name:</P><PRE>%token NUM 300</PRE><P>It is generally best, however, to let Bison choose the numeric codes forall token types.  Bison will automatically select codes that don't conflictwith each other or with ASCII characters.</P><P>In the event that the stack type is a union, you must augment the<CODE>%token</CODE> or other token declaration to include the data typealternative delimited by angle-brackets (see section <A HREF="bison_6.html#SEC45">More Than One Value Type</A>).  </P><P>For example:</P><PRE>%union {              /* define stack type */
bison_6.htm - 源码说明

本页面展示了「Lex和Yacc的Manual」中的 bison_6.htm 源码文件，采用 HTM 编程语言编写，共 1,405 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Manual相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?