⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 bison_6.htm

📁 Lex和Yacc的Manual
💻 HTM
📖 第 1 页 / 共 3 页
字号:
<HTML><HEAD><!-- This HTML file has been created by texi2html 1.44     from /opt/src/gnu/bison-1.25/bison.texinfo on 30 June 1997 --><TITLE>Bison 1.25 - Bison Grammar Files</TITLE></HEAD><BODY>Go to the <A HREF="bison_1.html">first</A>, <A HREF="bison_5.html">previous</A>, <A HREF="bison_7.html">next</A>, <A HREF="bison_15.html">last</A> section, <A HREF="index.html">table of contents</A>.<HR><H1><A NAME="SEC34" HREF="index.html#SEC34">Bison Grammar Files</A></H1><P>Bison takes as input a context-free grammar specification and produces aC-language function that recognizes correct instances of the grammar.</P><P>The Bison grammar input file conventionally has a name ending in <SAMP>`.y'</SAMP>.</P><H2><A NAME="SEC35" HREF="index.html#SEC35">Outline of a Bison Grammar</A></H2><P>A Bison grammar file has four main sections, shown here with theappropriate delimiters:</P><PRE>%{<VAR>C declarations</VAR>%}<VAR>Bison declarations</VAR>%%<VAR>Grammar rules</VAR>%%<VAR>Additional C code</VAR></PRE><P>Comments enclosed in <SAMP>`/* ... */'</SAMP> may appear in any of the sections.</P><H3><A NAME="SEC36" HREF="index.html#SEC36">The C Declarations Section</A></H3><P><A NAME="IDX50"></A><A NAME="IDX51"></A></P><P>The <VAR>C declarations</VAR> section contains macro definitions anddeclarations of functions and variables that are used in the actions in thegrammar rules.  These are copied to the beginning of the parser file sothat they precede the definition of <CODE>yyparse</CODE>.  You can use<SAMP>`#include'</SAMP> to get the declarations from a header file.  If you don'tneed any C declarations, you may omit the <SAMP>`%{'</SAMP> and <SAMP>`%}'</SAMP>delimiters that bracket this section.</P><H3><A NAME="SEC37" HREF="index.html#SEC37">The Bison Declarations Section</A></H3><P><A NAME="IDX52"></A><A NAME="IDX53"></A></P><P>The <VAR>Bison declarations</VAR> section contains declarations that defineterminal and nonterminal symbols, specify precedence, and so on.In some simple grammars you may not need any declarations.See section <A HREF="bison_6.html#SEC49">Bison Declarations</A>.</P><H3><A NAME="SEC38" HREF="index.html#SEC38">The Grammar Rules Section</A></H3><P><A NAME="IDX54"></A><A NAME="IDX55"></A></P><P>The <STRONG>grammar rules</STRONG> section contains one or more Bison grammarrules, and nothing else.  See section <A HREF="bison_6.html#SEC41">Syntax of Grammar Rules</A>.</P><P>There must always be at least one grammar rule, and the first<SAMP>`%%'</SAMP> (which precedes the grammar rules) may never be omitted evenif it is the first thing in the file.</P><H3><A NAME="SEC39" HREF="index.html#SEC39">The Additional C Code Section</A></H3><P><A NAME="IDX56"></A><A NAME="IDX57"></A></P><P>The <VAR>additional C code</VAR> section is copied verbatim to the end ofthe parser file, just as the <VAR>C declarations</VAR> section is copied tothe beginning.  This is the most convenient place to put anythingthat you want to have in the parser file but which need not come beforethe definition of <CODE>yyparse</CODE>.  For example, the definitions of<CODE>yylex</CODE> and <CODE>yyerror</CODE> often go here.  See section <A HREF="bison_7.html#SEC59">Parser C-Language Interface</A>.</P><P>If the last section is empty, you may omit the <SAMP>`%%'</SAMP> that separates itfrom the grammar rules.</P><P>The Bison parser itself contains many static variables whose names startwith <SAMP>`yy'</SAMP> and many macros whose names start with <SAMP>`YY'</SAMP>.  It is agood idea to avoid using any such names (except those documented in thismanual) in the additional C code section of the grammar file.</P><H2><A NAME="SEC40" HREF="index.html#SEC40">Symbols, Terminal and Nonterminal</A></H2><P><A NAME="IDX58"></A><A NAME="IDX59"></A><A NAME="IDX60"></A><A NAME="IDX61"></A></P><P><STRONG>Symbols</STRONG> in Bison grammars represent the grammatical classificationsof the language.</P><P>A <STRONG>terminal symbol</STRONG> (also known as a <STRONG>token type</STRONG>) represents aclass of syntactically equivalent tokens.  You use the symbol in grammarrules to mean that a token in that class is allowed.  The symbol isrepresented in the Bison parser by a numeric code, and the <CODE>yylex</CODE>function returns a token type code to indicate what kind of token has beenread.  You don't need to know what the code value is; you can use thesymbol to stand for it.</P><P>A <STRONG>nonterminal symbol</STRONG> stands for a class of syntactically equivalentgroupings.  The symbol name is used in writing grammar rules.  By convention,it should be all lower case.</P><P>Symbol names can contain letters, digits (not at the beginning),underscores and periods.  Periods make sense only in nonterminals.</P><P>There are three ways of writing terminal symbols in the grammar:</P><UL><LI>A <STRONG>named token type</STRONG> is written with an identifier, like anidentifier in C.  By convention, it should be all upper case.  Eachsuch name must be defined with a Bison declaration such as<CODE>%token</CODE>.  See section <A HREF="bison_6.html#SEC50">Token Type Names</A>.<LI><A NAME="IDX62"></A><A NAME="IDX63"></A><A NAME="IDX64"></A>A <STRONG>character token type</STRONG> (or <STRONG>literal character token</STRONG>) iswritten in the grammar using the same syntax used in C for characterconstants; for example, <CODE>'+'</CODE> is a character token type.  Acharacter token type doesn't need to be declared unless you need tospecify its semantic value data type (see section <A HREF="bison_6.html#SEC44">Data Types of Semantic Values</A>), associativity, or precedence (see section <A HREF="bison_8.html#SEC71">Operator Precedence</A>).By convention, a character token type is used only to represent atoken that consists of that particular character.  Thus, the tokentype <CODE>'+'</CODE> is used to represent the character <SAMP>`+'</SAMP> as atoken.  Nothing enforces this convention, but if you depart from it,your program will confuse other readers.All the usual escape sequences used in character literals in C can beused in Bison as well, but you must not use the null character as acharacter literal because its ASCII code, zero, is the code <CODE>yylex</CODE>returns for end-of-input (see section <A HREF="bison_7.html#SEC62">Calling Convention for <CODE>yylex</CODE></A>).<LI><A NAME="IDX65"></A><A NAME="IDX66"></A><A NAME="IDX67"></A>A <STRONG>literal string token</STRONG> is written like a C string constant; forexample, <CODE>"&#60;="</CODE> is a literal string token.  A literal string tokendoesn't need to be declared unless you need to specify its semanticvalue data type (see section <A HREF="bison_6.html#SEC44">Data Types of Semantic Values</A>), associativity, precedence(see section <A HREF="bison_8.html#SEC71">Operator Precedence</A>).You can associate the literal string token with a symbolic name as analias, using the <CODE>%token</CODE> declaration (see section <A HREF="bison_6.html#SEC50">Token Type Names</A>).  If you don't do that, the lexical analyzer has toretrieve the token number for the literal string token from the<CODE>yytname</CODE> table (see section <A HREF="bison_7.html#SEC62">Calling Convention for <CODE>yylex</CODE></A>).<STRONG>WARNING</STRONG>: literal string tokens do not work in Yacc.By convention, a literal string token is used only to represent a tokenthat consists of that particular string.  Thus, you should use the tokentype <CODE>"&#60;="</CODE> to represent the string <SAMP>`&#60;='</SAMP> as a token.  Bisondoes not enforces this convention, but if you depart from it, people whoread your program will be confused.All the escape sequences used in string literals in C can be used inBison as well.  A literal string token must contain two or morecharacters; for a token containing just one character, use a charactertoken (see above).</UL><P>How you choose to write a terminal symbol has no effect on itsgrammatical meaning.  That depends only on where it appears in rules andon when the parser function returns that symbol.</P><P>The value returned by <CODE>yylex</CODE> is always one of the terminal symbols(or 0 for end-of-input).  Whichever way you write the token type in thegrammar rules, you write it the same way in the definition of <CODE>yylex</CODE>.The numeric code for a character token type is simply the ASCII code forthe character, so <CODE>yylex</CODE> can use the identical character constant togenerate the requisite code.  Each named token type becomes a C macro inthe parser file, so <CODE>yylex</CODE> can use the name to stand for the code.(This is why periods don't make sense in terminal symbols.)  See section <A HREF="bison_7.html#SEC62">Calling Convention for <CODE>yylex</CODE></A>.</P><P>If <CODE>yylex</CODE> is defined in a separate file, you need to arrange for thetoken-type macro definitions to be available there.  Use the <SAMP>`-d'</SAMP>option when you run Bison, so that it will write these macro definitionsinto a separate header file <TT>`<VAR>name</VAR>.tab.h'</TT> which you can includein the other source files that need it.  See section <A HREF="bison_12.html#SEC87">Invoking Bison</A>.</P><P>The symbol <CODE>error</CODE> is a terminal symbol reserved for error recovery(see section <A HREF="bison_9.html#SEC81">Error Recovery</A>); you shouldn't use it for any other purpose.In particular, <CODE>yylex</CODE> should never return this value.</P><H2><A NAME="SEC41" HREF="index.html#SEC41">Syntax of Grammar Rules</A></H2><P><A NAME="IDX68"></A><A NAME="IDX69"></A><A NAME="IDX70"></A></P><P>A Bison grammar rule has the following general form:</P><PRE><VAR>result</VAR>: <VAR>components</VAR>...        ;</PRE><P>where <VAR>result</VAR> is the nonterminal symbol that this rule describesand <VAR>components</VAR> are various terminal and nonterminal symbols thatare put together by this rule (see section <A HREF="bison_6.html#SEC40">Symbols, Terminal and Nonterminal</A>).  </P><P>For example,</P><PRE>exp:      exp '+' exp        ;</PRE><P>says that two groupings of type <CODE>exp</CODE>, with a <SAMP>`+'</SAMP> token in between,can be combined into a larger grouping of type <CODE>exp</CODE>.</P><P>Whitespace in rules is significant only to separate symbols.  You can addextra whitespace as you wish.</P><P>Scattered among the components can be <VAR>actions</VAR> that determinethe semantics of the rule.  An action looks like this:</P><PRE>{<VAR>C statements</VAR>}</PRE><P>Usually there is only one action and it follows the components.See section <A HREF="bison_6.html#SEC46">Actions</A>.</P><P><A NAME="IDX71"></A>Multiple rules for the same <VAR>result</VAR> can be written separately or canbe joined with the vertical-bar character <SAMP>`|'</SAMP> as follows:</P><PRE><VAR>result</VAR>:    <VAR>rule1-components</VAR>...        | <VAR>rule2-components</VAR>...        ...        ;</PRE><P>They are still considered distinct rules even when joined in this way.</P><P>If <VAR>components</VAR> in a rule is empty, it means that <VAR>result</VAR> canmatch the empty string.  For example, here is how to define acomma-separated sequence of zero or more <CODE>exp</CODE> groupings:</P><PRE>expseq:   /* empty */        | expseq1        ;expseq1:  exp        | expseq1 ',' exp        ;</PRE><P>It is customary to write a comment <SAMP>`/* empty */'</SAMP> in each rulewith no components.</P><H2><A NAME="SEC42" HREF="index.html#SEC42">Recursive Rules</A></H2><P><A NAME="IDX72"></A></P><P>A rule is called <STRONG>recursive</STRONG> when its <VAR>result</VAR> nonterminal appearsalso on its right hand side.  Nearly all Bison grammars need to userecursion, because that is the only way to define a sequence of any numberof somethings.  Consider this recursive definition of a comma-separatedsequence of one or more expressions:</P><PRE>expseq1:  exp        | expseq1 ',' exp        ;</PRE><P><A NAME="IDX73"></A><A NAME="IDX74"></A>Since the recursive use of <CODE>expseq1</CODE> is the leftmost symbol in theright hand side, we call this <STRONG>left recursion</STRONG>.  By contrast, herethe same construct is defined using <STRONG>right recursion</STRONG>:</P><PRE>expseq1:  exp        | exp ',' expseq1        ;</PRE><P>Any kind of sequence can be defined using either left recursion orright recursion, but you should always use left recursion, because itcan parse a sequence of any number of elements with bounded stackspace.  Right recursion uses up space on the Bison stack in proportionto the number of elements in the sequence, because all the elementsmust be shifted onto the stack before the rule can be applied evenonce.  See section <A HREF="bison_8.html#SEC68">The Bison Parser Algorithm</A>, forfurther explanation of this.</P><P><A NAME="IDX75"></A><STRONG>Indirect</STRONG> or <STRONG>mutual</STRONG> recursion occurs when the result of therule does not appear directly on its right hand side, but does appearin rules for other nonterminals which do appear on its right handside.  </P><P>For example:</P><PRE>expr:     primary        | primary '+' primary        ;primary:  constant        | '(' expr ')'        ;</PRE><P>defines two mutually-recursive nonterminals, since each refers to theother.</P><H2><A NAME="SEC43" HREF="index.html#SEC43">Defining Language Semantics</A></H2><P><A NAME="IDX76"></A><A NAME="IDX77"></A></P><P>The grammar rules for a language determine only the syntax.  The semanticsare determined by the semantic values associated with various tokens andgroupings, and by the actions taken when various groupings are recognized.</P><P>For example, the calculator calculates properly because the valueassociated with each expression is the proper number; it adds properlybecause the action for the grouping <SAMP>`<VAR>x</VAR> + <VAR>y</VAR>'</SAMP> is to addthe numbers associated with <VAR>x</VAR> and <VAR>y</VAR>.</P><H3><A NAME="SEC44" HREF="index.html#SEC44">Data Types of Semantic Values</A></H3><P>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -