📄 bison_5.htm

📁 Lex和Yacc的Manual
💻 HTM
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
<HTML><HEAD><!-- This HTML file has been created by texi2html 1.44     from /opt/src/gnu/bison-1.25/bison.texinfo on 30 June 1997 --><TITLE>Bison 1.25 - Examples</TITLE></HEAD><BODY>Go to the <A HREF="bison_1.html">first</A>, <A HREF="bison_4.html">previous</A>, <A HREF="bison_6.html">next</A>, <A HREF="bison_15.html">last</A> section, <A HREF="index.html">table of contents</A>.<HR><H1><A NAME="SEC15" HREF="index.html#SEC15">Examples</A></H1><P><A NAME="IDX28"></A><A NAME="IDX29"></A></P><P>Now we show and explain three sample programs written using Bison: areverse polish notation calculator, an algebraic (infix) notationcalculator, and a multi-function calculator.  All three have been testedunder BSD Unix 4.3; each produces a usable, though limited, interactivedesk-top calculator.</P><P>These examples are simple, but Bison grammars for real programminglanguages are written the same way.</P><H2><A NAME="SEC16" HREF="index.html#SEC16">Reverse Polish Notation Calculator</A></H2><P><A NAME="IDX30"></A><A NAME="IDX31"></A><A NAME="IDX32"></A><A NAME="IDX33"></A></P><P>The first example is that of a simple double-precision <STRONG>reverse polishnotation</STRONG> calculator (a calculator using postfix operators).  This exampleprovides a good starting point, since operator precedence is not an issue.The second example will illustrate how operator precedence is handled.</P><P>The source code for this calculator is named <TT>`rpcalc.y'</TT>.  The<SAMP>`.y'</SAMP> extension is a convention used for Bison input files.</P><H3><A NAME="SEC17" HREF="index.html#SEC17">Declarations for <CODE>rpcalc</CODE></A></H3><P>Here are the C and Bison declarations for the reverse polish notationcalculator.  As in C, comments are placed between <SAMP>`/*...*/'</SAMP>.</P><PRE>/* Reverse polish notation calculator. */%{#define YYSTYPE double#include &#60;math.h&#62;%}%token NUM%% /* Grammar rules and actions follow */</PRE><P>The C declarations section (see section <A HREF="bison_6.html#SEC36">The C Declarations Section</A>) contains twopreprocessor directives.</P><P>The <CODE>#define</CODE> directive defines the macro <CODE>YYSTYPE</CODE>, thusspecifying the C data type for semantic values of both tokens and groupings(see section <A HREF="bison_6.html#SEC44">Data Types of Semantic Values</A>).  The Bison parser will use whatever type<CODE>YYSTYPE</CODE> is defined as; if you don't define it, <CODE>int</CODE> is thedefault.  Because we specify <CODE>double</CODE>, each token and each expressionhas an associated value, which is a floating point number.</P><P>The <CODE>#include</CODE> directive is used to declare the exponentiationfunction <CODE>pow</CODE>.</P><P>The second section, Bison declarations, provides information to Bison aboutthe token types (see section <A HREF="bison_6.html#SEC37">The Bison Declarations Section</A>).  Each terminal symbol that isnot a single-character literal must be declared here.  (Single-characterliterals normally don't need to be declared.)  In this example, all thearithmetic operators are designated by single-character literals, so theonly terminal symbol that needs to be declared is <CODE>NUM</CODE>, the tokentype for numeric constants.</P><H3><A NAME="SEC18" HREF="index.html#SEC18">Grammar Rules for <CODE>rpcalc</CODE></A></H3><P>Here are the grammar rules for the reverse polish notation calculator.</P><PRE>input:    /* empty */        | input line;line:     '\n'        | exp '\n'  { printf ("\t%.10g\n", $1); };exp:      NUM             { $$ = $1;         }        | exp exp '+'     { $$ = $1 + $2;    }        | exp exp '-'     { $$ = $1 - $2;    }        | exp exp '*'     { $$ = $1 * $2;    }        | exp exp '/'     { $$ = $1 / $2;    }      /* Exponentiation */        | exp exp '^'     { $$ = pow ($1, $2); }      /* Unary minus    */        | exp 'n'         { $$ = -$1;        };%%</PRE><P>The groupings of the rpcalc "language" defined here are the expression(given the name <CODE>exp</CODE>), the line of input (<CODE>line</CODE>), and thecomplete input transcript (<CODE>input</CODE>).  Each of these nonterminalsymbols has several alternate rules, joined by the <SAMP>`|'</SAMP> punctuatorwhich is read as "or".  The following sections explain what these rulesmean.</P><P>The semantics of the language is determined by the actions taken when agrouping is recognized.  The actions are the C code that appears insidebraces.  See section <A HREF="bison_6.html#SEC46">Actions</A>.</P><P>You must specify these actions in C, but Bison provides the means forpassing semantic values between the rules.  In each action, thepseudo-variable <CODE>$$</CODE> stands for the semantic value for the groupingthat the rule is going to construct.  Assigning a value to <CODE>$$</CODE> is themain job of most actions.  The semantic values of the components of therule are referred to as <CODE>$1</CODE>, <CODE>$2</CODE>, and so on.</P><H4><A NAME="SEC19" HREF="index.html#SEC19">Explanation of <CODE>input</CODE></A></H4><P>Consider the definition of <CODE>input</CODE>:</P><PRE>input:    /* empty */        | input line;</PRE><P>This definition reads as follows: "A complete input is either an emptystring, or a complete input followed by an input line".  Notice that"complete input" is defined in terms of itself.  This definition is saidto be <STRONG>left recursive</STRONG> since <CODE>input</CODE> appears always as theleftmost symbol in the sequence.  See section <A HREF="bison_6.html#SEC42">Recursive Rules</A>.</P><P>The first alternative is empty because there are no symbols between thecolon and the first <SAMP>`|'</SAMP>; this means that <CODE>input</CODE> can match anempty string of input (no tokens).  We write the rules this way because itis legitimate to type <KBD>Ctrl-d</KBD> right after you start the calculator.It's conventional to put an empty alternative first and write the comment<SAMP>`/* empty */'</SAMP> in it.</P><P>The second alternate rule (<CODE>input line</CODE>) handles all nontrivial input.It means, "After reading any number of lines, read one more line ifpossible."  The left recursion makes this rule into a loop.  Since thefirst alternative matches empty input, the loop can be executed zero ormore times.</P><P>The parser function <CODE>yyparse</CODE> continues to process input until agrammatical error is seen or the lexical analyzer says there are no moreinput tokens; we will arrange for the latter to happen at end of file.</P><H4><A NAME="SEC20" HREF="index.html#SEC20">Explanation of <CODE>line</CODE></A></H4><P>Now consider the definition of <CODE>line</CODE>:</P><PRE>line:     '\n'        | exp '\n'  { printf ("\t%.10g\n", $1); };</PRE><P>The first alternative is a token which is a newline character; this meansthat rpcalc accepts a blank line (and ignores it, since there is noaction).  The second alternative is an expression followed by a newline.This is the alternative that makes rpcalc useful.  The semantic value ofthe <CODE>exp</CODE> grouping is the value of <CODE>$1</CODE> because the <CODE>exp</CODE> inquestion is the first symbol in the alternative.  The action prints thisvalue, which is the result of the computation the user asked for.</P><P>This action is unusual because it does not assign a value to <CODE>$$</CODE>.  Asa consequence, the semantic value associated with the <CODE>line</CODE> isuninitialized (its value will be unpredictable).  This would be a bug ifthat value were ever used, but we don't use it: once rpcalc has printed thevalue of the user's input line, that value is no longer needed.</P><H4><A NAME="SEC21" HREF="index.html#SEC21">Explanation of <CODE>expr</CODE></A></H4><P>The <CODE>exp</CODE> grouping has several rules, one for each kind of expression.The first rule handles the simplest expressions: those that are just numbers.The second handles an addition-expression, which looks like two expressionsfollowed by a plus-sign.  The third handles subtraction, and so on.</P><PRE>exp:      NUM        | exp exp '+'     { $$ = $1 + $2;    }        | exp exp '-'     { $$ = $1 - $2;    }        ...        ;</PRE><P>We have used <SAMP>`|'</SAMP> to join all the rules for <CODE>exp</CODE>, but we couldequally well have written them separately:</P><PRE>exp:      NUM ;exp:      exp exp '+'     { $$ = $1 + $2;    } ;exp:      exp exp '-'     { $$ = $1 - $2;    } ;        ...</PRE><P>Most of the rules have actions that compute the value of the expression interms of the value of its parts.  For example, in the rule for addition,<CODE>$1</CODE> refers to the first component <CODE>exp</CODE> and <CODE>$2</CODE> refers tothe second one.  The third component, <CODE>'+'</CODE>, has no meaningfulassociated semantic value, but if it had one you could refer to it as<CODE>$3</CODE>.  When <CODE>yyparse</CODE> recognizes a sum expression using thisrule, the sum of the two subexpressions' values is produced as the value ofthe entire expression.  See section <A HREF="bison_6.html#SEC46">Actions</A>.</P><P>You don't have to give an action for every rule.  When a rule has noaction, Bison by default copies the value of <CODE>$1</CODE> into <CODE>$$</CODE>.This is what happens in the first rule (the one that uses <CODE>NUM</CODE>).</P><P>The formatting shown here is the recommended convention, but Bison doesnot require it.  You can add or change whitespace as much as you wish.For example, this:</P><PRE>exp   : NUM | exp exp '+' {$$ = $1 + $2; } | ...</PRE><P>means the same thing as this:</P><PRE>exp:      NUM        | exp exp '+'    { $$ = $1 + $2; }        | ...</PRE><P>The latter, however, is much more readable.</P><H3><A NAME="SEC22" HREF="index.html#SEC22">The <CODE>rpcalc</CODE> Lexical Analyzer</A></H3><P><A NAME="IDX34"></A><A NAME="IDX35"></A></P><P>The lexical analyzer's job is low-level parsing: converting characters orsequences of characters into tokens.  The Bison parser gets its tokens bycalling the lexical analyzer.  See section <A HREF="bison_7.html#SEC61">The Lexical Analyzer Function <CODE>yylex</CODE></A>.</P><P>Only a simple lexical analyzer is needed for the RPN calculator.  Thislexical analyzer skips blanks and tabs, then reads in numbers as<CODE>double</CODE> and returns them as <CODE>NUM</CODE> tokens.  Any other characterthat isn't part of a number is a separate token.  Note that the token-codefor such a single-character token is the character itself.</P><P>The return value of the lexical analyzer function is a numeric code whichrepresents a token type.  The same text used in Bison rules to stand forthis token type is also a C expression for the numeric code for the type.This works in two ways.  If the token type is a character literal, then itsnumeric code is the ASCII code for that character; you can use the samecharacter literal in the lexical analyzer to express the number.  If thetoken type is an identifier, that identifier is defined by Bison as a Cmacro whose definition is the appropriate number.  In this example,therefore, <CODE>NUM</CODE> becomes a macro for <CODE>yylex</CODE> to use.</P><P>The semantic value of the token (if it has one) is stored into the globalvariable <CODE>yylval</CODE>, which is where the Bison parser will look for it.(The C data type of <CODE>yylval</CODE> is <CODE>YYSTYPE</CODE>, which was definedat the beginning of the grammar; see section <A HREF="bison_5.html#SEC17">Declarations for <CODE>rpcalc</CODE></A>.)</P><P>A token type code of zero is returned if the end-of-file is encountered.(Bison recognizes any nonpositive value as indicating the end of theinput.)</P><P>Here is the code for the lexical analyzer:</P><PRE>/* Lexical analyzer returns a double floating point    number on the stack and the token NUM, or the ASCII
12 3 下一页
💿 文件大小 72 K
👤 上传用户 yuyx2003
📂 所属分类文章/文档
🏷️ 相关标签

#Manual #Yacc #Lex
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -