📄 bison_5.htm
字号:
<HTML><HEAD><!-- This HTML file has been created by texi2html 1.44 from /opt/src/gnu/bison-1.25/bison.texinfo on 30 June 1997 --><TITLE>Bison 1.25 - Examples</TITLE></HEAD><BODY>Go to the <A HREF="bison_1.html">first</A>, <A HREF="bison_4.html">previous</A>, <A HREF="bison_6.html">next</A>, <A HREF="bison_15.html">last</A> section, <A HREF="index.html">table of contents</A>.<HR><H1><A NAME="SEC15" HREF="index.html#SEC15">Examples</A></H1><P><A NAME="IDX28"></A><A NAME="IDX29"></A></P><P>Now we show and explain three sample programs written using Bison: areverse polish notation calculator, an algebraic (infix) notationcalculator, and a multi-function calculator. All three have been testedunder BSD Unix 4.3; each produces a usable, though limited, interactivedesk-top calculator.</P><P>These examples are simple, but Bison grammars for real programminglanguages are written the same way.</P><H2><A NAME="SEC16" HREF="index.html#SEC16">Reverse Polish Notation Calculator</A></H2><P><A NAME="IDX30"></A><A NAME="IDX31"></A><A NAME="IDX32"></A><A NAME="IDX33"></A></P><P>The first example is that of a simple double-precision <STRONG>reverse polishnotation</STRONG> calculator (a calculator using postfix operators). This exampleprovides a good starting point, since operator precedence is not an issue.The second example will illustrate how operator precedence is handled.</P><P>The source code for this calculator is named <TT>`rpcalc.y'</TT>. The<SAMP>`.y'</SAMP> extension is a convention used for Bison input files.</P><H3><A NAME="SEC17" HREF="index.html#SEC17">Declarations for <CODE>rpcalc</CODE></A></H3><P>Here are the C and Bison declarations for the reverse polish notationcalculator. As in C, comments are placed between <SAMP>`/*...*/'</SAMP>.</P><PRE>/* Reverse polish notation calculator. */%{#define YYSTYPE double#include <math.h>%}%token NUM%% /* Grammar rules and actions follow */</PRE><P>The C declarations section (see section <A HREF="bison_6.html#SEC36">The C Declarations Section</A>) contains twopreprocessor directives.</P><P>The <CODE>#define</CODE> directive defines the macro <CODE>YYSTYPE</CODE>, thusspecifying the C data type for semantic values of both tokens and groupings(see section <A HREF="bison_6.html#SEC44">Data Types of Semantic Values</A>). The Bison parser will use whatever type<CODE>YYSTYPE</CODE> is defined as; if you don't define it, <CODE>int</CODE> is thedefault. Because we specify <CODE>double</CODE>, each token and each expressionhas an associated value, which is a floating point number.</P><P>The <CODE>#include</CODE> directive is used to declare the exponentiationfunction <CODE>pow</CODE>.</P><P>The second section, Bison declarations, provides information to Bison aboutthe token types (see section <A HREF="bison_6.html#SEC37">The Bison Declarations Section</A>). Each terminal symbol that isnot a single-character literal must be declared here. (Single-characterliterals normally don't need to be declared.) In this example, all thearithmetic operators are designated by single-character literals, so theonly terminal symbol that needs to be declared is <CODE>NUM</CODE>, the tokentype for numeric constants.</P><H3><A NAME="SEC18" HREF="index.html#SEC18">Grammar Rules for <CODE>rpcalc</CODE></A></H3><P>Here are the grammar rules for the reverse polish notation calculator.</P><PRE>input: /* empty */ | input line;line: '\n' | exp '\n' { printf ("\t%.10g\n", $1); };exp: NUM { $$ = $1; } | exp exp '+' { $$ = $1 + $2; } | exp exp '-' { $$ = $1 - $2; } | exp exp '*' { $$ = $1 * $2; } | exp exp '/' { $$ = $1 / $2; } /* Exponentiation */ | exp exp '^' { $$ = pow ($1, $2); } /* Unary minus */ | exp 'n' { $$ = -$1; };%%</PRE><P>The groupings of the rpcalc "language" defined here are the expression(given the name <CODE>exp</CODE>), the line of input (<CODE>line</CODE>), and thecomplete input transcript (<CODE>input</CODE>). Each of these nonterminalsymbols has several alternate rules, joined by the <SAMP>`|'</SAMP> punctuatorwhich is read as "or". The following sections explain what these rulesmean.</P><P>The semantics of the language is determined by the actions taken when agrouping is recognized. The actions are the C code that appears insidebraces. See section <A HREF="bison_6.html#SEC46">Actions</A>.</P><P>You must specify these actions in C, but Bison provides the means forpassing semantic values between the rules. In each action, thepseudo-variable <CODE>$$</CODE> stands for the semantic value for the groupingthat the rule is going to construct. Assigning a value to <CODE>$$</CODE> is themain job of most actions. The semantic values of the components of therule are referred to as <CODE>$1</CODE>, <CODE>$2</CODE>, and so on.</P><H4><A NAME="SEC19" HREF="index.html#SEC19">Explanation of <CODE>input</CODE></A></H4><P>Consider the definition of <CODE>input</CODE>:</P><PRE>input: /* empty */ | input line;</PRE><P>This definition reads as follows: "A complete input is either an emptystring, or a complete input followed by an input line". Notice that"complete input" is defined in terms of itself. This definition is saidto be <STRONG>left recursive</STRONG> since <CODE>input</CODE> appears always as theleftmost symbol in the sequence. See section <A HREF="bison_6.html#SEC42">Recursive Rules</A>.</P><P>The first alternative is empty because there are no symbols between thecolon and the first <SAMP>`|'</SAMP>; this means that <CODE>input</CODE> can match anempty string of input (no tokens). We write the rules this way because itis legitimate to type <KBD>Ctrl-d</KBD> right after you start the calculator.It's conventional to put an empty alternative first and write the comment<SAMP>`/* empty */'</SAMP> in it.</P><P>The second alternate rule (<CODE>input line</CODE>) handles all nontrivial input.It means, "After reading any number of lines, read one more line ifpossible." The left recursion makes this rule into a loop. Since thefirst alternative matches empty input, the loop can be executed zero ormore times.</P><P>The parser function <CODE>yyparse</CODE> continues to process input until agrammatical error is seen or the lexical analyzer says there are no moreinput tokens; we will arrange for the latter to happen at end of file.</P><H4><A NAME="SEC20" HREF="index.html#SEC20">Explanation of <CODE>line</CODE></A></H4><P>Now consider the definition of <CODE>line</CODE>:</P><PRE>line: '\n' | exp '\n' { printf ("\t%.10g\n", $1); };</PRE><P>The first alternative is a token which is a newline character; this meansthat rpcalc accepts a blank line (and ignores it, since there is noaction). The second alternative is an expression followed by a newline.This is the alternative that makes rpcalc useful. The semantic value ofthe <CODE>exp</CODE> grouping is the value of <CODE>$1</CODE> because the <CODE>exp</CODE> inquestion is the first symbol in the alternative. The action prints thisvalue, which is the result of the computation the user asked for.</P><P>This action is unusual because it does not assign a value to <CODE>$$</CODE>. Asa consequence, the semantic value associated with the <CODE>line</CODE> isuninitialized (its value will be unpredictable). This would be a bug ifthat value were ever used, but we don't use it: once rpcalc has printed thevalue of the user's input line, that value is no longer needed.</P><H4><A NAME="SEC21" HREF="index.html#SEC21">Explanation of <CODE>expr</CODE></A></H4><P>The <CODE>exp</CODE> grouping has several rules, one for each kind of expression.The first rule handles the simplest expressions: those that are just numbers.The second handles an addition-expression, which looks like two expressionsfollowed by a plus-sign. The third handles subtraction, and so on.</P><PRE>exp: NUM | exp exp '+' { $$ = $1 + $2; } | exp exp '-' { $$ = $1 - $2; } ... ;</PRE><P>We have used <SAMP>`|'</SAMP> to join all the rules for <CODE>exp</CODE>, but we couldequally well have written them separately:</P><PRE>exp: NUM ;exp: exp exp '+' { $$ = $1 + $2; } ;exp: exp exp '-' { $$ = $1 - $2; } ; ...</PRE><P>Most of the rules have actions that compute the value of the expression interms of the value of its parts. For example, in the rule for addition,<CODE>$1</CODE> refers to the first component <CODE>exp</CODE> and <CODE>$2</CODE> refers tothe second one. The third component, <CODE>'+'</CODE>, has no meaningfulassociated semantic value, but if it had one you could refer to it as<CODE>$3</CODE>. When <CODE>yyparse</CODE> recognizes a sum expression using thisrule, the sum of the two subexpressions' values is produced as the value ofthe entire expression. See section <A HREF="bison_6.html#SEC46">Actions</A>.</P><P>You don't have to give an action for every rule. When a rule has noaction, Bison by default copies the value of <CODE>$1</CODE> into <CODE>$$</CODE>.This is what happens in the first rule (the one that uses <CODE>NUM</CODE>).</P><P>The formatting shown here is the recommended convention, but Bison doesnot require it. You can add or change whitespace as much as you wish.For example, this:</P><PRE>exp : NUM | exp exp '+' {$$ = $1 + $2; } | ...</PRE><P>means the same thing as this:</P><PRE>exp: NUM | exp exp '+' { $$ = $1 + $2; } | ...</PRE><P>The latter, however, is much more readable.</P><H3><A NAME="SEC22" HREF="index.html#SEC22">The <CODE>rpcalc</CODE> Lexical Analyzer</A></H3><P><A NAME="IDX34"></A><A NAME="IDX35"></A></P><P>The lexical analyzer's job is low-level parsing: converting characters orsequences of characters into tokens. The Bison parser gets its tokens bycalling the lexical analyzer. See section <A HREF="bison_7.html#SEC61">The Lexical Analyzer Function <CODE>yylex</CODE></A>.</P><P>Only a simple lexical analyzer is needed for the RPN calculator. Thislexical analyzer skips blanks and tabs, then reads in numbers as<CODE>double</CODE> and returns them as <CODE>NUM</CODE> tokens. Any other characterthat isn't part of a number is a separate token. Note that the token-codefor such a single-character token is the character itself.</P><P>The return value of the lexical analyzer function is a numeric code whichrepresents a token type. The same text used in Bison rules to stand forthis token type is also a C expression for the numeric code for the type.This works in two ways. If the token type is a character literal, then itsnumeric code is the ASCII code for that character; you can use the samecharacter literal in the lexical analyzer to express the number. If thetoken type is an identifier, that identifier is defined by Bison as a Cmacro whose definition is the appropriate number. In this example,therefore, <CODE>NUM</CODE> becomes a macro for <CODE>yylex</CODE> to use.</P><P>The semantic value of the token (if it has one) is stored into the globalvariable <CODE>yylval</CODE>, which is where the Bison parser will look for it.(The C data type of <CODE>yylval</CODE> is <CODE>YYSTYPE</CODE>, which was definedat the beginning of the grammar; see section <A HREF="bison_5.html#SEC17">Declarations for <CODE>rpcalc</CODE></A>.)</P><P>A token type code of zero is returned if the end-of-file is encountered.(Bison recognizes any nonpositive value as indicating the end of theinput.)</P><P>Here is the code for the lexical analyzer:</P><PRE>/* Lexical analyzer returns a double floating point number on the stack and the token NUM, or the ASCII
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -