📄 bison.texinfo

📁 GNU的词法/语法分析器bison源码
💻 TEXINFO
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
Here are the C and Bison declarations for the reverse polish notationcalculator.  As in C, comments are placed between @samp{/*@dots{}*/}.@example/* Reverse polish notation calculator.  */%@{  #define YYSTYPE double  #include <math.h>  int yylex (void);  void yyerror (char const *);%@}%token NUM%% /* Grammar rules and actions follow.  */@end exampleThe declarations section (@pxref{Prologue, , The prologue}) contains twopreprocessor directives and two forward declarations.The @code{#define} directive defines the macro @code{YYSTYPE}, thusspecifying the C data type for semantic values of both tokens andgroupings (@pxref{Value Type, ,Data Types of Semantic Values}).  TheBison parser will use whatever type @code{YYSTYPE} is defined as; if youdon't define it, @code{int} is the default.  Because we specify@code{double}, each token and each expression has an associated value,which is a floating point number.The @code{#include} directive is used to declare the exponentiationfunction @code{pow}.The forward declarations for @code{yylex} and @code{yyerror} areneeded because the C language requires that functions be declaredbefore they are used.  These functions will be defined in theepilogue, but the parser calls them so they must be declared in theprologue.The second section, Bison declarations, provides information to Bisonabout the token types (@pxref{Bison Declarations, ,The BisonDeclarations Section}).  Each terminal symbol that is not asingle-character literal must be declared here.  (Single-characterliterals normally don't need to be declared.)  In this example, all thearithmetic operators are designated by single-character literals, so theonly terminal symbol that needs to be declared is @code{NUM}, the tokentype for numeric constants.@node Rpcalc Rules@subsection Grammar Rules for @code{rpcalc}Here are the grammar rules for the reverse polish notation calculator.@exampleinput:    /* empty */        | input line;line:     '\n'        | exp '\n'      @{ printf ("\t%.10g\n", $1); @};exp:      NUM           @{ $$ = $1;           @}        | exp exp '+'   @{ $$ = $1 + $2;      @}        | exp exp '-'   @{ $$ = $1 - $2;      @}        | exp exp '*'   @{ $$ = $1 * $2;      @}        | exp exp '/'   @{ $$ = $1 / $2;      @}         /* Exponentiation */        | exp exp '^'   @{ $$ = pow ($1, $2); @}         /* Unary minus    */        | exp 'n'       @{ $$ = -$1;          @};%%@end exampleThe groupings of the rpcalc ``language'' defined here are the expression(given the name @code{exp}), the line of input (@code{line}), and thecomplete input transcript (@code{input}).  Each of these nonterminalsymbols has several alternate rules, joined by the @samp{|} punctuatorwhich is read as ``or''.  The following sections explain what these rulesmean.The semantics of the language is determined by the actions taken when agrouping is recognized.  The actions are the C code that appears insidebraces.  @xref{Actions}.You must specify these actions in C, but Bison provides the means forpassing semantic values between the rules.  In each action, thepseudo-variable @code{$$} stands for the semantic value for the groupingthat the rule is going to construct.  Assigning a value to @code{$$} is themain job of most actions.  The semantic values of the components of therule are referred to as @code{$1}, @code{$2}, and so on.@menu* Rpcalc Input::* Rpcalc Line::* Rpcalc Expr::@end menu@node Rpcalc Input@subsubsection Explanation of @code{input}Consider the definition of @code{input}:@exampleinput:    /* empty */        | input line;@end exampleThis definition reads as follows: ``A complete input is either an emptystring, or a complete input followed by an input line''.  Notice that``complete input'' is defined in terms of itself.  This definition is saidto be @dfn{left recursive} since @code{input} appears always as theleftmost symbol in the sequence.  @xref{Recursion, ,Recursive Rules}.The first alternative is empty because there are no symbols between thecolon and the first @samp{|}; this means that @code{input} can match anempty string of input (no tokens).  We write the rules this way because itis legitimate to type @kbd{Ctrl-d} right after you start the calculator.It's conventional to put an empty alternative first and write the comment@samp{/* empty */} in it.The second alternate rule (@code{input line}) handles all nontrivial input.It means, ``After reading any number of lines, read one more line ifpossible.''  The left recursion makes this rule into a loop.  Since thefirst alternative matches empty input, the loop can be executed zero ormore times.The parser function @code{yyparse} continues to process input until agrammatical error is seen or the lexical analyzer says there are no moreinput tokens; we will arrange for the latter to happen at end-of-input.@node Rpcalc Line@subsubsection Explanation of @code{line}Now consider the definition of @code{line}:@exampleline:     '\n'        | exp '\n'  @{ printf ("\t%.10g\n", $1); @};@end exampleThe first alternative is a token which is a newline character; this meansthat rpcalc accepts a blank line (and ignores it, since there is noaction).  The second alternative is an expression followed by a newline.This is the alternative that makes rpcalc useful.  The semantic value ofthe @code{exp} grouping is the value of @code{$1} because the @code{exp} inquestion is the first symbol in the alternative.  The action prints thisvalue, which is the result of the computation the user asked for.This action is unusual because it does not assign a value to @code{$$}.  Asa consequence, the semantic value associated with the @code{line} isuninitialized (its value will be unpredictable).  This would be a bug ifthat value were ever used, but we don't use it: once rpcalc has printed thevalue of the user's input line, that value is no longer needed.@node Rpcalc Expr@subsubsection Explanation of @code{expr}The @code{exp} grouping has several rules, one for each kind of expression.The first rule handles the simplest expressions: those that are just numbers.The second handles an addition-expression, which looks like two expressionsfollowed by a plus-sign.  The third handles subtraction, and so on.@exampleexp:      NUM        | exp exp '+'     @{ $$ = $1 + $2;    @}        | exp exp '-'     @{ $$ = $1 - $2;    @}        @dots{}        ;@end exampleWe have used @samp{|} to join all the rules for @code{exp}, but we couldequally well have written them separately:@exampleexp:      NUM ;exp:      exp exp '+'     @{ $$ = $1 + $2;    @} ;exp:      exp exp '-'     @{ $$ = $1 - $2;    @} ;        @dots{}@end exampleMost of the rules have actions that compute the value of the expression interms of the value of its parts.  For example, in the rule for addition,@code{$1} refers to the first component @code{exp} and @code{$2} refers tothe second one.  The third component, @code{'+'}, has no meaningfulassociated semantic value, but if it had one you could refer to it as@code{$3}.  When @code{yyparse} recognizes a sum expression using thisrule, the sum of the two subexpressions' values is produced as the value ofthe entire expression.  @xref{Actions}.You don't have to give an action for every rule.  When a rule has noaction, Bison by default copies the value of @code{$1} into @code{$$}.This is what happens in the first rule (the one that uses @code{NUM}).The formatting shown here is the recommended convention, but Bison doesnot require it.  You can add or change white space as much as you wish.For example, this:@exampleexp   : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ;@end example@noindentmeans the same thing as this:@exampleexp:      NUM        | exp exp '+'    @{ $$ = $1 + $2; @}        | @dots{};@end example@noindentThe latter, however, is much more readable.@node Rpcalc Lexer@subsection The @code{rpcalc} Lexical Analyzer@cindex writing a lexical analyzer@cindex lexical analyzer, writingThe lexical analyzer's job is low-level parsing: converting charactersor sequences of characters into tokens.  The Bison parser gets itstokens by calling the lexical analyzer.  @xref{Lexical, ,The LexicalAnalyzer Function @code{yylex}}.Only a simple lexical analyzer is needed for the @acronym{RPN}calculator.  Thislexical analyzer skips blanks and tabs, then reads in numbers as@code{double} and returns them as @code{NUM} tokens.  Any other characterthat isn't part of a number is a separate token.  Note that the token-codefor such a single-character token is the character itself.The return value of the lexical analyzer function is a numeric code whichrepresents a token type.  The same text used in Bison rules to stand forthis token type is also a C expression for the numeric code for the type.This works in two ways.  If the token type is a character literal, then itsnumeric code is that of the character; you can use the samecharacter literal in the lexical analyzer to express the number.  If thetoken type is an identifier, that identifier is defined by Bison as a Cmacro whose definition is the appropriate number.  In this example,therefore, @code{NUM} becomes a macro for @code{yylex} to use.The semantic value of the token (if it has one) is stored into theglobal variable @code{yylval}, which is where the Bison parser will lookfor it.  (The C data type of @code{yylval} is @code{YYSTYPE}, which wasdefined at the beginning of the grammar; @pxref{Rpcalc Decls,,Declarations for @code{rpcalc}}.)A token type code of zero is returned if the end-of-input is encountered.(Bison recognizes any nonpositive value as indicating end-of-input.)Here is the code for the lexical analyzer:@example@group/* The lexical analyzer returns a double floating point   number on the stack and the token NUM, or the numeric code   of the character read if not a number.  It skips all blanks   and tabs, and returns 0 for end-of-input.  */#include <ctype.h>@end group@groupintyylex (void)@{  int c;  /* Skip white space.  */  while ((c = getchar ()) == ' ' || c == '\t')    ;@end group@group  /* Process numbers.  */  if (c == '.' || isdigit (c))    @{      ungetc (c, stdin);      scanf ("%lf", &yylval);      return NUM;    @}@end group@group  /* Return end-of-input.  */  if (c == EOF)    return 0;  /* Return a single char.  */  return c;@}@end group@end example@node Rpcalc Main@subsection The Controlling Function@cindex controlling function@cindex main function in simple exampleIn keeping with the spirit of this example, the controlling function iskept to the bare minimum.  The only requirement is that it call@code{yyparse} to start the process of parsing.@example@groupintmain (void)@{  return yyparse ();@}@end group@end example@node Rpcalc Error@subsection The Error Reporting Routine@cindex error reporting routineWhen @code{yyparse} detects a syntax error, it calls the error reportingfunction @code{yyerror} to print an error message (usually but notalways @code{"syntax error"}).  It is up to the programmer to supply@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), sohere is the definition we will use:@example@group#include <stdio.h>/* Called by yyparse on error.  */voidyyerror (char const *s)@{  fprintf (stderr, "%s\n", s);@}@end group@end exampleAfter @code{yyerror} returns, the Bison parser may recover from the errorand
上一页 1 2 3 45
💿 文件大小 1286 K
👤 上传用户 my
📂 所属分类编译器/解释器
🏷️ 相关标签

#bison #GNU #语法分析器 #源码
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -