📄 bison.texinfo

📁 GNU的词法/语法分析器bison源码
💻 TEXINFO
📖 第 1 页 / 共 5 页
字号:
you may use it to develop a wide range of language parsers, from thoseused in simple desk calculators to complex programming languages.Bison is upward compatible with Yacc: all properly-written Yacc grammarsought to work with Bison with no change.  Anyone familiar with Yaccshould be able to use Bison with little trouble.  You need to be fluent inC programming in order to use Bison or to understand this manual.We begin with tutorial chapters that explain the basic concepts of usingBison and show three explained examples, each building on the last.  If youdon't know Bison or Yacc, start by reading these chapters.  Referencechapters follow which describe specific aspects of Bison in detail.Bison was written primarily by Robert Corbett; Richard Stallman made itYacc-compatible.  Wilfred Hansen of Carnegie Mellon University addedmulti-character string literals and other features.This edition corresponds to version @value{VERSION} of Bison.@node Conditions@unnumbered Conditions for Using BisonAs of Bison version 1.24, we have changed the distribution terms for@code{yyparse} to permit using Bison's output in nonfree programs whenBison is generating C code for @acronym{LALR}(1) parsers.  Formerly, theseparsers could be used only in programs that were free software.The other @acronym{GNU} programming tools, such as the @acronym{GNU} Ccompiler, have neverhad such a requirement.  They could always be used for nonfreesoftware.  The reason Bison was different was not due to a specialpolicy decision; it resulted from applying the usual General PublicLicense to all of the Bison source code.The output of the Bison utility---the Bison parser file---contains averbatim copy of a sizable piece of Bison, which is the code for the@code{yyparse} function.  (The actions from your grammar are insertedinto this function at one point, but the rest of the function is notchanged.)  When we applied the @acronym{GPL} terms to the code for@code{yyparse},the effect was to restrict the use of Bison output to free software.We didn't change the terms because of sympathy for people who want tomake software proprietary.  @strong{Software should be free.}  But weconcluded that limiting Bison's use to free software was doing little toencourage people to make other software free.  So we decided to make thepractical conditions for using Bison match the practical conditions forusing the other @acronym{GNU} tools.This exception applies only when Bison is generating C code for an@acronym{LALR}(1) parser; otherwise, the @acronym{GPL} terms operateas usual.  You cantell whether the exception applies to your @samp{.c} output file byinspecting it to see whether it says ``As a special exception, whenthis file is copied by Bison into a Bison output file, you may usethat output file without restriction.''@include gpl.texi@node Concepts@chapter The Concepts of BisonThis chapter introduces many of the basic concepts without which thedetails of Bison will not make sense.  If you do not already know how touse Bison or Yacc, we suggest you start by reading this chapter carefully.@menu* Language and Grammar::  Languages and context-free grammars,                            as mathematical ideas.* Grammar in Bison::  How we represent grammars for Bison's sake.* Semantic Values::   Each token or syntactic grouping can have                        a semantic value (the value of an integer,                        the name of an identifier, etc.).* Semantic Actions::  Each rule can have an action containing C code.* GLR Parsers::       Writing parsers for general context-free languages.* Locations Overview::    Tracking Locations.* Bison Parser::      What are Bison's input and output,                        how is the output used?* Stages::            Stages in writing and running Bison grammars.* Grammar Layout::    Overall structure of a Bison grammar file.@end menu@node Language and Grammar@section Languages and Context-Free Grammars@cindex context-free grammar@cindex grammar, context-freeIn order for Bison to parse a language, it must be described by a@dfn{context-free grammar}.  This means that you specify one or more@dfn{syntactic groupings} and give rules for constructing them from theirparts.  For example, in the C language, one kind of grouping is called an`expression'.  One rule for making an expression might be, ``An expressioncan be made of a minus sign and another expression''.  Another would be,``An expression can be an integer''.  As you can see, rules are oftenrecursive, but there must be at least one rule which leads out of therecursion.@cindex @acronym{BNF}@cindex Backus-Naur formThe most common formal system for presenting such rules for humans to readis @dfn{Backus-Naur Form} or ``@acronym{BNF}'', which was developed inorder to specify the language Algol 60.  Any grammar expressed in@acronym{BNF} is a context-free grammar.  The input to Bison isessentially machine-readable @acronym{BNF}.@cindex @acronym{LALR}(1) grammars@cindex @acronym{LR}(1) grammarsThere are various important subclasses of context-free grammar.  Although itcan handle almost all context-free grammars, Bison is optimized for whatare called @acronym{LALR}(1) grammars.In brief, in these grammars, it must be possible totell how to parse any portion of an input string with just a singletoken of look-ahead.  Strictly speaking, that is a description of an@acronym{LR}(1) grammar, and @acronym{LALR}(1) involves additionalrestrictions that arehard to explain simply; but it is rare in actual practice to find an@acronym{LR}(1) grammar that fails to be @acronym{LALR}(1).@xref{Mystery Conflicts, ,Mysterious Reduce/Reduce Conflicts}, formore information on this.@cindex @acronym{GLR} parsing@cindex generalized @acronym{LR} (@acronym{GLR}) parsing@cindex ambiguous grammars@cindex non-deterministic parsingParsers for @acronym{LALR}(1) grammars are @dfn{deterministic}, meaningroughly that the next grammar rule to apply at any point in the input isuniquely determined by the preceding input and a fixed, finite portion(called a @dfn{look-ahead}) of the remaining input.  A context-freegrammar can be @dfn{ambiguous}, meaning that there are multiple ways toapply the grammar rules to get the same inputs.  Even unambiguousgrammars can be @dfn{non-deterministic}, meaning that no fixedlook-ahead always suffices to determine the next grammar rule to apply.With the proper declarations, Bison is also able to parse these moregeneral context-free grammars, using a technique known as @acronym{GLR}parsing (for Generalized @acronym{LR}).  Bison's @acronym{GLR} parsersare able to handle any context-free grammar for which the number ofpossible parses of any given string is finite.@cindex symbols (abstract)@cindex token@cindex syntactic grouping@cindex grouping, syntacticIn the formal grammatical rules for a language, each kind of syntacticunit or grouping is named by a @dfn{symbol}.  Those which are built bygrouping smaller constructs according to grammatical rules are called@dfn{nonterminal symbols}; those which can't be subdivided are called@dfn{terminal symbols} or @dfn{token types}.  We call a piece of inputcorresponding to a single terminal symbol a @dfn{token}, and a piececorresponding to a single nonterminal symbol a @dfn{grouping}.We can use the C language as an example of what symbols, terminal andnonterminal, mean.  The tokens of C are identifiers, constants (numericand string), and the various keywords, arithmetic operators andpunctuation marks.  So the terminal symbols of a grammar for C include`identifier', `number', `string', plus one symbol for each keyword,operator or punctuation mark: `if', `return', `const', `static', `int',`char', `plus-sign', `open-brace', `close-brace', `comma' and many more.(These tokens can be subdivided into characters, but that is a matter oflexicography, not grammar.)Here is a simple C function subdivided into tokens:@ifinfo@exampleint             /* @r{keyword `int'} */square (int x)  /* @r{identifier, open-paren, keyword `int',}                   @r{identifier, close-paren} */@{               /* @r{open-brace} */  return x * x; /* @r{keyword `return', identifier, asterisk,                   identifier, semicolon} */@}               /* @r{close-brace} */@end example@end ifinfo@ifnotinfo@exampleint             /* @r{keyword `int'} */square (int x)  /* @r{identifier, open-paren, keyword `int', identifier, close-paren} */@{               /* @r{open-brace} */  return x * x; /* @r{keyword `return', identifier, asterisk, identifier, semicolon} */@}               /* @r{close-brace} */@end example@end ifnotinfoThe syntactic groupings of C include the expression, the statement, thedeclaration, and the function definition.  These are represented in thegrammar of C by nonterminal symbols `expression', `statement',`declaration' and `function definition'.  The full grammar uses dozens ofadditional language constructs, each with its own nonterminal symbol, inorder to express the meanings of these four.  The example above is afunction definition; it contains one declaration, and one statement.  Inthe statement, each @samp{x} is an expression and so is @samp{x * x}.Each nonterminal symbol must have grammatical rules showing how it is madeout of simpler constructs.  For example, one kind of C statement is the@code{return} statement; this would be described with a grammar rule whichreads informally as follows:@quotationA `statement' can be made of a `return' keyword, an `expression' and a`semicolon'.@end quotation@noindentThere would be many other rules for `statement', one for each kind ofstatement in C.@cindex start symbolOne nonterminal symbol must be distinguished as the special one whichdefines a complete utterance in the language.  It is called the @dfn{startsymbol}.  In a compiler, this means a complete input program.  In the Clanguage, the nonterminal symbol `sequence of definitions and declarations'plays this role.For example, @samp{1 + 2} is a valid C expression---a valid part of a Cprogram---but it is not valid as an @emph{entire} C program.  In thecontext-free grammar of C, this follows from the fact that `expression' isnot the start symbol.The Bison parser reads a sequence of tokens as its input, and groups thetokens using the grammar rules.  If the input is valid, the end result isthat the entire token sequence reduces to a single grouping whose symbol isthe grammar's start symbol.  If we use a grammar for C, the entire inputmust be a `sequence of definitions and declarations'.  If not, the parserreports a syntax error.@node Grammar in Bison@section From Formal Rules to Bison Input@cindex Bison grammar@cindex grammar, Bison@cindex formal grammarA formal grammar is a mathematical construct.  To define the languagefor Bison, you must write a file expressing the grammar in Bison syntax:a @dfn{Bison grammar} file.  @xref{Grammar File, ,Bison Grammar Files}.A nonterminal symbol in the formal grammar is represented in Bison inputas an identifier, like an identifier in C@.  By convention, it should bein lower case, such as @code{expr}, @code{stmt} or @code{declaration}.The Bison representation for a terminal symbol is also called a @dfn{tokentype}.  Token types as well can be represented as C-like identifiers.  Byconvention, these identifiers should be upper case to distinguish them fromnonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or@code{RETURN}.  A terminal symbol that stands for a particular keyword inthe language should be named after that keyword converted to upper case.The terminal symbol @code{error} is reserved for error recovery.@xref{Symbols}.A terminal symbol can also be represented as a character literal, just likea C character constant.  You should do this whenever a token is just asingle character (parenthesis, plus-sign, etc.): use that same character ina literal as the terminal symbol for that token.A third way to represent a terminal symbol is with a C string constantcontaining several characters.  @xref{Symbols}, for more information.The grammar rules also have an expression in Bison syntax.  For example,here is the Bison rule for a C @code{return} statement.  The semicolon inquotes is a literal character token, representing part of the C syntax forthe statement; the naked semicolon, and the colon, are Bison punctuationused in every rule.@examplestmt:   RETURN expr ';'        ;@end example@noindent@xref{Rules, ,Syntax of Grammar Rules}.@node Semantic Values@section Semantic Values@cindex semantic value@cindex value, semanticA formal grammar selects tokens only by their classifications: for example,if a rule mentions the terminal symbol `integer constant', it means that@emph{any} integer constant is grammatically valid in that position.  Theprecise value of the constant is irrelevant to how to parse the input: if@samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equallygrammatical.But the precise value is very important for what the input means once it isparsed.  A compiler is useless if it fails to distinguish between 4, 1 and3989 as constants in the program!  Therefore, each token in a Bison grammarhas both a token type and a @dfn{semantic value}.  @xref{Semantics,,Defining Language Semantics},for details.The token type is a terminal symbol defined in the grammar, such as@code{INTEGER}, @code{IDENTIFIER} or @code{','}.  It tells everythingyou need to know to decide where the token may validly appear and how togroup it with other tokens.  The grammar rules know nothing about tokensexcept their types.The semantic value has all the rest of the information about themeaning of the token, such as the value of an integer, or the name of anidentifier.  (A token such as @code{','} which is just punctuation doesn'tneed to have any semantic value.)For example, an input token might be classified as token type@code{INTEGER} and have the semantic value 4.  Another input token mighthave the same token type @code{INTEGER} but value 3989.  When a grammarrule says that @code{INTEGER} is allowed, either of these tokens isacceptable because each is an @code{INTEGER}.  When the parser accepts thetoken, it keeps track of the token's semantic value.Each grouping can also have a semantic value as well as its nonterminalsymbol.  For example, in a calculator, an expression typically has asemantic value that is a number.  In a compiler for a programminglanguage, an expression typically has a semantic value that is a treestructure describing the meaning of the expression.@node Semantic Actions@section Semantic Actions@cindex semantic actions@cindex actions, semanticIn order to be useful, a program must do more than parse input; it mustalso produce some output based on the input.  In a Bison grammar, a grammarrule can have an @dfn{action} made up of C statements.  Each time theparser recognizes a match for that rule, the action is executed.@xref{Actions}.Most of the time, the purpose of an action is to compute the semantic valueof the whole construct from the semantic values of its parts.  For example,suppose we have a rule which says an expression can be the sum of twoexpressions.  When the parser recognizes such a sum, each of thesubexpressions has a semantic value which describes how it was built up.The action for this rule should create a similar sort of value for thenewly recognized larger expression.For example, here is a rule that says an expression can be the sum oftwo subexpressions:@exampleexpr: expr '+' expr   @{ $$ = $1 + $3; @}        ;
💿 文件大小 1286 K
👤 上传用户 my
📂 所属分类编译器/解释器
🏷️ 相关标签

#bison #GNU #语法分析器 #源码
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -