⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 yacc-docs.txt

📁 windowns 环境的lex和yacc编译器工具
💻 TXT
📖 第 1 页 / 共 5 页
字号:
http://www.csc.calpoly.edu/~gfisher/450/doc/yacc/paper.txtYacc: Yet Another Compiler-Compiler                      PS1:15-1               Yacc: Yet Another Compiler-Compiler                       Stephen C. Johnson                     AT&T Bell Laboratories                  Murray Hill, New Jersey 07974                            ABSTRACT          Computer program input generally has  some  struc-     ture;  in  fact, every computer program that does input     can be thought of as  defining  an  ``input  language''     which  it accepts.  An input language may be as complex     as a programming language, or as simple as  a  sequence     of  numbers.  Unfortunately, usual input facilities are     limited, difficult to use,  and  often  are  lax  about     checking their inputs for validity.          Yacc provides a general tool  for  describing  the     input  to  a computer program.  The Yacc user specifies     the structures of his input, together with code  to  be     invoked  as  each  such  structure is recognized.  Yacc     turns such a specification into a subroutine that  han-     dles  the  input  process; frequently, it is convenient     and appropriate to have most of the flow of control  in     the user's application handled by this subroutine.          The input subroutine  produced  by  Yacc  calls  a     user-supplied  routine  to  return the next basic input     item.  Thus, the user can specify his input in terms of     individual  input  characters,  or  in  terms of higher     level constructs such as names and numbers.  The  user-     supplied  routine  may  also  handle idiomatic features     such as comment  and  continuation  conventions,  which     typically defy easy grammatical specification.          Yacc is written  in  portable  C.   The  class  of     specifications  accepted is a very general one: LALR(1)     grammars with disambiguating rules.          In addition to compilers for C, APL, Pascal,  RAT-     FOR,  etc.,  Yacc  has  also been used for less conven-     tional languages, including a phototypesetter language,     several desk calculator languages, a document retrieval     system, and a Fortran debugging system.PS1:15-2                      Yacc: Yet Another Compiler-Compiler0: Introduction     Yacc provides a general tool for imposing structure  on  theinput to a computer program.  The Yacc user prepares a specifica-tion of the input process; this  includes  rules  describing  theinput  structure,  code to be invoked when these rules are recog-nized, and a low-level routine to do the basic input.  Yacc  thengenerates  a  function  to control the input process.  This func-tion, called a parser, calls the  user-supplied  low-level  inputroutine (the lexical analyzer) to pick up the basic items (calledtokens) from  the  input  stream.   These  tokens  are  organizedaccording  to  the  input  structure rules, called grammar rules;when one of these rules has been recognized, then user code  sup-plied  for  this  rule,  an  action, is invoked; actions have theability to return values and make use  of  the  values  of  otheractions.     Yacc is written in  a  portable  dialect  of  C[1]  and  theactions, and output subroutine, are in C as well.  Moreover, manyof the syntactic conventions of Yacc follow C.     The heart of the input  specification  is  a  collection  ofgrammar  rules.   Each  rule describes an allowable structure andgives it a name.  For example, one grammar rule might be        date  :  month_name  day  ','  year   ;Here, date, month_name, day, and  year  represent  structures  ofinterest  in  the input process; presumably, month_name, day, andyear are defined elsewhere.  The comma ``,'' is enclosed in  sin-gle quotes; this implies that the comma is to appear literally inthe input.  The colon and semicolon merely serve  as  punctuationin  the  rule, and have no significance in controlling the input.Thus, with proper definitions, the input        July  4, 1776might be matched by the above rule.     An important part of the input process is carried out by thelexical  analyzer.   This  user  routine  reads the input stream,recognizing the lower level structures,  and  communicates  thesetokens to the parser.  For historical reasons, a structure recog-nized by the lexical analyzer is called a terminal symbol,  whilethe  structure  recognized  by the parser is called a nonterminalsymbol.  To avoid confusion, terminal  symbols  will  usually  bereferred to as tokens.     There is considerable leeway in deciding whether  to  recog-nize structures using the lexical analyzer or grammar rules.  Forexample, the rulesYacc: Yet Another Compiler-Compiler                      PS1:15-3        month_name  :  'J' 'a' 'n'   ;        month_name  :  'F' 'e' 'b'   ;                 . . .        month_name  :  'D' 'e' 'c'   ;might be used in the above example.  The lexical  analyzer  wouldonly  need  to recognize individual letters, and month_name wouldbe a nonterminal symbol.  Such low-level rules tend to waste timeand  space,  and  may  complicate the specification beyond Yacc'sability to deal with it.  Usually,  the  lexical  analyzer  wouldrecognize  the  month  names,  and  return  an  indication that amonth_name was seen; in this case, month_name would be a token.     Literal characters such as ``,'' must also be passed throughthe lexical analyzer, and are also considered tokens.     Specification files are very flexible.  It is realively easyto add to the above example the rule        date  :  month '/' day '/' year   ;allowing        7 / 4 / 1776as a synonym for        July 4, 1776In most cases, this new rule could be ``slipped in'' to a workingsystem  with  minimal  effort,  and  little  danger of disruptingexisting input.     The input being read may not conform to the  specifications.These input errors are detected as early as is theoretically pos-sible with a left-to-right scan; thus, not only is the chance  ofreading  and computing with bad input data substantially reduced,but the bad data can usually be quickly found.   Error  handling,provided as part of the input specifications, permits the reentryof bad data, or the continuation of the input process after skip-ping over the bad data.     In some cases, Yacc fails to produce a parser when  given  aset  of  specifications.   For example, the specifications may beself contradictory, or they may require a more powerful  recogni-tion  mechanism  than  that  available to Yacc.  The former casesrepresent design errors; the latter cases can often be  correctedby  making  the  lexical  analyzer more powerful, or by rewritingsome of the grammar rules.  While Yacc cannot handle all possiblespecifications,  its  power  compares favorably with similar sys-tems; moreover, the constructions which are difficult for Yacc toPS1:15-4                      Yacc: Yet Another Compiler-Compilerhandle  are also frequently difficult for human beings to handle.Some users have reported that the discipline of formulating validYacc specifications for their input revealed errors of conceptionor design early in the program development.     The theory underlying Yacc has been described  elsewhere.[2,3, 4] Yacc has been extensively used in numerous practical appli-cations, including lint,[5] the Portable  C  Compiler,[6]  and  asystem for typesetting mathematics.[7]     The next several sections  describe  the  basic  process  ofpreparing  a Yacc specification; Section 1 describes the prepara-tion of grammar rules, Section 2 the preparation of the user sup-plied  actions  associated  with  these  rules, and Section 3 thepreparation of lexical analyzers.  Section 4 describes the opera-tion of the parser.  Section 5 discusses various reasons why Yaccmay be unable to produce a parser from a specification, and  whatto  do about it.  Section 6 describes a simple mechanism for han-dling operator precedences in arithmetic expressions.  Section  7discusses  error detection and recovery.  Section 8 discusses theoperating environment and special features of  the  parsers  Yaccproduces.   Section 9 gives some suggestions which should improvethe style and  efficiency  of  the  specifications.   Section  10discusses some advanced topics, and Section 11 gives acknowledge-ments.  Appendix A has a brief example, and Appendix  B  gives  asummary  of  the  Yacc input syntax.  Appendix C gives an exampleusing some of the more advanced features of Yacc,  and,  finally,Appendix  D  describes  mechanisms  and syntax no longer activelysupported, but provided for historical continuity with older ver-sions of Yacc.1: Basic Specifications     Names refer to either tokens or nonterminal  symbols.   Yaccrequires  token  names  to be declared as such.  In addition, forreasons discussed in Section 3, it is often desirable to  includethe lexical analyzer as part of the specification file; it may beuseful to include other programs as well.  Thus, every specifica-tion file consists of three sections: the declarations, (grammar)rules, and programs.  The sections are separated by  double  per-cent  ``%%'' marks.  (The percent ``%'' is generally used in Yaccspecifications as an escape character.)     In other words, a full specification file looks like        declarations        %%        rules        %%        programs     The declaration section may be empty.  Moreover, if the pro-grams section is omitted, the second %% mark may be omitted also;Yacc: Yet Another Compiler-Compiler                      PS1:15-5thus, the smallest legal Yacc specification is        %%        rules     Blanks, tabs, and newlines are ignored except that they  maynot  appear  in  names or multi-character reserved symbols.  Com-ments may appear wherever a name is legal; they are  enclosed  in/* . . . */, as in C and PL/I.     The rules section is made up of one or more  grammar  rules.A grammar rule has the form:        A  :  BODY  ;A represents a nonterminal name, and BODY represents  a  sequenceof  zero or more names and literals.  The colon and the semicolonare Yacc punctuation.     Names may be of arbitrary length, and  may  be  made  up  ofletters,  dot  ``.'',  underscore  ``_'', and non-initial digits.Upper and lower case letters are distinct.  The names used in thebody  of  a grammar rule may represent tokens or nonterminal sym-bols.     A literal consists of a character enclosed in single  quotes``'''.   As  in  C,  the  backslash  ``\'' is an escape characterwithin literals, and all the C escapes are recognized.  Thus        '\n'    newline        '\r'    return        '\''    single quote ``'''        '\\'    backslash ``\''        '\t'    tab        '\b'    backspace        '\f'    form feed        '\xxx'  ``xxx'' in octalFor a number of technical reasons, the NUL character ('\0' or  0)should never be used in grammar rules.     If there are several grammar rules with the same  left  handside,  the  vertical bar ``|'' can be used to avoid rewriting theleft hand side.  In addition, the semicolon at the end of a  rulecan be dropped before a vertical bar.  Thus the grammar rules        A       :       B  C  D   ;        A       :       E  F   ;        A       :       G   ;can be given to Yacc asPS1:15-6                      Yacc: Yet Another Compiler-Compiler        A       :       B  C  D                |       E  F                |       G                ;It is not necessary that all grammar rules  with  the  same  leftside  appear  together  in the grammar rules section, although itmakes the input much more readable, and easier to change.     If a nonterminal symbol matches the empty string,  this  canbe indicated in the obvious way:        empty :   ;     Names representing tokens must be  declared;  this  is  mostsimply done by writing        %token   name1  name2 . . .in the declarations section.  (See Sections 3 , 5, and 6 for muchmore  discussion).   Every  name  not defined in the declarationssection is assumed to represent a nonterminal symbol.  Every non-terminal  symbol  must  appear  on  the left side of at least onerule.     Of all the nonterminal symbols, one, called the  start  sym-bol, has particular importance.  The parser is designed to recog-nize the start symbol; thus, this symbol represents the  largest,most  general  structure  described  by  the  grammar  rules.  Bydefault, the start symbol is taken to be the left  hand  side  ofthe first grammar rule in the rules section.  It is possible, andin fact desirable, to declare the start symbol explicitly in  thedeclarations section using the %start keyword:        %start   symbol     The end of the input to the parser is signaled by a  specialtoken,  called  the  endmarker.   If  the  tokens  up to, but notincluding, the endmarker form a structure which matches the startsymbol,  the parser function returns to its caller after the end-marker is seen; it accepts the input.  If the endmarker  is  seenin any other context, it is an error.     It is the job  of  the  user-supplied  lexical  analyzer  toreturn  the  endmarker  when  appropriate;  see section 3, below.Usually the endmarker  represents  some  reasonably  obvious  I/Ostatus, such as ``end-of-file'' or ``end-of-record''.2: Actions

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -