⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 yacc-docs.txt

📁 windowns 环境的lex和yacc编译器工具
💻 TXT
📖 第 1 页 / 共 5 页
字号:
in an action resets the parser to  its  normal  mode.   The  lastexample is better written        input   :       error  '\n'                                {       yyerrok;                                        printf( "Reenter last line: " );   }                        input                                {       $$  =  $4;  }                ;     As mentioned above, the token  seen  immediately  after  the``error''  symbol  is  the  input  token  at  which the error wasdiscovered.  Sometimes, this is inappropriate;  for  example,  anerror  recovery  action might take upon itself the job of findingthe correct place to resume input.  In this  case,  the  previouslookahead token must be cleared.  The statement        yyclearin ;in an action will have this effect.   For  example,  suppose  theaction after error were to call some sophisticated resynchroniza-tion routine, supplied by the user, that attempted to advance theinput  to  the beginning of the next valid statement.  After thisroutine was called,  the  next  token  returned  by  yylex  wouldYacc: Yet Another Compiler-Compiler                     PS1:15-25presumably  be  the  first  token  in a legal statement; the old,illegal token must be discarded, and the error state reset.  Thiscould be done by a rule like        stat    :       error                                {       resynch();                                        yyerrok ;                                        yyclearin ;   }                ;     These mechanisms are admittedly crude, but do  allow  for  asimple, fairly effective recovery of the parser from many errors;moreover, the user can get control to deal with the error actionsrequired by other portions of the program.8: The Yacc Environment     When the user inputs a specification to Yacc, the output  isa  file  of  C  programs,  called y.tab.c on most systems (due tolocal file system conventions, the names may differ from  instal-lation to installation).  The function produced by Yacc is calledyyparse; it is an integer valued function.  When it is called, itin  turn repeatedly calls yylex, the lexical analyzer supplied bythe user (see Section 3) to  obtain  input  tokens.   Eventually,either  an error is detected, in which case (if no error recoveryis possible) yyparse returns the value 1, or the lexical analyzerreturns  the  endmarker  token  and  the parser accepts.  In thiscase, yyparse returns the value 0.     The user must provide a certain amount  of  environment  forthis  parser  in order to obtain a working program.  For example,as with every C program, a program called main must  be  defined,that  eventually  calls  yyparse.   In addition, a routine calledyyerror prints a message when a syntax error is detected.     These two routines must be supplied in one form  or  anotherby the user.  To ease the initial effort of using Yacc, a libraryhas been provided with default versions of main and yyerror.  Thename  of  this  library  is system dependent; on many systems thelibrary is accessed by a -ly argument to the loader.  To show thetriviality of these default programs, the source is given below:        main(){                return( yyparse() );                }and        # include <stdio.h>        yyerror(s) char *s; {                fprintf( stderr, "%s\n", s );                }PS1:15-26                     Yacc: Yet Another Compiler-CompilerThe argument to yyerror is a string containing an error  message,usually  the  string  ``syntax  error''.  The average applicationwill want to do better than this.  Ordinarily, the program shouldkeep  track of the input line number, and print it along with themessage when a syntax error is detected.   The  external  integervariable  yychar  contains the lookahead token number at the timethe error was detected; this may be of some  interest  in  givingbetter  diagnostics.  Since the main program is probably suppliedby the user (to read arguments, etc.) the Yacc library is  usefulonly in small projects, or in the earliest stages of larger ones.     The external integer variable yydebug is normally set to  0.If it is set to a nonzero value, the parser will output a verbosedescription of its actions, including a discussion of which inputsymbols have been read, and what the parser actions are.  Depend-ing on the operating environment, it may be possible to set  thisvariable by using a debugging system.9: Hints for Preparing Specifications     This section contains miscellaneous hints on preparing effi-cient,  easy to change, and clear specifications.  The individualsubsections are more or less independent.Input Style     It is difficult to provide rules  with  substantial  actionsand  still  have  a  readable  specification file.  The followingstyle hints owe much to Brian Kernighan.a.   Use all capital letters for  token  names,  all  lower  case     letters  for  nonterminal  names.  This rule comes under the     heading of ``knowing who to blame when things go wrong.''b.   Put grammar rules  and  actions  on  separate  lines.   This     allows  either  to  be  changed without an automatic need to     change the other.c.   Put all rules with the same left hand  side  together.   Put     the left hand side in only once, and let all following rules     begin with a vertical bar.d.   Put a semicolon only after the last rule with a  given  left     hand  side,  and put the semicolon on a separate line.  This     allows new rules to be easily added.e.   Indent rule bodies by two tab stops, and  action  bodies  by     three tab stops.     The example in Appendix A is written following  this  style,as  are  the examples in the text of this paper (where space per-mits).  The user must make up his own mind about these  stylisticquestions;  the  central  problem,  however, is to make the rulesvisible through the morass of action code.Yacc: Yet Another Compiler-Compiler                     PS1:15-27Left Recursion     The algorithm used by the Yacc parser encourages  so  called``left recursive'' grammar rules: rules of the form        name    :       name  rest_of_rule  ;These rules  frequently  arise  when  writing  specifications  ofsequences and lists:        list    :       item                |       list  ','  item                ;and        seq     :       item                |       seq  item                ;In each of these cases, the first rule will be  reduced  for  thefirst  item  only,  and  the  second rule will be reduced for thesecond and all succeeding items.     With right recursive rules, such as        seq     :       item                |       item  seq                ;the parser would be a bit bigger, and the items  would  be  seen,and  reduced,  from  right  to left.  More seriously, an internalstack in the parser would be in danger of overflowing if  a  verylong  sequence  were read.  Thus, the user should use left recur-sion wherever reasonable.     It is worth considering whether a sequence  with  zero  ele-ments  has  any meaning, and if so, consider writing the sequencespecification with an empty rule:        seq     :       /* empty */                |       seq  item                ;Once again, the first rule would always be reduced exactly  once,before the first item was read, and then the second rule would bereduced once for each  item  read.   Permitting  empty  sequencesoften  leads  to  increased generality.  However, conflicts mightarise if Yacc is asked to decide  which  empty  sequence  it  hasseen, when it hasn't seen enough to know!Lexical Tie-ins     Some lexical decisions depend on context.  For example,  thePS1:15-28                     Yacc: Yet Another Compiler-Compilerlexical  analyzer  might  want to delete blanks normally, but notwithin quoted strings.  Or names might be entered into  a  symboltable in declarations, but not in expressions.     One way of handling this situation is  to  create  a  globalflag  that  is  examined  by  the  lexical  analyzer,  and set byactions.  For example, suppose a program consists of  0  or  moredeclarations, followed by 0 or more statements.  Consider:        %{                int dflag;        %}          ...  other declarations ...        %%        prog    :       decls  stats                ;        decls   :       /* empty */                                {       dflag = 1;  }                |       decls  declaration                ;        stats   :       /* empty */                                {       dflag = 0;  }                |       stats  statement                ;            ...  other rules ...The flag dflag is now 0 when reading statements, and 1 when read-ing  declarations, except for the first token in the first state-ment.  This token must be seen by the parser before it  can  tellthat  the  declaration  section has ended and the statements havebegun.  In many cases,  this  single  token  exception  does  notaffect the lexical scan.     This kind of ``backdoor'' approach can be  elaborated  to  anoxious  degree.  Nevertheless, it represents a way of doing somethings that are difficult, if not impossible, to do otherwise.Reserved Words     Some programming languages permit the user to use words like``if'',  which are normally reserved, as label or variable names,provided that such use does not conflict with the  legal  use  ofthese  names in the programming language.  This is extremely hardto do in the framework of Yacc; it is difficult to pass  informa-tion  to  the lexical analyzer telling it ``this instance of `if'is a keyword, and that instance is a variable''.   The  user  canmake a stab at it, using the mechanism described in the last sub-section, but it is difficult.Yacc: Yet Another Compiler-Compiler                     PS1:15-29     A number of ways of making this easier are under advisement.Until  then, it is better that the keywords be reserved; that is,be forbidden for use  as  variable  names.   There  are  powerfulstylistic reasons for preferring this, anyway.10: Advanced Topics     This section discusses a  number  of  advanced  features  ofYacc.Simulating Error and Accept in Actions     The parsing actions of error and accept can be simulated  inan action by use of macros YYACCEPT and YYERROR.  YYACCEPT causesyyparse to return the value  0;  YYERROR  causes  the  parser  tobehave  as  if  the current input symbol had been a syntax error;yyerror is called, and error recovery takes place.  These mechan-isms  can be used to simulate parsers with multiple endmarkers orcontext-sensitive syntax checking.Accessing Values in Enclosing Rules.     An action may refer to values returned  by  actions  to  theleft  of  the  current rule.  The mechanism is simply the same aswith ordinary actions, a dollar sign followed by a digit, but  inthis case the digit may be 0 or negative.  Consider        sent    :       adj  noun  verb  adj  noun                                {  look at the sentence . . .  }                ;        adj     :       THE             {       $$ = THE;  }                |       YOUNG   {       $$ = YOUNG;  }                . . .                ;        noun    :       DOG                                {       $$ = DOG;  }                |       CRONE                                {       if( $0 == YOUNG ){                                                printf( "what?\n" );                                                }                                        $$ = CRONE;                                        }                ;                . . .In the action following the word CRONE, a check is made that  thepreceding  token  shifted was not YOUNG.  Obviously, this is onlypossible when a great deal is known about what might precede  thesymbol  noun  in  the input.  There is also a distinctly unstruc-tured flavor about this.  Nevertheless, at times  this  mechanismwill save a great deal of trouble, especially when a few combina-tions are to be excluded from an otherwise regular structure.PS1:15-30                     Yacc: Yet Another Compiler-CompilerSupport for Arbitrary Value Types     By default, the values returned by actions and  the  lexicalanalyzer  are  integers.   Yacc  can also support values of othertypes, including structures.  In addition, Yacc  keeps  track  ofthe types, and inserts appropriate union member names so that theresulting parser will be strictly type checked.  The  Yacc  valuestack  (see  Section  4) is declared to be a union of the varioustypes of values desired.  The user declares the union, and  asso-ciates  union  member  names to each token and nonterminal symbolhaving a value.  When the value is referenced through a $$ or  $nconstruction,  Yacc  will  automatically  insert  the appropriateunion name, so that no unwanted conversions will take place.   Inaddition, type checking commands such as Lint[5] will be far moresilent.     There are three mechanisms used to provide for this  typing.First, there is a way of defining the union; this must be done bythe user since other programs, notably the l

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -