📄 yacc-docs.txt
字号:
in an action resets the parser to its normal mode. The lastexample is better written input : error '\n' { yyerrok; printf( "Reenter last line: " ); } input { $$ = $4; } ; As mentioned above, the token seen immediately after the``error'' symbol is the input token at which the error wasdiscovered. Sometimes, this is inappropriate; for example, anerror recovery action might take upon itself the job of findingthe correct place to resume input. In this case, the previouslookahead token must be cleared. The statement yyclearin ;in an action will have this effect. For example, suppose theaction after error were to call some sophisticated resynchroniza-tion routine, supplied by the user, that attempted to advance theinput to the beginning of the next valid statement. After thisroutine was called, the next token returned by yylex wouldYacc: Yet Another Compiler-Compiler PS1:15-25presumably be the first token in a legal statement; the old,illegal token must be discarded, and the error state reset. Thiscould be done by a rule like stat : error { resynch(); yyerrok ; yyclearin ; } ; These mechanisms are admittedly crude, but do allow for asimple, fairly effective recovery of the parser from many errors;moreover, the user can get control to deal with the error actionsrequired by other portions of the program.8: The Yacc Environment When the user inputs a specification to Yacc, the output isa file of C programs, called y.tab.c on most systems (due tolocal file system conventions, the names may differ from instal-lation to installation). The function produced by Yacc is calledyyparse; it is an integer valued function. When it is called, itin turn repeatedly calls yylex, the lexical analyzer supplied bythe user (see Section 3) to obtain input tokens. Eventually,either an error is detected, in which case (if no error recoveryis possible) yyparse returns the value 1, or the lexical analyzerreturns the endmarker token and the parser accepts. In thiscase, yyparse returns the value 0. The user must provide a certain amount of environment forthis parser in order to obtain a working program. For example,as with every C program, a program called main must be defined,that eventually calls yyparse. In addition, a routine calledyyerror prints a message when a syntax error is detected. These two routines must be supplied in one form or anotherby the user. To ease the initial effort of using Yacc, a libraryhas been provided with default versions of main and yyerror. Thename of this library is system dependent; on many systems thelibrary is accessed by a -ly argument to the loader. To show thetriviality of these default programs, the source is given below: main(){ return( yyparse() ); }and # include <stdio.h> yyerror(s) char *s; { fprintf( stderr, "%s\n", s ); }PS1:15-26 Yacc: Yet Another Compiler-CompilerThe argument to yyerror is a string containing an error message,usually the string ``syntax error''. The average applicationwill want to do better than this. Ordinarily, the program shouldkeep track of the input line number, and print it along with themessage when a syntax error is detected. The external integervariable yychar contains the lookahead token number at the timethe error was detected; this may be of some interest in givingbetter diagnostics. Since the main program is probably suppliedby the user (to read arguments, etc.) the Yacc library is usefulonly in small projects, or in the earliest stages of larger ones. The external integer variable yydebug is normally set to 0.If it is set to a nonzero value, the parser will output a verbosedescription of its actions, including a discussion of which inputsymbols have been read, and what the parser actions are. Depend-ing on the operating environment, it may be possible to set thisvariable by using a debugging system.9: Hints for Preparing Specifications This section contains miscellaneous hints on preparing effi-cient, easy to change, and clear specifications. The individualsubsections are more or less independent.Input Style It is difficult to provide rules with substantial actionsand still have a readable specification file. The followingstyle hints owe much to Brian Kernighan.a. Use all capital letters for token names, all lower case letters for nonterminal names. This rule comes under the heading of ``knowing who to blame when things go wrong.''b. Put grammar rules and actions on separate lines. This allows either to be changed without an automatic need to change the other.c. Put all rules with the same left hand side together. Put the left hand side in only once, and let all following rules begin with a vertical bar.d. Put a semicolon only after the last rule with a given left hand side, and put the semicolon on a separate line. This allows new rules to be easily added.e. Indent rule bodies by two tab stops, and action bodies by three tab stops. The example in Appendix A is written following this style,as are the examples in the text of this paper (where space per-mits). The user must make up his own mind about these stylisticquestions; the central problem, however, is to make the rulesvisible through the morass of action code.Yacc: Yet Another Compiler-Compiler PS1:15-27Left Recursion The algorithm used by the Yacc parser encourages so called``left recursive'' grammar rules: rules of the form name : name rest_of_rule ;These rules frequently arise when writing specifications ofsequences and lists: list : item | list ',' item ;and seq : item | seq item ;In each of these cases, the first rule will be reduced for thefirst item only, and the second rule will be reduced for thesecond and all succeeding items. With right recursive rules, such as seq : item | item seq ;the parser would be a bit bigger, and the items would be seen,and reduced, from right to left. More seriously, an internalstack in the parser would be in danger of overflowing if a verylong sequence were read. Thus, the user should use left recur-sion wherever reasonable. It is worth considering whether a sequence with zero ele-ments has any meaning, and if so, consider writing the sequencespecification with an empty rule: seq : /* empty */ | seq item ;Once again, the first rule would always be reduced exactly once,before the first item was read, and then the second rule would bereduced once for each item read. Permitting empty sequencesoften leads to increased generality. However, conflicts mightarise if Yacc is asked to decide which empty sequence it hasseen, when it hasn't seen enough to know!Lexical Tie-ins Some lexical decisions depend on context. For example, thePS1:15-28 Yacc: Yet Another Compiler-Compilerlexical analyzer might want to delete blanks normally, but notwithin quoted strings. Or names might be entered into a symboltable in declarations, but not in expressions. One way of handling this situation is to create a globalflag that is examined by the lexical analyzer, and set byactions. For example, suppose a program consists of 0 or moredeclarations, followed by 0 or more statements. Consider: %{ int dflag; %} ... other declarations ... %% prog : decls stats ; decls : /* empty */ { dflag = 1; } | decls declaration ; stats : /* empty */ { dflag = 0; } | stats statement ; ... other rules ...The flag dflag is now 0 when reading statements, and 1 when read-ing declarations, except for the first token in the first state-ment. This token must be seen by the parser before it can tellthat the declaration section has ended and the statements havebegun. In many cases, this single token exception does notaffect the lexical scan. This kind of ``backdoor'' approach can be elaborated to anoxious degree. Nevertheless, it represents a way of doing somethings that are difficult, if not impossible, to do otherwise.Reserved Words Some programming languages permit the user to use words like``if'', which are normally reserved, as label or variable names,provided that such use does not conflict with the legal use ofthese names in the programming language. This is extremely hardto do in the framework of Yacc; it is difficult to pass informa-tion to the lexical analyzer telling it ``this instance of `if'is a keyword, and that instance is a variable''. The user canmake a stab at it, using the mechanism described in the last sub-section, but it is difficult.Yacc: Yet Another Compiler-Compiler PS1:15-29 A number of ways of making this easier are under advisement.Until then, it is better that the keywords be reserved; that is,be forbidden for use as variable names. There are powerfulstylistic reasons for preferring this, anyway.10: Advanced Topics This section discusses a number of advanced features ofYacc.Simulating Error and Accept in Actions The parsing actions of error and accept can be simulated inan action by use of macros YYACCEPT and YYERROR. YYACCEPT causesyyparse to return the value 0; YYERROR causes the parser tobehave as if the current input symbol had been a syntax error;yyerror is called, and error recovery takes place. These mechan-isms can be used to simulate parsers with multiple endmarkers orcontext-sensitive syntax checking.Accessing Values in Enclosing Rules. An action may refer to values returned by actions to theleft of the current rule. The mechanism is simply the same aswith ordinary actions, a dollar sign followed by a digit, but inthis case the digit may be 0 or negative. Consider sent : adj noun verb adj noun { look at the sentence . . . } ; adj : THE { $$ = THE; } | YOUNG { $$ = YOUNG; } . . . ; noun : DOG { $$ = DOG; } | CRONE { if( $0 == YOUNG ){ printf( "what?\n" ); } $$ = CRONE; } ; . . .In the action following the word CRONE, a check is made that thepreceding token shifted was not YOUNG. Obviously, this is onlypossible when a great deal is known about what might precede thesymbol noun in the input. There is also a distinctly unstruc-tured flavor about this. Nevertheless, at times this mechanismwill save a great deal of trouble, especially when a few combina-tions are to be excluded from an otherwise regular structure.PS1:15-30 Yacc: Yet Another Compiler-CompilerSupport for Arbitrary Value Types By default, the values returned by actions and the lexicalanalyzer are integers. Yacc can also support values of othertypes, including structures. In addition, Yacc keeps track ofthe types, and inserts appropriate union member names so that theresulting parser will be strictly type checked. The Yacc valuestack (see Section 4) is declared to be a union of the varioustypes of values desired. The user declares the union, and asso-ciates union member names to each token and nonterminal symbolhaving a value. When the value is referenced through a $$ or $nconstruction, Yacc will automatically insert the appropriateunion name, so that no unwanted conversions will take place. Inaddition, type checking commands such as Lint[5] will be far moresilent. There are three mechanisms used to provide for this typing.First, there is a way of defining the union; this must be done bythe user since other programs, notably the l
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -