📄 yacc-docs.txt
字号:
PS1:15-24 Yacc: Yet Another Compiler-Compiler
Error rules such as the above are very general, but diffi-
cult to control. Somewhat easier are rules such as
stat : error ';'
Here, when there is an error, the parser attempts to skip over
the statement, but will do so by skipping to the next ';'. All
tokens after the error and before the next ';' cannot be shifted,
and are discarded. When the ';' is seen, this rule will be
reduced, and any ``cleanup'' action associated with it performed.
Another form of error rule arises in interactive applica-
tions, where it may be desirable to permit a line to be reentered
after an error. A possible error rule might be
input : error '\n' { printf( "Reenter last line: " ); } input
{ $$ = $4; }
There is one potential difficulty with this approach; the parser
must correctly process three input tokens before it admits that
it has correctly resynchronized after the error. If the reen-
tered line contains an error in the first two tokens, the parser
deletes the offending tokens, and gives no message; this is
clearly unacceptable. For this reason, there is a mechanism that
can be used to force the parser to believe that an error has been
fully recovered from. The statement
yyerrok ;
in an action resets the parser to its normal mode. The last
example is better written
input : error '\n'
{ yyerrok;
printf( "Reenter last line: " ); }
input
{ $$ = $4; }
;
As mentioned above, the token seen immediately after the
``error'' symbol is the input token at which the error was
discovered. Sometimes, this is inappropriate; for example, an
error recovery action might take upon itself the job of finding
the correct place to resume input. In this case, the previous
lookahead token must be cleared. The statement
yyclearin ;
in an action will have this effect. For example, suppose the
action after error were to call some sophisticated resynchroniza-
tion routine, supplied by the user, that attempted to advance the
input to the beginning of the next valid statement. After this
routine was called, the next token returned by yylex would
Yacc: Yet Another Compiler-Compiler PS1:15-25
presumably be the first token in a legal statement; the old,
illegal token must be discarded, and the error state reset. This
could be done by a rule like
stat : error
{ resynch();
yyerrok ;
yyclearin ; }
;
These mechanisms are admittedly crude, but do allow for a
simple, fairly effective recovery of the parser from many errors;
moreover, the user can get control to deal with the error actions
required by other portions of the program.
8: The Yacc Environment
When the user inputs a specification to Yacc, the output is
a file of C programs, called y.tab.c on most systems (due to
local file system conventions, the names may differ from instal-
lation to installation). The function produced by Yacc is called
yyparse; it is an integer valued function. When it is called, it
in turn repeatedly calls yylex, the lexical analyzer supplied by
the user (see Section 3) to obtain input tokens. Eventually,
either an error is detected, in which case (if no error recovery
is possible) yyparse returns the value 1, or the lexical analyzer
returns the endmarker token and the parser accepts. In this
case, yyparse returns the value 0.
The user must provide a certain amount of environment for
this parser in order to obtain a working program. For example,
as with every C program, a program called main must be defined,
that eventually calls yyparse. In addition, a routine called
yyerror prints a message when a syntax error is detected.
These two routines must be supplied in one form or another
by the user. To ease the initial effort of using Yacc, a library
has been provided with default versions of main and yyerror. The
name of this library is system dependent; on many systems the
library is accessed by a -ly argument to the loader. To show the
triviality of these default programs, the source is given below:
main(){
return( yyparse() );
}
and
# include <stdio.h>
yyerror(s) char *s; {
fprintf( stderr, "%s\n", s );
}
PS1:15-26 Yacc: Yet Another Compiler-Compiler
The argument to yyerror is a string containing an error message,
usually the string ``syntax error''. The average application
will want to do better than this. Ordinarily, the program should
keep track of the input line number, and print it along with the
message when a syntax error is detected. The external integer
variable yychar contains the lookahead token number at the time
the error was detected; this may be of some interest in giving
better diagnostics. Since the main program is probably supplied
by the user (to read arguments, etc.) the Yacc library is useful
only in small projects, or in the earliest stages of larger ones.
The external integer variable yydebug is normally set to 0.
If it is set to a nonzero value, the parser will output a verbose
description of its actions, including a discussion of which input
symbols have been read, and what the parser actions are. Depend-
ing on the operating environment, it may be possible to set this
variable by using a debugging system.
9: Hints for Preparing Specifications
This section contains miscellaneous hints on preparing effi-
cient, easy to change, and clear specifications. The individual
subsections are more or less independent.
Input Style
It is difficult to provide rules with substantial actions
and still have a readable specification file. The following
style hints owe much to Brian Kernighan.
a. Use all capital letters for token names, all lower case
letters for nonterminal names. This rule comes under the
heading of ``knowing who to blame when things go wrong.''
b. Put grammar rules and actions on separate lines. This
allows either to be changed without an automatic need to
change the other.
c. Put all rules with the same left hand side together. Put
the left hand side in only once, and let all following rules
begin with a vertical bar.
d. Put a semicolon only after the last rule with a given left
hand side, and put the semicolon on a separate line. This
allows new rules to be easily added.
e. Indent rule bodies by two tab stops, and action bodies by
three tab stops.
The example in Appendix A is written following this style,
as are the examples in the text of this paper (where space per-
mits). The user must make up his own mind about these stylistic
questions; the central problem, however, is to make the rules
visible through the morass of action code.
Yacc: Yet Another Compiler-Compiler PS1:15-27
Left Recursion
The algorithm used by the Yacc parser encourages so called
``left recursive'' grammar rules: rules of the form
name : name rest_of_rule ;
These rules frequently arise when writing specifications of
sequences and lists:
list : item
| list ',' item
;
and
seq : item
| seq item
;
In each of these cases, the first rule will be reduced for the
first item only, and the second rule will be reduced for the
second and all succeeding items.
With right recursive rules, such as
seq : item
| item seq
;
the parser would be a bit bigger, and the items would be seen,
and reduced, from right to left. More seriously, an internal
stack in the parser would be in danger of overflowing if a very
long sequence were read. Thus, the user should use left recur-
sion wherever reasonable.
It is worth considering whether a sequence with zero ele-
ments has any meaning, and if so, consider writing the sequence
specification with an empty rule:
seq : /* empty */
| seq item
;
Once again, the first rule would always be reduced exactly once,
before the first item was read, and then the second rule would be
reduced once for each item read. Permitting empty sequences
often leads to increased generality. However, conflicts might
arise if Yacc is asked to decide which empty sequence it has
seen, when it hasn't seen enough to know!
Lexical Tie-ins
Some lexical decisions depend on context. For example, the
PS1:15-28 Yacc: Yet Another Compiler-Compiler
lexical analyzer might want to delete blanks normally, but not
within quoted strings. Or names might be entered into a symbol
table in declarations, but not in expressions.
One way of handling this situation is to create a global
flag that is examined by the lexical analyzer, and set by
actions. For example, suppose a program consists of 0 or more
declarations, followed by 0 or more statements. Consider:
%{
int dflag;
%}
... other declarations ...
%%
prog : decls stats
;
decls : /* empty */
{ dflag = 1; }
| decls declaration
;
stats : /* empty */
{ dflag = 0; }
| stats statement
;
... other rules ...
The flag dflag is now 0 when reading statements, and 1 when read-
ing declarations, except for the first token in the first state-
ment. This token must be seen by the parser before it can tell
that the declaration section has ended and the statements have
begun. In many cases, this single token exception does not
affect the lexical scan.
This kind of ``backdoor'' approach can be elaborated to a
noxious degree. Nevertheless, it represents a way of doing some
things that are difficult, if not impossible, to do otherwise.
Reserved Words
Some programming languages permit the user to use words like
``if'', which are normally reserved, as label or variable names,
provided that such use does not conflict with the legal use of
these names in the programming language. This is extremely hard
to do in the framework of Yacc; it is difficult to pass informa-
tion to the lexical analyzer telling it ``this instance of `if'
is a keyword, and that instance is a variable''. The user can
make a stab at it, using the mechanism described in the last sub-
section, but it is difficult.
Yacc: Yet Another Compiler-Compiler PS1:15-29
A number of ways of making this easier are under advisement.
Until then, it is better that the keywords be reserved; that is,
be forbidden for use as variable names. There are powerful
stylistic reasons for preferring this, anyway.
10: Advanced Topics
This section discusses a number of advanced features of
Yacc.
Simulating Error and Accept in Actions
The parsing actions of error and accept can be simulated in
an action by use of macros YYACCEPT and YYERROR. YYACCEPT causes
yyparse to return the value 0; YYERROR causes the parser to
behave as if the current input symbol had been a syntax error;
yyerror is called, and error recovery takes place. These mechan-
isms can be used to simulate parsers with multiple endmarkers or
context-sensitive syntax checking.
Accessing Values in Enclosing Rules.
An action may refer to values returned by actions to the
left of the current rule. The mechanism is simply the same as
with ordinary actions, a dollar sign followed by a digit, but in
this case the digit may be 0 or negative. Consider
sent : adj noun verb adj noun
{ look at the sentence . . . }
;
adj : THE { $$ = THE; }
| YOUNG { $$ = YOUNG; }
. . .
;
noun : DOG
{ $$ = DOG; }
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -