📄 tply.doc

📁 Yacc例子代码
💻 DOC
📖 第 1 页 / 共 4 页
字号:
上一页 1 2 34

The reduce/reduce disambiguating rule is used to resolve conflicts that
arise when there is more than one grammar rule matching a given construct.
Such ambiguities are often caused by "special case constructs" which may be
given priority by simply listing the more specific rules ahead of the more
general ones.

For instance, the following is an excerpt from the grammar describing the
input language of the UNIX equation formatter EQN:

%right SUB SUP
%%
expr : expr SUB expr SUP expr
     | expr SUB expr
     | expr SUP expr
     ;

Here, the SUB and SUP operator symbols denote sub- and superscript,
respectively. The rationale behind this example is that an expression
involving both sub- and superscript is often set differently from a
superscripted subscripted expression. This special case is therefore
caught by the first rule in the above example which causes a reduce/reduce
conflict with rule 3 in expressions like expr-1 SUB expr-2 SUP expr-3.
The conflict is resolved in favour of the first rule.

In both cases discussed above, the ambiguities could also be eliminated
by rewriting the grammar accordingly (although this yields more complicated
and less readable grammars). This may not always be the case. Often
ambiguities are also caused by design errors in the grammar. Hence, if
TP Yacc reports any parsing conflicts when constructing the parser, you
should use the -v option to generate the parser description (.lst file)
and check whether TP Yacc resolved the conflicts correctly.

There is one type of syntactic constructs for which one often deliberately
uses an ambigious grammar as a more concise representation for a language
that could also be specified unambigiously: the syntax of expressions.
For instance, the following is an unambigious grammar for simple arithmetic
expressions:

%token NUM

%%

expr	: term
	| expr '+' term
        ;

term	: factor
	| term '*' factor
        ;

factor	: '(' expr ')'
	| NUM
        ;

You may check yourself that this grammar gives * a higher precedence than
+ and makes both operators left-associative. The same effect can be achieved
with the following ambigious grammar using precedence definitions:

%token NUM
%left '+'
%left '*'
%%
expr : expr '+' expr
     | expr '*' expr
     | '(' expr ')'
     | NUM
     ;

Without the precedence definitions, this is an ambigious grammar causing
a number of shift/reduce conflicts. The precedence definitions are used
to correctly resolve these conflicts (conflicts resolved using precedence
will not be reported by TP Yacc).

Each precedence definition introduces a new precedence level (lowest
precedence first) and specifies whether the corresponding operators
should be left-, right- or nonassociative (nonassociative operators
cannot be combined at all; example: relational operators in Pascal).

TP Yacc uses precedence information to resolve shift/reduce conflicts as
follows. Precedences are associated with each terminal occuring in a
precedence definition. Furthermore, each grammar rule is given the
precedence of its rightmost terminal (this default choice can be
overwritten using a %prec tag; see below). To resolve a shift/reduce
conflict using precedence, both the symbol and the rule involved must
have been assigned precedences. TP Yacc then chooses the parse action
as follows:

- If the symbol has higher precedence than the rule: shift.

- If the rule has higher precedence than the symbol: reduce.

- If symbol and rule have the same precedence, the associativity of the
  symbol determines the parse action: if the symbol is left-associative:
  reduce; if the symbol is right-associative: shift; if the symbol is
  non-associative: error.

To give you an idea of how this works, let us consider our ambigious
arithmetic expression grammar (without precedences):

%token NUM
%%
expr : expr '+' expr
     | expr '*' expr
     | '(' expr ')'
     | NUM
     ;

This grammar generates four shift/reduce conflicts. The description
of state 8 reads as follows:

state 8:

	*** conflicts:

	shift 4, reduce 1 on '*'
	shift 5, reduce 1 on '+'

	expr : expr '+' expr _	(1)
	expr : expr _ '+' expr
	expr : expr _ '*' expr

	'*'	shift 4
	'+'	shift 5
	$end	reduce 1
	')'	reduce 1
	.	error

In this state, we have successfully parsed a + expression (rule 1). When
the next symbol is + or *, we have the choice between the reduction and
shifting the symbol. Using the default shift/reduce disambiguating rule,
TP Yacc has resolved these conflicts in favour of shift.

Now let us assume the above precedence definition:

   %left '+'
   %left '*'

which gives * higher precedence than + and makes both operators left-
associative. The rightmost terminal in rule 1 is +. Hence, given these
precedence definitions, the first conflict will be resolved in favour
of shift (* has higher precedence than +), while the second one is resolved
in favour of reduce (+ is left-associative).

Similar conflicts arise in state 7:

state 7:

	*** conflicts:

	shift 4, reduce 2 on '*'
	shift 5, reduce 2 on '+'

	expr : expr '*' expr _	(2)
	expr : expr _ '+' expr
	expr : expr _ '*' expr

	'*'	shift 4
	'+'	shift 5
	$end	reduce 2
	')'	reduce 2
	.	error

Here, we have successfully parsed a * expression which may be followed
by another + or * operator. Since * is left-associative and has higher
precedence than +, both conflicts will be resolved in favour of reduce.

Of course, you can also have different operators on the same precedence
level. For instance, consider the following extended version of the
arithmetic expression grammar:

%token NUM
%left '+' '-'
%left '*' '/'
%%
expr	: expr '+' expr
	| expr '-' expr
        | expr '*' expr
        | expr '/' expr
        | '(' expr ')'
        | NUM
        ;

This puts all "addition" operators on the first and all "multiplication"
operators on the second precedence level. All operators are left-associative;
for instance, 5+3-2 will be parsed as (5+3)-2.

By default, TP Yacc assigns each rule the precedence of its rightmost
terminal. This is a sensible decision in most cases. Occasionally, it
may be necessary to overwrite this default choice and explicitly assign
a precedence to a rule. This can be done by putting a precedence tag
of the form

   %prec symbol

at the end of the corresponding rule which gives the rule the precedence
of the specified symbol. For instance, to extend the expression grammar
with a unary minus operator, giving it highest precedence, you may write:

%token NUM
%left '+' '-'
%left '*' '/'
%right UMINUS
%%
expr	: expr '+' expr
	| expr '-' expr
        | expr '*' expr
        | expr '/' expr
        | '-' expr      %prec UMINUS
        | '(' expr ')'
        | NUM
        ;

Note the use of the UMINUS token which is not an actual input symbol but
whose sole purpose it is to give unary minus its proper precedence. If
we omitted the precedence tag, both unary and binary minus would have the
same precedence because they are represented by the same input symbol.


Error Handling
--------------

Syntactic error handling is a difficult area in the design of user-friendly
parsers. Usually, you will not like to have the parser give up upon the
first occurrence of an errorneous input symbol. Instead, the parser should
recover from a syntax error, that is, it should try to find a place in the
input where it can resume the parse.

TP Yacc provides a general mechanism to implement parsers with error
recovery. A special predefined "error" token may be used in grammar rules
to indicate positions where syntax errors might occur. When the parser runs
into an error action (i.e., reads an errorneous input symbol) it prints out
an error message and starts error recovery by popping its stack until it
uncovers a state in which there is a shift action on the error token. If
there is no such state, the parser terminates with return value 1, indicating
an unrecoverable syntax error. If there is such a state, the parser takes the
shift on the error token (pretending it has seen an imaginary error token in
the input), and resumes parsing in a special "error mode."

While in error mode, the parser quietly skips symbols until it can again
perform a legal shift action. To prevent a cascade of error messages, the
parser returns to its normal mode of operation only after it has seen
and shifted three legal input symbols. Any additional error found after
the first shifted symbol restarts error recovery, but no error message
is printed. The TP Yacc library routine yyerrok may be used to reset the
parser to its normal mode of operation explicitly.

For a simple example, consider the rule

stmt	: error ';' { yyerrok; }

and assume a syntax error occurs while a statement (nonterminal stmt) is
parsed. The parser prints an error message, then pops its stack until it
can shift the token error of the error rule. Proceeding in error mode, it
will skip symbols until it finds a semicolon, then reduces by the error
rule. The call to yyerrok tells the parser that we have recovered from
the error and that it should proceed with the normal parse. This kind of
"panic mode" error recovery scheme works well when statements are always
terminated with a semicolon. The parser simply skips the "bad" statement
and then resumes the parse.

Implementing a good error recovery scheme can be a difficult task; see
Aho/Sethi/Ullman (1986) for a more comprehensive treatment of this topic.
Schreiner and Friedman have developed a systematic technique to implement
error recovery with Yacc which I found quite useful (I used it myself
to implement error recovery in the TP Yacc parser); see Schreiner/Friedman
(1985).


Yacc Library
------------

The TP Yacc library (YaccLib) unit provides some global declarations used
by the parser routine yyparse, and some variables and utility routines
which may be used to control the actions of the parser and to implement
error recovery. See the file yacclib.pas for a description of these
variables and routines.

You can also modify the Yacc library unit (and/or the code template in the
yyparse.cod file) to customize TP Yacc to your target applications.


Other Features
--------------

TP Yacc supports all additional language elements entitled as "Old Features
Supported But not Encouraged" in the UNIX manual, which are provided for
backward compatibility with older versions of (UNIX) Yacc:

- literals delimited by double quotes.

- multiple-character literals. Note that these are not treated as character
  sequences but represent single tokens which are given a symbolic integer
  code just like any other token identifier. However, they will not be
  declared in the output file, so you have to make sure yourself that
  the lexical analyzer returns the correct codes for these symbols. E.g.,
  you might explicitly assign token numbers by using a definition like

     %token ':=' 257

  at the beginning of the Yacc grammar.

- \ may be used instead of %, i.e. \\ means %%, \left is the same as %left,
  etc.

- other synonyms:
  %<             for %left
  %>             for %right
  %binary or %2  for %nonassoc
  %term or %0    for %token
  %=             for %prec

- actions may also be written as = { ... } or = single-statement;

- Turbo Pascal declarations (%{ ... %}) may be put at the beginning of the
  rules section. They will be treated as local declarations of the actions
  routine.


Implementation Restrictions
---------------------------

As with TP Lex, internal table sizes and the main memory available limit the
complexity of source grammars that TP Yacc can handle. However, the maximum
table sizes provided by TP Yacc are large enough to handle quite complex
grammars (such as the Pascal grammar in the TP Yacc distribution). The actual
table sizes are shown in the statistics printed by TP Yacc when a compilation
is finished. The given figures are "s" (states), "i" (LR0 kernel items), "t"
(shift and goto transitions) and "r" (reductions).

The default stack size of the generated parsers is yymaxdepth = 1024, as
declared in the TP Yacc library unit. This should be sufficient for any
average application, but you can change the stack size by including a
corresponding declaration in the definitions part of the Yacc grammar
(or change the value in the YaccLib unit). Note that right-recursive
grammar rules may increase stack space requirements, so it is a good
idea to use left-recursive rules wherever possible.


Differences from UNIX Yacc
--------------------------

Major differences between TP Yacc and UNIX Yacc are listed below.

- TP Yacc produces output code for Turbo Pascal, rather than for C.

- TP Yacc does not support %union definitions. Instead, a value type is
  declared by specifying the type identifier itself as the tag of a %token
  or %type definition. TP Yacc will automatically generate an appropriate
  variant record type (YYSType) which is capable of holding values of any
  of the types used in %token and %type.

  Type checking is very strict. If you use type definitions, then
  any symbol referred to in an action must have a type introduced
  in a type definition. Either the symbol must have been assigned a
  type in the definitions section, or the $<type-identifier> notation
  must be used. The syntax of the %type definition has been changed
  slightly to allow definitions of the form
     %type <type-identifier>
  (omitting the nonterminals) which may be used to declare types which
  are not assigned to any grammar symbol, but are used with the
  $<...> construct.

- The parse tables constructed by this Yacc version are slightly greater
  than those constructed by UNIX Yacc, since a reduce action will only be
  chosen as the default action if it is the only action in the state.
  In difference, UNIX Yacc chooses a reduce action as the default action
  whenever it is the only reduce action of the state (even if there are
  other shift actions).

  This solves a bug in UNIX Yacc that makes the generated parser start
  error recovery too late with certain types of error productions (see
  also Schreiner/Friedman, "Introduction to compiler construction with
  UNIX," 1985). Also, errors will be caught sooner in most cases where
  UNIX Yacc would carry out an additional (default) reduction before
  detecting the error.

- Library routines are named differently from the UNIX version (e.g.,
  the `yyerrlab' routine takes the place of the `YYERROR' macro of UNIX
  Yacc), and, of course, all macros of UNIX Yacc (YYERROR, YYACCEPT, etc.)
  had to be implemented as procedures.
上一页 1 2 34
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -