📄 tply.tex
字号:
ambiguities are also caused by design errors in the grammar. Hence, if
TP Yacc reports any parsing conflicts when constructing the parser, you
should use the \verb"-v" option to generate the parser description
(\verb".lst" file) and check whether TP Yacc resolved the conflicts correctly.
There is one type of syntactic constructs for which one often deliberately
uses an ambigious grammar as a more concise representation for a language
that could also be specified unambigiously: the syntax of expressions.
For instance, the following is an unambigious grammar for simple arithmetic
expressions:
\begin{quote}\begin{verbatim}
%token NUM
%%
expr : term
| expr '+' term
;
term : factor
| term '*' factor
;
factor : '(' expr ')'
| NUM
;
\end{verbatim}\end{quote}
You may check yourself that this grammar gives \verb"*" a higher precedence
than \verb"+" and makes both operators left-associative. The same effect can
be achieved with the following ambigious grammar using precedence definitions:
\begin{quote}\begin{verbatim}
%token NUM
%left '+'
%left '*'
%%
expr : expr '+' expr
| expr '*' expr
| '(' expr ')'
| NUM
;
\end{verbatim}\end{quote}
Without the precedence definitions, this is an ambigious grammar causing
a number of shift/reduce conflicts. The precedence definitions are used
to correctly resolve these conflicts (conflicts resolved using precedence
will not be reported by TP Yacc).
Each precedence definition introduces a new precedence level (lowest
precedence first) and specifies whether the corresponding operators
should be left-, right- or nonassociative (nonassociative operators
cannot be combined at all; example: relational operators in Pascal).
TP Yacc uses precedence information to resolve shift/reduce conflicts as
follows. Precedences are associated with each terminal occuring in a
precedence definition. Furthermore, each grammar rule is given the
precedence of its rightmost terminal (this default choice can be
overwritten using a \verb"%prec" tag; see below). To resolve a shift/reduce
conflict using precedence, both the symbol and the rule involved must
have been assigned precedences. TP Yacc then chooses the parse action
as follows:
\begin{itemize}
\item
If the symbol has higher precedence than the rule: shift.
\item
If the rule has higher precedence than the symbol: reduce.
\item
If symbol and rule have the same precedence, the associativity of the
symbol determines the parse action: if the symbol is left-associative:
reduce; if the symbol is right-associative: shift; if the symbol is
non-associative: error.
\end{itemize}
To give you an idea of how this works, let us consider our ambigious
arithmetic expression grammar (without precedences):
\begin{quote}\begin{verbatim}
%token NUM
%%
expr : expr '+' expr
| expr '*' expr
| '(' expr ')'
| NUM
;
\end{verbatim}\end{quote}
This grammar generates four shift/reduce conflicts. The description
of state 8 reads as follows:
\begin{quote}\begin{verbatim}
state 8:
*** conflicts:
shift 4, reduce 1 on '*'
shift 5, reduce 1 on '+'
expr : expr '+' expr _ (1)
expr : expr _ '+' expr
expr : expr _ '*' expr
'*' shift 4
'+' shift 5
$end reduce 1
')' reduce 1
. error
\end{verbatim}\end{quote}
In this state, we have successfully parsed a \verb"+" expression (rule 1).
When the next symbol is \verb"+" or \verb"*", we have the choice between the
reduction and shifting the symbol. Using the default shift/reduce
disambiguating rule, TP Yacc has resolved these conflicts in favour of shift.
Now let us assume the above precedence definition:
\begin{quote}\begin{verbatim}
%left '+'
%left '*'
\end{verbatim}\end{quote}
which gives \verb"*" higher precedence than \verb"+" and makes both operators
left-associative. The rightmost terminal in rule 1 is \verb"+". Hence, given
these precedence definitions, the first conflict will be resolved in favour
of shift (\verb"*" has higher precedence than \verb"+"), while the second one
is resolved in favour of reduce (\verb"+" is left-associative).
Similar conflicts arise in state 7:
\begin{quote}\begin{verbatim}
state 7:
*** conflicts:
shift 4, reduce 2 on '*'
shift 5, reduce 2 on '+'
expr : expr '*' expr _ (2)
expr : expr _ '+' expr
expr : expr _ '*' expr
'*' shift 4
'+' shift 5
$end reduce 2
')' reduce 2
. error
\end{verbatim}\end{quote}
Here, we have successfully parsed a \verb"*" expression which may be followed
by another \verb"+" or \verb"*" operator. Since \verb"*" is left-associative
and has higher precedence than \verb"+", both conflicts will be resolved in
favour of reduce.
Of course, you can also have different operators on the same precedence
level. For instance, consider the following extended version of the
arithmetic expression grammar:
\begin{quote}\begin{verbatim}
%token NUM
%left '+' '-'
%left '*' '/'
%%
expr : expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '(' expr ')'
| NUM
;
\end{verbatim}\end{quote}
This puts all ``addition'' operators on the first and all ``multiplication''
operators on the second precedence level. All operators are left-associative;
for instance, \verb"5+3-2" will be parsed as \verb"(5+3)-2".
By default, TP Yacc assigns each rule the precedence of its rightmost
terminal. This is a sensible decision in most cases. Occasionally, it
may be necessary to overwrite this default choice and explicitly assign
a precedence to a rule. This can be done by putting a precedence tag
of the form
\begin{quote}\begin{verbatim}
%prec symbol
\end{verbatim}\end{quote}
at the end of the corresponding rule which gives the rule the precedence
of the specified symbol. For instance, to extend the expression grammar
with a unary minus operator, giving it highest precedence, you may write:
\begin{quote}\begin{verbatim}
%token NUM
%left '+' '-'
%left '*' '/'
%right UMINUS
%%
expr : expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '-' expr %prec UMINUS
| '(' expr ')'
| NUM
;
\end{verbatim}\end{quote}
Note the use of the \verb"UMINUS" token which is not an actual input symbol
but whose sole purpose it is to give unary minus its proper precedence. If
we omitted the precedence tag, both unary and binary minus would have the
same precedence because they are represented by the same input symbol.
\subsection{Error Handling}
Syntactic error handling is a difficult area in the design of user-friendly
parsers. Usually, you will not like to have the parser give up upon the
first occurrence of an errorneous input symbol. Instead, the parser should
recover from a syntax error, that is, it should try to find a place in the
input where it can resume the parse.
TP Yacc provides a general mechanism to implement parsers with error
recovery. A special predefined \verb"error" token may be used in grammar rules
to indicate positions where syntax errors might occur. When the parser runs
into an error action (i.e., reads an errorneous input symbol) it prints out
an error message and starts error recovery by popping its stack until it
uncovers a state in which there is a shift action on the \verb"error" token.
If there is no such state, the parser terminates with return value 1,
indicating an unrecoverable syntax error. If there is such a state, the
parser takes the shift on the \verb"error" token (pretending it has seen
an imaginary \verb"error" token in the input), and resumes parsing in a
special ``error mode.''
While in error mode, the parser quietly skips symbols until it can again
perform a legal shift action. To prevent a cascade of error messages, the
parser returns to its normal mode of operation only after it has seen
and shifted three legal input symbols. Any additional error found after
the first shifted symbol restarts error recovery, but no error message
is printed. The TP Yacc library routine \verb"yyerrok" may be used to reset
the parser to its normal mode of operation explicitly.
For a simple example, consider the rule
\begin{quote}\begin{verbatim}
stmt : error ';' { yyerrok; }
\end{verbatim}\end{quote}
and assume a syntax error occurs while a statement (nonterminal \verb"stmt")
is parsed. The parser prints an error message, then pops its stack until it
can shift the token \verb"error" of the error rule. Proceeding in error mode,
it will skip symbols until it finds a semicolon, then reduces by the error
rule. The call to \verb"yyerrok" tells the parser that we have recovered from
the error and that it should proceed with the normal parse. This kind of
``panic mode'' error recovery scheme works well when statements are always
terminated with a semicolon. The parser simply skips the ``bad'' statement
and then resumes the parse.
Implementing a good error recovery scheme can be a difficult task; see
Aho/Sethi/Ullman (1986) for a more comprehensive treatment of this topic.
Schreiner and Friedman have developed a systematic technique to implement
error recovery with Yacc which I found quite useful (I used it myself
to implement error recovery in the TP Yacc parser); see Schreiner/Friedman
(1985).
\subsection{Yacc Library}
The TP Yacc library (\verb"YaccLib") unit provides some global declarations
used by the parser routine \verb"yyparse", and some variables and utility
routines which may be used to control the actions of the parser and to
implement error recovery. See the file \verb"yacclib.pas" for a description
of these variables and routines.
You can also modify the Yacc library unit (and/or the code template in the
\verb"yyparse.cod" file) to customize TP Yacc to your target applications.
\subsection{Other Features}
TP Yacc supports all additional language elements entitled as ``Old Features
Supported But not Encouraged'' in the UNIX manual, which are provided for
backward compatibility with older versions of (UNIX) Yacc:
\begin{itemize}
\item
literals delimited by double quotes.
\item
multiple-character literals. Note that these are not treated as
character sequences but represent single tokens which are given a
symbolic integer code just like any other token identifier. However,
they will not be declared in the output file, so you have to make sure
yourself that the lexical analyzer returns the correct codes for these
symbols. E.g., you might explicitly assign token numbers by using a
definition like
\begin{quote}\begin{verbatim}
%token ':=' 257
\end{verbatim}\end{quote}
at the beginning of the Yacc grammar.
\item
\verb"\" may be used instead of \verb"%", i.e. \verb"\\" means
\verb"%%", \verb"\left" is the same as \verb"%left", etc.
\item
other synonyms:
\begin{itemize}
\item \verb"%<" for \verb"%left"
\item \verb"%>" for \verb"%right"
\item \verb"%binary" or \verb"%2" for \verb"%nonassoc"
\item \verb"%term" or \verb"%0" for \verb"%token"
\item \verb"%=" for \verb"%prec"
\end{itemize}
\item
actions may also be written as \verb"= { ... }" or
\verb"= single-statement;"
\item
Turbo Pascal declarations (\verb"%{ ... %}") may be put at the
beginning of the rules section. They will be treated as local
declarations of the actions routine.
\end{itemize}
\subsection{Implementation Restrictions}
As with TP Lex, internal table sizes and the main memory available limit the
complexity of source grammars that TP Yacc can handle. However, the maximum
table sizes provided by TP
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -