📄 tply.tex

📁 Yacc例子代码
💻 TEX
📖 第 1 页 / 共 5 页
字号:
\begin{quote}\begin{verbatim}
%token NUM
%left '+'
%left '*'
%%
expr : expr '+' expr
     | expr '*' expr
     | '(' expr ')'
     | NUM
     ;
\end{verbatim}\end{quote}

When run with the \verb"-v" option on the above grammar, TP Yacc generates
the parser description listed below.

\begin{quote}\begin{verbatim}
state 0:

        $accept : _ expr $end

        '('     shift 2
        NUM     shift 3
        .       error

        expr    goto 1

state 1:

        $accept : expr _ $end
        expr : expr _ '+' expr
        expr : expr _ '*' expr

        $end    accept
        '*'     shift 4
        '+'     shift 5
        .       error

state 2:

        expr : '(' _ expr ')'

        '('     shift 2
        NUM     shift 3
        .       error

        expr    goto 6

state 3:

        expr : NUM _    (4)

        .       reduce 4

state 4:

        expr : expr '*' _ expr

        '('     shift 2
        NUM     shift 3
        .       error

        expr    goto 7

state 5:

        expr : expr '+' _ expr

        '('     shift 2
        NUM     shift 3
        .       error

        expr    goto 8

state 6:

        expr : '(' expr _ ')'
        expr : expr _ '+' expr
        expr : expr _ '*' expr

        ')'     shift 9
        '*'     shift 4
        '+'     shift 5
        .       error

state 7:

        expr : expr '*' expr _  (2)
        expr : expr _ '+' expr
        expr : expr _ '*' expr

        .       reduce 2

state 8:

        expr : expr '+' expr _  (1)
        expr : expr _ '+' expr
        expr : expr _ '*' expr

        '*'     shift 4
        $end    reduce 1
        ')'     reduce 1
        '+'     reduce 1
        .       error

state 9:

        expr : '(' expr ')' _   (3)

        .       reduce 3
\end{verbatim}\end{quote}

Each state of the parser corresponds to a certain prefix of the input
which has already been seen. The parser description lists the grammar
rules wich are parsed in each state, and indicates the portion of each
rule which has already been parsed by an underscore. In state 0, the
start state of the parser, the parsed rule is
\begin{quote}\begin{verbatim}
        $accept : expr $end
\end{verbatim}\end{quote}

This is not an actual grammar rule, but a starting rule automatically
added by TP Yacc. In general, it has the format
\begin{quote}\begin{verbatim}
        $accept : X $end
\end{verbatim}\end{quote}
where \verb"X" is the start nonterminal of the grammar, and \verb"$end" is
a pseudo token denoting end-of-input (the \verb"$end" symbol is used by the
parser to determine when it has successfully parsed the input).

The description of the start rule in state 0,
\begin{quote}\begin{verbatim}
        $accept : _ expr $end
\end{verbatim}\end{quote}
with the underscore positioned before the \verb"expr" symbol, indicates that
we are at the beginning of the parse and are ready to parse an expression
(nonterminal \verb"expr").

The parser maintains a stack to keep track of states visited during the
parse. There are two basic kinds of actions in each state: {\em shift\/},
which reads an input symbol and pushes the corresponding next state on top of
the stack, and {\em reduce\/} which pops a number of states from the stack
(corresponding to the number of right-hand side symbols of the rule used
in the reduction) and consults the {\em goto\/} entries of the uncovered
state to find the transition corresponding to the left-hand side symbol of the
reduced rule.

In each step of the parse, the parser is in a given state (the state on
top of its stack) and may consult the current {\em lookahead symbol\/}, the
next symbol in the input, to determine the parse action -- shift or reduce --
to perform. The parser terminates as soon as it reaches state 1 and reads
in the endmarker, indicated by the {\em accept\/} action on \verb"$end" in
state 1.

Sometimes the parser may also carry out an action without inspecting the
current lookahead token. This is the case, e.g., in state 3 where the
only action is reduction by rule 4:
\begin{quote}\begin{verbatim}
        .       reduce 4
\end{verbatim}\end{quote}

The default action in a state can also be {\em error\/} indicating that any
other input represents a syntax error. (In case of such an error the
parser will start syntactic error recovery, as described in Section
{\em Error Handling\/}.)

Now let us see how the parser responds to a given input. We consider the
input string \verb"2+5*3" which is presented to the parser as the token
sequence:
\begin{quote}\begin{verbatim}
   NUM + NUM * NUM
\end{verbatim}\end{quote}

Table \ref{tab2} traces the corresponding actions of the parser. We also
show the current state in each move, and the remaining states on the stack.

\begin{table*}\centering
   \begin{tabular}{l|l|l|p{8cm}}
      \hline\hline
      {\sc State}& {\sc Stack}& {\sc Lookahead}& {\sc Action}\\
      \hline
0 &               &    \verb"NUM"    &    shift state 3\\
3 &     0         &                  &    reduce rule 4 (pop 1 state, uncovering state
                                          0, then goto state 1 on symbol \verb"expr")\\
1 &     0         &    \verb"+"      &    shift state 5\\
5 &     1 0       &    \verb"NUM"    &    shift state 3\\
3 &     5 1 0     &                  &    reduce rule 4 (pop 1 state, uncovering state
                                          5, then goto state 8 on symbol \verb"expr")\\
8 &     5 1 0     &    \verb"*"      &    shift 4\\
4 &     8 5 1 0   &    \verb"NUM"    &    shift 3\\
3 &     4 8 5 1 0 &                  &    reduce rule 4 (pop 1 state, uncovering state
                                          4, then goto state 7 on symbol \verb"expr")\\
7 &     4 8 5 1 0 &                  &    reduce rule 2 (pop 3 states, uncovering state
                                          5, then goto state 8 on symbol \verb"expr")\\
8 &     5 1 0     &    \verb"$end"   &    reduce rule 1 (pop 3 states, uncovering state
                                          0, then goto state 1 on symbol \verb"expr")\\
1 &     0         &    \verb"$end"   &    accept\\
      \hline
   \end{tabular}
   \caption{Parse of \protect\verb"NUM + NUM * NUM".}
   \label{tab2}
\end{table*}

It is also instructive to see how the parser responds to illegal inputs.
E.g., you may try to figure out what the parser does when confronted with:
\begin{quote}\begin{verbatim}
   NUM + )
\end{verbatim}\end{quote}
or:
\begin{quote}\begin{verbatim}
   ( NUM * NUM
\end{verbatim}\end{quote}

You will find that the parser, sooner or later, will always run into an
error action when confronted with errorneous inputs. An LALR parser will
never shift an invalid symbol and thus will always find syntax errors as
soon as it is possible during a left-to-right scan of the input.

TP Yacc provides a debugging option (\verb"-d") that may be used to trace
the actions performed by the parser. When a grammar is compiled with the
\verb"-d" option, the generated parser will print out the actions as it
parses its input.

\subsection{Ambigious Grammars}

There are situations in which TP Yacc will not produce a valid parser for
a given input language. LALR(1) parsers are restricted to one-symbol
lookahead on which they have to base their parsing decisions. If a
grammar is ambigious, or cannot be parsed unambigiously using one-symbol
lookahead, TP Yacc will generate parsing conflicts when constructing the
parse table. There are two types of such conflicts: {\em shift/reduce
conflicts\/} (when there is both a shift and a reduce action for a given
input symbol in a given state), and {\em reduce/reduce\/} conflicts (if
there is more than one reduce action for a given input symbol in a given
state). Note that there never will be a shift/shift conflict.

When a grammar generates parsing conflicts, TP Yacc prints out the number
of shift/reduce and reduce/reduce conflicts it encountered when constructing
the parse table. However, TP Yacc will still generate the output code for the
parser. To resolve parsing conflicts, TP Yacc uses the following built-in
disambiguating rules:

\begin{itemize}
   \item
      in a shift/reduce conflict, TP Yacc chooses the shift action.
   \item
      in a reduce/reduce conflict, TP Yacc chooses reduction of the first
      grammar rule.
\end{itemize}

The shift/reduce disambiguating rule correctly resolves a type of
ambiguity known as the ``dangling-else ambiguity'' which arises in the
syntax of conditional statements of many programming languages (as in
Pascal):

\begin{quote}\begin{verbatim}
%token IF THEN ELSE
%%
stmt : IF expr THEN stmt
     | IF expr THEN stmt ELSE stmt
     ;
\end{verbatim}\end{quote}

This grammar is ambigious, because a nested construct like
\begin{quote}\begin{verbatim}
   IF expr-1 THEN IF expr-2 THEN stmt-1
     ELSE stmt-2
\end{verbatim}\end{quote}
can be parsed two ways, either as:
\begin{quote}\begin{verbatim}
   IF expr-1 THEN ( IF expr-2 THEN stmt-1
     ELSE stmt-2 )
\end{verbatim}\end{quote}
or as:
\begin{quote}\begin{verbatim}
   IF expr-1 THEN ( IF expr-2 THEN stmt-1 )
     ELSE stmt-2
\end{verbatim}\end{quote}

The first interpretation makes an \verb"ELSE" belong to the last unmatched
\verb"IF" which also is the interpretation chosen in most programming
languages. This is also the way that a TP Yacc-generated parser will parse
the construct since the shift/reduce disambiguating rule has the effect of
neglecting the reduction of \verb"IF expr-2 THEN stmt-1"; instead, the parser
will shift the \verb"ELSE" symbol which eventually leads to the reduction of
\verb"IF expr-2 THEN stmt-1 ELSE stmt-2".

The reduce/reduce disambiguating rule is used to resolve conflicts that
arise when there is more than one grammar rule matching a given construct.
Such ambiguities are often caused by ``special case constructs'' which may be
given priority by simply listing the more specific rules ahead of the more
general ones.

For instance, the following is an excerpt from the grammar describing the
input language of the UNIX equation formatter EQN:

\begin{quote}\begin{verbatim}
%right SUB SUP
%%
expr : expr SUB expr SUP expr
     | expr SUB expr
     | expr SUP expr
     ;
\end{verbatim}\end{quote}

Here, the \verb"SUB" and \verb"SUP" operator symbols denote sub- and
superscript, respectively. The rationale behind this example is that
an expression involving both sub- and superscript is often set differently
from a superscripted subscripted expression (compare $x_i^n$ to ${x_i}^n$).
This special case is therefore caught by the first rule in the above example
which causes a reduce/reduce conflict with rule 3 in expressions like
\verb"expr-1 SUB expr-2 SUP expr-3". The conflict is resolved in favour of
the first rule.

In both cases discussed above, the ambiguities could also be eliminated
by rewriting the grammar accordingly (although this yields more complicated
and less readable grammars). This may not always be the case. Often
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -