📄 tply.tex

📁 Yacc例子代码
💻 TEX
📖 第 1 页 / 共 5 页
字号:
      terminal definitions: Definitions of the form
      \begin{quote}\begin{verbatim}
   %token symbol ...
      \end{verbatim}\end{quote}
      are used to declare the terminal symbols (``tokens'') of the target
      language. Any identifier not introduced in a \verb"%token" definition
      will be treated as a nonterminal symbol.
    
      As far as TP Yacc is concerned, tokens are atomic symbols which do not
      have an innert structure. A lexical analyzer must be provided which
      takes on the task of tokenizing the input stream and return the
      individual tokens and literals to the parser (see Section {\em Lexical
      Analysis\/}).
   \item
      precedence definitions: Operator symbols (terminals) may be associated
      with a precedence by means of a precedence definition which may have
      one of the following forms
      \begin{quote}\begin{verbatim}
   %left symbol ...
   %right symbol ...
   %nonassoc symbol ...
      \end{verbatim}\end{quote}
      which are used to declare left-, right- and nonassociative operators,
      respectively. Each precedence definition introduces a new precedence
      level, lowest precedence first. E.g., you may write:
      \begin{quote}\begin{verbatim}
   %nonassoc '<' '>' '=' GEQ LEQ NEQ
      /* relational operators */
   %left     '+' '-'  OR
      /* addition operators */
   %left     '*' '/' AND
     /* multiplication operators */
   %right    NOT UMINUS
     /* unary operators */
      \end{verbatim}\end{quote}

      A terminal identifier introduced in a precedence definition may, but
      need not, appear in a \verb"%token" definition as well.
   \item
      type definitions: Any (terminal or nonterminal) grammar symbol may be
      associated with a type identifier which is used in the processing of
      semantic values. Type tags of the form \verb"<name>" may be used in
      token and precedence definitions to declare the type of a terminal
      symbol, e.g.:
      \begin{quote}\begin{verbatim}
   %token <Real>  NUM
   %left  <AddOp> '+' '-'
      \end{verbatim}\end{quote}

      To declare the type of a nonterminal symbol, use a type definition of
      the form:
      \begin{quote}\begin{verbatim}
   %type <name> symbol ...
      \end{verbatim}\end{quote}
      e.g.:
      \begin{quote}\begin{verbatim}
   %type <Real> expr
      \end{verbatim}\end{quote}

      In a \verb"%type" definition, you may also omit the nonterminals, i.e.
      you may write:
      \begin{quote}\begin{verbatim}
   %type <name>
      \end{verbatim}\end{quote}

      This is useful when a given type is only used with type casts (see
      Section {\em Grammar Rules and Actions\/}), and is not associated with
      a specific nonterminal.
   \item
      Turbo Pascal declarations: You may also include arbitrary Turbo Pascal
      code in the definitions section, enclosed in \verb"%{ %}". This code
      will be inserted as global declarations into the output file, unchanged.
\end{itemize}

\subsection{Grammar Rules and Actions}

The second part of a TP Yacc grammar contains the grammar rules for the
target language. Grammar rules have the format

\begin{quote}\begin{verbatim}
   name : symbol ... ;
\end{verbatim}\end{quote}

The left-hand side of a rule must be an identifier (which denotes a
nonterminal symbol). The right-hand side may be an arbitrary (possibly
empty) sequence of nonterminal and terminal symbols (including literals
enclosed in single quotes). The terminating semicolon may also be omitted.
Different rules for the same left-hand side symbols may be written using
the \verb"|" character to separate the different alternatives:

\begin{quote}\begin{verbatim}
   name : symbol ...
        | symbol ...
        ...
        ;
\end{verbatim}\end{quote}

For instance, to specify a simple grammar for arithmetic expressions, you
may write:

\begin{quote}\begin{verbatim}
%left '+' '-'
%left '*' '/'
%token NUM
%%
expr : expr '+' expr
     | expr '-' expr
     | expr '*' expr
     | expr '/' expr
     | '(' expr ')'
     | NUM
     ;
\end{verbatim}\end{quote}

(The \verb"%left" definitions at the beginning of the grammar are needed to
specify the precedence and associativity of the operator symbols. This will be
discussed in more detail in Section {\em Ambigious Grammars\/}.)

Grammar rules may contain actions -- Turbo Pascal statements enclosed in
\verb"{ }" -- to be executed as the corresponding rules are recognized.
Furthermore, rules may return values, and access values returned by other
rules. These ``semantic'' values are written as \verb"$$" (value of the
left-hand side nonterminal) and \verb"$i" (value of the $i$th right-hand
side symbol). They are kept on a special value stack which is maintained
automatically by the parser.

Values associated with terminal symbols must be set by the lexical analyzer
(more about this in Section {\em Lexical Analysis\/}). Actions of the form
\verb"$$ := $1" can frequently be omitted, since it is the default action
assumed by TP Yacc for any rule that does not have an explicit action.

By default, the semantic value type provided by Yacc is \verb"Integer". You
can also put a declaration like
\begin{quote}\begin{verbatim}
   %{
   type YYSType = Real;
   %}
\end{verbatim}\end{quote}
into the definitions section of your Yacc grammar to change the default value
type. However, if you have different value types, the preferred method is to
use type definitions as discussed in Section {\em Definitions\/}. When such
type definitions are given, TP Yacc handles all the necessary details of the
\verb"YYSType" definition and also provides a fair amount of type checking
which makes it easier to find type errors in the grammar.

For instance, we may declare the symbols \verb"NUM" and \verb"expr" in the
example above to be of type \verb"Real", and then use these values to
evaluate an expression as it is parsed.

\begin{quote}\begin{verbatim}
%left '+' '-'
%left '*' '/'
%token <Real> NUM
%type  <Real> expr
%%
expr : expr '+' expr   { $$ := $1+$3; }
     | expr '-' expr   { $$ := $1-$3; }
     | expr '*' expr   { $$ := $1*$3; }
     | expr '/' expr   { $$ := $1/$3; }
     | '(' expr ')'    { $$ := $2;    }
     | NUM
     ;
\end{verbatim}\end{quote}

(Note that we omitted the action of the last rule. The ``copy action''
\verb"$$ := $1" required by this rule is automatically added by TP Yacc.)

Actions may not only appear at the end, but also in the middle of a rule
which is useful to perform some processing before a rule is fully parsed.
Such actions inside a rule are treated as special nonterminals which are
associated with an empty right-hand side. Thus, a rule like
\begin{quote}\begin{verbatim}
   x : y { action; } z
\end{verbatim}\end{quote}
will be treated as:
\begin{quote}\begin{verbatim}
  x : y $act z
  $act : { action; }
\end{verbatim}\end{quote}

Actions inside a rule may also access values to the left of the action,
and may return values by assigning to the \verb"$$" value. The value returned
by such an action can then be accessed by other actions using the usual
\verb"$i" notation. E.g., we may write:
\begin{quote}\begin{verbatim}
   x : y { $$ := 2*$1; } z { $$ := $2+$3; }
\end{verbatim}\end{quote}
which has the effect of setting the value of \verb"x" to
\begin{quote}\begin{verbatim}
   2*(the value of y)+(the value of z).
\end{verbatim}\end{quote}

Sometimes it is desirable to access values in enclosing rules. This can be
done using the notation \verb"$i" with $i\leq 0$. \verb"$0" refers to the
first value ``to the left'' of the current rule, \verb"$-1" to the second,
and so on. Note that in this case the referenced value depends on the actual
contents of the parse stack, so you have to make sure that the requested
values are always where you expect them.

There are some situations in which TP Yacc cannot easily determine the
type of values (when a typed parser is used). This is true, in particular,
for values in enclosing rules and for the \verb"$$" value in an action inside
a rule. In such cases you may use a type cast to explicitly specify the type
of a value. The format for such type casts is \verb"$<name>$" (for left-hand
side values) and \verb"$<name>i" (for right-hand side values) where
\verb"name" is a type identifier (which must occur in a \verb"%token",
precedence or \verb"%type" definition).

\subsection{Auxiliary Procedures}

The third section of a TP Yacc program is optional. If it is present, it
may contain any Turbo Pascal code (such as supporting routines or a main
program) which is tacked on to the end of the output file.

\subsection{Lexical Analysis}

For any TP Yacc-generated parser, the programmer must supply a lexical
analyzer routine named \verb"yylex" which performs the lexical analysis for
the parser. This routine must be declared as

\begin{quote}\begin{verbatim}
   function yylex : Integer;
\end{verbatim}\end{quote}

The \verb"yylex" routine may either be prepared by hand, or by using the
lexical analyzer generator TP Lex (see Section {\em TP Lex\/}).

The lexical analyzer must be included in your main program behind the
parser subroutine (the \verb"yyparse" code template includes a forward
definition of the \verb"yylex" routine such that the parser can access the
lexical analyzer). For instance, you may put the lexical analyzer
routine into the auxiliary procedures section of your TP Yacc grammar,
either directly, or by using the the Turbo Pascal include directive
(\verb"$I").

The parser repeatedly calls the \verb"yylex" routine to tokenize the input
stream and obtain the individual lexical items in the input. For any
literal character, the \verb"yylex" routine has to return the corresponding
character code. For the other, symbolic, terminals of the input language,
the lexical analyzer must return corresponding integer codes. These are
assigned automatically by TP Yacc in the order in which token definitions
appear in the definitions section of the source grammar. The lexical
analyzer can access these values through corresponding integer constants
which are declared by TP Yacc in the output file.

For instance, if
\begin{quote}\begin{verbatim}
   %token NUM
\end{verbatim}\end{quote}
is the first definition in the Yacc grammar, then TP Yacc will create
a corresponding constant declaration
\begin{quote}\begin{verbatim}
   const NUM = 257;
\end{verbatim}\end{quote}
in the output file (TP Yacc automatically assigns symbolic token numbers
starting at 257; 1 thru 255 are reserved for character literals, 0 denotes
end-of-file, and 256 is reserved for the special error token which will be
discussed in Section {\em Error Handling\/}). This definition may then be
used, e.g., in a corresponding TP Lex program as follows:
\begin{quote}\begin{verbatim}
   [0-9]+   return(NUM);
\end{verbatim}\end{quote}

You can also explicitly assign token numbers in the grammar. For this
purpose, the first occurrence of a token identifier in the definitions
section may be followed by an unsigned integer. E.g. you may write:
\begin{quote}\begin{verbatim}
   %token NUM 299
\end{verbatim}\end{quote}

Besides the return value of \verb"yylex", the lexical analyzer routine may
also return an additional semantic value for the recognized token. This value
is assigned to a variable named \verb"yylval" and may then be accessed in
actions through the \verb"$i" notation (see above, Section {\em Grammar
Rules and Actions\/}). The \verb"yylval" variable is of type \verb"YYSType"
(the semantic value type, \verb"Integer" by default); its declaration may be
found in the \verb"yyparse.cod" file.

For instance, to assign an \verb"Integer" value to a \verb"NUM" token in the
above example, we may write:

\begin{quote}\begin{verbatim}
   [0-9]+   begin
              val(yytext, yylval, code);
              return(NUM);
            end;
\end{verbatim}\end{quote}

This assigns \verb"yylval" the value of the \verb"NUM" token (using the Turbo
Pascal standard procedure \verb"val").

If a parser uses tokens of different types (via a \verb"%token <name>"
definition), then the \verb"yylval" variable will not be of type
\verb"Integer", but instead of a corresponding variant record type which is
capable of holding all the different value types declared in the TP Yacc
grammar. In this case, the lexical analyzer must assign a semantic value to
the corresponding record component which is named \verb"yy"{\em name\/}
(where {\em name\/} stands for the corresponding type identifier).

E.g., if token \verb"NUM" is declared \verb"Real":
\begin{quote}\begin{verbatim}
   %token <Real> NUM
\end{verbatim}\end{quote}
then the value for token \verb"NUM" must be assigned to \verb"yylval.yyReal".

\subsection{How The Parser Works}

TP Yacc uses the LALR(1) technique developed by Donald E.\ Knuth and F.\
DeRemer to construct a simple, efficient, non-backtracking bottom-up
parser for the source grammar. The LALR parsing technique is described
in detail in Aho/Sethi/Ullman (1986). It is quite instructive to take a
look at the parser description TP Yacc generates from a small sample
grammar, to get an idea of how the LALR parsing algorithm works. We
consider the following simplified version of the arithmetic expression
grammar:
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -