📄 tply.doc

📁 Yacc例子代码
💻 DOC
📖 第 1 页 / 共 4 页
字号:

%start x y
%%
<x>a    start(y);
<y>b    start(x);
%%
begin
  start(x); if yylex=0 then ;
end.

Upon initialization, the lexical analyzer is put into state x. It then
proceeds in state x until it matches an a which puts it into state y.
In state y it may match a b which puts it into state x again, etc.

Start conditions are useful when certain constructs have to be analyzed
differently depending on some left context (such as a special character
at the beginning of the line), and if multiple lexical analyzers have to
work in concert. If a rule is not prefixed with a start condition, it is
valid in all user-defined start states, as well as in the lexical analyzer's
default start state.


Lex Library
-----------

The TP Lex library (LexLib) unit provides various variables and routines
which are used by Lex-generated lexical analyzers and application programs.
It provides the input and output streams and other internal data structures
used by the lexical analyzer routine, and supplies some variables and utility
routines which may be used by actions and application programs. Refer to
the file lexlib.pas for a closer description.

You can also modify the Lex library unit (and/or the code template in the
yylex.cod file) to customize TP Lex to your target applications. E.g.,
you might wish to optimize the code of the lexical analyzer for some
special application, make the analyzer read from/write to memory instead
of files, etc.


Implementation Restrictions
---------------------------

Internal table sizes and the main memory available limit the complexity of
source grammars that TP Lex can handle. There is currently no possibility to
change internal table sizes (apart from modifying the sources of TP Lex
itself), but the maximum table sizes provided by TP Lex seem to be large
enough to handle most realistic applications. The actual table sizes depend on
the particular implementation (they are much larger than the defaults if TP
Lex has been compiled with one of the 32 bit compilers such as Delphi 2 or
Free Pascal), and are shown in the statistics printed by TP Lex when a
compilation is finished. The units given there are "p" (positions, i.e. items
in the position table used to construct the DFA), "s" (DFA states) and "t"
(transitions of the generated DFA).

As implemented, the generated DFA table is stored as a typed array constant
which is inserted into the yylex.cod code template. The transitions in each
state are stored in order. Of course it would have been more efficient to
generate a big CASE statement instead, but I found that this may cause
problems with the encoding of large DFA tables because Turbo Pascal has
a quite rigid limit on the code size of individual procedures. I decided to
use a scheme in which transitions on different symbols to the same state are
merged into one single transition (specifying a character set and the
corresponding next state). This keeps the number of transitions in each state
quite small and still allows a fairly efficient access to the transition
table.

The TP Lex program has an option (-o) to optimize DFA tables. This causes a
minimal DFA to be generated, using the algorithm described in Aho, Sethi,
Ullman (1986). Although the absolute limit on the number of DFA states that TP
Lex can handle is at least 300, TP Lex poses an additional restriction (100)
on the number of states in the initial partition of the DFA optimization
algorithm. Thus, you may get a fatal `integer set overflow' message when using
the -o option even when TP Lex is able to generate an unoptimized DFA. In such
cases you will just have to be content with the unoptimized DFA. (Hopefully,
this will be fixed in a future version. Anyhow, using the merged transitions
scheme described above, TP Lex usually constructs unoptimized DFA's which are
not far from being optimal, and thus in most cases DFA optimization won't have
a great impact on DFA table sizes.)


Differences from UNIX Lex
-------------------------

Major differences between TP Lex and UNIX Lex are listed below.

- TP Lex produces output code for Turbo Pascal, rather than for C.

- Character tables (%T) are not supported; neither are any directives
  to determine internal table sizes (%p, %n, etc.).

- Library routines are named differently from the UNIX version (e.g.,
  the `start' routine takes the place of the `BEGIN' macro of UNIX
  Lex), and, of course, all macros of UNIX Lex (ECHO, REJECT, etc.) had
  to be implemented as procedures.

- The TP Lex library unit starts counting line numbers at 0, incrementing
  the count BEFORE a line is read (in contrast, UNIX Lex initializes
  yylineno to 1 and increments it AFTER the line end has been read). This
  is motivated by the way in which TP Lex maintains the current line,
  and will not affect your programs unless you explicitly reset the
  yylineno value (e.g., when opening a new input file). In such a case
  you should set yylineno to 0 rather than 1.




TP Yacc
=======

This section describes the TP Yacc compiler compiler.


Usage
-----

yacc [options] yacc-file[.y] [output-file[.pas]]


Options
-------

-v  "Verbose:" TP Yacc generates a readable description of the generated
    parser, written to yacc-file with new extension .lst.

-d  "Debug:" TP Yacc generates parser with debugging output.


Description
-----------

TP Yacc is a program that lets you prepare parsers from the description
of input languages by BNF-like grammars. You simply specify the grammar
for your target language, augmented with the Turbo Pascal code necessary
to process the syntactic constructs, and TP Yacc translates your grammar
into the Turbo Pascal code for a corresponding parser subroutine named
yyparse.

TP Yacc parses the source grammar contained in yacc-file (with default
suffix .y) and writes the constructed parser subroutine to the specified
output-file (with default suffix .pas); if no output file is specified,
output goes to yacc-file with new suffix .pas. If any errors are found
during compilation, error messages are written to the list file (yacc-file
with new suffix .lst).

The generated parser routine, yyparse, is declared as:

   function yyparse : Integer;

This routine may be called by your main program to execute the parser.
The return value of the yyparse routine denotes success or failure of
the parser (possible return values: 0 = success, 1 = unrecoverable syntax
error or parse stack overflow).

Similar to TP Lex, the code template for the yyparse routine may be found in
the yyparse.cod file. The rules for locating this file are analogous to those
of TP Lex (see Section `TP Lex').

The TP Yacc library (YaccLib) unit is required by programs using Yacc-
generated parsers; you will therefore have to put an appropriate uses clause
into your program or unit that contains the parser routine. The YaccLib unit
also provides some routines which may be used to control the actions of the
parser. See the file yacclib.pas for further information.


Yacc Source
-----------

A TP Yacc program consists of three sections separated with the %% delimiter:

definitions
%%
rules
%%
auxiliary procedures


The TP Yacc language is free-format: whitespace (blanks, tabs and newlines)
is ignored, except if it serves as a delimiter. Comments have the C-like
format /* ... */. They are treated as whitespace. Grammar symbols are denoted
by identifiers which have the usual form (letter, including underscore,
followed by a sequence of letters and digits; upper- and lowercase is
distinct). The TP Yacc language also has some keywords which always start
with the % character. Literals are denoted by characters enclosed in single
quotes. The usual C-like escapes are recognized:

\n     denotes newline
\r     denotes carriage return
\t     denotes tab
\b     denotes backspace
\f     denotes form feed
\NNN   denotes character no. NNN in octal base


Definitions
-----------

The first section of a TP Yacc grammar serves to define the symbols used in
the grammar. It may contain the following types of definitions:

- start symbol definition: A definition of the form

     %start symbol

  declares the start nonterminal of the grammar (if this definition is
  omitted, TP Yacc assumes the left-hand side nonterminal of the first
  grammar rule as the start symbol of the grammar).

- terminal definitions: Definitions of the form

     %token symbol ...

  are used to declare the terminal symbols ("tokens") of the target
  language. Any identifier not introduced in a %token definition will
  be treated as a nonterminal symbol.

  As far as TP Yacc is concerned, tokens are atomic symbols which do not
  have an innert structure. A lexical analyzer must be provided which
  takes on the task of tokenizing the input stream and return the
  individual tokens and literals to the parser (see Section `Lexical
  Analysis').

- precedence definitions: Operator symbols (terminals) may be associated
  with a precedence by means of a precedence definition which may have
  one of the following forms

     %left symbol ...
     %right symbol ...
     %nonassoc symbol ...

  which are used to declare left-, right- and nonassociative operators,
  respectively. Each precedence definition introduces a new precedence
  level, lowest precedence first. E.g., you may write:

     %nonassoc '<' '>' '=' GEQ LEQ NEQ  /* relational operators */
     %left     '+' '-'  OR              /* addition operators */
     %left     '*' '/' AND              /* multiplication operators */
     %right    NOT UMINUS               /* unary operators */

  A terminal identifier introduced in a precedence definition may, but
  need not, appear in a %token definition as well.

- type definitions: Any (terminal or nonterminal) grammar symbol may be
  associated with a type identifier which is used in the processing of
  semantic values. Type tags of the form <name> may be used in token and
  precedence definitions to declare the type of a terminal symbol, e.g.:

     %token <Real>  NUM
     %left  <AddOp> '+' '-'

  To declare the type of a nonterminal symbol, use a type definition of
  the form:

     %type <name> symbol ...

  e.g.:

     %type <Real> expr

  In a %type definition, you may also omit the nonterminals, i.e. you
  may write:

     %type <name>

  This is useful when a given type is only used with type casts (see
  Section `Grammar Rules and Actions'), and is not associated with a
  specific nonterminal.

- Turbo Pascal declarations: You may also include arbitrary Turbo Pascal
  code in the definitions section, enclosed in %{ %}. This code will be
  inserted as global declarations into the output file, unchanged.


Grammar Rules and Actions
-------------------------

The second part of a TP Yacc grammar contains the grammar rules for the
target language. Grammar rules have the format

   name : symbol ... ;

The left-hand side of a rule must be an identifier (which denotes a
nonterminal symbol). The right-hand side may be an arbitrary (possibly
empty) sequence of nonterminal and terminal symbols (including literals
enclosed in single quotes). The terminating semicolon may also be omitted.
Different rules for the same left-hand side symbols may be written using
the | character to separate the different alternatives:

   name : symbol ...
        | symbol ...
        ...
        ;

For instance, to specify a simple grammar for arithmetic expressions, you
may write:

%left '+' '-'
%left '*' '/'
%token NUM
%%
expr : expr '+' expr
     | expr '-' expr
     | expr '*' expr
     | expr '/' expr
     | '(' expr ')'
     | NUM
     ;

(The %left definitions at the beginning of the grammar are needed to specify
the precedence and associativity of the operator symbols. This will be
discussed in more detail in Section `Ambigious Grammars'.)

Grammar rules may contain actions - Turbo Pascal statements enclosed in
{ } - to be executed as the corresponding rules are recognized. Furthermore,
rules may return values, and access values returned by other rules. These
"semantic" values are written as $$ (value of the left-hand side nonterminal)
and $i (value of the ith right-hand side symbol). They are kept on a special
value stack which is maintained automatically by the parser.

Values associated with terminal symbols must be set by the lexical analyzer
(more about this in Section `Lexical Analysis'). Actions of the form $$ := $1
can frequently be omitted, since it is the default action assumed by TP Yacc
for any rule that does not have an explicit action.

By default, the semantic value type provided by Yacc is Integer. You can
also put a declaration like

   %{
   type YYSType = Real;
   %}

into the definitions section of your Yacc grammar to change the default value
type. However, if you have different value types, the preferred method is to
use type definitions as discussed in Section `Definitions'. When such type
definitions are given, TP Yacc handles all the necessary details of the
YYSType definition and also provides a fair amount of type checking which
makes it easier to find type errors in the grammar.

For instance, we may declare the symbols NUM and expr in the example above
to be of type Real, and then use these values to evaluate an expression as
it is parsed.

%left '+' '-'
%left '*' '/'
%token <Real> NUM
%type  <Real> expr
%%
expr : expr '+' expr   { $$ := $1+$3; }
     | expr '-' expr   { $$ := $1-$3; }
     | expr '*' expr   { $$ := $1*$3; }
     | expr '/' expr   { $$ := $1/$3; }
     | '(' expr ')'    { $$ := $2;    }
     | NUM
     ;

(Note that we omitted the action of the last rule. The "copy action"
$$ := $1 required by this rule is automatically added by TP Yacc.)

Actions may not only appear at the end, but also in the middle of a rule
which is useful to perform some processing before a rule is fully parsed.
Such actions inside a rule are treated as special nonterminals which are
associated with an empty right-hand side. Thus, a rule like

   x : y { action; } z

will be treated as:

  x : y $act z
  $act : { action; }

Actions inside a rule may also access values to the left of the action,
and may return values by assigning to the $$ value. The value returned
by such an action can then be accessed by other actions using the usual $i
notation. E.g., we may write:

   x : y { $$ := 2*$1; } z { $$ := $2+$3; }

which has the effect of setting the value of x to

   2*(the value of y)+(the value of z).

Sometimes it is desirable to access values in enclosing rules. This can be
done using the notation $i with i<=0. $0 refers to the first value "to the
left" of the current rule, $-1 to the second, and so on. Note that in this
case the referenced value depends on the actual contents of the parse stack,
so you have to make sure that the requested values are always where you
expect them.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -