📄 tply.doc

📁 Yacc例子代码
💻 DOC
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页


      TP Lex and Yacc - The Compiler Writer's Tools for Turbo Pascal
      == === === ==== = === ======== ======== ===== === ===== ======

                     Version 4.1 User Manual
                     ======= === ==== ======

                         Albert Graef
                 Department of Musicinformatics
               Johannes Gutenberg-University Mainz

               ag@muwiinfa.geschichte.uni-mainz.de

                          April 1998


Introduction
============

This document describes the TP Lex and Yacc compiler generator toolset. These
tools are designed especially to help you prepare compilers and similar
programs like text processing utilities and command language interpreters with
the Turbo Pascal (TM) programming language.

TP Lex and Yacc are Turbo Pascal adaptions of the well-known UNIX (TM)
utilities Lex and Yacc, which were written by M.E. Lesk and S.C. Johnson at
Bell Laboratories, and are used with the C programming language. TP Lex and
Yacc are intended to be approximately "compatible" with these programs.
However, they are an independent development of the author, based on the
techniques described in the famous "dragon book" of Aho, Sethi and Ullman
(Aho, Sethi, Ullman: "Compilers : principles, techniques and tools," Reading
(Mass.), Addison-Wesley, 1986).

Version 4.1 of TP Lex and Yacc works with all recent flavours of Turbo/Borland
Pascal, including Delphi, and with the Free Pascal Compiler, a free Turbo
Pascal-compatible compiler which currently runs on DOS and Linux (other ports
are under development). Recent information about TP Lex/Yacc, and the sources
are available from the TPLY homepage:

   http://www.musikwissenschaft.uni-mainz.de/~ag/tply

For information about the Free Pascal Compiler, please refer to:

   http://www.freepascal.org

TP Lex and Yacc, like any other tools of this kind, are not intended for
novices or casual programmers; they require extensive programming experience
as well as a thorough understanding of the principles of parser design and
implementation to be put to work successfully. But if you are a seasoned Turbo
Pascal programmer with some background in compiler design and formal language
theory, you will almost certainly find TP Lex and Yacc to be a powerful
extension of your Turbo Pascal toolset.

This manual tells you how to get started with the TP Lex and Yacc programs and
provides a short description of these programs. Some knowledge about the C
versions of Lex and Yacc will be useful, although not strictly necessary. For
further reading, you may also refer to:

- Aho, Sethi and Ullman: "Compilers : principles, techniques and tools."
  Reading (Mass.), Addison-Wesley, 1986.

- Johnson, S.C.: "Yacc - yet another compiler-compiler." CSTR-32, Bell
  Telephone Laboratories, 1974.

- Lesk, M.E.: "Lex - a lexical analyser generator." CSTR-39, Bell Telephone
  Laboratories, 1975.

- Schreiner, Friedman: "Introduction to compiler construction with UNIX."
  Prentice-Hall, 1985.

- The Unix Programmer's Manual, Sections `Lex' and `Yacc'.


Credits
-------

I would like to thank Berend de Boer (berend@pobox.com), who adapted TP Lex
and Yacc to take advantage of the large memory models in Borland Pascal 7.0
and Delphi, and Michael Van Canneyt (Michael.VanCanneyt@fys.kuleuven.ac.be),
the maintainer of the Linux version of the Free Pascal compiler, who is
responsible for the Free Pascal port. And of course thanks are due to the many
TP Lex/Yacc users all over the world for their support and comments which
helped to improve these programs.


Getting Started
---------------

Instructions on how to compile and install TP Lex and Yacc on all supported
platforms can be found in the README file contained in the distribution.

Once you have installed TP Lex and Yacc on your system, you can compile your
first TP Lex and Yacc program expr. Expr is a simple desktop calculator
program contained in the distribution, which consists of a lexical analyzer in
the TP Lex source file exprlex.l and the parser and main program in the TP
Yacc source file expr.y. To compile these programs, issue the commands

   lex exprlex
   yacc expr

That's it! You now have the Turbo Pascal sources (exprlex.pas and expr.pas)
for the expr program. Use the Turbo Pascal compiler to compile these programs
as usual:

   tpc expr

(Of course, the precise compilation command depends on the type of compiler
you are using. Thus you may have to replace tpc with bpc, dcc or dcc32,
depending on the version of the Turbo/Borland/Delphi compiler you have, and
with ppc386 for the Free Pascal compiler. If you are using TP Lex and Yacc
with Free Pascal under Linux, the corresponding commands are:

   plex exprlex
   pyacc expr
   ppc386 expr

Note that in the Linux version, the programs are named plex and pyacc to
avoid name clashes with the corresponding UNIX utilities.)

Having compiled expr.pas, you can execute the expr program and type some
expressions to see it work (terminate the program with an empty line). There
is a number of other sample TP Lex and Yacc programs (.l and .y files) in the
distribution, including a TP Yacc cross reference utility and a complete
parser for Standard Pascal.

The TP Lex and Yacc programs recognize some options which may be specified
anywhere on the command line. E.g.,

   lex -o exprlex

runs TP Lex with "DFA optimization" and

   yacc -v expr

runs TP Yacc in "verbose" mode (TP Yacc generates a readable description of
the generated parser).

The TP Lex and Yacc programs use the following default filename extensions:
- .l:   TP Lex input files
- .y:   TP Yacc input files
- .pas: TP Lex and Yacc output files

As usual, you may overwrite default filename extensions by explicitly
specifying suffixes.

If you ever forget how to run TP Lex and Yacc, you can issue the command lex
or yacc (resp. plex or pyacc) without arguments to get a short summary of the
command line syntax.



TP Lex
======

This section describes the TP Lex lexical analyzer generator.


Usage
-----

lex [options] lex-file[.l] [output-file[.pas]]


Options
-------

-v  "Verbose:" Lex generates a readable description of the generated
    lexical analyzer, written to lex-file with new extension `.lst'.

-o  "Optimize:" Lex optimizes DFA tables to produce a minimal DFA.


Description
-----------

TP Lex is a program generator that is used to generate the Turbo Pascal source
code for a lexical analyzer subroutine from the specification of an input
language by a regular expression grammar.

TP Lex parses the source grammar contained in lex-file (with default suffix
.l) and writes the constructed lexical analyzer subroutine to the specified
output-file (with default suffix .pas); if no output file is specified, output
goes to lex-file with new suffix .pas. If any errors are found during
compilation, error messages are written to the list file (lex-file with new
suffix .lst).

The generated output file contains a lexical analyzer routine, yylex,
implemented as:

  function yylex : Integer;

This routine has to be called by your main program to execute the lexical
analyzer. The return value of the yylex routine usually denotes the number
of a token recognized by the lexical analyzer (see the return routine in the
LexLib unit). At end-of-file the yylex routine normally returns 0.

The code template for the yylex routine may be found in the yylex.cod
file. This file is needed by TP Lex when it constructs the output file. It
must be present either in the current directory or in the directory from which
TP Lex was executed (TP Lex searches these directories in the indicated
order). (NB: For the Linux/Free Pascal version, the code template is searched
in some directory defined at compile-time instead of the execution path,
usually /usr/lib/fpc/lexyacc.)

The TP Lex library (LexLib) unit is required by programs using Lex-generated
lexical analyzers; you will therefore have to put an appropriate uses clause
into your program or unit that contains the lexical analyzer routine. The
LexLib unit also provides various useful utility routines; see the file
lexlib.pas for further information.


Lex Source
----------

A TP Lex program consists of three sections separated with the %% delimiter:

definitions
%%
rules
%%
auxiliary procedures

All sections may be empty. The TP Lex language is line-oriented; definitions
and rules are separated by line breaks. There is no special notation for
comments, but (Turbo Pascal style) comments may be included as Turbo Pascal
fragments (see below).

The definitions section may contain the following elements:

- regular definitions in the format:

     name   substitution

  which serve to abbreviate common subexpressions. The {name} notation
  causes the corresponding substitution from the definitions section to
  be inserted into a regular expression. The name must be a legal
  identifier (letter followed by a sequence of letters and digits;
  the underscore counts as a letter; upper- and lowercase are distinct).
  Regular definitions must be non-recursive.

- start state definitions in the format:

     %start name ...

  which are used in specifying start conditions on rules (described
  below). The %start keyword may also be abbreviated as %s or %S.

- Turbo Pascal declarations enclosed between %{ and %}. These will be
  inserted into the output file (at global scope). Also, any line that
  does not look like a Lex definition (e.g., starts with blank or tab)
  will be treated as Turbo Pascal code. (In particular, this also allows
  you to include Turbo Pascal comments in your Lex program.)

The rules section of a TP Lex program contains the actual specification of
the lexical analyzer routine. It may be thought of as a big CASE statement
discriminating over the different patterns to be matched and listing the
corresponding statements (actions) to be executed. Each rule consists of a
regular expression describing the strings to be matched in the input, and a
corresponding action, a Turbo Pascal statement to be executed when the
expression matches. Expression and statement are delimited with whitespace
(blanks and/or tabs). Thus the format of a Lex grammar rule is:

   expression      statement;

Note that the action must be a single Turbo Pascal statement terminated
with a semicolon (use begin ... end for compound statements). The statement
may span multiple lines if the successor lines are indented with at least
one blank or tab. The action may also be replaced by the | character,
indicating that the action for this rule is the same as that for the next
one.

The TP Lex library unit provides various variables and routines which are
useful in the programming of actions. In particular, the yytext string
variable holds the text of the matched string, and the yyleng Byte variable
its length.

Regular expressions are used to describe the strings to be matched in a
grammar rule. They are built from the usual constructs describing character
classes and sequences, and operators specifying repetitions and alternatives.
The precise format of regular expressions is described in the next section.

The rules section may also start with some Turbo Pascal declarations
(enclosed in %{ %}) which are treated as local declarations of the
actions routine.

Finally, the auxiliary procedures section may contain arbitrary Turbo
Pascal code (such as supporting routines or a main program) which is
simply tacked on to the end of the output file. The auxiliary procedures
section is optional.


Regular Expressions
-------------------

The following table summarizes the format of the regular expressions
recognized by TP Lex (also compare Aho, Sethi, Ullman 1986, fig. 3.48).
c stands for a single character, s for a string, r for a regular expression,
and n,m for nonnegative integers.

expression   matches                        example
----------   ----------------------------   -------
c            any non-operator character c   a
\c           character c literally          \*
"s"          string s literally             "**"
.            any character but newline      a.*b
^            beginning of line              ^abc
$            end of line                    abc$
[s]          any character in s             [abc]
[^s]         any character not in s         [^abc]
r*           zero or more r's               a*
r+           one or more r's                a+
r?           zero or one r                  a?
r{m,n}       m to n occurrences of r        a{1,5}
r{m}         m occurrences of r             a{5}
r1r2         r1 then r2                     ab
r1|r2        r1 or r2                       a|b
(r)          r                              (a|b)
r1/r2        r1 when followed by r2         a/b
<x>r         r when in start condition x    <x>abc
---------------------------------------------------

The operators *, +, ? and {} have highest precedence, followed by
concatenation. The | operator has lowest precedence. Parentheses ()
may be used to group expressions and overwrite default precedences.
The <> and / operators may only occur once in an expression.

The usual C-like escapes are recognized:

\n     denotes newline
\r     denotes carriage return
\t     denotes tab
\b     denotes backspace
\f     denotes form feed
\NNN   denotes character no. NNN in octal base

You can also use the \ character to quote characters which would otherwise
be interpreted as operator symbols. In character classes, you may use
the - character to denote ranges of characters. For instance, [a-z]
denotes the class of all lowercase letters.

The expressions in a TP Lex program may be ambigious, i.e. there may be inputs
which match more than one rule. In such a case, the lexical analyzer prefers
the longest match and, if it still has the choice between different rules,
it picks the first of these. If no rule matches, the lexical analyzer
executes a default action which consists of copying the input character
to the output unchanged. Thus, if the purpose of a lexical analyzer is
to translate some parts of the input, and leave the rest unchanged, you
only have to specify the patterns which have to be treated specially. If,
however, the lexical analyzer has to absorb its whole input, you will have
to provide rules that match everything. E.g., you might use the rules

   .   |
   \n  ;

which match "any other character" (and ignore it).

Sometimes certain patterns have to be analyzed differently depending on some
amount of context in which the pattern appears. In such a case the / operator
is useful. For instance, the expression a/b matches a, but only if followed
by b. Note that the b does not belong to the match; rather, the lexical
analyzer, when matching an a, will look ahead in the input to see whether
it is followed by a b, before it declares that it has matched an a. Such
lookahead may be arbitrarily complex (up to the size of the LexLib input
buffer). E.g., the pattern a/.*b matches an a which is followed by a b
somewhere on the same input line. TP Lex also has a means to specify left
context which is described in the next section.


Start Conditions
----------------

TP Lex provides some features which make it possible to handle left context.
The ^ character at the beginning of a regular expression may be used to
denote the beginning of the line. More distant left context can be described
conveniently by using start conditions on rules.

Any rule which is prefixed with the <> construct is only valid if the lexical
analyzer is in the denoted start state. For instance, the expression <x>a
can only be matched if the lexical analyzer is in start state x. You can have
multiple start states in a rule; e.g., <x,y>a can be matched in start states
x or y.

Start states have to be declared in the definitions section by means of
one or more start state definitions (see above). The lexical analyzer enters
a start state through a call to the LexLib routine start. E.g., you may
write:
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -