📄 用yacclex设计实现小型basic语言.txt

📁 本文包含简单中文说明和详细源代码
💻 TXT
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
The return value of the yyparse routine denotes success or failure of

the parser (possible return values: 0 = success, 1 = unrecoverable syntax

error or parse stack overflow).

 

Similar to TP Lex, the code template for the yyparse routine may be found in

the yyparse.cod file. The rules for locating this file are analogous to those

of TP Lex (see Section `TP Lex').

 

The TP Yacc library (YaccLib) unit is required by programs using Yacc-

generated parsers; you will therefore have to put an appropriate uses clause

into your program or unit that contains the parser routine. The YaccLib unit

also provides some routines which may be used to control the actions of the

parser. See the file yacclib.pas for further information.

 

 

Yacc Source

-----------

 

A TP Yacc program consists of three sections separated with the %% delimiter:

 

definitions

%%

rules

%%

auxiliary procedures

 

 

The TP Yacc language is free-format: whitespace (blanks, tabs and newlines)

is ignored, except if it serves as a delimiter. Comments have the C-like

format /* ... */. They are treated as whitespace. Grammar symbols are denoted

by identifiers which have the usual form (letter, including underscore,

followed by a sequence of letters and digits; upper- and lowercase is

distinct). The TP Yacc language also has some keywords which always start

with the % character. Literals are denoted by characters enclosed in single

quotes. The usual C-like escapes are recognized:

 

\n     denotes newline

\r     denotes carriage return

\t     denotes tab

\b     denotes backspace

\f     denotes form feed

\NNN   denotes character no. NNN in octal base

 

 

Definitions

-----------

 

The first section of a TP Yacc grammar serves to define the symbols used in

the grammar. It may contain the following types of definitions:

 

- start symbol definition: A definition of the form

 

     %start symbol

 

  declares the start nonterminal of the grammar (if this definition is

  omitted, TP Yacc assumes the left-hand side nonterminal of the first

  grammar rule as the start symbol of the grammar).

 

- terminal definitions: Definitions of the form

 

     %token symbol ...

 

  are used to declare the terminal symbols ("tokens") of the target

  language. Any identifier not introduced in a %token definition will

  be treated as a nonterminal symbol.

 

  As far as TP Yacc is concerned, tokens are atomic symbols which do not

  have an innert structure. A lexical analyzer must be provided which

  takes on the task of tokenizing the input stream and return the

  individual tokens and literals to the parser (see Section `Lexical

  Analysis').

 

- precedence definitions: Operator symbols (terminals) may be associated

  with a precedence by means of a precedence definition which may have

  one of the following forms

 

     %left symbol ...

     %right symbol ...

     %nonassoc symbol ...

 

  which are used to declare left-, right- and nonassociative operators,

  respectively. Each precedence definition introduces a new precedence

  level, lowest precedence first. E.g., you may write:

 

     %nonassoc '<' '>' '=' GEQ LEQ NEQ  /* relational operators */

     %left     '+' '-'  OR              /* addition operators */

     %left     '*' '/' AND              /* multiplication operators */

     %right    NOT UMINUS               /* unary operators */

 

  A terminal identifier introduced in a precedence definition may, but

  need not, appear in a %token definition as well.

 

- type definitions: Any (terminal or nonterminal) grammar symbol may be

  associated with a type identifier which is used in the processing of

  semantic values. Type tags of the form <name> may be used in token and

  precedence definitions to declare the type of a terminal symbol, e.g.:

 

     %token <Real>  NUM

     %left  <AddOp> '+' '-'

 

  To declare the type of a nonterminal symbol, use a type definition of

  the form:

 

     %type <name> symbol ...

 

  e.g.:

 

     %type <Real> expr

 

  In a %type definition, you may also omit the nonterminals, i.e. you

  may write:

 

     %type <name>

 

  This is useful when a given type is only used with type casts (see

  Section `Grammar Rules and Actions'), and is not associated with a

  specific nonterminal.

 

- Turbo Pascal declarations: You may also include arbitrary Turbo Pascal

  code in the definitions section, enclosed in %{ %}. This code will be

  inserted as global declarations into the output file, unchanged.

 

 

Grammar Rules and Actions

-------------------------

 

The second part of a TP Yacc grammar contains the grammar rules for the

target language. Grammar rules have the format

 

   name : symbol ... ;

 

The left-hand side of a rule must be an identifier (which denotes a

nonterminal symbol). The right-hand side may be an arbitrary (possibly

empty) sequence of nonterminal and terminal symbols (including literals

enclosed in single quotes). The terminating semicolon may also be omitted.

Different rules for the same left-hand side symbols may be written using

the | character to separate the different alternatives:

 

   name : symbol ...

        | symbol ...

        ...

        ;

 

For instance, to specify a simple grammar for arithmetic expressions, you

may write:

 

%left '+' '-'

%left '*' '/'

%token NUM

%%

expr : expr '+' expr

     | expr '-' expr

     | expr '*' expr

     | expr '/' expr

     | '(' expr ')'

     | NUM

     ;

 

(The %left definitions at the beginning of the grammar are needed to specify

the precedence and associativity of the operator symbols. This will be

discussed in more detail in Section `Ambigious Grammars'.)

 

Grammar rules may contain actions - Turbo Pascal statements enclosed in

{ } - to be executed as the corresponding rules are recognized. Furthermore,

rules may return values, and access values returned by other rules. These

"semantic" values are written as $$ (value of the left-hand side nonterminal)

and $i (value of the ith right-hand side symbol). They are kept on a special

value stack which is maintained automatically by the parser.

 

Values associated with terminal symbols must be set by the lexical analyzer

(more about this in Section `Lexical Analysis'). Actions of the form $$ := $1

can frequently be omitted, since it is the default action assumed by TP Yacc

for any rule that does not have an explicit action.

 

By default, the semantic value type provided by Yacc is Integer. You can

also put a declaration like

 

   %{

   type YYSType = Real;

   %}

 

into the definitions section of your Yacc grammar to change the default value

type. However, if you have different value types, the preferred method is to

use type definitions as discussed in Section `Definitions'. When such type

definitions are given, TP Yacc handles all the necessary details of the

YYSType definition and also provides a fair amount of type checking which

makes it easier to find type errors in the grammar.

 

For instance, we may declare the symbols NUM and expr in the example above

to be of type Real, and then use these values to evaluate an expression as

it is parsed.

 

%left '+' '-'

%left '*' '/'

%token <Real> NUM

%type  <Real> expr

%%

expr : expr '+' expr   { $$ := $1+$3; }

     | expr '-' expr   { $$ := $1-$3; }

     | expr '*' expr   { $$ := $1*$3; }

     | expr '/' expr   { $$ := $1/$3; }

     | '(' expr ')'    { $$ := $2;    }

     | NUM

     ;

 

(Note that we omitted the action of the last rule. The "copy action"

$$ := $1 required by this rule is automatically added by TP Yacc.)

 

Actions may not only appear at the end, but also in the middle of a rule

which is useful to perform some processing before a rule is fully parsed.

Such actions inside a rule are treated as special nonterminals which are

associated with an empty right-hand side. Thus, a rule like

 

   x : y { action; } z

 

will be treated as:

 

  x : y $act z

  $act : { action; }

 

Actions inside a rule may also access values to the left of the action,

and may return values by assigning to the $$ value. The value returned

by such an action can then be accessed by other actions using the usual $i

notation. E.g., we may write:

 

   x : y { $$ := 2*$1; } z { $$ := $2+$3; }

 

which has the effect of setting the value of x to

 

   2*(the value of y)+(the value of z).

 

Sometimes it is desirable to access values in enclosing rules. This can be

done using the notation $i with i<=0. $0 refers to the first value "to the

left" of the current rule, $-1 to the second, and so on. Note that in this

case the referenced value depends on the actual contents of the parse stack,

so you have to make sure that the requested values are always where you

expect them.

 

There are some situations in which TP Yacc cannot easily determine the

type of values (when a typed parser is used). This is true, in particular,

for values in enclosing rules and for the $$ value in an action inside a

rule. In such cases you may use a type cast to explicitly specify the type

of a value. The format for such type casts is $<name>$ (for left-hand side

values) and $<name>i (for right-hand side values) where name is a type

identifier (which must occur in a %token, precedence or %type definition).

 

 

Auxiliary Procedures

--------------------

 

The third section of a TP Yacc program is optional. If it is present, it

may contain any Turbo Pascal code (such as supporting routines or a main

program) which is tacked on to the end of the output file.

 

 

Lexical Analysis

----------------

 

For any TP Yacc-generated parser, the programmer must supply a lexical

analyzer routine named yylex which performs the lexical analysis for

the parser. This routine must be declared as

 

   function yylex : Integer;

 

The yylex routine may either be prepared by hand, or by using the lexical

analyzer generator TP Lex (see Section `TP Lex').

 

The lexical analyzer must be included in your main program behind the

parser subroutine (the yyparse code template includes a forward

definition of the yylex routine such that the parser can access the

lexical analyzer). For instance, you may put the lexical analyzer

routine into the auxiliary procedures section of your TP Yacc grammar,

either directly, or by using the the Turbo Pascal include directive

($I).

 

The parser repeatedly calls the yylex routine to tokenize the input

stream and obtain the individual lexical items in the input. For any

literal character, the yylex routine has to return the corresponding

character code. For the other, symbolic, terminals of the input language,

the lexical analyzer must return corresponding Integer codes. These are

assigned automatically by TP Yacc in the order in which token definitions

appear in the definitions section of the source grammar. The lexical

analyzer can access these values through corresponding Integer constants

which are declared by TP Yacc in the output file.

 

For instance, if

 

   %token NUM

 

is the first definition in the Yacc grammar, then TP Yacc will create

a corresponding constant declaration

 

   const NUM = 257;

 

in the output file (TP Yacc automatically assigns symbolic token numbers

starting at 257; 1 thru 255 are reserved for character literals, 0 denotes

end-of-file, and 256 is reserved for the special error token which will be

discussed in Section `Error Handling'). This definition may then be used,

e.g., in a corresponding TP Lex program as follows:

 

   [0-9]+   return(NUM);

 

You can also explicitly assign token numbers in the grammar. For this

purpose, the first occurrence of a token identifier in the definitions

section may be followed by an unsigned integer. E.g. you may write:

 

   %token NUM 299

 

Besides the return value of yylex, the lexical analyzer routine may also

return an additional semantic value for the recognized token. This value

is assigned to a variable named "yylval" and may then be accessed in actions

through the $i notation (see above, Section `Grammar Rules and Actions').

The yylval variable is of type YYSType (the semantic value type, Integer

by default); its declaration may be found in the yyparse.cod file.

 

For instance, to assign an Integer value to a NUM token in the above

example, we may write:

 

   [0-9]+   begin

              val(yytext, yylval, code);

              return(NUM);

            end;

 

This assigns yylval the value of the NUM token (using the Turbo Pascal

standard procedure val).

 

If a parser uses tokens of different types (via a %token <name> definition),

then the yylval variable will not be of type Integer, but instead of a

corresponding variant record type which is capable of holding all the

different value types declared in the TP Yacc grammar. In this case, the

lexical analyzer must assign a semantic value to the corresponding record

component which is named yy<name> (where <name> stands for the corresponding

type identifier).

 

E.g., if token NUM is declared Real:

 

   %token <Real> NUM

 

then the value for token NUM must be assigned to yylval.yyReal.

 

 

How The Parser Works

--------------------

 

TP Yacc uses the LALR(1) technique developed by Donald E. Knuth and F.
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -