📄 用yacclex设计实现小型basic语言.txt
字号:
The return value of the yyparse routine denotes success or failure of
the parser (possible return values: 0 = success, 1 = unrecoverable syntax
error or parse stack overflow).
Similar to TP Lex, the code template for the yyparse routine may be found in
the yyparse.cod file. The rules for locating this file are analogous to those
of TP Lex (see Section `TP Lex').
The TP Yacc library (YaccLib) unit is required by programs using Yacc-
generated parsers; you will therefore have to put an appropriate uses clause
into your program or unit that contains the parser routine. The YaccLib unit
also provides some routines which may be used to control the actions of the
parser. See the file yacclib.pas for further information.
Yacc Source
-----------
A TP Yacc program consists of three sections separated with the %% delimiter:
definitions
%%
rules
%%
auxiliary procedures
The TP Yacc language is free-format: whitespace (blanks, tabs and newlines)
is ignored, except if it serves as a delimiter. Comments have the C-like
format /* ... */. They are treated as whitespace. Grammar symbols are denoted
by identifiers which have the usual form (letter, including underscore,
followed by a sequence of letters and digits; upper- and lowercase is
distinct). The TP Yacc language also has some keywords which always start
with the % character. Literals are denoted by characters enclosed in single
quotes. The usual C-like escapes are recognized:
\n denotes newline
\r denotes carriage return
\t denotes tab
\b denotes backspace
\f denotes form feed
\NNN denotes character no. NNN in octal base
Definitions
-----------
The first section of a TP Yacc grammar serves to define the symbols used in
the grammar. It may contain the following types of definitions:
- start symbol definition: A definition of the form
%start symbol
declares the start nonterminal of the grammar (if this definition is
omitted, TP Yacc assumes the left-hand side nonterminal of the first
grammar rule as the start symbol of the grammar).
- terminal definitions: Definitions of the form
%token symbol ...
are used to declare the terminal symbols ("tokens") of the target
language. Any identifier not introduced in a %token definition will
be treated as a nonterminal symbol.
As far as TP Yacc is concerned, tokens are atomic symbols which do not
have an innert structure. A lexical analyzer must be provided which
takes on the task of tokenizing the input stream and return the
individual tokens and literals to the parser (see Section `Lexical
Analysis').
- precedence definitions: Operator symbols (terminals) may be associated
with a precedence by means of a precedence definition which may have
one of the following forms
%left symbol ...
%right symbol ...
%nonassoc symbol ...
which are used to declare left-, right- and nonassociative operators,
respectively. Each precedence definition introduces a new precedence
level, lowest precedence first. E.g., you may write:
%nonassoc '<' '>' '=' GEQ LEQ NEQ /* relational operators */
%left '+' '-' OR /* addition operators */
%left '*' '/' AND /* multiplication operators */
%right NOT UMINUS /* unary operators */
A terminal identifier introduced in a precedence definition may, but
need not, appear in a %token definition as well.
- type definitions: Any (terminal or nonterminal) grammar symbol may be
associated with a type identifier which is used in the processing of
semantic values. Type tags of the form <name> may be used in token and
precedence definitions to declare the type of a terminal symbol, e.g.:
%token <Real> NUM
%left <AddOp> '+' '-'
To declare the type of a nonterminal symbol, use a type definition of
the form:
%type <name> symbol ...
e.g.:
%type <Real> expr
In a %type definition, you may also omit the nonterminals, i.e. you
may write:
%type <name>
This is useful when a given type is only used with type casts (see
Section `Grammar Rules and Actions'), and is not associated with a
specific nonterminal.
- Turbo Pascal declarations: You may also include arbitrary Turbo Pascal
code in the definitions section, enclosed in %{ %}. This code will be
inserted as global declarations into the output file, unchanged.
Grammar Rules and Actions
-------------------------
The second part of a TP Yacc grammar contains the grammar rules for the
target language. Grammar rules have the format
name : symbol ... ;
The left-hand side of a rule must be an identifier (which denotes a
nonterminal symbol). The right-hand side may be an arbitrary (possibly
empty) sequence of nonterminal and terminal symbols (including literals
enclosed in single quotes). The terminating semicolon may also be omitted.
Different rules for the same left-hand side symbols may be written using
the | character to separate the different alternatives:
name : symbol ...
| symbol ...
...
;
For instance, to specify a simple grammar for arithmetic expressions, you
may write:
%left '+' '-'
%left '*' '/'
%token NUM
%%
expr : expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '(' expr ')'
| NUM
;
(The %left definitions at the beginning of the grammar are needed to specify
the precedence and associativity of the operator symbols. This will be
discussed in more detail in Section `Ambigious Grammars'.)
Grammar rules may contain actions - Turbo Pascal statements enclosed in
{ } - to be executed as the corresponding rules are recognized. Furthermore,
rules may return values, and access values returned by other rules. These
"semantic" values are written as $$ (value of the left-hand side nonterminal)
and $i (value of the ith right-hand side symbol). They are kept on a special
value stack which is maintained automatically by the parser.
Values associated with terminal symbols must be set by the lexical analyzer
(more about this in Section `Lexical Analysis'). Actions of the form $$ := $1
can frequently be omitted, since it is the default action assumed by TP Yacc
for any rule that does not have an explicit action.
By default, the semantic value type provided by Yacc is Integer. You can
also put a declaration like
%{
type YYSType = Real;
%}
into the definitions section of your Yacc grammar to change the default value
type. However, if you have different value types, the preferred method is to
use type definitions as discussed in Section `Definitions'. When such type
definitions are given, TP Yacc handles all the necessary details of the
YYSType definition and also provides a fair amount of type checking which
makes it easier to find type errors in the grammar.
For instance, we may declare the symbols NUM and expr in the example above
to be of type Real, and then use these values to evaluate an expression as
it is parsed.
%left '+' '-'
%left '*' '/'
%token <Real> NUM
%type <Real> expr
%%
expr : expr '+' expr { $$ := $1+$3; }
| expr '-' expr { $$ := $1-$3; }
| expr '*' expr { $$ := $1*$3; }
| expr '/' expr { $$ := $1/$3; }
| '(' expr ')' { $$ := $2; }
| NUM
;
(Note that we omitted the action of the last rule. The "copy action"
$$ := $1 required by this rule is automatically added by TP Yacc.)
Actions may not only appear at the end, but also in the middle of a rule
which is useful to perform some processing before a rule is fully parsed.
Such actions inside a rule are treated as special nonterminals which are
associated with an empty right-hand side. Thus, a rule like
x : y { action; } z
will be treated as:
x : y $act z
$act : { action; }
Actions inside a rule may also access values to the left of the action,
and may return values by assigning to the $$ value. The value returned
by such an action can then be accessed by other actions using the usual $i
notation. E.g., we may write:
x : y { $$ := 2*$1; } z { $$ := $2+$3; }
which has the effect of setting the value of x to
2*(the value of y)+(the value of z).
Sometimes it is desirable to access values in enclosing rules. This can be
done using the notation $i with i<=0. $0 refers to the first value "to the
left" of the current rule, $-1 to the second, and so on. Note that in this
case the referenced value depends on the actual contents of the parse stack,
so you have to make sure that the requested values are always where you
expect them.
There are some situations in which TP Yacc cannot easily determine the
type of values (when a typed parser is used). This is true, in particular,
for values in enclosing rules and for the $$ value in an action inside a
rule. In such cases you may use a type cast to explicitly specify the type
of a value. The format for such type casts is $<name>$ (for left-hand side
values) and $<name>i (for right-hand side values) where name is a type
identifier (which must occur in a %token, precedence or %type definition).
Auxiliary Procedures
--------------------
The third section of a TP Yacc program is optional. If it is present, it
may contain any Turbo Pascal code (such as supporting routines or a main
program) which is tacked on to the end of the output file.
Lexical Analysis
----------------
For any TP Yacc-generated parser, the programmer must supply a lexical
analyzer routine named yylex which performs the lexical analysis for
the parser. This routine must be declared as
function yylex : Integer;
The yylex routine may either be prepared by hand, or by using the lexical
analyzer generator TP Lex (see Section `TP Lex').
The lexical analyzer must be included in your main program behind the
parser subroutine (the yyparse code template includes a forward
definition of the yylex routine such that the parser can access the
lexical analyzer). For instance, you may put the lexical analyzer
routine into the auxiliary procedures section of your TP Yacc grammar,
either directly, or by using the the Turbo Pascal include directive
($I).
The parser repeatedly calls the yylex routine to tokenize the input
stream and obtain the individual lexical items in the input. For any
literal character, the yylex routine has to return the corresponding
character code. For the other, symbolic, terminals of the input language,
the lexical analyzer must return corresponding Integer codes. These are
assigned automatically by TP Yacc in the order in which token definitions
appear in the definitions section of the source grammar. The lexical
analyzer can access these values through corresponding Integer constants
which are declared by TP Yacc in the output file.
For instance, if
%token NUM
is the first definition in the Yacc grammar, then TP Yacc will create
a corresponding constant declaration
const NUM = 257;
in the output file (TP Yacc automatically assigns symbolic token numbers
starting at 257; 1 thru 255 are reserved for character literals, 0 denotes
end-of-file, and 256 is reserved for the special error token which will be
discussed in Section `Error Handling'). This definition may then be used,
e.g., in a corresponding TP Lex program as follows:
[0-9]+ return(NUM);
You can also explicitly assign token numbers in the grammar. For this
purpose, the first occurrence of a token identifier in the definitions
section may be followed by an unsigned integer. E.g. you may write:
%token NUM 299
Besides the return value of yylex, the lexical analyzer routine may also
return an additional semantic value for the recognized token. This value
is assigned to a variable named "yylval" and may then be accessed in actions
through the $i notation (see above, Section `Grammar Rules and Actions').
The yylval variable is of type YYSType (the semantic value type, Integer
by default); its declaration may be found in the yyparse.cod file.
For instance, to assign an Integer value to a NUM token in the above
example, we may write:
[0-9]+ begin
val(yytext, yylval, code);
return(NUM);
end;
This assigns yylval the value of the NUM token (using the Turbo Pascal
standard procedure val).
If a parser uses tokens of different types (via a %token <name> definition),
then the yylval variable will not be of type Integer, but instead of a
corresponding variant record type which is capable of holding all the
different value types declared in the TP Yacc grammar. In this case, the
lexical analyzer must assign a semantic value to the corresponding record
component which is named yy<name> (where <name> stands for the corresponding
type identifier).
E.g., if token NUM is declared Real:
%token <Real> NUM
then the value for token NUM must be assigned to yylval.yyReal.
How The Parser Works
--------------------
TP Yacc uses the LALR(1) technique developed by Donald E. Knuth and F.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -