⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 yacc-docs.txt

📁 句法分析器。一般用文法(grammar)来刻画.常见的是短语结构文法(chomsky hierarchy),其中最常用的是上下文无关文法(CFG)。
💻 TXT
📖 第 1 页 / 共 5 页
字号:





PS1:15-24                     Yacc: Yet Another Compiler-Compiler


     Error rules such as the above are very general,  but  diffi-
cult to control.  Somewhat easier are rules such as

        stat    :       error  ';'

Here, when there is an error, the parser attempts  to  skip  over
the  statement,  but will do so by skipping to the next ';'.  All
tokens after the error and before the next ';' cannot be shifted,
and  are  discarded.   When  the  ';'  is seen, this rule will be
reduced, and any ``cleanup'' action associated with it performed.

     Another form of error rule arises  in  interactive  applica-
tions, where it may be desirable to permit a line to be reentered
after an error.  A possible error rule might be

        input   :       error  '\n'  {  printf( "Reenter last line: " );  }  input
                                {       $$  =  $4;  }

There is one potential difficulty with this approach; the  parser
must  correctly  process three input tokens before it admits that
it has correctly resynchronized after the error.   If  the  reen-
tered  line contains an error in the first two tokens, the parser
deletes the offending tokens,  and  gives  no  message;  this  is
clearly unacceptable.  For this reason, there is a mechanism that
can be used to force the parser to believe that an error has been
fully recovered from.  The statement

        yyerrok ;

in an action resets the parser to  its  normal  mode.   The  last
example is better written

        input   :       error  '\n'
                                {       yyerrok;
                                        printf( "Reenter last line: " );   }
                        input
                                {       $$  =  $4;  }
                ;


     As mentioned above, the token  seen  immediately  after  the
``error''  symbol  is  the  input  token  at  which the error was
discovered.  Sometimes, this is inappropriate;  for  example,  an
error  recovery  action might take upon itself the job of finding
the correct place to resume input.  In this  case,  the  previous
lookahead token must be cleared.  The statement

        yyclearin ;

in an action will have this effect.   For  example,  suppose  the
action after error were to call some sophisticated resynchroniza-
tion routine, supplied by the user, that attempted to advance the
input  to  the beginning of the next valid statement.  After this
routine was called,  the  next  token  returned  by  yylex  would









Yacc: Yet Another Compiler-Compiler                     PS1:15-25


presumably  be  the  first  token  in a legal statement; the old,
illegal token must be discarded, and the error state reset.  This
could be done by a rule like

        stat    :       error
                                {       resynch();
                                        yyerrok ;
                                        yyclearin ;   }
                ;


     These mechanisms are admittedly crude, but do  allow  for  a
simple, fairly effective recovery of the parser from many errors;
moreover, the user can get control to deal with the error actions
required by other portions of the program.

8: The Yacc Environment

     When the user inputs a specification to Yacc, the output  is
a  file  of  C  programs,  called y.tab.c on most systems (due to
local file system conventions, the names may differ from  instal-
lation to installation).  The function produced by Yacc is called
yyparse; it is an integer valued function.  When it is called, it
in  turn repeatedly calls yylex, the lexical analyzer supplied by
the user (see Section 3) to  obtain  input  tokens.   Eventually,
either  an error is detected, in which case (if no error recovery
is possible) yyparse returns the value 1, or the lexical analyzer
returns  the  endmarker  token  and  the parser accepts.  In this
case, yyparse returns the value 0.

     The user must provide a certain amount  of  environment  for
this  parser  in order to obtain a working program.  For example,
as with every C program, a program called main must  be  defined,
that  eventually  calls  yyparse.   In addition, a routine called
yyerror prints a message when a syntax error is detected.

     These two routines must be supplied in one form  or  another
by the user.  To ease the initial effort of using Yacc, a library
has been provided with default versions of main and yyerror.  The
name  of  this  library  is system dependent; on many systems the
library is accessed by a -ly argument to the loader.  To show the
triviality of these default programs, the source is given below:

        main(){
                return( yyparse() );
                }

and

        # include <stdio.h>

        yyerror(s) char *s; {
                fprintf( stderr, "%s\n", s );
                }









PS1:15-26                     Yacc: Yet Another Compiler-Compiler


The argument to yyerror is a string containing an error  message,
usually  the  string  ``syntax  error''.  The average application
will want to do better than this.  Ordinarily, the program should
keep  track of the input line number, and print it along with the
message when a syntax error is detected.   The  external  integer
variable  yychar  contains the lookahead token number at the time
the error was detected; this may be of some  interest  in  giving
better  diagnostics.  Since the main program is probably supplied
by the user (to read arguments, etc.) the Yacc library is  useful
only in small projects, or in the earliest stages of larger ones.

     The external integer variable yydebug is normally set to  0.
If it is set to a nonzero value, the parser will output a verbose
description of its actions, including a discussion of which input
symbols have been read, and what the parser actions are.  Depend-
ing on the operating environment, it may be possible to set  this
variable by using a debugging system.

9: Hints for Preparing Specifications

     This section contains miscellaneous hints on preparing effi-
cient,  easy to change, and clear specifications.  The individual
subsections are more or less independent.

Input Style

     It is difficult to provide rules  with  substantial  actions
and  still  have  a  readable  specification file.  The following
style hints owe much to Brian Kernighan.

a.   Use all capital letters for  token  names,  all  lower  case
     letters  for  nonterminal  names.  This rule comes under the
     heading of ``knowing who to blame when things go wrong.''

b.   Put grammar rules  and  actions  on  separate  lines.   This
     allows  either  to  be  changed without an automatic need to
     change the other.

c.   Put all rules with the same left hand  side  together.   Put
     the left hand side in only once, and let all following rules
     begin with a vertical bar.

d.   Put a semicolon only after the last rule with a  given  left
     hand  side,  and put the semicolon on a separate line.  This
     allows new rules to be easily added.

e.   Indent rule bodies by two tab stops, and  action  bodies  by
     three tab stops.

     The example in Appendix A is written following  this  style,
as  are  the examples in the text of this paper (where space per-
mits).  The user must make up his own mind about these  stylistic
questions;  the  central  problem,  however, is to make the rules
visible through the morass of action code.









Yacc: Yet Another Compiler-Compiler                     PS1:15-27


Left Recursion

     The algorithm used by the Yacc parser encourages  so  called
``left recursive'' grammar rules: rules of the form

        name    :       name  rest_of_rule  ;

These rules  frequently  arise  when  writing  specifications  of
sequences and lists:

        list    :       item
                |       list  ','  item
                ;

and

        seq     :       item
                |       seq  item
                ;

In each of these cases, the first rule will be  reduced  for  the
first  item  only,  and  the  second rule will be reduced for the
second and all succeeding items.

     With right recursive rules, such as

        seq     :       item
                |       item  seq
                ;

the parser would be a bit bigger, and the items  would  be  seen,
and  reduced,  from  right  to left.  More seriously, an internal
stack in the parser would be in danger of overflowing if  a  very
long  sequence  were read.  Thus, the user should use left recur-
sion wherever reasonable.

     It is worth considering whether a sequence  with  zero  ele-
ments  has  any meaning, and if so, consider writing the sequence
specification with an empty rule:

        seq     :       /* empty */
                |       seq  item
                ;

Once again, the first rule would always be reduced exactly  once,
before the first item was read, and then the second rule would be
reduced once for each  item  read.   Permitting  empty  sequences
often  leads  to  increased generality.  However, conflicts might
arise if Yacc is asked to decide  which  empty  sequence  it  has
seen, when it hasn't seen enough to know!

Lexical Tie-ins

     Some lexical decisions depend on context.  For example,  the









PS1:15-28                     Yacc: Yet Another Compiler-Compiler


lexical  analyzer  might  want to delete blanks normally, but not
within quoted strings.  Or names might be entered into  a  symbol
table in declarations, but not in expressions.

     One way of handling this situation is  to  create  a  global
flag  that  is  examined  by  the  lexical  analyzer,  and set by
actions.  For example, suppose a program consists of  0  or  more
declarations, followed by 0 or more statements.  Consider:

        %{
                int dflag;
        %}
          ...  other declarations ...

        %%

        prog    :       decls  stats
                ;

        decls   :       /* empty */
                                {       dflag = 1;  }
                |       decls  declaration
                ;

        stats   :       /* empty */
                                {       dflag = 0;  }
                |       stats  statement
                ;

            ...  other rules ...

The flag dflag is now 0 when reading statements, and 1 when read-
ing  declarations, except for the first token in the first state-
ment.  This token must be seen by the parser before it  can  tell
that  the  declaration  section has ended and the statements have
begun.  In many cases,  this  single  token  exception  does  not
affect the lexical scan.

     This kind of ``backdoor'' approach can be  elaborated  to  a
noxious  degree.  Nevertheless, it represents a way of doing some
things that are difficult, if not impossible, to do otherwise.

Reserved Words

     Some programming languages permit the user to use words like
``if'',  which are normally reserved, as label or variable names,
provided that such use does not conflict with the  legal  use  of
these  names in the programming language.  This is extremely hard
to do in the framework of Yacc; it is difficult to pass  informa-
tion  to  the lexical analyzer telling it ``this instance of `if'
is a keyword, and that instance is a variable''.   The  user  can
make a stab at it, using the mechanism described in the last sub-
section, but it is difficult.










Yacc: Yet Another Compiler-Compiler                     PS1:15-29


     A number of ways of making this easier are under advisement.
Until  then, it is better that the keywords be reserved; that is,
be forbidden for use  as  variable  names.   There  are  powerful
stylistic reasons for preferring this, anyway.

10: Advanced Topics

     This section discusses a  number  of  advanced  features  of
Yacc.

Simulating Error and Accept in Actions

     The parsing actions of error and accept can be simulated  in
an action by use of macros YYACCEPT and YYERROR.  YYACCEPT causes
yyparse to return the value  0;  YYERROR  causes  the  parser  to
behave  as  if  the current input symbol had been a syntax error;
yyerror is called, and error recovery takes place.  These mechan-
isms  can be used to simulate parsers with multiple endmarkers or
context-sensitive syntax checking.

Accessing Values in Enclosing Rules.

     An action may refer to values returned  by  actions  to  the
left  of  the  current rule.  The mechanism is simply the same as
with ordinary actions, a dollar sign followed by a digit, but  in
this case the digit may be 0 or negative.  Consider

        sent    :       adj  noun  verb  adj  noun
                                {  look at the sentence . . .  }
                ;

        adj     :       THE             {       $$ = THE;  }
                |       YOUNG   {       $$ = YOUNG;  }
                . . .
                ;

        noun    :       DOG
                                {       $$ = DOG;  }

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -