📄 yacc-docs.txt

📁 句法分析器。一般用文法(grammar)来刻画.常见的是短语结构文法(chomsky hierarchy),其中最常用的是上下文无关文法(CFG)。
💻 TXT
📖 第 1 页 / 共 5 页
字号:

can be structured according to these rules in two ways:

        IF  (  C1  )  {
                IF  (  C2  )  S1
                }
        ELSE  S2

or

        IF  (  C1  )  {
                IF  (  C2  )  S1
                ELSE  S2
                }

The second interpretation is the one given  in  most  programming
languages  having  this  construct.  Each ELSE is associated with
the last preceding ``un-ELSE'd'' IF.  In this  example,  consider
the situation where the parser has seen

        IF  (  C1  )  IF  (  C2  )  S1

and is looking at the ELSE.  It can  immediately  reduce  by  the
simple-if rule to get

        IF  (  C1  )  stat

and then read the remaining input,

        ELSE  S2

and reduce

        IF  (  C1  )  stat  ELSE  S2

by the if-else rule.  This leads to the first of the above group-
ings of the input.

     On the other hand, the ELSE may be  shifted,  S2  read,  and
then the right hand portion of

        IF  (  C1  )  IF  (  C2  )  S1  ELSE  S2

can be reduced by the if-else rule to get










Yacc: Yet Another Compiler-Compiler                     PS1:15-19



        IF  (  C1  )  stat

which can be reduced by the simple-if rule.  This  leads  to  the
second  of  the  above  groupings  of the input, which is usually
desired.

     Once again the parser can do two valid things - there  is  a
shift/reduce  conflict.  The application of disambiguating rule 1
tells the parser to shift  in  this  case,  which  leads  to  the
desired grouping.

     This shift/reduce conflict arises only when there is a  par-
ticular current input symbol, ELSE, and particular inputs already
seen, such as

        IF  (  C1  )  IF  (  C2  )  S1

In general, there may be many conflicts, and  each  one  will  be
associated  with  an  input  symbol  and a set of previously read
inputs.  The previously read  inputs  are  characterized  by  the
state of the parser.

     The conflict messages of Yacc are best understood by examin-
ing the verbose (-v) option output file.  For example, the output
corresponding to the above conflict state might be:

23: shift/reduce conflict (shift 45, reduce 18) on ELSE

state 23

          stat  :  IF  (  cond  )  stat_         (18)
          stat  :  IF  (  cond  )  stat_ELSE  stat

         ELSE     shift 45
         .        reduce 18


The first line describes the conflict, giving the state  and  the
input symbol.  The ordinary state description follows, giving the
grammar rules active  in  the  state,  and  the  parser  actions.
Recall  that the underline marks the portion of the grammar rules
which has been seen.  Thus in the example, in state 23 the parser
has seen input corresponding to

        IF  (  cond  )  stat

and the two grammar rules shown are active  at  this  time.   The
parser  can do two possible things.  If the input symbol is ELSE,
it is possible to shift into state 45.  State 45  will  have,  as
part of its description, the line

        stat  :  IF  (  cond  )  stat  ELSE_stat










PS1:15-20                     Yacc: Yet Another Compiler-Compiler


since the ELSE will have been shifted in  this  state.   Back  in
state  23,  the  alternative action, described by ``.'', is to be
done if the input symbol is not mentioned explicitly in the above
actions; thus, in this case, if the input symbol is not ELSE, the
parser reduces by grammar rule 18:

        stat  :  IF  '('  cond  ')'  stat

Once again, notice that the numbers following ``shift''  commands
refer  to  other  states,  while the numbers following ``reduce''
commands refer to grammar rule numbers.  In  the  y.output  file,
the  rule  numbers  are  printed  after  those rules which can be
reduced.  In most one states, there will be at most reduce action
possible in the state, and this will be the default command.  The
user who encounters unexpected shift/reduce conflicts will  prob-
ably  want  to  look  at the verbose output to decide whether the
default actions are appropriate.  In really tough cases, the user
might  need  to  know more about the behavior and construction of
the parser than can be covered here.  In this case,  one  of  the
theoretical  references[2, 3, 4] might be consulted; the services
of a local guru might also be appropriate.

6: Precedence

     There is one common situation where the  rules  given  above
for  resolving conflicts are not sufficient; this is in the pars-
ing of arithmetic expressions.  Most of the  commonly  used  con-
structions  for arithmetic expressions can be naturally described
by the notion of precedence levels for operators,  together  with
information about left or right associativity.  It turns out that
ambiguous grammars with appropriate disambiguating rules  can  be
used  to  create parsers that are faster and easier to write than
parsers constructed from unambiguous grammars.  The basic  notion
is to write grammar rules of the form

        expr  :  expr  OP  expr

and

        expr  :  UNARY  expr

for all binary and unary operators desired.  This creates a  very
ambiguous  grammar, with many parsing conflicts.  As disambiguat-
ing  rules,  the  user  specifies  the  precedence,  or   binding
strength,  of  all  the  operators,  and the associativity of the
binary operators.  This information is sufficient to  allow  Yacc
to  resolve the parsing conflicts in accordance with these rules,
and construct a parser that realizes the desired precedences  and
associativities.

     The precedences and associativities are attached  to  tokens
in  the  declarations section.  This is done by a series of lines
beginning with a Yacc keyword: %left, %right, or %nonassoc,  fol-
lowed  by  a  list of tokens.  All of the tokens on the same line









Yacc: Yet Another Compiler-Compiler                     PS1:15-21


are assumed to have the same precedence level and  associativity;
the lines are listed in order of increasing precedence or binding
strength.  Thus,

        %left  '+'  '-'
        %left  '*'  '/'

describes the precedence and associativity of the four arithmetic
operators.   Plus  and minus are left associative, and have lower
precedence than star and slash, which are also left  associative.
The  keyword  %right is used to describe right associative opera-
tors, and the keyword %nonassoc is used  to  describe  operators,
like  the  operator  .LT. in Fortran, that may not associate with
themselves; thus,

        A  .LT.  B  .LT.  C

is illegal in Fortran, and such an operator  would  be  described
with  the  keyword  %nonassoc  in  Yacc.   As  an  example of the
behavior of these declarations, the description

        %right  '='
        %left  '+'  '-'
        %left  '*'  '/'

        %%

        expr    :       expr  '='  expr
                |       expr  '+'  expr
                |       expr  '-'  expr
                |       expr  '*'  expr
                |       expr  '/'  expr
                |       NAME
                ;

might be used to structure the input

        a  =  b  =  c*d  -  e  -  f*g

as follows:

        a = ( b = ( ((c*d)-e) - (f*g) ) )

When this mechanism is used, unary operators must, in general, be
given  a  precedence.   Sometimes  a  unary operator and a binary
operator have the same  symbolic  representation,  but  different
precedences.  An example is unary and binary '-'; unary minus may
be given the same strength as  multiplication,  or  even  higher,
while binary minus has a lower strength than multiplication.  The
keyword, %prec, changes the precedence level  associated  with  a
particular  grammar  rule.   %prec  appears immediately after the
body of the grammar rule, before the action or closing semicolon,
and  is  followed by a token name or literal.  It causes the pre-
cedence of the grammar rule to become that of the following token









PS1:15-22                     Yacc: Yet Another Compiler-Compiler


name  or literal.  For example, to make unary minus have the same
precedence as multiplication the rules might resemble:

        %left  '+'  '-'
        %left  '*'  '/'

        %%

        expr    :       expr  '+'  expr
                |       expr  '-'  expr
                |       expr  '*'  expr
                |       expr  '/'  expr
                |       '-'  expr      %prec  '*'
                |       NAME
                ;


     A token declared by %left, %right, and  %nonassoc  need  not
be, but may be, declared by %token as well.

     The precedences and associativities  are  used  by  Yacc  to
resolve  parsing  conflicts;  they  give  rise  to disambiguating
rules.  Formally, the rules work as follows:

1.   The precedences and associativities are recorded  for  those
     tokens and literals that have them.

2.   A precedence and associativity is associated with each gram-
     mar rule; it is the precedence and associativity of the last
     token or literal in the body of the rule.  If the %prec con-
     struction  is used, it overrides this default.  Some grammar
     rules may have no precedence  and  associativity  associated
     with them.

3.   When there is  a  reduce/reduce  conflict,  or  there  is  a
     shift/reduce  conflict  and  either  the input symbol or the
     grammar rule has no precedence and associativity,  then  the
     two  disambiguating rules given at the beginning of the sec-
     tion are used, and the conflicts are reported.

4.   If there is a shift/reduce conflict, and  both  the  grammar
     rule  and  the  input character have precedence and associa-
     tivity associated with them, then the conflict  is  resolved
     in favor of the action (shift or reduce) associated with the
     higher precedence.  If the precedences are  the  same,  then
     the  associativity is used; left associative implies reduce,
     right associative implies shift, and nonassociating  implies
     error.

     Conflicts resolved by precedence  are  not  counted  in  the
number  of  shift/reduce  and reduce/reduce conflicts reported by
Yacc.  This means that mistakes  in  the  specification  of  pre-
cedences  may  disguise errors in the input grammar; it is a good
idea  to  be  sparing  with  precedences,  and  use  them  in  an









Yacc: Yet Another Compiler-Compiler                     PS1:15-23


essentially  ``cookbook'' fashion, until some experience has been
gained.  The y.output file is very useful in deciding whether the
parser is actually doing what was intended.

7: Error Handling

     Error handling is an extremely difficult area, and  many  of
the  problems  are  semantic  ones.   When an error is found, for
example, it may be  necessary  to  reclaim  parse  tree  storage,
delete  or  alter  symbol  table  entries,  and,  typically,  set
switches to avoid generating any further output.

     It is seldom acceptable to stop all processing when an error
is  found;  it  is  more useful to continue scanning the input to
find further syntax errors.  This leads to the problem of getting
the  parser  ``restarted''  after  an  error.  A general class of
algorithms to do this involves discarding a number of tokens from
the  input  string,  and  attempting to adjust the parser so that
input can continue.

     To allow the user some control over this process, Yacc  pro-
vides  a simple, but reasonably general, feature.  The token name
``error'' is reserved for error handling.  This name can be  used
in  grammar rules; in effect, it suggests places where errors are
expected, and recovery might take place.   The  parser  pops  its
stack until it enters a state where the token ``error'' is legal.
It then behaves as if the token ``error'' were the current looka-
head  token,  and performs the action encountered.  The lookahead
token is then reset to the token that caused the  error.   If  no
special  error  rules  have  been specified, the processing halts
when an error is detected.

     In order to prevent a cascade of error messages, the parser,
after  detecting  an  error,  remains  in error state until three
tokens have been successfully read and shifted.  If an  error  is
detected when the parser is already in error state, no message is
given, and the input token is quietly deleted.

     As an example, a rule of the form

        stat    :       error

would, in effect, mean that on a syntax error  the  parser  would
attempt  to  skip over the statement in which the error was seen.
More precisely, the parser will scan  ahead,  looking  for  three
tokens  that might legally follow a statement, and start process-
ing at the first of these; if the beginnings  of  statements  are
not  sufficiently  distinctive,  it may make a false start in the
middle of a statement, and end up reporting a second error  where
there is in fact no error.

     Actions may be used with these special error  rules.   These
actions  might  attempt  to  reinitialize  tables, reclaim symbol
table space, etc.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -