⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 yacc-docs.txt

📁 windowns 环境的lex和yacc编译器工具
💻 TXT
📖 第 1 页 / 共 5 页
字号:
     With each grammar rule, the user may associate actions to beYacc: Yet Another Compiler-Compiler                      PS1:15-7performed  each time the rule is recognized in the input process.These actions may  return  values,  and  may  obtain  the  valuesreturned by previous actions.  Moreover, the lexical analyzer canreturn values for tokens, if desired.     An action is an arbitrary C statement, and as  such  can  doinput  and  output,  call subprograms, and alter external vectorsand variables.  An action is specified by one or more statements,enclosed in curly braces ``{'' and ``}''.  For example,        A       :       '('  B  ')'                                {       hello( 1, "abc" );  }and        XXX     :       YYY  ZZZ                                {       printf("a message\n");                                        flag = 25;   }are grammar rules with actions.     To facilitate easy communication between the actions and theparser,  the  action statements are altered slightly.  The symbol``dollar sign'' ``$'' is used as a signal to Yacc  in  this  con-text.     To return a value, the  action  normally  sets  the  pseudo-variable  ``$$'' to some value.  For example, an action that doesnothing but return the value 1 is                {  $$ = 1;  }     To obtain the values returned by previous  actions  and  thelexical analyzer, the action may use the pseudo-variables $1, $2,. . ., which refer to the values returned by  the  components  ofthe  right  side of a rule, reading from left to right.  Thus, ifthe rule is        A       :       B  C  D   ;for example, then $2 has the value returned  by  C,  and  $3  thevalue returned by D.     As a more concrete example, consider the rule        expr    :       '('  expr  ')'   ;The value returned by this rule is usually the value of the  exprin parentheses.  This can be indicated by        expr    :        '('  expr  ')'         {  $$ = $2 ;  }PS1:15-8                      Yacc: Yet Another Compiler-Compiler     By default, the value of a rule is the value  of  the  firstelement in it ($1).  Thus, grammar rules of the form        A       :       B    ;frequently need not have an explicit action.     In the examples above, all the actions came at  the  end  oftheir  rules.  Sometimes, it is desirable to get control before arule is fully parsed.  Yacc permits an action to  be  written  inthe middle of a rule as well as at the end.  This rule is assumedto return a value, accessible through the usual mechanism by  theactions  to  the  right of it.  In turn, it may access the valuesreturned by the symbols to its left.  Thus, in the rule        A       :       B                                {  $$ = 1;  }                        C                                {   x = $2;   y = $3;  }                ;the effect is to set x to 1, and y to the value returned by C.     Actions that do not terminate a rule are actually handled byYacc  by  manufacturing  a new nonterminal symbol name, and a newrule matching this name to the empty string.  The interior actionis the action triggered off by recognizing this added rule.  Yaccactually treats the above example as if it had been written:        $ACT    :       /* empty */                                {  $$ = 1;  }                ;        A       :       B  $ACT  C                                {   x = $2;   y = $3;  }                ;     In many applications, output is not  done  directly  by  theactions;  rather, a data structure, such as a parse tree, is con-structed in memory, and transformations are applied to it  beforeoutput  is  generated.  Parse trees are particularly easy to con-struct, given routines to build and maintain the  tree  structuredesired.   For example, suppose there is a C function node, writ-ten so that the call        node( L, n1, n2 )creates a node with label L,  and  descendants  n1  and  n2,  andreturns the index of the newly created node.  Then parse tree canbe built by supplying actions such as:        expr    :       expr  '+'  expr                                {  $$ = node( '+', $1, $3 );  }Yacc: Yet Another Compiler-Compiler                      PS1:15-9in the specification.     The user may define  other  variables  to  be  used  by  theactions.  Declarations and definitions can appear in the declara-tions section, enclosed in the marks ``%{''  and  ``%}''.   Thesedeclarations and definitions have global scope, so they are knownto the action statements and the lexical analyzer.  For example,        %{   int variable = 0;   %}could be placed in  the  declarations  section,  making  variableaccessible  to  all  of  the  actions.  The Yacc parser uses onlynames beginning in ``yy''; the user should avoid such names.     In these examples, all the values are integers: a discussionof values of other types will be found in Section 10.3: Lexical Analysis     The user must supply a lexical analyzer to  read  the  inputstream  and  communicate  tokens (with values, if desired) to theparser.  The  lexical  analyzer  is  an  integer-valued  functioncalled yylex.  The function returns an integer, the token number,representing the kind of token read.  If there is a value associ-ated with that token, it should be assigned to the external vari-able yylval.     The parser and the lexical  analyzer  must  agree  on  thesetoken  numbers  in  order  for communication between them to takeplace.  The numbers may be chosen by Yacc, or chosen by the user.In  either case, the ``# define'' mechanism of C is used to allowthe lexical analyzer to return these numbers  symbolically.   Forexample,  suppose  that  the token name DIGIT has been defined inthe declarations section of the  Yacc  specification  file.   Therelevant portion of the lexical analyzer might look like:        yylex(){                extern int yylval;                int c;                . . .                c = getchar();                . . .                switch( c ) {                        . . .                case '0':                case '1':                  . . .                case '9':                        yylval = c-'0';                        return( DIGIT );                        . . .                        }                . . .PS1:15-10                     Yacc: Yet Another Compiler-Compiler     The intent is to return a token number of DIGIT, and a valueequal  to  the  numerical  value of the digit.  Provided that thelexical analyzer code is placed in the programs  section  of  thespecification  file,  the identifier DIGIT will be defined as thetoken number associated with the token DIGIT.     This mechanism  leads  to  clear,  easily  modified  lexicalanalyzers;  the only pitfall is the need to avoid using any tokennames in the grammar that are reserved or significant in C or theparser;  for  example,  the  use  of token names if or while willalmost certainly  cause  severe  difficulties  when  the  lexicalanalyzer is compiled.  The token name error is reserved for errorhandling, and should not be used naively (see Section 7).     As mentioned above, the token numbers may be chosen by  Yaccor by the user.  In the default situation, the numbers are chosenby Yacc.  The default token number for a literal character is thenumerical  value  of  the  character  in the local character set.Other names are assigned token numbers starting at 257.     To assign a token number to a  token  (including  literals),the first appearance of the token name or literal in the declara-tions section  can  be  immediately  followed  by  a  nonnegativeinteger.   This  integer  is  taken to be the token number of thename or literal.  Names and literals not defined by this  mechan-ism  retain  their  default definition.  It is important that alltoken numbers be distinct.     For historical reasons, the endmarker must have token number0  or  negative.   This  token  number cannot be redefined by theuser; thus, all lexical analyzers should be prepared to return  0or  negative  as  a  token  number upon reaching the end of theirinput.     A very useful tool for constructing lexical analyzers is theLex  program  developed  by Mike Lesk.[8] These lexical analyzersare designed to work in close harmony  with  Yacc  parsers.   Thespecifications  for  these  lexical analyzers use regular expres-sions instead of grammar rules.  Lex can be easily used  to  pro-duce  quite  complicated lexical analyzers, but there remain somelanguages (such as FORTRAN) which  do  not  fit  any  theoreticalframework, and whose lexical analyzers must be crafted by hand.4: How the Parser Works     Yacc turns the specification file into a  C  program,  whichparses the input according to the specification given.  The algo-rithm used to go from the specification to the parser is complex,and  will  not  be  discussed  here  (see the references for moreinformation).  The parser itself, however, is relatively  simple,and  understanding  how  it  works, while not strictly necessary,will nevertheless make treatment of error recovery  and  ambigui-ties much more comprehensible.Yacc: Yet Another Compiler-Compiler                     PS1:15-11     The parser produced by  Yacc  consists  of  a  finite  statemachine  with a stack.  The parser is also capable of reading andremembering the next input token (called  the  lookahead  token).The current state is always the one on the top of the stack.  Thestates of the  finite  state  machine  are  given  small  integerlabels;  initially, the machine is in state 0, the stack containsonly state 0, and no lookahead token has been read.     The machine has only four actions available  to  it,  calledshift,  reduce,  accept, and error.  A move of the parser is doneas follows:1.   Based on its current state, the parser  decides  whether  it     needs  a  lookahead  token  to  decide what action should be     done; if it needs one, and does not have one, it calls yylex     to obtain the next token.2.   Using the current state, and the lookahead token if  needed,     the  parser  decides on its next action, and carries it out.     This may result in states being pushed onto  the  stack,  or     popped  off  of  the stack, and in the lookahead token being     processed or left alone.     The shift action is the most common action the parser takes.Whenever  a  shift  action  is taken, there is always a lookaheadtoken.  For example, in state 56 there may be an action:                IF      shift 34which says, in state 56,  if  the  lookahead  token  is  IF,  thecurrent  state  (56)  is  pushed  down on the stack, and state 34becomes the current state (on the top of the stack).  The  looka-head token is cleared.     The reduce action  keeps  the  stack  from  growing  withoutbounds.   Reduce actions are appropriate when the parser has seenthe right hand side  of  a  grammar  rule,  and  is  prepared  toannounce  that it has seen an instance of the rule, replacing theright hand side by the left hand side.  It may  be  necessary  toconsult the lookahead token to decide whether to reduce, but usu-ally it is not; in fact, the default  action  (represented  by  a``.'') is often a reduce action.     Reduce actions are associated with individual grammar rules.Grammar  rules  are  also given small integer numbers, leading tosome confusion.  The action                .       reduce 18refers to grammar rule 18, while the action                IF      shift 34refers to state 34.PS1:15-12                     Yacc: Yet Another Compiler-Compiler     Suppose the rule being reduced is        A       :       x  y  z    ;The reduce action depends on the left  hand  symbol  (A  in  thiscase), and the number of symbols on the right hand side (three inthis case).  To reduce, first pop off the top three  states  fromthe  stack  (In  general,  the number of states popped equals thenumber of symbols on the right side of  the  rule).   In  effect,these  states were the ones put on the stack while recognizing x,y, and z, and no longer serve any useful purpose.  After  poppingthese states, a state is uncovered which was the state the parserwas  in  before  beginning  to  process  the  rule.   Using  thisuncovered  state,  and  the  symbol on the left side of the rule,perform what is in effect a shift of A.  A new state is obtained,pushed onto the stack, and parsing continues.  There are signifi-cant differences between the processing of the left  hand  symboland  an  ordinary  shift  of  a token, however, so this action iscalled a goto action.  In  particular,  the  lookahead  token  iscleared  by a shift, and is not affected by a goto.  In any case,the uncovered state contains an entry such as:                A       goto 20causing state 20 to be pushed onto  the  stack,  and  become  thecurrent state.     In effect, the reduce action ``turns back the clock'' in theparse,  popping  the states off the stack to go back to the statewhere the right hand side of the rule was first seen.  The parserthen  behaves  as  if it had seen the left side at that time.  Ifthe right hand side of the rule is empty, no  states  are  poppedoff  of  the  stack:  the  uncovered state is in fact the currentstate.     The reduce action is also  important  in  the  treatment  ofuser-supplied  actions  and  values.  When a rule is reduced, thecode supplied with the rule  is  executed  before  the  stack  isadjusted.   In  addition to the stack holding the states, anotherstack, running in parallel with it,  holds  the  values  returnedfrom  the  lexical  analyzer and the actions.  When a shift takesplace, the external variable yylval  is  copied  onto  the  valuestack.   After  the  return  from the user code, the reduction iscarried out.  When the goto action is done, the external variable

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -