📄 the lemon parser generator.htm

📁 編譯器的lemon的辭法分析器
💻 HTM
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
destructor whenever the non-terminal is removed from the stack, unless the 
non-terminal is used in a C-code action. If the non-terminal is used by C-code, 
then it is assumed that the C-code will take care of destroying it if it should 
really be destroyed. More commonly, the value is used to build some larger 
structure and we don't want to destroy it, which is why the destructor is not 
called in this circumstance.</P>
<P>By appropriate use of destructors, it is possible to build a parser using 
Lemon that can be used within a long-running program, such as a GUI, that will 
not leak memory or other resources. To do the same using yacc or bison is much 
more difficult.</P>
<H4>The <TT>%extra_argument</TT> directive</H4>The %extra_argument directive 
instructs Lemon to add a 4th parameter to the parameter list of the Parse() 
function it generates. Lemon doesn't do anything itself with this extra 
argument, but it does make the argument available to C-code action routines, 
destructors, and so forth. For example, if the grammar file contains:
<P></P>
<P><PRE>    %extra_argument { MyStruct *pAbc }
</PRE>
<P></P>
<P>Then the Parse() function generated will have an 4th parameter of type 
``MyStruct*'' and all action routines will have access to a variable named 
``pAbc'' that is the value of the 4th parameter in the most recent call to 
Parse().</P>
<H4>The <TT>%include</TT> directive</H4>
<P>The %include directive specifies C code that is included at the top of the 
generated parser. You can include any text you want -- the Lemon parser 
generator copies to blindly. If you have multiple %include directives in your 
grammar file, their values are concatenated before being put at the beginning of 
the generated parser.</P>
<P>The %include directive is very handy for getting some extra #include 
preprocessor statements at the beginning of the generated parser. For 
example:</P>
<P><PRE>   %include {#include &lt;unistd.h&gt;}
</PRE>
<P></P>
<P>This might be needed, for example, if some of the C actions in the grammar 
call functions that are prototyed in unistd.h.</P>
<H4>The <TT>%left</TT> directive</H4>The %left directive is used (along with the 
%right and %nonassoc directives) to declare precedences of terminal symbols. 
Every terminal symbol whose name appears after a %left directive but before the 
next period (``.'') is given the same left-associative precedence value. 
Subsequent %left directives have higher precedence. For example:
<P></P>
<P><PRE>   %left AND.
   %left OR.
   %nonassoc EQ NE GT GE LT LE.
   %left PLUS MINUS.
   %left TIMES DIVIDE MOD.
   %right EXP NOT.
</PRE>
<P></P>
<P>Note the period that terminates each %left, %right or %nonassoc 
directive.</P>
<P>LALR(1) grammars can get into a situation where they require a large amount 
of stack space if you make heavy use or right-associative operators. For this 
reason, it is recommended that you use %left rather than %right whenever 
possible.</P>
<H4>The <TT>%name</TT> directive</H4>
<P>By default, the functions generated by Lemon all begin with the 
five-character string ``Parse''. You can change this string to something 
different using the %name directive. For instance:</P>
<P><PRE>   %name Abcde
</PRE>
<P></P>
<P>Putting this directive in the grammar file will cause Lemon to generate 
functions named 
<UL>
  <LI>AbcdeAlloc(), 
  <LI>AbcdeFree(), 
  <LI>AbcdeTrace(), and 
  <LI>Abcde(). </LI></UL>The %name directive allows you to generator two or more 
different parsers and link them all into the same executable. 
<P></P>
<H4>The <TT>%nonassoc</TT> directive</H4>
<P>This directive is used to assign non-associative precedence to one or more 
terminal symbols. See the section on precedence rules or on the %left directive 
for additional information.</P>
<H4>The <TT>%parse_accept</TT> directive</H4>
<P>The %parse_accept directive specifies a block of C code that is executed 
whenever the parser accepts its input string. To ``accept'' an input string 
means that the parser was able to process all tokens without error.</P>
<P>For example:</P>
<P><PRE>   %parse_accept {
      printf("parsing complete!\n");
   }
</PRE>
<P></P>
<H4>The <TT>%parse_failure</TT> directive</H4>
<P>The %parse_failure directive specifies a block of C code that is executed 
whenever the parser fails complete. This code is not executed until the parser 
has tried and failed to resolve an input error using is usual error recovery 
strategy. The routine is only invoked when parsing is unable to continue.</P>
<P><PRE>   %parse_failure {
     fprintf(stderr,"Giving up.  Parser is hopelessly lost...\n");
   }
</PRE>
<P></P>
<H4>The <TT>%right</TT> directive</H4>
<P>This directive is used to assign right-associative precedence to one or more 
terminal symbols. See the section on precedence rules or on the %left directive 
for additional information.</P>
<H4>The <TT>%stack_overflow</TT> directive</H4>
<P>The %stack_overflow directive specifies a block of C code that is executed if 
the parser's internal stack ever overflows. Typically this just prints an error 
message. After a stack overflow, the parser will be unable to continue and must 
be reset.</P>
<P><PRE>   %stack_overflow {
     fprintf(stderr,"Giving up.  Parser stack overflow\n");
   }
</PRE>
<P></P>
<P>You can help prevent parser stack overflows by avoiding the use of right 
recursion and right-precedence operators in your grammar. Use left recursion and 
and left-precedence operators instead, to encourage rules to reduce sooner and 
keep the stack size down. For example, do rules like this: <PRE>   list ::= list element.      // left-recursion.  Good!
   list ::= .
</PRE>Not like this: <PRE>   list ::= element list.      // right-recursion.  Bad!
   list ::= .
</PRE>
<H4>The <TT>%stack_size</TT> directive</H4>
<P>If stack overflow is a problem and you can't resolve the trouble by using 
left-recursion, then you might want to increase the size of the parser's stack 
using this directive. Put an positive integer after the %stack_size directive 
and Lemon will generate a parse with a stack of the requested size. The default 
value is 100.</P>
<P><PRE>   %stack_size 2000
</PRE>
<P></P>
<H4>The <TT>%start_symbol</TT> directive</H4>
<P>By default, the start-symbol for the grammar that Lemon generates is the 
first non-terminal that appears in the grammar file. But you can choose a 
different start-symbol using the %start_symbol directive.</P>
<P><PRE>   %start_symbol  prog
</PRE>
<P></P>
<H4>The <TT>%token_destructor</TT> directive</H4>
<P>The %destructor directive assigns a destructor to a non-terminal symbol. (See 
the description of the %destructor directive above.) This directive does the 
same thing for all terminal symbols.</P>
<P>Unlike non-terminal symbols which may each have a different data type for 
their values, terminals all use the same data type (defined by the %token_type 
directive) and so they use a common destructor. Other than that, the token 
destructor works just like the non-terminal destructors.</P>
<H4>The <TT>%token_prefix</TT> directive</H4>
<P>Lemon generates #defines that assign small integer constants to each terminal 
symbol in the grammar. If desired, Lemon will add a prefix specified by this 
directive to each of the #defines it generates. So if the default output of 
Lemon looked like this: <PRE>    #define AND              1
    #define MINUS            2
    #define OR               3
    #define PLUS             4
</PRE>You can insert a statement into the grammar like this: <PRE>    %token_prefix    TOKEN_
</PRE>to cause Lemon to produce these symbols instead: <PRE>    #define TOKEN_AND        1
    #define TOKEN_MINUS      2
    #define TOKEN_OR         3
    #define TOKEN_PLUS       4
</PRE>
<H4>The <TT>%token_type</TT> and <TT>%type</TT> directives</H4>
<P>These directives are used to specify the data types for values on the 
parser's stack associated with terminal and non-terminal symbols. The values of 
all terminal symbols must be of the same type. This turns out to be the same 
data type as the 3rd parameter to the Parse() function generated by Lemon. 
Typically, you will make the value of a terminal symbol by a pointer to some 
kind of token structure. Like this:</P>
<P><PRE>   %token_type    {Token*}
</PRE>
<P></P>
<P>If the data type of terminals is not specified, the default value is 
``int''.</P>
<P>Non-terminal symbols can each have their own data types. Typically the data 
type of a non-terminal is a pointer to the root of a parse-tree structure that 
contains all information about that non-terminal. For example:</P>
<P><PRE>   %type   expr  {Expr*}
</PRE>
<P></P>
<P>Each entry on the parser's stack is actually a union containing instances of 
all data types for every non-terminal and terminal symbol. Lemon will 
automatically use the correct element of this union depending on what the 
corresponding non-terminal or terminal symbol is. But the grammar designer 
should keep in mind that the size of the union will be the size of its largest 
element. So if you have a single non-terminal whose data type requires 1K of 
storage, then your 100 entry parser stack will require 100K of heap space. If 
you are willing and able to pay that price, fine. You just need to know.</P>
<H3>Error Processing</H3>
<P>After extensive experimentation over several years, it has been discovered 
that the error recovery strategy used by yacc is about as good as it gets. And 
so that is what Lemon uses.</P>
<P>When a Lemon-generated parser encounters a syntax error, it first invokes the 
code specified by the %syntax_error directive, if any. It then enters its error 
recovery strategy. The error recovery strategy is to begin popping the parsers 
stack until it enters a state where it is permitted to shift a special 
non-terminal symbol named ``error''. It then shifts this non-terminal and 
continues parsing. But the %syntax_error routine will not be called again until 
at least three new tokens have been successfully shifted.</P>
<P>If the parser pops its stack until the stack is empty, and it still is unable 
to shift the error symbol, then the %parse_failed routine is invoked and the 
parser resets itself to its start state, ready to begin parsing a new file. This 
is what will happen at the very first syntax error, of course, if there are no 
instances of the ``error'' non-terminal in your grammar.</P></BODY></HTML>
上一页 1 23
💿 文件大小 48 K
👤 上传用户 syxie
📂 所属分类编译器/解释器
🏷️ 相关标签

#lemon #分析器
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -