📄 the lemon parser generator.htm
字号:
destructor whenever the non-terminal is removed from the stack, unless the
non-terminal is used in a C-code action. If the non-terminal is used by C-code,
then it is assumed that the C-code will take care of destroying it if it should
really be destroyed. More commonly, the value is used to build some larger
structure and we don't want to destroy it, which is why the destructor is not
called in this circumstance.</P>
<P>By appropriate use of destructors, it is possible to build a parser using
Lemon that can be used within a long-running program, such as a GUI, that will
not leak memory or other resources. To do the same using yacc or bison is much
more difficult.</P>
<H4>The <TT>%extra_argument</TT> directive</H4>The %extra_argument directive
instructs Lemon to add a 4th parameter to the parameter list of the Parse()
function it generates. Lemon doesn't do anything itself with this extra
argument, but it does make the argument available to C-code action routines,
destructors, and so forth. For example, if the grammar file contains:
<P></P>
<P><PRE> %extra_argument { MyStruct *pAbc }
</PRE>
<P></P>
<P>Then the Parse() function generated will have an 4th parameter of type
``MyStruct*'' and all action routines will have access to a variable named
``pAbc'' that is the value of the 4th parameter in the most recent call to
Parse().</P>
<H4>The <TT>%include</TT> directive</H4>
<P>The %include directive specifies C code that is included at the top of the
generated parser. You can include any text you want -- the Lemon parser
generator copies to blindly. If you have multiple %include directives in your
grammar file, their values are concatenated before being put at the beginning of
the generated parser.</P>
<P>The %include directive is very handy for getting some extra #include
preprocessor statements at the beginning of the generated parser. For
example:</P>
<P><PRE> %include {#include <unistd.h>}
</PRE>
<P></P>
<P>This might be needed, for example, if some of the C actions in the grammar
call functions that are prototyed in unistd.h.</P>
<H4>The <TT>%left</TT> directive</H4>The %left directive is used (along with the
%right and %nonassoc directives) to declare precedences of terminal symbols.
Every terminal symbol whose name appears after a %left directive but before the
next period (``.'') is given the same left-associative precedence value.
Subsequent %left directives have higher precedence. For example:
<P></P>
<P><PRE> %left AND.
%left OR.
%nonassoc EQ NE GT GE LT LE.
%left PLUS MINUS.
%left TIMES DIVIDE MOD.
%right EXP NOT.
</PRE>
<P></P>
<P>Note the period that terminates each %left, %right or %nonassoc
directive.</P>
<P>LALR(1) grammars can get into a situation where they require a large amount
of stack space if you make heavy use or right-associative operators. For this
reason, it is recommended that you use %left rather than %right whenever
possible.</P>
<H4>The <TT>%name</TT> directive</H4>
<P>By default, the functions generated by Lemon all begin with the
five-character string ``Parse''. You can change this string to something
different using the %name directive. For instance:</P>
<P><PRE> %name Abcde
</PRE>
<P></P>
<P>Putting this directive in the grammar file will cause Lemon to generate
functions named
<UL>
<LI>AbcdeAlloc(),
<LI>AbcdeFree(),
<LI>AbcdeTrace(), and
<LI>Abcde(). </LI></UL>The %name directive allows you to generator two or more
different parsers and link them all into the same executable.
<P></P>
<H4>The <TT>%nonassoc</TT> directive</H4>
<P>This directive is used to assign non-associative precedence to one or more
terminal symbols. See the section on precedence rules or on the %left directive
for additional information.</P>
<H4>The <TT>%parse_accept</TT> directive</H4>
<P>The %parse_accept directive specifies a block of C code that is executed
whenever the parser accepts its input string. To ``accept'' an input string
means that the parser was able to process all tokens without error.</P>
<P>For example:</P>
<P><PRE> %parse_accept {
printf("parsing complete!\n");
}
</PRE>
<P></P>
<H4>The <TT>%parse_failure</TT> directive</H4>
<P>The %parse_failure directive specifies a block of C code that is executed
whenever the parser fails complete. This code is not executed until the parser
has tried and failed to resolve an input error using is usual error recovery
strategy. The routine is only invoked when parsing is unable to continue.</P>
<P><PRE> %parse_failure {
fprintf(stderr,"Giving up. Parser is hopelessly lost...\n");
}
</PRE>
<P></P>
<H4>The <TT>%right</TT> directive</H4>
<P>This directive is used to assign right-associative precedence to one or more
terminal symbols. See the section on precedence rules or on the %left directive
for additional information.</P>
<H4>The <TT>%stack_overflow</TT> directive</H4>
<P>The %stack_overflow directive specifies a block of C code that is executed if
the parser's internal stack ever overflows. Typically this just prints an error
message. After a stack overflow, the parser will be unable to continue and must
be reset.</P>
<P><PRE> %stack_overflow {
fprintf(stderr,"Giving up. Parser stack overflow\n");
}
</PRE>
<P></P>
<P>You can help prevent parser stack overflows by avoiding the use of right
recursion and right-precedence operators in your grammar. Use left recursion and
and left-precedence operators instead, to encourage rules to reduce sooner and
keep the stack size down. For example, do rules like this: <PRE> list ::= list element. // left-recursion. Good!
list ::= .
</PRE>Not like this: <PRE> list ::= element list. // right-recursion. Bad!
list ::= .
</PRE>
<H4>The <TT>%stack_size</TT> directive</H4>
<P>If stack overflow is a problem and you can't resolve the trouble by using
left-recursion, then you might want to increase the size of the parser's stack
using this directive. Put an positive integer after the %stack_size directive
and Lemon will generate a parse with a stack of the requested size. The default
value is 100.</P>
<P><PRE> %stack_size 2000
</PRE>
<P></P>
<H4>The <TT>%start_symbol</TT> directive</H4>
<P>By default, the start-symbol for the grammar that Lemon generates is the
first non-terminal that appears in the grammar file. But you can choose a
different start-symbol using the %start_symbol directive.</P>
<P><PRE> %start_symbol prog
</PRE>
<P></P>
<H4>The <TT>%token_destructor</TT> directive</H4>
<P>The %destructor directive assigns a destructor to a non-terminal symbol. (See
the description of the %destructor directive above.) This directive does the
same thing for all terminal symbols.</P>
<P>Unlike non-terminal symbols which may each have a different data type for
their values, terminals all use the same data type (defined by the %token_type
directive) and so they use a common destructor. Other than that, the token
destructor works just like the non-terminal destructors.</P>
<H4>The <TT>%token_prefix</TT> directive</H4>
<P>Lemon generates #defines that assign small integer constants to each terminal
symbol in the grammar. If desired, Lemon will add a prefix specified by this
directive to each of the #defines it generates. So if the default output of
Lemon looked like this: <PRE> #define AND 1
#define MINUS 2
#define OR 3
#define PLUS 4
</PRE>You can insert a statement into the grammar like this: <PRE> %token_prefix TOKEN_
</PRE>to cause Lemon to produce these symbols instead: <PRE> #define TOKEN_AND 1
#define TOKEN_MINUS 2
#define TOKEN_OR 3
#define TOKEN_PLUS 4
</PRE>
<H4>The <TT>%token_type</TT> and <TT>%type</TT> directives</H4>
<P>These directives are used to specify the data types for values on the
parser's stack associated with terminal and non-terminal symbols. The values of
all terminal symbols must be of the same type. This turns out to be the same
data type as the 3rd parameter to the Parse() function generated by Lemon.
Typically, you will make the value of a terminal symbol by a pointer to some
kind of token structure. Like this:</P>
<P><PRE> %token_type {Token*}
</PRE>
<P></P>
<P>If the data type of terminals is not specified, the default value is
``int''.</P>
<P>Non-terminal symbols can each have their own data types. Typically the data
type of a non-terminal is a pointer to the root of a parse-tree structure that
contains all information about that non-terminal. For example:</P>
<P><PRE> %type expr {Expr*}
</PRE>
<P></P>
<P>Each entry on the parser's stack is actually a union containing instances of
all data types for every non-terminal and terminal symbol. Lemon will
automatically use the correct element of this union depending on what the
corresponding non-terminal or terminal symbol is. But the grammar designer
should keep in mind that the size of the union will be the size of its largest
element. So if you have a single non-terminal whose data type requires 1K of
storage, then your 100 entry parser stack will require 100K of heap space. If
you are willing and able to pay that price, fine. You just need to know.</P>
<H3>Error Processing</H3>
<P>After extensive experimentation over several years, it has been discovered
that the error recovery strategy used by yacc is about as good as it gets. And
so that is what Lemon uses.</P>
<P>When a Lemon-generated parser encounters a syntax error, it first invokes the
code specified by the %syntax_error directive, if any. It then enters its error
recovery strategy. The error recovery strategy is to begin popping the parsers
stack until it enters a state where it is permitted to shift a special
non-terminal symbol named ``error''. It then shifts this non-terminal and
continues parsing. But the %syntax_error routine will not be called again until
at least three new tokens have been successfully shifted.</P>
<P>If the parser pops its stack until the stack is empty, and it still is unable
to shift the error symbol, then the %parse_failed routine is invoked and the
parser resets itself to its start state, ready to begin parsing a new file. This
is what will happen at the very first syntax error, of course, if there are no
instances of the ``error'' non-terminal in your grammar.</P></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -