📄 the lemon parser generator.htm
字号:
separate tokens) and it honors the same commenting conventions as C and C++.</P>
<H3>Terminals and Nonterminals</H3>
<P>A terminal symbol (token) is any string of alphanumeric and underscore
characters that begins with an upper case letter. A terminal can contain lower
class letters after the first character, but the usual convention is to make
terminals all upper case. A nonterminal, on the other hand, is any string of
alphanumeric and underscore characters than begins with a lower case letter.
Again, the usual convention is to make nonterminals use all lower case
letters.</P>
<P>In Lemon, terminal and nonterminal symbols do not need to be declared or
identified in a separate section of the grammar file. Lemon is able to generate
a list of all terminals and nonterminals by examining the grammar rules, and it
can always distinguish a terminal from a nonterminal by checking the case of the
first character of the name.</P>
<P>Yacc and bison allow terminal symbols to have either alphanumeric names or to
be individual characters included in single quotes, like this: ')' or '$'. Lemon
does not allow this alternative form for terminal symbols. With Lemon, all
symbols, terminals and nonterminals, must have alphanumeric names.</P>
<H3>Grammar Rules</H3>
<P>The main component of a Lemon grammar file is a sequence of grammar rules.
Each grammar rule consists of a nonterminal symbol followed by the special
symbol ``::='' and then a list of terminals and/or nonterminals. The rule is
terminated by a period. The list of terminals and nonterminals on the right-hand
side of the rule can be empty. Rules can occur in any order, except that the
left-hand side of the first rule is assumed to be the start symbol for the
grammar (unless specified otherwise using the <TT>%start</TT> directive
described below.) A typical sequence of grammar rules might look something like
this: <PRE> expr ::= expr PLUS expr.
expr ::= expr TIMES expr.
expr ::= LPAREN expr RPAREN.
expr ::= VALUE.
</PRE>
<P></P>
<P>There is one non-terminal in this example, ``expr'', and five terminal
symbols or tokens: ``PLUS'', ``TIMES'', ``LPAREN'', ``RPAREN'' and
``VALUE''.</P>
<P>Like yacc and bison, Lemon allows the grammar to specify a block of C code
that will be executed whenever a grammar rule is reduced by the parser. In
Lemon, this action is specified by putting the C code (contained within curly
braces <TT>{...}</TT>) immediately after the period that closes the rule. For
example: <PRE> expr ::= expr PLUS expr. { printf("Doing an addition...\n"); }
</PRE>
<P></P>
<P>In order to be useful, grammar actions must normally be linked to their
associated grammar rules. In yacc and bison, this is accomplished by embedding a
``$$'' in the action to stand for the value of the left-hand side of the rule
and symbols ``$1'', ``$2'', and so forth to stand for the value of the terminal
or nonterminal at position 1, 2 and so forth on the right-hand side of the rule.
This idea is very powerful, but it is also very error-prone. The single most
common source of errors in a yacc or bison grammar is to miscount the number of
symbols on the right-hand side of a grammar rule and say ``$7'' when you really
mean ``$8''.</P>
<P>Lemon avoids the need to count grammar symbols by assigning symbolic names to
each symbol in a grammar rule and then using those symbolic names in the action.
In yacc or bison, one would write this: <PRE> expr -> expr PLUS expr { $$ = $1 + $3; };
</PRE>But in Lemon, the same rule becomes the following: <PRE> expr(A) ::= expr(B) PLUS expr(C). { A = B+C; }
</PRE>In the Lemon rule, any symbol in parentheses after a grammar rule symbol
becomes a place holder for that symbol in the grammar rule. This place holder
can then be used in the associated C action to stand for the value of that
symbol.
<P>
<P>The Lemon notation for linking a grammar rule with its reduce action is
superior to yacc/bison on several counts. First, as mentioned above, the Lemon
method avoids the need to count grammar symbols. Secondly, if a terminal or
nonterminal in a Lemon grammar rule includes a linking symbol in parentheses but
that linking symbol is not actually used in the reduce action, then an error
message is generated. For example, the rule <PRE> expr(A) ::= expr(B) PLUS expr(C). { A = B; }
</PRE>will generate an error because the linking symbol ``C'' is used in the
grammar rule but not in the reduce action.
<P></P>
<P>The Lemon notation for linking grammar rules to reduce actions also
facilitates the use of destructors for reclaiming memory allocated by the values
of terminals and nonterminals on the right-hand side of a rule.</P>
<H3>Precedence Rules</H3>
<P>Lemon resolves parsing ambiguities in exactly the same way as yacc and bison.
A shift-reduce conflict is resolved in favor of the shift, and a reduce-reduce
conflict is resolved by reducing whichever rule comes first in the grammar
file.</P>
<P>Just like in yacc and bison, Lemon allows a measure of control over the
resolution of paring conflicts using precedence rules. A precedence value can be
assigned to any terminal symbol using the %left, %right or %nonassoc directives.
Terminal symbols mentioned in earlier directives have a lower precedence that
terminal symbols mentioned in later directives. For example:</P>
<P><PRE> %left AND.
%left OR.
%nonassoc EQ NE GT GE LT LE.
%left PLUS MINUS.
%left TIMES DIVIDE MOD.
%right EXP NOT.
</PRE>
<P></P>
<P>In the preceding sequence of directives, the AND operator is defined to have
the lowest precedence. The OR operator is one precedence level higher. And so
forth. Hence, the grammar would attempt to group the ambiguous expression <PRE> a AND b OR c
</PRE>like this <PRE> a AND (b OR c).
</PRE>The associativity (left, right or nonassoc) is used to determine the
grouping when the precedence is the same. AND is left-associative in our
example, so <PRE> a AND b AND c
</PRE>is parsed like this <PRE> (a AND b) AND c.
</PRE>The EXP operator is right-associative, though, so <PRE> a EXP b EXP c
</PRE>is parsed like this <PRE> a EXP (b EXP c).
</PRE>The nonassoc precedence is used for non-associative operators. So <PRE> a EQ b EQ c
</PRE>is an error.
<P></P>
<P>The precedence of non-terminals is transferred to rules as follows: The
precedence of a grammar rule is equal to the precedence of the left-most
terminal symbol in the rule for which a precedence is defined. This is normally
what you want, but in those cases where you want to precedence of a grammar rule
to be something different, you can specify an alternative precedence symbol by
putting the symbol in square braces after the period at the end of the rule and
before any C-code. For example:</P>
<P><PRE> expr = MINUS expr. [NOT]
</PRE>
<P></P>
<P>This rule has a precedence equal to that of the NOT symbol, not the MINUS
symbol as would have been the case by default.</P>
<P>With the knowledge of how precedence is assigned to terminal symbols and
individual grammar rules, we can now explain precisely how parsing conflicts are
resolved in Lemon. Shift-reduce conflicts are resolved as follows:
<UL>
<LI>If either the token to be shifted or the rule to be reduced lacks
precedence information, then resolve in favor of the shift, but report a
parsing conflict.
<LI>If the precedence of the token to be shifted is greater than the
precedence of the rule to reduce, then resolve in favor of the shift. No
parsing conflict is reported.
<LI>If the precedence of the token it be shifted is less than the precedence
of the rule to reduce, then resolve in favor of the reduce action. No parsing
conflict is reported.
<LI>If the precedences are the same and the shift token is right-associative,
then resolve in favor of the shift. No parsing conflict is reported.
<LI>If the precedences are the same the the shift token is left-associative,
then resolve in favor of the reduce. No parsing conflict is reported.
<LI>Otherwise, resolve the conflict by doing the shift and report the parsing
conflict. </LI></UL>Reduce-reduce conflicts are resolved this way:
<UL>
<LI>If either reduce rule lacks precedence information, then resolve in favor
of the rule that appears first in the grammar and report a parsing conflict.
<LI>If both rules have precedence and the precedence is different then resolve
the dispute in favor of the rule with the highest precedence and do not report
a conflict.
<LI>Otherwise, resolve the conflict by reducing by the rule that appears first
in the grammar and report a parsing conflict. </LI></UL>
<H3>Special Directives</H3>
<P>The input grammar to Lemon consists of grammar rules and special directives.
We've described all the grammar rules, so now we'll talk about the special
directives.</P>
<P>Directives in lemon can occur in any order. You can put them before the
grammar rules, or after the grammar rules, or in the mist of the grammar rules.
It doesn't matter. The relative order of directives used to assign precedence to
terminals is important, but other than that, the order of directives in Lemon is
arbitrary.</P>
<P>Lemon supports the following special directives:
<UL>
<LI><TT>%destructor</TT>
<LI><TT>%extra_argument</TT>
<LI><TT>%include</TT>
<LI><TT>%left</TT>
<LI><TT>%name</TT>
<LI><TT>%nonassoc</TT>
<LI><TT>%parse_accept</TT>
<LI><TT>%parse_failure </TT>
<LI><TT>%right</TT>
<LI><TT>%stack_overflow</TT>
<LI><TT>%stack_size</TT>
<LI><TT>%start_symbol</TT>
<LI><TT>%syntax_error</TT>
<LI><TT>%token_destructor</TT>
<LI><TT>%token_prefix</TT>
<LI><TT>%token_type</TT>
<LI><TT>%type</TT> </LI></UL>Each of these directives will be described
separately in the following sections:
<P></P>
<H4>The <TT>%destructor</TT> directive</H4>
<P>The %destructor directive is used to specify a destructor for a non-terminal
symbol. (See also the %token_destructor directive which is used to specify a
destructor for terminal symbols.)</P>
<P>A non-terminal's destructor is called to dispose of the non-terminal's value
whenever the non-terminal is popped from the stack. This includes all of the
following circumstances:
<UL>
<LI>When a rule reduces and the value of a non-terminal on the right-hand side
is not linked to C code.
<LI>When the stack is popped during error processing.
<LI>When the ParseFree() function runs. </LI></UL>The destructor can do whatever
it wants with the value of the non-terminal, but its design is to deallocate
memory or other resources held by that non-terminal.
<P></P>
<P>Consider an example: <PRE> %type nt {void*}
%destructor nt { free($$); }
nt(A) ::= ID NUM. { A = malloc( 100 ); }
</PRE>This example is a bit contrived but it serves to illustrate how
destructors work. The example shows a non-terminal named ``nt'' that holds
values of type ``void*''. When the rule for an ``nt'' reduces, it sets the value
of the non-terminal to space obtained from malloc(). Later, when the nt
non-terminal is popped from the stack, the destructor will fire and call free()
on this malloced space, thus avoiding a memory leak. (Note that the symbol
``$$'' in the destructor code is replaced by the value of the non-terminal.)
<P></P>
<P>It is important to note that the value of a non-terminal is passed to the
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -