⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 the lemon parser generator.htm

📁 編譯器的lemon的辭法分析器
💻 HTM
📖 第 1 页 / 共 3 页
字号:
separate tokens) and it honors the same commenting conventions as C and C++.</P>
<H3>Terminals and Nonterminals</H3>
<P>A terminal symbol (token) is any string of alphanumeric and underscore 
characters that begins with an upper case letter. A terminal can contain lower 
class letters after the first character, but the usual convention is to make 
terminals all upper case. A nonterminal, on the other hand, is any string of 
alphanumeric and underscore characters than begins with a lower case letter. 
Again, the usual convention is to make nonterminals use all lower case 
letters.</P>
<P>In Lemon, terminal and nonterminal symbols do not need to be declared or 
identified in a separate section of the grammar file. Lemon is able to generate 
a list of all terminals and nonterminals by examining the grammar rules, and it 
can always distinguish a terminal from a nonterminal by checking the case of the 
first character of the name.</P>
<P>Yacc and bison allow terminal symbols to have either alphanumeric names or to 
be individual characters included in single quotes, like this: ')' or '$'. Lemon 
does not allow this alternative form for terminal symbols. With Lemon, all 
symbols, terminals and nonterminals, must have alphanumeric names.</P>
<H3>Grammar Rules</H3>
<P>The main component of a Lemon grammar file is a sequence of grammar rules. 
Each grammar rule consists of a nonterminal symbol followed by the special 
symbol ``::='' and then a list of terminals and/or nonterminals. The rule is 
terminated by a period. The list of terminals and nonterminals on the right-hand 
side of the rule can be empty. Rules can occur in any order, except that the 
left-hand side of the first rule is assumed to be the start symbol for the 
grammar (unless specified otherwise using the <TT>%start</TT> directive 
described below.) A typical sequence of grammar rules might look something like 
this: <PRE>  expr ::= expr PLUS expr.
  expr ::= expr TIMES expr.
  expr ::= LPAREN expr RPAREN.
  expr ::= VALUE.
</PRE>
<P></P>
<P>There is one non-terminal in this example, ``expr'', and five terminal 
symbols or tokens: ``PLUS'', ``TIMES'', ``LPAREN'', ``RPAREN'' and 
``VALUE''.</P>
<P>Like yacc and bison, Lemon allows the grammar to specify a block of C code 
that will be executed whenever a grammar rule is reduced by the parser. In 
Lemon, this action is specified by putting the C code (contained within curly 
braces <TT>{...}</TT>) immediately after the period that closes the rule. For 
example: <PRE>  expr ::= expr PLUS expr.   { printf("Doing an addition...\n"); }
</PRE>
<P></P>
<P>In order to be useful, grammar actions must normally be linked to their 
associated grammar rules. In yacc and bison, this is accomplished by embedding a 
``$$'' in the action to stand for the value of the left-hand side of the rule 
and symbols ``$1'', ``$2'', and so forth to stand for the value of the terminal 
or nonterminal at position 1, 2 and so forth on the right-hand side of the rule. 
This idea is very powerful, but it is also very error-prone. The single most 
common source of errors in a yacc or bison grammar is to miscount the number of 
symbols on the right-hand side of a grammar rule and say ``$7'' when you really 
mean ``$8''.</P>
<P>Lemon avoids the need to count grammar symbols by assigning symbolic names to 
each symbol in a grammar rule and then using those symbolic names in the action. 
In yacc or bison, one would write this: <PRE>  expr -&gt; expr PLUS expr  { $$ = $1 + $3; };
</PRE>But in Lemon, the same rule becomes the following: <PRE>  expr(A) ::= expr(B) PLUS expr(C).  { A = B+C; }
</PRE>In the Lemon rule, any symbol in parentheses after a grammar rule symbol 
becomes a place holder for that symbol in the grammar rule. This place holder 
can then be used in the associated C action to stand for the value of that 
symbol.
<P>
<P>The Lemon notation for linking a grammar rule with its reduce action is 
superior to yacc/bison on several counts. First, as mentioned above, the Lemon 
method avoids the need to count grammar symbols. Secondly, if a terminal or 
nonterminal in a Lemon grammar rule includes a linking symbol in parentheses but 
that linking symbol is not actually used in the reduce action, then an error 
message is generated. For example, the rule <PRE>  expr(A) ::= expr(B) PLUS expr(C).  { A = B; }
</PRE>will generate an error because the linking symbol ``C'' is used in the 
grammar rule but not in the reduce action.
<P></P>
<P>The Lemon notation for linking grammar rules to reduce actions also 
facilitates the use of destructors for reclaiming memory allocated by the values 
of terminals and nonterminals on the right-hand side of a rule.</P>
<H3>Precedence Rules</H3>
<P>Lemon resolves parsing ambiguities in exactly the same way as yacc and bison. 
A shift-reduce conflict is resolved in favor of the shift, and a reduce-reduce 
conflict is resolved by reducing whichever rule comes first in the grammar 
file.</P>
<P>Just like in yacc and bison, Lemon allows a measure of control over the 
resolution of paring conflicts using precedence rules. A precedence value can be 
assigned to any terminal symbol using the %left, %right or %nonassoc directives. 
Terminal symbols mentioned in earlier directives have a lower precedence that 
terminal symbols mentioned in later directives. For example:</P>
<P><PRE>   %left AND.
   %left OR.
   %nonassoc EQ NE GT GE LT LE.
   %left PLUS MINUS.
   %left TIMES DIVIDE MOD.
   %right EXP NOT.
</PRE>
<P></P>
<P>In the preceding sequence of directives, the AND operator is defined to have 
the lowest precedence. The OR operator is one precedence level higher. And so 
forth. Hence, the grammar would attempt to group the ambiguous expression <PRE>     a AND b OR c
</PRE>like this <PRE>     a AND (b OR c).
</PRE>The associativity (left, right or nonassoc) is used to determine the 
grouping when the precedence is the same. AND is left-associative in our 
example, so <PRE>     a AND b AND c
</PRE>is parsed like this <PRE>     (a AND b) AND c.
</PRE>The EXP operator is right-associative, though, so <PRE>     a EXP b EXP c
</PRE>is parsed like this <PRE>     a EXP (b EXP c).
</PRE>The nonassoc precedence is used for non-associative operators. So <PRE>     a EQ b EQ c
</PRE>is an error.
<P></P>
<P>The precedence of non-terminals is transferred to rules as follows: The 
precedence of a grammar rule is equal to the precedence of the left-most 
terminal symbol in the rule for which a precedence is defined. This is normally 
what you want, but in those cases where you want to precedence of a grammar rule 
to be something different, you can specify an alternative precedence symbol by 
putting the symbol in square braces after the period at the end of the rule and 
before any C-code. For example:</P>
<P><PRE>   expr = MINUS expr.  [NOT]
</PRE>
<P></P>
<P>This rule has a precedence equal to that of the NOT symbol, not the MINUS 
symbol as would have been the case by default.</P>
<P>With the knowledge of how precedence is assigned to terminal symbols and 
individual grammar rules, we can now explain precisely how parsing conflicts are 
resolved in Lemon. Shift-reduce conflicts are resolved as follows: 
<UL>
  <LI>If either the token to be shifted or the rule to be reduced lacks 
  precedence information, then resolve in favor of the shift, but report a 
  parsing conflict. 
  <LI>If the precedence of the token to be shifted is greater than the 
  precedence of the rule to reduce, then resolve in favor of the shift. No 
  parsing conflict is reported. 
  <LI>If the precedence of the token it be shifted is less than the precedence 
  of the rule to reduce, then resolve in favor of the reduce action. No parsing 
  conflict is reported. 
  <LI>If the precedences are the same and the shift token is right-associative, 
  then resolve in favor of the shift. No parsing conflict is reported. 
  <LI>If the precedences are the same the the shift token is left-associative, 
  then resolve in favor of the reduce. No parsing conflict is reported. 
  <LI>Otherwise, resolve the conflict by doing the shift and report the parsing 
  conflict. </LI></UL>Reduce-reduce conflicts are resolved this way: 
<UL>
  <LI>If either reduce rule lacks precedence information, then resolve in favor 
  of the rule that appears first in the grammar and report a parsing conflict. 
  <LI>If both rules have precedence and the precedence is different then resolve 
  the dispute in favor of the rule with the highest precedence and do not report 
  a conflict. 
  <LI>Otherwise, resolve the conflict by reducing by the rule that appears first 
  in the grammar and report a parsing conflict. </LI></UL>
<H3>Special Directives</H3>
<P>The input grammar to Lemon consists of grammar rules and special directives. 
We've described all the grammar rules, so now we'll talk about the special 
directives.</P>
<P>Directives in lemon can occur in any order. You can put them before the 
grammar rules, or after the grammar rules, or in the mist of the grammar rules. 
It doesn't matter. The relative order of directives used to assign precedence to 
terminals is important, but other than that, the order of directives in Lemon is 
arbitrary.</P>
<P>Lemon supports the following special directives: 
<UL>
  <LI><TT>%destructor</TT> 
  <LI><TT>%extra_argument</TT> 
  <LI><TT>%include</TT> 
  <LI><TT>%left</TT> 
  <LI><TT>%name</TT> 
  <LI><TT>%nonassoc</TT> 
  <LI><TT>%parse_accept</TT> 
  <LI><TT>%parse_failure </TT>
  <LI><TT>%right</TT> 
  <LI><TT>%stack_overflow</TT> 
  <LI><TT>%stack_size</TT> 
  <LI><TT>%start_symbol</TT> 
  <LI><TT>%syntax_error</TT> 
  <LI><TT>%token_destructor</TT> 
  <LI><TT>%token_prefix</TT> 
  <LI><TT>%token_type</TT> 
  <LI><TT>%type</TT> </LI></UL>Each of these directives will be described 
separately in the following sections:
<P></P>
<H4>The <TT>%destructor</TT> directive</H4>
<P>The %destructor directive is used to specify a destructor for a non-terminal 
symbol. (See also the %token_destructor directive which is used to specify a 
destructor for terminal symbols.)</P>
<P>A non-terminal's destructor is called to dispose of the non-terminal's value 
whenever the non-terminal is popped from the stack. This includes all of the 
following circumstances: 
<UL>
  <LI>When a rule reduces and the value of a non-terminal on the right-hand side 
  is not linked to C code. 
  <LI>When the stack is popped during error processing. 
  <LI>When the ParseFree() function runs. </LI></UL>The destructor can do whatever 
it wants with the value of the non-terminal, but its design is to deallocate 
memory or other resources held by that non-terminal.
<P></P>
<P>Consider an example: <PRE>   %type nt {void*}
   %destructor nt { free($$); }
   nt(A) ::= ID NUM.   { A = malloc( 100 ); }
</PRE>This example is a bit contrived but it serves to illustrate how 
destructors work. The example shows a non-terminal named ``nt'' that holds 
values of type ``void*''. When the rule for an ``nt'' reduces, it sets the value 
of the non-terminal to space obtained from malloc(). Later, when the nt 
non-terminal is popped from the stack, the destructor will fire and call free() 
on this malloced space, thus avoiding a memory leak. (Note that the symbol 
``$$'' in the destructor code is replaced by the value of the non-terminal.)
<P></P>
<P>It is important to note that the value of a non-terminal is passed to the 

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -