📄 readme_lemon_tutorial.htm

📁 学习lemon语法分析的windows程序
💻 HTM
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0060)http://souptonuts.sourceforge.net/readme_lemon_tutorial.html -->
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=gb2312">
<STYLE type=text/css>PRE {
	
}
PRE {
	FONT-SIZE: 12px; BACKGROUND-COLOR: #ddd
}
</STYLE>

<META content="MSHTML 6.00.2800.1106" name=GENERATOR></HEAD>
<BODY>
<TABLE cellSpacing=5 cellPadding=5 width="80%" border=0>
  <TBODY>
  <TR>
    <TD>
      <H1>Lemon Parser Generator Tutorial</H1>
      <P>Copyright (c) 2004 by Mike Chirico mchirico@users.sourceforge.net. This 
      material may be distributed only subject to the terms and conditions set 
      forth in the Open Publication <A 
      href="http://www.opencontent.org/openpub/">License</A> v1.0, 8 June 1999 
      or later. 
      <P>Download: <A 
      href="http://prdownloads.sourceforge.net/souptonuts/lemon_examples.tar.gz?download">http://prdownloads.sourceforge.net/souptonuts/lemon_examples.tar.gz?download</A> 

      <P>Lemon is a compact, thread safe, well-tested parser generator written 
      by Dr. Richard Hipp. Using a parser generator, along with a scanner like 
      flex, can be advantageous because there is less code to write. You just 
      write the grammar for the parser. 
      <P>
      <HR>

      <P>
      <H2>Example 1: Getting started</H2>
      <P>Compare this to writing all of the raw code from scratch. For instance, 
      compare the basic C++ <A 
      href="http://souptonuts.sourceforge.net/code/desktop_calc.cc.html">desktop 
      calculator</A> to the file below. Below is a syntax file which contains 
      the grammar that the parser uses. "example1.y" is an example syntax file 
      for a very basic calculator. 
      <P><A 
      href="http://souptonuts.sourceforge.net/code/example1.y.html">example1.y</A> 

      <P><PRE><CODE>
    1  %token_type {int}
    2
    3  %left PLUS MINUS.
    4  %left DIVIDE TIMES.
    5
    6  %include {
    7  #include &lt;iostream&gt;
    8  #include "example1.h"
    9  }
    10
    11  %syntax_error {
    12    std::cout &lt;&lt; "Syntax error!" &lt;&lt; std::endl;
    13  }
    14
    15  program ::= expr(A).   { std::cout &lt;&lt; "Result=" &lt;&lt; A &lt;&lt; std::endl; }
    16
    17  expr(A) ::= expr(B) MINUS  expr(C).   { A = B - C; }
    18  expr(A) ::= expr(B) PLUS  expr(C).   { A = B + C; }
    19  expr(A) ::= expr(B) TIMES  expr(C).   { A = B * C; }
    20  expr(A) ::= expr(B) DIVIDE expr(C).  {
    
    
    21           if(C != 0){
    22             A = B / C;
    23            }else{
    24             std::cout &lt;&lt; "divide by zero" &lt;&lt; std::endl;
    25             }
    26  }  /* end of DIVIDE */
    
    
    27  expr(A) ::= INTEGER(B). { A = B; }
</CODE></PRE>
      <P>As you can see, this file is only 27 lines of code, not counting 
      spaces. It is much easer to modify the grammar than it is to rewrite 
      larger sections of raw code. 
      <P>The parser generator (lemon.c and lempar.c) takes the input from the 
      syntax file "example1.y" and creates the parser file "example1.c", along 
      with two other files "example1.h", which contains definitions for the 
      tokens, and "example1.out", which gives a detailed listing of the shift 
      reduce states for the grammar listed in "example1.y". 
      <P>Let's run through the steps, starting first with compiling the source 
      code of lemon (available <A 
      href="http://freshmeat.net/projects/lemon/">here</A>). The first step is 
      to compile lemon.c: 
      <P><PRE><CODE>
    $ gcc -o lemon lemon.c
</CODE></PRE>
      <P>Now we have our parser generator, <CODE>lemon</CODE>, so run the syntax 
      file "example1.y" through it. 
      <P><PRE><CODE>
    $ ./lemon example1.y
</CODE></PRE>
      <P>This will create example1.c, example1.h, and example1.out. What about 
      lempar.c? Compare "example1.c" with "lempar.c", and you will see that it 
      contains a lot of the same code. "lempar.c" is a template file. You can 
      modify the code if you want, and all modifications will be passed to 
      "example1.c" (including any comments). 
      <P>But "example1.c" is not complete. We'll append the contents of the file 
      "main_part", which contains a main function and tokens. "main_part" is 
      called a driver. 
      <P><A 
      href="http://souptonuts.sourceforge.net/code/main_part.html">main_part</A> 

      <P><PRE><CODE>
    1  int main()
    2  {
    3    void* pParser = ParseAlloc (malloc);
    
    4    /* First input:
    5        15 / 5
    6                                  */
    7    Parse (pParser, INTEGER, 15);
    8    Parse (pParser, DIVIDE, 0);
    9    Parse (pParser, INTEGER, 5);
    10    Parse (pParser, 0, 0);
    
    11    /*  Second input:
    12          50 + 125
    13                                 */
    
    14    Parse (pParser, INTEGER, 50);
    15    Parse (pParser, PLUS, 0);
    16    Parse (pParser, INTEGER, 125);
    17    Parse (pParser, 0, 0);
    
    18    /*  Third input:
    19          50 * 125 + 125
    20                                 */
    
    
    
    21    Parse (pParser, INTEGER, 50);
    22    Parse (pParser, TIMES, 0);
    23    Parse (pParser, INTEGER, 125);
    24    Parse (pParser, PLUS, 0);
    25    Parse (pParser, INTEGER, 125);
    26    Parse (pParser, 0, 0);
    
    27    ParseFree(pParser, free );
    
    28  }
</CODE></PRE>
      <P>So, what is main_part doing? Well, line 3 initializes the parser. 
      You'll note that pParser is passed to each call of the "Parse" function 
      starting at line 7. The expression 15/5, or 15 DIVIDE 5, is performed in 
      lines 7 through 10, sending first the INTEGER 15, then the identifier 
      DIVIDE, which doesn't need a value, so 0 is chosen as the third parameter 
      on line 8. Finally, line 10, with 0 as the second parameter in 
      "Parse(pParser, 0, 0);" signals the end of the input for this expression. 
      (Please note that in "example4.y", the grammar will handle this with a 
      NEWLINE, and "Parse(pParser,0,...);" will only be called at the very end 
      of the syntax file.) 
      <P>"main_part" is appended to "example1.c". You may want to reference the 
      Makefile with the downloadable example, which has this step: 
      <P><PRE><CODE>
    $ cat main_part &gt;&gt; example1.c
</CODE></PRE>
      <P>Next, just compile example1.c, and it's good to go. 
      <P><PRE><CODE>
    $ g++ -o ex1 example1.c
</CODE></PRE>
      <P>Now execute "ex1", and we'll see that the result of 15/5 is, of course, 
      3. And 50+125 is equal to 175, and 50*125+125 is indeed equal to 
      (50*125)+125= 6375. This last result verifies that TIMES has higher 
      precedence than PLUS. 
      <P><PRE><CODE>
    $ ./ex1
    Result=3
    Result=175
    Result=6375
</CODE></PRE>
      <P>Now for a closer look at the syntax file (example1.y). Why does TIMES 
      have higher precedence than PLUS? Line 3 and line 4 determine the 
      precedence of the operators PLUS, MINUS, DIVIDE, and TIMES. 
      <P><PRE><CODE>
    3  %left PLUS MINUS.
    4  %left DIVIDE TIMES.
</CODE></PRE>
      <P>Lines at the top have a lower operator precedence. This is very 
      important to note. PLUS and MINUS have less operator precedence than 
      DIVIDE and TIMES because PLUS and MINUS are on line 3, whereas DIVIDE and 
      TIMES are on line 4. If, for example, exponentiation (EXP) were added, 
      since EXP has even higher operator precedence than TIMES and DIVIDE, it 
      would be added below line 4. 
      <P>What if you wanted real numbers in the input, 15.5,5.2 instead of 
      integers? How would you do that? It's easy. These tokens are currently 
      integers because of the following line in "example1.y": 
      <P><PRE><CODE>
    1  %token_type {int}
</CODE></PRE>
      <P>So to accommodate a double, line 1 would be changed to: 
      <P><PRE><CODE>
    %token_type {double}
</CODE></PRE>
      <P>Moving further down the lines of "example1.y", there is an "include" 
      directive on line 6. The include statement in "example1.y" passes along 
      any C statements, which are inserted at the beginning of the parse file 
      "example1.c". Again, the contents are inserted into the beginning of 
      "example1.c", which is necessary for declarations and headers. 
      <P><PRE><CODE>
    ...
    6  %include {
    7  #include &lt;iostream&gt;
    8  #include "example1.h"
    9  }
    ...
</CODE></PRE>
      <P>Note that "example1.h" is generated from "<CODE>$ ./lemon 
      example1.y</CODE>". It defines the tokens, or, put another way, assigns 
      integer values to the token names starting at 1. Why start at 1, and not 
      0? Because 0 is reserved for the Parse function. Remember, "Parse 
      (pParser, 0, 0);", with the second parameter set to zero, signals an end 
      to the input. 
      <P>example1.h (note, this is a generated file; do not add code to it): 
      <P><PRE><CODE>
    #define PLUS                            1
    #define MINUS                           2
    #define DIVIDE                          3
    #define TIMES                           4
    #define INTEGER                         5
</CODE></PRE>
      <H2>Example 2: Creating a custom token type, or structure</H2>
      <P>"example2.y" differs from "example1.y" by defining the token type as a 
      structure. Specifically, this token type is defined in the file 
      "ex2def.h". Defining our own structure can give us flexibility in the 
      semantic action, or the piece of code on the right of the production rule. 
      Here is an example: 
      <P><PRE><CODE>
    expr(A) ::= expr(B) TIMES  expr(C).   { A.value = B.value * C.value;
    A.n = B.n+1  + C.n+1;
    }
</CODE></PRE>
      <P>The token_type in <A 
      href="http://souptonuts.sourceforge.net/code/example2.y.html">"example2.y"</A> 
      is defined as Token in line 6. 
      <P><PRE><CODE>
    6  %token_type {Token}
</CODE></PRE>
      <P>This structure Token is defined in "ex2def.h" as follows: 
      <P><A 
      href="http://souptonuts.sourceforge.net/code/ex2def.h.html">ex2def.h</A> 
      <P><PRE><CODE>
    struct Token {
    const char *z;
    int value;
    unsigned n;
    };
    
</CODE></PRE>
      <P>Special note: "const char *z" is not used in these examples, but I've 
      kept it in this structure, since it's the next logical step in a 
      calculator, assigning an expression to a variable. For instance, 
      variable1=2+5, where variable1 would be some value in a symbol table. See 
      <A 
      href="http://prdownloads.sourceforge.net/souptonuts/flex_bison.tar.gz?download">this 
      reference</A>. 
      <P>Again, note the change in the include directive, the addition of 
      "ex2def.h", which defines struct Token. 
      <P><A 
      href="http://souptonuts.sourceforge.net/code/example2.y.html">example2.y</A> 

      <P><PRE><CODE>
    1  #include {
    2  #include &lt;iostream&gt;
    3  #include "ex2def.h"
    4  #include "example2.h"
    5  }
    
    6  %token_type {Token}
    7  %default_type {Token}
    
    8  %type expr {Token}
    9  %type NUM {Token}
    10
    11  %left PLUS MINUS.
    12  %left DIVIDE TIMES.
    13
    14
    15  %syntax_error {
    16    std::cout &lt;&lt; "Syntax error!" &lt;&lt; std::endl;
    17  }
    18
    19  program ::= expr(A).   {
    20                          std::cout &lt;&lt; "Result.value=" &lt;&lt; A.value &lt;&lt; std::endl;
    21                          std::cout &lt;&lt; "Result.n=" &lt;&lt; A.n &lt;&lt; std::endl;
    
    22                           }
    
    23  expr(A) ::= expr(B) MINUS  expr(C).   { A.value = B.value - C.value;
    24                                         A.n = B.n+1  + C.n+1;
    25                                        }
    
    
    26  expr(A) ::= expr(B) PLUS  expr(C).   { A.value = B.value + C.value;
    27                                         A.n = B.n+1  + C.n+1;
    28                                        }
    
    29  expr(A) ::= expr(B) TIMES  expr(C).   { A.value = B.value * C.value;
    30                                          A.n = B.n+1  + C.n+1;
12 3 下一页
💿 文件大小 338 K
👤 上传用户 bonylee_java
📂 所属分类编译器/解释器
🏷️ 相关标签

#windows #lemon #分 #程序
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -