📄 readme_lemon_tutorial.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0060)http://souptonuts.sourceforge.net/readme_lemon_tutorial.html -->
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=gb2312">
<STYLE type=text/css>PRE {
}
PRE {
FONT-SIZE: 12px; BACKGROUND-COLOR: #ddd
}
</STYLE>
<META content="MSHTML 6.00.2800.1106" name=GENERATOR></HEAD>
<BODY>
<TABLE cellSpacing=5 cellPadding=5 width="80%" border=0>
<TBODY>
<TR>
<TD>
<H1>Lemon Parser Generator Tutorial</H1>
<P>Copyright (c) 2004 by Mike Chirico mchirico@users.sourceforge.net. This
material may be distributed only subject to the terms and conditions set
forth in the Open Publication <A
href="http://www.opencontent.org/openpub/">License</A> v1.0, 8 June 1999
or later.
<P>Download: <A
href="http://prdownloads.sourceforge.net/souptonuts/lemon_examples.tar.gz?download">http://prdownloads.sourceforge.net/souptonuts/lemon_examples.tar.gz?download</A>
<P>Lemon is a compact, thread safe, well-tested parser generator written
by Dr. Richard Hipp. Using a parser generator, along with a scanner like
flex, can be advantageous because there is less code to write. You just
write the grammar for the parser.
<P>
<HR>
<P>
<H2>Example 1: Getting started</H2>
<P>Compare this to writing all of the raw code from scratch. For instance,
compare the basic C++ <A
href="http://souptonuts.sourceforge.net/code/desktop_calc.cc.html">desktop
calculator</A> to the file below. Below is a syntax file which contains
the grammar that the parser uses. "example1.y" is an example syntax file
for a very basic calculator.
<P><A
href="http://souptonuts.sourceforge.net/code/example1.y.html">example1.y</A>
<P><PRE><CODE>
1 %token_type {int}
2
3 %left PLUS MINUS.
4 %left DIVIDE TIMES.
5
6 %include {
7 #include <iostream>
8 #include "example1.h"
9 }
10
11 %syntax_error {
12 std::cout << "Syntax error!" << std::endl;
13 }
14
15 program ::= expr(A). { std::cout << "Result=" << A << std::endl; }
16
17 expr(A) ::= expr(B) MINUS expr(C). { A = B - C; }
18 expr(A) ::= expr(B) PLUS expr(C). { A = B + C; }
19 expr(A) ::= expr(B) TIMES expr(C). { A = B * C; }
20 expr(A) ::= expr(B) DIVIDE expr(C). {
21 if(C != 0){
22 A = B / C;
23 }else{
24 std::cout << "divide by zero" << std::endl;
25 }
26 } /* end of DIVIDE */
27 expr(A) ::= INTEGER(B). { A = B; }
</CODE></PRE>
<P>As you can see, this file is only 27 lines of code, not counting
spaces. It is much easer to modify the grammar than it is to rewrite
larger sections of raw code.
<P>The parser generator (lemon.c and lempar.c) takes the input from the
syntax file "example1.y" and creates the parser file "example1.c", along
with two other files "example1.h", which contains definitions for the
tokens, and "example1.out", which gives a detailed listing of the shift
reduce states for the grammar listed in "example1.y".
<P>Let's run through the steps, starting first with compiling the source
code of lemon (available <A
href="http://freshmeat.net/projects/lemon/">here</A>). The first step is
to compile lemon.c:
<P><PRE><CODE>
$ gcc -o lemon lemon.c
</CODE></PRE>
<P>Now we have our parser generator, <CODE>lemon</CODE>, so run the syntax
file "example1.y" through it.
<P><PRE><CODE>
$ ./lemon example1.y
</CODE></PRE>
<P>This will create example1.c, example1.h, and example1.out. What about
lempar.c? Compare "example1.c" with "lempar.c", and you will see that it
contains a lot of the same code. "lempar.c" is a template file. You can
modify the code if you want, and all modifications will be passed to
"example1.c" (including any comments).
<P>But "example1.c" is not complete. We'll append the contents of the file
"main_part", which contains a main function and tokens. "main_part" is
called a driver.
<P><A
href="http://souptonuts.sourceforge.net/code/main_part.html">main_part</A>
<P><PRE><CODE>
1 int main()
2 {
3 void* pParser = ParseAlloc (malloc);
4 /* First input:
5 15 / 5
6 */
7 Parse (pParser, INTEGER, 15);
8 Parse (pParser, DIVIDE, 0);
9 Parse (pParser, INTEGER, 5);
10 Parse (pParser, 0, 0);
11 /* Second input:
12 50 + 125
13 */
14 Parse (pParser, INTEGER, 50);
15 Parse (pParser, PLUS, 0);
16 Parse (pParser, INTEGER, 125);
17 Parse (pParser, 0, 0);
18 /* Third input:
19 50 * 125 + 125
20 */
21 Parse (pParser, INTEGER, 50);
22 Parse (pParser, TIMES, 0);
23 Parse (pParser, INTEGER, 125);
24 Parse (pParser, PLUS, 0);
25 Parse (pParser, INTEGER, 125);
26 Parse (pParser, 0, 0);
27 ParseFree(pParser, free );
28 }
</CODE></PRE>
<P>So, what is main_part doing? Well, line 3 initializes the parser.
You'll note that pParser is passed to each call of the "Parse" function
starting at line 7. The expression 15/5, or 15 DIVIDE 5, is performed in
lines 7 through 10, sending first the INTEGER 15, then the identifier
DIVIDE, which doesn't need a value, so 0 is chosen as the third parameter
on line 8. Finally, line 10, with 0 as the second parameter in
"Parse(pParser, 0, 0);" signals the end of the input for this expression.
(Please note that in "example4.y", the grammar will handle this with a
NEWLINE, and "Parse(pParser,0,...);" will only be called at the very end
of the syntax file.)
<P>"main_part" is appended to "example1.c". You may want to reference the
Makefile with the downloadable example, which has this step:
<P><PRE><CODE>
$ cat main_part >> example1.c
</CODE></PRE>
<P>Next, just compile example1.c, and it's good to go.
<P><PRE><CODE>
$ g++ -o ex1 example1.c
</CODE></PRE>
<P>Now execute "ex1", and we'll see that the result of 15/5 is, of course,
3. And 50+125 is equal to 175, and 50*125+125 is indeed equal to
(50*125)+125= 6375. This last result verifies that TIMES has higher
precedence than PLUS.
<P><PRE><CODE>
$ ./ex1
Result=3
Result=175
Result=6375
</CODE></PRE>
<P>Now for a closer look at the syntax file (example1.y). Why does TIMES
have higher precedence than PLUS? Line 3 and line 4 determine the
precedence of the operators PLUS, MINUS, DIVIDE, and TIMES.
<P><PRE><CODE>
3 %left PLUS MINUS.
4 %left DIVIDE TIMES.
</CODE></PRE>
<P>Lines at the top have a lower operator precedence. This is very
important to note. PLUS and MINUS have less operator precedence than
DIVIDE and TIMES because PLUS and MINUS are on line 3, whereas DIVIDE and
TIMES are on line 4. If, for example, exponentiation (EXP) were added,
since EXP has even higher operator precedence than TIMES and DIVIDE, it
would be added below line 4.
<P>What if you wanted real numbers in the input, 15.5,5.2 instead of
integers? How would you do that? It's easy. These tokens are currently
integers because of the following line in "example1.y":
<P><PRE><CODE>
1 %token_type {int}
</CODE></PRE>
<P>So to accommodate a double, line 1 would be changed to:
<P><PRE><CODE>
%token_type {double}
</CODE></PRE>
<P>Moving further down the lines of "example1.y", there is an "include"
directive on line 6. The include statement in "example1.y" passes along
any C statements, which are inserted at the beginning of the parse file
"example1.c". Again, the contents are inserted into the beginning of
"example1.c", which is necessary for declarations and headers.
<P><PRE><CODE>
...
6 %include {
7 #include <iostream>
8 #include "example1.h"
9 }
...
</CODE></PRE>
<P>Note that "example1.h" is generated from "<CODE>$ ./lemon
example1.y</CODE>". It defines the tokens, or, put another way, assigns
integer values to the token names starting at 1. Why start at 1, and not
0? Because 0 is reserved for the Parse function. Remember, "Parse
(pParser, 0, 0);", with the second parameter set to zero, signals an end
to the input.
<P>example1.h (note, this is a generated file; do not add code to it):
<P><PRE><CODE>
#define PLUS 1
#define MINUS 2
#define DIVIDE 3
#define TIMES 4
#define INTEGER 5
</CODE></PRE>
<H2>Example 2: Creating a custom token type, or structure</H2>
<P>"example2.y" differs from "example1.y" by defining the token type as a
structure. Specifically, this token type is defined in the file
"ex2def.h". Defining our own structure can give us flexibility in the
semantic action, or the piece of code on the right of the production rule.
Here is an example:
<P><PRE><CODE>
expr(A) ::= expr(B) TIMES expr(C). { A.value = B.value * C.value;
A.n = B.n+1 + C.n+1;
}
</CODE></PRE>
<P>The token_type in <A
href="http://souptonuts.sourceforge.net/code/example2.y.html">"example2.y"</A>
is defined as Token in line 6.
<P><PRE><CODE>
6 %token_type {Token}
</CODE></PRE>
<P>This structure Token is defined in "ex2def.h" as follows:
<P><A
href="http://souptonuts.sourceforge.net/code/ex2def.h.html">ex2def.h</A>
<P><PRE><CODE>
struct Token {
const char *z;
int value;
unsigned n;
};
</CODE></PRE>
<P>Special note: "const char *z" is not used in these examples, but I've
kept it in this structure, since it's the next logical step in a
calculator, assigning an expression to a variable. For instance,
variable1=2+5, where variable1 would be some value in a symbol table. See
<A
href="http://prdownloads.sourceforge.net/souptonuts/flex_bison.tar.gz?download">this
reference</A>.
<P>Again, note the change in the include directive, the addition of
"ex2def.h", which defines struct Token.
<P><A
href="http://souptonuts.sourceforge.net/code/example2.y.html">example2.y</A>
<P><PRE><CODE>
1 #include {
2 #include <iostream>
3 #include "ex2def.h"
4 #include "example2.h"
5 }
6 %token_type {Token}
7 %default_type {Token}
8 %type expr {Token}
9 %type NUM {Token}
10
11 %left PLUS MINUS.
12 %left DIVIDE TIMES.
13
14
15 %syntax_error {
16 std::cout << "Syntax error!" << std::endl;
17 }
18
19 program ::= expr(A). {
20 std::cout << "Result.value=" << A.value << std::endl;
21 std::cout << "Result.n=" << A.n << std::endl;
22 }
23 expr(A) ::= expr(B) MINUS expr(C). { A.value = B.value - C.value;
24 A.n = B.n+1 + C.n+1;
25 }
26 expr(A) ::= expr(B) PLUS expr(C). { A.value = B.value + C.value;
27 A.n = B.n+1 + C.n+1;
28 }
29 expr(A) ::= expr(B) TIMES expr(C). { A.value = B.value * C.value;
30 A.n = B.n+1 + C.n+1;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -