📄 readme_lemon_tutorial.htm
字号:
31 }
32 expr(A) ::= expr(B) DIVIDE expr(C). {
33 if(C.value != 0){
34 A.value = B.value / C.value;
35 A.n = B.n+1 + C.n+1;
36 }else{
37 std::cout << "divide by zero" << std::endl;
38 }
39 } /* end of DIVIDE */
40 expr(A) ::= NUM(B). { A.value = B.value; A.n = B.n+1; }
</CODE></PRE>
<P>As you can see below, taking a close look at lines 23 through 25, the
Token structure A now takes on members "A.value" and "A.n", with ".value"
taking on the value of the expression and ".n" the number of times an
assignment is made:
<P><PRE><CODE>
23 expr(A) ::= expr(B) MINUS expr(C). { A.value = B.value - C.value;
24 A.n = B.n+1 + C.n+1;
25 }
</CODE></PRE>
<P>This is a quick way to see the "shift" and "reduce" dynamically. A
"shift" is referred to as the number of times a token is pushed on the
stack. A "reduce" is the number of times an expression rule has been
matched. Once it's matched, it can be reduced. As you will recall, when
<CODE>lemon</CODE> is run, three files are normally created: *.c, *.h, and
*.out. This ".out" file contains each step of the grammar, along with the
shift and reduce states. If you want a simple summary, run lemon with the
"-s" option:
<P><PRE><CODE>
$ ./lemon -s example2.y
Parser statistics: 6 terminals, 3 nonterminals, 6 rules
11 states, 0 parser table entries, 0 conflicts
</CODE></PRE>
<P>Again, as in the previous example, "main_part2", the driver, is
appended to "example2.c":
<P><PRE><CODE>
$ cat main_part2 >> example2.c
</CODE></PRE>
<P>Now "example2.c" can be compiled and executed:
<P><PRE><CODE>
$ g++ -o ex2 example2.c
$ ./ex2
Result.value=17
Result.n=4
Result.value=-9
Result.n=4
Result.value=78
Result.n=10
</CODE></PRE>
<H2>Example 3: Working with the token destructor</H2>
<P>One advantage of lemon over bison is the ability to free memory used by
a non-terminal. You can call the function of your choice.
"<CODE>expr</CODE>" is an example of a non-terminal. When the program is
done with the non-terminal, the function defined by
<CODE>token_destructor</CODE> is called.
<P><A
href="http://souptonuts.sourceforge.net/code/example3.y.html">example3.y</A>
<P><PRE><CODE>
1 %include {
2 #include <iostream>
3 #include "ex3def.h"
4 #include "example3.h"
5 void token_destructor(Token t)
6 {
7 std::cout << "In token_destructor t.value= " << t.value << std::endl;
8 std::cout << "In token_destructor t.n= " << t.n << std::endl;
9 }
10 }
11 %token_type {Token}
12 %default_type {Token}
13 %token_destructor { token_destructor($$); }
...
</CODE></PRE>
<P>In line 13, <CODE>token_destructor</CODE> is the function
"<CODE>token_destructor($$);</CODE>". The function
"<CODE>token_destructor</CODE>" is defined in lines 5 through 9. For this
simple example, no memory is allocated, so there is no need to call
<CODE>free</CODE>. Instead, to see what is happening, output will be
written to std::cout.
<P>After the program is compiled, it can be executed as follows. Note that
I have added line numbers to the output of "ex3" for easy reference.
<P><PRE><CODE>
$ ./ex3
1 t0.value=4 PLUS t1.value=13
2 In token_destructor t.value= 4
3 In token_destructor t.n= 0
4 Result.value=17
5 Result.n=4
6 parsing complete!
...
</CODE></PRE>
<P>After the expression has been reduced, the destructor is called, but it
is only called for the token.value=4. Why? For an answer we will have to
take a look at "main_part3".
<P><A
href="http://souptonuts.sourceforge.net/code/main_part3.html">main_part3</A>
<P><PRE><CODE>
1 int main()
2 {
3 void* pParser = ParseAlloc (malloc);
4 struct Token t0,t1;
5 struct Token mToken;
6 t0.value=4;
7 t0.n=0;
8 t1.value=13;
9 t1.n=0;
10 std::cout << " t0.value=4 PLUS t1.value=13 " << std::endl;
11 Parse (pParser, NUM, t0);
12 Parse (pParser, PLUS, t0);
13 Parse (pParser, NUM, t1);
14 Parse (pParser, 0, t0);
15 std::cout << " t0.value=4 DIVIDE t1.value=13 " << std::endl;
16 Parse (pParser, NUM, t0);
17 Parse (pParser, DIVIDE, t0);
18 Parse (pParser, NUM, t1);
19 Parse (pParser, 0, t1);
...
</CODE></PRE>
<P>Line 14 terminates the grammar with <CODE>t0</CODE> as the third
parameter. That third parameter is passed as "<CODE>$$</CODE>" to the
defined destructor function, "<CODE>token_destructor(...</CODE>". When
calling "<CODE>Parse</CODE>" a second time immediately, it is undefined,
so you should only call the destructor function once after you're done
passing tokens to complete an expression. In other words, you would never
call "<CODE>Parse (pParser, 0, t0);</CODE>", immediately followed by
another "<CODE>Parse (pParser, 0, t0);</CODE>".
<P>In line 19, <CODE>token_destructor</CODE> is called for <CODE>t1.value=
13</CODE>. If you look at "main_part3", line 19, you'll see that
<CODE>Parse</CODE> is called with <CODE>t1</CODE> as the third parameter
and <CODE>0</CODE> and the second parameter.
<P>Continuation of the output from the program:
<P><PRE><CODE>
7
8
9 t1.value=13 PLUS t0.value=4
10 In token_destructor t.value= 13
11 In token_destructor t.n= 0
12 Result.value=17
13 Result.n=4
14 parsing complete!
</CODE></PRE>
<P>So <CODE>t0</CODE> is called at the third parameter position in line 14
and <CODE>t1</CODE> is called in line 19. This shouldn't be a problem. One
variable could hold the value of the tokens. For instance, main_part3
could have had <CODE>Token t0</CODE> used for both the values 4 and 14 as
follows:
<P><PRE><CODE>
...
struct Token t0;
t0.value=4;
t0.n=0;
Parse (pParser, NUM, t0);
Parse (pParser, PLUS, t0);
t0.value=13;
t0.n=0;
Parse (pParser, NUM, t0);
Parse (pParser, 0, t0);
...
</CODE></PRE>
<H2>Example 4: Ending the grammar with a NEWLINE</H2>
<P>Notice that in the last three examples, <CODE>Parse(pParse,0..</CODE>
had to be called to signal the end of the input for an expression. This is
awkward. Instead, the grammar should dictate when the expression can no
longer be reduced.
<P>"example4.y" contains the following lines:
<P><A
href="http://souptonuts.sourceforge.net/code/example4.y.html">example4.y</A>
<P><PRE><CODE>
1 %include {
2 #include <iostream>
3 #include "ex4def.h"
4 #include "example4.h"
...
23
24 %syntax_error {
25 std::cout << "Syntax error!" << std::endl;
26 }
27
28 /* This is to terminate with a new line */
29 main ::= in.
30 in ::= .
31 in ::= in state NEWLINE.
32 state ::= expr(A). {
33 std::cout << "Result.value=" << A.value << std::end
34 std::cout << "Result.n=" << A.n << std::endl;
35 }
36 expr(A) ::= expr(B) MINUS expr(C). { A.value = B.value - C.value;
37 A.n = B.n+1 + C.n+1;
38 }
...
</CODE></PRE>
<P>Note lines 29 through 35. "<CODE>main</CODE>" and "<CODE>in</CODE>"
must be defined (lines 29-31). If you're a Bison user, you could get away
without having to define the non-terminal main, but lemon currently
requires it.
<P>With this change made to the grammar in "example4.y", "main_part4" can
now terminate each expression by passing the token NEWLINE.
<P>Here is a section of main_part4:
<P><A
href="http://souptonuts.sourceforge.net/code/main_part4.html">main_part4</A>
<P><PRE><CODE>
1 int main()
2 {
3 void* pParser = ParseAlloc (malloc);
4 struct Token t0,t1;
5 struct Token mToken;
6 t0.value=4;
7 t0.n=0;
8 t1.value=13;
9 t1.n=0;
10 std::cout << std::endl <<" t0.value=4 PLUS t1.value=13 " << std::endl << std::endl;
11 Parse (pParser, NUM, t0);
12 Parse (pParser, PLUS, t0);
13 Parse (pParser, NUM, t1);
14 Parse (pParser, NEWLINE, t1);
15 std::cout << std::endl <<" t0.value=4 TIMES t1.value=13 " << std::endl << std::endl;
</CODE></PRE>
<P>Note that line 14 is passing the token NEWLINE and checking
"example4.h". NEWLINE in this case is defined as an integer, 6.
<P>So, looking at the output of "ex4", with line numbers added for
clarification, we get the following:
<P><PRE><CODE>
$ ./ex4
1 t0.value=4 PLUS t1.value=13
2
3 In token_destructor t.value= 4
4 In token_destructor t.n= 0
5 Result.value=17
6 Result.n=4
7
8 t0.value=4 TIMES t1.value=13
9
10 In token_destructor t.value= 4
11 In token_destructor t.n= 0
12 Result.value=52
13 Result.n=4
14 parsing complete!
</CODE></PRE>
<P>We get the result on line 5, and there was no need to call <CODE>Parse
(pParser, 0, t0);</CODE>. Instead, <CODE>Parse( pParse, NEWLINE,
t0)</CODE> worked.
<H2>Example 5: Using flex for the tokenizer</H2>
<P>The next example takes input directly from the terminal, and flex will
create a scanner for finding the appropriate tokens.
<P>First, a quick look at the flex program "lexer.l", again with line
numbers added for clarification:
<P><A
href="http://souptonuts.sourceforge.net/code/lexer.l.html">lexer.l</A>
<P><PRE><CODE>
1 %{
2 #include "lexglobal.h"
3 #include "example5.h"
4 #include <string.h>
5 #include <math.h>
6 int line = 1, col = 1;
7 %}
8 %%
9 [0-9]+|[0-9]*\.[0-9]+ { col += (int) strlen(yytext);
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -