📄 readme_lemon_tutorial.htm
字号:
10 yylval.dval = atof(yytext);
11 return NUM; }
12 [ \t] { col += (int) strlen(yytext); } /* ignore but count white space */
13 [A-Za-z][A-Za-z0-9]* { /* ignore but needed for variables */
14 return 0;
15 }
16 "+" { return PLUS; }
17 "-" { return MINUS; }
18 "*" { return TIMES; }
19 "/" { return DIVIDE; }
20 \n { col = 0; ++line; return NEWLINE; }
21 . { col += (int) strlen(yytext); return yytext[0]; }
22 %%
23 /**
24 * reset the line and column count
25 *
26 *
27 */
28 void reset_lexer(void)
29 {
30 line = 1;
31 col = 1;
32 }
33 /**
34 * yyerror() is invoked when the lexer or the parser encounter
35 * an error. The error message is passed via *s
36 *
37 *
38 */
39 void yyerror(char *s)
40 {
41 printf("error: %s at line: %d col: %d\n",s,line,col);
42 }
43 int yywrap(void)
44 {
45 return 1;
46 }
</CODE></PRE>
<P>The format for flex is basically a rule on the left side followed by C
code to execute on the right side. Take line 9,
"<CODE>[0-9]+|[0-9]*\.[0-9]+</CODE>", which will match any of 3, .3, 0.3,
and 23.4 and will return NUM. What's the value of NUM? It's taken from
line 3, which includes the file "example5.h", generated from the lemon
parser. On Line 10, <CODE>yylval.dval</CODE> is assigned the value of
"<CODE>yytext</CODE>" after it's converted to a float. The structure of
yylval is defined in "lexglobal.h" on line 2.
<P>"lexglobal.h" with line numbers added:
<P><A
href="http://souptonuts.sourceforge.net/code/lexglobal.h.html">lexglobal.h</A>
<P><PRE><CODE>
1 #ifndef YYSTYPE
2 typedef union {
3 double dval;
4 struct symtab *symp;
5 } yystype;
6 # define YYSTYPE yystype
7 # define YYSTYPE_IS_TRIVIAL 1
8 #endif
9 /* extern YYSTYPE yylval; */
10 YYSTYPE yylval;
</CODE></PRE>
<P><CODE>yystype</CODE> is the union of <CODE>dval</CODE> and
<CODE>symtab</CODE>. Again, <CODE>symtab</CODE> is not used in these
examples, but should you move to a calculator with variables that can be
assigned, you'd use this. See Reference 3 for a full calculator example
with flex and bison.
<P>Again looking at lines 9 through 11 in <A
href="http://souptonuts.sourceforge.net/code/lexer.l.html">lexer.l</A>;
<P><PRE><CODE>
...
9 [0-9]+|[0-9]*\.[0-9]+ { col += (int) strlen(yytext);
10 yylval.dval = atof(yytext);
11 return NUM; }
...
</CODE></PRE>
<P>Both the type of token, NUM, and its value must be passed along. We
need to know it's a number, but we also need to know the value of the
number.
<P>Unlike what we need with PLUS, MINUS, TIME, and DIVIDE, we only need to
know the particular identifier has been found. Therefore, in lexer.l,
lines 16 through 19 only return the token value.
<P><PRE><CODE>
16 "+" { return PLUS; }
17 "-" { return MINUS; }
18 "*" { return TIMES; }
19 "/" { return DIVIDE; }
20 \n { col = 0; ++line; return NEWLINE; }
21 . { col += (int) strlen(yytext); return yytext[0]; }
22 %%
</CODE></PRE>
<P>Line 20 will match on a NEWLINE. Although not used, line numbers keep
track of the variable "<CODE>line</CODE>" and <CODE>col</CODE> is used to
track the number of columns. This is a good idea; it is helpful when
debugging.
<P>The driver, main_part5, contains a lot more code. The low level read
statement is used on stdin. This could easily be changed to accept input
coming in on a socket descriptor, so if you had a Web scraping program
that scans input from a TCP socket, the socket descriptor would replace
"<CODE>fileno(stdin)</CODE>" on line 33.
<P><A
href="http://souptonuts.sourceforge.net/code/main_part5.html">main_part5</A>
<P><PRE><CODE>
1 #include <stdio.h>
2 #include <unistd.h>
3 #include <sys/types.h>
4 #include <sys/stat.h>
5 #include <fcntl.h>
6 #include <stdlib.h>
7 #define BUFS 1024
8 /**
9 * We have to declare these here - they're not in any header files
10 * we can include. yyparse() is declared with an empty argument list
11 * so that it is compatible with the generated C code from bison.
12 *
13 */
14 extern FILE *yyin;
15 typedef struct yy_buffer_state *YY_BUFFER_STATE;
16 extern "C" {
17 int yylex( void );
18 YY_BUFFER_STATE yy_scan_string( const char * );
19 void yy_delete_buffer( YY_BUFFER_STATE );
20 }
21 int main(int argc,char** argv)
22 {
23 int n;
24 int yv;
25 char buf[BUFS+1];
26 void* pParser = ParseAlloc (malloc);
27 struct Token t0,t1;
28 struct Token mToken;
29 t0.n=0;
30 t0.value=0;
31 std::cout << "Enter an expression like 3+5 <return>" << std::endl;
32 std::cout << " Terminate with ^D" << std::endl;
33 while ( ( n=read(fileno(stdin), buf, BUFS )) > 0)
34 {
35 buf[n]='\0';
36 yy_scan_string(buf);
37 // on EOF yylex will return 0
38 while( (yv=yylex()) != 0)
39 {
40 std::cout << " yylex() " << yv << " yylval.dval " << yylval.dval << std::endl;
41 t0.value=yylval.dval;
42 Parse (pParser, yv, t0);
43 }
44 }
45 Parse (pParser, 0, t0);
46 ParseFree(pParser, free );
47 }
</CODE></PRE>
<P>Line 16, '<CODE>extern "C"</CODE>', is necessary because "lexer.l" was
run through flex to create C code, as opposed to C++ code:
<P><PRE><CODE>
$ flex lexer.l
</CODE></PRE>
<P>See the flex manual, Reference 7. Yes, "<CODE>flex++</CODE>" will
output C++ code. However, for complex scanning, C code may be faster.
"main_part5", which is compiled as a C++ program, makes the transition
smoothly.
<P>The parser should always terminate input with 0 in the second parameter
to "<CODE>Parse(pParser,0,..</CODE>". When there is no more input coming
into flex, it <I>will</I> return a zero, so the <CODE>while</CODE> loop
below on line 38 terminates with a zero. Then the <CODE>read</CODE>
statement, line 33, looks for more input. This is something you would want
to do when reading from a socket, since it may have been delayed.
<P>But if the initial read (line 33 for the first time) isn't successful,
flex has no chance of returning a zero. Therefore, line 45 has a zero as
the second parameter.
<P><PRE><CODE>
...
33 while ( ( n=read(fileno(stdin), buf, BUFS )) > 0)
...
38 while( (yv=yylex()) != 0)
39 {
40 std::cout << " yylex() " << yv << " yylval.dval " << yylval.dval << std::endl;
41 t0.value=yylval.dval;
42 Parse (pParser, yv, t0);
43 }
...
45 Parse (pParser, 0, t0);
46 ParseFree(pParser, free );
</CODE></PRE>
<H2>Summary</H2>
<P>lemon is fast, completely in the public domain, well tested in SQLite,
and thread safe. Parser generators can help developers write reusable code
for complex tasks in a fraction of the time they would need for writing
the complete program from scratch. The syntax file, the file that holds
the grammar, can be modified to suit multiple needs.
<P>Although I have had no problems with lemon.c, there are a few compiler
warnings regarding signed and unsigned integers when compiling it with the
-Wall -W flags:
<P><PRE><CODE>
[chirico@third-fl-71 lemon_examples]$ gcc -Wall -W -O2 -s -pipe lemon.c
lemon.c: In function `resolve_conflict':
lemon.c:973: warning: unused parameter `errsym'
lemon.c: In function `main':
lemon.c:1342: warning: unused parameter `argc'
lemon.c: At top level:
lemon.c:2308: warning: return type defaults to `int'
lemon.c: In function `preprocess_input':
lemon.c:2334: warning: comparison between signed and unsigned
lemon.c:2352: warning: control reaches end of non-void function
lemon.c:2311: warning: `start' might be used uninitialized in this function
lemon.c:2313: warning: `start_lineno' might be used uninitialized in this function
lemon.c: In function `Parse':
lemon.c:2393: warning: comparison between signed and unsigned
lemon.c: In function `tplt_open':
lemon.c:2904: warning: implicit declaration of function `access'
lemon.c: In function `append_str':
lemon.c:3019: warning: comparison between signed and unsigned
lemon.c:3011: warning: unused variable `i'
lemon.c: In function `translate_code':
lemon.c:3109: warning: control reaches end of non-void function
</CODE></PRE>
<P>This can be an inconvenience when adding the parse.c file to existing
code. A fix is on the way. Since I expect the changes to be cleaned up
soon, this version of lemon.c is the same version that you'd get from the
author's site, which will make it easier to apply the patch.
<P>There are times when a parser like lemon or bison may be a little too
much. These are powerful tools. An interesting alternative, if you're a
C++ programmer and you only need to do inline parsing, is the spirit
library. See Reference 9.
<H2>Examples for this article</H2>
<P>The complete source for these examples, including the parser itself,
can be downloaded <A
href="http://prdownloads.sourceforge.net/souptonuts/lemon_examples.tar.gz?download">here</A>.
<H2>References</H2>
<OL>
<LI><A
href="http://souptonuts.sourceforge.net/code/desktop_calc.cc.html">An
example desktop calculator from scratch</A>
<LI><A
href="http://prdownloads.sourceforge.net/souptonuts/flex_bison.tar.gz?download">An
example of a flex and bison parser</A>
<LI><A href="http://www.hwaci.com/sw/lemon/">The home of the lemon
parser generator</A>
<LI><A href="http://www.sqlite.org/">The home of SQLite</A>
<LI><A
href="http://prdownloads.sourceforge.net/souptonuts/README_sqlite_tutorial.html?download">Extensive
SQLite Tutorial</A>
<LI><A href="http://www.parsifalsoft.com/gloss.html">A glossary of
parser terms</A>
<LI><A href="http://www.parsifalsoft.com/isdp.html">A good introduction
to parsers</A>
<LI><A href="http://www.gnu.org/software/flex/manual/">The GNU flex
manual</A>
<LI><A href="http://www.gnu.org/software/bison/manual/">The GNU bison
manual</A>
<LI><A href="http://spirit.sourceforge.net/">The spirit parser</A>
<LI><A href="http://www.iunknown.com/000123.html">Getting a C++ Bison
parser to use a C Flex lexer</A>
<LI><A href="http://www.linux.com/howtos/Lex-YACC-HOWTO.shtml">The
Lex-YACC-HOWTO</A> </LI></OL>
<P></P></TD></TR></TBODY></TABLE><!-- *** BEGIN bio *** -->
<HR>
<P><IMG alt="Chirico img" src="readme_lemon_tutorial.files/pgtrack.htm"
border=0> <EM>Mike Chirico, a father of triplets (all girls) lives outside of
Philadelphia, PA, USA. He has worked with Linux since 1996, has a Masters in
Computer Science and Mathematics from Villanova University, and has worked in
computer-related jobs from Wall Street to the University of Pennsylvania. His
hero is Paul Erdos, a brilliant number theorist who was known for his open
collaboration with others.
<P><BR>Mike's notes page is <A
href="http://souptonuts.sourceforge.net/chirico/index.php">souptonuts</A>. For
open source consulting needs, please send an email to <A
href="mailto:mchirico@comcast.net?subject=Open Source Consulting Needs">mailto:mchirico@comcast.net?subject=Open
Source Consulting Needs</A>. All consulting work must include a donation to
SourceForge.net. </EM><BR><!-- *** END bio *** -->
<P><BR><A href="http://sourceforge.net/"><IMG height=62
alt="SourceForge.net Logo" src="readme_lemon_tutorial.files/sflogo.png"
width=210 border=0></A> </P></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -