📄 readme_lemon_tutorial.htm

📁 学习lemon语法分析的windows程序
💻 HTM
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
    10                                                  yylval.dval = atof(yytext);
    11                                                  return NUM; }
    12  [ \t]   { col += (int) strlen(yytext); }               /* ignore but count white space */
    13  [A-Za-z][A-Za-z0-9]*                           { /* ignore but needed for variables */
    
    14                                                  return 0;
    15                                                 }
    
    16  "+"           {  return PLUS; }
    17  "-"           {  return MINUS; }
    18  "*"           {  return TIMES; }
    19  "/"           {  return DIVIDE; }
    
    20  \n      { col = 0; ++line; return NEWLINE; }
    
    21  .       { col += (int) strlen(yytext); return yytext[0]; }
    22  %%
    23  /**
    24   * reset the line and column count
    25   *
    26   *
    27   */
    28  void reset_lexer(void)
    29  {
    
    30    line = 1;
    31    col  = 1;
    
    32  }
    
    33  /**
    34   * yyerror() is invoked when the lexer or the parser encounter
    35   * an error. The error message is passed via *s
    36   *
    37   *
    38   */
    39  void yyerror(char *s)
    40  {
    41    printf("error: %s at line: %d col: %d\n",s,line,col);
    
    42  }
    
    43  int yywrap(void)
    44  {
    45    return 1;
    46  }
    
</CODE></PRE>
      <P>The format for flex is basically a rule on the left side followed by C 
      code to execute on the right side. Take line 9, 
      "<CODE>[0-9]+|[0-9]*\.[0-9]+</CODE>", which will match any of 3, .3, 0.3, 
      and 23.4 and will return NUM. What's the value of NUM? It's taken from 
      line 3, which includes the file "example5.h", generated from the lemon 
      parser. On Line 10, <CODE>yylval.dval</CODE> is assigned the value of 
      "<CODE>yytext</CODE>" after it's converted to a float. The structure of 
      yylval is defined in "lexglobal.h" on line 2. 
      <P>"lexglobal.h" with line numbers added: 
      <P><A 
      href="http://souptonuts.sourceforge.net/code/lexglobal.h.html">lexglobal.h</A> 

      <P><PRE><CODE>
    1  #ifndef YYSTYPE
    2  typedef union {
    3    double    dval;
    4    struct symtab *symp;
    5  } yystype;
    6  # define YYSTYPE yystype
    7  # define YYSTYPE_IS_TRIVIAL 1
    8  #endif
    
    9  /* extern YYSTYPE yylval; */
    10  YYSTYPE yylval;
    
</CODE></PRE>
      <P><CODE>yystype</CODE> is the union of <CODE>dval</CODE> and 
      <CODE>symtab</CODE>. Again, <CODE>symtab</CODE> is not used in these 
      examples, but should you move to a calculator with variables that can be 
      assigned, you'd use this. See Reference 3 for a full calculator example 
      with flex and bison. 
      <P>Again looking at lines 9 through 11 in <A 
      href="http://souptonuts.sourceforge.net/code/lexer.l.html">lexer.l</A>; 
      <P><PRE><CODE>
    ...
    9  [0-9]+|[0-9]*\.[0-9]+    {                      col += (int) strlen(yytext);
    10                                                  yylval.dval = atof(yytext);
    11                                                  return NUM; }
    ...
</CODE></PRE>
      <P>Both the type of token, NUM, and its value must be passed along. We 
      need to know it's a number, but we also need to know the value of the 
      number. 
      <P>Unlike what we need with PLUS, MINUS, TIME, and DIVIDE, we only need to 
      know the particular identifier has been found. Therefore, in lexer.l, 
      lines 16 through 19 only return the token value. 
      <P><PRE><CODE>
    16  "+"           {  return PLUS; }
    17  "-"           {  return MINUS; }
    18  "*"           {  return TIMES; }
    19  "/"           {  return DIVIDE; }
    
    20  \n      { col = 0; ++line; return NEWLINE; }
    
    21  .       { col += (int) strlen(yytext); return yytext[0]; }
    22  %%
</CODE></PRE>
      <P>Line 20 will match on a NEWLINE. Although not used, line numbers keep 
      track of the variable "<CODE>line</CODE>" and <CODE>col</CODE> is used to 
      track the number of columns. This is a good idea; it is helpful when 
      debugging. 
      <P>The driver, main_part5, contains a lot more code. The low level read 
      statement is used on stdin. This could easily be changed to accept input 
      coming in on a socket descriptor, so if you had a Web scraping program 
      that scans input from a TCP socket, the socket descriptor would replace 
      "<CODE>fileno(stdin)</CODE>" on line 33. 
      <P><A 
      href="http://souptonuts.sourceforge.net/code/main_part5.html">main_part5</A> 

      <P><PRE><CODE>
    
    1  #include &lt;stdio.h&gt;
    2  #include &lt;unistd.h&gt;
    3  #include &lt;sys/types.h&gt;
    4  #include &lt;sys/stat.h&gt;
    5  #include &lt;fcntl.h&gt;
    6  #include &lt;stdlib.h&gt;
    
    7  #define BUFS 1024
    
    8  /**
    9   * We have to declare these here - they're not  in any header files
    10   * we can include.  yyparse() is declared with an empty argument list
    11   * so that it is compatible with the generated C code from bison.
    12   *
    13   */
    
    14  extern FILE *yyin;
    15  typedef struct yy_buffer_state *YY_BUFFER_STATE;
    
    16  extern "C" {
    17    int             yylex( void );
    18    YY_BUFFER_STATE yy_scan_string( const char * );
    19    void            yy_delete_buffer( YY_BUFFER_STATE );
    20  }
    
    21  int main(int argc,char** argv)
    22  {
    23    int n;
    24    int yv;
    25    char buf[BUFS+1];
    26    void* pParser = ParseAlloc (malloc);
    
    27    struct Token t0,t1;
    28    struct Token mToken;
    
    29    t0.n=0;
    30    t0.value=0;
    
    31    std::cout &lt;&lt; "Enter an expression like 3+5 &lt;return&gt;" &lt;&lt; std::endl;
    32    std::cout &lt;&lt; "  Terminate with ^D" &lt;&lt; std::endl;
    
    33    while ( ( n=read(fileno(stdin), buf, BUFS )) &gt;  0)
    34      {
    35        buf[n]='\0';
    36        yy_scan_string(buf);
    37        // on EOF yylex will return 0
    38        while( (yv=yylex()) != 0)
    39          {
    40            std::cout &lt;&lt; " yylex() " &lt;&lt; yv &lt;&lt; " yylval.dval " &lt;&lt; yylval.dval &lt;&lt; std::endl;
    41            t0.value=yylval.dval;
    42            Parse (pParser, yv, t0);
    43          }
    
    44      }
    
    45    Parse (pParser, 0, t0);
    46    ParseFree(pParser, free );
    
    47  }
</CODE></PRE>
      <P>Line 16, '<CODE>extern "C"</CODE>', is necessary because "lexer.l" was 
      run through flex to create C code, as opposed to C++ code: 
      <P><PRE><CODE>
    $ flex lexer.l
</CODE></PRE>
      <P>See the flex manual, Reference 7. Yes, "<CODE>flex++</CODE>" will 
      output C++ code. However, for complex scanning, C code may be faster. 
      "main_part5", which is compiled as a C++ program, makes the transition 
      smoothly. 
      <P>The parser should always terminate input with 0 in the second parameter 
      to "<CODE>Parse(pParser,0,..</CODE>". When there is no more input coming 
      into flex, it <I>will</I> return a zero, so the <CODE>while</CODE> loop 
      below on line 38 terminates with a zero. Then the <CODE>read</CODE> 
      statement, line 33, looks for more input. This is something you would want 
      to do when reading from a socket, since it may have been delayed. 
      <P>But if the initial read (line 33 for the first time) isn't successful, 
      flex has no chance of returning a zero. Therefore, line 45 has a zero as 
      the second parameter. 
      <P><PRE><CODE>
    ...
    33    while ( ( n=read(fileno(stdin), buf, BUFS )) &gt;  0)
    
    ...
    38        while( (yv=yylex()) != 0)
    39          {
    40            std::cout &lt;&lt; " yylex() " &lt;&lt; yv &lt;&lt; " yylval.dval " &lt;&lt; yylval.dval &lt;&lt; std::endl;
    41            t0.value=yylval.dval;
    42            Parse (pParser, yv, t0);
    43          }
    ...
    45    Parse (pParser, 0, t0);
    46    ParseFree(pParser, free );
</CODE></PRE>
      <H2>Summary</H2>
      <P>lemon is fast, completely in the public domain, well tested in SQLite, 
      and thread safe. Parser generators can help developers write reusable code 
      for complex tasks in a fraction of the time they would need for writing 
      the complete program from scratch. The syntax file, the file that holds 
      the grammar, can be modified to suit multiple needs. 
      <P>Although I have had no problems with lemon.c, there are a few compiler 
      warnings regarding signed and unsigned integers when compiling it with the 
      -Wall -W flags: 
      <P><PRE><CODE>
    [chirico@third-fl-71 lemon_examples]$ gcc -Wall -W -O2 -s -pipe lemon.c
    lemon.c: In function `resolve_conflict':
    lemon.c:973: warning: unused parameter `errsym'
    lemon.c: In function `main':
    lemon.c:1342: warning: unused parameter `argc'
    lemon.c: At top level:
    lemon.c:2308: warning: return type defaults to `int'
    lemon.c: In function `preprocess_input':
    lemon.c:2334: warning: comparison between signed and unsigned
    lemon.c:2352: warning: control reaches end of non-void function
    lemon.c:2311: warning: `start' might be used uninitialized in this function
    lemon.c:2313: warning: `start_lineno' might be used uninitialized in this function
    lemon.c: In function `Parse':
    lemon.c:2393: warning: comparison between signed and unsigned
    lemon.c: In function `tplt_open':
    lemon.c:2904: warning: implicit declaration of function `access'
    lemon.c: In function `append_str':
    lemon.c:3019: warning: comparison between signed and unsigned
    lemon.c:3011: warning: unused variable `i'
    lemon.c: In function `translate_code':
    lemon.c:3109: warning: control reaches end of non-void function
</CODE></PRE>
      <P>This can be an inconvenience when adding the parse.c file to existing 
      code. A fix is on the way. Since I expect the changes to be cleaned up 
      soon, this version of lemon.c is the same version that you'd get from the 
      author's site, which will make it easier to apply the patch. 
      <P>There are times when a parser like lemon or bison may be a little too 
      much. These are powerful tools. An interesting alternative, if you're a 
      C++ programmer and you only need to do inline parsing, is the spirit 
      library. See Reference 9. 
      <H2>Examples for this article</H2>
      <P>The complete source for these examples, including the parser itself, 
      can be downloaded <A 
      href="http://prdownloads.sourceforge.net/souptonuts/lemon_examples.tar.gz?download">here</A>. 

      <H2>References</H2>
      <OL>
        <LI><A 
        href="http://souptonuts.sourceforge.net/code/desktop_calc.cc.html">An 
        example desktop calculator from scratch</A> 
        <LI><A 
        href="http://prdownloads.sourceforge.net/souptonuts/flex_bison.tar.gz?download">An 
        example of a flex and bison parser</A> 
        <LI><A href="http://www.hwaci.com/sw/lemon/">The home of the lemon 
        parser generator</A> 
        <LI><A href="http://www.sqlite.org/">The home of SQLite</A> 
        <LI><A 
        href="http://prdownloads.sourceforge.net/souptonuts/README_sqlite_tutorial.html?download">Extensive 
        SQLite Tutorial</A> 
        <LI><A href="http://www.parsifalsoft.com/gloss.html">A glossary of 
        parser terms</A> 
        <LI><A href="http://www.parsifalsoft.com/isdp.html">A good introduction 
        to parsers</A> 
        <LI><A href="http://www.gnu.org/software/flex/manual/">The GNU flex 
        manual</A> 
        <LI><A href="http://www.gnu.org/software/bison/manual/">The GNU bison 
        manual</A> 
        <LI><A href="http://spirit.sourceforge.net/">The spirit parser</A> 
        <LI><A href="http://www.iunknown.com/000123.html">Getting a C++ Bison 
        parser to use a C Flex lexer</A> 
        <LI><A href="http://www.linux.com/howtos/Lex-YACC-HOWTO.shtml">The 
        Lex-YACC-HOWTO</A> </LI></OL>
      <P></P></TD></TR></TBODY></TABLE><!-- *** BEGIN bio *** -->
<HR>

<P><IMG alt="Chirico img" src="readme_lemon_tutorial.files/pgtrack.htm" 
border=0> <EM>Mike Chirico, a father of triplets (all girls) lives outside of 
Philadelphia, PA, USA. He has worked with Linux since 1996, has a Masters in 
Computer Science and Mathematics from Villanova University, and has worked in 
computer-related jobs from Wall Street to the University of Pennsylvania. His 
hero is Paul Erdos, a brilliant number theorist who was known for his open 
collaboration with others. 
<P><BR>Mike's notes page is <A 
href="http://souptonuts.sourceforge.net/chirico/index.php">souptonuts</A>. For 
open source consulting needs, please send an email to <A 
href="mailto:mchirico@comcast.net?subject=Open Source Consulting Needs">mailto:mchirico@comcast.net?subject=Open 
Source Consulting Needs</A>. All consulting work must include a donation to 
SourceForge.net. </EM><BR><!-- *** END bio *** -->
<P><BR><A href="http://sourceforge.net/"><IMG height=62 
alt="SourceForge.net Logo" src="readme_lemon_tutorial.files/sflogo.png" 
width=210 border=0></A> </P></BODY></HTML>
上一页 1 23
💿 文件大小 338 K
👤 上传用户 bonylee_java
📂 所属分类编译器/解释器
🏷️ 相关标签

#windows #lemon #分 #程序
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -