📄 the lemon parser generator.htm

📁 編譯器的lemon的辭法分析器
💻 HTM
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0040)http://www.hwaci.com/sw/lemon/lemon.html -->
<HTML><HEAD><TITLE>The Lemon Parser Generator</TITLE>
<META http-equiv=Content-Type content="text/html; charset=big5">
<META content="MSHTML 6.00.2800.1400" name=GENERATOR></HEAD>
<BODY bgColor=white>
<H1 align=center>The Lemon Parser Generator</H1>
<P>Lemon is an LALR(1) parser generator for C or C++. It does the same job as 
``bison'' and ``yacc''. But lemon is not another bison or yacc clone. It uses a 
different grammar syntax which is designed to reduce the number of coding 
errors. Lemon also uses a more sophisticated parsing engine that is faster than 
yacc and bison and which is both reentrant and thread-safe. Furthermore, Lemon 
implements features that can be used to eliminate resource leaks, making is 
suitable for use in long-running programs such as graphical user interfaces or 
embedded controllers.</P>
<P>This document is an introduction to the Lemon parser generator.</P>
<H2>Theory of Operation</H2>
<P>The main goal of Lemon is to translate a context free grammar (CFG) for a 
particular language into C code that implements a parser for that language. The 
program has two inputs: 
<UL>
  <LI>The grammar specification. 
  <LI>A parser template file. </LI></UL>Typically, only the grammar specification 
is supplied by the programmer. Lemon comes with a default parser template which 
works fine for most applications. But the user is free to substitute a different 
parser template if desired.
<P></P>
<P>Depending on command-line options, Lemon will generate between one and three 
files of outputs. 
<UL>
  <LI>C code to implement the parser. 
  <LI>A header file defining an integer ID for each terminal symbol. 
  <LI>An information file that describes the states of the generated parser 
  automaton. </LI></UL>By default, all three of these output files are generated. 
The header file is suppressed if the ``-m'' command-line option is used and the 
report file is omitted when ``-q'' is selected.
<P></P>
<P>The grammar specification file uses a ``.y'' suffix, by convention. In the 
examples used in this document, we'll assume the name of the grammar file is 
``gram.y''. A typical use of Lemon would be the following command: <PRE>   lemon gram.y
</PRE>This command will generate three output files named ``gram.c'', ``gram.h'' 
and ``gram.out''. The first is C code to implement the parser. The second is the 
header file that defines numerical values for all terminal symbols, and the last 
is the report that explains the states used by the parser automaton.
<P></P>
<H3>Command Line Options</H3>
<P>The behavior of Lemon can be modified using command-line options. You can 
obtain a list of the available command-line options together with a brief 
explanation of what each does by typing <PRE>   lemon -?
</PRE>As of this writing, the following command-line options are supported: 
<UL>
  <LI><TT>-b</TT> 
  <LI><TT>-c</TT> 
  <LI><TT>-g</TT> 
  <LI><TT>-m</TT> 
  <LI><TT>-q</TT> 
  <LI><TT>-s</TT> 
  <LI><TT>-x</TT> </LI></UL>The ``-b'' option reduces the amount of text in the 
report file by printing only the basis of each parser state, rather than the 
full configuration. The ``-c'' option suppresses action table compression. Using 
-c will make the parser a little larger and slower but it will detect syntax 
errors sooner. The ``-g'' option causes no output files to be generated at all. 
Instead, the input grammar file is printed on standard output but with all 
comments, actions and other extraneous text deleted. This is a useful way to get 
a quick summary of a grammar. The ``-m'' option causes the output C source file 
to be compatible with the ``makeheaders'' program. Makeheaders is a program that 
automatically generates header files from C source code. When the ``-m'' option 
is used, the header file is not output since the makeheaders program will take 
care of generated all header files automatically. The ``-q'' option suppresses 
the report file. Using ``-s'' causes a brief summary of parser statistics to be 
printed. Like this: <PRE>   Parser statistics: 74 terminals, 70 nonterminals, 179 rules
                      340 states, 2026 parser table entries, 0 conflicts
</PRE>Finally, the ``-x'' option causes Lemon to print its version number and 
copyright information and then stop without attempting to read the grammar or 
generate a parser.
<P></P>
<H3>The Parser Interface</H3>
<P>Lemon doesn't generate a complete, working program. It only generates a few 
subroutines that implement a parser. This section describes the interface to 
those subroutines. It is up to the programmer to call these subroutines in an 
appropriate way in order to produce a complete system.</P>
<P>Before a program begins using a Lemon-generated parser, the program must 
first create the parser. A new parser is created as follows: <PRE>   void *pParser = ParseAlloc( malloc );
</PRE>The ParseAlloc() routine allocates and initializes a new parser and 
returns a pointer to it. The actual data structure used to represent a parser is 
opaque -- its internal structure is not visible or usable by the calling 
routine. For this reason, the ParseAlloc() routine returns a pointer to void 
rather than a pointer to some particular structure. The sole argument to the 
ParseAlloc() routine is a pointer to the subroutine used to allocate memory. 
Typically this means ``malloc()''.
<P></P>
<P>After a program is finished using a parser, it can reclaim all memory 
allocated by that parser by calling <PRE>   ParseFree(pParser, free);
</PRE>The first argument is the same pointer returned by ParseAlloc(). The 
second argument is a pointer to the function used to release bulk memory back to 
the system.
<P></P>
<P>After a parser has been allocated using ParseAlloc(), the programmer must 
supply the parser with a sequence of tokens (terminal symbols) to be parsed. 
This is accomplished by calling the following function once for each token: <PRE>   Parse(pParser, hTokenID, sTokenData, pArg);
</PRE>The first argument to the Parse() routine is the pointer returned by 
ParseAlloc(). The second argument is a small positive integer that tells the 
parse the type of the next token in the data stream. There is one token type for 
each terminal symbol in the grammar. The gram.h file generated by Lemon contains 
#define statements that map symbolic terminal symbol names into appropriate 
integer values. (A value of 0 for the second argument is a special flag to the 
parser to indicate that the end of input has been reached.) The third argument 
is the value of the given token. By default, the type of the third argument is 
integer, but the grammar will usually redefine this type to be some kind of 
structure. Typically the second argument will be a broad category of tokens such 
as ``identifier'' or ``number'' and the third argument will be the name of the 
identifier or the value of the number.
<P></P>
<P>The Parse() function may have either three or four arguments, depending on 
the grammar. If the grammar specification file request it, the Parse() function 
will have a fourth parameter that can be of any type chosen by the programmer. 
The parser doesn't do anything with this argument except to pass it through to 
action routines. This is a convenient mechanism for passing state information 
down to the action routines without having to use global variables.</P>
<P>A typical use of a Lemon parser might look something like the following: <PRE>   01 ParseTree *ParseFile(const char *zFilename){
   02    Tokenizer *pTokenizer;
   03    void *pParser;
   04    Token sToken;
   05    int hTokenId;
   06    ParserState sState;
   07
   08    pTokenizer = TokenizerCreate(zFilename);
   09    pParser = ParseAlloc( malloc );
   10    InitParserState(&amp;sState);
   11    while( GetNextToken(pTokenizer, &amp;hTokenId, &amp;sToken) ){
   12       Parse(pParser, hTokenId, sToken, &amp;sState);
   13    }
   14    Parse(pParser, 0, sToken, &amp;sState);
   15    ParseFree(pParser, free );
   16    TokenizerFree(pTokenizer);
   17    return sState.treeRoot;
   18 }
</PRE>This example shows a user-written routine that parses a file of text and 
returns a pointer to the parse tree. (We've omitted all error-handling from this 
example to keep it simple.) We assume the existence of some kind of tokenizer 
which is created using TokenizerCreate() on line 8 and deleted by 
TokenizerFree() on line 16. The GetNextToken() function on line 11 retrieves the 
next token from the input file and puts its type in the integer variable 
hTokenId. The sToken variable is assumed to be some kind of structure that 
contains details about each token, such as its complete text, what line it 
occurs on, etc. 
<P></P>
<P>This example also assumes the existence of structure of type ParserState that 
holds state information about a particular parse. An instance of such a 
structure is created on line 6 and initialized on line 10. A pointer to this 
structure is passed into the Parse() routine as the optional 4th argument. The 
action routine specified by the grammar for the parser can use the ParserState 
structure to hold whatever information is useful and appropriate. In the 
example, we note that the treeRoot field of the ParserState structure is left 
pointing to the root of the parse tree.</P>
<P>The core of this example as it relates to Lemon is as follows: <PRE>   ParseFile(){
      pParser = ParseAlloc( malloc );
      while( GetNextToken(pTokenizer,&amp;hTokenId, &amp;sToken) ){
         Parse(pParser, hTokenId, sToken);
      }
      Parse(pParser, 0, sToken);
      ParseFree(pParser, free );
   }
</PRE>Basically, what a program has to do to use a Lemon-generated parser is 
first create the parser, then send it lots of tokens obtained by tokenizing an 
input source. When the end of input is reached, the Parse() routine should be 
called one last time with a token type of 0. This step is necessary to inform 
the parser that the end of input has been reached. Finally, we reclaim memory 
used by the parser by calling ParseFree().
<P></P>
<P>There is one other interface routine that should be mentioned before we move 
on. The ParseTrace() function can be used to generate debugging output from the 
parser. A prototype for this routine is as follows: <PRE>   ParseTrace(FILE *stream, char *zPrefix);
</PRE>After this routine is called, a short (one-line) message is written to the 
designated output stream every time the parser changes states or calls an action 
routine. Each such message is prefaced using the text given by zPrefix. This 
debugging output can be turned off by calling ParseTrace() again with a first 
argument of NULL (0).
<P></P>
<H3>Differences With YACC and BISON</H3>
<P>Programmers who have previously used the yacc or bison parser generator will 
notice several important differences between yacc and/or bison and Lemon. 
<UL>
  <LI>In yacc and bison, the parser calls the tokenizer. In Lemon, the tokenizer 
  calls the parser. 
  <LI>Lemon uses no global variables. Yacc and bison use global variables to 
  pass information between the tokenizer and parser. 
  <LI>Lemon allows multiple parsers to be running simultaneously. Yacc and bison 
  do not. </LI></UL>These differences may cause some initial confusion for 
programmers with prior yacc and bison experience. But after years of experience 
using Lemon, I firmly believe that the Lemon way of doing things is better.
<P></P>
<H2>Input File Syntax</H2>
<P>The main purpose of the grammar specification file for Lemon is to define the 
grammar for the parser. But the input file also specifies additional information 
Lemon requires to do its job. Most of the work in using Lemon is in writing an 
appropriate grammar file.</P>
<P>The grammar file for lemon is, for the most part, free format. It does not 
have sections or divisions like yacc or bison. Any declaration can occur at any 
point in the file. Lemon ignores whitespace (except where it is needed to
12 3 下一页
💿 文件大小 48 K
👤 上传用户 syxie
📂 所属分类编译器/解释器
🏷️ 相关标签

#lemon #分析器
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -