📄 manual.html

📁 java cup constructor to compiler
💻 HTML
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
<html><head><title>CUP User's Manual</title></head><body><hr><img src="cup_logo.gif" alt="[CUP Logo Image]"><hr><h1>CUP User's Manual</h1><h3><a href="http://www.cc.gatech.edu/gvu/people/Faculty/Scott.E.Hudson.html">Scott E. Hudson</a><br> <a href="http://www.cc.gatech.edu/gvu/gvutop.html">Graphics Visualization and Usability Center</a><br> <a href="http://www.gatech.edu/TechHome.html">Georgia Institute of Technology</a></h3>Modified by <a href="http://www.princeton.edu/~frankf">FrankFlannery</a>, <a href="http://www.pdos.lcs.mit.edu/~cananian/">C. Scott Ananian</a>, <a href="http://www.cs.princeton.edu/~danwang">Dan Wang</a> with advice from <a href="http://www.cs.princeton.edu/~appel">Andrew W. Appel</a><br>Last updated July 1999 (v0.10j)<hr><h3>Table of Contents</h3><dl compact>  <dt> i.  <dd> <a href="#about">About CUP Version 0.10</a>  <dt> 1.  <dd> <a href="#intro">Introduction and Example</a>  <dt> 2.  <dd> <a href="#spec">Specification Syntax</a>  <dt> 3.  <dd> <a href="#running">Running CUP</a>  <dt> 4.  <dd> <a href="#parser">Customizing the Parser</a>  <dt> 5.  <dd> <a href="#scanner">Scanner interface</a>  <dt> 6.  <dd> <a href="#errors">Error Recovery</a>  <dt> 7.  <dd> <a href="#conclusion">Conclusion</a>  <dt>     <dd> <a href="#refs">References</a>  <dt> A.  <dd> <a href="#appendixa">Grammar for CUP Specification Files</a>  <dt> B.  <dd> <a href="#appendixb">A Very Simple Example Scanner</a>  <dt> C.  <dd> <a href="#changes">Incompatibilites between CUP 0.9 and CUP 0.10</a>  <dt> D.  <dd> <a href="#bugs">Bugs</a>  <dt> E.  <dd> <a href="#version">Change log</a></dl><a name=about><h3>i. About CUP Version 0.10</h3></a> Version0.10 of CUP adds many new changes and features over the previous releasesof version 0.9.  These changes attempt to make CUP more like itspredecessor, YACC.  As a result, the old 0.9 parser specifications for CUP arenot compatible and a reading of <a href="#changes">appendix C</a> of the newmanual will be necessary to write new specifications.  The new version,however, gives the user more power and options, making parser specificationseasier to write.<a name=intro><h3>1. Introduction and Example</h3></a>This manual describes the basic operation and use of the Java<a href="#trademark">(tm)</a>Based Constructor of Useful Parsers (CUP for short).CUP is a system for generating LALR parsers from simple specifications.It serves the same role as the widely used program YACC <a href="#YACCref">[1]</a> and in fact offers most of the features of YACC.  However, CUP is written in Java, uses specifications including embedded Java code, and produces parsers which are implemented in Java.<p>Although this manual covers all aspects of the CUP system, it is relativelybrief, and assumes you have at least a little bit of knowledge of LRparsing.  A working knowledge of YACC is also very helpful inunderstanding how CUP specifications work.A number of compiler construction textbooks (such as <a href="#dragonbook">[2</a>,<a href="#crafting">3]</a>) cover this material, and discuss the YACC system (which is quite similar to this one) as a specific example.  In addition, Andrew Appel's <ahref="http://www.cs.princeton.edu/~appel/modern/java/">Modern CompilerImplementation in Java</a> textbook <a href="#modernjava">[4]</a> usesand describes CUP in the context of compiler construction.<p> Using CUP involves creating a simple specification based on thegrammar for which a parser is needed, along with construction of ascanner capable of breaking characters up into meaningful tokens (suchas keywords, numbers, and special symbols).<p> As a simple example, consider a system for evaluating simple arithmetic expressions over integers.  This system would read expressions from standard input (each terminated with a semicolon), evaluate them, and print the result on standard output.  A grammar for the input to such a system might look like: <pre>  expr_list ::= expr_list expr_part | expr_part  expr_part ::= expr ';'  expr      ::= expr '+' expr | expr '-' expr | expr '*' expr 	      | expr '/' expr | expr '%' expr | '(' expr ')'                | '-' expr | number </pre>To specify a parser based on this grammar, our first step is to identify andname the set of terminal symbols that will appear on input, and the set of non-terminal symbols.  In this case, the non-terminals are: <pre><tt>  expr_list, expr_part </tt> and <tt> expr </tt>.</pre>For terminal names we might choose:<pre><tt>  SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD, NUMBER, LPAREN,</tt>and <tt>RPAREN</tt></pre>The experienced user will note a problem with the above grammar.  It isambiguous.  An ambiguous grammar is a grammar which, given a certaininput, can reduce the parts of the input in two different ways such asto give two different answers.  Take the above grammar, forexample. given the following input: <br><tt>3 + 4 * 6</tt><br>The grammar can either evaluate the <tt>3 + 4</tt> and then multiplyseven by six, or it can evaluate <tt>4 * 6</tt> and then add three.Older versions of CUP forced the user to write unambiguous grammars, butnow there is a construct allowing the user to specify precedences andassociativities for terminals.  This means that the above ambiguousgrammar can be used, after specifying precedences and associativities.There is more explanation later.Based on these namings we can construct a small CUP specification as follows:<br><hr><pre><tt>// CUP specification for a simple expression evaluator (no actions)import java_cup.runtime.*;/* Preliminaries to set up and use the scanner.  */init with {: scanner.init();              :};scan with {: return scanner.next_token(); :};/* Terminals (tokens returned by the scanner). */terminal            SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD;terminal            UMINUS, LPAREN, RPAREN;terminal Integer    NUMBER;/* Non terminals */non terminal            expr_list, expr_part;non terminal Integer    expr, term, factor;/* Precedences */precedence left PLUS, MINUS;precedence left TIMES, DIVIDE, MOD;precedence left UMINUS;/* The grammar */expr_list ::= expr_list expr_part |               expr_part;expr_part ::= expr SEMI;expr      ::= expr PLUS expr             | expr MINUS expr              | expr TIMES expr              | expr DIVIDE expr              | expr MOD expr 	    | MINUS expr %prec UMINUS            | LPAREN expr RPAREN	    | NUMBER	    ;</tt></pre><hr><br>We will consider each part of the specification syntax in detail later.  However, here we can quickly see that the specification contains four main parts.  The first part provides preliminary and miscellaneous declarationsto specify how the parser is to be generated, and supply parts of the runtime code.  In this case we indicate that the <tt>java_cup.runtime</tt>classes should be imported, then supply a small bit of initialization code,and some code for invoking the scanner to retrieve the next input token.The second part of the specification declares terminals and non-terminals,and associates object classes with each.  In this case, the terminalsare declared as either with no type, or of type<tt>Integer</tt>.  The specified type of theterminal or non-terminal is the type of the value of those terminals ornon-terminals.  If no type is specified, the terminal or non-terminalcarries no value.  Here, no type indicates that theseterminals and non-terminals hold no value.  The third part specifies the precedence andassociativity of terminals.  The last precedence declaration give itsterminals the highest precedence. The final part of the specification contains the grammar.<p> To produce a parser from this specification we use the CUP generator.If this specification were stored in a file <tt>parser.cup</tt>, then (on a Unix system at least) we might invoke CUP using a command like:<pre><tt> java java_cup.Main &lt; parser.cup</tt> </pre>or (starting with CUP 0.10k):<pre><tt> java java_cup.Main parser.cup</tt> </pre>The system will produce two Java source files containing parts of the generated parser: <tt>sym.java</tt> and <tt>parser.java</tt>(these names can be changed with command-line options; see <A HREF="#running">below</a>).  As you might expect, these two files contain declarations for the classes <tt>sym</tt> and <tt>parser</tt>. The <tt>sym</tt> class contains a series of constant declarations, one for each terminal symbol.  This is typically used by the scanner to refer to symbols (e.g. with code such as "<tt>return new Symbol(sym.SEMI);</tt>" ).  The <tt>parser</tt> class implements the parser itself.<p>The specification above, while constructing a full parser, does not perform any semantic actions &emdash; it will only indicate success or failure of a parse.To calculate and print values of each expression, we must embed Javacode within the parser to carry out actions at various points.  In CUP,actions are contained in <i>code strings</i> which are surrounded by delimiters of the form <tt>{:</tt> and <tt>:}</tt> (we can see examples of this in the <tt>init with</tt> and <tt>scan with</tt> clauses above).  In general, the system records all characters within the delimiters, but does not try to check that it contains valid Java code.<p>A more complete CUP specification for our example system (with actions embedded at various points in the grammar) is shown below:<br><hr><pre><tt>// CUP specification for a simple expression evaluator (w/ actions)import java_cup.runtime.*;/* Preliminaries to set up and use the scanner.  */init with {: scanner.init();              :};scan with {: return scanner.next_token(); :};/* Terminals (tokens returned by the scanner). */terminal           SEMI, PLUS, MINUS, TIMES, DIVIDE, MOD;terminal           UMINUS, LPAREN, RPAREN;terminal Integer   NUMBER;/* Non-terminals */non terminal            expr_list, expr_part;non terminal Integer    expr;/* Precedences */precedence left PLUS, MINUS;precedence left TIMES, DIVIDE, MOD;precedence left UMINUS;/* The grammar */expr_list ::= expr_list expr_part 	      |               expr_part;expr_part ::= expr:e 	      {: System.out.println("= " + e); :}               SEMI              	      ;expr      ::= expr:e1 PLUS expr:e2    	      {: RESULT = new Integer(e1.intValue() + e2.intValue()); :} 	      |               expr:e1 MINUS expr:e2                  {: RESULT = new Integer(e1.intValue() - e2.intValue()); :} 	      |               expr:e1 TIMES expr:e2 	      {: RESULT = new Integer(e1.intValue() * e2.intValue()); :} 	      |               expr:e1 DIVIDE expr:e2 	      {: RESULT = new Integer(e1.intValue() / e2.intValue()); :} 	      |               expr:e1 MOD expr:e2 	      {: RESULT = new Integer(e1.intValue() % e2.intValue()); :} 	      |               NUMBER:n                 	      {: RESULT = n; :} 	      |               MINUS expr:e             	      {: RESULT = new Integer(0 - e.intValue()); :} 	      %prec UMINUS	      |               LPAREN expr:e RPAREN     	      {: RESULT = e; :} 	      ;</tt></pre><hr><br>Here we can see several changes.  Most importantly, code to be executed at various points in the parse is included inside code strings delimited by <tt>{:</tt> and <tt>:}</tt>.  In addition, labels have been placed on various symbols in the right hand side of productions.  For example in:<br><pre>  expr:e1 PLUS expr:e2    	{: RESULT = new Integer(e1.intValue() + e2.intValue()); :} </pre><a name="RES_part">the first non-terminal <tt>expr</tt> has been labeled with <tt>e1</tt>, and the second with <tt>e2</tt>.  The left hand side value of each production is always implicitly labeled as <tt>RESULT</tt>.<p>Each symbol appearing in a production is represented at runtime by anobject of type <tt>Symbol</tt> on the parse stack.  The labels refer tothe instance variable <tt>value</tt> in those objects.  In theexpression <tt>expr:e1 PLUS expr:e2</tt>, <tt>e1</tt> and <tt>e2</tt>refer to objects of type Integer.  These objects are in the value fieldsof the objects of type <tt>Symbol</tt> representing those non-terminalson the parse stack.  <tt>RESULT</tt> is of type <tt>Integer</tt> aswell, since the resulting non-terminal <tt>expr</tt> was declared as of type <tt>Integer</tt>.  This object becomes the <tt>value</tt> instance
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -