📄 manual.html

📁 java cup constructor to compiler
💻 HTML
📖 第 1 页 / 共 5 页
字号:
variable of a new <tt>Symbol</tt> object.<p>  For each label, two more variables accessible to the user are declared.A left and right value labels are passed to the code string, so that theuser can find out where the left and right side of each terminal ornon-terminal is in the input stream.  The name of these variables is thelabel name, plus <tt>left</tt> or <tt>right</tt>.  for example, giventhe right hand side of a production <tt>expr:e1 PLUS expr:e2</tt> theuser could not only access variables <tt>e1</tt> and <tt>e2</tt>, butalso <tt>e1left, e1right, e2left</tt> and <tt>e2right</tt>.  thesevariables are of type <tt>int</tt>.<p>    <a name="lex_part">The final step in creating a working parser is to create a <i>scanner</i> (alsoknown as a <i>lexical analyzer</i> or simply a <i>lexer</i>).  This routine is responsible for reading individual characters, removing things things likewhite space and comments, recognizing which terminal symbols from the grammar each group of characters represents, then returning Symbol objectsrepresenting these symbols to the parser.The terminals will be retrieved with a call to thescanner function.  In the example, the parser will call<tt>scanner.next_token()</tt>. The scanner should return objects oftype <tt>java_cup.runtime.Symbol</tt>.  This type is very different thanolder versions of CUP's <tt>java_cup.runtime.symbol</tt>.  These Symbolobjects contains the instance variable <tt>value</tt> of type Object, which should beset by the lexer.  This variable refers to the value of that symbol, andthe type of object in value should be of the same type as declared inthe <tt>terminal</tt> and <tt>non terminal</tt> declarations.  In theabove example, if the lexer wished to pass a NUMBER token, it shouldcreate a <tt>Symbol</tt> with the <tt>value</tt> instance variablefilled with an object of type <tt>Integer</tt>.  <code>Symbol</code>objects corresponding to terminals and non-terminals with no valuehave a null value field.<p>The code contained in the <tt>init with</tt> clause of the specification will be executed before any tokens are requested.  Each token will be requested using whatever code is found in the <tt>scan with</tt> clause.Beyond this, the exact form the scanner takes is up to you; howevernote that each call to the scanner function should return a newinstance of <code>java_cup.runtime.Symbol</code> (or a subclass).These symbol objects are annotated with parser information and pushedonto a stack; reusing objects will result in the parser annotationsbeing scrambled.  As of CUP 0.10j, <code>Symbol</code> reuse should bedetected if it occurs; the parser will throw an <code>Error</code>telling you to fix your scanner.<p>In the <a href="#spec">next section</a> a more detailed and formal explanation of all parts of a CUP specification will be given.  <a href="#running">Section 3</a> describes options for running the CUP system.  <a href="#parser">Section 4</a> discusses the details of how to customize a CUP parser, while <a href="#scanner">section 5</a>discusses the scanner interface added in CUP 0.10j. <a href="#errors">Section 6</a> considers error recovery.  Finally, <a href="#conclusion">Section 7</a> provides a conclusion.<a name="spec"><h3>2. Specification Syntax</h3></a>Now that we have seen a small example, we present a complete description of all parts of a CUP specification.  A specification has four sections with a total of eight specific parts (however, most of these are optional).  A specification consists of:<ul><li> <a href="#package_spec">package and import specifications</a>,<li> <a href="#code_part">user code components</a>,<li> <a href="#symbol_list">symbol (terminal and non-terminal) lists</a>, <li> <a href="#precedence">precedence declarations</a>, and<li> <a href="#production_list">the grammar</a>.</ul>Each of these parts must appear in the order presented here.  (A complete grammar for the specification language is given in <a href="#appendixa">Appendix A</a>.)  The particulars of each part ofthe specification are described in the subsections below.<p><h5><a name="package_spec">Package and Import Specifications</a></h5>A specification begins with optional <tt>package</tt> and <tt>import</tt> declarations.  These have the same syntax, and play the same role, as the package and import declarations found in a normal Java program.A package declaration is of the form:<pre><tt>    package <i>name</i>;</tt></pre>where name <tt><i>name</i></tt> is a Java package identifier, possibly inseveral parts separated by ".".  In general, CUP employs Java lexicalconventions.  So for example, both styles of Java comments are supported,and identifiers are constructed beginning with a letter, dollarsign ($), or underscore (_), which can then be followed by zero or moreletters, numbers, dollar signs, and underscores.<p>After an optional <tt>package</tt> declaration, there can be zero or more <tt>import</tt> declarations. As in a Java program these have the form:<pre><tt>    import <i>package_name.class_name</i>;</tt></pre>or<pre><tt>    import <i>package_name</i>.*;</tt></pre>The package declaration indicates what package the <tt>sym</tt> and <tt>parser</tt> classes that are generated by the system will be in.  Any import declarations that appear in the specification will also appearin the source file for the <tt>parser</tt> class allowing various names fromthat package to be used directly in user supplied action code.<h5><a name="code_part">User Code Components</a></h5>Following the optional <tt>package</tt> and <tt>import</tt> declarationsare a series of optional declarations that allow user code to be includedas part of the generated parser (see <a href="#parser">Section 4</a> for a full description of how the parser uses this code).  As a part of the parser file, a separate non-public class to contain all embedded user actions is produced.  The first <tt>action code</tt> declaration section allows code to be included in this class.  Routines and variables for use by the code embedded in the grammar would normally be placed in this section (a typical example might be symbol table manipulation routines).  This declaration takes the form:<pre><tt>    action code {: ... :};</tt></pre>where <tt>{: ... :}</tt> is a code string whose contents will be placeddirectly within the <tt>action class</tt> class declaration.<p>After the <tt>action code</tt> declaration is an optional <tt>parser code</tt> declaration.  This declaration allows methods andvariable to be placed directly within the generated parser class.Although this is less common, it can be helpful when customizing the parser &emdash; it is possible for example, to include scanning methods insidethe parser and/or override the default error reporting routines.  This declaration is very similar to the <tt>action code</tt> declaration and takes the form:<pre><tt>    parser code {: ... :};</tt></pre>Again, code from the code string is placed directly into the generated parserclass definition.<p>Next in the specification is the optional <tt>init</tt> declaration which has the form:<pre><tt>    init with {: ... :};</tt></pre>This declaration provides code that will be executed by the parserbefore it asks for the first token.  Typically, this is used to initializethe scanner as well as various tables and other data structures that mightbe needed by semantic actions.  In this case, the code given in the codestring forms the body of a <tt>void</tt> method inside the <tt>parser</tt> class.<p>The final (optional) user code section of the specification indicates how the parser should ask for the next token from the scanner.  This has theform:<pre><tt>    scan with {: ... :};</tt></pre>As with the <tt>init</tt> clause, the contents of the code string formsthe body of a method in the generated parser.  However, in this casethe method returns an object of type <tt>java_cup.runtime.Symbol</tt>.Consequently the code found in the <tt>scan with</tt> clause should return such a value.  See <a href="#scanner">section 5</a> forinformation on the default behavior if the <code>scan with</code>section is omitted.<p>As of CUP 0.10j the action code, parser code, init code, and scan withsections may appear in any order. They must, however, precede thesymbol lists.<p><h5><a name="symbol_list">Symbol Lists</a></h5>Following user supplied code comes the first required part of the specification: the symbol lists.  These declarations are responsible for naming and supplying a type for each terminal and non-terminalsymbol that appears in the grammar.  As indicated above, each terminaland non-terminal symbol is represented at runtime with a <tt>Symbol</tt>object.  Inthe case of terminals, these are returned by the scanner and placed onthe parse stack.  The lexer should put the value of the terminal in the<tt>value</tt> instance variable.  In the case of non-terminals these replace a seriesof <tt>Symbol</tt> objects on the parse stack whenever the right hand side ofsome production is recognized.  In order to tell the parser which objecttypes should be used for which symbol, <tt>terminal</tt> and <tt>non terminal</tt> declarations are used.  These take the forms:<pre><tt>    terminal <i>classname</i> <i>name1, name2,</i> ...;</tt><tt>    non terminal <i>classname</i> <i>name1, name2,</i> ...;</tt><tt>    terminal <i>name1, name2,</i> ...;</tt></pre>and<pre><tt>    non terminal <i>name1, name2,</i> ...;</tt></pre>where <tt><i>classname</i></tt> can be a multiple part name separated with"."s.  The<tt><i>classname</i></tt> specified represents the type of the value ofthat terminal or non-terminal.  When accessing these values throughlabels, the users uses the type declared. the <tt><i>classname</i></tt>can be of any type.  If no <tt><i>classname</i></tt> is given, then theterminal or non-terminal holds no value.  a label referring to such asymbol with have a null value. As of CUP 0.10j, you may specifynon-terminals the declaration "<code>nonterminal</code>" (note, nospace) as well as the original "<code>non terminal</code>" spelling.<p>Names of terminals and non-terminals cannot be CUP reserved words;these include "code", "action", "parser", "terminal", "non","nonterminal", "init", "scan", "with", "start", "precedence", "left","right", "nonassoc", "import", and "package".<p><h5><a name="precedence">Precedence and Associativity declarations</a></h5>The third section, which is optional, specifies the precedences andassociativity of terminals.  This is useful for parsing with ambiguousgrammars, as done in the example above. There are three type ofprecedence/associativity declarations:<pre><tt>	precedence left     <i>terminal</i>[, <i>terminal</i>...];	precedence right    <i>terminal</i>[, <i>terminal</i>...];	precedence nonassoc <i>terminal</i>[, <i>terminal</i>...];</tt></pre>The comma separated list indicates that those terminals should have theassociativity specified at that precedence level and the precedence ofthat declaration.  The order of precedence, from highest to lowest, isbottom to top.  Hence, this declares that multiplication and division havehigher precedence than addition and subtraction:<pre><tt>	precedence left  ADD, SUBTRACT;	precedence left  TIMES, DIVIDE;</tt></pre>Precedence resolves shift reduce problems.  For example, given the inputto the above example parser <tt>3 + 4 * 8</tt>, the parser doesn't knowwhether to reduce <tt>3 + 4</tt> or shift the '*' onto the stack.However, since '*' has a higher precedence than '+', it will be shiftedand the multiplication will be performed before the addition.<p>CUP assigns each one of its terminals a precedence according to thesedeclarations.  Any terminals not in this declaration have lowestprecedence.  CUP also assigns each of its productions a precedence.That precedence is equal to the precedence of the last terminal in thatproduction.  If the production has no terminals, then it has lowestprecedence. For example, <tt>expr ::= expr TIMES expr</tt> would havethe same precedence as <tt>TIMES</tt>.  When there is a shift/reduceconflict, the parser determines whether the terminal to be shifted has ahigher precedence, or if the production to reduce by does.  If theterminal has higher precedence, it it shifted, if the production hashigher precedence, a reduce is performed.  If they have equalprecedence, associativity of the terminal determine what happens.<p>An associativity is assigned to each terminal used in theprecedence/associativity declarations.  The three associativities are<tt>left, right</tt> and <tt>nonassoc</tt>  Associativities are alsoused to resolve shift/reduce conflicts, but only in the case of equalprecedences.  If the associativity of the terminal that can be shiftedis <tt>left</tt>, then a reduce is performed.  This means, if the inputis a string of additions, like <tt>3 + 4 + 5 + 6 + 7</tt>, the parserwill <i>always</i> reduce them from left to right, in this case,starting with <tt>3 + 4</tt>.  If the associativity of the terminal is<tt>right</tt>, it is shifted onto the stack.  hence, the reductionswill take place from right to left.  So, if PLUS were declared withassociativity of <tt>right</tt>, the <tt>6 + 7</tt> would be reducedfirst in the above string.  If a terminal is declared as<tt>nonassoc</tt>, then two consecutive occurrences of equal precedencenon-associative terminals generates an error.  This is useful forcomparison operations.  For example, if the input string is <tt>6 == 7 == 8 == 9</tt>, the parser should generate an error.  If '=='is declared as <tt>nonassoc</tt> then an error will be generated. <p>All terminals not used in the precedence/associativity declarations aretreated as lowest precedence.  If a shift/reduce error results,involving two such terminals, it cannot be resolved, as the aboveconflicts are, so it will be reported.<p><h5><a name="production_list">The Grammar</a></h5>The final section of a CUP declaration provides the grammar.  This section optionally starts with a declaration of the form:<pre><tt>    start with <i>non-terminal</i>;</tt></pre>This indicates which non-terminal is the <i>start</i> or <i>goal</i> non-terminal for parsing.  If a start non-terminal is not explicitlydeclared, then the non-terminal on the left hand side of the first
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -