📄 manual.html

📁 java cup constructor to compiler
💻 HTML
📖 第 1 页 / 共 5 页
字号:
production will be used.  At the end of a successful parse, CUP returnsan object of type <tt>java_cup.runtime.Symbol</tt>.  This<tt>Symbol</tt>'s value instance variable contains the final reductionresult.<p>The grammar itself follows the optional <tt>start</tt> declaration.  Eachproduction in the grammar has a left hand side non-terminal followed by the symbol "<tt>::=</tt>", which is then followed by a series of zero or moreactions, terminal, or non-terminalsymbols, followed by an optional contextual precedence assignment, and terminated with a semicolon (;).<p><a name="label_part">Each symbol on the right hand side can optionally be labeled with a name.Label names appear after the symbol name separated by a colon (:).  Labelnames must be unique within the production, and can be used within actioncode to refer to the value of the symbol.  Along with the label, twomore variables are created, which are the label plus <tt>left</tt> andthe label plus <tt>right</tt>.  These are <tt>int</tt> values thatcontain the right and left locations of what the terminal ornon-terminal covers in the input file.  These values must be properlyinitialized in the terminals by the lexer. The left and right valuesthen propagate to non-terminals to which productions reduce.<p> If there are several productions for the same non-terminal they may be declared together.  In this case the productions start with the non-terminal and "<tt>::=</tt>".  This is followed by multiple right hand sides each separated by a bar (|).  The full set of productions is then terminated by a semicolon.<p>Actions appear in the right hand side as code strings (e.g., Java code inside<tt>{:</tt> ... <tt>:}</tt> delimiters).  These are executed by the parserat the point when the portion of the production to the left of the action has been recognized.  (Note that the scanner will have returned the token one past the point of the action since the parser needs this extra<i>lookahead</i> token for recognition.)<p><a name="cpp">Contextual precedence assignments follow all the symbols and actions ofthe right hand side of the production whose precedence it is assigning.Contextual precedence assignment allows a production to be assigned aprecedence not based on the last terminal in it.  A good example isshown in the above sample parser specification:<pre><tt>	precedence left PLUS, MINUS;	precedence left TIMES, DIVIDE, MOD;	precedence left UMINUS;	expr ::=  MINUS expr:e             	          {: RESULT = new Integer(0 - e.intValue()); :} 	          %prec UMINUS</tt></pre>Here, there production is declared as having the precedence of UMINUS.Hence, the parser can give the MINUS sign two different precedences,depending on whether it is a unary minus or a subtraction operation. <a name="running"><h3>3. Running CUP</h3></a>As mentioned above, CUP is written in Java.  To invoke it, one needsto use the Java interpreter to invoke the static method <tt>java_cup.Main()</tt>, passing an array of strings containing options.  Assuming a Unix machine, the simplest way to do this is typically to invoke it directly from the command line with a command such as: <pre><tt>    java java_cup.Main <i>options</i> &lt; <i>inputfile</i></tt></pre>Once running, CUP expects to find a specification file on standard inputand produces two Java source files as output. Starting with CUP 0.10k,the final command-line argument may be a filename, in which case thespecification will be read from that file instead of from standard input.<p>In addition to the specification file, CUP's behavior can also be changedby passing various options to it.  Legal options are documented in<code>Main.java</code> and include:<dl>  <dt><tt>-package</tt> <i>name</i>    <dd>Specify that the <tt>parser</tt> and <tt>sym</tt> classes are to be        placed in the named package.  By default, no package specification        is put in the generated code (hence the classes default to the special        "unnamed" package).  <dt><tt>-parser</tt> <i>name</i>     <dd>Output parser and action code into a file (and class) with the given      name instead of the default of "<tt>parser</tt>".  <dt><tt>-symbols</tt> <i>name</i>    <dd>Output the symbol constant code into a class with the given      name instead of the default of "<tt>sym</tt>".  <dt><tt>-interface</tt>  <dd>Outputs the symbol constant code as an <code>interface</code>      rather than as a <code>class</code>.  <dt><tt>-nonterms</tt>        <dd>Place constants for non-terminals into the  symbol constant class.      The parser does not need these symbol constants, so they are not normally      output.  However, it can be very helpful to refer to these constants      when debugging a generated parser.  <dt><tt>-expect</tt> <i>number</i>        <dd>During parser construction the system may detect that an ambiguous       situation would occur at runtime.  This is called a <i>conflict</i>.        In general, the parser may be unable to decide whether to <i>shift</i>       (read another symbol) or <i>reduce</i> (replace the recognized right       hand side of a production with its left hand side).  This is called a       <i>shift/reduce conflict</i>.  Similarly, the parser may not be able       to decide between reduction with two different productions.  This is       called a <i>reduce/reduce conflict</i>.  Normally, if one or more of       these conflicts occur, parser generation is aborted.  However, in       certain carefully considered cases it may be advantageous to       arbitrarily break such a conflict.  In this case CUP uses YACC       convention and resolves shift/reduce conflicts by shifting, and       reduce/reduce conflicts using the "highest priority" production (the       one declared first in the specification).  In order to enable automatic       breaking of conflicts the <tt>-expect</tt> option must be given       indicating exactly how many conflicts are expected.  Conflicts      resolved by precedences and associativities are not reported.  <dt><tt>-compact_red</tt>     <dd>Including this option enables a table compaction optimization involving      reductions.  In particular, it allows the most common reduce entry in       each row of the parse action table to be used as the default for that       row.  This typically saves considerable room in the tables, which can       grow to be very large.  This optimization has the effect of replacing       all error entries in a row with the default reduce entry.  While this       may sound dangerous, if not down right incorrect, it turns out that this       does not affect the correctness of the parser.  In particular, some      changes of this type are inherent in LALR parsers (when compared to       canonical LR parsers), and the resulting parsers will still never       read past the first token at which the error could be detected.      The parser can, however, make extra erroneous reduces before detecting      the error, so this can degrade the parser's ability to do       <a href="#errors">error recovery</a>.      (Refer to reference [2] pp. 244-247 or reference [3] pp. 190-194 for a       complete explanation of this compaction technique.) <br><br>      This option is typically used to work-around the java bytecode      limitations on table initialization code sizes.  However, CUP      0.10h introduced a string-encoding for the parser tables which      is not subject to the standard method-size limitations.      Consequently, use of this option should no longer be required      for large grammars.  <dt><tt>-nowarn</tt>          <dd>This options causes all warning messages (as opposed to error messages)      produced by the system to be suppressed.  <dt><tt>-nosummary</tt>       <dd>Normally, the system prints a summary listing such things as the       number of terminals, non-terminals, parse states, etc. at the end of      its run.  This option suppresses that summary.  <dt><tt>-progress</tt>        <dd>This option causes the system to print short messages indicating its      progress through various parts of the parser generation process.  <dt><tt>-dump_grammar</tt>    <dt><tt>-dump_states</tt>     <dt><tt>-dump_tables</tt>     <dt><tt>-dump</tt>            <dd> These options cause the system to produce a human readable dump of       the grammar, the constructed parse states (often needed to resolve       parse conflicts), and the parse tables (rarely needed), respectively.       The <tt>-dump</tt> option can be used to produce all of these dumps.  <dt><tt>-time</tt>            <dd>This option adds detailed timing statistics to the normal summary of      results.  This is normally of great interest only to maintainers of       the system itself.  <dt><tt>-debug</tt>            <dd>This option produces voluminous internal debugging information about      the system as it runs.  This is normally of interest only to maintainers       of the system itself.  <dt><tt>-nopositions</tt>            <dd>This option keeps CUP from generating code to propagate the left      and right hand values of terminals to non-terminals, and then from      non-terminals to other terminals.  If the left and right values aren't      going to be used by the parser, then it will save some runtime      computation to not generate these position propagations.  This option      also keeps the left and right label variables from being generated, so      any reference to these will cause an error.  <dt><tt>-noscanner</tt>  <dd>CUP 0.10j introduced <a href="#scanner">improved scanner  integration</a> and a new interface,  <code>java_cup.runtime.Scanner</code>.  By default, the   generated parser refers to this interface, which means you cannot  use these parsers with CUP runtimes older than 0.10j.  If your  parser does not use the new scanner integration features, then you  may specify the <code>-noscanner</code> option to suppress the  <code>java_cup.runtime.Scanner</code> references and allow  compatibility with old runtimes.  Not many people should have reason  to do this.  <dt><tt>-version</tt>  <dd>Invoking CUP with the <code>-version</code> flag will cause it  to print out the working version of CUP and halt.  This allows  automated CUP version checking for Makefiles, install scripts and  other applications which may require it.</dl><a name="parser"><h3>4. Customizing the Parser</h3></a>Each generated parser consists of three generated classes.  The <tt>sym</tt> class (which can be renamed using the <tt>-symbols</tt>option) simply contains a series of <tt>int</tt> constants,one for each terminal.  Non-terminals are also included if the <tt>-nonterms</tt>option is given.  The source file for the <tt>parser</tt> class (which canbe renamed using the <tt>-parser</tt> option) actually contains two class definitions, the public <tt>parser</tt> class that implements the actual parser, and another non-public class (called <tt>CUP$action</tt>) which encapsulates all user actions contained in the grammar, as well as code from the <tt>action code</tt> declaration.  In addition to user supplied code, thisclass contains one method: <tt>CUP$do_action</tt> which consists of a large switch statement for selecting and executing various fragments of user supplied action code.  In general, all names beginning with the prefix of <tt>CUP$</tt> are reserved for internal uses by CUP generated code. <p> The <tt>parser</tt> class contains the actual generated parser.  It is a subclass of <tt>java_cup.runtime.lr_parser</tt> which implements a general table driven framework for an LR parser.  The generated <tt>parser</tt>class provides a series of tables for use by the general framework.  Three tables are provided:<dl compact><dt>the production table <dd>provides the symbol number of the left hand side non-terminal, along with    the length of the right hand side, for each production in the grammar,<dt>the action table<dd>indicates what action (shift, reduce, or error) is to be taken on each     lookahead symbol when encountered in each state, and<dt>the reduce-goto table<dd>indicates which state to shift to after reduces (under each non-terminalfrom each state). </dl>(Note that the action and reduce-goto tables are not stored as simple arrays,but use a compacted "list" structure to save a significant amount of space.See comments the runtime system source code for details.)<p>Beyond the parse tables, generated (or inherited) code provides a series of methods that can be used to customize the generated parser.  Some of thesemethods are supplied by code found in part of the specification and can be customized directly in that fashion.  The others are provided by the<tt>lr_parser</tt> base class and can be overridden with new versions (viathe <tt>parser code</tt> declaration) to customize the system.  Methodsavailable for customization include:<dl compact><dt><tt>public void user_init()</tt><dd>This method is called by the parser prior to asking for the first token     from the scanner.  The body of this method contains the code from the     <tt>init with</tt> clause of the the specification.  <dt><a name="scan_method"><tt>public java_cup.runtime.Symbol scan()</tt></a><dd>This method encapsulates the scanner and is called each time a new    terminal is needed by the parser.  The body of this method is     supplied by the <tt>scan with</tt> clause of the specification, if    present; otherwise it returns <code>getScanner().next_token()</code>.<dt><tt>public java_cup.runtime.Scanner getScanner()</tt><dd>Returns the default scanner.  See <a href="#scanner">section 5</a>.<dt><tt>public void setScanner(java_cup.runtime.Scanner s)</tt><dd>Sets the default scanner.  See <a href="#scanner">section 5</a>.<dt><tt> public void report_error(String message, Object info)</tt><dd>This method should be called whenever an error message is to be issued.  In    the default implementation of this method, the first parameter provides     the text of a message which is printed on <tt>System.err</tt>     and the second parameter is simply ignored.  It is very typical to    override this method in order to provide a more sophisticated error    reporting mechanism.<dt><tt>public void report_fatal_error(String message, Object info)</tt><dd>This method should be called whenever a non-recoverable error occurs.  It     responds by calling <tt>report_error()</tt>, then aborts parsing    by calling the parser method <tt>done_parsing()</tt>, and finally    throws an exception.  (In general <tt>done_parsing()</tt> should be called     at any point that parsing needs to be terminated early).<dt><tt>public void syntax_error(Symbol cur_token)</tt><dd>This method is called by the parser as soon as a syntax error is detected    (but before error recovery is attempted).  In the default implementation it    calls: <tt>report_error("Syntax error", null);</tt>.<dt><tt>public void unrecovered_syntax_error(Symbol cur_token)</tt><dd>This method is called by the parser if it is unable to recover from a     syntax error.  In the default implementation it calls:    <tt>report_fatal_error("Couldn't repair and continue parse", null);</tt>.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -