📄 javacoco.htm

📁 cocorj09-一个Java语言分析器
💻 HTM
📖 第 1 页 / 共 5 页
字号:
<PRE>   Option =  'push' (. stack[++top] = item; .)
           | 'pop'  (. item = stack[top--]; .)
           |        (. MonitorStack(); .) .
</PRE>

<P><B>Syntax error handling</B>. The programmer has to give some hints
in order to allow Coco/R to generate good and efficient error handling.
Firstly, synchronization points have to be specified. A synchronization point
is a location in the grammar where particularly &quot;safe&quot; symbols are
expected in the sense that they are hardly ever missing or mistyped.  </P>

<P>In most languages, good candidates for synchronization points are the
beginning of a statement (where <I>if</I>, <I>while</I>, etc. are expected),
or the beginning of a declaration sequence (where <I>static</I>,
<I>public</I>, etc. are expected). A synchronization point is specified by the
symbol SYNC. When the generated parser reaches such a point, it skips all
input until a symbol occurs that is expected at that point. The end-of-file
symbol is always included amongst the synchronization symbols. This guarantees
that synchronization terminates at least at the end of the source text. </P>

<P>The union of all synchronization sets we shall denote by <I>AllSyncs</I>.
Error-handling can be further improved by specifying which terminals are
weak in a certain context. A &quot;weak terminal&quot; is a symbol that
is often mistyped or missing, such as the semicolon between statements,
and is denoted by preceding it in the grammar with the keyword WEAK. When
the parser expects a weak symbol, but does not find it in the input stream,
it adjusts the input to the next symbol that is either a legal successor
of the weak symbol or a member of <I>AllSyncs </I>(symbols expected at
synchronization points are considered to be particularly &quot;strong&quot;,
so that it makes sense that they are never skipped). </P>

<P><I>Examples</I> </P>

<PRE>   StatementSeq = Statement {WEAK &quot;;&quot; Statement} .
   Declaration = SYNC (&quot;CONST&quot; | &quot;TYPE&quot; | &quot;VAR&quot; | ...) .
</PRE>

<P>As in the first of the above examples, iterations frequently start with
a weak terminal, in situations that can be generally described by</P>

<PRE>   Sequence = FirstPart { WEAK ExpectedTerminal IteratedPart } LastPart .</PRE>

<P>Such weak separators are handled in a special way: if <I>ExpectedTerminal
</I>is not recognized, source tokens are consumed until a token is found
that is either a member of FIRST(<I>IteratedPart</I>) or of FIRST(<I>LastPart</I>)
or of <I>AllSyncs</I>.</P>

<P><B>LL(1) requirements</B>. Recursive descent parsing requires that the
grammar of the parsed language satisfies the LL(1) property. This means
that at any point in the grammar the parser must be able to decide on the
basis of a single lookahead symbol which of several possible alternatives
have to be selected. For example, the following production is not LL(1):</P>

<PRE>   Statement =   ident ':=' Expression
               | ident ['(' ExpressionList ')'] .
</PRE>

<P>Both alternatives start with the symbol <I>ident</I>. When the parser comes
to a statement and <I>ident</I> is the next input symbol, it cannot distinguish
between these alternatives. However, the production can easily be transformed
into </P>

<PRE>   Statement = ident ( &quot;:=&quot; Expression |  ['(' ExpressionList ')'] ) .
</PRE>

<P>where all alternatives start with distinct symbols. There are LL(1)
conflicts that are not as easy to detect as in the above example. It can be
hard for programmers to detect these without a tool like Coco/R that checks the
grammar automatically; Coco/R gives appropriate error messages that suggest
how to correct any LL(1) conflicts.</P>

<A NAME="UserGuide"></A>

<H2>3. User Guide</H2>

<P>In order to generate an application (like a compiler) using Coco/R, a user has to </P>

<OL>
<LI>write an attributed grammar in Cocol/R; </LI>

<LI>process the grammar with Coco/R to generate a scanner and a parser;
</LI>

<LI>develop any supporting classes (like code generators or table handlers)
that are used in the grammar; </LI>

<LI>write a driver (main method) that starts the scanner and the parser. </LI>
</OL>

<P>Coco/R processes an attributed grammar to generate the following files: </P>

<UL>
<LI><I>Scanner.java</I>: containing the classes <I>Scanner</I> and <I>Token</I>
</LI>

<LI><I>Parser.java</I>: containing the class <I>Parser</I> (and, optionally
a class <I>SYM</I> if the NAMES feature is used in the scanner specification)
</LI>

<LI><I>ErrorStream.java</I>: containing the class <I>ErrorStream</I> </LI>

<LI><I>listing</I>: containing trace output (if any). </LI>
</UL>

<P>These files are generated in the directory that holds the attributed
grammar. All generated classes belong to a package, whose name is the name
of the start symbol of the attributed grammar. </P>

<P>These classes are generated by inserting appropriate Java source into
so-called <I>frame files</I>: </P>

<UL>
<LI><I>Scanner.frame</I> </LI>

<LI><I>Parser.frame</I> </LI>
</UL>

<P>that have to be in the same directory as the attributed grammar. </P>

<P>Coco/R may be started in "command-line" mode, with a parameter specifying
the name of the attribute grammar to be processed, for example:</P>

<PRE>
   java Coco.Comp MyGrammar.atg
</PRE>

<P>Alternatively it may be started in "GUI" mode, when it shows a file dialog
from which the user can select the attributed grammar file.</P>

<P>In either mode, error messages and warnings by default go to the standard
output stream (<I>System.out</I>).  It is possible to develop more
sophisticated ways of dealing with error output (see section 3.4).</P>

<H3>3.1 The Parser interface</H3>

<P>A parser generated by Coco/R has the following interface: </P>

<PRE>   class Parser {
     static Token token;              // last recognized token
     static Token t;                  // lookahead token
     static void Parse();             // instigate parse
     static void Error(int n);        // report syntax error n
     static void SemError(int n);     // report semantic error n
     static boolean Successful();     // return true if no errors reported
     static String LexString();       // return exact text of parsed token
     static String LexName();         // return text of parsed token (uppercase?)
     static String LookAheadString(); // return exact text of lookahead token
     static String LookAheadName();   // return text of lookahead token (uppercase?)
   }
</PRE>

<P>The parser is started by calling the method <I>Parse()</I>. This is done in
the main method of the compiler (see Section 3.5). <I>Parse</I> and other internal
private methods of the class repeatedly call the scanner to get tokens, and
execute semantic actions at the appropriate places. </P>

<P>The field <I>token</I> holds the most recently parsed token. The field
<I>t</I> is the lookahead token that has already been obtained from the
scanner but not yet parsed. In a semantic action, <I>token</I> refers to the
token immediately before the action commences, and <I>t</I> to the token
immediately visible after the action. </P>

<P>The other methods represent extensions to the original implementation, and
have been provided for convenience, particularly for programmers familiar with
the C, Pascal and Modula versions of Coco/R.  These methods provide hooks into
the parser and error reporters which may be useful in developing semantic
actions within the parser, as well as in developing methods in supporting
classes like table handlers.</P>

Calls to <I>Error(n)</I> are automatically inserted into the generated parser
for the purposes of reporting syntax errors.  The error numbers <I>n</I> are
generated by Coco/R along with appropriate error messages that can be
displayed and related to the position of the <I>lookahead</I> token.  There may be
situations where supporting classes also find it convenient to invoke this
method. However, these are more likely to wish to invoke the <I>SemError(n)</I>
method with an appropriately chosen error number, which will report a semantic
error message related to the position of the most recently <I>parsed</I> token.  In
order to obtain meaningful messages, a programmer may conveniently derive a
suitable subclass from the <I>ErrorStream</I> class; see section 3.4.</P>

<P>A call to the method <I>Successful()</I> will return <I>true</I> if parsing
has revealed no syntactic or semantic errors up to the time when the call is
made.</P>

<P>The remaining methods return a string that represent a lexeme for one of
the tokens scanned.  <I>LexString()</I> and <I>LexName()</I> return identical
strings unless the scanner has been built with the IGNORE CASE option
selected; in this case <I>LexName</I> returns the string converted to upper
case.</P>


<H3>3.2 The Token interface</H3>

<P>A token obtained from the scanner has the following structure: </P>

<PRE>   class Token {
     int kind;    // token kind
     int pos;     // token position in the source file (starting at 0)
     int line;    // token line (starting at 1)
     int col;     // token column (starting at 1)
     String str;  // token text (exactly as found in the source)
     String val;  // token text (converted to upper case if IGNORE CASE used)
   }
</PRE>

<P>Knowledge of this structure may be of use in the construction of semantic
actions and support methods.  Values of <I>kind</I> are assigned by Coco/R
dynamically.  Values of <I>pos, line</I> and <I>col</I> may be of use in error
reporting.  However, in many applications the inner details of a token may
not be needed, as the parser interface provides methods for dealing with error
reporting.</P>

<H3>3.3 The Scanner interface</H3>

<P>A scanner generated by Coco/R has the following interface: </P>

<PRE>   class Scanner
     static ErrorStream err;                         // error handler
     static void Init (String file, ErrorStream e);  // constructor
     static void Init (String file);                 // constructor
     static Token Scan();                            // scan for next token
   }
</PRE>

<P>The actual scanner is provided by the method <I>Scan()</I> which returns a
<I>Token</I> object every time it is called by the parser (these calls are
generated by Coco/R; users should not call <I>Scan</I> directly). When the
input is exhausted, it returns a special end-of-file token (with the token
code 0) that is known to the parser.</P>

<P>The scanner has to be initialized by calling the method <I>Init</I>,
passing it the name of the source file to be scanned. Optionally, one can
also specify an error stream object to be used for error handling (see
Section 3.4). If no error stream object is specified, a default error stream
object is installed. This object is installed in the field <I>err</I>.</P>

<H3>3.4 The ErrorStream interface</H3>

<P>Coco/R generates an <I>ErrorStream</I> class that can be used for reporting
errors; by default these are printed to <I>System.out</I>. This class has the
following interface: </P>

<PRE>   class ErrorStream {
     int count;                                           // number of reported errors
     ErrorStream();                                       // constructor
     void StoreError(int n, int line, int col, String s); // report/store an error
     void ParsErr(int n, int line, int col);              // record syntax error
     void SemErr(int n, int line, int col);               // record semantic error
     void Exception (String s);                           // report S and abort
     void Summarize (String s);                           // report on progress
   }
</PRE>

<P>Internally, whenever the parser calls the method <I>Error(n)</I> on
detecting a syntax error, this calls the method <I>ParsErr</I> and increments
<I>count</I>.  Coco/R generates code for this method such that it associates
meaningful error messages with the error; these messages are then passed on to
the <I>StoreError</I> method, which by default simply prints them to
<I>System.out</I>.  Similarly, calls made to the parser method <I>SemError</I>
result in calls to the method <I>SemErr</I>, which passes a default message
on to <I>StoreError</I>.  Although the interface provided by the parser
methods <I>Error</I> and <I>SemError</I> is probably easier to use, the
<I>ErrorStream</I> methods can be called directly by the programmer from the
semantic actions of the attributed grammar when a semantic error is detected.
The programmer has to provide an error code as well as position information
(obtained from the scanner). For example, the call could look like </P>

<PRE>   Scanner.err.SemErr(3, token.line, token.col);
</PRE>

<P>As mentioned, the default implementation of <I>SemErr</I> simply prints the
error code and the position information. To get more meaningful error
messages, the programmer can derive a subclass from <I>ErrorStream</I>, and
override <I>SemErr</I> so that the error code is transformed into an error
message. The way in which this might be achieved is exemplified by: </P>

<PRE>   class MyErrorStream extends ErrorStream {

     void SemErr(int n, int line, int col) {
       String s;
       count++;
       switch (n) {
         case -1: {s = "invalid character"; break;}
         case 1: {s = "My Message 1"; break;}
         // insert other application specific error messages here
         default: {s = "Semantic error " + n; break;}
       }
       StoreError(n, line, col, s);
     }

   }
</PRE>

<P>An object of such a subclass has then to be passed to the <I>Init</I>
method of the scanner. </P>

<P>If subclasses are to be derived, the user might also choose to override the
<I>StoreError</I> method so that the messages are manipulated in other ways.
A file supplied with the distribution shows how such a subclass can be
developed that merges the error messages in a source test listing in a manner
reminiscent of various other compilers.</P>

<P>The method <I>Exception(s)</I> can be called by the programmer when a serious
error occurs which makes any continuation obsolete. After printing the
parameter <I>s</I> the compilation is aborted. </P>

<P>After parsing, the number of syntax and semantic errors detected can be
obtained from <I>Scanner.err.count</I>. If this field is 0 the compilation was
successful (although in the case of Coco/R itself the parser and scanner are
not generated if the grammar does not satisfy various other checks as well).
</P>


<H3>3.5 The driver program</H3>

<P>The main function in the driver class of an application has to initialize
the scanner (possibly after creating a custom error stream object) and call
the parser. The following example shows a very minimal implementation of a
driver program:</P>

<PRE>   package GoalIdentifier;  // all classes form part of this package
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -