📄 javacoco.htm

📁 cocorj09-一个Java语言分析器
💻 HTM
📖 第 1 页 / 共 5 页
字号:

   public class Comp {

     public static void main (String[] args) {
       Scanner.Init(args[0]);
       Parser.Parse();
       System.out.println(Scanner.err.count + &quot; errors detected&quot;);
       System.exit(0);
     }

   }
</PRE>

<P>If users wish to provide meaningful semantic error messages they may, as
mentioned above, create a subclass of <I>ErrorStream</I>, and pass an object
of this class to the <I>Init</I> method of the scanner: </P>

<PRE>   package GoalIdentifier;  // all classes form part of this package

   class MyErrorStream extends ErrorStream {
    ...
   }

   public class Comp {

     public static void main (String[] args) {
       ErrorStream E = new MyErrorStream();
       Scanner.Init(args[0], E);
       Parser.Parse();
       E.Summarize("");
       System.exit(0);
     }

   }
</PRE>

<P>Various other possibilities for the driver class are given in the
distribution kits.</P>

<H3>3.6 Grammar tests</H3>

<P>Coco/R performs several tests to check that the grammar, besides being
syntactically correct, is also semantically well-formed. If one of the
following error messages is produced, no compiler parts are generated. </P>

<DL>
<DT>No production for X </DT>

<DD>The nonterminal X has been used but there is no production for it.
</DD>

<DT>X cannot be reached </DT>

<DD>There is a production for nonterminal X, but X cannot be derived from
the start symbol. </DD>

<DT>X cannot be derived to terminals </DT>

<DD>For example, if there is a production X = &quot;(&quot; X &quot;)&quot; .
</DD>

<DT>X --&gt; Y, Y --&gt; X </DT>

<DD>X and Y are nonterminals with circular derivations. </DD>

<DT>Tokens X and Y cannot be distinguished </DT>

<DD>The terminal symbols X and Y are declared to have the same structure,
e.g.,<BR>
&nbsp;&nbsp;integer = digit {digit} .<BR>
&nbsp;&nbsp;real = digit {digit} [&quot;.&quot; {digit}] .<BR>
In this example, a digit string could be recognized either as an integer
or as a real. </DD>
</DL>

<P>The following messages are warnings. They may indicate an error but
they may also describe desired effects. The generated compiler parts are
valid, although if an LL(1) error is reported for a construct X one must be
aware that the generated parser will choose the first of several possible
alternatives for X. </P>

<DL>
<DT>X deletable </DT>

<DD>X can be derived to the empty string, e.g., X = {Y} . </DD>

<DT>LL(1) error in X: Y is the start of more than one alternative </DT>

<DD>Several alternatives in the production of X start with the terminal
Y, e.g.,<BR>
&nbsp;&nbsp;Statement = ident &quot;:=&quot; Expression | ident [ActualParameters] .
</DD>

<DT>LL(1) error in X: Y is the start and successor of a deletable structure
</DT>

<DD>Deletable structures are [...] and {...}, e.g.,<BR>
&nbsp;&nbsp;qualident = [ident &quot;.&quot;] ident .<BR>
&nbsp;&nbsp;Statement = &quot;IF&quot; Expression &quot;THEN&quot; Statement
[&quot;ELSE&quot; Statement] .<BR>
The ELSE at the start of the else-part may also be a successor of a statement.
This LL(1) conflict is known under the name &quot;dangling else&quot;.
</DD>
</DL>

<H3>3.7 Trace output</H3>

<P>Coco/R can produce various output that can help in spotting LL(1) errors
or in understanding the generated parts. Trace output can be switched on with
the pragma </P>

<PRE>   '$' {digit | letter}
</PRE>

<P>at the beginning of the attributed grammar. The output goes to the file
<I>listing</I>. The effect of the switches is as follows (only digits have any
meaning in the current implementation): </P>

<TABLE>
<TR>
<TD>0:<BR>
1:<BR>
2:<BR>
3:<BR>
4:<BR>
6:<BR>
7:<BR>
8: </TD>

<TD>prints the states of the scanner automaton<BR>
prints the First and Follow sets of all nonterminals<BR>
prints the syntax graph of the productions<BR>
traces the computation of the First sets<BR>
generates names for the terminal tokens<BR>
prints the symbol table (terminals, nonterminals, pragmas)<BR>
prints a cross reference list of all syntax symbols<BR>
prints statistics about the Coco/R run </TD>
</TR>
</TABLE>

<P><A NAME="hints"></A></P>

<H2>4. Hints for Advanced Users of Coco/R</H2>

<H3>Providing a hand-written scanner</H3>

<P>Scanning is a time-consuming task. The scanner generated by Coco/R is
optimized, but it is implemented as a deterministic finite automaton, which
introduces some overhead. A manual implementation of the scanner can be
slightly more efficient. For time-critical applications a programmer may
wish to generate a parser but provide a hand-written scanner. This can
be done by declaring <I>all</I> terminal symbols (including literals) as tokens,
but without defining their structures by EBNF expressions, e.g., </P>

<PRE>   TOKENS
     ident
     number
     &quot;if&quot;
     ...
</PRE>

<P>If a named token is declared without structure, no scanner is generated.
Tokens are assigned numbers in the order of their declaration; i.e., the
first token gets the number 1, the second the number 2, etc. The number
0 is reserved for the end-of-file symbol. The hand-written scanner has
to return token numbers according to this convention. It must have the
interface described in Section 3.3.</P>

<H3>Tailoring the generated compiler parts to specific needs</H3>

<P>Using a generator usually increases productivity but decreases flexibility.
There are always special cases that can be handled more efficiently in
a hand-written implementation. A good tool handles routine matters in a
standard way, but gives a user the chance to change the standard policy
if this seems appropriate. Coco/R generates the Scanner, Parser and ErrorStream
classes from source texts (so-called frames) stored as the files <I>Scanner.frame</I>
and <I>Parser.frame</I>. It does so by inserting grammar-specific parts
into these frames at the places marked with distinctive --&gt;keys . A user
may edit the frames (within reason!) and thereby change the internally used
algorithms. For example, a user can implement a different buffering scheme for
input characters. </P>

<H3>Accessing the lookahead token</H3>

<P>The generated parser offers the lookahead token <I>t</I> that can be
used to access the next token in the input stream (i.e. the one that has
already been scanned but not yet parsed). The following example shows one
way of computing the start and the end position of a sequence of tokens
enclosed in curly brackets: </P>

<PRE>
   &quot;{&quot;     (. start = t.pos; .)   // start of the first ANY
   {ANY}   (. end = token.pos; .) // start of the last ANY
   &quot;}&quot;
</PRE>

<H3>Controlling the Parser by semantic information</H3>

<P>Ideally, syntax analysis should be &quot;context-free&quot;, that is,
independent of semantic analysis (symbol table handling, type checking, etc.).
However, many languages have constructs that can only be distinguished if one
also considers semantic information (e.g., the type) associated with various
identifiers. In the language Oberon, for example, a <I>Designator</I> is
defined as </P>

<PRE>   Designator = Qualident
                {&quot;.&quot; ident | &quot;^&quot; | &quot;[&quot; ExprList &quot;]&quot; | &quot;(&quot; Qualident &quot;)&quot; } .
</PRE>

<P>where in the context of an <I>Expression</I>, code like <I>x(T)</I> means a
type guard (i.e., <I>x</I> is asserted to be of type <I>T</I>). A
<I>Designator</I> may also be used in a <I>Statement</I> </P>

<PRE>   Statement = ... | Designator [&quot;(&quot; ExprList &quot;)&quot;] | ...  .
</PRE>

<P>but in this context <I>x(T)</I> can be interpreted as a regular procedure
name <I>x</I> followed by a parameter <I>T</I>. The two interpretations of
<I>x(T)</I> can only be distinguished by looking at the type of <I>x</I>. If
<I>x</I> is a regular procedure then the opening bracket is the start of a
parameter list, otherwise the bracket belongs to a type guard. </P>

<P>Coco/R allows control of the parser from within semantic actions to
a certain degree. A <I>Designator</I>, for example, can be processed in the
following way: </P>

<PRE>   Designator &lt;^Item x&gt; =
     Qualident &lt;^x&gt;
     {                        (. if (...x is a procedure...) return; .)
       &quot;(&quot; Qualident &lt;^y&gt; &quot;)&quot; (. ... process type guard ... .)
     | ...
     } .
</PRE>

<P>When an opening bracket is seen after a <I>Qualident</I>, the alternative
starting with an opening bracket is selected. The first semantic action of
this alternative checks for the type of <I>x</I>. If <I>x</I> is a regular
procedure, the parser returns from the production (and presumably continues in
the <I>Statement</I> production whence it was invoked). </P>

<A NAME="Taste"></A>

<H2>5. A Sample Compiler</H2>

<P>This section shows how to use Coco/R for building a compiler for a tiny
programming language called &quot;Taste&quot;. Taste bears some resemblance to
Modula-2 or Oberon. It has variables of type INTEGER and BOOLEAN, and regular
procedures without parameters. It allows assignments, procedure calls, and IF-
and WHILE-statements. Integers may be read from an input stream and written to
an output stream, one to a line. Expressions may incorporate arithmetic
operators (+, -,* , /) and relational operators (=, &lt;, &gt;). </P>

<P>Here is an example of a Taste program, which can be found in the file <A
HREF="../Taste/Test.TAS">Test.TAS</A>: </P>

<PRE>  MODULE Example;
    VAR n: INTEGER;

    PROCEDURE SumUp; (*build the sum of all integers from 1 to n*)
      VAR sum: INTEGER;
    BEGIN
      sum:=0;
      WHILE n&gt;0 DO sum:=sum+n; n:=n-1 END;
      WRITE sum
    END SumUp;

  BEGIN
    READ n;
    WHILE n&gt;0 DO SumUp; READ n END
  END Example.
</PRE>

<P>The full grammar of Taste can be found
<A HREF="Taste.htm">here</A>.
Of course Taste is too restrictive to be used as a real programming language.
Its purpose is just to give one a taste of how to write a compiler with
Coco/R!</P>

<H3> Execution</H3>

<P>The Taste compiler is a compile-and-go compiler, which means that it
reads a source program and translates it into a target program which is
executed (i.e. interpreted) immediately after the compilation. Once it has been
built, the Taste system can be run in "command line mode", for example

<PRE>   java Taste.Comp Test.TAS
</PRE>

<P>In this mode, when the program is interpreted, input is taken from standard input, and output is directed
to standard output.  Alternatively, the system may be run in "GUI" mode; in
this case when you run Taste, a dialog pops up which asks you for the name of
the source file to be compiled.   When the compiled program is interpreted a
second dialog box pops up, which asks you for the name of a data input file.
You could use
<A HREF="../Taste/Test.IN">Test.IN</A>
as a sample input file; once again the output is directed to standard
output.</P>

<H3>The Target Code</H3>

<P>We define an abstract stack machine as the target for the translation of
Taste programs. The compiler translates a source program into instructions for
this machine, which will later be interpreted. The machine uses the following
data structures (the code array is filled by the compiler): </P>

<PRE>   char code[];   code memory
   int stack[];   stack with frames for local variables
   int top;       stack pointer (points to next free stack element)
   int pc;        program counter
   int base;      base address of current frame
</PRE>

<P>The instructions have variable length and are described by the following
table (the initial values of the registers are: base=0; top=3;): </P>

<PRE>LOAD l,a  Load value          Push(stack[Frame(l)+a]);
LIT i     Load literal        Push(i);
STO l,a   Store               stack[Frame(l)+a]=Pop();
ADD       Add                 j=Pop(); i=Pop(); Push(i+j);
SUB       Subtract            j=Pop(); i=Pop(); Push(i-j);
DIV       Divide              j=Pop(); i=Pop(); Push(i/j);
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -