📄 javacoco.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>Coco/R for Java</TITLE>
<META NAME="GENERATOR" CONTENT="Mozilla/3.0Gold (Win16; I) [Netscape]">
</HEAD>
<BODY>
<H1 ALIGN=CENTER>Coco/R for Java</H1>
<CENTER><P>H. Mössenböck<BR>
Johannes Kepler University Linz<BR>
Institute of Practical Computer Science </P>
<P>This version amended (7 July 1999) and extended by<BR>
P.D. Terry<BR>
Computer Science Department<BR>
Rhodes University, Grahamstown</P>
</CENTER>
<P><I>You may obtain the latest version of this document from
<A HREF="http://cs.ru.ac.za/homes/cspt/javacoco.htm">
http://cs.ru.ac.za/homes/cspt/javacoco.htm</A></I>
<P>Coco/R is a compiler generator which takes a compiler description in the
form of an LL(1) attributed grammar and generates a scanner and parser for the
described language. This paper describes a version of Coco/R for Java that
slightly extends the original, and is a modified version of the original
documentation by Mössenböck that you can read
<A HREF="http://www.ssw.uni-linz.ac.at/Research/Projects/Coco/">here</A>.
There are also implementations of Coco/R for
<A HREF="http://cs.ru.ac.za/homes/cspt/cocor.htm">C, Turbo Pascal and
Modula-2</A> and for
<A HREF="ftp://ftp.ssw.uni-linz.ac.at/pub/Oberon/LinzTools/Coco.Cod">
Oberon</A>. </P>
<H2>Contents</H2>
<OL>
<LI><A HREF="#intro">Introduction</A> </LI>
<P>
<LI><A HREF="#InpLang">The specification language Cocol/R</BR>
</A>
2.1 Vocabulary<BR>
2.2 Structure of a compiler description<BR>
2.3 Scanner specification<BR>
2.4 Parser specification </LI>
<P>
<LI><A HREF="#UserGuide">User Guide<BR>
</A>3.1 The Parser interface<BR>
3.2 The Token interface<BR>
3.3 The Scanner interface<BR>
3.4 The ErrorStream interface<BR>
3.5 The driver program<BR>
3.6 Grammar tests<BR>
3.7 Trace output </LI>
<P>
<LI><A HREF="#hints">Hints for Advanced Users of Coco/R</A> </LI>
<P>
<LI><A HREF="#Taste">A Sample Compiler</A> </LI>
<P>
<LI><A HREF="#Apps">Useful applications of Coco/R</A></LI>
<P>
<LI><A HREF="#Sources">Sources of Coco/R</A> </LI>
<P>
<LI><A HREF="#port">Portability issues</A> </LI>
<P>
<LI><A HREF="#Lit">Literature</A> </LI>
</OL>
<P><A NAME="intro"></A></P>
<H2>1. Introduction</H2>
<P>Coco/R takes an LL(1) attributed grammar for a language, expressed in
augmented EBNF, and generates Java code for a recursive descent parser and a
scanner for that language. To generate a complete application, such as a
compiler, a user has also to supply a simple main method that calls the
parser, as well as semantic classes that are used from within the grammar
(e.g., a symbol table handler and a code generator). </P>
<P><IMG SRC="Fig1.gif" NATURALSIZEFLAG="3" HEIGHT=100 WIDTH=318 ALIGN=BOTTOM><BR>
</P>
<P>The following example gives you a rough impression of the input language
(Cocol/R). It shows a grammar rule for the processing of declarations. </P>
<PRE> VarDeclaration <^int adr>
(. Object obj, obj1;
Struct typ;
int n, a;
String name; .)
= Ident <^name> (. obj = SymTab.Find(name);
obj.link = null;
n = 1; .)
{ ',' Ident <^name> (. obj1 = SymTab.Find(name);
obj1.link = obj;
obj = obj1;
n++; .)
}
':'
Type <^typ> (. adr = adr + n * typ.size;
a = adr;
while (obj != null) {
a =- typ.size;
obj.adr = a;
obj = obj.link;
} .)
';' .
</PRE>
<P>The core of this specification is the <B>EBNF rule</B> </P>
<PRE> VarDeclaration = Ident {',' Ident} ':' Type ';' .
</PRE>
<P>It is augmented by attributes and semantic actions. The <B>attributes</B>
(e.g. <^name>) specify parameters that are associated with various of
the nonterminal symbols (e.g. Ident). The <B>semantic actions</B>
are arbitrary Java statements between "(." and ".)".
Such semantic actions will be executed by the generated parser at the position
in the grammar at which the associated syntactic element is encountered.
Nonterminals may have input attributes (e.g. <x, y>) and output
attributes (e.g. <^z>). Output attributes are preceded by a
"^". A nonterminal may, however, have at most one output attribute,
which must be the first attribute in the attribute list (e.g. sym<^x, y,
z>). </P>
<P>Coco/R generates a parser method for every grammar rule. The parser
method for the above rule would look as follows: </P>
<PRE>
private static int VarDeclaration () {
Object obj, obj1;
Struct typ;
int n, a;
String name;
name = Ident();
obj = SymTab.Find(name);
obj.link = null;
n = 1;
while (t.kind == commaSym) {
Get();
name = Ident();
obj1 = SymTab.Find(name);
obj1.link = obj;
obj = obj1;
n++;
}
Expect(colonSym);
typ = Type();
adr = adr + n * typ.size;
a = adr;
while (obj != null) {
a =- typ.size;
obj.adr = a;
obj = obj.link;
}
Expect(semicolonSym);
return adr;
}
</PRE>
<P>In this code can also be seen the statements that interact with the
generated scanner whose responsibility it will be to read the input file and
return a stream of tokens to the parser.</P>
<A NAME="InpLang"></A>
<H2>2. The specification language Cocol/R</H2>
<P>A compiler description can be viewed as consisting of imports,
declarations and grammar rules that describe the lexical and syntactical
structure of a source language as well as its translation into a target
language.
<H3>2.1 Vocabulary</H3>
<P>The vocabulary of Cocol/R uses identifiers, strings and numbers in the
usual way: </P>
<PRE> ident = letter {letter | digit} .
string = '"' {anyButQuoteOrLineMark} '"'
| "'" {anyButApostropheOrLineMark} "'" .
number = digit {digit} .
</PRE>
<P>Upper case letters are distinct from lower case letters. Strings must
not cross line borders. Keywords are </P>
<PRE>
ANY CASE CHARACTERS CHR COMMENTS
COMPILER CONTEXT END EOF FROM
IGNORE NAMES NESTED PRAGMAS PRODUCTIONS
SYNC TO TOKENS WEAK
</PRE>
<P>The following metacharacters are used to form EBNF expressions: </P>
<TABLE>
<TR>
<TD>( )<BR>
{ }<BR>
[ ]<BR>
< ><BR>
(. .)<BR>
= . | + - </TD>
<TD>for grouping<BR>
for iterations<BR>
for options<BR>
for attributes<BR>
for semantic parts<BR>
as explained below </TD>
</TR>
</TABLE>
<P>Comments are enclosed in "/*" and "*/" and may be
nested. </P>
<H3>2.2 Structure of a compiler description</H3>
<P>A compiler description in Cocol/R is made up of the following parts:</P>
<PRE>
Cocol =
"COMPILER" GoalIdentifier
ArbitraryJavaDeclarations
ScannerSpecification
ParserSpecification
"END" GoalIdentifier "." .
GoalIdentifier = ident .
</PRE>
<P>The name after the keyword COMPILER is the grammar name and must match
the name after the keyword END. The grammar name also denotes the topmost
nonterminal (the start symbol). </P>
<P>Arbitrary Java declarations may follow the grammar name. These are not
checked by Coco/R. They usually contain imports of other classes, as well as
declarations of fields and methods to be used in the semantic actions of
the parser class that is to be generated. </P>
<P>The remaining parts of the compiler description specify the lexical
and syntactical structure of the language to be processed. Effectively two
grammars are specified - one for the lexical analyzer or scanner, and the
other for the syntax analyzer or parser. The nonterminals (token classes)
recognized by the scanner are regarded as terminals by the parser.</P>
<H3>2.3 Scanner specification</H3>
<P>A scanner has to read source text, skip meaningless characters, and
recognize tokens which are then passed to a parser. Tokens may be classified
either as literals or as token classes. Literals (like "while" and
">=") may be incorporated directly into productions as
strings, and do not have to be named. Token classes (such as identifiers or
numbers) must be named, and have structures that are specified by regular
expressions, defined in EBNF. There are usually many different instances of a
token class (e.g., many different identifiers) which are all recognized as the
same kind of token. </P>
<P>A scanner specification consists of six optional parts that may be written
in arbitrary order. </P>
<PRE> ScannerSpecification =
{ CharacterSets
| Tokens
| Ignorable
| Comments
| Pragmas
| UserNames
} .
</PRE>
<P><B>Character sets</B>. The <I>CharacterSets</I> component allows for the
declaration of names for character sets such as letters or digits, and defines
the characters that may occur as member of these sets. These names may be used
in the other sections of the scanner specification (but not in the parser
specification). </P>
<PRE>
CharacterSets = "CHARACTERS" {SetDecl} .
SetDecl = SetIdent "=" CharacterSet .
CharacterSet = SimpleSet { ("+" | "-") SimpleSet} .
SimpleSet = SetIdent | string | "ANY"
| SingleChar [".." SingleChar ] .
SingleChar = "CHR" "(" number ")" .
SetIdent = ident .
</PRE>
<P>Simple character sets are denoted by one of: </P>
<TABLE>
<TR>
<TD>string
<td> a set consisting of all characters in the string
<tr>
<td>SetIdent
<td> the previously declared character set with this name
<tr>
<td valign=top>CHR(i)
<td valign=top>a character set consisting of a single element with the ordinal value i
(i <=127)
<tr>
<td valign=top>CHR(i) .. CHR(J)
<td valign=top>a character set consisting of all characters whose ordinal values are in the
range i ... j
<tr>
<td>ANY
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -