📄 coco.use
字号:
In this example, a digit string appears ambiguously to be recognized as
an integer or as a real.
The following messages are warnings. They may indicate an error but they may
also describe desired effects. The generated compiler parts may still be
valid. If an LL(1) error is reported for a construct X, one must be aware
that the generated parser will choose the first of several possible
alternatives for X.
X NULLABLE
X can be derived to the empty string, e.g. X = { Y } .
LL(1) ERROR IN X:Y IS START OF MORE THAN ONE ALTERNATIVE
Several alternatives in the production of X start with the terminal Y
e.g.
Statement = ident ":=" Expression | ident [ ActualParameters ] .
LL(1) ERROR IN X:Y IS START AND SUCCESSOR OF NULLABLE STRUCTURE
Nullable structures are [ ... ] and { ... }
e.g.
qualident = [ ident "." ] ident .
Statement = "IF" Expression "THEN" Statement [ "ELSE" Statement ] .
The ELSE at the start of the else part may also be a successor of a
statement. This LL(1) conflict is known under the name "dangling else".
The Parser Interface
====================
A parser generated by Coco/R defines various routines that may be called from
an application. As for the scanner, the form of the interface depends on the
host system. The parser generated by Coco/R for C has the following simple
interface:
#define MinErrDist 2
void Parse();
/* Parses the source */
int Successful();
/* Returns 1 if no errors have been recorded while parsing */
void LexString(char *Lex, int Size);
/* Retrieves at most Size characters from the most recently parsed
token into Lex */
void LexName(char *Lex, int Size);
/* Retrieves at most Size characters from the most recently parsed
token into Lex, converted to upper case if IGNORE CASE was specified */
void LookAheadString(char *Lex, int Size);
/* Retrieves at most Size characters from the lookahead token into Lex */
void LookAheadName(char *Lex, int Size);
/* Retrieves at most Size characters from the lookahead token into Lex,
converted to upper case if IGNORE CASE was specified */
void SynError(int errNo);
/* Reports syntax error denoted by errNo */
void SemError(int errNo);
/* Reports semantic error denoted by errNo */
For the C++ version, it effectively takes the form below. (There is actually
an underlying class hierarchy, and the declarations are really slightly
different from those presented here).
class grammarParser
{ public:
grammarParser(AbsScanner *S, CRError *E);
// Constructs parser associated with scanner S and error reporter E
void Parse();
// Parses the source
int Successful();
// Returns 1 if no errors have been recorded while parsing
private:
void LexString(char *Lex, int Size);
// Retrieves at most Size characters from the most recently parsed
// token into Lex
void LexName(char *Lex, int Size);
// Retrieves at most Size characters from the most recently parsed
// token into Lex, converted to upper case if IGNORE CASE was specified
long LexPos();
// Retrieves the position of the most recently parsed token
void LookAheadString(char *Lex, int Size);
// Retrieves at most Size characters from the lookahead token into Lex
void LookAheadName(char *Lex, int Size);
// Retrieves at most Size characters from the lookahead token into Lex,
// converted to upper case if IGNORE CASE was specified
long LookAheadPos();
// Retrieves the position of the lookahead token token
void SynError(int errNo);
// Reports syntax error denoted by errNo
void SemError(int errNo);
// Reports semantic error denoted by errNo
// ... Prototypes of functions for parsing each non-terminal in grammar
};
The functionality provides for the parser to
- initiate the parse for the goal symbol by calling Parse().
- investigate whether the parse succeeded by calling Successful().
- report on the presence of syntactic and semantic errors by calling SynError
and SemError.
- obtain the lexeme value of a particular token in one of four ways
(LexString, LexName, LookAheadString and LookAheadName). Calls to
LexString are most common; the others are used for special variations.
A tailored frame file can be supplied, from which Coco/R can generate a main
program if the $C pragma/option is used. Examples of this can be found in the
kit as well.
The Scanner Interface
=====================
The scanner generated by Coco/R for C has the following interface (the C++
version is somewhat different)
int S_src; /* source file */
int S_Line, S_Col; /* line and column of current symbol */
int S_Len; /* length of current symbol */
long S_Pos; /* file position of current symbol */
int S_NextLine; /* line of lookahead symbol */
int S_NextCol; /* column of lookahead symbol */
int S_NextLen; /* length of lookahead symbol */
long S_NextPos; /* file position of lookahead symbol */
int S_CurrLine; /* current input line (may be higher than line) */
long S_lineStart; /* start position of current line */
int S_Get();
/* Gets next symbol from source file */
void S_Reset();
/* Reads and stores source file internally */
/* Assert: S_src has been opened */
void S_GetString(long pos, int len, char *s);
/* Retrieves exact string of max length len at position pos in source
file */
void S_GetName(long pos, int len, char *s);
/* Retrieves an string of max length len at position pos in source file.
Each character in the string will be capitalized if IGNORE CASE is
specified */
unsigned char S_CurrentCh(long pos);
/* Returns current character at specified file position */
Notes
-----
It is rarely necessary to make use of any of this interface directly. The
parser interface discussed above exports most of the functionality that is
required when actions are required to retrieve token information.
The variables S_Line, S_Col, S_Pos, S_Len are apposite for the most recently
parsed token.
The variables S_NextLine, S_NextCol, S_NextPos, S_NextLen are apposite for the
most recently scanned token (the look-ahead token retrieved by the most recent
call to S_Get).
Tab characters (Ascii 9) are assumed to correspond to 8 character tab stops.
Although Borland C's editor allows the user to change the tab size to any
number (default 3), Coco/R uses 8 character long tabs for compatibility with
UNIX and DOS. If you wish to change the tab size, set the defined constant
TAB_SIZE in the frame file scan_c.frm to the size you prefer. Using an
incorrect tab size will cause the scanner to report the wrong column of a
token (S_Col, S_NextCol).
The main module is responsible for opening the source file S_src prior to
calling the parser. If you are using MS-DOS add O_BINARY to the open mode
options. Don't let the compiler convert CR/LF to LF, as this will cause an
invalid file position for reporting errors.
Reset is called by the parser to initialize the scanner. Reset reads the
entire source into a large internal buffer, thus improving the efficiency
of the scanner very markedly.
S_Get is called repeatedly from the parser, to get the next token from the
source text.
S_GetString and S_GetName can be used to obtain the text of a token starting
at position pos and having length len.
For the C++ version, the interface is effectively that shown below, although
there is actually an underlying class hierarchy, so that the declarations are
not exactly the same as those shown. Once again, it is rarely necessary to
make use of any of this interface directly.
class grammarScanner
{ public:
grammarScanner(int SourceFile, int ignoreCase);
// Constructs scanner for grammar and associates this with a
// previously opened SourceFile. Specifies whether to IGNORE CASE
int Get();
// Retrieves next token from source
void GetString(Token *Sym, char *Buffer, int Max);
// Retrieves at most Max characters from Sym into Buffer
void GetString(long Pos, char *Buffer, int Max);
// Retrieves at most Max characters from Pos into Buffer
void GetName(Token *Sym, char *Buffer, int Max);
// Retrieves at most Max characters from Sym into Buffer
// Buffer is capitalized if IGNORE CASE was specified
long GetLine(long Pos, char *Line, int Max);
// Retrieves at most Max characters (or until next line break)
// from position Pos in source file into Line
};
Automatically generated error explanations are written to a file
GrammarE.H by Coco/R in the following form:
"EOF expected",
"ident expected",
"string expected"
"number expected",
...
This text can then be merged into a program to procedure textual error
messages. This is done automatically if the $C pragma (/C command line
option) is used.
Bootstrapping Coco
==================
The parser and scanner used by Coco/R were themselves generated by a
bootstrap process; if Coco/R is given the grammar CR.ATG as input, it will
reproduce the files CRS.C, CRS.H, CRP.C, CRP.H and CRE.H, CRC.H. It can
also regenerate its own main program from the file SOURCES\CR.FRM if the $C
pragma is used.
This means that Coco/R can be extended and corrected by changing its
grammar and recompiling itself. If you feel tempted to do this, please
make sure that you have kept copies of the original system in case you
destroy or corrupt the scanner and parser!
The TASTE package
=================
The distribution kit contains, in the "taste" and "taste_cp" directories,
three related applications of Coco/R: a compiler/interpreter, a
cross-reference generator, and a pretty-printer, for a simple Pascal-like
block structured language. New users will find much of interest in these
applications, which exemplify the use of symbol table construction, code
generation, error handling and so on. Versions are given for both straight
C and also for C++, where the various support modules are all defined as a
simple set of hierarchical classes.
Trademarks
==========
All trademarks are acknowledged.
=END=
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -