📄 tutor12.doc
字号:
like the semicolon, each item of sugar is something that can
potentially cause a compile error by its omission.
DEALING WITH SEMICOLONS
There are two distinct ways in which semicolons are used in
popular languages. In Pascal, the semicolon is regarded as an
statement SEPARATOR. No semicolon is required after the last
statement in a block. The syntax is:
<block> ::= <statement> ( ';' <statement>)*
<statement> ::= <assignment> | <if> | <while> ... | null
(The null statement is IMPORTANT!)
Pascal also defines some semicolons in other places, such as
after the PROGRAM statement.
In C and Ada, on the other hand, the semicolon is considered a
statement TERMINATOR, and follows all statements (with some
embarrassing and confusing exceptions). The syntax for this is
simply:
<block> ::= ( <statement> ';')*
Of the two syntaxes, the Pascal one seems on the face of it more
rational, but experience has shown that it leads to some strange
difficulties. People get so used to typing a semicolon after
every statement that they tend to type one after the last
statement in a block, also. That usually doesn't cause any harm
... it just gets treated as a null statement. Many Pascal
programmers, including yours truly, do just that. But there is
one place you absolutely CANNOT type a semicolon, and that's
right before an ELSE. This little gotcha has cost me many an
extra compilation, particularly when the ELSE is added to
existing code. So the C/Ada choice turns out to be better.
Apparently Nicklaus Wirth thinks so, too: In his Modula 2, he
abandoned the Pascal approach.
Given either of these two syntaxes, it's an easy matter (now that
we've reorganized the parser!) to add these features to our
parser. Let's take the last case first, since it's simpler.
To begin, I've made things easy by introducing a new recognizer:
{--------------------------------------------------------------}
{ Match a Semicolon }A*2A*
- 6 -
PA2A
procedure Semi;
begin
MatchString(';');
end;
{--------------------------------------------------------------}
This procedure works very much like our old Match. It insists on
finding a semicolon as the next token. Having found it, it skips
to the next one.
Since a semicolon follows a statement, procedure Block is almost
the only one we need to change:
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
Scan;
while not(Token in ['e', 'l']) do begin
case Token of
'i': DoIf;
'w': DoWhile;
'R': DoRead;
'W': DoWrite;
'x': Assignment;
end;
Semi;
Scan;
end;
end;
{--------------------------------------------------------------}
Note carefully the subtle change in the case statement. The call
to Assignment is now guarded by a test on Token. This is to
avoid calling Assignment when the token is a semicolon (which
could happen if the statement is null).
Since declarations are also statements, we also need to add a
call to Semi within procedure TopDecls:
{--------------------------------------------------------------}
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
Scan;
while Token = 'v' do begin
Alloc;
while Token = ',' doA*2A*
- 7 -
PA2A
Alloc;
Semi;
end;
end;
{--------------------------------------------------------------}
Finally, we need one for the PROGRAM statement:
{--------------------------------------------------------------}
{ Main Program }
begin
Init;
MatchString('PROGRAM');
Semi;
Header;
TopDecls;
MatchString('BEGIN');
Prolog;
Block;
MatchString('END');
Epilog;
end.
{--------------------------------------------------------------}
It's as easy as that. Try it with a copy of TINY and see how you
like it.
The Pascal version is a little trickier, but it still only
requires minor changes, and those only to procedure Block. To
keep things as simple as possible, let's split the procedure into
two parts. The following procedure handles just one statement:
{--------------------------------------------------------------}
{ Parse and Translate a Single Statement }
procedure Statement;
begin
Scan;
case Token of
'i': DoIf;
'w': DoWhile;
'R': DoRead;
'W': DoWrite;
'x': Assignment;
end;
end;
{--------------------------------------------------------------}AB2AB
- 8 -A*2A*
PA2A
Using this procedure, we can now rewrite Block like this:
{--------------------------------------------------------------}
{ Parse and Translate a Block of Statements }
procedure Block;
begin
Statement;
while Token = ';' do begin
Next;
Statement;
end;
end;
{--------------------------------------------------------------}
That sure didn't hurt, did it? We can now parse semicolons in
Pascal-like fashion.
A COMPROMISE
Now that we know how to deal with semicolons, does that mean that
I'm going to put them in KISS/TINY? Well, yes and no. I like
the extra sugar and the security that comes with knowing for sure
where the ends of statements are. But I haven't changed my
dislike for the compilation errors associated with semicolons.
So I have what I think is a nice compromise: Make them OPTIONAL!
Consider the following version of Semi:
{--------------------------------------------------------------}
{ Match a Semicolon }
procedure Semi;
begin
if Token = ';' then Next;
end;
{--------------------------------------------------------------}
This procedure will ACCEPT a semicolon whenever it is called, but
it won't INSIST on one. That means that when you choose to use
semicolons, the compiler will use the extra information to help
keep itself on track. But if you omit one (or omit them all) the
compiler won't complain. The best of both worlds.
Put this procedure in place in the first version of your program
(the one for C/Ada syntax), and you have the makings of TINY
Version 1.2.A62A6
- 9 -A*2A*
PA2A
COMMENTS
Up until now I have carefully avoided the subject of comments.
You would think that this would be an easy subject ... after all,
the compiler doesn't have to deal with comments at all; it should
just ignore them. Well, sometimes that's true.
Comments can be just about as easy or as difficult as you choose
to make them. At one extreme, we can arrange things so that
comments are intercepted almost the instant they enter the
compiler. At the other, we can treat them as lexical elements.
Things tend to get interesting when you consider things like
comment delimiters contained in quoted strings.
SINGLE-CHARACTER DELIMITERS
Here's an example. Suppose we assume the Turbo Pascal standard
and use curly braces for comments. In this case we have single-
character delimiters, so our parsing is a little easier.
One approach is to strip the comments out the instant we
encounter them in the input stream; that is, right in procedure
GetChar. To do this, first change the name of GetChar to
something else, say GetCharX. (For the record, this is going to
be a TEMPORARY change, so best not do this with your only copy of
TINY. I assume you understand that you should always do these
experiments with a working copy.)
Now, we're going to need a procedure to skip over comments. So
key in the following one:
{--------------------------------------------------------------}
{ Skip A Comment Field }
procedure SkipComment;
begin
while Look <> '}' do
GetCharX;
GetCharX;
end;
{--------------------------------------------------------------}
Clearly, what this procedure is going to do is to simply read and
discard characters from the input stream, until it finds a right
curly brace. Then it reads one more character and returns it in
Look.
Now we can write a new version of GetChar that SkipComment to
strip out comments:AB2AB
- 10 -A*2A*
PA2A
{--------------------------------------------------------------}
{ Get Character from Input Stream }
{ Skip Any Comments }
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -