📄 tutor12.doc
字号:
O
PA2A
LET'S BUILD A COMPILER!
By
Jack W. Crenshaw, Ph.D.
5 June 1989
Part XII: MISCELLANY
PA2A
*****************************************************************
* *
* COPYRIGHT NOTICE *
* *
* Copyright (C) 1989 Jack W. Crenshaw. All rights reserved. *
* *
*****************************************************************
INTRODUCTION
This installment is another one of those excursions into side
alleys that don't seem to fit into the mainstream of this
tutorial series. As I mentioned last time, it was while I was
writing this installment that I realized some changes had to be
made to the compiler structure. So I had to digress from this
digression long enough to develop the new structure and show it
to you.
Now that that's behind us, I can tell you what I set out to in
the first place. This shouldn't take long, and then we can get
back into the mainstream.
Several people have asked me about things that other languages
provide, but so far I haven't addressed in this series. The two
biggies are semicolons and comments. Perhaps you've wondered
about them, too, and wondered how things would change if we had
to deal with them. Just so you can proceed with what's to come,
without being bothered by that nagging feeling that something is
missing, we'll address such issues here.
SEMICOLONS
Ever since the introduction of Algol, semicolons have been a part
of almost every modern language. We've all used them to the
point that they are taken for granted. Yet I suspect that more
compilation errors have occurred due to misplaced or missing
semicolons than any other single cause. And if we had a penny
for every extra keystroke programmers have used to type the
little rascals, we could pay off the national debt.
Having been brought up with FORTRAN, it took me a long time to
get used to using semicolons, and to tell the truth I've never
quite understood why they were necessary. Since I program in
Pascal, and since the use of semicolons in Pascal is particularly
tricky, that one little character is still by far my biggest
source of errors.
When I began developing KISS, I resolved to question EVERY
construct in other languages, and to try to avoid the most common
problems that occur with them. That puts the semicolon very high
on my hit list.A62A6
- 2 -A*2A*
PA2A
To understand the role of the semicolon, you have to look at a
little history.
Early programming languages were line-oriented. In FORTRAN, for
example, various parts of the statement had specific columns or
fields that they had to appear in. Since some statements were
too long for one line, the "continuation card" mechanism was
provided to let the compiler know that a given card was still
part of the previous line. The mechanism survives to this day,
even though punched cards are now things of the distant past.
When other languages came along, they also adopted various
mechanisms for dealing with multiple-line statements. BASIC is a
good example. It's important to recognize, though, that the
FORTRAN mechanism was not so much required by the line
orientation of that language, as by the column-orientation. In
those versions of FORTRAN where free-form input is permitted,
it's no longer needed.
When the fathers of Algol introduced that language, they wanted
to get away from line-oriented programs like FORTRAN and BASIC,
and allow for free-form input. This included the possibility of
stringing multiple statements on a single line, as in
a=b; c=d; e=e+1;
In cases like this, the semicolon is almost REQUIRED. The same
line, without the semicolons, just looks "funny":
a=b c= d e=e+1
I suspect that this is the major ... perhaps ONLY ... reason for
semicolons: to keep programs from looking funny.
But the idea of stringing multiple statements together on a
single line is a dubious one at best. It's not very good
programming style, and harks back to the days when it was
considered improtant to conserve cards. In these days of CRT's
and indented code, the clarity of programs is far better served
by keeping statements separate. It's still nice to have the
OPTION of multiple statements, but it seems a shame to keep
programmers in slavery to the semicolon, just to keep that one
rare case from "looking funny."
When I started in with KISS, I tried to keep an open mind. I
decided that I would use semicolons when it became necessary for
the parser, but not until then. I figured this would happen just
about the time I added the ability to spread statements over
multiple lines. But, as you can see, that never happened. The
TINY compiler is perfectly happy to parse the most complicated
statement, spread over any number of lines, without semicolons.A*2A*
- 3 -
PA2A
Still, there are people who have used semicolons for so long,
they feel naked without them. I'm one of them. Once I had KISS
defined sufficiently well, I began to write a few sample programs
in the language. I discovered, somewhat to my horror, that I
kept putting semicolons in anyway. So now I'm facing the
prospect of a NEW rash of compiler errors, caused by UNWANTED
semicolons. Phooey!
Perhaps more to the point, there are readers out there who are
designing their own languages, which may include semicolons, or
who want to use the techniques of these tutorials to compile
conventional languages like C. In either case, we need to be
able to deal with semicolons.
SYNTACTIC SUGAR
This whole discussion brings up the issue of "syntactic sugar"
... constructs that are added to a language, not because they are
needed, but because they help make the programs look right to the
programmer. After all, it's nice to have a small, simple
compiler, but it would be of little use if the resulting
language were cryptic and hard to program. The language FORTH
comes to mind (a premature OUCH! for the barrage I know that
one's going to fetch me). If we can add features to the language
that make the programs easier to read and understand, and if
those features help keep the programmer from making errors, then
we should do so. Particularly if the constructs don't add much
to the complexity of the language or its compiler.
The semicolon could be considered an example, but there are
plenty of others, such as the 'THEN' in a IF-statement, the 'DO'
in a WHILE-statement, and even the 'PROGRAM' statement, which I
came within a gnat's eyelash of leaving out of TINY. None of
these tokens add much to the syntax of the language ... the
compiler can figure out what's going on without them. But some
folks feel that they DO add to the readability of programs, and
that can be very important.
There are two schools of thought on this subject, which are well
represented by two of our most popular languages, C and Pascal.
To the minimalists, all such sugar should be left out. They
argue that it clutters up the language and adds to the number of
keystrokes programmers must type. Perhaps more importantly,
every extra token or keyword represents a trap laying in wait for
the inattentive programmer. If you leave out a token, misplace
it, or misspell it, the compiler will get you. So these people
argue that the best approach is to get rid of such things. These
folks tend to like C, which has a minimum of unnecessary keywords
and punctuation.
Those from the other school tend to like Pascal. They argue that
having to type a few extra characters is a small price to pay forA*2A*
- 4 -
PA2A
legibility. After all, humans have to read the programs, too.
Their best argument is that each such construct is an opportunity
to tell the compiler that you really mean for it to do what you
said to. The sugary tokens serve as useful landmarks to help you
find your way.
The differences are well represented by the two languages. The
most oft-heard complaint about C is that it is too forgiving.
When you make a mistake in C, the erroneous code is too often
another legal C construct. So the compiler just happily
continues to compile, and leaves you to find the error during
debug. I guess that's why debuggers are so popular with C
programmers.
On the other hand, if a Pascal program compiles, you can be
pretty sure that the program will do what you told it. If there
is an error at run time, it's probably a design error.
The best example of useful sugar is the semicolon itself.
Consider the code fragment:
a=1+(2*b+c) b...
Since there is no operator connecting the token 'b' with the rest
of the statement, the compiler will conclude that the expression
ends with the ')', and the 'b' is the beginning of a new
statement. But suppose I have simply left out the intended
operator, and I really want to say:
a=1+(2*b+c)*b...
In this case the compiler will get an error, all right, but it
won't be very meaningful since it will be expecting an '=' sign
after the 'b' that really shouldn't be there.
If, on the other hand, I include a semicolon after the 'b', THEN
there can be no doubt where I intend the statement to end.
Syntactic sugar, then, can serve a very useful purpose by
providing some additional insurance that we remain on track.
I find myself somewhere in the middle of all this. I tend to
favor the Pascal-ers' view ... I'd much rather find my bugs at
compile time rather than run time. But I also hate to just throw
verbosity in for no apparent reason, as in COBOL. So far I've
consistently left most of the Pascal sugar out of KISS/TINY. But
I certainly have no strong feelings either way, and I also can
see the value of sprinkling a little sugar around just for the
extra insurance that it brings. If you like this latter
approach, things like that are easy to add. Just remember that,A62A6
- 5 -A*2A*
PA2A
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -