📄 tutor12.doc

📁 计算机编译原理教材
💻 DOC
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
O
PA2A



























                            LET'S BUILD A COMPILER!

                                       By

                            Jack W. Crenshaw, Ph.D.

                                  5 June 1989


                              Part XII: MISCELLANY





























PA2A





       *****************************************************************
       *                                                               *
       *                        COPYRIGHT NOTICE                       *
       *                                                               *
       *   Copyright (C) 1989 Jack W. Crenshaw. All rights reserved.   *
       *                                                               *
       *****************************************************************


       INTRODUCTION

       This installment is another one  of  those  excursions  into side
       alleys  that  don't  seem to fit  into  the  mainstream  of  this
       tutorial  series.    As I mentioned last time, it was while I was
       writing this installment that I realized some changes  had  to be
       made  to  the  compiler structure.  So I had to digress from this
       digression long enough to develop the new structure  and  show it
       to you.

       Now that that's behind us, I can tell you what I  set  out  to in
       the first place.  This shouldn't  take  long, and then we can get
       back into the mainstream.

       Several people have asked  me  about  things that other languages
       provide, but so far I haven't addressed in this series.   The two
       biggies are semicolons and  comments.    Perhaps  you've wondered
       about them, too, and  wondered  how things would change if we had
       to  deal with them.  Just so you can proceed with what's to come,
       without being  bothered by that nagging feeling that something is
       missing, we'll address such issues here.


       SEMICOLONS

       Ever since the introduction of Algol, semicolons have been a part
       of  almost every modern language.  We've all  used  them  to  the
       point that they are taken for  granted.   Yet I suspect that more
       compilation errors have  occurred  due  to  misplaced  or missing
       semicolons  than  any  other single cause.  And if we had a penny
       for  every  extra  keystroke programmers have used  to  type  the
       little rascals, we could pay off the national debt.

       Having  been  brought  up with FORTRAN, it took me a long time to
       get used to using semicolons, and to tell the  truth  I've  never
       quite understood why they  were  necessary.    Since I program in
       Pascal, and since the use of semicolons in Pascal is particularly
       tricky,  that one little character is still  by  far  my  biggest
       source of errors.

       When  I  began  developing  KISS,  I resolved to  question  EVERY
       construct in other languages, and to try to avoid the most common
       problems that occur with them.  That puts the semicolon very high
       on my hit list.A62A6
                                     - 2 -A*2A*

PA2A





       To  understand  the  role of the semicolon, you have to look at a
       little history.

       Early programming languages were line-oriented.  In  FORTRAN, for
       example, various parts  of  the statement had specific columns or
       fields that they had to appear in.  Since  some  statements  were
       too  long for one line, the  "continuation  card"  mechanism  was
       provided to let  the  compiler  know  that a given card was still
       part of the previous  line.   The mechanism survives to this day,
       even though punched cards are now things of the distant past.

       When  other  languages  came  along,  they  also  adopted various
       mechanisms for dealing with multiple-line statements.  BASIC is a
       good  example.  It's important to  recognize,  though,  that  the
       FORTRAN  mechanism  was   not   so  much  required  by  the  line
       orientation of that  language,  as by the column-orientation.  In
       those versions of FORTRAN  where  free-form  input  is permitted,
       it's no longer needed.

       When the fathers  of  Algol introduced that language, they wanted
       to get away  from  line-oriented programs like FORTRAN and BASIC,
       and allow for free-form input.   This included the possibility of
       stringing multiple statements on a single line, as in


            a=b; c=d; e=e+1;


       In cases like this,  the  semicolon is almost REQUIRED.  The same
       line, without the semicolons, just looks "funny":


            a=b c= d e=e+1

       I suspect that this is the major ... perhaps ONLY ...  reason for
       semicolons: to keep programs from looking funny.

       But  the  idea  of stringing multiple statements  together  on  a
       single  line  is  a  dubious  one  at  best.  It's not very  good
       programming  style,  and  harks back to  the  days  when  it  was
       considered improtant to conserve cards.  In these  days  of CRT's
       and indented code, the clarity of programs is  far  better served
       by  keeping statements separate.  It's still  nice  to  have  the
       OPTION  of  multiple  statements,  but  it seems a shame to  keep
       programmers  in  slavery  to the semicolon, just to keep that one
       rare case from "looking funny."

       When I started in with KISS, I tried  to  keep  an  open mind.  I
       decided that I would use  semicolons when it became necessary for
       the parser, but not until then.  I figured this would happen just
       about  the time I added the ability  to  spread  statements  over
       multiple lines.  But, as you  can  see, that never happened.  The
       TINY compiler is perfectly  happy  to  parse the most complicated
       statement, spread over any number of lines, without semicolons.A*2A*
                                     - 3 -

PA2A





       Still, there are people  who  have  used  semicolons for so long,
       they feel naked  without them.  I'm one of them.  Once I had KISS
       defined sufficiently well, I began to write a few sample programs
       in the language.    I  discovered,  somewhat to my horror, that I
       kept  putting  semicolons  in anyway.   So  now  I'm  facing  the
       prospect of a NEW  rash  of  compiler  errors, caused by UNWANTED
       semicolons.  Phooey!

       Perhaps more to the point, there are readers out  there  who  are
       designing their own languages, which may  include  semicolons, or
       who  want to use the techniques of  these  tutorials  to  compile
       conventional languages like  C.    In  either case, we need to be
       able to deal with semicolons.


       SYNTACTIC SUGAR

       This whole discussion brings  up  the  issue of "syntactic sugar"
       ... constructs that are added to a language, not because they are
       needed, but because they help make the programs look right to the
       programmer.    After  all, it's nice  to  have  a  small,  simple
       compiler,    but  it  would  be  of  little  use if the resulting
       language  were  cryptic  and hard to program.  The language FORTH
       comes  to mind (a premature OUCH! for the  barrage  I  know  that
       one's going to fetch me).  If we can add features to the language
       that  make the programs easier to read  and  understand,  and  if
       those features  help keep the programmer from making errors, then
       we should do so.    Particularly if the constructs don't add much
       to the complexity of the language or its compiler.

       The  semicolon  could  be considered an example,  but  there  are
       plenty of others, such as the 'THEN' in a IF-statement,  the 'DO'
       in a WHILE-statement,  and  even the 'PROGRAM' statement, which I
       came within a gnat's eyelash of leaving out  of  TINY.    None of
       these tokens  add  much  to  the  syntax  of the language ... the
       compiler can figure out  what's  going on without them.  But some
       folks feel that they  DO  add to the readability of programs, and
       that can be very important.

       There are two schools of thought on this subject, which  are well
       represented by two of our most popular languages, C and Pascal.

       To  the minimalists, all such sugar should be  left  out.    They
       argue that it clutters up the language and adds to the  number of
       keystrokes  programmers  must type.   Perhaps  more  importantly,
       every extra token or keyword represents a trap laying in wait for
       the inattentive programmer.  If you leave out  a  token, misplace
       it, or misspell it, the compiler  will  get you.  So these people
       argue that the best approach is to get rid of such things.  These
       folks tend to like C, which has a minimum of unnecessary keywords
       and punctuation.

       Those from the other school tend to like Pascal.  They argue that
       having to type a few extra characters is a small price to pay forA*2A*
                                     - 4 -

PA2A





       legibility.    After  all, humans have to read the programs, too.
       Their best argument is that each such construct is an opportunity
       to tell the compiler that you really mean for it  to  do what you
       said to.  The sugary tokens serve as useful landmarks to help you
       find your way.

       The differences are well represented by the two  languages.   The
       most oft-heard complaint about  C  is  that  it is too forgiving.
       When you make a mistake in C, the  erroneous  code  is  too often
       another  legal  C  construct.    So  the  compiler  just  happily
       continues to compile, and  leaves  you  to  find the error during
       debug.    I guess that's why debuggers  are  so  popular  with  C
       programmers.

       On the  other  hand,  if  a  Pascal  program compiles, you can be
       pretty  sure that the program will do what you told it.  If there
       is an error at run time, it's probably a design error.

       The  best  example  of  useful  sugar  is  the semicolon  itself.
       Consider the code fragment:


            a=1+(2*b+c)   b...


       Since there is no operator connecting the token 'b' with the rest
       of the  statement, the compiler will conclude that the expression
       ends  with  the  ')', and the 'b'  is  the  beginning  of  a  new
       statement.    But  suppose  I  have simply left out the  intended
       operator, and I really want to say:


            a=1+(2*b+c)*b...


       In  this  case  the compiler will get an error, all right, but it
       won't be very meaningful  since  it will be expecting an '=' sign
       after the 'b' that really shouldn't be there.

       If, on the other hand, I include a semicolon after the  'b', THEN
       there  can  be no doubt where I  intend  the  statement  to  end.
       Syntactic  sugar,  then,  can  serve  a  very  useful purpose  by
       providing some additional insurance that we remain on track.

       I find  myself  somewhere  in  the middle of all this.  I tend to
       favor the Pascal-ers' view ... I'd much rather find  my  bugs  at
       compile time rather than run time.  But I also hate to just throw
       verbosity  in  for  no apparent reason, as in COBOL.  So far I've
       consistently left most of the Pascal sugar out of KISS/TINY.  But
       I certainly have no strong feelings either way, and  I  also  can
       see the value of sprinkling a little sugar around  just  for  the
       extra  insurance  that  it  brings.    If  you like  this  latter
       approach, things like that are easy to add.  Just  remember that,A62A6
                                     - 5 -A*2A*

PA2A
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -