📄 pascals.txt
字号:
Pascal-S: A Subset and its
Implementation
N. Wirth
1. Aims and motivation
Several years ago, the Computer Science Department of ETH Zurich had started to
use the programming language Pascal in its introductory programming courses [1, 2] .
These courses are taught mainly to engineers, physicists, and mathematicians in
their first year. The large number of participants dictates the use of an efficient,
and economical system; economical with regard to the students learning effort,
to computing time, and to storage requirements. The fIrst demand requires a
system with comprehensive syntax and runtime error checking and the provision of
meaningful, well-explained diagnostics. Machine economy was realized through a
combination of a compiler and a sub-batch monitor, the latter alternately invoking
compilation and program execution. This scheme requires an absolutely watertight
protection against errors in compiled programs and, of course, an entirely error-free
compiler and monitor.
The system developed for this purpose by R. Schild proved to be highly suc-
cessful, and it turned out to be so economical with respect to computing time that
the system's extensive use by about 350 students amounted to less than 0.5 per
cent. of the entire computing services provided by the computation centre averaged
over the entire term, although the student job batch was collected, run, and retur-
ned four times per day.
Nevertheless, there were also some disadvantages. First of all, there were only
four fixed times when jobs were collected (and sometimes fewer due to machine
failures). Then, the student jobs had to be handled separately from all other jobs
separate batching and special job cards to be provided by the operators). These
drawbacks and the advent of a separate medium-speed batch terminal located in
the students' program preparation room indicated that a system with different
characteristics might be more appropriate. The students' batch terminal allows for
self-service. It is therefore highly desirable-if not mandatory-that each student's
job be scheduled and run independently of other jobs. Fast turnaround can-under
the prospect of large numbers of jobs- only be guaranteed, if the jobs use relatively
little store. In fact, storage space is at a much higher premium than processor time.
Under these conditions, a compact system is mandatory. Without compromising
on the demand for extensive error checking, our interpretive solution appears as
most promising because it allows for a simple compiler and dense code. Program
size can be further .reduced if the system is restricted to handling that subset of
Pascal which is actually taught in these introductory courses. Hence, the new
system was intentionally designed to process a subset. The resulting reduction of
development labour was an additonal incentive for this decision.
In choosing an interpretive approach, one must be aware of and willing to
accept a substantial loss of efficiency in program execution. A factor of 30 compared
to reasonably good compiled code is not unusual, and a factor of 20 must be
called 'very good'. Such factors can of course, only be accepted if the gains expected
elsewhere are equally substantial. They can only be compensated if the execution
effort of the compiled program is relatively small, say at least 20 times
smaller than job initiation, compiler loading, compilation, and program loading
together.
In view of the everyday performance figures of common operating systems, this
condition is indeed satisfied for very many problems that can successfully be
asssigned as programming exercises. The most important single factor is gained by
eliminating the need for a relocation loader. This is achieved by directly depositing
the compiled code in store. As programs tend to be small, even the demand for
storage economy cannot be used as a strong counter argument agains this strategy.
The system described in this report consists of a compiler and an interpreter for
a subset of Pascal called Pascal-S. Section 2 defmes that subset; it contains a complete
syntax speeification in terms of concise syntax diagrams. These diagrams
directly mirror the structure of the compiler. Section 3 provides an overview of the
entrie system which is described as a single Pascal program. Some figures are
provided concerning system size and performance. Section 4 explains the architecture
of the hypothetical computer that executes compiled Pascal-S programs. Some
typical program constructs are listed together with the code generated for them.
The principles of operation of the computer become understandable through these
sample constructs which at the same time pciture the task of translating Pascal-S
into this code. Section 5 discusses the compiler itself, and it starts out with an
explanation of the tables and their structure. used to represent the information
given in a program's declarations.
The explanations are necessarily terse and brief and incomplete. They are intended
for people who already have some background on compilers; they are in particular
referred to by Wirth [2], where the principles along which this sytem is
constructed are taught and developed. For all details, the reader is referred to section
6 which is a full listing of the entire system. One may wonder about the value
of including a program listing in extenso. But I think it is important and hope it
will be useful. The primary value of the langu~ge and system Pascal is that it allows
one to construct large programs that are useful and highly efficient in a form that
can be read and cornrnunicated. The listing of thePascal-S system is intended to
support this claim. It also proves that compiler and interpreter can be described in a
machine-independent, well-structured form that nevertheless is effectively machine
translatable. The relative brevity of the program (25 pages) also raises a new aspect
of compiler portability; it is entirely possible to transport such a system by hand
coding. The effort is at most one of a few man-months, even for a computer where
nothing but symbolic assembly code or FORTRAN are available.
2. The language Pascal-S
The choice of features to be included in the subset now called Pascal-S was mainly
guided by the contents of traditional introductory prograrnrning courses. Beyond
this it is subject to personal experience, judgement, and prejudice. A firm guideline
was provided by the demand that the system must process a strict subset of Pascal,
i.e. that every Pascal-S program must also be acceptable by the compiler of standard
Pascal without being subjected to the slightest change. This rule makes it possible
for students to switch over to the regular system in later courses 'without noticing'.
The power of a language and its range of applications largely depend on its data
types and associated operators. They also determine the amount of effort required
to master a language. Pascal-S adheres in this respect largely to the tradition of
ALGOL 60. Its primitive data types are the integers, the real numbers, and the
Boolean truth values. They are augmented in a most important and crucial way by
the type char, representing the available set of printable characters. Omitted from
Pascal are the scalar types and sub range types.
Pascal-S includes only two kinds of data strnctures: the array and the record
(without variants). Omitted are the set and the file structure. The exceptions are
the two standard textflles input and output which are declared implicitly (but
must be listed in the program heading). A very essential omission is the absence of
pointer types and thereby of all dynamic structures. Of course, also all packing
options (packed records, packed arrays) are also omitted.
The choice of data types and structures essentially determines the complexity of
a processing system. Statement and control strnctures contribute very little to it.
Hence, Pascal-S includes most of Pascal's statement structures (compound, conditional,
selective, and repetitive statements). The only omissions are the with and
the goto statement. The latter was omitted very deliberately because of the principal
use of Pascal-S in teaching the systematic design of well-structured programs.
Procedures and functions are included in their fUll generality. The only exception
is that procedues and functions cannot be used as parameters.
The detailed syntax of Pascal-S can be seen from the syntax diagrams which are
included in the Appendix. They reveal a simple and consistent language that can be
learned in toto in a very short time, yet encompasses many of the truly fundamental
concepts of programming.
Teaching experience over many years has shown that the concept of the sequence
(sequential file) is of fundamental importance for the understanding of many computing
practices and techniques. In spite of the absence of declarable files, it can be
taught quite well with Pascal-S because of the presence of the two standard textflles.
We have deliberately decided to exclude the primitive operators put and get, and
have restricted file operations to read (on input) and write (on output). Also included
are the simple but flexible 'formatting' facilities of the Pascal write statement.
They proved to be not only useful and desirable from the point of view of utility
but, indeed, also quite simple to teach. (Actually, they do not even need to be
taught; students learn to use them quite naturally, as they are entriely free of pitfalls.)
The standard objects available in Pascal-S are:
Constants: true, false
Types: integer, real, boolean, char
Functions: abs, sqr, odd,
chr, ord, succ, pred,
round, tronc,
sin, cos, exp, In, sqrt, arctan,
eof. eoln,
Procedures: read, readln, write, writeln,
Functions pred and succ are only applicable to arguments of type char. The
argument of ord can be of type char, boolean, or integer. For further details concerning
the language, the reader is.referred to the literature [1,3]. It is noteworthy
that the subset corresponds largely to that part of the language Pascal which is
covered in a textbook for an introductory programming course [3,4].
3. The implementation
The Pascal-S system is described as a compiler that translates Pascal-S programs into
code for a hypothetical stack computer especially designed for this purpose. This
comouter is itself defmed as an algorithm, called the interpreter of the compiled
code. Bothe compiler and interpreter are described in a largely machine-independent
way by using the high-level language Pascal exclusively. In fact, these two parts
form a single Pascal program. It is listed in section 6.
The advantages of a description using a high-level language are particularly
apparent during the development of a system, but are equally significant if it has
to be transported and adapted to a different computer. In fact, Pascal-S can be
implemented immediately on all machines where a full Pascal compiler is available.
Of course, the success of such an automatically generated system crucially depends
on the quality of the tool compiler. The Pascal 6000-3.4 compiler used at ETH on
the CDC 6400 computer generates high-quality code, and recompiles the entire
Pascal-S sytem in 20 sec.
The disadvantage of an interpretive system is its relatively large overhead during
program execution. Experience has shown, however, that for small programs the
expense for central processor utilization for interpretation is anyway quite small.
Some actual figures are given for a few sample programs in section 7. Exercises
requiring less than 4 sec of computer time (costing less than SFr. 2.-) are predominant.
Of course, exercises will have to be carefully chosen, particularly in numerical
mathematics.
Apart from its machine-independent specification, the following characteristics
are noteworthy:
(1) The system resides on the disk store as an absolute binary overlay file; hence,
loading is fast.
(2) The storage space needed is reasonably small. On the CDC 6000 computer it
requires lO,OOO 60-bit words, including data and I/O buffers.
(3) The compiler is designed to recover from syntax errors and to proceed after
emitting a diagnostic keyword. This policy usually allows many errors to be
detected from a single compilation. A significant effort is made to suppress
so-called 'spurious' error messages, i.e. indication of irregular situations that
are due to previously reported errors.
(4) Control is not returned to the operating system between compilation and
execution. No loader is invoked nor is secondary storage accessed to deposit j
the generated code. This resulted in a very significant saving of overhead and
cost.
(5) A copy of the input data is made on the output file immediately after compilation.
This is often an invaluable aid to consultants and tutors.
(6) The interpretation steps needed for program exection are counted, and
provide a precise, reproducible hardware-independent measure for the actual
computational effort expended.
(7) By exchanging a single operating system control card the student may
switch over to the regular, full Pascal compiler. Pascal-S is a true subset of
standard Pascal.
(8) If an error is detected at run time, execution is aborted and a listing of the
names and values of all currently accessible variables is printed, together
with the coordinate of the point of interruption and an indication of the
reason for the interruption (post-mortem dump).
(9) The system requires no access to secondary store except for the standard
input and output files. The amount of these data is usually so small that a
single access (block transfer) is sufficient for each file.
4. The interpreter
The Pascal-S system consists of two main parts: compiler and interpreter. Their
principal interface is the array variable to which the compiler asssigns the generated
code. The interpreter itself is formulated as a procedure which is called after succesful
compilation.
The interpreter describes a straightforward stack computer, consisting of a store
S organized as a stack, two index registers T and B which control the stack, a program
counter PC, an instruction register IR, a program status register PS, and a DISPLA
Y used to speed up the addressing mechanism. Each element of the stack represents
either an integer, a real number, a logical value, or a character. The principal
structure is
procedure interpret;
begin initialize registers and auxiliary counters;
repeat ir;= code (pc] ;pc ;= pc+1;
interpret (ir)
until ps =1= mn;
if ps =1= fin then postmortemdump
end
Each instruction (order) is characterized by an order code f with values between
0 and 63. Orders with values, 0,1,2,3 have two parameters x andy. Instructions 0,
1, 2 generate an address of the data element on the stack with offset y in the currently
active data segment on level x. Orders with codes from 8 to 30 have
a single parameter y whose meaning differs in the individual cases. Orders with
codes 31...63 have no parameters and are operators whose arguments are the elements
on the top of the stack.
We refrain from introducing a complete set of mnemonics of the orders. Instead,
short key words are given as comments in the listing of the interpreter where necessary.
The individual routines are in most cases simple enough to make any commentary
superfluous. We can therefore restrict ourselves to a presentation of the general
layout of the stack and of the patterns of emitted code for specific language constructs.
These will show where individual orders are used, and thereby make their
principles of operation understandable.
4.1 Storage layout and procedure calls
Each stack element may either be an integer, a real number, a logical value, or a
character. Integers are also used as stack indices. Each activated procedure reserves
a stack section on the top. The beginning of each section is designated by a mark
which contains a pointer to the previous section. These pointers form the so-called
dynamic link, as they record the dynamic history of procedure activations. There is
also a static link, which connects those sections that belong to procedures declared
within each other. This chain designates all segments which are to be currently
accessible. The static chain starting with the currently active procedure is also
copied into the short array called display. This is done to speed up data access.
The first five locations of each stack section are occupied by the so-called section
mark. The mark contains the two links, the return address, a pointer (index) to the
symbol table, and the result table (used by function procedures only). All data are
accessed by an offset address relative to the section origin, or via the top stack
pointer. Subsequent locations are used for procedure parameters, followed by local
variables. The top of the stack is used for intermediate results in evaluating expressions.
Stack sections are 'allocated' when procedures are activated by changing the
stack pointers and setting up the two links. First, the stack is 'marked' i.e. a space
for the block mark is reserved. Then the actual parameters are processed. In the
case of value parameters the actual values are loaded onto the stack; in the case ofvariable
parameters, stack addresses are loaded instead of values. Finally, the procedure call
order changes the Band T registers and assigns the links and return address to the
section mark.
Upon procedure exit, the return order reverses the operations performed by the
call order. If the static level of the current procedure is lower than the one of the
procedure to which control returns, then the display has to be updated. This is
done by a separate order which indicates the two levels.
4.2 Control structures
Control structures are translated into sequences "of instructions containing jump
orders to the following patterns:
(1) if b then s
(2) if b then sl else s2
(3) while b do s
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -