📄 chapter 2 the components of a compiler.htm
字号:
the final code in its machine-readable form.
<P>
<H3>2.1.6 Code Generation</H3><!---------------------------------------------------------------------------->This
is the part of the compiler where native machine code is actually generated. If
the last example of the previous section were to be translated into VAX assembly
instructions, it might look like this:
<P><PRE> MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
</PRE>During this step, the compiler has to map address names from the
three-address intermediate code onto the very finite amount of registers had by
the machine. The final result is a piece of code that is (mostly) executable.
<P>None of the remaining phases prior the actual execution of the program are a
part of the compiler's responsibility. What remains to be done is to link the
program and build a binary executable. In this phase, various libraries as well
as the main executable portion of the program are combined to form a
fully-fledged application. There are various types of linking, most of which are
done prior to runtime. The final phase that a program must pass through prior to
execution is for it to be loaded into memory. In this phase, the addresses in
the binary code are translated from logical addresses, and any final runtime
bindings are performed.
<P>
<H3>2.1.7 Peephole Optimization</H3><!---------------------------------------------------------------------------->The
object code can be further optimized by a peephole Optimizer. This process is
present in a more sophisticated compiler. The purpose is to produce a more
efficient object program that can be executed in a shorter time. Optimization is
performed at a local level and it takes advantage of certain operator properties
such as commutativity, associativity, and distributivity. This topic is very
advanced, and if the student wishes, he or she is invited to research further.
<P>To see a summary of the entire process of compilation from start to finish,
refer to {PASSES}.
<P>
<MENU><IMG src="Chapter 2 The Components of a Compiler.files/PASSES.gif">
<P><FONT face=arial size=-1><B>Figure {PASSES}</B> The passes of a compiler.
In this diagram, you can see the path that a source file goes through as it
passes through the various parts of the compiler, and eventually becomes an
executable program.
<P></FONT></P></MENU><!---------------------------------------------------------------------------->
<H2>2.2 The Number of Compiler Passes</H2><!---------------------------------------------------------------------------->Figure
{PASSES} presents a compiler that scan the input source once, produces a
first-modified form, then scans the first-modified form and produce a
second-modified from and so on, until the object form is produced. Such a
compiler is called a multi pass compiler. In the case of the multi-pass compiler
each function of the compiler can be performed by one pass of the compiler. For
instance the first pass can read the input source, scan and extract the tokens,
and store the result in an output file. The second pass can read the file that
was produced in the first pass, do the syntactical analysis by building a
syntactical tree, and associate all the information relating to each node of the
tree. The output of the second pass, then, is a file containing the syntactical
tree. The third pass can read the output file produced by the second pass, and
perform the optimization by restructuring the tree structure.
<P>Another solution is for the compiler to scans the input text only once and
directly produce the object code. Such a compiler is called a one pass compiler.
In the case of a one pass compiler all the functions mentioned in Figure
{PASSES} must be done for each instruction of the program.
<P>In a one-pass compiler, when a line of source is processed it is scanned and
the tokens are extracted. Then the syntax of the line is analyzed and the tree
structure and some tables containing information about each token are built.
Finally, after the semantical part is checked for correctness, the code is
generated. The same process is repeated for each line of code until the entire
program is compiled. Usually the entire compiler is built around the parser,
which will call procedures that will perform different functions. There are
advantages and disadvantages for both single pass and multi pass compilers.
<P>A one pass compiler is fast, since all the compiler code is loaded in the
memory at once. It can process the source text without the overhead of the
operating system having to shut down one process and start another. Also, the
output of each pass of a multi pass compiler is stored on disk, and must be read
in each time the next pass starts.
<P>On the other hand, a one pass tends to impose some restrictions upon the
program: constants, types, variables, and procedures must be defined before they
are used. This is an inconvenience which, for instance, in PASCAL was alleviated
by introducing the FORWARD statement. A multi pass compiler does not impose this
type of restrictions upon the user. Operations that can not be performed because
of the lack of information can be deferred to the next pass of the compiler when
the text has been traversed and the needed information made available.
<P>The components of a one pass compiler are interrelated much closer than the
components of a multi pass compiler. This requires all the programmers working
on the project to have knowledge about the entire project. A multi pass compiler
can be decomposed into passes that are relatively independent, hence a team of
programmers can work on the project with little interaction among them. Each
pass of the compiler can be regarded as a mini compiler, having an input source
written in one intermediary language, and producing an output written in another
intermediary language.
<P>In this book the compiler that is presented is a one pass compiler. The
compiler is composed of different functional elements like the scanner, the
parser with error detection, the symbol tables, the semantical analysis, and the
code generation. At the end, the compiler must perform all these functions; it
must have all these components.
<P>In order to facilitate the understanding of how the compiler is built, the
presentation is done in stages. Each stage is presented and tested separately
and each stage uses the completed parts of the previous stages. The graphical
presentation of the grammar is an abstraction of a flow-chart, hence it is easy
to indicate the operations to be performed for each grammar rule. Each stage
will present the operations that must be performed for the grammar rules of the
language.
<P><!---------------------------------------------------------------------------->
<H2>2.3 The Target Architecture</H2><!---------------------------------------------------------------------------->A
compiler can generate code either for a real machine or for a virtual machine
(VM). A real machine is a piece of hardware equipment, like a microprocessor,
capable of understanding an assembly code and executing it. A virtual machine is
a software program that acts like a real machine. There are advantages and
disadvantages, which result from using each one of these machines.
<P>For the beginning let's consider the real machine, since that is how each
program is ultimately executed. Such a machine has the advantage of working very
fast: as fast as the hardware of the microprocessor works. On the other hand it
has the disadvantage of being just as complicated. Often, generating and
formatting instructions for these machines is very technical, tedious, and
time-consuming. In addition, the programmer has to become familiar with several
very technical features of the operating system.
<P>In short, generating code for a CPU is a very messy operation because there
are hundreds of codes, and for each code there are a lot of details at the bit
and byte level that must be considered. On the other hand, the instructions may
be very tiny, and the resulting code that is generated can be very long, making
the compiler more complicated. Overall, the instructional process can be very
painful for students.
<P>In the case of a virtual machine, instructions are different than those of a
real machine. In software, things like registers do not necessarily speed up the
execution process. Also, instructions must necessarily be simple, since the
overhead of decoding each instruction prior to execution becomes quite
significant. On one hand, the higher the level of instructions, the less code
the compiler has to generate. On the other hand, the simpler the instructions
are, the quicker they can be executed by the VM. Fortunately, the demands of two
are not mutually exclusive. Instructions must be both high-level, and quick for
the software to decode. In addition to this, an extra benefit can be added, that
the virtual machine instructions can be designed in accordance with the needs of
the compiler. This can really simplify the process of generating object code.
<P>This aspect is best illustrated by the Modula-Lilith experiment that was
conducted by Prof. N. Wirth at ETH Zurich Switzerland. The Lilith computer had a
set of assembly instructions that were specially designed for the Modula-2
language. In fact, the virtual machine that is used in this book is based
loosely on Lilith's hardware.
<P>Two versions of the Modula-2 compiler were written: the first generated code
for a PDP-11 computer, and the second generated code for the Lilith computer.
The same Modula-2 program was compiled using both compilers, and it was
discovered that the length of the object code for the VAX PDP-11 was 3.8 times
longer than the object code that was generated for the Lilith computer.
<P>This shows that code generated for a virtual machine that best fits the needs
of the compiler could be measurably shorter than code generated for a
general-purpose computer. This is one of the reasons why we decided to use a
virtual machine in this book.
<P>Another advantage that results from using a virtual machine is the ease with
which the compiler can be ported on other machines. The best example today is
with the Java language, although there have been others, such as the original
Pascal compiler, which generated code for a virtual machine called P-code.
Usually, porting the compiler to another machine requires that the entire code
generation process be rewritten. In the case of a virtual machine, porting the
compiler to another machine is a matter of resolving compatibility issues with
the new compiler for the host architecture, i.e., the compiler that compiles the
language being ported. Once a compiler has been ported to another architecture,
it can be left largely the same as it was on the previous architecture, and it
will still generate code that executes properly on our VM. Where the real work
lies in porting a VM language to another computer is in porting the VM, itself.
However, virtual machines themselves are usually (of a dire necessity) quite
simple in nature. If that were not the case, they would be colossally slow.
<P>The price that is paid for these advantages spoken of so far is the lack of
speed at run time; a virtual machine, sometimes called an interpreter, is always
slower than a real machine. The virtual machine concept has been used in the
past for implementing compilers. The pros and cons discussed here are summed up
in table {PROCON}.
<P>
<MENU><FONT face=arial size=-1>
<TABLE border=1>
<TBODY>
<TR>
<TD> </TD>
<TD><B>Hardware CPU</B></TD>
<TD><B>Virtual Mechine</B></TD></TR>
<TR vAlign=top>
<TD><B>Pros:</B></TD>
<TD>
<UL>
<LI>Very, very fast by comparison.
<LI>Generalized architecture. Versatile.
<LI>Low-level instruction set for doing low-level tasks. </LI></UL></TD>
<TD>
<UL>
<LI>Somewhat simpler.
<LI>Portable, all VM features are standardized.
<LI>Able to have features that are expensive in hardware.
<LI>Higher-level instruction set.
<LI>Can be tailor-made for one language. </LI></UL></TD></TR>
<TR vAlign=top>
<TD><B>Cons:</B></TD>
<TD>
<UL>
<LI>Complex and tedious to work with.
<LI>Non portable. Each CPU is unique. Not all CPUs have the same
features.
<LI>Lower level.
<LI>Complex instruction formatting.
<LI>Requires some knowledge of the OS, as well. </LI></UL></TD>
<TD>
<UL>
<LI>Very, very slow by comparison.
</LI></UL></TD></TR></TBODY></TABLE><B>Table {PROCON}</B> Pros and cons
related to using a hardware CPU versus a virtual mechime.
<P></FONT></P></MENU>The compiler that is presented in this book generates code
for a virtual machine called the SAL Virtual Machine, or SAL VM. We feel that by
choosing this approach, we can concentrate more on the problems of building a
compiler, giving students the general feel of working with hardware, while
sparing them from the details of generating code for a real machine.
<P></P></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -