📄 chapter 2 the components of a compiler.htm

📁 英文版编译器设计：里面详细介绍啦C编译器的设计
💻 HTM
📖 第 1 页 / 共 2 页
字号:
上一页 12
the final code in its machine-readable form.
<P>
<H3>2.1.6 Code Generation</H3><!---------------------------------------------------------------------------->This 
is the part of the compiler where native machine code is actually generated. If 
the last example of the previous section were to be translated into VAX assembly 
instructions, it might look like this:
<P><PRE>      MOVF  id3,    R2
      MULF  #60.0,  R2
      MOVF  id2,    R1
      ADDF  R2,     R1
      MOVF  R1,     id1
</PRE>During this step, the compiler has to map address names from the 
three-address intermediate code onto the very finite amount of registers had by 
the machine. The final result is a piece of code that is (mostly) executable.
<P>None of the remaining phases prior the actual execution of the program are a 
part of the compiler's responsibility. What remains to be done is to link the 
program and build a binary executable. In this phase, various libraries as well 
as the main executable portion of the program are combined to form a 
fully-fledged application. There are various types of linking, most of which are 
done prior to runtime. The final phase that a program must pass through prior to 
execution is for it to be loaded into memory. In this phase, the addresses in 
the binary code are translated from logical addresses, and any final runtime 
bindings are performed.
<P>
<H3>2.1.7 Peephole Optimization</H3><!---------------------------------------------------------------------------->The 
object code can be further optimized by a peephole Optimizer. This process is 
present in a more sophisticated compiler. The purpose is to produce a more 
efficient object program that can be executed in a shorter time. Optimization is 
performed at a local level and it takes advantage of certain operator properties 
such as commutativity, associativity, and distributivity. This topic is very 
advanced, and if the student wishes, he or she is invited to research further.
<P>To see a summary of the entire process of compilation from start to finish, 
refer to {PASSES}.
<P>
<MENU><IMG src="Chapter 2 The Components of a Compiler.files/PASSES.gif">
  <P><FONT face=arial size=-1><B>Figure {PASSES}</B> The passes of a compiler. 
  In this diagram, you can see the path that a source file goes through as it 
  passes through the various parts of the compiler, and eventually becomes an 
  executable program.
  <P></FONT></P></MENU><!---------------------------------------------------------------------------->
<H2>2.2 The Number of Compiler Passes</H2><!---------------------------------------------------------------------------->Figure 
{PASSES} presents a compiler that scan the input source once, produces a 
first-modified form, then scans the first-modified form and produce a 
second-modified from and so on, until the object form is produced. Such a 
compiler is called a multi pass compiler. In the case of the multi-pass compiler 
each function of the compiler can be performed by one pass of the compiler. For 
instance the first pass can read the input source, scan and extract the tokens, 
and store the result in an output file. The second pass can read the file that 
was produced in the first pass, do the syntactical analysis by building a 
syntactical tree, and associate all the information relating to each node of the 
tree. The output of the second pass, then, is a file containing the syntactical 
tree. The third pass can read the output file produced by the second pass, and 
perform the optimization by restructuring the tree structure.
<P>Another solution is for the compiler to scans the input text only once and 
directly produce the object code. Such a compiler is called a one pass compiler. 
In the case of a one pass compiler all the functions mentioned in Figure 
{PASSES} must be done for each instruction of the program.
<P>In a one-pass compiler, when a line of source is processed it is scanned and 
the tokens are extracted. Then the syntax of the line is analyzed and the tree 
structure and some tables containing information about each token are built. 
Finally, after the semantical part is checked for correctness, the code is 
generated. The same process is repeated for each line of code until the entire 
program is compiled. Usually the entire compiler is built around the parser, 
which will call procedures that will perform different functions. There are 
advantages and disadvantages for both single pass and multi pass compilers.
<P>A one pass compiler is fast, since all the compiler code is loaded in the 
memory at once. It can process the source text without the overhead of the 
operating system having to shut down one process and start another. Also, the 
output of each pass of a multi pass compiler is stored on disk, and must be read 
in each time the next pass starts.
<P>On the other hand, a one pass tends to impose some restrictions upon the 
program: constants, types, variables, and procedures must be defined before they 
are used. This is an inconvenience which, for instance, in PASCAL was alleviated 
by introducing the FORWARD statement. A multi pass compiler does not impose this 
type of restrictions upon the user. Operations that can not be performed because 
of the lack of information can be deferred to the next pass of the compiler when 
the text has been traversed and the needed information made available.
<P>The components of a one pass compiler are interrelated much closer than the 
components of a multi pass compiler. This requires all the programmers working 
on the project to have knowledge about the entire project. A multi pass compiler 
can be decomposed into passes that are relatively independent, hence a team of 
programmers can work on the project with little interaction among them. Each 
pass of the compiler can be regarded as a mini compiler, having an input source 
written in one intermediary language, and producing an output written in another 
intermediary language.
<P>In this book the compiler that is presented is a one pass compiler. The 
compiler is composed of different functional elements like the scanner, the 
parser with error detection, the symbol tables, the semantical analysis, and the 
code generation. At the end, the compiler must perform all these functions; it 
must have all these components.
<P>In order to facilitate the understanding of how the compiler is built, the 
presentation is done in stages. Each stage is presented and tested separately 
and each stage uses the completed parts of the previous stages. The graphical 
presentation of the grammar is an abstraction of a flow-chart, hence it is easy 
to indicate the operations to be performed for each grammar rule. Each stage 
will present the operations that must be performed for the grammar rules of the 
language.
<P><!---------------------------------------------------------------------------->
<H2>2.3 The Target Architecture</H2><!---------------------------------------------------------------------------->A 
compiler can generate code either for a real machine or for a virtual machine 
(VM). A real machine is a piece of hardware equipment, like a microprocessor, 
capable of understanding an assembly code and executing it. A virtual machine is 
a software program that acts like a real machine. There are advantages and 
disadvantages, which result from using each one of these machines. 
<P>For the beginning let's consider the real machine, since that is how each 
program is ultimately executed. Such a machine has the advantage of working very 
fast: as fast as the hardware of the microprocessor works. On the other hand it 
has the disadvantage of being just as complicated. Often, generating and 
formatting instructions for these machines is very technical, tedious, and 
time-consuming. In addition, the programmer has to become familiar with several 
very technical features of the operating system.
<P>In short, generating code for a CPU is a very messy operation because there 
are hundreds of codes, and for each code there are a lot of details at the bit 
and byte level that must be considered. On the other hand, the instructions may 
be very tiny, and the resulting code that is generated can be very long, making 
the compiler more complicated. Overall, the instructional process can be very 
painful for students.
<P>In the case of a virtual machine, instructions are different than those of a 
real machine. In software, things like registers do not necessarily speed up the 
execution process. Also, instructions must necessarily be simple, since the 
overhead of decoding each instruction prior to execution becomes quite 
significant. On one hand, the higher the level of instructions, the less code 
the compiler has to generate. On the other hand, the simpler the instructions 
are, the quicker they can be executed by the VM. Fortunately, the demands of two 
are not mutually exclusive. Instructions must be both high-level, and quick for 
the software to decode. In addition to this, an extra benefit can be added, that 
the virtual machine instructions can be designed in accordance with the needs of 
the compiler. This can really simplify the process of generating object code.
<P>This aspect is best illustrated by the Modula-Lilith experiment that was 
conducted by Prof. N. Wirth at ETH Zurich Switzerland. The Lilith computer had a 
set of assembly instructions that were specially designed for the Modula-2 
language. In fact, the virtual machine that is used in this book is based 
loosely on Lilith's hardware.
<P>Two versions of the Modula-2 compiler were written: the first generated code 
for a PDP-11 computer, and the second generated code for the Lilith computer. 
The same Modula-2 program was compiled using both compilers, and it was 
discovered that the length of the object code for the VAX PDP-11 was 3.8 times 
longer than the object code that was generated for the Lilith computer.
<P>This shows that code generated for a virtual machine that best fits the needs 
of the compiler could be measurably shorter than code generated for a 
general-purpose computer. This is one of the reasons why we decided to use a 
virtual machine in this book.
<P>Another advantage that results from using a virtual machine is the ease with 
which the compiler can be ported on other machines. The best example today is 
with the Java language, although there have been others, such as the original 
Pascal compiler, which generated code for a virtual machine called P-code. 
Usually, porting the compiler to another machine requires that the entire code 
generation process be rewritten. In the case of a virtual machine, porting the 
compiler to another machine is a matter of resolving compatibility issues with 
the new compiler for the host architecture, i.e., the compiler that compiles the 
language being ported. Once a compiler has been ported to another architecture, 
it can be left largely the same as it was on the previous architecture, and it 
will still generate code that executes properly on our VM. Where the real work 
lies in porting a VM language to another computer is in porting the VM, itself. 
However, virtual machines themselves are usually (of a dire necessity) quite 
simple in nature. If that were not the case, they would be colossally slow. 
<P>The price that is paid for these advantages spoken of so far is the lack of 
speed at run time; a virtual machine, sometimes called an interpreter, is always 
slower than a real machine. The virtual machine concept has been used in the 
past for implementing compilers. The pros and cons discussed here are summed up 
in table {PROCON}.
<P>
<MENU><FONT face=arial size=-1>
  <TABLE border=1>
    <TBODY>
    <TR>
      <TD>&nbsp;</TD>
      <TD><B>Hardware CPU</B></TD>
      <TD><B>Virtual Mechine</B></TD></TR>
    <TR vAlign=top>
      <TD><B>Pros:</B></TD>
      <TD>
        <UL>
          <LI>Very, very fast by comparison. 
          <LI>Generalized architecture. Versatile. 
          <LI>Low-level instruction set for doing low-level tasks. </LI></UL></TD>
      <TD>
        <UL>
          <LI>Somewhat simpler. 
          <LI>Portable, all VM features are standardized. 
          <LI>Able to have features that are expensive in hardware. 
          <LI>Higher-level instruction set. 
          <LI>Can be tailor-made for one language. </LI></UL></TD></TR>
    <TR vAlign=top>
      <TD><B>Cons:</B></TD>
      <TD>
        <UL>
          <LI>Complex and tedious to work with. 
          <LI>Non portable. Each CPU is unique. Not all CPUs have the same 
          features. 
          <LI>Lower level. 
          <LI>Complex instruction formatting. 
          <LI>Requires some knowledge of the OS, as well. </LI></UL></TD>
      <TD>
        <UL>
          <LI>Very, very slow by comparison. 
  </LI></UL></TD></TR></TBODY></TABLE><B>Table {PROCON}</B> Pros and cons 
  related to using a hardware CPU versus a virtual mechime.
  <P></FONT></P></MENU>The compiler that is presented in this book generates code 
for a virtual machine called the SAL Virtual Machine, or SAL VM. We feel that by 
choosing this approach, we can concentrate more on the problems of building a 
compiler, giving students the general feel of working with hardware, while 
sparing them from the details of generating code for a real machine.
<P></P></BODY></HTML>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -