📄 chapter 6 assignment and expressions.htm

📁 英文版编译器设计：里面详细介绍啦C编译器的设计
💻 HTM
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0062)http://topaz.cs.byu.edu/text/html/Textbook/Chapter6/index.html -->
<HTML><HEAD><TITLE>Chapter 6: Assignment and Expressions</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1458" name=GENERATOR></HEAD>
<BODY>
<CENTER>
<H2>Chapter 6<BR>Assignment and Expressions</H2></CENTER><!----------------------------------------------------------------------------->
<H2>6.1 Assignment and Expressions</H2><!----------------------------------------------------------------------------->This 
chapter will serve as a buffer to the shock that might otherwise follow. In 
short, code generation in a compiler is difficult, time consuming, and can be 
frustrating. In this chapter we will introduce the most basic aspect to any 
programming language: expression evaluation. We will also introduce the 
code-generation API that is presented in <TT>SAL-CodeGen.CPP</TT>. For an 
explanation of the virtual machine's architecture, the reader should referr to 
appendix A. Appendix A goes over the architecture of the virtual machine and 
presents it in a way that is interactive and understandable. It will also 
introduce the debugger and teaches how to use it. For this chapter, we will 
assume that the reader is familiar with all of these things.
<P>In order to get a good overview of what we are going to do in this chapter, 
let's consider a very simple example. Suppose we wish to generate code for a 
statement that assigns the value of five to a 32-bit integer: <PRE>      x:= 5;
</PRE>To start out, we consider the rule for assignment: 
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/RULE35.gif">
  <P><FONT face=arial size=-1><B>Figure {RULE35}.</B> Rule Assignment, number 
  35.</FONT></P></MENU>We can literally follow the path that the compiler takes as 
it generates the code for this statement, from start to finish. Beginning at the 
left in rule Assignment, we call rule IdentExpr. 
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/RULE35a.gif">
  <P></P></MENU>In this case, we are going to have <TT>RuleIdentExpr()</TT> 
generate an address to <TT>x</TT> (We will use a flag of some sort in order to 
indicate this). We want the address of <TT>x</TT> placed onto the EES. This 
causes a single instruction to be generated: <PRE>      LGA 00000004
</PRE>Next we eat an assignment token: 
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/RULE35b.gif">
  <P></P></MENU>After this, we call rule Expression. Rule Expression calls many 
other rules, and eventually calls rule Factor. 
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/RULE35c.gif">
  <P></P></MENU>Rule Factor handles all immediate constants, so it will be rule 
Factor's job to make sure the value of 5 gets placed onto the EES. It emits 
another instruction: <PRE>      LID 00000005
</PRE>At this point, the runtime EES will have two items on the stack, the value 
5 (as 32 bits) on the top, with the address to <TT>x</TT> underneath. <PRE>       __________
      |   . . .  |
      |----------|
      | x's addr |
      |----------|
      | 00000005 | &lt;--- EES Top
      |----------|
      |          |
      |
</PRE>At this point, we are basically finished parsing all that needs to be 
parsed for this simple statement. 
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/RULE35d.gif">
  <P></P></MENU>All that remains is for us to store the value that has been 
generated at the address that we have computed. We can do this with another 
single instruction: <PRE>      SSD 00000000
</PRE>The entire piece of code that stores the value 5 in <TT>x</TT> is thus: <PRE>      LGA 00000004   ; Loads an address, G+4 onto the EES
      LID 00000005   ; Loads the value of 5 onto the EES as a 32-bit quantity
      SSD 00000000   ; Store the four bytes on the top of the EES at the 
                     ;   address underneath
</PRE>This is a really simple example. In this chapter we are going to learn how 
to do this and many other things. By the end of this chapter our compiler should 
be able to perform assignments using expressions of arbitrary complexity. We 
will also get a strong grasp of records and arrays, and our compiler should be 
able to handle those, as well.
<P>The SAL compiler generates the program image byte by byte as it parses 
through the source program. A byte-array (or code array) stores the VM op-codes 
and their immediate parameters as they are generated. Functions and procedures 
are compiled and written out to disk along with string data and external fixups. 
At any given time, the code array will be partially filled, with a program 
counter holding the index to the byte where the next instruction will be 
written. <PRE>                                                                                PC 
                                                                                 |
                                                                                 V
      -+---+----+----+----+----+---+----+----+----+----+---+----+----+----+----+-------
       |LGA| 04 | 00 | 00 | 00 |LID| 00 | 00 | 00 | 00 |SSD| 00 | 00 | 00 | 00 | ...    
      -+---+----+----+----+----+---+----+----+----+----+---+----+----+----+----+-------
</PRE>
<MENU><FONT face=arial size=-1><B>Figure {CODARR}.</B> The code array and the 
  compiler's program counter (PC). This particular sequence of instructions 
  shows our previous example once it is complete.</FONT> </MENU>We have several 
functions within our code-generation library that properly pack byte-codes into 
the code array. They all begin with <TT>Emit</TT>. For instance, 
<TT>Emit1()</TT> emits a single byte. Here is an example of the code array 
before and after a call to <TT>Emit1()</TT>: <PRE>      Before a call to Emit1():      PC 
                                     |
                                     V
      -+---+----+----+----+----+---+----+----+----+---+-
       | ... Other instructions ...|  Garbage ...    
      -+---+----+----+----+----+---+----+----+----+---+-


      After a call to Emit1(0FFh):        PC 
                                          |
                                          V
      -+---+----+----+----+----+---+----+----+----+---+-
       | ... Other instructions ...|0FFh|  Garbage ...    
      -+---+----+----+----+----+---+----+----+----+---+-
</PRE>As we can see, a single byte is emitted, and the program counter is 
incremented one place. An assignment can be broken up into a hierarchy of rules. 
We will talk about each of these rules in this chapter. The hierarchy of rules 
looks like so: 
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/ASSNTX.gif">
  <P><FONT face=arial size=-1><B>Figure {ASSNTX}.</B> Hierarchy of various parts 
  of an assignment. This graph is crucial to properly understanding the 
  relationship between various parts of an assignment.</FONT></P></MENU>Again, 
performing an assignment is essentially loading the address of some variable 
onto the EES, loading the value of some expression onto the EES, and then 
storing the result of the expression into the address. The prototype for 
<TT>RuleAssignment()</TT> is unremarkable, and appears like the prototypes for 
most of the other rules <PRE>      void RuleAssignment( Set Follow );
</PRE>Let's take a closer look at what we need to do. In figure {GENASSN} we can 
see rule Assignment with several callouts showing various tasks that need to be 
done, and where in the rule they need to be done. 
<MENU><IMG src="Chapter 6 Assignment and Expressions.files/GENASSN.gif">
  <P><FONT face=arial size=-1><B>Figure {GENASSN}.</B> A diagram of how rule 
  Assignment is implemented. Also shown is a section of the code array that 
  shows the various parts of the assignment in machine code.</FONT></P></MENU>
<H3>6.1.1 Setting Up and Calling Rule IdentExpression</H3><!----------------------------------------------------------------------------->We 
begin by declaring two type variables. These are needed to keep track of the 
right-hand and left-hand types. Next, we call <TT>RuleIdentExpr()</TT>. This 
will be explained shortly. The important thing about this rule is that it is 
used to handle identifiers on the left side of the assignment as well as the 
right. Remember that on the left side we want an identifier's address, and on 
the right side we want its value. Notice that we pass in the value 
<TT>false</TT> as one of <TT>RuleIdentExpr()</TT>'s parameters. This tells it to 
make an lvalue (pronounced L-value, it is a value on the left-hand side of an 
assignment. There is also an rvalue for the right-hand side). We also pass in 
<TT>LType</TT>. <TT>RuleIdentExpr()</TT> will make an lvalue and set LType to 
the type of the variable.
<P>
<H3>6.1.2 Handling Procedure Calls</H3><!----------------------------------------------------------------------------->Take 
a look at rule Assignment figure {GENASSN} again. Notice how there is a path 
that bypases the assignment token and the following expression--basically a 
valid assignment can be simply one identifier expression. How can this be?
<P>This is always the case with a procedure call. The statement, <PRE>      foo();
</PRE>is called an empty expression in SAL; it returns nothing. There are two 
ways to check for a procedure call. The easiest is to just see if the current 
token is a semicolon. However, that does not assure us that the user didn't just 
type something like: <PRE>      x;
</PRE>In order to make sure that this does not happen, we have a better way. A 
procedure's return type will be <TT>notyp</TT>, whereas the return type for the 
above expression will be whatever <TT>x</TT> was defined as. The best way is to 
merge the two: <PRE>      //*** Checking for procedure call<BR>
      if LType.type = notyp and Token.sy = ';' then
        return;
      end if;
</PRE>
<H3>6.1.3 Evaluating the Expression and Storing the Result</H3><!----------------------------------------------------------------------------->Before 
calling <TT>RuleExpression()</TT> we want to set RType equal to LType. We then 
call <TT>RuleExpression()</TT> and pass it in. This rule will use the value in 
RType to resolve ambiguities in type (for constants, especially), and will set 
the value of RType to whatever the type of the expression ends up being. After 
this rule returns we check to see that <TT>LType</TT> is the same as 
<TT>RType</TT>. We can only assign integers to integers, cardinals to cardinals, 
reals to reals, etc. If the two types do not match then we generate an error.
<P>Once we know that two types are compatible, we are faced with two situations. 
We are doing either an assignment of complex data types (like a record or an 
array), or we are doing an assignment of scalar data types (variables).
<P><B>6.1.3.1 Assigning Complex data types</B><BR>There are two forms of complex 
assignment. One is a record or an array copy, and the other is a string copy.
<P>
<MENU><B>Record and Array copy.</B> This case is true when 
  <TT>LType.IsComplexType()</TT> returns <TT>true</TT>. Here is an example of a 
  record copy: <PRE>      type
        RecType is record
          a: int;
          b: char;
          c: real;
        end record;

      var
        x, y: RecType;    //*** Declare two records of the same type

      begin
        x.a:= 1;
        x.b:= 'A';
        x.c:= 1.5;

        y:= x;            //*** Do a record copy
</PRE>In this example, we declare a type that is a record. We fill <TT>x</TT> 
  with data, and then assign it to <TT>y</TT>. The assignment performed is a 
  memory-copy and not a pointer-copy. In other words, the data is duplicated, 
  not the pointer to the data. The difference is that with a pointer copy, a 
  change to <TT>x</TT>'s data appears immediately in <TT>y</TT>. This is because 
  they are actually pointing to the same block of memory. The type of assignment 
  that we want is a memory copy.
  <P>Before performing a memory copy, we need to confirm that the two are 
  actually of identical type. Simply veryifing that <TT>LType.type</TT> is the 
  same as <TT>RType.type</TT> will tell us if they are both records or both 
  arrays, but are they the same record or array? This is done by comparing the 
  block pointers of <TT>LType</TT> and <TT>RType</TT>. If they are pointing to 
  the same block in the symbol table, then we know that they are of the same 
  declared type.
  <P>Performing a memory copy in the SAL VM is done by using the <TT>SYS</TT> 
  instruction. <TT>SYS</TT> is a way for SAL programs to get at some low-level 
  routines that are implemented in fast native code. The particular system call 
  that we are going to use in order to copy an array or record is service 04h 
  (memory manipulation), function 00 (memory copy). This function takes a single 
  16-bit size value, and two 32-bit pointers from the EES. The source address is 
  the top one, and the destination address is the one underneath. The EES should 
  look like this: <PRE>      +------+
      |  ... |
      | Dest |
      | Src  |
      | Size | &lt;--- EES top
      |      |
</PRE>The needed size can be retrieved from <TT>RType.block-&gt;size()</TT>. 
  The code to do the mem-copy looks like this:
  <P><PRE>       Emit (LIW,RType.getExtra()-&gt;size());
      Emit (SYS, 04h, 00h );
</PRE>
  <P><B>String Copy.</B> A string copy in SAL is equivalent to a 
  <TT>strncpy()</TT> in C. In SAL we want to make sure that <TT>LType</TT> is an 
  array of characters. This is easily verified by checkig its block information. 
  The <TT>RType</TT> for a string will have its type as arraytyp, its subtype as
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -