⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 chapter 7 code generation for control statements.htm

📁 英文版编译器设计:里面详细介绍啦C编译器的设计
💻 HTM
📖 第 1 页 / 共 5 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0062)http://topaz.cs.byu.edu/text/html/Textbook/Chapter7/index.html -->
<HTML><HEAD><TITLE>Chapter 7: Code Generation for Control Statements</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1458" name=GENERATOR></HEAD>
<BODY>
<CENTER>
<H2>Chapter 7<BR>Flow-Control Statements</H2></CENTER>Central to to building a 
control statement such as <TT>for</TT>, <TT>if</TT>, or any of the loops is 
understanding how the branch instructions in the SAL VM work. In figure {AJUMP} 
we can see the disassembled portion of a code fragment for a short function. The 
code has been presented as it appears in memory (in little-endian order). We can 
see how each byte is laid out starting from the first address. 
<MENU><IMG 
  src="Chapter 7 Code Generation for Control Statements.files/AJUMP.gif">
  <P><FONT face=arial size=-1><B>Figure {AJUMP}.</B> A fragment of the machine 
  code in memory that is generated for function <TT>foo()</TT>. The data for 
  each instruction is in little-endian order.</FONT> </P></MENU>Instructions in 
the VM are logically processed in a continuous cycle by first reading the entire 
instruction, and then processing it. By "logically", we mean that reguardless of 
how the virtual machine implements a particular instruction, it is assumed to be 
read in one piece from the instruction stream, i.e., the instruction with all of 
its immediate parameters. Once the instruction is read, the program counter is 
incremented to the byte that immediately follows the instruction. After these 
two steps, the instruction is finally processed.
<P>Let us suppose that the program counter is currently at address 0008FB0C as 
shown. On the next instruction cycle, the five bytes (shown in color) from 
address 0008FB0C througn 0008FB10 will be read into the VM, and the PC will be 
moved to address 0008FB11. The particular branch instruction that is used in 
this example is a <TT>JPZ</TT>, or "JuMp if Zero". This instruction will take a 
single byte from the EES. If the byte is zero, the VM will add the offset from 
the immediate parameter to the current value of the PC. The offset can be either 
positive or negative (2's compliment).
<P>There are three branch instructions that we need to be immediately concerned 
about. 
<MENU><TT><B>JMP</B></TT> This is an unconditional branch. It will not read 
  anything from the EES, and will immediately branch to the offset that is given 
  in its immediate parameter.
  <P><TT><B>JPZ</B></TT> This instruction will read a byte from the top of the 
  EES. If the byte is zero, the offset within the immediate parameter will be 
  added to the current value of the PC.
  <P><TT><B>JPNZ</B></TT> This does the opposite of the <TT>JPZ</TT> 
  instruction. It will read a byte from the EES, and if it is anything but zero 
  it will add the offset within the immediate parameter to the current value of 
  the PC.
  <P></P></MENU>Basically, there are two types of jumps in compilers. There are 
forward jumps, and backward jumps. A forward jump will take the PC to some 
higher address, and the backward jump will take the PC to a lower address. The 
easiest for a compiler to do are backward jumps. The reason is that the compiler 
can easily remember the position in the code array where it wants to have the 
computer jump to. Here is what we mean. Suppose for a tiny instant that there 
actually exists a <TT>goto</TT> in SAL. We want to perform a backward jump like 
this: <PRE>      jumphere:
        write "I'm in an infinite loop!\n";
        goto jumphere;
</PRE>The compiler would compile this sequence of statements by remembering the 
PC where the write statement started, like this: <PRE>      DWORD backpc = PC;
</PRE>Once the goto was reached, the compiler would emit a <TT>JMP</TT> 
instruction with the offset of backpc - (PC + 4). The code would look like this: 
<PRE>      DWORD backpc = PC;

      RuleStatementSequence (...);

      Emit1(JMP);
      Emit4( backpc - (PC + 4) );
</PRE>
<P>Forward jumps are harder, since we need to know what offset to emit for the 
jump, and it is not known. <PRE>        goto jumphere;
        write "I will never be executed...\n";
      jumphere:
        ...
</PRE>We compensate by saving the PC of the jump offset (not of the jump 
instruction), and then we emit four bytes of zeros. After processing a block of 
statements, we then go back to where we emit the zeros and poke in the right 
values. To make this a little easier, we have two functions: 
<MENU><TT><B>DWORD ForwardJump()</B></TT> This function emits four bytes of 
  zeros, and returns the address of the first byte. Its use is to reserve space 
  for a forward jump offset.
  <P><TT><B>void JumpFix(DWORD jPC)</B></TT> This function takes a value 
  returned by <TT>ForwardJump()</TT> and sets the offset at that address to jump 
  to the current spot in the code array.
  <P></P></MENU>The process looks generally like this: <PRE>      Emit1(JMP);
      DWORD fwdjmp = ForwardJump();

      RuleStatementSequence(...);

      JumpFix(fwdjmp);
</PRE><!----------------------------------------------------------------------------->
<H2>7.1 If Statement</H2><!----------------------------------------------------------------------------->The 
basic layout for the <TT>if</TT> statement can be seen in figure {RULE36}. As we 
can see, there are only three steps involved. The real key to simplicity here is 
to keep a list (the implementation of which is left to the student) of jump 
fixups to exit the if statement. 
<P>
<MENU><IMG 
  src="Chapter 7 Code Generation for Control Statements.files/RULE36.gif">
  <P><FONT face=arial size=-1><B>Figure {RULE36}.</B> Code generation for rule 
  IfStatement.</FONT> </P></MENU>To begin with, we parse an <TT>if</TT> keyword, 
and then call <TT>RuleExpression()</TT>. We initialize the input type to 
<TT>booltyp</TT>. Once <TT>RuleExpression()</TT> returns, we make sure that the 
type of the expression was Boolean. If not, we give an error.
<P>After parsing the <TT>then</TT>, we call <TT>RuleStatementSequence()</TT>. If 
the following token is an <TT>end</TT>, then we know that we are at the end of 
the <TT>if</TT> statement. Otherwise, we need to insert a jump leading out, 
otherwise the VM will execute the following condition, which we do not want to 
have happen.
<P>Whether or not there is an <TT>end</TT>, or an <TT>elseif</TT>, or simply an 
<TT>else</TT>, we want to call <TT>JumpFix()</TT> right after this, so if the 
runtime condition was false, the VM will jump to the point that we are currently 
at.
<P>Since all the jumps leading out are forward, and since they all land at the 
same address, it makes sense to keep them in a list, where they can be fixed up 
after we are done processing the whole <TT>if</TT> statement. The general 
pattern that we want to achieve is shown in figure <TT>{IFMEM}</TT>. 
<MENU><IMG 
  src="Chapter 7 Code Generation for Control Statements.files/IFMEM.gif">
  <P><FONT face=arial size=-1><B>Figure {IFMEM}.</B> Pattern of code for an 
  if-statement.</FONT> </P></MENU>Here is a sample of code, and the pattern of 
assembly instructions that should be generated. <PRE>

      program TestIf;

        var
          x: int;

      begin
        if x&lt;1 or x&gt;10 then
          write "I said between 1 and 10, dude!\n";
        elseif x&lt;7 then
          write "Too low\n";
        elseif x&gt;8 then
          write "Too high\n";
        else
          write "You got it!!\n";
        end if;
      end program;


      *** Module table ************************************

      Module: TestIf  15:34.30  3-12-99
      Size: 4 Parent: 0x00000000
         var   1:b              int      s32    fl:   0	Offset: 4 pass by val


      *** DISASSEMBLY *************************************

        ENTR  00000000
        LGA   00000004  
        TS    
        SJPZ  begin  
        RTN

      begin:
        LGD   00000004        if x&lt;1 or x&gt;10 then
        LID   00000001
        LESD
        LGD   00000004
        LID   0000000A
        GTRD
        ORB
        JPZ   NxtCnd01
        LSTA  0000001E          write "I said between 1 and 10, dude!\n";
        SYS   00
        JMP   EndIf

      NxtCnd01: 
        LGD   00000004        elseif x&lt;7 then
        LID   00000007
        LESD
        JPZ   NxtCnd02
        LSTA  0000003E          write "Too low\n";
        SYS   00
        JMP   EndIf

      NxtCnd02:
        LGD   00000004        elseif x&gt;8 then
        LID   00000008
        GTRD
        JPZ   NxtCnd03
        LSTA  00000047          write "Too high\n";
        SYS   00
        JMP   EndIf

      NxtCnd03:
                              else
        LSTA  00000051          write "You got it!!\n";
        SYS   00
                              end if;
      EndIf:

        RTN                   end program.
</PRE></TD></TR></TABLE>&nbsp;
<P>&nbsp;
<P>&nbsp;
<P><!----------------------------------------------------------------------------->
<H2>7.2 Pre-Test Loop</H2><!----------------------------------------------------------------------------->The 
pre-test loop is much simpler than the <TT>if</TT> statement. It makes use of a 
forward jump to make the VM skip over the body if the condition fails, and a 
backward jump to take the VM back to the conditional part after each iteration 
of the body.
<P>The only tricky part is to remember to make sure we test for true conditions 
for <TT>while</TT> loops and false conditions for <TT>until</TT> loops. This can 
be done by using a variable to store the appropriate op-code, as shown in figure 
{RULE37}.
<P>
<MENU><IMG 
  src="Chapter 7 Code Generation for Control Statements.files/RULE37.gif">
  <P><FONT face=arial size=-1><B>Figure {RULE37}.</B> Code generation for rule 
  PreTestLoop.</FONT> </P></MENU>In the first step, we want to check for a 
<TT>while</TT> or an <TT>until</TT>. If we are doing an <TT>until</TT> loop then 
we want to test for a false condition, and we emit a <TT>JPNZ</TT>. Otherwise we 
are testing for a true condition, and we emit a <TT>JPZ</TT>We use a variable 
called <TT>JumpOp</TT> to keep track of this. Then we save the PC for a backward 
jump.
<P>Next, we call <TT>RuleExpression()</TT>. Just like with the <TT>if</TT> 
statement, we pre-initialize the input type to a Boolean. If the type returned 
is not Boolean, then we give an error.
<P>Next comes the crucial part, and do a forward jump. After emitting the 
op-code that we previously saved, we call <TT>ForwardJump()</TT> and save the 
address returned.
<P>After calling <TT>RuleStatementSequence()</TT>, we make an unconditional jump 
that takes us back to the start of the condition for the loop. After emitting 
the branch, the code for the loop is finished, and we can go back and fix up the 
forward jump so that when the condition fails, we can jump out of the loop to 
the current spot in the code array. The pattern that we would like to achieve 
looks like figure {PTMEM}. The patterns for both types of pre-test loops are 
shown. 
<MENU><IMG 
  src="Chapter 7 Code Generation for Control Statements.files/PTMEM.gif">
  <P><FONT face=arial size=-1><B>Figure {PTMEM}.</B> Patterns of code for both 
  types of pre-test loop.</FONT> </P></MENU>Here is a sample of code, and the 
pattern of assembly instructions that should be generated. <PRE>
      program TestWhile;

        const A = 10;

        var 
           b: int;

      begin
        b := 0;
        while b &lt; A do
          b := b+1;
        loop;
      end program.


      *** Module table ************************************

      Module: TestWhile  15:34.30  3-12-99
      Size: 4 Parent: 0x00000000
        cnst   1:A              int      s32    fl:   010
         var   1:b              int      s32    fl:   0	Offset: 4 pass by val


      *** DISASSEMBLY *************************************

        ENTR  00000000
        LGA   00000004  
        TS    
        SJPZ  begin  
        RTN

      begin:
        LGA   00000009          b := 0
        LID   00000000  
        SSD   00000000  

      StartLoop:
        LGD   00000009          while b &lt; A do
        LID   0000000A  
        LESD
        JPZ   EndLoop
        LGA   00000009            b := b + 1
        LGD   00000009  
        LID   00000001  
        ADDD
        SSD   00000000  
        JMP   StartLoop:        loop;
      EndLoop:

        RTN                   end program.

</PRE>&nbsp;
<P>&nbsp;
<P>&nbsp;
<P><!----------------------------------------------------------------------------->

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -