⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 appendix a debugging and virtual machine architecture.htm

📁 英文版编译器设计:里面详细介绍啦C编译器的设计
💻 HTM
📖 第 1 页 / 共 2 页
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0063)http://topaz.cs.byu.edu/text/html/Textbook/AppendixA/index.html -->
<HTML><HEAD><TITLE>Appendix A: Debugging and Virtual Machine Architecture</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2800.1458" name=GENERATOR></HEAD>
<BODY>
<CENTER>
<H2>Appendix A<BR>Debugging and Virtual Machine Architecture</H2></CENTER>
<H2>Introduction</H2><!----------------------------------------------------------------------------->This 
chapter will serve as a buffer to the shock that might otherwise follow. I 
short, code generation in a compiler is difficult, time consuming, and can be 
frustrating. In this chapter we will introduce some of the basic elements to 
making a SAL program run. We will also introduce the code-generation API that is 
presented in <TT>SAL-CodeGen.CPP</TT>. As we dive into this, we will go over the 
architecture of the virtual machine and present this in a way that is 
interactive and understandable. We will also introduce the debugger and teach 
how to use it.
<P>
<H2>A.1 Debugging a SAL Program</H2><!----------------------------------------------------------------------------->Let's 
start by learning a little about the architecture of the SAL Virtual Machine 
(VM). There is a great deal of information to be absorbed in a short time, so we 
will try to make our explanations clear. Start by getting the compiler and VM, 
and putting them into the same directory. Then, type in the following program 
and compile it. <PRE>      program Test;
        var
          s: int;

      begin
        write "Enter a number: ";
        read s;

        s:= s + 5;

        write "\ns + 5 = ", s, nl;
      end program.
</PRE>Let's debug this program. Type the following line: <PRE>      runsal -d -p test
</PRE>You should see something that looks like this: <PRE>      C:\SALTest&gt;d&gt;runsal -d -p test</D>

      SAL Interpreter
      Version 1.0  11-1997
      Brigham Young University
      Department of Computer Science
      Compiler Research Laboratory

      Build date: (Mon Jan 11 18:57:34 1999)
      00CF0D70: ENTR  00000000
      &lt;Enter&gt;:StpIn  &lt;Space&gt;:StpOvr  &lt;R&gt;:StpOut  &lt;?&gt;:Help  &lt;Esc&gt;:Quit
</PRE>After the startup banner and the build date we can see a line that has the 
instruction <TT>ENTR</TT> followed by a 4-byte dword that is zero. This is the 
first instruction of the program. Below that is a prompt for the most common 
next commands. Commands are given to the debugger in the form of a single 
keystroke. Hit the '?' key. You will see a summary of all the commands that can 
be given. <PRE>      Help commands:
      ----------------------------------------------
       E:  View the EES
       G:  View global memory
       L:  View local stack memory
       S:  View string data
       M:  View memory
       D:  Decode procedure.
      
      Enter  Step into a procedure
      Space  Step over a procedure
       R     Step out of (return from) a procedure
       X     Continue execution

      [esc]  Quit
</PRE>Let's see what our program looks like. Hit 'D'. When it asks for the 
procedure number, type 0 and hit enter. <PRE>      Decode procedure num (in hex): <B>0</B>

      Decoding procedure 0000:
      00CF0D70: ENTR  00000000
      00CF0D75: LGA   00000004
      00CF0D7A: TS
      00CF0D7B: SJPZ  00CF0D7F
      00CF0D7E: RTN
      00CF0D7F: LSTA  00000000
      00CF0D84: SYS   00
      00CF0D86: LGA   0000000D
      00CF0D8B: LIB   14
      00CF0D8D: SYS   06
      00CF0D8F: SSD   00000000
      00CF0D94: LGA   0000000D
      00CF0D99: LGD   0000000D
      00CF0D9E: LID   00000005
      00CF0DA3: ADD   _s32  _s32
      00CF0DA6: SSD   00000000
      00CF0DAB: LSTA  00000011
      00CF0DB0: SYS   00
      00CF0DB2: LGD   0000000D
      00CF0DB7: LID   00000000
      00CF0DBC: LIB   14
      00CF0DBE: SYS   02
      00CF0DC0: LIB   0A
      00CF0DC2: SYS   01
      00CF0DC4: LEXA
      00CF0DC5: LSD   00000000
      00CF0DCA: LIB   00
      00CF0DCC: UCMP  _u32  _u8
      00CF0DCF: EQL
      00CF0DD0: JPNZ  00CF0E01
      00CF0DD5: LEXA
      00CF0DD6: LSD   00000000
      00CF0DDB: COPT  _u32
      00CF0DDD: LIB   00
      00CF0DDF: UCMP  _u32  _u8
      00CF0DE2: EQL
      00CF0DE3: JPNZ  00CF0DFA
      00CF0DE8: COPT  _u32
      00CF0DEA: LSA   00000008
      00CF0DEF: SWAP  0004  0004
      00CF0DF4: MFREE
      00CF0DF5: JMP   00CF0DDB
      00CF0DFA: POP   0004
      00CF0DFD: LIW   000E
      00CF0E00: TRAP
      00CF0E01: RTN
      End of procedure.
</PRE>It will print out about 21 lines and ask you if you want to view more. If 
you hit any key other than 'N' it will keep on printing stuff. Let's go over 
what this code does and step through it piece by piece. On the first line we see 
this: <PRE>      00CF0D70: ENTR  00000000
</PRE>This instructions starts every procedure in SAL. In fact, the VM will halt 
if this is not the first instruction that it encounters in a procedure. All it 
does is allocate space for local variables on the procedure stack. Since this is 
the main procedure all of its variables are global, so no space is needed. Hit 
the space bar once and let's step to the next instruction.
<P>The next section of code looks like this: <PRE>      00CF0D75: LGA   00000004
      00CF0D7A: TS
      00CF0D7B: SJPZ  00CF0D7F
      00CF0D7E: RTN
</PRE>The first instruction calculates an address by taking the value of the G 
register and adding 4 to it. The instruction is called Load Global Address 
(LGA). The G register is a pointer that keeps track of the start of global 
memory for the current module. The value that we want is a single-byte flag at 
the fourth offset from the start of global memory.
<P>SAL programs are linked at runtime. Although this link is still static, the 
possibility that two or more modules of a program might rely on a common 
library. If this is true the VM needs to make sure that the module only gets 
initialized once. What this bit of code is doing is checking a flag to see if it 
has been set. Let's take a look at the byte that we want to look at. Hit the 'G' 
key. <PRE>      Memory dump at G = 00CF0C90
      Length: 13
       Address: +0 +1 +2 +3 +4 +5 +6 +7   +8 +9 +A +B +C +D +E +F | ASCII text:
      ------------------------------------------------------------+------------------
      00CF0C90  D0 0C CF 00 <B>00</B> 00 0D CF - 00 00 00 00 00          | ........ .....
</PRE>We can see that (in this case) G is pointing to the address 00CF0C90h. At 
offset 4 (highlited in bold) we wee that this flag is set to zero, indicating 
that our module has not yet been initialized. Initialization consists of setting 
up pointers to global arrays and records, calling constructors of global 
classes, and executing the main procedure of libraries (or in our case 
programs).
<P><!----------------------------------------------------------------------------->Consider 
the memory dump for a moment. On the left you can see the starting address for 
the current row. It reads 00CF0C90h. Along the top of the dump is a sequence of 
numbers labeling the columns, numbered +0, +1, +2, and so on up to +F. We can 
look up the value of any hex address by taking the least-significant digit to 
find the column and use the remainder to find the row.
<P>To the right we can see the actual ASCII characters for each byte. This is 
very useful in viewing string memory, as we shall see later. Right now none of 
the bytes are printable characters so the debugger merely prints periods.
<P>A word about SAL VM memory. Anything longer than a byte is arranged in 
"little endian" order. In other words, the little end (i.e., the least 
significant byte) is listed first and the most significant byte is listed last. 
With the SAL VM endian order is determined by the host architecture, in our case 
Intel x86. If we were running on a HP or Sun workstation the byte order would be 
in "big endian".
<P>The <TT>LGA</TT> instruction has not yet executed, so let's hit the space bar 
one more time and step through it. <TT>LGA</TT> leaves its result on the 
Expression Evaluation Stack (EES). Let's look at it now. Hit 'E'.
<P><PRE>        EES:
        -----------
        00 CF 0C 94
</PRE>The only exception to the endian rule is the EES, which is always the 
reverse of the host architecture. The reason is that the EES is a byte stack; 
items are stored on the EES one byte at a time reguardless of the original data 
size. As a LIFO structure, the ordering ends up reversed.
<P>The next instruction is <TT>TS</TT>. This stands for Test and Set. This 
instruction does three things. 
<OL>
  <LI>It eats an address from the EES. 
  <LI>It gets the byte at that address and stores it back on the EES. 
  <LI>It writes the value of 1 to that address. </LI></OL>This all happens in one 
instruction cycle. If you have taken a class in operating systems you will 
understand how this instruction is significant for controlling threads and 
processes.
<P>The next instruction is a short jump if the byte at the top of the EES is 
zero to address 00CF0D7Fh. The next instruction is a <TT>RTN</TT>. What these 
four instructions effectively accomplish is to check if this module's main 
procedure has been executed. If so, the VM should return immediately. In our 
case, it keeps going.
<P>Here is the next group of instructions. <PRE>      00CF0D7F: LSTA  00000000
      00CF0D84: SYS   00
</PRE>Hit the space bar a couple of times until the current instruction is 
<TT>LSTA</TT>. The <TT>LSTA</TT> instruction stands for Load STring Address. The 
address is loaded to the EES. Tap the space bar one more time. If we take 
another look at the EES, we can see four bytes. We can view the string area for 
this module by hitting 'S'. This is what we get. <PRE>      String memory at 00CF0D00
      Length: 27
       Address: +0 +1 +2 +3 +4 +5 +6 +7   +8 +9 +A +B +C +D +E +F | ASCII text:
      ------------------------------------------------------------+------------------
      00CF0D00  <B>45 6E 74 65 72 20 61 20 - 6E 75 6D 62 65 72 3A 20</B> | <B>Enter a  number: </B>
      00CF0D10  <B>00</B> 0A 73 20 2B 20 35 20 - 3D 20 00                | <B>.</B>.s + 5  = .
</PRE>This gives us a dump of the contents of string memory for this module. 
Starting at address 00CF0D00 we can see that the string being pointed to is 

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -