📄 appendix b the salvm instruction set.htm

📁 英文版编译器设计：里面详细介绍啦C编译器的设计
💻 HTM
📖 第 1 页 / 共 4 页
字号:
                   |            |           |          |    |  [Data]  |
                   |            |           |          |    |          |
                   |            |           |__________|    |__________|
                   |            |             External
          [PC]     |            |              Global           EES
           |    ___|___   ______|__________   Data Area
           V   /       \ /                 \
   -----+-----+----+----+----+----+----+----+-----
    ... | LEx | modnum  |    dword offset   | ...
   -----+-----+----+----+----+----+----+----+-----
      (LEx or SEx)

      <B>Figure {LSEXTF}</B>  A representation of the way the LEx and SEx instructions
      operate.
</PRE>The functionality of these instructions is a little obtuse. They require a 
certain amount of abstract thought in order to visualize. Of all the load/store 
op codes, this is one of the more complex.
<P>
<H3>B.1.5 Using a pointer on the EES</H3><!-------------------------------------------------------------------------------->The 
VM may also work with addresses directly. All of the segments in memory reside 
in the same address space. That means that even though some data is referenced 
by G and other data is referenced by L, and still other data is referenced by a 
lookup in the MAT, it is all still contained within the same address space.
<P>In addition to working with data indirectly through G or L, the VM may also 
work with a pure address, and access data directly. How addresses can be 
computed is discussed in the next section. Typically, when loading or storing, 
the pointer is consumed, i.e., poped off the of the EES. The best way to 
maintain a pointer while it is being worked with is to make a copy.
<P>The op-codes to load and store via a pointer take on the same familiar 
general format: <PRE>        Num  Instruction 
      =========================================
         1C   LSB <I>&lt;u32 offset&gt;</I>
         1D   LSW <I>&lt;u32 offset&gt;</I>
         1E   LSD <I>&lt;u32 offset&gt;</I>
         1F   LSQ <I>&lt;u32 offset&gt;</I>
         2C   SSB <I>&lt;u32 offset&gt;</I>
         2D   SSW <I>&lt;u32 offset&gt;</I>
         2E   SSD <I>&lt;u32 offset&gt;</I>
         2F   SSQ <I>&lt;u32 offset&gt;</I>

      <B>Table {LSEEST}</B>  The instructions to load and store data using a pointer
      on the EES.
</PRE>Realize that the S in these op-codes referrs to the EES, and not to the S 
register. The S register is only used as a place holder, demarking the end of 
the local data area. There are no instructions to load and store relative to the 
S register. All of the LSx and SSx op-codes referr to the EES. This must be kept 
in mind, or it can become very confusing.
<P><TT>LSx </TT>works by popping a 32-bit value from the EES, which is used as a 
base pointer. The immediate offset is then added to the pointer, and the data 
from that address in memory is fetched and pushed onto the EES. The <TT>SSx</TT> 
instruction does the opposite. Notice that the data to be stored is <I>at the 
top</I> of the EES, and the pointer to where it gets stored is underneath. This, 
also, must be remembered, or things can get ugly.
<P><PRE>                                              |   ...    |
                                              |----------|
                            [Base Address]--&gt; |          |
                          (popped from EES)|  |          |
                                           |  |          |
                                           |  |          |
                                           |  |          |
                                           V  |----------|
                         ,-------- Offset --&gt; |  [Data] --------------.
                         |                    |----------|            |
          [PC]           |                    |          |            |
           |    _________|________            |          |       |    V     |
           V   /                  \           |          |       |  [Data]  |
   -----+-----+----+----+----+----+-----      |          |       |          |
    ... | LSx |    dword offset   | ...       |   ...    |       |__________|
   -----+-----+----+----+----+----+-----         Memory              EES

      <B>Figure {LSEESF}</B>  A representation of the way the LSx instructions 
       operates.
</PRE><!-------------------------------------------------------------------------------->
<H3>B.1.6 Computing Pointers</H3>There are a group of instructions specifically 
for computing pointers. Again, the only registers within the SALVM are base 
pointers to various segments in memory. While the values of these registers may 
be taken, they may not ever be set--at least not directly. These registers are 
listed in table {COMPPTR}. <PRE>       Num  Instruction
      ======================================
        08   LLA <I>&lt;u32 offset&gt;</I>
        09   LGA <I>&lt;u32 offset&gt;</I>
        0A   LSA <I>&lt;u32 offset&gt;</I>
        0B   LEA <I>&lt;u16 modnum&gt;</I> <I>&lt;u32 offset&gt;</I>
      
      <B>Table {COMPPTR}</B>  A description of the op-codes and their machine codes
</PRE>All of these instructions take a 32-bit unsigned offset as an immediate 
parameter. The <TT>LEA</TT> instruction also takes an immediate unsigned 16-bit 
parameter, designating an entry in the MAT from which to extract an external G. 
Each of these instructions will compute an address, and leave it on the stack. 
Let's say a programmer wants to get a pointer to a variable in global memory. 
The programmer will know the address of the variable only as an offset from G. 
In order to get a pure pointer to the variable, the programmer will need to add 
that offset to G, and that will get the variable's pointer. The <TT>LGA</TT> 
instruction would be used. If the variable was stored at offset 2Ah from G, then 
the programmer would use the instruction, <PRE>      LGA  0000002A
</PRE>If the programmer wants to know the value of one of these registers, and 
nothing more, it can be done by using an offset of zero. This example will 
retrieve the value of L. <PRE>      LLA  00000000
</PRE>As stated before, all of these instructions will compute an address and 
leave it on the stack. The <TT>LSA</TT> instruction is particularly useful, in 
that it is used to add a signed 32-bit quantity to another 32-bit quantity (a 
pointer) on the top of the EES. This can be very useful for walking down through 
several nested levels of records. Either an <TT>LGA</TT> or an <TT>LLA</TT> will 
get the record's base address onto the EES, and then one or more successive 
<TT>LSA</TT> instructions will take the pointer to the appropriate offset.
<P>The <TT>LEA</TT> instruction is used to get a pointer to a global variable in 
another module. The first immediate parameter is a 16-bit number designating an 
external module. The instruction will then extract the G for that module (using 
the MAT), and add the 32-bit offset, in order to get the global variable's 
pointer. 
<H3>B.1.7 Accessing Other Registers and Memory Segments</H3><!-------------------------------------------------------------------------------->There 
are a series of additional registers for retrieving various other addresses. 
These are listed in table {AUXADDR}. <PRE>       Num  Instruction
      ==================================
        20   LPCA <I>&lt;s32 offset&gt;</I>
        21   LSTA <I>&lt;s32 offset&gt;</I>
        22   LVTA <I>&lt;u16 modnum&gt;</I> <I>&lt;u16 procnum&gt;</I>
        23   LEXA (no offset)
</PRE><TT><B>LPCA</B></TT>. The <TT>LPCA</TT> instruction is used to get the 
current value of the PC. It will also add a 32-bit offset to the value. By 
definition, <PRE>      LPCA  00000000
</PRE>returns the address of the next instruction in the stream. This 
instruction is useful for computing a jump address. In the SAL compiler, it is 
used for unwinding the stack during exception handeling. It has other uses, too.
<P><TT><B>LSTA</B></TT>. This instruction is used for fetching the address of a 
string constant. Only the string segment for the local module may be accessed; 
strings belonging to external modules may not. The use of the string segment was 
designed exclusively for the storage of string constants. Strings constants are 
never referred to symbolically, other than to copy their contents into some 
other area of memory that <I>is</I> symbolically accessed. Additionally, notice 
that there are no <TT>LSTx</TT> or <TT>SSTx</TT> instructions. Again, the only 
data that is stored in the string segment is string <I>constants</I>
<P>As example of this instruction's usage, suppose the programmer wants to print 
the string <TT>"Hello, world!"</TT> to the screen. The string will be stored in 
the local string segment at an offset that is known at compile time. For 
example, if the offset were at 1A4h, then the programmer would use, <PRE>      LSTA  000001A4
</PRE>The address of the string would be loaded onto the stack, and the 
programmer would then pass that address to a routine, which handeles strings.
<P><TT><B>LVTA</B></TT>. This instruction computes the address of a virtual 
table. Its use is related to classes, and is discussed at great length in 
chapter {VIRTUAL FUNCTIONS}. <TT><B>LEXA</B></TT>. This instruction is used to 
load the starting address of stack memory (i.e., the start of the segment into 
which L and S point). By definition, the first 32-bits of this segment store a 
pointer to the exception stack. The exception stack is not truly a part of the 
virtual machine's architecture, however, a pointer to the current thread's 
exception stack needs to be stored at a consistent address. The <TT>LEXA</TT> 
instruction exists so that that pointer may be properly set, or its value 
retrieved.
<P>
<H3>B.1.8 Using These Instructions Together</H3><!-------------------------------------------------------------------------------->Let's 
have a few examples of using these instructions together. First, we will cover 
assignment to a global variable, since it is the easiest. Suppose we have a 
variable called <TT>X</TT>, and we want to initialize it to zero. There are some 
things that we have to consider. First, since the architecture that we are 
working with is stack-based, we need to consider in which order the items need 
to appear on the EES. The second thing that we need to consider is the address 
at which the variable is stored. This is usually known at compile time. Third, 
we need to know the size of the variable. This is also known at compile time.
<P>If we want to assign a value to a variable, there are two ways to do it. Each 
way is dependant upon the architecture, and the way the instructions work. The 
easiest would be to simply store the value at the address of the variable. Since 
there is no instruction to store an immediate value, we need to load it onto the 
EES, first. We can accomplish this through a load-immediate instruction. Then we 
tell the VM to store the proper quantity of bytes at the proper offset from the 
proper base register. We always know beforehand the offset of all variables, and 
the segments where the reside. Supposing the variable was at offset 14h from the 
G (it's in global memory) and it was a word and we want to set it to zero, we 
would issue an instruction sequence like this: <PRE>      LIW 0000       ; Put the word containing zero on the EES
      SGW 00000014   ; Store the word at the top of the EES at offset 14h from G
</PRE>Very simple. The second way involves getting a pointer to the variable, 
and uses the <TT>SSW</TT> instruction. We know that these instructions take the 
data from the top of the EES and the address is under that, so we need to make 
sure that our data is on the stack in the correct order. Remember, the EES does 
not retain <I>any</I> type information, whatsoever. It merely works with 
quantities of information. We would issue a sequence of instructions like this:
<P><PRE>      LGA 00000014   ; Put the address of the variable on the EES
      LIW 0000       ; Put a word containing the value zero on the EES
      SSW 00000000   ; Store the word at the top of the EES at the address 
                     ;   underneath (no offset).
</PRE>This method is a little less straightforward. However, it lends itself 
very nicely to code generation in the compiler. In fact, this is the method 
discussed in this text.
<P>
<H2>B.2 Integer and Floating Point Arithmetic Instructions</H2><!-------------------------------------------------------------------------------->The 
Arithmetic Instructions are all fairly straightforward. They take no immediate 
parameters, they consume two equal quantities of bytes from the EES, perform an 
operation, and deposit the result on the EES in the same quantity of bytes. The 
VM has instructions for performing arithmetic on both integer and real data. 
They are listed in table {ARITHT}. <PRE>        Data Type(s)        bits add    sub    mul    div    mod    trunc   neg    abs
      ====================================================================================
         (un)signed byte    8    ADDB   SUBB   MULB   DIVB   MODB           NEGB   ABSB
         (un)signed word    16   ADDW   SUBW   MULW   DIVW   MODW           NEGW   ABSW
         (un)signed dword   32   ADDD   SUBD   MULD   DIVD   MODD           NEGD   ABSD
         (un)signed qword   64   ADDQ   SUBQ   MULQ   DIVQ   MODQ           NEGQ   ABSQ
         single precision   32   FADDS  FSUBS  FMULS  FDIVS  FMODS  FTRNCS  FNEGS  FABSS
         double precision   64   FADDD  FSUBD  FMULD  FDIVT  FMODD  FTRNCD  FNEGD  FABSD
         tenbyte precision  80   FADDT  FSUBT  FMULT  FDIVT  FMODT  FTRNCT  FNEGT  FABST
         quad precision     128  FADDQ  FSUBQ  FMULQ  FDIVQ  FMODQ  FTRNCQ  FNEGQ  FABSQ

      <B>Table {ARITHT}</B>  This table shows all of the instructions that are used to
      perform arithmetic operations.
</PRE>Integers within the SALVM may be either signed 2's compliment, or 
unsigned, and can range in size from 8 bits to 64 bits. Floating point numbers 
can be single precision (32 bits) to quad precision (128 bits).
<P>The integral instructions assume a 2's compliment host architecture. Most of 
the instructions take two operands. <TT>FMODx</TT>, <TT>NEGx</TT> and 
<TT>FNEGx</TT>, and <TT>ABSx</TT> and <TT>FABSx</TT> take one operand.All of the 
floating point instructions start with an F, and all of the integeral 
instructions do not. The letter at the end of the instruction tells what size of 
data it works with. For instance, <B>MULD</B> is an integeral instruction that 
works on 32-bit quantities of data. It requires that two dwords be on the EES.It 
will remove them, multiply them together, and then push the result as a dword 
back onto the EES. All of these instructions work in a three-step fashion. In 
step one, the appropriate amount of data is removed from the EES (either one or 
two operands). In step two, the operation is performed, and in step three the 
result is pushed back onto the EES. All two-operand instructions require that 
the second operand be at the top of the EES, and not the first. Thus, if we want 
to add two single precision floats together, say 3.5 + 2.6, the numbers need to 
be on the EES in reverse order. This easiy to remember, as long as we push the 
operands onto the EES in the order that they appear in the equation. Thus, we 
would first push 3.5, and then 2.6. See figure {EESORD}. <PRE>         3.5 + 2.6      3.5 + 2.6
          ^                    ^

         |  3.5  |      |  2.6  |
         |       |      |  3.5  |
         |  ...  |      |  ...  |
         |_______|      |_______|
            EES            EES
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -