📄 http:^^www.cs.wisc.edu^~cs354-2^cs354^lec.notes^mal.html
字号:
Date: Tue, 05 Nov 1996 00:32:34 GMTServer: NCSA/1.5Content-type: text/htmlLast-modified: Wed, 30 Aug 1995 21:21:33 GMTContent-length: 20490<html><head><title> Lecture notes - Chapter 8 - MAL and Registers</title></head><BODY><h1> Chapter 8 -- MAL and registers</h1><pre>REGISTERS and MAL-----------------An introduction to the subject of registers -- from a motivationalpoint of view.This lecture is an attempt to explain a bit about why computersare designed (currently) the way they are. Try to remember thatspeed of program execution is an important goal. Desire for increasedspeed drives the design of computer hardware.The impediment to speed (currently): transfering data to and frommemory.look at a SAL instruction: add x, y, z -x, y, and z must all be addresses of data in memory. -each address is 32 bits. -so, this instruction requires more than 96 bits. if each read from memory delivers 32 bits of data, then it takes a lot of reads before this instruction can be completed. at least 3 for instruction fetch 1 to load x 1 to load y 1 to store z that's 6 transactions with memory for 1 instruction!How bad is the problem? Assume that a 32-bit 2's complement addition takes 1 time unit. A read/write from/to memory takes about 10 time units. So we get fetch instruction: 30 time units decode 1 time unit load x 10 time units load y 10 time units add 1 time unit store z 10 time units --------------------------------- total time: 62 time units 60/62 = 96.7 % of the time is spent doing memory operations.what do we do to reduce this number? 1. transfer more data at one time if we transfer 2 words at one time, then it only takes 2 reads to get the instruction. There is no savings in loading/storing the operands. And, an extra word worth of data is transferred for each load, a waste of resources. So, this idea would give a saving of 1 memory transaction. 2. modify instructions such that they are smaller. This was common on machines from more than a decade ago. Here's how it works: SAL implies what is called a 3-address machine. Each arithmetic type instruction contains 3 operands, 2 for sources and 1 for the destination of the result. To reduce the number of operands (and thereby reduce the number of reads for the instruction fetch), develop an instruction set that uses 2 operands for arithemtic type instructions. (Called a 2-address machine.) Now, instead of add x, y, z we will have load x, z (puts the value of z into x) add x, y ( x <- x + y ) so, arithmetic type instructions always use one of the operands as both a source and a destination. There's a couple of problems with this approach: - where 1 instruction was executed before, 2 are now executed. It actually takes more memory transactions to execute this sequence! at least 2 to fetch each instruction 1 for each of the load/storing of the operands themselves. that is 8 reads/writes for the same sequence. So, allow only 1 operand -- called a 1-address format. now, the instruction add x, y, z will be accomplished by something like load z add y store x to facilitate this, there is an implied word of storage associated with the ALU. All results of instructions are placed into this word -- called an ACCUMULATOR. the operation of the sequence: load z -- place the contents of address z into the accumulator (sort of like if you did move accumulator, z in SAL) add y -- implied operation is to add the contents of the accumulator with the operand, and place the result back into the accumulator. store x-- place the contents of the accumulator into the location specified by the operand. Notice that this 1-address instruction format implies the use of a variable (the accumulator). How many memory transactions does it take? 2 -- (load) at least 1 for instruction fetch, 1 for read of z 2 -- (add) at least 1 for instruction fetch, 1 for read of y 2 -- (store) at least 1 for instruction fetch, 1 for write of x --- 6 the same as for the 3 address machine -- no savings. BUT, what if the operation following the add was something like div x, x, 3 then, the value for x is already in the accumulator, and the code on the 1 address machine could be load z add y div 3 store x there is only 1 extra instruction (2 memory transactions) for this whole sequence! On the 3-address machine: 12 transactions (6 for each instr.) On the 1-address machine: 8 transactions (2 for each instr.)REMEMBER this: the 1 address machine uses an extra word of storage that is located in the CPU. the example shows a savings in memory transactions when a value is re-used. 3. shorten addresses. This restricts where variables can be placed. First, make each address be 16 bits (instead of 32). Then add x, y, z requires 2 words for instruction fetch. Shorten addresses even more . . . make them each 5 bits long. Problem: that leaves only 32 words of data for operand storage. So, use extra move instructions that allow moving data from a 32 bit address to one of these special 32 words. Then, the add can fit into 1 instruction.NOW, put a couple of these ideas together.Use of storage in CPU (accumulator) allowed re-use of data.Its easy to design -- put a bunch of storage in the CPU --call them REGISTERS. How about 32 of them? Then, restrictarithmetic instructions to only use registers as operands. add x, y, z becomes something more like load reg10, y load reg11, z add reg12, reg11, reg10 store x, reg12presuming that the values for x, y, and z can/will be used again,the load operations take relatively less time.The MIPS R2000 architecture does this. It has 1. 32 32-bit registers. 2. Arithmetic/logical instructions use register values as operands.A set up like this where arith/logical instr. use only registersfor operands is called a LOAD/STORE architecture.A computer that allows operands to come from main memory is oftencalled a MEMORY TO MEMORY architecture, although that term is notuniversal.Load/store architectures are common today. They have the advantages 1. instructions can be fixed length (and short) 2. their design allows (easily permits) pipelining, making load/store architectures faster (More about pipelining at the end of the semester)MAL---discussing some of the details of the MIPS architecture, and howto write assembly language.MIPS assembly language (or at least MAL) looks a lot like SAL,except that operands are now in registers.To reference a register as an operand, use the syntax $x, where x is the number of the register you want. Some limitations on the use of registers. Due to conventions set by the simulator, certain registers are used for special purposes. It is wise to avoid the use of those registers. $0 is 0 (use as needed) $1 is used by the assembler (the simulator in our case) -- don't use it. $2-7 are used by the simulator -- don't use them until you know what they are for and how they are used. $26-27 Used to implement the mechanism for calling special procedures that do I/O and take care of other error conditions (like overflow) $29 is a stack pointer -- you are automatically allocated a stack (of words), and the $sp is initialized to contain the address of the empty word at the top of the stack at the start of any program.On to some MAL instructions. Here are descriptions and samplesof only some instructions. There are far too many to be able togo over each one in detail.Some sample info for all the examples: hex address hex contents (opt) assembly lang. 00002000 0011aaee c1: .word 0x0011aaee 00002004 ???????? c2: .space 12 00002008 ???????? 0000200c ???????? 00002010 00000016 c4: .word 22 00002014 000000f3 c5: .word 0x000000f3Load/Store---------- la rt, label # load address place the address assigned to label into the register rt. example: la $9, c1 $9 gets the value 0x00002000 lw rt, label # load word place the word at address label into the register rt. lw rt, (rb) # load word place the word at address (rb) into the register rt. lw rt, x(rb) # load word place the word at address X + (rb) into the register rt. example: lw $10, c1 $10 gets the value 0x0011aaee lb rt, label # load byte place the byte at address label into the least significant byte of register rt, and sign extend the value to the rest of the register. lb rt, (rb) # load byte place the byte at address (rb) into the least significant byte of register rt, and sign extend the value to the rest of the register. lb rt, x(rb) # load byte place the byte at address X + (rb) into the least significant byte of register rt, and sign extend the value to the rest of the register. example: lb $10, c1 on a little endian machine: presuming $9 contains the value 0x00002000, $10 gets the value 0xffffffee sw rt, label # store word write the contents of register rt to address label sw rt, (rb) # store word write the contents of register rt to address (rb) sw rt, x(rb) # store word write the contents of register rt to address X + (rb) example: sw $10, c2 the value 0xffffffee is placed into the word of memory at address 0x00002004Branch------all the branch instructions for MAL look just like the ones fromSAL! (on purpose). Just be sure that you use one that exists!The only difference worth mentioning is that the operands arerequired to be in registers. example: beq $20, $23, branchtarget Compare the values in registers 20 and 23. If the values are the same, then load the PC with the address branchtarget If not, then do nothing and fetch the next instruction. j target # jump target identical in effect to b target, but the implementation and execution are different (wrt the machine code). A branch specifies an offset to be added to the current value of the PC. A jump gives as many bits of address as possible, and the remaining ones come from the PC (no addition).Arithmetic/Logical------------------
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -