📄 http:^^www.cs.wisc.edu^~cs354-2^cs354^lec.notes^arch.features.html
字号:
1 2 3 4 5 6 7 8 . . . 1 F E 2 F E 3 F E 4 F E time for 1 instruction = 2 time units (INSTRUCTION LATENCY) rate of instruction execution = pipeline depth * (1 / time for ) (INSTRUCTION THROUGHPUT) 1 instruction = 2 * (1 / 2) = 1 per time unit 5-stage pipeline ---------------- a currently popular pipelined implementation (R2000/3000 has 5 stages, R6000 has 5 stages (but different), R4000 has 8 stages) steps: IF -- instruction fetch ID -- instruction decode (and get operands from registers) EX -- ALU operation (can be effective address calculation) MA -- memory access WB -- write back (results written to register(s)) which time unitsinstruction 1 2 3 4 5 6 7 8 . . . 1 IF ID EX MA WB 2 IF ID EX MA WB 3 IF ID EX MA WB INSTRUCTION LATENCY = 5 time units INSTRUCTION THROUGHPUT = 5 * (1 / 5) = 1 instruction per time unitunfortunately, pipelining introduces other difficulties. . . data dependencies ----------------- suppose we have the following code: lw $8, data1 addi $9, $8, 1 the data loaded doesn't get written to $8 until WB, but the addi instruction wants to get the data out of $8 it its ID stage. . . which time unitsinstruction 1 2 3 4 5 6 7 8 . . . lw IF ID EX MA WB ^^ addi IF ID EX MA WB ^^ the simplest solution is to STALL the pipeline. (Also called HOLES, HICCOUGHS or BUBBLES in the pipe.) which time unitsinstruction 1 2 3 4 5 6 7 8 . . . lw IF ID EX MA WB ^^ addi IF ID ID ID EX MA WB ^^ ^^ ^^ (pipeline stalling) A DATA DEPENDENCY (also called a HAZARD) causes performance to decrease. more on data dependencies ------------------------- Read After Write (RAW) -- (example given), a read of data is needed before it has been written Given for completeness, not a difficulty to current pipelines in practice, since the only writing occurs as the last stage. Write After Read (WAR) -- Write After Write (WAR) -- NOTE: there is no difficulty implementing a 2-stage pipeline due to DATA dependencies! control dependencies -------------------- what happens to a pipeline in the case of branch instructions? MAL CODE SEQUENCE: b label1 addi $9, $8, 1label1: mult $8, $9 which time unitsinstruction 1 2 3 4 5 6 7 8 . . . b IF ID EX MA WB ^^ (PC changed here) addi IF ID EX MA WB ^^ (WRONG instruction fetched here!) whenever the PC changes (except for PC <- PC + 4), we have a CONTROL DEPENDENCY. CONTROL DEPENDENCIES break pipelines. They cause performance to plummet. So, lots of (partial) solutions have been implemented to try to help the situation. Worst case, the pipeline must be stalled such that instructions are going through sequentially. Note that just stalling doesn't really help, since the (potentially) wrong instruction is fetched before it is determined that the previous instruction is a branch.BRANCHES and PIPELINING----------------------- (or, how to minimize the effect of control dependencies on pipelines.) easiest solution (poor performance) Cancel anything (later) in the pipe when a branch (jump) is decoded. This works as long as nothing changes the program's state before the cancellation. Then let the branch instruction finish (flush the pipe), and start up again. which time units instruction 1 2 3 4 5 6 7 8 . . . b IF ID EX MA WB ^^ (PC changed here) addi IF IF ID EX MA WB ^^ (cancelled) branch Prediction (static or dynamic) add lots of extra hardware to try to help. a) (static) assume that the branch will not be taken When the decision is made, the hw "knows" if the correct instruction has been partially executed. If the correct instruction is currently in the pipe, let it (and all those after it) continue. Then, there will be NO holes in the pipe. If the incorrect instruction is currently in the pipe, (meaning that the branch was taken), then all instructions currently in the pipe subsequent to the branch must be BACKED OUT. b) (dynamic) A variation of (a). Have some extra hw that keeps track of which branches have been taken in the recent past. Design the hw to presume that a branch will be taken the same way it was previously. If the guess is wrong, back out as in (a). Question for the advanced student: Which is better, (a) or (b)? Why? NOTE: solution (a) works quite well with currently popular pipeline solutions, because no state information is changed until the very last stage of an instruction. As long as the last stage hasn't started, backing out is a matter of stopping the last stage from occuring and getting the PC right. separate test from branch make the conditional test and address calculation separate instructions from the one that changes the PC. This reduces the number of holes in the pipe. delayed branch MIPS solution. The concept: prediction is always wrong sometime. There will be holes in the pipe when the prediction is wrong. So the goal is to reduce (eliminate?) the number of holes in the case of a branch. The mechanism: Have the effect of a branch (the change of the PC) be delayed until a subsequent instruction. This means that the instruction following a branch is executed independent of whether the branch is to be taken or not. (NOTE: the simulator completely ignores this delayed branch mechanism!) code example: add $8, $9, $10 beq $3, $4, label move $18, $5 . . . label: sub $20, $21, $22 is turned into the following by a MIPS assembler: add $8, $9, $10 beq $3, $4, label nop # really a pipeline hole, the DELAY SLOT move $18, $5 . . . label: sub $20, $21, $22 If the assembler has any smarts at all, it would REARRANGE the code to be beq $3, $4, label add $8, $9, $10 move $18, $5 . . . label: sub $20, $21, $22 This code can be rearranged only if there are no data dependencies between the branch and the add instructions. In fact, any instruction from before the branch (and after any previous branch) can be moved into the DELAY SLOT, as long as there are no dependencies on it. Delayed branching depends on a smart assembler (sw) to make the hardware perform at peak efficiency. This is a general trend in the field of computer science. Let the sw do more and more to improve performance of the hw. squashing A fancy name for branch prediction that always presumes the branch will be taken, and keeps a copy of the PC that will be needed in the case of backing out. condition codes a historically significant way of branching. Condition codes were used on MANY machines before pipelining became popular. 4 1-bit registers (condition code register): N -- negative V -- overflow P -- positive Z -- zero The result of an instruction set these 4 bits. Conditional branches were then based on these flags. Example: bn label # branch to label if the N bit is set Earlier computers had virtually every instruction set the condition codes. This had the effect that the test (for the branch) needed to come directly before the branch. Example: sub r3, r4, r5 # blt $4, $5, label bn label A performance improvement (sometimes) to this allowed the programmer to explicitly specify which instructions should set the condition codes. In this way, (on a pipelined machine) the test could be separated from the branch, resulting in fewer pipeline holes due to data dependencies.Amdahl's Law------------(Or why the common case matters most)speedup = new rate / old rate = old execution time / new execution timeWe program in some enhancement to part of our program. The fraction of time spent in that part of the code is f. The speedup of that part of the code (f) is S. ( Let an enhancement speedup f fraction of the time by speedup S)speedup = [(1-f)+f]*old time / (1-f) * old time + f/S * old time = 1 --------- 1-f + f/SExamples f S speedup --- --- ------- 95% 1.10 1.094 5% 10 1.047 5% inf 1.052lim 1 --------- = 1/ 1-fS --> inf 1-f + f/S f speedup --- ------- 1% 1.01 2% 1.02 5% 1.05 10% 1.11 20% 1.25 50% 2.00This says that we should concentrate on the common case!</pre>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -