⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 http:^^www.cs.wisc.edu^~cs354-2^cs354^lec.notes^arch.features.html

📁 This data set contains WWW-pages collected from computer science departments of various universities
💻 HTML
📖 第 1 页 / 共 2 页
字号:
			1  2  3  4  5  6  7  8 . . .       1                F  E       2                   F  E       3                      F  E       4                         F  E              time for 1 instruction =  2 time units	 (INSTRUCTION LATENCY)       rate of instruction execution = pipeline depth * (1 / time for     )         (INSTRUCTION THROUGHPUT)                           1 instruction				     =        2       * (1 /   2)				     =   1 per time unit 5-stage pipeline ---------------- a currently popular pipelined implementation    (R2000/3000 has 5 stages, R6000 has 5 stages (but different),     R4000 has 8 stages)     steps:	IF -- instruction fetch	ID -- instruction decode (and get operands from registers)	EX -- ALU operation (can be effective address calculation)	MA -- memory access	WB -- write back (results written to register(s))    which       time unitsinstruction   1   2   3   4   5   6   7  8 . . .     1        IF  ID  EX  MA  WB     2            IF  ID  EX  MA  WB     3                IF  ID  EX  MA  WB    INSTRUCTION LATENCY = 5 time units    INSTRUCTION THROUGHPUT = 5 * (1 / 5) = 1 instruction per time unitunfortunately, pipelining introduces other difficulties. . . data dependencies ----------------- suppose we have the following code:   lw   $8, data1   addi $9, $8, 1   the data loaded doesn't get written to $8 until WB,     but the addi instruction wants to get the data out of $8     it its ID stage. . .    which       time unitsinstruction   1   2   3   4   5   6   7  8 . . .    lw        IF  ID  EX  MA  WB			      ^^    addi          IF  ID  EX  MA  WB		      ^^		the simplest solution is to STALL the pipeline.	(Also called HOLES, HICCOUGHS or BUBBLES in the pipe.)    which       time unitsinstruction   1   2   3   4   5   6   7   8 . . .    lw        IF  ID  EX  MA  WB			      ^^    addi          IF  ID  ID  ID  EX  MA  WB		      ^^  ^^  ^^ (pipeline stalling)   A DATA DEPENDENCY (also called a HAZARD) causes performance to     decrease.   more on data dependencies -------------------------   Read After Write (RAW) --     (example given), a read of data is needed before it has been written   Given for completeness, not a difficulty to current pipelines in     practice, since the only writing occurs as the last stage.         Write After Read (WAR) --         Write After Write (WAR) --   NOTE:  there is no difficulty implementing a 2-stage pipeline   due to DATA dependencies! control dependencies -------------------- what happens to a pipeline in the case of branch instructions? MAL CODE SEQUENCE:        b  label1        addi  $9, $8, 1label1: mult $8, $9    which       time unitsinstruction   1   2   3   4   5   6   7  8 . . .     b        IF  ID  EX  MA  WB			      ^^ (PC changed here)    addi          IF  ID  EX  MA  WB		  ^^  (WRONG instruction fetched here!)		whenever the PC changes (except for PC <- PC + 4),	we have a CONTROL DEPENDENCY.	CONTROL DEPENDENCIES break pipelines.  They cause	performance to plummet.	So, lots of (partial) solutions have been implemented	to try to help the situation.	  Worst case, the pipeline must be stalled such that	  instructions are going through sequentially.	Note that just stalling doesn't really help, since	  the (potentially) wrong instruction is fetched	  before it is determined that the previous instruction	  is a branch.BRANCHES and PIPELINING----------------------- (or, how to minimize the effect of control dependencies on pipelines.) easiest solution (poor performance)    Cancel anything (later) in the pipe when a branch (jump) is decoded.    This works as long as nothing changes the program's state    before the cancellation.  Then let the branch instruction    finish (flush the pipe), and start up again.       which       time units   instruction   1   2   3   4   5   6   7  8 . . .        b        IF  ID  EX  MA  WB			         ^^ (PC changed here)       addi          IF              IF  ID  EX  MA  WB		     ^^ (cancelled)  branch Prediction (static or dynamic)   add lots of extra hardware to try to help.   a)  (static)  assume that the branch will not be taken       When the decision is made, the hw "knows" if the correct       instruction has been partially executed.       If the correct instruction is currently in the pipe,	 let it (and all those after it) continue.  Then,	 there will be NO holes in the pipe.       If the incorrect instruction is currently in the pipe,	 (meaning that the branch was taken), then all instructions	 currently in the pipe subsequent to the branch must	 be BACKED OUT.          b)  (dynamic) A variation of (a).         Have some extra hw that keeps track of which branches have       been taken in the recent past.  Design the hw to presume that       a branch will be taken the same way it was previously.       If the guess is wrong, back out as in (a).       Question for the advanced student:  Which is better, (a) or (b)? Why?   NOTE:  solution (a) works quite well with currently popular      pipeline solutions, because no state information is changed      until the very last stage of an instruction.  As long as      the last stage hasn't started, backing out is a matter      of stopping the last stage from occuring and getting the      PC right. separate test from branch   make the conditional test and address calculation   separate instructions from the one that changes the PC.   This reduces the number of holes in the pipe. delayed branch   MIPS solution.   The concept:  prediction is always wrong   sometime.  There will be holes in the pipe when the prediction   is wrong.  So the goal is to reduce (eliminate?) the number of   holes in the case of a branch.   The mechanism:     Have the effect of a branch (the change of the PC) be delayed     until a subsequent instruction.  This means that the instruction     following a branch is executed independent of whether the     branch is to be taken or not.     (NOTE: the simulator completely ignores this delayed branch      mechanism!)      code example:		  add $8, $9, $10	  beq $3, $4,  label	  move $18, $5	  .	  .	  .    label:  sub $20, $21, $22  is turned into the following by a MIPS assembler:	  add $8, $9, $10	  beq $3, $4,  label	  nop                  # really a pipeline hole, the DELAY SLOT	  move $18, $5	  .	  .	  .    label:  sub $20, $21, $22  If the assembler has any smarts at all, it would REARRANGE  the code to be	  beq $3, $4,  label	  add $8, $9, $10	  move $18, $5	  .	  .	  .    label:  sub $20, $21, $22  This code can be rearranged only if there are no data  dependencies between the branch and the add instructions.  In fact, any instruction from before the branch (and after any  previous branch) can be moved into the DELAY SLOT, as long as  there are no dependencies on it.  Delayed branching depends on a smart assembler (sw) to make  the hardware perform at peak efficiency.  This is a general  trend in the field of computer science.  Let the sw do more  and more to improve performance of the hw. squashing   A fancy name for branch prediction that always presumes the   branch will be taken,  and keeps a copy of the PC that will   be needed in the case of backing out. condition codes   a historically significant way of branching.  Condition codes   were used on MANY machines before pipelining became popular.   4 1-bit registers (condition code register):     N -- negative     V -- overflow     P -- positive     Z -- zero  The result of an instruction set these 4 bits.  Conditional branches were then based on these flags.  Example:  bn label       # branch to label if the N bit is set  Earlier computers had virtually every instruction set the  condition codes.  This had the effect that the test (for  the branch) needed to come directly before the branch.  Example:  	sub r3, r4, r5    # blt $4, $5, label 	bn  label  A performance improvement (sometimes) to this allowed the  programmer to explicitly specify which instructions should  set the condition codes.  In this way, (on a pipelined machine)  the test could be separated from the branch, resulting in  fewer pipeline holes due to data dependencies.Amdahl's Law------------(Or why the common case matters most)speedup = new rate / old rate         = old execution time / new execution timeWe program in some enhancement to part of our program.  The fraction of time spent in that part of the code is f.  The speedup of that part of the code (f) is S.  ( Let an enhancement speedup f fraction of the time by speedup S)speedup = [(1-f)+f]*old time / (1-f) * old time + f/S * old time	=    1	  ---------	  1-f + f/SExamples	    f	    S		speedup	   ---	   ---		-------	   95%	   1.10		1.094	    5%	   10		1.047	    5%	   inf		1.052lim		   1		---------	=  1/ 1-fS --> inf	1-f + f/S		 f	speedup	---	-------	1%      1.01	2%      1.02	5%      1.05	10%     1.11	20%     1.25	50%     2.00This says that we should concentrate on the common case!</pre>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -