📄 sh.md

📁 gcc3.2.1源代码
💻 MD
📖 第 1 页 / 共 5 页
字号:
;; Moreover, we need a way to specify the latency of insns that don't;; use an actual function unit.;; We use an 'issue' function unit to do that, and a cost factor of 10.(define_function_unit "issue" 2 0  (and (eq_attr "issues" "2") (eq_attr "type" "!nil,arith3"))  10 10)(define_function_unit "issue" 2 0  (and (eq_attr "issues" "2") (eq_attr "type" "arith3"))  30 30);; There is no point in providing exact scheduling information about branches,;; because they are at the starts / ends of basic blocks anyways.;; Some insns cannot be issued before/after another insn in the same cycle,;; irrespective of the type of the other insn.;; default is dual-issue, but can't be paired with an insn that;; uses multiple function units.(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "!smpy,dmpy,pload,pstore,dfp_cmp,gp_fpul,call,sfunc,arith3,arith3b"))  1 10  [(eq_attr "type" "smpy,dmpy,pload,pstore,dfp_cmp,gp_fpul")])(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "smpy,dmpy,pload,pstore,dfp_cmp,gp_fpul"))  10 10  [(const_int 1)]);; arith3 insns are always pairable at the start, but not inecessarily at;; the end; however, there doesn't seem to be a way to express that.(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "arith3"))  30 20  [(const_int 1)]);; arith3b insn are pairable at the end and have latency that prevents pairing;; with the following branch, but we don't want this latency be respected;;; When the following branch is immediately adjacent, we can redirect the;; internal branch, which is likly to be a larger win.(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "arith3b"))  20 20  [(const_int 1)]);; calls introduce a longisch delay that is likely to flush the pipelines.(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "call,sfunc"))  160 160  [(eq_attr "type" "!call") (eq_attr "type" "call")]);; Load and store instructions have no alignment peculiarities for the SH4,;; but they use the load-store unit, which they share with the fmove type;; insns (fldi[01]; fmov frn,frm; flds; fsts; fabs; fneg) .;; Loads have a latency of two.;; However, call insns can only paired with a preceding insn, and have;; a delay slot, so that we want two more insns to be scheduled between the;; load of the function address and the call.  This is equivalent to a;; latency of three.;; We cannot use a conflict list for this, because we need to distinguish;; between the actual call address and the function arguments.;; ADJUST_COST can only properly handle reductions of the cost, so we;; use a latency of three here, which gets multiplied by 10 to yield 30.;; We only do this for SImode loads of general registers, to make the work;; for ADJUST_COST easier.;; When specifying different latencies for different insns using the;; the same function unit, genattrtab.c assumes a 'FIFO constraint';; so that the blockage is at least READY-COST (E) + 1 - READY-COST (C);; for an executing insn E and a candidate insn C.;; Therefore, we define three different function units for load_store:;; load_store, load and load_si.(define_function_unit "load_si" 1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "load_si,pcload_si")) 30 10)(define_function_unit "load" 1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "load,pcload,pload")) 20 10)(define_function_unit "load_store" 1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "load_si,pcload_si,load,pcload,pload,store,pstore,fmove"))  10 10)(define_function_unit "int"    1 0  (and (eq_attr "issues" "2") (eq_attr "type" "arith,dyn_shift")) 10 10);; Again, we have to pretend a lower latency for the "int" unit to avoid a;; spurious FIFO constraint; the multiply instructions use the "int";; unit actually only for two cycles.(define_function_unit "int"    1 0  (and (eq_attr "issues" "2") (eq_attr "type" "smpy,dmpy")) 20 20);; We use a fictous "mpy" unit to express the actual latency.(define_function_unit "mpy"    1 0  (and (eq_attr "issues" "2") (eq_attr "type" "smpy,dmpy")) 40 20);; Again, we have to pretend a lower latency for the "int" unit to avoid a;; spurious FIFO constraint.(define_function_unit "int"     1 0  (and (eq_attr "issues" "2") (eq_attr "type" "gp_fpul")) 10 10);; We use a fictous "gp_fpul" unit to express the actual latency.(define_function_unit "gp_fpul"     1 0  (and (eq_attr "issues" "2") (eq_attr "type" "gp_fpul")) 20 10);; ??? multiply uses the floating point unit, but with a two cycle delay.;; Thus, a simple single-precision fp operation could finish if issued in;; the very next cycle, but stalls when issued two or three cycles later.;; Similarily, a divide / sqrt can work without stalls if issued in;; the very next cycle, while it would have to block if issued two or;; three cycles later.;; There is no way to model this with gcc's function units.  This problem is;; actually mentioned in md.texi.  Tackling this problem requires first that;; it is possible to speak about the target in an open discussion.;; ;; However, simple double-precision operations always conflict.(define_function_unit "fp"    1 0  (and (eq_attr "issues" "2") (eq_attr "type" "smpy,dmpy")) 40 40  [(eq_attr "type" "dfp_cmp,dfp_conv,dfp_arith")]);; The "fp" unit is for pipeline stages F1 and F2.(define_function_unit "fp"     1 0  (and (eq_attr "issues" "2") (eq_attr "type" "fp")) 30 10);; Again, we have to pretend a lower latency for the "fp" unit to avoid a;; spurious FIFO constraint; the bulk of the fdiv type insns executes in;; the F3 stage.(define_function_unit "fp"     1 0  (and (eq_attr "issues" "2") (eq_attr "type" "fdiv")) 30 10);; The "fdiv" function unit models the aggregate effect of the F1, F2 and F3;; pipeline stages on the pipelining of fdiv/fsqrt insns.;; We also use it to give the actual latency here.;; fsqrt is actually one cycle faster than fdiv (and the value used here),;; but that will hardly matter in practice for scheduling.(define_function_unit "fdiv"     1 0  (and (eq_attr "issues" "2") (eq_attr "type" "fdiv")) 120 100);; There is again a late use of the "fp" unit by [d]fdiv type insns;; that we can't express.(define_function_unit "fp"     1 0  (and (eq_attr "issues" "2") (eq_attr "type" "dfp_cmp,dfp_conv")) 40 20)(define_function_unit "fp"     1 0  (and (eq_attr "issues" "2") (eq_attr "type" "dfp_arith")) 80 60)(define_function_unit "fp"     1 0  (and (eq_attr "issues" "2") (eq_attr "type" "dfdiv")) 230 10)(define_function_unit "fdiv"     1 0  (and (eq_attr "issues" "2") (eq_attr "type" "dfdiv")) 230 210);; This should be enough for pt insns to be moved 5 insns ahead of;; corresponding branches.(define_function_unit "pt" 1 0  (eq_attr "type" "pt,ptabs") 10 2); Definitions for filling branch delay slots.(define_attr "needs_delay_slot" "yes,no" (const_string "no"));; ??? This should be (nil) instead of (const_int 0)(define_attr "hit_stack" "yes,no"	(cond [(eq (symbol_ref "find_regno_note (insn, REG_INC, SP_REG)")		   (const_int 0))	       (const_string "no")]	      (const_string "yes")))(define_attr "interrupt_function" "no,yes"  (const (symbol_ref "current_function_interrupt")))(define_attr "in_delay_slot" "yes,no"  (cond [(eq_attr "type" "cbranch") (const_string "no")	 (eq_attr "type" "pcload,pcload_si") (const_string "no")	 (eq_attr "needs_delay_slot" "yes") (const_string "no")	 (eq_attr "length" "2") (const_string "yes")	 ] (const_string "no")))(define_attr "is_sfunc" ""  (if_then_else (eq_attr "type" "sfunc") (const_int 1) (const_int 0)))(define_delay  (eq_attr "needs_delay_slot" "yes")  [(eq_attr "in_delay_slot" "yes") (nil) (nil)]);; On the SH and SH2, the rte instruction reads the return pc from the stack,;; and thus we can't put a pop instruction in its delay slot.;; ??? On the SH3, the rte instruction does not use the stack, so a pop;; instruction can go in the delay slot.;; Since a normal return (rts) implicitly uses the PR register,;; we can't allow PR register loads in an rts delay slot.(define_delay  (eq_attr "type" "return")  [(and (eq_attr "in_delay_slot" "yes")	(ior (and (eq_attr "interrupt_function" "no")		  (eq_attr "type" "!pload,prset"))	     (and (eq_attr "interrupt_function" "yes")		  (ior		   (ne (symbol_ref "TARGET_SH3") (const_int 0))		   (eq_attr "hit_stack" "no"))))) (nil) (nil)]);; Since a call implicitly uses the PR register, we can't allow;; a PR register store in a jsr delay slot.(define_delay  (ior (eq_attr "type" "call") (eq_attr "type" "sfunc"))  [(and (eq_attr "in_delay_slot" "yes")	(eq_attr "type" "!pstore,prget")) (nil) (nil)]);; Say that we have annulled true branches, since this gives smaller and;; faster code when branches are predicted as not taken.(define_delay  (and (eq_attr "type" "cbranch")       (ne (symbol_ref "TARGET_SH2") (const_int 0)))  [(eq_attr "in_delay_slot" "yes") (eq_attr "in_delay_slot" "yes") (nil)]);; -------------------------------------------------------------------------;; SImode signed integer comparisons;; -------------------------------------------------------------------------(define_insn ""  [(set (reg:SI T_REG)	(eq:SI (and:SI (match_operand:SI 0 "arith_reg_operand" "z,r")		       (match_operand:SI 1 "arith_operand" "L,r"))	       (const_int 0)))]  "TARGET_SH1"  "tst	%1,%0");; ??? Perhaps should only accept reg/constant if the register is reg 0.;; That would still allow reload to create cmpi instructions, but would;; perhaps allow forcing the constant into a register when that is better.;; Probably should use r0 for mem/imm compares, but force constant into a;; register for pseudo/imm compares.(define_insn "cmpeqsi_t"  [(set (reg:SI T_REG)	(eq:SI (match_operand:SI 0 "arith_reg_operand" "r,z,r")	       (match_operand:SI 1 "arith_operand" "N,rI,r")))]  "TARGET_SH1"  "@	tst	%0,%0	cmp/eq	%1,%0	cmp/eq	%1,%0")(define_insn "cmpgtsi_t"  [(set (reg:SI T_REG)	(gt:SI (match_operand:SI 0 "arith_reg_operand" "r,r")	       (match_operand:SI 1 "arith_reg_or_0_operand" "r,N")))]  "TARGET_SH1"  "@	cmp/gt	%1,%0	cmp/pl	%0")(define_insn "cmpgesi_t"  [(set (reg:SI T_REG)	(ge:SI (match_operand:SI 0 "arith_reg_operand" "r,r")	       (match_operand:SI 1 "arith_reg_or_0_operand" "r,N")))]  "TARGET_SH1"  "@	cmp/ge	%1,%0	cmp/pz	%0");; -------------------------------------------------------------------------;; SImode unsigned integer comparisons;; -------------------------------------------------------------------------(define_insn "cmpgeusi_t"  [(set (reg:SI T_REG)	(geu:SI (match_operand:SI 0 "arith_reg_operand" "r")		(match_operand:SI 1 "arith_reg_operand" "r")))]  "TARGET_SH1"  "cmp/hs	%1,%0")(define_insn "cmpgtusi_t"  [(set (reg:SI T_REG)	(gtu:SI (match_operand:SI 0 "arith_reg_operand" "r")		(match_operand:SI 1 "arith_reg_operand" "r")))]  "TARGET_SH1"  "cmp/hi	%1,%0");; We save the compare operands in the cmpxx patterns and use them when;; we generate the branch.(define_expand "cmpsi"  [(set (reg:SI T_REG)	(compare (match_operand:SI 0 "arith_operand" "")		 (match_operand:SI 1 "arith_operand" "")))]  "TARGET_SH1"  "{  sh_compare_op0 = operands[0];  sh_compare_op1 = operands[1];  DONE;}");; -------------------------------------------------------------------------;; DImode signed integer comparisons;; -------------------------------------------------------------------------;; ??? Could get better scheduling by splitting the initial test from the;; rest of the insn after reload.  However, the gain would hardly justify;; the sh.md size increase necessary to do that.(define_insn ""  [(set (reg:SI T_REG)	(eq:SI (and:DI (match_operand:DI 0 "arith_reg_operand" "r")		       (match_operand:DI 1 "arith_operand" "r"))	       (const_int 0)))]  "TARGET_SH1"  "* return output_branchy_insn (EQ, \"tst\\t%S1,%S0\;bf\\t%l9\;tst\\t%R1,%R0\",				 insn, operands);"  [(set_attr "length" "6")   (set_attr "type" "arith3b")])(define_insn "cmpeqdi_t"  [(set (reg:SI T_REG)	(eq:SI (match_operand:DI 0 "arith_reg_operand" "r,r")	       (match_operand:DI 1 "arith_reg_or_0_operand" "N,r")))]  "TARGET_SH1"  "@	tst	%S0,%S0\;bf	%,Ldi%=\;tst	%R0,%R0\\n%,Ldi%=:	cmp/eq	%S1,%S0\;bf	%,Ldi%=\;cmp/eq	%R1,%R0\\n%,Ldi%=:"  [(set_attr "length" "6")   (set_attr "type" "arith3b")])(define_split  [(set (reg:SI T_REG)	(eq:SI (match_operand:DI 0 "arith_reg_operand" "r,r")	       (match_operand:DI 1 "arith_reg_or_0_operand" "N,r")))];; If we applied this split when not optimizing, it would only be;; applied during the machine-dependent reorg, when no new basic blocks;; may be created.  "TARGET_SH1 && reload_completed && optimize"  [(set (reg:SI T_REG) (eq:SI (match_dup 2) (match_dup 3)))   (set (pc) (if_then_else (eq (reg:SI T_REG) (const_int 0))			   (label_ref (match_dup 6))			   (pc)))   (set (reg:SI T_REG) (eq:SI (match_dup 4) (match_dup 5)))   (match_dup 6)]  "{  operands[2]    = gen_rtx_REG (SImode,		   true_regnum (operands[0]) + (TARGET_LITTLE_ENDIAN ? 1 : 0));  operands[3]    = (operands[1] == const0_rtx       ? const0_rtx       : gen_rtx_REG (SImode,		      true_regnum (operands[1])		      + (TARGET_LITTLE_ENDIAN ? 1 : 0)));  operands[4] = gen_lowpart (SImode, operands[0]);  operands[5] = gen_lowpart (SImode, operands[1]);  operands[6] = gen_label_rtx ();}")(define_insn "cmpgtdi_t"  [(set (reg:SI T_REG)	(gt:SI (match_operand:DI 0 "arith_reg_operand" "r,r")	       (match_operand:DI 1 "arith_reg_or_0_operand" "r,N")))]  "TARGET_SH2"  "@	cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/gt\\t%S1,%S0\;cmp/hi\\t%R1,%R0\\n%,Ldi%=:	tst\\t%S0,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/pl\\t%S0\;cmp/hi\\t%S0,%R0\\n%,Ldi%=:"  [(set_attr "length" "8")   (set_attr "type" "arith3")])(define_insn "cmpgedi_t"  [(set (reg:SI T_REG)	(ge:SI (match_operand:DI 0 "arith_reg_operand" "r,r")	       (match_operand:DI 1 "arith_reg_or_0_operand" "r,N")))]  "TARGET_SH2"  "@	cmp/eq\\t%S1,%S0\;bf{.|/}s\\t%,Ldi%=\;cmp/ge\\t%S1,%S0\;cmp/hs\\t%R1,%R0\\n%,Ldi%=:	cmp/pz\\t%S0"  [(set_attr "length" "8,2")   (set_attr "type" "arith3,arith")]);; -------------------------------------------------------------------------;; DImode unsigned integer comparisons;; -------------------------------------------------------------------------
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -