⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 sh.md

📁 gcc编译工具没有什么特别
💻 MD
📖 第 1 页 / 共 5 页
字号:
;;- Machine description for the Hitachi SH.;;  Copyright (C) 1993 - 1999 Free Software Foundation, Inc.;;  Contributed by Steve Chamberlain (sac@cygnus.com).;;  Improved by Jim Wilson (wilson@cygnus.com).;; This file is part of GNU CC.;; GNU CC is free software; you can redistribute it and/or modify;; it under the terms of the GNU General Public License as published by;; the Free Software Foundation; either version 2, or (at your option);; any later version.;; GNU CC is distributed in the hope that it will be useful,;; but WITHOUT ANY WARRANTY; without even the implied warranty of;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the;; GNU General Public License for more details.;; You should have received a copy of the GNU General Public License;; along with GNU CC; see the file COPYING.  If not, write to;; the Free Software Foundation, 59 Temple Place - Suite 330,;; Boston, MA 02111-1307, USA.;; ??? Should prepend a * to all pattern names which are not used.;; This will make the compiler smaller, and rebuilds after changes faster.;; ??? Should be enhanced to include support for many more GNU superoptimizer;; sequences.  Especially the sequences for arithmetic right shifts.;; ??? Should check all DImode patterns for consistency and usefulness.;; ??? The MAC.W and MAC.L instructions are not supported.  There is no;; way to generate them.;; ??? The cmp/str instruction is not supported.  Perhaps it can be used;; for a str* inline function.;; BSR is not generated by the compiler proper, but when relaxing, it;; generates .uses pseudo-ops that allow linker relaxation to create;; BSR.  This is actually implemented in bfd/{coff,elf32}-sh.c;; Special constraints for SH machine description:;;;;    t -- T;;    x -- mac;;    l -- pr;;    z -- r0;;;; Special formats used for outputting SH instructions:;;;;   %.  --  print a .s if insn needs delay slot;;   %@  --  print rte/rts if is/isn't an interrupt function;;   %#  --  output a nop if there is nothing to put in the delay slot;;   %O  --  print a constant without the #;;   %R  --  print the lsw reg of a double;;   %S  --  print the msw reg of a double;;   %T  --  print next word of a double REG or MEM;;;; Special predicates:;;;;  arith_operand          -- operand is valid source for arithmetic op;;  arith_reg_operand      -- operand is valid register for arithmetic op;;  general_movdst_operand -- operand is valid move destination;;  general_movsrc_operand -- operand is valid move source;;  logical_operand        -- operand is valid source for logical op;; -------------------------------------------------------------------------;; Attributes;; -------------------------------------------------------------------------;; Target CPU.(define_attr "cpu" "sh1,sh2,sh3,sh3e,sh4"  (const (symbol_ref "sh_cpu_attr")))(define_attr "endian" "big,little" (const (if_then_else (symbol_ref "TARGET_LITTLE_ENDIAN")		      (const_string "little") (const_string "big"))))(define_attr "fmovd" "yes,no"  (const (if_then_else (symbol_ref "TARGET_FMOVD")		       (const_string "yes") (const_string "no"))));; issues/clock(define_attr "issues" "1,2"  (const (if_then_else (symbol_ref "TARGET_SUPERSCALAR") (const_string "2") (const_string "1"))));; cbranch	conditional branch instructions;; jump		unconditional jumps;; arith	ordinary arithmetic;; arith3	a compound insn that behaves similarly to a sequence of;;		three insns of type arith;; arith3b	like above, but might end with a redirected branch;; load		from memory;; load_si	Likewise, SImode variant for general register.;; store	to memory;; move		register to register;; fmove	register to register, floating point;; smpy		word precision integer multiply;; dmpy		longword or doublelongword precision integer multiply;; return	rts;; pload	load of pr reg, which can't be put into delay slot of rts;; pstore	store of pr reg, which can't be put into delay slot of jsr;; pcload	pc relative load of constant value;; pcload_si	Likewise, SImode variant for general register.;; rte		return from exception;; sfunc	special function call with known used registers;; call		function call;; fp		floating point;; fdiv		floating point divide (or square root);; gp_fpul	move between general purpose register and fpul;; dfp_arith, dfp_cmp,dfp_conv;; dfdiv	double precision floating point divide (or square root);; nil		no-op move, will be deleted.(define_attr "type" "cbranch,jump,jump_ind,arith,arith3,arith3b,dyn_shift,other,load,load_si,store,move,fmove,smpy,dmpy,return,pload,pstore,pcload,pcload_si,rte,sfunc,call,fp,fdiv,dfp_arith,dfp_cmp,dfp_conv,dfdiv,gp_fpul,nil"  (const_string "other")); If a conditional branch destination is within -252..258 bytes away; from the instruction it can be 2 bytes long.  Something in the; range -4090..4100 bytes can be 6 bytes long.  All other conditional; branches are initially assumed to be 16 bytes long.; In machine_dependent_reorg, we split all branches that are longer than; 2 bytes.;; The maximum range used for SImode constant pool entrys is 1018.  A final;; instruction can add 8 bytes while only being 4 bytes in size, thus we;; can have a total of 1022 bytes in the pool.  Add 4 bytes for a branch;; instruction around the pool table, 2 bytes of alignment before the table,;; and 30 bytes of alignment after the table.  That gives a maximum total;; pool size of 1058 bytes.;; Worst case code/pool content size ratio is 1:2 (using asms).;; Thus, in the worst case, there is one instruction in front of a maximum;; sized pool, and then there are 1052 bytes of pool for every 508 bytes of;; code.  For the last n bytes of code, there are 2n + 36 bytes of pool.;; If we have a forward branch, the initial table will be put after the;; unconditional branch.;;;; ??? We could do much better by keeping track of the actual pcloads within;; the branch range and in the pcload range in front of the branch range.;; ??? This looks ugly because genattrtab won't allow if_then_else or cond;; inside an le.(define_attr "short_cbranch_p" "no,yes"  (cond [(ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0))	 (const_string "no")	 (leu (plus (minus (match_dup 0) (pc)) (const_int 252)) (const_int 506))	 (const_string "yes")	 (ne (symbol_ref "NEXT_INSN (PREV_INSN (insn)) != insn") (const_int 0))	 (const_string "no")	 (leu (plus (minus (match_dup 0) (pc)) (const_int 252)) (const_int 508))	 (const_string "yes")         ] (const_string "no")))(define_attr "med_branch_p" "no,yes"  (cond [(leu (plus (minus (match_dup 0) (pc)) (const_int 990))	      (const_int 1988))	 (const_string "yes")	 (ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0))	 (const_string "no")	 (leu (plus (minus (match_dup 0) (pc)) (const_int 4092))	      (const_int 8186))	 (const_string "yes")	 ] (const_string "no")))(define_attr "med_cbranch_p" "no,yes"  (cond [(leu (plus (minus (match_dup 0) (pc)) (const_int 988))	      (const_int 1986))	 (const_string "yes")	 (ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0))	 (const_string "no")	 (leu (plus (minus (match_dup 0) (pc)) (const_int 4090))	       (const_int 8184))	 (const_string "yes")	 ] (const_string "no")))(define_attr "braf_branch_p" "no,yes"  (cond [(ne (symbol_ref "! TARGET_SH2") (const_int 0))	 (const_string "no")	 (leu (plus (minus (match_dup 0) (pc)) (const_int 10330))	      (const_int 20660))	 (const_string "yes")	 (ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0))	 (const_string "no")	 (leu (plus (minus (match_dup 0) (pc)) (const_int 32764))	      (const_int 65530))	 (const_string "yes")	 ] (const_string "no")))(define_attr "braf_cbranch_p" "no,yes"  (cond [(ne (symbol_ref "! TARGET_SH2") (const_int 0))	 (const_string "no")	 (leu (plus (minus (match_dup 0) (pc)) (const_int 10328))	      (const_int 20658))	 (const_string "yes")	 (ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0))	 (const_string "no")	 (leu (plus (minus (match_dup 0) (pc)) (const_int 32762))	      (const_int 65528))	 (const_string "yes")	 ] (const_string "no"))); An unconditional jump in the range -4092..4098 can be 2 bytes long.; For wider ranges, we need a combination of a code and a data part.; If we can get a scratch register for a long range jump, the code; part can be 4 bytes long; otherwise, it must be 8 bytes long.; If the jump is in the range -32764..32770, the data part can be 2 bytes; long; otherwise, it must be 6 bytes long.; All other instructions are two bytes long by default.;; ??? This should use something like *branch_p (minus (match_dup 0) (pc)),;; but getattrtab doesn't understand this.(define_attr "length" ""  (cond [(eq_attr "type" "cbranch")	 (cond [(eq_attr "short_cbranch_p" "yes")		(const_int 2)		(eq_attr "med_cbranch_p" "yes")		(const_int 6)		(eq_attr "braf_cbranch_p" "yes")		(const_int 12);; ??? using pc is not computed transitively.		(ne (match_dup 0) (match_dup 0))		(const_int 14)		] (const_int 16))	 (eq_attr "type" "jump")	 (cond [(eq_attr "med_branch_p" "yes")		(const_int 2)		(and (eq (symbol_ref "GET_CODE (PREV_INSN (insn))")			 (symbol_ref "INSN"))		     (eq (symbol_ref "INSN_CODE (PREV_INSN (insn))")			 (symbol_ref "code_for_indirect_jump_scratch")))		(if_then_else (eq_attr "braf_branch_p" "yes")			      (const_int 6)			      (const_int 10))		(eq_attr "braf_branch_p" "yes")		(const_int 10);; ??? using pc is not computed transitively.		(ne (match_dup 0) (match_dup 0))		(const_int 12)		] (const_int 14))	 ] (const_int 2)));; (define_function_unit {name} {num-units} {n-users} {test};;                       {ready-delay} {issue-delay} [{conflict-list}]);; Load and store instructions save a cycle if they are aligned on a;; four byte boundary.  Using a function unit for stores encourages;; gcc to separate load and store instructions by one instruction,;; which makes it more likely that the linker will be able to word;; align them when relaxing.;; Loads have a latency of two.;; However, call insns can have a delay slot, so that we want one more;; insn to be scheduled between the load of the function address and the call.;; This is equivalent to a latency of three.;; We cannot use a conflict list for this, because we need to distinguish;; between the actual call address and the function arguments.;; ADJUST_COST can only properly handle reductions of the cost, so we;; use a latency of three here.;; We only do this for SImode loads of general registers, to make the work;; for ADJUST_COST easier.(define_function_unit "memory" 1 0  (and (eq_attr "issues" "1")       (eq_attr "type" "load_si,pcload_si"))  3 2)(define_function_unit "memory" 1 0  (and (eq_attr "issues" "1")       (eq_attr "type" "load,pcload,pload,store,pstore"))  2 2)(define_function_unit "int"    1 0  (and (eq_attr "issues" "1") (eq_attr "type" "arith3,arith3b")) 3 3)(define_function_unit "int"    1 0  (and (eq_attr "issues" "1") (eq_attr "type" "dyn_shift")) 2 2)(define_function_unit "int"    1 0  (and (eq_attr "issues" "1") (eq_attr "type" "!arith3,arith3b,dyn_shift")) 1 1);; ??? These are approximations.(define_function_unit "mpy"    1 0  (and (eq_attr "issues" "1") (eq_attr "type" "smpy")) 2 2)(define_function_unit "mpy"    1 0  (and (eq_attr "issues" "1") (eq_attr "type" "dmpy")) 3 3)(define_function_unit "fp"     1 0  (and (eq_attr "issues" "1") (eq_attr "type" "fp,fmove")) 2 1)(define_function_unit "fp"     1 0  (and (eq_attr "issues" "1") (eq_attr "type" "fdiv")) 13 12);; SH4 scheduling;; The SH4 is a dual-issue implementation, thus we have to multiply all;; costs by at least two.;; There will be single increments of the modeled that don't correspond;; to the actual target ;; whenever two insns to be issued depend one a;; single resource, and the scheduler picks to be the first one.;; If we multiplied the costs just by two, just two of these single;; increments would amount to an actual cycle.  By picking a larger;; factor, we can ameliorate the effect; However, we then have to make sure;; that only two insns are modeled as issued per actual cycle.;; Moreover, we need a way to specify the latency of insns that don't;; use an actual function unit.;; We use an 'issue' function unit to do that, and a cost factor of 10.(define_function_unit "issue" 2 0  (and (eq_attr "issues" "2") (eq_attr "type" "!nil,arith3"))  10 10)(define_function_unit "issue" 2 0  (and (eq_attr "issues" "2") (eq_attr "type" "arith3"))  30 30);; There is no point in providing exact scheduling information about branches,;; because they are at the starts / ends of basic blocks anyways.;; Some insns cannot be issued before/after another insn in the same cycle,;; irrespective of the type of the other insn.;; default is dual-issue, but can't be paired with an insn that;; uses multiple function units.(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "!smpy,dmpy,pload,pstore,dfp_cmp,gp_fpul,call,sfunc,arith3,arith3b"))  1 10  [(eq_attr "type" "smpy,dmpy,pload,pstore,dfp_cmp,gp_fpul")])(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "smpy,dmpy,pload,pstore,dfp_cmp,gp_fpul"))  10 10  [(const_int 1)]);; arith3 insns are always pairable at the start, but not inecessarily at;; the end; however, there doesn;t seem to be a way to express that.(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "arith3"))  30 20  [(const_int 1)]);; arith3b insn are pairable at the end and have latency that prevents pairing;; with the following branch, but we don't want this latency be respected;;; When the following branch is immediately adjacent, we can redirect the;; internal branch, which is likly to be a larger win.(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "arith3b"))  20 20  [(const_int 1)]);; calls introduce a longisch delay that is likely to flush the pipelines.(define_function_unit "single_issue"     1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "call,sfunc"))  160 160  [(eq_attr "type" "!call") (eq_attr "type" "call")]);; Load and store instructions have no alignment peculiarities for the SH4,;; but they use the load-store unit, which they share with the fmove type;; insns (fldi[01]; fmov frn,frm; flds; fsts; fabs; fneg) .;; Loads have a latency of two.;; However, call insns can only paired with a preceding insn, and have;; a delay slot, so that we want two more insns to be scheduled between the;; load of the function address and the call.  This is equivalent to a;; latency of three.;; We cannot use a conflict list for this, because we need to distinguish;; between the actual call address and the function arguments.;; ADJUST_COST can only properly handle reductions of the cost, so we;; use a latency of three here, which gets multiplied by 10 to yield 30.;; We only do this for SImode loads of general registers, to make the work;; for ADJUST_COST easier.;; When specifying different latencies for different insns using the;; the same function unit, genattrtab.c assumes a 'FIFO constraint';; so that the blockage is at least READY-COST (E) + 1 - READY-COST (C);; for an executing insn E and a candidate insn C.;; Therefore, we define three different function units for load_store:;; load_store, load and load_si.(define_function_unit "load_si" 1 0  (and (eq_attr "issues" "2")       (eq_attr "type" "load_si,pcload_si")) 30 10)(define_function_unit "load" 1 0

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -