📄 sh.md
字号:
;;- Machine description for the Hitachi SH.;; Copyright (C) 1993 - 1999 Free Software Foundation, Inc.;; Contributed by Steve Chamberlain (sac@cygnus.com).;; Improved by Jim Wilson (wilson@cygnus.com).;; This file is part of GNU CC.;; GNU CC is free software; you can redistribute it and/or modify;; it under the terms of the GNU General Public License as published by;; the Free Software Foundation; either version 2, or (at your option);; any later version.;; GNU CC is distributed in the hope that it will be useful,;; but WITHOUT ANY WARRANTY; without even the implied warranty of;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the;; GNU General Public License for more details.;; You should have received a copy of the GNU General Public License;; along with GNU CC; see the file COPYING. If not, write to;; the Free Software Foundation, 59 Temple Place - Suite 330,;; Boston, MA 02111-1307, USA.;; ??? Should prepend a * to all pattern names which are not used.;; This will make the compiler smaller, and rebuilds after changes faster.;; ??? Should be enhanced to include support for many more GNU superoptimizer;; sequences. Especially the sequences for arithmetic right shifts.;; ??? Should check all DImode patterns for consistency and usefulness.;; ??? The MAC.W and MAC.L instructions are not supported. There is no;; way to generate them.;; ??? The cmp/str instruction is not supported. Perhaps it can be used;; for a str* inline function.;; BSR is not generated by the compiler proper, but when relaxing, it;; generates .uses pseudo-ops that allow linker relaxation to create;; BSR. This is actually implemented in bfd/{coff,elf32}-sh.c;; Special constraints for SH machine description:;;;; t -- T;; x -- mac;; l -- pr;; z -- r0;;;; Special formats used for outputting SH instructions:;;;; %. -- print a .s if insn needs delay slot;; %@ -- print rte/rts if is/isn't an interrupt function;; %# -- output a nop if there is nothing to put in the delay slot;; %O -- print a constant without the #;; %R -- print the lsw reg of a double;; %S -- print the msw reg of a double;; %T -- print next word of a double REG or MEM;;;; Special predicates:;;;; arith_operand -- operand is valid source for arithmetic op;; arith_reg_operand -- operand is valid register for arithmetic op;; general_movdst_operand -- operand is valid move destination;; general_movsrc_operand -- operand is valid move source;; logical_operand -- operand is valid source for logical op;; -------------------------------------------------------------------------;; Attributes;; -------------------------------------------------------------------------;; Target CPU.(define_attr "cpu" "sh1,sh2,sh3,sh3e,sh4" (const (symbol_ref "sh_cpu_attr")))(define_attr "endian" "big,little" (const (if_then_else (symbol_ref "TARGET_LITTLE_ENDIAN") (const_string "little") (const_string "big"))))(define_attr "fmovd" "yes,no" (const (if_then_else (symbol_ref "TARGET_FMOVD") (const_string "yes") (const_string "no"))));; issues/clock(define_attr "issues" "1,2" (const (if_then_else (symbol_ref "TARGET_SUPERSCALAR") (const_string "2") (const_string "1"))));; cbranch conditional branch instructions;; jump unconditional jumps;; arith ordinary arithmetic;; arith3 a compound insn that behaves similarly to a sequence of;; three insns of type arith;; arith3b like above, but might end with a redirected branch;; load from memory;; load_si Likewise, SImode variant for general register.;; store to memory;; move register to register;; fmove register to register, floating point;; smpy word precision integer multiply;; dmpy longword or doublelongword precision integer multiply;; return rts;; pload load of pr reg, which can't be put into delay slot of rts;; pstore store of pr reg, which can't be put into delay slot of jsr;; pcload pc relative load of constant value;; pcload_si Likewise, SImode variant for general register.;; rte return from exception;; sfunc special function call with known used registers;; call function call;; fp floating point;; fdiv floating point divide (or square root);; gp_fpul move between general purpose register and fpul;; dfp_arith, dfp_cmp,dfp_conv;; dfdiv double precision floating point divide (or square root);; nil no-op move, will be deleted.(define_attr "type" "cbranch,jump,jump_ind,arith,arith3,arith3b,dyn_shift,other,load,load_si,store,move,fmove,smpy,dmpy,return,pload,pstore,pcload,pcload_si,rte,sfunc,call,fp,fdiv,dfp_arith,dfp_cmp,dfp_conv,dfdiv,gp_fpul,nil" (const_string "other")); If a conditional branch destination is within -252..258 bytes away; from the instruction it can be 2 bytes long. Something in the; range -4090..4100 bytes can be 6 bytes long. All other conditional; branches are initially assumed to be 16 bytes long.; In machine_dependent_reorg, we split all branches that are longer than; 2 bytes.;; The maximum range used for SImode constant pool entrys is 1018. A final;; instruction can add 8 bytes while only being 4 bytes in size, thus we;; can have a total of 1022 bytes in the pool. Add 4 bytes for a branch;; instruction around the pool table, 2 bytes of alignment before the table,;; and 30 bytes of alignment after the table. That gives a maximum total;; pool size of 1058 bytes.;; Worst case code/pool content size ratio is 1:2 (using asms).;; Thus, in the worst case, there is one instruction in front of a maximum;; sized pool, and then there are 1052 bytes of pool for every 508 bytes of;; code. For the last n bytes of code, there are 2n + 36 bytes of pool.;; If we have a forward branch, the initial table will be put after the;; unconditional branch.;;;; ??? We could do much better by keeping track of the actual pcloads within;; the branch range and in the pcload range in front of the branch range.;; ??? This looks ugly because genattrtab won't allow if_then_else or cond;; inside an le.(define_attr "short_cbranch_p" "no,yes" (cond [(ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0)) (const_string "no") (leu (plus (minus (match_dup 0) (pc)) (const_int 252)) (const_int 506)) (const_string "yes") (ne (symbol_ref "NEXT_INSN (PREV_INSN (insn)) != insn") (const_int 0)) (const_string "no") (leu (plus (minus (match_dup 0) (pc)) (const_int 252)) (const_int 508)) (const_string "yes") ] (const_string "no")))(define_attr "med_branch_p" "no,yes" (cond [(leu (plus (minus (match_dup 0) (pc)) (const_int 990)) (const_int 1988)) (const_string "yes") (ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0)) (const_string "no") (leu (plus (minus (match_dup 0) (pc)) (const_int 4092)) (const_int 8186)) (const_string "yes") ] (const_string "no")))(define_attr "med_cbranch_p" "no,yes" (cond [(leu (plus (minus (match_dup 0) (pc)) (const_int 988)) (const_int 1986)) (const_string "yes") (ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0)) (const_string "no") (leu (plus (minus (match_dup 0) (pc)) (const_int 4090)) (const_int 8184)) (const_string "yes") ] (const_string "no")))(define_attr "braf_branch_p" "no,yes" (cond [(ne (symbol_ref "! TARGET_SH2") (const_int 0)) (const_string "no") (leu (plus (minus (match_dup 0) (pc)) (const_int 10330)) (const_int 20660)) (const_string "yes") (ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0)) (const_string "no") (leu (plus (minus (match_dup 0) (pc)) (const_int 32764)) (const_int 65530)) (const_string "yes") ] (const_string "no")))(define_attr "braf_cbranch_p" "no,yes" (cond [(ne (symbol_ref "! TARGET_SH2") (const_int 0)) (const_string "no") (leu (plus (minus (match_dup 0) (pc)) (const_int 10328)) (const_int 20658)) (const_string "yes") (ne (symbol_ref "mdep_reorg_phase <= SH_FIXUP_PCLOAD") (const_int 0)) (const_string "no") (leu (plus (minus (match_dup 0) (pc)) (const_int 32762)) (const_int 65528)) (const_string "yes") ] (const_string "no"))); An unconditional jump in the range -4092..4098 can be 2 bytes long.; For wider ranges, we need a combination of a code and a data part.; If we can get a scratch register for a long range jump, the code; part can be 4 bytes long; otherwise, it must be 8 bytes long.; If the jump is in the range -32764..32770, the data part can be 2 bytes; long; otherwise, it must be 6 bytes long.; All other instructions are two bytes long by default.;; ??? This should use something like *branch_p (minus (match_dup 0) (pc)),;; but getattrtab doesn't understand this.(define_attr "length" "" (cond [(eq_attr "type" "cbranch") (cond [(eq_attr "short_cbranch_p" "yes") (const_int 2) (eq_attr "med_cbranch_p" "yes") (const_int 6) (eq_attr "braf_cbranch_p" "yes") (const_int 12);; ??? using pc is not computed transitively. (ne (match_dup 0) (match_dup 0)) (const_int 14) ] (const_int 16)) (eq_attr "type" "jump") (cond [(eq_attr "med_branch_p" "yes") (const_int 2) (and (eq (symbol_ref "GET_CODE (PREV_INSN (insn))") (symbol_ref "INSN")) (eq (symbol_ref "INSN_CODE (PREV_INSN (insn))") (symbol_ref "code_for_indirect_jump_scratch"))) (if_then_else (eq_attr "braf_branch_p" "yes") (const_int 6) (const_int 10)) (eq_attr "braf_branch_p" "yes") (const_int 10);; ??? using pc is not computed transitively. (ne (match_dup 0) (match_dup 0)) (const_int 12) ] (const_int 14)) ] (const_int 2)));; (define_function_unit {name} {num-units} {n-users} {test};; {ready-delay} {issue-delay} [{conflict-list}]);; Load and store instructions save a cycle if they are aligned on a;; four byte boundary. Using a function unit for stores encourages;; gcc to separate load and store instructions by one instruction,;; which makes it more likely that the linker will be able to word;; align them when relaxing.;; Loads have a latency of two.;; However, call insns can have a delay slot, so that we want one more;; insn to be scheduled between the load of the function address and the call.;; This is equivalent to a latency of three.;; We cannot use a conflict list for this, because we need to distinguish;; between the actual call address and the function arguments.;; ADJUST_COST can only properly handle reductions of the cost, so we;; use a latency of three here.;; We only do this for SImode loads of general registers, to make the work;; for ADJUST_COST easier.(define_function_unit "memory" 1 0 (and (eq_attr "issues" "1") (eq_attr "type" "load_si,pcload_si")) 3 2)(define_function_unit "memory" 1 0 (and (eq_attr "issues" "1") (eq_attr "type" "load,pcload,pload,store,pstore")) 2 2)(define_function_unit "int" 1 0 (and (eq_attr "issues" "1") (eq_attr "type" "arith3,arith3b")) 3 3)(define_function_unit "int" 1 0 (and (eq_attr "issues" "1") (eq_attr "type" "dyn_shift")) 2 2)(define_function_unit "int" 1 0 (and (eq_attr "issues" "1") (eq_attr "type" "!arith3,arith3b,dyn_shift")) 1 1);; ??? These are approximations.(define_function_unit "mpy" 1 0 (and (eq_attr "issues" "1") (eq_attr "type" "smpy")) 2 2)(define_function_unit "mpy" 1 0 (and (eq_attr "issues" "1") (eq_attr "type" "dmpy")) 3 3)(define_function_unit "fp" 1 0 (and (eq_attr "issues" "1") (eq_attr "type" "fp,fmove")) 2 1)(define_function_unit "fp" 1 0 (and (eq_attr "issues" "1") (eq_attr "type" "fdiv")) 13 12);; SH4 scheduling;; The SH4 is a dual-issue implementation, thus we have to multiply all;; costs by at least two.;; There will be single increments of the modeled that don't correspond;; to the actual target ;; whenever two insns to be issued depend one a;; single resource, and the scheduler picks to be the first one.;; If we multiplied the costs just by two, just two of these single;; increments would amount to an actual cycle. By picking a larger;; factor, we can ameliorate the effect; However, we then have to make sure;; that only two insns are modeled as issued per actual cycle.;; Moreover, we need a way to specify the latency of insns that don't;; use an actual function unit.;; We use an 'issue' function unit to do that, and a cost factor of 10.(define_function_unit "issue" 2 0 (and (eq_attr "issues" "2") (eq_attr "type" "!nil,arith3")) 10 10)(define_function_unit "issue" 2 0 (and (eq_attr "issues" "2") (eq_attr "type" "arith3")) 30 30);; There is no point in providing exact scheduling information about branches,;; because they are at the starts / ends of basic blocks anyways.;; Some insns cannot be issued before/after another insn in the same cycle,;; irrespective of the type of the other insn.;; default is dual-issue, but can't be paired with an insn that;; uses multiple function units.(define_function_unit "single_issue" 1 0 (and (eq_attr "issues" "2") (eq_attr "type" "!smpy,dmpy,pload,pstore,dfp_cmp,gp_fpul,call,sfunc,arith3,arith3b")) 1 10 [(eq_attr "type" "smpy,dmpy,pload,pstore,dfp_cmp,gp_fpul")])(define_function_unit "single_issue" 1 0 (and (eq_attr "issues" "2") (eq_attr "type" "smpy,dmpy,pload,pstore,dfp_cmp,gp_fpul")) 10 10 [(const_int 1)]);; arith3 insns are always pairable at the start, but not inecessarily at;; the end; however, there doesn;t seem to be a way to express that.(define_function_unit "single_issue" 1 0 (and (eq_attr "issues" "2") (eq_attr "type" "arith3")) 30 20 [(const_int 1)]);; arith3b insn are pairable at the end and have latency that prevents pairing;; with the following branch, but we don't want this latency be respected;;; When the following branch is immediately adjacent, we can redirect the;; internal branch, which is likly to be a larger win.(define_function_unit "single_issue" 1 0 (and (eq_attr "issues" "2") (eq_attr "type" "arith3b")) 20 20 [(const_int 1)]);; calls introduce a longisch delay that is likely to flush the pipelines.(define_function_unit "single_issue" 1 0 (and (eq_attr "issues" "2") (eq_attr "type" "call,sfunc")) 160 160 [(eq_attr "type" "!call") (eq_attr "type" "call")]);; Load and store instructions have no alignment peculiarities for the SH4,;; but they use the load-store unit, which they share with the fmove type;; insns (fldi[01]; fmov frn,frm; flds; fsts; fabs; fneg) .;; Loads have a latency of two.;; However, call insns can only paired with a preceding insn, and have;; a delay slot, so that we want two more insns to be scheduled between the;; load of the function address and the call. This is equivalent to a;; latency of three.;; We cannot use a conflict list for this, because we need to distinguish;; between the actual call address and the function arguments.;; ADJUST_COST can only properly handle reductions of the cost, so we;; use a latency of three here, which gets multiplied by 10 to yield 30.;; We only do this for SImode loads of general registers, to make the work;; for ADJUST_COST easier.;; When specifying different latencies for different insns using the;; the same function unit, genattrtab.c assumes a 'FIFO constraint';; so that the blockage is at least READY-COST (E) + 1 - READY-COST (C);; for an executing insn E and a candidate insn C.;; Therefore, we define three different function units for load_store:;; load_store, load and load_si.(define_function_unit "load_si" 1 0 (and (eq_attr "issues" "2") (eq_attr "type" "load_si,pcload_si")) 30 10)(define_function_unit "load" 1 0
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -