⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 frv.md

📁 Mac OS X 10.4.9 for x86 Source Code gcc 实现源代码
💻 MD
📖 第 1 页 / 共 5 页
字号:
;; Attribute is "yes" for branches and jumps that span too great a distance;; to be implemented in the most natural way.  Such instructions will use;; a call instruction in some way.(define_attr "far_jump" "yes,no" (const_string "no"));; Instruction type;; "unknown" must come last.(define_attr "type"  "int,sethi,setlo,mul,div,gload,gstore,fload,fstore,movfg,movgf,macc,scan,cut,branch,jump,jumpl,call,spr,trap,fnop,fsconv,fsadd,fscmp,fsmul,fsmadd,fsdiv,sqrt_single,fdconv,fdadd,fdcmp,fdmul,fdmadd,fddiv,sqrt_double,mnop,mlogic,maveh,msath,maddh,mqaddh,mpackh,munpackh,mdpackh,mbhconv,mrot,mshift,mexpdhw,mexpdhd,mwcut,mmulh,mmulxh,mmach,mmrdh,mqmulh,mqmulxh,mqmach,mcpx,mqcpx,mcut,mclracc,mclracca,mdunpackh,mbhconve,mrdacc,mwtacc,maddacc,mdaddacc,mabsh,mdrot,mcpl,mdcut,mqsath,mqlimh,mqshift,mset,ccr,multi,load_or_call,unknown"  (const_string "unknown"))(define_attr "acc_group" "none,even,odd"  (symbol_ref "frv_acc_group (insn)"));; Scheduling and Packing Overview;; -------------------------------;;;; FR-V instructions are divided into five groups: integer, floating-point,;; media, branch and control.  Each group is associated with a separate set;; of processing units, the number and behavior of which depend on the target;; target processor.  Integer units have names like I0 and I1, floating-point;; units have names like F0 and F1, and so on.;;;; Each member of the FR-V family has its own restrictions on which;; instructions can issue to which units.  For example, some processors;; allow loads to issue to I0 or I1 while others only allow them to issue;; to I0.  As well as these processor-specific restrictions, there is a;; general rule that an instruction can only issue to unit X + 1 if an;; instruction in the same packet issued to unit X.;;;; Sometimes the only way to honor these restrictions is by adding nops;; to a packet.  For example, on the fr550, media instructions that access;; ACC4-7 can only issue to M1 or M3.  It is therefore only possible to;; execute these instructions by packing them with something that issues;; to M0.  When no useful M0 instruction exists, an "mnop" can be used;; instead.;;;; Having decided which instructions should issue to which units, the packet;; should be ordered according to the following template:;;;;     I0 F0/M0 I1 F1/M1 .... B0 B1 ...;;;; Note that VLIW packets execute strictly in parallel.  Every instruction;; in the packet will stall until all input operands are ready.  These;; operands are then read simultaneously before any registers are modified.;; This means that it's OK to have write-after-read hazards between;; instructions in the same packet, even if the write is listed earlier;; than the read.;;;; Three gcc passes are involved in generating VLIW packets:;;;;    (1) The scheduler.  This pass uses the standard scheduling code and;;	  behaves in much the same way as it would for a superscalar RISC;;	  architecture.;;;;    (2) frv_reorg.  This pass inserts nops into packets in order to meet;;	  the processor's issue requirements.  It also has code to optimize;;	  the type of padding used to align labels.;;;;    (3) frv_pack_insns.  The final packing phase, which puts the;;	  instructions into assembly language order according to the;;	  "I0 F0/M0 ..." template above.;;;; In the ideal case, these three passes will agree on which instructions;; should be packed together, but this won't always happen.  In particular:;;;;    (a) (2) might not pack predicated instructions in the same way as (1).;;	  The scheduler tries to schedule predicated instructions for the;;	  worst case, assuming the predicate is true.  However, if we have;;	  something like a predicated load, it isn't always possible to;;	  fill the load delay with useful instructions.  (2) should then;;	  pack the user of the loaded value as aggressively as possible,;;	  in order to optimize the case when the predicate is false.;;	  See frv_pack_insn_p for more details.;;;;    (b) The final shorten_branches pass runs between (2) and (3).;;	  Since (2) inserts nops, it is possible that some branches;;	  that were thought to be in range during (2) turned out to;;	  out-of-range in (3).;;;; All three passes use DFAs to model issue restrictions.  The main;; question that the DFAs are supposed to answer is simply: can these;; instructions be packed together?  The DFAs are not responsible for;; assigning instructions to execution units; that's the job of;; frv_sort_insn_group, see below for details.;;;; To get the best results, the DFAs should try to allow packets to;; be built in every possible order.  This gives the scheduler more;; flexibility, removing the need for things like multipass lookahead.;; It also means we can take more advantage of inter-packet dependencies.;;;; For example, suppose we're compiling for the fr400 and we have:;;;;	addi	gr4,#1,gr5;;	ldi	@(gr6,gr0),gr4;;;; We can pack these instructions together by assigning the load to I0 and;; the addition to I1.  However, because of the anti dependence between the;; two instructions, the scheduler must schedule the addition first.;; We should generally get better schedules if the DFA allows both;; (ldi, addi) and (addi, ldi), leaving the final packing pass to;; reorder the packet where appropriate.;;;; Almost all integer instructions can issue to any unit in the range I0;; to Ix, where the value of "x" depends on the type of instruction and;; on the target processor.  The rules for other instruction groups are;; usually similar.;;;; When the restrictions are as regular as this, we can get the desired;; behavior by claiming the DFA unit associated with the highest unused;; execution unit.  For example, if an instruction can issue to I0 or I1,;; the DFA first tries to take the DFA unit associated with I1, and will;; only take I0's unit if I1 isn't free.  (Note that, as mentioned above,;; the DFA does not assign instructions to units.  An instruction that;; claims DFA unit I1 will not necessarily issue to I1 in the final packet.);;;; There are some cases, such as the fr550 media restriction mentioned;; above, where the rule is not as simple as "any unit between 0 and X".;; Even so, allocating higher units first brings us close to the ideal.;;;; Having divided instructions into packets, passes (2) and (3) must;; assign instructions to specific execution units.  They do this using;; the following algorithm:;;;;    1. Partition the instructions into groups (integer, float/media, etc.);;;;    2. For each group of instructions:;;;;	 (a) Issue each instruction in the reset DFA state and use the;;	     DFA cpu_unit_query interface to find out which unit it picks;;	     first.;;;;	 (b) Sort the instructions into ascending order of picked units.;;	     Instructions that pick I1 first come after those that pick;;	     I0 first, and so on.  Let S be the sorted sequence and S[i];;	     be the ith element of it (counting from zero).;;;;	 (c) If this is the control or branch group, goto (i);;;;	 (d) Find the largest L such that S[0]...S[L-1] can be issued;;	     consecutively from the reset state and such that the DFA;;	     claims unit X when S[X] is added.  Let D be the DFA state;;	     after instructions S[0]...S[L-1] have been issued.;;;;	 (e) If L is the length of S, goto (i);;;;	 (f) Let U be the number of units belonging to this group and #S be;;	     the length of S.  Create a new sequence S' by concatenating;;	     S[L]...S[#S-1] and (U - #S) nops.;;;;	 (g) For each permutation S'' of S', try issuing S'' from last to;;	     first, starting with state D.  See if the DFA claims unit;;	     X + L when each S''[X] is added.  If so, set S to the;;	     concatenation of S[0]...S[L-1] and S'', then goto (i).;;;;	 (h) If (g) found no permutation, abort.;;;;	 (i) S is now the sorted sequence for this group, meaning that S[X];;	     issues to unit X.  Trim any unwanted nops from the end of S.;;;; The sequence calculated by (b) is trivially correct for control;; instructions since they can't be packed.  It is also correct for branch;; instructions due to their simple issue requirements.  For integer and;; floating-point/media instructions, the sequence calculated by (b) is;; often the correct answer; the rest of the algorithm is optimized for;; the case in which it is correct.;;;; If there were no irregularities in the issue restrictions then step;; (d) would not be needed.  It is mainly there to cope with the fr550;; integer restrictions, where a store can issue to I1, but only if a store;; also issues to I0.  (Note that if a packet has two stores, they will be;; at the beginning of the sequence calculated by (b).)  It also copes;; with fr400 M-2 instructions, which must issue to M0, and which cannot;; be issued together with an mnop in M1.;;;; Step (g) is the main one for integer and float/media instructions.;; The first permutation it tries is S' itself (because, as noted above,;; the sequence calculated by (b) is often correct).  If S' doesn't work,;; the implementation tries varying the beginning of the sequence first.;; Thus the nops towards the end of the sequence will only move to lower;; positions if absolutely necessary.;;;; The algorithm is theoretically exponential in the number of instructions;; in a group, although it's only O(n log(n)) if the sequence calculated by;; (b) is acceptable.  In practice, the algorithm completes quickly even;; in the rare cases where (g) needs to try other permutations.(define_automaton "integer, float_media, branch, control, idiv, div");; The main issue units.  Note that not all units are available on;; all processors.(define_query_cpu_unit "i0,i1,i2,i3" "integer")(define_query_cpu_unit "f0,f1,f2,f3" "float_media")(define_query_cpu_unit "b0,b1" "branch")(define_query_cpu_unit "c" "control");; Division units.(define_cpu_unit "idiv1,idiv2" "idiv")(define_cpu_unit "div1,div2,root" "div");; Control instructions cannot be packed with others.(define_reservation "control" "i0+i1+i2+i3+f0+f1+f2+f3+b0+b1");; Generic reservation for control insns(define_insn_reservation "control" 1  (eq_attr "type" "trap,spr,unknown,multi")  "c + control");; Reservation for relaxable calls to gettlsoff.(define_insn_reservation "load_or_call" 3  (eq_attr "type" "load_or_call")  "c + control");; ::::::::::::::::::::;; ::;; :: Generic/FR500 scheduler description;; ::;; ::::::::::::::::::::;; Integer insns;; Synthetic units used to describe issue restrictions.(define_automaton "fr500_integer")(define_cpu_unit "fr500_load0,fr500_load1,fr500_store0" "fr500_integer")(exclusion_set "fr500_load0,fr500_load1" "fr500_store0")(define_bypass 0 "fr500_i1_sethi" "fr500_i1_setlo")(define_insn_reservation "fr500_i1_sethi" 1  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "sethi"))  "i1|i0")(define_insn_reservation "fr500_i1_setlo" 1  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "setlo"))  "i1|i0")(define_insn_reservation "fr500_i1_int" 1  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "int"))  "i1|i0")(define_insn_reservation "fr500_i1_mul" 3  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "mul"))  "i1|i0")(define_insn_reservation "fr500_i1_div" 19  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "div"))  "(i1|i0),(idiv1*18|idiv2*18)")(define_insn_reservation "fr500_i2" 4  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "gload,fload"))  "(i1|i0) + (fr500_load0|fr500_load1)")(define_insn_reservation "fr500_i3" 0  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "gstore,fstore"))  "i0 + fr500_store0")(define_insn_reservation "fr500_i4" 3  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "movgf,movfg"))  "i0")(define_insn_reservation "fr500_i5" 0  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "jumpl"))  "i0");;;; Branch-instructions;;(define_insn_reservation "fr500_branch" 0  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "jump,branch,ccr"))  "b1|b0")(define_insn_reservation "fr500_call" 0  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "call"))  "b0");; Floating point insns.  The default latencies are for non-media;; instructions; media instructions incur an extra cycle.(define_bypass 4 "fr500_farith" "fr500_m1,fr500_m2,fr500_m3,			         fr500_m4,fr500_m5,fr500_m6")(define_insn_reservation "fr500_farith" 3  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "fnop,fsconv,fsadd,fsmul,fsmadd,fdconv,fdadd,fdmul,fdmadd"))  "(f1|f0)")(define_insn_reservation "fr500_fcmp" 4  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "fscmp,fdcmp"))  "(f1|f0)")(define_bypass 11 "fr500_fdiv" "fr500_m1,fr500_m2,fr500_m3,			        fr500_m4,fr500_m5,fr500_m6")(define_insn_reservation "fr500_fdiv" 10  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "fsdiv,fddiv"))  "(f1|f0),(div1*9 | div2*9)")(define_bypass 16 "fr500_froot" "fr500_m1,fr500_m2,fr500_m3,				 fr500_m4,fr500_m5,fr500_m6")(define_insn_reservation "fr500_froot" 15  (and (eq_attr "cpu" "generic,fr500,tomcat")       (eq_attr "type" "sqrt_single,sqrt_double"))  "(f1|f0) + root*15");; Media insns.  Conflict table is as follows:;;;;           M1  M2  M3  M4  M5  M6;;        M1  -   -   -   -   -   -;;        M2  -   -   -   -   X   X;;        M3  -   -   -   -   X   X;;        M4  -   -   -   -   -   X;;        M5  -   X   X   -   X   X;;        M6  -   X   X   X   X   X;;;; where X indicates an invalid combination.;;;; Target registers are as follows:;;;;	  M1 : FPRs;;	  M2 : FPRs;;	  M3 : ACCs;;	  M4 : ACCs;;	  M5 : FPRs;;	  M6 : ACCs;;;; The default FPR latencies are for integer instructions.;; Floating-point instructions need one cycle more and media;; instructions need one cycle less.(define_automaton "fr500_media")(define_cpu_unit "fr500_m2_0,fr500_m2_1" "fr500_media")(define_cpu_unit "fr500_m3_0,fr500_m3_1" "fr500_media")(define_cpu_unit "fr500_m4_0,fr500_m4_1" "fr500_media")(define_cpu_unit "fr500_m5" "fr500_media")(define_cpu_unit "fr500_m6" "fr500_media")(exclusion_set "fr500_m5,fr500_m6" "fr500_m2_0,fr500_m2_1,				    fr500_m3_0,fr500_m3_1")(exclusion_set "fr500_m6" "fr500_m4_0,fr500_m4_1,fr500_m5")

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -