📄 frv.md
字号:
;; Attribute is "yes" for branches and jumps that span too great a distance;; to be implemented in the most natural way. Such instructions will use;; a call instruction in some way.(define_attr "far_jump" "yes,no" (const_string "no"));; Instruction type;; "unknown" must come last.(define_attr "type" "int,sethi,setlo,mul,div,gload,gstore,fload,fstore,movfg,movgf,macc,scan,cut,branch,jump,jumpl,call,spr,trap,fnop,fsconv,fsadd,fscmp,fsmul,fsmadd,fsdiv,sqrt_single,fdconv,fdadd,fdcmp,fdmul,fdmadd,fddiv,sqrt_double,mnop,mlogic,maveh,msath,maddh,mqaddh,mpackh,munpackh,mdpackh,mbhconv,mrot,mshift,mexpdhw,mexpdhd,mwcut,mmulh,mmulxh,mmach,mmrdh,mqmulh,mqmulxh,mqmach,mcpx,mqcpx,mcut,mclracc,mclracca,mdunpackh,mbhconve,mrdacc,mwtacc,maddacc,mdaddacc,mabsh,mdrot,mcpl,mdcut,mqsath,mqlimh,mqshift,mset,ccr,multi,load_or_call,unknown" (const_string "unknown"))(define_attr "acc_group" "none,even,odd" (symbol_ref "frv_acc_group (insn)"));; Scheduling and Packing Overview;; -------------------------------;;;; FR-V instructions are divided into five groups: integer, floating-point,;; media, branch and control. Each group is associated with a separate set;; of processing units, the number and behavior of which depend on the target;; target processor. Integer units have names like I0 and I1, floating-point;; units have names like F0 and F1, and so on.;;;; Each member of the FR-V family has its own restrictions on which;; instructions can issue to which units. For example, some processors;; allow loads to issue to I0 or I1 while others only allow them to issue;; to I0. As well as these processor-specific restrictions, there is a;; general rule that an instruction can only issue to unit X + 1 if an;; instruction in the same packet issued to unit X.;;;; Sometimes the only way to honor these restrictions is by adding nops;; to a packet. For example, on the fr550, media instructions that access;; ACC4-7 can only issue to M1 or M3. It is therefore only possible to;; execute these instructions by packing them with something that issues;; to M0. When no useful M0 instruction exists, an "mnop" can be used;; instead.;;;; Having decided which instructions should issue to which units, the packet;; should be ordered according to the following template:;;;; I0 F0/M0 I1 F1/M1 .... B0 B1 ...;;;; Note that VLIW packets execute strictly in parallel. Every instruction;; in the packet will stall until all input operands are ready. These;; operands are then read simultaneously before any registers are modified.;; This means that it's OK to have write-after-read hazards between;; instructions in the same packet, even if the write is listed earlier;; than the read.;;;; Three gcc passes are involved in generating VLIW packets:;;;; (1) The scheduler. This pass uses the standard scheduling code and;; behaves in much the same way as it would for a superscalar RISC;; architecture.;;;; (2) frv_reorg. This pass inserts nops into packets in order to meet;; the processor's issue requirements. It also has code to optimize;; the type of padding used to align labels.;;;; (3) frv_pack_insns. The final packing phase, which puts the;; instructions into assembly language order according to the;; "I0 F0/M0 ..." template above.;;;; In the ideal case, these three passes will agree on which instructions;; should be packed together, but this won't always happen. In particular:;;;; (a) (2) might not pack predicated instructions in the same way as (1).;; The scheduler tries to schedule predicated instructions for the;; worst case, assuming the predicate is true. However, if we have;; something like a predicated load, it isn't always possible to;; fill the load delay with useful instructions. (2) should then;; pack the user of the loaded value as aggressively as possible,;; in order to optimize the case when the predicate is false.;; See frv_pack_insn_p for more details.;;;; (b) The final shorten_branches pass runs between (2) and (3).;; Since (2) inserts nops, it is possible that some branches;; that were thought to be in range during (2) turned out to;; out-of-range in (3).;;;; All three passes use DFAs to model issue restrictions. The main;; question that the DFAs are supposed to answer is simply: can these;; instructions be packed together? The DFAs are not responsible for;; assigning instructions to execution units; that's the job of;; frv_sort_insn_group, see below for details.;;;; To get the best results, the DFAs should try to allow packets to;; be built in every possible order. This gives the scheduler more;; flexibility, removing the need for things like multipass lookahead.;; It also means we can take more advantage of inter-packet dependencies.;;;; For example, suppose we're compiling for the fr400 and we have:;;;; addi gr4,#1,gr5;; ldi @(gr6,gr0),gr4;;;; We can pack these instructions together by assigning the load to I0 and;; the addition to I1. However, because of the anti dependence between the;; two instructions, the scheduler must schedule the addition first.;; We should generally get better schedules if the DFA allows both;; (ldi, addi) and (addi, ldi), leaving the final packing pass to;; reorder the packet where appropriate.;;;; Almost all integer instructions can issue to any unit in the range I0;; to Ix, where the value of "x" depends on the type of instruction and;; on the target processor. The rules for other instruction groups are;; usually similar.;;;; When the restrictions are as regular as this, we can get the desired;; behavior by claiming the DFA unit associated with the highest unused;; execution unit. For example, if an instruction can issue to I0 or I1,;; the DFA first tries to take the DFA unit associated with I1, and will;; only take I0's unit if I1 isn't free. (Note that, as mentioned above,;; the DFA does not assign instructions to units. An instruction that;; claims DFA unit I1 will not necessarily issue to I1 in the final packet.);;;; There are some cases, such as the fr550 media restriction mentioned;; above, where the rule is not as simple as "any unit between 0 and X".;; Even so, allocating higher units first brings us close to the ideal.;;;; Having divided instructions into packets, passes (2) and (3) must;; assign instructions to specific execution units. They do this using;; the following algorithm:;;;; 1. Partition the instructions into groups (integer, float/media, etc.);;;; 2. For each group of instructions:;;;; (a) Issue each instruction in the reset DFA state and use the;; DFA cpu_unit_query interface to find out which unit it picks;; first.;;;; (b) Sort the instructions into ascending order of picked units.;; Instructions that pick I1 first come after those that pick;; I0 first, and so on. Let S be the sorted sequence and S[i];; be the ith element of it (counting from zero).;;;; (c) If this is the control or branch group, goto (i);;;; (d) Find the largest L such that S[0]...S[L-1] can be issued;; consecutively from the reset state and such that the DFA;; claims unit X when S[X] is added. Let D be the DFA state;; after instructions S[0]...S[L-1] have been issued.;;;; (e) If L is the length of S, goto (i);;;; (f) Let U be the number of units belonging to this group and #S be;; the length of S. Create a new sequence S' by concatenating;; S[L]...S[#S-1] and (U - #S) nops.;;;; (g) For each permutation S'' of S', try issuing S'' from last to;; first, starting with state D. See if the DFA claims unit;; X + L when each S''[X] is added. If so, set S to the;; concatenation of S[0]...S[L-1] and S'', then goto (i).;;;; (h) If (g) found no permutation, abort.;;;; (i) S is now the sorted sequence for this group, meaning that S[X];; issues to unit X. Trim any unwanted nops from the end of S.;;;; The sequence calculated by (b) is trivially correct for control;; instructions since they can't be packed. It is also correct for branch;; instructions due to their simple issue requirements. For integer and;; floating-point/media instructions, the sequence calculated by (b) is;; often the correct answer; the rest of the algorithm is optimized for;; the case in which it is correct.;;;; If there were no irregularities in the issue restrictions then step;; (d) would not be needed. It is mainly there to cope with the fr550;; integer restrictions, where a store can issue to I1, but only if a store;; also issues to I0. (Note that if a packet has two stores, they will be;; at the beginning of the sequence calculated by (b).) It also copes;; with fr400 M-2 instructions, which must issue to M0, and which cannot;; be issued together with an mnop in M1.;;;; Step (g) is the main one for integer and float/media instructions.;; The first permutation it tries is S' itself (because, as noted above,;; the sequence calculated by (b) is often correct). If S' doesn't work,;; the implementation tries varying the beginning of the sequence first.;; Thus the nops towards the end of the sequence will only move to lower;; positions if absolutely necessary.;;;; The algorithm is theoretically exponential in the number of instructions;; in a group, although it's only O(n log(n)) if the sequence calculated by;; (b) is acceptable. In practice, the algorithm completes quickly even;; in the rare cases where (g) needs to try other permutations.(define_automaton "integer, float_media, branch, control, idiv, div");; The main issue units. Note that not all units are available on;; all processors.(define_query_cpu_unit "i0,i1,i2,i3" "integer")(define_query_cpu_unit "f0,f1,f2,f3" "float_media")(define_query_cpu_unit "b0,b1" "branch")(define_query_cpu_unit "c" "control");; Division units.(define_cpu_unit "idiv1,idiv2" "idiv")(define_cpu_unit "div1,div2,root" "div");; Control instructions cannot be packed with others.(define_reservation "control" "i0+i1+i2+i3+f0+f1+f2+f3+b0+b1");; Generic reservation for control insns(define_insn_reservation "control" 1 (eq_attr "type" "trap,spr,unknown,multi") "c + control");; Reservation for relaxable calls to gettlsoff.(define_insn_reservation "load_or_call" 3 (eq_attr "type" "load_or_call") "c + control");; ::::::::::::::::::::;; ::;; :: Generic/FR500 scheduler description;; ::;; ::::::::::::::::::::;; Integer insns;; Synthetic units used to describe issue restrictions.(define_automaton "fr500_integer")(define_cpu_unit "fr500_load0,fr500_load1,fr500_store0" "fr500_integer")(exclusion_set "fr500_load0,fr500_load1" "fr500_store0")(define_bypass 0 "fr500_i1_sethi" "fr500_i1_setlo")(define_insn_reservation "fr500_i1_sethi" 1 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "sethi")) "i1|i0")(define_insn_reservation "fr500_i1_setlo" 1 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "setlo")) "i1|i0")(define_insn_reservation "fr500_i1_int" 1 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "int")) "i1|i0")(define_insn_reservation "fr500_i1_mul" 3 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "mul")) "i1|i0")(define_insn_reservation "fr500_i1_div" 19 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "div")) "(i1|i0),(idiv1*18|idiv2*18)")(define_insn_reservation "fr500_i2" 4 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "gload,fload")) "(i1|i0) + (fr500_load0|fr500_load1)")(define_insn_reservation "fr500_i3" 0 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "gstore,fstore")) "i0 + fr500_store0")(define_insn_reservation "fr500_i4" 3 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "movgf,movfg")) "i0")(define_insn_reservation "fr500_i5" 0 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "jumpl")) "i0");;;; Branch-instructions;;(define_insn_reservation "fr500_branch" 0 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "jump,branch,ccr")) "b1|b0")(define_insn_reservation "fr500_call" 0 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "call")) "b0");; Floating point insns. The default latencies are for non-media;; instructions; media instructions incur an extra cycle.(define_bypass 4 "fr500_farith" "fr500_m1,fr500_m2,fr500_m3, fr500_m4,fr500_m5,fr500_m6")(define_insn_reservation "fr500_farith" 3 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "fnop,fsconv,fsadd,fsmul,fsmadd,fdconv,fdadd,fdmul,fdmadd")) "(f1|f0)")(define_insn_reservation "fr500_fcmp" 4 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "fscmp,fdcmp")) "(f1|f0)")(define_bypass 11 "fr500_fdiv" "fr500_m1,fr500_m2,fr500_m3, fr500_m4,fr500_m5,fr500_m6")(define_insn_reservation "fr500_fdiv" 10 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "fsdiv,fddiv")) "(f1|f0),(div1*9 | div2*9)")(define_bypass 16 "fr500_froot" "fr500_m1,fr500_m2,fr500_m3, fr500_m4,fr500_m5,fr500_m6")(define_insn_reservation "fr500_froot" 15 (and (eq_attr "cpu" "generic,fr500,tomcat") (eq_attr "type" "sqrt_single,sqrt_double")) "(f1|f0) + root*15");; Media insns. Conflict table is as follows:;;;; M1 M2 M3 M4 M5 M6;; M1 - - - - - -;; M2 - - - - X X;; M3 - - - - X X;; M4 - - - - - X;; M5 - X X - X X;; M6 - X X X X X;;;; where X indicates an invalid combination.;;;; Target registers are as follows:;;;; M1 : FPRs;; M2 : FPRs;; M3 : ACCs;; M4 : ACCs;; M5 : FPRs;; M6 : ACCs;;;; The default FPR latencies are for integer instructions.;; Floating-point instructions need one cycle more and media;; instructions need one cycle less.(define_automaton "fr500_media")(define_cpu_unit "fr500_m2_0,fr500_m2_1" "fr500_media")(define_cpu_unit "fr500_m3_0,fr500_m3_1" "fr500_media")(define_cpu_unit "fr500_m4_0,fr500_m4_1" "fr500_media")(define_cpu_unit "fr500_m5" "fr500_media")(define_cpu_unit "fr500_m6" "fr500_media")(exclusion_set "fr500_m5,fr500_m6" "fr500_m2_0,fr500_m2_1, fr500_m3_0,fr500_m3_1")(exclusion_set "fr500_m6" "fr500_m4_0,fr500_m4_1,fr500_m5")
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -