📄 reorg.c
字号:
/* Perform instruction reorganizations for delay slot filling. Copyright (C) 1992, 93, 94, 95, 96, 1997 Free Software Foundation, Inc. Contributed by Richard Kenner (kenner@vlsi1.ultra.nyu.edu). Hacked by Michael Tiemann (tiemann@cygnus.com).This file is part of GNU CC.GNU CC is free software; you can redistribute it and/or modifyit under the terms of the GNU General Public License as published bythe Free Software Foundation; either version 2, or (at your option)any later version.GNU CC is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See theGNU General Public License for more details.You should have received a copy of the GNU General Public Licensealong with GNU CC; see the file COPYING. If not, write tothe Free Software Foundation, 59 Temple Place - Suite 330,Boston, MA 02111-1307, USA. *//* Instruction reorganization pass. This pass runs after register allocation and final jump optimization. It should be the last pass to run before peephole. It serves primarily to fill delay slots of insns, typically branch and call insns. Other insns typically involve more complicated interactions of data dependencies and resource constraints, and are better handled by scheduling before register allocation (by the function `schedule_insns'). The Branch Penalty is the number of extra cycles that are needed to execute a branch insn. On an ideal machine, branches take a single cycle, and the Branch Penalty is 0. Several RISC machines approach branch delays differently: The MIPS and AMD 29000 have a single branch delay slot. Most insns (except other branches) can be used to fill this slot. When the slot is filled, two insns execute in two cycles, reducing the branch penalty to zero. The Motorola 88000 conditionally exposes its branch delay slot, so code is shorter when it is turned off, but will run faster when useful insns are scheduled there. The IBM ROMP has two forms of branch and call insns, both with and without a delay slot. Much like the 88k, insns not using the delay slot can be shorted (2 bytes vs. 4 bytes), but will run slowed. The SPARC always has a branch delay slot, but its effects can be annulled when the branch is not taken. This means that failing to find other sources of insns, we can hoist an insn from the branch target that would only be safe to execute knowing that the branch is taken. The HP-PA always has a branch delay slot. For unconditional branches its effects can be annulled when the branch is taken. The effects of the delay slot in a conditional branch can be nullified for forward taken branches, or for untaken backward branches. This means we can hoist insns from the fall-through path for forward branches or steal insns from the target of backward branches. Three techniques for filling delay slots have been implemented so far: (1) `fill_simple_delay_slots' is the simplest, most efficient way to fill delay slots. This pass first looks for insns which come from before the branch and which are safe to execute after the branch. Then it searches after the insn requiring delay slots or, in the case of a branch, for insns that are after the point at which the branch merges into the fallthrough code, if such a point exists. When such insns are found, the branch penalty decreases and no code expansion takes place. (2) `fill_eager_delay_slots' is more complicated: it is used for scheduling conditional jumps, or for scheduling jumps which cannot be filled using (1). A machine need not have annulled jumps to use this strategy, but it helps (by keeping more options open). `fill_eager_delay_slots' tries to guess the direction the branch will go; if it guesses right 100% of the time, it can reduce the branch penalty as much as `fill_simple_delay_slots' does. If it guesses wrong 100% of the time, it might as well schedule nops (or on the m88k, unexpose the branch slot). When `fill_eager_delay_slots' takes insns from the fall-through path of the jump, usually there is no code expansion; when it takes insns from the branch target, there is code expansion if it is not the only way to reach that target. (3) `relax_delay_slots' uses a set of rules to simplify code that has been reorganized by (1) and (2). It finds cases where conditional test can be eliminated, jumps can be threaded, extra insns can be eliminated, etc. It is the job of (1) and (2) to do a good job of scheduling locally; `relax_delay_slots' takes care of making the various individual schedules work well together. It is especially tuned to handle the control flow interactions of branch insns. It does nothing for insns with delay slots that do not branch. On machines that use CC0, we are very conservative. We will not make a copy of an insn involving CC0 since we want to maintain a 1-1 correspondence between the insn that sets and uses CC0. The insns are allowed to be separated by placing an insn that sets CC0 (but not an insn that uses CC0; we could do this, but it doesn't seem worthwhile) in a delay slot. In that case, we point each insn at the other with REG_CC_USER and REG_CC_SETTER notes. Note that these restrictions affect very few machines because most RISC machines with delay slots will not use CC0 (the RT is the only known exception at this point). Not yet implemented: The Acorn Risc Machine can conditionally execute most insns, so it is profitable to move single insns into a position to execute based on the condition code of the previous insn. The HP-PA can conditionally nullify insns, providing a similar effect to the ARM, differing mostly in which insn is "in charge". */#include "config.h"#include <stdio.h>#include "rtl.h"#include "insn-config.h"#include "conditions.h"#include "hard-reg-set.h"#include "basic-block.h"#include "regs.h"#include "insn-flags.h"#include "recog.h"#include "flags.h"#include "output.h"#include "obstack.h"#include "insn-attr.h"/* Import list of registers used as spill regs from reload. */extern HARD_REG_SET used_spill_regs;/* Import highest label used in function at end of reload. */extern int max_label_num_after_reload;#ifdef DELAY_SLOTS#define obstack_chunk_alloc xmalloc#define obstack_chunk_free free#ifndef ANNUL_IFTRUE_SLOTS#define eligible_for_annul_true(INSN, SLOTS, TRIAL, FLAGS) 0#endif#ifndef ANNUL_IFFALSE_SLOTS#define eligible_for_annul_false(INSN, SLOTS, TRIAL, FLAGS) 0#endif/* Insns which have delay slots that have not yet been filled. */static struct obstack unfilled_slots_obstack;static rtx *unfilled_firstobj;/* Define macros to refer to the first and last slot containing unfilled insns. These are used because the list may move and its address should be recomputed at each use. */#define unfilled_slots_base \ ((rtx *) obstack_base (&unfilled_slots_obstack))#define unfilled_slots_next \ ((rtx *) obstack_next_free (&unfilled_slots_obstack))/* This structure is used to indicate which hardware resources are set or needed by insns so far. */struct resources{ char memory; /* Insn sets or needs a memory location. */ char unch_memory; /* Insn sets of needs a "unchanging" MEM. */ char volatil; /* Insn sets or needs a volatile memory loc. */ char cc; /* Insn sets or needs the condition codes. */ HARD_REG_SET regs; /* Which registers are set or needed. */};/* Macro to clear all resources. */#define CLEAR_RESOURCE(RES) \ do { (RES)->memory = (RES)->unch_memory = (RES)->volatil = (RES)->cc = 0; \ CLEAR_HARD_REG_SET ((RES)->regs); } while (0)/* Indicates what resources are required at the beginning of the epilogue. */static struct resources start_of_epilogue_needs;/* Indicates what resources are required at function end. */static struct resources end_of_function_needs;/* Points to the label before the end of the function. */static rtx end_of_function_label;/* This structure is used to record liveness information at the targets or fallthrough insns of branches. We will most likely need the information at targets again, so save them in a hash table rather than recomputing them each time. */struct target_info{ int uid; /* INSN_UID of target. */ struct target_info *next; /* Next info for same hash bucket. */ HARD_REG_SET live_regs; /* Registers live at target. */ int block; /* Basic block number containing target. */ int bb_tick; /* Generation count of basic block info. */};#define TARGET_HASH_PRIME 257/* Define the hash table itself. */static struct target_info **target_hash_table;/* For each basic block, we maintain a generation number of its basic block info, which is updated each time we move an insn from the target of a jump. This is the generation number indexed by block number. */static int *bb_ticks;/* Mapping between INSN_UID's and position in the code since INSN_UID's do not always monotonically increase. */static int *uid_to_ruid;/* Highest valid index in `uid_to_ruid'. */static int max_uid;static void mark_referenced_resources PROTO((rtx, struct resources *, int));static void mark_set_resources PROTO((rtx, struct resources *, int, int));static int stop_search_p PROTO((rtx, int));static int resource_conflicts_p PROTO((struct resources *, struct resources *));static int insn_references_resource_p PROTO((rtx, struct resources *, int));static int insn_sets_resources_p PROTO((rtx, struct resources *, int));static rtx find_end_label PROTO((void));static rtx emit_delay_sequence PROTO((rtx, rtx, int, int));static rtx add_to_delay_list PROTO((rtx, rtx));static void delete_from_delay_slot PROTO((rtx));static void delete_scheduled_jump PROTO((rtx));static void note_delay_statistics PROTO((int, int));static rtx optimize_skip PROTO((rtx));static int get_jump_flags PROTO((rtx, rtx));static int rare_destination PROTO((rtx));static int mostly_true_jump PROTO((rtx, rtx));static rtx get_branch_condition PROTO((rtx, rtx));static int condition_dominates_p PROTO((rtx, rtx));static rtx steal_delay_list_from_target PROTO((rtx, rtx, rtx, rtx, struct resources *, struct resources *, struct resources *, int, int *, int *, rtx *));static rtx steal_delay_list_from_fallthrough PROTO((rtx, rtx, rtx, rtx, struct resources *, struct resources *, struct resources *, int, int *, int *));static void try_merge_delay_insns PROTO((rtx, rtx));static rtx redundant_insn PROTO((rtx, rtx, rtx));static int own_thread_p PROTO((rtx, rtx, int));static int find_basic_block PROTO((rtx));static void update_block PROTO((rtx, rtx));static int reorg_redirect_jump PROTO((rtx, rtx));static void update_reg_dead_notes PROTO((rtx, rtx));static void fix_reg_dead_note PROTO((rtx, rtx));static void update_reg_unused_notes PROTO((rtx, rtx));static void update_live_status PROTO((rtx, rtx));static rtx next_insn_no_annul PROTO((rtx));static void mark_target_live_regs PROTO((rtx, struct resources *));static void fill_simple_delay_slots PROTO((rtx, int));static rtx fill_slots_from_thread PROTO((rtx, rtx, rtx, rtx, int, int, int, int, int, int *));static void fill_eager_delay_slots PROTO((rtx));static void relax_delay_slots PROTO((rtx));static void make_return_insns PROTO((rtx));static int redirect_with_delay_slots_safe_p PROTO ((rtx, rtx, rtx));static int redirect_with_delay_list_safe_p PROTO ((rtx, rtx, rtx));/* Given X, some rtl, and RES, a pointer to a `struct resource', mark which resources are references by the insn. If INCLUDE_CALLED_ROUTINE is TRUE, resources used by the called routine will be included for CALL_INSNs. */static voidmark_referenced_resources (x, res, include_delayed_effects) register rtx x; register struct resources *res; register int include_delayed_effects;{ register enum rtx_code code = GET_CODE (x); register int i, j; register char *format_ptr; /* Handle leaf items for which we set resource flags. Also, special-case CALL, SET and CLOBBER operators. */ switch (code) { case CONST: case CONST_INT: case CONST_DOUBLE: case PC: case SYMBOL_REF: case LABEL_REF: return; case SUBREG: if (GET_CODE (SUBREG_REG (x)) != REG) mark_referenced_resources (SUBREG_REG (x), res, 0); else { int regno = REGNO (SUBREG_REG (x)) + SUBREG_WORD (x); int last_regno = regno + HARD_REGNO_NREGS (regno, GET_MODE (x)); for (i = regno; i < last_regno; i++) SET_HARD_REG_BIT (res->regs, i); } return; case REG: for (i = 0; i < HARD_REGNO_NREGS (REGNO (x), GET_MODE (x)); i++) SET_HARD_REG_BIT (res->regs, REGNO (x) + i); return; case MEM: /* If this memory shouldn't change, it really isn't referencing memory. */ if (RTX_UNCHANGING_P (x)) res->unch_memory = 1; else res->memory = 1; res->volatil = MEM_VOLATILE_P (x); /* Mark registers used to access memory. */ mark_referenced_resources (XEXP (x, 0), res, 0); return; case CC0: res->cc = 1; return; case UNSPEC_VOLATILE: case ASM_INPUT: case TRAP_IF: /* Traditional asm's are always volatile. */ res->volatil = 1; return; case ASM_OPERANDS: res->volatil = MEM_VOLATILE_P (x); /* For all ASM_OPERANDS, we must traverse the vector of input operands. We can not just fall through here since then we would be confused by the ASM_INPUT rtx inside ASM_OPERANDS, which do not indicate traditional asms unlike their normal usage. */ for (i = 0; i < ASM_OPERANDS_INPUT_LENGTH (x); i++) mark_referenced_resources (ASM_OPERANDS_INPUT (x, i), res, 0); return; case CALL: /* The first operand will be a (MEM (xxx)) but doesn't really reference memory. The second operand may be referenced, though. */ mark_referenced_resources (XEXP (XEXP (x, 0), 0), res, 0); mark_referenced_resources (XEXP (x, 1), res, 0); return; case SET: /* Usually, the first operand of SET is set, not referenced. But registers used to access memory are referenced. SET_DEST is also referenced if it is a ZERO_EXTRACT or SIGN_EXTRACT. */ mark_referenced_resources (SET_SRC (x), res, 0); x = SET_DEST (x); if (GET_CODE (x) == SIGN_EXTRACT || GET_CODE (x) == ZERO_EXTRACT) mark_referenced_resources (x, res, 0); else if (GET_CODE (x) == SUBREG) x = SUBREG_REG (x); if (GET_CODE (x) == MEM) mark_referenced_resources (XEXP (x, 0), res, 0); return; case CLOBBER: return; case CALL_INSN: if (include_delayed_effects) { /* A CALL references memory, the frame pointer if it exists, the stack pointer, any global registers and any registers given in USE insns immediately in front of the CALL. However, we may have moved some of the parameter loading insns into the delay slot of this CALL. If so, the USE's for them don't count and should be skipped. */ rtx insn = PREV_INSN (x); rtx sequence = 0; int seq_size = 0; rtx next = NEXT_INSN (x); int i; /* If we are part of a delay slot sequence, point at the SEQUENCE. */ if (NEXT_INSN (insn) != x) { next = NEXT_INSN (NEXT_INSN (insn)); sequence = PATTERN (NEXT_INSN (insn)); seq_size = XVECLEN (sequence, 0); if (GET_CODE (sequence) != SEQUENCE) abort ();
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -