📄 optimize-options.html
字号:
<dl>
<dt><code>-fbranch-probabilities</code>
<dd>After running a program compiled with <code>-fprofile-arcs</code>
(see <a href="Debugging-Options.html#Debugging%20Options">Options for Debugging Your Program or <code>gcc</code></a>), you can compile it a second time using
<code>-fbranch-probabilities</code>, to improve optimizations based on
the number of times each branch was taken. When the program
compiled with <code>-fprofile-arcs</code> exits it saves arc execution
counts to a file called <code></code><var>sourcename</var><code>.da</code> for each source
file The information in this data file is very dependent on the
structure of the generated code, so you must use the same source code
and the same optimization options for both compilations.
<p>With <code>-fbranch-probabilities</code>, GCC puts a
<code>REG_BR_PROB</code> note on each <code>JUMP_INSN</code> and <code>CALL_INSN</code>.
These can be used to improve optimization. Currently, they are only
used in one place: in <code>reorg.c</code>, instead of guessing which path a
branch is mostly to take, the <code>REG_BR_PROB</code> values are used to
exactly determine which path is taken more often.
<br><dt><code>-fnew-ra</code>
<dd>Use a graph coloring register allocator. Currently this option is meant
for testing, so we are interested to hear about miscompilations with
<code>-fnew-ra</code>.
<br><dt><code>-ftracer</code>
<dd>Perform tail duplication to enlarge superblock size. This transformation
simplifies the control flow of the function allowing other optimizations to do
better job.
<br><dt><code>-funroll-loops</code>
<dd>Unroll loops whose number of iterations can be determined at compile
time or upon entry to the loop. <code>-funroll-loops</code> implies both
<code>-fstrength-reduce</code> and <code>-frerun-cse-after-loop</code>. This
option makes code larger, and may or may not make it run faster.
<br><dt><code>-funroll-all-loops</code>
<dd>Unroll all loops, even if their number of iterations is uncertain when
the loop is entered. This usually makes programs run more slowly.
<code>-funroll-all-loops</code> implies the same options as
<code>-funroll-loops</code>,
<br><dt><code>-fprefetch-loop-arrays</code>
<dd>If supported by the target machine, generate instructions to prefetch
memory to improve the performance of loops that access large arrays.
<p>Disabled at level <code>-Os</code>.
<br><dt><code>-ffunction-sections</code>
<dd><dt><code>-fdata-sections</code>
<dd>Place each function or data item into its own section in the output
file if the target supports arbitrary sections. The name of the
function or the name of the data item determines the section's name
in the output file.
<p>Use these options on systems where the linker can perform optimizations
to improve locality of reference in the instruction space. Most systems
using the ELF object format and SPARC processors running Solaris 2 have
linkers with such optimizations. AIX may have these optimizations in
the future.
<p>Only use these options when there are significant benefits from doing
so. When you specify these options, the assembler and linker will
create larger object and executable files and will also be slower.
You will not be able to use <code>gprof</code> on all systems if you
specify this option and you may have problems with debugging if
you specify both this option and <code>-g</code>.
<br><dt><code>-fssa</code>
<dd>Perform optimizations in static single assignment form. Each function's
flow graph is translated into SSA form, optimizations are performed, and
the flow graph is translated back from SSA form. Users should not
specify this option, since it is not yet ready for production use.
<br><dt><code>-fssa-ccp</code>
<dd>Perform Sparse Conditional Constant Propagation in SSA form. Requires
<code>-fssa</code>. Like <code>-fssa</code>, this is an experimental feature.
<br><dt><code>-fssa-dce</code>
<dd>Perform aggressive dead-code elimination in SSA form. Requires <code>-fssa</code>.
Like <code>-fssa</code>, this is an experimental feature.
<br><dt><code>--param </code><var>name</var><code>=</code><var>value</var><code></code>
<dd>In some places, GCC uses various constants to control the amount of
optimization that is done. For example, GCC will not inline functions
that contain more that a certain number of instructions. You can
control some of these constants on the command-line using the
<code>--param</code> option.
<p>In each case, the <var>value</var> is an integer. The allowable choices for
<var>name</var> are given in the following table:
<dl>
<dt><code>max-crossjump-edges</code>
<dd>The maximum number of incoming edges to consider for crossjumping.
The algorithm used by <code>-fcrossjumping</code> is O(N^2) in
the number of edges incoming to each block. Increasing values mean
more aggressive optimization, making the compile time increase with
probably small improvement in executable size.
<br><dt><code>max-delay-slot-insn-search</code>
<dd>The maximum number of instructions to consider when looking for an
instruction to fill a delay slot. If more than this arbitrary number of
instructions is searched, the time savings from filling the delay slot
will be minimal so stop searching. Increasing values mean more
aggressive optimization, making the compile time increase with probably
small improvement in executable run time.
<br><dt><code>max-delay-slot-live-search</code>
<dd>When trying to fill delay slots, the maximum number of instructions to
consider when searching for a block with valid live register
information. Increasing this arbitrarily chosen value means more
aggressive optimization, increasing the compile time. This parameter
should be removed when the delay slot code is rewritten to maintain the
control-flow graph.
<br><dt><code>max-gcse-memory</code>
<dd>The approximate maximum amount of memory that will be allocated in
order to perform the global common subexpression elimination
optimization. If more memory than specified is required, the
optimization will not be done.
<br><dt><code>max-gcse-passes</code>
<dd>The maximum number of passes of GCSE to run.
<br><dt><code>max-pending-list-length</code>
<dd>The maximum number of pending dependencies scheduling will allow
before flushing the current state and starting over. Large functions
with few branches or calls can create excessively large lists which
needlessly consume memory and resources.
<br><dt><code>max-inline-insns-single</code>
<dd>Several parameters control the tree inliner used in gcc.
This number sets the maximum number of instructions (counted in gcc's
internal representation) in a single function that the tree inliner
will consider for inlining. This only affects functions declared
inline and methods implemented in a class declaration (C++).
The default value is 300.
<br><dt><code>max-inline-insns-auto</code>
<dd>When you use <code>-finline-functions</code> (included in <code>-O3</code>),
a lot of functions that would otherwise not be considered for inlining
by the compiler will be investigated. To those functions, a different
(more restrictive) limit compared to functions declared inline can
be applied.
The default value is 300.
<br><dt><code>max-inline-insns</code>
<dd>The tree inliner does decrease the allowable size for single functions
to be inlined after we already inlined the number of instructions
given here by repeated inlining. This number should be a factor of
two or more larger than the single function limit.
Higher numbers result in better runtime performance, but incur higher
compile-time resource (CPU time, memory) requirements and result in
larger binaries. Very high values are not advisable, as too large
binaries may adversely affect runtime performance.
The default value is 600.
<br><dt><code>max-inline-slope</code>
<dd>After exceeding the maximum number of inlined instructions by repeated
inlining, a linear function is used to decrease the allowable size
for single functions. The slope of that function is the negative
reciprocal of the number specified here.
The default value is 32.
<br><dt><code>min-inline-insns</code>
<dd>The repeated inlining is throttled more and more by the linear function
after exceeding the limit. To avoid too much throttling, a minimum for
this function is specified here to allow repeated inlining for very small
functions even when a lot of repeated inlining already has been done.
The default value is 130.
<br><dt><code>max-inline-insns-rtl</code>
<dd>For languages that use the RTL inliner (this happens at a later stage
than tree inlining), you can set the maximum allowable size (counted
in RTL instructions) for the RTL inliner with this parameter.
The default value is 600.
<br><dt><code>max-unrolled-insns</code>
<dd>The maximum number of instructions that a loop should have if that loop
is unrolled, and if the loop is unrolled, it determines how many times
the loop code is unrolled.
<br><dt><code>hot-bb-count-fraction</code>
<dd>Select fraction of the maximal count of repetitions of basic block in program
given basic block needs to have to be considered hot.
<br><dt><code>hot-bb-frequency-fraction</code>
<dd>Select fraction of the maximal frequency of executions of basic block in
function given basic block needs to have to be considered hot
<br><dt><code>tracer-dynamic-coverage</code>
<dd><dt><code>tracer-dynamic-coverage-feedback</code>
<dd>
This value is used to limit superblock formation once the given percentage of
executed instructions is covered. This limits unnecessary code size
expansion.
<p>The <code>tracer-dynamic-coverage-feedback</code> is used only when profile
feedback is available. The real profiles (as opposed to statically estimated
ones) are much less balanced allowing the threshold to be larger value.
<br><dt><code>tracer-max-code-growth</code>
<dd>Stop tail duplication once code growth has reached given percentage. This is
rather hokey argument, as most of the duplicates will be eliminated later in
cross jumping, so it may be set to much higher values than is the desired code
growth.
<br><dt><code>tracer-min-branch-ratio</code>
<dd>
Stop reverse growth when the reverse probability of best edge is less than this
threshold (in percent).
<br><dt><code>tracer-min-branch-ratio</code>
<dd><dt><code>tracer-min-branch-ratio-feedback</code>
<dd>
Stop forward growth if the best edge do have probability lower than this
threshold.
<p>Similarly to <code>tracer-dynamic-coverage</code> two values are present, one for
compilation for profile feedback and one for compilation without. The value
for compilation with profile feedback needs to be more conservative (higher) in
order to make tracer effective.
<br><dt><code>ggc-min-expand</code>
<dd>
GCC uses a garbage collector to manage its own memory allocation. This
parameter specifies the minimum percentage by which the garbage
collector's heap should be allowed to expand between collections.
Tuning this may improve compilation speed; it has no effect on code
generation.
<p>The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% when
RAM >= 1GB. If <code>getrlimit</code> is available, the notion of "RAM" is
the smallest of actual RAM, RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS. If
GCC is not able to calculate RAM on a particular platform, the lower
bound of 30% is used. Setting this parameter and
<code>ggc-min-heapsize</code> to zero causes a full collection to occur at
every opportunity. This is extremely slow, but can be useful for
debugging.
<br><dt><code>ggc-min-heapsize</code>
<dd>
Minimum size of the garbage collector's heap before it begins bothering
to collect garbage. The first collection occurs after the heap expands
by <code>ggc-min-expand</code>% beyond <code>ggc-min-heapsize</code>. Again,
tuning this may improve compilation speed, and has no effect on code
generation.
<p>The default is RAM/8, with a lower bound of 4096 (four megabytes) and an
upper bound of 131072 (128 megabytes). If <code>getrlimit</code> is
available, the notion of "RAM" is the smallest of actual RAM,
RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS. If GCC is not able to calculate
RAM on a particular platform, the lower bound is used. Setting this
parameter very large effectively disables garbage collection. Setting
this parameter and <code>ggc-min-expand</code> to zero causes a full
collection to occur at every opportunity.
</dl>
</dl>
</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -