📄 optimize-options.html

📁 gcc手册
💻 HTML
📖 第 1 页 / 共 4 页
字号:
上一页 1 2 34


     <dl>

<dt><code>-fbranch-probabilities</code>

     <dd>After running a program compiled with <code>-fprofile-arcs</code>

(see <a href="Debugging-Options.html#Debugging%20Options">Options for Debugging Your Program or <code>gcc</code></a>), you can compile it a second time using

<code>-fbranch-probabilities</code>, to improve optimizations based on

the number of times each branch was taken.  When the program

compiled with <code>-fprofile-arcs</code> exits it saves arc execution

counts to a file called <code></code><var>sourcename</var><code>.da</code> for each source

file  The information in this data file is very dependent on the

structure of the generated code, so you must use the same source code

and the same optimization options for both compilations.



     <p>With <code>-fbranch-probabilities</code>, GCC puts a

<code>REG_BR_PROB</code> note on each <code>JUMP_INSN</code> and <code>CALL_INSN</code>. 

These can be used to improve optimization.  Currently, they are only

used in one place: in <code>reorg.c</code>, instead of guessing which path a

branch is mostly to take, the <code>REG_BR_PROB</code> values are used to

exactly determine which path is taken more often.



     <br><dt><code>-fnew-ra</code>

     <dd>Use a graph coloring register allocator.  Currently this option is meant

for testing, so we are interested to hear about miscompilations with

<code>-fnew-ra</code>.



     <br><dt><code>-ftracer</code>

     <dd>Perform tail duplication to enlarge superblock size. This transformation

simplifies the control flow of the function allowing other optimizations to do

better job.



     <br><dt><code>-funroll-loops</code>

     <dd>Unroll loops whose number of iterations can be determined at compile

time or upon entry to the loop.  <code>-funroll-loops</code> implies both

<code>-fstrength-reduce</code> and <code>-frerun-cse-after-loop</code>.  This

option makes code larger, and may or may not make it run faster.



     <br><dt><code>-funroll-all-loops</code>

     <dd>Unroll all loops, even if their number of iterations is uncertain when

the loop is entered.  This usually makes programs run more slowly. 

<code>-funroll-all-loops</code> implies the same options as

<code>-funroll-loops</code>,



     <br><dt><code>-fprefetch-loop-arrays</code>

     <dd>If supported by the target machine, generate instructions to prefetch

memory to improve the performance of loops that access large arrays.



     <p>Disabled at level <code>-Os</code>.



     <br><dt><code>-ffunction-sections</code>

     <dd><dt><code>-fdata-sections</code>

     <dd>Place each function or data item into its own section in the output

file if the target supports arbitrary sections.  The name of the

function or the name of the data item determines the section's name

in the output file.



     <p>Use these options on systems where the linker can perform optimizations

to improve locality of reference in the instruction space.  Most systems

using the ELF object format and SPARC processors running Solaris 2 have

linkers with such optimizations.  AIX may have these optimizations in

the future.



     <p>Only use these options when there are significant benefits from doing

so.  When you specify these options, the assembler and linker will

create larger object and executable files and will also be slower. 

You will not be able to use <code>gprof</code> on all systems if you

specify this option and you may have problems with debugging if

you specify both this option and <code>-g</code>.



     <br><dt><code>-fssa</code>

     <dd>Perform optimizations in static single assignment form.  Each function's

flow graph is translated into SSA form, optimizations are performed, and

the flow graph is translated back from SSA form.  Users should not

specify this option, since it is not yet ready for production use.



     <br><dt><code>-fssa-ccp</code>

     <dd>Perform Sparse Conditional Constant Propagation in SSA form.  Requires

<code>-fssa</code>.  Like <code>-fssa</code>, this is an experimental feature.



     <br><dt><code>-fssa-dce</code>

     <dd>Perform aggressive dead-code elimination in SSA form.  Requires <code>-fssa</code>. 

Like <code>-fssa</code>, this is an experimental feature.



     <br><dt><code>--param </code><var>name</var><code>=</code><var>value</var><code></code>

     <dd>In some places, GCC uses various constants to control the amount of

optimization that is done.  For example, GCC will not inline functions

that contain more that a certain number of instructions.  You can

control some of these constants on the command-line using the

<code>--param</code> option.



     <p>In each case, the <var>value</var> is an integer.  The allowable choices for

<var>name</var> are given in the following table:



          <dl>

<dt><code>max-crossjump-edges</code>

          <dd>The maximum number of incoming edges to consider for crossjumping. 

The algorithm used by <code>-fcrossjumping</code> is O(N^2) in

the number of edges incoming to each block.  Increasing values mean

more aggressive optimization, making the compile time increase with

probably small improvement in executable size.



          <br><dt><code>max-delay-slot-insn-search</code>

          <dd>The maximum number of instructions to consider when looking for an

instruction to fill a delay slot.  If more than this arbitrary number of

instructions is searched, the time savings from filling the delay slot

will be minimal so stop searching.  Increasing values mean more

aggressive optimization, making the compile time increase with probably

small improvement in executable run time.



          <br><dt><code>max-delay-slot-live-search</code>

          <dd>When trying to fill delay slots, the maximum number of instructions to

consider when searching for a block with valid live register

information.  Increasing this arbitrarily chosen value means more

aggressive optimization, increasing the compile time.  This parameter

should be removed when the delay slot code is rewritten to maintain the

control-flow graph.



          <br><dt><code>max-gcse-memory</code>

          <dd>The approximate maximum amount of memory that will be allocated in

order to perform the global common subexpression elimination

optimization.  If more memory than specified is required, the

optimization will not be done.



          <br><dt><code>max-gcse-passes</code>

          <dd>The maximum number of passes of GCSE to run.



          <br><dt><code>max-pending-list-length</code>

          <dd>The maximum number of pending dependencies scheduling will allow

before flushing the current state and starting over.  Large functions

with few branches or calls can create excessively large lists which

needlessly consume memory and resources.



          <br><dt><code>max-inline-insns-single</code>

          <dd>Several parameters control the tree inliner used in gcc. 

This number sets the maximum number of instructions (counted in gcc's

internal representation) in a single function that the tree inliner

will consider for inlining.  This only affects functions declared

inline and methods implemented in a class declaration (C++). 

The default value is 300.



          <br><dt><code>max-inline-insns-auto</code>

          <dd>When you use <code>-finline-functions</code> (included in <code>-O3</code>),

a lot of functions that would otherwise not be considered for inlining

by the compiler will be investigated.  To those functions, a different

(more restrictive) limit compared to functions declared inline can

be applied. 

The default value is 300.



          <br><dt><code>max-inline-insns</code>

          <dd>The tree inliner does decrease the allowable size for single functions

to be inlined after we already inlined the number of instructions

given here by repeated inlining.  This number should be a factor of

two or more larger than the single function limit. 

Higher numbers result in better runtime performance, but incur higher

compile-time resource (CPU time, memory) requirements and result in

larger binaries.  Very high values are not advisable, as too large

binaries may adversely affect runtime performance. 

The default value is 600.



          <br><dt><code>max-inline-slope</code>

          <dd>After exceeding the maximum number of inlined instructions by repeated

inlining, a linear function is used to decrease the allowable size

for single functions.  The slope of that function is the negative

reciprocal of the number specified here. 

The default value is 32.



          <br><dt><code>min-inline-insns</code>

          <dd>The repeated inlining is throttled more and more by the linear function

after exceeding the limit.  To avoid too much throttling, a minimum for

this function is specified here to allow repeated inlining for very small

functions even when a lot of repeated inlining already has been done. 

The default value is 130.



          <br><dt><code>max-inline-insns-rtl</code>

          <dd>For languages that use the RTL inliner (this happens at a later stage

than tree inlining), you can set the maximum allowable size (counted

in RTL instructions) for the RTL inliner with this parameter. 

The default value is 600.



          <br><dt><code>max-unrolled-insns</code>

          <dd>The maximum number of instructions that a loop should have if that loop

is unrolled, and if the loop is unrolled, it determines how many times

the loop code is unrolled.



          <br><dt><code>hot-bb-count-fraction</code>

          <dd>Select fraction of the maximal count of repetitions of basic block in program

given basic block needs to have to be considered hot.



          <br><dt><code>hot-bb-frequency-fraction</code>

          <dd>Select fraction of the maximal frequency of executions of basic block in

function given basic block needs to have to be considered hot



          <br><dt><code>tracer-dynamic-coverage</code>

          <dd><dt><code>tracer-dynamic-coverage-feedback</code>

          <dd>

This value is used to limit superblock formation once the given percentage of

executed instructions is covered.  This limits unnecessary code size

expansion.



          <p>The <code>tracer-dynamic-coverage-feedback</code> is used only when profile

feedback is available.  The real profiles (as opposed to statically estimated

ones) are much less balanced allowing the threshold to be larger value.



          <br><dt><code>tracer-max-code-growth</code>

          <dd>Stop tail duplication once code growth has reached given percentage.  This is

rather hokey argument, as most of the duplicates will be eliminated later in

cross jumping, so it may be set to much higher values than is the desired code

growth.



          <br><dt><code>tracer-min-branch-ratio</code>

          <dd>

Stop reverse growth when the reverse probability of best edge is less than this

threshold (in percent).



          <br><dt><code>tracer-min-branch-ratio</code>

          <dd><dt><code>tracer-min-branch-ratio-feedback</code>

          <dd>

Stop forward growth if the best edge do have probability lower than this

threshold.



          <p>Similarly to <code>tracer-dynamic-coverage</code> two values are present, one for

compilation for profile feedback and one for compilation without.  The value

for compilation with profile feedback needs to be more conservative (higher) in

order to make tracer effective.



          <br><dt><code>ggc-min-expand</code>

          <dd>

GCC uses a garbage collector to manage its own memory allocation.  This

parameter specifies the minimum percentage by which the garbage

collector's heap should be allowed to expand between collections. 

Tuning this may improve compilation speed; it has no effect on code

generation.



          <p>The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% when

RAM &gt;= 1GB.  If <code>getrlimit</code> is available, the notion of "RAM" is

the smallest of actual RAM, RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS.  If

GCC is not able to calculate RAM on a particular platform, the lower

bound of 30% is used.  Setting this parameter and

<code>ggc-min-heapsize</code> to zero causes a full collection to occur at

every opportunity.  This is extremely slow, but can be useful for

debugging.



          <br><dt><code>ggc-min-heapsize</code>

          <dd>

Minimum size of the garbage collector's heap before it begins bothering

to collect garbage.  The first collection occurs after the heap expands

by <code>ggc-min-expand</code>% beyond <code>ggc-min-heapsize</code>.  Again,

tuning this may improve compilation speed, and has no effect on code

generation.



          <p>The default is RAM/8, with a lower bound of 4096 (four megabytes) and an

upper bound of 131072 (128 megabytes).  If <code>getrlimit</code> is

available, the notion of "RAM" is the smallest of actual RAM,

RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS.  If GCC is not able to calculate

RAM on a particular platform, the lower bound is used.  Setting this

parameter very large effectively disables garbage collection.  Setting

this parameter and <code>ggc-min-expand</code> to zero causes a full

collection to occur at every opportunity.



     </dl>

</dl>



   </body></html>
上一页 1 2 34
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -