📄 profile.sgml

📁 开放源码实时操作系统源码.
💻 SGML
📖 第 1 页 / 共 2 页
字号:
上一页 12
    </para>
    <programlisting>
int
main(int argc, char** argv)
{
    &hellip;
    init_all_network_interfaces();
    &hellip;
#ifdef CYGPKG_PROFILE_GPROF
    {
        extern char _stext[], _etext[];
        profile_on(_stext, _etext, 16, 3000);
    }
#endif
    &hellip;
}
    </programlisting>
    <para>
The application can then be linked and run as usual.
    </para>
    <informalfigure PgWide=1>
      <mediaobject>
        <imageobject>
          <imagedata fileref="gprofrun.png" Scalefit=1 Align="Center">
        </imageobject>
      </mediaobject>
    </informalfigure>
    <para>
When gprof is used for native development rather than for embedded
targets the profiling data will automatically be written out to a file
<filename>gmon.out</filename> when the program exits. This is not
possible on an embedded target because the code has no direct access
to the host's file system. Instead the <filename>gmon.out</filename>
file has to be <link linkend="gprof-extract">extracted</link> from
the target as described below. gprof can then be invoked normally:
    </para>
    <screen>
$ gprof dhrystone
Flat profile:
 
Each sample counts as 0.003003 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
 14.15      1.45     1.45   120000    12.05    12.05  Proc_7
 11.55      2.63     1.18   120000     9.84     9.84  Func_1
  8.04      3.45     0.82                             main
&hellip;
    </screen>
    <para>
If <filename>gmon.out</filename> does not contain call graph data,
either because <function>mcount</function> is not supported or because
this functionality was explicitly disabled, then the
<option>-no-graph</option> must be used.
    </para>
    <screen>
$ gprof --no-graph dhrystone
Flat profile:
 
Each sample counts as 0.003003 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
 14.15      1.45     1.45                             Proc_7
 11.55      2.63     1.18                             Func_1
  8.04      3.45     0.82                             main
&hellip;
    </screen>
  </refsect1>

  <refsect1 id="gprof-extract"><title>Extracting the Data</title>
    <para>
By default gprof expects to find the profiling data in a file
<function>gmon.out</function> in the current directory. This package
provides two ways of extracting data: a gdb macro or tftp transfers.
Using tftp is faster but requires a TCP/IP stack on the target. It
also consumes some additional target-side resources, including an
extra tftp daemon thread and its stack. The gdb macro can be used even
when the eCos configuration does not include a TCP/IP stack. However
it is much slower, typically taking tens of seconds to retrieve all
the data for a non-trivial application.
    </para>
    <para>
The gdb macro is called <command>gprof_dump</command>, and can be
found in the file <filename>gprof.gdb</filename> in the <filename
class="directory">host</filename> subdirectory of this package. A
typical way of using this macro is:
    </para>
    <screen>
(gdb) source &lt;repo&gt;/services/profile/gprof/&lt;version&gt;/host/gprof.gdb
(gdb) gprof_dump
    </screen>
    <para>
This macro can be used any time after the call to
<function>profile_on</function>. It will store the profiling data
accumulated so far to the file <filename>gmon.out</filename> in the
current directory, and then reset all counts. gprof uses only a 16 bit
counter for every bucket of code. These counters can easily saturate
if the profiling run goes on for a long time, or if the application
code spends nearly all its time in just a few tight inner loops. The
counters will not actually wrap around back to zero, instead they will
stick at 0xFFFF, but this will still affect the accuracy of the gprof
output. Hence it is desirable to reset the counters once the profiling
data has been extracted.
    </para>
    <para>
The file <filename>gprof.gdb</filename> contains two other macros
which may prove useful. <command>gprof_fetch</command> extracts the
profiling data and generates the file <filename>gmon.out</filename>,
but does not reset the counters. <command>gprof_reset</command> only
resets the counters, without extracting the data or overwriting
<filename>gmon.out</filename>.
    </para>
    <para>
If the configuration includes a TCP/IP stack then the profiling data
can be extracted using tftp instead. There are two relevant
configuration options. <varname>CYGPKG_PROFILE_TFTP</varname>
controls whether or not tftp is supported. It is enabled by default if
the configuration includes a TCP/IP stack, but can be disabled to save
target-side resources.
<varname>CYGNUM_PROFILE_TFTP_PORT</varname> controls the UDP
port which will be used. This port cannot be shared with other tftp
daemons. If neither application code nor any other package (for
example the gcov test coverage package) provides a tftp service then
the default port can be used. Otherwise it will be necessary to assign
unique ports to each daemon.
    </para>
    <para>
If enabled the tftp daemon will be started automatically by
<function>profile_on</function>. This should only happen once the
network is up and running, typically after the call to
<function>init_all_network_interfaces</function>.
    </para>
    <para>
The data can then be retrieved using a standard tftp client. There are
a number of such clients available with very different interfaces, but
a typical session might look something like this:
    </para>
    <screen>
$ tftp
tftp&gt; connect 10.1.1.134
tftp&gt; binary
tftp&gt; get gmon.out
Received 64712 bytes in 0.9 seconds
tftp&gt; quit
    </screen>
    <para>
The address <literal>10.1.1.134</literal> should be replaced with the
target's IP address. Extracting the profiling data by tftp will
automatically reset the counters.
    </para>
  </refsect1>

  <refsect1 id="gprof-configuration"><title>Configuration Options</title>
    <para>
This package contains a number of configuration options. Two of these,
<varname>CYGPKG_PROFILE_TFTP</varname> and
<varname>CYGNUM_PROFILE_TFTP_PORT</varname>, related to support for
<link linkend="gprof-extract">tftp transfers</link> and have already
been described.
    </para>
    <para>
Support for collecting the call graph data via
<function>mcount</function> is optional and can be controlled via
<varname>CYGPKG_PROFILE_CALLGRAPH</varname>. This option will only be
active if the HAL provides the underlying <function>mcount</function>
support and implements <varname>CYGINT_PROFILE_HAL_MCOUNT</varname>.
The call graph data allows gprof to produce more useful output, but at
the cost of extra run-time and memory overheads. If this option is
disabled then the <option>-pg</option> compiler flag should not be used.
    </para>
    <para>
If <varname>CYGPKG_PROFILE_CALLGRAPH</varname> is enabled then there
are two further options which can be used to control memory
requirements. Collecting the data requires two blocks of memory, a
simple hash table and an array of arc records. The
<function>mcount</function> code uses the program counter address to
index into the hash table, giving the first element of a singly linked
list. The array of arc records contains the various linked lists for
each hash slot. The required number of arc records depends on the
number of function calls in the application. For example if a function
<function>Proc_7</function> is called from three different places in
the application then three arc records will be needed.
    </para>
    <para>
<varname>CYGNUM_PROFILE_CALLGRAPH_HASH_SHIFT</varname> controls the
size of the hash table. The default value of 8 means that the program
counter is shifted right by eight places to give a hash table index.
Hence each hash table slot corresponds to 256 bytes of code, and for
an application with say 512K of code <filename>profile_on</filename>
will dynamically allocate an 8K hash table. Increasing the shift size
reduces the memory requirement, but means that each hash table slot
will correspond to more code and hence <function>mcount</function>
will need to traverse a longer linked list of arc records.
    </para>
    <para>
<varname>CYGNUM_PROFILE_CALLGRAPH_ARC_PERCENTAGE</varname> controls
how much memory <function>profile_on</function> will allocate for the
arc records. This uses a simple heuristic, a percentage of the overall
code size. By default the amount of arc record space allocated will be
5% of the code size, so for a 512K executable that requires
approximately 26K. This default should suffice for most applications.
In exceptional cases it may be insufficient and a diagnostic will be
generated when the profiling data is extracted.
    </para>
  </refsect1>

  <refsect1 id="gprof-hal"><title>Implementing the HAL Support</title>
    <para>
The profiling package requires HAL support: A function
<function>hal_enable_profile_timer</function> and an implementation
of <function>mcount</function>. The profile timer is required.
Typically it will be implemented by the variant or platform HAL
using a spare hardware timer, and that HAL package will also
implement the CDL interface
<varname>CYGINT_PROFILE_HAL_TIMER</varname>. Support for
<function>mcount</function> is optional but very desirable. Typically
it will be implemented by the architectural HAL, which will also
implement the CDL interface
<varname>CYGINT_PROFILE_HAL_MCOUNT</varname>. 
    </para>
    <programlisting>
#include &lt;pkgconf/system.h&gt;
#ifdef CYGPKG_PROFILE_GPROF
# include &lt;cyg/profile/profile.h&gt;
#endif

int
hal_enable_profile_timer(int resolution)
{
    &hellip;
    return actual_resolution;
}
    </programlisting>
    <para>
This function takes a single argument, a time interval in
microseconds. It should arrange for a timer interrupt to go off
after every interval. The timer VSR or ISR should then determine the
program counter of the interrupted code and register this with the
profiling package:
    </para>
    <programlisting>
    &hellip;
    __profile_hit(interrupted_pc);
    &hellip;
    </programlisting>
    <para>
The exact details of how this is achieved, especially obtaining the
interrupted PC, are left to the HAL implementor. The HAL is allowed to
modify the requested time interval because of hardware constraints,
and should return the interval that is actually used.
    </para>
    <para>
<function>mcount</function> can be more difficult. The calls to
<function>mcount</function> are generated internally by the compiler
and the details depend on the target architecture. In fact
<function>mcount</function> may not use the standard calling
conventions at all. Typically implementing <function>mcount</function>
requires looking at the code that is actually generated, and possibly
at the sources of the appropriate compiler back end.
    </para>
    <para>
The HAL <function>mcount</function> function should call into the
profiling package using standard calling conventions:
    </para>
    <programlisting>
    &hellip;
    __profile_mcount((CYG_ADDRWORD) caller_pc, (CYG_ADDRWORD) callee_pc);
    &hellip;
    </programlisting>
    <para>
If <function>mcount</function> was invoked because
<function>main</function> called <function>Proc_1</function> then the
caller pc should be an address inside <function>main</function>,
typically corresponding to the return location, and the callee pc
should be an address inside <function>Proc_1</function>, usually near
the start of the function.
    </para>
    <para>
For some targets the compiler does additional work, for example
automatically allocating a per-function word of memory to eliminate
the need for the hash table. This is too target-specific and hence
cannot easily be used by the generic profiling package.
    </para>
  </refsect1>

</refentry>
</part>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -