📄 cl-manual.html
字号:
number of basic blocks</strong></span>. For this, use the command line option <code class="option"><a href="cl-manual.html#opt.dump-every-bb">--dump-every-bb</a>=count</code>. </p></li><li><p><span><strong class="command">Dumping at enter/leave of all functions whose name starts with</strong></span> <span class="emphasis"><em>funcprefix</em></span>. Use the option <code class="option"><a href="cl-manual.html#opt.dump-before">--dump-before</a>=funcprefix</code> and <code class="option"><a href="cl-manual.html#opt.dump-after">--dump-after</a>=funcprefix</code>. To zero cost counters before entering a function, use <code class="option"><a href="cl-manual.html#opt.zero-before">--zero-before</a>=funcprefix</code>. The prefix method for specifying function names was choosen to ease the use with C++: you don't have to specify full signatures.</p><p>You can specify these options multiple times for different function prefixes.</p></li><li><p><span><strong class="command">Program controlled dumping.</strong></span> Put </p><pre class="screen">#include <valgrind/callgrind.h></pre><p> into your source and add <code class="computeroutput">CALLGRIND_DUMP_STATS;</code> when you want a dump to happen. Use <code class="computeroutput">CALLGRIND_ZERO_STATS;</code> to only zero cost centers.</p><p>In Valgrind terminology, this method is called "Client requests". The given macros generate a special instruction pattern with no effect at all (i.e. a NOP). When run under Valgrind, the CPU simulation engine detects the special instruction pattern and triggers special actions like the ones described above.</p></li></ul></div><p>If you are running a multi-threaded application and specify the command line option <code class="option"><a href="cl-manual.html#opt.separate-threads">--separate-threads</a>=yes</code>, every thread will be profiled on its own and will create its own profile dump. Thus, the last two methods will only generate one dump of the currently running thread. With the other methods, you will get multiple dumps (one for each thread) on a dump request.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cl-manual.limits"></a>5.3.3.燣imiting the range of collected events</h3></div></div></div><p>For aggregating events (function enter/leave, instruction execution, memory access) into event numbers, first, the events must be recognizable by Callgrind, and second, the collection state must be switched on.</p><p>Event collection is only possible if <span class="emphasis"><em>instrumentation</em></span> for program code is switched on. This is the default, but for faster execution (identical to <code class="computeroutput">valgrind --tool=none</code>), it can be switched off until the program reaches a state in which you want to start collecting profiling data. Callgrind can start without instrumentation by specifying option <code class="option"><a href="cl-manual.html#opt.instr-atstart">--instr-atstart</a>=no</code>. Instrumentation can be switched on interactively with </p><pre class="screen">callgrind_control -i on</pre><p> and off by specifying "off" instead of "on". Furthermore, instrumentation state can be programatically changed with the macros <code class="computeroutput">CALLGRIND_START_INSTRUMENTATION;</code> and <code class="computeroutput">CALLGRIND_STOP_INSTRUMENTATION;</code>. </p><p>In addition to enabling instrumentation, you must also enable event collection for the parts of your program you are interested in. By default, event collection is enabled everywhere. You can limit collection to specific function(s) by using <code class="option"><a href="cl-manual.html#opt.toggle-collect">--toggle-collect</a>=funcprefix</code>. This will toggle the collection state on entering and leaving the specified functions. When this option is in effect, the default collection state at program start is "off". Only events happening while running inside of functions starting with <span class="emphasis"><em>funcprefix</em></span> will be collected. Recursive calls of functions with <span class="emphasis"><em>funcprefix</em></span> do not trigger any action.</p><p>It is important to note that with instrumentation switched off, the cache simulator cannot see any memory access events, and thus, any simulated cache state will be frozen and wrong without instrumentation. Therefore, to get useful cache events (hits/misses) after switching on instrumentation, the cache first must warm up, probably leading to many <span class="emphasis"><em>cold misses</em></span> which would not have happened in reality. If you do not want to see these, start event collection a few million instructions after you have switched on instrumentation</p>. </div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cl-manual.cycles"></a>5.3.4.燗voiding cycles</h3></div></div></div><p>Each group of functions with any two of them happening to have a call chain from one to the other, is called a cycle. For example, with A calling B, B calling C, and C calling A, the three functions A,B,C build up one cycle.</p><p>If a call chain goes multiple times around inside of a cycle, with profiling, you can not distinguish event counts coming from the first round or the second. Thus, it makes no sense to attach any inclusive cost to a call among functions inside of one cycle. If "A > B" appears multiple times in a call chain, you have no way to partition the one big sum of all appearances of "A > B". Thus, for profile data presentation, all functions of a cycle are seen as one big virtual function.</p><p>Unfortunately, if you have an application using some callback mechanism (like any GUI program), or even with normal polymorphism (as in OO languages like C++), it's quite possible to get large cycles. As it is often impossible to say anything about performance behaviour inside of cycles, it is useful to introduce some mechanisms to avoid cycles in call graphs. This is done by treating the same function in different ways, depending on the current execution context, either by giving them different names, or by ignoring calls to functions.</p><p>There is an option to ignore calls to a function with <code class="option"><a href="cl-manual.html#opt.fn-skip">--fn-skip</a>=funcprefix</code>. E.g., you usually do not want to see the trampoline functions in the PLT sections for calls to functions in shared libraries. You can see the difference if you profile with <code class="option"><a href="cl-manual.html#opt.skip-plt">--skip-plt</a>=no</code>. If a call is ignored, cost events happening will be attached to the enclosing function.</p><p>If you have a recursive function, you can distinguish the first 10 recursion levels by specifying <code class="option"><a href="cl-manual.html#opt.fn-recursion-num">--fn-recursion10</a>=funcprefix</code>. Or for all functions with <code class="option"><a href="cl-manual.html#opt.fn-recursion">--fn-recursion</a>=10</code>, but this will give you much bigger profile data files. In the profile data, you will see the recursion levels of "func" as the different functions with names "func", "func'2", "func'3" and so on.</p><p>If you have call chains "A > B > C" and "A > C > B" in your program, you usually get a "false" cycle "B <> C". Use <code class="option"><a href="cl-manual.html#opt.fn-caller-num">--fn-caller2</a>=B</code> <code class="option"><a href="cl-manual.html#opt.fn-caller-num">--fn-caller2</a>=C</code>, and functions "B" and "C" will be treated as different functions depending on the direct caller. Using the apostrophe for appending this "context" to the function name, you get "A > B'A > C'B" and "A > C'A > B'C", and there will be no cycle. Use <code class="option"><a href="cl-manual.html#opt.fn-caller">--fn-caller</a>=3</code> to get a 2-caller dependency for all functions. Note that doing this will increase the size of profile data files.</p></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cl-manual.options"></a>5.4.燙ommand line option reference</h2></div></div></div><p>In the following, options are grouped into classes, in same order asthe output as <code class="computeroutput">callgrind --help</code>.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cl-manual.options.misc"></a>5.4.1.燤iscellaneous options</h3></div></div></div><div class="variablelist"><a name="cmd-options.misc"></a><dl><dt><span class="term"><code class="option">--help</code></span></dt><dd><p>Show summary of options. This is a short version of this manual section.</p></dd><dt><span class="term"><code class="option">--version</code></span></dt><dd><p>Show version of callgrind.</p></dd></dl></div></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cl-manual.options.creation"></a>5.4.2.燚ump creation options</h3></div></div></div><p>These options influence the name and format of the profile data files.</p><div class="variablelist"><a name="cmd-options.creation"></a><dl><dt><a name="opt.base"></a><span class="term"> <code class="option">--base=<prefix> [default: callgrind.out] </code> </span></dt><dd><p>Specify the base name for the dump file names. To distinguish different profile runs of the same application, <code class="computeroutput">.<pid></code> is appended to the base dump file name with <code class="computeroutput"><pid></code> being the process ID of the profile run (with multiple dumps happening, the file name is modified further; see below).</p><p>This option is especially usefull if your application changes its working directory. Usually, the dump file is generated in the current working directory of the application at program termination. By giving an absolute path with the base specification, you can force a fixed directory for the dump files.</p></dd><dt><a name="opt.dump-instr"></a><span class="term"> <code class="option">--dump-instr=<no|yes> [default: no] </code> </span></dt><dd><p>This specifies that event counting should be performed at per-instruction granularity. This allows for assembler code annotation, but currently the results can only be shown with KCachegrind.</p></dd><dt><a name="opt.dump-line"></a><span class="term"> <code class="option">--dump-line=<no|yes> [default: yes] </code> </span></dt><dd><p>This specifies that event counting should be performed at source line granularity. This allows source annotation for sources which are compiled with debug information ("-g").</p></dd><dt><a name="opt.compress-strings"></a><span class="term"> <code class="option">--compress-strings=<no|yes> [default: yes] </code> </span></dt><dd><p>This option influences the output format of the profile data. It specifies whether strings (file and function names) should be identified by numbers. This shrinks the file size, but makes it more difficult for humans to read (which is not recommand either way).</p><p>However, this currently has to be switched off if the files are to be read by <code class="computeroutput">callgrind_annotate</code>!</p></dd><dt><a name="opt.compress-pos"></a><span class="term"> <code class="option">--compress-pos=<no|yes> [default: yes] </code> </span></dt><dd><p>This option influences the output format of the profile data. It specifies whether numerical positions are always specified as absolute values or are allowed to be relative to previous numbers. This shrinks the file size,</p><p>However, this currently has to be switched off if the files are to be read by <code class="computeroutput">callgrind_annotate</code>!</p></dd><dt><a name="opt.combine-dumps"></a><span class="term"> <code class="option">--combine-dumps=<no|yes> [default: no] </code> </span></dt><dd><p>When multiple profile data parts are to be generated, these parts are appended to the same output file if this option is set to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -