📄 cg-manual.html
字号:
<html xmlns:cf="http://docbook.sourceforge.net/xmlns/chunkfast/1.0"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>4.燙achegrind: a cache profiler</title><link rel="stylesheet" href="vg_basic.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.69.0"><link rel="start" href="index.html" title="Valgrind Documentation"><link rel="up" href="manual.html" title="Valgrind User Manual"><link rel="prev" href="mc-manual.html" title="3.燤emcheck: a heavyweight memory checker"><link rel="next" href="cl-manual.html" title="5.燙allgrind: a heavyweight profiler"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr><td width="22px" align="center" valign="middle"><a accesskey="p" href="mc-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td><td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td><td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td><th align="center" valign="middle">Valgrind User Manual</th><td width="22px" align="center" valign="middle"><a accesskey="n" href="cl-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td></tr></table></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="cg-manual"></a>4.燙achegrind: a cache profiler</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="cg-manual.html#cg-manual.cache">4.1. Cache profiling</a></span></dt><dd><dl><dt><span class="sect2"><a href="cg-manual.html#cg-manual.overview">4.1.1. Overview</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#cache-sim">4.1.2. Cache simulation specifics</a></span></dt></dl></dd><dt><span class="sect1"><a href="cg-manual.html#cg-manual.profile">4.2. Profiling programs</a></span></dt><dd><dl><dt><span class="sect2"><a href="cg-manual.html#cg-manual.outputfile">4.2.1. Output file</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#cg-manual.cgopts">4.2.2. Cachegrind options</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#cg-manual.annotate">4.2.3. Annotating C/C++ programs</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#cg-manual.assembler">4.2.4. Annotating assembler programs</a></span></dt></dl></dd><dt><span class="sect1"><a href="cg-manual.html#cg-manual.annopts">4.3. <code class="computeroutput">cg_annotate</code> options</a></span></dt><dd><dl><dt><span class="sect2"><a href="cg-manual.html#id2609196">4.3.1. Warnings</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#id2578132">4.3.2. Things to watch out for</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#id2601977">4.3.3. Accuracy</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#id2582811">4.3.4. Todo</a></span></dt></dl></dd></dl></div><p>Detailed technical documentation on how Cachegrind works isavailable in <a href="cg-tech-docs.html">How Cachegrind works</a>. If you only want to knowhow to <span><strong class="command">use</strong></span> it, this is the page you need toread.</p><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-manual.cache"></a>4.1.燙ache profiling</h2></div></div></div><p>To use this tool, you must specify<code class="computeroutput">--tool=cachegrind</code> on theValgrind command line.</p><p>Cachegrind is a tool for doing cache simulations andannotating your source line-by-line with the number of cachemisses. In particular, it records:</p><div class="itemizedlist"><ul type="disc"><li><p>L1 instruction cache reads and misses;</p></li><li><p>L1 data cache reads and read misses, writes and write misses;</p></li><li><p>L2 unified cache reads and read misses, writes and writes misses.</p></li></ul></div><p>On a modern machine, an L1 miss will typically costaround 10 cycles, and an L2 miss can cost as much as 200cycles. Detailed cache profiling can be very useful for improvingthe performance of your program.</p><p>Also, since one instruction cache read is performed perinstruction executed, you can find out how many instructions areexecuted per line, which can be useful for traditional profilingand test coverage.</p><p>Any feedback, bug-fixes, suggestions, etc, welcome.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cg-manual.overview"></a>4.1.1.燨verview</h3></div></div></div><p>First off, as for normal Valgrind use, you probably want tocompile with debugging info (the<code class="computeroutput">-g</code> flag). But by contrast withnormal Valgrind use, you probably <span><strong class="command">do</strong></span> want to turnoptimisation on, since you should profile your program as it willbe normally run.</p><p>The two steps are:</p><div class="orderedlist"><ol type="1"><li><p>Run your program with <code class="computeroutput">valgrind --tool=cachegrind</code> in front of the normal command line invocation. When the program finishes, Cachegrind will print summary cache statistics. It also collects line-by-line information in a file <code class="computeroutput">cachegrind.out.pid</code>, where <code class="computeroutput">pid</code> is the program's process id.</p><p>This step should be done every time you want to collect information about a new program, a changed program, or about the same program with different input.</p></li><li><p>Generate a function-by-function summary, and possibly annotate source files, using the supplied <code class="computeroutput">cg_annotate</code> program. Source files to annotate can be specified manually, or manually on the command line, or "interesting" source files can be annotated automatically with the <code class="computeroutput">--auto=yes</code> option. You can annotate C/C++ files or assembly language files equally easily.</p><p>This step can be performed as many times as you like for each Step 2. You may want to do multiple annotations showing different information each time.</p></li></ol></div><p>The steps are described in detail in the followingsections.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cache-sim"></a>4.1.2.燙ache simulation specifics</h3></div></div></div><p>Cachegrind uses a simulation for a machine with a split L1cache and a unified L2 cache. This configuration is used for all(modern) x86-based machines we are aware of. Old Cyrix CPUs hada unified I and D L1 cache, but they are ancient historynow.</p><p>The more specific characteristics of the simulation are asfollows.</p><div class="itemizedlist"><ul type="disc"><li><p>Write-allocate: when a write miss occurs, the block written to is brought into the D1 cache. Most modern caches have this property.</p></li><li><p>Bit-selection hash function: the line(s) in the cache to which a memory block maps is chosen by the middle bits M--(M+N-1) of the byte address, where:</p><div class="itemizedlist"><ul type="circle"><li><p>line size = 2^M bytes</p></li><li><p>(cache size / line size) = 2^N bytes</p></li></ul></div></li><li><p>Inclusive L2 cache: the L2 cache replicates all the entries of the L1 cache. This is standard on Pentium chips, but AMD Athlons use an exclusive L2 cache that only holds blocks evicted from L1. Ditto AMD Durons and most modern VIAs.</p></li></ul></div><p>The cache configuration simulated (cache size,associativity and line size) is determined automagically usingthe CPUID instruction. If you have an old machine that (a)doesn't support the CPUID instruction, or (b) supports it in anearly incarnation that doesn't give any cache information, thenCachegrind will fall back to using a default configuration (thatof a model 3/4 Athlon). Cachegrind will tell you if thishappens. You can manually specify one, two or all three levels(I1/D1/L2) of the cache from the command line using the<code class="computeroutput">--I1</code>,<code class="computeroutput">--D1</code> and<code class="computeroutput">--L2</code> options.</p><p>Other noteworthy behaviour:</p><div class="itemizedlist"><ul type="disc"><li><p>References that straddle two cache lines are treated as follows:</p><div class="itemizedlist"><ul type="circle"><li><p>If both blocks hit --> counted as one hit</p></li><li><p>If one block hits, the other misses --> counted as one miss.</p></li><li><p>If both blocks miss --> counted as one miss (not two)</p></li></ul></div></li><li><p>Instructions that modify a memory location (eg. <code class="computeroutput">inc</code> and <code class="computeroutput">dec</code>) are counted as doing just a read, ie. a single data reference. This may seem strange, but since the write can never cause a miss (the read guarantees the block is in the cache) it's not very interesting.</p><p>Thus it measures not the number of times the data cache is accessed, but the number of times a data cache miss could occur.</p></li></ul></div><p>If you are interested in simulating a cache with differentproperties, it is not particularly hard to write your own cachesimulator, or to modify the existing ones in<code class="computeroutput">vg_cachesim_I1.c</code>,<code class="computeroutput">vg_cachesim_D1.c</code>,<code class="computeroutput">vg_cachesim_L2.c</code> and<code class="computeroutput">vg_cachesim_gen.c</code>. We'd beinterested to hear from anyone who does.</p></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-manual.profile"></a>4.2.燩rofiling programs</h2></div></div></div><p>To gather cache profiling information about the program<code class="computeroutput">ls -l</code>, invoke Cachegrind likethis:</p><pre class="programlisting">valgrind --tool=cachegrind ls -l</pre><p>The program will execute (slowly). Upon completion,summary statistics that look like this will be printed:</p><pre class="programlisting">==31751== I refs: 27,742,716==31751== I1 misses: 276==31751== L2 misses: 275==31751== I1 miss rate: 0.0%==31751== L2i miss rate: 0.0%==31751== ==31751== D refs: 15,430,290 (10,955,517 rd + 4,474,773 wr)==31751== D1 misses: 41,185 ( 21,905 rd + 19,280 wr)==31751== L2 misses: 23,085 ( 3,987 rd + 19,098 wr)==31751== D1 miss rate: 0.2% ( 0.1% + 0.4%)==31751== L2d miss rate: 0.1% ( 0.0% + 0.4%)==31751== ==31751== L2 misses: 23,360 ( 4,262 rd + 19,098 wr)==31751== L2 miss rate: 0.0% ( 0.0% + 0.4%)</pre><p>Cache accesses for instruction fetches are summarisedfirst, giving the number of fetches made (this is the number ofinstructions executed, which can be useful to know in its ownright), the number of I1 misses, and the number of L2 instruction(<code class="computeroutput">L2i</code>) misses.</p><p>Cache accesses for data follow. The information is similarto that of the instruction fetches, except that the values arealso shown split between reads and writes (note each row's<code class="computeroutput">rd</code> and<code class="computeroutput">wr</code> values add up to the row'stotal).</p><p>Combined instruction and data figures for the L2 cachefollow that.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cg-manual.outputfile"></a>4.2.1.燨utput file</h3></div></div></div><p>As well as printing summary information, Cachegrind alsowrites line-by-line cache profiling information to a file named<code class="computeroutput">cachegrind.out.pid</code>. This fileis human-readable, but is best interpreted by the accompanyingprogram <code class="computeroutput">cg_annotate</code>, describedin the next section.</p><p>Things to note about the<code class="computeroutput">cachegrind.out.pid</code>file:</p><div class="itemizedlist"><ul type="disc"><li><p>It is written every time Cachegrind is run, and will overwrite any existing <code class="computeroutput">cachegrind.out.pid</code> in the current directory (but that won't happen very often because it takes some time for process ids to be recycled).</p></li><li><p>It can be huge: <code class="computeroutput">ls -l</code> generates a file of about 350KB. Browsing a few files and web pages with a Konqueror built with full debugging information generates a file of around 15 MB.</p></li></ul></div><p>The <code class="computeroutput">.pid</code> suffixon the output file name serves two purposes. Firstly, it means you don't have to rename old log files that you don't want to overwrite. Secondly, and more importantly, it allows correct profiling with the<code class="computeroutput">--trace-children=yes</code> option ofprograms that spawn child processes.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cg-manual.cgopts"></a>4.2.2.燙achegrind options</h3></div></div></div><p><a name="cg.opts.para"></a>Manually specifies the I1/D1/L2 cacheconfiguration, where <code class="varname">size</code> and<code class="varname">line_size</code> are measured in bytes. The three itemsmust be comma-separated, but with no spaces, eg:</p><div class="literallayout"><p>牋牋valgrind
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -