cg-manual.html

来自「memory checking tool 源代码valgrind-3.2.1.」· HTML 代码 · 共 272 行
HTML
272 行
<html xmlns:cf="http://docbook.sourceforge.net/xmlns/chunkfast/1.0"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>4.燙achegrind: a cache profiler</title><link rel="stylesheet" href="vg_basic.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.69.0"><link rel="start" href="index.html" title="Valgrind Documentation"><link rel="up" href="manual.html" title="Valgrind User Manual"><link rel="prev" href="mc-manual.html" title="3.燤emcheck: a heavyweight memory checker"><link rel="next" href="cl-manual.html" title="5.燙allgrind: a heavyweight profiler"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div><table class="nav" width="100%" cellspacing="3" cellpadding="3" border="0" summary="Navigation header"><tr><td width="22px" align="center" valign="middle"><a accesskey="p" href="mc-manual.html"><img src="images/prev.png" width="18" height="21" border="0" alt="Prev"></a></td><td width="25px" align="center" valign="middle"><a accesskey="u" href="manual.html"><img src="images/up.png" width="21" height="18" border="0" alt="Up"></a></td><td width="31px" align="center" valign="middle"><a accesskey="h" href="index.html"><img src="images/home.png" width="27" height="20" border="0" alt="Up"></a></td><th align="center" valign="middle">Valgrind User Manual</th><td width="22px" align="center" valign="middle"><a accesskey="n" href="cl-manual.html"><img src="images/next.png" width="18" height="21" border="0" alt="Next"></a></td></tr></table></div><div class="chapter" lang="en"><div class="titlepage"><div><div><h2 class="title"><a name="cg-manual"></a>4.燙achegrind: a cache profiler</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="cg-manual.html#cg-manual.cache">4.1. Cache profiling</a></span></dt><dd><dl><dt><span class="sect2"><a href="cg-manual.html#cg-manual.overview">4.1.1. Overview</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#cache-sim">4.1.2. Cache simulation specifics</a></span></dt></dl></dd><dt><span class="sect1"><a href="cg-manual.html#cg-manual.profile">4.2. Profiling programs</a></span></dt><dd><dl><dt><span class="sect2"><a href="cg-manual.html#cg-manual.outputfile">4.2.1. Output file</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#cg-manual.cgopts">4.2.2. Cachegrind options</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#cg-manual.annotate">4.2.3. Annotating C/C++ programs</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#cg-manual.assembler">4.2.4. Annotating assembler programs</a></span></dt></dl></dd><dt><span class="sect1"><a href="cg-manual.html#cg-manual.annopts">4.3. <code class="computeroutput">cg_annotate</code> options</a></span></dt><dd><dl><dt><span class="sect2"><a href="cg-manual.html#id2609196">4.3.1. Warnings</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#id2578132">4.3.2. Things to watch out for</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#id2601977">4.3.3. Accuracy</a></span></dt><dt><span class="sect2"><a href="cg-manual.html#id2582811">4.3.4. Todo</a></span></dt></dl></dd></dl></div><p>Detailed technical documentation on how Cachegrind works isavailable in <a href="cg-tech-docs.html">How Cachegrind works</a>.  If you only want to knowhow to <span><strong class="command">use</strong></span> it, this is the page you need toread.</p><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-manual.cache"></a>4.1.燙ache profiling</h2></div></div></div><p>To use this tool, you must specify<code class="computeroutput">--tool=cachegrind</code> on theValgrind command line.</p><p>Cachegrind is a tool for doing cache simulations andannotating your source line-by-line with the number of cachemisses.  In particular, it records:</p><div class="itemizedlist"><ul type="disc"><li><p>L1 instruction cache reads and misses;</p></li><li><p>L1 data cache reads and read misses, writes and write    misses;</p></li><li><p>L2 unified cache reads and read misses, writes and    writes misses.</p></li></ul></div><p>On a modern machine, an L1 miss will typically costaround 10 cycles, and an L2 miss can cost as much as 200cycles. Detailed cache profiling can be very useful for improvingthe performance of your program.</p><p>Also, since one instruction cache read is performed perinstruction executed, you can find out how many instructions areexecuted per line, which can be useful for traditional profilingand test coverage.</p><p>Any feedback, bug-fixes, suggestions, etc, welcome.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cg-manual.overview"></a>4.1.1.燨verview</h3></div></div></div><p>First off, as for normal Valgrind use, you probably want tocompile with debugging info (the<code class="computeroutput">-g</code> flag).  But by contrast withnormal Valgrind use, you probably <span><strong class="command">do</strong></span> want to turnoptimisation on, since you should profile your program as it willbe normally run.</p><p>The two steps are:</p><div class="orderedlist"><ol type="1"><li><p>Run your program with <code class="computeroutput">valgrind    --tool=cachegrind</code> in front of the normal    command line invocation.  When the program finishes,    Cachegrind will print summary cache statistics. It also    collects line-by-line information in a file    <code class="computeroutput">cachegrind.out.pid</code>, where    <code class="computeroutput">pid</code> is the program's process    id.</p><p>This step should be done every time you want to collect    information about a new program, a changed program, or about    the same program with different input.</p></li><li><p>Generate a function-by-function summary, and possibly    annotate source files, using the supplied    <code class="computeroutput">cg_annotate</code> program. Source    files to annotate can be specified manually, or manually on    the command line, or "interesting" source files can be    annotated automatically with the    <code class="computeroutput">--auto=yes</code> option.  You can    annotate C/C++ files or assembly language files equally    easily.</p><p>This step can be performed as many times as you like    for each Step 2.  You may want to do multiple annotations    showing different information each time.</p></li></ol></div><p>The steps are described in detail in the followingsections.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cache-sim"></a>4.1.2.燙ache simulation specifics</h3></div></div></div><p>Cachegrind uses a simulation for a machine with a split L1cache and a unified L2 cache.  This configuration is used for all(modern) x86-based machines we are aware of.  Old Cyrix CPUs hada unified I and D L1 cache, but they are ancient historynow.</p><p>The more specific characteristics of the simulation are asfollows.</p><div class="itemizedlist"><ul type="disc"><li><p>Write-allocate: when a write miss occurs, the block    written to is brought into the D1 cache.  Most modern caches    have this property.</p></li><li><p>Bit-selection hash function: the line(s) in the cache    to which a memory block maps is chosen by the middle bits    M--(M+N-1) of the byte address, where:</p><div class="itemizedlist"><ul type="circle"><li><p>line size = 2^M bytes</p></li><li><p>(cache size / line size) = 2^N bytes</p></li></ul></div></li><li><p>Inclusive L2 cache: the L2 cache replicates all the    entries of the L1 cache.  This is standard on Pentium chips,    but AMD Athlons use an exclusive L2 cache that only holds    blocks evicted from L1.  Ditto AMD Durons and most modern    VIAs.</p></li></ul></div><p>The cache configuration simulated (cache size,associativity and line size) is determined automagically usingthe CPUID instruction.  If you have an old machine that (a)doesn't support the CPUID instruction, or (b) supports it in anearly incarnation that doesn't give any cache information, thenCachegrind will fall back to using a default configuration (thatof a model 3/4 Athlon).  Cachegrind will tell you if thishappens.  You can manually specify one, two or all three levels(I1/D1/L2) of the cache from the command line using the<code class="computeroutput">--I1</code>,<code class="computeroutput">--D1</code> and<code class="computeroutput">--L2</code> options.</p><p>Other noteworthy behaviour:</p><div class="itemizedlist"><ul type="disc"><li><p>References that straddle two cache lines are treated as    follows:</p><div class="itemizedlist"><ul type="circle"><li><p>If both blocks hit --&gt; counted as one hit</p></li><li><p>If one block hits, the other misses --&gt; counted        as one miss.</p></li><li><p>If both blocks miss --&gt; counted as one miss (not        two)</p></li></ul></div></li><li><p>Instructions that modify a memory location    (eg. <code class="computeroutput">inc</code> and    <code class="computeroutput">dec</code>) are counted as doing    just a read, ie. a single data reference.  This may seem    strange, but since the write can never cause a miss (the read    guarantees the block is in the cache) it's not very    interesting.</p><p>Thus it measures not the number of times the data cache    is accessed, but the number of times a data cache miss could    occur.</p></li></ul></div><p>If you are interested in simulating a cache with differentproperties, it is not particularly hard to write your own cachesimulator, or to modify the existing ones in<code class="computeroutput">vg_cachesim_I1.c</code>,<code class="computeroutput">vg_cachesim_D1.c</code>,<code class="computeroutput">vg_cachesim_L2.c</code> and<code class="computeroutput">vg_cachesim_gen.c</code>.  We'd beinterested to hear from anyone who does.</p></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-manual.profile"></a>4.2.燩rofiling programs</h2></div></div></div><p>To gather cache profiling information about the program<code class="computeroutput">ls -l</code>, invoke Cachegrind likethis:</p><pre class="programlisting">valgrind --tool=cachegrind ls -l</pre><p>The program will execute (slowly).  Upon completion,summary statistics that look like this will be printed:</p><pre class="programlisting">==31751== I   refs:      27,742,716==31751== I1  misses:           276==31751== L2  misses:           275==31751== I1  miss rate:        0.0%==31751== L2i miss rate:        0.0%==31751== ==31751== D   refs:      15,430,290  (10,955,517 rd + 4,474,773 wr)==31751== D1  misses:        41,185  (    21,905 rd +    19,280 wr)==31751== L2  misses:        23,085  (     3,987 rd +    19,098 wr)==31751== D1  miss rate:        0.2% (       0.1%   +       0.4%)==31751== L2d miss rate:        0.1% (       0.0%   +       0.4%)==31751== ==31751== L2 misses:         23,360  (     4,262 rd +    19,098 wr)==31751== L2 miss rate:         0.0% (       0.0%   +       0.4%)</pre><p>Cache accesses for instruction fetches are summarisedfirst, giving the number of fetches made (this is the number ofinstructions executed, which can be useful to know in its ownright), the number of I1 misses, and the number of L2 instruction(<code class="computeroutput">L2i</code>) misses.</p><p>Cache accesses for data follow. The information is similarto that of the instruction fetches, except that the values arealso shown split between reads and writes (note each row's<code class="computeroutput">rd</code> and<code class="computeroutput">wr</code> values add up to the row'stotal).</p><p>Combined instruction and data figures for the L2 cachefollow that.</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cg-manual.outputfile"></a>4.2.1.燨utput file</h3></div></div></div><p>As well as printing summary information, Cachegrind alsowrites line-by-line cache profiling information to a file named<code class="computeroutput">cachegrind.out.pid</code>.  This fileis human-readable, but is best interpreted by the accompanyingprogram <code class="computeroutput">cg_annotate</code>, describedin the next section.</p><p>Things to note about the<code class="computeroutput">cachegrind.out.pid</code>file:</p><div class="itemizedlist"><ul type="disc"><li><p>It is written every time Cachegrind is run, and will    overwrite any existing    <code class="computeroutput">cachegrind.out.pid</code>    in the current directory (but that won't happen very often    because it takes some time for process ids to be    recycled).</p></li><li><p>It can be huge: <code class="computeroutput">ls -l</code>    generates a file of about 350KB.  Browsing a few files and    web pages with a Konqueror built with full debugging    information generates a file of around 15 MB.</p></li></ul></div><p>The <code class="computeroutput">.pid</code> suffixon the output file name serves two purposes.  Firstly, it means you don't have to rename old log files that you don't want to overwrite.  Secondly, and more importantly, it allows correct profiling with the<code class="computeroutput">--trace-children=yes</code> option ofprograms that spawn child processes.</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="cg-manual.cgopts"></a>4.2.2.燙achegrind options</h3></div></div></div><p><a name="cg.opts.para"></a>Manually specifies the I1/D1/L2 cacheconfiguration, where <code class="varname">size</code> and<code class="varname">line_size</code> are measured in bytes.  The three itemsmust be comma-separated, but with no spaces, eg:</p><div class="literallayout"><p>牋牋valgrind
cg-manual.html - 源码说明

本页面展示了「memory checking tool 源代码valgrind-3.2.1.tar.gz 这是英文使用手册」中的 cg-manual.html 源码文件，采用 HTML 编程语言编写，共 272 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与checking相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?