cg-tech-docs.html

来自「memory checking tool 源代码valgrind-3.2.1.」· HTML 代码 · 共 481 行 · 第 1/2 页
HTML
481 行
|(uninit)|      (padding)   (1 byte)|i_addr2 |      instr_addr  (4 bytes)|0       |      I.a         (8 bytes)|0       |      I.m1        (8 bytes)|0       |      I.m2        (8 bytes)|0       |      D.a         (8 bytes)|0       |      D.m1        (8 bytes)|0       |      D.m2        (8 bytes)</pre><p>(Note that this step is not performed if a basic block isre-translated; see <a href="cg-tech-docs.html#cg-tech-docs.retranslations">Handling basic block retranslations</a> formore information.)</p><p>GCC inserts padding before the<code class="computeroutput">instr_size</code> field so that it isword aligned.</p><p>The instrumentation added to call the cache simulationfunction looks like this (instrumentation is indented todistinguish it from the original UCode):</p><pre class="programlisting">MOVL      $0x0, t20PUTL      t20, %EAX  PUSHL     %eax  PUSHL     %ecx  PUSHL     %edx  MOVL      $0x4091F8A4, t46  # address of 1st CC  PUSHL     t46  CALLMo    $0x12             # second cachesim function  CLEARo    $0x4  POPL      %edx  POPL      %ecx  POPL      %eaxINCEIPo   $5LEA1L     -4(t4), t14MOVL      $0x99, t18  MOVL      t14, t42STL       t18, (t14)  PUSHL     %eax  PUSHL     %ecx  PUSHL     %edx  PUSHL     t42  MOVL      $0x4091F8C4, t44  # address of 2nd CC  PUSHL     t44  CALLMo    $0x13             # second cachesim function  CLEARo    $0x8  POPL      %edx  POPL      %ecx  POPL      %eaxINCEIPo   $7</pre><p>Consider the first instruction's UCode.  Each call issurrounded by three <code class="computeroutput">PUSHL</code> and<code class="computeroutput">POPL</code> instructions to save andrestore the caller-save registers.  Then the address of theinstruction's cost centre is pushed onto the stack, to be thefirst argument to the cache simulation function.  The address isknown at this point because we are doing a simultaneous passthrough the cost centre array.  This means the cost centre lookupfor each instruction is almost free (just the cost of pushing anargument for a function call).  Then the call to the cachesimulation function for non-memory-reference instructions is made(note that the <code class="computeroutput">CALLMo</code>UInstruction takes an offset into a table of predefinedfunctions; it is not an absolute address), and the singleargument is <code class="computeroutput">CLEAR</code>ed from thestack.</p><p>The second instruction's UCode is similar.  The onlydifference is that, as mentioned before, we have to pass theaddress of the data item referenced to the cache simulationfunction too.  This explains the <code class="computeroutput">MOVL t14,t42</code> and <code class="computeroutput">PUSHLt42</code> UInstructions.  (Note that the seeminglyredundant <code class="computeroutput">MOV</code>ing will probablybe optimised away during register allocation.)</p><p>Note that instead of storing unchanging information abouteach instruction (instruction size, data size, etc) in its costcentre, we could have passed in these arguments to the simulationfunction.  But this would slow the calls down (two or three extraarguments pushed onto the stack).  Also it would bloat the UCodeinstrumentation by amounts similar to the space required for themin the cost centre; bloated UCode would also fill the translationcache more quickly, requiring more translations for largeprograms and slowing them down more.</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-tech-docs.retranslations"></a>2.5.燞andling basic block retranslations</h2></div></div></div><p>The above description ignores one complication.  Valgrindhas a limited size cache for basic block translations; if itfills up, old translations are discarded.  If a discarded basicblock is executed again, it must be re-translated.</p><p>However, we can't use this approach for profiling -- wecan't throw away cost centres for instructions in the middle ofexecution!  So when a basic block is translated, we first lookfor its cost centre array in the hash table.  If there is no costcentre array, it must be the first translation, so we proceed asdescribed above.  But if there is a cost centre array already, itmust be a retranslation.  In this case, we skip the cost centreallocation and initialisation steps, but still do the UCodeinstrumentation step.</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-tech-docs.cachesim"></a>2.6.燭he cache simulation</h2></div></div></div><p>The cache simulation is fairly straightforward.  It justtracks which memory blocks are in the cache at the moment (itdoesn't track the contents, since that is irrelevant).</p><p>The interface to the simulation is quite clean.  Thefunctions called from the UCode contain calls to the simulationfunctions in the files<code class="filename">vg_cachesim_{I1,D1,L2}.c</code>; these calls areinlined so that only one function call is done per simulated x86instruction.  The file <code class="filename">vg_cachesim.c</code> simply<code class="computeroutput">#include</code>s the three filescontaining the simulation, which makes plugging in new cachesimulations is very easy -- you just replace the three files andrecompile.</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-tech-docs.output"></a>2.7.燨utput</h2></div></div></div><p>Output is fairly straightforward, basically printing thecost centre for every instruction, grouped by files andfunctions.  Total counts (eg. total cache accesses, total L1misses) are calculated when traversing this structure rather thanduring execution, to save time; the cache simulation functionsare called so often that even one or two extra adds can make asizeable difference.</p><p>Input file has the following format:</p><pre class="programlisting">file         ::= desc_line* cmd_line events_line data_line+ summary_linedesc_line    ::= "desc:" ws? non_nl_stringcmd_line     ::= "cmd:" ws? cmdevents_line  ::= "events:" ws? (event ws)+data_line    ::= file_line | fn_line | count_linefile_line    ::= ("fl=" | "fi=" | "fe=") filenamefn_line      ::= "fn=" fn_namecount_line   ::= line_num ws? (count ws)+summary_line ::= "summary:" ws? (count ws)+count        ::= num | "."</pre><p>Where:</p><div class="itemizedlist"><ul type="disc"><li><p><code class="computeroutput">non_nl_string</code> is any    string not containing a newline.</p></li><li><p><code class="computeroutput">cmd</code> is a command line    invocation.</p></li><li><p><code class="computeroutput">filename</code> and    <code class="computeroutput">fn_name</code> can be anything.</p></li><li><p><code class="computeroutput">num</code> and    <code class="computeroutput">line_num</code> are decimal    numbers.</p></li><li><p><code class="computeroutput">ws</code> is whitespace.</p></li><li><p><code class="computeroutput">nl</code> is a newline.</p></li></ul></div><p>The contents of the "desc:" lines is printed out at the topof the summary.  This is a generic way of providing simulationspecific information, eg. for giving the cache configuration forcache simulation.</p><p>Counts can be "." to represent "N/A", eg. the number ofwrite misses for an instruction that doesn't write tomemory.</p><p>The number of counts in each<code class="computeroutput">line</code> and the<code class="computeroutput">summary_line</code> should not exceedthe number of events in the<code class="computeroutput">event_line</code>.  If the number ineach <code class="computeroutput">line</code> is less, cg_annotatetreats those missing as though they were a "." entry.</p><p>A <code class="computeroutput">file_line</code> changes thecurrent file name.  A <code class="computeroutput">fn_line</code>changes the current function name.  A<code class="computeroutput">count_line</code> contains counts thatpertain to the current filename/fn_name.  A "fn="<code class="computeroutput">file_line</code> and a<code class="computeroutput">fn_line</code> must appear before any<code class="computeroutput">count_line</code>s to give the contextof the first <code class="computeroutput">count_line</code>s.</p><p>Each <code class="computeroutput">file_line</code> should beimmediately followed by a<code class="computeroutput">fn_line</code>.  "fi="<code class="computeroutput">file_lines</code> are used to switchfilenames for inlined functions; "fe="<code class="computeroutput">file_lines</code> are similar, but areput at the end of a basic block in which the file name hasn'tbeen switched back to the original file name.  (fi and fe linesbehave the same, they are only distinguished to helpdebugging.)</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-tech-docs.summary"></a>2.8.燬ummary of performance features</h2></div></div></div><p>Quite a lot of work has gone into making the profiling asfast as possible.  This is a summary of the importantfeatures:</p><div class="itemizedlist"><ul type="disc"><li><p>The basic block-level cost centre storage allows almost    free cost centre lookup.</p></li><li><p>Only one function call is made per instruction    simulated; even this accounts for a sizeable percentage of    execution time, but it seems unavoidable if we want    flexibility in the cache simulator.</p></li><li><p>Unchanging information about an instruction is stored    in its cost centre, avoiding unnecessary argument pushing,    and minimising UCode instrumentation bloat.</p></li><li><p>Summary counts are calculated at the end, rather than    during execution.</p></li><li><p>The <code class="computeroutput">cachegrind.out</code>    output files can contain huge amounts of information; file    format was carefully chosen to minimise file sizes.</p></li></ul></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-tech-docs.annotate"></a>2.9.燗nnotation</h2></div></div></div><p>Annotation is done by cg_annotate.  It is a fairlystraightforward Perl script that slurps up all the cost centres,and then runs through all the chosen source files, printing outcost centres with them.  It too has been carefully optimised.</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="cg-tech-docs.extensions"></a>2.10.燬imilar work, extensions</h2></div></div></div><p>It would be relatively straightforward to do othersimulations and obtain line-by-line information about interestingevents.  A good example would be branch prediction -- allbranches could be instrumented to interact with a branchprediction simulator, using very similar techniques to thosedescribed above.</p><p>In particular, cg_annotate would not need to change -- thefile format is such that it is not specific to the cachesimulation, but could be used for any kind of line-by-lineinformation.  The only part of cg_annotate that is specific tothe cache simulation is the name of the input file(<code class="computeroutput">cachegrind.out</code>), although itwould be very simple to add an option to control this.</p></div></div><div><br><table class="nav" width="100%" cellspacing="3" cellpadding="2" border="0" summary="Navigation footer"><tr><td rowspan="2" width="40%" align="left"><a accesskey="p" href="mc-tech-docs.html">&lt;&lt;
cg-tech-docs.html - 源码说明

本页面展示了「memory checking tool 源代码valgrind-3.2.1.tar.gz 这是英文使用手册」中的 cg-tech-docs.html 源码文件，采用 HTML 编程语言编写，共 481 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与checking相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?