📄 gprof.texi

📁 基于4个mips核的noc设计
💻 TEXI
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
            0.00    0.00    3730/13496       ct_init (trees.c:385)            0.00    0.00    6525/13496       ct_init (trees.c:387)[6]  0.0    0.00    0.00   13496         init_block (trees.c:408)@end smallexample@node Annotated Source,,Line-by-line,Output@section The Annotated Source Listing@code{gprof}'s @samp{-A} option triggers an annotated source listing,which lists the program's source code, each function labeled with thenumber of times it was called.  You may also need to specify the@samp{-I} option, if @code{gprof} can't find the source code files.Compiling with @samp{gcc @dots{} -g -pg -a} augments your programwith basic-block counting code, in addition to function counting code.This enables @code{gprof} to determine how many times each lineof code was executed.For example, consider the following function, taken from gzip,with line numbers added:@smallexample 1 ulg updcrc(s, n) 2     uch *s; 3     unsigned n; 4 @{ 5     register ulg c; 6 7     static ulg crc = (ulg)0xffffffffL; 8 9     if (s == NULL) @{10         c = 0xffffffffL;11     @} else @{12         c = crc;13         if (n) do @{14             c = crc_32_tab[...];15         @} while (--n);16     @}17     crc = c;18     return c ^ 0xffffffffL;19 @}@end smallexample@code{updcrc} has at least five basic-blocks.One is the function itself.  The@code{if} statement on line 9 generates two more basic-blocks, onefor each branch of the @code{if}.  A fourth basic-block results fromthe @code{if} on line 13, and the contents of the @code{do} loop formthe fifth basic-block.  The compiler may also generate additionalbasic-blocks to handle various special cases.A program augmented for basic-block counting can be analyzed with@samp{gprof -l -A}.  I also suggest use of the @samp{-x} option,which ensures that each line of code is labeled at least once.Here is @code{updcrc}'sannotated source listing for a sample @code{gzip} run:@smallexample                ulg updcrc(s, n)                    uch *s;                    unsigned n;            2 ->@{                    register ulg c;                                    static ulg crc = (ulg)0xffffffffL;                            2 ->    if (s == NULL) @{            1 ->	c = 0xffffffffL;            1 ->    @} else @{            1 ->	c = crc;            1 ->        if (n) do @{        26312 ->            c = crc_32_tab[...];26312,1,26311 ->        @} while (--n);                    @}            2 ->    crc = c;            2 ->    return c ^ 0xffffffffL;            2 ->@}@end smallexampleIn this example, the function was called twice, passing once througheach branch of the @code{if} statement.  The body of the @code{do}loop was executed a total of 26312 times.  Note how the @code{while}statement is annotated.  It began execution 26312 times, once foreach iteration through the loop.  One of those times (the last time)it exited, while it branched back to the beginning of the loop 26311 times.@node Inaccuracy@chapter Inaccuracy of @code{gprof} Output@menu* Sampling Error::      Statistical margins of error* Assumptions::         Estimating children times@end menu@node Sampling Error,Assumptions,,Inaccuracy@section Statistical Sampling ErrorThe run-time figures that @code{gprof} gives you are based on a samplingprocess, so they are subject to statistical inaccuracy.  If a function runsonly a small amount of time, so that on the average the sampling processought to catch that function in the act only once, there is a pretty goodchance it will actually find that function zero times, or twice.By contrast, the number-of-calls and basic-block figuresare derived by counting, notsampling.  They are completely accurate and will not vary from run to runif your program is deterministic.The @dfn{sampling period} that is printed at the beginning of the flatprofile says how often samples are taken.  The rule of thumb is that arun-time figure is accurate if it is considerably bigger than the samplingperiod.The actual amount of error can be predicted.For @var{n} samples, the @emph{expected} erroris the square-root of @var{n}.  For example,if the sampling period is 0.01 seconds and @code{foo}'s run-time is 1 second,@var{n} is 100 samples (1 second/0.01 seconds), sqrt(@var{n}) is 10 samples, sothe expected error in @code{foo}'s run-time is 0.1 seconds (10*0.01 seconds),or ten percent of the observed value.Again, if the sampling period is 0.01 seconds and @code{bar}'s run-time is100 seconds, @var{n} is 10000 samples, sqrt(@var{n}) is 100 samples, sothe expected error in @code{bar}'s run-time is 1 second,or one percent of the observed value.It is likely tovary this much @emph{on the average} from one profiling run to the next.(@emph{Sometimes} it will vary more.)This does not mean that a small run-time figure is devoid of information.If the program's @emph{total} run-time is large, a small run-time for onefunction does tell you that that function used an insignificant fraction ofthe whole program's time.  Usually this means it is not worth optimizing.One way to get more accuracy is to give your program more (but similar)input data so it will take longer.  Another way is to combine the data fromseveral runs, using the @samp{-s} option of @code{gprof}.  Here is how:@enumerate@itemRun your program once.@itemIssue the command @samp{mv gmon.out gmon.sum}.@itemRun your program again, the same as before.@itemMerge the new data in @file{gmon.out} into @file{gmon.sum} with this command:@examplegprof -s @var{executable-file} gmon.out gmon.sum@end example@itemRepeat the last two steps as often as you wish.@itemAnalyze the cumulative data using this command:@examplegprof @var{executable-file} gmon.sum > @var{output-file}@end example@end enumerate@node Assumptions,,Sampling Error,Inaccuracy@section Estimating @code{children} TimesSome of the figures in the call graph are estimates---for example, the@code{children} time values and all the the time figures in caller andsubroutine lines.There is no direct information about these measurements in the profiledata itself.  Instead, @code{gprof} estimates them by making an assumptionabout your program that might or might not be true.The assumption made is that the average time spent in each call to anyfunction @code{foo} is not correlated with who called @code{foo}.  If@code{foo} used 5 seconds in all, and 2/5 of the calls to @code{foo} camefrom @code{a}, then @code{foo} contributes 2 seconds to @code{a}'s@code{children} time, by assumption.This assumption is usually true enough, but for some programs it is farfrom true.  Suppose that @code{foo} returns very quickly when its argumentis zero; suppose that @code{a} always passes zero as an argument, whileother callers of @code{foo} pass other arguments.  In this program, all thetime spent in @code{foo} is in the calls from callers other than @code{a}.But @code{gprof} has no way of knowing this; it will blindly andincorrectly charge 2 seconds of time in @code{foo} to the children of@code{a}.@c FIXME - has this been fixed?We hope some day to put more complete data into @file{gmon.out}, so thatthis assumption is no longer needed, if we can figure out how.  For thenonce, the estimated figures are usually more useful than misleading.@node How do I?@chapter Answers to Common Questions@table @asis@item How do I find which lines in my program were executed the most times?Compile your program with basic-block counting enabled, run it, thenuse the following pipeline:@examplegprof -l -C @var{objfile} | sort -k 3 -n -r@end exampleThis listing will show you the lines in your code executed most often,but not necessarily those that consumed the most time.@item How do I find which lines in my program called a particular function?Use @samp{gprof -l} and lookup the function in the call graph.The callers will be broken down by function and line number.@item How do I analyze a program that runs for less than a second?Try using a shell script like this one:@examplefor i in `seq 1 100`; do  fastprog  mv gmon.out gmon.out.$idonegprof -s fastprog gmon.out.*gprof fastprog gmon.sum@end exampleIf your program is completely deterministic, all the call countswill be simple multiples of 100 (i.e. a function called once ineach run will appear with a call count of 100).@end table@node Incompatibilities@chapter Incompatibilities with Unix @code{gprof}@sc{gnu} @code{gprof} and Berkeley Unix @code{gprof} use the same datafile @file{gmon.out}, and provide essentially the same information.  Butthere are a few differences.@itemize @bullet@item@sc{gnu} @code{gprof} uses a new, generalized file format with supportfor basic-block execution counts and non-realtime histograms.  A magiccookie and version number allows @code{gprof} to easily identifynew style files.  Old BSD-style files can still be read.@xref{File Format}.@itemFor a recursive function, Unix @code{gprof} lists the function as aparent and as a child, with a @code{calls} field that lists the numberof recursive calls.  @sc{gnu} @code{gprof} omits these lines and putsthe number of recursive calls in the primary line.@itemWhen a function is suppressed from the call graph with @samp{-e}, @sc{gnu}@code{gprof} still lists it as a subroutine of functions that call it.@item@sc{gnu} @code{gprof} accepts the @samp{-k} with its argumentin the form @samp{from/to}, instead of @samp{from to}.@itemIn the annotated source listing,if there are multiple basic blocks on the same line,@sc{gnu} @code{gprof} prints all of their counts, separated by commas.@ignore - it does this now@itemThe function names printed in @sc{gnu} @code{gprof} output do not includethe leading underscores that are added internally to the front of allC identifiers on many operating systems.@end ignore@itemThe blurbs, field widths, and output formats are different.  @sc{gnu}@code{gprof} prints blurbs after the tables, so that you can see thetables without skipping the blurbs.@end itemize@node Details@chapter Details of Profiling@menu* Implementation::      How a program collects profiling information* File Format::         Format of @samp{gmon.out} files* Internals::           @code{gprof}'s internal operation* Debugging::           Using @code{gprof}'s @samp{-d} option@end menu@node Implementation,File Format,,Details@section Implementation of ProfilingProfiling works by changing how every function in your program is compiledso that when it is called, it will stash away some information about whereit was called from.  From this, the profiler can figure out what functioncalled it, and can count how many times it was called.  This change is madeby the compiler when your program is compiled with the @samp{-pg} option,which causes every function to call @code{mcount}(or @code{_mcount}, or @code{__mcount}, depending on the OS and compiler)as one of its first operations.The @code{mcount} routine, included in the profiling library,is responsible for recording in an in-memory call graph tableboth its parent routine (the child) and its parent's parent.  This istypically done by examining the stack frame to find boththe address of the child, and the return address in the original parent.Since this is a very machine-dependent operation, @code{mcount}itself is typically a short assembly-language stub routinethat extracts the requiredinformation, and then calls @code{__mcount_internal}(a normal C function) with two arguments - @code{frompc} and @code{selfpc}.@code{__mcount_internal} is responsible for maintainingthe in-memory call graph, which records @code{frompc}, @code{selfpc},and the number of times each of these call arcs was traversed.GCC Version 2 provides a magical function (@code{__builtin_return_address}),which allows a generic @code{mcount} function to extract therequired information from the stack frame.  However, on somearchitectures, most notably the SPARC, using this builtin
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -