📄 mprof.tex
字号:
\label{recursive call figure}\end{figure}We considered several solutions to the problems caused by cycles andadopted the most conservative solution. One way to avoid recordingspurious allocation caused by recursion is for the monitor to identifythe cycles before recording the allocation. For example, inFigure~\ref{recursive call figure}, the monitor could realize that ithad already credited {\tt F} with the 10 bytes when it encountered{\tt F} calling {\tt G} the second time. This solution adds overheadto the monitor and conflicts with our goal to make the monitor asunobtrusive as possible.The solution that we adopted was to merge functions that are in acycle into a single node in the reduction phase. Thus, each stronglyconnected component in the dynamic call graph is merged into a singlenode. The result is a call graph with no cycles. This process isalso used by {\tt gprof}, and described carefullyelsewhere~\cite{graham83:gprof}. Such an approach works well in {\ttgprof} because C programs, for which {\tt gprof} was primarilyintended, tend to have limited amounts of recursion. Lisp programs,for which {\tt mprof} is also intended, intuitively contain much morerecursion. We have experience profiling a number of large Common Lispprograms. We observe several recursive cycles in most programs, butthe cycles generally contain a small percentage of the total functionsand {\tt mprof} is quite effective.\subsection{Lisp Implementation}So far, we have described the implementation of {\tt mprof} for C.The Lisp implementation is quite similar, and here we describe themajor differences. C has a single function, {\tt malloc}, that iscalled to allocate memory explicitly. Lisp has a large number ofprimitives that allocate memory implicitly (i.e., {\tt cons}, {\tt *},{\tt intern}, etc.). To make {\tt mprof} work, these primitives mustbe modified so that every allocation is recorded. Fortunately, at theLisp implementation level, all memory allocations may be channeledthrough a single routine. We worked with KCL (Kyoto Common Lisp),which is implemented in C. In KCL, all Lisp memory allocations arehandled by a single function, {\tt alloc\_object}. Just as we hadmodified {\tt malloc} in C, we were able to simply patch {\ttalloc\_object} to monitor memory allocation in KCL.The other major difference in monitoring Lisp is that the addressesrecorded by the monitor must be translated into Lisp function names.Again, KCL makes this quite easy because Lisp functions are defined ina central place in KCL and the names of the functions are known whenthey are defined. Many other Lisp systems are designed to allowreturn addresses to be mapped to symbolic function names so that thecall stack can be printed at a breakpoint. In this case, the monitorcan use the same mechanism to map return addresses to function names.Therefore, in Lisp systems in which addresses can be quickly mapped tofunction names, memory profiling in the style of {\tt mprof} is not adifficult problem. In systems in which symbolic names are notavailable in compiled code, profiling is more difficult. Furthermore,many systems open-code important allocation functions, like {\ttcons}. Because open-coded allocation functions will not necessarilycall a central allocation function (like {\tt alloc\_object}), suchallocations will not be observed by {\tt mprof}. To avoid such a lossof information, {\tt mprof} should be used in conjunction with programdeclarations that will force allocation functions such as {\tt cons}to be coded out-of-line.\section{Measurements}\label{measurements}We have measured the C implementation of {\tt mprof} by instrumentingfour programs using {\tt mprof}. The first program, {\tt example}, isour example program with the number of widgets allocated increased to100,000 to increase program execution time. The second program, {\ttfidilrt}, is the runtime library of FIDIL, a programming language forfinite difference computations~\cite{hilfinger88:fidil}. The thirdprogram, {\tt epoxy}, is an electrical and physical layout optimizerwritten by Fred Obermeier~\cite{fwo87:epoxy}. The fourth program,{\tt crystal}, is a VLSI timing analysisprogram~\cite{ouster85:crystal}. These tests represent a smallprogram ({\tt example}, 100 lines); a medium-sized program ({\ttfidilrt}, 7,100 lines); and two large programs ({\tt epoxy}, 11,000lines and {\tt crystal}, 10,500 lines). In the remainder of thissection, we will look at the resource consumption of {\tt mprof} fromtwo perspectives: execution time overhead and space consumption.\subsection{Execution Time Overhead}There are two sources of execution time overhead associated with {\ttmprof}: additional time spent monitoring an application and the timeto reduce and print the data produced by the monitor. The largestsource of monitor overhead is the time required to traverse thecomplete call chain and associate allocations with caller/calleepairs. We implemented a version of {\tt mprof}, called {\tt mprof-},which does not create the allocation call graph. With this version,we can see the relative cost of the allocation call graph. The ratioof the time spent with profiling to the time spent without profilingis called the {\em slowdown factor\/}. Table \ref{cpu:tab} summarizesthe execution time overheads for our four applications. Measurementswere gathered running the test programs on a VAX 8800 with 80megabytes of physical memory.\begin{table}[htbp]\begin{singlespace}\begin{center}\begin{tabular}{|l|r|r|r|r|} \hline\multicolumn{1}{|c|}{} & \multicolumn{4}{|c|}{Cost} \\ \cline{2-5}\multicolumn{1}{|c|}{Resource Description} & \multicolumn{1}{|c|}{\tt example} & \multicolumn{1}{|c|}{\tt fidilrt} & \multicolumn{1}{|c|}{\tt epoxy} & \multicolumn{1}{|c|}{\tt crystal} \\ \hlineNumber of allocations & 100000 & 77376 & 306295 & 31158 \\Execution time with {\tt mprof} (seconds) & 62.7 & 132.7 & 188.8 & 134.1 \\Execution time with {\tt mprof-} (seconds) & 44.1 & 116.0 & 149.7 & 25.5 \\Execution time without {\tt mprof} (seconds) & 17.9 & 107.1 & 52.1 & 13.2 \\\hlineSlowdown using {\tt mprof} & 3.5 & 1.2 & 3.6 & 10.1 \\Slowdown using {\tt mprof-} & 2.5 & 1.1 & 2.9 & 1.9 \\\hline Reduction and display time (seconds) & 10.3 & 28.8 & 28.3 & 36.8 \\\hline \end{tabular}\caption{Execution Time Overhead of {\tt mprof}}\label{cpu:tab}\end{center}\end{singlespace}\end{table}The slowdown associated with {\tt mprof} varies widely, from 1.5 to10. {\tt crystal} suffered the worst degradation from profilingbecause {\tt crystal} uses a depth-first algorithm that results inlong call chains. Programs without long call chains appear to slowdown by a factor of 2--4. If the allocation call graph is notgenerated and long call chains are not traversed, the slowdown issignificantly less, especially in the extreme cases. Since {\ttmprof} is a prototype and has not been carefully optimized, thisoverhead seems acceptable. From the table, we see the reduction anddisplay time is typically less than a minute.\subsection{Storage Consumption}\label{mmeas:sec}The storage consumption of {\tt mprof} is divided into the additionalmemory needed by the monitor as an application executes, and theexternal storage required by the profile data file. The mostsignificant source of memory used by the monitor is the data storedwith each object allocated: an object size and a pointer needed toconstruct the memory leak table. The monitor also uses memory torecord the memory bins and caller/callee byte counts and must writethis information to a file when the application is finished. Wemeasured how many bytes of memory and disk space are needed to storethis information. Table~\ref{mem:tab} summarizes the measurements ofstorage consumption associated with {\tt mprof}.\begin{table}[htbp]\begin{singlespace}\begin{center}\begin{tabular}{|l|r|r|r|r|} \hline\multicolumn{1}{|c|}{} & \multicolumn{4}{|c|}{Cost} \\ \cline{2-5}\multicolumn{1}{|c|}{Resource Description} & \multicolumn{1}{|c|}{\tt example} & \multicolumn{1}{|c|}{\tt fidilrt} & \multicolumn{1}{|c|}{\tt epoxy} & \multicolumn{1}{|c|}{\tt crystal} \\ \hlineNumber of allocations & 100000 & 61163 & 306295 & 31158 \\User memory allocated (Kbytes) & 20000 & 2425 & 6418 & 21464 \\\hlinePer object memory (Kbytes) & 781 & 477 & 2393 & 168 \\Other monitor memory (Kbytes) & 8.7 & 23.3 & 52.3 & 17.5 \\Total monitor memory (Kbytes) & 790 & 500 & 2445 & 186 \\Monitor fraction (\% memory allocated) & 4 & 17 & 28 & 1 \\\hlineData file size (Kbytes) & 4.5 & 8.1 & 28.6 & 9.6 \\\hline \end{tabular}\caption{Storage Consumption of {\tt mprof}}\label{mem:tab}\end{center}\end{singlespace}\end{table}The memory overhead of {\tt mprof} is small, except that an additional8 bytes of storage are allocated with every object. In programs inwhich many small objects are allocated, like {\tt epoxy}, {\tt mprof}can contribute significantly to the total memory allocated.Nevertheless, in the worst case, {\tt mprof} increases applicationsize by 1/3, and since {\tt mprof} is a development tool, thisoverhead seems acceptable. From the table we also see that the datafiles created by {\tt mprof} are quite small ($<$ 30 Kbytes).\section{Related Work}\label{related work}{\tt Mprof} is similar to the tool {\tt gprof}~\cite{graham83:gprof},a dynamic execution profiler. Because some of the problems ofinterpreting the dynamic call graph are the same, we have borrowedthese ideas from {\tt gprof}. Also, we have used ideas from the userinterface of {\tt gprof} for two reasons: because the informationbeing displayed is quite similar and because users familiar with {\ttgprof} would probably also be interested in {\tt mprof} and wouldbenefit from a similar presentation.Barach, Taenzer, and Wells developed a tool for finding storageallocation errors in C programs~\cite{barach82:memprof}. Theirapproach concentrated on finding two specific storage allocationerrors: memory leaks and duplicate frees. They modified {\tt malloc}and {\tt free} so that every time that these functions were called,information about the memory block being manipulated was recorded in afile. A program that examines this file, {\tt prleak}, prints outwhich memory blocks were never freed or were freed twice. Thisapproach differs from {\tt mprof} in two ways. First, {\tt mprof}provides more information about the memory allocation of programs than{\tt prleak}, which just reports on storage errors. Second, {\ttprleak} generates extremely large intermediate files that arecomparable in size to the total amount of memory allocated by theprogram (often megabytes of data). Although {\tt mprof} records moreuseful information, the data files it generates are of modest size(see the table above).\section{Conclusions}\label{conclusions}We have implemented a memory allocation profiling program for both Cand Common Lisp. Our example has shown that {\tt mprof} can beeffective in elucidating the allocation behavior of a program so thata programmer can detect memory leaks and identify major sources ofallocation.Unlike {\tt gprof}, {\tt mprof} records every caller in the call chainevery time an object is allocated. The overhead for this recording islarge but not impractically so, because we take advantage of the factthat a call chain changes little between allocations. Moreover,recording this information does not require large amounts of memorybecause there are relatively few unique caller/callee address pairs oncall chains in which allocation takes place, even in very largeprograms. We have measured the overhead of {\tt mprof}, and find thattypically it slows applications by a factor of 2--4 times, and adds upto 33\% to the memory allocated by the application. Because {\ttmprof} is intended as a development tool, these costs are acceptable.Because {\tt mprof} merges cycles caused by recursive function calls,{\tt mprof} may be ineffective for programs with large cycles in theircall graph. Only with more data will we be able to decide if manyprograms (especially those written in Lisp) contain so many recursivecalls that cycle merging makes {\tt mprof} ineffective. Nevertheless,{\tt mprof} has already been effective in detecting KCL systemfunctions that allocate memory extraneously.\footnote{Using {\ttmprof}, we noted that for a large object-oriented program written inKCL, the system function {\tt every} accounted for 13\% of the memoryallocated. We rewrote {\tt every} so it would not allocate anymemory, and decreased the memory consumption of the program by 13\%.}As a final note, we have received feedback from C applicationprogrammers who have used {\tt mprof}. They report that the memoryleak table and the allocation bin table are both extremely useful,while the direct allocation table and the allocation call graph areharder to understand and also less useful. Considering the executionoverhead associated with the allocation call graph and the complexityof the table, it is questionable whether the allocation call graphwill ever be as helpful C programmers as the memory leak table. Onthe other hand, with automatic storage reclamation, the memory leaktable becomes unnecessary. Yet for memory intensive languages, suchas Lisp, the need for effective use of the memory is more important,and tools such as the allocation call graph might prove very useful.Because we have limited feedback from Lisp programmers using {\ttmprof}, we cannot report their response to this tool.\bibliography{/spur2/larus/bib/abbrev,lispm,gc,/spur2/larus/bib/optim,spur,misc,lisp}\bibliographystyle{plain}\end{document}%
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -