http:^^www.cs.washington.edu^homes^shuntak^research.html
来自「This data set contains WWW-pages collect」· HTML 代码 · 共 253 行
HTML
253 行
Date: Tue, 10 Dec 1996 22:29:40 GMTServer: NCSA/1.4.2Content-type: text/htmlLast-modified: Tue, 09 Jan 1996 22:40:14 GMTContent-length: 12258<HTML><HEAD><TITLE>Research Summary of Shun-Tak A. Leung</TITLE></HEAD><BODY><H1>Research Summary</H1><HR><P>My current research centers on compiler-directed program restructuringtechniques to improve the cache locality of array accesses in loops.Specifically, I am studying the use of <!WA0><!WA0><!WA0><!WA0><!WA0><A HREF="#array-restructuring">array restructuring</A> to optimize for spatial locality and, in parallelexecution, to reduce false sharing.<P>I have also worked on improving the performance and applicability of <!WA1><!WA1><!WA1><!WA1><!WA1><AHREF="#runtime-parallelization">runtime parallelization</A> in an earlierproject.<P>My advisor is Prof. <!WA2><!WA2><!WA2><!WA2><!WA2><AHREF="http://www.cs.washington.edu/people/faculty/zahorjan.html">JohnZahorjan</A>. For a list of my publications on these and other subjects,<!WA3><!WA3><!WA3><!WA3><!WA3><A HREF="http://www.cs.washington.edu/homes/shuntak/publications.html">click</A> here.<HR><H2><A NAME="array-restructuring">Array Restructuring</A></H2><P>My current research focuses on compiler-directed program restructuringtechniques to improve cache locality. Specifically, I am studying the useof array restructuring to optimize array accesses in loops. My workcombines the development of algorithms within a formal framework,implementation of a prototype compiler, and extensive experimentationwith benchmark loops. It shows how array restructuring can be appliedautomatically and efficiently to a wide class of applications.<P>Array restructuring is an approach to enhancing the locality of arrayaccesses in loops. These accesses are targeted because they account for amajor portion of the memory traffic in many array-based scientificcomputations. Moreover, since they are typically executed many times, theeffort spent on optimizing a few of them in the program text can yield hugebenefits in execution performance.<P>Under the array restructuring approach, the compiler analyzes how eacharray is accessed and lays out the array appropriately according to theaccess pattern. As a trivial example, if a two-dimensional array isaccessed by rows, the compiler may decide to store it in row-major order,whereas if it is accessed by columns, the compiler would choosecolumn-major storage. In contrast, traditionally the storage order is fixedfor all arrays, forcing programmers concerned about program performance towrite programs in such a way that the data access pattern matches the fixeddata layout as far as possible.<P>My research in array restructuring is motivated in part by theobservation that array restructuring in many ways complements looprestructuring --- an alternative approach that changes the execution orderof loop iterations rather than the storage order of array elements --- buthas received much less attention. For example, array restructuring can beeasily applied to complicated loops that may hamper many automatic looprestructuring techniques. Also, array restructuring can improve spatiallocality without jeopardizing temporal locality, whereas loop restructuringaffects both types of locality. However, while loop restructuring has beenwidely studied, relatively little is known about how to apply arrayrestructuring automatically and efficiently.<P>My research shows how array restructuring can be applied automaticallyand efficiently to a wide class of programs. This not only provides a newset of techniques to complement existing loop restructuring techniques, butalso produces insights and experience that will, I believe, contribute toan integrated approach combining the strengths of the two. Specifically, mywork makes four contributions: a <!WA4><!WA4><!WA4><!WA4><!WA4><A HREF="#framework">framework</A> torepresent array transformations, <!WA5><!WA5><!WA5><!WA5><!WA5><A HREF="#algorithms">algorithms</A> toautomate array restructuring within this framework, a <!WA6><!WA6><!WA6><!WA6><!WA6><AHREF="#prototype">prototype</A> compiler to implement these algorithms, and<!WA7><!WA7><!WA7><!WA7><!WA7><A HREF="#experiments">experiments</A> to evaluate their effectiveness.<H3><A NAME="framework">Framework</A></H3><P>I have formulated a framework to represent a general class of arraytransformations. In this framework, an array to be restructured (the"original array") is replaced by another array (the "restructured array")that contains the same elements but in a different order. Correspondencebetween elements of these two arrays is defined by an invertible lineartransformation of their index vectors. In other words, instead of using anindex vector to find an element in the original array, we apply a lineartransformation to that vector and use the result to find the correspondingelement in the restructured array. It may appear that the extratransformation imposes a significant overhead, but in fact this is not thecase, for the following reason. Traditionally the memory address of anarray element is a linear function of the indices. This condition is thebasis of most compiler optimizations for reducing indexingoverhead. Applying an extra linear transformation to the index vectors doesnot invalidate this condition and therefore does not entail any extraindexing overhead --- a property essential for the efficiency and thusviability of array restructuring.<H3><A NAME="algorithms">Algorithms</A></H3><P>I have developed algorithms within this framework for the key steps inarray restructuring. My algorithms solve these problems with simple linearalgebraic techniques in the common case where the array indices are linearfunctions of loop variables. These algorithms have also been adapted todeal with more general access patterns as well.<UL><LI>The first step of array restructuring is to analyze the access patternof each array and choose a transformation to optimize for locality. To dothis, we can represent each array access as a linear mapping, relate theaccess's locality properties to the mapping's mathematical properties, andselect a linear transformation to effect desired changes in the mapping ---and thus in the access itself. <P><LI>Second, we need to compute the set of elements accessed by the loop todetermine which elements the restructured array must contain. This isachieved by representing loop and array bounds as sets of linearinequalities or, geometrically, convex polyhedra, which are manipulatedwith known mathematical techniques. <P><LI>Third, elements of the restructured array have to be laid out in memoryin such a way that each element can be efficiently located given itsindices. This is a non-trivial problem; for example, in the case oftwo-dimensional arrays, general array transformations may cause rows of therestructured array to have different lengths and start at different columnpositions, violating the basic assumptions of the traditional way of layingout array elements. My solution is to apply a further transformation thatreduces the problem to a more traditional form without jeopardizing thelocality improvement achieved by prior transformations. <P><LI>Finally, the program code is transformed appropriately. Transformedarray accesses are generated from their linear mapping representationscomputed earlier. <P></UL><H3><A NAME="prototype">Prototype</A></H3><P>I have implemented a prototype compiler to perform array restructuringautomatically. It is based on the <!WA8><!WA8><!WA8><!WA8><!WA8><A HREF="http://suif.stanford.edu/">SUIFcompiler</A> from Stanford University. SUIF itself comprises a number ofcompiler passes over an intermediate program representation. Myimplementation consists of an array restructuring pass (about 9,000 linesof C++) added after SUIF's own optimization passes, and a runtime library(about 2,000 lines of C and C++).<H3><A NAME="experiments">Results</A></H3><P>I have performed a series of experiments using the NASA7 kernels fromthe SPEC benchmarks and other loops from the literature. The experimentswere designed to study how array restructuring affects performance for arange of problem sizes as well as how it compares and interacts withvarious loop restructuring techniques. They were carried out on fourdifferent platforms representing two types of machines: single-processorworkstations (Alpha-based DEC 3000 and PowerPC-based IBM RS/6000) andshared-memory multiprocessors (R8000-based SGI Power Challenge and KendallSquare Research's KSR2, which has a proprietary processor).<P>Experimental results have been encouraging. On a DEC 3000 workstation,array restructuring decreased the execution time in most of the loops by40-50% (over 80% in some cases), and increased it in none. The averageimprovement was 53% (or 31% if including loops for which the compiler didnot apply array restructuring at all). This occurred for a wide range ofproblem sizes. Results for IBM RS/6000 were similar. On both platforms,performance improved because array restructuring led to better spatiallocality. For the same reason, performance on KSR2 and SGI Power Challengeshowed similar improvements for execution on any number ofprocessors. Moreover, in several cases where false sharing had existed,array restructuring removed this performance bottleneck, producingperformance benefits that increased with the number of processors.<P>Experiments also showed that in both applicability and performance,array restructuring techniques complemented many common looprestructuring techniques, including those performed by a production qualityoptimizing compiler from SGI. It was successfully applied to loops thatthese techniques could not automatically transform. It achieved comparable,often better, performance where both were applicable. In the few caseswhere it did not perform as well, simple forms of loop restructuringwould have sufficed, again suggesting that loop and array restructuringare complementary.<P>More can be found in a <!WA9><!WA9><!WA9><!WA9><!WA9><AHREF="ftp://ftp.cs.washington.edu/tr/1992/10/UW-CSE-95-09-01.PS.Z">technicalreport</A>. A more concise version is <!WA10><!WA10><!WA10><!WA10><!WA10><A HREF="http://www.cs.washington.edu/homes/shuntak/abstract.ps">here</A>.<HR><H2><A NAME="runtime-parallelization">Runtime Parallelization</A></H2><P>Runtime parallelization is a two-step strategy of parallelizingloops that may contain substantial parallelism but cannot be parallelizedat compile time because of insufficient dependence information. Toparallelize such a loop, the compiler generates two code fragments: theinspector and the executor. At run time, the inspector computes a parallelschedule of the iterations based on dependence information not available atcompile time. The executor performs the iterations in parallel using thisschedule.<P>My research in runtime parallelization has touched on both theinspector and the executor. I have proposed two ways to speed up theinspector. This work appeared in the Fourth ACM SIGPLAN Symposium onPrinciples and Practice of Parallel Programming. The paper is alsoavailable as technical report<!WA11><!WA11><!WA11><!WA11><!WA11><A HREF="ftp://ftp.cs.washington.edu/tr/1992/10/UW-CSE-92-10-05.PS.Z">92-12-05</A>. I have also studied various formsof the executor to improve its performance and extend its applicability tocomplex dependence patterns. (This research is reported in technicalreport <!WA12><!WA12><!WA12><!WA12><!WA12><A HREF="ftp://ftp.cs.washington.edu/tr/1992/10/UW-CSE-95-01-08.PS.Z">95-01-08</A>.) My experiments on the KSR1, a shared address spacemultiprocessor, show that false sharing and poor spatial locality couldseriously degrade executor performance. I have proposed and experimentallyevaluated two simple techniques to address these problems by restructuringarrays according to the parallel execution schedule. (This research isreported in technical report <!WA13><!WA13><!WA13><!WA13><!WA13><A HREF="ftp://ftp.cs.washington.edu/tr/1992/10/UW-CSE-94-02-01.PS.Z">94-02-01</A>.)<!-- Standard footer--><HR><ADDRESS> <!WA14><!WA14><!WA14><!WA14><!WA14><A HREF="http://www.cs.washington.edu/homes/shuntak/"> Shun-Tak A. Leung</A><BR> <!WA15><!WA15><!WA15><!WA15><!WA15><A HREF="http://www.cs.washington.edu/"> Department of Computer Science & Engineering </A><BR> <!WA16><!WA16><!WA16><!WA16><!WA16><A HREF="http://www.cac.washington.edu:1183/home/the_uw.html"> University of Washington </A> <BR> Box 352350 <BR> <!WA17><!WA17><!WA17><!WA17><!WA17><A HREF="http://www.cs.washington.edu/area/">Seattle</a>, WA 98195-2350 <BR> <!WA18><!WA18><!WA18><!WA18><!WA18><A HREF="mailto:shuntak@cs.washington.edu">Email: shuntak@cs.washington.edu</A><BR> Fax: (206)543-2969</ADDRESS> <HR>Last modified: January 8, 1996</BODY><! Local Variables: ><! mode: html ><! eval: (auto-fill-mode 1) ><! End: >
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?