fftw_4.html

来自「FFTW, a collection of fast C routines to」· HTML 代码 · 共 1,105 行 · 第 1/3 页
HTML
1,105 行
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"><HTML><HEAD><!-- This HTML file has been created by texi2html 1.52     from fftw.texi on 24 March 2003 --><TITLE>FFTW - Parallel FFTW</TITLE></HEAD><BODY TEXT="#000000" BGCOLOR="#FFFFFF">Go to the <A HREF="fftw_1.html">first</A>, <A HREF="fftw_3.html">previous</A>, <A HREF="fftw_5.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>.<P><HR><P><H1><A NAME="SEC47">Parallel FFTW</A></H1><P><A NAME="IDX201"></A>In this chapter we discuss the use of FFTW in a parallel environment,documenting the different parallel libraries that we have provided.(Users calling FFTW from a multi-threaded program should also consultSection <A HREF="fftw_3.html#SEC46">Thread safety</A>.)  The FFTW package currently contains three paralleltransform implementations that leverage the uniprocessor FFTW code:<UL><LI><A NAME="IDX202"></A>The first set of routines utilizes shared-memory threads for parallelone- and multi-dimensional transforms of both real and complex data.Any program using FFTW can be trivially modified to use themulti-threaded routines.  This code can use any common threadsimplementation, including POSIX threads.  (POSIX threads are availableon most Unix variants, including Linux.)  These routines are located inthe <CODE>threads</CODE> directory, and are documented in Section <A HREF="fftw_4.html#SEC48">Multi-threaded FFTW</A>.<LI><A NAME="IDX203"></A><A NAME="IDX204"></A>The <CODE>mpi</CODE> directory contains multi-dimensional transformsof real and complex data for parallel machines supporting MPI.  It alsoincludes parallel one-dimensional transforms for complex data.  The mainfeature of this code is that it supports distributed-memory transforms,so it runs on everything from workstation clusters to massively-parallelsupercomputers.  More information on MPI can be found at the<A HREF="http://www.mcs.anl.gov/mpi">MPI home page</A>.  The FFTW MPI routinesare documented in Section <A HREF="fftw_4.html#SEC55">MPI FFTW</A>.<LI><A NAME="IDX205"></A>We also have an experimental parallel implementation written in Cilk, aC-like parallel language developed at MIT and currently available forseveral SMP platforms.  For more information on Cilk see<A HREF="http://supertech.lcs.mit.edu/cilk">the Cilk home page</A>.  The FFTWCilk code can be found in the <CODE>cilk</CODE> directory, with parallelizedone- and multi-dimensional transforms of complex data.  The Cilk FFTWroutines are documented in <CODE>cilk/README</CODE>.</UL><H2><A NAME="SEC48">Multi-threaded FFTW</A></H2><P><A NAME="IDX206"></A>In this section we document the parallel FFTW routines for shared-memorythreads on SMP hardware.  These routines, which support parallel one-and multi-dimensional transforms of both real and complex data, are theeasiest way to take advantage of multiple processors with FFTW.  Theywork just like the corresponding uniprocessor transform routines, exceptthat they take the number of parallel threads to use as an extraparameter.  Any program that uses the uniprocessor FFTW can be triviallymodified to use the multi-threaded FFTW.<H3><A NAME="SEC49">Installation and Supported Hardware/Software</A></H3><P>All of the FFTW threads code is located in the <CODE>threads</CODE>subdirectory of the FFTW package.  On Unix systems, the FFTW threadslibraries and header files can be automatically configured, compiled,and installed along with the uniprocessor FFTW libraries simply byincluding <CODE>--enable-threads</CODE> in the flags to the <CODE>configure</CODE>script (see Section <A HREF="fftw_6.html#SEC67">Installation on Unix</A>).  (Note also that the threadsroutines, when enabled, are automatically tested by the <SAMP>`<CODE>makecheck'</SAMP></CODE> self-tests.)<A NAME="IDX207"></A><P>The threads routines require your operating system to have some sort ofshared-memory threads support.  Specifically, the FFTW threads packageworks with POSIX threads (available on most Unix variants, includingLinux), Solaris threads, <A HREF="http://www.be.com">BeOS</A> threads (testedon BeOS DR8.2), Mach C threads (reported to work by users), and Win32threads (reported to work by users).  (There is also untested code touse MacOS MP threads.)  We also support using<A HREF="http://www.openmp.org">OpenMP</A> or SGI MP compiler directives tolaunch threads, enabled by using <CODE>--with-openmp</CODE> or<CODE>--with-sgimp</CODE> in addition to <CODE>--enable-threads</CODE>.  This isespecially useful if you are employing that sort of directive in yourown code, in order to minimize conflicts.  If you have a shared-memorymachine that uses a different threads API, it should be a simple matterof programming to include support for it; see the file<CODE>fftw_threads-int.h</CODE> for more detail.<P>SMP hardware is not required, although of course you need multipleprocessors to get any benefit from the multithreaded transforms.<H3><A NAME="SEC50">Usage of Multi-threaded FFTW</A></H3><P>Here, it is assumed that the reader is already familiar with the usageof the uniprocessor FFTW routines, described elsewhere in this manual.We only describe what one has to change in order to use themulti-threaded routines.<P>First, instead of including <CODE>&#60;fftw.h&#62;</CODE> or <CODE>&#60;rfftw.h&#62;</CODE>, youshould include the files <CODE>&#60;fftw_threads.h&#62;</CODE> or<CODE>&#60;rfftw_threads.h&#62;</CODE>, respectively.<P>Second, before calling any FFTW routines, you should call the function:<PRE>int fftw_threads_init(void);</PRE><P><A NAME="IDX208"></A><P>This function, which should only be called once (probably in your<CODE>main()</CODE> function), performs any one-time initialization requiredto use threads on your system.  It returns zero if successful, and anon-zero value if there was an error (in which case, something isseriously wrong and you should probably exit the program).<P>Third, when you want to actually compute the transform, you should useone of the following transform routines instead of the ordinary FFTWfunctions:<PRE>fftw_threads(nthreads, plan, howmany, in, istride,              idist, out, ostride, odist);<A NAME="IDX209"></A>fftw_threads_one(nthreads, plan, in, out);<A NAME="IDX210"></A>fftwnd_threads(nthreads, plan, howmany, in, istride,               idist, out, ostride, odist);<A NAME="IDX211"></A>fftwnd_threads_one(nthreads, plan, in, out);<A NAME="IDX212"></A>rfftw_threads(nthreads, plan, howmany, in, istride,               idist, out, ostride, odist);<A NAME="IDX213"></A>rfftw_threads_one(nthreads, plan, in, out);<A NAME="IDX214"></A>rfftwnd_threads_real_to_complex(nthreads, plan, howmany, in,                                 istride, idist, out, ostride, odist);<A NAME="IDX215"></A>rfftwnd_threads_one_real_to_complex(nthreads, plan, in, out);<A NAME="IDX216"></A>rfftwnd_threads_complex_to_real(nthreads, plan, howmany, in,                                istride, idist, out, ostride, odist);<A NAME="IDX217"></A>rfftwnd_threads_one_real_to_complex(nthreads, plan, in, out);<A NAME="IDX218"></A>rfftwnd_threads_one_complex_to_real(nthreads, plan, in, out);<A NAME="IDX219"></A></PRE><P>All of these routines take exactly the same arguments and have exactlythe same effects as their uniprocessor counterparts (i.e. without the<SAMP>`<CODE>_threads</CODE>'</SAMP>) <EM>except</EM> that they take one extraparameter, <CODE>nthreads</CODE> (of type <CODE>int</CODE>), before the normalparameters.<A NAME="DOCF5" HREF="fftw_foot.html#FOOT5">(5)</A>  The <CODE>nthreads</CODE>parameter specifies the number of threads of execution to use whenperforming the transform (actually, the maximum number of threads).<A NAME="IDX220"></A><P>For example, to parallelize a single one-dimensional transform ofcomplex data, instead of calling the uniprocessor <CODE>fftw_one(plan,in, out)</CODE>, you would call <CODE>fftw_threads_one(nthreads, plan, in,out)</CODE>.  Passing an <CODE>nthreads</CODE> of <CODE>1</CODE> means to use only onethread (the main thread), and is equivalent to calling the uniprocessorroutine.  Passing an <CODE>nthreads</CODE> of <CODE>2</CODE> means that thetransform is potentially parallelized over two threads (and twoprocessors, if you have them), and so on.<P>These are the only changes you need to make to your source code.  Callsto all other FFTW routines (plan creation, destruction, wisdom,etcetera) are not parallelized and remain the same.  (The same plans andwisdom are used by both uniprocessor and multi-threaded transforms.)Your arrays are allocated and formatted in the same way, and so on.<P>Programs using the parallel complex transforms should be linked with<CODE>-lfftw_threads -lfftw -lm</CODE> on Unix.  Programs using the parallelreal transforms should be linked with <CODE>-lrfftw_threads-lfftw_threads -lrfftw -lfftw -lm</CODE>.  You will also need to link withwhatever library is responsible for threads on your system(e.g. <CODE>-lpthread</CODE> on Linux).<A NAME="IDX221"></A><H3><A NAME="SEC51">How Many Threads to Use?</A></H3><P><A NAME="IDX222"></A>There is a fair amount of overhead involved in spawning and synchronizingthreads, so the optimal number of threads to use depends upon the sizeof the transform as well as on the number of processors you have.<P>As a general rule, you don't want to use more threads than you haveprocessors.  (Using more threads will work, but there will be extraoverhead with no benefit.)  In fact, if the problem size is too small,you may want to use fewer threads than you have processors.<P>You will have to experiment with your system to see what level ofparallelization is best for your problem size.  Useful tools to help youdo this are the test programs that are automatically compiled along withthe threads libraries, <CODE>fftw_threads_test</CODE> and<CODE>rfftw_threads_test</CODE> (in the <CODE>threads</CODE> subdirectory).  These<A NAME="IDX223"></A><A NAME="IDX224"></A>take the same arguments as the other FFTW test programs (see<CODE>tests/README</CODE>), except that they also take the number of threadsto use as a first argument, and report the parallel speedup in speedtests.  For example,<PRE>fftw_threads_test 2 -s 128x128</PRE><P>will benchmark complex 128x128 transforms using two threads and reportthe speedup relative to the uniprocessor transform.<A NAME="IDX225"></A><P>For instance, on a 4-processor 200MHz Pentium Pro system running Linux2.2.0, we found that the "crossover" point at which 2 threads becamebeneficial for complex transforms was about 4k points, while 4 threadsbecame beneficial at 8k points.<H3><A NAME="SEC52">Using Multi-threaded FFTW in a Multi-threaded Program</A></H3><P><A NAME="IDX226"></A>It is perfectly possible to use the multi-threaded FFTW routines from amulti-threaded program (e.g. have multiple threads computingmulti-threaded transforms simultaneously).  If you have the processors,more power to you!  However, the same restrictions apply as for theuniprocessor FFTW routines (see Section <A HREF="fftw_3.html#SEC46">Thread safety</A>).  In particular, youshould recall that you may not create or destroy plans in parallel.<H3><A NAME="SEC53">Tips for Optimal Threading</A></H3><P>Not all transforms are equally well-parallelized by the multi-threadedFFTW routines.  (This is merely a consequence of laziness on the part ofthe implementors, and is not inherent to the algorithms employed.)Mainly, the limitations are in the parallel one-dimensional transforms.The things to avoid if you want optimal parallelization are as follows:<H3><A NAME="SEC54">Parallelization deficiencies in one-dimensional transforms</A></H3><UL><LI>Large prime factors can sometimes parallelize poorly.  Of course, youshould avoid these anyway if you want high performance.<LI><A NAME="IDX227"></A>Single in-place transforms don't parallelize completely.  (Multiplein-place transforms, i.e. <CODE>howmany &#62; 1</CODE>, are fine.)  Again, youshould avoid these in any case if you want high performance, as theyrequire transforming to a scratch array and copying back.<LI>Single real-complex (<CODE>rfftw</CODE>) transforms don't parallelizecompletely.  This is unfortunate, but parallelizing this correctly wouldhave involved a lot of extra code (and a much larger library).  Youstill get some benefit from additional processors, but if you have avery large number of processors you will probably be better off usingthe parallel complex (<CODE>fftw</CODE>) transforms.  Note thatmulti-dimensional real transforms or multiple one-dimensional realtransforms are fine.</UL><H2><A NAME="SEC55">MPI FFTW</A></H2><P><A NAME="IDX228"></A>This section describes the MPI FFTW routines for distributed-memory (andshared-memory) machines supporting MPI (Message Passing Interface).  TheMPI routines are significantly different from the ordinary FFTW becausethe transform data here are <EM>distributed</EM> over multiple processes,so that each process gets only a portion of the array.<A NAME="IDX229"></A>Currently, multi-dimensional transforms of both real and complex data,as well as one-dimensional transforms of complex data, are supported.<H3><A NAME="SEC56">MPI FFTW Installation</A></H3><P>The FFTW MPI library code is all located in the <CODE>mpi</CODE> subdirectoyof the FFTW package (along with source code for test programs).  On Unixsystems, the FFTW MPI libraries and header files can be automaticallyconfigured, compiled, and installed along with the uniprocessor FFTWlibraries simply by including <CODE>--enable-mpi</CODE> in the flags to the<CODE>configure</CODE> script (see Section <A HREF="fftw_6.html#SEC67">Installation on Unix</A>).
fftw_4.html - 源码说明

本页面展示了「FFTW, a collection of fast C routines to compute the Discrete Fourier Transform in one or more dime」中的 fftw_4.html 源码文件，采用 HTML 编程语言编写，共 1,105 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与collection相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?