📄 fftw_6.html
字号:
<H2><A NAME="SEC69">Installing FFTW in both single and double precision</A></H2><P><A NAME="IDX327"></A>It is often useful to install both single- and double-precision versionsof the FFTW libraries on the same machine, and we provide a convenientmechanism for achieving this on Unix systems.<P><A NAME="IDX328"></A>When the <CODE>--enable-type-prefix</CODE> option of configure is used, theFFTW libraries and header files are installed with a prefix of <SAMP>`d'</SAMP>or <SAMP>`s'</SAMP>, depending upon whether you compiled in double or singleprecision. Then, instead of linking your program with <CODE>-lrfftw-lfftw</CODE>, for example, you would link with <CODE>-ldrfftw -ldfftw</CODE> to usethe double-precision version or with <CODE>-lsrfftw -lsfftw</CODE> to use thesingle-precision version. Also, you would <CODE>#include</CODE><CODE><drfftw.h></CODE> or <CODE><srfftw.h></CODE> instead of <CODE><rfftw.h></CODE>, andso on.<P><EM>The names of FFTW functions, data types, and constants remainunchanged!</EM> You still call, for instance, <CODE>fftw_one</CODE> and not<CODE>dfftw_one</CODE>. Only the names of header files and libraries aremodified. One consequence of this is that <EM>you <B>cannot</B> use boththe single- and double-precision FFTW libraries in the same program,simultaneously,</EM> as the function names would conflict.<P>So, to install both the single- and double-precision libraries on thesame machine, you would do:<PRE>./configure --enable-type-prefix <I>[ other options ]</I>makemake installmake clean./configure --enable-float --enable-type-prefix <I>[ other options ]</I>makemake install</PRE><H2><A NAME="SEC70"><CODE>gcc</CODE> and Pentium hacks</A></H2><P><A NAME="IDX329"></A>The <CODE>configure</CODE> option <CODE>--enable-i386-hacks</CODE> enables specificoptimizations for the Pentium and later x86 CPUs under gcc, which cansignificantly improve performance of double-precision transforms.Specifically, we have tested these hacks on Linux with <CODE>gcc</CODE> 2.[789]and versions of <CODE>egcs</CODE> since 1.0.3. These optimizations affectonly the performance and not the correctness of FFTW (i.e. it is alwayssafe to try them out).<P>These hacks provide a workaround to the incorrect alignment of local<CODE>double</CODE> variables in <CODE>gcc</CODE>. Thecompiler aligns these<A NAME="IDX330"></A>variables to multiples of 4 bytes, but execution is much faster (onPentium and PentiumPro) if <CODE>double</CODE>s are aligned to a multiple of 8bytes. By carefully counting the number of variables allocated by thecompiler in performance-critical regions of the code, we have been ableto introduce dummy allocations (using <CODE>alloca</CODE>) that align thestack properly. The hack depends crucially on the compiler flags thatare used. For example, it won't work without<CODE>-fomit-frame-pointer</CODE>.<P>In principle, these hacks are no longer required under <CODE>gcc</CODE>versions 2.95 and later, which automatically align the stack correctly(see <CODE>-mpreferred-stack-boundary</CODE> in the <CODE>gcc</CODE> manual).However, we have encountered a<A HREF="http://egcs.cygnus.com/ml/gcc-bugs/1999-11/msg00259.html">bug</A> inthe stack alignment of versions 2.95.[012] that causes FFTW's stack tobe misaligned under some circumstances. The <CODE>configure</CODE> scriptautomatically detects this bug and disables <CODE>gcc</CODE>'s stack alignmentin favor of our own hacks when <CODE>--enable-i386-hacks</CODE> is used.<P>The <CODE>fftw_test</CODE> program outputs speed measurements that you can useto see if these hacks are beneficial.<A NAME="IDX331"></A><A NAME="IDX332"></A><P>The <CODE>configure</CODE> option <CODE>--enable-pentium-timer</CODE> enables theuse of the Pentium and PentiumPro cycle counter for timing purposes. Inorder to get correct results, you must define <CODE>FFTW_CYCLES_PER_SEC</CODE>in <CODE>fftw/config.h</CODE> to be the clock speed of your processor; theresulting FFTW library will be nonportable. The use of this option isdeprecated. On serious operating systems (such as Linux), FFTW uses<CODE>gettimeofday()</CODE>, which has enough resolution and is portable.(Note that Win32 has its own high-resolution timing routines as well.FFTW contains unsupported code to use these routines.)<H2><A NAME="SEC71">Customizing the timer</A></H2><P><A NAME="IDX333"></A><P>FFTW needs a reasonably-precise clock in order to find the optimal wayto compute a transform. On Unix systems, <CODE>configure</CODE> looks for<CODE>gettimeofday</CODE> and other system-specific timers. If it does notfind any high resolution clock, it defaults to using the <CODE>clock()</CODE>function, which is very portable, but forces FFTW to run for a long timein order to get reliable measurements.<A NAME="IDX334"></A><A NAME="IDX335"></A><P>If your machine supports a high-resolution clock not recognized by FFTW,it is therefore advisable to use it. You must edit<CODE>fftw/fftw-int.h</CODE>. There are a few macros you must redefine. Thecode is documented and should be self-explanatory. (By the way,<CODE>fftw-int</CODE> stands for <CODE>fftw-internal</CODE>, but for someinexplicable reason people are still using primitive systems with 8.3filenames.)<P>Even if you don't install high-resolution timing code, we stillrecommend that you look at the <CODE>FFTW_TIME_MIN</CODE> constant in<A NAME="IDX336"></A><CODE>fftw/fftw-int.h</CODE>. This constant holds the minimum time interval (inseconds) required to get accurate timing measurements, and should be (atleast) several hundred times the resolution of your clock. The defaultconstants are on the conservative side, and may cause FFTW to takelonger than necessary when you create a plan. Set <CODE>FFTW_TIME_MIN</CODE>to whatever is appropriate on your system (be sure to set the<EM>right</EM> <CODE>FFTW_TIME_MIN</CODE>...there are several definitions in<CODE>fftw-int.h</CODE>, corresponding to different platforms and timers).<P>As an aid in checking the resolution of your clock, you can use the<CODE>tests/fftw_test</CODE> program with the <CODE>-t</CODE> option(c.f. <CODE>tests/README</CODE>). Remember, the mere fact that your clockreports times in, say, picoseconds, does not mean that it is actually<EM>accurate</EM> to that resolution.<H2><A NAME="SEC72">Generating your own code</A></H2><P><A NAME="IDX337"></A><A NAME="IDX338"></A><A NAME="IDX339"></A><P>If you know that you will only use transforms of a certain size (say,powers of 2) and want to reduce the size of the library, you canreconfigure FFTW to support only those sizes you are interested in. Youmay even generate code to enable efficient transforms of a size notsupported by the default distribution. The default distributionsupports transforms of any size, but not all sizes are equally fast.The default installation of FFTW is best at handling sizes of the form2<SUP>a</SUP> 3<SUP>b</SUP> 5<SUP>c</SUP> 7<SUP>d</SUP> 11<SUP>e</SUP> 13<SUP>f</SUP>,where e+f is either 0 or1, and the other exponents are arbitrary. Other sizes arecomputed by means of a slow, general-purpose routine. However, if youhave an application that requires fast transforms of size, say,<CODE>17</CODE>, there is a way to generate specialized code to handle that.<P>The directory <CODE>gensrc</CODE> contains all the programs and scripts thatwere used to generate FFTW. In particular, the program<CODE>gensrc/genfft.ml</CODE> was used to generate the code that FFTW uses tocompute the transforms. We do not expect casual users to use it.<CODE>genfft</CODE> is a rather sophisticated program that generates directedacyclic graphs of FFT algorithms and performs algebraic simplificationson them. <CODE>genfft</CODE> is written in Objective Caml, a dialect of ML.Objective Caml is described at <A HREF="http://pauillac.inria.fr/ocaml/">http://pauillac.inria.fr/ocaml/</A>and can be downloaded from from <A HREF="ftp://ftp.inria.fr/lang/caml-light">ftp://ftp.inria.fr/lang/caml-light</A>.<A NAME="IDX340"></A><A NAME="IDX341"></A><P>If you have Objective Caml installed, you can type <CODE>shbootstrap.sh</CODE> in the top-level directory to re-generate the files. Ifyou change the <CODE>gensrc/config</CODE> file, you can optimize FFTW forsizes that are not currently supported efficiently (say, 17 or 19).<P>We do not provide more details about the code-generation process, sincewe do not expect that users will need to generate their own code.However, feel free to contact us at <A HREF="mailto:fftw@fftw.org">fftw@fftw.org</A> ifyou are interested in the subject. <P><A NAME="IDX342"></A>You might find it interesting to learn Caml and/or some modernprogramming techniques that we used in the generator (including monadicprogramming), especially if you heard the rumor that Java andobject-oriented programming are the latest advancement in the field.The internal operation of the codelet generator is described in thepaper, "A Fast Fourier Transform Compiler," by M. Frigo, which isavailable from the <A HREF="http://www.fftw.org">FFTW home page</A>and will appear in the <CITE>Proceedings of the 1999 ACM SIGPLANConference on Programming Language Design and Implementation (PLDI)</CITE>.<P><HR><P>Go to the <A HREF="fftw_1.html">first</A>, <A HREF="fftw_5.html">previous</A>, <A HREF="fftw_7.html">next</A>, <A HREF="fftw_10.html">last</A> section, <A HREF="fftw_toc.html">table of contents</A>.</BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -