todo

FFTW, a collection of fast C routines to compute the Discrete Fourier Transform in one or more dime
字号:
The following are a number of ideas for future work that we havethought of, or which have been suggested to us.  Let us know(fftw@fftw.org) if you have other proposals, or if there is somethingthat you want to work on.* Implement some sort of Prime Factor algorithm (Temperton's?)  (PFA isnow used in the codelets.)* Try the Winograd blocks for the base cases. (We now use Rader'salgorithm for prime size codelets.)* Try on-the-fly generation of twiddle factors, to save space andcache. (Done.  However, not yet enabled in the standard distribution.The codelet generator is capable of generating code that either loadsor computes the twiddle factors, and the FFTW C code supports bothways.  We do not have enough experimental numbers to determine whichway is faster, however)* Since we now have "strided wisdom," it would be nice to keep thestride into account when planning 1D transform recursively.  We shouldeliminate the planner table altogether, and just use the wisdom tablefor planning.* Implement fast DCT and DST codes (cosine and sine transforms);equivalently, implement fast algorithms for transforms of real/evenand real/odd data.  There are two parts to this: (i) modify thecodelet generator to output hard-coded transforms of small sizes [thisis done], and (ii) figure out & implement a recursive framework forcombining these codelets to achieve transforms of general lengths.(Once this is done, implement multi-dimensional transforms, etcetera.)* Implement a library of convolution routines, windowing, filters,etcetera based on FFTW.  As DSP isn't our field (we are interested inFFTs for other reasons), this sort of thing is probably best left toothers.  Let us know if you're interested in writing such a thing,though, and we'll be happy to link to your site and give you feedback.* Generate multi-dimensional codelets for use in two/three-dimensionaltransforms.  (i.e. implement what is sometimes called a "vector-radix"algorithm.)  There are potential cache benefits to this.* Take advantage of the vector instructions on the Pentium-III andforthcoming PowerPC architectures.  (Coming from the old Cray vectorsupercomputers and the horrible coding they encouraged, this seemssuspiciously like a giant step backwards in computer architectures...)We'd like to see better gcc support before we do anything along theselines, though.* In rfftw, implement a fast O(n lg n) algorithm for prime sizes andlarge prime factors (currently, only the complex FFTW has fastalgorithms for prime sizes).  The basic problem is that we don't knowof any such algorithm specialized for real data; suggestions and/orreferences are welcome.* In the MPI transforms, implement a parallel 1D transform for realdata (i.e. rfftw_mpi).  (Currently, there are only parallel 1Dtransforms for complex data in the MPI code.)* In the MPI transforms, implement more sophisticated (i.e. faster)in-place and out-of-place transpose routines for the in-processtransposes (used as subroutines by the distributed transpose).  Thecurrent routines are quite simplistic, although it is not clear howmuch they hurt performance.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -