📄 news
字号:
FFTW 3.2.1
* Performance improvements for some multidimensional r2c/c2r transforms;
thanks to Eugene Miloslavsky for his benchmark reports.
* Compile with icc on MacOS X, use better icc compiler flags.
* Compilation fixes for systems where snprintf is defined as a macro;
thanks to Marcus Mae for the bug report.
* Fortran documentation now recommends not using dfftw_execute,
because of reports of problems with various Fortran compilers;
it is better to use dfftw_execute_dft etcetera.
* Some documentation clarifications, e.g. of fact that --enable-openmp
and --enable-threads are mutually exclusive (thanks to Long To),
and document slightly odd behavior of plan_guru_r2r in Fortran
(thanks to Alexander Pozdneev).
* FAQ was accidentally omitted from 3.2 tarball.
* Remove some extraneous (harmless) files accidentally included in
a subdirectory of the 3.2 tarball.
FFTW 3.2
* Worked around apparent glibc bug that leads to rare hangs when freeing
semaphores.
* Fixed segfault due to unaligned access in certain obscure problems
that use SSE and multiple threads.
* MPI transforms not included, as they are still in alpha; the alpha
versions of the MPI transforms have been moved to FFTW 3.3alpha1.
FFTW 3.2alpha3
* Performance improvements for sizes with factors of 5 and 10.
* Documented FFTW_WISDOM_ONLY flag, at the suggestion of Mario
Emmenlauer and Phil Dumont.
* Port Cell code to SDK2.1 (libspe2), as opposed to the old libspe1 code.
* Performance improvements in Cell code for N < 32k, thanks to Jan Wagner
for the suggestions.
* Cycle counter for Sun x86_64 compiler, and compilation fix in cycle
counter for AIX/xlc (thanks to Jeff Haferman for the bug report).
* Fixed incorrect type prefix in MPI code that prevented wisdom routines
from working in single precision (thanks to Eric A. Borisch for the report).
* Added 'make check' for MPI code (which still fails in a couple corner
cases, but should be much better than in alpha2).
* Many other small fixes.
FFTW 3.2alpha2
* Support for the Cell processor, donated by IBM Research; see README.Cell
and the Cell section of the manual.
* New 64-bit API: for every "plan_guru" function there is a new "plan_guru64"
function with the same semantics, but which takes fftw_iodim64 instead of
fftw_iodim. fftw_iodim64 is the same as fftw_iodim, except that it takes
ptrdiff_t integer types as parameters, which is a 64-bit type on
64-bit machines. This is only useful for specifying very large transforms
on 64-bit machines. (Internally, FFTW uses ptrdiff_t everywhere
regardless of what API you choose.)
* Experimental MPI support. Complex one- and multi-dimensional FFTs,
multi-dimensional r2r, multi-dimensional r2c/c2r transforms, and
distributed transpose operations, with 1d block distributions.
(This is an alpha preview: routines have not been exhaustively
tested, documentation is incomplete, and some functionality is
missing, e.g. Fortran support.) See mpi/README and also the MPI
section of the manual.
* Significantly faster r2c/c2r transforms, especially on machines with SIMD.
* Rewritten multi-threaded support for better performance by
re-using a fixed pool of threads rather than continually
respawning and joining (which nowadays is much slower).
* Support for MIPS paired-single SIMD instructions, donated by
Codesourcery.
* FFTW_WISDOM_ONLY planner flag, to create plan only if wisdom is
available and return NULL otherwise.
* Removed k7 support, which only worked in 32-bit mode and is
becoming obsolete. Use --enable-sse instead.
* Added --with-g77-wrappers configure option to force inclusion
of g77 wrappers, in addition to whatever is needed for the
detected Fortran compilers. This is mainly intended for GNU/Linux
distros switching to gfortran that wish to include both
gfortran and g77 support in FFTW.
* In manual, renamed "guru execute" functions to "new-array execute"
functions, to reduce confusion with the guru planner interface.
(The programming interface is unchanged.)
* Add missing __declspec attribute to threads API functions when compiling
for Windows; thanks to Robert O. Morris for the bug report.
* Fixed missing return value from dfftw_init_threads in Fortran;
thanks to Markus Wetzstein for the bug report.
FFTW 3.1.1
* Performance improvements for Intel EMT64.
* Performance improvements for large-size transforms with SIMD.
* Cycle counter support for Intel icc and Visual C++ on x86-64.
* In fftw-wisdom tool, replaced obsolete --impatient with --measure.
* Fixed compilation failure with AIX/xlc; thanks to Joseph Thomas.
* Windows DLL support for Fortran API (added missing __declspec(dllexport)).
* SSE/SSE2 code works properly (i.e. disables itself) on older 386 and 486
CPUs lacking a CPUID instruction; thanks to Eric Korpela.
FFTW 3.1
* Faster FFTW_ESTIMATE planner.
* New (faster) algorithm for REDFT00/RODFT00 (type-I DCT/DST) of odd size.
* "4-step" algorithm for faster FFTs of very large sizes (> 2^18).
* Faster in-place real-data DFTs (for R2HC and HC2R r2r formats).
* Faster in-place non-square transpositions (FFTW uses these internally
for in-place FFTs, and you can also perform them explicitly using
the guru interface).
* Faster prime-size DFTs: implemented Bluestein's algorithm, as well
as a zero-padded Rader variant to limit recursive use of Rader's algorithm.
* SIMD support for split complex arrays.
* Much faster Altivec/VMX performance.
* New fftw_set_timelimit function to specify a (rough) upper bound to the
planning time (does not affect ESTIMATE mode).
* Removed --enable-3dnow support; use --enable-k7 instead.
* FMA (fused multiply-add) version is now included in "standard" FFTW,
and is enabled with --enable-fma (the default on PowerPC and Itanium).
* Automatic detection of native architecture flag for gcc. New
configure options: --enable-portable-binary and --with-gcc-arch=<arch>,
for people distributing compiled binaries of FFTW (see manual).
* Automatic detection of Altivec under Linux with gcc 3.4 (so that
same binary should work on both Altivec and non-Altivec PowerPCs).
* Compiler-specific tweaks/flags/workarounds for gcc 3.4, xlc, HP/UX,
Solaris/Intel.
* Various documentation clarifications.
* 64-bit clean. (Fixes a bug affecting the split guru planner on
64-bit machines, reported by David Necas.)
* Fixed Debian bug #259612: inadvertent use of SSE instructions on
non-SSE machines (causing a crash) for --enable-sse binaries.
* Fixed bug that caused HC2R transforms to destroy the input in
certain cases, even if the user specified FFTW_PRESERVE_INPUT.
* Fixed bug where wisdom would be lost under rare circumstances,
causing excessive planning time.
* FAQ notes bug in gcc-3.4.[1-3] that causes FFTW to crash with SSE/SSE2.
* Fixed accidentally exported symbol that prohibited simultaneous
linking to double/single multithreaded FFTW (thanks to Alessio Massaro).
* Support Win32 threads under MinGW (thanks to Alessio Massaro).
* Fixed problem with building DLL under Cygwin; thanks to Stephane Fillod.
* Fix build failure if no Fortran compiler is found (thanks to Charles
Radley for the bug report).
* Fixed compilation failure with icc 8.0 and SSE/SSE2. Automatic
detection of icc architecture flag (e.g. -xW).
* Fixed compilation with OpenMP on AIX (thanks to Greg Bauer).
* Fixed compilation failure on x86-64 with gcc (thanks to Orion Poplawski).
* Incorporated patch from FreeBSD ports (FreeBSD does not have memalign,
but its malloc is 16-byte aligned).
* Cycle-counter compilation fixes for Itanium, Alpha, x86-64, Sparc,
MacOS (thanks to Matt Boman, John Bowman, and James A. Treacy for
reports/fixes). Added x86-64 cycle counter for PGI compilers,
courtesy Cristiano Calonaci.
* Fix compilation problem in test program due to C99 conflict.
* Portability fix for import_system_wisdom with djgpp (thanks to Juan
Manuel Guerrero).
* Fixed compilation failure on MacOS 10.3 due to getopt conflict.
* Work around Visual C++ (version 6/7) bug in SSE compilation;
thanks to Eddie Yee for his detailed report.
Changes from FFTW 3.1 beta 2:
* Several minor compilation fixes.
* Eliminate FFTW_TIMELIMIT flag and replace fftw_timelimit global with
fftw_set_timelimit function. Make wisdom work with time-limited plans.
Changes from FFTW 3.1 beta 1:
* Fixes for creating DLLs under Windows; thanks to John Pavel for his feedback.
* Fixed more 64-bit problems, thanks to John Pavel for the bug report.
* Further speed improvements for Altivec/VMX.
* Further speed improvements for non-square transpositions.
* Many minor tweaks.
FFTW 3.0.1
* Some speed improvements in SIMD code.
* --without-cycle-counter option is removed. If no cycle counter is found,
then the estimator is always used. A --with-slow-timer option is provided
to force the use of lower-resolution timers.
* Several fixes for compilation under Visual C++, with help from Stefane Ruel.
* Added x86 cycle counter for Visual C++, with help from Morten Nissov.
* Added S390 cycle counter, courtesy of James Treacy.
* Added missing static keyword that prevented simultaneous linkage
of different-precision versions; thanks to Rasmus Larsen for the bug report.
* Corrected accidental omission of f77_wisdom.f file; thanks to Alan Watson.
* Support -xopenmp flag for SunOS; thanks to John Lou for the bug report.
* Compilation with HP/UX cc requires -Wp,-H128000 flag to increase
preprocessor limits; thanks to Peter Vouras for the bug report.
* Removed non-portable use of 'tempfile' in fftw-wisdom-to-conf script;
thanks to Nicolas Decoster for the patch.
* Added 'make smallcheck' target in tests/ directory, at the request of
James Treacy.
FFTW 3.0
Major goals of this release:
* Speed: often 20% or more faster than FFTW 2.x, even without SIMD (see below).
* Complete rewrite, to make it easier to add new algorithms and transforms.
* New API, to support more general semantics.
Other enhancements:
* SIMD acceleration on supporting CPUs (SSE, SSE2, 3DNow!, and AltiVec).
(With special thanks to Franz Franchetti for many experimental prototypes
and to Stefan Kral for the vectorizing generator from fftwgel.)
* True in-place 1d transforms of large sizes (as well as compressed
twiddle tables for additional memory/cache savings).
* More arbitrary placement of real & imaginary data, e.g. including
interleaved (as in FFTW 2.x) as well as separate real/imag arrays.
* Efficient prime-size transforms of real data.
* Multidimensional transforms can operate on a subset of a larger matrix,
and/or transform selected dimensions of a multidimensional array.
* By popular demand, simultaneous linking to double precision (fftw),
single precision (fftwf), and long-double precision (fftwl) versions
of FFTW is now supported.
* Cycle counters (on all modern CPUs) are exploited to speed planning.
* Efficient transforms of real even/odd arrays, a.k.a. discrete
cosine/sine transforms (types I-IV). (Currently work via pre/post
processing of real transforms, ala FFTPACK, so are not optimal.)
* DHTs (Discrete Hartley Transforms), again via post-processing
of real transforms (and thus suboptimal, for now).
* Support for linking to just those parts of FFTW that you need,
greatly reducing the size of statically linked programs when
only a limited set of transform sizes/types are required.
* Canonical global wisdom file (/etc/fftw/wisdom) on Unix, along
with a command-line tool (fftw-wisdom) to generate/update it.
* Fortran API can be used with both g77 and non-g77 compilers
simultaneously.
* Multi-threaded version has optional OpenMP support.
* Authors' good looks have greatly improved with age.
Changes from 3.0beta3:
* Separate FMA distribution to better exploit fused multiply-add instructions
on PowerPC (and possibly other) architectures.
* Performance improvements via some inlining tweaks.
* fftw_flops now returns double arguments, not int, to avoid overflows
for large sizes.
* Workarounds for automake bugs.
Changes from 3.0beta2:
* The standard REDFT00/RODFT00 (DCT-I/DST-I) algorithm (used in
FFTPACK, NR, etcetera) turns out to have poor numerical accuracy, so
we replaced it with a slower routine that is more accurate.
* The guru planner and execute functions now have two variants, one that
takes complex arguments and one that takes separate real/imag pointers.
* Execute and planner routines now automatically align the stack on x86,
in case the calling program is misaligned.
* README file for test program.
* Fixed bugs in the combination of SIMD with multi-threaded transforms.
* Eliminated internal fftw_threads_init function, which some people were
calling accidentally instead of the fftw_init_threads API function.
* Check for -openmp flag (Intel C compiler) when --enable-openmp is used.
* Support AMD x86-64 SIMD and cycle counter.
* Support SSE2 intrinsics in forthcoming gcc 3.3.
Changes from 3.0beta1:
* Faster in-place 1d transforms of non-power-of-two sizes.
* SIMD improvements for in-place, multi-dimensional, and/or non-FFTW_PATIENT
transforms.
* Added support for hard-coded DCT/DST/DHT codelets of small sizes; the
default distribution only includes hard-coded size-8 DCT-II/III, however.
* Many minor improvements to the manual. Added section on using the
codelet generator to customize and enhance FFTW.
* The default 'make check' should now only take a few minutes; for more
strenuous tests (which may take a day or so), do 'cd tests; make bigcheck'.
* fftw_print_plan is split into fftw_fprint_plan and fftw_print_plan, where
the latter uses stdout.
* Fixed ability to compile with a C++ compiler.
* Fixed support for C99 complex type under glibc.
* Fixed problems with alloca under MinGW, AIX.
* Workaround for gcc/SPARC bug.
* Fixed multi-threaded initialization failure on IRIX due to lack of
user-accessible PTHREAD_SCOPE_SYSTEM there.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -