⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 fftalgorithms.tex

📁 用于VC.net的gsl的lib库文件包
💻 TEX
📖 第 1 页 / 共 5 页
字号:
\documentclass[fleqn,12pt]{article}
%
\setlength{\oddsidemargin}{-0.25in}
\setlength{\textwidth}{7.0in}
\setlength{\topmargin}{-0.25in}
\setlength{\textheight}{9.5in}
%
\usepackage{algorithmic}
\newenvironment{algorithm}{\begin{quote} %\vspace{1em}
\begin{algorithmic}\samepage}{\end{algorithmic} %\vspace{1em} 
\end{quote}}
\newcommand{\Real}{\mathop{\mathrm{Re}}}
\newcommand{\Imag}{\mathop{\mathrm{Im}}}

\begin{document}
\title{FFT Algorithms}
\author{Brian Gough, {\tt bjg@network-theory.co.uk}}
\date{May 1997}
\maketitle

\section{Introduction}

Fast Fourier Transforms (FFTs) are efficient algorithms for
calculating the discrete fourier transform (DFT),
%
\begin{eqnarray}
h_a &=& \mathrm{DFT}(g_b) \\
    &=& \sum_{b=0}^{N-1} g_b \exp(-2\pi i a b /N) \qquad 0 \leq a \leq N-1\\
    &=& \sum_{b=0}^{N-1} g_b W_N^{ab} \qquad W_N= \exp(-2\pi i/N)
\end{eqnarray}
%
The DFT usually arises as an approximation to the continuous fourier
transform when functions are sampled at discrete intervals in space or
time. The naive evaluation of the discrete fourier transform is a
matrix-vector multiplication ${\mathbf W}\vec{g}$, and would take
$O(N^2)$ operations for $N$ data-points. The general principle of the
Fast Fourier Transform algorithms is to use a divide-and-conquer
strategy to factorize the matrix $W$ into smaller sub-matrices,
typically reducing the operation count to $O(N \sum f_i)$ if $N$ can
be factorized into smaller integers, $N=f_1 f_2 \dots f_n$.

This chapter explains the algorithms used in the GSL FFT routines
and provides some information on how to extend them. To learn more about
the FFT you should read the review article {\em Fast Fourier
Transforms: A Tutorial Review and A State of the Art} by Duhamel and
Vetterli~\cite{duhamel90}. There are several introductory books on the
FFT with example programs, such as {\em The Fast Fourier Transform} by
Brigham~\cite{brigham74} and {\em DFT/FFT and Convolution Algorithms}
by Burrus and Parks~\cite{burrus84}. In 1979 the IEEE published a
compendium of carefully-reviewed Fortran FFT programs in {\em Programs
for Digital Signal Processing}~\cite{committee79} which is a useful
reference for implementations of many different FFT algorithms. If you
are interested in using DSPs then the {\em Handbook of Real-Time Fast
Fourier Transforms}~\cite{smith95} provides detailed information on
the algorithms and hardware needed to design, build and test DSP
applications. Many FFT algorithms rely on results from number theory.
These results are covered in the books {\em Fast transforms:
algorithms, analyses, applications}, by Elliott and
Rao~\cite{elliott82}, {\em Fast Algorithms for Digital Signal
Processing} by Blahut~\cite{blahut} and {\em Number Theory in Digital
Signal Processing} by McClellan and Rader~\cite{mcclellan79}. There is
also an annotated bibliography of papers on the FFT and related topics
by Burrus~\cite{burrus-note}.

\section{Families of FFT algorithms}
%
There are two main families of FFT algorithms: the Cooley-Tukey
algorithm and the Prime Factor algorithm. These differ in the way they
map the full FFT into smaller sub-transforms. Of the Cooley-Tukey
algorithms there are two types of routine in common use: mixed-radix
(general-$N$) algorithms and radix-2 (power of 2) algorithms. Each
type of algorithm can be further classified by additional characteristics,
such as whether it operates in-place or uses additional scratch space,
whether its output is in a sorted or scrambled order, and whether it
uses decimation-in-time or -frequency iterations.

Mixed-radix algorithms work by factorizing the data vector into
shorter lengths. These can then be transformed by small-$N$ FFTs.
Typical programs include FFTs for small prime factors, such as 2, 3,
5, \dots which are highly optimized. The small-$N$ FFT modules act as
building blocks and can be multiplied together to make longer
transforms. By combining a reasonable set of modules it is possible to
compute FFTs of many different lengths. If the small-$N$ modules are
supplemented by an $O(N^2)$ general-$N$ module then an FFT of any
length can be computed, in principle. Of course, any lengths which
contain large prime factors would perform only as $O(N^2)$.

Radix-2 algorithms, or ``power of two'' algorithms, are simplified
versions of the mixed-radix algorithm. They are restricted to lengths
which are a power of two. The basic radix-2 FFT module only involves
addition and subtraction, so the algorithms are very simple. Radix-2
algorithms have been the subject of much research into optimizing the
FFT. Many of the most efficient radix-2 routines are based on the
``split-radix'' algorithm. This is actually a hybrid which combines
the best parts of both radix-2 and radix-4 (``power of 4'')
algorithms~\cite{sorenson86,duhamel86}.

The prime factor algorithm (PFA) is an alternative form of general-$N$
algorithm based on a different way of recombining small-$N$ FFT
modules~\cite{temperton83pfa,temperton85}. It has a very simple indexing
scheme which makes it attractive. However it only works in the case
where all factors are mutually prime. This requirement makes it more
suitable as a specialized algorithm for given lengths.


\begin{subsection}{FFTs of prime lengths}
%
  Large prime lengths cannot be handled efficiently by any of these
algorithms. However it may still possible to compute a DFT, by using
results from number theory. Rader showed that it is possible to
convert a length-$p$ FFT (where $p$ is prime) into a convolution of
length-($p-1$).  There is a simple identity between the convolution of
length $N$ and the FFT of the same length, so if $p-1$ is easily
factorizable this allows the convolution to be computed efficiently
via the FFT. The idea is presented in the original paper by
Rader~\cite{raderprimes} (also reprinted in~\cite{mcclellan79}), but
for more details see the theoretical books mentioned earlier.
\end{subsection}

%\subsection{In-place algorithms vs algorithms using scratch space}
%%
%For simple algorithms, such as the Radix-2 algorithms, the FFT can be
%performed in-place, without additional memory. For this to be possible
%the index calculations must be simple enough that the calculation of
%the result to be stored in index $k$ can be carried out at the same
%time as the data in $k$ is used, so that no temporary storage is
%needed.

%The mixed-radix algorithm is too complicated for this. It requires an
%area of scratch space sufficient to hold intermediate copies of the
%input data.

%\subsection{Self-sorting algorithms vs scrambling algorithms}
%%
%A self-sorting algorithm At each iteration of the FFT internal permutations are included
%which leave the final iteration in its natural order.


%The mixed-radix algorithm can be made self-sorting. The additional
%scratch space helps here, although it is in fact possible to design
%self-sorting algorithms which do not require scratch
%space. 


%The in-place radix-2 algorithm is not self-sorting. The data is
%permuted, and transformed into bit-reversed order. Note that
%bit-reversal only refers to the order of the array, not the values for
%each index which are of course not changed. More precisely, the data
%stored in location $[b_n \dots b_2 b_1 b_0]$ is moved to location
%$[b_0 b_1 b_2 \dots b_n]$, where $b_0 \dots b_n$ are the raw bits of a
%given index. Placing the data in bit reversed order is easily done,
%using right shifts to extract the least significant bits of the index
%and left-shifts to feed them into a register for the bit-reversed
%location. Simple radix-2 FFT usually recompute the bit reverse
%ordering in the naive way for every call. For repeated calls it might
%be worthwhile to precompute the permutation and cache it. There are
%also more sophisticated algorithms which reduce the number of
%operations needed to perform bit-reversal~\cite{bit-reversal}.


%\begin{subsection}{Decimation-in-time (DIT) vs Decimation-in-Frequency (DIF)}

%\end{subsection}


\subsection{Optimization}
%
There is no such thing as the single fastest FFT {\em algorithm}. FFT
algorithms involve a mixture of floating point calculations, integer
arithmetic and memory access. Each of these operations will have
different relative speeds on different platforms.  The performance of
an algorithm is a function of the hardware it is implemented on.  The
goal of optimization is thus to choose the algorithm best suited to
the characteristics of a given hardware platform.

For example, the Winograd Fourier Transform (WFTA) is an algorithm
which is designed to reduce the number of floating point
multiplications in the FFT. However, it does this at the expense of
using many more additions and data transfers than other algorithms.
As a consequence the WFTA might be a good candidate algorithm for
machines where data transfers occupy a negligible time relative to
floating point arithmetic. However on most modern machines, where the
speed of data transfers is comparable to or slower than floating point
operations, it would be outperformed by an algorithm which used a
better mix of operations (i.e. more floating point operations but
fewer data transfers).

For a study of this sort of effect in detail, comparing the different
algorithms on different platforms consult the paper {\it Effects of
Architecture Implementation on DFT Algorithm Performance} by Mehalic,
Rustan and Route~\cite{mehalic85}. The paper was written in the early
1980's and has data for super- and mini-computers which you are
unlikely to see today, except in a museum. However, the methodology is
still valid and it would be interesting to see similar results for
present day computers.


\section{FFT Concepts}
%
Factorization is the key principle of the mixed-radix FFT divide-and-conquer
strategy. If $N$ can be factorized into a product of $n_f$ integers,
%
\begin{equation}
N = f_1 f_2 ... f_{n_f} ,
\end{equation}
%
then the FFT itself can be divided into smaller FFTs for each factor.
More precisely, an FFT of length $N$ can be broken up into,
%
\begin{quote}
\begin{tabular}{l}
$(N/f_1)$ FFTs of length $f_1$, \\
$(N/f_2)$ FFTs of length $f_2$, \\
\dots \\
$(N/f_{n_f})$ FFTs of length $f_{n_f}$. 
\end{tabular}
\end{quote}
%
The total number of operations for these sub-operations will be $O(
N(f_1 + f_2 + ... + f_{n_f}))$. When the factors of $N$ are all small
integers this will be substantially less than $O(N^2)$. For example,
when $N$ is a power of 2 an FFT of length $N=2^m$ can be reduced to $m
N/2$ FFTs of length 2, or $O(N\log_2 N)$ operations.  Here is a
demonstration which shows this:

We start with the full DFT,
%
\begin{eqnarray}
h_a &=& \sum_{b=0}^{N-1} g_b W_N^{ab}       \qquad W_N=\exp(-2\pi i/N)
\end{eqnarray}
%
and split the sum into even and odd terms,
%
\begin{eqnarray}
\phantom{h_a}
%   &=& \left(\sum_{b=0,2,4,\dots} + \sum_{b=1,3,5,\dots}\right) g_b W_N^{ab}\\
   &=& \sum_{b=0}^{N/2-1} g_{2b} W_N^{a(2b)} + 
      \sum_{b=0}^{N/2-1} g_{2b+1} W_N^{a(2b+1)}.
\end{eqnarray}
%
This converts the original DFT of length $N$ into two DFTs of length
$N/2$,
%
\begin{equation}
h_a = \sum_{b=0}^{N/2-1} g_{2b} W_{(N/2)}^{ab} + 
      W_N^a \sum_{b=0}^{N/2-1} g_{2b+1} W_{(N/2)}^{ab} 
\end{equation}
%
The first term is a DFT of the even elements of $g$. The second term
is a DFT of the odd elements of $g$, premultiplied by an exponential
factor $W_N^k$ (known as a {\em twiddle factor}).
%
\begin{equation}
\mathrm{DFT}(h)  =  \mathrm{DFT}(g_{even}) + W_N^k \mathrm{DFT}(g_{odd})
\end{equation}
%
By splitting the DFT into its even and odd parts we have reduced the
operation count from $N^2$ (for a DFT of length $N$) to $2 (N/2)^2$
(for two DFTs of length $N/2$). The cost of the splitting is that we
need an additional $O(N)$ operations to multiply by the twiddle factor
$W_N^k$ and recombine the two sums.

We can repeat the splitting procedure recursively $\log_2 N$ times
until the full DFT is reduced to DFTs of single terms. The DFT of a
single value is just the identity operation, which costs nothing.
However since $O(N)$ operations were needed at each stage to recombine
the even and odd parts the total number of operations to obtain the
full DFT is $O(N \log_2 N)$. If we had used a length which was a
product of factors $f_1$, $f_2$, $\dots$ we could have split the sum
in a similar way. First we would split terms corresponding to the
factor $f_1$, instead of the even and odd terms corresponding to a
factor of two.  Then we would repeat this procedure for the subsequent
factors. This would lead to a final operation count of $O(N \sum
f_i)$.

This procedure gives some motivation for why the number of operations
in a DFT can in principle be reduced from $O(N^2)$ to $O(N \sum f_i)$.
It does not give a good explanation of how to implement the algorithm
in practice which is what we shall do in the next section.

\section{Radix-2 Algorithms}
%
For radix-2 FFTs it is natural to write array indices in binary form
because the length of the data is a power of two. This is nicely
explained in the article {\em The FFT: Fourier Transforming One Bit at
a Time} by P.B. Visscher~\cite{visscher96}. A binary representation
for indices is the key to deriving the simplest efficient radix-2
algorithms.

We can write an index $b$ ($0 \leq b < 2^{n-1}$) in binary
representation like this,
%
\begin{equation}
b = [b_{n-1} \dots b_1 b_0] = 2^{n-1}b_{n-1} + \dots + 2 b_1 + b_0 .
\end{equation}
%
Each of the $b_0, b_1, \dots, b_{n-1}$ are the bits (either 0 or 1) of
$b$.

Using this notation the original definition of the DFT
can be rewritten as a sum over the bits of $b$,
%
\begin{equation} 
h(a) = \sum_{b=0}^{N-1} g_b \exp(-2\pi i a b /N)
\end{equation}
%
to give an equivalent summation like this,
%
\begin{equation} 
h([a_{n-1} \dots a_1 a_0]) = 
\sum_{b_0=0}^{1} 
\sum_{b_1=0}^{1} 
\dots
\sum_{b_{n-1}=0}^{1} 
 g([b_{n-1} \dots b_1 b_0]) W_N^{ab}
\end{equation}
%
where the bits of $a$ are $a=[a_{n-1} \dots a_1 a_0]$. 

To reduce the number of operations in the sum we will use the
periodicity of the exponential term,
%
\begin{equation}
W_N^{x+N} = W_N^{x}.
\end{equation}
%
Most of the products $ab$ in $W_N^{ab}$ are greater than $N$. By
making use of this periodicity they can all be collapsed down into the
range $0\dots N-1$. This allows us to reduce the number of operations
by combining common terms, modulo $N$. Using this idea we can derive
decimation-in-time or decimation-in-frequency algorithms, depending on
how we break the DFT summation down into common terms. We'll first
consider the decimation-in-time algorithm.

\subsection{Radix-2 Decimation-in-Time (DIT)}
%
To derive the the decimation-in-time algorithm we start by separating
out the most significant bit of the index $b$,
%
\begin{equation}
[b_{n-1} \dots b_1 b_0] = 2^{n-1}b_{n-1} + [b_{n-2} \dots b_1 b_0]
\end{equation}
%
Now we can evaluate the innermost sum of the DFT without any
dependence on the remaining bits of $b$ in the exponential,
%
\begin{eqnarray} 
h([a_{n-1} \dots a_1 a_0]) &=& 
\sum_{b_0=0}^{1} 
\sum_{b_1=0}^{1} 
\dots
\sum_{b_{n-1}=0}^{1} 
 g(b) 
W_N^{a(2^{n-1}b_{n-1}+[b_{n-2} \dots b_1 b_0])} \\
 &=& 
\sum_{b_0=0}^{1} 
\sum_{b_1=0}^{1} 
\dots
\sum_{b_{n-2}=0}^{1} 
W_N^{a[b_{n-2} \dots b_1 b_0]}
\sum_{b_{n-1}=0}^{1} 
 g(b) 
W_N^{a(2^{n-1}b_{n-1})}
\end{eqnarray}
%
Looking at the term $W_N^{a(2^{n-1}b_{n-1})}$ we see that we can also
remove most of the dependence on $a$ as well, by using the periodicity of the
exponential,
%
\begin{eqnarray}
W_N^{a(2^{n-1}b_{n-1})} &=&
\exp(-2\pi i [a_{n-1} \dots a_1 a_0] 2^{n-1} b_{n-1}/ 2^n )\\
&=& \exp(-2\pi i [a_{n-1} \dots a_1 a_0] b_{n-1}/ 2 )\\
&=& \exp(-2\pi i ( 2^{n-2}a_{n-1} + \dots + a_1 + (a_0/2)) b_{n-1} )\\
&=& \exp(-2\pi i a_0 b_{n-1}/2 ) \\
&=& W_2^{a_0 b_{n-1}}
\end{eqnarray}
%
Thus the innermost exponential term simplifies so that it only
involves the highest order bit of $b$ and the lowest order bit of $a$,
and the sum can be reduced to,
%
\begin{equation}
h([a_{n-1} \dots a_1 a_0])
= 
\sum_{b_0=0}^{1} 
\sum_{b_1=0}^{1} 
\dots
\sum_{b_{n-2}=0}^{1} 
W_N^{a[b_{n-2} \dots b_1 b_0]}
\sum_{b_{n-1}=0}^{1} 
 g(b) 
W_2^{a_0 b_{n-1}}.
\end{equation}
%

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -