📄 fftalgorithms.tex
字号:
Therefore we can easily compute the inverse transform by conjugating
all the complex elements of the DFT matrices and twiddle factors that
we apply. (An alternative strategy is to conjugate the input data,
take a forward transform, and then conjugate the output data).
\section{Fast Sub-transform Modules}
%
To implement the mixed-radix FFT we still need to compute the
small-$N$ DFTs for each factor. Fortunately many highly-optimized
small-$N$ modules are available, following the work of Winograd who
showed how to derive efficient small-$N$ sub-transforms by number
theoretic techniques.
The algorithms in this section all compute,
%
\begin{equation}
x_a = \sum_{b=0}^{N-1} W_N^{ab} z_b
\end{equation}
%
The sub-transforms given here are the ones recommended by Temperton
and differ slightly from the canonical Winograd modules. According to
Temperton~\cite{temperton83} they are slightly more robust against
rounding errors and trade off some additions for multiplications.
%
For the $N=2$ DFT,
%
\begin{equation}
\begin{array}{ll}
x_0 = z_0 + z_1, &
x_1 = z_0 - z_1.
\end{array}
\end{equation}
%
For the $N=3$ DFT,
%
\begin{equation}
\begin{array}{lll}
t_1 = z_1 + z_2, &
t_2 = z_0 - t_1/2, &
t_3 = \sin(\pi/3) (z_1 - z_2),
\end{array}
\end{equation}
\begin{equation}
\begin{array}{lll}
x_0 = z_0 + t_1, &
x_1 = t_2 + i t_3, &
x_2 = t_2 - i t_3.
\end{array}
\end{equation}
%
The $N=4$ transform involves only additions and subtractions,
%
\begin{equation}
\begin{array}{llll}
t_1 = z_0 + z_2, &
t_2 = z_1 + z_3, &
t_3 = z_0 - z_2, &
t_4 = z_1 - z_3,
\end{array}
\end{equation}
\begin{equation}
\begin{array}{llll}
x_0 = t_1 + t_2, &
x_1 = t_3 + i t_4, &
x_2 = t_1 - t_2, &
x_3 = t_3 - i t_4.
\end{array}
\end{equation}
%
For the $N=5$ DFT,
%
\begin{equation}
\begin{array}{llll}
t_1 = z_1 + z_4, &
t_2 = z_2 + z_3, &
t_3 = z_1 - z_4, &
t_4 = z_2 - z_3,
\end{array}
\end{equation}
\begin{equation}
\begin{array}{llll}
t_5 = t_1 + t_2, &
t_6 = (\sqrt{5}/4) (t_1 - t_2), &
t_7 = z_0 - t_5/4, \\
\end{array}
\end{equation}
\begin{equation}
\begin{array}{llll}
t_8 = t_7 + t_6, &
t_9 = t_7 - t_6, \\
\end{array}
\end{equation}
\begin{equation}
\begin{array}{llll}
t_{10} = \sin(2\pi/5) t_3 + \sin(2\pi/10) t_4, &
t_{11} = \sin(2\pi/10) t_3 - \sin(2\pi/5) t_4,
\end{array}
\end{equation}
\begin{equation}
\begin{array}{llll}
x_0 = z_0 + t_5,
\end{array}
\end{equation}
\begin{equation}
\begin{array}{llll}
x_1 = t_8 + i t_{10}, &
x_2 = t_9 + i t_{11},
\end{array}
\end{equation}
\begin{equation}
\begin{array}{llll}
x_3 = t_9 - i t_{11}, &
x_4 = t_8 - i t_{10}.
\end{array}
\end{equation}
%
The DFT matrix for $N=6$ can be written as a combination of $N=3$ and
$N=2$ transforms with index permutations,
%
\begin{equation}
\left(
\begin{array}{c}
x_0 \\
x_4 \\
x_2 \\
\hline x_3 \\
x_1 \\
x_5
\end{array}
\right)
=
\left(
\begin{array}{ccc|ccc}
& & & & & \\
&W_3& & &W_3& \\
& & & & & \\
\hline & & & & & \\
&W_3& & &-W_3& \\
& & & & &
\end{array}
\right)
\left(
\begin{array}{c}
z_0 \\
z_2 \\
z_4 \\
\hline z_3 \\
z_5 \\
z_1
\end{array}
\right)
\end{equation}
%
This simplification is an example of the Prime Factor Algorithm, which
can be used because the factors 2 and 3 are mutually prime. For more
details consult one of the books on number theory for
FFTs~\cite{elliott82,blahut}. We can take advantage of the simple
indexing scheme of the PFA to write the $N=6$ DFT as,
%
\begin{equation}
\begin{array}{lll}
t_1 = z_2 + z_4, &
t_2 = z_0 - t_1/2, &
t_3 = \sin(\pi/3) (z_2 - z_4),
\end{array}
\end{equation}
\begin{equation}
\begin{array}{lll}
t_4 = z_5 + z_1, &
t_5 = z_3 - t_4/2, &
t_6 = \sin(\pi/3) (z_5 - z_1),
\end{array}
\end{equation}
\begin{equation}
\begin{array}{lll}
t_7 = z_0 + t_1, &
t_8 = t_2 + i t_3, &
t_9 = t_2 - i t_3,
\end{array}
\end{equation}
\begin{equation}
\begin{array}{lll}
t_{10} = z_3 + t_4, &
t_{11} = t_5 + i t_6, &
t_{12} = t_5 - i t_6,
\end{array}
\end{equation}
\begin{equation}
\begin{array}{lll}
x_0 = t_7 + t_{10}, &
x_4 = t_8 + t_{11}, &
x_2 = t_9 + t_{12},
\end{array}
\end{equation}
\begin{equation}
\begin{array}{lll}
x_3 = t_7 - t_{10}, &
x_1 = t_8 - t_{11}, &
x_5 = t_9 - t_{12}.
\end{array}
\end{equation}
For any remaining general factors we use Singleton's efficient method
for computing a DFT~\cite{singleton}. Although it is an $O(N^2)$
algorithm it does reduce the number of multiplications by a factor of
4 compared with a naive evaluation of the DFT. If we look at the
general stucture of a DFT matrix, shown schematically below,
%
\begin{equation}
\left(
\begin{array}{c}
h_0 \\
h_1 \\
h_2 \\
\vdots \\
h_{N-2} \\
h_{N-1}
\end{array}
\right)
=
\left(
\begin{array}{cccccc}
1 & 1 & 1 & \cdots & 1 & 1 \\
1 & W & W & \cdots & W & W \\
1 & W & W & \cdots & W & W \\
\vdots & \vdots & \vdots & \cdots & \vdots & \vdots \\
1 & W & W & \cdots & W & W \\
1 & W & W & \cdots & W & W
\end{array}
\right)
\left(
\begin{array}{c}
g_0 \\
g_1 \\
g_2 \\
\vdots \\
g_{N-2} \\
g_{N-1}
\end{array}
\right)
\end{equation}
%
we see that the outer elements of the DFT matrix are all unity. We can
remove these trivial multiplications but we will still be left with an
$(N-1) \times (N-1)$ sub-matrix of complex entries, which would appear
to require $(N-1)^2$ complex multiplications. Singleton's method,
uses symmetries of the DFT matrix to convert the complex
multiplications to an equivalent number of real multiplications. We
start with the definition of the DFT in component form,
%
\begin{equation}
a_k + i b_k = \sum_{j=0} (x_j+iy_j)(\cos(2\pi jk/f) - i\sin(2\pi jk/f))
\end{equation}
%
The zeroth component can be computed using only additions,
%
\begin{equation}
a_0 + i b_0 = \sum_{j=0}^{(f-1)} x_j + i y_j
\end{equation}
%
We can rewrite the remaining components as,
%
\begin{eqnarray}
a_k + i b_k & = x_0 + i y_0 & +
\sum_{j=1}^{(f-1)/2} (x_j + x_{f-j}) \cos(2\pi jk/f)
+ (y_j - y_{f-j}) \sin(2\pi jk/f) \\
& & + i\sum_{j=1}^{(f-1)/2} (y_j + y_{f-j}) \cos(2\pi jk/f)
- (x_j - x_{f-j}) \sin(2\pi jk/f)
\end{eqnarray}
%
by using the following trigonometric identities,
%
\begin{eqnarray}
\cos(2\pi(f-j)k/f) &=& \phantom{-}\cos(2\pi jk/f) \\
\sin(2\pi(f-j)k/f) &=& -\sin(2\pi jk/f)
\end{eqnarray}
%
These remaining components can all be computed using four partial
sums,
%
\begin{eqnarray}
a_k + i b_k & = & (a^+_k - a^-_k) + i (b^+_k + b^-_k) \\
a_{f-k} + i b_{f-k} & = & (a^+_k + a^-_k) + i (b^+_k - b^-_k)
\end{eqnarray}
%
for $k = 1, 2, \dots, (f-1)/2$, where,
%
\begin{eqnarray}
a^+_k &=& x_0 + \sum_{j=1}^{(f-1)/2} (x_j + x_{f-j}) \cos(2\pi jk/f) \\
a^-_k &=& \phantom{x_0} - \sum_{j=1}^{(f-1)/2} (y_j - y_{f-j}) \sin(2\pi jk/f) \\
b^+_k &=& y_0 + \sum_{j=1}^{(f-1)/2} (y_j + y_{f-j}) \cos(2\pi jk/f) \\
b^-_k &=& \phantom{y_0} - \sum_{j=1}^{(f-1)/2} (x_j - x_{f-j}) \sin(2\pi jk/f)
\end{eqnarray}
%
Note that the higher components $k'=f-k$ can be obtained directly
without further computation once $a^+$, $a^-$, $b^+$ and $b^-$ are
known. There are $4 \times (f-1)/2$ different sums, each involving
$(f-1)/2$ real multiplications, giving a total of $(f-1)^2$ real
multiplications instead of $(f-1)^2$ complex multiplications.
To implement Singleton's method we make use of the input and output
vectors $v$ and $v'$ as scratch space, copying data back and forth
between them to obtain the final result. First we use $v'$ to store
the terms of the symmetrized and anti-symmetrized vectors of the form
$x_j + x_{f-j}$ and $x_j - y_{f-j}$. Then we multiply these by the
appropriate trigonometric factors to compute the partial sums $a^+$,
$a^-$, $b^+$ and $b^-$, storing the results $a_k + i b_k$ and $a_{f-k}
+ i b_{f-k}$ back in $v$. Finally we multiply the DFT output by any
necessary twiddle factors and place the results in $v'$.
\section{FFTs for real data}
%
This section is based on the articles {\em Fast Mixed-Radix Real
Fourier Transforms} by Clive Temperton~\cite{temperton83real} and
{\em Real-Valued Fast Fourier Transform Algorithms} by Sorensen,
Jones, Heideman and Burrus~\cite{burrus87real}. The DFT of a real
sequence has a special symmetry, called a {\em conjugate-complex} or
{\em half-complex} symmetry,
%
\begin{equation}
h(a) = h(N-a)^*
\end{equation}
%
The element $h(0)$ is real, and when $N$ is even $h(N/2)$ is also
real. It is straightforward to prove the symmetry,
%
\begin{eqnarray}
h(a) &=& \sum W^{ab}_N g(b) \\
h(N-a)^* &=& \sum W^{-(N-a)b}_N g(b)^* \\
&=& \sum W^{-Nb}_N W^{ab}_N g(b) \qquad{(W^N_N=1)} \\
&=& \sum W^{ab}_N g(b)
\end{eqnarray}
%
Real-valued data is very common in practice (perhaps more common that
complex data) so it is worth having efficient FFT routines for real
data. In principle an FFT for real data should need half the
operations of an FFT on the equivalent complex data (where the
imaginary parts are set to zero). There are two different strategies
for computing FFTs of real-valued data:
One strategy is to ``pack'' the real data (of length $N$) into a
complex array (of length $N/2$) by index transformations. A complex
FFT routine can then be used to compute the transform of that array.
By further index transformations the result can actually by
``unpacked'' to the FFT of the original real data. It is also possible
to do two real FFTs simultaneously by packing one in the real part and
the other in the imaginary part of the complex array. These
techniques have some disadvantages. The packing and unpacking
procedures always add $O(N)$ operations, and packing a real array of
length $N$ into a complex array of length $N/2$ is only possible if
$N$ is even. In addition, if two unconnected datasets with very
different magnitudes are packed together in the same FFT there could
be ``cross-talk'' between them due to a loss of precision.
A more straightforward strategy is to start with an FFT algorithm,
such as the complex mixed-radix algorithm, and prune out all the
operations involving the zero imaginary parts of the initial data. The
FFT is linear so the imaginary part of the data can be decoupled from
the real part. This procedure leads to a dedicated FFT for real-valued
data which works for any length and does not perform any unnecessary
operations. It also allows us to derive a corresponding inverse FFT
routine which transforms a half-complex sequence back into real data.
\subsection{Radix-2 FFTs for real data}
%
Before embarking on the full mixed-radix real FFT we'll start with the
radix-2 case. It contains all the essential features of the
general-$N$ algorithm. To make it easier to see the analogy between
the two we will use the mixed-radix notation to describe the
factors. The factors are all 2,
%
\begin{equation}
f_1 = 2, f_2 = 2, \dots, f_{n_f} = 2
\end{equation}
%
and the products $p_i$ are powers of 2,
%
\begin{eqnarray}
p_0 & = & 1 \\
p_1 & = & f_1 = 2 \\
p_2 & = & f_1 f_2 = 4 \\
\dots &=& \dots \\
p_i & = & f_1 f_2 \dots f_i = 2^i
\end{eqnarray}
%
Using this notation we can rewrite the radix-2 decimation-in-time
algorithm as,
%
\begin{algorithm}
\STATE bit-reverse ordering of $g$
\FOR {$i = 1 \dots n$}
\FOR {$a = 0 \dots p_{i-1} - 1$}
\FOR{$b = 0 \dots
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -