📄 transforms.py
字号:
# as follows :def C (k, N): return cos ((k*pi)/(2*N))def unscaled_DCT (N, input, output): for o in range(N): # o is output index output[o] = 0 for i in range(N): # i is input index output[o] = output[o] + input[i] * C ((2*i+1)*o, N)# This trivial algorithm uses N*N multiplications and N*(N-1) additions.# One possible decomposition on this calculus is to use the fact that C (i, N)# and C (2*N-i, N) are opposed. This can lead to this decomposition :#def unscaled_DCT (N, input, output):# even_input = vector (N)# odd_input = vector (N)# even_output = vector (N)# odd_output = vector (N)## for i in range(N/2):# even_input[i] = input[i] + input[N-1-i]# odd_input[i] = input[i] - input[N-1-i]## unscaled_DCT (N, even_input, even_output)# unscaled_DCT (N, odd_input, odd_output)## for i in range(N/2):# output[2*i] = even_output[2*i]# output[2*i+1] = odd_output[2*i+1]# Now the even part can easily be calculated : by looking at the C(k,N)# formula, we see that the even part is actually an unscaled DCT of size N/2.# The odd part looks like a DCT of size N/2, but the coefficients are# actually C ((2*i+1)*(2*o+1), 2*N) instead of C ((2*i+1)*o, N).# We use a trigonometric relation here :# 2 * C ((a+b)/2, N) * C ((a-b)/2, N) = C (a, N) + C (b, N)# Thus with a = (2*i+1)*o and b = (2*i+1)*(o+1) :# 2 * C((2*i+1)*(2*o+1),2N) * C(2*i+1,2N) = C((2*i+1)*o,N) + C((2*i+1)*(o+1),N)# This leads us to the Lee DCT algorithm :def unscaled_DCT_Lee (N, input, output): even_input = vector(N/2) odd_input = vector(N/2) even_output = vector(N/2) odd_output = vector(N/2) for i in range(N/2): even_input[i] = input[i] + input[N-1-i] odd_input[i] = input[i] - input[N-1-i] for i in range(N/2): odd_input[i] = odd_input[i] * (0.5 / C (2*i+1, N)) unscaled_DCT (N/2, even_input, even_output) unscaled_DCT (N/2, odd_input, odd_output) for i in range(N/2-1): odd_output[i] = odd_output[i] + odd_output[i+1] for i in range(N/2): output[2*i] = even_output[i] output[2*i+1] = odd_output[i];# Notes about this algorithm :# The algorithm can be easily inverted to calculate the IDCT instead :# each of the basic stages are separately inversible...# This function does N adds, then N/2 muls, then 2 recursive calls with# size N/2, then N/2-1 adds again. If we apply it recursively, the total# number of operations will be N*log2(N)/2 multiplies and N*(3*log2(N)/2-1) + 1# additions. So this is much faster than the canonical algorithm.# Some of the multiplication coefficients 0.5/cos(...) can get quite large.# This means that a small error in the input will give a large error on the# output... For a DCT of size N the biggest coefficient will be at i=N/2-1# and it will be slighly more than N/pi which can be large for large N's.# In the IDCT however, the multiplication coefficients for the reverse# transformation are of the form 2*cos(...) so they can not get big and there# is no accuracy problem.# You can find another description of this algorithm at# http://www.intel.com/drg/mmx/appnotes/ap533.htm# Another idea is to observe that the DCT calculation can be made to look like# the DFT calculation : C (k, N) is the real part of W (k, 4*N) or W (-k, 4*N).# We can use this idea translate the DCT algorithm into a call to the DFT# algorithm :def unscaled_DCT_DFT (N, input, output): DFT_input = vector (4*N) DFT_output = vector (4*N) for i in range(N): DFT_input[2*i+1] = input[i] #DFT_input[4*N-2*i-1] = input[i] # We could use this instead unscaled_DFT (4*N, DFT_input, DFT_output) for i in range(N): output[i] = DFT_output[i].real# We can then use our knowledge of the DFT calculation to optimize for this# particular case. For example using the radix-2 decimation-in-time method :#def unscaled_DCT_DFT (N, input, output):# DFT_input = vector (2*N)# DFT_output = vector (2*N)## for i in range(N):# DFT_input[i] = input[i]# #DFT_input[2*N-1-i] = input[i] # We could use this instead## unscaled_DFT (2*N, DFT_input, DFT_output)## for i in range(N):# DFT_output[i] = DFT_output[i] * W (i, 4*N)## for i in range(N):# output[i] = DFT_output[i].real# This leads us to the AAN implementation of the DCT algorithm : if we set# both DFT_input[i] and DFT_input[2*N-1-i] to input[i], then the imaginary# parts of W(2*i+1) and W(-2*i-1) will compensate, and output_DFT[i] will# already be a real after the multiplication by W(i,4*N). Which means that# before the multiplication, it is the product of a real number and W(-i,4*N).# This leads to the following code, called the AAN algorithm :def unscaled_DCT_AAN (N, input, output): DFT_input = vector (2*N) DFT_output = vector (2*N) for i in range(N): DFT_input[i] = input[i] DFT_input[2*N-1-i] = input[i] symetrical_unscaled_DFT (2*N, DFT_input, DFT_output) for i in range(N): output[i] = DFT_output[i].real * (0.5 / C (i, N))# Notes about the AAN algorithm :# The cost of this function is N real multiplies and a DFT of size 2*N. The# DFT to calculate has special properties : the inputs are real and symmetric.# Also, we only need to calculate the real parts of the N first DFT outputs.# We can try to take advantage of all that.# We can invert this algorithm to calculate the IDCT. The final multiply# stage is trivially invertible. The DFT stage is invertible too, but we have# to take into account the special properties of this particular DFT for that.# Once again we have to take care of numerical precision for the DFT : the# output coefficients can get large, so that a small error in the input will# give a large error on the output... For a DCT of size N the biggest# coefficient will be at i=N/2-1 and it will be slightly more than N/pi# You can find another description of this algorithm at this url :# www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/transform/fastdct.html# (It is the same server where we already found a description of the fast DFT)# To optimize the DFT calculation, we can take a lot of specific things into# account : the input is real and symetric, and we only care about the real# part of the output. Also, we only care about the N first output coefficients,# but that one does not save operations actually, because the other# coefficients are the conjugates of the ones we look anyway.# One useful way to use the symetry of the input is to use the radix-2# decimation-in-frequency algorithm. We can write a version of# unscaled_DFT_radix2_freq for the case where the input is symetrical :# we have removed a few additions in the first stages because even_input# is symetrical and odd_input is antisymetrical. Also, we have modified the# odd_input vector so that the second half of it is set to zero and the real# part of the DFT output is not modified. After that modification, the second# part of the odd_input was null so we used the radix-2 decimation-in-frequency# again on the odd DFT. Also odd_output is symetrical because input is real...def symetrical_unscaled_DFT (N, input, output): even_input = vector(N/2) odd_tmp = vector(N/2) odd_input = vector(N/2) even_output = vector(N/2) odd_output = vector(N/2) for i in range(N/4): even_input[N/2-i-1] = even_input[i] = input[i] + input[N/2-1-i] for i in range(N/4): odd_tmp[i] = input[i] - input[N/2-1-i] odd_input[0] = odd_tmp[0] for i in range(N/4)[1:]: odd_input[i] = (odd_tmp[i] + odd_tmp[i-1]) * W (i, N) unscaled_DFT (N/2, even_input, even_output) # symetrical real inputs, real outputs unscaled_DFT (N/4, odd_input, odd_output) # complex inputs, real outputs for i in range(N/2): output[2*i] = even_output[i] for i in range(N/4): output[N-1-4*i] = output[4*i+1] = odd_output[i]# This procedure takes 3*N/4-1 real additions and N/2-3 real multiplies,# followed by another symetrical real DFT of size N/2 and a "complex to real"# DFT of size N/4.# We thus get the following performance results :# N real additions real multiplies complex multiplies# 1 0 0 0# 2 0 0 0# 4 2 0 0# 8 9 1 0# 16 28 6 0# 32 76 21 0# 64 189 54 2# 128 451 125 10# 256 1042 270 36# 512 2358 565 108# 1024 5251 1158 294# 2048 11557 2349 750# 4096 25200 4734 1832# 8192 54544 9509 4336# 16384 117337 19062 10026# 32768 251127 38173 22770# 65536 535102 76398 50988# We thus get a better performance with the AAN DCT algorithm than with the# Lee DCT algorithm : we can do a DCT of size 32 with 189 additions, 54+32 real# multiplies, and 2 complex multiplies. The Lee algorithm would have used 209# additions and 80 multiplies. With the AAN algorithm, we also have the# advantage that a big number of the multiplies are actually grouped at the# output stage of the algorithm, so if we want to do a DCT followed by a# quantization stage, we will be able to group the multiply of the output with# the multiply of the quantization stage, thus saving 32 more operations. In# the mpeg audio layer 1 or 2 processing, we can also group the multiply of the# output with the multiply of the convolution stage...# Another source code for the AAN algorithm (implemented on 8 points, and# without all of the explanations) can be found at this URL :# http://developer.intel.com/drg/pentiumII/appnotes/aan_org.c . This# implementation uses 28 adds and 6+8 muls instead of 29 adds and 5+8 muls -# the difference is that in the symetrical_unscaled_DFT procedure, they noticed# how odd_input[i] and odd_input[N/4-i] will be combined at the start of the# complex-to-real DFT and they took advantage of this to convert 2 real adds# and 4 real muls into one complex multiply.# TODO : write about multi-dimentional DCT# TEST CODEdef dump (vector): str = "" for i in range(len(vector)): if i: str = str + ", " vector[i] = vector[i] + 0j realstr = "%+.4f" % vector[i].real imagstr = "%+.4fj" % vector[i].imag if (realstr == "-0.0000"): realstr = "+0.0000" if (imagstr == "-0.0000j"): imagstr = "+0.0000j" str = str + realstr #+ imagstr return "[%s]" % strimport whrandomdef test(N): input = vector(N) output = vector(N) verify = vector(N) for i in range(N): input[i] = whrandom.random() + 1j * whrandom.random() unscaled_DFT (N, input, output) unscaled_DFT (N, input, verify) if (dump(output) != dump(verify)): print dump(output) print dump(verify)#test (64)# PERFORMANCE ANALYSIS CODEdef display (table): N = 1 print "#\tN\treal additions\treal multiplies\tcomplex multiplies" while table.has_key(N): print "#%8d%16d%16d%16d" % (N, table[N][0], table[N][1], table[N][2]) N = 2*N printbest_complex_DFT = {}def complex_DFT (max_N): best_complex_DFT[1] = (0,0,0) best_complex_DFT[2] = (4,0,0) best_complex_DFT[4] = (16,0,0) N = 8 while (N<=max_N): # best method = split radix best2 = best_complex_DFT[N/2] best4 = best_complex_DFT[N/4] best_complex_DFT[N] = (best2[0] + 2*best4[0] + 3*N + 4, best2[1] + 2*best4[1] + 4, best2[2] + 2*best4[2] + N/2 - 4) N = 2*Nbest_real_DFT = {}def real_DFT (max_N): best_real_DFT[1] = (0,0,0) best_real_DFT[2] = (2,0,0) best_real_DFT[4] = (6,0,0) N = 8 while (N<=max_N): # best method = split radix decimate-in-frequency best2 = best_real_DFT[N/2] best4 = best_complex_DFT[N/4] best_real_DFT[N] = (best2[0] + best4[0] + N + 2, best2[1] + best4[1] + 2, best2[2] + best4[2] + N/4 - 2) N = 2*Nbest_complex2real_DFT = {}def complex2real_DFT (max_N): best_complex2real_DFT[1] = (0,0,0) best_complex2real_DFT[2] = (2,0,0) best_complex2real_DFT[4] = (8,0,0) N = 8 while (N<=max_N): best2 = best_complex2real_DFT[N/2] best4 = best_complex_DFT[N/4] best_complex2real_DFT[N] = (best2[0] + best4[0] + 3*N/2 + 1, best2[1] + best4[1] + 2, best2[2] + best4[2] + N/4 - 2) N = 2*Nbest_real_symetric_DFT = {}def real_symetric_DFT (max_N): best_real_symetric_DFT[1] = (0,0,0) best_real_symetric_DFT[2] = (0,0,0) best_real_symetric_DFT[4] = (2,0,0) N = 8 while (N<=max_N): best2 = best_real_symetric_DFT[N/2] best4 = best_complex2real_DFT[N/4] best_real_symetric_DFT[N] = (best2[0] + best4[0] + 3*N/4 - 1, best2[1] + best4[1] + N/2 - 3, best2[2] + best4[2]) N = 2*Ncomplex_DFT (65536)real_DFT (65536)complex2real_DFT (65536)real_symetric_DFT (65536)print "complex DFT"display (best_complex_DFT)print "real DFT"display (best_real_DFT)print "complex2real DFT"display (best_complex2real_DFT)print "real symetric DFT"display (best_real_symetric_DFT)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -