📄 dsp_fft32x32_sa.sa

📁 TI c64x的FFT程序
💻 SA
📖 第 1 页 / 共 4 页
字号:
*       The fft() code shown here performs the bulk of the computation      * *       in place. However, because digit-reversal cannot be performed       * *       in-place, the final result is written to a separate array, y[].     * *                                                                           * *       There is one slight break in the flow of packed processing that     * *       needs to be comprehended. The real part of the complex number is    * *       in the lower half, and the imaginary part is in the upper half.     * *       The flow breaks in case of "xl0" and "xl1" because in this case     * *       the real part needs to be combined with the imaginary part because  * *       of the multiplication by "j". This requires a packed quantity like  * *       "xl21xl20" to be rotated as "xl20xl21" so that it can be combined   * *        using add2's and sub2's. Hence the natural version of C code       * *       shown below is transformed using packed data processing as shown:   * *                                                                           * *                        xl0  = x[2 * i0    ] - x[2 * i2    ];              * *                        xl1  = x[2 * i0 + 1] - x[2 * i2 + 1];              * *                        xl20 = x[2 * i1    ] - x[2 * i3    ];              * *                        xl21 = x[2 * i1 + 1] - x[2 * i3 + 1];              * *                                                                           * *                        xt1  = xl0 + xl21;                                 * *                        yt2  = xl1 + xl20;                                 * *                        xt2  = xl0 - xl21;                                 * *                        yt1  = xl1 - xl20;                                 * *                                                                           * *                        xl1_xl0   = _sub2(x21_x20, x21_x20)                * *                        xl21_xl20 = _sub2(x32_x22, x23_x22)                * *                        xl20_xl21 = _rotl(xl21_xl20, 16)                   * *                                                                           * *                        yt2_xt1   = _add2(xl1_xl0, xl20_xl21)              * *                        yt1_xt2   = _sub2(xl1_xl0, xl20_xl21)              * *                                                                           * *       Also notice that xt1, yt1 endup on seperate words, these need to    * *       be packed together to take advantage of the packed twiddle fact     * *       ors that have been loaded. In order for this to be achieved they    * *       are re-aligned as follows:                                          * *                                                                           * *       yt1_xt1 = _packhl2(yt1_xt2, yt2_xt1)                                * *       yt2_xt2 = _packhl2(yt2_xt1, yt1_xt2)                                * *                                                                           * *       The packed words "yt1_xt1" allows the loaded"sc" twiddle factor     * *       to be used for the complex multiplies. The real part os the         * *       complex multiply is implemented using _dotp2. The imaginary         * *       part of the complex multiply is implemented using _dotpn2           * *       after the twiddle factors are swizzled within the half word.        * *                                                                           * *       (X + jY) ( C + j S) = (XC + YS) + j (YC - XS).                      * *                                                                           * *       The actual twiddle factors for the FFT are cosine, - sine. The      * *       twiddle factors stored in the table are csine and sine, hence       * *       the sign of the "sine" term is comprehended during multipli-        * *       cation as shown above.                                              * *                                                                           * *                                                                           * *   ASSUMPTIONS                                                             * *                                                                           * *       The size of the FFT, n, must be a power of 4 and greater than       * *       or equal to 16 and less than 32768.                                 * *                                                                           * *       The arrays 'x[]', 'y[]', and 'w[]' all must be aligned on a         * *       double-word boundary for the "optimized" implementations.           * *                                                                           * *       The input and output data are complex, with the real/imaginary      * *       components stored in adjacent locations in the array.  The real     * *       components are stored at even array indices, and the imaginary      * *       components are stored at odd array indices.                         * *                                                                           * *   C CODE                                                                  * *                                                                           * *                                                                           * * ------------------------------------------------------------------------- **             Copyright (c) 2007 Texas Instruments, Incorporated.           **                            All Rights Reserved.                           ** ========================================================================= *                .sect ".text:psa"                .global _DSP_fft32x32* ======================================================================== **S Place file level definitions here.                                     S** ======================================================================== *_DSP_fft32x32    .cproc A_ptr_w, B_n, A_ptr_x, B_ptr_y             .no_mdep; ====================== SYMBOLIC REGISTER ASSIGNMENTS =======================        .rega           A_fft_jmp        .rega           A_y        .regb           B_y        .regb           B_i        .rega           A_w        .regb           B_w        .regb           B_x_1:B_x_0        .rega           A_x_3:A_x_2        .regb           B_xl1_1i:B_xl1_0i        .rega           A_xl1_3i:A_xl1_2i        .regb           B_xl2_1i:B_xl2_0i        .rega           A_xl2_3i:A_xl2_2i        .regb           B_xh2_1i:B_xh2_0i        .rega           A_xh2_3i:A_xh2_2i        .regb           B_2h2        .rega           A_2h2        .regb           B_xh0_0:B_xl0_0        .regb           B_xh1_0:B_xl1_0        .rega           A_xh0_1:A_xl0_1        .rega           A_xh1_1:A_xl1_1        .regb           B_xh20_0:B_xl20_0        .regb           B_xh21_0:B_xl21_0        .rega           A_xh20_1:A_xl20_1        .rega           A_xh21_1:A_xl21_1        .regb           B_xt1_0:B_xt2_0        .regb           B_yt2_0:B_yt1_0        .rega           A_xt1_1:A_xt2_1        .rega           A_yt2_1:A_yt1_1        .regb           B_x_1o:B_x_0o        .rega           A_x_3o:A_x_2o        .regb           B_xh2_1o:B_xh2_0o        .rega           A_xh2_3o:A_xh2_2o        .regb           B_xl1_1o:B_xl1_0o        .rega           A_xl1_3o:A_xl1_2o        .regb           B_xl2_1o:B_xl2_0o        .rega           A_xl2_3o:A_xl2_2o        .rega           A_x1:A_x0        .regb           B_x3:B_x2        .rega           A_x5:A_x4        .regb           B_x7:B_x6        .rega           A_xh0_0:A_xl0_0        .rega           A_xh1_0:A_xl1_0        .regb           B_xh0_1:B_xl0_1        .regb           B_xh1_1:B_xl1_1        .regb           B_y1:B_y0        .rega           A_y3:A_y2        .regb           B_y5:B_y4        .rega           A_y7:A_y6        .regb           B_w0,  B_x        .rega           A_x        .regb           B_xp1:B_xp0        .rega           A_l1        .rega           A_xl1p1:A_xl1p0        .rega           A_h2        .regb           B_xh2p1:B_xh2p0        .rega           A_l2        .rega           A_xl2p1:A_xl2p0        .rega           A_xh0, A_xh1_0c        .rega           A_xh1        .regb           B_xl0        .regb           B_xl1        .rega           A_xh20        .rega           A_xh21        .regb           B_xl20        .regb           B_xl21, B_xl1_1c         .rega           A_y_h1_1:A_y_h1_0        .rega           A_w0        .rega           A_j        .regb           B_j        .regb           B_co10:B_si10        .rega           A_co20:A_si20        .regb           B_co30:B_si30        .rega           A_co11:A_si11        .regb           B_co21:B_si21        .rega           A_co31:A_si31        .rega           A_xt0        .rega           A_yt0        .regb           B_xt1        .regb           B_yt2        .regb           B_xt2        .regb           B_yt1        .regb           B_p0r        .rega           A_p1r        .regb           B_p01r        .regb           B_p0c        .rega           A_p1c        .regb           B_y_h2_1:B_y_h2_0        .regb           B_p01c        .rega           A_p2r        .rega           A_p3r        .rega           A_p23r        .rega           A_p2c        .rega           A_p3c        .rega           A_y_l1_1:A_y_l1_0        .rega           A_p23c        .regb           B_p4r        .regb           B_p5r        .regb           B_p45r        .regb           B_p4c        .regb           B_p5c        .regb           B_y_l2_1:B_y_l2_0        .regb           B_p45c        .rega           A_x_1        .rega           B_x__        .regb           B_fft_jmp        .rega           A_fft_jmp_1        .rega           A_ifj        .regb           B_ifj        .regb           B_h2        .regb           B_l1        .rega           A_i        .regb           B_xt0_0, B_yt0_0        .rega           A_xt0_1, A_yt0_1        .regb           B_p0, B_p1, B_p2, B_p3        .rega           A_p4, A_p5, A_p6, A_p7        .rega           A_p8, A_pb, A_pc, A_pe        .regb           B_p9, B_pa, B_pd, B_pf        .regb           B_p10, B_p11, B_p12, B_p13        .rega           A_p14, A_p15, A_p16, A_p17        .rega           A_tw_offset        .regb           B_stride, B_while        .rega           A_p_x0        .regb           B_p_x0        .regb           B_p_y0, B_p_y1, B_p_y2, B_p_y3        .regb           B_h0, B_h1, B_h3, B_h4        .rega           A_r2, A_radix, A_temp        .regb           B_j0, B_radix2 ; ======================================================================          ;-------------------------------------------------------------;          ;  Assume radix is 4, by default. Check the norm of the # of  ;          ; points to be transformed, and change radix to 2 if reqd.    ;          ;-------------------------------------------------------------;        MVK     .1     4,                A_radix                            NORM    .2     B_n,              B_radix2                         AND     .2     B_radix2,         1,                B_radix2[B_radix2]MVK   .1     2,                A_radix                            ;-------------------------------------------------------------;          ; "stride" is a vraibale that denotes the speration between   ;          ; the legs of the butterfly. "tw_offset" is the offset within ;
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -