⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 cfft_rad4_ps_pbrev.asm

📁 ADI BF DSP的FFT汇编优化后的代码
💻 ASM
📖 第 1 页 / 共 2 页
字号:
/*******************************************************************************
Copyright(c) 2000 - 2002 Analog Devices. All Rights Reserved.
Developed by Joint Development Software Application Team, IPDC, Bangalore, India
for Blackfin DSPs  ( Micro Signal Architecture 1.0 specification).

By using this module you agree to the terms of the Analog Devices License
Agreement for DSP Software. 
********************************************************************************
Module Name     : CFFT_Rad4_PS_PBRev.asm
Label name      : __CFFT_Rad4_PS_PBRev
Version         : 1.4
Change History  :

                Version   Date            Author        Comments
                1.4       11/18/2002      Swarnalatha   Tested with VDSP++ 3.0
                                                        compiler 6.2.2 on 
                                                        ADSP-21535 Rev.0.2
                1.3       11/13/2002      Swarnalatha   Tested with VDSP++3.0
                                                        on ADSP-21535 Rev.0.2
                1.2       02/18/2002      Nishanth      Modified to match 
                                                        silicon cycle count
                1.1       11/12/2001      Nishanth      Modified to include 
                                                        initialization of M2
                1.0       05/01/2001      Nishanth      Original

Description     : The assembly function implements radix4 outplace FFT DIT 
                  algorithm for complex inputs.
                 
                  The twiddle factor array to be passed to the function must be 
                  initialized with real(cos) and imaginary(-sine) values 
                  alternately.
                  The length of the twiddle factor array should be 3*N/4 - 2, 
                  where N is the number of FFT points. 
                             w = e^(-2*j*pi*[0 : 3*N/4-3]/N)

                  The input array to be passed to the function also must be 
                  initialized with real and imaginary values alternately.
                  The real and imaginary output values will be stored 
                  alternately in the output array.

                  Static scaling : Input data is scaled by 4 in the first stage 
                  to avoid overflow. The output of each stage FFT except last 
                  stage is scaled by 4 to avoid overflow.

Assumptions     : 1. There should be atleast 2 stages since first stage is 
                     done separately. In the first stage, Bit reversal is done 
                     and so it is separated out. In the last stage, scaling of 
                     output is not required and so is separated out, but if 
                     N = 16, computation for intermediate stages are skipped by 
                     the conditional jump.
                     In brief, FFT length should be atleast 16 and should be an 
                     integer power of 4.

                  2. in[], w[] and out[] should be aligned to a 4 byte boundary.

                  3. All the inputs are assumed to be scaled by 4.

                  4. Input vector is assumed to be stored in bit reversed order.

                  5. out[] and w[] should be in different minibanks.

                  6. in[] and out[] should be in different minibanks.

Prototype       : void _CFFT_Rad4_PS_PBRev(
                        complex_fract16 in[],
                            // (i) :  Pointer to the input array. 
                        complex_fract16 out[],
                            // (o) :  Pointer to the output array. 
                        int             N,
                            // (i) :  FFT length. 
                        complex_fract16 w[])
                            // (i) :  Twiddle factor array 

Registers used  : A0,A1, R0-R7, I0-I3, B1,B3, M0-M3, L0-L3, P0-P2, LC0,LC1, CC

Performance     :
                Code Size   : 472 Bytes.
                Cycle Count : 3 * N * M   +   20 * M   -   2 * N   +   19
                          where N = FFT length and M = log(N) to the base 4.
                      123 cycles for FFT size of   16.
                      527 cycles for FFT size of   64.
                     2659 cycles for FFT size of  256.
                    13431 cycles for FFT size of 1024.

******************************************************************************/
.section L1_code;
.align 8;
.global __CFFT_Rad4_PS_PBRev;
    
__CFFT_Rad4_PS_PBRev:
    [--SP] = (R7:4);        // Save registers R4-R7
    P1 = R2;                // P1 = N, length of FFT
    
    I0 = R0;                // Address of input array
    
    I1 = R1;                // Address of output buffer(read pointer)
    I3 = R1;                // Address of output buffer(write pointer)
    B1 = R1;                // Base address of circular buffer
    B3 = R1;
    R0 = R2 << 2 || R3 = [SP + 28]; 
                            // R0 = 4*N , R3 = Address of twiddle factor array
    L0 = 0;                 // Disable circular buffering
    L2 = 0;
    L1 = R0;                // Circular buffering enabled
    L3 = R0;
    M0 = 12;                // Modifier to decrement outpoint pointer for 3 
                            // dummy writes
    I2 = R3;                // Address of twiddle factor array
    
    R0 = R2 << 1 || I1 -= M0;
                            // R0 = 2*N , Decrement output pointer by 12 for 
                            // skewing 
    M3 = R0;                // M3 = 2*N
    
    P1 = P1 >> 2;           // P1 = N/4;
    M2 = 0;                 // To avoid overflowing of I2 during dummy 
                            // increments at the start
    P0 = 16;                // Modifier for fetching input
    P2 = 0;                 // Loop counter for number of butterflies in each 
                            // group
    
// Start of first stage with input fetching (input is assumed to be in bit 
//reversed order)
// Input is assumed to be scaled by four. The outputis also scaled by four by 
//ASR option
    R0 = [I0++];            // x0
    R3 = [I0++];            // x2
    R1 = [I0++];            // x1
    LSETUP(STAGE1_ST,STAGE1_END) LC0 = P1;
                            // There are N/4 butterflies in first stage
STAGE1_ST:
        R4 = R0 +|+ R3 , R5 = R0 -|- R3(ASR) || R3 = [I0++] || [I1++] = R5;
                            // A = x0 +|+ x2, B = x0 -|- x2, fetch x3, store y1 
                            // of prev. bf
        R6 = R1 +|+ R3 , R1 = R1 -|- R3 (ASR,CO) || R0 = [I0++] || [I1++] = R6;
                            // C = x1 +|+ x3, D = x1 -|- x3(CO), fetch x0, store
                            // y2 of prev. bf
        R4 = R4 +|+ R6 , R6 = R4 -|- R6(ASR) || R3 = [I0++] || [I1++] = R7;
                            // y0 = A +|+ C, y2 = A -|- C, fetch x2, store y3 of
                            // prev. bf
STAGE1_END:     
        R7 = R5 +|- R1 , R5 = R5 -|+ R1(ASR) || R1 = [I0++] || [I1++] = R4;
                            // y3 = B +|- D, y1 = B -|+ D, fetch x3, Store y0 of
                            // this bf
    [I1++] = R5;            // Store y1 of last butterfly
    [I1++] = R6;            // Store y2 of last butterfly
    [I1++] = R7;            // Store y3 of last butterfly
                            // End of first stage with input fetching by bit 
                            // reversal
    
// Start of intermediate stages. All the stages except first and last done here.
INTER_STG:
                            // This loop(INTER_STAGES) is for (number of 
                            // stages - 2)
    P2 += 1;                // Butterfly counter is incremented
    P2 = P2 << 2;           // Number of butterflies is multiplied by 4.
    P2 += -1;               // Loop counter is decremented as first butterfly is
                            // done outside.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -