⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 cr2fftnasm_outplace.asm

📁 快速FFT,汇编原程序
💻 ASM
📖 第 1 页 / 共 2 页
字号:
/*******************************************************************************************
Copyright(c) 2000 Analog Devices/Intel
Developed by JD(FRIO) Software Application Team, IPDC, Bangalore, India
********************************************************************************************
  File Name    : cr2fftn_outplace.asm
  Module Name  : one dimensional radix 2 forward FFT for complex data (Out of place).
  Label name   :  __cfftN
  Description  : This file contains the code for the implementation of FFT. The Algorithm 
                 used is Decimation in Time. For the optimization point of view, the whole
				 computaion of butterfly signal flow has beeen divided in three parts. The
				 first part, the middle part and the last part. In the first part, the 
				 Stage 1 and Stage 2 of the butterfly structure are implemented. In the 2nd
				 part the general butterfly computation is done, which corresponds to the 
			     middle stages of butterfly. In the last part the last stage of the buttrfly
				 structure is implemented, where mainly the loop overheads are saved.

				 Input and output both the data are complex and 16 bit and represented in 1.15 
				 format. The loop unroling is also done for optimization point of view. 
				
				 The C callable prototype of the function is as follows:

				 void cfftN(complex_fract16  *in,  complex_fract16 *out, complex_fract16 *w, 
				            int wst, int n);
		  
				*in -> Pointer to Input array.  
				*out -> Pointer to Output array.
				*w -> Pointer to Twiddle factor. 
				wst -> Twiddle factor Stride. It is equal to 512/(size of input).
				n -> length of Input data.

				 The C equivalent code for the main loop of each butterfly is 
				 as follows.

				 for(i=0; i<le; i++)
                 {   
				   xire = out[add].re >> 1;
		           xiim = out[add].im >> 1;
		           xipre = out[add+indx].re >> 1;
		           xipim = out[add+indx].im >> 1;

				   mult.re = ((xipre * wptr[indx_w].re - xipim * wptr[indx_w].im) >>15);
				   mult.im = ((xipre * wptr[indx_w].im + xipim * wptr[indx_w].re) >>15);
				
				   out[add].re = xire + mult.re; 
			       out[add].im = xiim + mult.im;
				   out[add+indx].re = xire - mult.re;
				   out[add+indx].im = xiim - mult.im;
				   add = add + 2*offset;
			    }
				The ouput is  provided in normal order.
 
  Restrictions : The length of input array should be more than 4.i.e, 8, 16, 32
                 and it should be power of 2.

  Registers Used : 
       R0 -> It is mainly used for counter for middle stages. Its value is equal to m-3.
 	         if m are the total number of stages for a perticular n.
	   R1 -> It is used for the storing the value of wst. wst = 512/n.
	   R2 -> Used for storing Input/Output data.
	   R3 -> used for storing the twiddle factor value.
	   R4 -> Used for storing Input/Output data.
	   R5 -> Used for storing Input/Output data.
	   R6 -> Used for storing Input/Output data.
	   R7 -> Used for calculating and storing the address offset for twiddle factor 
	         array.
	   P0 -> It is used for storing the address offset of output buffer in middle part
	         of implementation.
	   P1 -> It is used for storing the address offset of output buffer in middle part
	         of implementation.
	   P2 -> It is used for storing the address offset of output buffer in middle part
	         of implementation.
	   P3 -> It stores the number of lines for the butterflies at a perticular stage.
	   P4 -> It stores the value of input array length.
	   P5 -> It stores the number of butterflies at a perticular stage.
	   A0 -> Used for storing the value of MAC temporarily.
	   A1 -> Used for storing the value of MAC temporarily.
	   B0 -> Start address of Input array.
	   B2 -> start address of output array.
	   B3 -> Start address of twiddle factor buffer.
       I0 -> Address for input array.
	   I1 -> Address for output and temporary array while computing.
       I2 -> Address for output array while computing.
	   I3 -> Address for twiddle factor array.
	   M0 -> Address offset for input/output array.
	   M1 -> Address offset for output array.
	   M2 -> Address offset for output array.
	   M3 -> Address offset for twiddle factor.
	   LC0 -> Counter.
       LC1 -> Counter.

PERFORMANCE: (With new timerfrio1.dll)

	   N = 256.
	   The cycle count for 256 point fft calculation is 3176 cycles.
	   The Cycle count for Stage1 + Stage2 + Bitreverse computation  =  389
 	   The Cycle count for middle stage computation                  = 2291
	   The Cycle count for last stage computation                    =  397
	   The Cycle count for prologue, epilogue... etc                 =   99
	   ----------------------------------------------------------------------
	                                                   TOTAL         = 3176
	
	   Code Size : 484 Bytes.

****************************************************************************************/
		.section					data1;
		.align						4096;
		.global             		_in;
		.var 						_in[512];
		.align						4096;
		.global             		_out;
		.var 						_out[512];
		.align 					 	4096;
		.global                     _w;
		.var                        _w[256];

/***************************************************************************************/

.section 				program;
.global					__cfftN;
.align                  8;

__cfftN:

/**************** Function Prologue ****************************************************/

/*
*  B0 stores the starting address of Input array.
*  B2 stores the starting address of Output array.
*  B3 stores the starting address of Twiddle factor.
*/
		B0 = R0;             //Pointer of Input buffer
		B2 = R1;             //Pointer of output buffer
		B3 = R2;             //Address of Twiddle factor
		M3 = B2;
		P1 = [SP+12];		 //The value of wst.  wst = 512/n.
		R2 = [SP+16];        //length of input array
		[--SP] = (R7:4, P5:3);
	
		CC = R2 <= 4 (IU);      //Come out of the program if the size of input data
 	    If CC Jump Terminate;   //is less than equal to 4.
/***************************************************************************************/

/************************** Implementation of First part *******************************/

/* 
* The stage 1 and 2 of the butterfly structure are implemented. After the comput-
* ation the result is stored to output array. The main reason for separating it out from
* the general computation, is that the multiplications of both the stages can be avoided. 
* 
* In the first stage of signal flow, there are n/2 number of butterflies. Each butterfly
* works on two inputs. These input are multiplied by W0 and added and subtracted. 
* Multiplying a data with W0 which is eqaul to 1 +0j will result the same data.
* 
* In the  second stage the number of butterflies are n/4. The data are added and subtra-
* cted after multiplication with W0 and Wn/4. The multiplication with W0 doesn't have
* any impact. The multiplication of data x+jy with Wn/4 will give the value y-jx. 
*
* Therefore, the multiplications involved in both the stages 1 and 2, can be reduced to 
* additions only. The output is stored after dividing by 2 for scaling purpose. In one loop
* data correspodding to 2 butterflies are processed.
*
* The input is read from Input array and output of computation at this stage is stored in
* output buffer, which is of the size of input array and passed at the time of calling
* the fft function.
*/
		P4 = R2;	
		I0 = B0;               //Address of Input array
		I2 = B2;               //Address of Output array
        R3 = R2 << 1 || nop || nop;      
		M0 = R3;              //M0 stores the offset required for bit reversing
		P5 = P4 >> 2;         //P5 is equal to number of butterflies at Stage 2
		P0 = P5 >> 1;         //P0 is half of th enumber of butterflies at stage 2.
		P0 += -1;             //P0 is one less than 1/2 of butterfly at stage 2.

/*
*  Register I0 is used for reading Input from Input array from Bit reversed locations.
*  I2 is used for writing the output of this stage in output buffer in normal
*  order.
*  The register set R2, R3, R4 and R5 process the four data corresponding to one
*  butterfly at Stage 2.
*  Register set R0, R1, R6 and R7 process the four data corresponding to next
*  butterfly at Stage 2.
*  The loop is set for one less than the half of the number of butterflies at stage2.
*  The last butterfly is processed separately.
*  All the data are divided by 2, before processing to avoid saturation and for

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -