📄 wvec.asm

📁 TMS320bbs（源程序）的c67xfiles文件。用于在CCS2.0集成编译环境下实现TI的c67x系列DSP开发。是用DSP汇编语言
💻 ASM
字号:
*===============================================================================**	TEXAS INSTRUMENTS, INC.		**	WEIGHTED VECTOR SUM (floating point)**	Revision Date:	02/26/98*	*	USAGE	This routine is C Callable and can be called as:*		*               void w_vec(const float *x, const float *y, const float m, *					float *c, const short count);**		x is pointer to array holding the floating point vector being weighted*		y is pointer to array holding the floating point summation vector		*		m is floating point weighting factor*		c is pointer to array which will be floating point output vector*		count is length of the vectors**		If the routine is not to be used as a C callable function,*		then you need to initialize values for all of the parameters*		passed to the function since these are assumed to be in*		registers as defined by the calling convention of the*		compiler, (refer to the TMS320C6x Optimizing C Compiler*		User's Guide).**	C CODE*		This is the C equivalent for the assembly code.  Note that*		the assembly code is hand optimized and restrictions may*		apply.**		void w_vec(const float *x, const float *y, const float m, float *c, const short n)*		{*			short i;**			for (i=0; i<n; i++) {*				c[i] = (m * x[i]) + y[i];*			}*		}**	DESCRIPTION**		This routine is used to obtain the weighted vector sum.  Vector *		x is weighted by a factor of m and then added to vector y. Both *		the inputs and output are floating point vectors.**	TECHNIQUES**		1.  A LDDW instructions are used to load two floating point *		    values simultaneously for the x array.	*		2.  The loop is unrolled once and software pipelined.**	ASSUMPTIONS**		1.  Little Endian is assumed for LDDW instructions.*		2.  The value of count (and the number of entries in the *		    array x) must be greater than or equal to 12 and *		    even (i.e. 12, 14, 16, ...).*		3.  Since single assignment of registers is not used,*		    interrupts should be disabled before this function is *                   called.*		*	MEMORY NOTE**		The arrays x and y should be aligned on opposite*		(even and odd) double word boundaries to avoid memory*		bank hits.  This routine does not perform extraneous loads.**       ARGUMENTS PASSED**		*x	 ->  A4*		*y	 ->  B4*		m	 ->  A6*		*c	 ->  B6	*		n	 ->  A8**	CYCLES*		n + 12*	*===============================================================================	.global _w_vec	.text_w_vec:*** BEGIN Benchmark Timing ***** --------------------------------------------------------------------------** PIPED LOOP PROLOG	LDDW	.D1T1	*A4++(8),A11:A10	;@	load x[i+1], x[i] from memory ||	MV	.L2X	A6,B9			;	move weighting term m into B9||	ADD	.S1X	B6,4,A12		;	make A12 c[i+1]	SUB	.L2X	A8,6,B2			;2	cntr = n - 6	LDDW	.D1T1	*A4++(8),A11:A10	;@@	load x[i+1], x[i] from memory ||	SUB	.L2	B2,B0,B2		;	cntr = cntr - B0	MV	.L1X	B2,A2			;4	cntr2 = cntr (for preventing extranious loads)	LDDW	.D1T1	*A4++(8),A11:A10	;@@@	load x[i+1], x[i] from memory||	LDDW	.D2T2	*B4++(8),B11:B10	;@	load y[i+1], y[i] from memory	MPYSP	.M1	A11,A6,A0		;@	prod2 = x[i+1] * m ||	MPYSP	.M2X	A10,B9,B0		;@	prod1 = x[i] * m	LDDW	.D1T1	*A4++(8),A11:A10	;@@@@	load x[i+1], x[i] from memory||	LDDW	.D2T2	*B4++(8),B11:B10	;@@	load y[i+1], y[i] form memory	MPYSP	.M1	A11,A6,A0		;@@	prod2 = x[i+1] * m||	MPYSP	.M2X	A10,B9,B0		;@@	prod1 = x[i] * m	LDDW	.D1T1	*A4++(8),A11:A10	;@@@@@	load x[i+1], x[i] from memory||	LDDW	.D2T2	*B4++(8),B11:B10	;@@@	load y[i+1], y[i] from memory|| 	B	.S1	LOOP			;@	if (cntr) branch to loop	MPYSP	.M1	A11,A6,A0		;@@@	prod2 = x[i+1] * m||	MPYSP	.M2X	A10,B9,B0		;@@@	prod1 = x[i] * m||	ADDSP	.L1X	A0,B11,A1		;@	sum2 = prod2 + y[i+1]||	ADDSP	.L2	B0,B10,B1		;@	sum1 = prod1 + y[i]	LDDW	.D1T1	*A4++(8),A11:A10	;@@@@@@	load x[i+1], x[i] from memory||	LDDW	.D2T2	*B4++(8),B11:B10	;@@@@	load y[i+1], y[i] from memory||	B	.S1	LOOP			;@@	if(cntr) branch to loop	MPYSP	.M1	A11,A6,A0		;@@@@	prod2 = x[i+1] * m||	MPYSP	.M2X	A10,B9,B0		;@@@@	prod1 = x[i] * m||	ADDSP	.L1X	A0,B11,A1		;@@	sum2 = prod2 + y[i+1]||	ADDSP	.L2	B0,B10,B1		;@@	sum1 = prod1 + y[i]	;** --------------------------------------------------------------------------*LOOP:     ; PIPED LOOP KERNEL   [A2]	LDDW	.D1T1	*A4++(8),A11:A10	;@@@@@@@if (cntr2) load x[i+1], x[i] from memory|| [A2]	LDDW	.D2T2	*B4++(8),B11:B10	;@@@@@	if (cntr2) load y[i+1], y[i] from memory|| [B2]	B	.S1	LOOP			;@@@	if (cntr) branch to loop|| [B2]	SUB	.S2	B2,2,B2			;	if (cntr) cntr = cntr - 2	MPYSP	.M1	A11,A6,A0		;@@@@@	prod2 = x[i+1] * m||	MPYSP	.M2X	A10,B9,B0		;@@@@@	prod1 = x[i] * m||	ADDSP	.L1X	A0,B11,A1		;@@@	sum2 = prod2 + y[i+1]||	ADDSP	.L2	B0,B10,B1		;@@@	sum1 = prod1 + y[i]||	STW	.D1	A1,*A12++(8)		;@	store sum2 into c[i+1]||	STW	.D2	B1,*B6++(8)		;@	store sum1 into c[i]|| [A2] SUB	.S1	A2,2,A2			;	if (cntr2) cntr2 = cntr2 - 2                             *** --------------------------------------------------------------------------*		B	.S2	B3			;21	return from function       B_END:                  	NOP		5             	*** END Benchmark Timing ***
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -