📄 sadct.asm
字号:
/*******************************************************************************
Copyright(c) 2000 - 2002 Analog Devices. All Rights Reserved.
Developed by Joint Development Software Application Team, IPDC, Bangalore, India
for Blackfin DSPs ( Micro Signal Architecture 1.0 specification).
By using this module you agree to the terms of the Analog Devices License
Agreement for DSP Software.
********************************************************************************
Module name : sadct.asm
Label name : __sadct
Version : 1.3
Change History :
Version Date Author Comments
1.3 11/18/2002 Swarnalatha Tested with VDSP++ 3.0
compiler 6.2.2 on
ADSP-21535 Rev.0.2
1.2 11/13/2002 Swarnalatha Tested with VDSP++ 3.0
on ADSP-21535 Rev.0.2
1.1 03/10/2002 Manoj Modified to match
silicon cycle count
1.0 07/20/2001 Manoj Original
Description : This program performs SADCT on a 8x8 as prescribed in the
MPEG-4 standard. It takes the input data array x[] in short
format to be transformed (as difference can have a 9 bit
range), where the elements are
x00, x01 ...x07;
x10,x11 ....x17;
........
x70,x71.....x77;
and the corresponding shape information in character format
(Alpha Map). Consider the shape array of a 3x3 (taken for ease
of demonstration). The following sequence of operations are
performed.
a) Perform column alignment as shown
[255 255 0 ; column [255 255 255 ;
0 0 255; =======> 0 255 0 ; =====>
0 255 0 ] align 0 0 0 ]
xc=[x00 x01 x12 ;
0 x31 0 ;
0 0 0 ]
b) Perform DCT of appropriate length on each of the columns of
xc i.e. perform DCT(1) on column 1 of xc, DCT(2) on column 2
of xc and DCT(1) on column 3 of xc,
where DCT(N) => DCT(N,N)=K * cos(i*(j+0.5)*(pi/N)),
where i,j E [0 N) and K=sqrt(1/N) : i=0;
=sqrt(2/N) : else.
N is the number of shape elements in the column on which SADCT
is being performed. In this program, the DCT coefficients are
stored in an array and a direct matrix multiplication method
is used to implement the DCT. It is to be noted that for N > 6
special flowgraph implementation of DCT will be optimal
(considering the conditional branch and DCT complexity).
However, this has not been incorporated in this program. For
DCT(8) the chens DCT will suffice. If required, the flowgraph
of DCT(7) is to be integrated. However, since the cycle count
is highly dependent on the shape array, the user has to
prudently decide whether to use the flowgraph approach in
application using the considerations of code-size and speed
improvement. If most of the elements in the shape array are
non-zero, it would not be very advisable to use a SADCT over
the normal DCT, considering the computational load. However,
user has to choose between SNR loss and cycle count increase
based on the particular application. In the implementation
provided, two SADCT outputs are computed simultaneously, by a
slight compromise on memory storage for the coefficients.
The column SADCT transformed array XC is
[X00 X10 X20 ;
0 X11 0 ;
0 0 0 ]
c) Perform row alignment as shown
[255 255 255 ; row XR=[X00 X10 X20 ;
0 255 0 ; =====> X11 0 0 ;
0 0 0 ] align 0 0 0 ]
d) Perform DCT of appropriate length on each of the row of XR
i.e. perform DCT(3) on row 1 of XR, DCT(1) on row 2 of XR and
skip row 3 as there are no non-zero elements in row 3.
A new technique that avoids the intermediate shape after top-
column alignment is adopted here to save stack space and to
optimize cycle count.
Prototype : void sadct(short in[], unsigned char shape[], short out[],
short coeff[]);
in -> Address of the 8x8 data array
shape -> Address of the 8x8 shape array
out -> Address of the 8x8 output array
coeff -> Address of the coefficients
Registers used : A0, A1, R0-R7, I0-I3, B0-B3, M0-M3, L0-L3, P0-P5, LC0, LC1.
Performance :
Code Size : 436 Bytes
Cycle count : 1904 Cycles for a lower triangular matrix
(including the diagonal)
*******************************************************************************/
/*Create a temporary storage of 1x8 of size fract16 to store the packed column/
row elements for SADCT in Stack*/
/*Create a temporary storage of 2x8 bytes in stack to store the length of the
intermediate shape*/
.section L1_code;
.global __sadct;
.align 8;
.extern __Coeff_offset;
__sadct:
//Initializations
P0 = R0; //Address of the input array
P1 = R1; //Address of the shape array
B0 = R1;
I0 = R2; //Address of the output array
B3 = R2;
R1 = [SP+12];
[--SP] = (R7:4,P5:3);
P4 = R0; //Save the address of the input array
I2 = R1; //Base address of the coeff. array
B2 = R1; //Save the base of the coeff. array
L2 = 0;
I3.L = __Coeff_offset; //Base of the offset array
I3.H = __Coeff_offset;
L3 = 0;
L0 = 0;
SP += -16; //Temporary Storage allocated in stack.
I1 = SP; //Pointer to temporary storage
B1 = SP;
L1 = 0;
SP += -20; //To store the altered shape information
P5 = SP; //Pointer to the base of column length
M1 = 16;
M3 = -12;
//Column Alignment and Column DCT done together to reduce the intermediate
// storage space from 64*2 to 8*2 bytes in the stack.
//The Col SADCT in normal manner in I0
P3 = 32;
R7 = 0; //Set the outer loop counter for column operation
LSETUP($0LP_ST,$0LP_ST) LC0 = P3;
//Clear the shape array and output array
$0LP_ST:
[I0++] = R7;
R5 = 2;
I0 = B3;
COL_ALIGN_ST:
P3 = 8;
I1 = B1;
R6 = B1;
R2 = B1;
LSETUP($1LP_ST,$1LP_END) LC0 = P3;
P3 = 16;
R1 = W[P0++P3] (Z); //Read the pixel value
$1LP_ST:
R4 = R2+R5 (S) ||R0 = B[P1] (Z);
//Read the first byte of shape
CC = R0 == 0;
R1 = W[P0++P3] (Z) || W[I1] = R1.L;
IF !CC R2 = R4;
I1 = R2;
$1LP_END:
P1 += 8;
R0 = R2-R6(S);
R6 = R0>>1 || W[P5] = R0.L;
//Save 2*length
P5 += 2;
P3 = R7; //Point to the next column
P1 = B0; //P1 is restored to start of Shape buffer
P4 += 2; //point to the next column
P0 = P4; //P0 is restored to the data buffer
P3 += 1;
P1 = P1+P3;
CC = R6 == 0;
IF CC JUMP COL_ALIGN_END;
//If zero length, no SADCT for that column.
M0 = R0; //2 * Length (L) of non-zero elements
L1 = R0; //Set I1 as circular buffer
P2 = R6;
R0 = B3; //Base of temporary storage
R4 = 1;
R2 = R2-R2 (S) || I3 += M0;
//Point to the right offset
R1 = R6+R4 (S) || R2.L = W[I3] || I3 -= M0;
//Length+1, Fetch the offset. Restore I3
M2 = R2;
P3 = R1;
R3 = R7<<1;
I1 = B1; //Point to the start of the temporary storage
R3 = R0+R3(S) || I2 += M2;
I0 = R3;
A1 = A0 = 0 || R0.L = W[I1++] || R1 = [I2++];
LSETUP($2LP_ST,$2LP_END) LC1 = P3>>1;
//Set Loop for (L+1)>>1
$2LP_ST:
LSETUP($3LP_ST,$3LP_ST) LC0 = P2;
//Set Loop for L
$3LP_ST: R2.H = (A1 += R0.L*R1.H),R2.L = (A0 += R0.L*R1.L) || R0.L = W[I1++]
|| R1 = [I2++];
//Fetch a data and 2 coeff.
W[I0] = R2.L || I0 += M1;
$2LP_END:
A1 = A0 = 0 || W[I0] = R2.H || I0 += M1;
//Output is stored
I2 = B2; //Restore pointer to the coeff. buffer
L1 = 0; //Clear the circular buffering of I1
COL_ALIGN_END:
R7 += 1; //Increment row counter
CC = R7 <= 7 (IU);
IF CC JUMP COL_ALIGN_ST (BP);
//End of column alignment and column DCT
//Row Alignment and Row DCT.
I1 = B1; //Restore the temporary location pointer
R7 = 0; //Set the outer loop counter for column operation
I0 = B3;
R4 = 2;
ROW_ALIGN_ST:
P5 = SP; //Restore the pointer to the intermediate shape
//array
R0 = W[P5++] (X) ; //Read the length
P3 = 8;
I1 = B1;
R2 = B1;
R6 = B1;
LSETUP($5LP_ST,$5LP_END) LC0 = P3;
P3 = 4;
$5LP_ST:
R5 = R0-R4 (S) || R1.L = W[I0++] || R0 = W[P5--](X);
//Read the pixel value
R3 = PACK(R2.H,R2.L) || W[P5++P3] = R5.L;
CC = R5 < 0;
R3 = R3+R4 (S) || W[I1] = R1.L;
IF !CC R2 = R3;
$5LP_END:
I1 = R2;
R6 = R2-R6;
R6 >>= 1; //Count of nonzero elements
R0 = B3;
R3 = R7 << 4;
R3 = R0+R3;
I0 = R3;
R7 += 1; //Increment column counter
R3 += 16;
CC = R6 == 0;
IF CC JUMP ROW_ALIGN_END;
//If zero length, no SADCT for that column.
I1 = B1; //Point to the start of the temporary storage
R0 = R6<<1 || NOP;
M0 = R0; //2 * Length (L) of non-zero elements
L1 = R0; //Set the temporary buffer as circular
R0 = 0;
R5 = R4 >> 1 || [I0++] = R0;
R2 = R2-R2 (S) || [I0++] = R0 || I3 += M0;
//Point to the right offset
R1 = R6+R5 (S) || R2.L = W[I3] ;
//Length+1, Fetch the offset.
M2 = R2;
[I0++] = R0|| I3 -= M0; //Clear I0. Restore I3
P3 = R1;
P2 = R6;
I2 += M2 || [I0++M3] = R0;
//Point to the right coeff. location
A1 = A0 = 0 || R0.L = W[I1++] || R1 = [I2++];
LSETUP($6LP_ST,$6LP_END) LC1 = P3>>1;
//Set Loop for (L+1)>>1
$6LP_ST:
LSETUP($7LP_ST,$7LP_ST) LC0 = P2;
//Set Loop for L
$7LP_ST:
R2.H = (A1 += R0.L*R1.H),R2.L = (A0 += R0.L*R1.L) || R0.L = W[I1++]
|| R1 = [I2++];
//Fetch a data and 2 coeff.
$6LP_END:
A1 = A0 = 0 || [I0++] = R2 ;
//Output is stored
I2 = B2; //Restore pointer to the coeff. buffer
L1 = 0;
ROW_ALIGN_END:
I0 = R3; //Point to the next row
CC = R7 <= 7 (IU);
IF CC JUMP ROW_ALIGN_ST (BP);
//End of row alignment and row DCT
SP += 36;
(R7:4,P5:3) = [SP++];
RTS;
NOP; //to avoid one stall if LINK or UNLINK happens to be
//the next instruction after RTS in the memory.
__sadct.end:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -