⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 sadct.asm

📁 用于视频图像的解压缩
💻 ASM
字号:
/*******************************************************************************
Copyright(c) 2000 - 2002 Analog Devices. All Rights Reserved.
Developed by Joint Development Software Application Team, IPDC, Bangalore, India
for Blackfin DSPs  ( Micro Signal Architecture 1.0 specification).

By using this module you agree to the terms of the Analog Devices License
Agreement for DSP Software. 
********************************************************************************
Module name     : sadct.asm
Label name      : __sadct
Version         : 1.3
Change History  :

                Version     Date        Author        Comments
                1.3         11/18/2002  Swarnalatha   Tested with VDSP++ 3.0
                                                      compiler 6.2.2 on 
                                                      ADSP-21535 Rev.0.2
                1.2         11/13/2002  Swarnalatha   Tested with VDSP++ 3.0
                                                      on ADSP-21535 Rev.0.2
                1.1         03/10/2002  Manoj         Modified to match
                                                      silicon cycle count
                1.0         07/20/2001  Manoj         Original 

Description     : This program performs SADCT on a 8x8 as prescribed in the 
                  MPEG-4 standard. It takes the input data array x[] in short 
                  format to be transformed (as difference can have a 9 bit 
                  range), where the elements are 

                     x00, x01 ...x07;
                     x10,x11 ....x17;
                        ........
                     x70,x71.....x77;

                  and the corresponding shape information in character format 
                  (Alpha Map). Consider the shape array of a 3x3 (taken for ease
                  of demonstration). The following sequence of operations are 
                  performed.
        
        
                  a) Perform column alignment as shown
        
                  [255 255  0 ;      column        [255 255  255 ;           
                    0   0  255;      =======>         0  255  0   ;   =====>      
                    0  255  0 ]       align           0   0   0   ]               

                                                            xc=[x00  x01  x12 ;
                                                            0   x31   0  ;
                                                            0    0    0  ] 


                  b) Perform DCT of appropriate length on each of the columns of
                  xc i.e. perform DCT(1) on column 1 of xc, DCT(2) on column 2 
                  of xc and DCT(1) on column 3 of xc, 
                  where DCT(N) => DCT(N,N)=K * cos(i*(j+0.5)*(pi/N)),
                     where i,j E [0 N) and K=sqrt(1/N) : i=0;
                                            =sqrt(2/N) : else.
                  N is the number of shape elements in the column on which SADCT
                  is being performed. In this program, the DCT coefficients are 
                  stored in an array and a direct matrix multiplication method 
                  is used to implement the DCT. It is to be noted that for N > 6
                  special flowgraph implementation of DCT will be optimal 
                  (considering the conditional branch and DCT complexity). 
                  However, this has not been incorporated in this program. For 
                  DCT(8) the chens DCT will suffice. If required, the flowgraph
                  of DCT(7) is to be integrated. However, since the cycle count 
                  is highly dependent on the shape array, the user has to 
                  prudently decide whether to use the flowgraph approach in 
                  application using the considerations of code-size and speed 
                  improvement. If most of the elements in the shape array are 
                  non-zero, it would not be very advisable to use a SADCT over 
                  the normal DCT, considering the computational load. However, 
                  user has to choose between SNR loss and cycle count increase 
                  based on the particular application. In the implementation 
                  provided, two SADCT outputs are computed simultaneously, by a 
                  slight compromise on memory storage for the coefficients.
                  The column SADCT transformed array XC is

                     [X00 X10 X20 ;
                       0  X11  0  ;
                       0   0   0  ]

                  c) Perform row alignment as shown
        
                      [255 255  255 ;    row     XR=[X00  X10  X20 ;
                        0  255  0   ;   =====>       X11   0    0  ;
                        0   0   0   ]   align         0    0    0  ] 
   
                  d) Perform DCT of appropriate length on each of the row of XR 
                  i.e. perform DCT(3) on row 1 of XR, DCT(1) on row 2 of XR and 
                  skip row 3 as there are no non-zero elements in row 3.
        
                  A new technique that avoids the intermediate shape after top-
                  column alignment is adopted here to save stack space and to 
                  optimize cycle count.

Prototype       : void sadct(short in[], unsigned char shape[], short out[], 
                             short coeff[]);

                     in     -> Address of the 8x8 data array 
                     shape  -> Address of the 8x8 shape array 
                     out    -> Address of the 8x8 output array
                     coeff  -> Address of the coefficients

Registers used  : A0, A1, R0-R7, I0-I3, B0-B3, M0-M3, L0-L3, P0-P5, LC0, LC1.

Performance     :
                Code Size   : 436 Bytes
                Cycle count : 1904 Cycles for a lower triangular matrix 
                                   (including the diagonal)
*******************************************************************************/
/*Create a temporary storage of 1x8 of size fract16 to store the packed column/
row elements for SADCT in Stack*/
/*Create a temporary storage of 2x8 bytes in stack to store the length of the 
intermediate shape*/

.section L1_code;
.global __sadct;
.align 8;
.extern __Coeff_offset;

__sadct:
                            //Initializations
    
    P0 = R0;                //Address of the input array
    P1 = R1;                //Address of the shape array
    B0 = R1;      
    I0 = R2;                //Address of the output array
    B3 = R2;
    R1 = [SP+12];
    
    [--SP] = (R7:4,P5:3);
    P4 = R0;                //Save the address of the input array
    I2 = R1;                //Base address of the coeff. array
    B2 = R1;                //Save the base of the coeff. array
    L2 = 0;
    I3.L = __Coeff_offset;  //Base of the offset array
    I3.H = __Coeff_offset;
    L3 = 0;
    L0 = 0;
    SP += -16;              //Temporary Storage allocated in stack.
    I1 = SP;                //Pointer to temporary storage
    B1 = SP;      
    L1 = 0;       
    
    SP += -20;              //To store the altered shape information
    P5 = SP;                //Pointer to the base of column length
    
    M1 = 16;
    M3 = -12;
//Column Alignment and Column DCT done together to reduce the intermediate 
// storage space from 64*2 to 8*2 bytes in the stack.
//The Col SADCT in normal manner in I0
    
    P3 = 32;
    R7 = 0;                 //Set the outer loop counter for column operation
    
    LSETUP($0LP_ST,$0LP_ST) LC0 = P3;
                            //Clear the shape array and output array 
$0LP_ST:
        [I0++] = R7;
    
    R5 = 2;
    I0 = B3;          
COL_ALIGN_ST:
    P3 = 8;
    I1 = B1;
    R6 = B1;
    R2 = B1;
    
    LSETUP($1LP_ST,$1LP_END) LC0 = P3;
    P3 = 16;
    R1 = W[P0++P3] (Z);     //Read the pixel value
$1LP_ST:
        R4 = R2+R5 (S) ||R0 = B[P1] (Z);
                            //Read the first byte of shape 
        CC = R0 == 0;
        R1 = W[P0++P3] (Z) || W[I1] = R1.L;     
        IF !CC R2 = R4;
        I1 = R2;
$1LP_END:
        P1 += 8;
    
    R0 = R2-R6(S);
    R6 = R0>>1 || W[P5] = R0.L;
                            //Save 2*length 
    
    P5 += 2;
    
    P3 = R7;                //Point to the next column
    P1 = B0;                //P1 is restored to start of Shape buffer
    P4 += 2;                //point to the next column
    P0 = P4;                //P0 is restored to the data buffer
    P3 += 1;
    P1 = P1+P3;
    
    CC = R6 == 0;
    IF CC JUMP COL_ALIGN_END;
                            //If zero length, no SADCT for that column. 
    
    
    M0 = R0;                //2 * Length (L) of non-zero elements
    L1 = R0;                //Set I1 as circular buffer
    P2 = R6;
    R0 = B3;                //Base of temporary storage
    R4 = 1;
    R2 = R2-R2 (S) || I3 += M0;
                            //Point to the right offset 
    R1 = R6+R4 (S) || R2.L = W[I3] || I3 -= M0;
                            //Length+1, Fetch the offset. Restore I3 
    M2 = R2;
    P3 = R1;
    R3 = R7<<1;
    I1 = B1;                //Point to the start of the temporary storage
    R3 = R0+R3(S) || I2 += M2;
    I0 = R3;
    
    
    A1 = A0 = 0 || R0.L = W[I1++] || R1 = [I2++];
    
    LSETUP($2LP_ST,$2LP_END) LC1 = P3>>1;
                            //Set Loop for (L+1)>>1 
$2LP_ST:
        LSETUP($3LP_ST,$3LP_ST) LC0 = P2;
                            //Set Loop for L 
    
$3LP_ST:    R2.H = (A1 += R0.L*R1.H),R2.L = (A0 += R0.L*R1.L) || R0.L = W[I1++]
            || R1 = [I2++];
                            //Fetch a data and 2 coeff. 
        W[I0] = R2.L || I0 += M1;
$2LP_END:
        A1 = A0 = 0 || W[I0] = R2.H || I0 += M1;
                            //Output is stored 
    I2 = B2;                //Restore pointer to the coeff. buffer
    L1 = 0;                 //Clear the circular buffering of I1
    
COL_ALIGN_END:
    R7 += 1;                //Increment row counter
    CC = R7 <=  7 (IU);
    IF CC JUMP COL_ALIGN_ST (BP);
    
//End of column alignment and column DCT

//Row Alignment and Row DCT.
    
    I1 = B1;                //Restore the temporary location pointer
    R7 = 0;                 //Set the outer loop counter for column operation
    I0 = B3;
    R4 = 2;
    
ROW_ALIGN_ST:
    P5 = SP;                //Restore the pointer to the intermediate shape
                            //array
    
    R0 = W[P5++] (X) ;      //Read the length
    P3 = 8;
    I1 = B1;
    R2 = B1;
    R6 = B1;
    
    LSETUP($5LP_ST,$5LP_END) LC0 = P3;
    P3 = 4;
$5LP_ST:
        R5 = R0-R4 (S) || R1.L = W[I0++] || R0 = W[P5--](X);
                            //Read the pixel value 
        R3 = PACK(R2.H,R2.L) || W[P5++P3] = R5.L;
        CC = R5 < 0;
        R3 = R3+R4 (S) || W[I1] = R1.L;
        IF !CC R2 = R3; 
$5LP_END:
        I1 = R2;
    
    
    R6 = R2-R6;
    R6 >>= 1;               //Count of nonzero elements
    
    R0 = B3;
    R3 = R7 << 4;
    R3 = R0+R3;
    I0 = R3;
    R7 += 1;                //Increment column counter
    R3 += 16;
    
    CC = R6 == 0;
    IF CC JUMP ROW_ALIGN_END;
                            //If zero length, no SADCT for that column. 
    
    I1 = B1;                //Point to the start of the temporary storage
    R0 = R6<<1 || NOP;
    M0 = R0;                //2 * Length (L) of non-zero elements
    L1 = R0;                //Set the temporary buffer as circular
    R0 = 0;
    R5 = R4 >> 1 || [I0++] = R0;
    R2 = R2-R2 (S) || [I0++] = R0 || I3 += M0;
                            //Point to the right offset 
    R1 = R6+R5 (S) || R2.L = W[I3] ;
                            //Length+1, Fetch the offset. 
    M2 = R2;
    [I0++] = R0|| I3 -= M0; //Clear I0. Restore I3
    P3 = R1;
    P2 = R6;
    I2 += M2 || [I0++M3] = R0;
                            //Point to the right coeff. location 
    
    A1 = A0 = 0 || R0.L = W[I1++] || R1 = [I2++];
    LSETUP($6LP_ST,$6LP_END) LC1 = P3>>1;
                            //Set Loop for (L+1)>>1 
    
$6LP_ST:
        LSETUP($7LP_ST,$7LP_ST) LC0 = P2;
                            //Set Loop for L 
$7LP_ST:
            R2.H = (A1 += R0.L*R1.H),R2.L = (A0 += R0.L*R1.L) || R0.L = W[I1++]
            || R1 = [I2++];
                            //Fetch a data and 2 coeff. 
$6LP_END:
        A1 = A0 = 0 || [I0++] = R2 ;
                            //Output is stored 
    I2 = B2;                //Restore pointer to the coeff. buffer
    L1 = 0;
ROW_ALIGN_END:
    I0 = R3;                //Point to the next row
    CC = R7 <=  7 (IU);
    IF CC JUMP ROW_ALIGN_ST (BP);
//End of row alignment and row DCT
    
    SP += 36;
    (R7:4,P5:3) = [SP++];
    RTS;
    NOP;                    //to avoid one stall if LINK or UNLINK happens to be
                            //the next instruction after RTS in the memory.
__sadct.end:

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -