⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 fdct_aan_kc.sc

📁 motion Jpeg 在SPI DSP平台优化好的代码
💻 SC
📖 第 1 页 / 共 2 页
字号:
    // phase 2    b0 = a0 + a3;    b3 = a0 - a3;    b1 = a1 + a2;    b2 = a1 - a2;    // phase 3    out0 = b0 + b1;    out4 = b0 - b1;    tmp1 = b2 + b3;	z1 = (vec int16x2) spi_vmulra16i(tmp1,(vec int16x2)FIX15_0_707106781, 0);    // phase 5    out2 = b3 + z1;    out6 = b3 - z1;    // Odd part     // phase 2    b0 = a4 + a5;    b1 = a5 + a6;    b2 = a6 + a7;    // The rotator is modified from fig 4-8 to avoid extra negations.    tmp1 = b0 - b2;	z5 = (vec int16x2) spi_vmulra16i(tmp1,(vec int16x2)FIX15_0_382683433, 0);	z2 = (vec int16x2) spi_vmulra16i(b0,(vec int16x2)FIX15_0_541196100, 0) + z5;	z4 = (vec int16x2) spi_vshift16(spi_vmulra16i(b2,(vec int16x2)FIX14_1_306562965, 0), 1) + z5;	z3 = (vec int16x2) spi_vmulra16i(b1,(vec int16x2)FIX15_0_707106781, 0);    //phase 5    z11 = a7 + z3;    z13 = a7 - z3;    // phase 6    out5 = z13 + z2;    out3 = z13 - z2;    out1 = z11 + z4;    out7 = z11 - z4;}///////////////////////////////////////////////////////////////////inline void kernel dct8_2_sub128(    // Input pixel values (0--255) as int16x2s in 16.0    vec int16x2  in0(in),     vec int16x2  in1(in),    vec int16x2  in2(in),    vec int16x2  in3(in),    vec int16x2  in4(in),     vec int16x2  in5(in),    vec int16x2  in6(in),    vec int16x2  in7(in),    // Output DCT coefficients in 16.0    vec int16x2 out0(out),     vec int16x2 out1(out),    vec int16x2 out2(out),    vec int16x2 out3(out),    vec int16x2 out4(out),     vec int16x2 out5(out),    vec int16x2 out6(out),    vec int16x2 out7(out))// Description://    This function does two 8pt DCTs on each lane, on each half of//    the int16x2s.  That is, the upper half of the 8 input int16x2s//    represent one 8-element array, while the lower halves represent//    another 8-element array. This algorithm is from//    Pennebaker/Mitchell, pg. 50-52.  See also Arai, Agui, Nakajima.//    The algorithm is based on the 16-pt DFT.  Basically, the 8-pt//    DCT can be calculated by scaling the real parts of the output of//    the 16-pt DFT.  Pixel offset of 128 is subtracted from the each//    input that features in addition to ax, unlike kernel dct8_2////	Returns:    Nothing.//////////////////////////////////////////////////////////////////{    vec int16x2 a0, a1, a2, a3, a4, a5, a6, a7;    vec int16x2 b0, b1, b2, b3;    vec int16x2 z1, z2, z3, z4, z5, z11, z13;    vec int16x2 tmp1;        //phase 1    a0 = (in0 + in7) - (int16x2)SUBTRACT_128x2;    a7 = in0 - in7;    a1 = (in1 + in6) - (int16x2)SUBTRACT_128x2;    a6 = in1 - in6;    a2 = (in2 + in5) - (int16x2)SUBTRACT_128x2;    a5 = in2 - in5;    a3 = (in3 + in4) - (int16x2)SUBTRACT_128x2;    a4 = in3 - in4;    // even part    // phase 2    b0 = a0 + a3;    b3 = a0 - a3;    b1 = a1 + a2;    b2 = a1 - a2;    // phase 3    out0 = b0 + b1;    out4 = b0 - b1;    tmp1 = b2 + b3;	z1 = (vec int16x2) spi_vmulra16i(tmp1,(vec int16x2)FIX15_0_707106781, 0);    // phase 5    out2 = b3 + z1;    out6 = b3 - z1;    // Odd part     // phase 2    b0 = a4 + a5;    b1 = a5 + a6;    b2 = a6 + a7;    // The rotator is modified from fig 4-8 to avoid extra negations.    tmp1 = b0 - b2;	z5 = (vec int16x2) spi_vmulra16i(tmp1,(vec int16x2)FIX15_0_382683433, 0);	z2 = (vec int16x2) spi_vmulra16i(b0,(vec int16x2)FIX15_0_541196100, 0) + z5;	z4 = (vec int16x2) spi_vshift16(spi_vmulra16i(b2,(vec int16x2)FIX14_1_306562965, 0), 1) + z5;	z3 = (vec int16x2) spi_vmulra16i(b1,(vec int16x2)FIX15_0_707106781, 0);    //phase 5    z11 = a7 + z3;    z13 = a7 - z3;    // phase 6    out5 = z13 + z2;    out3 = z13 - z2;    out1 = z11 + z4;    out7 = z11 - z4;}///////////////////////////////////////////////////////////////////inline void kernel quantize (    vec int16x2 in(in),    vec int16x2 scale(in),    vec int16x2 out(out))// Description: Kernel takes a 16x2 vec co-efficient and a 16x2 vec//              quantization divisor scale value.  The co-efficient//              is quantized  by multiplying it with the divisor//              and then scaled down with proper rounding to retain//              only the most precise portion of the result.////	Returns:    Nothing.//////////////////////////////////////////////////////////////////{    vec int16x2 tmp0, tmp1;    vec int32x1 shift, rounding;    vec int32x1 mul_hi, mul_lo;    vec int32x1 out_hi, out_lo;    shift = DIV_CONST_BITS;    rounding =  (vec int32x1) DIV_CONST_HALF;    tmp0 = (in < (vec int16x2)0) ? ((vec int16x2)0 - in) : in;    mul_hi = spi_vmuld16i_hi (tmp0, scale);    mul_lo = spi_vmuld16i_lo (tmp0, scale);        out_hi = (mul_hi + rounding) >> shift;    out_lo = (mul_lo + rounding) >> shift;    tmp1 = (vec int16x2) spi_vshufflei (EXTRACT_MSB, out_lo, out_hi);    out  = (tmp0 != in) ? ((vec int16x2)0 - tmp1) : tmp1;}///////////////////////////////////////////////////////////////////inline kernel void fdct8x8_and_quantize_aan_kc(    // quant scale    vec int16x2 scale0(in),  vec int16x2 scale1(in),  vec int16x2 scale2(in),  vec int16x2 scale3(in),    vec int16x2 scale4(in),  vec int16x2 scale5(in),  vec int16x2 scale6(in),  vec int16x2 scale7(in),    vec int16x2 scale8(in),  vec int16x2 scale9(in),  vec int16x2 scale10(in), vec int16x2 scale11(in),    vec int16x2 scale12(in), vec int16x2 scale13(in), vec int16x2 scale14(in), vec int16x2 scale15(in),    vec int16x2 scale16(in), vec int16x2 scale17(in), vec int16x2 scale18(in), vec int16x2 scale19(in),    vec int16x2 scale20(in), vec int16x2 scale21(in), vec int16x2 scale22(in), vec int16x2 scale23(in),    vec int16x2 scale24(in), vec int16x2 scale25(in), vec int16x2 scale26(in), vec int16x2 scale27(in),    vec int16x2 scale28(in), vec int16x2 scale29(in), vec int16x2 scale30(in), vec int16x2 scale31(in),    // Input pixel values (0--255) as int16x2s in 16.0    vec int16x2 a0(in),  vec int16x2 a1(in),  vec int16x2 a2(in),  vec int16x2 a3(in),    vec int16x2 a4(in),  vec int16x2 a5(in),  vec int16x2 a6(in),  vec int16x2 a7(in),    vec int16x2 a8(in),  vec int16x2 a9(in),  vec int16x2 a10(in), vec int16x2 a11(in),    vec int16x2 a12(in), vec int16x2 a13(in), vec int16x2 a14(in), vec int16x2 a15(in),    vec int16x2 a16(in), vec int16x2 a17(in), vec int16x2 a18(in), vec int16x2 a19(in),    vec int16x2 a20(in), vec int16x2 a21(in), vec int16x2 a22(in), vec int16x2 a23(in),    vec int16x2 a24(in), vec int16x2 a25(in), vec int16x2 a26(in), vec int16x2 a27(in),    vec int16x2 a28(in), vec int16x2 a29(in), vec int16x2 a30(in), vec int16x2 a31(in),    // Output quantized DCT coefficients in 16.0    vec int16x2 d0(out),  vec int16x2 d1(out),  vec int16x2 d2(out),  vec int16x2 d3(out),    vec int16x2 d4(out),  vec int16x2 d5(out),  vec int16x2 d6(out),  vec int16x2 d7(out),    vec int16x2 d8(out),  vec int16x2 d9(out),  vec int16x2 d10(out), vec int16x2 d11(out),    vec int16x2 d12(out), vec int16x2 d13(out), vec int16x2 d14(out), vec int16x2 d15(out),    vec int16x2 d16(out), vec int16x2 d17(out), vec int16x2 d18(out), vec int16x2 d19(out),    vec int16x2 d20(out), vec int16x2 d21(out), vec int16x2 d22(out), vec int16x2 d23(out),    vec int16x2 d24(out), vec int16x2 d25(out), vec int16x2 d26(out), vec int16x2 d27(out),    vec int16x2 d28(out), vec int16x2 d29(out), vec int16x2 d30(out), vec int16x2 d31(out))// Description://    This function performs one 8x8 DCT on each lane.  The input is//    organized as a sequence of 8 rows, where each row is composed of//    4 int16x2s.  The output is similar to the input except that//    the data IS TRANSPOSED---i.e., a sequence of 8 columns. The //    resulting 8x8 block co-efficients are quantized using a 8x8 2-D  //    transposed quantization divisor matrix.////	Returns:    Nothing.//////////////////////////////////////////////////////////////////{	////////////////////////////////////////////////////////////////////////////////////////////////////    //      Coloumn DCTs    ////////////////////////////////////////////////////////////////////////////////////////////////////    dct8_2_sub128(         a0, a4, a8, a12, a16, a20, a24, a28,         d0, d4, d8, d12, d16, d20, d24, d28         );    dct8_2_sub128(         a1, a5, a9, a13, a17, a21, a25, a29,         d1, d5, d9, d13, d17, d21, d25, d29         );    dct8_2_sub128(         a2, a6, a10, a14, a18, a22, a26, a30,         d2, d6, d10, d14, d18, d22, d26, d30         );    dct8_2_sub128(         a3, a7, a11, a15, a19, a23, a27, a31,         d3, d7, d11, d15, d19, d23, d27, d31         );	////////////////////////////////////////////////////////////////////////////////////////////////////    //      8x8 matrix transpose    ////////////////////////////////////////////////////////////////////////////////////////////////////    transpose_8x8_intralane(        d0, d1, d2, d3, d4, d5, d6, d7,        d8, d9, d10, d11, d12, d13, d14, d15,        d16, d17, d18, d19, d20, d21, d22, d23,        d24, d25, d26, d27, d28, d29, d30, d31,        a0, a1, a2,  a3,  a4,  a5,  a6,  a7,        a8, a9, a10, a11, a12, a13, a14, a15,        a16, a17, a18, a19, a20, a21, a22, a23,        a24, a25, a26, a27, a28, a29, a30, a31        );	////////////////////////////////////////////////////////////////////////////////////////////////////    //      Row DCTs    ////////////////////////////////////////////////////////////////////////////////////////////////////    dct8_2(         a0, a4, a8, a12, a16, a20, a24, a28,         d0, d4, d8, d12, d16, d20, d24, d28         );    dct8_2(         a1, a5, a9, a13, a17, a21, a25, a29,         d1, d5, d9, d13, d17, d21, d25, d29         );    dct8_2(         a2, a6, a10, a14, a18, a22, a26, a30,         d2, d6, d10, d14, d18, d22, d26, d30         );    dct8_2(         a3, a7, a11, a15, a19, a23, a27, a31,         d3, d7, d11, d15, d19, d23, d27, d31         );	////////////////////////////////////////////////////////////////////////////////////////////////////    //      Quantization    ////////////////////////////////////////////////////////////////////////////////////////////////////    quantize (d0, scale0, d0);          // Each of 64 co-efficients are stored as 16x2 are quantized    quantize (d1, scale1, d1);    quantize (d2, scale2, d2);    quantize (d3, scale3, d3);    quantize (d4, scale4, d4);    quantize (d5, scale5, d5);    quantize (d6, scale6, d6);    quantize (d7, scale7, d7);    quantize (d8, scale8, d8);    quantize (d9, scale9, d9);    quantize (d10, scale10, d10);    quantize (d11, scale11, d11);    quantize (d12, scale12, d12);    quantize (d13, scale13, d13);    quantize (d14, scale14, d14);    quantize (d15, scale15, d15);    quantize (d16, scale16, d16);    quantize (d17, scale17, d17);    quantize (d18, scale18, d18);    quantize (d19, scale19, d19);    quantize (d20, scale20, d20);    quantize (d21, scale21, d21);    quantize (d22, scale22, d22);    quantize (d23, scale23, d23);    quantize (d24, scale24, d24);    quantize (d25, scale25, d25);    quantize (d26, scale26, d26);    quantize (d27, scale27, d27);    quantize (d28, scale28, d28);    quantize (d29, scale29, d29);    quantize (d30, scale30, d30);    quantize (d31, scale31, d31);    // Note: the output is transposed relative to the input!}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -