⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 fdct_mmx.asm

📁 jpeg and mpeg 编解码技术源代码
💻 ASM
📖 第 1 页 / 共 3 页
字号:
;/******************************************************************************
; *                                                                            *
; *  This file is part of XviD, a free MPEG-4 video encoder/decoder            *
; *                                                                            *
; *  XviD is an implementation of a part of one or more MPEG-4 Video tools     *
; *  as specified in ISO/IEC 14496-2 standard.  Those intending to use this    *
; *  software module in hardware or software products are advised that its     *
; *  use may infringe existing patents or copyrights, and any such use         *
; *  would be at such party's own risk.  The original developer of this        *
; *  software module and his/her company, and subsequent editors and their     *
; *  companies, will have no liability for use of this software or             *
; *  modifications or derivatives thereof.                                     *
; *                                                                            *
; *  XviD is free software; you can redistribute it and/or modify it           *
; *  under the terms of the GNU General Public License as published by         *
; *  the Free Software Foundation; either version 2 of the License, or         *
; *  (at your option) any later version.                                       *
; *                                                                            *
; *  XviD is distributed in the hope that it will be useful, but               *
; *  WITHOUT ANY WARRANTY; without even the implied warranty of                *
; *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the             *
; *  GNU General Public License for more details.                              *
; *                                                                            *
; *  You should have received a copy of the GNU General Public License         *
; *  along with this program; if not, write to the Free Software               *
; *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA  *
; *                                                                            *
; ******************************************************************************/
;
;/******************************************************************************
; *                                                                            *
; *  fdct_mmx.asm, MMX optimized forward DCT                                   *
; *                                                                            *
; *  Initial, but incomplete version provided by Intel at AppNote AP-922       *
; *  http://developer.intel.com/vtune/cbts/strmsimd/922down.htm                *
; *  Copyright (C) 1999 Intel Corporation,                                     *
; *                                                                            *
; *  completed and corrected in fdctmm32.c/fdctmm32.doc,                       *
; *  http://members.tripod.com/~liaor                                          *
; *  Copyright (C) 2000 - Royce Shih-Wea Liao <liaor@iname.com>,               *
; *                                                                            *
; *  ported to NASM and some minor changes                                     * 
; *  Copyright (C) 2001 - Michael Militzer <isibaar@xvid.org>                  *
; *                                                                            *
; *  For more information visit the XviD homepage: http://www.xvid.org         *
; *                                                                            *
; ******************************************************************************/
;
;/******************************************************************************
; *                                                                            *
; *  Revision history:                                                         *
; *                                                                            *
; *  04.11.2001 loop unrolled (Isibaar) 									   *
; *  02.11.2001 initial version  (Isibaar)                                     *
; *                                                                            *
; ******************************************************************************/


BITS 32

%macro cglobal 1 
	%ifdef PREFIX
		global _%1 
		%define %1 _%1
	%else
		global %1
	%endif
%endmacro

%define INP eax
%define TABLE ebx
%define TABLEF ebx
%define OUT ecx
%define round_frw_row edx

%define INP_1 eax + 16
%define INP_2 eax + 32
%define INP_3 eax + 48
%define INP_4 eax + 64
%define INP_5 eax + 80
%define INP_6 eax + 96
%define INP_7 eax + 112

%define OUT_1 ecx + 16
%define OUT_2 ecx + 32
%define OUT_3 ecx + 48
%define OUT_4 ecx + 64
%define OUT_5 ecx + 80
%define OUT_6 ecx + 96
%define OUT_7 ecx + 112
%define OUT_8 ecx + 128

%define TABLE_1 ebx + 64
%define TABLE_2 ebx + 128
%define TABLE_3 ebx + 192
%define TABLE_4 ebx + 256
%define TABLE_5 ebx + 320
%define TABLE_6 ebx + 384
%define TABLE_7 ebx + 448

%define x0 INP + 0*16
%define x1 INP + 1*16
%define x2 INP + 2*16
%define x3 INP + 3*16
%define x4 INP + 4*16
%define x5 INP + 5*16
%define x6 INP + 6*16
%define x7 INP + 7*16
%define y0 OUT + 0*16
%define y1 OUT + 1*16
%define y2 OUT + 2*16
%define y3 OUT + 3*16
%define y4 OUT + 4*16
%define y5 OUT + 5*16
%define y6 OUT + 6*16
%define y7 OUT + 7*16

%define tg_1_16 (TABLEF + 0)
%define tg_2_16 (TABLEF + 8)
%define tg_3_16 (TABLEF + 16)
%define cos_4_16 (TABLEF + 24)
%define ocos_4_16 (TABLEF + 32)


SECTION .data

ALIGN 16

BITS_FRW_ACC equ 3							; 2 or 3 for accuracy
SHIFT_FRW_COL equ BITS_FRW_ACC
SHIFT_FRW_ROW equ (BITS_FRW_ACC + 17)
RND_FRW_ROW equ (1 << (SHIFT_FRW_ROW-1))

SHIFT_FRW_ROW_CLIP2	equ (4)
SHIFT_FRW_ROW_CLIP1	equ (SHIFT_FRW_ROW - SHIFT_FRW_ROW_CLIP2)

one_corr		 dw 1, 1, 1, 1


r_frw_row		 dd RND_FRW_ROW, RND_FRW_ROW


tg_all_16		 dw 13036, 13036, 13036, 13036,		; tg * (2<<16) + 0.5
				 dw 27146, 27146, 27146, 27146,		; tg * (2<<16) + 0.5
				 dw -21746, -21746, -21746, -21746,	; tg * (2<<16) + 0.5
				 dw -19195, -19195, -19195, -19195,	; cos * (2<<16) + 0.5
				 dw 23170, 23170, 23170, 23170      ; cos * (2<<15) + 0.5


tab_frw_01234567 
				 ; row0
				 dw 16384, 16384, 21407, -8867,     ; w09 w01 w08 w00
				 dw 16384, 16384, 8867, -21407,     ; w13 w05 w12 w04
                 dw 16384, -16384, 8867, 21407,     ; w11 w03 w10 w02
                 dw -16384, 16384, -21407, -8867,   ; w15 w07 w14 w06
                 dw 22725, 12873, 19266, -22725,    ; w22 w20 w18 w16
                 dw 19266, 4520, -4520, -12873,     ; w23 w21 w19 w17
                 dw 12873, 4520, 4520, 19266,       ; w30 w28 w26 w24
                 dw -22725, 19266, -12873, -22725,  ; w31 w29 w27 w25

				 ; row1
                 dw 22725, 22725, 29692, -12299,    ; w09 w01 w08 w00
                 dw 22725, 22725, 12299, -29692,    ; w13 w05 w12 w04
                 dw 22725, -22725, 12299, 29692,    ; w11 w03 w10 w02
                 dw -22725, 22725, -29692, -12299,  ; w15 w07 w14 w06
                 dw 31521, 17855, 26722, -31521,    ; w22 w20 w18 w16
                 dw 26722, 6270, -6270, -17855,     ; w23 w21 w19 w17
                 dw 17855, 6270, 6270, 26722,       ; w30 w28 w26 w24
                 dw -31521, 26722, -17855, -31521,  ; w31 w29 w27 w25

				 ; row2
                 dw 21407, 21407, 27969, -11585,    ; w09 w01 w08 w00
                 dw 21407, 21407, 11585, -27969,    ; w13 w05 w12 w04
                 dw 21407, -21407, 11585, 27969,    ; w11 w03 w10 w02
                 dw -21407, 21407, -27969, -11585,	; w15 w07 w14 w06
                 dw 29692, 16819, 25172, -29692,    ; w22 w20 w18 w16
                 dw 25172, 5906, -5906, -16819,     ; w23 w21 w19 w17
                 dw 16819, 5906, 5906, 25172,       ; w30 w28 w26 w24
                 dw -29692, 25172, -16819, -29692,  ; w31 w29 w27 w25

				 ; row3
                 dw 19266, 19266, 25172, -10426,    ; w09 w01 w08 w00
                 dw 19266, 19266, 10426, -25172,    ; w13 w05 w12 w04
                 dw 19266, -19266, 10426, 25172,    ; w11 w03 w10 w02
                 dw -19266, 19266, -25172, -10426,  ; w15 w07 w14 w06 
                 dw 26722, 15137, 22654, -26722,    ; w22 w20 w18 w16
                 dw 22654, 5315, -5315, -15137,     ; w23 w21 w19 w17
                 dw 15137, 5315, 5315, 22654,       ; w30 w28 w26 w24
                 dw -26722, 22654, -15137, -26722,  ; w31 w29 w27 w25

				 ; row4
                 dw 16384, 16384, 21407, -8867,     ; w09 w01 w08 w00
                 dw 16384, 16384, 8867, -21407,     ; w13 w05 w12 w04
                 dw 16384, -16384, 8867, 21407,     ; w11 w03 w10 w02
                 dw -16384, 16384, -21407, -8867,   ; w15 w07 w14 w06
                 dw 22725, 12873, 19266, -22725,    ; w22 w20 w18 w16
                 dw 19266, 4520, -4520, -12873,     ; w23 w21 w19 w17
                 dw 12873, 4520, 4520, 19266,       ; w30 w28 w26 w24
                 dw -22725, 19266, -12873, -22725,  ; w31 w29 w27 w25 

				 ; row5
                 dw 19266, 19266, 25172, -10426,    ; w09 w01 w08 w00
                 dw 19266, 19266, 10426, -25172,    ; w13 w05 w12 w04
                 dw 19266, -19266, 10426, 25172,    ; w11 w03 w10 w02
                 dw -19266, 19266, -25172, -10426,  ; w15 w07 w14 w06
                 dw 26722, 15137, 22654, -26722,    ; w22 w20 w18 w16
                 dw 22654, 5315, -5315, -15137,     ; w23 w21 w19 w17
                 dw 15137, 5315, 5315, 22654,       ; w30 w28 w26 w24
                 dw -26722, 22654, -15137, -26722,  ; w31 w29 w27 w25

				 ; row6
                 dw 21407, 21407, 27969, -11585,    ; w09 w01 w08 w00
                 dw 21407, 21407, 11585, -27969,    ; w13 w05 w12 w04
                 dw 21407, -21407, 11585, 27969,    ; w11 w03 w10 w02
                 dw -21407, 21407, -27969, -11585,  ; w15 w07 w14 w06
                 dw 29692, 16819, 25172, -29692,    ; w22 w20 w18 w16
                 dw 25172, 5906, -5906, -16819,     ; w23 w21 w19 w17
                 dw 16819, 5906, 5906, 25172,       ; w30 w28 w26 w24
                 dw -29692, 25172, -16819, -29692,  ; w31 w29 w27 w25

				 ; row7
                 dw 22725, 22725, 29692, -12299,    ; w09 w01 w08 w00
                 dw 22725, 22725, 12299, -29692,    ; w13 w05 w12 w04
                 dw 22725, -22725, 12299, 29692,    ; w11 w03 w10 w02
                 dw -22725, 22725, -29692, -12299,  ; w15 w07 w14 w06
                 dw 31521, 17855, 26722, -31521,    ; w22 w20 w18 w16
                 dw 26722, 6270, -6270, -17855,     ; w23 w21 w19 w17
                 dw 17855, 6270, 6270, 26722,       ; w30 w28 w26 w24
                 dw -31521, 26722, -17855, -31521   ; w31 w29 w27 w25


SECTION .text

ALIGN 16

cglobal fdct_mmx
;;void fdct_mmx(short *block);
fdct_mmx:
	
	push ebx

    mov INP, dword [esp + 8]	; block

    mov TABLEF, tg_all_16
    mov OUT, INP

    movq mm0, [x1]				; 0 ; x1

    movq mm1, [x6]				; 1 ; x6
    movq mm2, mm0				; 2 ; x1

    movq mm3, [x2]				; 3 ; x2
    paddsw mm0, mm1				; t1 = x[1] + x[6]

    movq mm4, [x5]				; 4 ; x5
    psllw mm0, SHIFT_FRW_COL	; t1

    movq mm5, [x0]				; 5 ; x0
    paddsw mm4, mm3				; t2 = x[2] + x[5]

    paddsw mm5, [x7]			; t0 = x[0] + x[7]
    psllw mm4, SHIFT_FRW_COL	; t2

    movq mm6, mm0				; 6 ; t1
    psubsw mm2, mm1				; 1 ; t6 = x[1] - x[6]

    movq mm1, [tg_2_16]			; 1 ; tg_2_16
    psubsw mm0, mm4				; tm12 = t1 - t2

    movq mm7, [x3]				; 7 ; x3
    pmulhw mm1, mm0				; tm12*tg_2_16

    paddsw mm7, [x4]			; t3 = x[3] + x[4]
    psllw mm5, SHIFT_FRW_COL	; t0

    paddsw mm6, mm4				; 4 ; tp12 = t1 + t2
    psllw mm7, SHIFT_FRW_COL	; t3

    movq mm4, mm5				; 4 ; t0
    psubsw mm5, mm7				; tm03 = t0 - t3

    paddsw mm1, mm5				; y2 = tm03 + tm12*tg_2_16
    paddsw mm4, mm7				; 7 ; tp03 = t0 + t3

    por mm1, qword [one_corr]	; correction y2 +0.5
    psllw mm2, SHIFT_FRW_COL+1	; t6

    pmulhw mm5, [tg_2_16]		; tm03*tg_2_16
    movq mm7, mm4				; 7 ; tp03

    psubsw mm3, [x5]			; t5 = x[2] - x[5]
    psubsw mm4, mm6				; y4 = tp03 - tp12

    movq [y2], mm1				; 1 ; save y2
    paddsw mm7, mm6				; 6 ; y0 = tp03 + tp12
     
    movq mm1, [x3]				; 1 ; x3
    psllw mm3, SHIFT_FRW_COL+1	; t5

    psubsw mm1, [x4]			; t4 = x[3] - x[4]
    movq mm6, mm2				; 6 ; t6
    
    movq [y4], mm4				; 4 ; save y4
    paddsw mm2, mm3				; t6 + t5

    pmulhw mm2, [ocos_4_16]		; tp65 = (t6 + t5)*cos_4_16
    psubsw mm6, mm3				; 3 ; t6 - t5

    pmulhw mm6, [ocos_4_16]		; tm65 = (t6 - t5)*cos_4_16
    psubsw mm5, mm0				; 0 ; y6 = tm03*tg_2_16 - tm12

    por mm5, qword [one_corr]	; correction y6 +0.5
    psllw mm1, SHIFT_FRW_COL	; t4

    por mm2, qword [one_corr]	; correction tp65 +0.5
    movq mm4, mm1				; 4 ; t4

    movq mm3, [x0]				; 3 ; x0
    paddsw mm1, mm6				; tp465 = t4 + tm65

    psubsw mm3, [x7]			; t7 = x[0] - x[7]
    psubsw mm4, mm6				; 6 ; tm465 = t4 - tm65

    movq mm0, [tg_1_16]			; 0 ; tg_1_16
    psllw mm3, SHIFT_FRW_COL	; t7

    movq mm6, [tg_3_16]			; 6 ; tg_3_16
    pmulhw mm0, mm1				; tp465*tg_1_16

    movq [y0], mm7				; 7 ; save y0
    pmulhw mm6, mm4				; tm465*tg_3_16

    movq [y6], mm5				; 5 ; save y6
    movq mm7, mm3				; 7 ; t7

    movq mm5, [tg_3_16]			; 5 ; tg_3_16
    psubsw mm7, mm2				; tm765 = t7 - tp65

    paddsw mm3, mm2				; 2 ; tp765 = t7 + tp65
    pmulhw mm5, mm7				; tm765*tg_3_16

    paddsw mm0, mm3				; y1 = tp765 + tp465*tg_1_16
    paddsw mm6, mm4				; tm465*tg_3_16

    pmulhw mm3, [tg_1_16]		; tp765*tg_1_16
 	
    por mm0, qword [one_corr]	; correction y1 +0.5
    paddsw mm5, mm7				; tm765*tg_3_16

    psubsw mm7, mm6				; 6 ; y3 = tm765 - tm465*tg_3_16
    add INP, 0x08

    movq [y1], mm0				; 0 ; save y1
    paddsw mm5, mm4				; 4 ; y5 = tm765*tg_3_16 + tm465

    movq [y3], mm7				; 7 ; save y3
    psubsw mm3, mm1				; 1 ; y7 = tp765*tg_1_16 - tp465

    movq [y5], mm5				; 5 ; save y5

    movq mm0, [x1]				; 0 ; x1

    movq [y7], mm3				; 3 ; save y7 (columns 0-4)

    movq mm1, [x6]				; 1 ; x6
    movq mm2, mm0				; 2 ; x1

    movq mm3, [x2]				; 3 ; x2
    paddsw mm0, mm1				; t1 = x[1] + x[6]

    movq mm4, [x5]				; 4 ; x5
    psllw mm0, SHIFT_FRW_COL	; t1

    movq mm5, [x0]				; 5 ; x0
    paddsw mm4, mm3				; t2 = x[2] + x[5]

    paddsw mm5, [x7]			; t0 = x[0] + x[7]
    psllw mm4, SHIFT_FRW_COL	; t2
	
    movq mm6, mm0				; 6 ; t1
    psubsw mm2, mm1				; 1 ; t6 = x[1] - x[6]

    movq mm1, [tg_2_16]			; 1 ; tg_2_16
    psubsw mm0, mm4				; tm12 = t1 - t2

    movq mm7, [x3]				; 7 ; x3
    pmulhw mm1, mm0				; tm12*tg_2_16

    paddsw mm7, [x4]			; t3 = x[3] + x[4]
    psllw mm5, SHIFT_FRW_COL	; t0

    paddsw mm6, mm4				; 4 ; tp12 = t1 + t2
    psllw mm7, SHIFT_FRW_COL	; t3

    movq mm4, mm5				; 4 ; t0
    psubsw mm5, mm7				; tm03 = t0 - t3

    paddsw mm1, mm5				; y2 = tm03 + tm12*tg_2_16
    paddsw mm4, mm7				; 7 ; tp03 = t0 + t3

    por mm1, qword [one_corr]	; correction y2 +0.5

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -