📄 ycbcr422pl_to_rgb565_h.h64

📁 基于DM642平台的视频缩小放大功能程序源代码
💻 H64
字号:
* ========================================================================= *
*   TEXAS INSTRUMENTS, INC.						    *
*									    *
*   NAME								    *
*	ycbcr422pl_to_rgb565 -- Planarized YCbCr 4:2:2/4:2:0 to 16-bit      *
*				RGB 5:6:5 color space conversion.	    *
*S									    *
*S  AUTHOR								    *
*S	Joseph Zbiciak  						    *
*S									    *
*S  REVISION HISTORY							    *
*S	25-Aug-2001 Initial complete version . . . . . . .  J. Zbiciak      *
*									    *
*   USAGE								    *
*	This function is C callable, and is called according to this	    *
*	C prototype:							    *
*									    *
*	void ycbcr422pl_to_rgb565					    *
*	(								    *
*	    const short 	coeff[5],  -- Matrix coefficients.	    *
*	    const unsigned char *y_data,   -- Luminence data  (Y')	    *
*	    const unsigned char *cb_data,  -- Blue color-diff (B'-Y')	    *
*	    const unsigned char *cr_data,  -- Red color-diff  (R'-Y')	    *
*	    unsigned short	*rgb_data, -- RGB 5:6:5 packed pixel out.   *
*	    unsigned		num_pixels -- # of luma pixels to process.  *
*	)								    *
*									    *
*	The 'coeff[]' array contains the color-space-conversion matrix      *
*	coefficients.  The 'y_data', 'cb_data' and 'cr_data' pointers	    *
*	point to the separate input image planes.  The 'rgb_data' pointer   *
*	points to the output image buffer, and must be word aligned.	    *
*									    *
*	The kernel is designed to process arbitrary amounts of 4:2:2	    *
*	image data, although 4:2:0 image data may be processed as well.     *
*	For 4:2:2 input data, the 'y_data', 'cb_data' and 'cr_data'	    *
*	arrays may hold an arbitrary amount of image data.  For 4:2:0	    *
*	input data, only a single scan-line (or portion thereof) may be     *
*	processed at a time.						    *
*									    *
*	The coefficients in the coeff array must be in signed Q13 form.     *
*	These coefficients correspond to the following matrix equation:     *
*									    *
*	    [ Y' -  16 ]   [ coeff[0] 0.0000   coeff[1] ]     [ R']	    *
*	    [ Cb - 128 ] * [ coeff[0] coeff[2] coeff[3] ]  =  [ G']	    *
*	    [ Cr - 128 ]   [ coeff[0] coeff[4] 0.0000	]     [ B']	    *
*									    *
*   DESCRIPTION 							    *
*	This function runs for 46 + (num_pixels * 3) cycles, including      *
*	6 cycles of function-call overhead.  Interrupts are masked for      *
*	37 + (num_pixels * 3) cycles.  Code size is 512 bytes.  	    *
*									    *
*	This kernel performs Y'CbCr to RGB conversion.  From the Color      *
*	FAQ, http://home.inforamp.net/~poynton/ColorFAQ.html :  	    *
*									    *
*	    Various scale factors are applied to (B'-Y') and (R'-Y')	    *
*	    for different applications.  The Y'PbPr scale factors are	    *
*	    optimized for component analog video.  The Y'CbCr scaling	    *
*	    is appropriate for component digital video, JPEG and MPEG.      *
*	    Kodak's PhotoYCC(tm) uses scale factors optimized for the	    *
*	    gamut of film colors.  Y'UV scaling is appropriate as an	    *
*	    intermediate step in the formation of composite NTSC or PAL     *
*	    video signals, but is not appropriate when the components	    *
*	    are keps separate.  Y'UV nomenclature is now used rather	    *
*	    loosely, and it sometimes denotes any scaling of (B'-Y')	    *
*	    and (R'-Y').  Y'IQ coding is obsolete.			    *
*									    *
*	This code can perform various flavors of Y'CbCr to RGB conversion   *
*	as long as the offsets on Y, Cb, and Cr are -16, -128, and -128,    *
*	respectively, and the coefficients match the pattern shown.	    *
*									    *
*	The kernel implements the following matrix form, which involves 5   *
*	unique coefficients:						    *
*									    *
*	    [ coeff[0] 0.0000	coeff[1] ]   [ Y' -  16 ]     [ R']	    *
*	    [ coeff[0] coeff[2] coeff[3] ] * [ Cb - 128 ]  =  [ G']	    *
*	    [ coeff[0] coeff[4] 0.0000   ]   [ Cr - 128 ]     [ B']	    *
*									    *
*									    *
*	Below are some common coefficient sets, along with the matrix	    *
*	equation that they correspond to.   Coefficients are in signed      *
*	Q13 notation, which gives a suitable balance between precision      *
*	and range.							    *
*									    *
*	1.  Y'CbCr -> RGB conversion with RGB levels that correspond to     *
*	    the 219-level range of Y'.  Expected ranges are [16..235] for   *
*	    Y' and [16..240] for Cb and Cr.				    *
*									    *
*	    coeff[] = { 0x2000, 0x2BDD, -0x0AC5, -0x1658, 0x3770 };	    *
*									    *
*	    [ 1.0000	0.0000    1.3707 ]   [ Y' -  16 ]     [ R']	    *
*	    [ 1.0000   -0.3365   -0.6982 ] * [ Cb - 128 ]  =  [ G']	    *
*	    [ 1.0000	1.7324    0.0000 ]   [ Cr - 128 ]     [ B']	    *
*									    *
*	2.  Y'CbCr -> RGB conversion with the 219-level range of Y'	    *
*	    expanded to fill the full RGB dynamic range.  (The matrix has   *
*	    been scaled by 255/219.)  Expected ranges are [16..235] for Y'  *
*	    and [16..240] for Cb and Cr.				    *
*									    *
*	    coeff[] = { 0x2543, 0x3313, -0x0C8A, -0x1A04, 0x408D };	    *
*									    *
*	    [ 1.1644	0.0000    1.5960 ]   [ Y' -  16 ]     [ R']	    *
*	    [ 1.1644   -0.3918   -0.8130 ] * [ Cb - 128 ]  =  [ G']	    *
*	    [ 1.1644	2.0172    0.0000 ]   [ Cr - 128 ]     [ B']	    *
*									    *
*	Other scalings of the color differences (B'-Y') and (R'-Y')	    *
*	(sometimes incorrectly referred to as U and V) are supported, as    *
*	long as the color differences are unsigned values centered around   *
*	128 rather than signed values centered around 0, as noted above.    *
*									    *
*	In addition to performing plain color-space conversion, color	    *
*	saturation can be adjusted by scaling coeff[1] through coeff[4].    *
*	Similarly, brightness can be adjusted by scaling coeff[0].	    *
*	General hue adjustment can not be performed, however, due to the    *
*	two zeros hard-coded in the matrix.				    *
*									    *
*   TECHNIQUES  							    *
*	Pixel replication is performed implicitly on chroma data to	    *
*	reduce the total number of multiplies required.  The chroma	    *
*	portion of the matrix is calculated once for each Cb, Cr pair,      *
*	and the result is added to both Y' samples.			    *
*									    *
*	Matrix Multiplication is performed as a combination of MPY2s and    *
*	DOTP2s. Saturation to 8bit values is performed using SPACKU4	    *
*	which takes in 4 signed 16-bit values and saturates them to unsigned*
*	8-bit values. The output of Matrix Multiplication would ideally be  *
*	in a Q13 format. This however, cannot be fed directly to SPACKU4.   *
*	This implies a shift left by 3 bits, which could be pretty	    *
*	expensive in terms of the number of shifts to be performed. Thus,   *
*	to avoid being bottlenecked by so many shifts, the Y, Cr & Cb data  *
*	are shifted left by 3 before multiplication. This is possible	    *
*	because they are 8-bit unsigned data. Due to this, the output of    *
*	Matrix Multiplication is in a Q16 format, which can be directly fed *
*	to SPACKU4							    *
*									    *
*	Because the loop accesses four different arrays at three different  *
*	strides, no memory accesses are allowed to parallelize in the	    *
*	loop.  No bank conflicts occur, as a result.			    *
*									    *
*	The epilog has been completely removed, while the prolog is left    *
*	as is. However, some cycles of the prolog are performed using the   *
*	kernel cycles to help reduce code-size. The setup code is merged    *
*	along with the prolog for speed.				    *
*									    *
*   ASSUMPTIONS 							    *
*	The number of luma samples to be processed needs to be a multiple   *
*	of 8.								    *
*	The input Y array needs to be double-word aligned.		    *
*	The input Cr and Cb arrays need to be word aligned		    *
*	The output image must be double-word aligned.			    *
*									    *
*   NOTES								    *
*	No bank conflicts occur.					    *
*									    *
*	Codesize is 960 bytes.  Increase in code size is due to loop	    *
*	where 8 pixels are processed in one run of the loop unrolling.      *
*									    *
*	Memory bank conflicts will not occurs since the 3 loads and two     *
*	stores happen in different cycles of the loop			    *
*									    *
*	The kernel requires 3  words of stack space.			    *
*									    *
*   SOURCE								    *
*	Poynton, Charles et al.  "The Color FAQ,"  1999.		    *
*	    http://home.inforamp.net/~poynton/ColorFAQ.html		    *
*									    *
* ------------------------------------------------------------------------- *
*	      Copyright (c) 2001 Texas Instruments, Incorporated.	    *
*			     All Rights Reserved.			    *
* ========================================================================= *

       .global _ycbcr422pl16_to_rgb565_asm


* ======================================================================== *
*  End of file:  ycbcr422pl_to_rgb565_h.h				   *
* ------------------------------------------------------------------------ *
*	     Copyright (c) 1999 Texas Instruments, Incorporated.	   *
*			    All Rights Reserved.			   *
* ======================================================================== *
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -