📄 img_ycbcr422p_rgb565.h64
字号:
* Below are some common coefficient sets, along with the matrix *
* equation that they correspond to. Coefficients are in signed *
* Q13 notation, which gives a suitable balance between precision *
* and range. *
* *
* 1. Y'CbCr -> RGB conversion with RGB levels that correspond to *
* the 219-level range of Y'. Expected ranges are [16..235] for *
* Y' and [16..240] for Cb and Cr. *
* *
* coeff[] = { 0x2000, 0x2BDD, -0x0AC5, -0x1658, 0x3770 }; *
* *
* [ 1.0000 0.0000 1.3707 ] [ Y' - 16 ] [ R'] *
* [ 1.0000 -0.3365 -0.6982 ] * [ Cb - 128 ] = [ G'] *
* [ 1.0000 1.7324 0.0000 ] [ Cr - 128 ] [ B'] *
* *
* 2. Y'CbCr -> RGB conversion with the 219-level range of Y' *
* expanded to fill the full RGB dynamic range. (The matrix *
* has been scaled by 255/219.) Expected ranges are [16..235] *
* for Y' and [16..240] for Cb and Cr. *
* *
* coeff[] = { 0x2543, 0x3313, -0x0C8A, -0x1A04, 0x408D }; *
* *
* [ 1.1644 0.0000 1.5960 ] [ Y' - 16 ] [ R'] *
* [ 1.1644 -0.3918 -0.8130 ] * [ Cb - 128 ] = [ G'] *
* [ 1.1644 2.0172 0.0000 ] [ Cr - 128 ] [ B'] *
* *
* 3. Y'CbCr -> BGR conversion with RGB levels that correspond to *
* the 219-level range of Y'. This is equivalent to #1 above, *
* except that the R, G, and B output order in the packed *
* pixels is reversed. Note: The 'cr_data' and 'cb_data' *
* input arguments must be exchanged for this example as *
* indicated under USAGE above. *
* *
* coeff[] = { 0x2000, 0x3770, -0x1658, -0x0AC5, 0x2BDD }; *
* *
* [ 1.0000 0.0000 1.7324 ] [ Y' - 16 ] [ B'] *
* [ 1.0000 -0.6982 -0.3365 ] * [ Cr - 128 ] = [ G'] *
* [ 1.0000 1.3707 0.0000 ] [ Cb - 128 ] [ R'] *
* *
* 4. Y'CbCr -> BGR conversion with the 219-level range of Y' *
* expanded to fill the full RGB dynamic range. This is *
* equivalent to #2 above, except that the R, G, and B output *
* order in the packed pixels is reversed. Note: The *
* 'cr_data' and 'cb_data' input arguments must be exchanged *
* for this example as indicated under USAGE above. *
* *
* coeff[] = { 0x2000, 0x408D, -0x1A04, -0x0C8A, 0x3313 }; *
* *
* [ 1.0000 0.0000 2.0172 ] [ Y' - 16 ] [ B'] *
* [ 1.0000 -0.8130 -0.3918 ] * [ Cr - 128 ] = [ G'] *
* [ 1.0000 1.5960 0.0000 ] [ Cb - 128 ] [ R'] *
* *
* Other scalings of the color differences (B'-Y') and (R'-Y') *
* (sometimes incorrectly referred to as U and V) are supported, as *
* long as the color differences are unsigned values centered around *
* 128 rather than signed values centered around 0, as noted above. *
* *
* In addition to performing plain color-space conversion, color *
* saturation can be adjusted by scaling coeff[1] through coeff[4]. *
* Similarly, brightness can be adjusted by scaling coeff[0]. *
* General hue adjustment can not be performed, however, due to the *
* two zeros hard-coded in the matrix. *
* *
* TECHNIQUES *
* Pixel replication is performed implicitly on chroma data to *
* reduce the total number of multiplies required. The chroma *
* portion of the matrix is calculated once for each Cb, Cr pair, *
* and the result is added to both Y' samples. *
* *
* Matrix Multiplication is performed as a combination of MPY2s and *
* DOTP2s. Saturation to 8bit values is performed using SPACKU4 *
* which takes in 4 signed 16-bit values and saturates them to *
* unsigned 8-bit values. The output of Matrix Multiplication would *
* ideally be in a Q13 format. This however, cannot be fed directly *
* to SPACKU4. *
* *
* This implies a shift left by 3 bits, which could be pretty *
* expensive in terms of the number of shifts to be performed. Thus, *
* to avoid being bottlenecked by so many shifts, the Y, Cr & Cb data *
* are shifted left by 3 before multiplication. This is possible *
* because they are 8-bit unsigned data. Due to this, the output of *
* Matrix Multiplication is in a Q16 format, which can be directly *
* fed to SPACKU4. *
* *
* Because the loop accesses four different arrays at three *
* different strides, no memory accesses are allowed to parallelize *
* in the loop. No bank conflicts occur, as a result. *
* *
* The epilog has been completely removed, while the prolog is left *
* as is. However, some cycles of the prolog are performed using the *
* kernel cycles to help reduce code-size. The setup code is merged *
* along with the prolog for speed. *
* *
* ASSUMPTIONS *
* The number of luma samples to be processed needs to be a multiple *
* of 8. *
* The input Y array needs to be double-word aligned. *
* The input Cr and Cb arrays need to be word aligned *
* The output image must be double-word aligned. *
* *
* NOTES *
* No bank conflicts occur. *
* *
* Codesize is 952 bytes. *
* *
* Memory bank conflicts will not occurs since the 3 loads and two *
* stores happen in different cycles of the loop *
* *
* The kernel requires 3 words of stack space. *
* *
* CYCLES *
* 12 * num_pixels/8 + 50 *
* *
* CODESIZE *
* 952 bytes *
* *
* SOURCE *
* Poynton, Charles et al. "The Color FAQ," 1999. *
* http://home.inforamp.net/~poynton/ColorFAQ.html *
* ------------------------------------------------------------------------- *
* Copyright (c) 2003 Texas Instruments, Incorporated. *
* All Rights Reserved. *
* ========================================================================= *
.global _IMG_ycbcr422p_rgb565
* ========================================================================= *
* End of file: img_ycbcr422p_rgb565.h64 *
* ------------------------------------------------------------------------- *
* Copyright (c) 2003 Texas Instruments, Incorporated. *
* All Rights Reserved. *
* ========================================================================= *
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -