⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 vp3-format.txt

📁 mediastreamer2是开源的网络传输媒体流的库
💻 TXT
📖 第 1 页 / 共 4 页
字号:
VP3 Bitstream Format and Decoding Processby Mike Melanson (mike at multimedia.cx)v0.5: December 8, 2004[December 8, 2004: Note that this document is not complete and likelywill never be completed. However, it helped form the basis of Theora I specification available at  http://www.theora.org/doc/Theora_I_spec.pdf ]Contents-------- * Introduction * Underlying Coding Concepts * VP3 Coding Overview * VP3 Chunk Format * Decoding The Frame Header * Initializing The Quantization Matrices * Hilbert Coding Pattern * Unpacking The Block Coding Information * Unpacking The Macroblock Coding Mode Information * Unpacking The Macroblock Motion Vectors * Unpacking The DCT Coefficients * Reversing The DC Prediction * Reconstructing The Frame * Theora Specification * Appendix A: Quantization Matrices And Scale Factors * Appendix B: Macroblock Coding Mode Alphabets * Appendix C: DCT Coefficient VLC Tables * Appendix D: The VP3 IDCT * Acknowledgements * References * ChangelogIntroduction------------A company named On2 (http://www.on2.com) created a video codec namedVP3. Eventually, they decided to open source it. Like any body of codethat was produced on a deadline, the source code was not particularlyclean or well-documented. This makes it difficult to understand thefundamental operation of the codec.This document describes the VP3 bitstream format and decoding process ata higher level than source code.Underlying Coding Concepts-------------------------- In order to understand the VP3 coding method it is necessary tounderstand the individual steps in the process. Like many multimediacompression algorithms VP3 does not consist of a single coding method.Rather, it uses a chain of methods to achieve compression.If you are acquainted with the MPEG video clique then many of VP3'scoding concepts should look familiar as well. What follows is a list ofthe coding methods used in VP3 and a brief description of each.* Discrete Cosine Transform (DCT): This is a magical mathematicalfunction that takes a group of numbers and turns it into another groupof numbers. The transformed group of numbers exhibits some curiousproperties. Notably, larger numbers are concentrated in certain areas ofthe transformed group. A video codec like VP3 often operates on 8x8 blocks of numbers. Whenthese 8x8 blocks are transformed using a DCT the larger numbers occurmostly in the up and left areas of the block with the largest numberoccurring as the first in the block (up-left corner). This number iscalled the DC coefficient. The other 63 numbers are called the ACcoefficients.The DCT and its opposite operation, the inverse DCT, require a lot ofmultiplications. Much research and experimentation is focused ofoptimizing this phase of the coding/decoding process.* Quantization: This coding step tosses out information by essentiallydividing a number to be coded by a factor and throwing away theremainder. The inverse process (dequantization) involves multiplying bythe same factor to obtain a number that is close enough to the original.* Run Length Encoding (RLE): The concept behind RLE is to shorten runsof numbers that are the same. For example, the string "88888" is encodedas (5, 8), indicating a run of 5 '8' numbers. In VP3 (and MPEG/JPEG),RLE is used to record the number of zero-value coefficients that occurbefore a non-zero coefficient. For example:  0 0 0 0 5 0 2 0 0 0 9is encoded as:  (4, 5), (1, 2), (3, 9)This indicates that a run of 4 zeroes is followed by a coefficient of 5;then a run of 1 zero is followed by 2; then a run of 3 zeroes isfollowed by 9.* Zigzag Ordering: After transforming and quantizing a block of samples,the samples are not in an optimal order for run length encoding. Zigzagordering rearranges the samples to put more zeros between non-zerosamples.* Differential (or Delta) Pulse Code Modulation (DPCM): 1 + 1 = 2. Gotthat? Seriously, that is what DPCM means. Rather than encoding absolutevalues, encode the differences between successive values. For example:  82 84 81 80 86 88 85Can be delta-encoded as:  82 +2 -3 -1 +6 +2 -3Most of the numbers turn into smaller numbers which require lessinformation to encode.* Motion Compensation: Simply, this coding method specifies that a blockfrom a certain position in the previous frame is to be copied into a newposition in the current frame. This technique is often combined with DCTand DPCM coding, as well as fractional pixel motion.* Entropy Coding (a.k.a. Huffman Coding): This is the process of codingfrequently occurring symbols with fewer bits than symbols that are notlikely to occur as frequently.* Variable Length Run Length Booleans: An initial Boolean bit isextracted from the bitstream. A variable length code (VLC) is extractedfrom the bitstream and converted to a count. This count indicates thatthe next (count) elements are to be set to the Boolean value.Afterwards, the Boolean value is toggled, the next VLC is extracted andconverted to a count, and the process continues until all elements areset to either 0 or 1.* YUV Colorspace: Like many modern video codecs, VP3 operates on a YUVcolorspace rather than a RGB colorspace. Specifically, VP3 uses YUV4:2:0, alias YUV420P, YV12. Note: Throughout the course of thisdocument, the U and V planes (a.k.a., Cb and Cr planes) will becollectively referred to as C planes (color or chrominance planes).* Frame Types: VP3 has intra-coded frames, a.k.a. intraframes, I-frames,or keyframes. VP3 happens to call these golden frames. VP3 hasinterframes, a.k.a. predicted frames or P-frames. These frames can useinformation from either the previous interframe or from the previousgolden frame.VP3 Overview------------The first thing to understand about the VP3 coding method is that itencodes all 3 planes upside down. That is, the data is encoded frombottom-to-top rather than top-to-bottom as is done with many videocodecs.VP3 codes a video frame by first breaking each of the 3 planes (Y, U,and V) into a series of 8x8 blocks called fragments. VP3 also has anotion of superblocks. Superblocks encapsulate 16 fragments arranged ina 4x4 matrix. Each plane has its own set of superblocks. Further, VP3also uses the notion of macroblocks which is the same as that found inJPEG/MPEG. One macroblock encompasses 4 blocks from the Y plane arrangedin a 2x2 matrix, 1 block from the U plane, and 1 block from the V plane.While a fragment or a superblock applies to 1 and only 1 plane, amacroblock extends over all 3 planes.VP3 compresses golden frames by transforming each fragment with adiscrete cosine transform. Each transformed sample is then quantized andthe DC coefficient is reduced via DPCM using a combination of DCcoefficients from surrounding fragments as predictors. Then, eachfragment's DC coefficient is entropy-coded in the output bitstream,followed by each fragment's first AC coefficient, then each second ACcoefficient, and so on.An interframe, naturally, is more complicated. While there is only onecoding mode available for a golden frame (intra coding), there are 8coding modes that the VP3 coder can choose from for interframemacroblocks. Intra coding as seen in the keyframe is still available.The rest of the modes involve encoding a fragment diff, either from theprevious frame or the golden frame, from the same coordinate or from thesame coordinate plus a motion vector. All of the macroblock coding modesand motion vectors are encoded in an interframe bitstream.VP3 Chunk Format----------------The high-level format of a compressed VP3 frame is laid out as: * chunk header * block coding information * macroblock coding mode information * motion vectors * DC coefficients * 1st AC coefficients * 2nd AC coefficients * ... * 63rd AC coefficientsDecoding The Frame Header-------------------------The chunk header always contains at least 1 byte which has the followingformat:  bit 7: 0 = golden frame, 1 = interframe  bit 6: unused  bits 5-0: Quality index (0..63)Further, if the frame is a golden frame, there are 2 more bytes in theheader:  byte 0: version byte 0  byte 1:    bits 7-3: VP3 version number (stored)    bit 2:    key frame coding method (0 = DCT key frame, only type              supported)    bits 1-0: unused, spare bitsAll frame headers are encoded with a quality index. This 6-bit value isused to index into 2 dequantizer scaling tables, 1 for DC values and 1for AC values. Each of the 3 dequantization tables is modified per thesescaling values.Initializing The Quantization Matrices--------------------------------------VP3 has three static matrices for quantizing and dequantizing fragments.One matrix is for quantizing golden frame Y fragments, one matrix is for quantizing golden frame C fragments, and one matrix is for quantizing bothgolden frame and interframe Y or C fragments. While these matrices arestatic, they are adjusted according to quality index coded in the header.The quality index is an index into 2 64-element tables:dc_scale_factor[] and ac_scale_factor[]. Each quantization factor fromeach of the three quantization matrices is adjusted by the appropriatescale factor according to this formula:                base quantizer * scale factor  quantizer  =  -----------------------------                            100         where scale factor =        dc_scale_factor[quality_index] for DC dequantizer        ac_scale_factor[quality_index] for AC dequantizerThe quantization matrices need to be recalculated at the beginning of aframe decode if the current frame's quality index is different from theprevious frame's quality index.See Appendix A for the complete VP3 quantization matrices and scale factortables.As an example, this is the base quantization matrix for golden frame Yfragments:    16  11  10  16  24  40  51  61    12  12  14  19  26  58  60  55    14  13  16  24  40  57  69  56    14  17  22  29  51  87  80  62    18  22  37  58  68 109 103  77    24  35  55  64  81 104 113  92    49  64  78  87 103 121 120 101    72  92  95  98 112 100 103  99If a particular coded frame specifies a quality index of 54. Element 54of the dc_scale_factor table is 20, thus:                               16 * 20  DC coefficient quantizer  =  -------  =  3                                 100Element 54 of the ac_scale_factor table is 24. The AC coefficientquantizers are each scaled using this factor, e.g.:    11 * 24    -------   =  2      100    100 * 24    --------  =  24      100[not complete; still need to explain how these quantizers are saturatedand scaled with respect to the DCT process]Hilbert Coding Pattern----------------------VP3 uses a Hilbert pattern to code fragments within a superblock. AHilbert pattern is a recursive pattern that can grow quite complicated.The coding pattern that VP3 uses is restricted to this pattern subset,where each fragment in a superblock is represented by a 'X':      X -> X    X -> X           |    ^           v    |      X <- X    X <- X      |              ^      v              |      X    X -> X    X      |    ^    |    ^      v    |    v    |      X -> X    X -> XAs an example of this pattern, consider a plane that is 256 samples wideand 64 samples high. Each fragment row will be 32 fragments wide. Thefirst superblock in the plane will be comprised of these 16 fragments:   0   1   2   3  ...  31  32  33  34  35  ...  63  64  65  66  67  ...  95  96  97  98  99  ... 127The order in which these 16 fragments are coded is:   0 |  0  1  14 15  32 |  3  2  13 12  64 |  4  7   8 11  96 |  5  6   9 10All of the image coding information, including the block coding statusand modes, the motion vectors, and the DCT coefficients, are all codedand decoded using this pattern. Thus, it is rather critical to have thepattern and all of its corner cases handled correctly. In the aboveexample, if the bottom row and left column were not present due to thesuperblock being in a corner, the pattern proceeds as if the missingfragments were present, but the missing fragments are omitted in thefinal coding list. The coding order would be:  0, 1, 2, 3, 4, 7, 8, 13, 14

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -