spec.tex
来自「mediastreamer2是开源的网络传输媒体流的库」· TEX 代码 · 共 1,662 行 · 第 1/5 页
TEX
1,662 行
\section{Coding Modes and Prediction}Each block is coded using one of a small, fixed set of \term{coding modes} that define how the block is predicted from previous frames.A block is predicted using one of two \term{reference frames}, selected according to the coding mode.A reference frame is the fully decoded version of a previous frame in the stream.The first available reference frame is the previous intra frame, called the \term{golden frame}.The second available reference frame is the previous frame, whether it was an intra frame or an inter frame.If the previous frame was an intra frame, then both reference frames are the same.See Figure~\ref{fig:reference-frames} for an illustration of the reference frames used for an intra frame that does not follow an intra frame.\begin{figure}[htbp]\begin{center}\includegraphics{reference-frames}\end{center}\caption{Example of reference frames for an inter frame}\label{fig:reference-frames}\end{figure}Two coding modes in particular are worth mentioning here.The INTRA mode is used for blocks that are not predicted from either reference frame.This is the only coding mode allowed in intra frames.The INTER\_NOMV coding mode uses the co-located contents of the block in the previous frame as the predictor.This is the default coding mode.\section{DCT Coefficients}\label{sec:dct-coeffs}A \term{residual} is added to the predicted contents of a block to form the final reconstruction.The residual is stored as a set of quantized coefficients from an integer approximation of a two-dimensional Type II Discrete Cosine Transform.The DCT takes an $8\times 8$ array of pixel values as input and returns an $8\times 8$ array of coefficient values.The \term{natural ordering} of these coefficients is defined to be row-major order, from lowest to highest frequency.They are also often indexed in \term{zig-zag order}, as shown in Figure~\ref{tab:zig-zag}.\begin{figure}[htbp]\begin{center}\begin{tabular}[c]{rr|c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c@{}c} &\multicolumn{1}{r}{} & && &&&&&$c$&&& && && \\ &\multicolumn{1}{r}{} &0&&1&&2&&3&&4&&5&&6&&7 \\\cline{3-17} &0 & 0 &$\rightarrow$& 1 && 5 &$\rightarrow$& 6 && 14 &$\rightarrow$& 15 && 27 &$\rightarrow$& 28 \\[-0.5\defaultaddspace] & & &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$& \\ &1 & 2 & & 4 && 7 & & 13 && 16 & & 26 && 29 & & 42 \\[-0.5\defaultaddspace] & &$\downarrow$&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&$\downarrow$ \\ &2 & 3 & & 8 && 12 & & 17 && 25 & & 30 && 41 & & 43 \\[-0.5\defaultaddspace] & & &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$& \\ &3 & 9 & & 11 && 18 & & 24 && 31 & & 40 && 44 & & 53 \\[-0.5\defaultaddspace]$r$&&$\downarrow$&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&$\downarrow$ \\ &4 & 10 & & 19 && 23 & & 32 && 39 & & 45 && 52 & & 54 \\[-0.5\defaultaddspace] & & &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$& \\ &5 & 20 & & 22 && 33 & & 38 && 46 & & 51 && 55 & & 60 \\[-0.5\defaultaddspace] & &$\downarrow$&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&&$\swarrow$&&$\nearrow$&$\downarrow$ \\ &6 & 21 & & 34 && 37 & & 47 && 50 & & 56 && 59 & & 61 \\[-0.5\defaultaddspace] & & &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$&&$\nearrow$& &$\swarrow$& \\ &7 & 35 &$\rightarrow$& 36 && 48 &$\rightarrow$& 49 && 57 &$\rightarrow$& 58 && 62 &$\rightarrow$& 63\end{tabular}\end{center}\caption{Zig-zag order}\label{tab:zig-zag}\end{figure}\begin{verse}{\bf Note:} the row and column indices refer to {\em frequency number} and not pixel locations.The frequency numbers are defined independently of the memory organization of the pixels.They have been written from top to bottom here to follow conventional notation, despite the right-handed coordinate system Theora uses for pixel locations.%RG: I'd rather we were internally consistent and put dc at the lower left.Many implementations of the DCT operate `in-place'.That is, they return DCT coefficients in the same memory buffer that the initial pixel values were stored in.Due to the right-handed coordinate system used for pixel locations in Theora, one must note carefully how both pixel values and DCT coefficients are organized in memory in such a system.\end{verse}DCT coefficient $(0,0)$ is called the \term{DC coefficient}.All the other coefficients are called \term{AC coefficients}.\chapter{Decoding Overview}This section provides a high level description of the Theora codec's construction.A bit-by-bit specification appears beginning in Section~\ref{sec:bitpacking}.The later sections assume a high-level understanding of the Theora decode process, which is provided below.\section{Decoder Configuration}Decoder setup consists of configuration of the quantization matrices and the Huffman codebooks for the DCT coefficients, and a table of limit values for the deblocking filter.The remainder of the decoding pipeline is not configurable.\subsection{Global Configuration}The global codec configuration consists of a few video related fields, such as frame rate, frame size, picture size and offset, aspect ratio, color space, pixel format, and a version number.The version number is divided into a major version, a minor version, amd a minor revision number.%r: afaik the released vp3 codec called itself 3.1 and is compatible w/ theora%r: even though we received the in-progress 3.2 codebaseFor the format defined in this specification, these are `3', `2', and `0', respectively, in reference to Theora's origin as a successor to the VP3.1 format.\subsection{Quantization Matrices}Theora allows up to 384 different quantization matrices to be defined, one for each \term{quantization type}, \term{color plane} ($Y'$, $C_b$, or $C_r$), and \term{quantization index}, \qi, which ranges from zero to 63, inclusive.There are currently two quantization types defined, which depend on the coding mode of the block being dequantized, as shown in Table~\ref{tab:quant-types}.\begin{table}[htbp]\begin{center}\begin{tabular}{cl}\topruleQuantization Type & Usage \\\midrule$0$ & INTRA-mode blocks \\$1$ & Blocks in any other mode. \\\bottomrule\end{tabular}\end{center}\caption{Quantization Type Indices}\label{tab:quant-types}\end{table}%r: I think 'nominally' is more specific than 'generally' hereThe quantization index, on the other hand, nominally represents a progressive range of quality levels, from low quality near zero to high quality near 63.However, the interpretation is arbitrary, and it is possible, for example, to partition the scale into two completely separate ranges with 32 levels each that are meant to represent different classes of source material, or any other arrangement that suits the encoder's requirements.Each quantization matrix is an $8\times 8$ matrix of 16-bit values, which is used to quantize the output of the $8\times 8$ DCT\@.Quantization matrices are specified using three components: a \term{base matrix} and two \term{scale values}.The first scale value is the \term{DC scale}, which is applied to the DC component of the base matrix.The second scale value is the \term{AC scale}, which is applied to all the other components of the base matrix.There are 64 DC scale values and 64 AC scale values, one for each \qi\ value.There are 64 elements in each base matrix, one for each DCT coefficient.They are stored in natural order (cf. Section~\ref{sec:dct-coeffs}).There is a separate set of base matrices for each quantization type and each color plane, with up to 64 possible base matrices in each set, one for each \qi\ value.%r: we will mention that the given matricies must bound the \qi range%r: in the detailed section. it's not important at this level.Typically the bitstream contains matrices for only a sparse subset of the possible \qi\ values.The base matrices for the remainder of the \qi\ values are computed using linear interpolation.This configuration allows the encoder to adjust the quantization matrices to approximate the complex, non-linear response of the human visual system to different quantization errors.Finally, because the in-loop deblocking filter strength depends on the strength of the quantization matrices defined in this header, a table of 64 \term{loop filter limit values} is defined, one for each \qi\ value.The precise specification of how all of this information is decoded appears in Section~\ref{sub:loop-filter-limits} and Section~\ref{sub:quant-params}.\subsection{Huffman Codebooks}Theora uses 80 configurable binary Huffman codes to represent the 32 tokens used to encode DCT coefficients.Each of the 32 token values has a different semantic meaning and is used to represent single coefficient values, zero runs, combinations of the two, and \term{End-Of-Block markers}.The 80 codes are divided up into five groups of 16, with each group corresponding to a set of DCT coefficient indices.The first group corresponds to the DC coefficient, while the remaining four groups correspond to different subsets of the AC coefficients.Within each frame, two pairs of 4-bit codebook indices are stored.The first pair selects which codebooks to use from the DC coefficient group for the $Y'$ coefficients and the $C_b$ and $C_r$ coefficients.The second pair selects which codebooks to use from {\em all four} of the AC coefficient groups for the $Y'$ coefficients and the $C_b$ and $C_r$ coefficients.The precise specification of how the codebooks are decoded appears in Section~\ref{sub:huffman-tables}.\section{High-Level Decode Process}\subsection{Decoder Setup}Before decoding can begin, a decoder MUST be initialized using the bitstream headers corresponding to the stream to be decoded.Theora uses three header packets; all are required, in order, by this specification.Once set up, decode may begin at any intra-frame packet---or even inter-frame packets, provided the appropriate decoded reference frames have already been decoded and cached---belonging to the Theora stream.In Theora I, all packets after the three initial headers are intra-frame or inter-frame packets.The header packets are, in order, the identification header, the comment header, and the setup header.\paragraph{Identification Header}The identification header identifies the stream as Theora, provides a version number, and defines the characteristics of the video stream such as frame size.A complete description of the identification header appears in Section~\ref{sec:idheader}.\paragraph{Comment Header}The comment header includes user text comments (`tags') and a vendor string for the application/library that produced the stream.The format of the comment header is the same as that used in the Vorbis I and Speex codecs, with slight modifications due to the use of a different bit packing mechanism.A complete description of how the comment header is coded appears in Section~\ref{sec:commentheader}, along with a suggested set of tags.\paragraph{Setup Header}The setup header includes extensive codec setup information, including the complete set of quantization matrices and Huffman codebooks needed to decode the DCT coefficients.A complete description of the setup header appears in Section~\ref{sec:setupheader}.\subsection{Decode Procedure}The decoding and synthesis procedure for all video packets is fundamentally the same, with some steps omitted for intra frames.\begin{itemize}\itemDecode packet type flag.\itemDecode frame header.\itemDecode coded block information (inter frames only).\itemDecode macro block mode information (inter frames only).\itemDecode motion vectors (inter frames only).\itemDecode block-level \qi\ information.\itemDecode DC coefficient for each coded block.\itemDecode 1st AC coefficient for each coded block.\itemDecode 2nd AC coefficient for each coded block.\item$\ldots$\itemDecode 63rd AC coefficient for each coded block.\item Perform DC coefficient prediction.\item Reconstruct coded blocks.\item Copy uncoded bocks.\item Perform loop filtering.\end{itemize}\begin{verse}{\bf Note:} clever rearrangement of the steps in this process is possible.As an example, in a memory-constrained environment, one can make multiple passes through the DCT coefficients to avoid buffering them all in memory.On the first pass, the starting location of each coefficient is identified, and then 64 separate get pointers are used to read in the 64 DCT coefficients required to reconstruct each coded block in sequence.This operation produces entirely equivalent output and is naturally perfectly legal.It may even be a benefit in non-memory-constrained environments due to a reduced cache footprint.\end{verse}Theora makes equivalence easy to check by defining all decoding operations in terms of exact integer operations.No floating-point math is required, and in particular, the implementation of the iDCT transform MUST be followed precisely.This prevents the decoder mismatch problem commonly associated with codecs that provide a less rigorous transform specification.Such a mismatch problem would be devastating to Theora, since a single rounding error in one frame could propagate throughout the entire succeeding frame due to DC prediction.\paragraph{Packet Type Decode}Theora I uses four packet types.The first three packet types mark each of the three Theora headers described above.The fourth packet type marks a video packet.All other packet types are reserved; packets marked with a reserved type should be ignored.Additionally, zero-length packets are treated as if they were an inter frame with no blocks coded. That is, as a duplicate frame.\paragraph{Frame Header Decode}The frame header contains some global information about the current frame.The first is the frame type field, which specifies if this is an intra frame or an inter frame.Inter frames predict their contents from previously decoded reference frames.Intra frames can be independently decoded with no established reference frames.The next piece of information in the frame header is the list of \qi\ values allowed in the frame.Theora allows from one to three different \qi\ values to be used in a single frame, each of which selects a set of six quantization matrices, one for each quantization type (inter or intra), and one for each color plane.The first \qi\ value is {\em always} used when dequantizing DC coefficients.The \qi\ value used when dequantizing AC coefficients, however, can vary from block to block.VP3, in contrast, only allows a single \qi\ value per frame for both the DC and AC coefficients.
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?