spec.tex
来自「mediastreamer2是开源的网络传输媒体流的库」· TEX 代码 · 共 1,662 行 · 第 1/5 页
TEX
1,662 行
The details of how individual blocks are organized and how DCT coefficients are stored in the bitstream differ substantially from these codecs, however.Theora supports only intra frames (I frames in MPEG) and inter frames (P frames in MPEG).There is no equivalent to the bi-predictive frames (B frames) found in MPEG codecs.\section{Assumptions}The Theora codec design assumes a complex, psychovisually-aware encoder and a simple, low-complexity decoder.%TODO: Talk more about implementation complexity.Theora provides none of its own framing, synchronization, or protection against transmission errors. An encoder is solely a method of accepting input video frames and compressing these frames into raw, unformatted `packets'.The decoder then accepts these raw packets in sequence, decodes them, and synthesizes a fascimile of the original video frames.Theora is a free-form variable bit rate (VBR) codec, and packets have no minimum size, maximum size, or fixed/expected size.Theora packets are thus intended to be used with a transport mechanism that provides free-form framing, synchronization, positioning, and error correction in accordance with these design assumptions, such as Ogg (for file transport) or RTP (for network multicast).For the purposes of a few examples in this document, we will assume that Theora is embedded in an Ogg stream specifically, although this is by no means a requirement or fundamental assumption in the Theora design.The specification for embedding Theora into an Ogg transport stream is given in Appendix~\ref{app:oggencapsulation}.\section{Codec Setup and Probability Model}Theora's heritage is the proprietary commerical codec VP3, and it retains a fair amount of inflexibility when compared to Vorbis \cite{vorbis}, the first Xiph.org codec, which began as a research codec.However, to provide additional scope for encoder improvement, Theora adopts some of the configurable aspects of decoder setup that are present in Vorbis.This configuration data is not available in VP3, which uses hardcoded values instead.Theora makes the same controversial design decision that Vorbis made to include the entire probability model for the DCT coefficients and all the quantization parameters in the bitstream headers.This is often several hundred fields.It is therefore impossible to decode any frame in the stream without having previously fetched the codec info and codec setup headers.\begin{verse}{\bf Note:} Theora {\em can} initiate decode at an arbitrary intra-frame packet within a bitstream so long as the codec has been initialized with the setup headers.\end{verse}Thus, Theora headers are both required for decode to begin and relatively large as bitstream headers go.The header size is unbounded, although as a rule-of-thumb less than 16kB is recommended, and Xiph.org's reference encoder follows this suggestion.%TODO: Is 8kB enough? My setup header is 7.4kB, that doesn't leave much room% for comments.%RG: the lesson from vorbis is that as small as possible is really% important in some applications. Practically, what's acceptable% depends a great deal on the target bitrate. I'd leave 16 kB in the% spec for now. fwiw more than 1k of comments is quite unusual.Our own design work indicates that the primary liability of the required header is in mindshare; it is an unusual design and thus causes some amount of complaint among engineers as this runs against current design trends and points out limitations in some existing software/interface designs.However, we find that it does not fundamentally limit Theora's suitable application space.%silvia: renamed%\subsection{Format Specification}\section{Format Conformance}The Theora format is well-defined by its decode specification; any encoder that produces packets that are correctly decoded by an implementation following this specification may be considered a proper Theora encoder.A decoder must faithfully and completely implement the specification defined herein %, except where noted, to be considered a conformant Theora decoder.A decoder need not be implemented strictly as described, but the actual decoder process MUST be {\em entirely mathematically equivalent} to the described process.Where appropriate, a non-normative description of encoder processes is included.These sections will be marked as such, and a proper Theora encoder is not bound to follow them.%TODO: \subsection{Hardware Profile}\chapter{Coded Video Structure}Theora's encoding and decoding process is based on $8\times 8$ blocks of pixels.This sections describes how a video frame is laid out, divided into blocks, and how those blocks are organized.\section{Frame Layout}A video frame in Theora is a two-dimensional array of pixels.Theora, like VP3, uses a right-handed coordinate system, with the origin in the lower-left corner of the frame.This is contrary to many video formats which use a left-handed coordinate system with the origin in the upper-left corner of the frame.%INT: This means that for interlaced material, the definition of `even fields'%INT: and `odd fields' may be reversed between Theora and other video codecs.%INT: This document will always refer to them as `top fields' and `bottom%INT: fields'.Theora divides the pixel array up into three separate \term{color planes}, one for each of the $Y'$, $C_b$, and $C_r$ components of the pixel.The $Y'$ plane is also called the \term{luma plane}, and the $C_b$ and $C_r$ planes are also called the \term{chroma planes}.Each plane is assigned a numerical value, as shown in Table~\ref{tab:color-planes}.\begin{table}[htbp]\begin{center}\begin{tabular}{cl}\topruleIndex & Color Plane \\\midrule$0$ & $Y'$ \\$1$ & $C_b$ \\$2$ & $C_r$ \\\bottomrule\end{tabular}\end{center}\caption{Color Plane Indices}\label{tab:color-planes}\end{table}In some pixel formats, the chroma planes are subsampled by a factor of two in one or both directions.This means that the width or height of the chroma planes may be half that of the total frame width and height.The luma plane is never subsampled.\section{Picture Region}An encoded video frame in Theora is required to have a width and height that are multiples of sixteen, making an integral number of blocks even when the chroma planes are subsampled.However, inside a frame a smaller \term{picture region} may be defined to present material whose dimensions are not a multiple of sixteen pixels, as shown in Figure~\ref{fig:pic-frame}.The picture region can be offset from the lower-left corner of the frame by up to 255 pixels in each direction, and may have an arbitrary width and height, provided that it is contained entirely within the coded frame.It is this picture region that contains the actual video data.The portions of the frame which lie outside the picture region may contain arbitrary image data, so the frame must be cropped to the picture region before display.The picture region plays no other role in the decode process, which operates on the entire video frame.\begin{figure}[htbp]\begin{center}\includegraphics{pic-frame}\end{center}\caption{Location of frame and picture regions}\label{fig:pic-frame}\end{figure}\section{Blocks and Super Blocks}\label{sec:blocks-and-sbs}Each color plane is subdivided into \term{blocks} of $8\times 8$ pixels.Blocks are grouped into $4\times 4$ arrays called \term{super blocks} as shown in Figure~\ref{fig:superblock}.Each color plane has its own set of blocks and super blocks.If the chroma planes are subsampled, they are still divided into $8\times 8$ blocks of pixels; there are just fewer blocks than in the luma plane.The boundaries of blocks and super blocks in the luma plane do not necessarily coincide with those of the chroma planes, if the chroma planes have been subsampled.\begin{figure}[htbp]\begin{center}\includegraphics{superblock}\end{center}\caption{Subdivision of a frame into blocks and super blocks}\label{fig:superblock}\end{figure}Blocks are accessed in two different orders in the various decoder processes.The first is \term{raster order}, illustrated in Figure~\ref{fig:raster-block}.This accesses each block in row-major order, starting in the lower left of the frame and continuing along the bottom row of the entire frame, followed by the next row up, starting on the left edge of the frame, etc.\begin{figure}[htbp]\begin{center}\includegraphics{raster-block}\end{center}\caption{Raster ordering of $n\times m$ blocks}\label{fig:raster-block}\end{figure}The second is \term{coded order}.In coded order, blocks are accessed by super block.Within each frame, super blocks are traversed in raster order, similar to raster order for blocks.Within each super block, however, blocks are accessed in a Hilbert curve pattern, illustrated in Figure~\ref{fig:hilbert-block}.If a color plane does not contain a complete super block on the top or right sides, the same ordering is still used, simply with any blocks outside the frame boundary ommitted.\begin{figure}[htbp]\begin{center}\includegraphics{hilbert-block}\end{center}\caption{Hilbert curve ordering of blocks within a super block}\label{fig:hilbert-block}\end{figure}To illustrate this ordering, consider a frame that is 240 pixels wide and 48 pixels high.Each row of the luma plane has 30 blocks and 8 super blocks, and there are 6 rows of blocks and two rows of super blocks.%When accessed in raster order, each block in the luma plane is assigned the% following indices:%\vspace{\baselineskip}%\begin{center}%\begin{tabular}{|ccccccc|}\hline%150 & 151 & 152 & 153 & $\ldots$ & 178 & 179 \\%120 & 121 & 122 & 123 & $\ldots$ & 148 & 149 \\\hline% 90 & 91 & 92 & 93 & $\ldots$ & 118 & 119 \\% 60 & 61 & 62 & 63 & $\ldots$ & 88 & 89 \\% 30 & 31 & 32 & 33 & $\ldots$ & 58 & 59 \\% 0 & 1 & 2 & 3 & $\ldots$ & 28 & 29 \\\hline%\end{tabular}%\end{center}%\vspace{\baselineskip}When accessed in coded order, each block in the luma plane is assigned the following indices:\vspace{\baselineskip}\begin{center}\begin{tabular}{|cccc|c|cc|}\hline123 & 122 & 125 & 124 & $\ldots$ & 179 & 178 \\120 & 121 & 126 & 127 & $\ldots$ & 176 & 177 \\\hline 5 & 6 & 9 & 10 & $\ldots$ & 117 & 118 \\ 4 & 7 & 8 & 11 & $\ldots$ & 116 & 119 \\ 3 & 2 & 13 & 12 & $\ldots$ & 115 & 114 \\ 0 & 1 & 14 & 15 & $\ldots$ & 112 & 113 \\\hline\end{tabular}\end{center}\vspace{\baselineskip}Here the index values specify the order in which the blocks would be accessed.The indices of the blocks are numbered continuously from one color plane to the next.They do not reset to zero at the start of each plane.Instead, the numbering increases continuously from the $Y'$ plane to the $C_b$ plane to the $C_r$ plane.The implication is that the blocks from all planes are treated as a unit during the various processing steps.Although blocks are sometimes accessed in raster order, in this document the index associated with a block is {\em always} its index in coded order.\section{Macro Blocks}\label{sec:mbs}A macro block contains a $2\times 2$ array of blocks in the luma plane {\em and} the co-located blocks in the chroma planes, as shown in Figure~\ref{fig:macroblock}.Thus macro blocks can represent anywhere from six to twelve blocks, depending on how the chroma planes are subsampled.This is in contrast to super blocks, which only contain blocks from a single color plane.% the whole super vs. macro blocks thing is a little confusing, and it can be% hard to remember which is what initially. A figure would/will help here,% but I tried to add some text emphasizing the difference in terms of% functionality.%TBT: At this point we haven't described any functionality yet.%TBT: As far as the reader knows, the only purpose of the blocks, macro blocks%TBT: and super blocks is for data organization---and for blocks and super%TBT: blocks, this is essentially true.%TBT: So lets restrict the differences we emphasize to those of data%TBT: organization, which the sentence I just added above does.Macro blocks contain information about coding mode and motion vectors for the corresponding blocks in all color planes.\begin{figure}[htbp] \begin{center} \includegraphics{macroblock} \end{center} \caption{Subdivision of a frame into macro blocks} \label{fig:macroblock}\end{figure}Macro blocks are also accessed in a \term{coded order}.This coded order proceeds by examining each super block in the luma plane in raster order, and traversing the four macro blocks inside using a smaller Hilbert curve, as shown in Figure~\ref{fig:hilbert-mb}.%r: I rearranged the wording to make a more formal idiom hereIf the luma plane does not contain a complete super block on the top or right sides, the same ordering is still used, with any macro blocks outside the frame boundary simply omitted.Because the frame size is constrained to be a multiple of 16, there are never any partial macro blocks.Unlike blocks, macro blocks need never be accessed in a pure raster order.\begin{figure}[htbp]\begin{center}\includegraphics{hilbert-mb}\end{center}\caption{Hilbert curve ordering of macro blocks within a super block}\label{fig:hilbert-mb}\end{figure}Using the same frame size as the example above, there are 15 macro blocks in each row and 3 rows of macro blocks.The macro blocks are assigned the following indices:\vspace{\baselineskip}\begin{center}\begin{tabular}{|cc|cc|c|cc|c|}\hline30 & 31 & 32 & 33 & $\cdots$ & 42 & 43 & 44 \\\hline 1 & 2 & 5 & 6 & $\cdots$ & 25 & 26 & 29 \\ 0 & 3 & 4 & 7 & $\cdots$ & 24 & 27 & 28 \\\hline\end{tabular}\end{center}\vspace{\baselineskip}
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?