spec.tex

来自「mediastreamer2是开源的网络传输媒体流的库」· TEX 代码 · 共 1,662 行 · 第 1/5 页

TEX
1,662
字号
\paragraph{Coded Block Information}This stage determines which blocks in the frame are coded and which are uncoded.A \term{coded block list} is constructed which lists all the coded blocks in coded order.For intra frames, every block is coded, and so no data needs to be read from the packet.\paragraph{Macro Block Mode Information}For intra frames, every block is coded in INTRA mode, and this stage is skipped.In inter frames a \term{coded macro block list} is constructed from the coded block list.Any macro block which has at least one of its luma blocks coded is considered coded; all other macro blocks are uncoded, even if they contain coded chroma blocks.A coding mode is decoded for each coded macro block, and assigned to all its constituent coded blocks.All coded chroma blocks in uncoded macro blocks are assigned the INTER\_NOMV coding mode.\paragraph{Motion Vectors}Intra frames are coded entirely in INTRA mode, and so this stage is skipped.Some inter coding modes, however, require one or more motion vectors to be specified for each macro block.These are decoded in this stage, and an appropriate motion vector is assigned to each coded block in the macro block.\paragraph{Block-Level \qi\ Information}If a frame allows multiple \qi\ values, the \qi\ value assigned to each block is decoded here.Frames that use only a single \qi\ value have nothing to decode.\paragraph{DCT Coefficients}Finally, the quantized DCT coefficients are decoded.A list of DCT coefficients in zig-zag order for a single block is represented by a list of tokens.A token can take on one of 32 different values, each with a different semantic meaning.A single token can represent a single DCT coefficient, a run of zero coefficients within a single block, a combination of a run of zero coefficients followed by a single non-zero coefficient, an \term{End-Of-Block marker}, or a run of EOB markers.EOB markers signify that the remainder of the block is one long zero run.Unlike JPEG and MPEG, there is no requirement for each block to end with  a special marker.If non-EOB tokens yield values for all 64 of the coefficients in a block, then no EOB marker occurs.Each token is associated with a specific \term{token index} in a block.For single-coefficient tokens, this index is the zig-zag index of the token in the block.For zero-run tokens, this index is the zig-zag index of the {\em first} coefficient in the run.For combination tokens, the index is again the zig-zag index of the first coefficient in the zero run.For EOB markers, which signify that the remainder of the block is one long zero run, the index is the zig-zag index of the first zero coefficient in that run.For EOB runs, the token index is that of the first EOB marker in the run.Due to zero runs and EOB markers, a block does not have to have a token for every zig-zag index.Tokens are grouped in the stream by token index, not by the block they originate from.This means that for each zig-zag index in turn, the tokens with that index from {\em all} the coded blocks are coded in coded block order.When decoding, a current token index is maintained for each coded block.This index is advanced by the number of coefficients that are added to the block as each token is decoded.After fully decoding all the tokens with token index \ti, the current token index of every coded block will be \ti\ or greater.If an EOB run of $n$ blocks is decoded at token index \ti, then it ends the next $n$ blocks in coded block order whose current token index is equal to \ti, but not greater.If there are fewer than $n$ blocks with a current token index of \ti, then the decoder goes through the coded block list again from the start, ending blocks with a current token index of $\ti+1$, and so on, until $n$ blocks have been ended.Tokens are read by parsing a Huffman code that depends on \ti\ and the color plane of the next coded block whose current token index is equal to \ti, but not greater.The Huffman codebooks are selected on a per-frame basis from the 80 codebooks defined in the setup header.Many tokens have a fixed number of \term{extra bits} associated with them.These bits are read from the packet immediately after the token is decoded.These are used to define things such as coefficient magnitude, sign, and the length of runs.\paragraph{DC Prediction}After the coefficients for each block are decoded, the quantized DC value of each block is adjusted based on the DC values of its neighbors.This adjustment is performed by scanning the blocks in raster order, not coded block order.\paragraph{Reconstruction}Finally, using the coding mode, motion vector (if applicable), quantized coefficient list, and \qi\ value defined for each block, all the coded blocks are reconstructed.The DCT coefficients are dequantized, an inverse DCT transform is applied, and the predictor is formed from the coding mode and motion vector and added to the result.\paragraph{Loop Filtering}To complete the reconstructed frame, an ``in-loop'' deblocking filter is applied to the edges of all coded blocks.\chapter{Video Formats}This section gives a precise description of the video formats that Theora is capable of storing.The Theora bitstream is capable of handling video at any arbitrary resolution up to $1048560\times 1048560$.Such video would require almost three terabytes of storage per frame for uncompressed data, so compliant decoders MAY refuse to decode images with sizes beyond their capabilities.%TODO: What MUST a "compliant" decoder accept?%TODO: What SHOULD a decoder use for an upper bound? (derive from total amount%TODO:  of memory and memory bandwidth)%TODO: Any lower limits?%TODO: We really need hardware device profiles, but such things should be%TODO:  developed with input from the hardware community.%TODO: And even then sometimes they're uselessThe remainder of this section talks about two specific aspects of the video format: the color space and the pixel format.The first describes how color is represented and how to transform that color representation into a device independent color space such as CIE $XYZ$ (1931).The second describes the various schemes for sampling the color values in time and space.\section{Color Space Conventions}There are a large number of different color standards used in digital video.Since Theora is a lossy codec, it restricts itself to only a few of them to simplify playback.Unlike the alternate method of describing all the parameters of the color model, this allows a few dedicated routines for color conversion to be written and heavily optimized in a decoder.More flexible conversion functions should instead be specified in an encoder, where additional computational complexity is more easily tolerated.The color spaces were selected to give a fair representation of color standards in use around the world today.Most of the standards that do not exactly match one of these can be converted to one fairly easily.All Theora color spaces are $Y'C_bC_r$ color spaces with one luma channel and two chroma channels.Each channel contains 8-bit discrete values in the range $0\ldots255$, which represent non-linear gamma pre-corrected signals.The Theora identification header contains an 8-bit value that describes the color space.This merely selects one of the color spaces available from an enumerated list.Currently, only two color spaces are defined, with a third possibility that indicates the color space is ``unknown".\section{Color Space Conversions and Parameters}\label{sec:color-xforms}The parameters which describe the conversions between each color space are listed below.These are the parameters needed to map colors from the encoded $Y'C_bC_r$ representation to the device-independent color space CIE $XYZ$ (1931).These parameters define abstract mathematical conversion functions which are infinitely precise.The accuracy and precision with which the conversions are performed in a real system is determined by the quality of output desired and the available processing power.Exact decoder output is defined by this specification only in the original $Y'C_bC_r$ space.\begin{description}\item[$Y'C_bC_r$ to $Y'P_bP_r$:]\vspace{\baselineskip}\hfillThis conversion takes 8-bit discrete values in the range $[0\ldots255]$ and maps them to real values in the range $[0\ldots1]$ for Y and $[-\frac{1}{2}\ldots\frac{1}{2}]$ for $P_b$ and $P_r$.Because some values may fall outside the offset and excursion defined for each channel in the $Y'C_bC_r$ space, the results may fall outside these ranges in $Y'P_bP_r$ space.No clamping should be done at this stage.\begin{align}Y'_\mathrm{out} & = \frac{Y'_\mathrm{in}-\mathrm{Offset}_Y}{\mathrm{Excursion}_Y} \\P_b             & = \frac{C_b-\mathrm{Offset}_{C_b}}{\mathrm{Excursion}_{C_b}} \\P_r             & = \frac{C_r-\mathrm{Offset}_{C_r}}{\mathrm{Excursion}_{C_r}}\end{align}Parameters: $\mathrm{Offset}_{Y,C_b,C_r}$, $\mathrm{Excursion}_{Y,C_b,C_r}$.\item[$Y'P_bP_r$ to $R'G'B'$:]\vspace{\baselineskip}\hfillThis conversion takes the one luma and two chroma channel representation and maps it to the non-linear $R'G'B'$ space used to drive actual output devices.Values should be clamped into the range $[0\ldots1]$ after this stage.\begin{align}R' & = Y'+2(1-K_r)P_r \\G' & = Y'-2\frac{(1-K_b)K_b}{1-K_b-K_r}P_b-2\frac{(1-K_r)K_r}{1-K_b-K_r}P_r\\B' & = Y'+2(1-K_b)P_b\end{align}Parameters: $K_b,K_r$.\item[$R'G'B'$ to $RGB$ (Output device gamma correction):]\vspace{\baselineskip}\hfillThis conversion takes the non-linear $R'G'B'$ voltage levels and maps them to linear light levels produced by the actual output device.Note that this conversion is only that of the output device, and its inverse is {\em not} that used by the input device.Because a dim viewing environment is assumed in most television standards, the overall gamma between the input and output devices is usually around $1.1$ to $1.2$, and not a strict $1.0$.For calibration with actual output devices, the model\begin{align}L & =(E'+\Delta)^\gamma\end{align} should be used, with $\Delta$ the free parameter and $\gamma$ held fixed to the value specified in this document.The conversion function presented here is an idealized version with $\Delta=0$.\begin{align}R & = R'^\gamma \\G & = G'^\gamma \\B & = B'^\gamma\end{align}Parameters: $\gamma$.\item[$RGB$ to $R'G'B'$ (Input device gamma correction):]\vspace{\baselineskip}\hfill%TODO: Tag section as non-normativeThis conversion takes linear light levels and maps them to the non-linear voltage levels produced in the actual input device.This information is merely informative.It is not required for building a decoder or for converting between the various formats and the actual output capabilities of a particular device.A linear segment is introduced on the low end to reduce noise in dark areas of the image.The rest of the scale is adjusted so that the power segment of the curve intersects the linear segment with the proper slope, and so that it still maps 0 to 0 and 1 to 1.\begin{align}R' & = \left\{\begin{array}{ll}\alpha R,                     & 0\le R<\delta   \\(1+\epsilon)R^\beta-\epsilon, & \delta\le R\le1\end{array}\right. \\G' & = \left\{\begin{array}{ll}\alpha G,                     & 0\le G<\delta   \\(1+\epsilon)G^\beta-\epsilon, & \delta\le G\le1\end{array}\right. \\B' & = \left\{\begin{array}{ll}\alpha B,                     & 0\le B<\delta   \\(1+\epsilon)B^\beta-\epsilon, & \delta\le B\le1\end{array}\right.\end{align}Parameters: $\beta$, $\alpha$, $\delta$, $\epsilon$.\item[$RGB$ to CIE $XYZ$ (1931):]\vspace{\baselineskip}\hfillThis conversion maps a device-dependent linear RGB space to the device-independent linear CIE $XYZ$ space.The parameters are the CIE chromaticity coordinates of the three primaries---red, green, and blue---as well as the chromaticity coordinates of the white point of the device.This is how hardware manufacturers and standards typically describe a particular $RGB$ space.The math required to convert these parameters into a useful transformation matrix is reproduced below.\begin{align}F                  & =\left[\begin{array}{ccc}\frac{x_r}{y_r}       & \frac{x_g}{y_g}       & \frac{x_b}{y_b}       \\1                     & 1                     & 1                     \\\frac{1-x_r-y_r}{y_r} & \frac{1-x_g-y_g}{y_g} & \frac{1-x_b-y_b}{y_b}\end{array}\right] \\\left[\begin{array}{c}s_r \\s_g \\s_b\end{array}\right] & =F^{-1}\left[\begin{array}{c}\frac{x_w}{y_w} \\1 \\\frac{1-x_w-y_w}{y_w}\end{array}\right] \\\left[\begin{array}{c}X \\Y \\Z\end{array}\right] & =F\left[\begin{array}{c}s_rR \\s_gG \\s_bB\end{array}\right]\end{align}Parameters: $x_r,x_g,x_b,x_w, y_r,y_g,y_b,y_w$.\end{description}\section{Available Color Spaces}\label{sec:colorspaces}These are the color spaces currently defined for use by Theora video.Each one has a short name, with which it is referred to in this document, and

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?