spec.tex
来自「mediastreamer2是开源的网络传输媒体流的库」· TEX 代码 · 共 1,662 行 · 第 1/5 页
TEX
1,662 行
a more detailed specification of the standards from which its parameters are derived.Some standards do not specify all the parameters necessary.For these unspecified parameters, this document serves as the definition of what should be used when encoding or decoding Theora video.\subsection{Rec.~470M (Rec.~ITU-R~BT.470-6 System M/NTSC with Rec.~ITU-R~BT.601-5)}\label{sec:470m}This color space is used by broadcast television and DVDs in much of the Americas, Japan, Korea, and the Union of Myanmar \cite{rec470}.This color space may also be used for System M/PAL (Brazil), with an appropriate conversion supplied by the encoder to compensate for the different gamma value.See Section~\ref{sec:470bg} for an appropriate gamma value to assume for M/PAL input.In the US, studio monitors are adjusted to a D65 white point ($x_w,y_w=0.313,0.329$).In Japan, studio monitors are adjusted to a D white of 9300K ($x_w,y_w=0.285,0.293$).Rec.~470 does not specify a digital encoding of the color signals.For Theora, Rec.~ITU-R~BT.601-5 \cite{rec601} is used, starting from the $R'G'B'$ signals specified by Rec.~470.Rec.~470 does not specify an input gamma function.For Theora, the Rec.~709 \cite{rec709} input function is assumed.This is the same as that specified by SMPTE 170M \cite{smpte170m}, which claims to reflect modern practice in the creation of NTSC signals circa 1994.The parameters for all the color transformations defined in Section~\ref{sec:color-xforms} are given in Table~\ref{tab:470m}.\begin{table}[htb]\begin{align*}\mathrm{Offset}_{Y,C_b,C_r} & = (16, 128, 128) \\\mathrm{Excursion}_{Y,C_b,C_r} & = (219, 224, 224) \\K_r & = 0.299 \\K_b & = 0.114 \\\gamma & = 2.2 \\\beta & = 0.45 \\\alpha & = 4.5 \\\delta & = 0.018 \\\epsilon & = 0.099 \\x_r,y_r & = 0.67, 0.33 \\x_g,y_g & = 0.21, 0.71 \\x_b,y_b & = 0.14, 0.08 \\\text{(Illuminant C) } x_w,y_w & = 0.310, 0.316 \\\end{align*}\caption{Rec.~470M Parameters}\label{tab:470m}\end{table}\subsection{Rec.~470BG (Rec.~ITU-R~BT.470-6 Systems B and G with Rec.~ITU-R~BT.601-5)}\label{sec:470bg}This color space is used by the PAL and SECAM systems in much of the rest of the world \cite{rec470}This can be used directly by systems (B, B1, D, D1, G, H, I, K, N)/PAL and (B, D, G, H, K, K1, L)/SECAM\@.\begin{verse}{\bf Note:} the Rec.~470BG chromaticity values are different from those specified in Rec.~470M\@.When PAL and SECAM systems were first designed, they were based upon the same primaries as NTSC\@.However, as methods of making color picture tubes have changed, the primaries used have changed as well.The U.S. recommends using correction circuitry to approximate the existing, standard NTSC primaries.Current PAL and SECAM systems have standardized on primaries in accord with more recent technology.\end{verse}Rec.~470 provisionally permits the use of the NTSC chromaticity values (given in Section~\ref{sec:470m}) with legacy PAL and SECAM equipment.In Theora, material must be decoded assuming the new PAL and SECAM primaries.Material intended for display on old legacy devices should be converted by the decoder.The official Rec.~470BG specifies a gamma value of $\gamma=2.8$.However, in practice this value is unrealistically high \cite{Poyn97}.Rec.~470BG states that the overall system gamma should be approximately $\gamma\beta=1.2$.Since most cameras pre-correct with a gamma value of $\beta=0.45$, this suggests an output device gamma of approximately $\gamma=2.67$.This is the value recommended for use with PAL systems in Theora.Rec.~470 does not specify a digital encoding of the color signals.For Theora, Rec.~ITU-R~BT.601-5 \cite{rec601} is used, starting from the $R'G'B'$ signals specified by Rec.~470.Rec.~470 does not specify an input gamma function.For Theora, the Rec 709 \cite{rec709} input function is assumed.The parameters for all the color transformations defined in Section~\ref{sec:color-xforms} are given in Table~\ref{tab:470bg}.\begin{table}[htb]\begin{align*}\mathrm{Offset}_{Y,C_b,C_r} & = (16, 128, 128) \\\mathrm{Excursion}_{Y,C_b,C_r} & = (219, 224, 224) \\K_r & = 0.299 \\K_b & = 0.114 \\\gamma & = 2.67 \\\beta & = 0.45 \\\alpha & = 4.5 \\\delta & = 0.018 \\\epsilon & = 0.099 \\x_r,y_r & = 0.64, 0.33 \\x_g,y_g & = 0.29, 0.60 \\x_b,y_b & = 0.15, 0.06 \\\text{(D65) } x_w,y_w & = 0.313, 0.329 \\\end{align*}\caption{Rec.~470BG Parameters}\label{tab:470bg}\end{table}\section{Pixel Formats}\label{sec:pixfmts}Theora supports several different pixel formats, each of which uses different subsampling for the chroma planes relative to the luma plane.\subsection{4:4:4 Subsampling}\label{sec:444}All three color planes are stored at full resolution - each pixel has a $Y'$, a $C_b$ and a $C_r$ value (see Figure~\ref{fig:pixel444}).The samples in the different planes are all at co-located sites.\begin{figure}[htbp]\begin{center}\includegraphics{pixel444}\end{center}\caption{Pixels encoded 4:4:4}\label{fig:pixel444}\end{figure}% Figure.%YRB YRB%%%%YRB YRB%%%\subsection{4:2:2 Subsampling}\label{sec:422}The $C_b$ and $C_r$ planes are stored with half the horizontal resolution of the $Y'$ plane.Thus, each of these planes has half the number of horizontal blocks as the luma plane (see Figure~\ref{fig:pixel422}).Similarly, they have half the number of horizontal super blocks, rounded up.Macro blocks are defined across color planes, and so their number does not change, but each macro block contains half as many chroma blocks.The chroma samples are vertically aligned with the luma samples, but horizontally centered between two luma samples.Thus, each luma sample has a unique closest chroma sample.A horizontal phase shift may be required to produce signals which use different horizontal chroma sampling locations for compatibility with different systems.\begin{figure}[htbp]\begin{center}\includegraphics{pixel422}\end{center}\caption{Pixels encoded 4:2:2}\label{fig:pixel422}\end{figure}% Figure.%Y RB Y Y RB Y%%%%Y RB Y Y RB Y%%%\subsection{4:2:0 Subsampling}\label{sec:420}The $C_b$ and $C_r$ planes are stored with half the horizontal and half the vertical resolution of the $Y'$ plane.Thus, each of these planes has half the number of horizontal blocks and half the number of vertical blocks as the luma plane, for a total of one quarter the number of blocks (see Figure~\ref{fig:pixel420}).Similarly, they have half the number of horizontal super blocks and half the number of vertical super blocks, rounded up.Macro blocks are defined across color planes, and so their number does not change, but each macro block contains within it one quarter as many chroma blocks.The chroma samples are vertically and horizontally centered between four luma samples.Thus, each luma sample has a unique closest chroma sample.This is the same sub-sampling pattern used with JPEG, MJPEG, and MPEG-1, and was inherited from VP3.A horizontal or vertical phase shift may be required to produce signals which use different chroma sampling locations for compatibility with different systems.\begin{figure}[htbp]\begin{center}\includegraphics{pixel420}\end{center}\caption{Pixels encoded 4:2:0}\label{fig:pixel420}\end{figure}% Figure.%Y Y Y Y%% RB RB%%Y Y Y Y%%%%Y Y Y Y%% RB RB%%Y Y Y Y%%%\subsection{Subsampling and the Picture Region}Although the frame size must be an integral number of macro blocks, and thus both the number of pixels and the number of blocks in each direction must be even, no such requirement is made of the picture region.Thus, when using subsampled pixel formats, careful attention must be paid to which chroma samples correspond to which luma samples.As mentioned above, for each pixel format, there is a unique chroma sample that is the closest to each luma sample.When cropping the chroma planes to the picture region, all the chroma samples corresponding to a luma sample in the cropped picture region must be included.Thus, when dividing the width or height of the picture region by two to obtain the size of the subsampled chroma planes, they must be rounded up.Furthermore, the sampling locations are defined relative to the frame, {\em not} the picture region.When using the 4:2:2 and 4:2:0 formats, the locations of chroma samples relative to the luma samples depends on whether or not the X offset of the picture region is odd.If the offset is even, each column of chroma samples corresponds to two columns of luma samples (see Figure~\ref{fig:pic_even} for an example).The only exception is if the width is odd, in which case the last column corresponds to only one column of luma samples (see Figure~\ref{fig:pic_even_odd}).If the offset is odd, then the first column of chroma samples corresponds to only one column of luma samples, while the remaining columns each correspond to two (see Figure~\ref{fig:pic_odd}).In this case, if the width is even, the last column again corresponds to only one column of luma samples (see Figure~\ref{fig:pic_odd_even}).A similar process is followed with the rows of a picture region of odd height encoded in the 4:2:0 format.If the Y offset is even, each row of chroma samples corresponds to two rows of luma samples (see Figure~\ref{fig:pic_even}), except with an odd height, where the last row corresponds to one row of chroma luna samples only (see Figure~\ref{fig:pic_even_odd}).If the offset is odd, then it is the first row of chroma samples which corresponds to only one row of luma samples, while the remaining rows each correspond to two (Figure~\ref{fig:pic_odd}), except with an even height, where the last row also corresponds to one (Figure~\ref{fig:pic_odd_even}).Encoders should be aware of these differences in the subsampling when using an even or odd offset.In the typical case, with an even width and height, where one expects two rows or columns of luma samples for every row or column of chroma samples, the encoder must take care to ensure that the offsets used are both even.\begin{figure}[htbp]\begin{center}\includegraphics[width=\textwidth]{pic_even}\end{center}\caption{Pixel correspondence between color planes with even picture offset and even picture size}\label{fig:pic_even}\end{figure}\begin{figure}[htbp]\begin{center}\includegraphics[width=\textwidth]{pic_even_odd}\end{center}\caption{Pixel correspondence with even picture offset and odd picture size}\label{fig:pic_even_odd}\end{figure}\begin{figure}[htbp]\begin{center}\includegraphics[width=\textwidth]{pic_odd}\end{center}\caption{Pixel correspondence with odd picture offset and odd picture size}\label{fig:pic_odd}\end{figure}\begin{figure}[htbp]\begin{center}\includegraphics[width=\textwidth]{pic_odd_even}\end{center}\caption{Pixel correspondence with odd picture offset and even picture size}\label{fig:pic_odd_even}\end{figure}\chapter{Bitpacking Convention}\label{sec:bitpacking}\section{Overview}The Theora codec uses relatively unstructured raw packets containing binary integer fields of arbitrary width.Logically, each packet is a bitstream in which bits are written one-by-one by
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?