📄 rfc2429.txt

📁 其中为本人做媒体项目时搜集的一些有关rtp和h264方面的资料.
💻 TXT
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
Network Working GroupRequest for Comments: 2429                                    C. BormannCategory: Standards Track                                   Univ. Bremen                                                                L. Cline                                                              G. Deisher                                                               T. Gardos                                                             C. Maciocco                                                               D. Newell                                                                   Intel                                                                  J. Ott                                                            Univ. Bremen                                                             G. Sullivan                                                              PictureTel                                                               S. Wenger                                                               TU Berlin                                                                  C. Zhu                                                                   Intel                                                            October 1998               RTP Payload Format for the 1998 Version of                    ITU-T Rec. H.263 Video (H.263+)Status of this Memo   This document specifies an Internet standards track protocol for the   Internet community, and requests discussion and suggestions for   improvements.  Please refer to the current edition of the "Internet   Official Protocol Standards" (STD 1) for the standardization state   and status of this protocol.  Distribution of this memo is unlimited.Copyright Notice   Copyright (C) The Internet Society (1998).  All Rights Reserved.1. Introduction   This document specifies an RTP payload header format applicable to   the transmission of video streams generated based on the 1998 version   of ITU-T Recommendation H.263 [4].  Because the 1998 version of H.263   is a superset of the 1996 syntax, this format can also be used with   the 1996 version of H.263 [3], and is recommended for this use by new   implementations.  This format does not replace RFC 2190, which   continues to be used by existing implementations, and may be required   for backward compatibility in new implementations.  Implementations   using the new features of the 1998 version of H.263 shall use the   format described in this document.Bormann, et. al.            Standards Track                     [Page 1]RFC 2429                         H.263+                     October 1998   The 1998 version of ITU-T Recommendation H.263 added numerous coding   options to improve codec performance over the 1996 version.  The 1998   version is referred to as H.263+ in this document.  Among the new   options, the ones with the biggest impact on the RTP payload   specification and the error resilience of the video content are the   slice structured mode, the independent segment decoding mode, the   reference picture selection mode, and the scalability mode.  This   section summarizes the impact of these new coding options on   packetization.  Refer to [4] for more information on coding options.   The slice structured mode was added to H.263+ for three purposes: to   provide enhanced error resilience capability, to make the bitstream   more amenable to use with an underlying packet transport such as RTP,   and to minimize video delay.  The slice structured mode supports   fragmentation at macroblock boundaries.   With the independent segment decoding (ISD) option, a video picture   frame is broken into segments and encoded in such a way that each   segment is independently decodable.  Utilizing ISD in a lossy network   environment helps to prevent the propagation of errors from one   segment of the picture to others.   The reference picture selection mode allows the use of an older   reference picture rather than the one immediately preceding the   current picture.  Usually, the last transmitted frame is implicitly   used as the reference picture for inter-frame prediction.  If the   reference picture selection mode is used, the data stream carries   information on what reference frame should be used, indicated by the   temporal reference as an ID for that reference frame.  The reference   picture selection mode can be used with or without a back channel,   which provides information to the encoder about the internal status   of the decoder.  However, no special provision is made herein for   carrying back channel information.   H.263+ also includes bitstream scalability as an optional coding   mode.  Three kinds of scalability are defined: temporal, signal-to-   noise ratio (SNR), and spatial scalability.  Temporal scalability is   achieved via the disposable nature of bi-directionally predicted   frames, or B-frames. (A low-delay form of temporal scalability known   as P-picture temporal scalability can also be achieved by using the   reference picture selection mode described in the previous   paragraph.)  SNR scalability permits refinement of encoded video   frames, thereby improving the quality (or SNR).  Spatial scalability   is similar to SNR scalability except the refinement layer is twice   the size of the base layer in the horizontal dimension, vertical   dimension, or both.Bormann, et. al.            Standards Track                     [Page 2]RFC 2429                         H.263+                     October 19982. Usage of RTP   When transmitting H.263+ video streams over the Internet, the output   of the encoder can be packetized directly.  All the bits resulting   from the bitstream including the fixed length codes and variable   length codes will be included in the packet, with the only exception   being that when the payload of a packet begins with a Picture, GOB,   Slice, EOS, or EOSBS start code, the first two (all-zero) bytes of   the start code are removed and replaced by setting an indicator bit   in the payload header.   For H.263+ bitstreams coded with temporal, spatial, or SNR   scalability, each layer may be transported to a different network   address.  More specifically, each layer may use a unique IP address   and port number combination.  The temporal relations between layers   shall be expressed using the RTP timestamp so that they can be   synchronized at the receiving ends in multicast or unicast   applications.   The H.263+ video stream will be carried as payload data within RTP   packets.  A new H.263+ payload header is defined in section 4.  This   section defines the usage of the RTP fixed header and H.263+ video   packet structure.2.1 RTP Header Usage   Each RTP packet starts with a fixed RTP header.  The following fields   of the RTP fixed header are used for H.263+ video streams:   Marker bit (M bit): The Marker bit of the RTP header is set to 1 when   the current packet carries the end of current frame, and is 0   otherwise.   Payload Type (PT): The Payload Type shall specify the H.263+ video   payload format.   Timestamp: The RTP Timestamp encodes the sampling instance of the   first video frame data contained in the RTP data packet.  The RTP   timestamp shall be the same on successive packets if a video frame   occupies more than one packet.  In a multilayer scenario, all   pictures corresponding to the same temporal reference should use the   same timestamp.  If temporal scalability is used (if B-frames are   present), the timestamp may not be monotonically increasing in the   RTP stream.  If B-frames are transmitted on a separate layer and   address, they must be synchronized properly with the reference   frames.  Refer to the 1998 ITU-T Recommendation H.263 [4] for   information on required transmission order to a decoder.  For an   H.263+ video stream, the RTP timestamp is based on a 90 kHz clock,Bormann, et. al.            Standards Track                     [Page 3]RFC 2429                         H.263+                     October 1998   the same as that of the RTP payload for H.261 stream [5].  Since both   the H.263+ data and the RTP header contain time information, it is   required that those timing information run synchronously.  That is,   both the RTP timestamp and the temporal reference (TR in the picture   header of H.263) should carry the same relative timing information.   Any H.263+ picture clock frequency can be expressed as   1800000/(cd*cf) source pictures per second, in which cd is an integer   from 1 to 127 and cf is either 1000 or 1001.  Using the 90 kHz clock   of the RTP timestamp, the time increment between each coded H.263+   picture should therefore be a integer multiple of (cd*cf)/20. This   will always be an integer for any "reasonable" picture clock   frequency (for example, it is 3003 for 29.97 Hz NTSC, 3600 for 25 Hz   PAL, 3750 for 24 Hz film, and 1500, 1250 and 1200 for the computer   display update rates of 60, 72 and 75 Hz, respectively).  For RTP   packetization of hypothetical H.263+ bitstreams using "unreasonable"   custom picture clock frequencies, mathematical rounding could become   necessary for generating the RTP timestamps.2.2 Video Packet Structure   A section of an H.263+ compressed bitstream is carried as a payload   within each RTP packet.  For each RTP packet, the RTP header is   followed by an H.263+ payload header, which is followed by a number   of bytes of a standard H.263+ compressed bitstream.  The size of the   H.263+ payload header is variable depending on the payload involved   as detailed in the section 4.  The layout of the RTP H.263+ video   packet is shown as:      0                   1                   2                   3      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+     |    RTP Header                                               ...     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+     |    H.263+ Payload Header                                    ...     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+     |    H.263+ Compressed Data Stream                            ...     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   Any H.263+ start codes can be byte aligned by an encoder by using the   stuffing mechanisms of H.263+.  As specified in H.263+, picture,   slice, and EOSBS starts codes shall always be byte aligned, and GOB   and EOS start codes may be byte aligned.  For packetization purposes,   GOB start codes should be byte aligned; however, since this is not   required in H.263+, there may be some cases where GOB start codes are   not aligned, such as when transmitting existing content, or when   using H.263 encoders that do not support GOB start code alignment.   In this case, follow-on packets (see section 5.2) should be used for   packetization.Bormann, et. al.            Standards Track                     [Page 4]RFC 2429                         H.263+                     October 1998   All H.263+ start codes (Picture, GOB, Slice, EOS, and EOSBS) begin   with 16 zero-valued bits.  If a start code is byte aligned and it   occurs at the beginning of a packet, these two bytes shall be removed   from the H.263+ compressed data stream in the packetization process   and shall instead be represented by setting a bit (the P bit) in the   payload header.3. Design Considerations   The goals of this payload format are to specify an efficient way of   encapsulating an H.263+ standard compliant bitstream and to enhance   the resiliency towards packet losses.  Due to the large number of   different possible coding schemes in H.263+, a copy of the picture   header with configuration information is inserted into the payload   header when appropriate.  The use of that copy of the picture header   along with the payload data can allow decoding of a received packet   even in such cases in which another packet containing the original   picture header becomes lost.   There are a few assumptions and constraints associated with this   H.263+ payload header design.  The purpose of this section is to   point out various design issues and also to discuss several coding   options provided by H.263+ that may impact the performance of   network-based H.263+ video.   o The optional slice structured mode described in Annex K of H.263+     [4] enables more flexibility for packetization.  Similar to a     picture segment that begins with a GOB header, the motion vector     predictors in a slice are restricted to reside within its     boundaries.  However, slices provide much greater freedom in the     selection of the size and shape of the area which is represented as     a distinct decodable region. In particular, slices can have a size     which is dynamically selected to allow the data for each slice to     fit into a chosen packet size. Slices can also be chosen to have a     rectangular shape which is conducive for minimizing the impact of     errors and packet losses on motion compensated prediction.  For     these reasons, the use of the slice structured mode is strongly     recommended for any applications used in environments where     significant packet loss occurs.   o In non-rectangular slice structured mode, only complete slices     should be included in a packet.  In other words, slices should not     be fragmented across packet boundaries.  The only reasonable need     for a slice to be fragmented across packet boundaries is when the     encoder which generated the H.263+ data stream could not be     influenced by an awareness of the packetization process (such as     when sending H.263+ data through a network other than the one to     which the encoder is attached, as in network gatewayBormann, et. al.            Standards Track                     [Page 5]RFC 2429                         H.263+                     October 1998     implementations).  Optimally, each packet will contain only one     slice.   o The independent segment decoding (ISD) described in Annex R of [4]     prevents any data dependency across slice or GOB boundaries in the     reference picture.  It can be utilized to further improve     resiliency in high loss conditions.   o If ISD is used in conjunction with the slice structure, the     rectangular slice submode shall be enabled and the dimensions and     quantity of the slices present in a frame shall remain the same     between each two intra-coded frames (I-frames), as required in     H.263+. The individual ISD segments may also be entirely intra     coded from time to time to realize quick error recovery without     adding the latency time associated with sending complete INTRA-     pictures.   o When the slice structure is not applied, the insertion of a     (preferably byte-aligned) GOB header can be used to provide resync     boundaries in the bitstream, as the presence of a GOB header     eliminates the dependency of motion vector prediction across GOB     boundaries.  These resync boundaries provide natural locations for     packet payload boundaries.   o H.263+ allows picture headers to be sent in an abbreviated form in     order to prevent repetition of overhead information that does not     change from picture to picture.  For resiliency, sending a complete     picture header for every frame is often advisable.  This means that     (especially in cases with high packet loss probability in which     picture header contents are not expected to be highly predictable),     the sender may find it advisable to always set the subfield UFEP in     PLUSPTYPE to '001' in the H.263+ video bitstream.  (See [4] for the     definition of the UFEP and PLUSPTYPE fields).
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -