rfc2190.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 676 行 · 第 1/2 页

TXT
676
字号






Network Working Group                                             C. Zhu
Request for Comments: 2190                                   Intel Corp.
Category: Standards Track                                 September 1997


               RTP Payload Format for H.263 Video Streams

Status of This Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

   This document specifies the payload format for encapsulating an H.263
   bitstream in the Real-Time Transport Protocol (RTP). Three modes are
   defined for the H.263 payload header. An RTP packet can use one of
   the three modes for H.263 video streams depending on the desired
   network packet size and H.263 encoding options employed. The shortest
   H.263 payload header (mode A) supports fragmentation at Group of
   Block (GOB) boundaries. The long H.263 payload headers (mode B and C)
   support fragmentation at Macroblock (MB) boundaries.

1. Introduction

   This document describes a scheme to packetize an H.263 video stream
   for transport using RTP [1]. H.263 video stream is defined by ITU-T
   Recommendation H.263 (referred to as H.263 in this document) [4] for
   video coding at very low data rates. RTP is defined by the Internet
   Engineering Task Force (IETF) to provide end-to-end network transport
   functions suitable for applications transmitting real-time data over
   multicast or unicast network services.

2. Definitions

   The following definitions apply in this document:

   CIF: Common Intermediate Format. For H.263, a CIF picture has 352 x
   288 pixels for luminance, and 176 x 144 pixels for chrominance.

   QCIF: Quarter CIF source format with 176 x 144 pixels for luminance
   and 88 x 72 pixels for chrominance.

   Sub-QCIF:  picture source format with 128 x 96 pixels for luminance
   and 64 x 48 pixels for chrominance.



Zhu                         Standards Track                     [Page 1]

RFC 2190       RTP Payload Format for H.263 Video Streams September 1997


   4CIF: Picture source format with 704 x 576 pixels for luminance and
   352 x 288 pixels for chrominance.

   16CIF: Picture source format with 1408 x 1152 pixels for luminance
   and 704 x 576 pixels for chrominance.

   GOB: For H.263, a Group of Blocks (GOB) consists of  k*16 lines,
   where k depends on the picture format (k=1 for QCIF, CIF and sub-
   QCIF; k=2 for 4CIF and k=4 for 16CIF).

   MB: A macroblock (MB) contains four blocks of luminance and the
   spatially corresponding two blocks of chrominance. Each block
   consists of 8x8 pixels. For example, there are eleven MBs in a GOB in
   QCIF format and twenty two MBs in a GOB in CIF format.

3. Design Issues for Packetizing H.263 Bitstreams

   H.263 is based on the ITU-T Recommendation H.261 [2] (referred to as
   H.261 in this document). Compared to H.261, H.263 employs similar
   techniques to reduce both temporal and spatial redundancy, but there
   are several major differences between the two algorithms that affect
   the design of packetization schemes significantly. This section
   summarizes those differences.

3.1 Optional Features of H.263

   In addition to the basic source coding algorithms, H.263 supports
   four negotiable coding options to improve performance: Advanced
   Prediction, PB-frames, Syntax-based Arithmetic Coding, and
   Unrestricted Motion Vectors. They can be used in any combination.

   Advanced Prediction(AP): One or four motion vectors can be used for
   some macroblocks in a frame. This feature makes recovery from packet
   loss difficult, because more redundant information has to be
   preserved at the beginning of a packet when fragmenting at a
   macroblock boundary.

   PB-frames:  Two frames (a P frame and a B frame) are coded into one
   bitstream with macroblocks from the two frames interleaved. From a
   packetization point of view, a MB from the P frame and a MB from the
   B frame must be treated together because each MB for the B frame is
   coded based on the corresponding MB for the P frame. A means must be
   provided to ensure proper rendering of two frames in the right order.
   Also, if part of this combined bitstream is lost, it will affect both
   frames, and possibly more.






Zhu                         Standards Track                     [Page 2]

RFC 2190       RTP Payload Format for H.263 Video Streams September 1997


   Syntax-based Arithmetic Coding (SAC): When the SAC option is used,
   the resultant run-value pair after quantization of Discrete Cosine
   Transform (DCT) coefficients will be coded differently from Huffman
   codes, but the macroblock hierarchy will be preserved. Since context
   variables are only synchronized after fixed length codes in the
   bitstream, any fragmentation starting at variable length codes will
   result in difficulty in decoding in the presence of packet loss
   without carrying the values of all the context variables in each
   H.263 payload header.

   The Unrestricted motion vectors feature allows large range of motion
   vectors to improve performance of motion compensation for inter-coded
   pictures. This option also affects packetization because it uses
   larger range of motion vectors than normal.

   To enable proper decoding of packets received, without dependency on
   previous packets, the use of these optional features is signaled in
   the H.263 payload header, as described in Section 5.

3.2 GOB Numbering

   In H.263, each picture is divided into groups of blocks (GOB). GOBs
   are numbered according to a vertical scan of a picture, starting with
   the top GOB and ending with the bottom GOB. In contrast, a GOB in
   H.261 is composed of three rows of 16x16 MB for QCIF, and three
   half-rows of MBs for CIF. A GOB is divided into macroblocks in H.263
   and the definition of the macroblocks are the same as in H.261.

   Each GOB in H.263 can have a fixed GOB header, but the use of the
   header is optional. If the GOB header is present, it may or may not
   start on a byte boundary. Byte alignment can be achieved by proper
   bit stuffing by the encoder, but it is not required by the H.263
   bitstream specification [4].

   In summary, a GOB in H.263 is defined and coded with finer
   granularity but with the same source format, resulting in more
   flexibility for packetization than with H.261.

3.3 Motion Vector Encoding

   Differential coding is used to code motion vectors as variable length
   codes. Unlike in H.261, where each motion vector is predicted from
   the previous MB in the GOB, H.263 employs a more flexible prediction
   scheme, where one or three candidate predictors could be used
   depending on the presence of GOB headers.






Zhu                         Standards Track                     [Page 3]

RFC 2190       RTP Payload Format for H.263 Video Streams September 1997


   If the GOB header is present in a GOB, motion vectors are coded with
   reference to MBs in the current GOB only. If a GOB header is not
   present in the current GOB, three motion vectors must be available to
   decode one macroblock, where two of them might come from the previous
   GOB. To correctly decode a whole inter-coded GOB, all the motion
   vectors for MBs in the previous GOB  must be available to compute the
   predictors or the predictors themselves must be present. The optional
   use of three motion vector predictors can be a major problem for a
   packetization scheme like the one defined for H.261 when packetizing
   at MB boundaries [5].

   Consider the case that a packet starts with a MB but the GOB header
   is not present. If the previous packet is lost, then all the motion
   vectors needed to predict the motion vectors for the MBs in the
   current GOB are not available. In order to decode the received MBs
   correctly, all the motion vectors for the previous GOB or the motion
   vector predictors would have to be duplicated at the beginning of the
   packet. This kind of duplication would be very expensive and
   unacceptable in terms of bandwidth overhead.

   The encoding strategy of each H.263 CODEC (CODer and DECoder)
   implementation is beyond the scope of this document, even though it
   has significant effect on visual quality in the presence of packet
   loss. However, we strongly recommend use of the GOB header for every
   GOB at the beginning of a packet to address this problem.

   Similar problems exist because of cross-GOB data dependency related
   to motion vectors, but they can not be addressed by using the GOB
   header. For 16CIF and 4CIF pictures, a GOB contains more than one row
   of MBs. If a GOB can not fit in one RTP packet, and the first packet
   containing the GOB header is lost, then MBs in the second packet can
   not compute motion vectors correctly, because they are coded relative
   to data in the lost packet. Similarly,  when OBMC (Overlapped Block
   Motion Compensation) [4] in Advanced Prediction mode is used, motion
   compensation for some MBs in one GOB could use motion vectors of MBs
   in previous GOB regardless of the presence of GOB header. When MBs
   that are used to decode received MBs are lost, those received MBs can
   not be decoded correctly. Each implementation of the method described
   in this document should take these limitations into account.












Zhu                         Standards Track                     [Page 4]

RFC 2190       RTP Payload Format for H.263 Video Streams September 1997


3.4 Macroblock Address

   As specified by H.261, a macroblock address (MBA) is encoded with a
   variable length code to indicate the position of a macroblock within
   a group of MBs in H.261 bitstreams. H.263 does not code the MBA
   explicitly, but the macroblock address within a GOB is necessary to
   recover from packet loss when fragmenting at MB boundaries.
   Therefore, this information must be included in the H.263 payload
   header for modes (mode B and mode C as described in Section 5) that
   allow packetization at MB boundaries.

4. Usage of RTP

   When transmitting H.263 video streams over the Internet, the output
   of the encoder can be packetized directly. For every video frame, the
   H.263 bitstream itself is carried in the RTP payload without
   alteration, including the picture start code, the entire picture
   header, in addition to any fixed length codes and variable length
   codes.  In addition, the output of the encoder is packetized without
   adding the framing information specified by H.223 [6]. Therefore
   multiplexing audio and video signals in the same packet is not
   accommodated, as UDP and RTP provide a much more efficient way to
   achieve multiplexing.

   RTP does not guarantee a reliable and orderly data delivery service,
   so a packet might get lost in the network. To achieve a best-effort
   recovery from packet loss, the decoder needs assistance to proceed
   with decoding of other packets that are received. Thus it is
   desirable to be able to process each packet independent of other
   packets. Some frame level information is included in each packet,
   such as source format and flags for optional features to assist the
   decoder in operating correctly and efficiently in presence of packet
   loss. The flags for H.263 optional features also provide information
   about coding options used in H.263 video bitstreams that can be used
   by session management tools.

   H.263 video bitstreams will be carried as payload data within RTP
   packets. A new H.263 payload header is defined in section 5 on the
   H.263 payload header. This section defines the usage of RTP fixed
   header and H.263 video packet structure.

4.1 RTP Header Usage

   Each RTP packet starts with a fixed RTP header [1]. The following
   fields of the RTP fixed header are used for H.263 video streams:






Zhu                         Standards Track                     [Page 5]

RFC 2190       RTP Payload Format for H.263 Video Streams September 1997


   Marker bit (M bit): The Marker bit of the RTP fixed header is set to
   1 when the current packet carries the end of current frame; set to 0
   otherwise.

   Payload Type (PT): The Payload Type shall specify H.263 video payload
   format using the value specified by the RTP profile in use, for
   example RFC 1890 [3].

   Timestamp: The RTP timestamp encodes the sampling instant of the
   video frame contained in the RTP data packet. The RTP timestamp may
   be the same  on successive packets if a video frame occupies more
   than one packet. For H.263 video streams, the RTP timestamp is based
   on a 90 kHz clock, the same as the RTP timestamp for H.261 video
   streams [5].

4.2 Video Packet Structure

   For each RTP packet, the RTP fixed header is followed by the H.263
   payload header, which is followed by the standard H.263 compressed
   bitstream [4].

   The size of the H.263 payload header is variable depending on modes
   used as detailed in the next section. The layout of an RTP H.263
   video packet is shown as:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 RTP header                                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 H.263 payload header                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 H.263 bitstream                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


5. H.263 Payload Header

   For H.263 video streams, each RTP packet carries only one H.263 video
   packet. The H.263 payload header is always present for each H.263
   video packet.

   Three formats (mode A, mode B and mode C) are defined for H.263
   payload header. In mode A, an H.263 payload header of four bytes is
   present before actual compressed H.263 video bitstream in a packet.
   It allows fragmentation at GOB boundaries. In mode B, an eight byte
   H.263 payload header is used and each packet starts at MB boundaries
   without the PB-frames option. Finally, a twelve byte H.263 payload



Zhu                         Standards Track                     [Page 6]

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?