rfc2038.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 620 行 · 第 1/2 页

TXT
620
字号






Network Working Group                                         D. Hoffman
Request for Comments: 2038                                   G. Fernando
Category: Standards Track                         Sun Microsystems, Inc.
                                                                V. Goyal
                                                  Precept Software, Inc.
                                                            October 1996


                RTP Payload Format for MPEG1/MPEG2 Video

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Abstract

   This memo describes a packetization scheme for MPEG video and audio
   streams.  The scheme proposed can be used to transport such a video
   or audio flow over the transport protocols supported by RTP.  Two
   approaches are described. The first is designed to support maximum
   interoperability with MPEG System environments.  The second is
   designed to provide maximum compatibility with other RTP-encapsulated
   media streams and future conference control work of the IETF.

1. Introduction

   ISO/IEC JTC1/SC29 WG11 (also referred to as the MPEG committee) has
   defined the MPEG1 standard (ISO/IEC 11172)[1] and the MPEG2 standard
   (ISO/IEC 13818)[2].  This memo describes a packetization scheme to
   transport MPEG video and audio streams using the Real-time Transport
   Protocol (RTP), version 2 [3, 4].

   The MPEG1 specification is defined in three parts: System, Video and
   Audio.  It is designed primarily for CD-ROM-based applications, and
   is optimized for approximately 1.5 Mbits/sec combined data rates. The
   video and audio portions of the specification describe the basic
   format of the video or audio stream.  These formats define the
   Elementary Streams (ES).  The MPEG1 System specification defines an
   encapsulation of the ES that contains Presentation Time Stamps (PTS),
   Decoding Time Stamps and System Clock references, and performs
   multiplexing of MPEG1 compressed video and audio ES's with user data.






Hoffman, et. al.            Standards Track                     [Page 1]

RFC 2038        RTP Payload Format for MPEG1/MPEG2 Video    October 1996


   The MPEG2 specification is structured in a similar way. However, it
   hasn't been restricted only to CD-ROM applications. The MPEG2 System
   specification defines two system stream formats:  the MPEG2 Transport
   Stream (MTS) and the MPEG2 Program Stream (MPS).  The MTS is tailored
   for communicating or storing one or more programs of MPEG2 compressed
   data and also other data in relatively error-prone environments. The
   MPS is tailored for relatively error-free environments.

   We seek to achieve interoperability among 4 types of end-systems in
   the following specification. The 4 types are:

        1. Transmitting Interworking Unit (TIU)

           Receives MPEG information from a native MTS system for
           distribution over packet networks using a native RTP-based
           system layer (such as an IP-based internetwork). Examples:
           real-time encoder, MTS satellite link to Internet, video
           server with MTS-encoded source material.

        2. Receiving Interworking Unit (RIU)

           Receives MPEG information in real time from an RTP-based
           network for forwarding to a native MTS environment.
           Examples: Internet-based video server to MTS-based cable
           distribution plant.

        3. Transmitting Internet End-System (TAES)

           Transmits MPEG information generated or stored within the
           internet end-system itself, or received from internet-based
           computer networks.  Example: video server.

        4. Receiving Internet End-System (RAES)

           Receives MPEG information over an RTP-based internet for
           consumption at the internet end-system or forwarding to
           traditional computer network.  Example: desktop PC or
           workstation viewing training video.

   Each of the 2 types of transmitters must work with each of the 2
   types of receivers.  Because it is probable that the TAES, and
   certain that the RAES, will be based on existing and planned
   internet-connected computers, it is highly desirable for the
   interoperable protocol to be based on RTP.

   Because of the range of applications that might employ MPEG streams,
   we propose to define two payload formats.




Hoffman, et. al.            Standards Track                     [Page 2]

RFC 2038        RTP Payload Format for MPEG1/MPEG2 Video    October 1996


   Much interest in the MPEG community is in the use of one of the MPEG
   System encodings, and hence, in Section 2 we propose encapsulations
   of MPEG1 System streams and MPEG2 Transport and Program Streams with
   RTP.  This profile supports the full semantics of MPEG System and
   offers basic interoperability among all four end-system types.

   When operating only among internet-based end-systems (i.e., TAES and
   RAES) a payload format that provides greater compatibility with the
   Internet architecture is desired, deferring some of the system issues
   to other protocols being defined in the Internet community (such as
   the MMUSIC WG).  In Section 3 we propose an encapsulation of
   compressed video and audio data (referred to in MPEG documentation as
   "Elementary Streams" (ES)) complying with either MPEG1 or MPEG2.
   Here, neither of the System standards of MPEG1 or MPEG2 are utilized.
   The ES's are directly encapsulated with RTP.

   Throughout this specification, we make extensive use of MPEG
   terminology.  The reader should consult the primary MPEG references
   for definitive descriptions of this terminology.

2. Encapsulation of MPEG System and Transport Streams

   Each RTP packet will contain a timestamp derived from the sender's
   90KHz clock reference.  This clock is synchronized to the system
   stream Program Clock Reference (PCR) or System Clock Reference (SCR)
   and represents the target transmission time of the first byte of the
   packet payload.  The RTP timestamp will not be passed to the MPEG
   decoder.  This use of the timestamp is somewhat different than
   normally is the case in RTP, in that it is not considered to be the
   media display or presentation timestamp. The primary purposes of the
   RTP timestamp will be to estimate and reduce any network-induced
   jitter and to synchronize relative time drift between the transmitter
   and receiver.

   For MPEG2 Transport Streams the RTP payload will contain an integral
   number of MPEG transport packets.  To avoid end system
   inefficiencies, data from multiple small MTS packets (normally fixed
   in size at 188 bytes) are aggregated into a single RTP packet.  The
   number of transport packets contained is computed by dividing RTP
   payload length by the length of an MTS packet (188).

   For MPEG2 Program streams and MPEG1 system streams there are no
   packetization restrictions; these streams are treated as a packetized
   stream of bytes.







Hoffman, et. al.            Standards Track                     [Page 3]

RFC 2038        RTP Payload Format for MPEG1/MPEG2 Video    October 1996


2.1 RTP header usage

   The RTP header fields are used as follows:

        Payload Type: Distinct payload types should be assigned for
          of MPEG1 System Streams, MPEG2 Program Streams and MPEG2
          Transport Streams.  See [4] for payload type assignments.

        M bit:  Set to 1 whenever the timestamp is discontinuous
          (such as might happen when a sender switches from one data
          source to another). This allows the receiver and any
          intervening RTP mixers or translators that are synchronizing
          to the flow to ignore the difference between this timestamp
          and any previous timestamp in their clock phase detectors.

        timestamp: 32 bit 90K Hz timestamp representing the target
          transmission time for the first byte of the packet.

3. Encapsulation of MPEG Elementary Streams

   The following ES types may be encapsulated directly in RTP:

        (a) MPEG1 Video (ISO/IEC 11172-2)
        (b) MPEG2 Video (ISO/IEC 13818-2)
        (c) MPEG1 Audio (ISO/IEC 11172-3)
        (d) MPEG2 Audio (ISO/IEC 13818-3)

   A distinct RTP payload type is assigned to MPEG1/MPEG2 Video and
   MPEG1/MPEG2 Audio, respectively. Further indication as to whether the
   data is MPEG1 or MPEG2 need not be provided in the RTP or MPEG-
   specific headers of this encapsulation, as this information is
   available in the ES headers.

   Presentation Time Stamps (PTS) of 32 bits with an accuracy of 90 kHz
   shall be carried in the fixed RTP header. All packets that make up a
   audio or video frame shall have the same time stamp.

3.1 MPEG Video elementary streams

   MPEG1 Video can be distinguished from MPEG2 Video at the video
   sequence header, i.e. for MPEG2 Video a sequence_header() is followed
   by sequence_extension().  The particular profile and level of MPEG2
   Video (MAIN_Profile@MAIN_Level, HIGH_Profile@HIGH_Level, etc) are
   determined by the profile_and_level_indicator field of the
   sequence_extension header of MPEG2 Video.

   The MPEG bit-stream semantics were designed for relatively error-free
   environments, and there is significant amount of dependency (both



Hoffman, et. al.            Standards Track                     [Page 4]

RFC 2038        RTP Payload Format for MPEG1/MPEG2 Video    October 1996


   temporal and spatial) within the stream such that loss of some data
   make other uncorrupted data useless.  The format as defined in this
   encapsulation uses application layer framing information plus
   additional information in the RTP stream-specific header to allow for
   certain recovery mechanisms.  Appendix 1 suggests several recovery
   strategies based on the properties of this encapsulation.

   Since MPEG pictures can be large, they will normally be fragmented
   into packets of size less than a typical LAN/WAN MTU.  The following
   fragmentation rules apply:

        1. The MPEG Video_Sequence_Header, when present, will always
           be at the beginning of an RTP payload.
        2. An MPEG GOP_header, when present, will always be at the
           beginning of the RTP payload, or will follow a
           Video_Sequence_Header.
        3. An MPEG Picture_Header, when present, will always be at the
           beginning of a RTP payload, or will follow a GOP_header.

   Each ES header must be completely contained within the packet.
   Consequently, a minimum RTP payload size of 261 bytes must be
   supported to contain the largest single header defined in the ES
   (that is, the extension_data() header containing the
   quant_matrix_extension()).  Otherwise, there are no restrictions on
   where headers may appear within packet payloads.

   In MPEG, each picture is made up of one or more "slices," and a slice
   is intended to be the unit of recovery from data loss or corruption.
   An MPEG-compliant decoder will normally advance to the beginning of
   next slice whenever an error is encountered in the stream.  MPEG
   slice begin and end bits are provided in the encapsulation header to
   facilitate this.

   The beginning of a slice must either be the first data in a packet
   (after any MPEG ES headers) or must follow after some integral number
   of slices in a packet.  This requirement insures that the beginning
   of the next slice after one with a missing packet can be found
   without requiring that the receiver scan the packet contents.  Slices
   may be fragmented across packets as long as all the above rules are
   met.

   An implementation based on this encapsulation assumes that the
   Video_Sequence_Header is repeated periodically in the MPEG bit-
   stream.  In practice (though not required by MPEG standard) this is
   used to allow channel switching and to receive and start decoding a
   continuously relayed MPEG bit-stream at arbitrary points in the media
   stream.  It is suggested that when playing back from an MPEG stream
   from a file format (where the Video_Sequence_Header may only be



Hoffman, et. al.            Standards Track                     [Page 5]

RFC 2038        RTP Payload Format for MPEG1/MPEG2 Video    October 1996


   represented at the beginning of the stream) that the first
   Video_Sequence_Header (preceded by an end-of-stream indicator) be
   saved by the packetizer for periodic injection in to the network
   stream.

3.2 MPEG Audio elementary streams

   MPEG1 Audio can be distinguished from MPEG2 Audio from the MPEG
   ancillary_data() header.  For either MPEG1 or MPEG2 Audio, distinct
   Presentation Time Stamps may be present for frames which correspond
   to either 384 samples for Layer-I, or 1152 samples for Layer-II or
   Layer-III.  The actual number of bytes required to represent this
   number of samples will vary depending on the encoder parameters.

   Multiple audio frames may be encapsulated within one RTP packet.  In
   this case, an integral number of audio frames must be contained
   within the packet and the fragmentation header defined in Section 3.5
   shall be set to 0.

   Also, if relatively short packets are to be used, one frame may be so
   large that it may straddle multiple RTP packets.  For example, for
   Layer-II MPEG audio sampled at a rate of 44.1 KHz each frame would
   represent a time slot of 26.1 msec. At this sampling rate if the
   compressed bit-rate is 384 kbits/sec (i.e.  48 kBytes/sec) then the

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?