📄 draft-ietf-avt-mpeg4-simple-06.txt
字号:
Units. An MPEG Access Unit (AU) is the smallest data entity to which timing information is attributed. In case of audio an Access Unit may represent an audio frame and in case of video a picture. MPEG Access Units are by definition octet-aligned. If for example an audio frame is not octet-aligned, up to 7 zero-padding bits MUST be inserted at the end of the frame to achieve the octet-aligned Access Units, as required by the MPEG-4 specification. MPEG-4 decoders MUST be able to decode AUs in which such padding is applied. Consistent with the MPEG-4 specification, this document requires that each MPEG-4 part 2 video Access Unit includes all the coded data of a picture, any video stream headers that may precede the coded picture data, and any video stream stuffing that may follow it, up to, but not including the startcode indicating the start of a new video stream or the next Access Unit.2.3 Concatenation of Access Units Frequently it is possible to carry multiple Access Units in one RTP packet. This is particularly useful for audio; for example, when AAC is used for encoding of a stereo signal at 64 kbits/sec, AAC frames contain on average approximately 200 octets. On a LAN with a 1500 octet MTU this would allow on average 7 complete AAC frames to be carried per AAC packet. Access Units may have a fixed size in octets, but a variable size is also possible. To facilitate parsing in case of multiple concatenated AUs in one RTP packet, the size of each AU is made known to the receiver. When concatenating in case of a constant AU size, this size is communicated "out of band" through a MIME format parameter. When concatenating in case of variable size AUs, the RTP payload carries "in band" an AU size field for each contained AU. van der Meer et al. Expires June 2003 [Page 6]RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 In combination with the RTP payload length the size information allows the RTP payload to be split by the receiver back into the individual AUs. To simplify the implementation of RTP receivers, it is required that when multiple AUs are carried in an RTP packet, each AU MUST be complete, i.e. the number of AUs in an RTP packet MUST be integral. In addition, an AU MUST NOT be repeated in other RTP packets; hence repetition of an AU is only possible by using a duplicate RTP packet.2.4 Fragmentation of Access Units MPEG allows for very large Access Units. Since most IP networks have significantly smaller MTU sizes, this payload format allows for the fragmentation of an Access Unit over multiple RTP packets so as to avoid IP layer fragmentation. To simplify the implementation of RTP receivers, an RTP packet SHALL either carry one or more complete Access Units or a single fragment of one Access Unit (i.e. packets MUST NOT contain fragments of multiple Access Units). 2.5 Interleaving When an RTP packet carries a contiguous sequence of Access Units, the loss of such a packet can result in a "decoding gap" for the user. One method to alleviate this problem is to allow for the Access Units to be interleaved in the RTP packets. For a modest cost in latency and implementation complexity, significant error resiliency to packet loss can be achieved. To support optional interleaving of Access Units, this payload format allows for index information to be sent for each Access Unit. After informing receivers about buffer resources to allocate for de-interleaving, the RTP sender is free to choose the interleaving pattern without propagating this information a priori to the receiver(s). Indeed the sender could dynamically adjust the interleaving pattern based on the Access Unit size, error rates, etc. The RTP receiver does not need to know the interleaving pattern used, it only needs to extract the index information of the Access Unit and insert the Access Unit into the appropriate sequence in the decoding or rendering queue. An example of interleaving is given below. Assume that an RTP packet contains 3 AUs, and that the AUs are numbered 0, 1, 2, 3, 4, etc. If an interleaving group length of 9 is chosen, then RTP packet(i) contains the following AU(n): RTP packet(0): AU(0), AU(3), AU(6) RTP packet(1): AU(1), AU(4), AU(7) RTP packet(2): AU(2), AU(5), AU(8) RTP packet(3): AU(9), AU(12), AU(15) RTP packet(4): AU(10), AU(13), AU(16) Etc.van der Meer et al. Expires June 2003 [Page 7]RFC xxxx Transport of MPEG-4 Elementary Streams December 20022.6 Time stamp information The RTP time stamp MUST carry the sampling instant of the first AU (fragment) in the RTP packet. When multiple AUs are carried within an RTP packet, the time stamps of subsequent AUs can be calculated if the frame period of each AU is known. For audio and video this is possible if the frame rate is constant. However, in some cases it is not possible to make such calculation, for example for variable frame rate video and for MPEG-4 BIFS streams carrying composition information. To support such cases, this payload format can be configured to carry a time stamp in the RTP payload for each contained Access Unit. A time stamp MAY be conveyed in the RTP payload only for non-first AUs in the RTP packet, and SHALL NOT be conveyed for the first AU (fragment), as the time stamp for the first AU in the RTP packet is carried by the RTP time stamp. MPEG-4 defines two type of time stamps, the composition time stamp (CTS) and the decoding time stamp (DTS). The CTS represents the sampling instant of an AU, and hence the CTS is equivalent to the RTP time stamp. The DTS may be used in MPEG-4 video streams that use bi-directional coding, i.e. when pictures are predicted in both forward and backward direction by using either a reference picture in the past, or a reference picture in the future. The DTS cannot be carried in the RTP header. In some cases the DTS can be derived from the RTP time stamp using frame rate information; this requires deep parsing in the video stream, which may be considered objectionable. But if the video frame rate is variable, the required information may not even be present in the video stream. For both reasons, the capability has been defined to optionally carry the DTS in the RTP payload for each contained Access Unit. To keep the coding of time stamps efficient, each time stamp contained in the RTP payload is coded differentially, the CTS from the RTP time stamp, and the DTS from the CTS. 2.7 State indication of MPEG-4 system streams ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to convey state information when transporting MPEG-4 system streams, this payload format allows for the optional carriage in the RTP payload of the stream state for each contained Access Unit. Stream states are used to signal "crucial" AUs that carry information whose loss cannot be tolerated and are also useful when repeating AUs according to the carousel mechanism defined in ISO/IEC 14496-1.2.8 Random access indication Random access to the content of MPEG-4 elementary streams may be possible at some but not all Access Units. To signal Access Units where random access is possible, a random access point flag can van der Meer et al. Expires June 2003 [Page 8]RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 optionally be carried in the RTP payload for each contained Access Unit. Carriage of random access points is particularly useful for MPEG-4 system streams in combination with the stream state. 2.9 Carriage of auxiliary information. This payload format defines a specific field to carry auxiliary data. The auxiliary data field is preceded by a field that specifies the length of the auxiliary data, so as to facilitate skipping of the data without parsing it. The coding of the auxiliary data is not defined in this document; instead the format, meaning and signaling of auxiliary information is expected to be specified in one or more future RFCs. Auxiliary information MUST NOT be transmitted until its format, meaning and signaling have been specified and its use has been signaled. Receivers that have knowledge of the auxiliary data MAY decode the auxiliary data, but receivers without knowledge of such data MUST skip the auxiliary data field.2.10 MIME format parameters and configuring conditional fields To support the features described in the previous sections several fields are defined for carriage in the RTP payload. However, their use strongly depends on the type of MPEG-4 elementary stream that is carried. Sometimes a specific field is needed with a certain length, while in other cases such field is not needed at all. To be efficient in either case, the fields to support these features are configurable by means of MIME format parameters. In general, a MIME format parameter defines the presence and length of the associated field. A length of zero indicates absence of the field. As a consequence, parsing of the payload requires knowledge of MIME format parameters. The MIME format parameters are conveyed to the receiver via SDP [5] messages, as specified in section 4.4.1, or through other means.2.11 Global structure of payload format The RTP payload following the RTP header, contains three octet-aligned data sections, of which the first two MAY be empty. See figure 1. +---------+-----------+-----------+---------------+ | RTP | AU Header | Auxiliary | Access Unit | | Header | Section | Section | Data Section | +---------+-----------+-----------+---------------+ <----------RTP Packet Payload-----------> Figure 1: Data sections within an RTP packet The first data section is the AU (Access Unit) Header Section, that contains one or more AU-headers; however, each AU-header MAY be empty, in which case the entire AU Header Section is empty. The van der Meer et al. Expires June 2003 [Page 9]RFC xxxx Transport of MPEG-4 Elementary Streams December 2002 second section is the Auxiliary Section, containing auxiliary data; this section MAY also be configured empty. The third section is the Access Unit Data Section, containing either a single fragment of one Access Unit or one or more complete Access Units. The Access Unit Data Section MUST NOT be empty.2.12 Modes to transport MPEG-4 streams While it is possible to build fully configurable receivers capable of receiving any MPEG-4 stream, this specification also allows for the design of simplified, but dedicated receivers, that are capable for example of receiving only one type of MPEG-4 stream. This is achieved by requiring that specific modes be defined for using this specification. Each mode may define constraints for transport of one or more type of MPEG-4 streams, for instance on the payload configuration. The applied mode MUST be signaled. Signaling the mode is particularly important for receivers that are only capable of decoding one or more specific modes. Such receivers need to determine whether the applied mode is supported, so as to avoid problems with processing of payloads that are beyond the capabilities of the receiver. In this document several modes are defined for transport of MPEG-4 CELP and AAC streams, as well as a generic mode that can be used for any MPEG-4 stream. In the future, new RFCs may specify other modes of using this specification. However, each mode MUST be in full compliance with this specification (see section 3.3.7).2.13 Alignment with RFC 3016 This payload can be configured to be nearly identical to the payload format defined in RFC 3016 [12] for the MPEG-4 video configurations recommended in RFC 3016. Hence, receivers that comply with RFC 3016 can decode such RTP payload, providing that additional packets containing video decoder configuration (VO, VOL, VOSH) are inserted in the stream, as required by RFC 3016. Conversely, receivers that comply with the specification in this document should be able to decode payloads, names and parameters defined for MPEG-4 video in RFC 3016. In this respect it is strongly RECOMMENDED to implement the ability to ignore "in band" video decoder configuration packets in the RFC 3016 payload. Note the "out of band" availability of the video decoder configuration is optional in RFC 3016. To achieve maximum interoperability with the RTP payload format defined in this document, applications that use RFC 3016 to transport MPEG-4 video (part 2) are recommended to make the video decoder configuration available as a MIME parameter. van der Meer et al. Expires June 2003 [Page 10]RFC xxxx Transport of MPEG-4 Elementary Streams December 20023. Payload Format3.1 Usage of RTP Header Fields and RTCP Payload Type (PT): The assignment of an RTP payload type for this packet format is outside the scope of this document; it is specified by the RTP profile under which this payload format is used. Marker (M) bit: The M bit is set to 1 to indicate that the RTP packet payload contains either the final fragment of a fragmented Access Unit or one or more complete Access Units. Extension (X) bit: Defined by the RTP profile used. Sequence Number: The RTP sequence number SHOULD be generated by the sender in the usual manner with a constant random offset. Timestamp: Indicates the sampling instant of the first AU contained in the RTP payload. This sampling instant is equivalent to the CTS in the MPEG-4 time domain. When using SDP the clock rate of the RTP time stamp MUST be expressed using the "rtpmap" attribute. If an MPEG-4 audio stream is transported, the rate SHOULD be set to the same value as the sampling rate of the audio stream. If an MPEG-4 video stream is transported, it is RECOMMENDED to set the rate to 90 kHz. In all cases, the sender SHALL make sure that RTP time stamps are identical only if the RTP time stamp refers to fragments of the same Access Unit. According to RFC 1889 [2] (section 5.1), RTP time stamps are RECOMMENDED to start at a random value for security reasons. This is not an issue for synchronization of multiple RTP streams. When, however, streams from multiple sources are to be synchronized (for example one stream from local storage, another from an RTP streaming server), synchronization may become impossible if the receiver only knows the original time stamp relationships. Synchronization in such cases, may require to provide the correct relationship between time stamps for obtaining synchronization by out of band means. The format of such information as well as methods to convey such information are beyond the scope of this specification.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -