📄 rfc3984.txt
字号:
RFC 3984 RTP Payload Format for H.264 Video February 2005
This memo introduces new NAL unit types, which are presented in
section 5.2. The NAL unit types defined in this memo are marked as
unspecified in [1]. Moreover, this specification extends the
semantics of F and NRI as described in section 5.3.
2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in BCP 14, RFC 2119 [3].
This specification uses the notion of setting and clearing a bit when
bit fields are handled. Setting a bit is the same as assigning that
bit the value of 1 (On). Clearing a bit is the same as assigning
that bit the value of 0 (Off).
3. Scope
This payload specification can only be used to carry the "naked"
H.264 NAL unit stream over RTP, and not the bitstream format
discussed in Annex B of H.264. Likely, the first applications of
this specification will be in the conversational multimedia field,
video telephony or video conferencing, but the payload format also
covers other applications, such as Internet streaming and TV over IP.
4. Definitions and Abbreviations
4.1. Definitions
This document uses the definitions of [1]. The following terms,
defined in [1], are summed up for convenience:
access unit: A set of NAL units always containing a primary coded
picture. In addition to the primary coded picture, an access unit
may also contain one or more redundant coded pictures or other NAL
units not containing slices or slice data partitions of a coded
picture. The decoding of an access unit always results in a
decoded picture.
coded video sequence: A sequence of access units that consists, in
decoding order, of an instantaneous decoding refresh (IDR) access
unit followed by zero or more non-IDR access units including all
subsequent access units up to but not including any subsequent IDR
access unit.
IDR access unit: An access unit in which the primary coded picture
is an IDR picture.
Wenger, et al. Standards Track [Page 6]
RFC 3984 RTP Payload Format for H.264 Video February 2005
IDR picture: A coded picture containing only slices with I or SI
slice types that causes a "reset" in the decoding process. After
the decoding of an IDR picture, all following coded pictures in
decoding order can be decoded without inter prediction from any
picture decoded prior to the IDR picture.
primary coded picture: The coded representation of a picture to be
used by the decoding process for a bitstream conforming to H.264.
The primary coded picture contains all macroblocks of the picture.
redundant coded picture: A coded representation of a picture or a
part of a picture. The content of a redundant coded picture shall
not be used by the decoding process for a bitstream conforming to
H.264. The content of a redundant coded picture may be used by
the decoding process for a bitstream that contains errors or
losses.
VCL NAL unit: A collective term used to refer to coded slice and
coded data partition NAL units.
In addition, the following definitions apply:
decoding order number (DON): A field in the payload structure, or
a derived variable indicating NAL unit decoding order. Values of
DON are in the range of 0 to 65535, inclusive. After reaching the
maximum value, the value of DON wraps around to 0.
NAL unit decoding order: A NAL unit order that conforms to the
constraints on NAL unit order given in section 7.4.1.2 in [1].
transmission order: The order of packets in ascending RTP sequence
number order (in modulo arithmetic). Within an aggregation
packet, the NAL unit transmission order is the same as the order
of appearance of NAL units in the packet.
media aware network element (MANE): A network element, such as a
middlebox or application layer gateway that is capable of parsing
certain aspects of the RTP payload headers or the RTP payload and
reacting to the contents.
Informative note: The concept of a MANE goes beyond normal
routers or gateways in that a MANE has to be aware of the
signaling (e.g., to learn about the payload type mappings of
the media streams), and in that it has to be trusted when
working with SRTP. The advantage of using MANEs is that they
allow packets to be dropped according to the needs of the media
coding. For example, if a MANE has to drop packets due to
congestion on a certain link, it can identify those packets
Wenger, et al. Standards Track [Page 7]
RFC 3984 RTP Payload Format for H.264 Video February 2005
whose dropping has the smallest negative impact on the user
experience and remove them in order to remove the congestion
and/or keep the delay low.
Abbreviations
DON: Decoding Order Number
DONB: Decoding Order Number Base
DOND: Decoding Order Number Difference
FEC: Forward Error Correction
FU: Fragmentation Unit
IDR: Instantaneous Decoding Refresh
IEC: International Electrotechnical Commission
ISO: International Organization for Standardization
ITU-T: International Telecommunication Union,
Telecommunication Standardization Sector
MANE: Media Aware Network Element
MTAP: Multi-Time Aggregation Packet
MTAP16: MTAP with 16-bit timestamp offset
MTAP24: MTAP with 24-bit timestamp offset
NAL: Network Abstraction Layer
NALU: NAL Unit
SEI: Supplemental Enhancement Information
STAP: Single-Time Aggregation Packet
STAP-A: STAP type A
STAP-B: STAP type B
TS: Timestamp
VCL: Video Coding Layer
5. RTP Payload Format
5.1. RTP Header Usage
The format of the RTP header is specified in RFC 3550 [4] and
reprinted in Figure 1 for convenience. This payload format uses the
fields of the header in a manner consistent with that specification.
When one NAL unit is encapsulated per RTP packet, the RECOMMENDED RTP
payload format is specified in section 5.6. The RTP payload (and the
settings for some RTP header bits) for aggregation packets and
fragmentation units are specified in sections 5.7 and 5.8,
respectively.
Wenger, et al. Standards Track [Page 8]
RFC 3984 RTP Payload Format for H.264 Video February 2005
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| contributing source (CSRC) identifiers |
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1. RTP header according to RFC 3550
The RTP header information to be set according to this RTP payload
format is set as follows:
Marker bit (M): 1 bit
Set for the very last packet of the access unit indicated by the
RTP timestamp, in line with the normal use of the M bit in video
formats, to allow an efficient playout buffer handling. For
aggregation packets (STAP and MTAP), the marker bit in the RTP
header MUST be set to the value that the marker bit of the last
NAL unit of the aggregation packet would have been if it were
transported in its own RTP packet. Decoders MAY use this bit as
an early indication of the last packet of an access unit, but MUST
NOT rely on this property.
Informative note: Only one M bit is associated with an
aggregation packet carrying multiple NAL units. Thus, if a
gateway has re-packetized an aggregation packet into several
packets, it cannot reliably set the M bit of those packets.
Payload type (PT): 7 bits
The assignment of an RTP payload type for this new packet format
is outside the scope of this document and will not be specified
here. The assignment of a payload type has to be performed either
through the profile used or in a dynamic way.
Sequence number (SN): 16 bits
Set and used in accordance with RFC 3550. For the single NALU and
non-interleaved packetization mode, the sequence number is used to
determine decoding order for the NALU.
Timestamp: 32 bits
The RTP timestamp is set to the sampling timestamp of the content.
A 90 kHz clock rate MUST be used.
Wenger, et al. Standards Track [Page 9]
RFC 3984 RTP Payload Format for H.264 Video February 2005
If the NAL unit has no timing properties of its own (e.g.,
parameter set and SEI NAL units), the RTP timestamp is set to the
RTP timestamp of the primary coded picture of the access unit in
which the NAL unit is included, according to section 7.4.1.2 of
[1].
The setting of the RTP Timestamp for MTAPs is defined in section
5.7.2.
Receivers SHOULD ignore any picture timing SEI messages included
in access units that have only one display timestamp. Instead,
receivers SHOULD use the RTP timestamp for synchronizing the
display process.
RTP senders SHOULD NOT transmit picture timing SEI messages for
pictures that are not supposed to be displayed as multiple fields.
If one access unit has more than one display timestamp carried in
a picture timing SEI message, then the information in the SEI
message SHOULD be treated as relative to the RTP timestamp, with
the earliest event occurring at the time given by the RTP
timestamp, and subsequent events later, as given by the difference
in SEI message picture timing values. Let tSEI1, tSEI2, ...,
tSEIn be the display timestamps carried in the SEI message of an
access unit, where tSEI1 is the earliest of all such timestamps.
Let tmadjst() be a function that adjusts the SEI messages time
scale to a 90-kHz time scale. Let TS be the RTP timestamp. Then,
the display time for the event associated with tSEI1 is TS. The
display time for the event with tSEIx, where x is [2..n] is TS +
tmadjst (tSEIx - tSEI1).
Informative note: Displaying coded frames as fields is needed
commonly in an operation known as 3:2 pulldown, in which film
content that consists of coded frames is displayed on a display
using interlaced scanning. The picture timing SEI message
enables carriage of multiple timestamps for the same coded
picture, and therefore the 3:2 pulldown process is perfectly
controlled. The picture timing SEI message mechanism is
necessary because only one timestamp per coded frame can be
conveyed in the RTP timestamp.
Informative note: Because H.264 allows the decoding order to be
different from the display order, values of RTP timestamps may
not be monotonically non-decreasing as a function of RTP
sequence numbers. Furthermore, the value for interarrival
jitter reported in the RTCP reports may not be a trustworthy
indication of the network performance, as the calculation rules
Wenger, et al. Standards Track [Page 10]
RFC 3984 RTP Payload Format for H.264 Video February 2005
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -