📄 rfc3016.txt
字号:
unless specified by an out-of-band means (e.g., SDP parameter or MIME
parameter as defined in section 5).
Other header fields are used as described in RFC 1889 [8].
3.2 Fragmentation of MPEG-4 Visual bitstream
A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP
payload without any addition of extra header fields or any removal of
Visual syntax elements. The Combined Configuration/Elementary
streams mode is used. The following rules apply for the
fragmentation.
In the following, header means one of the following:
- Configuration information (Visual Object Sequence Header, Visual
Object Header and Video Object Layer Header)
- visual_object_sequence_end_code
- The header of the entry point function for an elementary stream
(Group_of_VideoObjectPlane() or the header of VideoObjectPlane(),
video_plane_with_short_header(), MeshObject() or FaceObject())
- The video packet header (video_packet_header() excluding
next_resync_marker())
- The header of gob_layer()
See 6.2.1 "Start codes" of ISO/IEC 14496-2 [2][9][4] for the
definition of the configuration information and the entry point
functions.
(1) Configuration information and Group_of_VideoObjectPlane() fields
SHALL be placed at the beginning of the RTP payload (just after the
RTP header) or just after the header of the syntactically upper layer
function.
(2) If one or more headers exist in the RTP payload, the RTP payload
SHALL begin with the header of the syntactically highest function.
Note: The visual_object_sequence_end_code is regarded as the lowest
function.
Kikuchi, et al. Standards Track [Page 6]
RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000
(3) A header SHALL NOT be split into a plurality of RTP packets.
(4) Different VOPs SHOULD be fragmented into different RTP packets so
that one RTP packet consists of the data bytes associated with a
unique VOP time instance (that is indicated in the timestamp field in
the RTP packet header), with the exception that multiple consecutive
VOPs MAY be carried within one RTP packet in the decoding order if
the size of the VOPs is small.
Note: When multiple VOPs are carried in one RTP payload, the
timestamp of the VOPs after the first one may be calculated by the
decoder. This operation is necessary only for RTP packets in which
the marker bit equals to one and the beginning of RTP payload
corresponds to a start code. (See timestamp and marker bit in section
3.1.)
(5) It is RECOMMENDED that a single video packet is sent as a single
RTP packet. The size of a video packet SHOULD be adjusted in such a
way that the resulting RTP packet is not larger than the path-MTU.
Note: Rule (5) does not apply when the video packet is disabled by
the coder configuration (by setting resync_marker_disable in the VOL
header to 1), or in coding tools where the video packet is not
supported. In this case, a VOP MAY be split at arbitrary byte-
positions.
The video packet starts with the VOP header or the video packet
header, followed by motion_shape_texture(), and ends with
next_resync_marker() or next_start_code().
3.3 Examples of packetized MPEG-4 Visual bitstream
Figure 2 shows examples of RTP packets generated based on the
criteria described in 3.2
(a) is an example of the first RTP packet or the random access point
of an MPEG-4 Visual bitstream containing the configuration
information. According to criterion (1), the Visual Object Sequence
Header(VS header) is placed at the beginning of the RTP payload,
preceding the Visual Object Header and the Video Object Layer
Header(VO header, VOL header). Since the fragmentation rule defined
in 3.2 guarantees that the configuration information, starting with
visual_object_sequence_start_code, is always placed at the beginning
of the RTP payload, RTP receivers can detect the random access point
by checking if the first 32-bit field of the RTP payload is
visual_object_sequence_start_code.
Kikuchi, et al. Standards Track [Page 7]
RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000
(b) is another example of the RTP packet containing the configuration
information. It differs from example (a) in that the RTP packet also
contains a video packet in the VOP following the configuration
information. Since the length of the configuration information is
relatively short (typically scores of bytes) and an RTP packet
containing only the configuration information may thus increase the
overhead, the configuration information and the immediately following
GOV and/or (a part of) VOP can be packetized into a single RTP packet
as in this example.
(c) is an example of an RTP packet that contains
Group_of_VideoObjectPlane(GOV). Following criterion (1), the GOV is
placed at the beginning of the RTP payload. It would be a waste of
RTP/IP header overhead to generate an RTP packet containing only a
GOV whose length is 7 bytes. Therefore, (a part of) the following
VOP can be placed in the same RTP packet as shown in (c).
(d) is an example of the case where one video packet is packetized
into one RTP packet. When the packet-loss rate of the underlying
network is high, this kind of packetization is recommended. Even
when the RTP packet containing the VOP header is discarded by a
packet loss, the other RTP packets can be decoded by using the
HEC(Header Extension Code) information in the video packet header.
No extra RTP header field is necessary.
(e) is an example of the case where more than one video packet is
packetized into one RTP packet. This kind of packetization is
effective to save the overhead of RTP/IP headers when the bit-rate of
the underlying network is low. However, it will decrease the
packet-loss resiliency because multiple video packets are discarded
by a single RTP packet loss. The optimal number of video packets in
an RTP packet and the length of the RTP packet can be determined
considering the packet-loss rate and the bit-rate of the underlying
network.
(f) is an example of the case when the video packet is disabled by
setting resync_marker_disable in the VOL header to 1. In this case,
a VOP may be split into a plurality of RTP packets at arbitrary
byte-positions. For example, it is possible to split a VOP into
fixed-length packets. This kind of coder configuration and RTP
packet fragmentation may be used when the underlying network is
guaranteed to be error-free. On the other hand, it is not
recommended to use it in error-prone environment since it provides
only poor packet loss resiliency.
Figure 3 shows examples of RTP packets prohibited by the criteria of
3.2.
Kikuchi, et al. Standards Track [Page 8]
RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000
Fragmentation of a header into multiple RTP packets, as in (a), will
not only increase the overhead of RTP/IP headers but also decrease
the error resiliency. Therefore, it is prohibited by the criterion
(3).
When concatenating more than one video packets into an RTP packet,
VOP header or video_packet_header() shall not be placed in the middle
of the RTP payload. The packetization as in (b) is not allowed by
criterion (2) due to the aspect of the error resiliency. Comparing
this example with Figure 2(d), although two video packets are mapped
onto two RTP packets in both cases, the packet-loss resiliency is not
identical. Namely, if the second RTP packet is lost, both video
packets 1 and 2 are lost in the case of Figure 3(b) whereas only
video packet 2 is lost in the case of Figure 2(d).
+------+------+------+------+
(a) | RTP | VS | VO | VOL |
|header|header|header|header|
+------+------+------+------+
+------+------+------+------+------------+
(b) | RTP | VS | VO | VOL |Video Packet|
|header|header|header|header| |
+------+------+------+------+------------+
+------+-----+------------------+
(c) | RTP | GOV |Video Object Plane|
|header| | |
+------+-----+------------------+
+------+------+------------+ +------+------+------------+
(d) | RTP | VOP |Video Packet| | RTP | VP |Video Packet|
|header|header| (1) | |header|header| (2) |
+------+------+------------+ +------+------+------------+
+------+------+------------+------+------------+------+------------+
(e) | RTP | VP |Video Packet| VP |Video Packet| VP |Video Packet|
|header|header| (1) |header| (2) |header| (3) |
+------+------+------------+------+------------+------+------------+
+------+------+------------+ +------+------------+
(f) | RTP | VOP |VOP fragment| | RTP |VOP fragment|
|header|header| (1) | |header| (2) | ___
+------+------+------------+ +------+------------+
Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream
Kikuchi, et al. Standards Track [Page 9]
RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000
+------+-------------+ +------+------------+------------+
(a) | RTP |First half of| | RTP |Last half of|Video Packet|
|header| VP header | |header| VP header | |
+------+-------------+ +------+------------+------------+
+------+------+----------+ +------+---------+------+------------+
(b) | RTP | VOP |First half| | RTP |Last half| VP |Video Packet|
|header|header| of VP(1) | |header| of VP(1)|header| (2) |
+------+------+----------+ +------+---------+------+------------+
Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual
bitstream
4. RTP Packetization of MPEG-4 Audio bitstream
This section specifies RTP packetization rules for MPEG-4 Audio
bitstreams. MPEG-4 Audio streams MUST be formatted by LATM (Low-
overhead MPEG-4 Audio Transport Multiplex) tool [5], and the LATM-
based streams are then mapped onto RTP packets as described the three
sections below.
4.1 RTP Packet Format
LATM-based streams consist of a sequence of audioMuxElements that
include one or more audio frames. A complete audioMuxElement or a
part of one SHALL be mapped directly onto an RTP payload without any
removal of audioMuxElement syntax elements (see Figure 4). The first
byte of each audioMuxElement SHALL be located at the first payload
location in an RTP packet.
Kikuchi, et al. Standards Track [Page 10]
RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |RTP
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| contributing source (CSRC) identifiers |
| .... |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| |RTP
: audioMuxElement (byte aligned) :Payload
| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| :...OPTIONAL RTP padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4 - An RTP packet for MPEG-4 Audio
In order to decode the audioMuxElement, the following
muxConfigPresent information is required to be indicated by an out-
of-band means. When SDP is utilized for this indication, MIME
parameter "cpresent" corresponds to the muxConfigPresent information
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -