📄 rfc1889.txt
字号:
The audio conferencing application used by each conference
participant sends audio data in small chunks of, say, 20 ms duration.
Each chunk of audio data is preceded by an RTP header; RTP header and
data are in turn contained in a UDP packet. The RTP header indicates
what type of audio encoding (such as PCM, ADPCM or LPC) is contained
in each packet so that senders can change the encoding during a
conference, for example, to accommodate a new participant that is
connected through a low-bandwidth link or react to indications of
network congestion.
The Internet, like other packet networks, occasionally loses and
reorders packets and delays them by variable amounts of time. To cope
with these impairments, the RTP header contains timing information
and a sequence number that allow the receivers to reconstruct the
timing produced by the source, so that in this example, chunks of
audio are contiguously played out the speaker every 20 ms. This
timing reconstruction is performed separately for each source of RTP
packets in the conference. The sequence number can also be used by
the receiver to estimate how many packets are being lost.
Since members of the working group join and leave during the
conference, it is useful to know who is participating at any moment
and how well they are receiving the audio data. For that purpose,
Schulzrinne, et al Standards Track [Page 5]
RFC 1889 RTP January 1996
each instance of the audio application in the conference periodically
multicasts a reception report plus the name of its user on the RTCP
(control) port. The reception report indicates how well the current
speaker is being received and may be used to control adaptive
encodings. In addition to the user name, other identifying
information may also be included subject to control bandwidth limits.
A site sends the RTCP BYE packet (Section 6.5) when it leaves the
conference.
2.2 Audio and Video Conference
If both audio and video media are used in a conference, they are
transmitted as separate RTP sessions RTCP packets are transmitted for
each medium using two different UDP port pairs and/or multicast
addresses. There is no direct coupling at the RTP level between the
audio and video sessions, except that a user participating in both
sessions should use the same distinguished (canonical) name in the
RTCP packets for both so that the sessions can be associated.
One motivation for this separation is to allow some participants in
the conference to receive only one medium if they choose. Further
explanation is given in Section 5.2. Despite the separation,
synchronized playback of a source's audio and video can be achieved
using timing information carried in the RTCP packets for both
sessions.
2.3 Mixers and Translators
So far, we have assumed that all sites want to receive media data in
the same format. However, this may not always be appropriate.
Consider the case where participants in one area are connected
through a low-speed link to the majority of the conference
participants who enjoy high-speed network access. Instead of forcing
everyone to use a lower-bandwidth, reduced-quality audio encoding, an
RTP-level relay called a mixer may be placed near the low-bandwidth
area. This mixer resynchronizes incoming audio packets to reconstruct
the constant 20 ms spacing generated by the sender, mixes these
reconstructed audio streams into a single stream, translates the
audio encoding to a lower-bandwidth one and forwards the lower-
bandwidth packet stream across the low-speed link. These packets
might be unicast to a single recipient or multicast on a different
address to multiple recipients. The RTP header includes a means for
mixers to identify the sources that contributed to a mixed packet so
that correct talker indication can be provided at the receivers.
Some of the intended participants in the audio conference may be
connected with high bandwidth links but might not be directly
reachable via IP multicast. For example, they might be behind an
Schulzrinne, et al Standards Track [Page 6]
RFC 1889 RTP January 1996
application-level firewall that will not let any IP packets pass. For
these sites, mixing may not be necessary, in which case another type
of RTP-level relay called a translator may be used. Two translators
are installed, one on either side of the firewall, with the outside
one funneling all multicast packets received through a secure
connection to the translator inside the firewall. The translator
inside the firewall sends them again as multicast packets to a
multicast group restricted to the site's internal network.
Mixers and translators may be designed for a variety of purposes. An
example is a video mixer that scales the images of individual people
in separate video streams and composites them into one video stream
to simulate a group scene. Other examples of translation include the
connection of a group of hosts speaking only IP/UDP to a group of
hosts that understand only ST-II, or the packet-by-packet encoding
translation of video streams from individual sources without
resynchronization or mixing. Details of the operation of mixers and
translators are given in Section 7.
3. Definitions
RTP payload: The data transported by RTP in a packet, for example
audio samples or compressed video data. The payload format and
interpretation are beyond the scope of this document.
RTP packet: A data packet consisting of the fixed RTP header, a
possibly empty list of contributing sources (see below), and the
payload data. Some underlying protocols may require an
encapsulation of the RTP packet to be defined. Typically one
packet of the underlying protocol contains a single RTP packet,
but several RTP packets may be contained if permitted by the
encapsulation method (see Section 10).
RTCP packet: A control packet consisting of a fixed header part
similar to that of RTP data packets, followed by structured
elements that vary depending upon the RTCP packet type. The
formats are defined in Section 6. Typically, multiple RTCP
packets are sent together as a compound RTCP packet in a single
packet of the underlying protocol; this is enabled by the length
field in the fixed header of each RTCP packet.
Port: The "abstraction that transport protocols use to distinguish
among multiple destinations within a given host computer. TCP/IP
protocols identify ports using small positive integers." [3] The
transport selectors (TSEL) used by the OSI transport layer are
equivalent to ports. RTP depends upon the lower-layer protocol
to provide some mechanism such as ports to multiplex the RTP and
RTCP packets of a session.
Schulzrinne, et al Standards Track [Page 7]
RFC 1889 RTP January 1996
Transport address: The combination of a network address and port that
identifies a transport-level endpoint, for example an IP address
and a UDP port. Packets are transmitted from a source transport
address to a destination transport address.
RTP session: The association among a set of participants
communicating with RTP. For each participant, the session is
defined by a particular pair of destination transport addresses
(one network address plus a port pair for RTP and RTCP). The
destination transport address pair may be common for all
participants, as in the case of IP multicast, or may be
different for each, as in the case of individual unicast network
addresses plus a common port pair. In a multimedia session,
each medium is carried in a separate RTP session with its own
RTCP packets. The multiple RTP sessions are distinguished by
different port number pairs and/or different multicast
addresses.
Synchronization source (SSRC): The source of a stream of RTP packets,
identified by a 32-bit numeric SSRC identifier carried in the
RTP header so as not to be dependent upon the network address.
All packets from a synchronization source form part of the same
timing and sequence number space, so a receiver groups packets
by synchronization source for playback. Examples of
synchronization sources include the sender of a stream of
packets derived from a signal source such as a microphone or a
camera, or an RTP mixer (see below). A synchronization source
may change its data format, e.g., audio encoding, over time. The
SSRC identifier is a randomly chosen value meant to be globally
unique within a particular RTP session (see Section 8). A
participant need not use the same SSRC identifier for all the
RTP sessions in a multimedia session; the binding of the SSRC
identifiers is provided through RTCP (see Section 6.4.1). If a
participant generates multiple streams in one RTP session, for
example from separate video cameras, each must be identified as
a different SSRC.
Contributing source (CSRC): A source of a stream of RTP packets that
has contributed to the combined stream produced by an RTP mixer
(see below). The mixer inserts a list of the SSRC identifiers of
the sources that contributed to the generation of a particular
packet into the RTP header of that packet. This list is called
the CSRC list. An example application is audio conferencing
where a mixer indicates all the talkers whose speech was
combined to produce the outgoing packet, allowing the receiver
to indicate the current talker, even though all the audio
packets contain the same SSRC identifier (that of the mixer).
Schulzrinne, et al Standards Track [Page 8]
RFC 1889 RTP January 1996
End system: An application that generates the content to be sent in
RTP packets and/or consumes the content of received RTP packets.
An end system can act as one or more synchronization sources in
a particular RTP session, but typically only one.
Mixer: An intermediate system that receives RTP packets from one or
more sources, possibly changes the data format, combines the
packets in some manner and then forwards a new RTP packet. Since
the timing among multiple input sources will not generally be
synchronized, the mixer will make timing adjustments among the
streams and generate its own timing for the combined stream.
Thus, all data packets originating from a mixer will be
identified as having the mixer as their synchronization source.
Translator: An intermediate system that forwards RTP packets with
their synchronization source identifier intact. Examples of
translators include devices that convert encodings without
mixing, replicators from multicast to unicast, and application-
level filters in firewalls.
Monitor: An application that receives RTCP packets sent by
participants in an RTP session, in particular the reception
reports, and estimates the current quality of service for
distribution monitoring, fault diagnosis and long-term
statistics. The monitor function is likely to be built into the
application(s) participating in the session, but may also be a
separate application that does not otherwise participate and
does not send or receive the RTP data packets. These are called
third party monitors.
Non-RTP means: Protocols and mechanisms that may be needed in
addition to RTP to provide a usable service. In particular, for
multimedia conferences, a conference control application may
distribute multicast addresses and keys for encryption,
negotiate the encryption algorithm to be used, and define
dynamic mappings between RTP payload type values and the payload
formats they represent for formats that do not have a predefined
payload type value. For simple applications, electronic mail or
a conference database may also be used. The specification of
such protocols and mechanisms is outside the scope of this
document.
4. Byte Order, Alignment, and Time Format
All integer fields are carried in network byte order, that is, most
significant byte (octet) first. This byte order is commonly known as
big-endian. The transmission order is described in detail in [4].
Unless otherwise noted, numeric constants are in decimal (base 10).
Schulzrinne, et al Standards Track [Page 9]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -