📄 rfc3267.txt
字号:
Sjoberg, et. al. Standards Track [Page 11]
RFC 3267 RTP Payload Format for AMR and AMR-WB June 2002
3.9. AMR or AMR-WB Speech over IP scenarios
The primary scenario for this payload format is IP end-to-end between
two terminals, as shown in Figure 2. This payload format is expected
to be useful for both conversational and streaming services.
+----------+ +----------+
| | IP/UDP/RTP/AMR or | |
| TERMINAL |<----------------------->| TERMINAL |
| | IP/UDP/RTP/AMR-WB | |
+----------+ +----------+
Figure 2: IP terminal to IP terminal scenario
A conversational service puts requirements on the payload format.
Low delay is one very important factor, i.e., few speech frame-blocks
per payload packet. Low overhead is also required when the payload
format traverses low bandwidth links, especially as the frequency of
packets will be high. For low bandwidth links it also an advantage
to support UED which allows a link provider to reduce delay and
packet loss or to reduce the utilization of link resources.
Streaming service has less strict real-time requirements and
therefore can use a larger number of frame-blocks per packet than
conversational service. This reduces the overhead from IP, UDP, and
RTP headers. However, including several frame-blocks per packet
makes the transmission more vulnerable to packet loss, so
interleaving may be used to reduce the effect packet loss will have
on speech quality. A streaming server handling a large number of
clients also needs a payload format that requires as few resources as
possible when doing packetization. The octet-aligned and
interleaving modes require the least amount of resources, while CRC,
robust sorting, and bandwidth efficient modes have higher demands.
Another scenario occurs when AMR or AMR-WB encoded speech will be
transmitted from a non-IP system (e.g., a GSM or 3GPP network) to an
IP/UDP/RTP VoIP terminal, and/or vice versa, as depicted in Figure 3.
Sjoberg, et. al. Standards Track [Page 12]
RFC 3267 RTP Payload Format for AMR and AMR-WB June 2002
AMR or AMR-WB
over
I.366.{2,3} or +------+ +----------+
3G Iu or | | IP/UDP/RTP/AMR or | |
<------------->| GW |<---------------------->| TERMINAL |
GSM Abis | | IP/UDP/RTP/AMR-WB | |
etc. +------+ +----------+
|
GSM/3GPP network | IP network
|
Figure 3: GW to VoIP terminal scenario
In such a case, it is likely that the AMR or AMR-WB frame is
packetized in a different way in the non-IP network and will need to
be re-packetized into RTP at the gateway. Also, speech frames from
the non-IP network may come with some UEP/UED information (e.g., a
frame quality indicator) that will need to be preserved and forwarded
on to the decoder along with the speech bits. This is specified in
Section 4.3.2.
AMR's capability to do fast mode switching is exploited in some non-
IP networks to optimize speech quality. To preserve this
functionality in scenarios including a gateway to an IP network, a
codec mode request (CMR) field is needed. The gateway will be
responsible for forwarding the CMR between the non-IP and IP parts in
both directions. The IP terminal should follow the CMR forwarded by
the gateway to optimize speech quality going to the non-IP decoder.
The mode control algorithm in the gateway must accommodate the delay
imposed by the IP network on the response to CMR by the IP terminal.
The IP terminal should not set the CMR (see Section 4.3.1), but the
gateway can set the CMR value on frames going toward the encoder in
the non-IP part to optimize speech quality from that encoder to the
gateway. The gateway can alternatively set a lower CMR value, if
desired, as one means to control congestion on the IP network.
Sjoberg, et. al. Standards Track [Page 13]
RFC 3267 RTP Payload Format for AMR and AMR-WB June 2002
A third likely scenario is that IP/UDP/RTP is used as transport
between two non-IP systems, i.e., IP is originated and terminated in
gateways on both sides of the IP transport, as illustrated in Figure
4 below.
AMR or AMR-WB AMR or AMR-WB
over over
I.366.{2,3} or +------+ +------+ I.366.{2,3} or
3G Iu or | | IP/UDP/RTP/AMR or | | 3G Iu or
<------------->| GW |<------------------->| GW |<------------->
GSM Abis | | IP/UDP/RTP/AMR-WB | | GSM Abis
etc. +------+ +------+ etc.
| |
GSM/3GPP network | IP network | GSM/3GPP network
| |
Figure 4: GW to GW scenario
This scenario requires the same mechanisms for preserving UED/UEP and
CMR information as in the single gateway scenario. In addition, the
CMR value may be set in packets received by the gateways on the IP
network side. The gateway should forward to the non-IP side a CMR
value that is the minimum of three values:
- the CMR value it receives on the IP side;
- the CMR value it calculates based on its reception quality on
the non-IP side; and
- a CMR value it may choose for congestion control of transmission
on the IP side.
The details of the control algorithm are left to the implementation.
4. AMR and AMR-WB RTP Payload Formats
The AMR and AMR-WB payload formats have identical structure, so they
are specified together. The only differences are in the types of
codec frames contained in the payload. The payload format consists
of the RTP header, payload header and payload data.
4.1. RTP Header Usage
The format of the RTP header is specified in [8]. This payload
format uses the fields of the header in a manner consistent with that
specification.
Sjoberg, et. al. Standards Track [Page 14]
RFC 3267 RTP Payload Format for AMR and AMR-WB June 2002
The RTP timestamp corresponds to the sampling instant of the first
sample encoded for the first frame-block in the packet. The
timestamp clock frequency is the same as the sampling frequency, so
the timestamp unit is in samples.
The duration of one speech frame-block is 20 ms for both AMR and
AMR-WB. For AMR, the sampling frequency is 8 kHz, corresponding to
160 encoded speech samples per frame from each channel. For AMR-WB,
the sampling frequency is 16 kHz, corresponding to 320 samples per
frame from each channel. Thus, the timestamp is increased by 160 for
AMR and 320 for AMR-WB for each consecutive frame-block.
A packet may contain multiple frame-blocks of encoded speech or
comfort noise parameters. If interleaving is employed, the frame-
blocks encapsulated into a payload are picked according to the
interleaving rules as defined in Section 4.4.1. Otherwise, each
packet covers a period of one or more contiguous 20 ms frame-block
intervals. In case the data from all the channels for a particular
frame-block in the period is missing, for example at a gateway from
some other transport format, it is possible to indicate that no data
is present for that frame-block rather than breaking a multi-frame-
block packet into two, as explained in Section 4.3.2.
To allow for error resiliency through redundant transmission, the
periods covered by multiple packets MAY overlap in time. A receiver
MUST be prepared to receive any speech frame multiple times, either
in exact duplicates, or in different AMR rate modes, or with data
present in one packet and not present in another. If multiple
versions of the same speech frame are received, it is RECOMMENDED
that the mode with the highest rate be used by the speech decoder. A
given frame MUST NOT be encoded as speech in one packet and comfort
noise parameters in another.
The payload is always made an integral number of octets long by
padding with zero bits if necessary. If additional padding is
required to bring the payload length to a larger multiple of octets
or for some other purpose, then the P bit in the RTP in the header
may be set and padding appended as specified in [8].
The RTP header marker bit (M) SHALL be set to 1 if the first frame-
block carried in the packet contains a speech frame which is the
first in a talkspurt. For all other packets the marker bit SHALL be
set to zero (M=0).
Sjoberg, et. al. Standards Track [Page 15]
RFC 3267 RTP Payload Format for AMR and AMR-WB June 2002
The assignment of an RTP payload type for this new packet format is
outside the scope of this document, and will not be specified here.
It is expected that the RTP profile under which this payload format
is being used will assign a payload type for this encoding or specify
that the payload type is to be bound dynamically.
4.2. Payload Structure
The complete payload consists of a payload header, a payload table of
contents, and speech data representing one or more speech frame-
blocks. The following diagram shows the general payload format
layout:
+----------------+-------------------+----------------
| payload header | table of contents | speech data ...
+----------------+-------------------+----------------
Payloads containing more than one speech frame-block are called
compound payloads.
The following sections describe the variations taken by the payload
format depending on whether the AMR session is set up to use the
bandwidth-efficient mode or octet-aligned mode and any of the
OPTIONAL functions for robust sorting, interleaving, and frame CRCs.
Implementations SHOULD support both bandwidth-efficient and octet-
aligned operation to increase interoperability.
4.3. Bandwidth-Efficient Mode
4.3.1. The Payload Header
In bandwidth-efficient mode, the payload header simply consists of a
4 bit codec mode request:
0 1 2 3
+-+-+-+-+
| CMR |
+-+-+-+-+
CMR (4 bits): Indicates a codec mode request sent to the speech
encoder at the site of the receiver of this payload. The value of
the CMR field is set to the frame type index of the corresponding
speech mode being requested. The frame type index may be 0-7 for
AMR, as defined in Table 1a in [2], or 0-8 for AMR-WB, as defined
in Table 1a in [4]. CMR value 15 indicates that no mode request
is present, and other values are for future use.
Sjoberg, et. al. Standards Track [Page 16]
RFC 3267 RTP Payload Format for AMR and AMR-WB June 2002
The mode request received in the CMR field is valid until the next
CMR is received, i.e., a newly received CMR value overrides the
previous one. Therefore, if a terminal continuously wishes to
receive frames in the same mode X, it needs to set CMR=X for all its
outbound payloads, and if a terminal has no preference in which mode
to receive, it SHOULD set CMR=15 in all its outbound payloads.
If receiving a payload with a CMR value which is not a speech mode or
NO_DATA, the CMR MUST be ignored by the receiver.
In a multi-channel session, CMR SHOULD be interpreted by the receiver
of the payload as the desired encoding mode for all the channels in
the session.
An IP end-point SHOULD NOT set the CMR based on packet losses or
other congestion indications, for several reasons:
- The other end of the IP path may be a gateway to a non-IP
network (such as a radio link) that needs to set the CMR field
to optimize performance on that network.
- Congestion on the IP network is managed by the IP sender, in
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -