📄 rfc3267.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 5 页
字号:
   packets during silence periods to a minimum.  The operation of
   sending CN parameters at regular intervals during silence periods is
   usually called discontinuous transmission (DTX) or source controlled
   rate (SCR) operation.  The AMR or AMR-WB frames containing CN
   parameters are called Silence Indicator (SID) frames.  See more
   details about VAD and DTX functionality in [9] and [10].

3.5. Support for Multi-Channel Session

   Both the RTP payload format and the storage format defined in this
   document support multi-channel audio content (e.g., a stereophonic
   speech session).

   Although AMR and AMR-WB codecs themselves do not support encoding of
   multi-channel audio content into a single bit stream, they can be
   used to separately encode and decode each of the individual channels.

   To transport (or store) the separately encoded multi-channel content,
   the speech frames for all channels that are framed and encoded for
   the same 20 ms periods are logically collected in a frame-block.

   At the session setup, out-of-band signaling must be used to indicate
   the number of channels in the session and the order of the speech
   frames from different channels in each frame-block.  When using SDP
   for signaling, the number of channels is specified in the rtpmap




Sjoberg, et. al.            Standards Track                     [Page 6]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


   attribute and the order of channels carried in each frame-block is
   implied by the number of channels as specified in Section 4.1 in
   [24].

3.6. Unequal Bit-error Detection and Protection

   The speech bits encoded in each AMR or AMR-WB frame have different
   perceptual sensitivity to bit errors.  This property has been
   exploited in cellular systems to achieve better voice quality by
   using unequal error protection and detection (UEP and UED)
   mechanisms.

   The UEP/UED mechanisms focus the protection and detection of
   corrupted bits to the perceptually most sensitive bits in an AMR or
   AMR-WB frame.  In particular, speech bits in an AMR or AMR-WB frame
   are divided into class A, B, and C, where bits in class A are most
   sensitive and bits in class C least sensitive (see Table 1 below for
   AMR and [4] for AMR-WB).  A frame is only declared damaged if there
   are bit errors found in the most sensitive bits, i.e., the class A
   bits.  On the other hand, it is acceptable to have some bit errors in
   the other bits, i.e., class B and C bits.

                                    Class A   total speech
                  Index   Mode       bits       bits
                  ----------------------------------------
                    0     AMR 4.75   42         95
                    1     AMR 5.15   49        103
                    2     AMR 5.9    55        118
                    3     AMR 6.7    58        134
                    4     AMR 7.4    61        148
                    5     AMR 7.95   75        159
                    6     AMR 10.2   65        204
                    7     AMR 12.2   81        244
                    8     AMR SID    39         39

          Table 1.  The number of class A bits for the AMR codec.

   Moreover, a damaged frame is still useful for error concealment at
   the decoder since some of the less sensitive bits can still be used.
   This approach can improve the speech quality compared to discarding
   the damaged frame.

3.6.1. Applying UEP and UED in an IP Network

   To take full advantage of the bit-error robustness of the AMR and
   AMR-WB codec, the RTP payload format is designed to facilitate
   UEP/UED in an IP network.  It should be noted however that the
   utilization of UEP and UED discussed below is OPTIONAL.



Sjoberg, et. al.            Standards Track                     [Page 7]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


   UEP/UED in an IP network can be achieved by detecting bit errors in
   class A bits and tolerating bit errors in class B/C bits of the AMR
   or AMR-WB frame(s) in each RTP payload.

   Today there exist some link layers that do not discard packets with
   bit errors, e.g., SLIP and some wireless links.  With the Internet
   traffic pattern shifting towards a more multimedia-centric one, more
   link layers of such nature may emerge in the future.  With transport
   layer support for partial checksums, for example those supported by
   UDP-Lite [15], bit error tolerant AMR and AMR-WB traffic could
   achieve better performance over these types of links.

   There are at least two basic approaches for carrying AMR and AMR-WB
   traffic over bit error tolerant IP networks:

   1) Utilizing a partial checksum to cover headers and the most
      important speech bits of the payload.  It is recommended that at
      least all class A bits are covered by the checksum.

   2) Utilizing a partial checksum to only cover headers, but a frame
      CRC to cover the class A bits of each speech frame in the RTP
      payload.

   In either approach, at least part of the class B/C bits are left
   without error-check and thus bit error tolerance is achieved.

      Note, it is still important that the network designer pay
      attention to the class B and C residual bit error rate.  Though
      less sensitive to errors than class A bits, class B and C bits are
      not insignificant and undetected errors in these bits cause
      degradation in speech quality.  An example of residual error rates
      considered acceptable for AMR in UMTS can be found in [20] and for
      AMR-WB in [21].

   The application interface to the UEP/UED transport protocol (e.g.,
   UDP-Lite) may not provide any control over the link error rate,
   especially in a gateway scenario.  Therefore, it is incumbent upon
   the designer of a node with a link interface of this type to choose a
   residual bit error rate that is low enough to support applications
   such as AMR encoding when transmitting packets of a UEP/UED transport
   protocol.

   Approach 1 is a bit efficient, flexible and simple way, but comes
   with two disadvantages, namely, a) bit errors in protected speech
   bits will cause the payload to be discarded, and b) when transporting
   multiple frames in a payload there is the possibility that a single
   bit error in protected bits will cause all the frames to be
   discarded.



Sjoberg, et. al.            Standards Track                     [Page 8]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


   These disadvantages can be avoided, if needed, with some overhead in
   the form of a frame-wise CRC (Approach 2).  In problem a), the CRC
   makes it possible to detect bit errors in class A bits and use the
   frame for error concealment, which gives a small improvement in
   speech quality.  For b), when transporting multiple frames in a
   payload, the CRCs remove the possibility that a single bit error in a
   class A bit will cause all the frames to be discarded.  Avoiding that
   gives an improvement in speech quality when transporting multiple
   frames over links subject to bit errors.

   The choice between the above two approaches must be made based on the
   available bandwidth, and desired tolerance to bit errors.  Neither
   solution is appropriate to all cases.  Section 8 defines parameters
   that may be used at session setup to select between these approaches.

3.7. Robustness against Packet Loss

   The payload format supports several means, including forward error
   correction (FEC) and frame interleaving, to increase robustness
   against packet loss.

3.7.1. Use of Forward Error Correction (FEC)

   The simple scheme of repetition of previously sent data is one way of
   achieving FEC.  Another possible scheme which is more bandwidth
   efficient is to use payload external FEC, e.g., RFC2733 [19], which
   generates extra packets containing repair data.  The whole payload
   can also be sorted in sensitivity order to support external FEC
   schemes using UEP.  There is also a work in progress on a generic
   version of such a scheme [18] that can be applied to AMR or AMR-WB
   payload transport.

   With AMR or AMR-WB, it is possible to use the multi-rate capability
   of the codec to send redundant copies of the same mode or of another
   mode, e.g., one with lower-bandwidth.  We describe such a scheme
   next.















Sjoberg, et. al.            Standards Track                     [Page 9]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


   This involves the simple retransmission of previously transmitted
   frame-blocks together with the current frame-block(s).  This is done
   by using a sliding window to group the speech frame-blocks to send in
   each payload.  Figure 1 below shows us an example.

   --+--------+--------+--------+--------+--------+--------+--------+--
     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
   --+--------+--------+--------+--------+--------+--------+--------+--

     <---- p(n-1) ---->
              <----- p(n) ----->
                       <---- p(n+1) ---->
                                <---- p(n+2) ---->
                                         <---- p(n+3) ---->
                                                  <---- p(n+4) ---->

              Figure 1: An example of redundant transmission.

   In this example each frame-block is retransmitted one time in the
   following RTP payload packet.  Here, f(n-2)..f(n+4) denotes a
   sequence of speech frame-blocks and p(n-1)..p(n+4) a sequence of
   payload packets.

   The use of this approach does not require signaling at the session
   setup.  In other words, the speech sender can choose to use this
   scheme without consulting the receiver.  This is because a packet
   containing redundant frames will not look different from a packet
   with only new frames.  The receiver may receive multiple copies or
   versions (encoded with different modes) of a frame for a certain
   timestamp if no packet is lost.  If multiple versions of the same
   speech frame are received, it is recommended that the mode with the
   highest rate be used by the speech decoder.

   This redundancy scheme provides the same functionality as the one
   described in RFC 2198 "RTP Payload for Redundant Audio Data" [24].
   In most cases the mechanism in this payload format is more efficient
   and simpler than requiring both endpoints to support RFC 2198 in
   addition.  There are two situations in which use of RFC 2198 is
   indicated: if the spread in time required between the primary and
   redundant encodings is larger than 5 frame times, the bandwidth
   overhead of RFC 2198 will be lower; or, if a non-AMR codec is desired
   for the redundant encoding, the AMR payload format won't be able to
   carry it.

   The sender is responsible for selecting an appropriate amount of
   redundancy based on feedback about the channel, e.g., in RTCP
   receiver reports.  A sender should not base selection of FEC on the
   CMR, as this parameter most probably was set based on none-IP



Sjoberg, et. al.            Standards Track                    [Page 10]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


   information, e.g., radio link performance measures.  The sender is
   also responsible for avoiding congestion, which may be exacerbated by
   redundancy (see Section 6 for more details).

3.7.2. Use of Frame Interleaving

   To decrease protocol overhead, the payload design allows several
   speech frame-blocks be encapsulated into a single RTP packet.  One of
   the drawbacks of such an approach is that in case of packet loss this
   means loss of several consecutive speech frame-blocks, which usually
   causes clearly audible distortion in the reconstructed speech.
   Interleaving of frame-blocks can improve the speech quality in such
   cases by distributing the consecutive losses into a series of single
   frame-block losses.  However, interleaving and bundling several
   frame-blocks per payload will also increase end-to-end delay and is
   therefore not appropriate for all types of applications.  Streaming
   applications will most likely be able to exploit interleaving to
   improve speech quality in lossy transmission conditions.

   This payload design supports the use of frame interleaving as an
   option.  For the encoder (speech sender) to use frame interleaving in
   its outbound RTP packets for a given session, the decoder (speech
   receiver) needs to indicate its support via out-of-band means (see
   Section 8).

3.8. Bandwidth Efficient or Octet-aligned Mode

   For a given session, the payload format can be either bandwidth
   efficient or octet aligned, depending on the mode of operation that
   is established for the session via out-of-band means.

   In the octet-aligned format, all the fields in a payload, including
   payload header, table of contents entries, and speech frames
   themselves, are individually aligned to octet boundaries to make
   implementations efficient.  In the bandwidth efficient format only
   the full payload is octet aligned, so fewer padding bits are added.

      Note, octet alignment of a field or payload means that the last
      octet is padded with zeroes in the least significant bits to fill
      the octet.  Also note that this padding is separate from padding
      indicated by the P bit in the RTP header.

   Between the two operation modes, only the octet-aligned mode has the
   capability to use the robust sorting, interleaving, and frame CRC to
   make the speech transport robust to packet loss and bit errors.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -