⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 rfc3267.txt

📁 RFC 的详细文档!
💻 TXT
📖 第 1 页 / 共 5 页
字号:






Network Working Group                                         J. Sjoberg
Request for Comments: 3267                                 M. Westerlund
Category: Standards Track                                       Ericsson
                                                            A. Lakaniemi
                                                                   Nokia
                                                                  Q. Xie
                                                                Motorola
                                                               June 2002


   Real-Time Transport Protocol (RTP) Payload Format and File Storage
    Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate
                     Wideband (AMR-WB) Audio Codecs

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

Abstract

   This document specifies a real-time transport protocol (RTP) payload
   format to be used for Adaptive Multi-Rate (AMR) and Adaptive Multi-
   Rate Wideband (AMR-WB) encoded speech signals.  The payload format is
   designed to be able to interoperate with existing AMR and AMR-WB
   transport formats on non-IP networks.  In addition, a file format is
   specified for transport of AMR and AMR-WB speech data in storage mode
   applications such as email.  Two separate MIME type registrations are
   included, one for AMR and one for AMR-WB, specifying use of both the
   RTP payload format and the storage format.














Sjoberg, et. al.            Standards Track                     [Page 1]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


Table of Contents

   1. Introduction.................................................... 3
   2. Conventions and Acronyms........................................ 3
   3. Background on AMR/AMR-WB and Design Principles.................. 4
     3.1. The Adaptive Multi-Rate (AMR) Speech Codec.................. 4
     3.2. The Adaptive Multi-Rate Wideband (AMR-WB) Speech Codec...... 5
     3.3. Multi-rate Encoding and Mode Adaptation..................... 5
     3.4. Voice Activity Detection and Discontinuous Transmission..... 6
     3.5. Support for Multi-Channel Session........................... 6
     3.6. Unequal Bit-error Detection and Protection.................. 7
       3.6.1. Applying UEP and UED in an IP Network................... 7
     3.7. Robustness against Packet Loss.............................. 9
       3.7.1. Use of Forward Error Correction (FEC)................... 9
       3.7.2. Use of Frame Interleaving...............................11
     3.8. Bandwidth Efficient or Octet-aligned Mode...................11
     3.9. AMR or AMR-WB Speech over IP scenarios......................12
   4. AMR and AMR-WB RTP Payload Formats..............................14
     4.1. RTP Header Usage............................................14
     4.2. Payload Structure...........................................16
     4.3. Bandwidth-Efficient Mode....................................16
       4.3.1. The Payload Header......................................16
       4.3.2. The Payload Table of Contents...........................17
       4.3.3. Speech Data.............................................19
       4.3.4. Algorithm for Forming the Payload.......................20
       4.3.5 Payload Examples.........................................21
            4.3.5.1. Single Channel Payload Carrying a Single Frame...21
            4.3.5.2. Single Channel Payload Carrying Multiple Frames..22
            4.3.5.3. Multi-Channel Payload Carrying Multiple Frames...23
     4.4. Octet-aligned Mode..........................................25
       4.4.1. The Payload Header......................................25
       4.4.2. The Payload Table of Contents and Frame CRCs............26
         4.4.2.1. Use of Frame CRC for UED over IP....................28
       4.4.3. Speech Data.............................................30
       4.4.4. Methods for Forming the Payload.........................30
       4.4.5. Payload Examples........................................32
            4.4.5.1. Basic Single Channel Payload Carrying
                     Multiple Frames..................................32
         4.4.5.2. Two Channel Payload with CRC, Interleaving,
                     and Robust-sorting...............................32
     4.5. Implementation Considerations...............................33
   5. AMR and AMR-WB Storage Format...................................34
     5.1. Single Channel Header.......................................34
     5.2. Multi-channel Header........................................35
     5.3. Speech Frames...............................................36
   6. Congestion Control..............................................37
   7. Security Considerations.........................................37
     7.1. Confidentiality.............................................37



Sjoberg, et. al.            Standards Track                     [Page 2]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


     7.2. Authentication..............................................38
     7.3. Decoding Validation.........................................38
   8. Payload Format Parameters.......................................38
     8.1. AMR MIME Registration.......................................39
     8.2. AMR-WB MIME Registration....................................41
     8.3. Mapping MIME Parameters into SDP............................44
   9. IANA Considerations.............................................45
   10. Acknowledgements...............................................45
   11. References.....................................................45
     11.1 Informative References......................................46
   12. Authors' Addresses.............................................48
   13. Full Copyright Statement.......................................49

1. Introduction

   This document specifies the payload format for packetization of AMR
   and AMR-WB encoded speech signals into the Real-time Transport
   Protocol (RTP) [8].  The payload format supports transmission of
   multiple channels, multiple frames per payload, the use of fast codec
   mode adaptation, robustness against packet loss and bit errors, and
   interoperation with existing AMR and AMR-WB transport formats on
   non-IP networks, as described in Section 3.

   The payload format itself is specified in Section 4.  A related file
   format is specified in Section 5 for transport of AMR and AMR-WB
   speech data in storage mode applications such as email.  In Section
   8, two separate MIME type registrations are provided, one for AMR and
   one for AMR-WB.

   Even though this RTP payload format definition supports the transport
   of both AMR and AMR-WB speech, it is important to remember that AMR
   and AMR-WB are two different codecs and they are always handled as
   different payload types in RTP.

2. Conventions and Acronyms

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC2119 [5].

   The following acronyms are used in this document:

      3GPP   - the Third Generation Partnership Project
      AMR    - Adaptive Multi-Rate Codec
      AMR-WB - Adaptive Multi-Rate Wideband Codec
      CMR    - Codec Mode Request
      CN     - Comfort Noise
      DTX    - Discontinuous Transmission



Sjoberg, et. al.            Standards Track                     [Page 3]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


      ETSI   - European Telecommunications Standards Institute
      FEC    - Forward Error Correction
      SCR    - Source Controlled Rate Operation
      SID    - Silence Indicator (the frames containing only CN
               parameters)
      VAD    - Voice Activity Detection
      UED    - Unequal Error Detection
      UEP    - Unequal Error Protection

   The term "frame-block" is used in this document to describe the
   time-synchronized set of speech frames in a multi-channel AMR or
   AMR-WB session.  In particular, in an N-channel session, a frame-
   block will contain N speech frames, one from each of the channels,
   and all N speech frames represents exactly the same time period.

3. Background on AMR/AMR-WB and Design Principles

   AMR and AMR-WB were originally designed for circuit-switched mobile
   radio systems.  Due to their flexibility and robustness, they are
   also suitable for other real-time speech communication services over
   packet-switched networks such as the Internet.

   Because of the flexibility of these codecs, the behavior in a
   particular application is controlled by several parameters that
   select options or specify the acceptable values for a variable.
   These options and variables are described in general terms at
   appropriate points in the text of this specification as parameters to
   be established through out-of-band means.  In Section 8, all of the
   parameters are specified in the form of MIME subtype registrations
   for the AMR and AMR-WB encodings.  The method used to signal these
   parameters at session setup or to arrange prior agreement of the
   participants is beyond the scope of this document; however, Section
   8.3 provides a mapping of the parameters into the Session Description
   Protocol (SDP) [11] for those applications that use SDP.

3.1. The Adaptive Multi-Rate (AMR) Speech Codec

   The AMR codecs was originally developed and standardized by the
   European Telecommunications Standards Institute (ETSI) for GSM
   cellular systems.  It is now chosen by the Third Generation
   Partnership Project (3GPP) as the mandatory codec for third
   generation (3G) cellular systems [1].

   The AMR codec is a multi-mode codec that supports 8 narrow band
   speech encoding modes with bit rates between 4.75 and 12.2 kbps.  The
   sampling frequency used in AMR is 8000 Hz and the speech encoding is
   performed on 20 ms speech frames.  Therefore, each encoded AMR speech
   frame represents 160 samples of the original speech.



Sjoberg, et. al.            Standards Track                     [Page 4]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


   Among the 8 AMR encoding modes, three are already separately adopted
   as standards of their own.  Particularly, the 6.7 kbps mode is
   adopted as PDC-EFR [14], the 7.4 kbps mode as IS-641 codec in TDMA
   [13], and the 12.2 kbps mode as GSM-EFR [12].

3.2. The Adaptive Multi-Rate Wideband (AMR-WB) Speech Codec

   The Adaptive Multi-Rate Wideband (AMR-WB) speech codec [3] was
   originally developed by 3GPP to be used in GSM and 3G cellular
   systems.

   Similar to AMR, the AMR-WB codec is also a multi-mode speech codec.
   AMR-WB supports 9 wide band speech coding modes with respective bit
   rates ranging from 6.6 to 23.85 kbps.  The sampling frequency used in
   AMR-WB is 16000 Hz and the speech processing is performed on 20 ms
   frames.  This means that each AMR-WB encoded frame represents 320
   speech samples.

3.3. Multi-rate Encoding and Mode Adaptation

   The multi-rate encoding (i.e., multi-mode) capability of AMR and
   AMR-WB is designed for preserving high speech quality under a wide
   range of transmission conditions.

   With AMR or AMR-WB, mobile radio systems are able to use available
   bandwidth as effectively as possible.  E.g., in GSM it is possible to
   dynamically adjust the speech encoding rate during a session so as to
   continuously adapt to the varying transmission conditions by dividing
   the fixed overall bandwidth between speech data and error protective
   coding to enable best possible trade-off between speech compression
   rate and error tolerance.  To perform mode adaptation, the decoder
   (speech receiver) needs to signal the encoder (speech sender) the new
   mode it prefers.  This mode change signal is called Codec Mode
   Request or CMR.

   Since in most sessions speech is sent in both directions between the
   two ends, the mode requests from the decoder at one end to the
   encoder at the other end are piggy-backed over the speech frames in
   the reverse direction.  In other words, there is no out-of-band
   signaling needed for sending CMRs.

   Every AMR or AMR-WB codec implementation is required to support all
   the respective speech coding modes defined by the codec and must be
   able to handle mode switching to any of the modes at any time.
   However, some transport systems may impose limitations in the number
   of modes supported and how often the mode can change due to bandwidth





Sjoberg, et. al.            Standards Track                     [Page 5]

RFC 3267        RTP Payload Format for AMR and AMR-WB          June 2002


   limitations or other constraints.  For this reason, the decoder is
   allowed to indicate its acceptance of a particular mode or a subset
   of the defined modes for the session using out-of-band means.

   For example, the GSM radio link can only use a subset of at most four
   different modes in a given session.  This subset can be any
   combination of the 8 AMR modes for an AMR session or any combination
   of the 9 AMR-WB modes for an AMR-WB session.

   Moreover, for better interoperability with GSM through a gateway, the
   decoder is allowed to use out-of-band means to set the minimum number
   of frames between two mode changes and to limit the mode change among
   neighboring modes only.

   Section 8 specifies a set of MIME parameters that may be used to
   signal these mode adaptation controls at session setup.

3.4. Voice Activity Detection and Discontinuous Transmission

   Both codecs support voice activity detection (VAD) and generation of
   comfort noise (CN) parameters during silence periods.  Hence, the
   codecs have the option to reduce the number of transmitted bits and

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -