⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 rfc3190.txt

📁 RFC 的详细文档!
💻 TXT
📖 第 1 页 / 共 3 页
字号:






Network Working Group                                       K. Kobayashi
Request for Comments: 3190             Communication Research Laboratory
Category: Standards Track                                       A. Ogawa
                                                         Keio University
                                                               S. Casner
                                                           Packet Design
                                                              C. Bormann
                                                 Universitaet Bremen TZI
                                                            January 2002


                         RTP Payload Format for
        12-bit DAT Audio and 20- and 24-bit Linear Sampled Audio

Status of this Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

Abstract

   This document specifies a packetization scheme for encapsulating
   12-bit nonlinear, 20-bit linear, and 24-bit linear audio data streams
   using the Real-time Transport Protocol (RTP).  This document also
   specifies the format of a Session Description Protocol (SDP)
   parameter to indicate when audio data is preemphasized before
   sampling.  The parameter may be used with other audio payload
   formats, in particular L16 (16-bit linear).

1. Introduction

   This document describes the sampling of audio data in 12-bit
   nonlinear, 20-bit linear, and 24-bit linear encodings, and specifies
   the encapsulation of the audio data into the Real-time Transport
   Protocol (RTP), version 2 [1,2].  DAT (digital audio tape) and DV
   (digital video) devices [3,4] use these audio encodings in addition
   to 16-bit linear encoding.  The packetization scheme for 16-bit
   linear audio (L16) is already specified [2,5].  This document
   specifies the packetization scheme for the other encodings following
   that for L16; in particular, when used with the RTP profile [2],
   these payload formats follow the encoding-independent rules for



Kobayashi, et al.           Standards Track                     [Page 1]

RFC 3190                  RTP Payload Format                January 2002


   sample ordering and channel interleaving specified in [2] plus
   extensions specified here.  This document also specifies out-of-band
   negotiation methods for the extended channel interleaving rules and
   for use when an analog preemphasis technique is applied to the audio
   data.

1.1 Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [6]

2. The need for RTP encapsulation of 12-, 20- and 24-bit audio

   Many high-quality digital audio and visual systems, such as DAT and
   DV, adopt sample-based audio encodings.  Different audio formats are
   used in various situations.  To transport the audio data using RTP,
   an encapsulation needs to be defined for each specific format.  Only
   16-bit linear audio encapsulation (L16) has thus far been defined.
   Other encoding formats have already appeared, such as the 12-bit
   nonlinear, 20-bit linear and 24-bit linear encodings used in the DAT
   and DV video world.  This specification defines the RTP payload
   encapsulation format in order to use the new encodings in the RTP
   environment.

3. 12-bit nonlinear audio encapsulation

   IEC 61119 [3] specifies the 12-bit nonlinear audio format in DAT and
   DV, called LP (Long Play) audio.  It would be easy to convert 12-bit
   nonlinear audio into 16-bit linear form at the RTP sender and
   transmit it using the L16 audio format already defined.  However,
   this would consume 33% more network bandwidth than necessary.  This
   payload format is specified as a more efficient alternative.

   The 12-bit nonlinear encoding is the same as for 16-bit linear audio
   except for the packing of each sampled data element.  Each sample of
   12-bit nonlinear audio is derived from a single sample of 16-bit
   linear audio by a nonlinear compression.  Table 1 shows the details
   of the conversion from 16 to 12 bits.  The result is a 12-bit signed
   value ranging from -2048 to 2047 and it is represented in two's
   complement notation.  The 12-bit samples are packed contiguously into
   payload octets starting with the most significant bit.  When the
   payload contains an odd number of samples, the four LSBs of the last
   octet are unused.  Parameters other than quantization, e.g., sampling
   frequency and audio channel assignment, are the same as in the L16
   payload format.  In particular, samples are packed into the packet in
   time sequence beginning with the oldest sample.




Kobayashi, et al.           Standards Track                     [Page 2]

RFC 3190                  RTP Payload Format                January 2002


    ------------------------------------------------------------
     32,767 (7FFFh) Y = INT(X/64) + (600h)        2,047 (7FFh)
     16,384 (4000h)                               1,792 (700h)
    ------------------------------------------------------------
     16,383 (3FFFh) Y = INT(X/32) + (500h)        1,791 (6FFh)
      8,192 (2000h)                               1,536 (600h)
    ------------------------------------------------------------
      8,191 (1FFFh) Y = INT(X/16) + (400h)        1,535 (5FFh)
      4,096 (1000h)                               1,280 (500h)
    ------------------------------------------------------------
      4,095 (0FFFh) Y = INT(X/8) + (300h)         1,279 (4FFh)
      2,048 (0800h)                               1,024 (400h)
    ------------------------------------------------------------
      2,047 (07FFh) Y = INT(X/4) + (200h)         1,023 (3FFh)
      1,024 (0400h)                                 768 (300h)
    ------------------------------------------------------------
      1,023 (03FFh) Y = INT(X/2) + (100h)           767 (2FFh)
        512 (0200h)                                 512 (200h)
    ------------------------------------------------------------
        511 (01FFh) Y = X                           511 (1FFh)
          0 (0000h)                                   0 (000h)
    ------------------------------------------------------------
         -1 (FFFFh) Y = X                            -1 (FFFh)
       -512 (FE00h)                                -512 (E00h)
    ------------------------------------------------------------
       -513 (FFFFh) Y = INT((X + 1)/2) - (101h)    -513 (DFFh)
     -1,024 (FE00h)                                -768 (D00h)
    ------------------------------------------------------------
     -1,025 (FBFFh) Y = INT((X + 1)/4) - (201h)    -769 (CFFh)
     -2,048 (F800h)                              -1,024 (C00h)
    ------------------------------------------------------------
     -2,049 (F7FFh) Y = INT((X + 1)/8) - (301h)  -1,025 (BFFh)
     -4,096 (F000h)                              -1,280 (B00h)
    ------------------------------------------------------------
     -4,097 (EFFFh) Y = INT((X + 1)/16) - (401h) -1,281 (AFFh)
     -8,192 (E000h)                              -1,536 (A00h)
    ------------------------------------------------------------
     -8,193 (DFFFh) Y = INT((X + 1)/32) - (501h) -1,537 (9FFh)
    -16,384 (C000h)                              -1,792 (900h)
    ------------------------------------------------------------
    -16,385 (BFFFh) Y = INT((X + 1)/64) - (601h) -1,793 (8FFh)
    -32,768 (8000h)                              -2,048 (800h)
    ------------------------------------------------------------

    Table 1. Conversion from 16-bit linear values (X) to 12-bit
             nonlinear values (Y) [3]





Kobayashi, et al.           Standards Track                     [Page 3]

RFC 3190                  RTP Payload Format                January 2002


   When conveying encoding information in an SDP [7] session
   description, the 12-bit nonlinear audio payload format specified here
   is given the encoding name "DAT12".  Thus, the media format
   representation might be:

      m=audio 49230 RTP/AVP 97 98
      a=rtpmap:97 DAT12/32000/2
      a=rtpmap:98 L16/48000/2

4. 20- and 24-bit linear audio encapsulation

   The 20- and 24-bit linear audio encodings are simply an extension of
   the L16 linear audio encoding [2].  The 20- or 24-bit uncompressed
   audio data samples are represented as signed values in two's
   complement notation.  The samples are packed contiguously into
   payload octets starting with the most significant bit.  For the
   20-bit encoding, when the payload contains an odd number of samples,
   the four LSBs of the last octet are unused.  Samples are packed into
   the packet in time sequence beginning with the oldest sample.

   When conveying encoding information in an SDP session description,
   the 20- and 24-bit linear audio payload formats specified here are
   given the encoding names "L20" and "L24", respectively.  An example
   SDP audio media description would be:

      m=audio 49230 RTP/AVP 99 100
      a=rtpmap:99 L20/48000/2
      a=rtpmap:100 L24/48000

5. Preemphasized audio data

   In order to improve the higher frequency characteristics of audio
   signals, analog preemphasis is often applied to the signal before
   quantization.  If analog preemphasis was applied before the payload
   data was sampled, the type of the preemphasis SHOULD be conveyed with
   out-of-band signaling.  An "emphasis" parameter is defined for this
   purpose and may be conveyed either as a MIME optional parameter or
   using the SDP format-specific attribute (a=fmtp line) as below:

      a=fmtp:<payload type> emphasis=<emphasis type>

   Only one <emphasis type> value is defined for the parameter at this
   point:

      50-15           <50/15 microsecond CD-type emphasis>






Kobayashi, et al.           Standards Track                     [Page 4]

RFC 3190                  RTP Payload Format                January 2002


   The emphasis attribute MUST NOT be included in the SDP record if
   preemphasis was not applied.  This rule allows the emphasis attribute
   to be used with other audio formats, in particular L16 [2], while
   retaining backward compatibility with existing implementations so
   long as preemphasis is not applied.  If an existing application that
   does not implement preemphasis accepts a session description with an
   emphasis attribute but ignores that attribute, the only penalty is
   that the sound will be too "bright" when receiving or "dull" when
   sending.

   A sample SDP record showing preemphasis applied only to payload type
   99 might be as follows:

      m=audio 49230 RTP/AVP 99 100
      a=rtpmap:99 L20/48000/2
      a=fmtp:99 emphasis=50-15
      a=rtpmap:100 L24/48000

6. Translation of DV audio error code

   The DV video specification IEC 61834-4 [4] defines the negative full-
   scale audio sample value to be an audio error code indicating that no
   valid audio sample is available for that sample period.  Such an
   error might occur due to a failure while reading audio data from
   magnetic tape.  The audio error code values for each of the DV audio
   encodings are (in hexadecimal):

      12-bit nonlinear:  800h
      16-bit linear:     8000h
      20-bit linear:     80000h

   For the payload formats defined in this document, as well as for the
   L16 payload format defined in [2], no such error code is defined.
   That is, all possible sample values are valid.  When an RTP sender
   accepts audio samples from a DV video system and encapsulates those
   samples according to one of these payload formats, the RTP sender
   SHOULD perform some error concealment algorithm which may depend upon
   whether a single sample error or multiple sample errors have
   occurred.  The error concealment algorithm is not specified here and
   is left to the implementation.  The RTP sender MAY treat the error
   code as if it were a valid audio sample, but this is likely to cause
   undesirable audio output.

   Conversely, an RTP receiver that accepts audio packets in one of
   these payload formats and delivers the audio samples to a DV video
   system SHOULD translate the audio samples that would be interpreted
   as error codes into the next smaller negative audio value.  Such
   audio samples may be present because the audio packets may have come



Kobayashi, et al.           Standards Track                     [Page 5]

RFC 3190                  RTP Payload Format                January 2002


   from a source other than a DV video system.  The DV video
   specification [4] gives the following translations for the defined
   audio encodings:

      12-bit nonlinear:  800h              ->  801h
      16-bit linear:     8000h             ->  8001h
      20-bit linear:     80000h - 8000Fh   ->  80010h

   For the 20-bit linear encoding, note that multiple audio sample
   values are translated in order to allow a 16-bit system to play 20-
   bit audio data by ignoring the least significant four bits.  Note
   also that no translation is specified for 24-bit linear audio because
   that encoding is not included in the DV video specification.

7. Channel interleaving and non-AIFF-C audio channel convention

   When multiple channels of audio, such as in a stereo program, are
   multiplexed into a single RTP stream, the audio samples from each
   channel are interleaved according to the rules specified in [2] to be
   consistent with the L16 payload format.  That is, samples from
   different channels taken at the same sampling instant are packed into
   consecutive octets.  For example, for a two-channel encoding, the
   sample sequence is (left channel, first sample), (right channel,
   first sample), (left channel, second sample), (right channel, second
   sample).  Samples for all channels belonging to a single sampling
   instant MUST be contained in the same packet.

   This sample order differs from the packing of samples into blocks in
   a native DV audio stream.  Therefore, applications transmitting DV
   audio using the payload formats defined in this document MUST
   reshuffle the samples into the order specified here.  This
   requirement is intended to enable interworking between DV systems and
   other digital audio systems.  Applications choosing to send bundled

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -