📄 ilbc.txt

📁 ilbc 源码
💻 TXT
📖 第 1 页 / 共 5 页
字号:
   the frame.

   For 20 ms frames there are three possible positions for the two-sub-
   block length maximum power segment; the start state position is
   encoded with 2 bits.  The start state position, start, MUST be
   encoded as

      start=1: start state in sub-frame 0 and 1
      start=2: start state in sub-frame 1 and 2
      start=3: start state in sub-frame 2 and 3

   For 30 ms frames there are five possible positions of the two-sub-
   block length maximum power segment, the start state position is
   encoded with 3 bits.  The start state position, start, MUST be
   encoded as

      start=1: start state in sub-frame 0 and 1
      start=2: start state in sub-frame 1 and 2
      start=3: start state in sub-frame 2 and 3
      start=4: start state in sub-frame 3 and 4
      start=5: start state in sub-frame 4 and 5





Andersen, et al.              Experimental                     [Page 16]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   Hence, in both cases, index 0 is not used.  In order to shorten the
   start state for bit rate efficiency, the start state is brought down
   to STATE_SHORT_LEN=57 samples for 20 ms frames and STATE_SHORT_LEN=58
   samples for 30 ms frames.  The power of the first 23/22 and last
   23/22 samples of the two sub-frame blocks identified above is
   computed as the sum of the squared signal sample values, and the
   23/22-sample segment with the lowest power is excluded from the start
   state.  One bit is transmitted to indicate which of the two possible
   57/58 sample segments is used.  The start state position within the
   two sub-frames determined above, state_first, MUST be encoded as

      state_first=1: start state is first STATE_SHORT_LEN samples
      state_first=0: start state is last STATE_SHORT_LEN samples

3.5.2.  All-Pass Filtering and Scale Quantization

   The block of residual samples in the start state is first filtered by
   an all-pass filter with the quantized LPC coefficients as denominator
   and reversed quantized LPC coefficients as numerator.  The purpose of
   this phase-dispersion filter is to get a more even distribution of
   the sample values in the residual signal.  The filtering is performed
   by circular convolution, where the initial filter memory is set to
   zero.

      res(0..(STATE_SHORT_LEN-1))   = uncoded start state residual
      res((STATE_SHORT_LEN)..(2*STATE_SHORT_LEN-1)) = 0

      Pk(z) = A~rk(z)/A~k(z), where
                                   ___
                                   \
      A~rk(z)= z^(-LPC_FILTERORDER)+>a~k(i+1)*z^(i-(LPC_FILTERORDER-1))
                                   /__
                               i=0...(LPC_FILTERORDER-1)

      and A~k(z) is taken from the block where the start state begins

      res -> Pk(z) -> filtered

      ccres(k) = filtered(k) + filtered(k+STATE_SHORT_LEN),
                                        k=0..(STATE_SHORT_LEN-1)

   The all-pass filtered block is searched for its largest magnitude
   sample.  The 10-logarithm of this magnitude is quantized with a 6-bit
   quantizer, state_frgqTbl, by finding the nearest representation.







Andersen, et al.              Experimental                     [Page 17]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   This results in an index, idxForMax, corresponding to a quantized
   value, qmax.  The all-pass filtered residual samples in the block are
   then multiplied with a scaling factor scal=4.5/(10^qmax) to yield
   normalized samples.

   state_frgqTbl[64] = {1.000085, 1.071695, 1.140395, 1.206868,
                  1.277188, 1.351503, 1.429380, 1.500727, 1.569049,
                  1.639599, 1.707071, 1.781531, 1.840799, 1.901550,
                  1.956695, 2.006750, 2.055474, 2.102787, 2.142819,
                  2.183592, 2.217962, 2.257177, 2.295739, 2.332967,
                  2.369248, 2.402792, 2.435080, 2.468598, 2.503394,
                  2.539284, 2.572944, 2.605036, 2.636331, 2.668939,
                  2.698780, 2.729101, 2.759786, 2.789834, 2.818679,
                  2.848074, 2.877470, 2.906899, 2.936655, 2.967804,
                  3.000115, 3.033367, 3.066355, 3.104231, 3.141499,
                  3.183012, 3.222952, 3.265433, 3.308441, 3.350823,
                  3.395275, 3.442793, 3.490801, 3.542514, 3.604064,
                  3.666050, 3.740994, 3.830749, 3.938770, 4.101764}

3.5.3.  Scalar Quantization

   The normalized samples are quantized in the perceptually weighted
   speech domain by a sample-by-sample scalar DPCM quantization as
   depicted in Figure 3.3.  Each sample in the block is filtered by a
   weighting filter Wk(z), specified in section 3.4, to form a weighted
   speech sample x[n].  The target sample d[n] is formed by subtracting
   a predicted sample y[n], where the prediction filter is given by

           Pk(z) = 1 - 1 / Wk(z).

               +-------+  x[n] +    d[n] +-----------+ u[n]
   residual -->| Wk(z) |-------->(+)---->| Quantizer |------> quantized
               +-------+       - /|\     +-----------+    |   residual
                                  |                      \|/
                             y[n] +--------------------->(+)
                                  |                       |
                                  |        +------+       |
                                  +--------| Pk(z)|<------+
                                           +------+

   Figure 3.3.  Quantization of start state samples by DPCM in weighted
   speech domain.

   The coded state sample u[n] is obtained by quantizing d[n] with a 3-
   bit quantizer with quantization table state_sq3Tbl.

   state_sq3Tbl[8] = {-3.719849, -2.177490, -1.130005, -0.309692,
                  0.444214, 1.329712, 2.436279, 3.983887}



Andersen, et al.              Experimental                     [Page 18]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   The quantized samples are transformed back to the residual domain by
   1) scaling with 1/scal; 2) time-reversing the scaled samples; 3)
   filtering the time-reversed samples by the same all-pass filter, as
   in section 3.5.2, by using circular convolution; and 4) time-
   reversing the filtered samples.  (More detail is in section 4.2.)

   A reference implementation of the start-state encoding can be found
   in Appendix A.46.

3.6.  Encoding the Remaining Samples

   A dynamic codebook is used to encode 1) the 23/22 remaining samples
   in the two sub-blocks containing the start state; 2) the sub-blocks
   after the start state in time; and 3) the sub-blocks before the start
   state in time.  Thus, the encoding target can be either the 23/22
   samples remaining of the 2 sub-blocks containing the start state, or
   a 40-sample sub-block.  This target can consist of samples that are
   indexed forward in time or backward in time, depending on the
   location of the start state.  The length of the target is denoted by
   lTarget.

   The coding is based on an adaptive codebook that is built from a
   codebook memory that contains decoded LPC excitation samples from the
   already encoded part of the block.  These samples are indexed in the
   same time direction as is the target vector and end at the sample
   instant prior to the first sample instant represented in the target
   vector.  The codebook memory has length lMem, which is equal to
   CB_MEML=147 for the two/four 40-sample sub-blocks and 85 for the
   23/22-sample sub-block.

   The following figure shows an overview of the encoding procedure.

         +------------+    +---------------+    +-------------+
      -> | 1. Decode  | -> | 2. Mem setup  | -> | 3. Perc. W. | ->
         +------------+    +---------------+    +-------------+

         +------------+    +-----------------+
      -> | 4. Search  | -> | 5. Upd. Target  | ------------------>
       | +------------+    +------------------ |
       ----<-------------<-----------<----------
                     stage=0..2

         +----------------+
      -> | 6. Recalc G[0] | ---------------> gains and CB indices
         +----------------+

   Figure 3.4.  Flow chart of the codebook search in the iLBC encoder.




Andersen, et al.              Experimental                     [Page 19]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   1. Decode the part of the residual that has been encoded so far,
      using the codebook without perceptual weighting.

   2. Set up the memory by taking data from the decoded residual.  This
      memory is used to construct codebooks.  For blocks preceding the
      start state, both the decoded residual and the target are time
      reversed (section 3.6.1).
   3. Filter the memory + target with the perceptual weighting filter
      (section 3.6.2).

   4. Search for the best match between the target and the codebook
      vector.  Compute the optimal gain for this match and quantize that
      gain (section 3.6.4).

   5. Update the perceptually weighted target by subtracting the
      contribution from the selected codebook vector from the
      perceptually weighted memory (quantized gain times selected
      vector).  Repeat 4 and 5 for the two additional stages.

   6. Calculate the energy loss due to encoding of the residual.  If
      needed, compensate for this loss by an upscaling and
      requantization of the gain for the first stage (section 3.7).

   The following sections provide an in-depth description of the
   different blocks of Figure 3.4.

3.6.1.  Codebook Memory

   The codebook memory is based on the already encoded sub-blocks, so
   the available data for encoding increases for each new sub-block that
   has been encoded.  Until enough sub-blocks have been encoded to fill
   the codebook memory with data, it is padded with zeros.  The
   following figure shows an example of the order in which the sub-
   blocks are encoded for the 30 ms frame size if the start state is
   located in the last 58 samples of sub-block 2 and 3.

   +-----------------------------------------------------+
   |  5     | 1  |///|////////|    2   |    3   |    4   |
   +-----------------------------------------------------+

   Figure 3.5.  The order from 1 to 5 in which the sub-blocks are
   encoded.  The slashed area is the start state.









Andersen, et al.              Experimental                     [Page 20]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   The first target sub-block to be encoded is number 1, and the
   corresponding codebook memory is shown in the following figure.  As
   the target vector comes before the start state in time, the codebook
   memory and target vector are time reversed; thus, after the block has
   been time reversed the search algorithm can be reused.  As only the
   start state has been encoded so far, the last samples of the codebook
   memory are padded with zeros.

   +-------------------------
   |zeros|\\\\\\\\|\\\\|  1 |
   +-------------------------

   Figure 3.6.  The codebook memory, length lMem=85 samples, and the
   target vector 1, length 22 samples.

   The next step is to encode sub-block 2 by using the memory that now
   has increased since sub-block 1 has been encoded.  The following
   figure shows the codebook memory for encoding of sub-block 2.

   +-----------------------------------
   | zeros | 1  |///|////////|    2   |
   +-----------------------------------

   Figure 3.7.  The codebook memory, length lMem=147 samples, and the
   target vector 2, length 40 samples.

   The next step is to encode sub-block 3 by using the memory which has
   been increased yet again since sub-blocks 1 and 2 have been encoded,
   but the sub-block still has to be padded with a few zeros.  The
   following figure shows the codebook memory for encoding of sub-block
   3.

   +------------------------------------------
   |zeros| 1  |///|////////|    2   |   3    |
   +------------------------------------------
💿 文件大小 67 K
👤 上传用户 liubixing
📂 所属分类压缩解压
🏷️ 相关标签

#ilbc #源码
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -