⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ilbc.txt

📁 ilbc 源码
💻 TXT
📖 第 1 页 / 共 5 页
字号:

   The two consecutive sub-blocks of the residual exhibiting the maximal
   weighted energy are identified.  Within these two sub-blocks, the
   start state (segment) is selected from two choices: the first 57/58
   samples or the last 57/58 samples of the two consecutive sub-blocks.
   The selected segment is the one of higher energy.  The start state is
   encoded with scalar quantization.

   A dynamic codebook encoding procedure is used to encode 1) the 23/22
   (20 ms/30 ms) remaining samples in the two sub-blocks containing the
   start state; 2) the sub-blocks after the start state in time; and 3)
   the sub-blocks before the start state in time.  Thus, the encoding
   target can be either the 23/22 samples remaining of the two sub-
   blocks containing the start state or a 40-sample sub-block.  This
   target can consist of samples indexed forward in time or backward in
   time, depending on the location of the start state.

   The codebook coding is based on an adaptive codebook built from a
   codebook memory that contains decoded LPC excitation samples from the
   already encoded part of the block.  These samples are indexed in the
   same time direction as the target vector, ending at the sample
   instant prior to the first sample instant represented in the target
   vector.  The codebook is used in CB_NSTAGES=3 stages in a successive
   refinement approach, and the resulting three code vector gains are
   encoded with 5-, 4-, and 3-bit scalar quantization, respectively.

   The codebook search method employs noise shaping derived from the LPC
   filters, and the main decision criterion is to minimize the squared
   error between the target vector and the code vectors.  Each code
   vector in this codebook comes from one of CB_EXPAND=2 codebook
   sections.  The first section is filled with delayed, already encoded
   residual vectors.  The code vectors of the second codebook section
   are constructed by predefined linear combinations of vectors in the
   first section of the codebook.

   As codebook encoding with squared-error matching is known to produce
   a coded signal of less power than does the scalar quantized start
   state signal, a gain re-scaling method is implemented by a refined
   search for a better set of codebook gains in terms of power matching
   after encoding.  This is done by searching for a higher value of the
   gain factor for the first stage codebook, as the subsequent stage
   codebook gains are scaled by the first stage gain.




Andersen, et al.              Experimental                      [Page 6]

RFC 3951              Internet Low Bit Rate Codec          December 2004


2.2.  Decoder

   Typically for packet communications, a jitter buffer placed at the
   receiving end decides whether the packet containing an encoded signal
   block has been received or lost.  This logic is not part of the codec
   described here.  For each encoded signal block received the decoder
   performs a decoding.  For each lost signal block, the decoder
   performs a PLC operation.

   The decoding for each block starts by decoding and interpolating the
   LPC coefficients.  Subsequently the start state is decoded.

   For codebook-encoded segments, each segment is decoded by
   constructing the three code vectors given by the received codebook
   indices in the same way that the code vectors were constructed in the
   encoder.  The three gain factors are also decoded and the resulting
   decoded signal is given by the sum of the three codebook vectors
   scaled with respective gain.

   An enhancement algorithm is applied to the reconstructed excitation
   signal.  This enhancement augments the periodicity of voiced speech
   regions.  The enhancement is optimized under the constraint that the
   modification signal (defined as the difference between the enhanced
   excitation and the excitation signal prior to enhancement) has a
   short-time energy that does not exceed a preset fraction of the
   short-time energy of the excitation signal prior to enhancement.

   A packet loss concealment (PLC) operation is easily embedded in the
   decoder.  The PLC operation can, e.g., be based on repeating LPC
   filters and obtaining the LPC residual signal by using a long-term
   prediction estimate from previous residual blocks.

3.  Encoder Principles

   The following block diagram is an overview of all the components of
   the iLBC encoding procedure.  The description of the blocks contains
   references to the section where that particular procedure is further
   described.













Andersen, et al.              Experimental                      [Page 7]

RFC 3951              Internet Low Bit Rate Codec          December 2004


             +-----------+    +---------+    +---------+
   speech -> | 1. Pre P  | -> | 2. LPC  | -> | 3. Ana  | ->
             +-----------+    +---------+    +---------+

             +---------------+   +--------------+
          -> | 4. Start Sel  | ->| 5. Scalar Qu | ->
             +---------------+   +--------------+

             +--------------+    +---------------+
          -> |6. CB Search  | -> | 7. Packetize  | -> payload
          |  +--------------+ |  +---------------+
          ----<---------<------
       sub-frame 0..2/4 (20 ms/30 ms)

   Figure 3.1. Flow chart of the iLBC encoder

   1. Pre-process speech with a HP filter, if needed (section 3.1).

   2. Compute LPC parameters, quantize, and interpolate (section 3.2).

   3. Use analysis filters on speech to compute residual (section 3.3).

   4. Select position of 57/58-sample start state (section 3.5).

   5. Quantize the 57/58-sample start state with scalar quantization
      (section 3.5).

   6. Search the codebook for each sub-frame.  Start with 23/22 sample
      block, then encode sub-blocks forward in time, and then encode
      sub-blocks backward in time.  For each block, the steps in Figure
      3.4 are performed (section 3.6).

   7. Packetize the bits into the payload specified in Table 3.2.

   The input to the encoder SHOULD be 16-bit uniform PCM sampled at 8
   kHz.  Also it SHOULD be partitioned into blocks of BLOCKL=160/240
   samples.  Each block input to the encoder is divided into NSUB=4/6
   consecutive sub-blocks of SUBL=40 samples each.













Andersen, et al.              Experimental                      [Page 8]

RFC 3951              Internet Low Bit Rate Codec          December 2004


             0        39        79       119       159
             +---------------------------------------+
             |    1    |    2    |    3    |    4    |
             +---------------------------------------+
                            20 ms frame

   0        39        79       119       159       199       239
   +-----------------------------------------------------------+
   |    1    |    2    |    3    |    4    |    5    |    6    |
   +-----------------------------------------------------------+
                                  30 ms frame
   Figure 3.2. One input block to the encoder for 20 ms (with four sub-
   frames) and 30 ms (with six sub-frames).

3.1.  Pre-processing

   In some applications, the recorded speech signal contains DC level
   and/or 50/60 Hz noise.  If these components have not been removed
   prior to the encoder call, they should be removed by a high-pass
   filter.  A reference implementation of this, using a filter with a
   cutoff frequency of 90 Hz, can be found in Appendix A.28.

3.2.  LPC Analysis and Quantization

   The input to the LPC analysis module is a possibly high-pass filtered
   speech buffer, speech_hp, that contains 240/300 (LPC_LOOKBACK +
   BLOCKL = 80/60 + 160/240 = 240/300) speech samples, where samples 0
   through 79/59 are from the previous block and samples 80/60 through
   239/299 are from the current block.  No look-ahead into the next
   block is used.  For the very first block processed, the look-back
   samples are assumed to be zeros.

   For each input block, the LPC analysis calculates one/two set(s) of
   LPC_FILTERORDER=10 LPC filter coefficients using the autocorrelation
   method and the Levinson-Durbin recursion.  These coefficients are
   converted to the Line Spectrum Frequency representation.  In the 20
   ms case, the single lsf set represents the spectral characteristics
   as measured at the center of the third sub-block.  For 30 ms frames,
   the first set, lsf1, represents the spectral properties of the input
   signal at the center of the second sub-block, and the other set,
   lsf2, represents the spectral characteristics as measured at the
   center of the fifth sub-block.  The details of the computation for 30
   ms frames are described in sections 3.2.1 through 3.2.6.  Section
   3.2.7 explains how the LPC Analysis and Quantization differs for 20
   ms frames.






Andersen, et al.              Experimental                      [Page 9]

RFC 3951              Internet Low Bit Rate Codec          December 2004


3.2.1.  Computation of Autocorrelation Coefficients

   The first step in the LPC analysis procedure is to calculate
   autocorrelation coefficients by using windowed speech samples.  This
   windowing is the only difference in the LPC analysis procedure for
   the two sets of coefficients.  For the first set, a 240-sample-long
   standard symmetric Hanning window is applied to samples 0 through 239
   of the input data.  The first window, lpc_winTbl, is defined as

      lpc_winTbl[i]= 0.5 * (1.0 - cos((2*PI*(i+1))/(BLOCKL+1)));
               i=0,...,119
      lpc_winTbl[i] = winTbl[BLOCKL - i - 1]; i=120,...,239

   The windowed speech speech_hp_win1 is then obtained by multiplying
   the first 240 samples of the input speech buffer with the window
   coefficients:

      speech_hp_win1[i] = speech_hp[i] * lpc_winTbl[i];
               i=0,...,BLOCKL-1

   From these 240 windowed speech samples, 11 (LPC_FILTERORDER + 1)
   autocorrelation coefficients, acf1, are calculated:

      acf1[lag] += speech_hp_win1[n] * speech_hp_win1[n + lag];
               lag=0,...,LPC_FILTERORDER; n=0,...,BLOCKL-lag-1

   In order to make the analysis more robust against numerical precision
   problems, a spectral smoothing procedure is applied by windowing the
   autocorrelation coefficients before the LPC coefficients are
   computed.  Also, a white noise floor is added to the autocorrelation
   function by multiplying coefficient zero by 1.0001 (40dB below the
   energy of the windowed speech signal).  These two steps are
   implemented by multiplying the autocorrelation coefficients with the
   following window:

      lpc_lagwinTbl[0] = 1.0001;
      lpc_lagwinTbl[i] = exp(-0.5 * ((2 * PI * 60.0 * i) /FS)^2);
               i=1,...,LPC_FILTERORDER
               where FS=8000 is the sampling frequency

   Then, the windowed acf function acf1_win is obtained by

      acf1_win[i] = acf1[i] * lpc_lagwinTbl[i];
               i=0,...,LPC_FILTERORDER

   The second set of autocorrelation coefficients, acf2_win, are
   obtained in a similar manner.  The window, lpc_asymwinTbl, is applied
   to samples 60 through 299, i.e., the entire current block.  The



Andersen, et al.              Experimental                     [Page 10]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   window consists of two segments, the first (samples 0 to 219) being
   half a Hanning window with length 440 and the second a quarter of a
   cycle of a cosine wave.  By using this asymmetric window, an LPC
   analysis centered in the fifth sub-block is obtained without the need
   for any look-ahead, which would add delay.  The asymmetric window is
   defined as

      lpc_asymwinTbl[i] = (sin(PI * (i + 1) / 441))^2; i=0,...,219

      lpc_asymwinTbl[i] = cos((i - 220) * PI / 40); i=220,...,239

   and the windowed speech is computed by

      speech_hp_win2[i] = speech_hp[i + LPC_LOOKBACK] *
               lpc_asymwinTbl[i];  i=0,....BLOCKL-1

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -