📄 ilbc.txt

📁 ilbc 源码
💻 TXT
📖 第 1 页 / 共 5 页
字号:

   The windowed autocorrelation coefficients are then obtained in
   exactly the same way as for the first analysis instance.

   The generation of the windows lpc_winTbl, lpc_asymwinTbl, and
   lpc_lagwinTbl are typically done in advance, and the arrays are
   stored in ROM rather than repeating the calculation for every block.

3.2.2.  Computation of LPC Coefficients

   From the 2 x 11 smoothed autocorrelation coefficients, acf1_win and
   acf2_win, the 2 x 11 LPC coefficients, lp1 and lp2, are calculated
   in the same way for both analysis locations by using the well known
   Levinson-Durbin recursion.  The first LPC coefficient is always 1.0,
   resulting in ten unique coefficients.

   After determining the LPC coefficients, a bandwidth expansion
   procedure is applied to smooth the spectral peaks in the
   short-term spectrum.  The bandwidth addition is obtained by the
   following modification of the LPC coefficients:

      lp1_bw[i] = lp1[i] * chirp^i; i=0,...,LPC_FILTERORDER
      lp2_bw[i] = lp2[i] * chirp^i; i=0,...,LPC_FILTERORDER

   where "chirp" is a real number between 0 and 1.  It is RECOMMENDED to
   use a value of 0.9.

3.2.3.  Computation of LSF Coefficients from LPC Coefficients

   Thus far, two sets of LPC coefficients that represent the short-term
   spectral characteristics of the speech signal for two different time
   locations within the current block have been determined.  These
   coefficients SHOULD be quantized and interpolated.  Before this is



Andersen, et al.              Experimental                     [Page 11]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   done, it is advantageous to convert the LPC parameters into another
   type of representation called Line Spectral Frequencies (LSF).  The
   LSF parameters are used because they are better suited for
   quantization and interpolation than the regular LPC coefficients.
   Many computationally efficient methods for calculating the LSFs from
   the LPC coefficients have been proposed in the literature.  The
   detailed implementation of one applicable method can be found in
   Appendix A.26.  The two arrays of LSF coefficients obtained, lsf1 and
   lsf2, are of dimension 10 (LPC_FILTERORDER).

3.2.4.  Quantization of LSF Coefficients

   Because the LPC filters defined by the two sets of LSFs are also
   needed in the decoder, the LSF parameters need to be quantized and
   transmitted as side information.  The total number of bits required
   to represent the quantization of the two LSF representations for one
   block of speech is 40, with 20 bits used for each of lsf1 and lsf2.

   For computational and storage reasons, the LSF vectors are quantized
   using three-split vector quantization (VQ).  That is, the LSF vectors
   are split into three sub-vectors that are each quantized with a
   regular VQ.  The quantized versions of lsf1 and lsf2, qlsf1 and
   qlsf2, are obtained by using the same memoryless split VQ.  The
   length of each of these two LSF vectors is 10, and they are split
   into three sub-vectors containing 3, 3, and 4 values, respectively.

   For each of the sub-vectors, a separate codebook of quantized values
   has been designed with a standard VQ training method for a large
   database containing speech from a large number of speakers recorded
   under various conditions.  The size of each of the three codebooks
   associated with the split definitions above is

      int size_lsfCbTbl[LSF_NSPLIT] = {64,128,128};

   The actual values of the vector quantization codebook that must be
   used can be found in the reference code of Appendix A.  Both sets of
   LSF coefficients, lsf1 and lsf2, are quantized with a standard
   memoryless split vector quantization (VQ) structure using the squared
   error criterion in the LSF domain.  The split VQ quantization
   consists of the following steps:

   1) Quantize the first three LSF coefficients (1 - 3) with a VQ
      codebook of size 64.
   2) Quantize the next three LSF coefficients 4 - 6 with VQ a codebook
      of size 128.
   3) Quantize the last four LSF coefficients (7 - 10) with a VQ
      codebook of size 128.




Andersen, et al.              Experimental                     [Page 12]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   This procedure, repeated for lsf1 and lsf2, gives six quantization
   indices and the quantized sets of LSF coefficients qlsf1 and qlsf2.
   Each set of three indices is encoded with 6 + 7 + 7 = 20 bits.  The
   total number of bits used for LSF quantization in a block is thus 40
   bits.

3.2.5.  Stability Check of LSF Coefficients

   The LSF representation of the LPC filter has the convenient property
   that the coefficients are ordered by increasing value, i.e., lsf(n-1)
   < lsf(n), 0 < n < 10, if the corresponding synthesis filter is
   stable.  As we are employing a split VQ scheme, it is possible that
   at the split boundaries the LSF coefficients are not ordered
   correctly and hence that the corresponding LP filter is unstable.  To
   ensure that the filter used is stable, a stability check is performed
   for the quantized LSF vectors.  If it turns out that the coefficients
   are not ordered appropriately (with a safety margin of 50 Hz to
   ensure that formant peaks are not too narrow), they will be moved
   apart.  The detailed method for this can be found in Appendix A.40.
   The same procedure is performed in the decoder.  This ensures that
   exactly the same LSF representations are used in both encoder and
   decoder.

3.2.6.  Interpolation of LSF Coefficients

   From the two sets of LSF coefficients that are computed for each
   block of speech, different LSFs are obtained for each sub-block by
   means of interpolation.  This procedure is performed for the original
   LSFs (lsf1 and lsf2), as well as the quantized versions qlsf1 and
   qlsf2, as both versions are used in the encoder.  Here follows a
   brief summary of the interpolation scheme; the details are found in
   the c-code of Appendix A.  In the first sub-block, the average of the
   second LSF vector from the previous block and the first LSF vector in
   the current block is used.  For sub-blocks two through five, the LSFs
   used are obtained by linear interpolation from lsf1 (and qlsf1) to
   lsf2 (and qlsf2), with lsf1 used in sub-block two and lsf2 in sub-
   block five.  In the last sub-block, lsf2 is used.  For the very first
   block it is assumed that the last LSF vector of the previous block is
   equal to a predefined vector, lsfmeanTbl, obtained by calculating the
   mean LSF vector of the LSF design database.

   lsfmeanTbl[LPC_FILTERORDER] = {0.281738, 0.445801, 0.663330,
                  0.962524, 1.251831, 1.533081, 1.850586, 2.137817,
                  2.481445, 2.777344}







Andersen, et al.              Experimental                     [Page 13]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   The interpolation method is standard linear interpolation in the LSF
   domain.  The interpolated LSF values are converted to LPC
   coefficients for each sub-block.  The unquantized and quantized LPC
   coefficients form two sets of filters respectively.  The unquantized
   analysis filter for sub-block k is defined as follows

                ___
                \
      Ak(z)= 1 + > ak(i)*z^(-i)
                /__
             i=1...LPC_FILTERORDER

   The quantized analysis filter for sub-block k is defined as follows
                 ___
                 \
      A~k(z)= 1 + > a~k(i)*z^(-i)
                 /__
             i=1...LPC_FILTERORDER

   A reference implementation of the lsf encoding is given in Appendix
   A.38.  A reference implementation of the corresponding decoding can
   be found in Appendix A.36.

3.2.7.  LPC Analysis and Quantization for 20 ms Frames

   As previously stated, the codec only calculates one set of LPC
   parameters for the 20 ms frame size as opposed to two sets for 30 ms
   frames.  A single set of autocorrelation coefficients is calculated
   on the LPC_LOOKBACK + BLOCKL = 80 + 160 = 240 samples.  These samples
   are windowed with the asymmetric window lpc_asymwinTbl, centered over
   the third sub-frame, to form speech_hp_win.  Autocorrelation
   coefficients, acf, are calculated on the 240 samples in speech_hp_win
   and then windowed exactly as in section 3.2.1 (resulting in
   acf_win).

   This single set of windowed autocorrelation coefficients is used to
   calculate LPC coefficients, LSF coefficients, and quantized LSF
   coefficients in exactly the same manner as in sections 3.2.3 through
   3.2.4.  As for the 30 ms frame size, the ten LSF coefficients are
   divided into three sub-vectors of size 3, 3, and 4 and quantized by
   using the same scheme and codebook as in section 3.2.4 to finally get
   3 quantization indices.  The quantized LSF coefficients are
   stabilized with the algorithm described in section 3.2.5.

   From the set of LSF coefficients computed for this block and those
   from the previous block, different LSFs are obtained for each sub-
   block by means of interpolation.  The interpolation is done linearly
   in the LSF domain over the four sub-blocks, so that the n-th sub-



Andersen, et al.              Experimental                     [Page 14]

RFC 3951              Internet Low Bit Rate Codec          December 2004


   frame uses the weight (4-n)/4 for the LSF from old frame and the
   weight n/4 of the LSF from the current frame.  For the very first
   block the mean LSF, lsfmeanTbl, is used as the LSF from the previous
   block.  Similarly as seen in section 3.2.6, both unquantized, A(z),
   and quantized, A~(z), analysis filters are calculated for each of the
   four sub-blocks.

3.3.  Calculation of the Residual

   The block of speech samples is filtered by the quantized and
   interpolated LPC analysis filters to yield the residual signal.  In
   particular, the corresponding LPC analysis filter for each 40 sample
   sub-block is used to filter the speech samples for the same sub-
   block.  The filter memory at the end of each sub-block is carried
   over to the LPC filter of the next sub-block.  The signal at the
   output of each LP analysis filter constitutes the residual signal for
   the corresponding sub-block.

   A reference implementation of the LPC analysis filters is given in
   Appendix A.10.

3.4.  Perceptual Weighting Filter

   In principle any good design of a perceptual weighting filter can be
   applied in the encoder without compromising this codec definition.
   However, it is RECOMMENDED to use the perceptual weighting filter Wk
   for sub-block k specified below:

      Wk(z)=1/Ak(z/LPC_CHIRP_WEIGHTDENUM), where
                               LPC_CHIRP_WEIGHTDENUM = 0.4222

   This is a simple design with low complexity that is applied in the
   LPC residual domain.  Here Ak(z) is the filter obtained for sub-block
   k from unquantized but interpolated LSF coefficients.

3.5.  Start State Encoder

   The start state is quantized by using a common 6-bit scalar quantizer
   for the block and a 3-bit scalar quantizer operating on scaled
   samples in the weighted speech domain.  In the following we describe
   the state encoding in greater detail.










Andersen, et al.              Experimental                     [Page 15]

RFC 3951              Internet Low Bit Rate Codec          December 2004


3.5.1.  Start State Estimation

   The two sub-blocks containing the start state are determined by
   finding the two consecutive sub-blocks in the block having the
   highest power.  Advantageously, down-weighting is used in the
   beginning and end of the sub-frames, i.e., the following measure is
   computed (NSUB=4/6 for 20/30 ms frame size):

      nsub=1,...,NSUB-1
      ssqn[nsub] = 0.0;
      for (i=(nsub-1)*SUBL; i<(nsub-1)*SUBL+5; i++)
               ssqn[nsub] += sampEn_win[i-(nsub-1)*SUBL]*
                                 residual[i]*residual[i];
      for (i=(nsub-1)*SUBL+5; i<(nsub+1)*SUBL-5; i++)
               ssqn[nsub] += residual[i]*residual[i];
      for (i=(nsub+1)*SUBL-5; i<(nsub+1)*SUBL; i++)
               ssqn[nsub] += sampEn_win[(nsub+1)*SUBL-i-1]*
                                 residual[i]*residual[i];

   where sampEn_win[5]={1/6, 2/6, 3/6, 4/6, 5/6}; MAY be used.  The
   sub-frame number corresponding to the maximum value of
   ssqEn_win[nsub-1]*ssqn[nsub] is selected as the start state
   indicator.  A weighting of ssqEn_win[]={0.8,0.9,1.0,0.9,0.8} for 30
   ms frames and ssqEn_win[]={0.9,1.0,0.9} for 20 ms frames; MAY
   advantageously be used to bias the start state towards the middle of
💿 文件大小 67 K
👤 上传用户 liubixing
📂 所属分类压缩解压
🏷️ 相关标签

#ilbc #源码
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -