📄 draft-ietf-avt-ilbc-codec-05.txt

📁 虚拟串口驱动相关资料虚拟串口驱动程序源码虚拟串口驱动相关资料
💻 TXT
📖 第 1 页 / 共 5 页
字号:
   computed and the speech signal is filtered through them to produce
   the residual signal. The codec uses scalar quantization of the
   dominant part, in terms of energy, of the residual signal for the
   block. The dominant state is of length 57/58 (20 ms/30 ms) samples
   
   Andersen et. al.  Experimental - Expires November 29th, 2004      5
                     Internet Low Bit Rate Codec               May 04
   
   and forms a start state for dynamic codebooks constructed from the
   already coded parts of the residual signal. These dynamic codebooks
   are used to code the remaining parts of the residual signal. By this
   method, coding independence between blocks is achieved, resulting in
   elimination of propagation of perceptual degradations due to packet
   loss. The method facilitates high-quality packet loss concealment
   (PLC).


2.1 Encoder
   
   The input to the encoder SHOULD be 16 bit uniform PCM sampled at 8
   kHz. It SHOULD be partitioned into blocks of BLOCKL=160/240 samples
   for the 20/30 ms frame size. Each block is divided into NSUB=4/6
   consecutive sub-blocks of SUBL=40 samples each. For 30 ms frame
   size, the encoder performs two LPC_FILTERORDER=10 linear-predictive
   coding (LPC) analyses. The first analysis applies a smooth window
   centered over the 2nd sub-block and extending to the middle of the
   5th sub-block. The second LPC analysis applies a smooth asymmetric
   window centered over the 5th sub-block and extending to the end of
   the 6th sub-block. For 20 ms frame size one LPC_FILTERORDER=10
   linear-predictive coding (LPC) analysis is performed with a smooth
   window centered over the 3rd sub-frame.
   
   For each of the LPC analyses, a set of line-spectral frequencies
   (LSFs) are obtained, quantized and interpolated to obtain LSF
   coefficients for each sub-block. Subsequently, the LPC residual is
   computed using the quantized and interpolated LPC analysis filters.
   
   The two consecutive sub-blocks of the residual exhibiting the
   maximal weighted energy are identified. Within these 2 sub-blocks,
   the start state (segment) is selected from two choices: the first
   57/58 samples or the last 57/58 samples of the 2 consecutive sub-
   blocks. The selected segment is the one of higher energy. The start
   state is encoded with scalar quantization. 
   
   A dynamic codebook encoding procedure is used to encode 1) the 23/22
   (20 ms/30 ms) remaining samples in the 2 sub-blocks containing the
   start state; 2) encoding of the sub-blocks after the start state in
   time; 3) encoding of the sub-blocks before the start state in time.
   Thus, the encoding target can be either the 23/22 samples remaining
   of the 2 sub-blocks containing the start state or a 40 sample sub-
   block. This target can consist of samples that are indexed forwards
   in time or backwards in time depending on the location of the start
   state. 
   
   The coding is based on an adaptive codebook that is built from a
   codebook memory which contains decoded LPC excitation samples from
   the already encoded part of the block. These samples are indexed in
   the same time direction as the target vector and ending at the
   sample instant prior to the first sample instant represented in the
   target vector. The codebook is used in CB_NSTAGES=3 stages in a
   successive refinement approach and the resulting 3 code vector gains
   are encoded with 5, 4, and 3 bit scalar quantization, respectively.
   
   
   Andersen et. al.  Experimental - Expires November 29th, 2004      6
                     Internet Low Bit Rate Codec               May 04
   
   The codebook search method employs noise shaping derived from the
   LPC filters and the main decision criterion is minimizing the
   squared error between the target vector and the code vectors. Each
   code vector in this codebook comes from one of CB_EXPAND=2 codebook
   sections. The first section is filled with delayed, already encoded
   residual vectors. The code vectors of the second codebook section
   are constructed by predefined linear combinations of vectors in the
   first section of the codebook.


   Since codebook encoding with squared-error matching is known to
   produce a coded signal of less power than the scalar quantized start
   state signal, a gain re-scaling method is implemented by a refined
   search for a better set of codebook gains in terms of power matching
   after encoding. This is done by searching for a higher value of the
   gain factor for the first stage codebook since the subsequent stage
   codebook gains are scaled by the first stage gain.


2.2 Decoder
   
   For packet communications, typically a jitter buffer placed at the
   receiving end decides whether the packet containing an encoded
   signal block has been received or lost. This logic is not part of
   the codec described here. For each received encoded signal block the
   decoder performs a decoding. For each lost signal block the decoder
   performs a PLC operation.
   
   The decoding for each block starts by decoding and interpolating the
   LPC coefficients. Subsequently the start state is decoded.
   
   For codebook encoded segments, each segment is decoded by
   constructing the 3 code vectors given by the received codebook
   indices in the same way as the code vectors were constructed in the
   encoder. The 3 gain factors are also decoded and the resulting
   decoded signal is given by the sum of the 3 codebook vectors scaled
   with respective gain.
   
   An enhancement algorithm is applied on the reconstructed excitation
   signal. This enhancement augments the periodicity of voiced speech
   regions. The enhancement is optimized under the constraint that the
   modification signal (defined as the difference between the enhanced
   excitation and the excitation signal prior to enhancement) has a
   short-time energy that does not exceed a preset fraction of the
   short-time energy of the excitation signal prior to enhancement.
   
   A packet loss concealment (PLC) operation is easily embedded in the
   decoder. The PLC operation can, e.g., be based on repetition of LPC
   filters and obtaining the LPC residual signal using a long term
   prediction estimate from previous residual blocks. 
   






   
   Andersen et. al.  Experimental - Expires November 29th, 2004      7
                     Internet Low Bit Rate Codec               May 04
   
3. ENCODER PRINCIPLES
   
   The following block diagram is an overview of all the components of
   the iLBC encoding procedure. The description of the blocks contains
   references to the section where that particular procedure is
   described further.
   
              +-----------+    +---------+    +---------+   
    speech -> | 1. Pre P  | -> | 2. LPC  | -> | 3. Ana  | ->
              +-----------+    +---------+    +---------+   
   
              +---------------+   +--------------+    
           -> | 4. Start Sel  | ->| 5. Scalar Qu | -> 
              +---------------+   +--------------+   
   
              +--------------+    +---------------+
           -> |6. CB Search  | -> | 7. Packetize  | -> payload
           |  +--------------+ |  +---------------+
           ----<---------<------
        sub-frame 0..2/4 (20 ms/30 ms)
   
   Figure 3.1. Flow chart of the iLBC encoder
   
   1. Pre process speech with a HP filter if needed (section 3.1)
   2. Compute LPC parameters, quantize and interpolate (section 3.2)
   3. Use analysis filters on speech to compute residual (section 3.3)
   4. Select position of 57/58 sample start state (section 3.5)
   5. Quantize the 57/58 sample start state with scalar quantization
   (section 3.5)
   6. Search the codebook for each sub-frame. Start with 23/22 sample
   block, then encode sub-blocks forward in time and then encode sub-
   blocks backward in time. For each block the steps in figure 3.4 are
   performed (section 3.6)
   7. Packetize the bits into the payload specified in table 3.2.
   
   The input to the encoder SHOULD be 16 bit uniform PCM sampled at 8
   kHz. Also it SHOULD be partitioned into blocks of BLOCKL=160/240
   samples. Each block input to the encoder is divided into NSUB=4/6
   consecutive sub-blocks of SUBL=40 samples each.
   















   
   Andersen et. al.  Experimental - Expires November 29th, 2004      8
                     Internet Low Bit Rate Codec               May 04
   
               0        39        79       119       159
               +---------------------------------------+
               |    1    |    2    |    3    |    4    |
               +---------------------------------------+
                              20 ms frame
   
     0        39        79       119       159       199       239
     +-----------------------------------------------------------+
     |    1    |    2    |    3    |    4    |    5    |    6    |
     +-----------------------------------------------------------+
                               30 ms frame
   
   Figure 3.2. One input block to the encoder for 20 ms (with 4 sub-
   frames) and 30 ms (with 6 sub-frames).


3.1 Pre-processing
   
   In some applications the recorded speech signal contains DC level
   and/or 50/60 Hz noise. If these components have not been removed
   prior to the encoder call, they should be removed by a high-pass
   filter. A reference implementation of this, using a filter with cut
   off frequency 90 Hz, can be found in Appendix A.28.


3.2 LPC Analysis and Quantization
   
   The input to the LPC analysis module is a possibly high-pass
   filtered speech buffer, speech_hp, that contains 240/300
   (LPC_LOOKBACK + BLOCKL = 80/60 + 160/240 = 240/300) speech samples
,
   where samples 0 through 79/59 are from the previous block and
   samples 80/60 through 239/299 are from the current block. No look-
   ahead into the next block is used. For the very first block
   processed, the look back samples are assumed to be zeros.
   
   For each input block, the LPC analysis calculates one/two set(s) of
   LPC_FILTERORDER=10 LPC filter coefficients using the autocorrelation


   method and the Levinson-Durbin recursion. These coefficients are
   converted to the Line Spectrum Frequency representation. In the 20
   ms case the set, lsf, represents the spectral characteristics as
   measured at the center of the third sub-block. For 30 ms frames the
   first set, lsf1, represents the spectral properties of the input
   signal at the center of the second sub-block while the other set,
   lsf2, represents the spectral characteristics as measured at the
   center of the fifth sub-block. The details of the computation for 30
   ms frames are described in 3.2.1 through 3.2.6. Section 3.2.7
   explains how the LPC Analysis and Quantization differs for 20 ms
   frames.


 3.2.1 Computation of Autocorrelation Coefficients
   
   The first step in the LPC analysis procedure is to calculate
   autocorrelation coefficients using windowed speech samples. This
   windowing is the only difference in the LPC analysis procedure for
   the two sets of coefficients. For the first set, a 240 sample long


   
   Andersen et. al.  Experimental - Expires November 29th, 2004      9
                     Internet Low Bit Rate Codec               May 04
   
   standard symmetric Hanning window is applied to samples 0 through
   239 of the input data. The first window, lpc_winTbl, is defined as:
   
         lpc_winTbl[i]= 0.5 * (1.0 - cos((2*PI*(i+1))/(BLOCKL+1)));
                  i=0,...,119
         lpc_winTbl[i] = winTbl[BLOCKL - i - 1]; i=120,...,239
   
   The windowed speech speech_hp_win1 is then obtained by multiplying
   the 240 first samples of the input speech buffer with the window
   coefficients:
   
         speech_hp_win1[i] = speech_hp[i] * lpc_winTbl[i];
                  i=0,...,BLOCKL-1
   
   From these 240 windowed speech samples, 11 (LPC_FILTERORDER + 1)
   autocorrelation coefficients, acf1, are calculated:
   
         acf1[lag] += speech_hp_win1[n] * speech_hp_win1[n + lag];
                  lag=0,...,LPC_FILTERORDER; n=0,...,BLOCKL-lag-1
   
   In order to make the analysis more robust against numerical
   precision problems, a spectral smoothing procedure is applied by
   windowing the autocorrelation coefficients before the LPC
   coefficients are computed. Also, a white noise floor is added to the
   autocorrelation function by multiplying coefficient zero by 1.0001
   (40dB below the energy of the windowed speech signal). These two
   steps are implemented by multiplying the autocorrelation
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -