📄 draft-ietf-avt-ilbc-codec-05.txt

📁 虚拟串口驱动相关资料虚拟串口驱动程序源码虚拟串口驱动相关资料
💻 TXT
📖 第 1 页 / 共 5 页
字号:
   for the block and a 3-bit scalar quantizer operating on scaled
   samples in the weighted speech domain. In the following we describe
   the state encoding in greater detail.


 3.5.1 Start State Estimation
   
   The two sub-blocks containing the start state are determined by
   finding the two consecutive sub-blocks in the block having the
   highest power. Advantageously, down-weighting is used in the
   beginning and end of the sub-frames. I.e., the following measure is
   computed (NSUB=4/6 for 20/30 ms frame size):
   
         nsub=1,...,NSUB-1 
         ssqn[nsub] = 0.0; 
         for (i=(nsub-1)*SUBL; i<(nsub-1)*SUBL+5; i++) 
                  ssqn[nsub] += sampEn_win[i-(nsub-1)*SUBL]*
                                    residual[i]*residual[i];
         for (i=(nsub-1)*SUBL+5; i<(nsub+1)*SUBL-5; i++) 
                  ssqn[nsub] += residual[i]*residual[i];
         for (i=(nsub+1)*SUBL-5; i<(nsub+1)*SUBL; i++) 
                  ssqn[nsub] += sampEn_win[(nsub+1)*SUBL-i-1]*
                                    residual[i]*residual[i];
   
   where sampEn_win[5]={1/6, 2/6, 3/6, 4/6, 5/6}; MAY be used. The sub-
   frame number corresponding to the maximum value of ssqEn_win[nsub-
   1]*ssqn[nsub] is selected as the start state indicator. A weighting
   of ssqEn_win[]={0.8,0.9,1.0,0.9,0.8} for 30 ms frames and
   ssqEn_win[]={0.9,1.0,0.9} for 20 ms frames; MAY advantageously be
   used to bias the start state towards the middle of the frame. 
   
   For 20 ms frames there are 3 possible positions of the two-sub-block
   length maximum power segment, the start state position is encoded
   using 2 bits. The start state position, start, MUST be encoded as:
   
         start=1: start state in sub-frame 0 and 1
         start=2: start state in sub-frame 1 and 2
         start=3: start state in sub-frame 2 and 3
   
   For 30 ms frames there are 5 possible positions of the two-sub-block
   length maximum power segment, the start state position is encoded
   using 3 bits. The start state position, start, MUST be encoded as:
   
         start=1: start state in sub-frame 0 and 1
         start=2: start state in sub-frame 1 and 2
         start=3: start state in sub-frame 2 and 3
         start=4: start state in sub-frame 3 and 4
         start=5: start state in sub-frame 4 and 5
   
   hence, in both cases, index 0 is not utilized. In order to shorten
   the start state for bit rate efficiency, the start state is brought
   down to STATE_SHORT_LEN=57 samples for 20 ms frames and
   
   Andersen et. al.  Experimental - Expires November 29th, 2004     15
                     Internet Low Bit Rate Codec               May 04
   
   STATE_SHORT_LEN=58 samples for 30 ms frames. The power of the first
   23/22 and last 23/22 samples of the 2 sub-frame block identified
   above is computed as the sum of the squared signal sample values and
   the 23/22 sample segment with the lowest power is excluded from the
   start state. One bit is transmitted to indicate which of the 2
   possible 57/58 sample segments is used. The start state position
   within the 2 sub-frames determined above, state_first, MUST be
   encoded as:
   
         state_first=1: start state is first STATE_SHORT_LEN samples
         state_first=0: start state is last STATE_SHORT_LEN samples


 3.5.2 All-Pass Filtering and Scale Quantization
   
   The block of residual samples in the start state is first filtered
   by an all-pass filter with the quantized LPC coefficients as
   denominator and reversed quantized LPC coefficients as numerator.
   The purpose of this phase-dispersion filter is to get a more even
   distribution of the sample values in the residual signal. The
   filtering is performed by circular convolution, where the initial
   filter memory is set to zero.
   
      res(0..(STATE_SHORT_LEN-1))   = uncoded start state residual
      res((STATE_SHORT_LEN)..(2*STATE_SHORT_LEN-1)) = 0
   
      Pk(z) = A~rk(z)/A~k(z), where
                                   ___
                                   \
      A~rk(z)= z^(-LPC_FILTERORDER)+>a~k(i+1)*z^(i-(LPC_FILTERORDER-1))


                                   /__
                               i=0...(LPC_FILTERORDER-1)
   
      and A~k(z) is taken from the block where the start state begins
   
      res -> Pk(z) -> filtered
   
      ccres(k) = filtered(k) + filtered(k+STATE_SHORT_LEN), 
                                        k=0..(STATE_SHORT_LEN-1)
   
   The all pass filtered block is searched for its largest magnitude
   sample. The 10-logarithm of this magnitude is quantized with a 6-bit
   quantizer, state_frgqTbl, by finding the nearest representation.
   This results in an index, idxForMax, corresponding to a quantized
   value, qmax. The all-pass filtered residual samples in the block are
   then multiplied with a scaling factor scal=4.5/(10^qmax) to yield
   normalized samples.
   
   state_frgqTbl[64] = {1.000085, 1.071695, 1.140395, 1.206868,
                  1.277188, 1.351503, 1.429380, 1.500727, 1.569049,
                  1.639599, 1.707071, 1.781531, 1.840799, 1.901550,
                  1.956695, 2.006750, 2.055474, 2.102787, 2.142819,
                  2.183592, 2.217962, 2.257177, 2.295739, 2.332967,
                  2.369248, 2.402792, 2.435080, 2.468598, 2.503394,
                  2.539284, 2.572944, 2.605036, 2.636331, 2.668939,
   
   Andersen et. al.  Experimental - Expires November 29th, 2004     16
                     Internet Low Bit Rate Codec               May 04
   
                  2.698780, 2.729101, 2.759786, 2.789834, 2.818679,
                  2.848074, 2.877470, 2.906899, 2.936655, 2.967804,
                  3.000115, 3.033367, 3.066355, 3.104231, 3.141499,
                  3.183012, 3.222952, 3.265433, 3.308441, 3.350823,
                  3.395275, 3.442793, 3.490801, 3.542514, 3.604064,
                  3.666050, 3.740994, 3.830749, 3.938770, 4.101764}


 3.5.3 Scalar Quantization


   The normalized samples are quantized in the perceptually weighted
   speech domain by a sample-by-sample scalar DPCM quantization as
   depicted in Figure 3.3. Each sample in the block is filtered by a
   weighting filter Wk(z), specified in section 3.4, to form a weighted
   speech sample x[n]. The target sample d[n] is formed by subtracting
   a predicted sample y[n], where the prediction filter is given by
   
           Pk(z) = 1 - 1 / Wk(z).
   
               +-------+  x[n] +    d[n] +-----------+ u[n]
   residual -->| Wk(z) |-------->(+)---->| Quantizer |------> quantized
               +-------+       - /|\     +-----------+    |   residual
                                  |                      \|/
                             y[n] +--------------------->(+)
                                  |                       |
                                  |        +------+       |
                                  +--------| Pk(z)|<------+
                                           +------+
   
   Figure 3.3. Quantization of start state samples by DPCM in weighted
   speech domain.
   
   The coded state sample u[n] is obtained by quantizing d[n] with a 3-
   bit quantizer with quantization table state_sq3Tbl.
   
   state_sq3Tbl[8] = {-3.719849, -2.177490, -1.130005, -0.309692,
                  0.444214, 1.329712, 2.436279, 3.983887}
   
   The quantized samples are transformed back to the residual domain by
   1) scaling with 1/scal 2) time-reversing the scaled samples 3)
   filtering the time-reversed samples by the same all-pass filter as
   in section 3.5.2, using circular convolution 4) time-reversing the
   filtered samples. (More detailed in section 4.2)
   
   A reference implementation of the start state encoding can be found
   in Appendix A.46.


3.6 Encoding the remaining samples
   
   A dynamic codebook is used to encode 1) the 23/22 remaining samples
   in the 2 sub-blocks containing the start state; 2) encoding of the
   sub-blocks after the start state in time; 3) encoding of the sub-
   blocks before the start state in time. Thus, the encoding target can
   be either the 23/22 samples remaining of the 2 sub-blocks containing
   the start state or a 40 sample sub-block. This target can consist of
   
   Andersen et. al.  Experimental - Expires November 29th, 2004     17
                     Internet Low Bit Rate Codec               May 04
   
   samples that are indexed forwards in time or backwards in time
   depending on the location of the start state. The length of the
   target is denoted by lTarget.
   
   The coding is based on an adaptive codebook that is built from a
   codebook memory which contains decoded LPC excitation samples from
   the already encoded part of the block. These samples are indexed in
   the same time direction as the target vector and ending at the
   sample instant prior to the first sample instant represented in the
   target vector. The codebook memory has length lMem which is equal to
   CB_MEML=147 for the two/four 40 sample sub-blocks and 85 for the
   23/22 sample sub-block.
   
   The following figure shows an overview of the encoding procedure.
   
            +------------+    +---------------+    +-------------+
         -> | 1. Decode  | -> | 2. Mem setup  | -> | 3. Perc. W. | ->
            +------------+    +---------------+    +-------------+
   
            +------------+    +-----------------+    
         -> | 4. Search  | -> | 5. Upd. Target  | ------------------> 
          | +------------+    +------------------ |  
          ----<-------------<-----------<----------
                        stage=0..2
   
            +----------------+
         -> | 6. Recalc G[0] | ---------------> gains and CB indices
            +----------------+
   
   Figure 3.4. Flow chart of the codebook search in the iLBC encoder


   1. Decode the part of the residual that has been encoded so far,
   using the codebook without perceptual weighting
   2. Set up the memory by taking data from the decoded residual. This
   memory is used to construct codebooks from. For blocks preceding the
   start state, both the decoded residual and the target are time
   reversed (section 3.6.1)
   3. Filter the memory + target with the perceptual weighting filter
   (section 3.6.2)
   4. Search for the best match between the target and the codebook
   vector. Compute the optimal gain for this match and quantize that
   gain (section 3.6.4)
   5. Update the perceptually weighted target by subtracting the
   contribution from the selected codebook vector from the perceptually
   weighted memory (quantized gain times selected vector). Repeat 4.
   and 5. for the 2 additional stages
   6. Calculate the energy loss due to encoding of the residual. If
   needed, compensate for this loss by an upscaling and requantization
   of the gain for the first stage (section 3.7)
   
   The following sections provide an in-depth description of the
   different blocks of figure 3.4.
   


   
   Andersen et. al.  Experimental - Expires November 29th, 2004     18
                     Internet Low Bit Rate Codec               May 04
   
 3.6.1 Codebook Memory
   
   The codebook memory is based on the already encoded sub-blocks so
   the available data for encoding increases for each new sub-block
   that has been encoded. Until enough sub-blocks have been encoded to
   fill the codebook memory with data it is padded with zeros. The
   following figure shows an example of the order in which the sub-
   blocks are encoded for the 30 ms frame size if the start state is
   located in the last 58 samples of sub-block 2 and 3.
   
   +-----------------------------------------------------+
   |  5     | 1  |///|////////|    2   |    3   |    4   |
   +-----------------------------------------------------+
   
   Figure 3.5. The order from 1 to 5 in which the sub-blocks are
   encoded. The slashed area is the start state.
   
   The first target sub-block to be encoded is number 1 and the
   corresponding codebook memory is shown in the following figure.
   Since the target vector is before the start state in time the
   codebook memory and target vector are time reversed. By reversing
   them in time, the search algorithm can be reused. Since only the
   start state has been encoded so far the last samples of the codebook
   memory are padded with zeros.
   
   +-------------------------
   |zeros|\\\\\\\\|\\\\|  1 |
   +-------------------------
   
   Figure 3.6. The codebook memory, length lMem=85 samples, and the
   target vector 1, length 22 samples.
   
   The next step is to encode sub-block 2 using the memory which now
   has increased since sub-block 1 has been encoded. The following
   figure shows the codebook memory for encoding of sub-block 2.
   
   +-----------------------------------
   | zeros | 1  |///|////////|    2   |
   +-----------------------------------
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -