📄 rfc3951.txt

📁 mediastreamer2是开源的网络传输媒体流的库
💻 TXT
📖 第 1 页 / 共 5 页
字号:
Andersen, et al.              Experimental                     [Page 16]RFC 3951              Internet Low Bit Rate Codec          December 2004   Hence, in both cases, index 0 is not used.  In order to shorten the   start state for bit rate efficiency, the start state is brought down   to STATE_SHORT_LEN=57 samples for 20 ms frames and STATE_SHORT_LEN=58   samples for 30 ms frames.  The power of the first 23/22 and last   23/22 samples of the two sub-frame blocks identified above is   computed as the sum of the squared signal sample values, and the   23/22-sample segment with the lowest power is excluded from the start   state.  One bit is transmitted to indicate which of the two possible   57/58 sample segments is used.  The start state position within the   two sub-frames determined above, state_first, MUST be encoded as      state_first=1: start state is first STATE_SHORT_LEN samples      state_first=0: start state is last STATE_SHORT_LEN samples3.5.2.  All-Pass Filtering and Scale Quantization   The block of residual samples in the start state is first filtered by   an all-pass filter with the quantized LPC coefficients as denominator   and reversed quantized LPC coefficients as numerator.  The purpose of   this phase-dispersion filter is to get a more even distribution of   the sample values in the residual signal.  The filtering is performed   by circular convolution, where the initial filter memory is set to   zero.      res(0..(STATE_SHORT_LEN-1))   = uncoded start state residual      res((STATE_SHORT_LEN)..(2*STATE_SHORT_LEN-1)) = 0      Pk(z) = A~rk(z)/A~k(z), where                                   ___                                   \      A~rk(z)= z^(-LPC_FILTERORDER)+>a~k(i+1)*z^(i-(LPC_FILTERORDER-1))                                   /__                               i=0...(LPC_FILTERORDER-1)      and A~k(z) is taken from the block where the start state begins      res -> Pk(z) -> filtered      ccres(k) = filtered(k) + filtered(k+STATE_SHORT_LEN),                                        k=0..(STATE_SHORT_LEN-1)   The all-pass filtered block is searched for its largest magnitude   sample.  The 10-logarithm of this magnitude is quantized with a 6-bit   quantizer, state_frgqTbl, by finding the nearest representation.Andersen, et al.              Experimental                     [Page 17]RFC 3951              Internet Low Bit Rate Codec          December 2004   This results in an index, idxForMax, corresponding to a quantized   value, qmax.  The all-pass filtered residual samples in the block are   then multiplied with a scaling factor scal=4.5/(10^qmax) to yield   normalized samples.   state_frgqTbl[64] = {1.000085, 1.071695, 1.140395, 1.206868,                  1.277188, 1.351503, 1.429380, 1.500727, 1.569049,                  1.639599, 1.707071, 1.781531, 1.840799, 1.901550,                  1.956695, 2.006750, 2.055474, 2.102787, 2.142819,                  2.183592, 2.217962, 2.257177, 2.295739, 2.332967,                  2.369248, 2.402792, 2.435080, 2.468598, 2.503394,                  2.539284, 2.572944, 2.605036, 2.636331, 2.668939,                  2.698780, 2.729101, 2.759786, 2.789834, 2.818679,                  2.848074, 2.877470, 2.906899, 2.936655, 2.967804,                  3.000115, 3.033367, 3.066355, 3.104231, 3.141499,                  3.183012, 3.222952, 3.265433, 3.308441, 3.350823,                  3.395275, 3.442793, 3.490801, 3.542514, 3.604064,                  3.666050, 3.740994, 3.830749, 3.938770, 4.101764}3.5.3.  Scalar Quantization   The normalized samples are quantized in the perceptually weighted   speech domain by a sample-by-sample scalar DPCM quantization as   depicted in Figure 3.3.  Each sample in the block is filtered by a   weighting filter Wk(z), specified in section 3.4, to form a weighted   speech sample x[n].  The target sample d[n] is formed by subtracting   a predicted sample y[n], where the prediction filter is given by           Pk(z) = 1 - 1 / Wk(z).               +-------+  x[n] +    d[n] +-----------+ u[n]   residual -->| Wk(z) |-------->(+)---->| Quantizer |------> quantized               +-------+       - /|\     +-----------+    |   residual                                  |                      \|/                             y[n] +--------------------->(+)                                  |                       |                                  |        +------+       |                                  +--------| Pk(z)|<------+                                           +------+   Figure 3.3.  Quantization of start state samples by DPCM in weighted   speech domain.   The coded state sample u[n] is obtained by quantizing d[n] with a 3-   bit quantizer with quantization table state_sq3Tbl.   state_sq3Tbl[8] = {-3.719849, -2.177490, -1.130005, -0.309692,                  0.444214, 1.329712, 2.436279, 3.983887}Andersen, et al.              Experimental                     [Page 18]RFC 3951              Internet Low Bit Rate Codec          December 2004   The quantized samples are transformed back to the residual domain by   1) scaling with 1/scal; 2) time-reversing the scaled samples; 3)   filtering the time-reversed samples by the same all-pass filter, as   in section 3.5.2, by using circular convolution; and 4) time-   reversing the filtered samples.  (More detail is in section 4.2.)   A reference implementation of the start-state encoding can be found   in Appendix A.46.3.6.  Encoding the Remaining Samples   A dynamic codebook is used to encode 1) the 23/22 remaining samples   in the two sub-blocks containing the start state; 2) the sub-blocks   after the start state in time; and 3) the sub-blocks before the start   state in time.  Thus, the encoding target can be either the 23/22   samples remaining of the 2 sub-blocks containing the start state, or   a 40-sample sub-block.  This target can consist of samples that are   indexed forward in time or backward in time, depending on the   location of the start state.  The length of the target is denoted by   lTarget.   The coding is based on an adaptive codebook that is built from a   codebook memory that contains decoded LPC excitation samples from the   already encoded part of the block.  These samples are indexed in the   same time direction as is the target vector and end at the sample   instant prior to the first sample instant represented in the target   vector.  The codebook memory has length lMem, which is equal to   CB_MEML=147 for the two/four 40-sample sub-blocks and 85 for the   23/22-sample sub-block.   The following figure shows an overview of the encoding procedure.         +------------+    +---------------+    +-------------+      -> | 1. Decode  | -> | 2. Mem setup  | -> | 3. Perc. W. | ->         +------------+    +---------------+    +-------------+         +------------+    +-----------------+      -> | 4. Search  | -> | 5. Upd. Target  | ------------------>       | +------------+    +------------------ |       ----<-------------<-----------<----------                     stage=0..2         +----------------+      -> | 6. Recalc G[0] | ---------------> gains and CB indices         +----------------+   Figure 3.4.  Flow chart of the codebook search in the iLBC encoder.Andersen, et al.              Experimental                     [Page 19]RFC 3951              Internet Low Bit Rate Codec          December 2004   1. Decode the part of the residual that has been encoded so far,      using the codebook without perceptual weighting.   2. Set up the memory by taking data from the decoded residual.  This      memory is used to construct codebooks.  For blocks preceding the      start state, both the decoded residual and the target are time      reversed (section 3.6.1).   3. Filter the memory + target with the perceptual weighting filter      (section 3.6.2).   4. Search for the best match between the target and the codebook      vector.  Compute the optimal gain for this match and quantize that      gain (section 3.6.4).   5. Update the perceptually weighted target by subtracting the      contribution from the selected codebook vector from the      perceptually weighted memory (quantized gain times selected      vector).  Repeat 4 and 5 for the two additional stages.   6. Calculate the energy loss due to encoding of the residual.  If      needed, compensate for this loss by an upscaling and      requantization of the gain for the first stage (section 3.7).   The following sections provide an in-depth description of the   different blocks of Figure 3.4.3.6.1.  Codebook Memory   The codebook memory is based on the already encoded sub-blocks, so   the available data for encoding increases for each new sub-block that   has been encoded.  Until enough sub-blocks have been encoded to fill   the codebook memory with data, it is padded with zeros.  The   following figure shows an example of the order in which the sub-   blocks are encoded for the 30 ms frame size if the start state is   located in the last 58 samples of sub-block 2 and 3.   +-----------------------------------------------------+   |  5     | 1  |///|////////|    2   |    3   |    4   |   +-----------------------------------------------------+   Figure 3.5.  The order from 1 to 5 in which the sub-blocks are   encoded.  The slashed area is the start state.Andersen, et al.              Experimental                     [Page 20]RFC 3951              Internet Low Bit Rate Codec          December 2004   The first target sub-block to be encoded is number 1, and the   corresponding codebook memory is shown in the following figure.  As   the target vector comes before the start state in time, the codebook   memory and target vector are time reversed; thus, after the block has   been time reversed the search algorithm can be reused.  As only the   start state has been encoded so far, the last samples of the codebook   memory are padded with zeros.   +-------------------------   |zeros|\\\\\\\\|\\\\|  1 |   +-------------------------   Figure 3.6.  The codebook memory, length lMem=85 samples, and the   target vector 1, length 22 samples.   The next step is to encode sub-block 2 by using the memory that now   has increased since sub-block 1 has been encoded.  The following   figure shows the codebook memory for encoding of sub-block 2.   +-----------------------------------   | zeros | 1  |///|////////|    2   |   +-----------------------------------   Figure 3.7.  The codebook memory, length lMem=147 samples, and the   target vector 2, length 40 samples.   The next step is to encode sub-block 3 by using the memory which has   been increased yet again since sub-blocks 1 and 2 have been encoded,   but the sub-block still has to be padded with a few zeros.  The   following figure shows the codebook memory for encoding of sub-block   3.   +------------------------------------------   |zeros| 1  |///|////////|    2   |   3    |   +------------------------------------------   Figure 3.8.  The codebook memory, length lMem=147 samples, and the   target vector 3, length 40 samples.   The next step is to encode sub-block 4 by using the memory which now   has increased yet again since sub-blocks 1, 2, and 3 have been   encoded.  This time, the memory does not have to be padded with   zeros.  The following figure shows the codebook memory for encoding   of sub-block 4.Andersen, et al.              Experimental                     [Page 21]RFC 3951              Internet Low Bit Rate Codec          December 2004   +------------------------------------------   |1|///|////////|    2   |   3    |   4    |   +------------------------------------------   Figure 3.9.  The codebook memory, length lMem=147 samples, and the   target vector 4, length 40 samples.   The final target sub-block to be encoded is number 5, and the   following figure shows the corresponding codebook memory.  As the   target vector comes before the start state in time, the codebook   memory and target vector are time reversed.   +-------------------------------------------   |  3  |   2    |\\\\\\\\|\\\\|  1 |   5    |
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -