📄 ilbc.txt
字号:
the frame.
For 20 ms frames there are three possible positions for the two-sub-
block length maximum power segment; the start state position is
encoded with 2 bits. The start state position, start, MUST be
encoded as
start=1: start state in sub-frame 0 and 1
start=2: start state in sub-frame 1 and 2
start=3: start state in sub-frame 2 and 3
For 30 ms frames there are five possible positions of the two-sub-
block length maximum power segment, the start state position is
encoded with 3 bits. The start state position, start, MUST be
encoded as
start=1: start state in sub-frame 0 and 1
start=2: start state in sub-frame 1 and 2
start=3: start state in sub-frame 2 and 3
start=4: start state in sub-frame 3 and 4
start=5: start state in sub-frame 4 and 5
Andersen, et al. Experimental [Page 16]
RFC 3951 Internet Low Bit Rate Codec December 2004
Hence, in both cases, index 0 is not used. In order to shorten the
start state for bit rate efficiency, the start state is brought down
to STATE_SHORT_LEN=57 samples for 20 ms frames and STATE_SHORT_LEN=58
samples for 30 ms frames. The power of the first 23/22 and last
23/22 samples of the two sub-frame blocks identified above is
computed as the sum of the squared signal sample values, and the
23/22-sample segment with the lowest power is excluded from the start
state. One bit is transmitted to indicate which of the two possible
57/58 sample segments is used. The start state position within the
two sub-frames determined above, state_first, MUST be encoded as
state_first=1: start state is first STATE_SHORT_LEN samples
state_first=0: start state is last STATE_SHORT_LEN samples
3.5.2. All-Pass Filtering and Scale Quantization
The block of residual samples in the start state is first filtered by
an all-pass filter with the quantized LPC coefficients as denominator
and reversed quantized LPC coefficients as numerator. The purpose of
this phase-dispersion filter is to get a more even distribution of
the sample values in the residual signal. The filtering is performed
by circular convolution, where the initial filter memory is set to
zero.
res(0..(STATE_SHORT_LEN-1)) = uncoded start state residual
res((STATE_SHORT_LEN)..(2*STATE_SHORT_LEN-1)) = 0
Pk(z) = A~rk(z)/A~k(z), where
___
\
A~rk(z)= z^(-LPC_FILTERORDER)+>a~k(i+1)*z^(i-(LPC_FILTERORDER-1))
/__
i=0...(LPC_FILTERORDER-1)
and A~k(z) is taken from the block where the start state begins
res -> Pk(z) -> filtered
ccres(k) = filtered(k) + filtered(k+STATE_SHORT_LEN),
k=0..(STATE_SHORT_LEN-1)
The all-pass filtered block is searched for its largest magnitude
sample. The 10-logarithm of this magnitude is quantized with a 6-bit
quantizer, state_frgqTbl, by finding the nearest representation.
Andersen, et al. Experimental [Page 17]
RFC 3951 Internet Low Bit Rate Codec December 2004
This results in an index, idxForMax, corresponding to a quantized
value, qmax. The all-pass filtered residual samples in the block are
then multiplied with a scaling factor scal=4.5/(10^qmax) to yield
normalized samples.
state_frgqTbl[64] = {1.000085, 1.071695, 1.140395, 1.206868,
1.277188, 1.351503, 1.429380, 1.500727, 1.569049,
1.639599, 1.707071, 1.781531, 1.840799, 1.901550,
1.956695, 2.006750, 2.055474, 2.102787, 2.142819,
2.183592, 2.217962, 2.257177, 2.295739, 2.332967,
2.369248, 2.402792, 2.435080, 2.468598, 2.503394,
2.539284, 2.572944, 2.605036, 2.636331, 2.668939,
2.698780, 2.729101, 2.759786, 2.789834, 2.818679,
2.848074, 2.877470, 2.906899, 2.936655, 2.967804,
3.000115, 3.033367, 3.066355, 3.104231, 3.141499,
3.183012, 3.222952, 3.265433, 3.308441, 3.350823,
3.395275, 3.442793, 3.490801, 3.542514, 3.604064,
3.666050, 3.740994, 3.830749, 3.938770, 4.101764}
3.5.3. Scalar Quantization
The normalized samples are quantized in the perceptually weighted
speech domain by a sample-by-sample scalar DPCM quantization as
depicted in Figure 3.3. Each sample in the block is filtered by a
weighting filter Wk(z), specified in section 3.4, to form a weighted
speech sample x[n]. The target sample d[n] is formed by subtracting
a predicted sample y[n], where the prediction filter is given by
Pk(z) = 1 - 1 / Wk(z).
+-------+ x[n] + d[n] +-----------+ u[n]
residual -->| Wk(z) |-------->(+)---->| Quantizer |------> quantized
+-------+ - /|\ +-----------+ | residual
| \|/
y[n] +--------------------->(+)
| |
| +------+ |
+--------| Pk(z)|<------+
+------+
Figure 3.3. Quantization of start state samples by DPCM in weighted
speech domain.
The coded state sample u[n] is obtained by quantizing d[n] with a 3-
bit quantizer with quantization table state_sq3Tbl.
state_sq3Tbl[8] = {-3.719849, -2.177490, -1.130005, -0.309692,
0.444214, 1.329712, 2.436279, 3.983887}
Andersen, et al. Experimental [Page 18]
RFC 3951 Internet Low Bit Rate Codec December 2004
The quantized samples are transformed back to the residual domain by
1) scaling with 1/scal; 2) time-reversing the scaled samples; 3)
filtering the time-reversed samples by the same all-pass filter, as
in section 3.5.2, by using circular convolution; and 4) time-
reversing the filtered samples. (More detail is in section 4.2.)
A reference implementation of the start-state encoding can be found
in Appendix A.46.
3.6. Encoding the Remaining Samples
A dynamic codebook is used to encode 1) the 23/22 remaining samples
in the two sub-blocks containing the start state; 2) the sub-blocks
after the start state in time; and 3) the sub-blocks before the start
state in time. Thus, the encoding target can be either the 23/22
samples remaining of the 2 sub-blocks containing the start state, or
a 40-sample sub-block. This target can consist of samples that are
indexed forward in time or backward in time, depending on the
location of the start state. The length of the target is denoted by
lTarget.
The coding is based on an adaptive codebook that is built from a
codebook memory that contains decoded LPC excitation samples from the
already encoded part of the block. These samples are indexed in the
same time direction as is the target vector and end at the sample
instant prior to the first sample instant represented in the target
vector. The codebook memory has length lMem, which is equal to
CB_MEML=147 for the two/four 40-sample sub-blocks and 85 for the
23/22-sample sub-block.
The following figure shows an overview of the encoding procedure.
+------------+ +---------------+ +-------------+
-> | 1. Decode | -> | 2. Mem setup | -> | 3. Perc. W. | ->
+------------+ +---------------+ +-------------+
+------------+ +-----------------+
-> | 4. Search | -> | 5. Upd. Target | ------------------>
| +------------+ +------------------ |
----<-------------<-----------<----------
stage=0..2
+----------------+
-> | 6. Recalc G[0] | ---------------> gains and CB indices
+----------------+
Figure 3.4. Flow chart of the codebook search in the iLBC encoder.
Andersen, et al. Experimental [Page 19]
RFC 3951 Internet Low Bit Rate Codec December 2004
1. Decode the part of the residual that has been encoded so far,
using the codebook without perceptual weighting.
2. Set up the memory by taking data from the decoded residual. This
memory is used to construct codebooks. For blocks preceding the
start state, both the decoded residual and the target are time
reversed (section 3.6.1).
3. Filter the memory + target with the perceptual weighting filter
(section 3.6.2).
4. Search for the best match between the target and the codebook
vector. Compute the optimal gain for this match and quantize that
gain (section 3.6.4).
5. Update the perceptually weighted target by subtracting the
contribution from the selected codebook vector from the
perceptually weighted memory (quantized gain times selected
vector). Repeat 4 and 5 for the two additional stages.
6. Calculate the energy loss due to encoding of the residual. If
needed, compensate for this loss by an upscaling and
requantization of the gain for the first stage (section 3.7).
The following sections provide an in-depth description of the
different blocks of Figure 3.4.
3.6.1. Codebook Memory
The codebook memory is based on the already encoded sub-blocks, so
the available data for encoding increases for each new sub-block that
has been encoded. Until enough sub-blocks have been encoded to fill
the codebook memory with data, it is padded with zeros. The
following figure shows an example of the order in which the sub-
blocks are encoded for the 30 ms frame size if the start state is
located in the last 58 samples of sub-block 2 and 3.
+-----------------------------------------------------+
| 5 | 1 |///|////////| 2 | 3 | 4 |
+-----------------------------------------------------+
Figure 3.5. The order from 1 to 5 in which the sub-blocks are
encoded. The slashed area is the start state.
Andersen, et al. Experimental [Page 20]
RFC 3951 Internet Low Bit Rate Codec December 2004
The first target sub-block to be encoded is number 1, and the
corresponding codebook memory is shown in the following figure. As
the target vector comes before the start state in time, the codebook
memory and target vector are time reversed; thus, after the block has
been time reversed the search algorithm can be reused. As only the
start state has been encoded so far, the last samples of the codebook
memory are padded with zeros.
+-------------------------
|zeros|\\\\\\\\|\\\\| 1 |
+-------------------------
Figure 3.6. The codebook memory, length lMem=85 samples, and the
target vector 1, length 22 samples.
The next step is to encode sub-block 2 by using the memory that now
has increased since sub-block 1 has been encoded. The following
figure shows the codebook memory for encoding of sub-block 2.
+-----------------------------------
| zeros | 1 |///|////////| 2 |
+-----------------------------------
Figure 3.7. The codebook memory, length lMem=147 samples, and the
target vector 2, length 40 samples.
The next step is to encode sub-block 3 by using the memory which has
been increased yet again since sub-blocks 1 and 2 have been encoded,
but the sub-block still has to be padded with a few zeros. The
following figure shows the codebook memory for encoding of sub-block
3.
+------------------------------------------
|zeros| 1 |///|////////| 2 | 3 |
+------------------------------------------
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -