📄 draft-ietf-avt-ilbc-codec-05.txt
字号:
for the block and a 3-bit scalar quantizer operating on scaled
samples in the weighted speech domain. In the following we describe
the state encoding in greater detail.
3.5.1 Start State Estimation
The two sub-blocks containing the start state are determined by
finding the two consecutive sub-blocks in the block having the
highest power. Advantageously, down-weighting is used in the
beginning and end of the sub-frames. I.e., the following measure is
computed (NSUB=4/6 for 20/30 ms frame size):
nsub=1,...,NSUB-1
ssqn[nsub] = 0.0;
for (i=(nsub-1)*SUBL; i<(nsub-1)*SUBL+5; i++)
ssqn[nsub] += sampEn_win[i-(nsub-1)*SUBL]*
residual[i]*residual[i];
for (i=(nsub-1)*SUBL+5; i<(nsub+1)*SUBL-5; i++)
ssqn[nsub] += residual[i]*residual[i];
for (i=(nsub+1)*SUBL-5; i<(nsub+1)*SUBL; i++)
ssqn[nsub] += sampEn_win[(nsub+1)*SUBL-i-1]*
residual[i]*residual[i];
where sampEn_win[5]={1/6, 2/6, 3/6, 4/6, 5/6}; MAY be used. The sub-
frame number corresponding to the maximum value of ssqEn_win[nsub-
1]*ssqn[nsub] is selected as the start state indicator. A weighting
of ssqEn_win[]={0.8,0.9,1.0,0.9,0.8} for 30 ms frames and
ssqEn_win[]={0.9,1.0,0.9} for 20 ms frames; MAY advantageously be
used to bias the start state towards the middle of the frame.
For 20 ms frames there are 3 possible positions of the two-sub-block
length maximum power segment, the start state position is encoded
using 2 bits. The start state position, start, MUST be encoded as:
start=1: start state in sub-frame 0 and 1
start=2: start state in sub-frame 1 and 2
start=3: start state in sub-frame 2 and 3
For 30 ms frames there are 5 possible positions of the two-sub-block
length maximum power segment, the start state position is encoded
using 3 bits. The start state position, start, MUST be encoded as:
start=1: start state in sub-frame 0 and 1
start=2: start state in sub-frame 1 and 2
start=3: start state in sub-frame 2 and 3
start=4: start state in sub-frame 3 and 4
start=5: start state in sub-frame 4 and 5
hence, in both cases, index 0 is not utilized. In order to shorten
the start state for bit rate efficiency, the start state is brought
down to STATE_SHORT_LEN=57 samples for 20 ms frames and
Andersen et. al. Experimental - Expires November 29th, 2004 15
Internet Low Bit Rate Codec May 04
STATE_SHORT_LEN=58 samples for 30 ms frames. The power of the first
23/22 and last 23/22 samples of the 2 sub-frame block identified
above is computed as the sum of the squared signal sample values and
the 23/22 sample segment with the lowest power is excluded from the
start state. One bit is transmitted to indicate which of the 2
possible 57/58 sample segments is used. The start state position
within the 2 sub-frames determined above, state_first, MUST be
encoded as:
state_first=1: start state is first STATE_SHORT_LEN samples
state_first=0: start state is last STATE_SHORT_LEN samples
3.5.2 All-Pass Filtering and Scale Quantization
The block of residual samples in the start state is first filtered
by an all-pass filter with the quantized LPC coefficients as
denominator and reversed quantized LPC coefficients as numerator.
The purpose of this phase-dispersion filter is to get a more even
distribution of the sample values in the residual signal. The
filtering is performed by circular convolution, where the initial
filter memory is set to zero.
res(0..(STATE_SHORT_LEN-1)) = uncoded start state residual
res((STATE_SHORT_LEN)..(2*STATE_SHORT_LEN-1)) = 0
Pk(z) = A~rk(z)/A~k(z), where
___
\
A~rk(z)= z^(-LPC_FILTERORDER)+>a~k(i+1)*z^(i-(LPC_FILTERORDER-1))
/__
i=0...(LPC_FILTERORDER-1)
and A~k(z) is taken from the block where the start state begins
res -> Pk(z) -> filtered
ccres(k) = filtered(k) + filtered(k+STATE_SHORT_LEN),
k=0..(STATE_SHORT_LEN-1)
The all pass filtered block is searched for its largest magnitude
sample. The 10-logarithm of this magnitude is quantized with a 6-bit
quantizer, state_frgqTbl, by finding the nearest representation.
This results in an index, idxForMax, corresponding to a quantized
value, qmax. The all-pass filtered residual samples in the block are
then multiplied with a scaling factor scal=4.5/(10^qmax) to yield
normalized samples.
state_frgqTbl[64] = {1.000085, 1.071695, 1.140395, 1.206868,
1.277188, 1.351503, 1.429380, 1.500727, 1.569049,
1.639599, 1.707071, 1.781531, 1.840799, 1.901550,
1.956695, 2.006750, 2.055474, 2.102787, 2.142819,
2.183592, 2.217962, 2.257177, 2.295739, 2.332967,
2.369248, 2.402792, 2.435080, 2.468598, 2.503394,
2.539284, 2.572944, 2.605036, 2.636331, 2.668939,
Andersen et. al. Experimental - Expires November 29th, 2004 16
Internet Low Bit Rate Codec May 04
2.698780, 2.729101, 2.759786, 2.789834, 2.818679,
2.848074, 2.877470, 2.906899, 2.936655, 2.967804,
3.000115, 3.033367, 3.066355, 3.104231, 3.141499,
3.183012, 3.222952, 3.265433, 3.308441, 3.350823,
3.395275, 3.442793, 3.490801, 3.542514, 3.604064,
3.666050, 3.740994, 3.830749, 3.938770, 4.101764}
3.5.3 Scalar Quantization
The normalized samples are quantized in the perceptually weighted
speech domain by a sample-by-sample scalar DPCM quantization as
depicted in Figure 3.3. Each sample in the block is filtered by a
weighting filter Wk(z), specified in section 3.4, to form a weighted
speech sample x[n]. The target sample d[n] is formed by subtracting
a predicted sample y[n], where the prediction filter is given by
Pk(z) = 1 - 1 / Wk(z).
+-------+ x[n] + d[n] +-----------+ u[n]
residual -->| Wk(z) |-------->(+)---->| Quantizer |------> quantized
+-------+ - /|\ +-----------+ | residual
| \|/
y[n] +--------------------->(+)
| |
| +------+ |
+--------| Pk(z)|<------+
+------+
Figure 3.3. Quantization of start state samples by DPCM in weighted
speech domain.
The coded state sample u[n] is obtained by quantizing d[n] with a 3-
bit quantizer with quantization table state_sq3Tbl.
state_sq3Tbl[8] = {-3.719849, -2.177490, -1.130005, -0.309692,
0.444214, 1.329712, 2.436279, 3.983887}
The quantized samples are transformed back to the residual domain by
1) scaling with 1/scal 2) time-reversing the scaled samples 3)
filtering the time-reversed samples by the same all-pass filter as
in section 3.5.2, using circular convolution 4) time-reversing the
filtered samples. (More detailed in section 4.2)
A reference implementation of the start state encoding can be found
in Appendix A.46.
3.6 Encoding the remaining samples
A dynamic codebook is used to encode 1) the 23/22 remaining samples
in the 2 sub-blocks containing the start state; 2) encoding of the
sub-blocks after the start state in time; 3) encoding of the sub-
blocks before the start state in time. Thus, the encoding target can
be either the 23/22 samples remaining of the 2 sub-blocks containing
the start state or a 40 sample sub-block. This target can consist of
Andersen et. al. Experimental - Expires November 29th, 2004 17
Internet Low Bit Rate Codec May 04
samples that are indexed forwards in time or backwards in time
depending on the location of the start state. The length of the
target is denoted by lTarget.
The coding is based on an adaptive codebook that is built from a
codebook memory which contains decoded LPC excitation samples from
the already encoded part of the block. These samples are indexed in
the same time direction as the target vector and ending at the
sample instant prior to the first sample instant represented in the
target vector. The codebook memory has length lMem which is equal to
CB_MEML=147 for the two/four 40 sample sub-blocks and 85 for the
23/22 sample sub-block.
The following figure shows an overview of the encoding procedure.
+------------+ +---------------+ +-------------+
-> | 1. Decode | -> | 2. Mem setup | -> | 3. Perc. W. | ->
+------------+ +---------------+ +-------------+
+------------+ +-----------------+
-> | 4. Search | -> | 5. Upd. Target | ------------------>
| +------------+ +------------------ |
----<-------------<-----------<----------
stage=0..2
+----------------+
-> | 6. Recalc G[0] | ---------------> gains and CB indices
+----------------+
Figure 3.4. Flow chart of the codebook search in the iLBC encoder
1. Decode the part of the residual that has been encoded so far,
using the codebook without perceptual weighting
2. Set up the memory by taking data from the decoded residual. This
memory is used to construct codebooks from. For blocks preceding the
start state, both the decoded residual and the target are time
reversed (section 3.6.1)
3. Filter the memory + target with the perceptual weighting filter
(section 3.6.2)
4. Search for the best match between the target and the codebook
vector. Compute the optimal gain for this match and quantize that
gain (section 3.6.4)
5. Update the perceptually weighted target by subtracting the
contribution from the selected codebook vector from the perceptually
weighted memory (quantized gain times selected vector). Repeat 4.
and 5. for the 2 additional stages
6. Calculate the energy loss due to encoding of the residual. If
needed, compensate for this loss by an upscaling and requantization
of the gain for the first stage (section 3.7)
The following sections provide an in-depth description of the
different blocks of figure 3.4.
Andersen et. al. Experimental - Expires November 29th, 2004 18
Internet Low Bit Rate Codec May 04
3.6.1 Codebook Memory
The codebook memory is based on the already encoded sub-blocks so
the available data for encoding increases for each new sub-block
that has been encoded. Until enough sub-blocks have been encoded to
fill the codebook memory with data it is padded with zeros. The
following figure shows an example of the order in which the sub-
blocks are encoded for the 30 ms frame size if the start state is
located in the last 58 samples of sub-block 2 and 3.
+-----------------------------------------------------+
| 5 | 1 |///|////////| 2 | 3 | 4 |
+-----------------------------------------------------+
Figure 3.5. The order from 1 to 5 in which the sub-blocks are
encoded. The slashed area is the start state.
The first target sub-block to be encoded is number 1 and the
corresponding codebook memory is shown in the following figure.
Since the target vector is before the start state in time the
codebook memory and target vector are time reversed. By reversing
them in time, the search algorithm can be reused. Since only the
start state has been encoded so far the last samples of the codebook
memory are padded with zeros.
+-------------------------
|zeros|\\\\\\\\|\\\\| 1 |
+-------------------------
Figure 3.6. The codebook memory, length lMem=85 samples, and the
target vector 1, length 22 samples.
The next step is to encode sub-block 2 using the memory which now
has increased since sub-block 1 has been encoded. The following
figure shows the codebook memory for encoding of sub-block 2.
+-----------------------------------
| zeros | 1 |///|////////| 2 |
+-----------------------------------
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -