📄 draft-ietf-avt-ilbc-codec-05.txt
字号:
coefficients with the following window:
lpc_lagwinTbl[0] = 1.0001;
lpc_lagwinTbl[i] = exp(-0.5 * ((2 * PI * 60.0 * i) /FS)^2);
i=1,...,LPC_FILTERORDER
where FS=8000 is the sampling frequency
Then, the windowed acf function acf1_win is obtained by:
acf1_win[i] = acf1[i] * lpc_lagwinTbl[i];
i=0,...,LPC_FILTERORDER
The second set of autocorrelation coefficients, acf2_win are
obtained in a similar manner. The window, lpc_asymwinTbl, is applied
to samples 60 through 299, i.e., the entire current block. The
window consists of two segments; the first (samples 0 to 219) being
half a Hanning window with length 440 and the second being a quarter
of a cycle of a cosine wave. By using this asymmetric window, an LPC
analysis centered in the fifth sub-block is obtained without the
need for any look-ahead, which would have added delay. The
asymmetric window is defined as:
lpc_asymwinTbl[i] = (sin(PI * (i + 1) / 441))^2; i=0,...,219
lpc_asymwinTbl[i] = cos((i - 220) * PI / 40); i=220,...,239
Andersen et. al. Experimental - Expires November 29th, 2004 10
Internet Low Bit Rate Codec May 04
and the windowed speech is computed by:
speech_hp_win2[i] = speech_hp[i + LPC_LOOKBACK] *
lpc_asymwinTbl[i]; i=0,....BLOCKL-1
The windowed autocorrelation coefficients are then obtained in
exactly the same way as for the first analysis instance.
The generation of the windows lpc_winTbl, lpc_asymwinTbl, and
lpc_lagwinTbl are typically done in advance and the arrays are
stored in ROM rather than repeating the calculation for every block.
3.2.2 Computation of LPC Coefficients
From the 2 x 11 smoothed autocorrelation coefficients, acf1_win and
acf2_win, the 2 x 11 LPC coefficients, lp1 and lp2, are calculated
in the same way for both analysis locations using the well known
Levinson-Durbin recursion. The first LPC coefficient is always 1.0,
resulting in 10 unique coefficients.
After determining the LPC coefficients, a bandwidth expansion
procedure is applied in order to smooth the spectral peaks in the
short-term spectrum. The bandwidth addition is obtained by the
following modification of the LPC coefficients:
lp1_bw[i] = lp1[i] * chirp^i; i=0,...,LPC_FILTERORDER
lp2_bw[i] = lp2[i] * chirp^i; i=0,...,LPC_FILTERORDER
where "chirp" is a real number between 0 and 1. It is RECOMMENDED to
use a value of 0.9.
3.2.3 Computation of LSF Coefficients from LPC Coefficients
Thusfar, two sets of LPC coefficients that represent the short-term
spectral characteristics of the speech signal for two different time
locations within the current block have been determined. These
coefficients SHOULD be quantized and interpolated. Before doing so,
it is advantageous to convert the LPC parameters into another type
of representation called Line Spectral Frequencies (LSF). The LSF
parameters are used because they are better suited for quantization
and interpolation than the regular LPC coefficients. Many
computationally efficient methods for calculating the LSFs from the
LPC coefficients have been proposed in the literature. The detailed
implementation of one applicable method can be found in Appendix
A.26. The two arrays of LSF coefficients obtained, lsf1 and lsf2,
are of dimension 10 (LPC_FILTERORDER).
3.2.4 Quantization of LSF Coefficients
Since the LPC filters defined by the two sets of LSFs are needed
also in the decoder, the LSF parameters need to be quantized and
transmitted as side information. The total number of bits required
to represent the quantization of the two LSF representations for one
block of speech is 40 with 20 bits used for each of lsf1 and lsf2.
Andersen et. al. Experimental - Expires November 29th, 2004 11
Internet Low Bit Rate Codec May 04
For computational and storage reasons, the LSF vectors are quantized
using 3-split vector quantization (VQ). That is, the LSF vectors are
split into three sub-vectors which are each quantized with a regular
VQ. The quantized versions of lsf1 and lsf2, qlsf1 and qlsf2, are
obtained by using the same memoryless split VQ. The length of each
of these two LSF vectors is 10 and they are split into 3 sub-vectors
containing 3, 3 and 4 values respectively.
For each of the sub-vectors, a separate codebook of quantized values
has been designed using a standard VQ training method for a large
database containing speech from a large number of speakers recorded
under various conditions. The size of each of the three codebooks
associated with the split definitions above is:
int size_lsfCbTbl[LSF_NSPLIT] = {64,128,128};
The actual values of the vector quantization codebook that must be
used can be found in the reference code of appendix A. Both sets of
LSF coefficients, lsf1 and lsf2, are quantized with a standard
memoryless split vector quantization (VQ) structure using the
squared error criterion in the LSF domain. The split VQ quantization
consists of the following steps:
1) Quantize the first 3 LSF coefficients (1 - 3) with a VQ codebook
of size 64.
2) Quantize the LSF coefficients 4, 5, and 6 with VQ a codebook of
size 128.
3) Quantize the last 4 LSF coefficients (7 - 10) with a VQ codebook
of size 128.
This procedure, repeated for lsf1 and lsf2, gives 6 quantization
indices and the quantized sets of LSF coefficients qlsf1 and qlsf2.
Each set of three indices is encoded with 6 + 7 + 7 = 20 bits. The
total number of bits used for LSF quantization in a block is thus 40
bits.
3.2.5 Stability Check of LSF Coefficients
The LSF representation of the LPC filter has the nice property that
the coefficients are ordered by increasing value, i.e., lsf(n-1) <
lsf(n), 0 < n < 10, if the corresponding synthesis filter is stable.
Since we are employing a split VQ scheme it is possible that at the
split boundaries the LSF coefficients are not ordered correctly and
hence the corresponding LP filter is unstable. To ensure that the
filter used is stable, a stability check is performed for the
quantized LSF vectors. If it turns out that the coefficients are not
ordered appropriately (with a safety margin of 50 Hz to ensure that
formant peaks are not too narrow) they will be moved apart. The
detailed method for this can be found in Appendix A.40. The same
procedure is performed in the decoder. This ensures that exactly the
same LSF representations are used in both encoder and decoder.
3.2.6 Interpolation of LSF Coefficients
Andersen et. al. Experimental - Expires November 29th, 2004 12
Internet Low Bit Rate Codec May 04
From the two sets of LSF coefficients that are computed for each
block of speech, different LSFs are obtained for each sub-block by
means of interpolation. This procedure is performed for the original
LSFs (lsf1 and lsf2), as well as the quantized versions qlsf1 and
qlsf2 since both versions are used in the encoder. Here follows a
brief summary of the interpolation scheme while the details are
found in the c-code of Appendix A. In the first sub-block, the
average of the second LSF vector from the previous block and the
first LSF vector in the current block is used. For sub-blocks two
through five the LSFs used are obtained by linear interpolation from
lsf1 (and qlsf1) to lsf2 (and qlsf2) with lsf1 used in sub-block two
and lsf2 in sub-block five. In the last sub-block, lsf2 is used. For
the very first block it is assumed that the last LSF vector of the
previous block is equal to a predefined vector, lsfmeanTbl, that was
obtained by calculating the mean LSF vector of the LSF design
database.
lsfmeanTbl[LPC_FILTERORDER] = {0.281738, 0.445801, 0.663330,
0.962524, 1.251831, 1.533081, 1.850586, 2.137817,
2.481445, 2.777344}
The interpolation method is standard linear interpolation in the LSF
domain. The interpolated LSF values are converted to LPC
coefficients for each sub-block. The unquantized and quantized LPC
coefficients form two sets of filters respectively. The unquantized
analysis filter for sub-block k:
___
\
Ak(z)= 1 + > ak(i)*z^(-i)
/__
i=1...LPC_FILTERORDER
And the quantized analysis filter for sub-block k:
___
\
A~k(z)= 1 + > a~k(i)*z^(-i)
/__
i=1...LPC_FILTERORDER
A reference implementation of the lsf encoding is given in Appendix
A.38. A reference implementation of the corresponding decoding can
be found in Appendix A.36.
3.2.7 LPC Analysis and Quantization for 20 ms frames
As stated before, the codec only calculates one set of LPC
parameters for the 20 ms frame size as opposed to two sets for 30 ms
frames. A single set of autocorrelation coefficients is calculated
on the LPC_LOOKBACK + BLOCKL = 80 + 160 = 240 samples. These sampl
es
are windowed with the asymmetric window lpc_asymwinTbl, centered
over the third sub-frame, to form speech_hp_win. Autocorrelation
coefficients, acf, are calculated on the 240 samples in
Andersen et. al. Experimental - Expires November 29th, 2004 13
Internet Low Bit Rate Codec May 04
speech_hp_win and then windowed exactly as in 3.2.1 (resulting in
acf_win).
This single set of windowed autocorrelation coefficients is used to
calculate LPC Coefficients, LSF Coefficients and quantized LSF
coefficients in exactly the same manner as in 3.2.3 to 3.2.4. As for
the 30 ms frame size, the 10 LSF coefficients are divided into three
sub-vectors of size 3, 3, 4 and quantized using the same scheme and
codebook as in 3.2.4 to finally get 3 quantization indices. The
quantized LSF coefficients are stabilized with the algorithm
described in 3.2.5.
From the set of LSF coefficients that was computed for this block
together with the LSF coefficients from the previous block,
different LSFs are obtained for each sub-block by means of
interpolation. The interpolation is done linearly in the LSF domain
over the 4 sub-blocks, so that the n-th sub-frame uses the weight
(4-n)/4 for the LSF from old frame and the weight n/4 of the LSF
from the current frame. For the very first block the mean LSF,
lsfmeanTbl, is used as the LSF from the previous block. Similar to
3.2.6, both unquantized, A(z), and quantized, A~(z), analysis
filters are calculated for each of the four sub-blocks.
3.3 Calculation of the Residual
The block of speech samples is filtered by the quantized and
interpolated LPC analysis filters to yield the residual signal. In
particular, the corresponding LPC analysis filter for each 40 sample
sub-block is used to filter the speech samples for the same sub-
block. The filter memory at the end of each sub-block is carried
over to the LPC filter of the next sub-block. The signal at the
output of each LP analysis filter constitutes the residual signal
for the corresponding sub-block.
A reference implementation of the LPC analysis filters is given in
Appendix A.10.
3.4 Perceptual Weighting Filter
In principle any good design of a perceptual weighting filter can be
applied in the encoder without compromising this codec definition.
It is however RECOMMENDED to use the perceptual weighting filter
specified below:
Weighting filter for sub-block k:
Wk(z)=1/Ak(z/LPC_CHIRP_WEIGHTDENUM), where
LPC_CHIRP_WEIGHTDENUM = 0.4222
This is a simple design with low complexity that is applied in the
LPC residual domain. Here Ak(z) is the filter obtained from
unquantized but interpolated LSF coefficients.
Andersen et. al. Experimental - Expires November 29th, 2004 14
Internet Low Bit Rate Codec May 04
3.5 Start State Encoder
The start state is quantized using a common 6-bit scalar quantizer
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -