📄 adpcm_算法.txt
字号:
finally the 16 bit pcm value of the first sample is written out. (isamp2) (for each channel.)
then the rest of the block may be encoded. (note that the first encoded value will be for the 3rd sample in the block since the first two are contained in the header.)
while there are more samples in the block to decode:
predict the next sample from the previous two samples.
lpredsamp = ((isamp1 * icoef1) + (isamp2 *icoef2))
/ fixed_point_coef_base
the 4 bit signed error delta is produced and overflow/underflow is prevented..
ierrordelta = (sample(n) - lpredsamp) / idelta
if ierrordelta is too large, make it the maximum allowable size.
if ierrordelta is too small, make it the minimum allowable size.
then the nibble ierrordelta is written out.
putnibble( ierrordelta )
add the 'error in prediction' to the predicted next sample and prevent over/underflow errors.
(lnewsamp = lpredsample + (idelta * ierrordelta)
if lnewsample too large, make it the maximum allowable size.
if lnewsample too small, make it the minimum allowable size.
adjust the quantization step size used to calculate the 'error in prediction'.
idelta = idelta * adaptiontable[ ierrordelta] / fixed_point_adaption_base
if idelta too small, make it the minimum allowable size.
update the record of previous samples.
isamp2 = isamp1;
isamp1 = lnewsample.
sample c code
sample c code is contained in the file msadpcm.c, which is available with this document in electronic form and separately. see the overview section for how to obtain this sample code.
cvsd wave type
added 07/21/92
author: dsp solutions, formerly digispeech
fact chunk
this chunk is required for all wave formats other than wave_format_pcm. it stores file dependent information about the contents of the wave data. it currently specifies the time length of the data in samples.
wave format header
# define wave_format_ibm_cvsd (0x0005)
wformattag this must be set to wave_format_ibm_cvsd
nchannels number of channels in the wave, 1 for mono, 2 for stereo...
nsamplespersec frequency the source was sampled at. see chart below.
navgbytespersec average data rate. see chart below. (one of 1800, 2400, 3000, 3600, 4200, or 4800)
playback software can estimate the buffer size using the value.
nblockalign set to 2048 to provide efficient caching of file from cd-rom.
playback software needs to process a multiple of bytes of data at a time, so that the value of can be used for buffer alignment.
wbitspersample this is the number of bits per sample of data. this is always 1 for cvsd.
cbsize the size in bytes of the rest of the wave format header. this is zero for cvsd.
the digispeech cvsd compression format is compatible with the ibm ps/2 speech adapter, which uses a motorola mc3418 for cvsd modulation. the motorola chip uses only one algorithm which can work at variable sampling clock rates. the cvsd algorithm compresses each input audio sample to 1 bit. an acceptable quality of sound is achieved using high sampling rates. the digispeech ds201 adapter supports six cvsd sampling frequencies, which are being used by most software using the ibm ps/2 speech adapter:
sample rate bytes/second
14,400hz 1800 bytes
19,200hz 2400 bytes
24,000hz 3000 bytes
28,800hz 3600 bytes
33,600hz 4200 bytes
38,400hz 4800 bytes
the cvsd format is a compression scheme which has been used by ibm and is supported by the ibm ps/2 speech adapter card. digispeech also has a card that uses this compression scheme. it is not digispeech's policy to disclose any of these algorithms to any third party vendor.
ccitt standard companded wave types
added: 05/22/92
author: microsoft, dsp solutions formerly digispeech, vocaltec, artisoft
fact chunk
this chunk is required for all wave formats other than wave_format_pcm. it stores file dependent information about the contents of the wave data. it currently specifies the time length of the data in samples.
wave format header
#define wave_format_alaw (0x0006)
#define wave_format_mulaw (0x0007)
wformattag this must be set to one of wave_format_alaw, wave_format_mulaw
nchannels number of channels in the wave, 1 for mono, 2 for stereo...
nsamplespersec frequency of the wave file. (8000, 11025, 22050, 44100).
navgbytespersec average data rate.
playback software can estimate the buffer size using the <navgbytespersec> value.
nblockalign size of the blocks in bytes.
playback software needs to process a multiple of <nblockalign> bytes of data at a time, so that the value of <nblockalign> can be used for buffer alignment.
wbitspersample this is the number of bits per sample of data. (this is 8 for all the companded formats.)
cbsize the size in bytes of the extra information in the extended wave 'fmt' header. this should be zero.
see the ccitt g.711 specification for details of the data format.
this is a ccitt (international telegraph and telephone consultative committee) specification. their address is:
palais des nations
ch-1211 geneva 10, switzerland
phone: 22 7305111
oki adpcm wave types
added: 05/22/92
author: digispeech, vocaltec, wang
fact chunk
this chunk is required for all wave formats other than wave_format_pcm. it stores file dependent information about the contents of the wave data. it currently specifies the time length of the data in samples.
wave format header
# define wave_format_oki_adpcm (0x0010)
typedef struct oki_adpcmwaveformat_tag {
waveformatex wfx;
word wpole;
} okiadpcmwaveformat;
wformattag this must be set to wave_format_oki_adpcm
nchannels number of channels in the wave, 1 for mono, 2 for stereo.
nsamplespersec frequency the sample rate of the wave file. (8000, 11025, 22050, 44100).
navgbytespersec average data rate.
playback software can estimate the buffer size using the <navgbytespersec> value.
nblockalign this is dependent upon the number of bits per sample.
wbitspersample nchannels nblockalign
3 1 3
3 2 6
4 1 1
4 2 1
playback software needs to process a multiple of <nblockalign> bytes of data at a time, so that the value of <nblockalign> can be used for buffer alignment.
wbitspersample this is the number of bits per sample of data. (oki can be 3 or 4)
cbsize the size in bytes of the extra information in the extended wave 'fmt' header. this should be 2.
wpole high frequency emphasis value
this format is created and read by the oki apdcm chip set. this chip set is used by a number of card manufacturers.
ima adpcm wave type
the ima adpcm and the dvi adpcm are identical. please see the following section on the dvi adpcm wave type for a full description.
# define wave_format_ima_adpcm (0x0011)
dvi adpcm wave type
added: 12/16/92
author: intel
please note that dvi adpcm wave type is identical to ima adpcm wave type.
fact chunk
this chunk is required for all wave formats other than wave_format_pcm. it stores file dependent information about the contents of the wave data. it currently specifies the time length of the data in samples.
wave format header
# define wave_format_dvi_adpcm (0x0011)
typedef struct dvi_adpcmwaveformat_tag {
waveformatex wfx;
word wsamplesperblock;
} dviadpcmwaveformat;
wformattag this must be set to wave_format_dvi_adpcm.
nchannels number of channels in the wave, 1 for mono, 2 for stereo...
nsamplespersec sample rate of the wave file. this should be 8000, 11025, 22050 or 44100. other sample rates are allowed.
navgbytespersec total average data rate.
playback software can estimate the buffer size for a selected amount of time by using the <navgbytespersec> value.
nblockalign this is dependent upon the number of bits per sample.
wbitspersample nblockalign
3 (( n * 3 ) + 1 ) * 4 * nchannels
4 (n + 1) * 4 * nchannels
where n = 0, 1, 2, 3 . . .
the recommended block size for coding is
256 * <nchannels> bytes* min(1, (/ 11 khz))
smaller values cause the block header to become a more significant storage overhead. but, it is up to the implementation of the coding portion of the algorithm to decide the optimal value for <nblockalign> within the given constraints (see above). the decoding portion of the algorithm must be able to handle any valid block size. playback software needs to process a multiple of <nblockalign> bytes of data at a time, so the value of <nblockalign> can be used for allocating buffers.
wbitspersample this is the number of bits per sample of data. dvi adpcm supports 3 or 4 bits per sample.
cbsize the size in bytes of the extra information in the extended wave 'fmt' header. this should be 2.
wsamplesperblock count of the number of samples per channel per block.
block
the block is defined to be <nblockalign> bytes in length. for dvi adpcm this must be a multiple of 4 bytes since all information in the block is divided on 32 bit word boundaries.
the block has two parts, the header and the data. the two together are <nblockalign> bytes in length. the following diagram shows the header and data parts of one block.
where:
m =
header
this is a c structure that defines the dvi adpcm block header.
typedef struct dvi_adpcmblockheader_tag {
int isamp0;
byte bsteptableindex;
byte breserved;
} dvi_adpcmblockheader;
field description
isamp0 the first sample value of the block. when decoding, this will be used as the previous sample to start decoding with.
bsteptableindex the current index into the step table array. (0 - 88)
breserved this byte is reserved for future use.
a block contains an array of <nchannels> header structures as defined above. this diagram gives a byte level description of the contents of each header word.
data
the data words are interpreted differently depending on the number of bits per sample selected.
for 4 bit dvi adpcm (where <wbitspersample> is equal to four) each data word contains eight sample codes as shown in the following diagram.
where:
n = a data word for a given channel, in the range of 0 to
<nblockalign> / ( 4 * <nchannels> ) - <nchannels> - 1
p = ( n * 8 ) + 1
sample 0 is always included in the block header for the channel.
each sample is 4 bits in length. each block contains a total of <wsamplesperblock> samples for each channel.
for 3 bit dvi adpcm (where <wbitspersample> is equal to three) each data word contains 10.667 sample codes. it takes three words to hold an integral number of sample codes at 3 bits per code. so for 3 bit dvi adpcm, the number of data words is required to be a multiple of three words (12 bytes). these three words contain 32 sample codes as shown in the following diagram.
where:
m = one of the channels, in the range of 1 to <nchannels>
n = the first data word in a group of three data words for channelm, in the
range of 0 to <nblockalign> / ( 4 * <nchannels> ) - <nchannels> - 1
p = ( ( n / 3 ) * 32 ) + 1
sample 0 is always included in the block header for the channel.
each sample is 3 bits in length. each block contains a total of <wsamplesperblock> samples for each channel.
dvi adpcm algorithm
each channel of the dvi adpcm file can be encoded/decoded independently. since the channels are encoded/decoded independently, this document is written as if only one channel is being decoded. since the channels are interleaved, multiple channels may be encoded/decoded in parallel using independent local storage and temporaries.
note that the process for encoding/decoding one block is independent from the process for the next block. therefore the process is described for one block only, and may be repeated for other blocks.
the processes for encoding and decoding is discussed below.
tables
the dvi adpcm algorithm relies on two tables to encode and decode audio samples. these are the step table and the index table. the contents of these tables are fixed for this algorithm. the 3 and 4 bit versions of the dvi adpcm algorithm use the same step table, which is:
const int steptab[ 89 ] = {
7, 8, 9, 10, 11, 12, 13, 14,
16, 17, 19, 21, 23, 25, 28, 31,
34, 37, 41, 45, 50, 55, 60, 66,
73, 80, 88, 97, 107, 118, 130, 143,
157, 173, 190, 209, 230, 253, 279, 307,
337, 371, 408, 449, 494, 544, 598, 658,
724, 796, 876, 963, 1060, 1166, 1282, 1411,
1552, 1707, 1878, 2066, 2272, 2499, 2749, 3024,
3327, 3660, 4026, 4428, 4871, 5358, 5894, 6484,
7132, 7845, 8630, 9493, 10442, 11487, 12635, 13899,
15289, 16818, 18500, 20350, 22385, 24623, 27086, 29794,
32767 }
but, the index table is different for the different bit rates. for the 4 bit dvi adpcm the contents of index table is:
const int indextab[ 16 ] = { -1, -1, -1, -1, 2, 4, 6, 8,
-1, -1, -1, -1, 2, 4, 6, 8 };
for 3 bit dvi adpcm the contents of the index table is:
const int indextab[ 8 ] = { -1, -1, 1, 2,
-1, -1, 1, 2 };
decoding
this section describes the algorithm used for decoding the 4 bit dvi adpcm. this procedure must be followed for each block for each channel.
get the first sample, samp0, from the block header
set the initial step table index, index, from the block header
output the first sample, samp0
set the previous sample value:
sampx-1 = samp0
while there are still samples to decode
get the next sample code, sampx code
calculate the new sample:
calculate the difference:
diff = 0
if ( sampx code & 4 )
diff = diff + steptab[ index ]
if ( sampx code & 2 )
diff = diff + ( steptab[ index ] >> 1 )
if ( sampx code & 1 )
diff = diff + ( steptab[ index ] >> 2 )
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -