📄 chap09.htm
字号:
Hadamard Transform</span><span
style='font-family:宋体;"Times New Roman"'>,简称</span><span lang=EN-US>DHT)</span><span
style='font-family:宋体;"Times New Roman"'>。</span></p>
<p style='line-height:18.0pt'><span style='font-family:宋体;
"Times New Roman"'>其它的编码方法也有很多,如混合编码</span><span
lang=EN-US>(Hybird Coding)</span><span style='font-family:宋体;"Times New Roman";"Times New Roman"'>、矢量量化</span><span
lang=EN-US>(Vector Quantize</span><span style='font-family:宋体;"Times New Roman";"Times New Roman"'>,</span><span
lang=EN-US>VQ) </span><span style='font-family:宋体;
"Times New Roman"'>、</span><span lang=EN-US>LZW</span><span
style='font-family:宋体;"Times New Roman"'>算法。在这里,我们只介绍</span><span lang=EN-US>LZW</span><span
style='font-family:宋体;"Times New Roman"'>算法的大体思想。</span></p>
<p style='line-height:18.0pt'><span style='font-family:宋体;
"Times New Roman"'>值得注意的是,近些年来出现了很多新的压缩编码方法,如使用人工神经元网络</span><span
lang=EN-US>(Artificial Neural Network</span><span style='font-family:宋体;
"Times New Roman"'>,简称</span><span
lang=EN-US>ANN)</span><span style='font-family:宋体;
"Times New Roman"'>的压缩编码算法、分形</span><span lang=EN-US>(Fractl)</span><span
style='font-family:宋体;"Times New Roman"'>、小波</span><span lang=EN-US>(Wavelet)
</span><span
style='font-family:宋体;"Times New Roman"'>、基于对象</span><span lang=EN-US>(Object
Based)</span><span
style='font-family:宋体;"Times New Roman"'>的压缩编码算法、基于模型</span><span lang=EN-US>(Model
–Based)</span><span
style='font-family:宋体;"Times New Roman"'>的压缩编码算法</span><span lang=EN-US>(</span><span
style='font-family:宋体;"Times New Roman"'>应用在</span><span lang=EN-US>MPEG4</span><span
style='font-family:宋体;"Times New Roman"'>及未来的视频压缩编码标准中</span><span lang=EN-US>)</span><span
style='font-family:宋体;"Times New Roman"'>。这些都超出了本书的范围。</span></p>
<p style='line-height:18.0pt'><span style='font-family:宋体;
"Times New Roman"'>本章的最后,我们将以</span><span
lang=EN-US>JPEG</span><span style='font-family:宋体;
"Times New Roman"'>压缩编码标准为例,看看上面的几种编码方法在实际的压缩编码中是怎样应用的。</span></p>
<h2> <span
lang=EN-US>9.1</span> <span lang=EN-US> </span><a name="_Toc486331908"></a><a
name="_Toc486332908"></a><a name="_Toc486339017"></a><a name="_Toc454810882"></a><a
name="_Toc454856656"><span><span>哈夫曼编码</span></span></a></h2>
<p style='line-height:18.0pt'><span style='font-family:宋体;
"Times New Roman"'>哈夫曼</span><span
lang=EN-US>(Huffman)</span><span style='font-family:宋体;"Times New Roman";"Times New Roman"'>编码是一种常用的压缩编码方法,是</span><span
lang=EN-US>Huffman</span><span style='font-family:宋体;"Times New Roman";"Times New Roman"'>于</span><span
lang=EN-US>1952</span><span style='font-family:宋体;
"Times New Roman"'>年为压缩文本文件建立的。它的基本原理是频繁使用的数据用较短的代码代替,较少使用的数据用较长的代码代替,每个数据的代码各不相同。这些代码都是二进制码,且码的长度是可变的。举个例子:假设一个文件中出现了</span><span
lang=EN-US>8</span><span style='font-family:宋体;
"Times New Roman"'>种符号</span><span lang=EN-US>S0,S1,S2,S3,S4,S5,S6,S7</span><span
style='font-family:宋体;"Times New Roman"'>,那么每种符号要编码,至少需要</span><span lang=EN-US>3</span><span
style='font-family:宋体;"Times New Roman"'>比特。假设编码成</span><span lang=EN-US>000,001,010,011,100,101,110,111(</span><span
style='font-family:宋体;"Times New Roman"'>称做码字</span><span lang=EN-US>)</span><span style='font-family:
宋体;"Times New Roman"'>。那么符号序列</span><span
lang=EN-US>S0S1S7S0S1S6S2S2S3S4S5S0S0S1</span><span style='font-family:宋体;
"Times New Roman"'>编码后变成</span><span
lang=EN-US>000001111000001110010010011100101000000001</span><span
style='font-family:宋体;"Times New Roman"'>,共用了</span><span lang=EN-US>42</span><span style='font-family:
宋体;"Times New Roman"'>比特。我们发现</span><span
lang=EN-US>S0</span><span style='font-family:宋体;
"Times New Roman"'>,</span><span lang=EN-US>S1</span><span
style='font-family:宋体;"Times New Roman"'>,</span><span lang=EN-US>S2</span><span style='font-family:
宋体;"Times New Roman"'>这三个符号出现的频率比较大,其它符号出现的频率比较小,如果我们采用一种编码方案使得</span><span
lang=EN-US>S0</span><span style='font-family:宋体;
"Times New Roman"'>,</span><span lang=EN-US>S1</span><span
style='font-family:宋体;"Times New Roman"'>,</span><span lang=EN-US>S2</span><span style='font-family:
宋体;"Times New Roman"'>的码字短,其它符号的码字长,这样就能够减少占用的比特数。例如,我们采用这样的编码方案:</span><span
lang=EN-US>S0</span><span style='font-family:宋体;
"Times New Roman"'>到</span><span lang=EN-US>S7</span><span
style='font-family:宋体;"Times New Roman"'>的码字分别</span><span lang=EN-US>01,11,101,0000,0001,0010,0011,100</span><span
style='font-family:宋体;"Times New Roman"'>,那么上述符号序列变成</span><span lang=EN-US>011110001110011101101000000010010010111</span><span
style='font-family:宋体;"Times New Roman"'>,共用了</span><span lang=EN-US>39</span><span style='font-family:
宋体;"Times New Roman"'>比特,尽管有些码字如</span><span
lang=EN-US>S3</span><span style='font-family:宋体;
"Times New Roman"'>,</span><span lang=EN-US>S4</span><span
style='font-family:宋体;"Times New Roman"'>,</span><span lang=EN-US>S5</span><span style='font-family:
宋体;"Times New Roman"'>,</span><span
lang=EN-US>S6</span><span style='font-family:宋体;
"Times New Roman"'>变长了</span><span lang=EN-US>(</span><span
style='font-family:宋体;"Times New Roman"'>由</span><span lang=EN-US>3</span><span style='font-family:
宋体;"Times New Roman"'>位变成</span><span
lang=EN-US>4</span><span style='font-family:宋体;
"Times New Roman"'>位</span><span lang=EN-US>)</span><span
style='font-family:宋体;"Times New Roman"'>,但使用频繁的几个码字如</span><span lang=EN-US>S0</span><span
style='font-family:宋体;"Times New Roman"'>,</span><span lang=EN-US>S1</span><span style='font-family:
宋体;"Times New Roman"'>变短了,所以实现了压缩。</span></p>
<p style='line-height:18.0pt'><span style='font-family:宋体;
"Times New Roman"'>上述的编码是如何得到的呢?随意乱写是不行的。编码必须保证不能出现一个码字和另一个的前几位相同的情况,比如说,如果</span><span
lang=EN-US>S0</span><span style='font-family:宋体;
"Times New Roman"'>的码字为</span><span lang=EN-US>01</span><span
style='font-family:宋体;"Times New Roman"'>,</span><span lang=EN-US>S2</span><span style='font-family:
宋体;"Times New Roman"'>的码字为</span><span
lang=EN-US>011</span><span style='font-family:宋体;
"Times New Roman"'>,那么当序列中出现</span><span lang=EN-US>011</span><span
style='font-family:宋体;"Times New Roman"'>时,你不知道是</span><span lang=EN-US>S0</span><span
style='font-family:宋体;"Times New Roman"'>的码字后面跟了个</span><span lang=EN-US>1</span><span
style='font-family:宋体;"Times New Roman"'>,还是完整的一个</span><span lang=EN-US>S2</span><span
style='font-family:宋体;"Times New Roman"'>的码字。我们给出的编码能够保证这一点。</span></p>
<p style='line-height:18.0pt'><span style='font-family:宋体;
"Times New Roman"'>下面给出具体的</span><span
lang=EN-US>Huffman</span><span style='font-family:宋体;"Times New Roman";"Times New Roman"'>编码算法。</span></p>
<p style='line-height:
18.0pt;'> <span
lang=EN-US>(1)<span style='font:7.0pt "Times New Roman"'>
</span></span> <span style='font-family:宋体;
"Times New Roman"'>首先统计出每个符号出现的频率,上例</span><span
lang=EN-US>S0</span><span style='font-family:宋体;
"Times New Roman"'>到</span><span lang=EN-US>S7</span><span
style='font-family:宋体;"Times New Roman"'>的出现频率分别为</span><span lang=EN-US>4/14</span><span
style='font-family:宋体;"Times New Roman"'>,</span><span lang=EN-US>3/14</span><span style='font-family:
宋体;"Times New Roman"'>,</span><span
lang=EN-US>2/14</span><span style='font-family:宋体;
"Times New Roman"'>,</span><span lang=EN-US>1/14</span><span
style='font-family:宋体;"Times New Roman"'>,</span><span lang=EN-US>1/14</span><span style='font-family:
宋体;"Times New Roman"'>,</span><span
lang=EN-US>1/14</span><span style='font-family:宋体;
"Times New Roman"'>,</span><span lang=EN-US>1/14</span><span
style='font-family:宋体;"Times New Roman"'>,</span><span lang=EN-US>1/14</span><span style='font-family:
宋体;"Times New Roman"'>。</span></p>
<p style='line-height:
18.0pt;'> <span
lang=EN-US>(2)<span style='font:7.0pt "Times New Roman"'>
</span></span> <span style='font-family:宋体;
"Times New Roman"'>从左到右把上述频率按从小到大的顺序排列。</span></p>
<p style='line-height:
18.0pt;'> <span
lang=EN-US>(3)<span style='font:7.0pt "Times New Roman"'>
</span></span> <span style='font-family:宋体;
"Times New Roman"'>每一次选出最小的两个值,作为二叉树的两个叶子节点,将和作为它们的根节点,这两个叶子节点不再参与比较,新的根节点参与比较。</span></p>
<p style='line-height:
18.0pt;'> <span
lang=EN-US>(4)<span style='font:7.0pt "Times New Roman"'>
</span></span> <span style='font-family:宋体;
"Times New Roman"'>重复</span><span lang=EN-US>(3)</span><span
style='font-family:宋体;"Times New Roman"'>,直到最后得到和为</span><span lang=EN-US>1</span><span
style='font-family:宋体;"Times New Roman"'>的根节点。</span></p>
<p style='line-height:
18.0pt;'> <span
lang=EN-US>(5)<span style='font:7.0pt "Times New Roman"'>
</span></span> <span style='font-family:宋体;
"Times New Roman"'>将形成的二叉树的左节点标</span><span lang=EN-US>0</span><span
style='font-family:宋体;"Times New Roman"'>,右节点标</span><span lang=EN-US>1</span><span style='font-family:
宋体;"Times New Roman"'>。把从最上面的根节点到最下面的叶子节点途中遇到的</span><span
lang=EN-US>0,1</span><span style='font-family:宋体;
"Times New Roman"'>序列串起来,就得到了各个符号的编码。</span></p>
<p style='line-height:18.0pt'><span style='font-family:宋体;
"Times New Roman"'>上面的例子用</span><span
lang=EN-US>Huffman</span><span style='font-family:宋体;"Times New Roman";"Times New Roman"'>编码的过程如图</span><span
lang=EN-US>9.1</span><span style='font-family:宋体;
"Times New Roman"'>所示,其中圆圈中的数字是新节点产生的顺序。可见,我们上面给出的编码就是这么得到的。</span></p>
<p class=a style='line-height:18.0pt'><span lang=EN-US> <img width=519 height=330
src="chap09.files/image001.gif" v:shapes="_x0000_i1025"> </span></p>
<p align=center style='text-align:center;line-height:18.0pt'><b><span
style='font-family:宋体;"Times New Roman"'>图</span>9.1 Huffman</b><b><span
style='font-family:宋体;"Times New Roman"'>编码的示意图</span><span lang=EN-US></span></b></p>
<p style='line-height:18.0pt'><span style='font-family:宋体;
"Times New Roman"'>产生</span><span
lang=EN-US>Huffman</span><span style='font-family:宋体;"Times New Roman";"Times New Roman"'>编码需要对原始数据扫描两遍。第一遍扫描要精确地统计出原始数据中,每个值出现的频率,第二遍是建立</span><span
lang=EN-US>Huffman</span><span style='font-family:宋体;"Times New Roman";"Times New Roman"'>树并进行编码。由于需要建立二叉树并遍历二叉树生成编码,因此数据压缩和还原速度都较慢,但简单有效,因而得到广泛的应用。</span></p>
<p style='line-height:18.0pt'><span style='font-family:宋体;
"Times New Roman"'>源程序就不给出了,有兴趣的读者可以自己实现。</span></p>
<h2> <span
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -