115-117.html

来自「The primary purpose of this book is to e」· HTML 代码 · 共 258 行

HTML

258 行

<!DOCTYPE HTML PUBLIC "html.dtd"><HTML><HEAD><TITLE>The Data Compression Book-:Huffman One Better: Arithmetic Coding</TITLE><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><SCRIPT><!--function displayWindow(url, width, height) {        var Win = window.open(url,"displayWindow",'width=' + width +',height=' + height + ',resizable=1,scrollbars=yes');}//--></SCRIPT></HEAD><BODY  BGCOLOR="#FFFFFF" VLINK="#DD0000" TEXT="#000000" LINK="#DD0000" ALINK="#FF0000"><TD WIDTH="540" VALIGN="TOP"><!--  <CENTER><TABLE><TR><TD><FORM METHOD="GET" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-foldocsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE="Glossary Search"></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD><TD><IMG SRC="http://www.itknowledge.com/images/dotclear.gif" WIDTH="15"   HEIGHT="1"></TD><TD><FORM METHOD="POST" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-subscriptionsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE="  Book Search  "></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="backlink" TYPE="hidden" VALUE="http://search.itknowledge.com:80/excite/AT-subscriptionquery.html"><INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD></TR></TABLE></CENTER> --><!-- ISBN=1558514341//--><!-- TITLE=The Data Compression Book-//--><!-- AUTHOR=Mark Nelson//--><!-- PUBLISHER=IDG Books Worldwide, Inc.//--><!-- IMPRINT=M & T Books//--><!-- CHAPTER=5//--><!-- PAGES=115-117//--><!-- UNASSIGNED1//--><!-- UNASSIGNED2//--><CENTER><TABLE BORDER><TR><TD><A HREF="113-115.html">Previous</A></TD><TD><A HREF="../ewtoc.html">Table of Contents</A></TD><TD><A HREF="117-120.html">Next</A></TD></TR></TABLE></CENTER><P><BR></P><P>Each character is assigned the portion of the 0 to 1 range that corresponds to its probability of appearance. Note that the character &#147;owns&#148; everything up to, but not including, the higher number. So the letter T in fact has the range .90 to .9999&#133;</P><P>The most significant portion of an arithmetic-coded message belongs to the first symbols&#151;or B, in the message &#147;BILL GATES.&#148; To decode the first character properly, the final coded message has to be a number greater than or equal to .20 and less than .30. To encode this number, track the range it could fall in. After the first character is encoded, the low end for this range is .20 and the high end is .30.</P><P>During the rest of the encoding process, each new symbol will further restrict the possible range of the output number. The next character to be encoded, the letter I, owns the range .50 to .60 in the new subrange of .2 to .3. So the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established range. Applying this logic will further restrict our number to .25 to .26. The algorithm to accomplish this for a message of any length is shown here:</P><!--  CODE SNIP //--><PRE>low = 0.0;high = 1.0;while ( ( c = getc( input ) ) != EOF ) &#123;    range = high - low;    high = low &#43; range * high_range( c );    low = low &#43; range * low_range( c );&#125;output ( low );</PRE><!--  END CODE SNIP //--><P>Following this process to its natural conclusion with our message results in the following table:</P><TABLE WIDTH="100%"><TR><TH COLSPAN="3"><HR><TR><TH WIDTH="30%" ALIGN="LEFT">New Character<TH WIDTH="30%" ALIGN="LEFT">Low value<TH WIDTH="40%" ALIGN="LEFT">High Value<TR><TH COLSPAN="3"><HR><TR><TD><TD>0.0<TD>1.0<TR><TD>B<TD>0.2<TD>0.3<TR><TD>I<TD>0.25<TD>0.26<TR><TD>L<TD>0.256<TD>0.258<TR><TD>L<TD>0.2572<TD>0.2576<TR><TD>SPACE<TD>0.25720<TD>0.25724<TR><TD>G<TD>0.257216<TD>0.257220<TR><TD>A<TD>0.2572164<TD>0.2572168<TR><TD>T<TD>0.25721676<TD>0.2572168<TR><TD>E<TD>0.257216772<TD>0.257216776<TR><TD>S<TD>0.2572167752<TD>0.2572167756<TR><TD COLSPAN="3"><HR></TABLE><P>So the final low value, .2572167752, will uniquely encode the message &#147;BILL GATES&#148; using our present coding scheme.</P><P>Given this encoding scheme, it is relatively easy to see how the decoding process operates. Find the first symbol in the message by seeing which symbol owns the space our encoded message falls in. Since .2572167752 falls between .2 and .3, the first character must be B. Then remove B from the encoded number. Since we know the low and high ranges of B, remove their effects by reversing the process that put them in. First, subtract the low value of B, giving .0572167752. Then divide by the width of the range of B, or .1. This gives a value of .572167752. Then calculate where that lands, which is in the range of the next letter, I. The algorithm for decoding the incoming number is shown next:</P><!--  CODE SNIP //--><PRE>number = input_code();for ( ; ; ) &#123;    symbol = find_symbol_straddling_this_range( number );    putc( symbol );    range = high_range( symbol ) - low_range( symbol );    number = number - low_range( symbol );    number = number / range;&#125;</PRE><!--  END CODE SNIP //--><P>I have conveniently ignored the problem of how to decide when there are no more symbols left to decode. This can be handled either by encoding a special end-of-file symbol or by carrying the stream length with the encoded message. In the example in this chapter, as elsewhere in the book, I carry along a special end-of-stream symbol that tells the decoder when it is out of symbols. The decoding algorithm for the &#147;BILL GATES&#148; message will proceed as shown:</P><TABLE WIDTH="100%"><TR><TH COLSPAN="5"><HR><TR><TH WIDTH="30%" ALIGN="LEFT">Encoded Number<TH WIDTH="25%" ALIGN="LEFT">Output Symbol<TH WIDTH="15%" ALIGN="LEFT">Low<TH WIDTH="15%" ALIGN="LEFT">High<TH WIDTH="15%" ALIGN="LEFT">Range<TR><TH COLSPAN="5"><HR><TR><TD>0.2572167752<TD>B<TD>0.2<TD>0.3<TD>0.1<TR><TD>0.572167752<TD>I<TD>0.5<TD>0.6<TD>0.1<TR><TD>0.72167752<TD>L<TD>0.6<TD>0.8<TD>0.2<TR><TD>0.6083876<TD>L<TD>0.6<TD>0.8<TD>0.2<TR><TD>0.041938<TD>SPACE<TD>0.0<TD>.1<TD>0.1<TR><TD>0.41938<TD>G<TD>0.4<TD>0.5<TD>0.1<TR><TD>0.1938<TD>A<TD>0.2<TD>0.3<TD>0.1<TR><TD>0.938<TD>T<TD>0.9<TD>1.0<TD>0.1<TR><TD>0.38<TD>E<TD>0.3<TD>0.4<TD>0.1<TR><TD>0.8<TD>S<TD>0.8<TD>0.9<TD>0.1<TR><TD>0.0<TD><TD><TD><TD><TR><TD COLSPAN="5"><HR></TABLE><P>In summary, the encoding process is simply one of narrowing the range of possible numbers with every new symbol. The new range is proportional to the predefined probability attached to that symbol. Decoding is the inverse procedure, in which the range is expanded in proportion to the probability of each symbol as it is extracted.</P><P><BR></P><CENTER><TABLE BORDER><TR><TD><A HREF="113-115.html">Previous</A></TD><TD><A HREF="../ewtoc.html">Table of Contents</A></TD><TD><A HREF="117-120.html">Next</A></TD></TR></TABLE></CENTER></TD></TR></TABLE></BODY></HTML>

115-117.html - 源码说明

本页面展示了「The primary purpose of this book is to explain various data-compression techniques using the C progr」中的 115-117.html 源码文件，采用 HTML 编程语言编写，共 258 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与data-compression相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?