📄 115-117.html
字号:
<!DOCTYPE HTML PUBLIC "html.dtd"><HTML><HEAD><TITLE>The Data Compression Book-:Huffman One Better: Arithmetic Coding</TITLE><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><SCRIPT><!--function displayWindow(url, width, height) { var Win = window.open(url,"displayWindow",'width=' + width +',height=' + height + ',resizable=1,scrollbars=yes');}//--></SCRIPT></HEAD><BODY BGCOLOR="#FFFFFF" VLINK="#DD0000" TEXT="#000000" LINK="#DD0000" ALINK="#FF0000"><TD WIDTH="540" VALIGN="TOP"><!-- <CENTER><TABLE><TR><TD><FORM METHOD="GET" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-foldocsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE="Glossary Search"></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD><TD><IMG SRC="http://www.itknowledge.com/images/dotclear.gif" WIDTH="15" HEIGHT="1"></TD><TD><FORM METHOD="POST" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-subscriptionsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE=" Book Search "></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="backlink" TYPE="hidden" VALUE="http://search.itknowledge.com:80/excite/AT-subscriptionquery.html"><INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD></TR></TABLE></CENTER> --><!-- ISBN=1558514341//--><!-- TITLE=The Data Compression Book-//--><!-- AUTHOR=Mark Nelson//--><!-- PUBLISHER=IDG Books Worldwide, Inc.//--><!-- IMPRINT=M & T Books//--><!-- CHAPTER=5//--><!-- PAGES=115-117//--><!-- UNASSIGNED1//--><!-- UNASSIGNED2//--><CENTER><TABLE BORDER><TR><TD><A HREF="113-115.html">Previous</A></TD><TD><A HREF="../ewtoc.html">Table of Contents</A></TD><TD><A HREF="117-120.html">Next</A></TD></TR></TABLE></CENTER><P><BR></P><P>Each character is assigned the portion of the 0 to 1 range that corresponds to its probability of appearance. Note that the character “owns” everything up to, but not including, the higher number. So the letter T in fact has the range .90 to .9999…</P><P>The most significant portion of an arithmetic-coded message belongs to the first symbols—or B, in the message “BILL GATES.” To decode the first character properly, the final coded message has to be a number greater than or equal to .20 and less than .30. To encode this number, track the range it could fall in. After the first character is encoded, the low end for this range is .20 and the high end is .30.</P><P>During the rest of the encoding process, each new symbol will further restrict the possible range of the output number. The next character to be encoded, the letter I, owns the range .50 to .60 in the new subrange of .2 to .3. So the new encoded number will fall somewhere in the 50th to 60th percentile of the currently established range. Applying this logic will further restrict our number to .25 to .26. The algorithm to accomplish this for a message of any length is shown here:</P><!-- CODE SNIP //--><PRE>low = 0.0;high = 1.0;while ( ( c = getc( input ) ) != EOF ) { range = high - low; high = low + range * high_range( c ); low = low + range * low_range( c );}output ( low );</PRE><!-- END CODE SNIP //--><P>Following this process to its natural conclusion with our message results in the following table:</P><TABLE WIDTH="100%"><TR><TH COLSPAN="3"><HR><TR><TH WIDTH="30%" ALIGN="LEFT">New Character<TH WIDTH="30%" ALIGN="LEFT">Low value<TH WIDTH="40%" ALIGN="LEFT">High Value<TR><TH COLSPAN="3"><HR><TR><TD><TD>0.0<TD>1.0<TR><TD>B<TD>0.2<TD>0.3<TR><TD>I<TD>0.25<TD>0.26<TR><TD>L<TD>0.256<TD>0.258<TR><TD>L<TD>0.2572<TD>0.2576<TR><TD>SPACE<TD>0.25720<TD>0.25724<TR><TD>G<TD>0.257216<TD>0.257220<TR><TD>A<TD>0.2572164<TD>0.2572168<TR><TD>T<TD>0.25721676<TD>0.2572168<TR><TD>E<TD>0.257216772<TD>0.257216776<TR><TD>S<TD>0.2572167752<TD>0.2572167756<TR><TD COLSPAN="3"><HR></TABLE><P>So the final low value, .2572167752, will uniquely encode the message “BILL GATES” using our present coding scheme.</P><P>Given this encoding scheme, it is relatively easy to see how the decoding process operates. Find the first symbol in the message by seeing which symbol owns the space our encoded message falls in. Since .2572167752 falls between .2 and .3, the first character must be B. Then remove B from the encoded number. Since we know the low and high ranges of B, remove their effects by reversing the process that put them in. First, subtract the low value of B, giving .0572167752. Then divide by the width of the range of B, or .1. This gives a value of .572167752. Then calculate where that lands, which is in the range of the next letter, I. The algorithm for decoding the incoming number is shown next:</P><!-- CODE SNIP //--><PRE>number = input_code();for ( ; ; ) { symbol = find_symbol_straddling_this_range( number ); putc( symbol ); range = high_range( symbol ) - low_range( symbol ); number = number - low_range( symbol ); number = number / range;}</PRE><!-- END CODE SNIP //--><P>I have conveniently ignored the problem of how to decide when there are no more symbols left to decode. This can be handled either by encoding a special end-of-file symbol or by carrying the stream length with the encoded message. In the example in this chapter, as elsewhere in the book, I carry along a special end-of-stream symbol that tells the decoder when it is out of symbols. The decoding algorithm for the “BILL GATES” message will proceed as shown:</P><TABLE WIDTH="100%"><TR><TH COLSPAN="5"><HR><TR><TH WIDTH="30%" ALIGN="LEFT">Encoded Number<TH WIDTH="25%" ALIGN="LEFT">Output Symbol<TH WIDTH="15%" ALIGN="LEFT">Low<TH WIDTH="15%" ALIGN="LEFT">High<TH WIDTH="15%" ALIGN="LEFT">Range<TR><TH COLSPAN="5"><HR><TR><TD>0.2572167752<TD>B<TD>0.2<TD>0.3<TD>0.1<TR><TD>0.572167752<TD>I<TD>0.5<TD>0.6<TD>0.1<TR><TD>0.72167752<TD>L<TD>0.6<TD>0.8<TD>0.2<TR><TD>0.6083876<TD>L<TD>0.6<TD>0.8<TD>0.2<TR><TD>0.041938<TD>SPACE<TD>0.0<TD>.1<TD>0.1<TR><TD>0.41938<TD>G<TD>0.4<TD>0.5<TD>0.1<TR><TD>0.1938<TD>A<TD>0.2<TD>0.3<TD>0.1<TR><TD>0.938<TD>T<TD>0.9<TD>1.0<TD>0.1<TR><TD>0.38<TD>E<TD>0.3<TD>0.4<TD>0.1<TR><TD>0.8<TD>S<TD>0.8<TD>0.9<TD>0.1<TR><TD>0.0<TD><TD><TD><TD><TR><TD COLSPAN="5"><HR></TABLE><P>In summary, the encoding process is simply one of narrowing the range of possible numbers with every new symbol. The new range is proportional to the predefined probability attached to that symbol. Decoding is the inverse procedure, in which the range is expanded in proportion to the probability of each symbol as it is extracted.</P><P><BR></P><CENTER><TABLE BORDER><TR><TD><A HREF="113-115.html">Previous</A></TD><TD><A HREF="../ewtoc.html">Table of Contents</A></TD><TD><A HREF="117-120.html">Next</A></TD></TR></TABLE></CENTER></TD></TR></TABLE></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -