📄 226-228.html
字号:
<!DOCTYPE HTML PUBLIC "html.dtd"><HTML><HEAD><TITLE>The Data Compression Book-:Sliding Window Compression</TITLE><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><SCRIPT><!--function displayWindow(url, width, height) { var Win = window.open(url,"displayWindow",'width=' + width +',height=' + height + ',resizable=1,scrollbars=yes');}//--></SCRIPT></HEAD><BODY BGCOLOR="#FFFFFF" VLINK="#DD0000" TEXT="#000000" LINK="#DD0000" ALINK="#FF0000"><TD WIDTH="540" VALIGN="TOP"><!-- <CENTER><TABLE><TR><TD><FORM METHOD="GET" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-foldocsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE="Glossary Search"></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD><TD><IMG SRC="http://www.itknowledge.com/images/dotclear.gif" WIDTH="15" HEIGHT="1"></TD><TD><FORM METHOD="POST" ACTION="http://search.itknowledge.com/excite/cgi-bin/AT-subscriptionsearch.cgi"><INPUT NAME="search" SIZE="20" VALUE=""><BR><CENTER><INPUT NAME="searchButton" TYPE="submit" VALUE=" Book Search "></CENTER><INPUT NAME="source" TYPE="hidden" VALUE="local" CHECKED> <INPUT NAME="backlink" TYPE="hidden" VALUE="http://search.itknowledge.com:80/excite/AT-subscriptionquery.html"><INPUT NAME="bltext" TYPE="hidden" VALUE="Back to Search"><INPUT NAME="sp" TYPE="hidden" VALUE="sp"></FORM></TD></TR></TABLE></CENTER> --><!-- ISBN=1558514341//--><!-- TITLE=The Data Compression Book-//--><!-- AUTHOR=Mark Nelson//--><!-- PUBLISHER=IDG Books Worldwide, Inc.//--><!-- IMPRINT=M & T Books//--><!-- CHAPTER=8//--><!-- PAGES=226-228//--><!-- UNASSIGNED1//--><!-- UNASSIGNED2//--><CENTER><TABLE BORDER><TR><TD><A HREF="223-226.html">Previous</A></TD><TD><A HREF="../ewtoc.html">Table of Contents</A></TD><TD><A HREF="228-232.html">Next</A></TD></TR></TABLE></CENTER><P><BR></P><H4 ALIGN="LEFT"><A NAME="Heading8"></A><FONT COLOR="#000077">Greedy vs. Best Possible</FONT></H4><P>Both LZ77 and LZSS are called “greedy” algorithms: They don’t look ahead into the input stream to analyze it for the best combination of indices and characters. Consider a dictionary-based encoding scheme that used nine bits to encode a single character and twenty-five bits to encode a combined index/offset pair. This scheme would have a break-even point somewhere between two and three characters, which means it would encode a match of two characters as two individual symbols and a match of three symbols as in index/offset token.</P><P>Consider now how we would go about encoding the phrase “Go To Statement Considered Harmful” if the contents of the phrase dictionary contained the following fragments: “Go T” “o S” “tat” “Stat.” A greedy encoder would naturally encode the “Go T” phrase of four characters length first, followed by the “o S” phrase of three characters length, then the “tat” phrase of three characters length. The output of the encoder up to this point would look like this:</P><TABLE WIDTH="70%"><TR><TD WIDTH="45%" ALIGN="LEFT">Offset/Length of “Go T”<TD WIDTH="10%" ALIGN="LEFT">:<TD WIDTH="45%" ALIGN="LEFT">25 bits<TR><TD>Offset/Length of “o S”<TD>:<TD>25 bits<TR><TD>Offset/Length of “tat”<TD>:<TD>25 bits<TR><TD><TD><TD>————<TR><TD><TD><TD>75 bits</TABLE><P>The encoder looks like it was doing what makes sense, trying to build phrases up instead of characters. But an optimal encoder would encode the fragment as shown:</P><TABLE WIDTH="70%"><TR><TD WIDTH="45%" ALIGN="LEFT">Offset/Length of “Go “<TD WIDTH="10%" ALIGN="LEFT">:<TD WIDTH="45%" ALIGN="LEFT">25 bits<TR><TD>Character ‘T’<TD>:<TD>9 bits<TR><TD>Character ‘o’<TD>:<TD>9 bits<TR><TD>Offset/Length of “Stat”<TD>:<TD>25 bits<TR><TD><TD><TD>———-<TR><TD><TD><TD>68 bits</TABLE><P>These figures clearly show that the greedy encoder did not do as well as the optimal encoder. But it should also be noted that even in this contrived example, the difference between the two is only about 10 percent. When using dictionary coding, it is difficult to find examples of optimal encoders outperforming greedy encoders by more than a few percent. The largest differences occur when only short phrases are in the dictionary, and there is a real possibility that encoding single symbols will take less space than a phrase.</P><P>The problem with optimal coding is simply one of payback. Implementing an optimal encoder generally means that encoding speed will be drastically reduced. While optimizing algorithms are available, they tend to be CPU intensive, and the profit derived is generally small. In the world of data compression, a few good heuristics are often more respected than a provably superior algorithm. The greedy heuristic in this case is definitely the choice of most compression programmers.</P><H3><A NAME="Heading9"></A><FONT COLOR="#000077">The Code</FONT></H3><P>The C implementation of LZSS shown here is relatively simple. A production program would probably want to take advantage of numerous potential improvements, which will be discussed at the end of the chapter.</P><P>By the very nature of LZSS compression, the compression program will be considerably more complicated than the decoder. The decoder does not have to worry about maintaining the tree or searching for matches. Those two activities are what the encoder spends most of its time doing.</P><H4 ALIGN="LEFT"><A NAME="Heading10"></A><FONT COLOR="#000077">Constants and Macros</FONT></H4><P>All of the constants and global data used in this program are shown following. The parameters of the text window are initially defined by deciding how many bits to allocate to the two fields used to define a pointer or index into the text window. In this example. INDEX_BIT_COUNT is set to twelve: It will use twelve bits to define an index into the text window. The LENGTH_BIT_COUNT macro is set to four bits, which means it will use a four-bit field to encode the length of a matching phrase.</P><P>After determining the size of the two bit fields, other macros can be given values derived from them. First, the WINDOW_SIZE is directly determined by the size of the INDEX_BIT_COUNT. In this case, our text window will consist of 4,096 bytes, or 1 << 12. Since we have allocated four bits for the length parameter used to encode a phrase, we will be able to encode a length of up to sixteen bytes, or 1 << 4. This is defined as the RAW_LOOK_AHEAD_SIZE.</P><P><BR></P><CENTER><TABLE BORDER><TR><TD><A HREF="223-226.html">Previous</A></TD><TD><A HREF="../ewtoc.html">Table of Contents</A></TD><TD><A HREF="228-232.html">Next</A></TD></TR></TABLE></CENTER></TD></TR></TABLE></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -