📄 lzrw4.txt

📁 一个基于lZW压缩算法的高效实现
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页
--<Start of LZRW4 paper>--LZRW4: ZIV AND LEMPEL MEET MARKOV=================================Author : Ross WilliamsDate   : 23-Jul-1991.This is a public domain document and may be copied freely so long asit is not modified (but text formatting transformations are permittedthough).Note: This paper has not been polished to publication-standard. It hasbeen prepared under considerable time pressure and is released mainlyto make the ideas it contains public so as to avoid othersre-inventing and patenting them.Abstract--------In 1977 and 1978, Ziv and Lempel[Ziv77][Ziv78] introduced the LZ classof adaptive dictionary data compression techniques that have become sovery popular today. In competition with these techniques are theMarkov algorithms that give better compression but run much moreslowly. This paper compares the two classes of algorithm and shows howalgorithms that combine techniques from both classes have thepotential to provide higher performance (i.e. speed/ compression tradeoff) than we have seen so far. An algorithm called LZRW4 that combinesMarkov and LZ77 ideas is sketched.Comparison of Ziv/Lempel and Markov Algorithms----------------------------------------------To the casual observer, the Ziv/Lempel (LZ) data compressionalgorithms may appear to be worlds away from the lesser know Markovalgorithms. LZ algorithms construct trees of strings and transmitstrings as codewords. Markov algorithms construct large numbers ofcontexts and fiddle with arithmetic ranges and probabilities. The twokinds of algorithms seem to operate with totally different objects.However, deep down, both classes of algorithm are using the sameprobabilistic engine. Deep down, both classes must be representingmore common events with short codes and less common events with longercodes. This is the law of data compression. But, without some carefulthought, it's hard to see how the two classes relate.In the early eighties, Glen Langdon put in some careful thought and alot of insight and came up with two papers[Langdon83] and [Langdon84]which for me are pivotal in my understanding of the field of datacompression. I cannot recommend these papers highly enough to anyonewho wishes to gain an understanding of what REALLY makes LZW tick orto anyone who wants a broader perspective of the field. The papersseem highly focused and obscure, but in fact they cast a broad softlight over the whole field of text compression, illuminating areasthat for me were very dark (but might not be so dark for people whoread IEEE Transactions on Information Theory like Dickens). The twopapers have to be read carefully though. An earlier paper [Rissanen81]by Rissanen and Langdon gives an even more general view, but does notdeal directly with LZ algorithms as the later papers do and so I havefocussed on the later papers.The idea behind the two papers is that all dictionary algorithms,including the LZ classes (LZ77 and LZ78), are subsets (can be emulatedby) Markov algorithms, but the converse is not true. Langdon shows usa clear view of LZ algorithms through Markov eyes. This view is highlyilluminating.In [Langdon83], Langdon examines the LZ78 class of algorithm closelyand explains the underlying engine in probabilistic terms. In[Langdon84], he has a more thorough byte at the problem, formalizingthe whole case and proving that dictionary algorithms can always beemulated by Markov algorithms, but not the converse.I will attempt to explain the idea of [Langdon83] briefly. I willreview LZW but not give a detailed explanation of how it works.LZ78/LZW builds its dictionary by parsing the input as phrasesconsisting of the longest dictionary match to date. At the end of eachphrase parse, the algorithm adds a new phrase to the dictionaryconsisting of the current phrase plus the next input character. Thedictionary grows forever, adding one new phrase per phrase parsed.The dictionary may be viewed in many ways. We choose to view it as a(forwards) trie structure whose leaves are to the right and whose rootis at the left. The following diagram illustrates such a tree. Eacharc carries a single character label. Each node represents a singlestring. In the diagram, arcs are labelled in lower case and nodes inupper case.         AA        /      a/      A    a/ \b    /   \Root     AB   b\     \      BAll dictionaries can be viewed in this way.In LZW, each node is allocated a code value. Suppose we have two-bitcode words and the codes are allocated as follows:   0: A   1: B   2: AA   3: ABThis code allocation implies that each event has an equal probabilityof one quarter. We can thus label our tree with node occurrenceprobabilities.         AA 1/4        /      a/      A  1/4    a/ \b    /   \Root     AB 1/4   b\     \      B 1/4These node probabilities can then be converted into probabilities forthe ARCS:         AA        /       /a 1/3      A 3/4a/ \b 1/3    /   \Root     AB   b\  1/4\      B(the 1/3 and 1/3 don't add up because the A node (which represents animaginary arc labelled by the empty string) has 1/3 too.)These probabilties are exactly the sort of fodder that we need to feeda Markov algorithm. From the above, it can be seen that the act of LZWof parsing the longest match and assigning a new codeword, can beviewed as the execution of a series of single-byte predictiondecisions each with its own separate probabilities and each capable ofbeing coded independently and with identical efficiency usingarithmetic coding. You can't see this normally because LZW does theparse all in one go and outputs (for each phrased) a codeword whichlooks, for all the world, to be indivisible. But deep down, we allknow that it isn't. Instead, it's made up of little pieces of entropyaccumulated during the traversal and cleverly bundled into 16 bitcodewords at the last minute.So far, we have converted our LZW tree to a Markov-like predictiontree. Now let's analyse it.Consider a single node in the LZW tree. Associated with the node is astring (being the characters on the arc leading from the root). Eachtime that the parse reaches the node, it means that the node's stringhas occurred at the start of a phrase boundary. LZW then goes onfurther to the node's child* nodes and eventually reaches a leaf whereit adds a new node AND ALLOCATES A NEW CODEWORD. The result is thateach node accumulates a subtree whose cardinality (=number of subnodesincluding itself in this paper=number of codewords allocated to thesubtree) is proportional to the probability of the node's stringarising as a prefix of a parse. As the number of subnodes is also thenode's share of the probability space (because there is one codewordallocated to each node), we see that the amount of code spaceallocated to a node is proportional to the number of times it hasoccurred. Very neat.All this stuff is in [Langdon83] which quantifies it.The above explains why LZW compresses so well. Why then does itcompress so badly (compared to Markov models)? Without going intodetails, here are some reasons:   * Approximately half of the code space is wasted at any point of time     in unallocated codewords.   * Probability estimates are inaccurate near the leaves.Other such reasons eat away at the compression. In particular, thereis one inefficiency of LZW-type compression that stands out.In our analysis, each LZW parse of length d consists of dsingle-character predictions. Each prediction is made in the contextof previous predictions. Thus, after having parsed part of a phrase"ab", the LZW algorithm has two characters of Markov context withwhich to "predict" the next character. After having parsed"compressi", it would have nine characters of context. Thus we seethat, to the extent that we can view LZ78/LZW algorithms as makingpredictions, they make predictions using a sawtooth context-length.Order     ^    4|              *.    3|            *  .            *.          *.    2|    *.    *    .    *.    *  .        *  .    1|  *  .  *      .  *  .  *    .  *.  *    .    0|*    .*        .*    .*      .*  .*      .     +-----|---------|-----|-------|---|-------|-----> Characters    phrase1    phr2   phr3   phr4  phr5   phr6In contrast, Markov algorithms make predictions using a fairly flat(or sometimes totally flat) context line. Thus, while a Markovalgorithm may make all of its predictions based on a 3-charactercontext, a LZW algorithm will make only some of its predictions fromthat depth.The thing that really kills LZW's compression is the first one or twocharacters of each phrase, the first of which is made based on nocontext at all and the second of which is made based on a context of asingle character. After that the context is sufficient not tointroduce too much inefficiency. See Figure 64 of [WIlliams91] to getan idea of how much impact this has on compression.To compensate for its poor performance in the first one or twocharacters, LZW grows an enormous tree. This has the effect ofincreasing the average length of a phrase so that the beginning ofphrases occurs less often. This is like Pollyanna adding extra days tothe week to avoid Sundays. As the length of the input increases toinfinity, so does the average phrase length, with the startling resultthat by the time you get to infinity, LZW has converged on the entropyof the source (as Ziv and Lempel proved in their foundingpaper[Ziv78]). The rate of convergence depends on the average phraselength which depends on the entropy of the source. In practice, forordinary sources (e.g. text files), the average phrase length does notrise fast enough to properly offset the effect of thelow-context-length predictions at the start of each phrase.An additional reason for the relatively poor compression performanceof LZW can be found in its phrasing. Markov algorithms predict eachcharacter of the input one at a time and record contextual informationat each step. In contrast LZW divides the input up into phrases eachof which causes a single parse through its tree. Thus, the"zero-order" model that LZW builds up for the first character of eachphrase is based only on the first character of all the previousphrases it has seen whereas a zero-order Markov model builds itsmodel based on ALL the characters it has seen. This applies for allorders.To summarize:   * All dictionary algorithms can be cast into Markov form.   * The compression grunt (order) that LZW applies is in the shape     of an irregular sawtooth.   * LZW collects statistics for each context only on phrase boundaries.   * A little thought reveals that all this applies to the LZ77 class     as well.A brief discussion of all this can be found in Chapter one of my book[Williams91] and in particular section 1.13:"Dictionary Methods vsMarkov Methods". However, for a full treatment, the reader is referredto [Langdon83] and [Langdon84].Ziv and Lempel meet Markov--------------------------Currently, the field of text data compression has two separate methodsbeing statistical (Markov) and dictionary (Ziv and Lempel). Markovalgorithms inch through the input character by character, recordingand using context information at each step. This yields the bestcompression, but makes Markov algorithms slow. Ziv and Lempelalgorithms gallop through the input one phrase at a time, losing lotsof non-phrase-aligned context information and using zero order modelsat the start of each phrase. This yields a higher speed but poorercompression.Obviously it would be highly desirable to combine the speed of the Zivand Lempel algorithms with the compression power of the Markovalgorithms. As time goes by this seems less and less likely (withoutspecial (e.g. parallel) hardware). However, there does seem to be oneway of combining the two classes of algorithm and to my surprise, Ihave not yet seen it used.The idea is to create a group of contexts, but to parse phrases fromeach context. For example, we might create 256 1-character contextsand grow an LZW tree in each of them. Compression would proceedexactly as with LZW except that the most recent character transmitted(the last character of the previous phrase) would be used to selectone of the 256 contexts, each of which would contain an LZW tree whichwould then be used to parse the phrase.The result would be to take the low-order edge off LZW without losingany of its speed. Each character of the phrase would be "predicted"using a model one order higher than would have been used in LZW. Athigh orders this would have negligible effect, but at low orders itwould have a significant effect, for the first character of eachphrase, formerly predicted by a zero-order model, would now bepredicted by a first-order model. Each phrase would be coded usingonly the range required by the size of the tree in its context.By using contexts at the start of the LZW tree, we raise thebottom of the sawtooth a few notches.
12 下一页
💿 文件大小 105 K
👤 上传用户 zhangyuntong
📂 所属分类压缩解压
🏷️ 相关标签

#lZW #压缩算法
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -