📄 lzrw5.txt
字号:
IMPORTANT NOTE: The LZRW5 document was posted on 23-Jul-1991.That document appears at the end of this file.At the time I thought LZRW5 was a new idea. However, I was wrong as isshown by the following email which I have included here before theoriginal document.======================================================================Path: spam!rossFrom: ross@spam.ua.oz.au (Ross Williams)Newsgroups: comp.compressionSubject: LZRW5 is NOT original.Summary: LZRW5 is NOT original.Keywords: lzrw5 data compression algorithm orginality yabba yabbawhapMessage-ID: <967@spam.ua.oz>Date: 28 Jul 91 16:21:46 GMTSender: ross@spam.ua.ozFollowup-To: comp.compressionDistribution: worldOrganization: Statistics, Pure & Applied Mathematics, University of AdelaideLines: 26LZRW5 is NOT Original=====================Earlier I posted my LZRW5 as an original algorithm. However, I haverecently discovered that it has appeared before. The same idea ofkeeping an array of pointers to phrases in an LZW algorithm was usedby Dan Bernstein in his yabbawhap compression package which can befound in comp.sources.unix vol 24. At least one site where you can getthis is UUNET (in Virginia) in uunet.uu.net:comp.sources.unix/volume24/yabbawhap.Dan developed the yabba program about one year ago and posted it toalt.comp.compression in March 1991.Boo hoo,Ross Williamsross@spam.ua.oz.auPS: How embarrassing - I actually created alt.comp.compression! Danposted yabba after I thought I had destroyed alt.comp.compression andwhile I was busy with creating comp.compression! I nearly looked atyabba recently too, but decided to spend the time getting LZRW4 andLZRW5 out! I have not yet sighted yabba source code, being busy atpresent with non-compression stuff.======================================================================Path: spam!sirius.ucs.adelaide.edu.au!yoyo.aarnet.edu.au!munnari.oz.au!samsung!uakari.primate.wisc.edu!sdd.hp.com!wupost!emory!ox.com!yale.edu!cmcl2!kramden.acf.nyu.edu!brnstndFrom: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)Newsgroups: comp.compressionSubject: Re: LZRW5: A Turbocharged LZW-like Decompressor.Message-ID: <28577.Jul2519.12.1791@kramden.acf.nyu.edu>Date: 25 Jul 91 19:12:17 GMTReferences: <961@spam.ua.oz>Organization: IRLines: 22While I'm sure we all appreciate Ross's extensive contributions to thefield of compression, I have to point out that his ``turbocharging'' isexactly the method stated in my introduction to Y coding (draft 4b,3/19/91), in section 2 (``Implementing LZW''):: --- Decoding:: LZW decoding is even easier than encoding: the dictionary does not have: to support searching. The easiest (and generally fastest) method is to: keep I in memory as it is reconstructed, and to keep track of which: substring of I a given dictionary number corresponds to. To add pc to: the dictionary means to add the pair (pointer to start of pc within I,: length of pc) at the right spot in an array indexed by dictionary: numbers. There are methods which take slightly less memory than this,: but they are slower.I applied this to LZW last year, and indeed it makes a very fast, simpledecompressor. My ``unwhap'' decompressor for AP, as included in myyabbawhap package, comp.sources.unix volume 24, also applies the sametechnique.---Dan======================================================================Path: spam!sirius.ucs.adelaide.edu.au!yoyo.aarnet.edu.au!munnari.oz.au!samsung!mips!swrinde!zaphod.mps.ohio-state.edu!think.com!hsdndev!cmcl2!kramden.acf.nyu.edu!brnstndFrom: brnstnd@kramden.acf.nyu.edu (Dan Bernstein)Newsgroups: comp.compressionSubject: Re: LZRW5 is NOT original.Message-ID: <18214.Jul2902.59.2991@kramden.acf.nyu.edu>Date: 29 Jul 91 02:59:29 GMTReferences: <967@spam.ua.oz>Organization: IRLines: 42In article <967@spam.ua.oz> ross@spam.ua.oz.au (Ross Williams) writes:> Earlier I posted my LZRW5 as an original algorithm. However, I have> recently discovered that it has appeared before. The same idea of> keeping an array of pointers to phrases in an LZW algorithm was used> by Dan Bernstein in his yabbawhap compression package which can be> found in comp.sources.unix vol 24.I used it in unwhap, and it's the main reason unwhap runs as fast asuncompress---AP does three to five times more dictionary manipulationsthan LZW. However, I haven't figured out how to apply it to Y coding.(I have a very promising lead, though, so don't be surprised if and whenundabba is faster than unyabba... more news soon...)> Dan developed the yabba program about one year ago and posted it to> alt.comp.compression in March 1991.Actually, I posted my introduction to Y coding to alt.comp.compressionon March 6, 1991, a few days before the group was officially removed. Iincluded a slightly revised draft with yabbawhap, which I posted tocomp.sources.unix, not alt.comp.compression. I discovered Y coding onDecember 26, 1990. A year ago (July 28, 1990, according to my mailrecords) I discovered AP coding, which I called ``B coding'' in privatediscussions until a month later, when Brad Templeton pointed outStorer's prior discovery of the method. To get back to the topic athand, my implementations of AP during August 1990 all used the sametechnique as in Ross's recent LZRW5. I don't know if I was the first tocome upon this idea.While I'm on a historical accuracy kick: Don't trust anything you readin the May 1991 Computer Language article on data compression. I quote:``One of the most popular data-compression algorithms is thesliding-window dictionary (SWD) variation on the Lempel-Ziv-Welch (LZW)algorithm, developed around 1984-85.'' First of all, LZW was developedin 1983 (both by Welch and by Miller-Wegman), and published in 1984, so``developed around 1984-85'' is silly. Second, I haven't gone throughthe code in that article in detail, but there is no such thing as ``thesliding-window dictionary variation on LZW.'' The article appears todescribe one of the LZ77 variants. The article goes on to use ``SWD'' asthe name of the method; this is nonstandard at best (though one can saythe same of almost all of Storer's terminology).---Dan======================================================================--<Start of LZRW5 paper>--LZRW5: A TURBOCHARGED LZW-LIKE DECOMPRESSOR===========================================Author : Ross WilliamsDate : 23-Jul-1991.This is a public domain document and may be copied freely so long asit is not modified (but text formatting transformations are permittedthough).Abstract--------A technique is described for constructing an LZW-like algorithm thatyields the same compression as LZW but whose decompressor runs muchfaster (93K/s for a 68000 assembler implementation on an 8MHzMac-SE20). The implementation of the compression algorithm is openended (although a modified version of LZW could be used).Unfortunately it is possible that the technique may be covered bythird party patents.Distribution Algorithms-----------------------Over the last month or so, one or two people have posted requests forwhat I call "distribution algorithms", which are data compressionalgorithms where the decompressor's speed and memory consumption ismuch more important than the compressor's. Such algorithms are usefulwhen distributing data, where the data will be compressed once by thedistributor, but decompressed thousands or perhaps millions of timesby users. The distributor can afford to use a hugely powerful computerto perform the compression, but the decompression must be performed oneach user computer using minimum resources.While thinking about this, I happened to apply the phrase table ideaof my LZRW2 algorithm to the LZW algorithm to produce an LZW-likealgorithm that provides the same compression performance as LZW, butyields very much faster decompression, possibly at a cost in memoryconsumption and speed of the compressor.Description of the LZRW5 Idea------------------------------The trick is for the compressor and decompressor to maintain a "phrasetable" of roughly the most recent 2^n phrases parsed (where n is thecodeword width). Apart from a large memory output buffer, this is theONLY data structure of the decompressor. The compressor is likely toneed other data structures. Each phrase table entry consists of apointer to the phrase and the length of the phrase. An index rotatesthrough this table indicating the next position to overwrite. Pointerspoint to the input buffer in the compressor and the output buffer inthe decompressor.To process each phrase, the decompressor receives a phrase number fromthe compressor. It then looks up the pointer and the length of the phraseand copies the phrase to the output. It then overwrites the next positionin the phrase table with a pointer to the phrase just coded, but putsin the length+1 in the table.The compressor does all this too but keeps an additional table thatenables it to select the longest match from the strings in the phrasetable.Here is a rough outline for the decompressor code. A complete 68000C and assembly language version appears in Appendix A. unsigned cycle; -- Index of next slot to write. unsigned word *p_src; -- Pointer to input block. unsigned byte *p_dst; -- Pointer to output block. char *pointer[65536]; -- Stores pointers to phrases. unsigned *length[65536]; -- Stores length of phrases. p_src=pointer to start of input block. p_dst=pointer to start of output block. cycle=256; forall ch in 0..255, pointer[ch]=&<constant byte being value of ch>; forall ch in 0..255, length[ch]=1; while (p_src!=end of input block) loop unsigned phrasenum,len; phrasenum=*p_src++; -- Pick up next code word. p=pointer[phrasenum]; -- Get pointer and length of phrase to copy len=length[phrasenum]; pointer[cycle]=p_dst; -- Add a new phrase one byte longer. length[cycle]=len+1; cycle++; -- Update cycle counter. if (cycle==2^n) cycle=256;-- Note:0..255 reserved for alphabet. while (len--) *p_dst++=*p++; -- Copy phrase to output. end loopThis decompressor code has the full power of LZW while giving roughlythe speed of the fast LZ77 decompressors (e.g. the A1 or LZRW1algorithms). For 65536 phrases, it will use 320K, plus space for theoutput buffer. The algorithm is adaptive so reducing the table sizewill probably not impact on compression grossly (you may or may notwant to pack the resultant narrower codewords).This decompressor avoids the nasty problem LZW has with adding aphrase and then using it immediately (and having to look up the firstcharacter etc). The problem is solved in a LZ77 kind of a way by usingpointers to the output to do automatic overlapping! This decompressoris also adaptive because it overwrites old phrases with new ones. Unixcompress has to do its adaption explicitly.There are any number of compressor implementations that could operatein conjunction with the decompressor above. The compressor can useexpensive data structures to find the best match. It would also be avery easy and efficient thing to modify the decompressor to (say) addTWO strings to the dictionary instead of one (e.g. strings of length 1and 3 characters longer, or this and the next phrase) (many othersimilar variations are possible).There is no reason why a conventional LZW hash table compressor couldnot be modified to drive the output through a phrase table. This wouldyield roughly the same compression speed.The code below is an early draft of the LZRW5 algorithm. Thecompressor uses binary trees and is rather pathetic (it runs at about1k/s) as I have been concentrating on the deecompressor.The 68000 assembly language version of the decompressor runs at93K/sec on my Mac-SE20 (43K/sec in C). This (the 93K/sec) is only 2/3the speed of LZRW1 (which yields worse compression), but a lot fasterthan Unix compress. The compressor below runs at about 1K/second.So far I have been unable to ascertain whether this idea is covered byany patents. If anyone can help here, I would be most grateful.Enjoy,Ross Williamsross@spam.ua.oz.auAppendix A: Half-Baked Code for LZRW5-------------------------------------The following code is public domain and is provided "as is". As mostexperimental code, it is poorly designed, structured, coded, andcommented.WARNING: It is possible that this code may be subject to patentprotection by third parties. As such, it should not be included inproduction software or executed for any but experimental purposeswithout prior legal advice.WARNING: This code will not work properly for more than 65536 phrases.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -