📄 lzrw2.txt

📁 一个基于lZW压缩算法的高效实现
💻 TXT
字号:
NOTES ON THE LZRW2 ALGORITHM============================Author : Ross Williams.Date   : 29-Jun-1991.Abstract--------This file announces the release of my LZRW2 algorithm which wasinvented on 25-Nov-1990. The LZRW2 is a descendant of the LZRW1-Aalgorithm. LZRW2 runs a little more slowly and uses a little morememory than LZRW1-A but yields better compression and isdeterministic. LZRW2 is not patented and is of the LZ77 class and sois unlikely to be subject to a patent challenge. The exact figures forthe Calgary corpus on C implementations on my Macintosh are(percentage remaining, compression speed, decompression speed, memoryrequired during compression and decompression):           PerRem    ComK/S    DecK/S     ComMem    DecMemLZRW1-A     55.1 %   17 K/s     69 K/s      16 K       0 KLZRW2       51.5 %   18 K/s     54 K/s      24 K      16 KLZRW2 has been written and released mainly to demonstrate the phrasetable idea but may also be useful to those who are unhappy withLZRW1-A's compression performance.Availability------------The only implementation available is in C. It can be found in thefollowing archive within a couple of days of 29-Jun-1991:   FTP Archive Access:   Machine   : sirius.itd.adelaide.edu.au   [IP=129.127.40.3]   Directory : ~pub/compression   Files     : lzrw2.tex   - This file.               lzrw2.c     - An implementation in C.Motivation for LZRW2--------------------LZ77 algorithms derive their high decompression speed from thesimplicity of the semantics of the code they use. To process eachitem, the decompressor has to either copy a single literal byte orconvert an offset into an address and perform a multi-byte copy.Because there is no dictionary to look up, the offset/length scheme isextremely simple and fast to decode.However, length/offsets schemes waste a lot of code space. Forexample:   1) Many offsets refer to the same string values.   2) The length range may be too long for some strings.More sophisticated algorithms (e.g. LZ78 class algorithms) are moremiserly with their code space and construct elaborate dictionaries tomap between individual strings and the code space values.Although it may seem that one has to choose between a simple mappingand a complex dictionary, I believe that there is a grey region ofextremely simple hardly-dictionary structures that bridge the gap.LZRW2 uses such a structure which I call a phrase table.To increase speed LZRW1-A (and LZRW1) only updates its hash table atthe end of each phrase (where here a phrase is defined to be either asingle-byte literal item or a copy item). This technique works wellbecause it means that the work that the algorithm puts into updatingits table is inversely proportional to how well it is compressing.When it is compressing badly, it will be moving one literal byte at atime and updating its table frequently. When compressing well, it willbe moving at a few (copy) bytes at a time, only updating its tableonce every few bytes. The technique loses just a few percentcompression, but increases speed significantly.A consequence of the technique is that MOST of the offset space isrendered unusable because the compression algorithm will never attemptto match at a particular offset unless that offset (or a correspondingpointer) appears in the hash table, and if the hash able is updatedonly every few bytes then most offsets won't make it into the table.To exploit the redundant offset space, a phrase table mechanism can beemployed. A phrase table is a table of pointers to the first byte ofthe most recent 4096 phrases processed by the algorithm. The phrasetable is organized as a circular array with a pointer to the nextposition. Both the compressor and decompressor maintain the phrasetable which is very cheap to update, requiring only that once perphrase a pointer to the start of the phrase just encoded/decoded beoverwritten at the next position in the phrase table and the positionpointer incremented circularly.As in LZRW1-A, the compressor maintains a hash table (the decompressordoesn't have to), but in LZRW2, the hash table entries are indexes tothe phrase table.LZRW2 is identical to LZRW1-A except that instead of transmitting alength and an offset, LZRW2 transmits a length and a phrase tableindex.The effect of the phrase table is not to fill in the gaps in thehistory positions to which the compressor can point, but rather toaccept the gaps as good for speed and use the phrase table to increasethe reach of the compressor in pointing far back. LZRW1 and LZRW1-Acan only reach back 4095 BYTES whereas LZRW2 can reach back 4096PHRASES which is a lot more, particularly as the average length ofphrases can get up in the 4 to 6 byte range.Whether the circular phrase table idea is new or not (and as far as Iknow, it is), the LZRW2 algorithm acts as a good example of itsapplication.For more details, see the code itself.Benchmark---------Here are the results of applying LZRW2.C compiled under THINK C 4.0and running on a Mac-SE (8MHz 68000) to the standard Calgary corpus.    +----------------------------------------------------------------+    | DATA COMPRESSION TEST                                          |    | =====================                                          |    | Time of run     : Sat 29-Jun-1991 01:24PM                      |    | Timing accuracy : One part in 100                              |    | Context length  : 262144 bytes (= 256.0000K)                   |    | Test suite      : Calgary Corpus Suite                         |    | Files in suite  : 14                                           |    | Algorithm       : LZRW2                                        |    | Note: All averages are calculated from the un-rounded values.  |    +----------------------------------------------------------------+    | File Name   Length  CxB  ComLen  %Remn  Bits  Com K/s  Dec K/s |    | ----------  ------  ---  ------  -----  ----  -------  ------- |    | rpus:Bib.D  111261    1   58726   52.8  4.22    16.98    52.36 |    | us:Book1.D  768771    3  491413   63.9  5.11    14.82    47.04 |    | us:Book2.D  610856    3  331932   54.3  4.35    17.10    51.28 |    | rpus:Geo.D  102400    1   84118   82.1  6.57    10.81    41.67 |    | pus:News.D  377109    2  215652   57.2  4.57    15.20    50.68 |    | pus:Obj1.D   21504    1   13032   60.6  4.85    13.13    50.15 |    | pus:Obj2.D  246814    1  119078   48.2  3.86    17.81    55.01 |    | s:Paper1.D   53161    1   28269   53.2  4.25    17.16    51.92 |    | s:Paper2.D   82199    1   46789   56.9  4.55    16.58    49.96 |    | rpus:Pic.D  513216    2  123311   24.0  1.92    33.17    71.63 |    | us:Progc.D   39611    1   19959   50.4  4.03    17.65    53.36 |    | us:Progl.D   71646    1   28571   39.9  3.19    22.63    59.13 |    | us:Progp.D   49379    1   19544   39.6  3.17    22.52    59.45 |    | us:Trans.D   93695    1   35364   37.7  3.02    22.87    60.89 |    +----------------------------------------------------------------+    | Average     224401    1  115411   51.5  4.12    18.46    53.89 |    +----------------------------------------------------------------+--<End of Release Notes for LZRW2>--
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -