📄 lzrw3-a.c
字号:
/******************************************************************************//* *//* LZRW3-A.C *//* *//******************************************************************************//* *//* Author : Ross Williams. *//* Date : 15-Jul-1991. *//* Release : 1. *//* *//******************************************************************************//* *//* This file contains an implementation of the LZRW3-A data compression *//* algorithm in the C programming language. *//* *//* The LZRW3-A algorithm has the following features: *//* *//* 1 Requires only 16K of memory (for both compression and decompression). *//* 2 The compressor runs about two times faster than Unix compress's. *//* 3 The decompressor runs about three times faster than Unix compress's. *//* 4 Yields a few percent better compression than Unix compress for *//* most files. *//* 5 Allows you to dial up extra compression at a speed cost in the *//* compressor. The speed of the decompressor is not affected. *//* 6 Algorithm is deterministic. *//* 7 Algorithm is free of patent problems. The algorithm has not been *//* patented (nor will it be) and is of the LZ77 class which is fairly *//* clear of patents. *//* 8 This implementation in C is in the public domain. *//* *//* (Timing tests for the speed comparison were performed on a Pyramid 9820.) *//* *//* LZRW3-A is LZRW3 with a deepened hash table. This simple change yields *//* about a 6% (absolute) improvement in compression. *//* *//* Here are the results of applying this code, compiled under THINK C 4.0 *//* and running on a Mac-SE (8MHz 68000), to the standard calgary corpus. *//* *//* +----------------------------------------------------------------+ *//* | DATA COMPRESSION TEST | *//* | ===================== | *//* | Time of run : Mon 15-Jul-1991 05:29PM | *//* | Timing accuracy : One part in 100 | *//* | Context length : 262144 bytes (= 256.0000K) | *//* | Test suite : Calgary Corpus Suite | *//* | Files in suite : 14 | *//* | Algorithm : LZRW3-A | *//* | Note: All averages are calculated from the un-rounded values. | *//* +----------------------------------------------------------------+ *//* | File Name Length CxB ComLen %Remn Bits Com K/s Dec K/s | *//* | ---------- ------ --- ------ ----- ---- ------- ------- | *//* | rpus:Bib.D 111261 1 49044 44.1 3.53 8.47 31.19 | *//* | us:Book1.D 768771 3 420464 54.7 4.38 7.27 30.07 | *//* | us:Book2.D 610856 3 277955 45.5 3.64 8.51 33.40 | *//* | rpus:Geo.D 102400 1 84218 82.2 6.58 4.23 15.04 | *//* | pus:News.D 377109 2 192880 51.1 4.09 7.08 25.89 | *//* | pus:Obj1.D 21504 1 12651 58.8 4.71 5.23 17.44 | *//* | pus:Obj2.D 246814 1 108044 43.8 3.50 8.01 28.11 | *//* | s:Paper1.D 53161 1 24526 46.1 3.69 8.11 30.24 | *//* | s:Paper2.D 82199 1 39483 48.0 3.84 8.11 32.04 | *//* | rpus:Pic.D 513216 2 111622 21.7 1.74 10.64 49.31 | *//* | us:Progc.D 39611 1 17923 45.2 3.62 8.06 29.01 | *//* | us:Progl.D 71646 1 24362 34.0 2.72 10.74 39.51 | *//* | us:Progp.D 49379 1 16805 34.0 2.72 10.64 37.58 | *//* | us:Trans.D 93695 1 30296 32.3 2.59 11.02 38.06 | *//* +----------------------------------------------------------------+ *//* | Average 224401 1 100733 45.8 3.67 8.29 31.21 | *//* +----------------------------------------------------------------+ *//* *//******************************************************************************/ /* INCLUDE FILES */ /* ============= *///#include "port.h" /* Defines symbols for the non portable stuff. */#include "compress.h" /* Defines single exported function "compress". */#include "fast_copy.h" /* Fast memory copy routine. *//******************************************************************************//* The following structure is returned by the "compress" function below when *//* the user asks the function to return identifying information. *//* The most important field in the record is the working memory field which *//* tells the calling program how much working memory should be passed to *//* "compress" when it is called to perform a compression or decompression. *//* LZRW3-A uses the same amount of memory during compression and *//* decompression. For more information on this structure see "compress.h". *//* The alignment fudge below really only needs to be 4 (but I play it safe!). *//* The id looks non-random, but it really was generated by coin tossing! */#define U(X) ((ULONG) X)#define SIZE_P_BYTE (U(sizeof(UBYTE *)))#define ALIGNMENT_FUDGE (U(16))#define MEM_REQ ( U(4096)*(SIZE_P_BYTE) + ALIGNMENT_FUDGE )static struct compress_identity identity ={ U(0x01B90B91), /* Algorithm identification number. */ MEM_REQ, /* Working memory (bytes) required. */ "LZRW3-A", /* Name of algorithm. */ "1.0", /* Version number of algorithm. */ "15-Jul-1990", /* Date of algorithm. */ "Public Domain", /* Copyright notice. */ "Ross N. Williams", /* Author of algorithm. */ "Renaissance Software", /* Affiliation of author. */ "Public Domain" /* Vendor of algorithm. */};void compress_compress (UBYTE *,UBYTE *,ULONG,UBYTE *,ULONG *);void compress_decompress(UBYTE *,UBYTE *,ULONG,UBYTE *,ULONG *);/******************************************************************************//* This function is the only function exported by this module. *//* Depending on its first parameter, the function can be requested to *//* compress a block of memory, decompress a block of memory, or to identify *//* itself. For more information, see the specification file "compress.h". */EXPORT void compress(action,wrk_mem,src_adr,src_len,dst_adr,p_dst_len)UWORD action; /* Action to be performed. */UBYTE *wrk_mem; /* Address of working memory we can use. */UBYTE *src_adr; /* Address of input data. */ULONG src_len; /* Length of input data. */UBYTE *dst_adr; /* Address to put output data. */ULONG *p_dst_len; /* Address of longword for length of output data. */{ switch (action) { case COMPRESS_ACTION_IDENTITY: *p_dst_len=(ULONG) &identity; break; case COMPRESS_ACTION_COMPRESS: compress_compress(wrk_mem,src_adr,src_len,dst_adr,p_dst_len); break; case COMPRESS_ACTION_DECOMPRESS: compress_decompress(wrk_mem,src_adr,src_len,dst_adr,p_dst_len); break; }}/******************************************************************************//* *//* BRIEF DESCRIPTION OF THE LZRW3-A ALGORITHM *//* ========================================== *//* Note: Before attempting to understand this algorithm, you should first *//* understand the LZRW3 algorithm from which this algorithm is derived. *//* *//* The LZRW3-A algorithm is identical to the LZRW3 algorithm except that the *//* hash table has been "deepened". The LZRW3 algorithm has a hash table of *//* 4096 pointers which point to strings in the buffer. LZRW3-A generalizes *//* this to 4096/(2^n) partitions each of which contains (2^n) pointers. *//* In LZRW3-A, the hash function hashes to a partition number. *//* *//* During the processing of each phrase, LZRW3 overwrites the pointer in the *//* position selected by the hash function. LZRW3-A overwrites one of the *//* pointers in the partition that was selected by the hash function. *//* *//* When searching for a match, LZRW3-A matches against all (2^n) strings *//* pointed to by the pointers in the target partition. *//* *//* Deep hash tables were used in early versions of LZRW1 in late 1989, but *//* were discarded in an effort to increase speed (which was the primary *//* requirement for LZRW1). They were revived for use in LZRW3-A in order to *//* produce an algorithm with compression performance competitive with Unix *//* compress. *//* *//* Until 14-Jul-1991, deep hash tables used in prototype LZRW* algorithms *//* used a queue discipline within each partition. Upon the arrival of a new *//* pointer, the pointers in the partition would be block copied back one *//* position (with the oldest pointer being overwritten) and the new pointer *//* being inserted in the space at the front (the youngest position). *//* This meant that pointers to the (2^n) most recent phrases corresponding to *//* each hash was kept. The only flaw in this system was the time-consuming *//* block copy operation which was cheap for shallow tables but expensive for *//* deep tables. *//* *//* The traditional solution to ring buffer block copy problems is to maintain *//* a cyclic counter which points to the "head" of the queue. However, this *//* would have required one counter to be stored for each partition and would *//* have been slightly messy. After some thought (on 14-Jul-1991) a better *//* solution was found. Instead of maintaining a counter for each partition, *//* LZRW3-A maintains a single counter for all partitions! This counter is *//* maintained in both the compressor and decompressor and means that the *//* algorithm (effectively) overwrites a RANDOM element of the partition to be *//* updated. The result was to increase the speed of the compressor and *//* decompressor, to make the decompressor's speed independent from whatever *//* depth was selected, and to impair compression by less than 1% absolute. *//* *//* Setting the depth is a speed/compression tradeoff. The table below gives *//* the tradeoff observed for a typical 50K text file on a Mac-SE. *//* Note: %Rem=Percentage Remaining (after compression). *//* *//* Depth %Rem CmpK/s DecK/s *//* 1 45.2 14.77 32.24 *//* 2 42.6 12.12 31.26 *//* 4 40.9 10.28 31.91 *//* 8 40.0 7.81 32.36 *//* 16 39.5 5.30 32.47 *//* 32 39.0 3.23 32.59 *//* *//* I have chosen a depth of 8 as the "default" depth for LZRW3-A. If you use *//* a depth different to this (e.g. 4), you should use the name LZRW3-A(4) to *//* indicate that a different depth is being used. LZRW3-A(8) is an acceptable *//* longhand for LZRW3-A. *//* *//* To change the depth, search for "HERE IT IS" in the rest of this file. *//* *//* +---+ *//* |___|4095 *//* |===| *//* +---------------------*_|<---+ /----+---\ *//* | |___| +---|Hash | *//* | 512 partitions |___| |Function| *//* | of 8 pointers |===| \--------/ *//* | each (or any |___|0 ^ */
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -