📄 lzrw3-a.c
字号:
/* | a*b=4096) +---+ | *//* | Hash +-----+ *//* | Table | *//* | --- *//* v ^^^ *//* +-------------------------------------|----------------+ *//* |||||||||||||||||||||||||||||||||||||||||||||||||||||||| *//* +-------------------------------------|----------------+ *//* | |1......18| | *//* |<------- Lempel=History ------------>|<--Ziv-->| | *//* | (=bytes already processed) |<-Still to go-->| *//* |<-------------------- INPUT BLOCK ------------------->| *//* *//* *//******************************************************************************//* *//* DEFINITION OF COMPRESSED FILE FORMAT *//* ==================================== *//* * A compressed file consists of a COPY FLAG followed by a REMAINDER. *//* * The copy flag CF uses up four bytes with the first byte being the *//* least significant. *//* * If CF=1, then the compressed file represents the remainder of the file *//* exactly. Otherwise CF=0 and the remainder of the file consists of zero *//* or more GROUPS, each of which represents one or more bytes. *//* * Each group consists of two bytes of CONTROL information followed by *//* sixteen ITEMs except for the last group which can contain from one *//* to sixteen items. *//* * An item can be either a LITERAL item or a COPY item. *//* * Each item corresponds to a bit in the control bytes. *//* * The first control byte corresponds to the first 8 items in the group *//* with bit 0 corresponding to the first item in the group and bit 7 to *//* the eighth item in the group. *//* * The second control byte corresponds to the second 8 items in the group *//* with bit 0 corresponding to the ninth item in the group and bit 7 to *//* the sixteenth item in the group. *//* * A zero bit in a control word means that the corresponding item is a *//* literal item. A one bit corresponds to a copy item. *//* * A literal item consists of a single byte which represents itself. *//* * A copy item consists of two bytes that represent from 3 to 18 bytes. *//* * The first byte in a copy item will be denoted C1. *//* * The second byte in a copy item will be denoted C2. *//* * Bits will be selected using square brackets. *//* For example: C1[0..3] is the low nibble of the first control byte. *//* of copy item C1. *//* * The LENGTH of a copy item is defined to be C1[0..3]+3 which is a number *//* in the range [3,18]. *//* * The INDEX of a copy item is defined to be C1[4..7]*256+C2[0..8] which *//* is a number in the range [0,4095]. *//* * A copy item represents the sequence of bytes *//* text[POS-OFFSET..POS-OFFSET+LENGTH-1] where *//* text is the entire text of the uncompressed string. *//* POS is the index in the text of the character following the *//* string represented by all the items preceeding the item *//* being defined. *//* OFFSET is obtained from INDEX by looking up the hash table. *//* *//******************************************************************************//* When I first started to get concerned about the portability of my C code, *//* I switched over to using only macro defined types UBYTE, UWORD, ULONG and *//* one or two others. While, these are useful for most purposes, they impair *//* efficiency as, if I have a variable whose range will be [0,1000], I will *//* declare it as a UWORD. This will translate into (say) "short int" and *//* hence may be less efficient than just an "int" which represents the *//* natural size of the machine. Before releasing LZRW3-A, I realized this *//* mistake. Unfortunately, I can't access the ftp archive with my portability *//* header in it in time for this algorithm's release and so I am including an *//* extra definition. The definition UCARD stands for an unsigned (cardinal) *//* type that can hold values in the range [0,32767]. This is within the ANSI *//* range of a standard int or unsigned. No assumption about overflow of this *//* type is made in the code (i.e. all usages are within range and I do not *//* use the value -1 to detect the end of loops.). *//* You can use either "unsigned" or just "int" here depending on which is *//* more efficient in your environment (both the same probably). */#define UCARD unsigned/* The following #define defines the length of the copy flag that appears at *//* the start of the compressed file. The value of four bytes was chosen *//* because the fast_copy routine on my Macintosh runs faster if the source *//* and destination blocks are relatively longword aligned. *//* The actual flag data appears in the first byte. The rest are zeroed so as *//* to normalize the compressed representation (i.e. not non-deterministic). */#define FLAG_BYTES 4/* The following #defines define the meaning of the values of the copy *//* flag at the start of the compressed file. */#define FLAG_COMPRESS 0 /* Signals that output was result of compression. */#define FLAG_COPY 1 /* Signals that output was simply copied over. *//* The 68000 microprocessor (on which this algorithm was originally developed *//* is fussy about non-aligned arrays of words. To avoid these problems the *//* following macro can be used to "waste" from 0 to 3 bytes so as to align *//* the argument pointer. */#define ULONG_ALIGN_UP(X) ((((ULONG)X)+3)&~3)/* The following constant defines the maximum length of an uncompressed item. *//* This definition must not be changed; its value is hardwired into the code. *//* The longest number of bytes that can be spanned by a single item is 18 *//* for the longest copy item. */#define MAX_RAW_ITEM (18)/* The following constant defines the maximum length of an uncompressed group.*//* This definition must not be changed; its value is hardwired into the code. *//* A group contains at most 16 items which explains this definition. */#define MAX_RAW_GROUP (16*MAX_RAW_ITEM)/* The following constant defines the maximum length of a compressed group. *//* This definition must not be changed; its value is hardwired into the code. *//* A compressed group consists of two control bytes followed by up to 16 *//* compressed items each of which can have a maximum length of two bytes. */#define MAX_CMP_GROUP (2+16*2)/* This constant defines the number of pointers in the hash table. The number *//* of partitions multiplied by the number of pointers in each partition must *//* multiply out to this value of 4096. In LZRW1, LZRW1-A, and LZRW2, this *//* table length value can be changed. However, in LZRW3-A (and LZRW3), the *//* table length cannot be changed because it is connected directly to the *//* coding scheme which is hardwired (the table index of a single pointer is *//* transmitted in the 12-bit index field). So don't change this constant! */#define HASH_TABLE_LENGTH (4096)/* HERE IT IS: THE PLACE TO CHANGE THE HASH TABLE DEPTH! *//* The following definition is the log_2 of the depth of the hash table. This *//* constant can be in the range [0,1,2,3,...,12]. Increasing the depth *//* increases compression at the expense of speed. However, you are not likely *//* to see much of a compression improvement (e.g. not more than 0.5%) above a *//* value of 6 and the algorithm will start to get very slow. See the table in *//* the earlier comments block for an idea of the trade-off involved. *//* Note: The parentheses are to avoid macro substitution funnies. *//* Note: The LZRW3-A default is a value of (3). *//* Note: If you end up choosing a value of 0, you should use LZRW3 instead. *//* Note: Changing the value of HASH_TABLE_DEPTH_BITS is the ONLY thing you *//* have to do to change the depth, so go ahead and recompile now! *//* Note: I have tested LZRW3-A for DEPTH_BITS=0,1,2,3,4 and a few other *//* values. However, I have not tested it for 12 as I can't wait that long! */#define HASH_TABLE_DEPTH_BITS (3) /* Must be in range [0,12]. *//* The following definitions are all self-explanatory and follow from the *//* definition of HASH_TABLE_DEPTH_BITS and the hardwired requirement that the *//* hash table contain exactly 4096 pointers. */#define PARTITION_LENGTH_BITS (12-HASH_TABLE_DEPTH_BITS)#define PARTITION_LENGTH (1<<PARTITION_LENGTH_BITS)#define HASH_TABLE_DEPTH (1<<HASH_TABLE_DEPTH_BITS )#define HASH_MASK (PARTITION_LENGTH-1)#define DEPTH_MASK (HASH_TABLE_DEPTH-1)/* LZRW3-A, unlike LZRW1(-A), must initialize its hash table so as to enable *//* the compressor and decompressor to stay in step maintaining identical hash *//* tables. In an early version of LZRW3, the tables were simply *//* initialized to zero and a check for zero was included just before the *//* matching code. However, this test costs time. A better solution is to *//* initialize all the entries in the hash table to point to a constant *//* string. The decompressor does the same. This solution requires no extra *//* test. The contents of the string do not matter so long as the string is *//* the same for the compressor and decompressor and contains at least *//* MAX_RAW_ITEM bytes. I chose consecutive decimal digits because they do not *//* have white space problems (e.g. there is no chance that the compiler will *//* replace more than one space by a TAB) and because they make the length of *//* the string obvious by inspection. */#define START_STRING_18 ((UBYTE *) "123456789012345678")/* The following macro accepts a pointer PTR to three consecutive bytes in *//* memory and hashes them into an integer that is a hash table index that *//* points to the zeroth (first) element of a partition. Thus, the hash *//* function really hashes to a partition number but, for convenience, *//* multiplies it up to yield a hash table index. From all this, we see that *//* the resultant number is in the range [0,HASH_TABLE_LENGTH-1] and is a *//* multiple of HASH_TABLE_DEPTH. *//* A macro is used, because in LZRW3-A we have to hash more than once. */#define HASH(PTR) \ ( \ (((40543*(((*(PTR))<<8)^((*((PTR)+1))<<4)^(*((PTR)+2))))>>4) & HASH_MASK) \ << HASH_TABLE_DEPTH_BITS \ )/* Another operation that is performed more than once is the updating of the *//* hash table. Here two macros are defined to simplify update operations. *//* Updating consists of identifying and overwriting a pointer in a partition *//* with a newer pointer and then updating the global cycle value. *//* These macros accept the new pointer (NEWPTR) and either a pointer to *//* (P_BASE) or the index of (I_BASE) the zeroth (first, or base) pointer in *//* the partition that is to be updated. The macros use the 'cycle' variable *//* to locate and overwrite a pointer and then update the cycle value. *//* Note: Hardcoding 'cycle' in this macro is naughty (it should really be a *//* macro parameter), but I have done so because it neatens up the code. */#define UPDATE_P(P_BASE,NEWPTR) \{(P_BASE)[cycle++]=(NEWPTR); cycle&=DEPTH_MASK;}#define UPDATE_I(I_BASE,NEWPTR) \{hash[(I_BASE)+cycle++]=(NEWPTR); cycle&=DEPTH_MASK;}/* This constant supplies a legal (in-range) hash table index for use when *//* a legal-but-don't-care index is required. */#define ANY_HASH_INDEX (0)/******************************************************************************/LOCAL void compress_compress (p_wrk_mem,p_src_first,src_len,p_dst_first,p_dst_len)/* Input : Hand over the required amount of working memory in p_wrk_mem. *//* Input : Specify input block using p_src_first and src_len. *//* Input : Point p_dst_first to the start of the output zone (OZ). *//* Input : Point p_dst_len to a ULONG to receive the output length. *//* Input : Input block and output zone must not overlap. *//* Output : Length of output block written to *p_dst_len. *//* Output : Output block in Mem[p_dst_first..p_dst_first+*p_dst_len-1]. May *//* Output : write in OZ=Mem[p_dst_first..p_dst_first+src_len+MAX_CMP_GROUP-1].*//* Output : Upon completion guaranteed *p_dst_len<=src_len+FLAG_BYTES. */UBYTE *p_wrk_mem;UBYTE *p_src_first;ULONG src_len;UBYTE *p_dst_first;ULONG *p_dst_len;{
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -