📄 lzrw3-a.c

📁 一个基于lZW压缩算法的高效实现
💻 C
📖 第 1 页 / 共 4 页
字号:
/*              |    a*b=4096)      +---+             |                       *//*              |                   Hash        +-----+                       *//*              |                   Table       |                             *//*              |                              ---                            *//*              v                              ^^^                            *//*      +-------------------------------------|----------------+              *//*      ||||||||||||||||||||||||||||||||||||||||||||||||||||||||              *//*      +-------------------------------------|----------------+              *//*      |                                     |1......18|      |              *//*      |<------- Lempel=History ------------>|<--Ziv-->|      |              *//*      |     (=bytes already processed)      |<-Still to go-->|              *//*      |<-------------------- INPUT BLOCK ------------------->|              *//*                                                                            *//*                                                                            *//******************************************************************************//*                                                                            *//*                     DEFINITION OF COMPRESSED FILE FORMAT                   *//*                     ====================================                   *//*  * A compressed file consists of a COPY FLAG followed by a REMAINDER.      *//*  * The copy flag CF uses up four bytes with the first byte being the       *//*    least significant.                                                      *//*  * If CF=1, then the compressed file represents the remainder of the file  *//*    exactly. Otherwise CF=0 and the remainder of the file consists of zero  *//*    or more GROUPS, each of which represents one or more bytes.             *//*  * Each group consists of two bytes of CONTROL information followed by     *//*    sixteen ITEMs except for the last group which can contain from one      *//*    to sixteen items.                                                       *//*  * An item can be either a LITERAL item or a COPY item.                    *//*  * Each item corresponds to a bit in the control bytes.                    *//*  * The first control byte corresponds to the first 8 items in the group    *//*    with bit 0 corresponding to the first item in the group and bit 7 to    *//*    the eighth item in the group.                                           *//*  * The second control byte corresponds to the second 8 items in the group  *//*    with bit 0 corresponding to the ninth item in the group and bit 7 to    *//*    the sixteenth item in the group.                                        *//*  * A zero bit in a control word means that the corresponding item is a     *//*    literal item. A one bit corresponds to a copy item.                     *//*  * A literal item consists of a single byte which represents itself.       *//*  * A copy item consists of two bytes that represent from 3 to 18 bytes.    *//*  * The first  byte in a copy item will be denoted C1.                      *//*  * The second byte in a copy item will be denoted C2.                      *//*  * Bits will be selected using square brackets.                            *//*    For example: C1[0..3] is the low nibble of the first control byte.      *//*    of copy item C1.                                                        *//*  * The LENGTH of a copy item is defined to be C1[0..3]+3 which is a number *//*    in the range [3,18].                                                    *//*  * The INDEX of a copy item is defined to be C1[4..7]*256+C2[0..8] which   *//*    is a number in the range [0,4095].                                      *//*  * A copy item represents the sequence of bytes                            *//*       text[POS-OFFSET..POS-OFFSET+LENGTH-1] where                          *//*          text   is the entire text of the uncompressed string.             *//*          POS    is the index in the text of the character following the    *//*                   string represented by all the items preceeding the item  *//*                   being defined.                                           *//*          OFFSET is obtained from INDEX by looking up the hash table.       *//*                                                                            *//******************************************************************************//* When I first started to get concerned about the portability of my C code,  *//* I switched over to using only macro defined types UBYTE, UWORD, ULONG and  *//* one or two others. While, these are useful for most purposes, they impair  *//* efficiency as, if I have a variable whose range will be [0,1000], I will   *//* declare it as a UWORD. This will translate into (say) "short int" and      *//* hence may be less efficient than just an "int" which represents the        *//* natural size of the machine. Before releasing LZRW3-A, I realized this     *//* mistake. Unfortunately, I can't access the ftp archive with my portability *//* header in it in time for this algorithm's release and so I am including an *//* extra definition. The definition UCARD stands for an unsigned (cardinal)   *//* type that can hold values in the range [0,32767]. This is within the ANSI  *//* range of a standard int or unsigned. No assumption about overflow of this  *//* type is made in the code (i.e. all usages are within range and I do not    *//* use the value -1 to detect the end of loops.).                             *//* You can use either "unsigned" or just "int" here depending on which is     *//* more efficient in your environment (both the same probably).               */#define UCARD unsigned/* The following #define defines the length of the copy flag that appears at  *//* the start of the compressed file. The value of four bytes was chosen       *//* because the fast_copy routine on my Macintosh runs faster if the source    *//* and destination blocks are relatively longword aligned.                    *//* The actual flag data appears in the first byte. The rest are zeroed so as  *//* to normalize the compressed representation (i.e. not non-deterministic).   */#define FLAG_BYTES 4/* The following #defines define the meaning of the values of the copy        *//* flag at the start of the compressed file.                                  */#define FLAG_COMPRESS 0     /* Signals that output was result of compression. */#define FLAG_COPY     1     /* Signals that output was simply copied over.    *//* The 68000 microprocessor (on which this algorithm was originally developed *//* is fussy about non-aligned arrays of words. To avoid these problems the    *//* following macro can be used to "waste" from 0 to 3 bytes so as to align    *//* the argument pointer.                                                      */#define ULONG_ALIGN_UP(X) ((((ULONG)X)+3)&~3)/* The following constant defines the maximum length of an uncompressed item. *//* This definition must not be changed; its value is hardwired into the code. *//* The longest number of bytes that can be spanned by a single item is 18     *//* for the longest copy item.                                                 */#define MAX_RAW_ITEM (18)/* The following constant defines the maximum length of an uncompressed group.*//* This definition must not be changed; its value is hardwired into the code. *//* A group contains at most 16 items which explains this definition.          */#define MAX_RAW_GROUP (16*MAX_RAW_ITEM)/* The following constant defines the maximum length of a compressed group.   *//* This definition must not be changed; its value is hardwired into the code. *//* A compressed group consists of two control bytes followed by up to 16      *//* compressed items each of which can have a maximum length of two bytes.     */#define MAX_CMP_GROUP (2+16*2)/* This constant defines the number of pointers in the hash table. The number *//* of partitions multiplied by the number of pointers in each partition must  *//* multiply out to this value of 4096. In LZRW1, LZRW1-A, and LZRW2, this     *//* table length value can be changed. However, in LZRW3-A (and LZRW3), the    *//* table length cannot be changed because it is connected directly to the     *//* coding scheme which is hardwired (the table index of a single pointer is   *//* transmitted in the 12-bit index field). So don't change this constant!     */#define HASH_TABLE_LENGTH (4096)/* HERE IT IS: THE PLACE TO CHANGE THE HASH TABLE DEPTH!                      *//* The following definition is the log_2 of the depth of the hash table. This *//* constant can be in the range [0,1,2,3,...,12]. Increasing the depth        *//* increases compression at the expense of speed. However, you are not likely *//* to see much of a compression improvement (e.g. not more than 0.5%) above a *//* value of 6 and the algorithm will start to get very slow. See the table in *//* the earlier comments block for an idea of the trade-off involved.          *//* Note: The parentheses are to avoid macro substitution funnies.             *//* Note: The LZRW3-A default is a value of (3).                               *//* Note: If you end up choosing a value of 0, you should use LZRW3 instead.   *//* Note: Changing the value of HASH_TABLE_DEPTH_BITS is the ONLY thing you    *//* have to do to change the depth, so go ahead and recompile now!             *//* Note: I have tested LZRW3-A for DEPTH_BITS=0,1,2,3,4 and a few other       *//* values. However, I have not tested it for 12 as I can't wait that long!    */#define HASH_TABLE_DEPTH_BITS (3)      /* Must be in range [0,12].            *//* The following definitions are all self-explanatory and follow from the     *//* definition of HASH_TABLE_DEPTH_BITS and the hardwired requirement that the *//* hash table contain exactly 4096 pointers.                                  */#define PARTITION_LENGTH_BITS (12-HASH_TABLE_DEPTH_BITS)#define PARTITION_LENGTH      (1<<PARTITION_LENGTH_BITS)#define HASH_TABLE_DEPTH      (1<<HASH_TABLE_DEPTH_BITS )#define HASH_MASK             (PARTITION_LENGTH-1)#define DEPTH_MASK            (HASH_TABLE_DEPTH-1)/* LZRW3-A, unlike LZRW1(-A), must initialize its hash table so as to enable  *//* the compressor and decompressor to stay in step maintaining identical hash *//* tables. In an early version of LZRW3, the tables were simply               *//* initialized to zero and a check for zero was included just before the      *//* matching code. However, this test costs time. A better solution is to      *//* initialize all the entries in the hash table to point to a constant        *//* string. The decompressor does the same. This solution requires no extra    *//* test. The contents of the string do not matter so long as the string is    *//* the same for the compressor and decompressor and contains at least         *//* MAX_RAW_ITEM bytes. I chose consecutive decimal digits because they do not *//* have white space problems (e.g. there is no chance that the compiler will  *//* replace more than one space by a TAB) and because they make the length of  *//* the string obvious by inspection.                                          */#define START_STRING_18 ((UBYTE *) "123456789012345678")/* The following macro accepts a pointer PTR to three consecutive bytes in    *//* memory and hashes them into an integer that is a hash table index that     *//* points to the zeroth (first) element of a partition. Thus, the hash        *//* function really hashes to a partition number but, for convenience,         *//* multiplies it up to yield a hash table index. From all this, we see that   *//* the resultant number is in the range [0,HASH_TABLE_LENGTH-1] and is a      *//* multiple of HASH_TABLE_DEPTH.                                              *//* A macro is used, because in LZRW3-A we have to hash more than once.        */#define HASH(PTR) \ ( \     (((40543*(((*(PTR))<<8)^((*((PTR)+1))<<4)^(*((PTR)+2))))>>4) & HASH_MASK) \  << HASH_TABLE_DEPTH_BITS \ )/* Another operation that is performed more than once is the updating of the  *//* hash table. Here two macros are defined to simplify update operations.     *//* Updating consists of identifying and overwriting a pointer in a partition  *//* with a newer pointer and then updating the global cycle value.             *//* These macros accept the new pointer (NEWPTR) and either a pointer to       *//* (P_BASE) or the index of (I_BASE) the zeroth (first, or base) pointer in   *//* the partition that is to be updated. The macros use the 'cycle' variable   *//* to locate and overwrite a pointer and then update the cycle value.         *//* Note: Hardcoding 'cycle' in this macro is naughty (it should really be a   *//* macro parameter), but I have done so because it neatens up the code.       */#define UPDATE_P(P_BASE,NEWPTR) \{(P_BASE)[cycle++]=(NEWPTR); cycle&=DEPTH_MASK;}#define UPDATE_I(I_BASE,NEWPTR) \{hash[(I_BASE)+cycle++]=(NEWPTR); cycle&=DEPTH_MASK;}/* This constant supplies a legal (in-range) hash table index for use when    *//* a legal-but-don't-care index is required.                                  */#define ANY_HASH_INDEX (0)/******************************************************************************/LOCAL void compress_compress           (p_wrk_mem,p_src_first,src_len,p_dst_first,p_dst_len)/* Input  : Hand over the required amount of working memory in p_wrk_mem.     *//* Input  : Specify input block using p_src_first and src_len.                *//* Input  : Point p_dst_first to the start of the output zone (OZ).           *//* Input  : Point p_dst_len to a ULONG to receive the output length.          *//* Input  : Input block and output zone must not overlap.                     *//* Output : Length of output block written to *p_dst_len.                     *//* Output : Output block in Mem[p_dst_first..p_dst_first+*p_dst_len-1]. May   *//* Output : write in OZ=Mem[p_dst_first..p_dst_first+src_len+MAX_CMP_GROUP-1].*//* Output : Upon completion guaranteed *p_dst_len<=src_len+FLAG_BYTES.        */UBYTE *p_wrk_mem;UBYTE *p_src_first;ULONG  src_len;UBYTE *p_dst_first;ULONG *p_dst_len;{
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -