📄 pgpzinflate.c
字号:
/*
* inflate.c -- Not copyrighted 1992,1993 by Mark Adler
* Latest hacking on this version by Colin Plumb
* version p3, 19 Oct 1995
*
* $Id: pgpZInflate.c,v 1.23 1999/03/22 16:47:03 hal Exp $
*/
/*
* You can do whatever you like with this source file, though I would
* prefer that if you modify it and redistribute it that you include
* comments to that effect with your name and the date. Thank you.
*
* History:
* vers date who what
* ---- --------- -------------- ------------------------------------
* a ~~ Feb 92 M. Adler used full (large, one-step) lookup table
* b1 21 Mar 92 M. Adler first version with partial lookup tables
* b2 21 Mar 92 M. Adler fixed bug in fixed-code blocks
* b3 22 Mar 92 M. Adler sped up match copies, cleaned up some
* b4 25 Mar 92 M. Adler added prototypes; removed window[] (now
* is the responsibility of unzip.h--also
* changed name to slide[]), so needs diffs
* for unzip.c and unzip.h (this allows
* compiling in the small model on MSDOS);
* fixed cast of q in huft_build();
* b5 26 Mar 92 M. Adler got rid of unintended macro recursion.
* b6 27 Mar 92 M. Adler got rid of nextbyte() routine. fixed
* bug in inflate_fixed().
* c1 30 Mar 92 M. Adler removed lbits, dbits environment variables.
* changed BMAX to 16 for explode. Removed
* OUTB usage, and replaced it with flush()--
* this was a 20% speed improvement! Added
* an explode.c (to replace unimplode.c) that
* uses the huft routines here. Removed
* register union.
* c2 4 Apr 92 M. Adler fixed bug for file sizes a multiple of 32k.
* c3 10 Apr 92 M. Adler reduced memory of code tables made by
* huft_build significantly (factor of two to
* three).
* c4 15 Apr 92 M. Adler added NOMEMCPY do kill use of memcpy().
* worked around a Turbo C optimization bug.
* c5 21 Apr 92 M. Adler added the WSIZE #define to allow reducing
* the 32K window size for specialized
* applications.
* c6 31 May 92 M. Adler added some typecasts to eliminate warnings
* c7 27 Jun 92 G. Roelofs added some more typecasts (444: MSC bug).
* c8 5 Oct 92 J-l. Gailly added ifdef'd code to deal with PKZIP bug.
* c9 9 Oct 92 M. Adler removed a memory error message (~line 416).
* c10 17 Oct 92 G. Roelofs changed ULONG/UWORD/byte to ulg/ush/uch,
* removed old inflate, renamed inflate_entry
* to inflate, added Mark's fix to a comment.
* c11 2 Jan 93 M. Adler fixed bug in detection of incomplete
* tables, and removed assumption that EOB is
* the longest code (bad assumption).
* c12 3 Jan 93 M. Adler make tables for fixed blocks only once.
* c13 5 Jan 93 M. Adler allow all zero length codes (pkzip 2.04c
* outputs one zero length code for an empty
* distance tree).
* c14 12 Mar 93 M. Adler made inflate.c standalone with the
* introduction of inflate.h.
* d0 25 Apr 93 M. Adler several speedups in inflate_codes of
* almost 20% (suggested by Urban Mueller)--
* now requires the use of backfill.[ch]
* d1 4 May 93 M. Adler deleted extraneous statement in cheap loop,
* optimized common copy a little more.
* d2 5 May 93 M. Adler calculate number of cheap loops, a few
* other small optimizations.
* p1 Nov 93 C. Plumb Adpated to be able to suspend itself.
* Pretty massive reorganization.
* Many comments added.
* p2 18 Oct 95 C. Plumb Improved interface some more. Now
* nicely re-entrant. Still more comments.
* p3 19 Oct 95 C. Plumb Changed infInflate core function so that
* it sucks the input completely dry before
* returning. This gets rid of the old ad-hoc
* technique for flushing out the last little
* bit of a compressed file by padding it
* with some dummy data, which was ugly.
*/
/*
* Inflate deflated (PKZIP's method 8 compressed) data. The compression
* method searches for as much of the current string of bytes (up to a
* length of 258) in the previous 32K bytes. If it doesn't find any
* matches (of at least length 3), it codes the next byte. Otherwise, it
* codes the length of the matched string and its distance backwards from
* the current position. There is a single Huffman code that codes both
* single bytes (called "literals") and match lengths. A second Huffman
* code codes the distance information, which follows a length code. Each
* length or distance code actually represents a base value and a number
* of "extra" (sometimes zero) bits to get to add to the base value. At
* the end of each deflated block is a special end-of-block (EOB) literal/
* length code. The decoding process is basically: get a literal/length
* code; if EOB then done; if a literal, emit the decoded byte; if a
* length then get the distance and emit the referred-to bytes from the
* sliding window of previously emitted data.
*
* There are (currently) three kinds of inflate blocks: stored, fixed, and
* dynamic. The compressor outputs a chunk of data at a time and decides
* which method to use on a chunk-by-chunk basis. A chunk might typically
* be 32K to 64K, uncompressed. If the chunk is uncompressible, then the
* "stored" method is used. In this case, the bytes are simply stored as
* is, eight bits per byte, with none of the above coding. The bytes are
* preceded by a (16-bit) count, since there is no longer an EOB code.
*
* If the data is compressible, then either the fixed or dynamic methods
* are used. In the dynamic method, the compressed data is preceded by
* an encoding of the literal/length and distance Huffman codes that are
* to be used to decode this block. The representation is itself Huffman
* coded, and so is preceded by a description of that code. These code
* descriptions take up a little space, and so for small blocks, there is
* a predefined set of codes, called the fixed codes. The fixed method is
* used if the block ends up smaller that way (usually for quite small
* chunks); otherwise the dynamic method is used. In the latter case, the
* codes are customized to the probabilities in the current block and so
* can code it much better than the pre-determined fixed codes can.
*
* The Huffman codes themselves are decoded using a mutli-level table
* lookup, in order to maximize the speed of decoding plus the speed of
* building the decoding tables. See the comments below that precede the
* LBITS and DBITS tuning parameters.
*/
/*
* Notes beyond the 1.93a appnote.txt:
*
* 1. Distance pointers never point before the beginning of the output
* stream.
* 2. Distance pointers can point back across blocks, up to 32k away.
* 3. There is an implied maximum of 7 bits for the bit length table and
* 15 bits for the actual data.
* 4. If only one code exists, then it is encoded using one bit. (Zero
* would be more efficient, but perhaps a little confusing.) If two
* codes exist, they are coded using one bit each (0 and 1).
* 5. There is no way of sending zero distance codes--a dummy must be
* sent if there are none. (History: a pre 2.0 version of PKZIP would
* store blocks with no distance codes, but this was discovered to be
* too harsh a criterion.) Valid only for 1.93a. 2.04c does allow
* zero distance codes, which is sent as one code of zero bits in
* length.
* 6. There are up to 286 literal/length codes. Code 256 represents the
* end-of-block. Note however that the static length tree defines
* 288 codes just to fill out the Huffman codes. Codes 286 and 287
* cannot be used though, since there is no length base or extra bits
* defined for them. Similarily, there are up to 30 distance codes.
* However, static trees define 32 codes (all 5 bits) to fill out the
* Huffman codes, but the last two had better not show up in the data.
* 7. Unzip can check dynamic Huffman blocks for complete code sets.
* The exception is that a single code would not be complete (see #4).
* 8. The five bits following the block type is really the number of
* literal codes sent minus 257.
* 9. Length codes 8,16,16 are interpreted as 13 length codes of 8 bits
* (1+6+6). Therefore, to output three times the length, you output
* three codes (1+1+1), whereas to output four times the same length,
* you only need two codes (1+3). Hmm.
* 10. In the tree reconstruction algorithm, Code = Code + Increment
* only if BitLength(i) is not zero. (Pretty obvious.)
* 11. Correction: 4 Bits: # of Bit Length codes - 4 (4 - 19)
* 12. Note: length code 284 can represent 227-258, but length code 285
* really is 258. The last length deserves its own, short code
* since it gets used a lot in very redundant files. The length
* 258 is special since 258 - 3 (the min match length) is 255.
* 13. The literal/length and distance code bit lengths are read as a
* single stream of lengths. It is possible (and advantageous) for
* a repeat code (16, 17, or 18) to go across the boundary between
* the two sets of lengths.
*/
#include "pgpConfig.h"
#include <string.h>
#include "pgpDebug.h"
#include "pgpMem.h"
#include "pgpUsuals.h"
#include "pgpZInflate.h"
/* Increasing these only wastes space, so they can be changed if necessary */
typedef PGPUInt16 ush;
typedef PGPUInt32 ulg;
typedef struct huft huft;
/* Private context structure for use by all functions. */
struct InflateContext {
PGPContextRef context;
/* Major state information */
int state;
int substate;
int lastblock;
/* static trees*/
huft * fixed_tl;
huft * fixed_td;
PGPUInt32 fixed_bl;
PGPUInt32 fixed_bd;
/* Output-related information */
unsigned char *slide; /* Circular output buffer */
unsigned char *slideend;
unsigned slidelen; /* slideend-slide - MUST BE POWER OF 2 */
unsigned slidemask; /* slidelen-1 */
unsigned char *outptr; /* Pointer into slide */
unsigned char const *readptr; /* Pointer into slide for reading */
/* Input-related information */
unsigned char const *inptr; /* Input pointer */
int inlen; /* Input length */
ulg bitbuffer;
unsigned bufbits;
/* The literal/length tree - also used for the bit-length tree */
huft *tree1;
unsigned bits1;
/* The distance tree */
huft *tree2;
unsigned bits2;
/* Encoded huffman tree */
unsigned char bitlen[286+30];
/* For dynamic trees */
unsigned bitlengths;
unsigned litcodes;
unsigned distcodes;
/* Used in various places */
huft const *t;
unsigned copylen;
int distbase;
int numbits; /* # of bits in dyn. tree repeat and after dist. code*/
unsigned index; /* Position in dynamic tree bit lengths we've read to*/
DEBUG_STRUCT_CONSTRUCTOR( InflateContext )
};
/* Options for behaviour */
#ifndef SECURE
#define SECURE 1 /* Wipe memory before freeing it? */
#endif
#ifndef WSIZE
#define WSIZE 32768 /* Window size - 32768 for PKZIP compatibility */
#endif
#ifndef STRICT
#define STRICT 1 /* Require unused bits to be zero? */
#endif
/*
* Huffman code lookup table entry--this entry is four bytes for machines
* that have 16-bit pointers (e.g. PC's in the small or medium model).
* Valid extra bits are 0..13. e == 32 is EOB (end of block), e == 16
* means that v is a literal, e < 0 means that v is a pointer to the next
* table, which codes -e bits, and lastly e == -128 indicates an unused
* code. If a code with e == -128 is looked up, this implies an error in
* the data.
*/
struct huft {
union {
struct {
signed char exop; /* # of extra bits or operation */
char bits; /* # of bits in this code/subcode */
} what;
char *pad; /* pad structure to a power of 2 (4 bytes for*/
} word; /* 16-bit, 8 bytes for 32-bit machines) */
union {
ush base; /* literal, length base, or distance base */
huft *next; /* pointer to next level of table */
} more;
};
#define base more.base
#define next more.next
#define exop word.what.exop
#define bits word.what.bits
/* Function prototypes */
#ifndef OF
# if defined(__STDC__) || defined(__cplusplus)
# define OF(a) a
# else
# define OF(a) ()
# endif
#endif
static int huft_build OF(( PGPContextRef context,
unsigned char const *, unsigned, unsigned,
ush const *, ush const *, huft **,
unsigned *));
static int huft_free OF(( PGPContextRef context, huft *));
/*
* The inflate algorithm uses a sliding 32K byte window on the uncompressed
* stream to find repeated byte strings. This is implemented here as a
* circular buffer. The index is updated simply by incrementing and then
* and'ing with 0x7fff (32K-1).
*/
/* Tables for deflate from PKZIP's appnote.txt. */
static unsigned const border[] = { /* Order of the bit length code lengths */
16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};
static ush const cplens[] = { /* Copy lengths for literal codes 257..285 */
1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 15, 17, 21, 25, 29,
33, 41, 49, 57, 65, 81, 97, 113, 129, 161, 193, 225, 256, 0, 0};
/* actually lengths - 2; also see note #13 above about 258 */
static ush const cplext[] = { /* Extra bits for literal codes 257..285 */
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2,
3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 128, 128}; /* 128==invalid */
static ush const cpdist[] = { /* Copy offsets for distance codes 0..29 */
1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
8193, 12289, 16385, 24577};
static ush const cpdext[] = { /* Extra bits for distance codes */
0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6,
7, 7, 8, 8, 9, 9, 10, 10, 11, 11,
12, 12, 13, 13};
/* And'ing with mask[n] masks the lower n bits */
static unsigned const mask[17] = {
0x0000,
0x0001, 0x0003, 0x0007, 0x000f, 0x001f, 0x003f, 0x007f, 0x00ff,
0x01ff, 0x03ff, 0x07ff, 0x0fff, 0x1fff, 0x3fff, 0x7fff, 0xffff
};
/* Macros for inflate() bit peeking and grabbing.
* The usage is:
*
* NEEDBITS(j)
* x = b & mask[j];
* DUMPBITS(j)
*
* where NEEDBITS makes sure that b has at least j bits in it, and
* DUMPBITS removes the bits from b. The macros use the variable k
* for the number of bits in b. Normally, b and k are register
* variables for speed, and are initialized at the beginning of a
* routine that uses these macros from a global bit buffer and count.
*
* It is allowed for more bits to be requested than actually used.
* The remainder stay in the bit buffer. NOTE that a few more
* bytes than necessary may be grabbed at the end of input.
* This should not be fatal as long as the grabbed bits stay
* in the bit buffer. The wrapup functions should check for this.
* There is also the macro GRABBITS which is like NEEDBITS, but uses
* *g++ to get the byte without checking availability. This requires
* using AVAILBYTES first to assure that the needed bytes are there
* already. g is set to inptr and then inptr is restored to g around
* this usage.
*/
/*
* Variable usage:
* b = bit buffer, k least-significant bits valid
* k = # of bits in bit buffer
* g = grab bits pointer (pointer to next byte to add: b |= *g++ << k)
* s = # of bytes available
* ctx = InflateContext
*/
/* Functions to save the state of the bit buffer before returning */
#define LOADBITBUF \
(b=ctx->bitbuffer, k=ctx->bufbits, g=ctx->inptr, s=ctx->inlen)
#define SAVEBITBUF \
(ctx->bitbuffer=b, ctx->bufbits=k, ctx->inptr=g, ctx->inlen=s)
#define SAVEBITBUFEMPTY \
(ctx->bitbuffer=b, ctx->bufbits=k, ctx->inptr=g, ctx->inlen=0)
/*
* A simple macro to ensure that at least "x" bits are in the bit buffer.
* Does NOT check for input underflow (s < 0).
*/
#define GRABBITS(x) \
while (k < x) { \
--s; \
b |= (ulg)*g++ << k; \
k += 8; \
}
/*
* Get "x" bits in the input buffer or execute the "whatif" code.
* NOTE NOTE NOTE that if the "whatif" code is executed, s is set to -1!
* (This is faster on a lot of machines.) You will usually have to
* reset it to 0 manually.
*/
#define GETBITSOR(x, whatif) \
while (k < (unsigned)(x)) { \
if (--s < 0) { \
whatif; \
} \
b |= (ulg)*g++ << k; \
k += 8; \
}
/*
* A macro to ensure that there are at least "x" bits available in the
* bit buffer. If there are not, the bit buffer is saved back into the
* context, and any code in "savestate" is run to save any additional
* context that might be needed. Then, "return 1" means that more input is
* needed.
*/
#define NEEDBITS(x, savestate) \
GETBITSOR(x, savestate; SAVEBITBUFEMPTY; return 1)
/*
* NEEDBITS2 uses two figures for the number of bits required. The first,
* x, is cheap to compute and is used unless we run out of input data.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -