📄 pgpzinflate.c

📁 著名的加密软件的应用于电子邮件中
💻 C
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页
/*
* inflate.c -- Not copyrighted 1992,1993 by Mark Adler
* Latest hacking on this version by Colin Plumb
* version p3, 19 Oct 1995
*
* $Id: pgpZInflate.c,v 1.2 1997/01/24 23:35:54 mhw Exp $
*/


/*
* You can do whatever you like with this source file, though I would
* prefer that if you modify it and redistribute it that you include
* comments to that effect with your name and the date. Thank you.
	*
	* History:
	* vers	date	who	what
	* ----  ---------  --------------  ------------------------------------
	* a ~~ Feb 92 M. Adler	used full (large, one-step) lookup table
	* b1 21 Mar 92 M. Adler	first version with partial lookup tables
	* b2 21 Mar 92 M. Adler	fixed bug in fixed-code blocks
	* b3 22 Mar 92 M. Adler	sped up match copies, cleaned up some
	* b4 25 Mar 92 M. Adler	added prototypes; removed window[] (now
	*	is the responsibility of unzip.h--also
	*	changed name to slide[]), so needs diffs
	*	for unzip.c and unzip.h (this allows
	*	compiling in the small model on MSDOS);
	*	fixed cast of q in huft_build();
	* b5 26 Mar 92 M. Adler	got rid of unintended macro recursion.
	* b6 27 Mar 92 M. Adler	got rid of nextbyte() routine. fixed
	*	bug in inflate_fixed().
	* c1 30 Mar 92 M. Adler	removed lbits, dbits environment variables.
	*	changed BMAX to 16 for explode. Removed
	*	OUTB usage, and replaced it with flush()--
	*	this was a 20% speed improvement! Added
	*	an explode.c (to replace unimplode.c) that
	*	uses the huft routines here. Removed
	*	register union.
	* c2	4 Apr 92 M. Adler fixed bug for file sizes a multiple of 32k.
	* c3 10 Apr 92 M. Adler	reduced memory of code tables made by
	*	huft_build significantly (factor of two to
	*	three).
	* c4 15 Apr 92 M. Adler	added NOMEMCPY do kill use of memcpy().
	*	worked around a Turbo C optimization bug.
	* c5 21 Apr 92 M. Adler	added the WSIZE #define to allow reducing
	*	the 32K window size for specialized
	*	applications.
	* c6 31 May 92 M. Adler	added some typecasts to eliminate warnings
	* c7 27 Jun 92 G. Roelofs	added some more typecasts (444: MSC bug).
	* c8 5 Oct 92 J-l. Gailly	added ifdef'd code to deal with PKZIP bug.
	* c9 9 Oct 92 M. Adler	removed a memory error message (~line 416).
	* c10 17 Oct 92 G. Roelofs	changed ULONG/UWORD/byte to ulg/ush/uch,
	*	removed old inflate, renamed inflate_entry
	*	to inflate, added Mark's fix to a comment.
	* c11 2 Jan 93 M. Adler	fixed bug in detection of incomplete
	*	tables, and removed assumption that EOB is
	*	the longest code (bad assumption).
	* c12 3 Jan 93 M. Adler	make tables for fixed blocks only once.
	* c13 5 Jan 93 M. Adler	allow all zero length codes (pkzip 2.04c
	*	outputs one zero length code for an empty
	*	distance tree).
	* c14 12 Mar 93 M. Adler	made inflate.c standalone with the
	*	introduction of inflate.h.
	* d0 25 Apr 93 M. Adler	several speedups in inflate_codes of
	*	almost 20% (suggested by Urban Mueller)--
	*	now requires the use of backfill.[ch]
	* d1 4 May 93 M. Adler deleted extraneous statement in cheap loop,
	* optimized common copy a little more.
	* d2 5 May 93 M. Adler calculate number of cheap loops, a few
	* other small optimizations.
	* p1 Nov 93 C. Plumb Adpated to be able to suspend itself.
	* Pretty massive reorganization.
	* Many comments added.
	* p2 18 Oct 95 C. Plumb	Improved interface some more. Now
	*	nicely re-entrant. Still more comments.
	* p3 19 Oct 95 C. Plumb	Changed infInflate core function so that
	*	it sucks the input completely dry before
	*	returning. This gets rid of the old ad-hoc
	*	technique for flushing out the last little
	*	bit of a compressed file by padding it
	*	with some dummy data, which was ugly.
*/


/*
* Inflate deflated (PKZIP's method 8 compressed) data. The compression
* method searches for as much of the current string of bytes (up to a
* length of 258) in the previous 32K bytes. If it doesn't find any
* matches (of at least length 3), it codes the next byte. Otherwise, it
* codes the length of the matched string and its distance backwards from
* the current position. There is a single Huffman code that codes both
* single bytes (called "literals") and match lengths. A second Huffman
* code codes the distance information, which follows a length code. Each
* length or distance code actually represents a base value and a number
* of "extra" (sometimes zero) bits to get to add to the base value. At
* the end of each deflated block is a special end-of-block (EOB) literal/
* length code. The decoding process is basically: get a literal/length
* code; if EOB then done; if a literal, emit the decoded byte; if a
* length then get the distance and emit the referred-to bytes from the
* sliding window of previously emitted data.
*
* There are (currently) three kinds of inflate blocks: stored, fixed, and
* dynamic. The compressor outputs a chunk of data at a time and decides
* which method to use on a chunk-by-chunk basis. A chunk might typically
* be 32K to 64K, uncompressed. If the chunk is uncompressible, then the
* "stored" method is used. In this case, the bytes are simply stored as
* is, eight bits per byte, with none of the above coding. The bytes are
* preceded by a (16-bit) count, since there is no longer an EOB code.
*
* If the data is compressible, then either the fixed or dynamic methods
* are used. In the dynamic method, the compressed data is preceded by
* an encoding of the literal/length and distance Huffman codes that are
* to be used to decode this block. The representation is itself Huffman
* coded, and so is preceded by a description of that code. These code
* descriptions take up a little space, and so for small blocks, there is
* a predefined set of codes, called the fixed codes. The fixed method is
* used if the block ends up smaller that way (usually for quite small
* chunks); otherwise the dynamic method is used. In the latter case, the
* codes are customized to the probabilities in the current block and so
* can code it much better than the pre-determined fixed codes can.
*
* The Huffman codes themselves are decoded using a mutli-level table
* lookup, in order to maximize the speed of decoding plus the speed of
* building the decoding tables. See the comments below that precede the
* LBITS and DBITS tuning parameters.
*/


/*
*  Notes beyond the 1.93a  appnote.txt:
*
* 1. Distance pointers never point before the beginning of the output
* stream.
* 2. Distance pointers can point back across blocks, up to 32k away.
* 3. There is an implied maximum of 7 bits for the bit length table and
* 15 bits for the actual data.
* 4. If only one code exists, then it is encoded using one bit. (Zero
* would be more efficient, but perhaps a little confusing.) If two
* codes exist, they are coded using one bit each (0 and 1).
* 5. There is no way of sending zero distance codes--a dummy must be
* sent if there are none. (History: a pre 2.0 version of PKZIP would
* store blocks with no distance codes, but this was discovered to be
* too harsh a criterion.) Valid only for 1.93a. 2.04c does allow
* zero distance codes, which is sent as one code of zero bits in
* length.
* 6. There are up to 286 literal/length codes. Code 256 represents the
* end-of-block. Note however that the static length tree defines
* 288 codes just to fill out the Huffman codes. Codes 286 and 287
* cannot be used though, since there is no length base or extra bits
*	defined for them. Similarily, there are up to 30 distance codes.
*	However, static trees define 32 codes (all 5 bits) to fill out the
*	Huffman codes, but the last two had better not show up in the data.
* 7. Unzip can check dynamic Huffman blocks for complete code sets.
* The exception is that a single code would not be complete (see #4).
* 8. The five bits following the block type is really the number of
* literal codes sent minus 257.
* 9. Length codes 8,16,16 are interpreted as 13 length codes of 8 bits
* (1+6+6). Therefore, to output three times the length, you output
* three codes (1+1+1), whereas to output four times the same length,
* you only need two codes (1+3). Hmm.
* 10. In the tree reconstruction algorithm, Code = Code + Increment
* only if BitLength(i) is not zero. (Pretty obvious.)
* 11. Correction: 4 Bits: # of Bit Length codes - 4 (4 - 19)
* 12. Note: length code 284 can represent 227-258, but length code 285
* really is 258. The last length deserves its own, short code
* since it gets used a lot in very redundant files. The length
* 258 is special since 258 - 3 (the min match length) is 255.
* 13. The literal/length and distance code bit lengths are read as a
* single stream of lengths. It is possible (and advantageous) for
* a repeat code (16, 17, or 18) to go across the boundary between
* the two sets of lengths.
*/

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include <string.h> /* for memset() */

#include "pgpDebug.h"
#include "pgpMem.h"
#include "pgpUsuals.h"
#include "pgpZInflate.h"

/* Increasing these only wastes space, so they can be changed if necessary */
typedef word16 ush;
typedef word32 ulg;

/* Private context structure for use by all functions. */
struct InflateContext {
		/* Major state information */
		int state;
		int substate;
		int lastblock;

		/* Output-related information */
		unsigned char *slide; /* Circular output buffer */
		unsigned char *slideend;
		unsigned slidelen; /* slideend-slide - MUST BE POWER OF 2 */
		unsigned slidemask; /* slidelen-1 */
		unsigned char *outptr; /* Pointer into slide */
		unsigned char const *readptr; /* Pointer into slide for reading */

		/* Input-related information */
		unsigned char const *inptr; /* Input pointer */
		int inlen;  /* Input length */
		ulg bitbuffer;
		unsigned bufbits;

		/* The literal/length tree - also used for the bit-length tree */
		struct huft *tree1;
		unsigned bits1;

		/* The distance tree */
		struct huft *tree2;
		unsigned bits2;

		/* Encoded huffman tree */
		unsigned char bitlen[286+30];

		/* For dynamic trees */
		unsigned bitlengths;
		unsigned litcodes;
		unsigned distcodes;

		/* Used in various places */
		struct huft const *t;
		unsigned copylen;
		int distbase;
		int numbits; /* # of bits in dyn. tree repeat and after dist. code*/
		unsigned index; /* Position in dynamic tree bit lengths we've read to*/
	};

/* Options for behaviour */
	#ifndef SECURE
	#define SECURE 1	/* Wipe memory before freeing it? */
	#endif

	#ifndef WSIZE
	#define WSIZE 32768	/* Window size - 32768 for PKZIP compatibility */
	#endif

	#ifndef STRICT
	#define STRICT 1	/* Require unused bits to be zero? */
	#endif

	/*
* Huffman code lookup table entry--this entry is four bytes for machines
* that have 16-bit pointers (e.g. PC's in the small or medium model).
* Valid extra bits are 0..13. e == 32 is EOB (end of block), e == 16
* means that v is a literal, e < 0 means that v is a pointer to the next
* table, which codes -e bits, and lastly e == -128 indicates an unused
* code. If a code with e == -128 is looked up, this implies an error in
* the data.
*/
struct huft {
		union {
			struct {
			 signed char exop; /* # of extra bits or operation */
			 char bits; /* # of bits in this code/subcode */
			} what;
			char *pad;	/* pad structure to a power of 2 (4 bytes for*/
		} word; 		/* 16-bit, 8 bytes for 32-bit machines) */
		union {
			ush base;	/* literal, length base, or distance base */
			struct huft *next; /* pointer to next level of table */
		} more;
	};

#define base more.base
#define next more.next
#define exop word.what.exop
#define bits word.what.bits


/* Function prototypes */
#ifndef OF
# if defined(__STDC__) || defined(__cplusplus)
# define OF(a) a
# else
# define OF(a) ()
# endif
#endif

static int huft_build OF((unsigned char const *, unsigned, unsigned,
ush const *, ush const *, struct huft **,
unsigned *));
static int huft_free OF((struct huft *));

/*
* The inflate algorithm uses a sliding 32K byte window on the uncompressed
* stream to find repeated byte strings. This is implemented here as a
* circular buffer. The index is updated simply by incrementing and then
* and'ing with 0x7fff (32K-1).
*/

/* Tables for deflate from PKZIP's appnote.txt. */
static unsigned const border[] = { /* Order of the bit length code lengths */
 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};

static ush const cplens[] = { /* Copy lengths for literal codes 257..285 */
 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 15, 17, 21, 25, 29,
 33, 41, 49, 57, 65, 81, 97, 113, 129, 161, 193, 225, 256, 0, 0};
 /* actually lengths - 2; also see note #13 above about 258 */

static ush const cplext[] = { /* Extra bits for literal codes 257..285 */
 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2,
 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 128, 128}; /* 128==invalid */

static ush const cpdist[] = { /* Copy offsets for distance codes 0..29 */
 1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
 257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
 8193, 12289, 16385, 24577};

static ush const cpdext[] = { /* Extra bits for distance codes */
 0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6,
 7, 7, 8, 8, 9, 9, 10, 10, 11, 11,
 12, 12, 13, 13};

/* And'ing with mask[n] masks the lower n bits */
static unsigned const mask[17] = {
0x0000,
0x0001, 0x0003, 0x0007, 0x000f, 0x001f, 0x003f, 0x007f, 0x00ff,
0x01ff, 0x03ff, 0x07ff, 0x0fff, 0x1fff, 0x3fff, 0x7fff, 0xffff
};


/* Macros for inflate() bit peeking and grabbing.
* The usage is:
*
*	NEEDBITS(j)
*	x = b & mask[j];
*	DUMPBITS(j)
*
* where NEEDBITS makes sure that b has at least j bits in it, and
* DUMPBITS removes the bits from b. The macros use the variable k
* for the number of bits in b. Normally, b and k are register
* variables for speed, and are initialized at the beginning of a
* routine that uses these macros from a global bit buffer and count.
*
* It is allowed for more bits to be requested than actually used.
* The remainder stay in the bit buffer. NOTE that a few more
* bytes than necessary may be grabbed at the end of input.
* This should not be fatal as long as the grabbed bits stay
* in the bit buffer. The wrapup functions should check for this.

* There is also the macro GRABBITS which is like NEEDBITS, but uses
* *g++ to get the byte without checking availability. This requires
* using AVAILBYTES first to assure that the needed bytes are there
* already. g is set to inptr and then inptr is restored to g around
* this usage.
*/

/*
* Variable usage:
* b = bit buffer, k least-significant bits valid
* k = # of bits in bit buffer
* g = grab bits pointer (pointer to next byte to add: b |= *g++ << k)
* s = # of bytes available
* ctx = InflateContext
*/

/* Functions to save the state of the bit buffer before returning */
#define LOADBITBUF \
 (b=ctx->bitbuffer, k=ctx->bufbits, g=ctx->inptr, s=ctx->inlen)
#define SAVEBITBUF \
 (ctx->bitbuffer=b, ctx->bufbits=k, ctx->inptr=g, ctx->inlen=s)
#define SAVEBITBUFEMPTY \
 (ctx->bitbuffer=b, ctx->bufbits=k, ctx->inptr=g, ctx->inlen=0)

/*
* A simple macro to ensure that at least "x" bits are in the bit buffer.
* Does NOT check for input underflow (s < 0).
*/
#define GRABBITS(x) \
		while (k < x) { \
			--s; \
			b |= (ulg)*g++ << k; \
			k += 8; \
		}

/*
* Get "x" bits in the input buffer or execute the "whatif" code.
* NOTE NOTE NOTE that if the "whatif" code is executed, s is set to -1!
* (This is faster on a lot of machines.) You will usually have to
* reset it to 0 manually.
*/
#define GETBITSOR(x, whatif) \
		while (k < (unsigned)(x)) { \
			if (--s < 0) { \
			 whatif; \
			} \
			b |= (ulg)*g++ << k; \
			k += 8; \
		}

/*
* A macro to ensure that there are at least "x" bits available in the
* bit buffer. If there are not, the bit buffer is saved back into the
* context, and any code in "savestate" is run to save any additional
* context that might be needed. Then, "return 1" means that more input is
* needed.
*/
#define NEEDBITS(x, savestate) \
 GETBITSOR(x, savestate; SAVEBITBUFEMPTY; return 1)

/*
* NEEDBITS2 uses two figures for the number of bits required. The first,
* x, is cheap to compute and is used unless we run out of input data.
* If that happens, the more expensive value, x2, is used.
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -