⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pgpzinflate.c

📁 vc环境下的pgp源码
💻 C
📖 第 1 页 / 共 4 页
字号:
/*
 * inflate.c -- Not copyrighted 1992,1993 by Mark Adler
 * Latest hacking on this version by Colin Plumb
 * version p3, 19 Oct 1995
 *
 * $Id: pgpZInflate.c,v 1.23 1999/03/22 16:47:03 hal Exp $
 */


/*
 * You can do whatever you like with this source file, though I would
 * prefer that if you modify it and redistribute it that you include
 * comments to that effect with your name and the date.  Thank you.
 *
 * History:
 * vers    date          who           what
 * ----  ---------  --------------  ------------------------------------
 *  a    ~~ Feb 92  M. Adler        used full (large, one-step) lookup table
 *  b1   21 Mar 92  M. Adler        first version with partial lookup tables
 *  b2   21 Mar 92  M. Adler        fixed bug in fixed-code blocks
 *  b3   22 Mar 92  M. Adler        sped up match copies, cleaned up some
 *  b4   25 Mar 92  M. Adler        added prototypes; removed window[] (now
 *                                  is the responsibility of unzip.h--also
 *                                  changed name to slide[]), so needs diffs
 *                                  for unzip.c and unzip.h (this allows
 *                                  compiling in the small model on MSDOS);
 *                                  fixed cast of q in huft_build();
 *  b5   26 Mar 92  M. Adler        got rid of unintended macro recursion.
 *  b6   27 Mar 92  M. Adler        got rid of nextbyte() routine.  fixed
 *                                  bug in inflate_fixed().
 *  c1   30 Mar 92  M. Adler        removed lbits, dbits environment variables.
 *                                  changed BMAX to 16 for explode.  Removed
 *                                  OUTB usage, and replaced it with flush()--
 *                                  this was a 20% speed improvement!  Added
 *                                  an explode.c (to replace unimplode.c) that
 *                                  uses the huft routines here.  Removed
 *                                  register union.
 *  c2    4 Apr 92  M. Adler        fixed bug for file sizes a multiple of 32k.
 *  c3   10 Apr 92  M. Adler        reduced memory of code tables made by
 *                                  huft_build significantly (factor of two to
 *                                  three).
 *  c4   15 Apr 92  M. Adler        added NOMEMCPY do kill use of memcpy().
 *                                  worked around a Turbo C optimization bug.
 *  c5   21 Apr 92  M. Adler        added the WSIZE #define to allow reducing
 *                                  the 32K window size for specialized
 *                                  applications.
 *  c6   31 May 92  M. Adler        added some typecasts to eliminate warnings
 *  c7   27 Jun 92  G. Roelofs      added some more typecasts (444:  MSC bug).
 *  c8    5 Oct 92  J-l. Gailly     added ifdef'd code to deal with PKZIP bug.
 *  c9    9 Oct 92  M. Adler        removed a memory error message (~line 416).
 *  c10  17 Oct 92  G. Roelofs      changed ULONG/UWORD/byte to ulg/ush/uch,
 *                                  removed old inflate, renamed inflate_entry
 *                                  to inflate, added Mark's fix to a comment.
 *  c11   2 Jan 93  M. Adler        fixed bug in detection of incomplete
 *                                  tables, and removed assumption that EOB is
 *                                  the longest code (bad assumption).
 *  c12   3 Jan 93  M. Adler        make tables for fixed blocks only once.
 *  c13   5 Jan 93  M. Adler        allow all zero length codes (pkzip 2.04c
 *                                  outputs one zero length code for an empty
 *                                  distance tree).
 *  c14  12 Mar 93  M. Adler        made inflate.c standalone with the
 *                                  introduction of inflate.h.
 *  d0   25 Apr 93  M. Adler        several speedups in inflate_codes of
 *                                  almost 20% (suggested by Urban Mueller)--
 *                                  now requires the use of backfill.[ch]
 *  d1    4 May 93  M. Adler        deleted extraneous statement in cheap loop,
 *                                  optimized common copy a little more.
 *  d2    5 May 93  M. Adler        calculate number of cheap loops, a few
 *                                  other small optimizations.
 *  p1      Nov 93  C. Plumb        Adpated to be able to suspend itself.
 *                                  Pretty massive reorganization.
 *                                  Many comments added.
 *  p2   18 Oct 95  C. Plumb        Improved interface some more.  Now
 *                                  nicely re-entrant.  Still more comments.
 *  p3   19 Oct 95  C. Plumb        Changed infInflate core function so that
 *                                  it sucks the input completely dry before
 *                                  returning.  This gets rid of the old ad-hoc
 *                                  technique for flushing out the last little
 *                                  bit of a compressed file by padding it
 *                                  with some dummy data, which was ugly.
 */


/*
 * Inflate deflated (PKZIP's method 8 compressed) data.  The compression
 * method searches for as much of the current string of bytes (up to a
 * length of 258) in the previous 32K bytes.  If it doesn't find any
 * matches (of at least length 3), it codes the next byte.  Otherwise, it
 * codes the length of the matched string and its distance backwards from
 * the current position.  There is a single Huffman code that codes both
 * single bytes (called "literals") and match lengths.  A second Huffman
 * code codes the distance information, which follows a length code.  Each
 * length or distance code actually represents a base value and a number
 * of "extra" (sometimes zero) bits to get to add to the base value.  At
 * the end of each deflated block is a special end-of-block (EOB) literal/
 * length code.  The decoding process is basically: get a literal/length
 * code; if EOB then done; if a literal, emit the decoded byte; if a
 * length then get the distance and emit the referred-to bytes from the
 * sliding window of previously emitted data.
 *
 * There are (currently) three kinds of inflate blocks: stored, fixed, and
 * dynamic.  The compressor outputs a chunk of data at a time and decides
 * which method to use on a chunk-by-chunk basis.  A chunk might typically
 * be 32K to 64K, uncompressed.  If the chunk is uncompressible, then the
 * "stored" method is used.  In this case, the bytes are simply stored as
 * is, eight bits per byte, with none of the above coding.  The bytes are
 * preceded by a (16-bit) count, since there is no longer an EOB code.
 *
 * If the data is compressible, then either the fixed or dynamic methods
 * are used.  In the dynamic method, the compressed data is preceded by
 * an encoding of the literal/length and distance Huffman codes that are
 * to be used to decode this block.  The representation is itself Huffman
 * coded, and so is preceded by a description of that code.  These code
 * descriptions take up a little space, and so for small blocks, there is
 * a predefined set of codes, called the fixed codes.  The fixed method is
 * used if the block ends up smaller that way (usually for quite small
 * chunks); otherwise the dynamic method is used.  In the latter case, the
 * codes are customized to the probabilities in the current block and so
 * can code it much better than the pre-determined fixed codes can.
 *
 * The Huffman codes themselves are decoded using a mutli-level table
 * lookup, in order to maximize the speed of decoding plus the speed of
 * building the decoding tables.  See the comments below that precede the
 * LBITS and DBITS tuning parameters.
 */


/*
 *  Notes beyond the 1.93a appnote.txt:
 *
 *  1. Distance pointers never point before the beginning of the output
 *     stream.
 *  2. Distance pointers can point back across blocks, up to 32k away.
 *  3. There is an implied maximum of 7 bits for the bit length table and
 *     15 bits for the actual data.
 *  4. If only one code exists, then it is encoded using one bit.  (Zero
 *     would be more efficient, but perhaps a little confusing.)  If two
 *     codes exist, they are coded using one bit each (0 and 1).
 *  5. There is no way of sending zero distance codes--a dummy must be
 *     sent if there are none.  (History: a pre 2.0 version of PKZIP would
 *     store blocks with no distance codes, but this was discovered to be
 *     too harsh a criterion.)  Valid only for 1.93a.  2.04c does allow
 *     zero distance codes, which is sent as one code of zero bits in
 *     length.
 *  6. There are up to 286 literal/length codes.  Code 256 represents the
 *     end-of-block.  Note however that the static length tree defines
 *     288 codes just to fill out the Huffman codes.  Codes 286 and 287
 *     cannot be used though, since there is no length base or extra bits
 *     defined for them.  Similarily, there are up to 30 distance codes.
 *     However, static trees define 32 codes (all 5 bits) to fill out the
 *     Huffman codes, but the last two had better not show up in the data.
 *  7. Unzip can check dynamic Huffman blocks for complete code sets.
 *     The exception is that a single code would not be complete (see #4).
 *  8. The five bits following the block type is really the number of
 *     literal codes sent minus 257.
 *  9. Length codes 8,16,16 are interpreted as 13 length codes of 8 bits
 *     (1+6+6).  Therefore, to output three times the length, you output
 *     three codes (1+1+1), whereas to output four times the same length,
 *     you only need two codes (1+3).  Hmm.
 * 10. In the tree reconstruction algorithm, Code = Code + Increment
 *     only if BitLength(i) is not zero.  (Pretty obvious.)
 * 11. Correction: 4 Bits: # of Bit Length codes - 4     (4 - 19)
 * 12. Note: length code 284 can represent 227-258, but length code 285
 *     really is 258.  The last length deserves its own, short code
 *     since it gets used a lot in very redundant files.  The length
 *     258 is special since 258 - 3 (the min match length) is 255.
 * 13. The literal/length and distance code bit lengths are read as a
 *     single stream of lengths.  It is possible (and advantageous) for
 *     a repeat code (16, 17, or 18) to go across the boundary between
 *     the two sets of lengths.
 */

#include "pgpConfig.h"

#include <string.h>

#include "pgpDebug.h"
#include "pgpMem.h"
#include "pgpUsuals.h"
#include "pgpZInflate.h"

/* Increasing these only wastes space, so they can be changed if necessary */
typedef PGPUInt16 ush;
typedef PGPUInt32 ulg;


typedef struct huft	huft;
/* Private context structure for use by all functions. */
struct InflateContext {
	PGPContextRef	context;
	/* Major state information */
	int state;
	int substate;
	int lastblock;

	/* static trees*/
	huft *		fixed_tl;
	huft *		fixed_td;
	PGPUInt32	fixed_bl;
	PGPUInt32	fixed_bd;

	/* Output-related information */
	unsigned char *slide;	/* Circular output buffer */
	unsigned char *slideend;
	unsigned slidelen;      /* slideend-slide - MUST BE POWER OF 2 */
	unsigned slidemask;	/* slidelen-1 */
	unsigned char *outptr;	/* Pointer into slide */
	unsigned char const *readptr;	/* Pointer into slide for reading */

	/* Input-related information */
	unsigned char const *inptr;	/* Input pointer */
	int inlen;		/* Input length */
	ulg bitbuffer;
	unsigned bufbits;

	/* The literal/length tree - also used for the bit-length tree */
	huft *tree1;
	unsigned bits1;

	/* The distance tree */
	huft *tree2;
	unsigned bits2;

	/* Encoded huffman tree */
	unsigned char bitlen[286+30];

	/* For dynamic trees */
	unsigned bitlengths;
	unsigned litcodes;
	unsigned distcodes;

	/* Used in various places */
	huft const *t;
	unsigned copylen;
	int distbase;
	int numbits;	/* # of bits in dyn. tree repeat and after dist. code*/
	unsigned index;	/* Position in dynamic tree bit lengths we've read to*/
	DEBUG_STRUCT_CONSTRUCTOR( InflateContext )
};

/* Options for behaviour */
#ifndef SECURE
#define SECURE 1	/* Wipe memory before freeing it? */
#endif

#ifndef WSIZE
#define WSIZE 32768	/* Window size - 32768 for PKZIP compatibility */
#endif

#ifndef STRICT
#define STRICT 1	/* Require unused bits to be zero? */
#endif

/*
 * Huffman code lookup table entry--this entry is four bytes for machines
 * that have 16-bit pointers (e.g. PC's in the small or medium model).
 * Valid extra bits are 0..13.  e == 32 is EOB (end of block), e == 16
 * means that v is a literal, e < 0 means that v is a pointer to the next
 * table, which codes -e bits, and lastly e == -128 indicates an unused
 * code.  If a code with e == -128 is looked up, this implies an error in
 * the data.
 */
struct huft {
	union {
		struct {
			signed char exop; /* # of extra bits or operation */
			char bits;	/* # of bits in this code/subcode */
		} what;
		char *pad;	/* pad structure to a power of 2 (4 bytes for*/
	} word;			/*  16-bit, 8 bytes for 32-bit machines) */
	union {
		ush base;	/* literal, length base, or distance base */
		huft *next; /* pointer to next level of table */
	} more;
};

#define base more.base
#define next more.next
#define exop word.what.exop
#define bits word.what.bits


/* Function prototypes */
#ifndef OF
#  if defined(__STDC__) || defined(__cplusplus)
#    define OF(a) a
#  else
#    define OF(a) ()
#  endif
#endif

static int huft_build OF(( PGPContextRef context,
				unsigned char const *, unsigned, unsigned,
                 ush const *, ush const *, huft **,
                  unsigned *));
static int huft_free OF(( PGPContextRef context, huft *));

/*
 * The inflate algorithm uses a sliding 32K byte window on the uncompressed
 * stream to find repeated byte strings.  This is implemented here as a
 * circular buffer.  The index is updated simply by incrementing and then
 * and'ing with 0x7fff (32K-1).
 */

/* Tables for deflate from PKZIP's appnote.txt. */
static unsigned const border[] = { /* Order of the bit length code lengths */
	16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};

static ush const cplens[] = { /* Copy lengths for literal codes 257..285 */
	1, 2, 3, 4, 5, 6, 7,  8,  9, 11, 13, 15, 17, 21, 25, 29,
	33, 41, 49, 57, 65, 81, 97, 113, 129, 161, 193, 225, 256, 0, 0};
	/* actually lengths - 2; also see note #13 above about 258 */

static ush const cplext[] = { /* Extra bits for literal codes 257..285 */
	0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2,
	3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0, 128, 128}; /* 128==invalid */

static ush const cpdist[] = { /* Copy offsets for distance codes 0..29 */
	1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
	257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
	8193, 12289, 16385, 24577};

static ush const cpdext[] = { /* Extra bits for distance codes */
	0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6,
	7, 7, 8, 8, 9, 9, 10, 10, 11, 11,
	12, 12, 13, 13};

/* And'ing with mask[n] masks the lower n bits */
static unsigned const mask[17] = {
    0x0000,
    0x0001, 0x0003, 0x0007, 0x000f, 0x001f, 0x003f, 0x007f, 0x00ff,
    0x01ff, 0x03ff, 0x07ff, 0x0fff, 0x1fff, 0x3fff, 0x7fff, 0xffff
};


/* Macros for inflate() bit peeking and grabbing.
 * The usage is:
 *
 *	NEEDBITS(j)
 *	x = b & mask[j];
 *	DUMPBITS(j)
 *
 * where NEEDBITS makes sure that b has at least j bits in it, and
 * DUMPBITS removes the bits from b.  The macros use the variable k
 * for the number of bits in b.  Normally, b and k are register
 * variables for speed, and are initialized at the beginning of a
 * routine that uses these macros from a global bit buffer and count.
 *
 * It is allowed for more bits to be requested than actually used.
 * The remainder stay in the bit buffer.  NOTE that a few more
 * bytes than necessary may be grabbed at the end of input.
 * This should not be fatal as long as the grabbed bits stay
 * in the bit buffer.  The wrapup functions should check for this.

 * There is also the macro GRABBITS which is like NEEDBITS, but uses
 * *g++ to get the byte without checking availability.  This requires
 * using AVAILBYTES first to assure that the needed bytes are there
 * already.  g is set to inptr and then inptr is restored to g around
 * this usage.
 */

/*
 * Variable usage:
 * b = bit buffer, k least-significant bits valid
 * k = # of bits in bit buffer
 * g = grab bits pointer (pointer to next byte to add: b |= *g++ << k)
 * s = # of bytes available
 * ctx = InflateContext
 */

/* Functions to save the state of the bit buffer before returning */
#define LOADBITBUF \
	(b=ctx->bitbuffer, k=ctx->bufbits, g=ctx->inptr, s=ctx->inlen)
#define SAVEBITBUF \
	(ctx->bitbuffer=b, ctx->bufbits=k, ctx->inptr=g, ctx->inlen=s)
#define SAVEBITBUFEMPTY \
	(ctx->bitbuffer=b, ctx->bufbits=k, ctx->inptr=g, ctx->inlen=0)

/*
 * A simple macro to ensure that at least "x" bits are in the bit buffer.
 * Does NOT check for input underflow (s < 0).
 */
#define GRABBITS(x) \
	while (k < x) { \
		--s; \
		b |= (ulg)*g++ << k; \
		k += 8; \
	}

/*
 * Get "x" bits in the input buffer or execute the "whatif" code.
 * NOTE NOTE NOTE that if the "whatif" code is executed, s is set to -1!
 * (This is faster on a lot of machines.)  You will usually have to
 * reset it to 0 manually.
 */
#define GETBITSOR(x, whatif) \
	while (k < (unsigned)(x)) { \
		if (--s < 0) { \
			whatif; \
		} \
		b |= (ulg)*g++ << k; \
		k += 8; \
	}

/*
 * A macro to ensure that there are at least "x" bits available in the
 * bit buffer.  If there are not, the bit buffer is saved back into the
 * context, and any code in "savestate" is run to save any additional
 * context that might be needed.  Then, "return 1" means that more input is
 * needed.
 */
#define NEEDBITS(x, savestate) \
	GETBITSOR(x, savestate; SAVEBITBUFEMPTY; return 1)

/*
 * NEEDBITS2 uses two figures for the number of bits required.  The first,
 * x, is cheap to compute and is used unless we run out of input data.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -