📄 zip.htm
字号:
bits 1-2: block type
00 (0) - block is stored - all stored data is byte aligned.
skip bits until next byte, then next word = block length,
followed by the ones compliment of the block length word.
remaining data in block is the stored data.</pre>
<pre> 01 (1) - use fixed huffman codes for literal and distance codes.
lit code bits dist code bits
--------- ---- --------- ----
0 - 143 8 0 - 31 5
144 - 255 9
256 - 279 7
280 - 287 8</pre>
<pre> literal codes 286-287 and distance codes 30-31 are never
used but participate in the huffman construction.</pre>
<pre> 10 (2) - dynamic huffman codes. (see expanding huffman codes)</pre>
<pre> 11 (3) - reserved - flag a "error in compressed data" if seen.</pre>
<pre>expanding huffman codes
-----------------------
if the data block is stored with dynamic huffman codes, the huffman
codes are sent in the following compressed format:</pre>
<pre> 5 bits: # of literal codes sent - 256 (256 - 286)
all other codes are never sent.
5 bits: # of dist codes - 1 (1 - 32)
4 bits: # of bit length codes - 3 (3 - 19)</pre>
<pre>the huffman codes are sent as bit lengths and the codes are built as
described in the implode algorithm. the bit lengths themselves are
compressed with huffman codes. there are 19 bit length codes:</pre>
<pre> 0 - 15: represent bit lengths of 0 - 15
16: copy the previous bit length 3 - 6 times.
the next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
example: codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
expand to 12 bit lengths of 8 (1 + 6 + 5)
17: repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
18: repeat a bit length of 0 for 11 - 138 times (7 bits of length)</pre>
<pre>the lengths of the bit length codes are sent packed 3 bits per value
(0 - 7) in the following order:</pre>
<pre> 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15</pre>
<pre>the huffman codes should be built as described in the implode algorithm
except codes are assigned starting at the shortest bit length, i.e. the
shortest code should be all 0's rather than all 1's. also, codes with
a bit length of zero do not participate in the tree construction. the
codes are then used to decode the bit lengths for the literal and distance
tables.</pre>
<pre>the bit lengths for the literal tables are sent first with the number
of entries sent described by the 5 bits sent earlier. there are up
to 286 literal characters; the first 256 represent the respective 8
bit character, code 256 represents the end-of-block code, the remaining
29 codes represent copy lengths of 3 thru 258. there are up to 30
distance codes representing distances from 1 thru 32k as described
below.</pre>
<pre> length codes
------------
extra extra extra extra
code bits length code bits lengths code bits lengths code bits length(s)
---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- ---------
257 0 3 265 1 11,12 273 3 35-42 281 5 131-162
258 0 4 266 1 13,14 274 3 43-50 282 5 163-194
259 0 5 267 1 15,16 275 3 51-58 283 5 195-226
260 0 6 268 1 17,18 276 3 59-66 284 5 227-257
261 0 7 269 2 19-22 277 4 67-82 285 0 258
262 0 8 270 2 23-26 278 4 83-98
263 0 9 271 2 27-30 279 4 99-114
264 0 10 272 2 31-34 280 4 115-130</pre>
<pre> distance codes
--------------
extra extra extra extra
code bits dist code bits dist code bits distance code bits distance
---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- --------
0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144
1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192
2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288
3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384
4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576
5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768
6 2 9-12 14 6 129-192 22 10 2049-3072
7 2 13-16 15 6 193-256 23 10 3073-4096</pre>
<pre>the compressed data stream begins immediately after the
compressed header data. the compressed data stream can be
interpreted as follows:</pre>
<pre>do
read header from input stream.</pre>
<pre> if stored block
skip bits until byte aligned
read count and 1's compliment of count
copy count bytes data block
otherwise
loop until end of block code sent
decode literal character from input stream
if literal < 256
copy character to the output stream
otherwise
if literal = end of block
break from loop
otherwise
decode distance from input stream</pre>
<pre> move backwards distance bytes in the output stream, and
copy length characters from this position to the output
stream.
end loop
while not last block</pre>
<pre>if data descriptor exists
skip bits until byte aligned
read crc and sizes
endif</pre>
<pre>decryption
----------</pre>
<pre>the encryption used in pkzip was generously supplied by roger
schlafly. pkware is grateful to mr. schlafly for his expert
help and advice in the field of data encryption.</pre>
<pre>pkzip encrypts the compressed data stream. encrypted files must
be decrypted before they can be extracted.</pre>
<pre>each encrypted file has an extra 12 bytes stored at the start of
the data area defining the encryption header for that file. the
encryption header is originally set to random values, and then
itself encrypted, using three, 32-bit keys. the key values are
initialized using the supplied encryption password. after each byte
is encrypted, the keys are then updated using pseudo-random number
generation techniques in combination with the same crc-32 algorithm
used in pkzip and described elsewhere in this document.</pre>
<pre>the following is the basic steps required to decrypt a file:</pre>
<pre>1) initialize the three 32-bit keys with the password.
2) read and decrypt the 12-byte encryption header, further
initializing the encryption keys.
3) read and decrypt the compressed data stream using the
encryption keys.
</pre>
<pre>step 1 - initializing the encryption keys
-----------------------------------------</pre>
<pre>key(0) <- 305419896
key(1) <- 591751049
key(2) <- 878082192</pre>
<pre>loop for i <- 0 to length(password)-1
update_keys(password(i))
end loop
</pre>
<pre>where update_keys() is defined as:
</pre>
<pre>update_keys(char):
key(0) <- crc32(key(0),char)
key(1) <- key(1) + (key(0) & 000000ffh)
key(1) <- key(1) * 134775813 + 1
key(2) <- crc32(key(2),key(1) >> 24)
end update_keys
</pre>
<pre>where crc32(old_crc,char) is a routine that given a crc value and a
character, returns an updated crc value after applying the crc-32
algorithm described elsewhere in this document.
</pre>
<pre>step 2 - decrypting the encryption header
-----------------------------------------</pre>
<pre>the purpose of this step is to further initialize the encryption
keys, based on random data, to render a plaintext attack on the
data ineffective.
</pre>
<pre>read the 12-byte encryption header into buffer, in locations
buffer(0) thru buffer(11).</pre>
<pre>loop for i <- 0 to 11
c <- buffer(i) ^ decrypt_byte()
update_keys(c)
buffer(i) <- c
end loop
</pre>
<pre>where decrypt_byte() is defined as:
</pre>
<pre>unsigned char decrypt_byte()
local unsigned short temp
temp <- key(2) | 2
decrypt_byte <- (temp * (temp ^ 1)) >> 8
end decrypt_byte
</pre>
<pre>after the header is decrypted, the last 1 or 2 bytes in buffer
should be the high-order word/byte of the crc for the file being
decrypted, stored in intel low-byte/high-byte order. versions of
pkzip prior to 2.0 used a 2 byte crc check; a 1 byte crc check is
used on versions after 2.0. this can be used to test if the password
supplied is correct or not.
</pre>
<pre>step 3 - decrypting the compressed data stream
----------------------------------------------</pre>
<pre>the compressed data stream can be decrypted as follows:
</pre>
<pre>loop until done
read a character into c
temp <- c ^ decrypt_byte()
update_keys(temp)
output temp
end loop
</pre>
<pre>in addition to the above mentioned contributors to pkzip and pkunzip,
i would like to extend special thanks to robert mahoney for suggesting
the extension .zip for this software.
</pre>
<pre>references:</pre>
<pre> fiala, edward r., and greene, daniel h., "data compression with
finite windows", communications of the acm, volume 32, number 4,
april 1989, pages 490-505.</pre>
<pre> held, gilbert, "data compression, techniques and applications,
hardware and software considerations",
john wiley & sons, 1987.</pre>
<pre> huffman, d.a., "a method for the construction of minimum-redundancy
codes", proceedings of the ire, volume 40, number 9, september 1952,
pages 1098-1101.</pre>
<pre> nelson, mark, "lzw data compression", dr. dobbs journal, volume 14,
number 10, october 1989, pages 29-37.</pre>
<pre> nelson, mark, "the data compression book", m&t books, 1991.</pre>
<pre> storer, james a., "data compression, methods and theory",
computer science press, 1988</pre>
<pre> welch, terry, "a technique for high-performance data compression",
ieee computer, volume 17, number 6, june 1984, pages 8-19.</pre>
<pre> ziv, j. and lempel, a., "a universal algorithm for sequential data
compression", communications of the acm, volume 30, number 6,
june 1987, pages 520-540.</pre>
<pre> ziv, j. and lempel, a., "compression of individual sequences via
variable-rate coding", ieee transactions on information theory,
volume 24, number 5, september 1978, pages 530-536.</pre>
</td>
</tr>
</table>
</center></div>
<p align="center"><a href="../index.htm">返回</a></p>
</body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -