📄 rfc1951.txt

📁 SharpZipLib之前叫做NZipLib
💻 TXT
📖 第 1 页 / 共 3 页
字号:
         We define a prefix code in terms of a binary tree in which the         two edges descending from each non-leaf node are labeled 0 and         1 and in which the leaf nodes correspond one-for-one with (are         labeled with) the symbols of the alphabet; then the code for a         symbol is the sequence of 0's and 1's on the edges leading from         the root to the leaf labeled with that symbol.  For example:Deutsch                      Informational                      [Page 6]RFC 1951      DEFLATE Compressed Data Format Specification      May 1996                          /\              Symbol    Code                         0  1             ------    ----                        /    \                A      00                       /\     B               B       1                      0  1                    C     011                     /    \                   D     010                    A     /\                         0  1                        /    \                       D      C         A parser can decode the next symbol from an encoded input         stream by walking down the tree from the root, at each step         choosing the edge corresponding to the next input bit.         Given an alphabet with known symbol frequencies, the Huffman         algorithm allows the construction of an optimal prefix code         (one which represents strings with those symbol frequencies         using the fewest bits of any possible prefix codes for that         alphabet).  Such a code is called a Huffman code.  (See         reference [1] in Chapter 5, references for additional         information on Huffman codes.)         Note that in the "deflate" format, the Huffman codes for the         various alphabets must not exceed certain maximum code lengths.         This constraint complicates the algorithm for computing code         lengths from symbol frequencies.  Again, see Chapter 5,         references for details.      3.2.2. Use of Huffman coding in the "deflate" format         The Huffman codes used for each alphabet in the "deflate"         format have two additional rules:             * All codes of a given bit length have lexicographically               consecutive values, in the same order as the symbols               they represent;             * Shorter codes lexicographically precede longer codes.Deutsch                      Informational                      [Page 7]RFC 1951      DEFLATE Compressed Data Format Specification      May 1996         We could recode the example above to follow this rule as         follows, assuming that the order of the alphabet is ABCD:            Symbol  Code            ------  ----            A       10            B       0            C       110            D       111         I.e., 0 precedes 10 which precedes 11x, and 110 and 111 are         lexicographically consecutive.         Given this rule, we can define the Huffman code for an alphabet         just by giving the bit lengths of the codes for each symbol of         the alphabet in order; this is sufficient to determine the         actual codes.  In our example, the code is completely defined         by the sequence of bit lengths (2, 1, 3, 3).  The following         algorithm generates the codes as integers, intended to be read         from most- to least-significant bit.  The code lengths are         initially in tree[I].Len; the codes are produced in         tree[I].Code.         1)  Count the number of codes for each code length.  Let             bl_count[N] be the number of codes of length N, N >= 1.         2)  Find the numerical value of the smallest code for each             code length:                code = 0;                bl_count[0] = 0;                for (bits = 1; bits <= MAX_BITS; bits++) {                    code = (code + bl_count[bits-1]) << 1;                    next_code[bits] = code;                }         3)  Assign numerical values to all codes, using consecutive             values for all codes of the same length with the base             values determined at step 2. Codes that are never used             (which have a bit length of zero) must not be assigned a             value.                for (n = 0;  n <= max_code; n++) {                    len = tree[n].Len;                    if (len != 0) {                        tree[n].Code = next_code[len];                        next_code[len]++;                    }Deutsch                      Informational                      [Page 8]RFC 1951      DEFLATE Compressed Data Format Specification      May 1996                }         Example:         Consider the alphabet ABCDEFGH, with bit lengths (3, 3, 3, 3,         3, 2, 4, 4).  After step 1, we have:            N      bl_count[N]            -      -----------            2      1            3      5            4      2         Step 2 computes the following next_code values:            N      next_code[N]            -      ------------            1      0            2      0            3      2            4      14         Step 3 produces the following code values:            Symbol Length   Code            ------ ------   ----            A       3        010            B       3        011            C       3        100            D       3        101            E       3        110            F       2         00            G       4       1110            H       4       1111      3.2.3. Details of block format         Each block of compressed data begins with 3 header bits         containing the following data:            first bit       BFINAL            next 2 bits     BTYPE         Note that the header bits do not necessarily begin on a byte         boundary, since a block does not necessarily occupy an integral         number of bytes.Deutsch                      Informational                      [Page 9]RFC 1951      DEFLATE Compressed Data Format Specification      May 1996         BFINAL is set if and only if this is the last block of the data         set.         BTYPE specifies how the data are compressed, as follows:            00 - no compression            01 - compressed with fixed Huffman codes            10 - compressed with dynamic Huffman codes            11 - reserved (error)         The only difference between the two compressed cases is how the         Huffman codes for the literal/length and distance alphabets are         defined.         In all cases, the decoding algorithm for the actual data is as         follows:            do               read block header from input stream.               if stored with no compression                  skip any remaining bits in current partially                     processed byte                  read LEN and NLEN (see next section)                  copy LEN bytes of data to output               otherwise                  if compressed with dynamic Huffman codes                     read representation of code trees (see                        subsection below)                  loop (until end of block code recognized)                     decode literal/length value from input stream                     if value < 256                        copy value (literal byte) to output stream                     otherwise                        if value = end of block (256)                           break from loop                        otherwise (value = 257..285)                           decode distance from input stream                           move backwards distance bytes in the output                           stream, and copy length bytes from this                           position to the output stream.                  end loop            while not last block         Note that a duplicated string reference may refer to a string         in a previous block; i.e., the backward distance may cross one         or more block boundaries.  However a distance cannot refer past         the beginning of the output stream.  (An application using aDeutsch                      Informational                     [Page 10]RFC 1951      DEFLATE Compressed Data Format Specification      May 1996         preset dictionary might discard part of the output stream; a         distance can refer to that part of the output stream anyway)         Note also that the referenced string may overlap the current         position; for example, if the last 2 bytes decoded have values         X and Y, a string reference with <length = 5, distance = 2>         adds X,Y,X,Y,X to the output stream.         We now specify each compression method in turn.      3.2.4. Non-compressed blocks (BTYPE=00)         Any bits of input up to the next byte boundary are ignored.         The rest of the block consists of the following information:              0   1   2   3   4...            +---+---+---+---+================================+            |  LEN  | NLEN  |... LEN bytes of literal data...|            +---+---+---+---+================================+         LEN is the number of data bytes in the block.  NLEN is the         one's complement of LEN.      3.2.5. Compressed blocks (length and distance codes)         As noted above, encoded data blocks in the "deflate" format         consist of sequences of symbols drawn from three conceptually         distinct alphabets: either literal bytes, from the alphabet of         byte values (0..255), or <length, backward distance> pairs,         where the length is drawn from (3..258) and the distance is         drawn from (1..32,768).  In fact, the literal and length         alphabets are merged into a single alphabet (0..285), where         values 0..255 represent literal bytes, the value 256 indicates         end-of-block, and values 257..285 represent length codes         (possibly in conjunction with extra bits following the symbol         code) as follows:Deutsch                      Informational                     [Page 11]RFC 1951      DEFLATE Compressed Data Format Specification      May 1996                 Extra               Extra               Extra            Code Bits Length(s) Code Bits Lengths   Code Bits Length(s)            ---- ---- ------     ---- ---- -------   ---- ---- -------             257   0     3       267   1   15,16     277   4   67-82             258   0     4       268   1   17,18     278   4   83-98             259   0     5       269   2   19-22     279   4   99-114             260   0     6       270   2   23-26     280   4  115-130             261   0     7       271   2   27-30     281   5  131-162             262   0     8       272   2   31-34     282   5  163-194             263   0     9       273   3   35-42     283   5  195-226             264   0    10       274   3   43-50     284   5  227-257             265   1  11,12      275   3   51-58     285   0    258             266   1  13,14      276   3   59-66         The extra bits should be interpreted as a machine integer         stored with the most-significant bit first, e.g., bits 1110
💿 文件大小 398 K
👤 上传用户 ylchsj521
📂 所属分类压缩解压
🏷️ 相关标签

#SharpZipLib #NZipLib
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -