⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 rfc3284.txt

📁 RFC 的详细文档!
💻 TXT
📖 第 1 页 / 共 5 页
字号:
   to identify the file type, and information concerning data processing
   beyond the basic encoding format.  The Window sections encode the
   target windows.

   Below is the overall organization of a delta file.  The indented
   items refine the ones immediately above them.  An item in square
   brackets may or may not be present in the file depending on the
   information encoded in the Indicator byte above it.







Korn, et. al.               Standards Track                     [Page 6]

RFC 3284                         VCDIFF                        June 2002


      Header
          Header1                                  - byte
          Header2                                  - byte
          Header3                                  - byte
          Header4                                  - byte
          Hdr_Indicator                            - byte
          [Secondary compressor ID]                - byte
          [Length of code table data]              - integer
          [Code table data]
              Size of near cache                   - byte
              Size of same cache                   - byte
              Compressed code table data
      Window1
          Win_Indicator                            - byte
          [Source segment size]                    - integer
          [Source segment position]                - integer
          The delta encoding of the target window
              Length of the delta encoding         - integer
              The delta encoding
                  Size of the target window        - integer
                  Delta_Indicator                  - byte
                  Length of data for ADDs and RUNs - integer
                  Length of instructions and sizes - integer
                  Length of addresses for COPYs    - integer
                  Data section for ADDs and RUNs   - array of bytes
                  Instructions and sizes section   - array of bytes
                  Addresses section for COPYs      - array of bytes
      Window2
      ...

4.1 The Header Section

   Each delta file starts with a header section organized as below.
   Note the convention that square-brackets enclose optional items.

         Header1                                  - byte = 0xD6
         Header2                                  - byte = 0xC3
         Header3                                  - byte = 0xC4
         Header4                                  - byte
         Hdr_Indicator                            - byte
         [Secondary compressor ID]                - byte
         [Length of code table data]              - integer
         [Code table data]








Korn, et. al.               Standards Track                     [Page 7]

RFC 3284                         VCDIFF                        June 2002


   The first three Header bytes are the ASCII characters 'V', 'C' and
   'D' with their most significant bits turned on (in hexadecimal, the
   values are 0xD6, 0xC3, and 0xC4).  The fourth Header byte is
   currently set to zero.  In the future, it might be used to indicate
   the version of Vcdiff.

   The Hdr_Indicator byte shows if there is any initialization data
   required to aid in the reconstruction of data in the Window sections.
   This byte MAY have non-zero values for either, both, or neither of
   the two bits VCD_DECOMPRESS and VCD_CODETABLE below:

       7 6 5 4 3 2 1 0
      +-+-+-+-+-+-+-+-+
      | | | | | | | | |
      +-+-+-+-+-+-+-+-+
                   ^ ^
                   | |
                   | +-- VCD_DECOMPRESS
                   +---- VCD_CODETABLE

   If bit 0 (VCD_DECOMPRESS) is non-zero, this indicates that a
   secondary compressor may have been used to further compress certain
   parts of the delta encoding data as described in Sections 4.3 and 6.
   In that case, the ID of the secondary compressor is given next.  If
   this bit is zero, the compressor ID byte is not included.

   If bit 1 (VCD_CODETABLE) is non-zero, this indicates that an
   application-defined code table is to be used for decoding the delta
   instructions.  This table itself is compressed.  The length of the
   data comprising this compressed code table and the data follow next.
   Section 7 discusses application-defined code tables.  If this bit is
   zero, the code table data length and the code table data are not
   included.

   If both bits are set, then the compressor ID byte is included before
   the code table data length and the code table data.

4.2 The Format of a Window Section

   Each Window section is organized as follows:

      Win_Indicator                            - byte
      [Source segment length]                  - integer
      [Source segment position]                - integer
      The delta encoding of the target window






Korn, et. al.               Standards Track                     [Page 8]

RFC 3284                         VCDIFF                        June 2002


   Below are the details of the various items:

      Win_Indicator:
          This byte is a set of bits, as shown:

          7 6 5 4 3 2 1 0
         +-+-+-+-+-+-+-+-+
         | | | | | | | | |
         +-+-+-+-+-+-+-+-+
                      ^ ^
                      | |
                      | +-- VCD_SOURCE
                      +---- VCD_TARGET

         If bit 0 (VCD_SOURCE) is non-zero, this indicates that a
         segment of data from the "source" file was used as the
         corresponding source window of data to encode the target
         window.  The decoder will use this same source data segment to
         decode the target window.

         If bit 1 (VCD_TARGET) is non-zero, this indicates that a
         segment of data from the "target" file was used as the
         corresponding source window of data to encode the target
         window.  As above, this same source data segment is used to
         decode the target window.

         The Win_Indicator byte MUST NOT have more than one of the bits
         set (non-zero).  It MAY have none of these bits set.

         If one of these bits is set, the byte is followed by two
         integers to indicate respectively, the length and position of
         the source data segment in the relevant file.  If the indicator
         byte is zero, the target window was compressed by itself
         without comparing against another data segment, and these two
         integers are not included.

      The delta encoding of the target window:

         This contains the delta encoding of the target window, either
         in terms of the source data segment (i.e., VCD_SOURCE or
         VCD_TARGET was set) or by itself if no source window is
         specified.  This data format is discussed next.









Korn, et. al.               Standards Track                     [Page 9]

RFC 3284                         VCDIFF                        June 2002


4.3 The Delta Encoding of a Target Window

   The delta encoding of a target window is organized as follows:

      Length of the delta encoding            - integer
      The delta encoding
          Length of the target window         - integer
          Delta_Indicator                     - byte
          Length of data for ADDs and RUNs    - integer
          Length of instructions section      - integer
          Length of addresses for COPYs       - integer
          Data section for ADDs and RUNs      - array of bytes
          Instructions and sizes section      - array of bytes
          Addresses section for COPYs         - array of bytes

         Length of the delta encoding:
            This integer gives the total number of remaining bytes that
            comprise the data of the delta encoding for this target
            window.

         The delta encoding:
            This contains the data representing the delta encoding which
            is described next.

         Length of the target window:
            This integer indicates the actual size of the target window
            after decompression.  A decoder can use this value to
            allocate memory to store the uncompressed data.

         Delta_Indicator:
            This byte is a set of bits, as shown:

          7 6 5 4 3 2 1 0
         +-+-+-+-+-+-+-+-+
         | | | | | | | | |
         +-+-+-+-+-+-+-+-+
                    ^ ^ ^
                    | | |
                    | | +-- VCD_DATACOMP
                    | +---- VCD_INSTCOMP
                    +------ VCD_ADDRCOMP

              VCD_DATACOMP:   bit value 1.
              VCD_INSTCOMP:   bit value 2.
              VCD_ADDRCOMP:   bit value 4.






Korn, et. al.               Standards Track                    [Page 10]

RFC 3284                         VCDIFF                        June 2002


         As discussed, the delta encoding consists of COPY, ADD and RUN
         instructions.  The ADD and RUN instructions have accompanying
         unmatched data (that is, data that does not specifically match
         any data in the source window or in some earlier part of the
         target window) and the COPY instructions have addresses of
         where the matches occur.  OPTIONALLY, these types of data MAY
         be further compressed using a secondary compressor.  Thus,
         Vcdiff separates the encoding of the delta instructions into
         three parts:

            a. The unmatched data in the ADD and RUN instructions,
            b. The delta instructions and accompanying sizes, and
            c. The addresses of the COPY instructions.

         If the bit VCD_DECOMPRESS (Section 4.1) was on, each of these
         sections may have been compressed using the specified secondary
         compressor.  The bit positions 0 (VCD_DATACOMP), 1
         (VCD_INSTCOMP), and 2 (VCD_ADDRCOMP) respectively indicate, if
         non-zero, that the corresponding parts are compressed.  Then,
         these parts MUST be decompressed before decoding the delta
         instructions.

      Length of data for ADDs and RUNs:
         This is the length (in bytes) of the section of data storing
         the unmatched data accompanying the ADD and RUN instructions.

      Length of instructions section:
         This is the length (in bytes) of the delta instructions and
         accompanying sizes.

      Length of addresses for COPYs:
         This is the length (in bytes) of the section storing the
         addresses of the COPY instructions.

      Data section for ADDs and RUNs:
         This sequence of bytes encodes the unmatched data for the ADD
         and RUN instructions.

      Instructions and sizes section:
         This sequence of bytes encodes the instructions and their
         sizes.

      Addresses section for COPYs:
         This sequence of bytes encodes the addresses of the COPY
         instructions.






Korn, et. al.               Standards Track                    [Page 11]

RFC 3284                         VCDIFF                        June 2002


5. Delta Instruction Encoding

   The delta instructions described in Section 3 represent the results
   of string matching.  For many data differencing applications in which
   the changes between source and target data are small, any
   straightforward representation of these instructions would be
   adequate.  However, for applications including differencing of binary
   files or data compression, it is important to encode these
   instructions well to achieve good compression rates.  The keys to
   this achievement is to efficiently encode the addresses of COPY
   instructions and the sizes of all delta instructions.

5.1 Address Encoding Modes of COPY Instructions

   Addresses of COPY instructions are locations of matches and often
   occur close by or even exactly equal to one another.  This is because
   data in local regions are often replicated with minor changes.  In
   turn, this means that coding a newly matched address against some
   recently matched addresses can be beneficial.  To take advantage of
   this phenomenon and encode addresses of COPY instructions more
   efficiently, the Vcdiff data format supports the use of two different
   types of address caches.  Both the encoder and decoder maintain these

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -