📄 rfc3284.txt
字号:
to identify the file type, and information concerning data processing
beyond the basic encoding format. The Window sections encode the
target windows.
Below is the overall organization of a delta file. The indented
items refine the ones immediately above them. An item in square
brackets may or may not be present in the file depending on the
information encoded in the Indicator byte above it.
Korn, et. al. Standards Track [Page 6]
RFC 3284 VCDIFF June 2002
Header
Header1 - byte
Header2 - byte
Header3 - byte
Header4 - byte
Hdr_Indicator - byte
[Secondary compressor ID] - byte
[Length of code table data] - integer
[Code table data]
Size of near cache - byte
Size of same cache - byte
Compressed code table data
Window1
Win_Indicator - byte
[Source segment size] - integer
[Source segment position] - integer
The delta encoding of the target window
Length of the delta encoding - integer
The delta encoding
Size of the target window - integer
Delta_Indicator - byte
Length of data for ADDs and RUNs - integer
Length of instructions and sizes - integer
Length of addresses for COPYs - integer
Data section for ADDs and RUNs - array of bytes
Instructions and sizes section - array of bytes
Addresses section for COPYs - array of bytes
Window2
...
4.1 The Header Section
Each delta file starts with a header section organized as below.
Note the convention that square-brackets enclose optional items.
Header1 - byte = 0xD6
Header2 - byte = 0xC3
Header3 - byte = 0xC4
Header4 - byte
Hdr_Indicator - byte
[Secondary compressor ID] - byte
[Length of code table data] - integer
[Code table data]
Korn, et. al. Standards Track [Page 7]
RFC 3284 VCDIFF June 2002
The first three Header bytes are the ASCII characters 'V', 'C' and
'D' with their most significant bits turned on (in hexadecimal, the
values are 0xD6, 0xC3, and 0xC4). The fourth Header byte is
currently set to zero. In the future, it might be used to indicate
the version of Vcdiff.
The Hdr_Indicator byte shows if there is any initialization data
required to aid in the reconstruction of data in the Window sections.
This byte MAY have non-zero values for either, both, or neither of
the two bits VCD_DECOMPRESS and VCD_CODETABLE below:
7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+
| | | | | | | | |
+-+-+-+-+-+-+-+-+
^ ^
| |
| +-- VCD_DECOMPRESS
+---- VCD_CODETABLE
If bit 0 (VCD_DECOMPRESS) is non-zero, this indicates that a
secondary compressor may have been used to further compress certain
parts of the delta encoding data as described in Sections 4.3 and 6.
In that case, the ID of the secondary compressor is given next. If
this bit is zero, the compressor ID byte is not included.
If bit 1 (VCD_CODETABLE) is non-zero, this indicates that an
application-defined code table is to be used for decoding the delta
instructions. This table itself is compressed. The length of the
data comprising this compressed code table and the data follow next.
Section 7 discusses application-defined code tables. If this bit is
zero, the code table data length and the code table data are not
included.
If both bits are set, then the compressor ID byte is included before
the code table data length and the code table data.
4.2 The Format of a Window Section
Each Window section is organized as follows:
Win_Indicator - byte
[Source segment length] - integer
[Source segment position] - integer
The delta encoding of the target window
Korn, et. al. Standards Track [Page 8]
RFC 3284 VCDIFF June 2002
Below are the details of the various items:
Win_Indicator:
This byte is a set of bits, as shown:
7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+
| | | | | | | | |
+-+-+-+-+-+-+-+-+
^ ^
| |
| +-- VCD_SOURCE
+---- VCD_TARGET
If bit 0 (VCD_SOURCE) is non-zero, this indicates that a
segment of data from the "source" file was used as the
corresponding source window of data to encode the target
window. The decoder will use this same source data segment to
decode the target window.
If bit 1 (VCD_TARGET) is non-zero, this indicates that a
segment of data from the "target" file was used as the
corresponding source window of data to encode the target
window. As above, this same source data segment is used to
decode the target window.
The Win_Indicator byte MUST NOT have more than one of the bits
set (non-zero). It MAY have none of these bits set.
If one of these bits is set, the byte is followed by two
integers to indicate respectively, the length and position of
the source data segment in the relevant file. If the indicator
byte is zero, the target window was compressed by itself
without comparing against another data segment, and these two
integers are not included.
The delta encoding of the target window:
This contains the delta encoding of the target window, either
in terms of the source data segment (i.e., VCD_SOURCE or
VCD_TARGET was set) or by itself if no source window is
specified. This data format is discussed next.
Korn, et. al. Standards Track [Page 9]
RFC 3284 VCDIFF June 2002
4.3 The Delta Encoding of a Target Window
The delta encoding of a target window is organized as follows:
Length of the delta encoding - integer
The delta encoding
Length of the target window - integer
Delta_Indicator - byte
Length of data for ADDs and RUNs - integer
Length of instructions section - integer
Length of addresses for COPYs - integer
Data section for ADDs and RUNs - array of bytes
Instructions and sizes section - array of bytes
Addresses section for COPYs - array of bytes
Length of the delta encoding:
This integer gives the total number of remaining bytes that
comprise the data of the delta encoding for this target
window.
The delta encoding:
This contains the data representing the delta encoding which
is described next.
Length of the target window:
This integer indicates the actual size of the target window
after decompression. A decoder can use this value to
allocate memory to store the uncompressed data.
Delta_Indicator:
This byte is a set of bits, as shown:
7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+
| | | | | | | | |
+-+-+-+-+-+-+-+-+
^ ^ ^
| | |
| | +-- VCD_DATACOMP
| +---- VCD_INSTCOMP
+------ VCD_ADDRCOMP
VCD_DATACOMP: bit value 1.
VCD_INSTCOMP: bit value 2.
VCD_ADDRCOMP: bit value 4.
Korn, et. al. Standards Track [Page 10]
RFC 3284 VCDIFF June 2002
As discussed, the delta encoding consists of COPY, ADD and RUN
instructions. The ADD and RUN instructions have accompanying
unmatched data (that is, data that does not specifically match
any data in the source window or in some earlier part of the
target window) and the COPY instructions have addresses of
where the matches occur. OPTIONALLY, these types of data MAY
be further compressed using a secondary compressor. Thus,
Vcdiff separates the encoding of the delta instructions into
three parts:
a. The unmatched data in the ADD and RUN instructions,
b. The delta instructions and accompanying sizes, and
c. The addresses of the COPY instructions.
If the bit VCD_DECOMPRESS (Section 4.1) was on, each of these
sections may have been compressed using the specified secondary
compressor. The bit positions 0 (VCD_DATACOMP), 1
(VCD_INSTCOMP), and 2 (VCD_ADDRCOMP) respectively indicate, if
non-zero, that the corresponding parts are compressed. Then,
these parts MUST be decompressed before decoding the delta
instructions.
Length of data for ADDs and RUNs:
This is the length (in bytes) of the section of data storing
the unmatched data accompanying the ADD and RUN instructions.
Length of instructions section:
This is the length (in bytes) of the delta instructions and
accompanying sizes.
Length of addresses for COPYs:
This is the length (in bytes) of the section storing the
addresses of the COPY instructions.
Data section for ADDs and RUNs:
This sequence of bytes encodes the unmatched data for the ADD
and RUN instructions.
Instructions and sizes section:
This sequence of bytes encodes the instructions and their
sizes.
Addresses section for COPYs:
This sequence of bytes encodes the addresses of the COPY
instructions.
Korn, et. al. Standards Track [Page 11]
RFC 3284 VCDIFF June 2002
5. Delta Instruction Encoding
The delta instructions described in Section 3 represent the results
of string matching. For many data differencing applications in which
the changes between source and target data are small, any
straightforward representation of these instructions would be
adequate. However, for applications including differencing of binary
files or data compression, it is important to encode these
instructions well to achieve good compression rates. The keys to
this achievement is to efficiently encode the addresses of COPY
instructions and the sizes of all delta instructions.
5.1 Address Encoding Modes of COPY Instructions
Addresses of COPY instructions are locations of matches and often
occur close by or even exactly equal to one another. This is because
data in local regions are often replicated with minor changes. In
turn, this means that coding a newly matched address against some
recently matched addresses can be beneficial. To take advantage of
this phenomenon and encode addresses of COPY instructions more
efficiently, the Vcdiff data format supports the use of two different
types of address caches. Both the encoder and decoder maintain these
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -