📄 appendix-a.html
字号:
</P>
<P>Following the header structure header is an area called the
index. The index contains one or more index entries.
Each index entry contains information about, and a pointer
to, a specific data item.
</P>
<P>After the index comes the store. It is in the store that the data items are kept. The data in
the store is packed together as closely as possible. The order in which the data is stored is
immaterial—a far cry from the C structure used in the lead.
</P>
<B>A.2.1.1.1.1.3. The Header Structure in Depth</B>
<P>Let's take a more in-depth look at the actual format of a header structure, starting with
the header structure header.
</P>
<B>A.2.1.1.1.1.4. The Header Structure Header</B>
<P>The header structure header always starts with a 3-byte magic number:
8e ad e8. Following this is a 1-byte version number. Next are 4 bytes that are reserved for future expansion.
After the reserved bytes is a 4-byte number that indicates how many index entries exist in this
header structure, followed by another 4-byte number indicating how many bytes of data are part
of the header structure.
</P>
<B>A.2.1.1.1.1.5. The Index Entry</B>
<P>The header structure's index is made up of zero or more index entries. Each entry is 16
bytes long. The first 4 bytes contain a tag—a numeric value that identifies what type of data is
pointed to by the entry. The tag values change according to the header structure's position in the
RPM file. A list of the actual tag values, and what they represent, is included in section A.2.1.3.2.
</P>
<A NAME="PAGENUM-343"><P>Page 343</P></A>
<P>Following the tag is a 4-byte type, which is a numeric value that describes the format of
the data pointed to by the entry. The types and their values do not change from header
structure to header structure. Here is the current list:
</P>
<UL>
<LI> NULL = 0
<LI> CHAR = 1
<LI> INT8 = 2
<LI> INT16 = 3
<LI> INT32 = 4
<LI> INT64 = 5
<LI> STRING = 6
<LI> BIN = 7
<LI> STRING_ARRAY = 8
</UL>
<P>A few of the data types might need some clarification. The
STRING data type is simply a null-terminated string, while the
STRING_ARRAY is a collection of strings. Finally, the
BIN data type is a collection of binary data. This is normally used to identify data that is longer than an
INT but is not a printable STRING.
</P>
<P>Next is a 4-byte offset that contains the position of the data, relative to the beginning of
the store. We'll talk about the store in just a moment.
</P>
<P>Finally, there is a 4-byte count that contains the number of data items pointed to by the
index entry. There are a few wrinkles to the meaning of the count, and they center around the
STRING and STRING_ARRAY data types. STRING data always has a count of 1, while
STRING_ARRAY data has a count equal to the number of strings contained in the store.
</P>
<B>A.2.1.1.1.1.6. The Store</B>
<P>The store is where the data contained in the header structure is stored. Depending on the
data type being stored, there are some details that should be kept in mind:
</P>
<UL>
<LI> For STRING data, each string is terminated with a null byte.
<LI> For INT data, each integer is stored at the natural boundary for its type. A 64-bit
INT is stored on an 8-byte boundary, a 16-bit INT is stored on a 2-byte boundary, and so on.
<LI> All data is in network byte order.
</UL>
<P>With all these details out of the way, let's take a look at the signature.
</P>
<H4>
A.2.1.2. The Signature
</H4>
<P>The signature section follows the lead in the RPM package file. It contains information
that can be used to verify the integrity and, optionally, the authenticity of the majority of the
package file. The signature is implemented as a header structure.
</P>
<A NAME="PAGENUM-344"><P>Page 344</P></A>
<P>You probably noticed our use of the word
majority. The information in the signature header structure is based on the contents of the package file's header and archive only. The data in
the lead and the signature header structure is not included when the signature information is
created, nor is it part of any subsequent checks based on that information.
</P>
<P>While that omission might seem to be a weakness in RPM's design, it really isn't. In the case
of the lead, since it is used only for easy identification of package files, any changes made to
that part of the file would, at worst, leave the file in such a state that RPM wouldn't recognize it
as a valid package file. Likewise, any changes to the signature header structure would make
it impossible to verify the file's integrity, since the signature information would have been
changed from its original value.
</P>
<B>A.2.1.2.1. Analyzing the Signature Area</B>
<P>Using our newfound knowledge of header structures, let's take a look at the signatures in rpm-
</P>
<!-- CODE SNIP //-->
<PRE>
2.2.1-1.i386.rpm:
00000060: 8ead e801 0000 0000 0000 0003 0000 00ac ................
</PRE>
<!-- END CODE SNIP //-->
<P>The first 3 bytes (8ead e8) contain the magic number for the start of the header structure.
The next byte (01) is the header structure's version.
</P>
<P>As we discussed earlier, the next 4 bytes (0000
0000) are reserved. The 4 bytes after that (0000
0003) represent the number of index entries in the signature section, namely, three.
Following that are 4 bytes (0000 00ac) that indicate how many bytes of data are stored in the
signature. The hex value 00ac, when converted to decimal, means the store is 172 bytes long.
</P>
<P>Following the first 16 bytes is the index. Each of the three index entries in this header
structure consists of four 32-bit integers, in the following order:
</P>
<UL>
<LI> Tag
<LI> Type
<LI> Offset
<LI> Count
</UL>
<P>Let's take a look at the first index entry:
</P>
<!-- CODE SNIP //-->
<PRE>
00000070: 0000 03e8 0000 0004 0000 0000 0000 0001 ................
</PRE>
<!-- END CODE SNIP //-->
<P>The tag consists of the first 4 bytes (0000
03e8), which is 1,000 when translated from hex.
Looking in the RPM source directory, at the file
lib/signature.h, we find the following tag definitions:
</P>
<!-- CODE SNIP //-->
<PRE>
#define SIGTAG_SIZE 1000
#define SIGTAG_MD5 1001
#define SIGTAG_PGP 1002
</PRE>
<!-- END CODE SNIP //-->
<P>So the tag we are studying is for a size signature. Let's continue.
</P>
<A NAME="PAGENUM-345"><P>Page 345</P></A>
<P>The next 4 bytes (0000 0004) contain the data type. As we saw earlier, data type 4 means
that the data stored for this index entry is a 32-bit integer. Skipping the next 4 bytes for a
moment, the last 4 bytes (0000 0001) are the number of 32-bit integers pointed to by this index entry.
</P>
<P>Now let's go back to the 4 bytes prior to the count
(0000 0000). This number is the offset, in bytes, at which the size signature is located. It has a value of zero, but the question is, 0
bytes from what? The answer, although it doesn't do us much good, is that the offset is
calculated from the start of the store. So first we must find where the store begins, and we can do that
by performing a simple calculation.
</P>
<P>First, go back to the start of the signature section. We've made a copy here so you won't
need to flip from page to page:
</P>
<!-- CODE SNIP //-->
<PRE>
00000060: 8ead e801 0000 0000 0000 0003 0000 00ac ................
</PRE>
<!-- END CODE SNIP //-->
<P>After the magic, the version, and the 4 reserved bytes, there are the number of index
entries (0000 0003). Since we know that each index entry is 16 bytes long (4 for the tag, 4 for the
type, 4 for the offset, and 4 for the count), we can multiply the number of entries (3) by the
number of bytes in each entry (16) and obtain the total size of the index, which is 48 in decimal, or
30 in hex. Since the first index entry starts at hex offset 70, we can simply add hex 30 to hex
70, and get, in hex, offset a0. So let's skip down to offset a0 and see what's there:
</P>
<!-- CODE SNIP //-->
<PRE>
000000a0: 0004 4c4f b025 b097 1597 0132 df35 d169 ..LO.%.....2.5.i
</PRE>
<!-- END CODE SNIP //-->
<P>If we've done our math correctly, the first 4 bytes
(0004 4c4f) should represent the size of this file. Converting to decimal, this is 281,679. Let's take a look at the size of the actual file:
</P>
<!-- CODE SNIP //-->
<PRE>
# ls -al rpm-2.2.1-1.i386.rpm
-rw-rw-r-- 1 ed ed 282015 Jul 21 16:05 rpm-2.2.1-1.i386.rpm
#
</PRE>
<!-- END CODE SNIP //-->
<P>Hmmm, something's not right. Or is it? It looks like we're short by 336 bytes, or in hex,
150. Interesting how that's a nice round hex number, isn't it? For now, let's continue through
the remainder of the index entries, and see if hex 150 pops up elsewhere.
</P>
<P>Here's the next index entry. It has a tag of decimal 1001, which is an MD5 checksum. It
is type 7, which is the BIN data type, it is 16 bytes long, and its data starts 4 bytes after the
beginning of the store:
</P>
<!-- CODE SNIP //-->
<PRE>
00000080: 0000 03e9 0000 0007 0000 0004 0000 0010 ................
</PRE>
<!-- END CODE SNIP //-->
<P>And here's the data. It starts with b025 (Remember that offset of four!) and ends on the
second line with 5375. This is a 128-bit MD5 checksum of the package file's header and archive
sections:
</P>
<!-- CODE SNIP //-->
<PRE>
000000a0: 0004 4c4f b025 b097 1597 0132 df35 d169 ..LO.%.....2.5.i
000000b0: 329c 5375 8900 9503 0500 31ed 6390 a520 2.Su......1.c..
</PRE>
<!-- END CODE SNIP //-->
<P>Okay, let's jump back to the last index entry:
</P>
<!-- CODE SNIP //-->
<PRE>
00000090: 0000 03ea 0000 0007 0000 0014 0000 0098 ................
</PRE>
<!-- END CODE SNIP //-->
<A NAME="PAGENUM-346"><P>Page 346</P></A>
<P>It has a tag value of 03ea (1002 in decimal—a PGP signature block) and is also a
BIN data type. The data starts 20 decimal bytes from the start of the data area, which would put it at file
offset b4 (in hex). It's a biggie—152 bytes long! Here's the data, starting with
8900:
</P>
<!-- CODE //-->
<PRE>
000000b0: 329c 5375 8900 9503 0500 31ed 6390 a520 2.Su......1.c..
000000c0: e8f1 cba2 9bf9 0101 437b 0400 9c8e 0ad4 ........C{......
000000d0: 3790 364e dfb0 9a8a 22b5 b0b3 dc30 4c6f 7.6N...."....0Lo
000000e0: 91b8 c150 704e 2c64 d88a 8fca 18ab 5b6f ...PpN,d......[o
000000f0: f041 ebc8 d18a 01c9 3601 66f0 9ddd e956 .A......6.f....V
00000100: 3142 61b3 b1da 8494 6bef 9c19 4574 c49f 1Ba.....k...Et..
00000110: ee17 35e1 d105 fb68 0ce6 715a 60f1 c660 ..5....h..qZ`..`
00000120: 279f 0306 28ed 0ba0 0855 9e82 2b1c 2ede `...(....U..+...
00000130: e8e3 5090 6260 0b3c ba04 69a9 2573 1bbb ..P.b`.<..i.%s..
00000140: 5b65 4de1 b1d2 c07f 8afa 4a9b 0000 0000 [eM.......J.....
</PRE>
<!-- END CODE //-->
<P>It ends with the bytes 4a9b. This is a 1,216-bit PGP signature block. It is also the end of
the signature section. There are 4 null bytes following the last data item in order to round the
size out so that it ends on an 8-byte boundary. This means that the offset of the next section
starts at offset 150, in hex. Say, wasn't the size in the size signature off by 150 hex? Yes, the size in
the signature is the size of the file—minus the size of the lead and the signature sections.
</P>
<B>
A.2.1.3. The Header
</B>
<P>The header section contains all available information about the package. Entries such as
the package's name, version, and file list are contained in the header. Like the signature
section, the header is in header structure format. Unlike the signature, which has only three
possible tag types, the header has more than 60 different tags. (The list of currently defined tags
appears in section A.2.1.3.2.) Be aware that the list of tags changes frequently; the definitive list
appears in the RPM sources in lib/rpmlib.h.
</P>
<B>A.2.1.3.1. Analyzing the Header</B>
<P>The easiest way to find the start of the header is to look for the second header structure
by scanning for its magic number (8ead e8). The 16 bytes, starting with the magic, are the
header structure's header. They follow the same format as the header in the signature's header
structure:
<P>
<!-- CODE SNIP //-->
<PRE>
00000150: 8ead e801 0000 0000 0000 0021 0000 09d3 ...........!....
</PRE>
<!-- END CODE SNIP //-->
<P>As before, the byte following the magic identifies this header structure as being in version
1 format. Following the 4 reserved bytes, we find the count of entries stored in the header
(0000 0021). Converting to decimal, we find that there are 33 entries in the header. The next 4
bytes (0000 09d3), converted to decimal, tell us that there are 2,515 bytes of data in the store.
</P>
<P>Since the header is a header structure just like the signature, we know that the next 16 bytes
are the first index entry:
</P>
<!-- CODE SNIP //-->
<PRE>
00000160: 0000 03e8 0000 0006 0000 0000 0000 0001 ................
</PRE>
<!-- END CODE SNIP //-->
<A NAME="PAGENUM-347"><P>Page 347</P></A>
<P>The first 4 bytes (0000 03e8) are the tag, which is the tag for the package name. The next
4 bytes indicate that the data is type 6, or a null-terminated string. There's an offset of 0 in
the next 4 bytes, meaning that the data for this tag is first in the store. Finally, the last 4 bytes
(0000 0001) show that the data count is 1, which is the only legal value for data of type
STRING.
</P>
<P>To find the data, we need to take the offset from the start of the first index entry in the
header (160) and add in the count of index entries (21) multiplied by the size of an index entry
(10). Doing the math (all the values shown are in hex, remember!), we arrive at the offset to
the store, hex 370. Since the offset for this particular index entry is 0, the data should start at
offset 370:
</P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -