📄 the pe file format.txt

📁 一些比较有价值的关于PE格式的资料,希望能对大家理解PE有帮助
💻 TXT
📖 第 1 页 / 共 5 页
字号:
        ...
        call _symbol                 ; without declspec(dllimport)
        ...
        call ptr cs:0xdeadbeef       ; with declspec(dllimport)
        ...

    the stub provided by the import library (not used with
                __declspec(dllimport)) is located in the code section
                and is the above mentioned jump-instruction:
        _symbol: jmp dword ptr [0xdeadbeef]

So however you do it, there is a call or a jump to a location the
address of which is at 0xdeadbeef (in our example). This 0xdeadbeef is
right in the middle of the import directory; it is in the entry for the
function "symbol" in the DLL "foo.dll".

Now we'll look how an import directory is made up.

The import directory should reside in a section that's "initialized
data" and "readable".
The import directory is an array of IMAGE_IMPORT_DESCRIPTORs, one for
each used DLL. The list is terminated by a IMAGE_IMPORT_DESCRIPTOR
that's entirely filled with 0-bytes.
An IMAGE_IMPORT_DESCRIPTOR is a struct with these members:

    OriginalFirstThunk
        An RVA (32 bit) to a 0-terminated array of RVAs to
        IMAGE_THUNK_DATAs, each describing one imported function. The
        array will never change.

    TimeDateStamp
        A 32-bit-timestamp that has several purposes. Let's pretent that
        the timestamp is 0, and handle the advanced cases later.

    ForwarderChain
        The 32-bit-index of the first forwarder in the list of imported
        functions. Forwarders are also advanced stuff.
        
    Name
        A 32-bit-RVA to the Name (a 0-terminated ASCII string) of the
        DLL.
        
    FirstThunk
        An RVA (32 bit) to a 0-terminated array of RVAs to
        IMAGE_THUNK_DATAs, each describing one imported function. The
        array will change.

Each IMAGE_IMPORT_DESCRIPTOR in the array gives you the name of the
exporting DLL and, apart from the forwarder and timestamp, it gives you
2 RVAs to arrays of IMAGE_THUNK_DATAs, using 32 bits. (The last member
of each array is entirely filled with 0-bytes.)
Each IMAGE_THUNK_DATA is, for now, an RVA to a IMAGE_IMPORT_BY_NAME
which describes the imported function.
The interesting point is now, the arrays run parallel, i.e.: they point
to the same IMAGE_IMPORT_BY_NAMEs.

No need to be desparate, I will draw another picture. This is the
essential contents of one IMAGE_IMPORT_DESCRIPTOR:

     OriginalFirstThunk      FirstThunk
            |                    |
            |                    |
            |                    |
            V                    V

            0-->    func1     <--0
            1-->    func2     <--1
            2-->    func3     <--2
            3-->    foo       <--3
            4-->    mumpitz   <--4
            5-->    knuff     <--5
            6-->0            0<--6      /* the last RVA is 0! */

where the names in the center are the yet to discuss
IMAGE_IMPORT_BY_NAMEs. Each of them is a 16-bit-number (the ordinal of
the imported symbol) followed by an unspecified amount of bytes, being
the 0-terminated ASCII name of the imported symbol.
Mind you, the ordinal may be wrong. It is only a hint where to start
looking for the symbol name in the export table of the exporting DLL;
you may not compare the ordinals and take the matching entry, but you
must find the matching symbol name.

To summarize, if you want to look up information about the imported
function "foo" from DLL "knurr", you first find the entry
IMAGE_DIRECTORY_ENTRY_IMPORT in the data directories, get an RVA, find
that address in the raw section data and now have an array of
IMAGE_IMPORT_DESCRIPTORs. Get the member of this array that relates to
the DLL "knurr" by inspecting the strings pointed to by the 'Name's.
When you have found the right IMAGE_IMPORT_DESCRIPTOR, follow its
'OriginalFirstThunk' and get hold of the pointed-to array of
IMAGE_THUNK_DATAs; inspect the RVAs and find the function "foo".

This was the basic structure, for simple cases. Now we'll learn about
tweaks in the import directories.

First, the bit IMAGE_ORDINAL_FLAG (that is: the MSB) of the
IMAGE_THUNK_DATA in the arrays can be set, in which case there is no
symbol-name-information in the list and the symbol is imported purely by
ordinal. You get the ordinal by inspecting the lower word of the
IMAGE_THUNK_DATA.
Obviously, in this case the warning about unreliable ordinals doesn't
apply because all you have *is* the ordinal.

Ok, now, why do we have *two* lists of pointers to the
IMAGE_IMPORT_BY_NAMEs? Because at runtime the application doesn't need
the imported functions' names but the addresses. The loader will look up
each imported symbol in the export-directory of the DLL in question and
replace the IMAGE_THUNK_DATA in the 'FirstThunk'-list with the linear
address of the DLL's entry point. Remember the stubs "jmp dword ptr
[0xdeadbeef]"; this address 0xdeadbeef is exactly the address of the
IMAGE_THUNK_DATA in the 'FirstThunk'-list.
The 'OriginalFirstThunk' remains untouched, so you can always look up
the original list of imported names via the 'OriginalFirstThunk'-list.

Now we get the so-called "bound imports".
Think about the loader's task: when a binary that it wants to execute
needs a function from a DLL, the loader loads the DLL, finds its export
directory, looks up the function's RVA and calculates the function's
entry point. Then it patches the so-found address into the 'FirstThunk'-
list.
Given that the programmer was clever and supplied unique preferred load
addresses for the DLLs that don't clash, we can assume that the
functions' entry points will always be the same. They can be computed
and patched into the 'FirstThunk'-list at link-time, and that's what
happens with the "bound imports". (The utility "bind" does this.)

Of course, one must be cautious: The user's DLL may have a different
version, or it may be necessary to relocate the DLL, thus invalidating
the pre-patched 'FirstThunk'-list; in this case, the loader will still
be able to walk the 'OriginalFirstThunk'-list, find the imported symbols
and re-patch the 'FirstThunk'-list. The loader knows that this is
necessary if a) the versions of the exporting DLL don't match or b) the
exporting DLL had to be relocated.

To decide whether there were relocations is no problem for the loader,
but how to find out if the versions differ? This is where the
'TimeDateStamp' of the IMAGE_IMPORT_DESCRIPTOR comes in. If it is 0, the
import-list has not been bound, and the loader must fix the entry points
always. Otherwise, the imports are bound, and 'TimeDateStamp' must match
the 'TimeDateStamp' of the exporting DLL's 'FileHeader'; if it doesn't
match, the loader assumes that the binary is bound to a "wrong" DLL and
will re-patch the import list.

This was the so-called "old-style" binding.

There is an additional quirk about "forwarders" in the import-list. A DLL
can export a symbol that's not defined in the DLL but imported from
another DLL; such a symbol is said to be forwarded. Now, obviously you
can't tell if the symbol's entry point is valid by looking into the
timestamp of a DLL that doesn't actually contain the entry point. So the
forwarded symbols' entry points must always be fixed up, for safety
reasons. In the import list of a binary, imports of forwarded symbols
need to be found so the loader can patch them.

This is done via the 'ForwarderChain'. It is an index in the thunk-
lists; the import at the indexed position is a forwarded export, and the
contents of the 'FirstThunk'-list at this position is the index of the
*next* forwarded import, and so on, until the index is "-1" which
indicates there are no more forwards.

At this point, we should sum up what we have had so far :-)

Ok, I will assume you have found the IMAGE_DIRECTORY_ENTRY_IMPORT and you have
followed it to find the import-directory, which will be in one of the
sections. Now you're at the beginning of an array of
IMAGE_IMPORT_DESCRIPTORs the last of which will be entirely 0-bytes-
filled.
To decipher one of the IMAGE_IMPORT_DESCRIPTORs, you first look into the
'Name'-field, follow the RVA and thusly find the name of the exporting
DLL. Next you decide whether the imports are bound or not;
'TimeDateStamp' will be non-zero if the imports are bound. If they are
bound, now is a good time to check if the DLL version matches yours by
comparing the 'TimeDateStamp's.
Now you follow the 'OriginalFirstThunk'-RVA to go to the
IMAGE_THUNK_DATA-array; walk down this array (it is be 0-terminated),
and each member will be the RVA of a IMAGE_IMPORT_BY_NAME (unless the
hi-bit is set in which case you don't have a name but are left with a
mere ordinal). Follow the RVA, and skip 2 bytes (the ordinal), and now
you have got a pointer to a 0-terminated ASCII-string that's the name of
the imported function.
To find the supplied entry point addresses in case it is a bound import,
follow the 'FirstThunk' and walk it parallel to the
'OriginalFirstThunk'-array; the array-members are the linear addresses
of the entry points.

There is one thing I didn't mention until now: Apparently there are
linkers that exhibit a bug when they build the import directory (I've
found this bug being in use by a Borland C linker). These linkers set
the 'OriginalFirstThunk' in the IMAGE_IMPORT_DESCRIPTOR to 0 and create
only the 'FirstThunk'-array. Obviously, such import directories cannot
be bound (else the necessary information to re-fix the imports were
lost). In this case, you will have to follow the 'FirstThunk'-array to get
the imported symbol names, and you will never have pre-patched entry point
addresses.

The last tweak about the import directories is the so-called "new style"
binding (it is described in [3]), which can also be done with the
"bind"-utility. When this is used, the 'TimeDateStamp' is set to
all-bits-1 and there is no forwarderchain; all imported symbols get their
address patched, whether they are forwarded or not. Still, you need to
know the DLLs' version, and you need to distinguish forwarded symbols
from ordinary ones. For this purpose, the
IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT directory is created. This will, as
far as I could find out, *not* be in a section but in the header, after
the section headers and before the first section. (Hey, I didn't invent
this, I'm only describing it!)
This directory tells you, for each used DLL, from which other DLLs there
are forwarded exports.
The structure is an IMAGE_BOUND_IMPORT_DESCRIPTOR, comprising (in this
order):
A 32-bit number, giving you the 'TimeDateStamp' of the DLL;
a 16-bit-number 'OffsetModuleName', being the offset from the beginning
of the directory to the 0-terminated name of the forwarded-to DLL;
a 16-bit-number 'NumberOfModuleForwarderRefs' giving you the number of
DLLs that this DLL uses for its forwarders.

Immediatly following this struct you find 'NumberOfModuleForwarderRefs'
structs that tell you the names and versions of the DLLs that this DLL
forwards from. These structs are 'IMAGE_BOUND_FORWARDER_REF's:
A 32-bit-number 'TimeDateStamp';
a 16-bit-number 'OffsetModuleName', being the offset from the beginning
of the directory to the 0-terminated name of the forwarded-from DLL;
16 unused bits.

Following the 'IMAGE_BOUND_FORWARDER_REF's is the next
'IMAGE_BOUND_IMPORT_DESCRIPTOR' and so on; the list is terminated by an
all-0-bits-IMAGE_BOUND_IMPORT_DESCRIPTOR.


Sorry for the inconvenience, but that's what it looks like :-)


Now, if you have a new-bound import directory, you load all the DLLs,
use the directory pointer IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT to find the
IMAGE_BOUND_IMPORT_DESCRIPTOR, scan through it and check if the
'TimeDateStamp's of the loaded DLLs match the ones given in this
directory. If not, fix them in the 'FirstThunk'-array of the import
directory.

It is easy if you know how it works :-)



resources
---------
The resources, like dialog boxes, menus, icons and so on, are stored in
the data directory pointed to by IMAGE_DIRECTORY_ENTRY_RESOURCE. It is in
a section that has, at least, the bits 'IMAGE_SCN_CNT_INITIALIZED_DATA'
and 'IMAGE_SCN_MEM_READ' set.

A resource base is a 'IMAGE_RESOURCE_DIRECTORY's; it contains several
'IMAGE_RESOURCE_DIRECTORY_ENTRY's which in turn may point to a
'IMAGE_RESOURCE_DIRECTORY'. This way, you get a tree of
'IMAGE_RESOURCE_DIRECTORY's with 'IMAGE_RESOURCE_DIRECTORY_ENTRY's as
leafs; these leafs point to the actual resource data.

In real life, the situation is somewhat relaxed. Normally you won't find
convoluted trees you can't possibly sort out.
The hierarchy is, normally, like this: one directory is the root. It
points to directories, one for each resource type. These directories
point to subdirectories, each of which will have a name or an ID and
point to a directory of the languages provided for this resource; for
each language you will find one resource entry, which will finally point
to the data.

The tree, without the pointer to the data, may look like this:

                           (root)
                              |
             +----------------+------------------+
             |                |                  |
            menu            dialog             icon
             |                |                  |
       +-----+-----+        +-+----+           +-+----+----+
       |           |        |      |           |      |    |
    "main"      "popup"   0x10   "maindlg"    0x100 0x110 0x120
       |            |       |      |           |      |    | 
   +---+-+          |       |      |           |      |    |
   |     |     default   english   default    def.   def.  def.
german english


A IMAGE_RESOURCE_DIRECTORY comprises:
32 bits of unused flags called 'Characteristics';
32 bits 'TimeDateStamp' (again in the common time_t representation),
giving you the time the resource was created (if the entry is set);
16 bits 'MajorVersion' and 16 bits 'MinorVersion', thusly allowing you
to maintain several versions of the resource;
16 bits 'NumberOfNamedEntries' and another 16 bits 'NumberOfIdEntries'.

Immediatly following such a structure are
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -