📄 pe.txt

📁 全面揭示PE文件的核心机密!彻底揭开PE结构的面纱!
💻 TXT
📖 第 1 页 / 共 5 页
字号:

First, we have 32 bits of 'Characteristics' that are unused and normally
0. Then there is a 32-bit-'TimeDateStamp', which presumably should give
the time the table was created in the time_t-format; alas, it is not
always valid (some linkers set it to 0). Then we have 2 16-bit-words of
version-info ('MajorVersion' and 'MinorVersion'), and these, too, are
often enough set to 0.

The next thing is 32 bits of 'Name'; this is an RVA to the DLL name as a
0-terminated ASCII string. (The name is necessary in case the DLL file is
renamed - see "binding" at the import directory.)
Then, we have got a 32-bit-'Base'. We'll come to that in a moment.

The next 32-bit-value is the total number of exported items
('NumberOfFunctions'). In addition to their ordinal number, items may be
exported by one or several names. and the next 32-bit-number is the
total number of exported names ('NumberOfNames').
In most cases, each exported item will have exactly one corresponding
name and it will be used by that name, but an item may have several
associated names (it is then accessible by each of them), or it may have
no name, in which case it is only accessible by its ordinal number. The
use of unnamed exports (purely by ordinal) is discouraged, because all
versions of the exporting DLL would have to use the same ordinal
numbering, which is a maintainance problem.

The next 32-bit-value 'AddressOfFunctions' is a RVA to the list of
exported items. It points to an array of 'NumberOfFunctions'
32-bit-values, each being a RVA to the exported function or variable.

There are 2 quirks about this list: First, such an exported RVA may be 0,
in which case it is unused. Second, if the RVA points into the section
containing the export directory, this is a forwarded export. A forwarded
export is a pointer to an export in another binary; if it is used, the
pointed-to export in the other binary is used instead. The RVA in this
case points, as mentioned, into the export directory's section, to a
zero-terminated string comprising the name of the pointed-to DLL and
the export name separated by a dot, like "otherdll.exportname", or the
DLL's name and the export ordinal, like "otherdll.#19".

Now is the time to explain the export ordinal. An export's ordinal is
the index into the AddressOfFunctions-Array (the 0-based position in
this array) plus the 'Base' mentioned above.
In most cases, the 'Base' is 1, which means the first export has an
ordinal of 1, the second has an ordinal of 2 and so on.

After the 'AddressOfFunctions'-RVA we find a RVA to the array of
32-bit-RVAs to symbol names 'AddressOfNames', and a RVA to the array of
16-bit-ordinals 'AddressOfNameOrdinals'. Both arrays have
'NumberOfNames' elements.
The symbol names may be missing entirely, in which case the
'AddressOfNames' is 0. Otherwise, the pointed-to arrays are running
parallel, which means their elements at each index belong together. The
'AddressOfNames'-array consists of RVAs to 0-terminated export names;
the names are held in a sorted list (i.e. the first array member is the
RVA to the alphabetically smallest name; this allows efficient searching
when looking up an exported symbol by name).
According to the PE specification, the 'AddressOfNameOrdinals'-array has
the ordinal corresponding to each name; however, I've found this array
to contain the actual index into the 'AddressOfFunctions-Array instead.

I'll draw a picture about the three tables:


    AddressOfFunctions
           |
           |
           |
           v
    exported RVA with ordinal 'Base'
    exported RVA with ordinal 'Base'+1
    ...
    exported RVA with ordinal 'Base'+'NumberOfFunctions'-1



    AddressOfNames                  AddressOfNameOrdinals
           |                              |
           |                              |
           |                              |
           v                              v
    RVA to first name           <-> Index of export for first name
    RVA to second name          <-> Index of export for second name
    ...                             ...
    RVA to name 'NumberOfNames' <-> Index of export for name 'NumberOfNames'


Some examples are in order.

To find an exported symbol by ordinal, subtract the 'Base' to get the
index, follow the 'AddressOfFunctions'-RVA to find the exports-array and
use the index to find the exported RVA in the array. If it does not
point into the export section, you are done. Otherwise, it points to a
string describing the exporting DLL and the name or ordinal therein, and
you have to look up the forwarded export there.

To find an exported symbol by name, follow the 'AddressOfNames'-RVA (if
it is 0 there are no names) to find the array of RVAs to the export
names. Search your name in the list. Use the name's index in the
'AddressOfNameOrdinals'-Array and get the 16-bit-number corresponding to
the found name. According to the PE spec, it is an ordinal and you need
to subtract the 'Base' to get the export index; according to my
experiences it is the export index and you don't subtract. Using the
export index, you find the export RVA in the 'AddressOfFunctions'-Array,
being either the exported RVA itself or a RVA to a string describing a
forwarded export.


imported symbols
----------------
When the compiler finds a call to a function that is in a different
executable (mostly in a DLL), it will, in the most simplistic case, not
know anything about the circumstances and simply output a normal
call-instruction to that symbol, the address of which the linker will
have to fix, like it does for any external symbol.
The linker uses an import library to look up from which DLL which symnol
is imported, and produces stubs for all the imported symbols, each of
which consists of a jump-instruction; the stubs are the actual
call-targets. These jump-instructions will actually jump to an address
that's fetched from the so-called import address table. In more
sophisticated applications (when "__declspec(dllimport)" is used), the
compiler knows the function is imported, and outputs a call to the
address that's in the import address table, bypassing the jump.

Anyway, the address of the function in the DLL is always necessary and
will be supplied by the loader from the exporting DLL's export directory
when the application is loaded. The loader knows which symbols in what
libraries have to be looked up and their addresses fixed by searching
the import directory.

I will better give you an example. The calls with or without
__declspec(dllimport) look like this:

    source:
        int symbol(char *);
        __declspec(dllimport) int symbol2(char*);
        void foo(void)
        {
            int i=symbol("bar");
            int j=symbol2("baz");
        }
    
    assembly:
        ...
        call _symbol                 ; without declspec(dllimport)
        ...
        call [__imp__symbol2]        ; with declspec(dllimport)
        ...

In the first case (without __declspec(dllimport)), the compiler didn't
know that '_symbol' was in a DLL, so the linker has to provide the
function '_symbol'. Since the function isn't there, it will supply a
stub function for the imported symbol, being an indirect jump. The
collection of all import-stubs is called the "transfer area" (also
sometimes called a "trampoline", because you jump there in order to jump
to somewhere else).
Typically this transfer area is located in the code section (it is not
part of the import directory). Each of the function stubs is a jump to
the actual function in the target DLLs. The transfer area looks like
this:

    _symbol:        jmp  [__imp__symbol]
    _other_symbol:  jmp  [__imp__other__symbol]
    ...


This means: if you use imported symbols without specifying
"__declspec(dllimport)" then the linker will generate a transfer area
for them, consisting of indirect jumps. If you do specify
"__declspec(dllimport)", the compiler will do the indirection itself and
a transfer area is not necessary. (It also means: if you import
variables or other stuff you must specify "__declspec(dllimport)",
because a stub with a jmp instruction is appropriate for functions
only.)

In any case the adress of symbol 'x' is stored at a location '__imp_x'.
All these locations together comprise the so-called "import address
table", which is provided to the linker by the import libraries of the
various DLLs that are used. The import address table is a list of
addresses like this:

   __imp__symbol:   0xdeadbeef
   __imp__symbol2:  0x40100
   __imp__symbol3:  0x300100
   ...

This import address table is a part of the import directory, and it is
pointed to by the IMAGE_DIRECTORY_ENTRY_IAT directory pointer (although
some linkers don't set this directory entry and it works nevertheless;
apparently, the loader can resolve imports without using the directory
IMAGE_DIRECTORY_ENTRY_IAT).
The addresses in this table are unknown to the linker; the linker
inserts dummies (RVAs to the function names; see below for more
information) that are patched by the loader at load time using the
export directory of the exporting DLL. The import address table, and how
it is found by the loader, will be described in more detail later in
this chapter.

Note that this description is C-specific; there are other application
building environments that don't use import libraries. They all need to
generate an import address table, though, which they use to let their
programs access the imported objects and functions. C compilers tend to
use import libraries because it is convenient for them - their linkers
use libraries anyway. Other environments use e.g. a description file
that lists the necessary DLL names and function names (like the "module
definition file"), or a declaration-style list in the source.


This is how imports are used by the program's code; now we'll look how
an import directory is made up so the loader can use it.


The import directory should reside in a section that's "initialized
data" and "readable".
The import directory is an array of IMAGE_IMPORT_DESCRIPTORs, one for
each used DLL. The list is terminated by a IMAGE_IMPORT_DESCRIPTOR
that's entirely filled with 0-bytes.
An IMAGE_IMPORT_DESCRIPTOR is a struct with these members:

    OriginalFirstThunk
        An RVA (32 bit) pointing to a 0-terminated array of RVAs to
        IMAGE_THUNK_DATAs, each describing one imported function. The
        array will never change.

    TimeDateStamp
        A 32-bit-timestamp that has several purposes. Let's pretend that
        the timestamp is 0, and handle the advanced cases later.

    ForwarderChain
        The 32-bit-index of the first forwarder in the list of imported
        functions. Forwarders are also advanced stuff; set to all-bits-1
        for beginners.
        
    Name
        A 32-bit-RVA to the name (a 0-terminated ASCII string) of the
        DLL.
        
    FirstThunk
        An RVA (32 bit) to a 0-terminated array of RVAs to
        IMAGE_THUNK_DATAs, each describing one imported function. The
        array is part of the import address table and will change.

So each IMAGE_IMPORT_DESCRIPTOR in the array gives you the name of the
exporting DLL and, apart from the forwarder and timestamp, it gives you
2 RVAs to arrays of IMAGE_THUNK_DATAs, using 32 bits. (The last member
of each array is entirely filled with 0-bytes to mark the end.)
Each IMAGE_THUNK_DATA is, for now, an RVA to a IMAGE_IMPORT_BY_NAME
which describes the imported function.
The interesting point is now, the arrays run parallel, i.e.: they point
to the same IMAGE_IMPORT_BY_NAMEs.

No need to be desparate, I will draw another picture. This is the
essential contents of one IMAGE_IMPORT_DESCRIPTOR:

     OriginalFirstThunk      FirstThunk
            |                    |
            |                    |
            |                    |
            V                    V

            0-->    func1     <--0
            1-->    func2     <--1
            2-->    func3     <--2
            3-->    foo       <--3
            4-->    mumpitz   <--4
            5-->    knuff     <--5
            6-->0            0<--6      /* the last RVA is 0! */

where the names in the center are the yet to discuss
IMAGE_IMPORT_BY_NAMEs. Each of them is a 16-bit-number (a hint) followed
by an unspecified amount of bytes, being the 0-terminated ASCII name of
the imported symbol.
The hint is an index into the exporting DLL's name table (see export
directory above). The name at that index is tried, and if it doesn't
match then a binary search is done to find the name.
(Some linkers don't bother to look up correct hints and simply specify
1 all the time, or some other arbitrary number. This doesn't harm, it
just makes the first attempt to resolve the name always fail, enforcing
a binary search for each name.)

To summarize, if you want to look up information about the imported
function "foo" from DLL "knurr", you first find the entry
IMAGE_DIRECTORY_ENTRY_IMPORT in the data directories, get an RVA, find
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -