📄 the pe file format.txt

📁 一些比较有价值的关于PE格式的资料,希望能对大家理解PE有帮助
💻 TXT
📖 第 1 页 / 共 5 页
字号:
    If bit 7 (IMAGE_SCN_CNT_UNINITIALIZED_DATA) is set, this section's
    data is uninitialized data and will be initialized to all-0-bytes
    before execution starts. This is normally the BSS.

    If bit 9 (IMAGE_SCN_LNK_INFO) is set, the section doesn't contain
    image data but comments, description or other documentation.

    If bit 11 (IMAGE_SCN_LNK_REMOVE) is set, the data is part of an
    object file's section that is supposed to be left out when the
    executable file is linked.

    If bit 12 (IMAGE_SCN_LNK_COMDAT) is set, the section contains
    "common block data", which are packaged functions of some sort.

    If bit 15 (IMAGE_SCN_MEM_FARDATA) is set, we have far data -
    whatever that means. This bit's meaning is unsure.

    If bit 17 (IMAGE_SCN_MEM_PURGEABLE) is set, the section's data
    is purgeable - but I don't think that this is the same as
    "discardable", which has a bit of its own, see below.
    The same bit is apparently used to indicate 16-bit-information as
    there is also a define IMAGE_SCN_MEM_16BIT for it.
    This bit's meaning is unsure.

    If bit 18 (IMAGE_SCN_MEM_LOCKED) is set, the section should not be
    moved in memory? This bit's meaning is unsure.

    If bit 19 (IMAGE_SCN_MEM_PRELOAD) is set, the section should be
    paged in before execution starts? This bit's meaning is unsure.

    Bits 20 to 23 specify an alignment that I have no information
    about. There are #defines IMAGE_SCN_ALIGN_16BYTES and the like. The
    only value I've ever seen used is 0, for the default 16-byte-
    alignment. I suspect that this is the alignment of objects in a
    library file or the like.

    If bit 24 (IMAGE_SCN_LNK_NRELOC_OVFL) is set, the section contains
    some extended relocations that I don't know about.

    If bit 25 (IMAGE_SCN_MEM_DISCARDABLE) is set, the section's data is
    not needed after the process has started. This is the case,
    for example, with the relocation information. I've seen it also for
    startup routines of drivers and services that are only executed
    once.

    If bit 26 (IMAGE_SCN_MEM_NOT_CACHED) is set, the section's data
    should not be cached. Don't ask my why not. Does this mean to switch
    off the 2nd-level-cache?

    If bit 27 (IMAGE_SCN_MEM_NOT_PAGED) is set, the section's data
    should not be paged out. This may be interesting for drivers.

    If bit 28 (IMAGE_SCN_MEM_SHARED) is set, the section's data is
    shared among all running instances of the image. If it is e.g. the
    initialized data of a DLL, all running instances of the DLL will at
    any time have the same variable contents.
    Note that only the first instance's section is initialized.
    Sections containing code are always shared.

    If bit 29 (IMAGE_SCN_MEM_EXECUTE) is set, the process gets
    'execute'-access to the section's memory.
    
    If bit 30 (IMAGE_SCN_MEM_READ) is set, the process gets
    'read'-access to the section's memory.
    
    If bit 31 (IMAGE_SCN_MEM_WRITE) is set, the process gets
    'write'-access to the section's memory.



After the section headers we find the sections themselves. They are, in
the file, aligned to 'FileAlignment' bytes (that is, after the optional
header and after each section's data there will be padding bytes). When
loaded (in RAM), the sections are aligned to 'SectionAlignment' bytes.

As an example, if the optional header ends at file offset 981 and
'FileAlignment' is 512, the first section will start at byte 1024. Note
that you can find the sections via the 'PointerToRawData' or the
'VirtualAddress', so there is hardly any need to actually fuss around
with the alignments.


I will try to make an image of it all:


    +-------------------+
    | DOS-stub          |
    +-------------------+
    | file-header       |
    +-------------------+
    | optional header   |
    |- - - - - - - - - -|
    |                   |----------------+
    | data directories  |                |
    |                   |                |
    |(RVAs to direc-    |-------------+  |
    |tories in sections)|             |  |
    |                   |---------+   |  |
    |                   |         |   |  |
    +-------------------+         |   |  |
    |                   |-----+   |   |  |
    | section headers   |     |   |   |  |
    | (RVAs to section  |--+  |   |   |  |
    |  borders)         |  |  |   |   |  |
    +-------------------+<-+  |   |   |  |
    |                   |     | <-+   |  |
    | section data 1    |     |       |  |
    |                   |     | <-----+  |
    +-------------------+<----+          |
    |                   |                |
    | section data 2    |                |
    |                   | <--------------+
    +-------------------+

There is one section header for each section, and each data directory
will point to one of the sections (several data directories may point to
the same section, and there may be sections without data directory
pointing to them).



Sections' raw data
------------------

general
-------
All sections are aligned to 'SectionAlignment' when loaded in RAM, and
'FileAlignment' in the file. The sections are described by entries in
the section headers: You find the sections in the file via
'PointerToRawData' and in memory via 'VirtualAddress'; the length is in
'SizeOfRawData'.

There are several kinds of sections, depending on what's contained in
them. In most cases (but not in all) there will be at least one
data directory in a section, with a pointer to it in the optional
header's data directory array.

code section
------------
First, I will mention the code section. The section will have, at least,
the bits 'IMAGE_SCN_CNT_CODE', 'IMAGE_SCN_MEM_EXECUTE' and
'IMAGE_SCN_MEM_READ' set, and 'AddressOfEntryPoint' will point somewhere
into the section, to the start of the function that the developer wants
to execute first.
'BaseOfCode' will normally point to the start of this section, but may
point to somewhere later in the section if some non-code-bytes are
placed before the code in the section.
Normally, there will be nothing but executable code in this section, and
there will be only one code section, but don't rely on this.
Typical section names are ".text", ".code", "AUTO" and the like.

data section
-------------
The next thing we'll discuss is the initialized variables; this section
contains initialized static variables (like "static int i = 5;"). It will
have, at least, the bits 'IMAGE_SCN_CNT_INITIALIZED_DATA',
'IMAGE_SCN_MEM_READ' and 'IMAGE_SCN_MEM_WRITE' set. Some linkers may
place constant data into a section of their own that doesn't have the
writeable-bit. If part of the data is shareable, or there are other
peculiarities, there may be more sections with the apropriate section-
bits set.
The section, or sections, will be in the range 'BaseOfData' up to
'BaseOfData'+'SizeOfInitializedData'.
Typical section names are '.data', '.idata', 'DATA' and so on.

bss section
-----------
Then there is the uninitialized data (for static variables like "static
int k;"); this section is quite like the initialized data, but will have
a file offset ('PointerToRawData') of 0 to indicate its contents is not
stored in the file, and 'IMAGE_SCN_CNT_UNINITIALIZED_DATA' is set
instead of 'IMAGE_SCN_CNT_INITIALIZED_DATA' to indicate that the
contents should be set to 0-bytes at load-time.
The length will be 'SizeOfUninitializedData'.
Typical names are '.bss', 'BSS' and the like.

These were the section data that are *not* pointed to by data
directories. Their contents and structure is supplied by the compiler,
not by the linker.
(The stack-segment and heap-segment are not sections in the binary but
created by the loader from the stacksize- and heapsize-entries in the
optional header.)

copyright
---------
To begin with a simple directory-section, let's look at the data
directory 'IMAGE_DIRECTORY_ENTRY_COPYRIGHT'. The contents is a
copyright- or description string in ASCII (not 0-terminated), like
"Gonkulator control application, copyright (c) 1848 Hugendubel & Cie".
This string is, normally, supplied to the linker with the command line.
This string is not needed at runtime and may be discarded. It is not
writeable; in fact, the application doesn't need access at all.
So the linker will find out if there is a discardable non-writeable
section already and if not, create one (named '.descr' or the like). It
will then stuff the string into the section and let the
copyright-directory-pointer point to the string. The
'IMAGE_SCN_CNT_INITIALIZED_DATA' bit should be set.

exported symbols
----------------
The next-simplest thing is the export directory,
'IMAGE_DIRECTORY_ENTRY_EXPORT'. This is a directory typically found
in DLLs; it contains the entry points of exported functions (and the
addresses of exported objects etc.). Executables may of course also have
exported symbols but usually they don't.
The containing section should be "initialized data" and "readable". It
can not be "discardable" because the process might call "GetProcAddress()"
to find a function's entry point at runtime.
The section is normally called '.edata' if it is a separate thing; often
enough, it is merged into some other section like "initialized data".

The structure of the export table ('IMAGE_EXPORT_DIRECTORY') comprises a
header and the export data, that is: the symbol names, their ordinals
and the offsets to their entry points.

First, we have 32 bits of 'Characteristics' that are unused and normally
0. Then there is a 32-bit-'TimeDateStamp', which presumably should give
the time the table was created in the time_t-format; alas, it is not
always valid (some linkers set it to 0). Then we have 2 16-bit-words of
version-info ('MajorVersion' and 'MinorVersion'), and these, too, are
often enough set to 0.

The next thing is 32 bits of 'Name'; this is an RVA to the DLL name as a
0-terminated ASCII string. (The name is necessary in case the DLL file is
renamed - see "binding" at the import directory.)
Then, we have got a 32-bit-'Base'. We'll come to that in a moment.

Now we have got sort of a problem, because the next two 32-bit-values are
the number of exported functions ('NumberOfFunctions') and the number of
exported names ('NumberOfNames'). These values are *not* always equal,
and weird things happen if you try to decipher the export directory past
the minimum of both. In *most* export-directories the values are equal
and we're fine. It seems to be always safe to decipher up to the minimum
of both values.
I've got a suspicion that it is somehow possible do export symbols
without a name (just by ordinal), but I didn't find anything about this
topic.

The next 3 32-bit-values are RVAs to 3 arrays. The arrays run parallel:
the array of entry points 'AddressOfFunctions' (given as 32-bit-RVAs),
the array of 32-bit-RVAs to symbol names 'AddressOfNames', and the array
of 16-bit-ordinals 'AddressOfNameOrdinals'.

To get information about the i-th exported symbol, find the export
directory, follow the 3 RVAs, skip to index i in the 3 arrays, and
you have got a) the RVA of the function's entry point; b) the RVA to the
function's name (a 0-terminated ASCII-string) and c) the function's
ordinal (you have to add the 'Base' to the value to get the real
ordinal).

For functions, the entry-point-RVA will normally point into the code
section; for objects, it will almost always point into the data- or
bss-section.

imported symbols
----------------
This is one for the tough.

When the compiler finds a call to a function that is in a different
executable (mostly in a DLL), it will, in the most simplistic case, not
know anything about the circumstances and simply output a normal
call-instruction to that symbol, the address of which the linker will
have to fix. The linker uses an import library to supply the addresses,
and that library has stubs for all the exported symbols, each of which
consists of a jump-instruction; the stubs are the actual call-targets.
These jump-instructions will actually jump to an address that's fetched
from the import address table we're discussing here.
In more sophisticated applications (when "declspec(dllimport)" is used),
the compiler knows the function is imported, and outputs a call to the
address that's in the import address table, bypassing the jump.

Hm. So far for that matter. You are kindly referred to [1] and [2].

However, the address of the function in the DLL is always necessary and
will be supplied by the loader when the application is loaded. The
loader knows which symbols in what libraries have to be looked up and
their addresses fixed by searching the import directory.

I will better give you an example. The calls with or without
__declspec(dllimport) look like this:

    source:
        int symbol(char *);
        __declspec(dllimport) int symbol2(char*);
        void foo(void)
        {
            int i=symbol("bar");
            int j=symbol2("baz");
        }
    
    assembly:
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -