📄 pe.txt

📁 全面揭示PE文件的核心机密!彻底揭开PE结构的面纱!
💻 TXT
📖 第 1 页 / 共 5 页
字号:
file, you will have to find out that sections in RAM are aligned to 4096
bytes and the ".code"-section starts at RVA 0x1000 in RAM and is 16384
bytes long; then you know that RVA 0x1560 is at offset 0x560 in that
section. Find out that the sections are aligned to 512-byte-borders in
the file and that ".code" begins at offset 0x800 in the file, and you
know that the code execution start is at byte 0x800+0x560=0xd60 in the
file.

Then you disassemble and find an access to a variable at the linear
address 0x1051d0. The linear address will be relocated upon loading the
binary and is given on the assumption that the preferred load address is
used. You find out that the preferred load address is 0x100000, so we
are dealing with RVA 0x51d0. This is in the data section which starts at
RVA 0x5000 and is 2048 bytes long. It begins at file offset 0x4800.
Hence. the veriable can be found at file offset
0x4800+0x51d0-0x5000=0x49d0.


Optional Header
---------------

Immediatly following the file header is the IMAGE_OPTIONAL_HEADER
(which, in spite of the name, is always there). It contains
information about how to treat the PE-file exactly. We'll also have the
members from top to bottom.

The first 16-bit-word is 'Magic' and has, as far as I looked into
PE-files, always the value 0x010b.

The next 2 bytes are the version of the linker ('MajorLinkerVersion' and
'MinorLinkerVersion') that produced the file. These values, again, are
unreliable and do not always reflect the linker version properly.
(Several linkers simply don't set this field.)
And, coming to think about it, what good is the version if you have got
no idea *which* linker was used?

The next 3 longwords (32 bit each) are intended to be the size of the
executable code ('SizeOfCode'), the size of the initialized data
('SizeOfInitializedData', the so-called "data segment"), and the size of
the uninitialized data ('SizeOfUninitializedData', the so-called "bss
segment"). These values are, again, unreliable (e.g. the data segment
may actually be split into several segments by the compiler or linker),
and you get better sizes by inspecting the 'sections' that follow the
optional header.

Next is a 32-bit-value that is a RVA. This RVA is the offset to the
codes's entry point ('AddressOfEntryPoint').
Execution starts here; it is e.g. the address of a DLL's LibMain() or a
program's startup code (which will in turn call main()) or a driver's
DriverEntry(). If you dare to load the image "by hand", you call this
address to start the process after you have done all the fixups and the
relocations.

The next 2 32-bit-values are the offsets to the executable code
('BaseOfCode') and the initialized data ('BaseOfData'), both of them
RVAs again, and both of them being of little interest because you get
more reliable information by inspecting the 'sections' that follow the
headers.
There is no offset to the uninitialized data because, being
uninitialized, there is little point in providing this data in the
image.

The next entry is a 32-bit-value giving the preferred (linear) load
address ('ImageBase') of the entire binary, including all headers. This
is the address (always a multiple of 64 KB) the file has been relocated
to by the linker; if the binary can in fact be loaded to that address,
the loader doesn't need to relocate the file again, which is a win in
loading time.
The preferred load address can not be used if another image has already
been loaded to that address (an "address clash", which happens quite
often if you load several DLLs that are all relocated to the linker's
default), or the memory in question has been used for other purposes
(stack, malloc(), uninitialized data, whatever). In these cases, the
image must be loaded to some other address and it needs to be relocated
(see 'relocation directory' below). This has further consequences if the
image is a DLL, because then the "bound imports" are no longer valid,
and fixups have to be made to the binary that uses the DLL - see 'import
directory' below.

The next 2 32-bit-values are the alignments of the PE-file's sections in
RAM ('SectionAlignment', when the image has been loaded) and in the file
('FileAlignment'). Usually both values are 32, or FileAlignment is 512
and SectionAlignment is 4096. Sections will be discussed later.

The next 2 16-bit-words are the expected operating system version
('MajorOperatingSystemVersion' and 'MinorOperatingSystemVersion' [they
_do_ like self-documenting names at MS]). This version information is
intended to be the operating system's (e.g. NT or Win95) version, as
opposed to the subsystem's version (e.g. Win32); it is often not
supplied, or wrong supplied. The loader doesn't use it, apparently.

The next 2 16-bit-words are the binary's version, ('MajorImageVersion' and
'MinorImageVersion'). Many linkers don't set this information correctly
and many programmers don't bother to supply it, so it is better to rely
on the version-resource if one exists.

The next 2 16-bit-words are the expected subsystem version
('MajorSubsystemVersion' and 'MinorSubsystemVersion'). This should be
the Win32 version or the POSIX version, because 16-bit-programs or
OS/2-programs won't be in PE-format, obviously.
This subsystem version should be supplied correctly, because it *is*
checked and used:
If the application is a Win32-GUI-application and runs on NT4, and the
subsystem version is *not* 4.0, the dialogs won't be 3D-style and
certain other features will also work "old-style" because the
application expects to run on NT 3.51, which had the program manager
instead of explorer and so on, and NT 4.0 will mimic that behaviour as
faithfully as possible.

Then we have a 'Win32VersionValue' of 32 bits. I don't know what it is
good for. It has been 0 in all the PE files that I inspected.

Next is a 32-bits-value giving the amount of memory the image will need,
in bytes ('SizeOfImage'). It is the sum of all headers' and sections'
lengths if aligned to 'SectionAlignment'. It is a hint to the loader how
many pages it will need in order to load the image.

The next thing is a 32-bit-value giving the total length of all headers
including the data directories and the section headers
('SizeOfHeaders'). It is at the same time the offset from the beginning
of the file to the first section's raw data.

Then we have got a 32-bit-checksum ('CheckSum'). This checksum is, for
current versions of NT, only checked if the image is a NT-driver (the
driver will fail to load if the checksum isn't correct). For other
binary types, the checksum need not be supplied and may be 0.
The algorithm to compute the checksum is property of Microsoft, and they
won't tell you. However, several tools of the Win32 SDK will compute
and/or patch a valid checksum, and the function CheckSumMappedFile() in
the imagehelp.dll will do so too.
The checksum is supposed to prevent loading of damaged binaries that
would crash anyway - and a crashing driver would result in a BSOD, so
it is better not to load it at all.

Then there is a 16-bit-word 'Subsystem' that tells in which of the
NT-subsystems the image runs:

    IMAGE_SUBSYSTEM_NATIVE (1)
        The binary doesn't need a subsystem. This is used for drivers.
    
    IMAGE_SUBSYSTEM_WINDOWS_GUI (2)
        The image is a Win32 graphical binary. (It can still open a
        console with AllocConsole() but won't get one automatically at
        startup.)
    
    IMAGE_SUBSYSTEM_WINDOWS_CUI (3)
        The binary is a Win32 console binary. (It will get a console
        per default at startup, or inherit the parent's console.)
    
    IMAGE_SUBSYSTEM_OS2_CUI (5)
        The binary is a OS/2 console binary. (OS/2 binaries will be in
        OS/2 format, so this value will seldom be used in a PE file.)
    
    IMAGE_SUBSYSTEM_POSIX_CUI (7)
        The binary uses the POSIX console subsystem.

Windows 95 binaries will always use the Win32 subsystem, so the only
legal values for these binaries are 2 and 3; I don't know if "native"
binaries on windows 95 are possible.

The next thing is a 16-bit-value that tells, if the image is a DLL, when
to call the DLL's entry point ('DllCharacteristics'). This seems not to
be used; apparently, the DLL is always notified about everything.
    If bit 0 is set, the DLL is notified about process attachment (i.e.
        DLL load).
    If bit 1 is set, the DLL is notified about thread detachments (i.e.
        thread terminations).
    If bit 2 is set, the DLL is notified about thread attachments (i.e.
        thread creations).
    If bit 3 is set, the DLL is notified about process detachment (i.e.
        DLL unload).

The next 4 32-bit-values are the size of reserved stack
('SizeOfStackReserve'), the size of initially committed stack
('SizeOfStackCommit'), the size of the reserved heap
('SizeOfHeapReserve') and the size of the committed heap
('SizeOfHeapCommit').
The 'reserved' amounts are address space (not real RAM) that is reserved
for the specific purpose; at program startup, the 'committed' amount is
actually allocated in RAM. The 'committed' value is also the amount by
which the committed stack or heap grows if necessary. (Other sources
claim that the stack will grow in pages, regardless of the
'SizeOfStackCommit' value. I didn't check this.)
So, as an example, if a program has a reserved heap of 1 MB and a
committed heap of 64 KB, the heap will start out at 64 KB and is
guaranteed to be enlargeable up to 1 MB. The heap will grow in
64-KB-chunks.
The 'heap' in this context is the primary (default) heap. A process can
create more heaps if so it wishes.
The stack is the first thread's stack (the one that starts main()). The
process can create more threads which will have their own stacks.
DLLs don't have a stack or heap of their own, so the values are ignored
for their images. I don't know if drivers have a heap or a stack of
their own, but I don't think so.

After these stack- and heap-descriptions, we find 32 bits of
'LoaderFlags', which I didn't find a useful description of. I only found
a vague note about setting bits that automatically invoke a breakpoint
or a debugger after loading the image; however, this doesn't seem to
work.

Then we find 32 bits of 'NumberOfRvaAndSizes', which is the number of
valid entries in the directories that follow immediatly. I've found this
value to be unreliable; you might wish use the constant
IMAGE_NUMBEROF_DIRECTORY_ENTRIES instead, or the lesser of both.

After the 'NumberOfRvaAndSizes' there is an array of
IMAGE_NUMBEROF_DIRECTORY_ENTRIES (16) IMAGE_DATA_DIRECTORYs.
Each of these directories describes the location (32 bits RVA called
'VirtualAddress') and size (also 32 bit, called 'Size') of a particular
piece of information, which is located in one of the sections that
follow the directory entries.
For example, the security directory is found at the RVA and has the size
that are given at index 4.
The directories that I know the structure of will be discussed later.
Defined directory indexes are:

    IMAGE_DIRECTORY_ENTRY_EXPORT (0)
        The directory of exported symbols; mostly used for DLLs.
        Described below.
    
    IMAGE_DIRECTORY_ENTRY_IMPORT (1)
        The directory of imported symbols; see below.
    
    IMAGE_DIRECTORY_ENTRY_RESOURCE (2)
        Directory of resources. Described below.
    
    IMAGE_DIRECTORY_ENTRY_EXCEPTION (3)
        Exception directory - structure and purpose unknown.
    
    IMAGE_DIRECTORY_ENTRY_SECURITY (4)
        Security directory - structure and purpose unknown.
    
    IMAGE_DIRECTORY_ENTRY_BASERELOC (5)
        Base relocation table - see below.
    
    IMAGE_DIRECTORY_ENTRY_DEBUG (6)
        Debug directory - contents is compiler dependent. Moreover, many
        compilers stuff the debug information into the code section and
        don't create a separate section for it.
    
    IMAGE_DIRECTORY_ENTRY_COPYRIGHT (7)
        Description string - some arbitrary copyright note or the like.
    
    IMAGE_DIRECTORY_ENTRY_GLOBALPTR (8)
        Machine Value (MIPS GP) - structure and purpose unknown.
    
    IMAGE_DIRECTORY_ENTRY_TLS (9)
        Thread local storage directory - structure unknown; contains
        variables that are declared "__declspec(thread)", i.e.
        per-thread global variables.
    
    IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG (10)
        Load configuration directory - structure and purpose unknown.
    
    IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT (11)
        Bound import directory - see description of import directory.
    
    IMAGE_DIRECTORY_ENTRY_IAT (12)
        Import Address Table - see description of import directory.
    
As an example, if we find at index 7 the 2 longwords 0x12000 and 33, and
the load address is 0x10000, we know that the copyright data is at
address 0x10000+0x12000 (in whatever section there may be), and the
copyright note is 33 bytes long.
If a directory of a particular type is not used in a binary, the Size
and VirtualAddress are both 0.



Section directories
-------------------

The sections consist of two major parts: first, a section description
(of type IMAGE_SECTION_HEADER) and then the raw section data. So after
the data directories we find an array of 'NumberOfSections' section
headers, ordered by the sections' RVAs.

A section header contains:

An array of IMAGE_SIZEOF_SHORT_NAME (8) bytes that make up the name
(ASCII) of the section. If all of the 8 bytes are used there is no 0-
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -