📄 pe.txt

📁 一些比较有价值的关于PE格式的资料,希望能对大家理解PE有帮助
💻 TXT
📖 第 1 页 / 共 5 页
字号:

The first 16-bit-word is 'Magic' and has, as far as I looked into
PE-files, always the value 0x010b.

The next 2 bytes are the version of the linker ('MajorLinkerVersion' and
'MinorLinkerVersion') that produced the file. These values, again, are
unreliable and do not always reflect the linker version properly.
(Several linkers simply don't set this field.)
And, coming to think about it, what good is the version if you have got
no idea *which* linker was used?

The next 3 longwords (32 bit each) are intended to be the size of the
executable code ('SizeOfCode'), the size of the initialized data
('SizeOfInitializedData', the so-called "data segment"), and the size of
the uninitialized data ('SizeOfUninitializedData', the so-called "bss
segment"). These values are, again, unreliable (e.g. the data segment
may actually be split into several segments by the compiler or linker),
and you get better sizes by inspecting the 'sections' that follow the
optional header.

Next is a 32-bit-value that is a RVA. This RVA is the offset to the
codes's entry point ('AddressOfEntryPoint'). Execution starts here; it
is e.g. the address of a DLL's LibMain() or a program's startup code
(which will in turn call main()) or a driver's DriverEntry().
If you dare to load the image "by hand", you call this address to start
the process after you have done all the fixups and relocations.

The next 2 32-bit-values are the offsets to the executable code
('BaseOfCode') and the initialized data ('BaseOfData'), both of them
RVAs again, and both of them being of little interest because you get
more reliable information by inspecting the 'sections' that follow the
headers.
There is no offset to the uninitialized data because, being
uninitialized, there is little point in providing this data in the
image.

The next entry is a 32-bit-value that is a RVA giving the preferred load
address ('ImageBase'). This is the address the file has been relocated
to by the linker; if the binary can in fact be loaded to that address,
the loader doesn't need to relocate the file again, which is a big win
in loading time.
The preferred load address can not be used if another image has already
been loaded to that address (an "address clash", which happens quite
often if you load several DLLs that are all relocated to the linker's
default), or the memory in question has been used for other purposes
(stack, malloc(), uninitialized data, whatever). In these cases, the
image must be loaded to some other address and it needs to be relocated
(see 'relocation directory' below). This has further consequences if the
image is a DLL, because then the "bound imports" are no longer valid,
and fixups have to be made to the binary that uses the DLL - see 'import
directory' below.

The next 2 32-bit-values are the alignments of the PE-file's sections in
RAM ('SectionAlignment', when the image has been loaded) and in the file
('FileAlignment'). Usually both values are 32, or FileAlignment is 512
and SectionAlignment is 4096. Sections will be discussed later.

The next 2 16-bit-words are the expected operating system version
('MajorOperatingSystemVersion' and 'MinorOperatingSystemVersion' [they
_do_ like self-documenting names at MS]). This version information is
intended to be the operating system's (e.g. NT or Win95) version, as
opposed to the subsystem's version (e.g. Win32); it is often not
supplied, or wrong supplied. The loader doesn't use it, apparently.

The next 2 16-bit-words are the binary's version, ('MajorImageVersion' and
'MinorImageVersion'). Many linkers don't set this information correctly
and many programmers don't bother to supply it, so it is better to rely
on the version-resource if one exists.

The next 2 16-bit-words are the expected subsystem version
('MajorSubsystemVersion' and 'MinorSubsystemVersion'). This should be
the Win32 version or the POSIX version, because 16-bit-programs or
OS/2-programs won't be in PE-format, obviously.
This subsystem version should be supplied correctly, because it *is*
checked and used:
If the application is a Win32-GUI-application and runs on NT, and the
subsystem version is *not* 4.0, the dialogs won't be 3D-style and
certain other features will also work "old-style" because the
application expects to run on NT 3.51, which had the program manager
instead of explorer and so on, and NT 4.0 will mimic that behaviour as
faithfully as possible.

Then we have a 'Win32VersionValue' of 32 bits. I don't know what it is
good for. It has been 0 in all the PE files that I inspected.

Next is a 32-bits-value giving the amount of memory the image will need,
in bytes ('SizeOfImage'). It is the sum of all sections' lengths. It is
a hint to the loader how many pages it will need in order to load the image.

The next thing is a 32-bit-value giving the total length of all headers
including the data directories and the section headers
('SizeOfHeaders'). It is at the same time the offset from the beginning
of the file to the first section's raw data.

Then we have got a 32-bit-checksum ('CheckSum'). This checksum is, for
current versions of NT, only checked if the image is a NT-driver (the
driver will fail to load if the checksum isn't correct).
The algorithm to compute the checksum is property of Microsoft, and they
won't tell you. However, several tools of the Win32 SDK will compute a
valid checksum, and the function CheckSumMappedFile() in the
imagehelp.dll will do so too.
The checksum is supposed to prevent loading of damaged binaries that
would crash anyway - and a crashing driver would result in a BSOD, so
it is better not to load it at all.

Then there is a 16-bit-word 'Subsystem' that tells in which of the
NT-subsystems the image runs:

    IMAGE_SUBSYSTEM_NATIVE (1)
        The binary doesn't need a subsystem. This is used for drivers.
    
    IMAGE_SUBSYSTEM_WINDOWS_GUI (2)
        The image is a Win32 graphical binary. (It can still open a
        console with AllocConsole() but won't get one automatically at
        startup.)
    
    IMAGE_SUBSYSTEM_WINDOWS_CUI (3)
        The binary is a Win32 console binary. (It will get a console
        per default at startup.)
    
    IMAGE_SUBSYSTEM_OS2_CUI (5)
        The binary is a OS/2 console binary. (OS/2 binaries will be in
        OS/2 format, so this value will seldom be used in a PE
        file.)
    
    IMAGE_SUBSYSTEM_POSIX_CUI (7)
        The binary uses the POSIX console subsystem.

Windows 95 binaries will always use the Win32 subsystem, so the only
legal values for these binaries are 2 and 3; I don't know if "native"
binaries on windows 95 are possible.

The next thing is a 16-bit-value that tells, if the image is a DLL, when
to call the DLL's entry point ('DllCharacteristics'). This seems not to
be used; apparently, the DLL is always notified about everything.
    If bit 0 is set, the DLL is notified about process attachment (i.e.
    DLL load).
    If bit 1 is set, the DLL is notified about thread detachments (i.e.
    thread terminations).
    If bit 2 is set, the DLL is notified about thread attachments (i.e.
    thread creations).
    If bit 3 is set, the DLL is notified about process detachment (i.e.
    DLL unload).

The next 4 32-bit-values are the size of reserved stack
('SizeOfStackReserve'), the size of initially committed stack
('SizeOfStackCommit'), the size of the reserved heap
('SizeOfHeapReserve') and the size of the committed heap
('SizeOfHeapCommit').
The 'reserved' amounts are address space (not real RAM) that is reserved
for the specific purpose; at program startup, the 'committed' amount is
actually allocated in RAM. The 'committed' value is also the amount by
which the committed stack or heap grows if necessary.
So, as an example, if a program has a reserved heap of 1 MB and a
committed heap of 64 KB, the heap will start out at 64 KB and is
guaranteed to be enlargeable up to 1 MB. The heap will grow in
64-KB-chunks.
The 'heap' in this context is the primary (default) heap.
DLLs don't have a stack or heap of their own, so the values are ignored
for their images.

After these stack- and heap-descriptions, we find 32 bits of
'LoaderFlags', which I didn't find a useful description of. I only found
that you may set bits that automatically invoke a breakpoint or a
debugger after loading the image; however, this doesn't seem to work.

Then we find 32 bits of 'NumberOfRvaAndSizes', which is the number of
valid entries in the directories that follow immediatly. I've found this
value to be unreliable; you might wish use the constant
IMAGE_NUMBEROF_DIRECTORY_ENTRIES instead, or the lesser of both.

After the 'NumberOfRvaAndSizes' there is an array of
IMAGE_NUMBEROF_DIRECTORY_ENTRIES (16) IMAGE_DATA_DIRECTORYs.
Each of these directories describes the location (32 bits RVA called
'VirtualAddress') and size (also 32 bit, called 'Size') of a particular
piece of information, which is located in one of the sections that
follow the directory entries.
For example, the security directory is found at the RVA and has the size
that are given at index 4.
The directories that I know the structure of will be discussed later.
Defined directory indexes are:

    IMAGE_DIRECTORY_ENTRY_EXPORT (0)
        The directory of exported symbols; mostly used for DLLs.
        Described below.
    
    IMAGE_DIRECTORY_ENTRY_IMPORT (1)
        The directory of imported symbols; see below.
    
    IMAGE_DIRECTORY_ENTRY_RESOURCE (2)
        Directory of resources. Described below.
    
    IMAGE_DIRECTORY_ENTRY_EXCEPTION (3)
        Exception directory - structure and purpose unknown.
    
    IMAGE_DIRECTORY_ENTRY_SECURITY (4)
        Security directory - structure and purpose unknown.
    
    IMAGE_DIRECTORY_ENTRY_BASERELOC (5)
        Base relocation table - see below.
    
    IMAGE_DIRECTORY_ENTRY_DEBUG (6)
        Debug directory - contents is compiler dependent. Moreover, many
        compilers stuff the debug information into the code section and
        don't create a separate section for it.
    
    IMAGE_DIRECTORY_ENTRY_COPYRIGHT (7)
        Description string - some arbitrary copyright note or the like.
    
    IMAGE_DIRECTORY_ENTRY_GLOBALPTR (8)
        Machine Value (MIPS GP) - structure and purpose unknown.
    
    IMAGE_DIRECTORY_ENTRY_TLS (9)
        Thread local storage directory - structure unknown; contains
        variables that are declared "declspec(thread)".
    
    IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG (10)
        Load configuration directory - structure and purpose unknown.
    
    IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT (11)
        Bound import directory - see description of import directory.
    
    IMAGE_DIRECTORY_ENTRY_IAT (12)
        Import Address Table - structure and purpose unknown
    
As an example, if we find at index 7 the 2 longwords 0x12000 and 33, and
the load address is 0x10000, we know that the copyright data is at
address 0x10000+0x12000 (in whatever section there may be), and the
copyright note is 33 bytes long.
If a directory of a particular type is not used in a binary, the Size
and VirtualAddress are both 0.



Section directories
-------------------

The sections consist of two major parts: first, a section description
(of type IMAGE_SECTION_HEADER) and then the raw section data. So after
the data directories we find an array of 'NumberOfSections' section
headers.

A section header is actually a very *small* struct. It contains:

An array of IMAGE_SIZEOF_SHORT_NAME (8) bytes that make up the name
(ASCII) of the section. If all of the 8 bytes are used there is no 0-
terminator for the string! The name is typically something like ".data"
or ".text" or ".bss". There need not be a leading '.', the names may
also be "CODE" or "IAT" or the like.
Please note that the names are not at all related to the contents of the
section. A section named ".code" may or may not contain the executable
code; it may just as well contain the import address table; it may also
contain the code *and* the address table *and* the initialized data.
To find information in the sections, you will have to look it up via the
data directories of the optional header. Do not rely on the names, and
do not assume that the section's raw data starts at the beginning of a
section.

The next member of the IMAGE_SECTION_HEADER is a 32-bit-union of
'PhysicalAddress' and 'VirtualSize'. In an object file, this is the
address the contents is relocated to; in an executable, it is the size of
the contents. In fact, the field seems to be unused; There are linkers
that enter the size, and there are linkers that enter the address, and
I've also found a linker that enters a 0, and all the executables run
like the gentle wind.

The next member is 'VirtualAddress', a 32-bit-value holding the RVA to
the section's data when it is loaded in RAM.

Then we have got 32 bits of 'SizeOfRawData', which is the size of the
secion's data rounded up to the next multiple of 'FileAlignment'.

Next is 'PointerToRawData' (32 bits), which is incredibly usefull
because it is the offset from the file's beginning to the section's data.
If it is 0, the section's data are not contained in the file and will be
arbitrary.

Then we have got 'PointerToRelocations' (32 bits) and
'PointerToLinenumbers' (also 32 bits), 'NumberOfRelocations' (16 bits)
and 'NumberOfLinenumbers' (also 16 bits). All of these are information
that's only used for object files. Executables have a special base
relocation directory, and the line number information, if present at
all, is usually contained in a special purpose debugging segment or
elsewhere.

The last member of a section header is the 32 bits 'Characteristics',
which is a bunch of flags describing how the section's memory should be
treated:

    If bit 5 (IMAGE_SCN_CNT_CODE) is set, the section contains
    executable code.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -