📄 intro.txt
字号:
[5] Speed. Preccx is fast, typically taking two to five
seconds to compile scripts of several hundred lines. And it
builds fast parsers too.
and some more follows...
----------------------------------------------------------------------
4. To understand windows executable files what we need to know?
----------------------------------------------------------------------
General Layout
MS-DOS header (64 bytes)
real-mode program stub
PE file header
signature
number of sections
time date stamp
pointer to symbol table
number of symbols
size of optional header
characteristics
Optional header (224 bytes)
---standard fields
signature
linker version
code size
data size
entry point RVA
base of code
base of data
---NT specific fields
image base
initial stack size
program entry point location
preferred base address
operation system version
section alignment information
file alignment
os information
user information
system informatiion
reserved
image size
header size
file checksum
subsystem
dll flags
stack size
heap size
loader flags
number of data entries
export table
import table
resource table
exception table
security table
base relocation table
debug table
copyright
global ptr
tls table
load config table
reserved
Section table(Section Headers)
--- each section header has following
section name
virtual size
RVA/Offset
size of raw data
pointer to raw data
pointer to relocs
pointer to linenumbers
number of relocs
number of linenumbers
section flags
Section data
the data for each section is located at the file offset given
by the pointer to raw data field, size is also given.
Data directories exist within the body of
their corresponding data section. Typically,
data directories are the first structure within
the section body, but not always. For that
reason, you need to retrieve information from
both the section header and optional header to
locate a specific data directory.
There are a lot more things to know about PE file format, and
how to dig out useful informations, but lets stop here for now.
----------------------------------------------------------------------
5. Combine two things and we get the basic disassembler.
----------------------------------------------------------------------
Now I can figure out what is starting position of the code and
what is the RVA(relative virtual address), etc,...
So with this pedump.c and disassembler we built using preccx we can
report very basic things. Alas there are more things to do.
Well for reporting or printing there is print1.c fuction to look at.
The program is straight forward once you understand what is the parse
structure is ... But since I changed this one to solve the problem
which i willl tell you later, it looks quite complicated.
If there is anyone who want to print out disassembler as AT&T assembler
format you can just modify this print1.c and voila, you will get AT&T
format diassembler.
------------------------------------------------------------------------
6. What is the problem anyway?
------------------------------------------------------------------------
=======
case 1:
=======
:00401574 55 push ebp
:00401575 8BEC mov ebp, esp
:00401577 8B4508 mov eax, dword[ebp+08]
:0040157A 8B550C mov edx, dword[ebp+0C]
:0040157D C70038154000 mov dword[eax], 00401538
:00401583 895004 mov dword[eax+04], edx
:00401586 5D pop ebp
:00401587 C3 ret
:00401588 83 00 00 00 03 00 30 00 04 00 00 00 7F 00 00 00 ......0.........
:00401598 3C 00 4C 00 00 00 00 00 00 00 00 00 00 00 00 00 <.L.............
:004015A8 05 00 00 00 04 00 00 00 F0 15 40 00 01 00 5C 00 ..........@...\.
:004015B8 63 6F 6E 73 74 72 65 61 6D 00 00 00 73 17 40 00 constream...s.@.
:004015C8 00 00 00 00 03 00 00 00 00 00 00 00 C3 17 40 00 ..............@.
:004015D8 00 00 00 00 0D 00 00 00 00 00 00 00 8D 16 40 00 ..............@.
:004015E8 08 00 00 00 00 00 00 00 ........
=======
case 2:
=======
:00402BCC 55 push ebp
:00402BCD 8BEC mov ebp, esp
:00402BCF 8B4D0C mov ecx, dword[ebp+0C]
:00402BD2 8B5508 mov edx, dword[ebp+08]
:00402BD5 8BC1 mov eax, ecx
:00402BD7 83E003 and eax, 00000003
:00402BDA FF2485E12B4000 jmp dword[4*eax+00402BE1]
:00402BE1 1B2C4000 DWORD 00402C1B
:00402BE5 F12B4000 DWORD 00402BF1
:00402BE9 FF2B4000 DWORD 00402BFF
:00402BED 0D2C4000 DWORD 00402C0D
:00402BF1 8A01 mov al, byte[ecx]
:00402BF3 0AC0 or al, al
:00402BF5 7461 je 00402C58
:00402BF7 8802 mov byte[edx], al
:00402BF9 83C101 add ecx, 00000001
:00402BFC 83C201 add edx, 00000001
Well look at the case 1 and case 2 carefully.
There is a big problem in two cases. Suddenly there appears
some kind of data block which have no relations what so ever,
and some addresses which looks like case jump addresses.
There are some more kind of these data blocks appear inside code block.
Some disassembler is not prepared to handle all of these cases.
W32dsm87.exe solves some case of data and code separation, but
it is not good at this point at all.
IDA(Interactive Disassembler) only disassembles code block
which is referenced somewhere.
If IDA does not know whether it is referenced or not simply it
treats code block as data.
So either way it is unsatisfactory.
-------------------------------------------------------------------
7. Can we solve the problem?
-------------------------------------------------------------------
Well what can i say? Above two cases are taken from actual
disasembly listing of some program, so you can expect most part
of data and code seperation is done by disassem.exe.
What do i use to solve this problem?
Do you remember one of structures i showed you before?
It is case jump block, and it looks like this.
<case_jump_block> = "0xFF" "0x24" <sib> <label_start_position> <label>*
<label_start_position>
= <label>
<label> = <double_word>
It takes care of case jump block(at least partially you know).
But how about case 1?
I used several heuristic methods to solve code and data seperation.
I proceed until there breaks some problem and erase the part
which caused the problem. And restart from some point which is
somehow known to be an entry point or a label point.
Is it that simple? Yes, and no.
Basic idea is simple, but many other things have been put together
to make the program work right. If you have any questions about how
it works after studying source itself,
I am happy to help you out.
-----------------------------------------------------------------------
8. What's next step?
-----------------------------------------------------------------------
Well I am working on next version of disassembler.
I don't know exactly what should be done and what should'nt.
I like to make diassembler iterative at least although it is not interactive.
This means if you found some part of data should be code block, then you can
tell disassembler about this and disassembler does dirty job for you.
Or you can tell disassembler some part of code is not code at all and it
should be data instead, then this guy does what you have told him.
There are still something I haven't done because of time constraint.
More of string business, (this means I like to find more strings
and possibly crosss reference string data)
Data cross referencing.
Find some hidden(or indirect) call references.
I have to confess that actually I like to build a decompiler.
What is decompiler? Is it possible to make a decompiler?
Decompiler does what disassembler does but one step further.
It produces source programs(C or C++ or whatever) for given exe files.
There was some efforts to build a real life decompiler but
no one succeeded in building usable one.
I have some vague idea how this can be done, but there are some reasons
to believe this is possible.
--------------------------------------------------------------------------
9. Some explanation about each files and directory.
--------------------------------------------------------------------------
First, preccx.zip contains most part of preccx package.
I deleted libarary part and something else which seems
not that important, if you wish to obtain whole package
you may contact "http://www.comlab.ox.ac.uk:80/archive/redo/"
You can find the preccx package there.
Second, disasm.zip contains whole source for disassem.exe program.
it consists of main.c, pedump.c, print1.c, g1.c, ccx.c, ccx.h, cc.bat
which is a batch file to compile sources using DJGPP compiler.
g1.y is the preccx script for disassembler. g1.c is the transformed
C source file for g1.y.
intro.txt is what you read right now.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -