📄 intro.txt

📁 一个反汇编程序源码
💻 TXT
📖 第 1 页 / 共 2 页
字号:
上一页 12

     [5] Speed. Preccx is fast,  typically  taking  two  to  five
     seconds  to compile scripts of several hundred lines. And it
     builds fast parsers too.

and some more follows...


----------------------------------------------------------------------
4. To understand windows executable files what we need to know?
----------------------------------------------------------------------

General Layout

   MS-DOS header (64 bytes)
   
   real-mode program stub
   
   PE file header
      signature
	  number of sections
	  time date stamp
	  pointer to symbol table
	  number of symbols
	  size of optional header
	  characteristics
   
   Optional header	(224 bytes)
   ---standard fields  
	  signature
	  linker version
	  code size
	  data size
      entry point RVA
	  base of code
	  base of data
   ---NT specific fields	 
	  image base
	  initial stack size
	  program entry point location
	  preferred base address
	  operation system version
	  section alignment information
	  file alignment
	  os information
	  user information
	  system informatiion
	  reserved
	  image size
	  header size
	  file checksum
	  subsystem
	  dll flags
	  stack size
	  heap size
	  loader flags
	  number of data entries
	     export table
		 import table
		 resource table
		 exception table
		 security table
		 base relocation table
		 debug table
		 copyright 
		 global ptr
		 tls table
		 load config table
		 reserved
   
   Section table(Section Headers)
      --- each section header has following   
	  section name
	  virtual size
	  RVA/Offset
	  size of raw data
	  pointer to raw data
	  pointer to relocs
	  pointer to linenumbers
	  number of relocs
	  number of linenumbers
	  section flags
   
   Section data
      the data for each section is located at the file offset given
	  by the pointer to raw data field, size is also given.

      Data directories exist within the body of 
      their corresponding data section. Typically, 
      data directories are the first structure within 
      the section body, but not always. For that 
      reason, you need to retrieve information from 
      both the section header and optional header to 
      locate a specific data directory.
    
There are a lot more things to know about PE file format, and
how to dig out useful informations, but lets stop here for now.


----------------------------------------------------------------------
5. Combine two things and we get the basic disassembler.
----------------------------------------------------------------------


Now I can figure out what is starting position of the code and
what is the RVA(relative virtual address), etc,...
So with this pedump.c and disassembler we built using preccx we can
report very basic things. Alas there are more things to do.

Well for reporting or printing there is print1.c fuction to look at.
The program is straight forward once you understand what is the parse
structure is ... But since I changed this one to solve the problem
which i willl tell you later, it looks quite complicated.

If there is anyone who want to print out disassembler as AT&T assembler
format you can just modify this print1.c and voila, you will get AT&T
format diassembler.



------------------------------------------------------------------------
6. What is the problem anyway?
------------------------------------------------------------------------
  
  =======
  case 1:
  =======

:00401574 55                      push ebp
:00401575 8BEC                    mov ebp, esp
:00401577 8B4508                  mov eax, dword[ebp+08]
:0040157A 8B550C                  mov edx, dword[ebp+0C]
:0040157D C70038154000            mov dword[eax], 00401538
:00401583 895004                  mov dword[eax+04], edx
:00401586 5D                      pop ebp
:00401587 C3                      ret

:00401588 83 00 00 00 03 00 30 00 04 00 00 00 7F 00 00 00   ......0.........
:00401598 3C 00 4C 00 00 00 00 00 00 00 00 00 00 00 00 00   <.L.............
:004015A8 05 00 00 00 04 00 00 00 F0 15 40 00 01 00 5C 00   ..........@...\.
:004015B8 63 6F 6E 73 74 72 65 61 6D 00 00 00 73 17 40 00   constream...s.@.
:004015C8 00 00 00 00 03 00 00 00 00 00 00 00 C3 17 40 00   ..............@.
:004015D8 00 00 00 00 0D 00 00 00 00 00 00 00 8D 16 40 00   ..............@.
:004015E8 08 00 00 00 00 00 00 00                           ........

  =======
  case 2:
  =======

:00402BCC 55                      push ebp
:00402BCD 8BEC                    mov ebp, esp
:00402BCF 8B4D0C                  mov ecx, dword[ebp+0C]
:00402BD2 8B5508                  mov edx, dword[ebp+08]
:00402BD5 8BC1                    mov eax, ecx
:00402BD7 83E003                  and eax, 00000003
:00402BDA FF2485E12B4000          jmp dword[4*eax+00402BE1]

:00402BE1 1B2C4000                DWORD 00402C1B
:00402BE5 F12B4000                DWORD 00402BF1
:00402BE9 FF2B4000                DWORD 00402BFF
:00402BED 0D2C4000                DWORD 00402C0D

:00402BF1 8A01                    mov al, byte[ecx]
:00402BF3 0AC0                    or al, al
:00402BF5 7461                    je 00402C58
:00402BF7 8802                    mov byte[edx], al
:00402BF9 83C101                  add ecx, 00000001
:00402BFC 83C201                  add edx, 00000001

Well look at the case 1 and case 2 carefully.
There is a big problem in two cases. Suddenly there appears
some kind of data block which have no relations what so ever,
and some addresses which looks like case jump addresses.
There are some more kind of these data blocks appear inside code block.
Some disassembler is not prepared to handle all of these cases.
W32dsm87.exe solves some case of data and code separation, but
it is not good at this point at all.
IDA(Interactive Disassembler) only disassembles code block 
which is referenced somewhere.
If IDA does not know whether it is referenced or not simply it
treats code block as data.
So either way it is unsatisfactory.


-------------------------------------------------------------------
7. Can we solve the problem?
-------------------------------------------------------------------

Well what can i say? Above two cases are taken from actual
disasembly listing of some program, so you can expect most part
of data and code seperation is done by disassem.exe.

What do i use to solve this problem? 
Do you remember one of structures i showed you before?
It is case jump block, and it looks like this.

   <case_jump_block>   =    "0xFF" "0x24" <sib> <label_start_position> <label>*

   <label_start_position>
                       =    <label>

   <label>             =    <double_word>

It takes care of case jump block(at least partially you know).
But how about case 1?
I used several heuristic methods to solve code and data seperation.
I proceed until there breaks some problem and erase the part 
which caused the problem. And restart from some point which is 
somehow known to be an entry point or a label point.
Is it that simple?	Yes, and no. 
Basic idea is simple, but many other things have been put together 
to make the program work right. If you have any questions about how 
it works after studying source itself,
I am happy to help you out.


-----------------------------------------------------------------------
8. What's next step?
-----------------------------------------------------------------------

Well I am working on next version of disassembler.
I don't know exactly what should be done and what should'nt.
I like to make diassembler iterative at least although it is not interactive.
This means if you found some part of data should be code block, then you can
tell disassembler about this and disassembler does dirty job for you.
Or you can tell disassembler some part of code is not code at all and it
should be data instead, then this guy does what you have told him.

There are still something I haven't done because of time constraint.
   More of string business, (this means I like to find more strings 
      and possibly crosss reference string data)
   Data cross referencing.
   Find some hidden(or indirect) call references.

I have to confess that actually I like to build a decompiler.
What is decompiler? Is it possible to make a decompiler?
Decompiler does what disassembler does but one step further.
It produces source programs(C or C++ or whatever) for given exe files.

There was some efforts to build a real life decompiler but
no one succeeded in building usable one.
I have some vague idea how this can be done, but there are some reasons 
to believe this is possible.



--------------------------------------------------------------------------
9. Some explanation about each files and directory.
--------------------------------------------------------------------------

First, preccx.zip contains most part of preccx package.
I deleted libarary part and something else which seems 
not that important, if you wish to obtain whole package
you may contact "http://www.comlab.ox.ac.uk:80/archive/redo/"
You can find the preccx package there.

Second, disasm.zip contains whole source for disassem.exe program. 
it consists of main.c, pedump.c, print1.c, g1.c, ccx.c, ccx.h, cc.bat
which is a batch file to compile sources using DJGPP compiler.

g1.y is the preccx script for disassembler. g1.c is the transformed 
C source file for g1.y.

intro.txt is what you read right now.
上一页 12
💿 文件大小 619 K
👤 上传用户 rubyist
📂 所属分类编译器/解释器
📄 代码行数 530 行
💻 语言类型 TXT
🏷️ 相关标签

#反汇编 #程序源码
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -