📄 decompil.txt
字号:
Question: Is there any hope of a decompiler that would convert an executable programinto C/C++ code?Answer: Don't hold your breath. Think about it... For a decompiler to workproperly, either 1) every compiler would have to generate substantiallyidentical code, even with full optimization turned on, or 2) it would have torecognize the individual output of every compiler's code generator. If the first case were to be correct, there would be no more need forcompiler benchmarks since every one would work the same. For the second caseto be true would require in immensely complex program that had to change withevery new compiler release. OK, so what about specific decompilers for specific compilers - say adecompiler designed to only work on code generated by, say, BC++ 4.52? Thisgets us right back to the optimization issue. Code written for clarity andunderstandability is often inefficient. Code written for maximum performance(speed or size) is often cryptic (at best!) Add to this the fact that allmodern compilers have a multitude of optimization switches to control whichoptimization techniques to enable and which to avoid. The bottom line isthat, for a reasonably large, complex source module, you can get the compilerto produce a number of different object modules simply by changing youroptimization switches, so your decompiler will also have to be a deoptimizerwhich can automagically recognize which optimization strategies were enabledat compile time. OK, let's simplify further and specify that you only want to support onespecific compiler and you want to decompile to the most logical source codewithout trying to interpret the optimization. What then? A good optimizer canand will substantially rewrite the internals of your code, so what you getout of your decompiler will be, not only cryptic, but in many cases, riddledwith goto statements and other no-no's of good coding practice. At thispoint, you have decompiled source, but what good is it? Also note carefully my reference to source modules. One characteristic of Cis that it becomes largely unreadable unless broken into easily maintainablesource modules (.C files). How will the decompiler deal with that? It couldeither try to decompile the whole program into some mammoth main() function,losing all modularity, or it could try to place each called function into itsown file. The first way would generate unusable chaos and the second wouldrun into problems where the original source had files with multiple functionsusing static data and/or one or more functions calling one or more staticfunctions. A decompiler could make static data and/or functions global butonly at the expense or readability (which would already be unacceptable). Also, remember that commercial applications often code the most difficultor time-critical functions in assembler which could prove almost impossibleto decompile into a C equivalent. Closely related to the issue of modularity is that of library code.Consider the ubiquitous "Hello world" program. After compilation it containsabout 10 bytes of compiled source, about a dozen bytes of data, and anywherefrom 5-10K (depending on compiler, target, memory model, etc.) of start upand library code. This is a great example since printf() also calls *lots* ofother library functions of its own! Once the decompiler has assigned names tothe dozen or so functions in its output, the fun starts when you have tofigure out which arbitrarily-named function is really printf() and whichother functions are library helper functions that it calls. The bottom linehere is that in order to do so, you'd have to know enough about writing Clibraries to be able to recognize the code for printf() when you see it. Again, the situation with C++ would be orders of magnitude more complextrying to make sense of the compiled code once the O-O structures andrelationships had been compiled into oblivion. Even if you take the simpleapproach and decompile C++ into C, would anyone like to try and trace throughthe source to figure out a cout call which adds another 7-10K of overheadvis-a-vis a printf() call? I sure wouldn't!!! So what do your have? For a small program, you'd wind up trying to decipherwhat is mostly library source. For a large program, you'd wind up with either1) one humonguous main(), or 2) lots of arbitrary single-function modulesfrom which all notions of static data and functions would have been lost(contributing to a vast pool of global data), which would still includedecompiled source for all the library objects as well. In any scenario, isany of this useful? Probably not. While we've touched on the topic of library code, here's yet another reasonthat C and C++ are particularly difficult to de-compile: macros.For instance, if I have something like: while (EOF != ( ch = getchar())) { if (isupper(ch)) putchar(ch);getchar, EOF, putchar and isupper are all typically macros, something like:#define EOF -1#define isupper(x) (__types[(unsigned char)x+1] && __UPPER)#define getchar() (getc(stdin))#define putchar(c) (putc((c),stdout)#define getc(s) ((s)->__pos<(s)->__len? \ (s)->__buf[__pos++]: \ filbuf(s))#define putc(c,s) ((s)->__pos<(s)->__len? \ (s)->__buf[__pos++]=(c): \ putbuf((s),(c)))Finally, stdin and stdout are generally just items in an array of FILEpointers something like:FILE __iobuf[20];FILE *stdin = __iobuf; // This part is done silently by theFILE *stdout = __iobuf + 1; // compiler, without actual source codeFILE *stderr = __iobuf + 2;Even if you just expand the macros and never actually compile the code atall, you end up with something that's basically unreadable. However, this iswhat actually gets fed to the compiler, so it's also absolute best you couldever hope for from a perfect de-compiler.C++ of course adds in-line functions and after an optimizer runs acrossthings, the code from the in-line function may well be mixed in withsurrounding code, making it nearly impossible to extract the function fromthe code that calls it. There are only a few formats in use for vtables,which would help in preserving virtual functions, but inline functions wouldbe lost, so you'd typically end up with hundreds of times that code would bedirectly accessing variables in other classes. Like I said, don't hold your breath. As technology improves to wheredecompilers may become more feasible, optimizers and languages (C++, forexample, would be a significantly tougher language to decompile than C) alsoconspire to make them less likely. For years Unix applications have been distributed in shrouded source form(machine but not human readable -- all comments and whitespace removed,variables names all in the form OOIIOIOI, etc.), which has been a quiteadequate means of protecting the author's rights. It's very unlikely thatdecompiler output would even be as readable as shrouded source. A general purpose decompiler is the Holy Grail of tyro programmers. [by Bob Stout & Jerry Coffin]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -