📄 nasmdoc8.htm
字号:
<html><head><title>NASM Manual</title></head><body><h1 align=center>The Netwide Assembler: NASM</h1><p align=center><a href="nasmdoc9.html">Next Chapter</a> |<a href="nasmdoc7.html">Previous Chapter</a> |<a href="nasmdoc0.html">Contents</a> |<a href="nasmdoci.html">Index</a><h2><a name="chapter-8">Chapter 8: Writing 32-bit Code (Unix, Win32, DJGPP)</a></h2><p>This chapter attempts to cover some of the common issues involved whenwriting 32-bit code, to run under Win32 or Unix, or to be linked with Ccode generated by a Unix-style C compiler such as DJGPP. It covers how towrite assembly code to interface with 32-bit C routines, and how to writeposition-independent code for shared libraries.<p>Almost all 32-bit code, and in particular all code running under<code><nobr>Win32</nobr></code>, <code><nobr>DJGPP</nobr></code> or any ofthe PC Unix variants, runs in <em>flat</em> memory model. This means thatthe segment registers and paging have already been set up to give you thesame 32-bit 4Gb address space no matter what segment you work relative to,and that you should ignore all segment registers completely. When writingflat-model application code, you never need to use a segment override ormodify any segment register, and the code-section addresses you pass to<code><nobr>CALL</nobr></code> and <code><nobr>JMP</nobr></code> live inthe same address space as the data-section addresses you access yourvariables by and the stack-section addresses you access local variables andprocedure parameters by. Every address is 32 bits long and contains only anoffset part.<h3><a name="section-8.1">8.1 Interfacing to 32-bit C Programs</a></h3><p>A lot of the discussion in <a href="nasmdoc7.html#section-7.4">section7.4</a>, about interfacing to 16-bit C programs, still applies when workingin 32 bits. The absence of memory models or segmentation worries simplifiesthings a lot.<h4><a name="section-8.1.1">8.1.1 External Symbol Names</a></h4><p>Most 32-bit C compilers share the convention used by 16-bit compilers,that the names of all global symbols (functions or data) they define areformed by prefixing an underscore to the name as it appears in the Cprogram. However, not all of them do: the <code><nobr>ELF</nobr></code>specification states that C symbols do <em>not</em> have a leadingunderscore on their assembly-language names.<p>The older Linux <code><nobr>a.out</nobr></code> C compiler, all<code><nobr>Win32</nobr></code> compilers, <code><nobr>DJGPP</nobr></code>,and <code><nobr>NetBSD</nobr></code> and <code><nobr>FreeBSD</nobr></code>,all use the leading underscore; for these compilers, the macros<code><nobr>cextern</nobr></code> and <code><nobr>cglobal</nobr></code>, asgiven in <a href="nasmdoc7.html#section-7.4.1">section 7.4.1</a>, willstill work. For <code><nobr>ELF</nobr></code>, though, the leadingunderscore should not be used.<p>See also <a href="nasmdoc2.html#section-2.1.21">section 2.1.21</a>.<h4><a name="section-8.1.2">8.1.2 Function Definitions and Function Calls</a></h4><p>The C calling conventionThe C calling convention in 32-bit programs isas follows. In the following description, the words <em>caller</em> and<em>callee</em> are used to denote the function doing the calling and thefunction which gets called.<ul><li>The caller pushes the function's parameters on the stack, one afteranother, in reverse order (right to left, so that the first argumentspecified to the function is pushed last).<li>The caller then executes a near <code><nobr>CALL</nobr></code>instruction to pass control to the callee.<li>The callee receives control, and typically (although this is notactually necessary, in functions which do not need to access theirparameters) starts by saving the value of <code><nobr>ESP</nobr></code> in<code><nobr>EBP</nobr></code> so as to be able to use<code><nobr>EBP</nobr></code> as a base pointer to find its parameters onthe stack. However, the caller was probably doing this too, so part of thecalling convention states that <code><nobr>EBP</nobr></code> must bepreserved by any C function. Hence the callee, if it is going to set up<code><nobr>EBP</nobr></code> as a frame pointer, must push the previousvalue first.<li>The callee may then access its parameters relative to<code><nobr>EBP</nobr></code>. The doubleword at<code><nobr>[EBP]</nobr></code> holds the previous value of<code><nobr>EBP</nobr></code> as it was pushed; the next doubleword, at<code><nobr>[EBP+4]</nobr></code>, holds the return address, pushedimplicitly by <code><nobr>CALL</nobr></code>. The parameters start afterthat, at <code><nobr>[EBP+8]</nobr></code>. The leftmost parameter of thefunction, since it was pushed last, is accessible at this offset from<code><nobr>EBP</nobr></code>; the others follow, at successively greateroffsets. Thus, in a function such as <code><nobr>printf</nobr></code> whichtakes a variable number of parameters, the pushing of the parameters inreverse order means that the function knows where to find its firstparameter, which tells it the number and type of the remaining ones.<li>The callee may also wish to decrease <code><nobr>ESP</nobr></code>further, so as to allocate space on the stack for local variables, whichwill then be accessible at negative offsets from<code><nobr>EBP</nobr></code>.<li>The callee, if it wishes to return a value to the caller, should leavethe value in <code><nobr>AL</nobr></code>, <code><nobr>AX</nobr></code> or<code><nobr>EAX</nobr></code> depending on the size of the value.Floating-point results are typically returned in<code><nobr>ST0</nobr></code>.<li>Once the callee has finished processing, it restores<code><nobr>ESP</nobr></code> from <code><nobr>EBP</nobr></code> if it hadallocated local stack space, then pops the previous value of<code><nobr>EBP</nobr></code>, and returns via<code><nobr>RET</nobr></code> (equivalently,<code><nobr>RETN</nobr></code>).<li>When the caller regains control from the callee, the functionparameters are still on the stack, so it typically adds an immediateconstant to <code><nobr>ESP</nobr></code> to remove them (instead ofexecuting a number of slow <code><nobr>POP</nobr></code> instructions).Thus, if a function is accidentally called with the wrong number ofparameters due to a prototype mismatch, the stack will still be returned toa sensible state since the caller, which <em>knows</em> how many parametersit pushed, does the removing.</ul><p>There is an alternative calling convention used by Win32 programs forWindows API calls, and also for functions called <em>by</em> the WindowsAPI such as window procedures: they follow what Microsoft calls the<code><nobr>__stdcall</nobr></code> convention. This is slightly closer tothe Pascal convention, in that the callee clears the stack by passing aparameter to the <code><nobr>RET</nobr></code> instruction. However, theparameters are still pushed in right-to-left order.<p>Thus, you would define a function in C style in the following way:<p><pre>global _myfunc _myfunc: push ebp mov ebp,esp sub esp,0x40 ; 64 bytes of local stack space mov ebx,[ebp+8] ; first parameter to function ; some more code leave ; mov esp,ebp / pop ebp ret</pre><p>At the other end of the process, to call a C function from your assemblycode, you would do something like this:<p><pre>extern _printf ; and then, further down... push dword [myint] ; one of my integer variables push dword mystring ; pointer into my data segment call _printf add esp,byte 8 ; `byte' saves space ; then those data items... segment _DATA myint dd 1234 mystring db 'This number -> %d <- should be 1234',10,0</pre><p>This piece of code is the assembly equivalent of the C code<p><pre> int myint = 1234; printf("This number -> %d <- should be 1234\n", myint);</pre><h4><a name="section-8.1.3">8.1.3 Accessing Data Items</a></h4><p>To get at the contents of C variables, or to declare variables which Ccan access, you need only declare the names as<code><nobr>GLOBAL</nobr></code> or <code><nobr>EXTERN</nobr></code>.(Again, the names require leading underscores, as stated in<a href="#section-8.1.1">section 8.1.1</a>.) Thus, a C variable declared as<code><nobr>int i</nobr></code> can be accessed from assembler as<p><pre> extern _i mov eax,[_i]</pre><p>And to declare your own integer variable which C programs can access as<code><nobr>extern int j</nobr></code>, you do this (making sure you areassembling in the <code><nobr>_DATA</nobr></code> segment, if necessary):<p><pre> global _j _j dd 0</pre><p>To access a C array, you need to know the size of the components of thearray. For example, <code><nobr>int</nobr></code> variables are four byteslong, so if a C program declares an array as<code><nobr>int a[10]</nobr></code>, you can access<code><nobr>a[3]</nobr></code> by coding<code><nobr>mov ax,[_a+12]</nobr></code>. (The byte offset 12 is obtainedby multiplying the desired array index, 3, by the size of the arrayelement, 4.) The sizes of the C base types in 32-bit compilers are: 1 for<code><nobr>char</nobr></code>, 2 for <code><nobr>short</nobr></code>, 4for <code><nobr>int</nobr></code>, <code><nobr>long</nobr></code> and<code><nobr>float</nobr></code>, and 8 for<code><nobr>double</nobr></code>. Pointers, being 32-bit addresses, arealso 4 bytes long.<p>To access a C data structure, you need to know the offset from the baseof the structure to the field you are interested in. You can either do thisby converting the C structure definition into a NASM structure definition(using <code><nobr>STRUC</nobr></code>), or by calculating the one offsetand using just that.<p>To do either of these, you should read your C compiler's manual to findout how it organises data structures. NASM gives no special alignment tostructure members in its own <code><nobr>STRUC</nobr></code> macro, so youhave to specify alignment yourself if the C compiler generates it.Typically, you might find that a structure like<p><pre>struct { char c; int i; } foo;</pre><p>might be eight bytes long rather than five, since the<code><nobr>int</nobr></code> field would be aligned to a four-byteboundary. However, this sort of feature is sometimes a configurable optionin the C compiler, either using command-line options or<code><nobr>#pragma</nobr></code> lines, so you have to find out how yourown compiler does it.<h4><a name="section-8.1.4">8.1.4 <code><nobr>c32.mac</nobr></code>: Helper Macros for the 32-bit C Interface</a></h4><p>Included in the NASM archives, in the <code><nobr>misc</nobr></code>directory, is a file <code><nobr>c32.mac</nobr></code> of macros. Itdefines three macros: <code><nobr>proc</nobr></code>,<code><nobr>arg</nobr></code> and <code><nobr>endproc</nobr></code>. Theseare intended to be used for C-style procedure definitions, and theyautomate a lot of the work involved in keeping track of the callingconvention.<p>An example of an assembly function using the macro set is given here:<p><pre>proc _proc32 %$i arg %$j arg mov eax,[ebp + %$i] mov ebx,[ebp + %$j] add eax,[ebx] endproc</pre><p>This defines <code><nobr>_proc32</nobr></code> to be a procedure takingtwo arguments, the first (<code><nobr>i</nobr></code>) an integer and thesecond (<code><nobr>j</nobr></code>) a pointer to an integer. It returns<code><nobr>i + *j</nobr></code>.<p>Note that the <code><nobr>arg</nobr></code> macro has an<code><nobr>EQU</nobr></code> as the first line of its expansion, and sincethe label before the macro call gets prepended to the first line of theexpanded macro, the <code><nobr>EQU</nobr></code> works, defining<code><nobr>%$i</nobr></code> to be an offset from<code><nobr>BP</nobr></code>. A context-local variable is used, local tothe context pushed by the <code><nobr>proc</nobr></code> macro and poppedby the <code><nobr>endproc</nobr></code> macro, so that the same argumentname can be used in later procedures. Of course, you don't <em>have</em> todo that.<p><code><nobr>arg</nobr></code> can take an optional parameter, giving thesize of the argument. If no size is given, 4 is assumed, since it is likelythat many function parameters will be of type <code><nobr>int</nobr></code>or pointers.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -