📄 nasmdoc.src

📁 开源的nasm编译器源码,研究编译器原理很有帮且
💻 SRC
📖 第 1 页 / 共 5 页
字号:
As part of NASM's drive for simplicity, it also does not support the\c{ASSUME} directive. NASM will not keep track of what values youchoose to put in your segment registers, and will never\e{automatically} generate a \i{segment override} prefix.\S{qsmodel} NASM Doesn't Support \i{Memory Models}NASM also does not have any directives to support different 16-bitmemory models. The programmer has to keep track of which functionsare supposed to be called with a \i{far call} and which with a\i{near call}, and is responsible for putting the correct form of\c{RET} instruction (\c{RETN} or \c{RETF}; NASM accepts \c{RET}itself as an alternate form for \c{RETN}); in addition, theprogrammer is responsible for coding CALL FAR instructions wherenecessary when calling \e{external} functions, and must also keeptrack of which external variable definitions are far and which arenear.\S{qsfpu} \i{Floating-Point} DifferencesNASM uses different names to refer to floating-point registers fromMASM: where MASM would call them \c{ST(0)}, \c{ST(1)} and so on, and\i\c{a86} would call them simply \c{0}, \c{1} and so on, NASMchooses to call them \c{st0}, \c{st1} etc.As of version 0.96, NASM now treats the instructions with\i{`nowait'} forms in the same way as MASM-compatible assemblers.The idiosyncratic treatment employed by 0.95 and earlier was basedon a misunderstanding by the authors.\S{qsother} Other DifferencesFor historical reasons, NASM uses the keyword \i\c{TWORD} where MASMand compatible assemblers use \i\c{TBYTE}.NASM does not declare \i{uninitialised storage} in the same way asMASM: where a MASM programmer might use \c{stack db 64 dup (?)},NASM requires \c{stack resb 64}, intended to be read as `reserve 64bytes'. For a limited amount of compatibility, since NASM treats\c{?} as a valid character in symbol names, you can code \c{? equ 0}and then writing \c{dw ?} will at least do something vaguely useful.\I\c{RESB}\i\c{DUP} is still not a supported syntax, however.In addition to all of this, macros and directives work completelydifferently to MASM. See \k{preproc} and \k{directive} for furtherdetails.\C{lang} The NASM Language\H{syntax} Layout of a NASM Source LineLike most assemblers, each NASM source line contains (unless itis a macro, a preprocessor directive or an assembler directive: see\k{preproc} and \k{directive}) some combination of the four fields\c label:    instruction operands        ; commentAs usual, most of these fields are optional; the presence or absenceof any combination of a label, an instruction and a comment is allowed.Of course, the operand field is either required or forbidden by thepresence and nature of the instruction field.NASM uses backslash (\\) as the line continuation character; if a lineends with backslash, the next line is considered to be a part of thebackslash-ended line.NASM places no restrictions on white space within a line: labels mayhave white space before them, or instructions may have no spacebefore them, or anything. The \i{colon} after a label is alsooptional. (Note that this means that if you intend to code \c{lodsb}alone on a line, and type \c{lodab} by accident, then that's still avalid source line which does nothing but define a label. RunningNASM with the command-line option\I{orphan-labels}\c{-w+orphan-labels} will cause it to warn you ifyou define a label alone on a line without a \i{trailing colon}.)\i{Valid characters} in labels are letters, numbers, \c{_}, \c{$},\c{#}, \c{@}, \c{~}, \c{.}, and \c{?}. The only characters which maybe used as the \e{first} character of an identifier are letters,\c{.} (with special meaning: see \k{locallab}), \c{_} and \c{?}.An identifier may also be prefixed with a \I{$, prefix}\c{$} toindicate that it is intended to be read as an identifier and not areserved word; thus, if some other module you are linking withdefines a symbol called \c{eax}, you can refer to \c{$eax} in NASMcode to distinguish the symbol from the register.The instruction field may contain any machine instruction: Pentiumand P6 instructions, FPU instructions, MMX instructions and evenundocumented instructions are all supported. The instruction may beprefixed by \c{LOCK}, \c{REP}, \c{REPE}/\c{REPZ} or\c{REPNE}/\c{REPNZ}, in the usual way. Explicit \I{address-sizeprefixes}address-size and \i{operand-size prefixes} \c{A16},\c{A32}, \c{O16} and \c{O32} are provided - one example of their useis given in \k{mixsize}. You can also use the name of a \I{segmentoverride}segment register as an instruction prefix: coding\c{es mov [bx],ax} is equivalent to coding \c{mov [es:bx],ax}. Werecommend the latter syntax, since it is consistent with othersyntactic features of the language, but for instructions such as\c{LODSB}, which has no operands and yet can require a segmentoverride, there is no clean syntactic way to proceed apart from\c{es lodsb}.An instruction is not required to use a prefix: prefixes such as\c{CS}, \c{A32}, \c{LOCK} or \c{REPE} can appear on a line bythemselves, and NASM will just generate the prefix bytes.In addition to actual machine instructions, NASM also supports anumber of pseudo-instructions, described in \k{pseudop}.Instruction \i{operands} may take a number of forms: they can beregisters, described simply by the register name (e.g. \c{ax},\c{bp}, \c{ebx}, \c{cr0}: NASM does not use the \c{gas}-stylesyntax in which register names must be prefixed by a \c{%} sign), orthey can be \i{effective addresses} (see \k{effaddr}), constants(\k{const}) or expressions (\k{expr}).For \i{floating-point} instructions, NASM accepts a wide range ofsyntaxes: you can use two-operand forms like MASM supports, or youcan use NASM's native single-operand forms in most cases. Details ofall forms of each supported instruction are given in\k{iref}. For example, you can code:\c         fadd    st1             ; this sets st0 := st0 + st1\c         fadd    st0,st1         ; so does this\c\c         fadd    st1,st0         ; this sets st1 := st1 + st0\c         fadd    to st1          ; so does thisAlmost any floating-point instruction that references memory mustuse one of the prefixes \i\c{DWORD}, \i\c{QWORD} or \i\c{TWORD} toindicate what size of \i{memory operand} it refers to.\H{pseudop} \i{Pseudo-Instructions}Pseudo-instructions are things which, though not real x86 machineinstructions, are used in the instruction field anyway becausethat's the most convenient place to put them. The currentpseudo-instructions are \i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ} and\i\c{DT}, their \i{uninitialised} counterparts \i\c{RESB},\i\c{RESW}, \i\c{RESD}, \i\c{RESQ} and \i\c{REST}, the \i\c{INCBIN}command, the \i\c{EQU} command, and the \i\c{TIMES} prefix.\S{db} \c{DB} and friends: Declaring Initialised Data\i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ} and \i\c{DT} are used, muchas in MASM, to declare initialised data in the output file. They canbe invoked in a wide range of ways:\I{floating-point}\I{character constant}\I{string constant}\c       db    0x55                ; just the byte 0x55\c       db    0x55,0x56,0x57      ; three bytes in succession\c       db    'a',0x55            ; character constants are OK\c       db    'hello',13,10,'$'   ; so are string constants\c       dw    0x1234              ; 0x34 0x12\c       dw    'a'                 ; 0x61 0x00 (it's just a number)\c       dw    'ab'                ; 0x61 0x62 (character constant)\c       dw    'abc'               ; 0x61 0x62 0x63 0x00 (string)\c       dd    0x12345678          ; 0x78 0x56 0x34 0x12\c       dd    1.234567e20         ; floating-point constant\c       dq    1.234567e20         ; double-precision float\c       dt    1.234567e20         ; extended-precision float\c{DQ} and \c{DT} do not accept \i{numeric constants} or stringconstants as operands.\S{resb} \c{RESB} and friends: Declaring \i{Uninitialised} Data\i\c{RESB}, \i\c{RESW}, \i\c{RESD}, \i\c{RESQ} and \i\c{REST} aredesigned to be used in the BSS section of a module: they declare\e{uninitialised} storage space. Each takes a single operand, whichis the number of bytes, words, doublewords or whatever to reserve.As stated in \k{qsother}, NASM does not support the MASM/TASM syntaxof reserving uninitialised space by writing \I\c{?}\c{DW ?} orsimilar things: this is what it does instead. The operand to a\c{RESB}-type pseudo-instruction is a \i\e{critical expression}: see\k{crit}.For example:\c buffer:         resb    64              ; reserve 64 bytes\c wordvar:        resw    1               ; reserve a word\c realarray       resq    10              ; array of ten reals\S{incbin} \i\c{INCBIN}: Including External \i{Binary Files}\c{INCBIN} is borrowed from the old Amiga assembler \i{DevPac}: itincludes a binary file verbatim into the output file. This can behandy for (for example) including \i{graphics} and \i{sound} datadirectly into a game executable file. It can be called in one ofthese three ways:\c     incbin  "file.dat"             ; include the whole file\c     incbin  "file.dat",1024        ; skip the first 1024 bytes\c     incbin  "file.dat",1024,512    ; skip the first 1024, and\c                                    ; actually include at most 512\S{equ} \i\c{EQU}: Defining Constants\c{EQU} defines a symbol to a given constant value: when \c{EQU} isused, the source line must contain a label. The action of \c{EQU} isto define the given label name to the value of its (only) operand.This definition is absolute, and cannot change later. So, forexample,\c message         db      'hello, world'\c msglen          equ     $-messagedefines \c{msglen} to be the constant 12. \c{msglen} may not then beredefined later. This is not a \i{preprocessor} definition either:the value of \c{msglen} is evaluated \e{once}, using the value of\c{$} (see \k{expr} for an explanation of \c{$}) at the point ofdefinition, rather than being evaluated wherever it is referencedand using the value of \c{$} at the point of reference. Note thatthe operand to an \c{EQU} is also a \i{critical expression}(\k{crit}).\S{times} \i\c{TIMES}: \i{Repeating} Instructions or DataThe \c{TIMES} prefix causes the instruction to be assembled multipletimes. This is partly present as NASM's equivalent of the \i\c{DUP}syntax supported by \i{MASM}-compatible assemblers, in that you cancode\c zerobuf:        times 64 db 0or similar things; but \c{TIMES} is more versatile than that. Theargument to \c{TIMES} is not just a numeric constant, but a numeric\e{expression}, so you can do things like\c buffer: db      'hello, world'\c         times 64-$+buffer db ' 'which will store exactly enough spaces to make the total length of\c{buffer} up to 64. Finally, \c{TIMES} can be applied to ordinaryinstructions, so you can code trivial \i{unrolled loops} in it:\c         times 100 movsbNote that there is no effective difference between \c{times 100 resb1} and \c{resb 100}, except that the latter will be assembled about100 times faster due to the internal structure of the assembler.The operand to \c{TIMES}, like that of \c{EQU} and those of \c{RESB}and friends, is a critical expression (\k{crit}).Note also that \c{TIMES} can't be applied to \i{macros}: the reasonfor this is that \c{TIMES} is processed after the macro phase, whichallows the argument to \c{TIMES} to contain expressions such as\c{64-$+buffer} as above. To repeat more than one line of code, or acomplex macro, use the preprocessor \i\c{%rep} directive.\H{effaddr} Effective AddressesAn \i{effective address} is any operand to an instruction which\I{memory reference}references memory. Effective addresses, in NASM,have a very simple syntax: they consist of an expression evaluatingto the desired address, enclosed in \i{square brackets}. Forexample:\c wordvar dw      123\c         mov     ax,[wordvar]\c         mov     ax,[wordvar+1]\c         mov     ax,[es:wordvar+bx]Anything not conforming to this simple system is not a valid memoryreference in NASM, for example \c{es:wordvar[bx]}.More complicated effective addresses, such as those involving morethan one register, work in exactly the same way:\c         mov     eax,[ebx*2+ecx+offset]\c         mov     ax,[bp+di+8]NASM is capable of doing \i{algebra} on these effective addresses,so that things which don't necessarily \e{look} legal are perfectlyall right:\c     mov     eax,[ebx*5]             ; assembles as [ebx*4+ebx]\c     mov     eax,[label1*2-label2]   ; ie [label1+(label1-label2)]Some forms of effective address have more than one assembled form;in most such cases NASM will generate the smallest form it can. Forexample, there are distinct assembled forms for the 32-bit effectiveaddresses \c{[eax*2+0]} and \c{[eax+eax]}, and NASM will generallygenerate the latter on the grounds that the former requires fourbytes to store a zero offset.NASM has a hinting mechanism which will cause \c{[eax+ebx]} and\c{[ebx+eax]} to generate different opcodes; this is occasionallyuseful because \c{[esi+ebp]} and \c{[ebp+esi]} have differentdefault segment registers.However, you can force NASM to generate an effective address in aparticular form by the use of the keywords \c{BYTE}, \c{WORD},\c{DWORD} and \c{NOSPLIT}. If you need \c{[eax+3]} to be assembledusing a double-word offset field instead of the one byte NASM willnormally generate, you can code \c{[dword eax+3]}. Similarly, youcan force NASM to use a byte offset for a small value which ithasn't seen on the first pass (see \k{crit} for an example of such acode fragment) by using \c{[byte eax+offset]}. As special cases,\c{[byte eax]} will code \c{[eax+0]} with a byte offset of zero, and\c{[dword eax]} will code it with a double-word offset of zero. Thenormal form, \c{[eax]}, will be coded with no offset field.The form described in the previous paragraph is also useful if youare trying to access data in a 32-bit segment from within 16 bit code.For more information on this see the section on mixed-size addressing(\k{mixaddr}). In particular, if you need to access data with a knownoffset that is larger than will fit in a 16-bit value, if you don'tspecify that it is a dword offset, nasm will cause the high word ofthe offset to be lost.Similarly, NASM will split \c{[eax*2]} into \c{[eax+eax]} because
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -