nasmdoc.src

来自「一个汇编语言编译器源码」· SRC 代码 · 共 1,545 行 · 第 1/5 页
SRC
1,545 行

\b \i\c{orphan-labels} covers warnings about source lines which
contain no instruction but define a label without a trailing colon.
NASM does not warn about this somewhat obscure condition by default;
see \k{syntax} for an example of why you might want it to.

\b \i\c{number-overflow} covers warnings about numeric constants which
don't fit in 32 bits (for example, it's easy to type one too many Fs
and produce \c{0x7ffffffff} by mistake). This warning class is
enabled by default.

\S{nasmenv} The \c{NASM} \i{Environment} Variable

If you define an environment variable called \c{NASM}, the program
will interpret it as a list of extra command-line options, which are
processed before the real command line. You can use this to define
standard search directories for include files, by putting \c{-i}
options in the \c{NASM} variable.

The value of the variable is split up at white space, so that the
value \c{-s -ic:\\nasmlib} will be treated as two separate options.
However, that means that the value \c{-dNAME="my name"} won't do
what you might want, because it will be split at the space and the
NASM command-line processing will get confused by the two
nonsensical words \c{-dNAME="my} and \c{name"}.

To get round this, NASM provides a feature whereby, if you begin the
\c{NASM} environment variable with some character that isn't a minus
sign, then NASM will treat this character as the \i{separator
character} for options. So setting the \c{NASM} variable to the
value \c{!-s!-ic:\\nasmlib} is equivalent to setting it to \c{-s
-ic:\\nasmlib}, but \c{!-dNAME="my name"} will work.

\H{qstart} \i{Quick Start} for \i{MASM} Users

If you're used to writing programs with MASM, or with \i{TASM} in
MASM-compatible (non-Ideal) mode, or with \i\c{a86}, this section
attempts to outline the major differences between MASM's syntax and
NASM's. If you're not already used to MASM, it's probably worth
skipping this section.

\S{qscs} NASM Is \I{case sensitivity}Case-Sensitive

One simple difference is that NASM is case-sensitive. It makes a
difference whether you call your label \c{foo}, \c{Foo} or \c{FOO}.
If you're assembling to DOS or OS/2 \c{.OBJ} files, you can invoke
the \i\c{UPPERCASE} directive (documented in \k{objfmt}) to ensure
that all symbols exported to other code modules are forced to be
upper case; but even then, \e{within} a single module, NASM will
distinguish between labels differing only in case.

\S{qsbrackets} NASM Requires \i{Square Brackets} For \i{Memory References}

NASM was designed with simplicity of syntax in mind. One of the
\i{design goals} of NASM is that it should be possible, as far as is
practical, for the user to look at a single line of NASM code
and tell what opcode is generated by it. You can't do this in MASM:
if you declare, for example,

\c foo       equ 1
\c bar       dw 2

then the two lines of code

\c           mov ax,foo
\c           mov ax,bar

generate completely different opcodes, despite having
identical-looking syntaxes.

NASM avoids this undesirable situation by having a much simpler
syntax for memory references. The rule is simply that any access to
the \e{contents} of a memory location requires square brackets
around the address, and any access to the \e{address} of a variable
doesn't. So an instruction of the form \c{mov ax,foo} will
\e{always} refer to a compile-time constant, whether it's an \c{EQU}
or the address of a variable; and to access the \e{contents} of the
variable \c{bar}, you must code \c{mov ax,[bar]}.

This also means that NASM has no need for MASM's \i\c{OFFSET}
keyword, since the MASM code \c{mov ax,offset bar} means exactly the
same thing as NASM's \c{mov ax,bar}. If you're trying to get
large amounts of MASM code to assemble sensibly under NASM, you
can always code \c{%idefine offset} to make the preprocessor treat
the \c{OFFSET} keyword as a no-op.

This issue is even more confusing in \i\c{a86}, where declaring a
label with a trailing colon defines it to be a `label' as opposed to
a `variable' and causes \c{a86} to adopt NASM-style semantics; so in
\c{a86}, \c{mov ax,var} has different behaviour depending on whether
\c{var} was declared as \c{var: dw 0} (a label) or \c{var dw 0} (a
word-size variable). NASM is very simple by comparison:
\e{everything} is a label.

NASM, in the interests of simplicity, also does not support the
\i{hybrid syntaxes} supported by MASM and its clones, such as
\c{mov ax,table[bx]}, where a memory reference is denoted by one
portion outside square brackets and another portion inside. The
correct syntax for the above is \c{mov ax,[table+bx]}. Likewise,
\c{mov ax,es:[di]} is wrong and \c{mov ax,[es:di]} is right.

\S{qstypes} NASM Doesn't Store \i{Variable Types}

NASM, by design, chooses not to remember the types of variables you
declare. Whereas MASM will remember, on seeing \c{var dw 0}, that
you declared \c{var} as a word-size variable, and will then be able
to fill in the \i{ambiguity} in the size of the instruction \c{mov
var,2}, NASM will deliberately remember nothing about the symbol
\c{var} except where it begins, and so you must explicitly code
\c{mov word [var],2}.

For this reason, NASM doesn't support the \c{LODS}, \c{MOVS},
\c{STOS}, \c{SCAS}, \c{CMPS}, \c{INS}, or \c{OUTS} instructions,
but only supports the forms such as \c{LODSB}, \c{MOVSW}, and
\c{SCASD}, which explicitly specify the size of the components of
the strings being manipulated.

\S{qsassume} NASM Doesn't \i\c{ASSUME}

As part of NASM's drive for simplicity, it also does not support the
\c{ASSUME} directive. NASM will not keep track of what values you
choose to put in your segment registers, and will never
\e{automatically} generate a \i{segment override} prefix.

\S{qsmodel} NASM Doesn't Support \i{Memory Models}

NASM also does not have any directives to support different 16-bit
memory models. The programmer has to keep track of which functions
are supposed to be called with a \i{far call} and which with a
\i{near call}, and is responsible for putting the correct form of
\c{RET} instruction (\c{RETN} or \c{RETF}; NASM accepts \c{RET}
itself as an alternate form for \c{RETN}); in addition, the
programmer is responsible for coding CALL FAR instructions where
necessary when calling \e{external} functions, and must also keep
track of which external variable definitions are far and which are
near.

\S{qsfpu} \i{Floating-Point} Differences

NASM uses different names to refer to floating-point registers from
MASM: where MASM would call them \c{ST(0)}, \c{ST(1)} and so on, and
\i\c{a86} would call them simply \c{0}, \c{1} and so on, NASM
chooses to call them \c{st0}, \c{st1} etc.

As of version 0.96, NASM now treats the instructions with
\i{`nowait'} forms in the same way as MASM-compatible assemblers.
The idiosyncratic treatment employed by 0.95 and earlier was based
on a misunderstanding by the authors.

\S{qsother} Other Differences

For historical reasons, NASM uses the keyword \i\c{TWORD} where MASM
and compatible assemblers use \i\c{TBYTE}.

NASM does not declare \i{uninitialised storage} in the same way as
MASM: where a MASM programmer might use \c{stack db 64 dup (?)},
NASM requires \c{stack resb 64}, intended to be read as `reserve 64
bytes'. For a limited amount of compatibility, since NASM treats
\c{?} as a valid character in symbol names, you can code \c{? equ 0}
and then writing \c{dw ?} will at least do something vaguely useful.
\I\c{RESB}\i\c{DUP} is still not a supported syntax, however.

In addition to all of this, macros and directives work completely
differently to MASM. See \k{preproc} and \k{directive} for further
details.

\C{lang} The NASM Language

\H{syntax} Layout of a NASM Source Line

Like most assemblers, each NASM source line contains (unless it
is a macro, a preprocessor directive or an assembler directive: see
\k{preproc} and \k{directive}) some combination of the four fields

\c label:    instruction operands        ; comment

As usual, most of these fields are optional; the presence or absence
of any combination of a label, an instruction and a comment is allowed.
Of course, the operand field is either required or forbidden by the
presence and nature of the instruction field.

NASM places no restrictions on white space within a line: labels may
have white space before them, or instructions may have no space
before them, or anything. The \i{colon} after a label is also
optional. (Note that this means that if you intend to code \c{lodsb}
alone on a line, and type \c{lodab} by accident, then that's still a
valid source line which does nothing but define a label. Running
NASM with the command-line option
\I{orphan-labels}\c{-w+orphan-labels} will cause it to warn you if
you define a label alone on a line without a \i{trailing colon}.)

\i{Valid characters} in labels are letters, numbers, \c{_}, \c{$},
\c{#}, \c{@}, \c{~}, \c{.}, and \c{?}. The only characters which may
be used as the \e{first} character of an identifier are letters,
\c{.} (with special meaning: see \k{locallab}), \c{_} and \c{?}.
An identifier may also be prefixed with a \I{$prefix}\c{$} to
indicate that it is intended to be read as an identifier and not a
reserved word; thus, if some other module you are linking with
defines a symbol called \c{eax}, you can refer to \c{$eax} in NASM
code to distinguish the symbol from the register.

The instruction field may contain any machine instruction: Pentium
and P6 instructions, FPU instructions, MMX instructions and even
undocumented instructions are all supported. The instruction may be
prefixed by \c{LOCK}, \c{REP}, \c{REPE}/\c{REPZ} or
\c{REPNE}/\c{REPNZ}, in the usual way. Explicit \I{address-size
prefixes}address-size and \i{operand-size prefixes} \c{A16},
\c{A32}, \c{O16} and \c{O32} are provided - one example of their use
is given in \k{mixsize}. You can also use the name of a \I{segment
override}segment register as an instruction prefix: coding
\c{es mov [bx],ax} is equivalent to coding \c{mov [es:bx],ax}. We
recommend the latter syntax, since it is consistent with other
syntactic features of the language, but for instructions such as
\c{LODSB}, which has no operands and yet can require a segment
override, there is no clean syntactic way to proceed apart from
\c{es lodsb}.

An instruction is not required to use a prefix: prefixes such as
\c{CS}, \c{A32}, \c{LOCK} or \c{REPE} can appear on a line by
themselves, and NASM will just generate the prefix bytes.

In addition to actual machine instructions, NASM also supports a
number of pseudo-instructions, described in \k{pseudop}.

Instruction \i{operands} may take a number of forms: they can be
registers, described simply by the register name (e.g. \c{ax},
\c{bp}, \c{ebx}, \c{cr0}: NASM does not use the \c{gas}-style
syntax in which register names must be prefixed by a \c{%} sign), or
they can be \i{effective addresses} (see \k{effaddr}), constants
(\k{const}) or expressions (\k{expr}).

For \i{floating-point} instructions, NASM accepts a wide range of
syntaxes: you can use two-operand forms like MASM supports, or you
can use NASM's native single-operand forms in most cases. Details of
all forms of each supported instruction are given in
\k{iref}. For example, you can code:

\c           fadd st1               ; this sets st0 := st0 + st1
\c           fadd st0,st1           ; so does this
\c
\c           fadd st1,st0           ; this sets st1 := st1 + st0
\c           fadd to st1            ; so does this

Almost any floating-point instruction that references memory must
use one of the prefixes \i\c{DWORD}, \i\c{QWORD} or \i\c{TWORD} to
indicate what size of \i{memory operand} it refers to.

\H{pseudop} \i{Pseudo-Instructions}

Pseudo-instructions are things which, though not real x86 machine
instructions, are used in the instruction field anyway because
that's the most convenient place to put them. The current
pseudo-instructions are \i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ} and
\i\c{DT}, their \i{uninitialised} counterparts \i\c{RESB},
\i\c{RESW}, \i\c{RESD}, \i\c{RESQ} and \i\c{REST}, the \i\c{INCBIN}
command, the \i\c{EQU} command, and the \i\c{TIMES} prefix.

\S{db} \c{DB} and friends: Declaring Initialised Data

\i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ} and \i\c{DT} are used, much
as in MASM, to declare initialised data in the output file. They can
be invoked in a wide range of ways:
\I{floating-point}\I{character constant}\I{string constant}

\c           db 0x55                ; just the byte 0x55
\c           db 0x55,0x56,0x57      ; three bytes in succession
\c           db 'a',0x55            ; character constants are OK
\c           db 'hello',13,10,'$'   ; so are string constants
\c           dw 0x1234              ; 0x34 0x12
\c           dw 'a'                 ; 0x41 0x00 (it's just a number)
\c           dw 'ab'                ; 0x41 0x42 (character constant)
\c           dw 'abc'               ; 0x41 0x42 0x43 0x00 (string)
\c           dd 0x12345678          ; 0x78 0x56 0x34 0x12
\c           dd 1.234567e20         ; floating-point constant
\c           dq 1.234567e20         ; double-precision float
\c           dt 1.234567e20         ; extended-precision float

\c{DQ} and \c{DT} do not accept \i{numeric constants} or string
constants as operands.

\S{resb} \c{RESB} and friends: Declaring \i{Uninitialised} Data

\i\c{RESB}, \i\c{RESW}, \i\c{RESD}, \i\c{RESQ} and \i\c{REST} are
designed to be used in the BSS section of a module: they declare
\e{uninitialised} storage space. Each takes a single operand, which
is the number of bytes, words, doublewords or whatever to reserve.
As stated in \k{qsother}, NASM does not support the MASM/TASM syntax
of reserving uninitialised space by writing \I\c{?}\c{DW ?} or
similar things: this is what it does instead. The operand to a
\c{RESB}-type pseudo-instruction is a \i\e{critical expression}: see
\k{crit}.

For example:

\c buffer:   resb 64                ; reserve 64 bytes
\c wordvar:  resw 1                 ; reserve a word
\c realarray resq 10                ; array of ten reals

\S{incbin} \i\c{INCBIN}: Including External \i{Binary Files}

\c{INCBIN} is borrowed from the old Amiga assembler \i{DevPac}: it
includes a binary file verbatim into the output file. This can be
handy for (for example) including \i{graphics} and \i{sound} data
directly into a game executable file. It can be called in one of
these three ways:

\c           incbin "file.dat"      ; include the whole file
\c           incbin "file.dat",1024 ; skip the first 1024 bytes
\c           incbin "file.dat",1024,512 ; skip the first 1024, and
nasmdoc.src - 源码说明

本页面展示了「一个汇编语言编译器源码」中的 nasmdoc.src 源码文件，采用 SRC 编程语言编写，共 1,545 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与汇编语言相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?