nasmdoc.src

来自「一个汇编语言编译器源码」· SRC 代码 · 共 1,545 行 · 第 1/5 页
SRC
1,545 行
\c                                  ; actually include at most 512

\S{equ} \i\c{EQU}: Defining Constants

\c{EQU} defines a symbol to a given constant value: when \c{EQU} is
used, the source line must contain a label. The action of \c{EQU} is
to define the given label name to the value of its (only) operand.
This definition is absolute, and cannot change later. So, for
example,

\c message   db 'hello, world'
\c msglen    equ $-message

defines \c{msglen} to be the constant 12. \c{msglen} may not then be
redefined later. This is not a \i{preprocessor} definition either:
the value of \c{msglen} is evaluated \e{once}, using the value of
\c{$} (see \k{expr} for an explanation of \c{$}) at the point of
definition, rather than being evaluated wherever it is referenced
and using the value of \c{$} at the point of reference. Note that
the operand to an \c{EQU} is also a \i{critical expression}
(\k{crit}).

\S{times} \i\c{TIMES}: \i{Repeating} Instructions or Data

The \c{TIMES} prefix causes the instruction to be assembled multiple
times. This is partly present as NASM's equivalent of the \i\c{DUP}
syntax supported by \i{MASM}-compatible assemblers, in that you can
code

\c zerobuf:  times 64 db 0

or similar things; but \c{TIMES} is more versatile than that. The
argument to \c{TIMES} is not just a numeric constant, but a numeric
\e{expression}, so you can do things like

\c buffer:   db 'hello, world'
\c           times 64-$+buffer db ' '

which will store exactly enough spaces to make the total length of
\c{buffer} up to 64. Finally, \c{TIMES} can be applied to ordinary
instructions, so you can code trivial \i{unrolled loops} in it:

\c           times 100 movsb

Note that there is no effective difference between \c{times 100 resb
1} and \c{resb 100}, except that the latter will be assembled about
100 times faster due to the internal structure of the assembler.

The operand to \c{TIMES}, like that of \c{EQU} and those of \c{RESB}
and friends, is a critical expression (\k{crit}).

Note also that \c{TIMES} can't be applied to \i{macros}: the reason
for this is that \c{TIMES} is processed after the macro phase, which
allows the argument to \c{TIMES} to contain expressions such as
\c{64-$+buffer} as above. To repeat more than one line of code, or a
complex macro, use the preprocessor \i\c{%rep} directive.

\H{effaddr} Effective Addresses

An \i{effective address} is any operand to an instruction which
\I{memory reference}references memory. Effective addresses, in NASM,
have a very simple syntax: they consist of an expression evaluating
to the desired address, enclosed in \i{square brackets}. For
example:

\c wordvar   dw 123
\c           mov ax,[wordvar]
\c           mov ax,[wordvar+1]
\c           mov ax,[es:wordvar+bx]

Anything not conforming to this simple system is not a valid memory
reference in NASM, for example \c{es:wordvar[bx]}.

More complicated effective addresses, such as those involving more
than one register, work in exactly the same way:

\c           mov eax,[ebx*2+ecx+offset]
\c           mov ax,[bp+di+8]

NASM is capable of doing \i{algebra} on these effective addresses,
so that things which don't necessarily \e{look} legal are perfectly
all right:

\c           mov eax,[ebx*5]        ; assembles as [ebx*4+ebx]
\c           mov eax,[label1*2-label2] ; ie [label1+(label1-label2)]

Some forms of effective address have more than one assembled form;
in most such cases NASM will generate the smallest form it can. For
example, there are distinct assembled forms for the 32-bit effective
addresses \c{[eax*2+0]} and \c{[eax+eax]}, and NASM will generally
generate the latter on the grounds that the former requires four
bytes to store a zero offset.

NASM has a hinting mechanism which will cause \c{[eax+ebx]} and
\c{[ebx+eax]} to generate different opcodes; this is occasionally
useful because \c{[esi+ebp]} and \c{[ebp+esi]} have different
default segment registers.

However, you can force NASM to generate an effective address in a
particular form by the use of the keywords \c{BYTE}, \c{WORD},
\c{DWORD} and \c{NOSPLIT}. If you need \c{[eax+3]} to be assembled
using a double-word offset field instead of the one byte NASM will
normally generate, you can code \c{[dword eax+3]}. Similarly, you
can force NASM to use a byte offset for a small value which it
hasn't seen on the first pass (see \k{crit} for an example of such a
code fragment) by using \c{[byte eax+offset]}. As special cases,
\c{[byte eax]} will code \c{[eax+0]} with a byte offset of zero, and
\c{[dword eax]} will code it with a double-word offset of zero. The
normal form, \c{[eax]}, will be coded with no offset field.

Similarly, NASM will split \c{[eax*2]} into \c{[eax+eax]} because
that allows the offset field to be absent and space to be saved; in
fact, it will also split \c{[eax*2+offset]} into
\c{[eax+eax+offset]}. You can combat this behaviour by the use of
the \c{NOSPLIT} keyword: \c{[nosplit eax*2]} will force
\c{[eax*2+0]} to be generated literally.

\H{const} \i{Constants}

NASM understands four different types of constant: numeric,
character, string and floating-point.

\S{numconst} \i{Numeric Constants}

A numeric constant is simply a number. NASM allows you to specify
numbers in a variety of number bases, in a variety of ways: you can
suffix \c{H}, \c{Q} and \c{B} for \i{hex}, \i{octal} and \i{binary},
or you can prefix \c{0x} for hex in the style of C, or you can
prefix \c{$} for hex in the style of Borland Pascal. Note, though,
that the \I{$prefix}\c{$} prefix does double duty as a prefix on
identifiers (see \k{syntax}), so a hex number prefixed with a \c{$}
sign must have a digit after the \c{$} rather than a letter.

Some examples:

\c           mov ax,100             ; decimal
\c           mov ax,0a2h            ; hex
\c           mov ax,$0a2            ; hex again: the 0 is required
\c           mov ax,0xa2            ; hex yet again
\c           mov ax,777q            ; octal
\c           mov ax,10010011b       ; binary

\S{chrconst} \i{Character Constants}

A character constant consists of up to four characters enclosed in
either single or double quotes. The type of quote makes no
difference to NASM, except of course that surrounding the constant
with single quotes allows double quotes to appear within it and vice
versa.

A character constant with more than one character will be arranged
with \i{little-endian} order in mind: if you code

\c           mov eax,'abcd'

then the constant generated is not \c{0x61626364}, but
\c{0x64636261}, so that if you were then to store the value into
memory, it would read \c{abcd} rather than \c{dcba}. This is also
the sense of character constants understood by the Pentium's
\i\c{CPUID} instruction (see \k{insCPUID}).

\S{strconst} String Constants

String constants are only acceptable to some pseudo-instructions,
namely the \I\c{DW}\I\c{DD}\I\c{DQ}\I\c{DT}\i\c{DB} family and
\i\c{INCBIN}.

A string constant looks like a character constant, only longer. It
is treated as a concatenation of maximum-size character constants
for the conditions. So the following are equivalent:

\c           db 'hello'             ; string constant
\c           db 'h','e','l','l','o' ; equivalent character constants

And the following are also equivalent:

\c           dd 'ninechars'         ; doubleword string constant
\c           dd 'nine','char','s'   ; becomes three doublewords
\c           db 'ninechars',0,0,0   ; and really looks like this

Note that when used as an operand to \c{db}, a constant like
\c{'ab'} is treated as a string constant despite being short enough
to be a character constant, because otherwise \c{db 'ab'} would have
the same effect as \c{db 'a'}, which would be silly. Similarly,
three-character or four-character constants are treated as strings
when they are operands to \c{dw}.

\S{fltconst} \I{floating-point, constants}Floating-Point Constants

\i{Floating-point} constants are acceptable only as arguments to
\i\c{DD}, \i\c{DQ} and \i\c{DT}. They are expressed in the
traditional form: digits, then a period, then optionally more
digits, then optionally an \c{E} followed by an exponent. The period
is mandatory, so that NASM can distinguish between \c{dd 1}, which
declares an integer constant, and \c{dd 1.0} which declares a
floating-point constant.

Some examples:

\c           dd 1.2                 ; an easy one
\c           dq 1.e10               ; 10,000,000,000
\c           dq 1.e+10              ; synonymous with 1.e10
\c           dq 1.e-10              ; 0.000 000 000 1
\c           dt 3.141592653589793238462 ; pi

NASM cannot do compile-time arithmetic on floating-point constants.
This is because NASM is designed to be portable - although it always
generates code to run on x86 processors, the assembler itself can
run on any system with an ANSI C compiler. Therefore, the assembler
cannot guarantee the presence of a floating-point unit capable of
handling the \i{Intel number formats}, and so for NASM to be able to
do floating arithmetic it would have to include its own complete set
of floating-point routines, which would significantly increase the
size of the assembler for very little benefit.

\H{expr} \i{Expressions}

Expressions in NASM are similar in syntax to those in C.

NASM does not guarantee the size of the integers used to evaluate
expressions at compile time: since NASM can compile and run on
64-bit systems quite happily, don't assume that expressions are
evaluated in 32-bit registers and so try to make deliberate use of
\i{integer overflow}. It might not always work. The only thing NASM
will guarantee is what's guaranteed by ANSI C: you always have \e{at
least} 32 bits to work in.

NASM supports two special tokens in expressions, allowing
calculations to involve the current assembly position: the
\I{$ here}\c{$} and \i\c{$$} tokens. \c{$} evaluates to the assembly
position at the beginning of the line containing the expression; so
you can code an \i{infinite loop} using \c{JMP $}. \c{$$} evaluates
to the beginning of the current section; so you can tell how far
into the section you are by using \c{($-$$)}.

The arithmetic \i{operators} provided by NASM are listed here, in
increasing order of \i{precedence}.

\S{expor} \i\c{|}: \i{Bitwise OR} Operator

The \c{|} operator gives a bitwise OR, exactly as performed by the
\c{OR} machine instruction. Bitwise OR is the lowest-priority
arithmetic operator supported by NASM.

\S{expxor} \i\c{^}: \i{Bitwise XOR} Operator

\c{^} provides the bitwise XOR operation.

\S{expand} \i\c{&}: \i{Bitwise AND} Operator

\c{&} provides the bitwise AND operation.

\S{expshift} \i\c{<<} and \i\c{>>}: \i{Bit Shift} Operators

\c{<<} gives a bit-shift to the left, just as it does in C. So \c{5<<3}
evaluates to 5 times 8, or 40. \c{>>} gives a bit-shift to the
right; in NASM, such a shift is \e{always} unsigned, so that
the bits shifted in from the left-hand end are filled with zero
rather than a sign-extension of the previous highest bit.

\S{expplmi} \I{+ opaddition}\c{+} and \I{- opsubtraction}\c{-}:
\i{Addition} and \i{Subtraction} Operators

The \c{+} and \c{-} operators do perfectly ordinary addition and
subtraction.

\S{expmul} \i\c{*}, \i\c{/}, \i\c{//}, \i\c{%} and \i\c{%%}:
\i{Multiplication} and \i{Division}

\c{*} is the multiplication operator. \c{/} and \c{//} are both
division operators: \c{/} is \i{unsigned division} and \c{//} is
\i{signed division}. Similarly, \c{%} and \c{%%} provide \I{unsigned
modulo}\I{modulo operators}unsigned and
\i{signed modulo} operators respectively.

NASM, like ANSI C, provides no guarantees about the sensible
operation of the signed modulo operator.

Since the \c{%} character is used extensively by the macro
\i{preprocessor}, you should ensure that both the signed and unsigned
modulo operators are followed by white space wherever they appear.

\S{expmul} \i{Unary Operators}: \I{+ opunary}\c{+}, \I{- opunary}\c{-},
\i\c{~} and \i\c{SEG}

The highest-priority operators in NASM's expression grammar are
those which only apply to one argument. \c{-} negates its operand,
\c{+} does nothing (it's provided for symmetry with \c{-}), \c{~}
computes the \i{one's complement} of its operand, and \c{SEG}
provides the \i{segment address} of its operand (explained in more
detail in \k{segwrt}).

\H{segwrt} \i\c{SEG} and \i\c{WRT}

When writing large 16-bit programs, which must be split into
multiple \i{segments}, it is often necessary to be able to refer to
the \I{segment address}segment part of the address of a symbol. NASM
supports the \c{SEG} operator to perform this function.

The \c{SEG} operator returns the \i\e{preferred} segment base of a
symbol, defined as the segment base relative to which the offset of
the symbol makes sense. So the code

\c           mov ax,seg symbol
\c           mov es,ax
\c           mov bx,symbol

will load \c{ES:BX} with a valid pointer to the symbol \c{symbol}.
nasmdoc.src - 源码说明

本页面展示了「一个汇编语言编译器源码」中的 nasmdoc.src 源码文件，采用 SRC 编程语言编写，共 1,545 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与汇编语言相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?