📄 nasmdoc.src

📁 汇编编译器的最新版本的源码.买了自己动手写操作系统这本书的人一定要下
💻 SRC
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45

\S{equ} \i\c{EQU}: Defining Constants

\c{EQU} defines a symbol to a given constant value: when \c{EQU} is
used, the source line must contain a label. The action of \c{EQU} is
to define the given label name to the value of its (only) operand.
This definition is absolute, and cannot change later. So, for
example,

\c message         db      'hello, world'
\c msglen          equ     $-message

defines \c{msglen} to be the constant 12. \c{msglen} may not then be
redefined later. This is not a \i{preprocessor} definition either:
the value of \c{msglen} is evaluated \e{once}, using the value of
\c{$} (see \k{expr} for an explanation of \c{$}) at the point of
definition, rather than being evaluated wherever it is referenced
and using the value of \c{$} at the point of reference.


\S{times} \i\c{TIMES}: \i{Repeating} Instructions or Data

The \c{TIMES} prefix causes the instruction to be assembled multiple
times. This is partly present as NASM's equivalent of the \i\c{DUP}
syntax supported by \i{MASM}-compatible assemblers, in that you can
code

\c zerobuf:        times 64 db 0

or similar things; but \c{TIMES} is more versatile than that. The
argument to \c{TIMES} is not just a numeric constant, but a numeric
\e{expression}, so you can do things like

\c buffer: db      'hello, world'
\c         times 64-$+buffer db ' '

which will store exactly enough spaces to make the total length of
\c{buffer} up to 64. Finally, \c{TIMES} can be applied to ordinary
instructions, so you can code trivial \i{unrolled loops} in it:

\c         times 100 movsb

Note that there is no effective difference between \c{times 100 resb
1} and \c{resb 100}, except that the latter will be assembled about
100 times faster due to the internal structure of the assembler.

The operand to \c{TIMES} is a critical expression (\k{crit}).

Note also that \c{TIMES} can't be applied to \i{macros}: the reason
for this is that \c{TIMES} is processed after the macro phase, which
allows the argument to \c{TIMES} to contain expressions such as
\c{64-$+buffer} as above. To repeat more than one line of code, or a
complex macro, use the preprocessor \i\c{%rep} directive.


\H{effaddr} Effective Addresses

An \i{effective address} is any operand to an instruction which
\I{memory reference}references memory. Effective addresses, in NASM,
have a very simple syntax: they consist of an expression evaluating
to the desired address, enclosed in \i{square brackets}. For
example:

\c wordvar dw      123
\c         mov     ax,[wordvar]
\c         mov     ax,[wordvar+1]
\c         mov     ax,[es:wordvar+bx]

Anything not conforming to this simple system is not a valid memory
reference in NASM, for example \c{es:wordvar[bx]}.

More complicated effective addresses, such as those involving more
than one register, work in exactly the same way:

\c         mov     eax,[ebx*2+ecx+offset]
\c         mov     ax,[bp+di+8]

NASM is capable of doing \i{algebra} on these effective addresses,
so that things which don't necessarily \e{look} legal are perfectly
all right:

\c     mov     eax,[ebx*5]             ; assembles as [ebx*4+ebx]
\c     mov     eax,[label1*2-label2]   ; ie [label1+(label1-label2)]

Some forms of effective address have more than one assembled form;
in most such cases NASM will generate the smallest form it can. For
example, there are distinct assembled forms for the 32-bit effective
addresses \c{[eax*2+0]} and \c{[eax+eax]}, and NASM will generally
generate the latter on the grounds that the former requires four
bytes to store a zero offset.

NASM has a hinting mechanism which will cause \c{[eax+ebx]} and
\c{[ebx+eax]} to generate different opcodes; this is occasionally
useful because \c{[esi+ebp]} and \c{[ebp+esi]} have different
default segment registers.

However, you can force NASM to generate an effective address in a
particular form by the use of the keywords \c{BYTE}, \c{WORD},
\c{DWORD} and \c{NOSPLIT}. If you need \c{[eax+3]} to be assembled
using a double-word offset field instead of the one byte NASM will
normally generate, you can code \c{[dword eax+3]}. Similarly, you
can force NASM to use a byte offset for a small value which it
hasn't seen on the first pass (see \k{crit} for an example of such a
code fragment) by using \c{[byte eax+offset]}. As special cases,
\c{[byte eax]} will code \c{[eax+0]} with a byte offset of zero, and
\c{[dword eax]} will code it with a double-word offset of zero. The
normal form, \c{[eax]}, will be coded with no offset field.

The form described in the previous paragraph is also useful if you
are trying to access data in a 32-bit segment from within 16 bit code.
For more information on this see the section on mixed-size addressing
(\k{mixaddr}). In particular, if you need to access data with a known
offset that is larger than will fit in a 16-bit value, if you don't
specify that it is a dword offset, nasm will cause the high word of
the offset to be lost.

Similarly, NASM will split \c{[eax*2]} into \c{[eax+eax]} because
that allows the offset field to be absent and space to be saved; in
fact, it will also split \c{[eax*2+offset]} into
\c{[eax+eax+offset]}. You can combat this behaviour by the use of
the \c{NOSPLIT} keyword: \c{[nosplit eax*2]} will force
\c{[eax*2+0]} to be generated literally.

In 64-bit mode, NASM will by default generate absolute addresses.  The
\i\c{REL} keyword makes it produce \c{RIP}-relative addresses. Since
this is frequently the normally desired behaviour, see the \c{DEFAULT}
directive (\k{default}). The keyword \i\c{ABS} overrides \i\c{REL}.


\H{const} \i{Constants}

NASM understands four different types of constant: numeric,
character, string and floating-point.


\S{numconst} \i{Numeric Constants}

A numeric constant is simply a number. NASM allows you to specify
numbers in a variety of number bases, in a variety of ways: you can
suffix \c{H}, \c{Q} or \c{O}, and \c{B} for \i{hex}, \i{octal} and \i{binary},
or you can prefix \c{0x} for hex in the style of C, or you can
prefix \c{$} for hex in the style of Borland Pascal. Note, though,
that the \I{$, prefix}\c{$} prefix does double duty as a prefix on
identifiers (see \k{syntax}), so a hex number prefixed with a \c{$}
sign must have a digit after the \c{$} rather than a letter.

Numeric constants can have underscores (\c{_}) interspersed to break
up long strings.

Some examples:

\c         mov     ax,100          ; decimal
\c         mov     ax,0a2h         ; hex
\c         mov     ax,$0a2         ; hex again: the 0 is required
\c         mov     ax,0xa2         ; hex yet again
\c         mov     ax,777q         ; octal
\c         mov     ax,777o         ; octal again
\c         mov     ax,10010011b    ; binary
\c         mov     ax,1001_0011b   ; same binary constant


\S{strings} \I{Strings}\i{Character Strings}

A character string consists of up to eight characters enclosed in
either single quotes (\c{'...'}), double quotes (\c{"..."}) or
backquotes (\c{`...`}).  Single or double quotes are equivalent to
NASM (except of course that surrounding the constant with single
quotes allows double quotes to appear within it and vice versa); the
contents of those are represented verbatim.  Strings enclosed in
backquotes support C-style \c{\\}-escapes for special characters.


The following \i{escape sequences} are recognized by backquoted strings:

\c       \'          single quote (')
\c       \"          double quote (")
\c       \`          backquote (`)
\c       \\\          backslash (\)
\c       \?          question mark (?)
\c       \a          BEL (ASCII 7)
\c       \b          BS  (ASCII 8)
\c       \t          TAB (ASCII 9)
\c       \n          LF  (ASCII 10)
\c       \v          VT  (ASCII 11)
\c       \f          FF  (ASCII 12)
\c       \r          CR  (ASCII 13)
\c       \e          ESC (ASCII 27)
\c       \377        Up to 3 octal digits - literal byte
\c       \xFF        Up to 2 hexadecimal digits - literal byte
\c       \u1234      4 hexadecimal digits - Unicode character
\c       \U12345678  8 hexadecimal digits - Unicode character

All other escape sequences are reserved.  Note that \c{\\0}, meaning a
\c{NUL} character (ASCII 0), is a special case of the octal escape
sequence.

\i{Unicode} characters specified with \c{\\u} or \c{\\U} are converted to
\i{UTF-8}.  For example, the following lines are all equivalent:

\c       db `\u263a`            ; UTF-8 smiley face
\c       db `\xe2\x98\xba`      ; UTF-8 smiley face
\c       db 0E2h, 098h, 0BAh    ; UTF-8 smiley face


\S{chrconst} \i{Character Constants}

A character constant consists of a string up to eight bytes long, used
in an expression context.  It is treated as if it was an integer.

A character constant with more than one byte will be arranged
with \i{little-endian} order in mind: if you code

\c           mov eax,'abcd'

then the constant generated is not \c{0x61626364}, but
\c{0x64636261}, so that if you were then to store the value into
memory, it would read \c{abcd} rather than \c{dcba}. This is also
the sense of character constants understood by the Pentium's
\i\c{CPUID} instruction.


\S{strconst} \i{String Constants}

String constants are character strings used in the context of some
pseudo-instructions, namely the
\I\c{DW}\I\c{DD}\I\c{DQ}\I\c{DT}\I\c{DO}\I\c{DY}\i\c{DB} family and
\i\c{INCBIN} (where it represents a filename.)  They are also used in
certain preprocessor directives.

A string constant looks like a character constant, only longer. It
is treated as a concatenation of maximum-size character constants
for the conditions. So the following are equivalent:

\c       db    'hello'               ; string constant
\c       db    'h','e','l','l','o'   ; equivalent character constants

And the following are also equivalent:

\c       dd    'ninechars'           ; doubleword string constant
\c       dd    'nine','char','s'     ; becomes three doublewords
\c       db    'ninechars',0,0,0     ; and really looks like this

Note that when used in a string-supporting context, quoted strings are
treated as a string constants even if they are short enough to be a
character constant, because otherwise \c{db 'ab'} would have the same
effect as \c{db 'a'}, which would be silly. Similarly, three-character
or four-character constants are treated as strings when they are
operands to \c{DW}, and so forth.

\S{unicode} \I{UTF-16}\I{UTF-32}\i{Unicode} Strings

The special operators \i\c{__utf16__} and \i\c{__utf32__} allows
definition of Unicode strings.  They take a string in UTF-8 format and
converts it to (littleendian) UTF-16 or UTF-32, respectively.

For example:

\c %define u(x) __utf16__(x)
\c %define w(x) __utf32__(x)
\c
\c       dw u('C:\WINDOWS'), 0       ; Pathname in UTF-16
\c       dd w(`A + B = \u206a`), 0   ; String in UTF-32

\c{__utf16__} and \c{__utf32__} can be applied either to strings
passed to the \c{DB} family instructions, or to character constants in
an expression context.  

\S{fltconst} \I{floating-point, constants}Floating-Point Constants

\i{Floating-point} constants are acceptable only as arguments to
\i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ}, \i\c{DT}, and \i\c{DO}, or as
arguments to the special operators \i\c{__float8__},
\i\c{__float16__}, \i\c{__float32__}, \i\c{__float64__},
\i\c{__float80m__}, \i\c{__float80e__}, \i\c{__float128l__}, and
\i\c{__float128h__}.

Floating-point constants are expressed in the traditional form:
digits, then a period, then optionally more digits, then optionally an
\c{E} followed by an exponent. The period is mandatory, so that NASM
can distinguish between \c{dd 1}, which declares an integer constant,
and \c{dd 1.0} which declares a floating-point constant.  NASM also
support C99-style hexadecimal floating-point: \c{0x}, hexadecimal
digits, period, optionally more hexadeximal digits, then optionally a
\c{P} followed by a \e{binary} (not hexadecimal) exponent in decimal
notation.

Underscores to break up groups of digits are permitted in
floating-point constants as well.

Some examples:

\c       db    -0.2                    ; "Quarter precision"
\c       dw    -0.5                    ; IEEE 754r/SSE5 half precision
\c       dd    1.2                     ; an easy one
\c       dd    1.222_222_222           ; underscores are permitted
\c       dd    0x1p+2                  ; 1.0x2^2 = 4.0
\c       dq    0x1p+32                 ; 1.0x2^32 = 4 294 967 296.0
\c       dq    1.e10                   ; 10 000 000 000.0
\c       dq    1.e+10                  ; synonymous with 1.e10
\c       dq    1.e-10                  ; 0.000 000 000 1
\c       dt    3.141592653589793238462 ; pi
\c       do    1.e+4000                ; IEEE 754r quad precision

The 8-bit "quarter-precision" floating-point format is
sign:exponent:mantissa = 1:4:3 with an exponent bias of 7.  This
appears to be the most frequently used 8-bit floating-point format,
although it is not covered by any formal standard.  This is sometimes
called a "\i{minifloat}."

The special operators are used to produce floating-point numbers in
other contexts.  They produce the binary representation of a specific
floa
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -