📄 x86-defs.m4
字号:
divert(-1)dnl m4 macros for x86 assembler.dnl Copyright 1999, 2000, 2001, 2002 Free Software Foundation, Inc.dnl dnl This file is part of the GNU MP Library.dnldnl The GNU MP Library is free software; you can redistribute it and/ordnl modify it under the terms of the GNU Lesser General Public License asdnl published by the Free Software Foundation; either version 2.1 of thednl License, or (at your option) any later version.dnldnl The GNU MP Library is distributed in the hope that it will be useful,dnl but WITHOUT ANY WARRANTY; without even the implied warranty ofdnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNUdnl Lesser General Public License for more details.dnldnl You should have received a copy of the GNU Lesser General Publicdnl License along with the GNU MP Library; see the file COPYING.LIB. Ifdnl not, write to the Free Software Foundation, Inc., 59 Temple Place -dnl Suite 330, Boston, MA 02111-1307, USA.dnl Notes:dnldnl m4 isn't perfect for processing BSD style x86 assembler code, the maindnl problems are,dnldnl 1. Doing define(foo,123) and then using foo in an addressing mode likednl foo(%ebx) expands as a macro rather than a constant. This is workeddnl around by using deflit() from asm-defs.m4, instead of define().dnldnl 2. Immediates in macro definitions need a space or `' to stop the $dnl looking like a macro parameter. For example,dnldnl define(foo, `mov $ 123, %eax')dnldnl This is only a problem in macro definitions, not in ordinary text,dnl and not in macro parameters like text passed to forloop() or ifdef().deflit(BYTES_PER_MP_LIMB, 4)dnl Libtool gives -DPIC -DDLL_EXPORT to indicate a cygwin or mingw DLL. Wednl undefine PIC since we don't need to be position independent in thisdnl case and definitely don't want the ELF style _GLOBAL_OFFSET_TABLE_ etc.ifdef(`DLL_EXPORT',`undefine(`PIC')')dnl Called: PROLOGUE_cpu(GSYM_PREFIX`'foo)dnldnl In the x86 code we use explicit TEXT and ALIGN() calls in the code,dnl since different alignments are wanted in various circumstances. So fordnl instance,dnldnl TEXTdnl ALIGN(16)dnl PROLOGUE(mpn_add_n)dnl ...dnl EPILOGUE()define(`PROLOGUE_cpu',m4_assert_numargs(1) `GLOBL $1 TYPE($1,`function')$1:ifelse(WANT_PROFILING,`no',,`call_mcount')')dnl Usage: call_mcountdnldnl For `gprof' style profiling, %ebp is setup as a frame pointer. None ofdnl the assembler routines use %ebp this way, so it's done only for thednl benefit of mcount. glibc sysdeps/i386/i386-mcount.S shows how mcountdnl gets the current function from (%esp) and the parent from 4(%ebp).dnldnl For `prof' style profiling gcc generates mcount calls without settingdnl up %ebp, and the same is done here.define(`call_mcount',m4_assert_numargs(-1)m4_assert_defined(`WANT_PROFILING')m4_assert_defined(`MCOUNT_PIC_REG')m4_assert_defined(`MCOUNT_NONPIC_REG')m4_assert_defined(`MCOUNT_PIC_CALL')m4_assert_defined(`MCOUNT_NONPIC_CALL')`ifelse(ifdef(`PIC',`MCOUNT_PIC_REG',`MCOUNT_NONPIC_REG'),,,` DATA ALIGN(4)L(mcount_data_`'mcount_data_counter): W32 0 TEXT')dnlifelse(WANT_PROFILING,`gprof',` pushl %ebp movl %esp, %ebp')dnlifdef(`PIC',` pushl %ebx mcount_movl_GOT_ebxifelse(MCOUNT_PIC_REG,,,` leal L(mcount_data_`'mcount_data_counter)@GOTOFF(%ebx), MCOUNT_PIC_REG')MCOUNT_PIC_CALL popl %ebx',`dnl non-PICifelse(MCOUNT_NONPIC_REG,,,` movl `$'L(mcount_data_`'mcount_data_counter), MCOUNT_NONPIC_REG')dnlMCOUNT_NONPIC_CALL')dnlifelse(WANT_PROFILING,`gprof',` popl %ebp')define(`mcount_data_counter',eval(mcount_data_counter+1))')define(mcount_data_counter,1)dnl Called: mcount_movl_GOT_ebxdnl Label H is "here", the %eip obtained from the call. C is the calleddnl subroutine. J is the jump across that subroutine. A fetch and "ret"dnl is always done so calls and returns are balanced for the benefit of thednl various x86s that have return stack branch prediction.define(mcount_movl_GOT_ebx,m4_assert_numargs(-1)` call L(mcount_movl_GOT_ebx_C`'mcount_movl_GOT_ebx_counter)L(mcount_movl_GOT_ebx_H`'mcount_movl_GOT_ebx_counter): jmp L(mcount_movl_GOT_ebx_J`'mcount_movl_GOT_ebx_counter)L(mcount_movl_GOT_ebx_C`'mcount_movl_GOT_ebx_counter): movl (%esp), %ebx retL(mcount_movl_GOT_ebx_J`'mcount_movl_GOT_ebx_counter): addl $_GLOBAL_OFFSET_TABLE_+[.-L(mcount_movl_GOT_ebx_H`'mcount_movl_GOT_ebx_counter)], %ebxdefine(`mcount_movl_GOT_ebx_counter',incr(mcount_movl_GOT_ebx_counter))')define(mcount_movl_GOT_ebx_counter,1)dnl --------------------------------------------------------------------------dnl Various x86 macros.dnldnl Usage: ALIGN_OFFSET(bytes,offset)dnldnl Align to `offset' away from a multiple of `bytes'.dnldnl This is useful for testing, for example align to something very strictdnl and see what effect offsets from it have, "ALIGN_OFFSET(256,32)".dnldnl Generally you wouldn't execute across the padding, but it's done withdnl nop's so it'll work.define(ALIGN_OFFSET,m4_assert_numargs(2)`ALIGN($1)forloop(`i',1,$2,` nop')')dnl Usage: defframe(name,offset)dnldnl Make a definition like the following with which to access a parameterdnl or variable on the stack.dnldnl define(name,`FRAME+offset(%esp)')dnldnl Actually m4_empty_if_zero(FRAME+offset) is used, which will save onednl byte if FRAME+offset is zero, by putting (%esp) rather than 0(%esp).dnl Use define(`defframe_empty_if_zero_disabled',1) if for some reason thednl zero offset is wanted.dnldnl The new macro also gets a check that when it's used FRAME is actuallydnl defined, and that the final %esp offset isn't negative, which woulddnl mean an attempt to access something below the current %esp.dnldnl deflit() is used rather than a plain define(), so the new macro won'tdnl delete any following parenthesized expression. name(%edi) will comednl out say as 16(%esp)(%edi). This isn't valid assembler and shoulddnl provoke an error, which is better than silently giving just 16(%esp).dnldnl See README for more on the suggested way to access the stack frame.define(defframe,m4_assert_numargs(2)`deflit(`$1',m4_assert_defined(`FRAME')`defframe_check_notbelow(`$1',$2,FRAME)dnldefframe_empty_if_zero(FRAME+($2))(%esp)')')dnl Called: defframe_empty_if_zero(expression)define(defframe_empty_if_zero,m4_assert_numargs(1)`ifelse(defframe_empty_if_zero_disabled,1,`eval($1)',`m4_empty_if_zero($1)')')dnl Called: defframe_check_notbelow(`name',offset,FRAME)define(defframe_check_notbelow,m4_assert_numargs(3)`ifelse(eval(($3)+($2)<0),1,`m4_error(`$1 at frame offset $2 used when FRAME is only $3 bytes')')')dnl Usage: FRAME_pushl()dnl FRAME_popl()dnl FRAME_addl_esp(n)dnl FRAME_subl_esp(n)dnldnl Adjust FRAME appropriately for a pushl or popl, or for an addl or subldnl %esp of n bytes.dnldnl Using these macros is completely optional. Sometimes it makes morednl sense to put explicit deflit(`FRAME',N) forms, especially when there'sdnl jumps and different sequences of FRAME values need to be used indnl different places.define(FRAME_pushl,m4_assert_numargs(0)m4_assert_defined(`FRAME')`deflit(`FRAME',eval(FRAME+4))')define(FRAME_popl,m4_assert_numargs(0)m4_assert_defined(`FRAME')`deflit(`FRAME',eval(FRAME-4))')define(FRAME_addl_esp,m4_assert_numargs(1)m4_assert_defined(`FRAME')`deflit(`FRAME',eval(FRAME-($1)))')define(FRAME_subl_esp,m4_assert_numargs(1)m4_assert_defined(`FRAME')`deflit(`FRAME',eval(FRAME+($1)))')dnl Usage: defframe_pushl(name)dnldnl Do a combination FRAME_pushl() and a defframe() to name the stackdnl location just pushed. This should come after a pushl instruction.dnl Putting it on the same line works and avoids lengthening the code. Fordnl example,dnldnl pushl %eax defframe_pushl(VAR_COUNTER)dnldnl Notice the defframe() is done with an unquoted -FRAME thus giving itsdnl current value without tracking future changes.define(defframe_pushl,m4_assert_numargs(1)`FRAME_pushl()defframe(`$1',-FRAME)')dnl --------------------------------------------------------------------------dnl Assembler instruction macros.dnldnl Usage: emms_or_femmsdnl femms_available_pdnldnl femms_available_p expands to 1 or 0 according to whether the AMD 3DNowdnl femms instruction is available. emms_or_femms expands to femms ifdnl available, or emms if not.dnldnl emms_or_femms is meant for use in the K6 directory where plain K6dnl (without femms) and K6-2 and K6-3 (with a slightly faster femms) arednl supported together.dnldnl On K7 femms is no longer faster and is just an alias for emms, so plaindnl emms may as well be used.define(femms_available_p,m4_assert_numargs(-1)`m4_ifdef_anyof_p( `HAVE_HOST_CPU_k62', `HAVE_HOST_CPU_k63', `HAVE_HOST_CPU_athlon')')define(emms_or_femms,m4_assert_numargs(-1)`ifelse(femms_available_p,1,`femms',`emms')')dnl Usage: femmsdnldnl Gas 2.9.1 which comes with FreeBSD 3.4 doesn't support femms, so thednl following is a replacement using .byte.dnldnl If femms isn't available, an emms is generated instead, for conveniencednl when testing on a machine without femms.define(femms,m4_assert_numargs(-1)`ifelse(femms_available_p,1,`.byte 15,14 C AMD 3DNow femms',`emms`'dnlm4_warning(`warning, using emms in place of femms, use for testing only')')')dnl Usage: jadcl0(op)dnldnl Generate a jnc/incl as a substitute for adcl $0,op. Note this isn't andnl exact replacement, since it doesn't set the flags like adcl does.dnldnl This finds a use in K6 mpn_addmul_1, mpn_submul_1, mpn_mul_basecase anddnl mpn_sqr_basecase because on K6 an adcl is slow, the branchdnl misprediction penalty is small, and the multiply algorithm used leadsdnl to a carry bit on average only 1/4 of the time.dnldnl jadcl0_disabled can be set to 1 to instead generate an ordinary adcldnl for comparison. For example,dnldnl define(`jadcl0_disabled',1)dnldnl When using a register operand, eg. "jadcl0(%edx)", the jnc/incl code isdnl the same size as an adcl. This makes it possible to use the exact samednl computed jump code when testing the relative speed of the two.define(jadcl0,m4_assert_numargs(1)`ifelse(jadcl0_disabled,1, `adcl $`'0, $1', `jnc L(jadcl0_`'jadcl0_counter) incl $1L(jadcl0_`'jadcl0_counter):define(`jadcl0_counter',incr(jadcl0_counter))')')define(jadcl0_counter,1)dnl Usage: cmov_available_pdnldnl Expand to 1 if cmov is available, 0 if not.define(cmov_available_p,m4_assert_numargs(-1)`m4_ifdef_anyof_p( `HAVE_HOST_CPU_pentiumpro', `HAVE_HOST_CPU_pentium2', `HAVE_HOST_CPU_pentium3', `HAVE_HOST_CPU_pentium4', `HAVE_HOST_CPU_athlon')')dnl Usage: x86_lookup(target, key,value, key,value, ...)dnl x86_lookup_p(target, key,value, key,value, ...)dnldnl Look for `target' among the `key' parameters.dnldnl x86_lookup expands to the corresponding `value', or generates an errordnl if `target' isn't found.dnldnl x86_lookup_p expands to 1 if `target' is found, or 0 if not.define(x86_lookup,m4_assert_numargs_range(1,999)`ifelse(eval($#<3),1,`m4_error(`unrecognised part of x86 instruction: $1')',`ifelse(`$1',`$2', `$3',`x86_lookup(`$1',shift(shift(shift($@))))')')')define(x86_lookup_p,m4_assert_numargs_range(1,999)`ifelse(eval($#<3),1, `0',`ifelse(`$1',`$2', `1',`x86_lookup_p(`$1',shift(shift(shift($@))))')')')dnl Usage: x86_opcode_reg32(reg)dnl x86_opcode_reg32_p(reg)dnldnl x86_opcode_reg32 expands to the standard 3 bit encoding for the givendnl 32-bit register, eg. `%ebp' turns into 5.dnldnl x86_opcode_reg32_p expands to 1 if reg is a valid 32-bit register, or 0dnl if not.define(x86_opcode_reg32,m4_assert_numargs(1)`x86_lookup(`$1',x86_opcode_reg32_list)')
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -