⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 hipe_amd64_abi.txt

📁 OTP是开放电信平台的简称
💻 TXT
字号:
$Id$HiPE AMD64 ABI==============This document describes aspects of HiPE's runtime systemthat are specific for the AMD64 (x86-64) architecture.Register Usage--------------%rsp and %rbp are fixed and must be preserved by calls (callee-save).%rax, %rbx, %rcx, %rdx, %rsi, %rdi, %r8, %r9, %r10, %r11, %r12, %r13, %r14are clobbered by calls (caller-save).%r15 is a fixed global register (unallocatable).%rsp is the native code stack pointer, growing towards lower addresses.%rbp (aka P) is the current process' "Process*".%r15 (aka HP) is the current process' heap pointer. (If HP_IN_R15 is true.)Notes:- C/AMD64 16-byte aligns %rsp, presumably for SSE and signal handling.  HiPE/AMD64 does not need that, so our %rsp is only 8-byte aligned.- HiPE/x86 uses %esi for HP, but C/AMD64 uses %rsi for parameter passing,  so HiPE/AMD64 should not use %rsi for HP.- Using %r15 for HP requires a REX instruction prefix, but performing  64-bit stores needs one anyway, so the only REX-prefix overhead  occurs when incrementing or copying HP [not true (we need REX for 64  bit add and mov too);爋nly overhead is when accessing floats on the  heap /Luna].- XXX: HiPE/x86 could just as easily use %ebx for HP. HiPE/AMD64 could use  %rbx, but the performance impact is probably minor. Try&measure?- XXX: Cache SP_LIMIT, HP_LIMIT, and FCALLS in registers? Try&measure.Calling Convention------------------Same as in the HiPE/x86 ABI, with the following adjustments:The first NR_ARG_REGS (a tunable parameter between 0 and 6, inclusive)parameters are passed in %rsi, %rdx, %rcx, %r8, %r9, and %rdi.The first return value from a function is placed in %rax, the second(if any) is placed in %rdx.Notes:- Currently, NR_ARG_REGS==0.- C BIFs expect P in C parameter register 1: %rdi. By making Erlang  parameter registers 1-5 coincide with C parameter registers 2-6,  our BIF wrappers can simply move P to %rdi without having to shift  the remaining parameter registers.- A few primop calls target C functions that do not take a P parameter.  For these, the code generator should have a "ccall" instruction which  passes parameters starting with %rdi instead of %rsi.- %rdi can still be used for Erlang parameter passing. The BIF wrappers  will push it to the C stack, but \emph{parameter \#6 would have been  pushed anyway}, so there is no additional overhead.- We could pass more parameters in %rax, %rbx, %r10, %r11, %r12, %r13,  and %r14. However:  * we may need a scratch register for distant call trampolines  * using >6 argument registers complicates the mode-switch interface    (needs hacks and special-case optimisations)  * it is questionable whether using more than 6 improves performance;    it may be better to just cache more P state in registersInstruction Encoding / Code Model---------------------------------AMD64 maintains x86's limit of <= 32 bits for PC-relative offsetsin call and jmp instructions. HiPE/AMD64 handles this as follows:- The compiler emits ordinary call/jmp instructions for  recursive calls and tailcalls.- The runtime system code is loaded into the low 32 bits of the  address space. (C/AMD64 small or medium code model.) By using mmap()  with the MAP_32BIT flag when allocating memory for code, all  code will be in the low 32 bits of the address space, and hence  no trampolines will be necessary.When generating code for non-immediate literals (boxed objects inthe constants pool), the code generator should use AMD64's newinstruction for loading a 64-bit immediate into a register:mov reg,imm with a rex prefix.Notes:- The loader/linker could redirect a distant call (where the offset  does not fit in a 32-bit signed immediate) to a linker-generated  trampoline. However, managing trampolines requires changes in the  loaders and possibly also the object code format, since the trampoline  must be close to the call site, which implies that code and its  trampolines must be created as a unit. This is the better long-term  solution, not just for AMD64 but also for SPARC32 and PowerPC,  both of which have similar problems.- The constants pool could also be restricted to the low 32 bits of  the address space. However:  * We want to move away from a single constants pool. With multiple    areas, the address space restriction may be unrealistic.  * Creating the address of a literal is an infrequent operation, so    the performance impact of using 64-bit immediates should be minor.Stack Frame LayoutGarbage Collection InterfaceBIFsStacks and Unix Signal Handlers-------------------------------Same as in the HiPE/x86 ABI.Standard C/AMD64 Calling Conventions====================================See <http://www.x86-64.org/abi.pdf>.%rax, %rdx, %rcx, %rsi, %rdi, %r8, %r9, %r10, %r11 are clobbered by calls (caller-save)%rsp, %rbp, %rbx, %r12, %r13, %r14, %r15 are preserved by calls (callee-save)[note: %rsi and %rdi are calleR-save, nor calleE-save as in the x86 ABI]%rsp is the stack pointer (fixed). It is required that ((%rsp+8) & 15) == 0when a function is entered. (Section 3.2.2 in the ABI document.)%rbp is optional frame pointer or local variableThe first six integer parameters are passed in %rdi, %rsi, %rdx, %rcx, %r8, and %r9.Remaining integer parameters are pushed right-to-left on the stack.When calling a variadic function, %rax (%al actually) must contain an upperbound on the number of SSE parameter registers, 0-8 inclusive.%r10 is used for passing a function's static chain pointer.%r11 is available for PLT code when computing the target address.The first integer return value is put in %rax, the second (for __int128) in %rdx.A memory return value (exact definition is complicated, but basically "large struct"),is implemented as follows: the caller passes a pointer in %rdi as a hidden firstparameter, the callee stores the result there and returns this pointer in %rax.The caller deallocates stacked parameters after return (addq $N, %rsp).

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -