📄 apcsintro.html

📁 关于ARM汇编的非常好的教程
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
        EXPORT  one        EXPORT  mainmain        MOV     ip, sp        STMFD   sp!, {fp,ip,lr,pc}        SUB     fp, ip, #4        CMPS    sp, sl        BLLT    |x$stack_overflow|        BL      one        MOV     a1, #0        LDMEA   fp, {fp,sp,pc}^        DCB     &amp;6f,&amp;6e,&amp;65,&amp;00        DCD     &amp;ff000004        EXPORT  zero        EXPORT  twoone        MOV     ip, sp        STMFD   sp!, {fp,ip,lr,pc}        SUB     fp, ip, #4        CMPS    sp, sl        BLLT    |x$stack_overflow|        BL      zero        LDMEA   fp, {fp,sp,lr}        B       two        IMPORT  |_printf|two        ADD     a1, pc, #L000060-.-8        B       |_printf|L000060        DCB     &amp;6d,&amp;61,&amp;69,&amp;6e        DCB     &amp;2e,&amp;2e,&amp;2e,&amp;6f        DCB     &amp;6e,&amp;65,&amp;2e,&amp;2e        DCB     &amp;2e,&amp;74,&amp;77,&amp;6f        DCB     &amp;0a,&amp;00,&amp;00,&amp;00zero        MOVS    pc, lr        AREA |C$$data||x$dataseg|        END</font></pre>The example code is not 32 bit compliant. However the APCS-32 specification simply states thatflags need not be preserved. Thus, remove the '^' on the LDMs, and remove the 'S' from the MOVSin zero. Then the code is pretty much the same as that generated by a 32-bit aware compiler.<p>The save code pointer points to a location twelve bytes beyond the start of the code which set upthat backtrace structure. You can see this in the example. Remember, you will need to strip offthe PSR for 26-bit code.<p>So now we turn to our function, 'two'. As soon as execution enters 'two':<ul>  <li> <i>pc</i> contains the location of the next instruction(s) to be executed, as always       <br>&nbsp;<br>  <li> <i>lr</i> contains the value to load into <i>pc</i> to exit (as always). This will also       contain the PSR in 26-bit code.       <br>&nbsp;<br>  <li> <i>sp</i> points to the current stack chunk limit, or above it. This is the place you can       dump temporary data into, registers and the like. Under RISC OS, you have at least 256       bytes with the option to extend it.       <br>&nbsp;<br>  <li> <i>fp</i> is either zero, or it points to the most recent part of the backtrace structure.       <br>&nbsp;<br>  <li> Function arguments are arranged as described (below).</ul><p><h2>Arguments</h2>The layout of records, arrays, and the like is not defined by APCS. Thus, a language is free todefine how it performs these activities. However making your own implementation is not really inthe spirit of APCS as it would not permit code from your compiler to be linked with code fromanother compiler. Typically, the C language conventions are utilised.<p><ul>  <li> The first four integer arguments (or less, if less!) are loaded into a1 - a4.       <br>&nbsp;<br>  <li> The first four floating point arguments (or less, if less) are loaded into f0 - f3.       <br>&nbsp;<br>  <li> Anything else (if anything) is stored in memory, pointed to by the words immediately       above the value of sp on entry. In other words, the remaining arguments have been pushed       onto the stack. It seems, therefore, that optimisation may be made simply by defining       functions to receive four or less parameters.</ul><p><h2>Leaving the function</h2>The return link value is moved into the program counter to exit the function, and:<ul>  <li> If the function returns a value of, or less, than a word in size, that value is to be       present in a1.       <br>&nbsp;<br>  <li> If the function returns a floating point value, then it is to be present in f0.       <br>&nbsp;<br>  <li> sp, fp, sl, v1-v6, and f4-f7 shall be restored (if altered) to contain the values that       were present on entry.<br>       I have tested corrupting the registers, intentionally, and can report that the results       could be the most unexpected and bizarre glitches (often in totally different parts of the       program), as well as the expected 'uh-oh!'.       <br>&nbsp;<br>  <li> ip, lr, a2-a4, and f1-f3 and those arguments that were stacked may be corrupted.</ul>In 32bit modes, the PSR flags do not need to be preserved across a function call. In 26bit modesthey should be, and would be implicitly restored by moving lr into pc (MOVS, or LDMFD xxx^).<br>The N, Z, C, and V <i>must</i> be reloaded from lr, it is not enough to preserve the flags acrossthe function.<p><h2>APCSs</h2>Globally, there are several versions of APCS (16, in fact). We are, however, only going toconcern ourselves with those you may encounter on RISC OS.<p><b>APCS-A</b><br>This is APCS-Arthur; and was defined in the dark days of Arthur. You may come across it(unlikely, though), or references to it, so it is worth knowing it exists. It has been deprecatedand due to differing register definitions (that seem somehow alien to a seasoned RISC OS coder),it should not be used.<br>It was for Arthur applications running in USR mode.<br><code>sl = R13, fp = R10, ip = R11, sp = R12, lr = R14, pc = R15.</code><br>The PRM (p4-411) says &quot;<i>Use of <code>r12</code> as <code>sp</code>, rather than thearchitecturally more natural <code>r13</code>, is historical and predates both Arthur and RISCOS.</i>&quot;<br>The stack is segmented and is extended on demand.<br>26-bit program counter.<br>No passing of floating point arguments in FP registers.<br>Non-reentrant.Flags must be restored.<p><b>APCS-R</b><br>This is APCS-RISC OS. It is for RISC OS applications operating in USR mode; or modules/handlersin SVC mode.<br><code>sl = R10, fp = R11, ip = R12, sp = R13, lr = R14, pc = R15.</code><br>This is the single most common APCS version, as all compiled C programs will have usedAPCS-R.<br>Explicit stack limit checking<br>26-bit program counter.<br>No passing of floating point arguments in FP registers.<br>Non-reentrant.Flags must be restored.<p><b>APCS-U</b><br>This is APCS-Unix, used in Acorn's RISCiX. It is for RISCiX applications (USR mode) or the kernel(SVC mode).<code>sl = R10, fp = R11, ip = R12, sp = R13, lr = R14, pc = R15.</code><br>Implicit stack limit checking (with sl)<br>26-bit program counter.<br>No passing of floating point arguments in FP registers.<br>Non-reentrant.Flags must be restored.<p><b>APCS-32</b><br>This is an extension of APCS-2 (-R and -U) which allows for a 32bit program counter, and forflags to not be restored on exit from a function executing in USR mode.<br>Other things as for APCS-R.<br>Acorn C version 5 supports the generation of 32bit code; the most complete being the developmentrelease of the 32bit tools for wide-area debugging. A simple test is to ask your compiler toexport assembler source (instead of making object code). You should <i>not</i> find:<br><code>MOV<b><i>S</i></b> PC, R14</code><br>or:<br><code>LDMFD R13!, {Rx-x, PC}<b><i>^</i></b></code><p><h2>Creating a stack backtrace structure</h2>For simple functions (fixed number of parameters, non-reentrant), you can create a stackbacktrace structure in a few instructions:<pre>function_name_label        MOV     ip, sp        STMFD   sp!, {fp,ip,lr,pc}        SUB     fp, ip, #4</pre>That snippet (from the aforementioned compiled program) is the most basic form. If you intend tocorrupt some of the non-corruptable registers, then you should include that register in theSTMFD command.<p>Your next task is to check the stack space. If you don't need much space (less than 256 bytes)then you can use:<pre>        CMPS    sp, sl        BLLT    |x$stack_overflow|<pre>That is the C version 4.00 way of handling overflows. In later versions, you will want to call<code>|__rt_stkovf_split_small|</code>.<p>Then you do your stuff...<p>Exiting is performed by:<pre>        LDMEA   fp, {fp,sp,pc}^</pre>Again, if you stacked other registers, then reload them here.<br>The exit mechanism was chosen because it is easier and saner to simply LDM... to exit a functionthan to branch to a special function exit handler.<p>An extension to the protocol, used in backtracing, is to embed the function name into thecode.<br>Immediately before the function (and the <code>MOV ip, sp</code>), you should have the following:<pre>        DCD     &amp;ff0000xx</pre>Where 'xx' is the length of the function name string (including padding and terminator). Thisstring is word-aligned, tail-padded, and should be placed directly before the DCD &amp;ff....<p>So, your complete stack backtrace code would look like:<pre>        DCB     "my_function_name", 0, 0, 0, 0        DCD     &amp;ff000010my_function_name        MOV     ip, sp        STMFD   sp!, {fp, ip, lr, pc}        SUB     fp, ip, #4        CMPS    sp, sl                    ; may be omitted if you        BLLT    |x$stack_overflow|        ; won't be using stack        ...process...        LDMEA   fp, {fp, sp, pc}^</pre>To make this 32-bit compliant, simply omit the '^' in the final instruction. Note that you thencannot use that code within 26-bit compiled code. Truth be told, you <i>may</i> get away with it,but it's not something I'd like to bet on.<p>&nbsp;<p>If you use no stack, and you don't need to save any registers, and you don't call anything, thensetting up an APCS block is unnecessary (but might be useful to track down problems during thedebug stage).<br>In this case, you could:<pre>my_simple_function        ...process...        MOVS    pc, lr</pre>(again, use MOV instead of MOVS for 32bit APCS, but don't take your chances linking with 26bitcode).<p><h2>Useful codey things</h2>The first thing to consider is that dratted 26/32 bit issue. Put simply, there is absolutely noway in hell that the same general-purpose code can be assembled for both versions of APCS,without some hairy and devious tricks.<br>But, frankly, this isn't an issue. We know that your APCS standard isn't going to suddenlychange. We also know that a 32bit version of RISC OS is not going to transmogrify itself whenyou pop out to brew a cuppa.<br>So utilising this, we can devise a scheme to support both versions. This goes far beyond theAPCS, for a 32bit version of RISC OS you will need to think of poking around with MSR to dealwith status and mode bits, instead of the old TEQP-is-your-friend.<br>Many existing APIs don't actually require flags to be preserved. So in our 32bit version we canget away by changing <code>MOVS PC,...</code> to <code>MOV PC,...</code>, and <code>LDM{...}^</code> to <code>LDM {...}</code>, and rebuilding.<br>The <i>objasm</i> assembler (v3.00 or later) have a <code>{CONFIG}</code> variable which will beeither <code>26</code> or <code>32</code>. Using this, it is possible to build macros...<pre>my_function_name        MOV     ip, sp        STMFD   sp!, {fp, ip, lr, pc}        SUB     fp, ip, #4        ...process...        [ {CONFIG} = 26          LDMEA   fp, {fp, sp, pc}^        |          LDMEA   fp, {fp, sp, pc}        ]</pre>I've not tested this code. It (or something like it) is likely to be the best way to staycompatible with both versions of APCS, and also with both versions of RISC OS, the 26bit versionand the future 32bit version.<p>&nbsp;<p>Testing for 32bit?<br>If you require your code to be adaptive, there is a simple test to determine the processor PCstate. From this, you can determine:<ul>  <li> 26bit PC, may be APCS-R or APCS-32.  <li> 32bit PC, will <i>never</i> be APCS-R. All 26-bit code (TEQP etc) doomed to failure!</ul><p><pre>   TEQ     PC, PC     ; EQ for 32bit; NE for 26bit</pre><p>&nbsp;<p>First case optimisation<br>Let's say we have a function like:<pre>  int getbytefromcache(ptr)  {     /* ptr is pointer to cache value 0...xxxx */     int __ptr = ptr;     if (__ptr > __cachebase)     {        __ptr -= __cachebase;        if (__ptr < __cachelimit)           return (int)__cache[__ptr];     }     /* flush the cache, reload wanted block, return value */     ...</pre>It's a crappy example I devised off the top of my head. You have a pointer which can point intoa total area of memory, and a cache of a small part.<br>If you look at it, the lead tests are pretty simple and quick. You could perform these withoutthe APCS overheads. Something like:<pre>getbytefromcache        LDR     a4, __cachebase        CMP     a1, a4        BLT     getbytefromcache_entry        SUB     a2, a1, a4        LDR     a4, __cachesize        CMP     a2, a4        LDRLTB  a1, [a2]        MOVLT   pc, lr        ; fall through if not LTgetbytefromcache_entry        MOV     ip, sp        STMFD   sp!, {fp, ip, lr, pc}        SUB     fp, ip, #4        ... stuff ...        LDRB    a1, [a#]        [ {CONFIG} = 26          LDMEA   fp, {fp, sp, pc}^        |          LDMEA   fp, {fp, sp, pc}        ]</pre>That example is, again, off of the top of my head so don't blindly copy the code.<tt>:-)</tt><br>The point, though, is if there is something quick and simple that can be done in few instructionswhich can skip that which is done in many instructions, then it may be a worthwhile optimisationto make. A suggested rule of thumb, if it would take less time to execute than twice the timetaken for the APCS stack structure creation, then try to optimise it.<br>(I work visually, counting the numbers of lines and taking STM as being '2'. This is a lotquicker than actually working out how many nanoseconds it would take by counting cycles!)<p>&nbsp;<p><hr size = "3"><a href="index.html#08">Return to assembler index</a><hr size = "3"><address>Copyright &copy; 2001 Richard Murray</address></body></html>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -