📄 arminstrs.txt
字号:
AARRMM IInnssttrruuccttiioonn FFoorrmmaattss aanndd TTiimmiinnggssLLaasstt rreevviisseedd:: 1155tthh NNoovveemmbbeerr 11999955The information included here is provided in good faith, but no responsi-bility can be accepted for any damage or loss caused from the use of infor-mation contained within this document even if the author has been advisedof the possibility of such loss.This is not an official document from ARM Ltd; in fact other than a coupleof nice people from ARM limited pointing out some of the corrections, theyhave no connection with this document at all. They do not guarantee to havefound all the mistakes in this, so don't blame them when you find somemore.Corrections/amendments for this document would be most welcome. They shouldbe reported to Robin Watts at the address below.Throughout this document, a `word' refers to 32 bits (thats 4 bytes) ofmemory. If you don't like this, tough.This document is available in several forms. An index to them can be foundit http://www.comlab.ox.ac.uk/oucl/users/robin.watts/ARMinstrs/ on theWorld Wide Web, or via anonymous FTP to ftp.comlab.ox.ac.uk in/tmp/Robin.Watts/ARMinstrs/README.11.. PPrroocceessssoorr MMooddeessARM processors have a user mode and a number of privileged supervisormodes. These are used as follows:IRQ Entered when an Interrupt Request (IRQ) is triggered.FIQ Entered when a Fast Interrupt Request (FIQ) is triggered.SVC Entered when a Software Interrupt (SWI) is executed.Undef Entered when an Undefined instruction is executed (Not ARM 2 and 3, where SVC mode is entered).Abt Entered when a memory access attempt is aborted by the memory man- ager (e.g. MEMC or MMU), usually because an attempt is made to access non-existent memory or to access memory from an insuffi- ciently privileged mode (Not ARM 2 and 3, where SVC mode is entered).In each case the appropriate hardware vector is also called. -2-22.. RReeggiisstteerrssThe ARM 2 and 3 have 27 32 bit processor registers, 16 of which are visibleat any given time (which sixteen varies according to the processor mode).These are referred to as R0-R15.The ARM 6 and later have 31 32 bit processor registers, again 16 of whichare visible at any given time.R15 has special significance. On the ARM 2 and 3, 24 bits are used as theprogram counter, and the remaining 8 bits are used to hold processor mode,status flags and interrupt modes. R15 is therefore often referred to as PC. R15 = PC = NZCVIFpp pppppppp pppppppp ppppMMBits 0-1 and 26-31 are known as the PSR (processor status register). Bits2-25 give the address (in words) of the instruction currently being fetchedinto the execution pipeline (see below). Thus instructions are only everexecuted from word aligned addresses. M Current processor mode 0 User Mode 1 Fast interrupt processing mode (FIQ mode) 2 Interrupt processing mode (IRQ mode) 3 Supervisor mode (SVC mode) Name Meaning N Negative flag Z Zero flag C Carry flag V oVerflow flag I Interrupt request disable F Fast interrupt request disableR14, R14_FIQ, R14_IRQ, and R14_SVC are sometimes known as `link' registersdue to their behaviour during the branch with link instructions.The ARM 6 and later processor cores support a 32 bit address space. Suchprocessors can operate in both 26 bit and 32 bit PC modes. In 26 bit PCmode, R15 acts as on previous processors, and hence code can only be run inthe lowest 64MBytes of the address space. In 32 bit PC mode, all 32 bits ofR15 are used as the program counter. Separate status registers are used tostore the processor mode and status flags. These are defined as follows: NZCVxxxx xxxxxxxx xxxxxxxx IFxMMMMMNote that the bottom two bits of R15 are always zero in 32-bit modes --i.e. you can still only get word-aligned instructions. Any attempts towrite non-zeros to these bits will be ignored. -3-The following modes are currently defined: M Name Meaning 00000 usr_26 26 bit PC User Mode 00001 fiq_26 26 bit PC FIQ Mode 00010 irq_26 26 bit PC IRQ Mode 00011 svc_26 26 bit PC SVC Mode 10000 usr_32 32 bit PC User Mode 10001 fiq_32 32 bit PC FIQ Mode 10010 irq_32 32 bit PC IRQ Mode 10011 svc_32 32 bit PC SVC Mode 10111 abt_32 32 bit PC Abt Mode 11011 und_32 32 bit PC Und ModeExtrapolating from the above table, it might be expected that the followingtwo modes are also defined: M Name Meaning 00111 abt_26 26 bit PC Abt Mode 01011 und_26 26 bit PC Und ModeThese are in fact undefined (and if you ddoo write 00111 or 01011 to the modebits, the resulting chip state won't be what you might expect -- i.e. itwon't be a 26-bit privileged mode with the appropriate R13 and R14 swappedin).The following table shows which registers are available in which processormodes: Mode Registers available USR R0 -- R14 R15 FIQ R0 -- R7 R8_FIQ -- R14_FIQ R15 IRQ R0 -- R12 R13_IRQ -- R14_IRQ R15 SVC R0 -- R12 R13_SVC -- R14_SVC R15 ABT R0 -- R12 R13_ABT -- R14_ABT R15 (ARM 6 and later only) UND R0 -- R12 R13_UND -- R14_UND R15 (ARM 6 and later only)There are six status registers on the ARM6 and later processors. One is thecurrent processor status register (CPSR) and holds information about thecurrent state of the processor. The other five are the saved processor sta-tus registers (SPSRs): there is one of these for each privileged mode, tohold information about the state the processor must be returned to whenexception handling in that mode is complete.These registers are set and read using the MSR and MRS instructions respec-tively. -4-33.. PPiippeelliinneeRather than being a microcoded processor, the ARM is (in keeping with itsRISCness) entirely hardwired.To speed execution the ARM 2 and 3 have 3 stage pipelines. The first stageholds the instruction being fetched from memory. The second starts thedecoding, and the third is where it is actually executed. Due to this, theprogram counter is always 2 instructions beyond the currently executinginstruction. (This must be taken account of when calculating offsets forbranch instructions).Because of this pipeline, 2 instruction cycles are lost on a branch (as thepipeline must refill). It is therefore often preferable to make use of con-ditional instructions to avoid wasting cycles. For example: ...... CCMMPP RR00,,##00 BBEEQQ oovveerr MMOOVV RR11,,##11 MMOOVV RR22,,##22oovveerr ......can be more efficiently written as: ...... CCMMPP RR00,,##00 MMOOVVNNEE RR11,,##11 MMOOVVNNEE RR22,,##22 ......44.. TTiimmiinnggssARM instructions are timed in a mixture of S, N, I and C cycles.An S-cycle is a cycle in which the ARM accesses a sequential memory loca-tion.An N-cycle is a cycle in which the ARM accesses a non-sequential memorylocation.An I-cycle is a cycle in which the ARM doesn't try to access a memory loca-tion or to transfer a word to or from a coprocessor.A C-cycle is a cycle in which a word is transferred between the ARM and acoprocessor on either the data bus (for uncached ARMs) or the coprocessorbus (for cached ARMs). -5-The different types of cycle must all be at least as long as the ARM'sclock rating. The memory system can stretch them: with a typical DRAM sys-tem, this results in:+o N-cycles being twice the minimum length (essentially because DRAMs require a longer access protocol when the memory access is non-sequen- tial).+o S-cycles usually being the minimum length, but occasionally being stretched to N-cycle length (when you've just moved sequentially from the last word of one memory "row" to the first of the next one1).+o I- and C-cycles always being the minimum length.With a typical SRAM system, all four types of cycle are typically the mini-mum length.On the 8MHz ARM2 used in the Acorn Archimedes A440/1, an S (sequential)cycle is 125ns and an N (non-sequential) cycle is 250ns. It should be notedthat these timings are nnoott attributes of the ARM, but of the memory system.E.g. an 8MHz ARM2 can be connected to a static RAM system which gives a125ns N cycle. The fact that the processor is rated at 8MHz simply meansthat it isn't guaranteed to work if you make any of the types of cycleshorter than 125ns in length.Cached processors: All the information given is in terms of the clockcycles seen by the ARM. These do not occur at a constant rate: the cachecontrol logic changes the source of the clock cycles presented to the ARMwhen cache misses occur.Generally, a cached ARM has two clock inputs: the "fast clock" FCLK and the"memory clock" MCLK. When operating normally from cache, the ARM is clockedat FCLK speed and all types of cycle are the minimum length: cache iseffectively a type of SRAM from this point of view. When a cache missoccurs, the ARM's clock is synchronised to MCLK, then the cache line filltakes place at MCLK speed (taking either N+3S or N+7S depending on thelength of cache lines in the processor involved), then the ARM's clock isresynchronised back to FCLK.While the memory access is taking place, the ARM is being clocked: however,an input called NWAIT is used to cause the ARM cycles involved not to do----------- 1 Memory controllers tend to use this simple strategy: if an N-cycleis requested, treat the access as not being in the same row; if an S-cycle is requested, treat the access as being in the same row unlessit is effectively the last word in the row (which can be detectedquickly). The net result is that ssoommee S-cycles will last the same timeas an N-cycle; if I remember correctly, on an Archimedes these are S-cycle accesses to an address which is divisible by 16. The practicalconsequences of this for Archimedes code are: (a) that about 1 in 4 S-cycles becomes an N-cycle, since for this purpose, all addresses areword addresses and so divisible by 4; (b) that it is occasionallyworth taking care to align code carefully to avoid this effect and getsome extra performance.) -6-anything until the correct word arrives from memory, and usually not to doanything while the remaining words arrive (to avoid getting further memoryrequests while the cache is still busy with the cache line refill). Thesituation is also complicated by the fact that the cached ARM can be con-figured either for FCLK and MCLK to be synchronous to each other (so FCLKis an exact multiple of MCLK, and every MCLK clock cycle starts at justabout the same time as an FCLK cycle) or asynchronous (in which case FCLKand MCLK cycles can have any relationship to each other).All in all, the situation is therefore quite complicated. An approximationto the behaviour is that when a cache line miss occurs, the cycle involvedtakes the cache line refill time (i.e. N+3S or N+7S) in MCLK cycles, withN-cycles and S-cycles probably being stretched as described above for DRAM,plus a few more cycles to allow for the resynchronisation periods. For anymore details, you really need to get a datasheet for the processorinvolved.55.. IInnssttrruuccttiioonnssEach ARM instruction is 32 bits wide, and are explained in more detailbelow. For each instruction class we give the instruction bitmap, and anexample of the syntax used by a typical assembler.It should of course be noted that the mnemonic syntax is not fixed; it is aproperty of the assembler, not the ARM machine code.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -