📄 exception.s

📁 RTEMS (Real-Time Executive for Multiprocessor Systems) is a free open source real-time operating sys
💻 S
📖 第 1 页 / 共 2 页
字号:
上一页 12
0:	la      r0,48(r2)     # Use explicit loop to avoid using ctr1:	cmpw    r1,r3         # In theory the loop is somewhat slower	beq-    2f            # than documentation example	cmpw    r0,r2         # but we gain from starting cache load 	lwzu    r1,8(r2)      # earlier and using slots between load 	bne+    1b            # and comparison for other purposes.  	cmpw    r1,r3	bne-    4f            # Secondary hash check2:	lwz     r1,4(r2)      # Found:	load second word of PTE 	mfspr   r0,DMISS      # get miss address during load delay#ifdef ASSUME_REF_SET	mtspr	RPA,r1	mfsrr1	r3	tlbld	r0#else	andi.   r3,r1,0x100   # check R bit ahead to help folding	mfsrr1  r3            # get saved cr0 bits now to dual issue	ori     r1,r1,0x100	mtspr   RPA,r1	tlbld   r0/* Do not update PTE if R bit already set, this will save one cache linewriteback at a later time, and avoid even more bus traffic inmultiprocessing systems, when several processors access the same PTEGs.We also hope that the reference bit will be already set. */	bne+    3f#ifdef MULTIPROCESSING		srwi    r1,r1,8       # get byte 7 of pte	stb     r1,+6(r2)     # update page table#else	sth     r1,+6(r2)     # update page table#endif#endif3:	mtcrf   0x80,r3       # restore CR0	rfi                   # return to executing program               /* The preceding code is 18 to 23 instructions long, which occupies3 cache lines. */4:	andi.   r0,r3,0x0040  # see if we have done second hash	lis     r1,0x4000     # set up error code in case next branch taken	bne-    9f            # speculatively issue the following	mfspr   r2,HASH2      # get the second pointer	ori     r3,r3,0x0040  # change the compare value	lwz     r1,0(r2)      # load first entry asap	b       0b            # and go back to main loop/* We are now at 25 to 30 instructions, using 3 or 4 cache lines for allcases in which the TLB is successfully loaded. */ /*   Data TLB miss on store or not dirty page flow     Entry at 0x1200 with the following:	       srr0 -> address of instruction that caused the miss       srr1 -> 0:3=cr0, 13=0 (data), 14=lru way, 15=1, 16:31=saved MSR       msr<tgpr> -> 1       dMiss -> ea that missed       dCmp -> the compare value for the va that missed       hash1 -> pointer to first hash pteg      hash2 -> pointer to second hash pteg      Register usage:       r0 is limit address during search / scratch after       r1 is pte data / error code for DSI exception when search fails      r2 is pointer to pte       r3 is compare value during search / scratch after*/		.org	tlb_handlers+0x200	mfspr   r2,HASH1      	lwz     r1,0(r2)      # Start memory access as soon as possible	mfspr   r3,DCMP       # to load the cache.  0:	la      r0,48(r2)     # Use explicit loop to avoid using ctr1:	cmpw    r1,r3         # In theory the loop is somewhat slower	beq-    2f            # than documentation example	cmpw    r0,r2         # but we gain from starting cache load 	lwzu    r1,8(r2)      # earlier and using slots between load 	bne+    1b            # and comparison for other purposes.  	cmpw    r1,r3	bne-    4f            # Secondary hash check2:	lwz     r1,4(r2)      # Found:	load second word of PTE 	mfspr   r0,DMISS      # get miss address during load delay/* We could simply set the C bit and then rely on hardware to flag protection violations. This raises the problem that a page which actually has not been modified may be marked as dirty and violates the OEA model for guaranteed bit settings (table 5-8 of 603eUM.pdf). This can have harmful consequences on operating system memory management routines, and play havoc with copy on write schemes. So the protection check is ABSOLUTELY necessary. */	andi.   r3,r1,0x80    # check C bit	beq-    5f            # if (C==0) go to check protection 3:	mfsrr1  r3            # get the saved cr0 bits 	mtspr   RPA,r1        # set the pte	tlbld   r0            # load the dtlb 	mtcrf   0x80,r3       # restore CR0 	rfi                   # return to executing program /* The preceding code is 20 instructions long, which occupy3 cache lines. */ 4:	andi.   r0,r3,0x0040  # see if we have done second hash	lis     r1,0x4200     # set up error code in case next branch taken	bne-    9f            # speculatively issue the following	mfspr   r2,HASH2      # get the second pointer	ori     r3,r3,0x0040  # change the compare value	lwz     r1,0(r2)      # load first entry asap	b       0b            # and go back to main loop/* We are now at 27 instructions, using 3 or 4 cache lines for allcases in which the TLB C bit is already set. */#ifdef DIRTY_MEANS_WRITABLE5:	lis     r1,0x0A00     # protection violation on store#else/*   Entry found and C==0: check protection before setting C:     Register usage:       r0 is dMiss register      r1 is PTE entry (to be copied to RPA if success)       r2 is pointer to pte       r3 is trashed     For the 603e, the key bit in SRR1 helps to decide whether there is a  protection violation. However the way the check is done in the manual is  not very efficient. The code shown here works as well for 603 and 603e and  is much more efficient for the 603 and comparable to the manual example  for 603e. This code however has quite a bad structure due to the fact it   has been reordered to speed up the most common cases.  */	 /* The first of the following two instructions could be replaced byandi. r3,r1,3 but it would compete with cmplwi for cr0 resource. */5:	clrlwi  r3,r1,30      # Extract two low order bits	cmplwi  r3,2          # Test for PP=10	bne-    7f            # assume fallthrough is more frequent6:	ori     r1,r1,0x180   # set referenced and changed bit	sth     r1,6(r2)      # update page table	b       3b            # and finish loading TLB/* We are now at 33 instructions, using 5 cache lines. */7:	bgt-    8f            # if PP=11 then DSI protection exception/* This code only works if key bit is present (602/603e/603ev) */#ifdef USE_KEY_BIT		mfsrr1  r3            # get the KEY bit and test it	andis.  r3,r3,0x0008	beq     6b            # default prediction taken, truly better ?#else	/* This code is for all 602 and 603 family models: */	mfsrr1  r3            # Here the trick is to use the MSR PR bit as a	mfsrin  r0,r0         # shift count for an rlwnm. instruction which	extrwi  r3,r3,1,17    # extracts and tests the correct key bit from	rlwnm.  r3,r0,r3,1,1  # the segment register. RISC they said...	mfspr   r0,DMISS      # Restore fault address to r0 	beq     6b            # if 0 load tlb else protection fault#endif/* We are now at 40 instructions, (37 if using key bit), using 5 cachelines in all cases in which the C bit is successfully set */8:	lis     r1,0x0A00     # protection violation on store#endif /* DIRTY_IS_WRITABLE *//* PTE entry not found branch here with DSISR code in r1 */	9:	mfsrr1  r3	mtdsisr r1	clrlwi  r2,r3,16      # set up srr1 for DSI exception 	mfmsr   r0/* I have some doubts about the usefulness of the xori instruction inmixed or pure little-endian environment. The address is in the samedoubleword, hence in the same protection domain and performing an exclusiveor with 7 is only valid for byte accesses. */#ifdef CHECK_MIXED_ENDIAN		andi.   r1,r2,1       # test LE bit ahead to help folding#endif	mtsrr1  r2	rlwinm  r0,r0,0,15,13 # clear the msr<tgpr> bit 	mfspr   r1,DMISS      # get miss address#ifdef CHECK_MIXED_ENDIAN	beq     1f            # if little endian then: 	xori    r1,r1,0x07    # de-mung the data address 1:#endif		mtdar   r1            # put in dar 	mtcrf   0x80,r3       # restore CR0 	mtmsr   r0            # flip back to the native gprs	isync                 # required from 602 manual 	b       DSIVec        # branch to DSI exception/* We are now between 50 and 56 instructions. Close to the limitbut should be sufficient in case bugs are found. *//* Altogether the three handlers occupy 128 instructions in the worst case, 64 instructions could still be added (non contiguously). */	.org	tlb_handlers+0x300	.globl	_handler_glue_handler_glue:/* Entry code for exceptions: DSI (0x300), ISI(0x400), alignment(0x600) and * traps(0x700). In theory it is not necessary to save and restore r13 and all * higher numbered registers, but it is done because it allowed to call the  * firmware (PPCBug) for debugging in the very first stages when writing the  * bootloader. */	stwu	r1,-160(r1)	stw	r0,save_r(0)	mflr	r0	stmw	r2,save_r(2)	bl	0f0:	mfctr	r4	stw	r0,save_lr	mflr	r9		/* Interrupt vector + few instructions */	la	r10,160(r1)	stw	r4,save_ctr	mfcr	r5	lwz	r8,2f-0b(r9)	mfxer	r6	stw	r5,save_cr	mtctr	r8	stw	r6,save_xer	mfsrr0	r7	stw	r10,save_r(1)	mfsrr1	r8	stw	r7,save_nip	la	r4,8(r1)	lwz	r13,1f-0b(r9)	rlwinm	r3,r9,24,0x3f	/* Interrupt vector >> 8 */	stw	r8,save_msr	bctrl	lwz	r7,save_msr	lwz	r6,save_nip	mtsrr1	r7	lwz	r5,save_xer	mtsrr0	r6	lwz	r4,save_ctr	mtxer	r5	lwz	r3,save_lr	mtctr	r4	lwz	r0,save_cr	mtlr	r3	lmw	r2,save_r(2)	mtcr	r0	lwz	r0,save_r(0)	la	r1,160(r1)	rfi1:	.long	(__bd)@fixup2:	.long	(_handler)@fixup	.section .fixup,"aw"	.align	2	.long 1b, 2b	.previous
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -