📄 ssin.s
字号:
cmpil #0,d0 jgt COSTINYSINTINY: movew #0x0000,a6@(XDCARE) |...JUST IN CASE fmovel d1,fpcr | restore users exceptions fmovex a6@(X),fp0 | last inst - possible exception set jra __x_t_frcinxCOSTINY: .long 0xf23c4400,0x3f800000 /* fmoves &0x3F800000,fp0 */ fmovel d1,fpcr | restore users exceptions .long 0xf23c4428,0x00800000 /* fsubs &0x00800000,fp0 */ jra __x_t_frcinxREDUCEX:|--WHEN REDUCEX IS USED, THE CODE WILL INEVITABLY BE SLOW.|--THIS REDUCTION METHOD, HOWEVER, IS MUCH FASTER THAN USING|--THE REMAINDER INSTRUCTION WHICH IS NOW IN SOFTWARE. fmovemx fp2-fp5,A7@- |...save fp2 through fp5 movel d2,A7@-/* fmoves &0x00000000,fp1 */ .long 0xf23c4480,0x00000000|--If compact form of abs(arg) in d0=0x7ffeffff, argument is so large that|--there is a danger of unwanted overflow in first LOOP iteration. In this|--case, reduce argument by one remainder step to make subsequent reduction|--safe. cmpil #0x7ffeffff,d0 | is argument dangerously large? jne LOOP movel #0x7ffe0000,a6@(FP_SCR2) | yes| | create 2**16383*PI/2 movel #0xc90fdaa2,a6@(FP_SCR2+4) clrl a6@(FP_SCR2+8) ftstx fp0 | test sign of argument movel #0x7fdc0000,a6@(FP_SCR3) | create low half of 2**16383*| | PI/2 at FP_SCR3 movel #0x85a308d3,a6@(FP_SCR3+4) clrl a6@(FP_SCR3+8) fblt red_neg orw #0x8000,a6@(FP_SCR2) | positive arg orw #0x8000,a6@(FP_SCR3)red_neg: faddx a6@(FP_SCR2),fp0 | high part of reduction is exact fmovex fp0,fp1 | save high result in fp1 faddx a6@(FP_SCR3),fp0 | low part of reduction fsubx fp0,fp1 | determine low component of result faddx a6@(FP_SCR3),fp1 | fp0/fp1 are reduced argument.|--ON ENTRY, FP0 IS X, ON RETURN, FP0 IS X REM PI/2, |X| <= PI/4.|--integer quotient will be stored in N|--Intermeditate remainder is 66-bit long| (R,r) in (FP0,FP1)LOOP: fmovex fp0,a6@(INARG) |...+-2**K * F, 1 <= F < 2 movew a6@(INARG),d0 movel d0,a1 |...save a copy of d0 andil #0x00007FFF,d0 subil #0x00003FFF,d0 |...D0 IS K cmpil #28,d0 jle LASTLOOPCONTLOOP: subil #27,d0 |...D0 IS L := K-27 movel #0,a6@(ENDFLAG) jra WORKLASTLOOP: clrl d0 |...D0 IS L := 0 movel #1,a6@(ENDFLAG)WORK:|--FIND THE REMAINDER OF (R,r) W.R.T. 2**L * (PI/2). L IS SO CHOSEN|--THAT INT( X * (2/PI) / 2**(L) ) < 2**29.|--CREATE 2**(-L) * (2/PI), SIGN(INARG)*2**(63),|--2**L * (PIby2_1), 2**L * (PIby2_2) movel #0x00003FFE,d2 |...BIASED EXPO OF 2/PI subl d0,d2 |...BIASED EXPO OF 2**(-L)*(2/PI) movel #0xA2F9836E,a6@(FP_SCR1+4) movel #0x4E44152A,a6@(FP_SCR1+8) movew d2,a6@(FP_SCR1) |...FP_SCR1 is 2**(-L)*(2/PI) fmovex fp0,fp2 fmulx a6@(FP_SCR1),fp2|--WE MUST NOW FIND INT(FP2). SINCE WE NEED THIS VALUE IN/* |--FLOATING POINT FORMAT, THE TWO FMOVE'S FMOVE.l FP <--> N */|--WILL BE TOO INEFFICIENT. THE WAY AROUND IT IS THAT|--(SIGN(INARG)*2**63 + FP2) - SIGN(INARG)*2**63 WILL GIVE|--US THE DESIRED VALUE IN FLOATING POINT.|--HIDE SIX CYCLES OF INSTRUCTION movel a1,d2 swap d2 andil #0x80000000,d2 oril #0x5F000000,d2 |...D2 IS SIGN(INARG)*2**63 IN SGL movel d2,a6@(TWOTO63) movel d0,d2 addil #0x00003FFF,d2 |...BIASED EXPO OF 2**L * (PI/2)|--FP2 IS READY fadds a6@(TWOTO63),fp2 |...THE FRACTIONAL PART OF fp1 IS ROUNDED|--HIDE 4 CYCLES OF INSTRUCTION| creating 2**(L)*Piby2_1 and 2**(L)*Piby2_2 movew d2,a6@(FP_SCR2) clrw a6@(FP_SCR2+2) movel #0xC90FDAA2,a6@(FP_SCR2+4) clrl a6@(FP_SCR2+8) |...FP_SCR2 is 2**(L) * Piby2_1|--FP2 IS READY fsubs a6@(TWOTO63),fp2 |...FP2 is N addil #0x00003FDD,d0 movew d0,a6@(FP_SCR3) clrw a6@(FP_SCR3+2) movel #0x85A308D3,a6@(FP_SCR3+4) clrl a6@(FP_SCR3+8) |...FP_SCR3 is 2**(L) * Piby2_2 movel a6@(ENDFLAG),d0|--We are now ready to perform (R+r) - N*P1 - N*P2, P1 = 2**(L) * Piby2_1 and|--P2 = 2**(L) * Piby2_2 fmovex fp2,fp4 fmulx a6@(FP_SCR2),fp4 |...w = N*P1 fmovex fp2,fp5 fmulx a6@(FP_SCR3),fp5 |...w = N*P2 fmovex fp4,fp3|--we want P+p = W+w but |p| <= half ulp of P|--Then, we need to compute A := R-P and a := r-p faddx fp5,fp3 |...FP3 is P fsubx fp3,fp4 |...w-P fsubx fp3,fp0 |...FP0 is A := R - P faddx fp5,fp4 |...FP4 is p = (W-P)+w fmovex fp0,fp3 |...FP3 A fsubx fp4,fp1 |...FP1 is a := r - p|--Now we need to normalize (A,a) to "new (R,r)" where R+r = A+a but|--|r| <= half ulp of R. faddx fp1,fp0 |...FP0 is R := A+a|--No need to calculate r if this is the last loop cmpil #0,d0 jgt RESTORE|--Need to calculate r fsubx fp0,fp3 |...A-R faddx fp3,fp1 |...FP1 is r := (A-R)+a jra LOOPRESTORE: fmovel fp2,a6@(N) movel A7@+,d2 fmovemx A7@+,fp2-fp5 movel a6@(ADJN),d0 cmpil #4,d0 jlt SINCONT jra SCCONT .globl __x_ssincosd__x_ssincosd:|--SIN AND COS OF X FOR DENORMALIZED X .long 0xf23c4480,0x3f800000 /* fmoves &0x3F800000,fp1 */ bsrl __x_sto_cos | store cosine result jra __x_t_extdnrm .globl __x_ssincos__x_ssincos:|--SET ADJN TO 4 movel #4,a6@(ADJN) fmovex a0@,fp0 |...lOAD INPUT movel A0@,d0 movew A0@(4),d0 fmovex fp0,a6@(X) andil #0x7FFFFFFF,d0 |...COMPACTIFY X cmpil #0x3FD78000,d0 |...|X| >= 2**(-40)? jge SCOK1 jra SCSMSCOK1: cmpil #0x4004BC7E,d0 |...|X| < 15 PI? jlt SCMAIN jra REDUCEXSCMAIN:|--THIS IS THE USUAL CASE, |X| <= 15 PI.|--THE ARGUMENT REDUCTION IS DONE BY TABLE LOOK UP. fmovex fp0,fp1 fmuld TWOBYPI,fp1 |...X*2/PI|--HIDE THE NEXT THREE INSTRUCTIONS lea __x_PITBL+0x200,a1 |...TABLE OF N*PI/2, N = -32,...,32|--FP1 IS NOW READY fmovel fp1,a6@(N) |...CONVERT TO INTEGER movel a6@(N),d0 asll #4,d0 addal d0,a1 |...ADDRESS OF N*PIBY2, IN Y1, Y2 fsubx A1@+,fp0 |...X-Y1 fsubs A1@,fp0 |...FP0 IS R = (X-Y1)-Y2SCCONT:|--continuation point from REDUCEX|--HIDE THE NEXT TWO movel a6@(N),d0 rorl #1,d0 cmpil #0,d0 |...D0 < 0 IFF N IS ODD jge NEVENNODD:|--REGISTERS SAVED SO FAR: D0, A0, FP2. fmovex fp0,a6@(RPRIME) fmulx fp0,fp0 |...FP0 IS S = R*R fmoved SINA7,fp1 |...A7 fmoved COSB8,fp2 |...B8 fmulx fp0,fp1 |...SA7 movel d2,A7@- movel d0,d2 fmulx fp0,fp2 |...SB8 rorl #1,d2 andil #0x80000000,d2 faddd SINA6,fp1 |...A6+SA7 eorl d0,d2 andil #0x80000000,d2 faddd COSB7,fp2 |...B7+SB8 fmulx fp0,fp1 |...S(A6+SA7) eorl d2,a6@(RPRIME) movel A7@+,d2 fmulx fp0,fp2 |...S(B7+SB8) rorl #1,d0 andil #0x80000000,d0 faddd SINA5,fp1 |...A5+S(A6+SA7) movel #0x3F800000,a6@(POSNEG1) eorl d0,a6@(POSNEG1) faddd COSB6,fp2 |...B6+S(B7+SB8) fmulx fp0,fp1 |...S(A5+S(A6+SA7)) fmulx fp0,fp2 |...S(B6+S(B7+SB8)) fmovex fp0,a6@(SPRIME) faddd SINA4,fp1 |...A4+S(A5+S(A6+SA7)) eorl d0,a6@(SPRIME) faddd COSB5,fp2 |...B5+S(B6+S(B7+SB8)) fmulx fp0,fp1 |...S(A4+...) fmulx fp0,fp2 |...S(B5+...) faddd SINA3,fp1 |...A3+S(A4+...) faddd COSB4,fp2 |...B4+S(B5+...) fmulx fp0,fp1 |...S(A3+...) fmulx fp0,fp2 |...S(B4+...) faddx SINA2,fp1 |...A2+S(A3+...) faddx COSB3,fp2 |...B3+S(B4+...) fmulx fp0,fp1 |...S(A2+...) fmulx fp0,fp2 |...S(B3+...) faddx SINA1,fp1 |...A1+S(A2+...) faddx COSB2,fp2 |...B2+S(B3+...) fmulx fp0,fp1 |...S(A1+...) fmulx fp2,fp0 |...S(B2+...) fmulx a6@(RPRIME),fp1 /* |...R'S(A1+...) */ fadds COSB1,fp0 |...B1+S(B2...) fmulx a6@(SPRIME),fp0 /* |...S'(B1+S(B2+...)) */ movel d1,a7@- | restore users mode # precision andil #0xff,d1 | mask off all exceptions fmovel d1,fpcr faddx a6@(RPRIME),fp1 |...COS(X) bsrl __x_sto_cos | store cosine result fmovel a7@+,fpcr | restore users exceptions fadds a6@(POSNEG1),fp0 |...SIN(X) jra __x_t_frcinxNEVEN:|--REGISTERS SAVED SO FAR: FP2. fmovex fp0,a6@(RPRIME) fmulx fp0,fp0 |...FP0 IS S = R*R fmoved COSB8,fp1 |...B8 fmoved SINA7,fp2 |...A7 fmulx fp0,fp1 |...SB8 fmovex fp0,a6@(SPRIME) fmulx fp0,fp2 |...SA7 rorl #1,d0 andil #0x80000000,d0 faddd COSB7,fp1 |...B7+SB8 faddd SINA6,fp2 |...A6+SA7 eorl d0,a6@(RPRIME) eorl d0,a6@(SPRIME) fmulx fp0,fp1 |...S(B7+SB8) oril #0x3F800000,d0 movel d0,a6@(POSNEG1) fmulx fp0,fp2 |...S(A6+SA7) faddd COSB6,fp1 |...B6+S(B7+SB8) faddd SINA5,fp2 |...A5+S(A6+SA7) fmulx fp0,fp1 |...S(B6+S(B7+SB8)) fmulx fp0,fp2 |...S(A5+S(A6+SA7)) faddd COSB5,fp1 |...B5+S(B6+S(B7+SB8)) faddd SINA4,fp2 |...A4+S(A5+S(A6+SA7)) fmulx fp0,fp1 |...S(B5+...) fmulx fp0,fp2 |...S(A4+...) faddd COSB4,fp1 |...B4+S(B5+...) faddd SINA3,fp2 |...A3+S(A4+...) fmulx fp0,fp1 |...S(B4+...) fmulx fp0,fp2 |...S(A3+...) faddx COSB3,fp1 |...B3+S(B4+...) faddx SINA2,fp2 |...A2+S(A3+...) fmulx fp0,fp1 |...S(B3+...) fmulx fp0,fp2 |...S(A2+...) faddx COSB2,fp1 |...B2+S(B3+...) faddx SINA1,fp2 |...A1+S(A2+...) fmulx fp0,fp1 |...S(B2+...) fmulx fp2,fp0 |...s(a1+...) fadds COSB1,fp1 |...B1+S(B2...) fmulx a6@(RPRIME),fp0 /* |...R'S(A1+...) */ fmulx a6@(SPRIME),fp1 /* |...S'(B1+S(B2+...)) */ movel d1,a7@- | save users mode # precision andil #0xff,d1 | mask off all exceptions fmovel d1,fpcr fadds a6@(POSNEG1),fp1 |...COS(X) bsrl __x_sto_cos | store cosine result fmovel a7@+,fpcr | restore users exceptions faddx a6@(RPRIME),fp0 |...SIN(X) jra __x_t_frcinxSCBORS: cmpil #0x3FFF8000,d0 jgt REDUCEXSCSM: movew #0x0000,a6@(XDCARE) .long 0xf23c4480,0x3f800000 /* fmoves &0x3F800000,fp1 */ movel d1,a7@- | save users mode # precision andil #0xff,d1 | mask off all exceptions fmovel d1,fpcr .long 0xf23c44a8,0x00800000 /* fsubs &0x00800000,fp1 */ bsrl __x_sto_cos | store cosine result fmovel a7@+,fpcr | restore users exceptions fmovex a6@(X),fp0 jra __x_t_frcinx| end
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -