📄 s_erfcl.s
字号:
// ~=~ 2^K * ( T + T*[exp(delta + r) - 1] )// ~=~ 2^K * ( T + T*[(exp(delta)-1) // + exp(delta)*(exp(r)-1)] )// ~=~ 2^K * ( T + T*( W + (1+W)*poly(r) ) )// ~=~ 2^K * ( Y_hi + Y_lo )//// where Y_hi = T and Y_lo = T*(W + (1+W)*poly(r))//// For exp(X)-1, we have//// exp(X)-1 ~=~ 2^K * ( Y_hi + Y_lo ) - 1// ~=~ 2^K * ( Y_hi + Y_lo - 2^(-K) )//// and we combine Y_hi + Y_lo - 2^(-N) into the form of two // numbers Y_hi + Y_lo carefully.//// **** Algorithm Details ****//// A careful algorithm must be used to realize the mathematical ideas// accurately. We describe each of the three cases. We assume SAFE// is preset to be TRUE.//// Case exp_tiny://// The important points are to ensure an accurate result under // different rounding directions and a correct setting of the SAFE // flag.//// If expm1 is 1, then// SAFE := False ...possibility of underflow// Scale := 1.0// Y_hi := X// Y_lo := 2^(-17000)// Else// Scale := 1.0// Y_hi := 1.0// Y_lo := X ...for different rounding modes// Endif//// Case exp_small://// Here we compute a simple polynomial. To exploit parallelism, we split// the polynomial into several portions.//// Let r = X //// If exp ...i.e. exp( argument )//// rsq := r * r; // r4 := rsq*rsq// poly_lo := P_3 + r*(P_4 + r*(P_5 + r*P_6))// poly_hi := r + rsq*(P_1 + r*P_2)// Y_lo := poly_hi + r4 * poly_lo// Y_hi := 1.0// Scale := 1.0//// Else ...i.e. exp( argument ) - 1//// rsq := r * r// r4 := rsq * rsq// r6 := rsq * r4// poly_lo := r6*(Q_5 + r*(Q_6 + r*Q_7))// poly_hi := Q_1 + r*(Q_2 + r*(Q_3 + r*Q_4))// Y_lo := rsq*poly_hi + poly_lo// Y_hi := X// Scale := 1.0//// Endif//// Case exp_regular://// The previous description contain enough information except the// computation of poly and the final Y_hi and Y_lo in the case for// exp(X)-1.//// The computation of poly for Step 2://// rsq := r*r// poly := r + rsq*(A_1 + r*(A_2 + r*A_3))//// For the case exp(X) - 1, we need to incorporate 2^(-K) into// Y_hi and Y_lo at the end of Step 4.//// If K > 10 then// Y_lo := Y_lo - 2^(-K)// Else// If K < -10 then// Y_lo := Y_hi + Y_lo// Y_hi := -2^(-K)// Else// Y_hi := Y_hi - 2^(-K)// End If// End If//// Overview of operation//==============================================================// Registers used//==============================================================// Floating Point registers used: // f8, input// f9 -> f14, f36 -> f126// General registers used: // r32 -> r71 // Predicate registers used:// p6 -> p15// Assembly macros//==============================================================// GR for exp(X)GR_ad_Arg = r33GR_ad_C = r34GR_ERFC_S_TB = r35GR_signexp_x = r36GR_exp_x = r36GR_exp_mask = r37GR_ad_W1 = r38GR_ad_W2 = r39GR_M2 = r40GR_M1 = r41GR_K = r42GR_exp_2_k = r43GR_ad_T1 = r44GR_ad_T2 = r45GR_N_fix = r46GR_ad_P = r47GR_exp_bias = r48GR_BIAS = r48GR_exp_half = r49GR_sig_inv_ln2 = r50GR_rshf_2to51 = r51GR_exp_2tom51 = r52GR_rshf = r53// GR for erfcl(x)//==============================================================GR_ERFC_XC_TB = r54GR_ERFC_P_TB = r55GR_IndxPlusBias = r56GR_P_POINT_1 = r57GR_P_POINT_2 = r58GR_AbsArg = r59GR_ShftXBi = r60GR_ShftPi = r61GR_mBIAS = r62GR_ShftPi_bias = r63GR_ShftXBi_bias = r64GR_ShftA14 = r65GR_ShftA15 = r66GR_EpsNorm = r67GR_0x1 = r68GR_ShftPi_8 = r69GR_26PlusBias = r70GR_27PlusBias = r71// GR for __libm_support call//==============================================================GR_SAVE_B0 = r64GR_SAVE_PFS = r65GR_SAVE_GP = r66GR_SAVE_SP = r67GR_Parameter_X = r68GR_Parameter_Y = r69GR_Parameter_RESULT = r70GR_Parameter_TAG = r71//==============================================================// Floating Point Registers//FR_RSHF_2TO51 = f10FR_INV_LN2_2TO63 = f11FR_W_2TO51_RSH = f12FR_2TOM51 = f13FR_RSHF = f14FR_scale = f36FR_float_N = f37FR_N_signif = f38FR_L_hi = f39FR_L_lo = f40FR_r = f41FR_W1 = f42FR_T1 = f43FR_W2 = f44FR_T2 = f45FR_rsq = f46FR_C2 = f47FR_C3 = f48FR_poly = f49FR_P6 = f49FR_T = f50FR_P5 = f50FR_P4 = f51FR_W = f51FR_P3 = f52FR_Wp1 = f52FR_P2 = f53FR_P1 = f54FR_Q7 = f56FR_Q6 = f57FR_Q5 = f58FR_Q4 = f59FR_Q3 = f60FR_Q2 = f61FR_Q1 = f62FR_C1 = f63FR_A15 = f64FR_ch_dx = f65FR_T_scale = f66FR_norm_x = f67FR_AbsArg = f68FR_POS_ARG_ASYMP = f69FR_NEG_ARG_ASYMP = f70FR_Tmp = f71FR_Xc = f72FR_A0 = f73FR_A1 = f74FR_A2 = f75FR_A3 = f76FR_A4 = f77FR_A5 = f78FR_A6 = f79FR_A7 = f80FR_A8 = f81FR_A9 = f82FR_A10 = f83FR_A11 = f84FR_A12 = f85FR_A13 = f86FR_A14 = f87FR_P15_0_1 = f88FR_P15_8_1 = f88FR_P15_1_1 = f89FR_P15_8_2 = f89FR_P15_1_2 = f90FR_P15_2_1 = f91FR_P15_2_2 = f92FR_P15_3_1 = f93FR_P15_3_2 = f94FR_P15_4_2 = f95FR_P15_7_1 = f96FR_P15_7_2 = f97FR_P15_9_1 = f98FR_P15_9_2 = f99FR_P15_13_1 = f100FR_P15_14_1 = f101FR_P15_14_2 = f102FR_Tmp2 = f103FR_Xpdx_lo = f104FR_2 = f105FR_xsq_lo = f106FR_LocArg = f107FR_Tmpf = f108FR_Tmp1 = f109FR_EpsNorm = f110FR_UnfBound = f111FR_NormX = f112FR_Xpdx_hi = f113FR_dU = f114FR_H = f115FR_G = f116FR_V = f117FR_M = f118FR_U = f119FR_Q = f120FR_S = f121FR_R = f122FR_res_pos_x_hi = f123FR_res_pos_x_lo = f124FR_dx = f125FR_dx1 = f126// for error handler routineFR_X = f9FR_Y = f0FR_RESULT = f8// Data tables
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -