📄 s_expm1l.s
字号:
.file "expl_m1.s"// Copyright (c) 2000 - 2003, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.//// History//==============================================================// 02/02/00 Initial Version// 04/04/00 Unwind support added// 08/15/00 Bundle added after call to __libm_error_support to properly// set [the previously overwritten] GR_Parameter_RESULT.// 07/07/01 Improved speed of all paths// 05/20/02 Cleaned up namespace and sf0 syntax// 02/10/03 Reordered header: .section, .global, .proc, .align;// used data8 for long double table values// 03/11/03 Improved accuracy and performance, corrected missing inexact flags// 04/17/03 Eliminated misplaced and unused data label// 12/15/03 Eliminated call to error support on expm1l underflow////********************************************************************* //// Function: Combined expl(x) and expm1l(x), where// x // expl(x) = e , for double-extended precision x values// x// expm1l(x) = e - 1 for double-extended precision x values////********************************************************************* //// Resources Used://// Floating-Point Registers: f8 (Input and Return Value) // f9-f15,f32-f77 //// General Purpose Registers: // r14-r38// r35-r38 (Used to pass arguments to error handling routine)// // Predicate Registers: p6-p15////********************************************************************* //// IEEE Special Conditions://// Denormal fault raised on denormal inputs // Overflow exceptions raised when appropriate for exp and expm1// Underflow exceptions raised when appropriate for exp and expm1// (Error Handling Routine called for overflow and Underflow)// Inexact raised when appropriate by algorithm //// exp(inf) = inf// exp(-inf) = +0// exp(SNaN) = QNaN// exp(QNaN) = QNaN// exp(0) = 1// exp(EM_special Values) = QNaN// exp(inf) = inf// expm1(-inf) = -1 // expm1(SNaN) = QNaN// expm1(QNaN) = QNaN// expm1(0) = 0// expm1(EM_special Values) = QNaN// //********************************************************************* //// Implementation and Algorithm Notes://// ker_exp_64( in_FR : X,// out_FR : Y_hi,// out_FR : Y_lo,// out_FR : scale,// out_PR : Safe )//// On input, X is in register format// p6 for exp,// p7 for expm1,//// On output, //// scale*(Y_hi + Y_lo) approximates exp(X) if exp// scale*(Y_hi + Y_lo) approximates exp(X)-1 if expm1//// The accuracy is sufficient for a highly accurate 64 sig.// bit implementation. Safe is set if there is no danger of // overflow/underflow when the result is composed from scale, // Y_hi and Y_lo. Thus, we can have a fast return if Safe is set. // Otherwise, one must prepare to handle the possible exception // appropriately. Note that SAFE not set (false) does not mean // that overflow/underflow will occur; only the setting of SAFE// guarantees the opposite.//// **** High Level Overview **** //// The method consists of three cases.// // If |X| < Tiny use case exp_tiny;// else if |X| < 2^(-m) use case exp_small; m=12 for exp, m=7 for expm1// else use case exp_regular;//// Case exp_tiny://// 1 + X can be used to approximate exp(X) // X + X^2/2 can be used to approximate exp(X) - 1//// Case exp_small://// Here, exp(X) and exp(X) - 1 can all be // appproximated by a relatively simple polynomial.//// This polynomial resembles the truncated Taylor series//// exp(w) = 1 + w + w^2/2! + w^3/3! + ... + w^n/n!//// Case exp_regular://// Here we use a table lookup method. The basic idea is that in// order to compute exp(X), we accurately decompose X into//// X = N * log(2)/(2^12) + r, |r| <= log(2)/2^13.//// Hence//// exp(X) = 2^( N / 2^12 ) * exp(r).//// The value 2^( N / 2^12 ) is obtained by simple combinations// of values calculated beforehand and stored in table; exp(r)// is approximated by a short polynomial because |r| is small.//// We elaborate this method in 4 steps.//// Step 1: Reduction//// The value 2^12/log(2) is stored as a double-extended number// L_Inv.//// N := round_to_nearest_integer( X * L_Inv )//// The value log(2)/2^12 is stored as two numbers L_hi and L_lo so// that r can be computed accurately via//// r := (X - N*L_hi) - N*L_lo//// We pick L_hi such that N*L_hi is representable in 64 sig. bits// and thus the FMA X - N*L_hi is error free. So r is the // 1 rounding error from an exact reduction with respect to // // L_hi + L_lo.//// In particular, L_hi has 30 significant bit and can be stored// as a double-precision number; L_lo has 64 significant bits and// stored as a double-extended number.//// Step 2: Approximation//// exp(r) - 1 is approximated by a short polynomial of the form// // r + A_1 r^2 + A_2 r^3 + A_3 r^4 .//// Step 3: Composition from Table Values //// The value 2^( N / 2^12 ) can be composed from a couple of tables// of precalculated values. First, express N as three integers// K, M_1, and M_2 as//// N = K * 2^12 + M_1 * 2^6 + M_2//// Where 0 <= M_1, M_2 < 2^6; and K can be positive or negative.// When N is represented in 2's complement, M_2 is simply the 6// lsb's, M_1 is the next 6, and K is simply N shifted right// arithmetically (sign extended) by 12 bits.//// Now, 2^( N / 2^12 ) is simply // // 2^K * 2^( M_1 / 2^6 ) * 2^( M_2 / 2^12 )//// Clearly, 2^K needs no tabulation. The other two values are less// trivial because if we store each accurately to more than working// precision, than its product is too expensive to calculate. We// use the following method.//// Define two mathematical values, delta_1 and delta_2, implicitly// such that//// T_1 = exp( [M_1 log(2)/2^6] - delta_1 ) // T_2 = exp( [M_2 log(2)/2^12] - delta_2 )//// are representable as 24 significant bits. To illustrate the idea,// we show how we define delta_1: //// T_1 := round_to_24_bits( exp( M_1 log(2)/2^6 ) )// delta_1 = (M_1 log(2)/2^6) - log( T_1 ) //// The last equality means mathematical equality. We then tabulate//// W_1 := exp(delta_1) - 1// W_2 := exp(delta_2) - 1//// Both in double precision.//// From the tabulated values T_1, T_2, W_1, W_2, we compose the values// T and W via//// T := T_1 * T_2 ...exactly// W := W_1 + (1 + W_1)*W_2 //// W approximates exp( delta ) - 1 where delta = delta_1 + delta_2.// The mathematical product of T and (W+1) is an accurate representation// of 2^(M_1/2^6) * 2^(M_2/2^12).//// Step 4. Reconstruction//// Finally, we can reconstruct exp(X), exp(X) - 1. // Because//// X = K * log(2) + (M_1*log(2)/2^6 - delta_1) // + (M_2*log(2)/2^12 - delta_2)// + delta_1 + delta_2 + r ...accurately// We have//// exp(X) ~=~ 2^K * ( T + T*[exp(delta_1+delta_2+r) - 1] )// ~=~ 2^K * ( T + T*[exp(delta + r) - 1] )// ~=~ 2^K * ( T + T*[(exp(delta)-1) // + exp(delta)*(exp(r)-1)] )// ~=~ 2^K * ( T + T*( W + (1+W)*poly(r) ) )// ~=~ 2^K * ( Y_hi + Y_lo )//// where Y_hi = T and Y_lo = T*(W + (1+W)*poly(r))//// For exp(X)-1, we have//// exp(X)-1 ~=~ 2^K * ( Y_hi + Y_lo ) - 1// ~=~ 2^K * ( Y_hi + Y_lo - 2^(-K) )//// and we combine Y_hi + Y_lo - 2^(-N) into the form of two // numbers Y_hi + Y_lo carefully.//// **** Algorithm Details ****//// A careful algorithm must be used to realize the mathematical ideas// accurately. We describe each of the three cases. We assume SAFE// is preset to be TRUE.//// Case exp_tiny://// The important points are to ensure an accurate result under // different rounding directions and a correct setting of the SAFE // flag.//// If expm1 is 1, then// SAFE := False ...possibility of underflow// Scale := 1.0// Y_hi := X// Y_lo := 2^(-17000)// Else// Scale := 1.0// Y_hi := 1.0// Y_lo := X ...for different rounding modes// Endif//// Case exp_small://// Here we compute a simple polynomial. To exploit parallelism, we split// the polynomial into several portions.//// Let r = X //// If exp ...i.e. exp( argument )//// rsq := r * r; // r4 := rsq*rsq// poly_lo := P_3 + r*(P_4 + r*(P_5 + r*P_6))// poly_hi := r + rsq*(P_1 + r*P_2)// Y_lo := poly_hi + r4 * poly_lo// Y_hi := 1.0// Scale := 1.0//// Else ...i.e. exp( argument ) - 1//// rsq := r * r// r4 := rsq * rsq// poly_lo := Q_7 + r*(Q_8 + r*Q_9))// poly_med:= Q_3 + r*Q_4 + rsq*(Q_5 + r*Q_6)// poly_med:= poly_med + r4*poly_lo// poly_hi := Q_1 + r*Q_2// Y_lo := rsq*(poly_hi + rsq*poly_lo)// Y_hi := X// Scale := 1.0//// Endif//// Case exp_regular://// The previous description contain enough information except the// computation of poly and the final Y_hi and Y_lo in the case for// exp(X)-1.//// The computation of poly for Step 2://// rsq := r*r// poly := r + rsq*(A_1 + r*(A_2 + r*A_3))//// For the case exp(X) - 1, we need to incorporate 2^(-K) into// Y_hi and Y_lo at the end of Step 4.//// If K > 10 then// Y_lo := Y_lo - 2^(-K)// Else// If K < -10 then// Y_lo := Y_hi + Y_lo// Y_hi := -2^(-K)// Else// Y_hi := Y_hi - 2^(-K)// End If// End If////=======================================================// General Purpose Registers//GR_ad_Arg = r14GR_ad_A = r15GR_sig_inv_ln2 = r15GR_rshf_2to51 = r16GR_ad_PQ = r16GR_ad_Q = r16GR_signexp_x = r17GR_exp_x = r17GR_small_exp = r18GR_rshf = r18GR_exp_mask = r19GR_ad_W1 = r20GR_exp_2tom51 = r20GR_ad_W2 = r21GR_exp_underflow = r21GR_M2 = r22GR_huge_exp = r22GR_M1 = r23GR_huge_signif = r23GR_K = r24GR_one = r24GR_minus_one = r24GR_exp_bias = r25GR_ad_Limits = r26GR_N_fix = r26GR_exp_2_mk = r26GR_ad_P = r27GR_exp_2_k = r27GR_big_expo_neg = r28GR_very_small_exp = r29GR_exp_half = r29GR_ad_T1 = r30GR_ad_T2 = r31GR_SAVE_PFS = r32GR_SAVE_B0 = r33GR_SAVE_GP = r34GR_Parameter_X = r35GR_Parameter_Y = r36GR_Parameter_RESULT = r37GR_Parameter_TAG = r38 // Floating Point Registers//FR_norm_x = f9FR_RSHF_2TO51 = f10FR_INV_LN2_2TO63 = f11FR_W_2TO51_RSH = f12FR_2TOM51 = f13FR_RSHF = f14FR_Y_hi = f34FR_Y_lo = f35FR_scale = f36FR_tmp = f37FR_float_N = f38FR_N_signif = f39FR_L_hi = f40FR_L_lo = f41FR_r = f42FR_W1 = f43FR_T1 = f44FR_W2 = f45FR_T2 = f46FR_W1_p1 = f47FR_rsq = f48FR_A2 = f49FR_r4 = f50FR_A3 = f51FR_poly = f52FR_T = f53FR_W = f54FR_Wp1 = f55FR_p21 = f59FR_p210 = f59FR_p65 = f60FR_p654 = f60FR_p6543 = f60FR_2_mk = f61FR_P4Q7 = f61FR_P4 = f61FR_Q7 = f61FR_P3Q6 = f62FR_P3 = f62FR_Q6 = f62FR_q65 = f62FR_q6543 = f62FR_P2Q5 = f63FR_P2 = f63FR_Q5 = f63FR_P1Q4 = f64FR_P1 = f64FR_Q4 = f64FR_q43 = f64FR_Q3 = f65FR_Q2 = f66FR_q21 = f66FR_Q1 = f67FR_A1 = f68FR_P6Q9 = f68FR_P6 = f68FR_Q9 = f68FR_P5Q8 = f69FR_P5 = f69FR_Q8 = f69FR_q987 = f69FR_q98 = f69FR_q9876543 = f69FR_min_oflow_x = f70FR_huge_exp = f70FR_zero_uflow_x = f71FR_huge_signif = f71FR_huge = f72FR_small = f72FR_half = f73FR_T_scale = f74FR_result_lo = f75FR_W_T_scale = f76FR_Wp1_T_scale = f77FR_ftz = f77FR_half_x = f77//FR_X = f9FR_Y = f0FR_RESULT = f15
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -