⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 s_expm1l.s

📁 glibc 库, 不仅可以学习使用库函数,还可以学习函数的具体实现,是提高功力的好资料
💻 S
📖 第 1 页 / 共 3 页
字号:
.file "expl_m1.s"// Copyright (c) 2000 - 2003, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.//// History//==============================================================// 02/02/00 Initial Version// 04/04/00 Unwind support added// 08/15/00 Bundle added after call to __libm_error_support to properly//          set [the previously overwritten] GR_Parameter_RESULT.// 07/07/01 Improved speed of all paths// 05/20/02 Cleaned up namespace and sf0 syntax// 02/10/03 Reordered header: .section, .global, .proc, .align;//          used data8 for long double table values// 03/11/03 Improved accuracy and performance, corrected missing inexact flags// 04/17/03 Eliminated misplaced and unused data label// 12/15/03 Eliminated call to error support on expm1l underflow////********************************************************************* //// Function:   Combined expl(x) and expm1l(x), where//                        x //             expl(x) = e , for double-extended precision x values//                          x//             expm1l(x) = e  - 1  for double-extended precision x values////********************************************************************* //// Resources Used:////    Floating-Point Registers: f8  (Input and Return Value) //                              f9-f15,f32-f77 ////    General Purpose Registers: //      r14-r38//      r35-r38 (Used to pass arguments to error handling routine)//                                     //    Predicate Registers:      p6-p15////********************************************************************* //// IEEE Special Conditions:////    Denormal  fault raised on denormal inputs  //    Overflow exceptions raised when appropriate for exp and expm1//    Underflow exceptions raised when appropriate for exp and expm1//    (Error Handling Routine called for overflow and Underflow)//    Inexact raised when appropriate by algorithm ////    exp(inf) = inf//    exp(-inf) = +0//    exp(SNaN) = QNaN//    exp(QNaN) = QNaN//    exp(0) = 1//    exp(EM_special Values) = QNaN//    exp(inf) = inf//    expm1(-inf) = -1 //    expm1(SNaN) = QNaN//    expm1(QNaN) = QNaN//    expm1(0) = 0//    expm1(EM_special Values) = QNaN//    //********************************************************************* //// Implementation and Algorithm Notes:////  ker_exp_64( in_FR  : X,//            out_FR : Y_hi,//            out_FR : Y_lo,//            out_FR : scale,//            out_PR : Safe )//// On input, X is in register format// p6 for exp,// p7 for expm1,//// On output, ////   scale*(Y_hi + Y_lo)  approximates  exp(X)       if exp//   scale*(Y_hi + Y_lo)  approximates  exp(X)-1     if expm1//// The accuracy is sufficient for a highly accurate 64 sig.// bit implementation.  Safe is set if there is no danger of // overflow/underflow when the result is composed from scale, // Y_hi and Y_lo. Thus, we can have a fast return if Safe is set. // Otherwise, one must prepare to handle the possible exception // appropriately.  Note that SAFE not set (false) does not mean // that overflow/underflow will occur; only the setting of SAFE// guarantees the opposite.//// **** High Level Overview **** //// The method consists of three cases.// // If           |X| < Tiny	use case exp_tiny;// else if	|X| < 2^(-m)	use case exp_small; m=12 for exp, m=7 for expm1// else		use case exp_regular;//// Case exp_tiny:////   1 + X     can be used to approximate exp(X) //   X + X^2/2 can be used to approximate exp(X) - 1//// Case exp_small:////   Here, exp(X) and exp(X) - 1 can all be //   appproximated by a relatively simple polynomial.////   This polynomial resembles the truncated Taylor series////	exp(w) = 1 + w + w^2/2! + w^3/3! + ... + w^n/n!//// Case exp_regular:////   Here we use a table lookup method. The basic idea is that in//   order to compute exp(X), we accurately decompose X into////   X = N * log(2)/(2^12)  + r,	|r| <= log(2)/2^13.////   Hence////   exp(X) = 2^( N / 2^12 ) * exp(r).////   The value 2^( N / 2^12 ) is obtained by simple combinations//   of values calculated beforehand and stored in table; exp(r)//   is approximated by a short polynomial because |r| is small.////   We elaborate this method in 4 steps.////   Step 1: Reduction////   The value 2^12/log(2) is stored as a double-extended number//   L_Inv.////   N := round_to_nearest_integer( X * L_Inv )////   The value log(2)/2^12 is stored as two numbers L_hi and L_lo so//   that r can be computed accurately via////   r := (X - N*L_hi) - N*L_lo////   We pick L_hi such that N*L_hi is representable in 64 sig. bits//   and thus the FMA   X - N*L_hi   is error free. So r is the //   1 rounding error from an exact reduction with respect to //   //   L_hi + L_lo.////   In particular, L_hi has 30 significant bit and can be stored//   as a double-precision number; L_lo has 64 significant bits and//   stored as a double-extended number.////   Step 2: Approximation////   exp(r) - 1 is approximated by a short polynomial of the form//   //   r + A_1 r^2 + A_2 r^3 + A_3 r^4 .////   Step 3: Composition from Table Values ////   The value 2^( N / 2^12 ) can be composed from a couple of tables//   of precalculated values. First, express N as three integers//   K, M_1, and M_2 as////     N  =  K * 2^12  + M_1 * 2^6 + M_2////   Where 0 <= M_1, M_2 < 2^6; and K can be positive or negative.//   When N is represented in 2's complement, M_2 is simply the 6//   lsb's, M_1 is the next 6, and K is simply N shifted right//   arithmetically (sign extended) by 12 bits.////   Now, 2^( N / 2^12 ) is simply  //	//      2^K * 2^( M_1 / 2^6 ) * 2^( M_2 / 2^12 )////   Clearly, 2^K needs no tabulation. The other two values are less//   trivial because if we store each accurately to more than working//   precision, than its product is too expensive to calculate. We//   use the following method.////   Define two mathematical values, delta_1 and delta_2, implicitly//   such that////     T_1 = exp( [M_1 log(2)/2^6]  -  delta_1 ) //     T_2 = exp( [M_2 log(2)/2^12] -  delta_2 )////   are representable as 24 significant bits. To illustrate the idea,//   we show how we define delta_1: ////     T_1     := round_to_24_bits( exp( M_1 log(2)/2^6 ) )//     delta_1  = (M_1 log(2)/2^6) - log( T_1 )  ////   The last equality means mathematical equality. We then tabulate////     W_1 := exp(delta_1) - 1//     W_2 := exp(delta_2) - 1////   Both in double precision.////   From the tabulated values T_1, T_2, W_1, W_2, we compose the values//   T and W via////     T := T_1 * T_2			...exactly//     W := W_1 + (1 + W_1)*W_2	////   W approximates exp( delta ) - 1  where delta = delta_1 + delta_2.//   The mathematical product of T and (W+1) is an accurate representation//   of 2^(M_1/2^6) * 2^(M_2/2^12).////   Step 4. Reconstruction////   Finally, we can reconstruct exp(X), exp(X) - 1. //   Because////	X = K * log(2) + (M_1*log(2)/2^6  - delta_1) //		       + (M_2*log(2)/2^12 - delta_2)//		       + delta_1 + delta_2 + r 		...accurately//   We have////	exp(X) ~=~ 2^K * ( T + T*[exp(delta_1+delta_2+r) - 1] )//	       ~=~ 2^K * ( T + T*[exp(delta + r) - 1]         )//	       ~=~ 2^K * ( T + T*[(exp(delta)-1)  //				 + exp(delta)*(exp(r)-1)]   )//             ~=~ 2^K * ( T + T*( W + (1+W)*poly(r) ) )//             ~=~ 2^K * ( Y_hi  +  Y_lo )////   where Y_hi = T  and Y_lo = T*(W + (1+W)*poly(r))////   For exp(X)-1, we have////	exp(X)-1 ~=~ 2^K * ( Y_hi + Y_lo ) - 1//		 ~=~ 2^K * ( Y_hi + Y_lo - 2^(-K) )////   and we combine Y_hi + Y_lo - 2^(-N)  into the form of two //   numbers  Y_hi + Y_lo carefully.////   **** Algorithm Details ****////   A careful algorithm must be used to realize the mathematical ideas//   accurately. We describe each of the three cases. We assume SAFE//   is preset to be TRUE.////   Case exp_tiny:////   The important points are to ensure an accurate result under //   different rounding directions and a correct setting of the SAFE //   flag.////   If expm1 is 1, then//      SAFE  := False	...possibility of underflow//      Scale := 1.0//      Y_hi  := X//      Y_lo  := 2^(-17000)//   Else//      Scale := 1.0//      Y_hi  := 1.0//      Y_lo  := X	...for different rounding modes//   Endif////   Case exp_small:////   Here we compute a simple polynomial. To exploit parallelism, we split//   the polynomial into several portions.////   Let r = X ////   If exp 	...i.e. exp( argument )////      rsq := r * r; //      r4  := rsq*rsq//      poly_lo := P_3 + r*(P_4 + r*(P_5 + r*P_6))//      poly_hi := r + rsq*(P_1 + r*P_2)//      Y_lo    := poly_hi + r4 * poly_lo//      Y_hi    := 1.0//      Scale   := 1.0////   Else			...i.e. exp( argument ) - 1////      rsq := r * r//      r4  := rsq * rsq//      poly_lo := Q_7 + r*(Q_8 + r*Q_9))//      poly_med:= Q_3 + r*Q_4 + rsq*(Q_5 + r*Q_6)//      poly_med:= poly_med + r4*poly_lo//      poly_hi := Q_1 + r*Q_2//      Y_lo    := rsq*(poly_hi +  rsq*poly_lo)//      Y_hi    := X//      Scale   := 1.0////   Endif////  Case exp_regular:////  The previous description contain enough information except the//  computation of poly and the final Y_hi and Y_lo in the case for//  exp(X)-1.////  The computation of poly for Step 2:////   rsq := r*r//   poly := r + rsq*(A_1 + r*(A_2 + r*A_3))////  For the case exp(X) - 1, we need to incorporate 2^(-K) into//  Y_hi and Y_lo at the end of Step 4.////   If K > 10 then//      Y_lo := Y_lo - 2^(-K)//   Else//      If K < -10 then//	 Y_lo := Y_hi + Y_lo//	 Y_hi := -2^(-K)//      Else//	 Y_hi := Y_hi - 2^(-K)//      End If//   End If////=======================================================// General Purpose Registers//GR_ad_Arg           = r14GR_ad_A             = r15GR_sig_inv_ln2      = r15GR_rshf_2to51       = r16GR_ad_PQ            = r16GR_ad_Q             = r16GR_signexp_x        = r17GR_exp_x            = r17GR_small_exp        = r18GR_rshf             = r18GR_exp_mask         = r19GR_ad_W1            = r20GR_exp_2tom51       = r20GR_ad_W2            = r21GR_exp_underflow    = r21GR_M2               = r22GR_huge_exp         = r22GR_M1               = r23GR_huge_signif      = r23GR_K                = r24GR_one              = r24GR_minus_one        = r24GR_exp_bias         = r25GR_ad_Limits        = r26GR_N_fix            = r26GR_exp_2_mk         = r26GR_ad_P             = r27GR_exp_2_k          = r27GR_big_expo_neg     = r28GR_very_small_exp   = r29GR_exp_half         = r29GR_ad_T1            = r30GR_ad_T2            = r31GR_SAVE_PFS         = r32GR_SAVE_B0          = r33GR_SAVE_GP          = r34GR_Parameter_X      = r35GR_Parameter_Y      = r36GR_Parameter_RESULT = r37GR_Parameter_TAG    = r38 // Floating Point Registers//FR_norm_x           = f9FR_RSHF_2TO51       = f10FR_INV_LN2_2TO63    = f11FR_W_2TO51_RSH      = f12FR_2TOM51           = f13FR_RSHF             = f14FR_Y_hi             = f34FR_Y_lo             = f35FR_scale            = f36FR_tmp              = f37FR_float_N          = f38FR_N_signif         = f39FR_L_hi             = f40FR_L_lo             = f41FR_r                = f42FR_W1               = f43FR_T1               = f44FR_W2               = f45FR_T2               = f46FR_W1_p1            = f47FR_rsq              = f48FR_A2               = f49FR_r4               = f50FR_A3               = f51FR_poly             = f52FR_T                = f53FR_W                = f54FR_Wp1              = f55FR_p21              = f59FR_p210             = f59FR_p65              = f60FR_p654             = f60FR_p6543            = f60FR_2_mk             = f61FR_P4Q7             = f61FR_P4               = f61FR_Q7               = f61FR_P3Q6             = f62FR_P3               = f62FR_Q6               = f62FR_q65              = f62FR_q6543            = f62FR_P2Q5             = f63FR_P2               = f63FR_Q5               = f63FR_P1Q4             = f64FR_P1               = f64FR_Q4               = f64FR_q43              = f64FR_Q3               = f65FR_Q2               = f66FR_q21              = f66FR_Q1               = f67FR_A1               = f68FR_P6Q9             = f68FR_P6               = f68FR_Q9               = f68FR_P5Q8             = f69FR_P5               = f69FR_Q8               = f69FR_q987             = f69FR_q98              = f69FR_q9876543         = f69FR_min_oflow_x      = f70FR_huge_exp         = f70FR_zero_uflow_x     = f71FR_huge_signif      = f71FR_huge             = f72FR_small            = f72FR_half             = f73FR_T_scale          = f74FR_result_lo        = f75FR_W_T_scale        = f76FR_Wp1_T_scale      = f77FR_ftz              = f77FR_half_x           = f77//FR_X                = f9FR_Y                = f0FR_RESULT           = f15

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -