⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 s_log1pl.s

📁 glibc 2.9,最新版的C语言库函数
💻 S
📖 第 1 页 / 共 3 页
字号:
.file "log1pl.s" // Copyright (c) 2000 - 2003, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.////*********************************************************************//// History: // 02/02/00 Initial version// 04/04/00 Unwind support added// 08/15/00 Bundle added after call to __libm_error_support to properly//          set [the previously overwritten] GR_Parameter_RESULT.// 05/21/01 Removed logl and log10l, putting them in a separate file// 06/29/01 Improved speed of all paths// 05/20/02 Cleaned up namespace and sf0 syntax// 02/10/03 Reordered header: .section, .global, .proc, .align;//          used data8 for long double table values////*********************************************************************////*********************************************************************//// Function:   log1pl(x) = ln(x+1), for double-extended precision x values////*********************************************************************//// Resources Used:////    Floating-Point Registers: f8 (Input and Return Value)//                              f34-f82////    General Purpose Registers://      r32-r56//      r53-r56 (Used to pass arguments to error handling routine)////    Predicate Registers:      p6-p13////*********************************************************************//// IEEE Special Conditions:////    Denormal fault raised on denormal inputs//    Overflow exceptions cannot occur  //    Underflow exceptions raised when appropriate for log1p //    Inexact raised when appropriate by algorithm////    log1pl(inf) = inf//    log1pl(-inf) = QNaN //    log1pl(+/-0) = +/-0 //    log1pl(-1) =  -inf //    log1pl(SNaN) = QNaN//    log1pl(QNaN) = QNaN//    log1pl(EM_special Values) = QNaN////*********************************************************************//// Overview//// The method consists of three cases.//// If      |X| < 2^(-80)	use case log1p_small;// else    |X| < 2^(-7)	        use case log_near1;// else      			use case log_regular;//// Case log1p_small:////   log1pl( X ) = logl( X+1 ) can be approximated by X//// Case log_near1:////   log1pl( X ) = log( X+1 ) can be approximated by a simple polynomial//   in W = X. This polynomial resembles the truncated Taylor//   series W - W^/2 + W^3/3 - ...// // Case log_regular:////   Here we use a table lookup method. The basic idea is that in//   order to compute logl(Arg) = log1pl (Arg-1) for an argument Arg in [1,2), //   we construct a value G such that G*Arg is close to 1 and that//   logl(1/G) is obtainable easily from a table of values calculated//   beforehand. Thus////      logl(Arg) = logl(1/G) + logl(G*Arg)//      	 = logl(1/G) + logl(1 + (G*Arg - 1))////   Because |G*Arg - 1| is small, the second term on the right hand//   side can be approximated by a short polynomial. We elaborate//   this method in four steps.////   Step 0: Initialization////   We need to calculate logl( X+1 ). Obtain N, S_hi such that////      X+1 = 2^N * ( S_hi + S_lo )   exactly////   where S_hi in [1,2) and S_lo is a correction to S_hi in the sense//   that |S_lo| <= ulp(S_hi).////   Step 1: Argument Reduction////   Based on S_hi, obtain G_1, G_2, G_3 from a table and calculate////      G := G_1 * G_2 * G_3//      r := (G * S_hi - 1) + G * S_lo////   These G_j's have the property that the product is exactly //   representable and that |r| < 2^(-12) as a result.////   Step 2: Approximation//////   logl(1 + r) is approximated by a short polynomial poly(r).////   Step 3: Reconstruction//////   Finally, log1pl( X ) = logl( X+1 ) is given by////   logl( X+1 )   =   logl( 2^N * (S_hi + S_lo) )//                 ~=~  N*logl(2) + logl(1/G) + logl(1 + r)//                 ~=~  N*logl(2) + logl(1/G) + poly(r).//// **** Algorithm ****//// Case log1p_small://// Although log1pl(X) is basically X, we would like to preserve the inexactness// nature as well as consistent behavior under different rounding modes.// We can do this by computing the result as //    //     log1pl(X) = X - X*X////// Case log_near1://// Here we compute a simple polynomial. To exploit parallelism, we split// the polynomial into two portions.// //       W := X//       Wsq := W * W//       W4  := Wsq*Wsq//       W6  := W4*Wsq//       Y_hi := W + Wsq*(P_1 + W*(P_2 + W*(P_3 + W*P_4))//       Y_lo := W6*(P_5 + W*(P_6 + W*(P_7 + W*P_8)))//// Case log_regular://// We present the algorithm in four steps.////   Step 0. Initialization//   ----------------------////   Z := X + 1//   N := unbaised exponent of Z//   S_hi := 2^(-N) * Z//   S_lo := 2^(-N) * { (max(X,1)-Z) + min(X,1) }////   Step 1. Argument Reduction//   --------------------------////   Let////      Z = 2^N * S_hi = 2^N * 1.d_1 d_2 d_3 ... d_63////   We obtain G_1, G_2, G_3 by the following steps.//////      Define		X_0 := 1.d_1 d_2 ... d_14. This is extracted//      		from S_hi.////      Define		A_1 := 1.d_1 d_2 d_3 d_4. This is X_0 truncated//      		to lsb = 2^(-4).////      Define		index_1 := [ d_1 d_2 d_3 d_4 ].////      Fetch 		Z_1 := (1/A_1) rounded UP in fixed point with//      fixed point	lsb = 2^(-15).//      		Z_1 looks like z_0.z_1 z_2 ... z_15//      	        Note that the fetching is done using index_1.//      		A_1 is actually not needed in the implementation//      		and is used here only to explain how is the value//      		Z_1 defined.////      Fetch		G_1 := (1/A_1) truncated to 21 sig. bits.//      floating pt.	Again, fetching is done using index_1. A_1//      		explains how G_1 is defined.////      Calculate	X_1 := X_0 * Z_1 truncated to lsb = 2^(-14)//      		     = 1.0 0 0 0 d_5 ... d_14//      		This is accomplised by integer multiplication.//      		It is proved that X_1 indeed always begin//      		with 1.0000 in fixed point.//////      Define		A_2 := 1.0 0 0 0 d_5 d_6 d_7 d_8. This is X_1 //      		truncated to lsb = 2^(-8). Similar to A_1,//      		A_2 is not needed in actual implementation. It//      		helps explain how some of the values are defined.////      Define		index_2 := [ d_5 d_6 d_7 d_8 ].////      Fetch 		Z_2 := (1/A_2) rounded UP in fixed point with//      fixed point	lsb = 2^(-15). Fetch done using index_2.//      		Z_2 looks like z_0.z_1 z_2 ... z_15////      Fetch		G_2 := (1/A_2) truncated to 21 sig. bits.//      floating pt.////      Calculate	X_2 := X_1 * Z_2 truncated to lsb = 2^(-14)//      		     = 1.0 0 0 0 0 0 0 0 d_9 d_10 ... d_14//      		This is accomplised by integer multiplication.//      		It is proved that X_2 indeed always begin//      		with 1.00000000 in fixed point.//////      Define		A_3 := 1.0 0 0 0 0 0 0 0 d_9 d_10 d_11 d_12 d_13 1.//      		This is 2^(-14) + X_2 truncated to lsb = 2^(-13).////      Define		index_3 := [ d_9 d_10 d_11 d_12 d_13 ].////      Fetch		G_3 := (1/A_3) truncated to 21 sig. bits.//      floating pt.	Fetch is done using index_3.////      Compute		G := G_1 * G_2 * G_3. ////      This is done exactly since each of G_j only has 21 sig. bits.////      Compute   ////      	r := (G*S_hi - 1) + G*S_lo using 2 FMA operations.////      Thus r approximates G*(S_hi + S_lo) - 1 to within a couple of//      rounding errors.//////  Step 2. Approximation//  ---------------------////   This step computes an approximation to logl( 1 + r ) where r is the//   reduced argument just obtained. It is proved that |r| <= 1.9*2^(-13);//   thus logl(1+r) can be approximated by a short polynomial:////      logl(1+r) ~=~ poly = r + Q1 r^2 + ... + Q4 r^5//////  Step 3. Reconstruction//  ----------------------////   This step computes the desired result of logl(X+1):////      logl(X+1) =   logl( 2^N * (S_hi + S_lo) )//      	  =   N*logl(2) + logl( S_hi + S_lo) )//      	  =   N*logl(2) + logl(1/G) +//      	      logl(1 + G * ( S_hi + S_lo ) - 1 )////   logl(2), logl(1/G_j) are stored as pairs of (single,double) numbers://   log2_hi, log2_lo, log1byGj_hi, log1byGj_lo. The high parts are//   single-precision numbers and the low parts are double precision//   numbers. These have the property that////      N*log2_hi + SUM ( log1byGj_hi )////   is computable exactly in double-extended precision (64 sig. bits).//   Finally////      Y_hi := N*log2_hi + SUM ( log1byGj_hi )//      Y_lo := poly_hi + [ poly_lo + //              ( SUM ( log1byGj_lo ) + N*log2_lo ) ]//RODATA.align 64// ************* DO NOT CHANGE THE ORDER OF THESE TABLES *************// P_8, P_7, P_6, P_5, P_4, P_3, P_2, and P_1 LOCAL_OBJECT_START(Constants_P)//data4  0xEFD62B15,0xE3936754,0x00003FFB,0x00000000//data4  0xA5E56381,0x8003B271,0x0000BFFC,0x00000000//data4  0x73282DB0,0x9249248C,0x00003FFC,0x00000000//data4  0x47305052,0xAAAAAA9F,0x0000BFFC,0x00000000//data4  0xCCD17FC9,0xCCCCCCCC,0x00003FFC,0x00000000//data4  0x00067ED5,0x80000000,0x0000BFFD,0x00000000//data4  0xAAAAAAAA,0xAAAAAAAA,0x00003FFD,0x00000000//data4  0xFFFFFFFE,0xFFFFFFFF,0x0000BFFD,0x00000000data8  0xE3936754EFD62B15,0x00003FFBdata8  0x8003B271A5E56381,0x0000BFFCdata8  0x9249248C73282DB0,0x00003FFCdata8  0xAAAAAA9F47305052,0x0000BFFCdata8  0xCCCCCCCCCCD17FC9,0x00003FFCdata8  0x8000000000067ED5,0x0000BFFDdata8  0xAAAAAAAAAAAAAAAA,0x00003FFDdata8  0xFFFFFFFFFFFFFFFE,0x0000BFFDLOCAL_OBJECT_END(Constants_P)// log2_hi, log2_lo, Q_4, Q_3, Q_2, and Q_1 LOCAL_OBJECT_START(Constants_Q)//data4  0x00000000,0xB1721800,0x00003FFE,0x00000000 //data4  0x4361C4C6,0x82E30865,0x0000BFE2,0x00000000//data4  0x328833CB,0xCCCCCAF2,0x00003FFC,0x00000000//data4  0xA9D4BAFB,0x80000077,0x0000BFFD,0x00000000//data4  0xAAABE3D2,0xAAAAAAAA,0x00003FFD,0x00000000//data4  0xFFFFDAB7,0xFFFFFFFF,0x0000BFFD,0x00000000 data8  0xB172180000000000,0x00003FFEdata8  0x82E308654361C4C6,0x0000BFE2data8  0xCCCCCAF2328833CB,0x00003FFCdata8  0x80000077A9D4BAFB,0x0000BFFDdata8  0xAAAAAAAAAAABE3D2,0x00003FFDdata8  0xFFFFFFFFFFFFDAB7,0x0000BFFDLOCAL_OBJECT_END(Constants_Q)// 1/ln10_hi, 1/ln10_loLOCAL_OBJECT_START(Constants_1_by_LN10)//data4  0x37287195,0xDE5BD8A9,0x00003FFD,0x00000000//data4  0xACCF70C8,0xD56EAABE,0x00003FBB,0x00000000data8  0xDE5BD8A937287195,0x00003FFDdata8  0xD56EAABEACCF70C8,0x00003FBBLOCAL_OBJECT_END(Constants_1_by_LN10)// Z1 - 16 bit fixed LOCAL_OBJECT_START(Constants_Z_1)data4  0x00008000data4  0x00007879data4  0x000071C8data4  0x00006BCBdata4  0x00006667data4  0x00006187data4  0x00005D18data4  0x0000590Cdata4  0x00005556data4  0x000051ECdata4  0x00004EC5data4  0x00004BDBdata4  0x00004925data4  0x0000469Fdata4  0x00004445data4  0x00004211LOCAL_OBJECT_END(Constants_Z_1)// G1 and H1 - IEEE single and h1 - IEEE doubleLOCAL_OBJECT_START(Constants_G_H_h1)data4  0x3F800000,0x00000000data8  0x0000000000000000data4  0x3F70F0F0,0x3D785196data8  0x3DA163A6617D741Cdata4  0x3F638E38,0x3DF13843data8  0x3E2C55E6CBD3D5BBdata4  0x3F579430,0x3E2FF9A0data8  0xBE3EB0BFD86EA5E7data4  0x3F4CCCC8,0x3E647FD6data8  0x3E2E6A8C86B12760data4  0x3F430C30,0x3E8B3AE7data8  0x3E47574C5C0739BAdata4  0x3F3A2E88,0x3EA30C68data8  0x3E20E30F13E8AF2Fdata4  0x3F321640,0x3EB9CEC8data8  0xBE42885BF2C630BDdata4  0x3F2AAAA8,0x3ECF9927data8  0x3E497F3497E577C6data4  0x3F23D708,0x3EE47FC5data8  0x3E3E6A6EA6B0A5AB

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -