📄 s_log1pl.s
字号:
.file "log1pl.s" // Copyright (c) 2000 - 2003, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.////*********************************************************************//// History: // 02/02/00 Initial version// 04/04/00 Unwind support added// 08/15/00 Bundle added after call to __libm_error_support to properly// set [the previously overwritten] GR_Parameter_RESULT.// 05/21/01 Removed logl and log10l, putting them in a separate file// 06/29/01 Improved speed of all paths// 05/20/02 Cleaned up namespace and sf0 syntax// 02/10/03 Reordered header: .section, .global, .proc, .align;// used data8 for long double table values////*********************************************************************////*********************************************************************//// Function: log1pl(x) = ln(x+1), for double-extended precision x values////*********************************************************************//// Resources Used://// Floating-Point Registers: f8 (Input and Return Value)// f34-f82//// General Purpose Registers:// r32-r56// r53-r56 (Used to pass arguments to error handling routine)//// Predicate Registers: p6-p13////*********************************************************************//// IEEE Special Conditions://// Denormal fault raised on denormal inputs// Overflow exceptions cannot occur // Underflow exceptions raised when appropriate for log1p // Inexact raised when appropriate by algorithm//// log1pl(inf) = inf// log1pl(-inf) = QNaN // log1pl(+/-0) = +/-0 // log1pl(-1) = -inf // log1pl(SNaN) = QNaN// log1pl(QNaN) = QNaN// log1pl(EM_special Values) = QNaN////*********************************************************************//// Overview//// The method consists of three cases.//// If |X| < 2^(-80) use case log1p_small;// else |X| < 2^(-7) use case log_near1;// else use case log_regular;//// Case log1p_small://// log1pl( X ) = logl( X+1 ) can be approximated by X//// Case log_near1://// log1pl( X ) = log( X+1 ) can be approximated by a simple polynomial// in W = X. This polynomial resembles the truncated Taylor// series W - W^/2 + W^3/3 - ...// // Case log_regular://// Here we use a table lookup method. The basic idea is that in// order to compute logl(Arg) = log1pl (Arg-1) for an argument Arg in [1,2), // we construct a value G such that G*Arg is close to 1 and that// logl(1/G) is obtainable easily from a table of values calculated// beforehand. Thus//// logl(Arg) = logl(1/G) + logl(G*Arg)// = logl(1/G) + logl(1 + (G*Arg - 1))//// Because |G*Arg - 1| is small, the second term on the right hand// side can be approximated by a short polynomial. We elaborate// this method in four steps.//// Step 0: Initialization//// We need to calculate logl( X+1 ). Obtain N, S_hi such that//// X+1 = 2^N * ( S_hi + S_lo ) exactly//// where S_hi in [1,2) and S_lo is a correction to S_hi in the sense// that |S_lo| <= ulp(S_hi).//// Step 1: Argument Reduction//// Based on S_hi, obtain G_1, G_2, G_3 from a table and calculate//// G := G_1 * G_2 * G_3// r := (G * S_hi - 1) + G * S_lo//// These G_j's have the property that the product is exactly // representable and that |r| < 2^(-12) as a result.//// Step 2: Approximation////// logl(1 + r) is approximated by a short polynomial poly(r).//// Step 3: Reconstruction////// Finally, log1pl( X ) = logl( X+1 ) is given by//// logl( X+1 ) = logl( 2^N * (S_hi + S_lo) )// ~=~ N*logl(2) + logl(1/G) + logl(1 + r)// ~=~ N*logl(2) + logl(1/G) + poly(r).//// **** Algorithm ****//// Case log1p_small://// Although log1pl(X) is basically X, we would like to preserve the inexactness// nature as well as consistent behavior under different rounding modes.// We can do this by computing the result as // // log1pl(X) = X - X*X////// Case log_near1://// Here we compute a simple polynomial. To exploit parallelism, we split// the polynomial into two portions.// // W := X// Wsq := W * W// W4 := Wsq*Wsq// W6 := W4*Wsq// Y_hi := W + Wsq*(P_1 + W*(P_2 + W*(P_3 + W*P_4))// Y_lo := W6*(P_5 + W*(P_6 + W*(P_7 + W*P_8)))//// Case log_regular://// We present the algorithm in four steps.//// Step 0. Initialization// ----------------------//// Z := X + 1// N := unbaised exponent of Z// S_hi := 2^(-N) * Z// S_lo := 2^(-N) * { (max(X,1)-Z) + min(X,1) }//// Step 1. Argument Reduction// --------------------------//// Let//// Z = 2^N * S_hi = 2^N * 1.d_1 d_2 d_3 ... d_63//// We obtain G_1, G_2, G_3 by the following steps.////// Define X_0 := 1.d_1 d_2 ... d_14. This is extracted// from S_hi.//// Define A_1 := 1.d_1 d_2 d_3 d_4. This is X_0 truncated// to lsb = 2^(-4).//// Define index_1 := [ d_1 d_2 d_3 d_4 ].//// Fetch Z_1 := (1/A_1) rounded UP in fixed point with// fixed point lsb = 2^(-15).// Z_1 looks like z_0.z_1 z_2 ... z_15// Note that the fetching is done using index_1.// A_1 is actually not needed in the implementation// and is used here only to explain how is the value// Z_1 defined.//// Fetch G_1 := (1/A_1) truncated to 21 sig. bits.// floating pt. Again, fetching is done using index_1. A_1// explains how G_1 is defined.//// Calculate X_1 := X_0 * Z_1 truncated to lsb = 2^(-14)// = 1.0 0 0 0 d_5 ... d_14// This is accomplised by integer multiplication.// It is proved that X_1 indeed always begin// with 1.0000 in fixed point.////// Define A_2 := 1.0 0 0 0 d_5 d_6 d_7 d_8. This is X_1 // truncated to lsb = 2^(-8). Similar to A_1,// A_2 is not needed in actual implementation. It// helps explain how some of the values are defined.//// Define index_2 := [ d_5 d_6 d_7 d_8 ].//// Fetch Z_2 := (1/A_2) rounded UP in fixed point with// fixed point lsb = 2^(-15). Fetch done using index_2.// Z_2 looks like z_0.z_1 z_2 ... z_15//// Fetch G_2 := (1/A_2) truncated to 21 sig. bits.// floating pt.//// Calculate X_2 := X_1 * Z_2 truncated to lsb = 2^(-14)// = 1.0 0 0 0 0 0 0 0 d_9 d_10 ... d_14// This is accomplised by integer multiplication.// It is proved that X_2 indeed always begin// with 1.00000000 in fixed point.////// Define A_3 := 1.0 0 0 0 0 0 0 0 d_9 d_10 d_11 d_12 d_13 1.// This is 2^(-14) + X_2 truncated to lsb = 2^(-13).//// Define index_3 := [ d_9 d_10 d_11 d_12 d_13 ].//// Fetch G_3 := (1/A_3) truncated to 21 sig. bits.// floating pt. Fetch is done using index_3.//// Compute G := G_1 * G_2 * G_3. //// This is done exactly since each of G_j only has 21 sig. bits.//// Compute //// r := (G*S_hi - 1) + G*S_lo using 2 FMA operations.//// Thus r approximates G*(S_hi + S_lo) - 1 to within a couple of// rounding errors.////// Step 2. Approximation// ---------------------//// This step computes an approximation to logl( 1 + r ) where r is the// reduced argument just obtained. It is proved that |r| <= 1.9*2^(-13);// thus logl(1+r) can be approximated by a short polynomial://// logl(1+r) ~=~ poly = r + Q1 r^2 + ... + Q4 r^5////// Step 3. Reconstruction// ----------------------//// This step computes the desired result of logl(X+1)://// logl(X+1) = logl( 2^N * (S_hi + S_lo) )// = N*logl(2) + logl( S_hi + S_lo) )// = N*logl(2) + logl(1/G) +// logl(1 + G * ( S_hi + S_lo ) - 1 )//// logl(2), logl(1/G_j) are stored as pairs of (single,double) numbers:// log2_hi, log2_lo, log1byGj_hi, log1byGj_lo. The high parts are// single-precision numbers and the low parts are double precision// numbers. These have the property that//// N*log2_hi + SUM ( log1byGj_hi )//// is computable exactly in double-extended precision (64 sig. bits).// Finally//// Y_hi := N*log2_hi + SUM ( log1byGj_hi )// Y_lo := poly_hi + [ poly_lo + // ( SUM ( log1byGj_lo ) + N*log2_lo ) ]//RODATA.align 64// ************* DO NOT CHANGE THE ORDER OF THESE TABLES *************// P_8, P_7, P_6, P_5, P_4, P_3, P_2, and P_1 LOCAL_OBJECT_START(Constants_P)//data4 0xEFD62B15,0xE3936754,0x00003FFB,0x00000000//data4 0xA5E56381,0x8003B271,0x0000BFFC,0x00000000//data4 0x73282DB0,0x9249248C,0x00003FFC,0x00000000//data4 0x47305052,0xAAAAAA9F,0x0000BFFC,0x00000000//data4 0xCCD17FC9,0xCCCCCCCC,0x00003FFC,0x00000000//data4 0x00067ED5,0x80000000,0x0000BFFD,0x00000000//data4 0xAAAAAAAA,0xAAAAAAAA,0x00003FFD,0x00000000//data4 0xFFFFFFFE,0xFFFFFFFF,0x0000BFFD,0x00000000data8 0xE3936754EFD62B15,0x00003FFBdata8 0x8003B271A5E56381,0x0000BFFCdata8 0x9249248C73282DB0,0x00003FFCdata8 0xAAAAAA9F47305052,0x0000BFFCdata8 0xCCCCCCCCCCD17FC9,0x00003FFCdata8 0x8000000000067ED5,0x0000BFFDdata8 0xAAAAAAAAAAAAAAAA,0x00003FFDdata8 0xFFFFFFFFFFFFFFFE,0x0000BFFDLOCAL_OBJECT_END(Constants_P)// log2_hi, log2_lo, Q_4, Q_3, Q_2, and Q_1 LOCAL_OBJECT_START(Constants_Q)//data4 0x00000000,0xB1721800,0x00003FFE,0x00000000 //data4 0x4361C4C6,0x82E30865,0x0000BFE2,0x00000000//data4 0x328833CB,0xCCCCCAF2,0x00003FFC,0x00000000//data4 0xA9D4BAFB,0x80000077,0x0000BFFD,0x00000000//data4 0xAAABE3D2,0xAAAAAAAA,0x00003FFD,0x00000000//data4 0xFFFFDAB7,0xFFFFFFFF,0x0000BFFD,0x00000000 data8 0xB172180000000000,0x00003FFEdata8 0x82E308654361C4C6,0x0000BFE2data8 0xCCCCCAF2328833CB,0x00003FFCdata8 0x80000077A9D4BAFB,0x0000BFFDdata8 0xAAAAAAAAAAABE3D2,0x00003FFDdata8 0xFFFFFFFFFFFFDAB7,0x0000BFFDLOCAL_OBJECT_END(Constants_Q)// 1/ln10_hi, 1/ln10_loLOCAL_OBJECT_START(Constants_1_by_LN10)//data4 0x37287195,0xDE5BD8A9,0x00003FFD,0x00000000//data4 0xACCF70C8,0xD56EAABE,0x00003FBB,0x00000000data8 0xDE5BD8A937287195,0x00003FFDdata8 0xD56EAABEACCF70C8,0x00003FBBLOCAL_OBJECT_END(Constants_1_by_LN10)// Z1 - 16 bit fixed LOCAL_OBJECT_START(Constants_Z_1)data4 0x00008000data4 0x00007879data4 0x000071C8data4 0x00006BCBdata4 0x00006667data4 0x00006187data4 0x00005D18data4 0x0000590Cdata4 0x00005556data4 0x000051ECdata4 0x00004EC5data4 0x00004BDBdata4 0x00004925data4 0x0000469Fdata4 0x00004445data4 0x00004211LOCAL_OBJECT_END(Constants_Z_1)// G1 and H1 - IEEE single and h1 - IEEE doubleLOCAL_OBJECT_START(Constants_G_H_h1)data4 0x3F800000,0x00000000data8 0x0000000000000000data4 0x3F70F0F0,0x3D785196data8 0x3DA163A6617D741Cdata4 0x3F638E38,0x3DF13843data8 0x3E2C55E6CBD3D5BBdata4 0x3F579430,0x3E2FF9A0data8 0xBE3EB0BFD86EA5E7data4 0x3F4CCCC8,0x3E647FD6data8 0x3E2E6A8C86B12760data4 0x3F430C30,0x3E8B3AE7data8 0x3E47574C5C0739BAdata4 0x3F3A2E88,0x3EA30C68data8 0x3E20E30F13E8AF2Fdata4 0x3F321640,0x3EB9CEC8data8 0xBE42885BF2C630BDdata4 0x3F2AAAA8,0x3ECF9927data8 0x3E497F3497E577C6data4 0x3F23D708,0x3EE47FC5data8 0x3E3E6A6EA6B0A5AB
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -