⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 libm_lgammal.s

📁 glibc 2.9,最新版的C语言库函数
💻 S
📖 第 1 页 / 共 5 页
字号:
.file "libm_lgammal.s"// Copyright (c) 2002 - 2005, Intel Corporation// All rights reserved.//// Contributed 2002 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,INCLUDING,BUT NOT// LIMITED TO,THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS// CONTRIBUTORS BE LIABLE FOR ANY DIRECT,INDIRECT,INCIDENTAL,SPECIAL,// EXEMPLARY,OR CONSEQUENTIAL DAMAGES (INCLUDING,BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,DATA,OR// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY// OF LIABILITY,WHETHER IN CONTRACT,STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE,EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.//// Intel Corporation is the author of this code,and requests that all// problem reports or change requests be submitted to it directly at// http://www.intel.com/software/products/opensource/libraries/num.htm.////*********************************************************************//// History:// 03/28/02  Original version// 05/20/02  Cleaned up namespace and sf0 syntax// 08/21/02  Added support of SIGN(GAMMA(x)) calculation// 09/26/02  Algorithm description improved// 10/21/02  Now it returns SIGN(GAMMA(x))=-1 for negative zero// 02/10/03  Reordered header: .section, .global, .proc, .align// 03/31/05  Reformatted delimiters between data tables////*********************************************************************//// Function: __libm_lgammal(long double x, int* signgam, int szsigngam)// computes the principal value of the logarithm of the GAMMA function// of x. Signum of GAMMA(x) is stored to memory starting at the address// specified by the signgam.////*********************************************************************//// Resources Used:////    Floating-Point Registers: f8 (Input and Return Value)//                              f9-f15//                              f32-f127////    General Purpose Registers://      r2, r3, r8-r11, r14-r31//      r32-r65//      r66-r69 (Used to pass arguments to error handling routine)////    Predicate Registers:      p6-p15////*********************************************************************//// IEEE Special Conditions:////    __libm_lgammal(+inf) = +inf//    __libm_lgammal(-inf) = QNaN//    __libm_lgammal(+/-0) = +inf//    __libm_lgammal(x<0, x - integer) = QNaN//    __libm_lgammal(SNaN) = QNaN//    __libm_lgammal(QNaN) = QNaN////*********************************************************************//// ALGORITHM DESCRIPTION//// Below we suppose that there is log(z) function which takes an long// double argument and returns result as a pair of long double numbers// lnHi and lnLo (such that sum lnHi + lnLo provides ~80 correct bits// of significand). Algorithm description for such log(z) function// see below.// Also, it this algorithm description we use the following notational// conventions:// a) pair A = (Ahi, Alo) means number A represented as sum of Ahi and Alo// b) C = A + B = (Ahi, Alo) + (Bhi, Blo) means multi-precision addition.//    The result would be C = (Chi, Clo). Notice, that Clo shouldn't be//    equal to Alo + Blo// c) D = A*B = (Ahi, Alo)*(Bhi, Blo) = (Dhi, Dlo) multi-precisiion//    multiplication.//// So, lgammal has the following computational paths:// 1) |x| < 0.5//    P = A1*|x| + A2*|x|^2 + ... + A22*|x|^22//    A1, A2, A3 represented as a sum of two double precision//    numbers and multi-precision computations are used for 3 higher//    terms of the polynomial. We get polynomial as a sum of two//    double extended numbers: P = (Phi, Plo)//    1.1) x > 0//         lgammal(x) = P - log(|x|) = (Phi, Plo) - (lnHi(|x|), lnLo(|x|))//    1.2) x < 0//         lgammal(x) = -P - log(|x|) - log(sin(Pi*x)/(Pi*x))//         P and log(|x|) are computed by the same way as in 1.1;//         - log(sin(Pi*x)/(Pi*x)) is approximated by a polynomial Plnsin.//         Plnsin:= fLnSin2*|x|^2 + fLnSin4*|x|^4 + ... + fLnSin36*|x|^36//         The first coefficient of Plnsin is represented as sum of two//         double precision numbers (fLnSin2, fLnSin2L). Multi-precision//         computations for higher two terms of Plnsin are used.//         So, the final result is reconstructed by the following formula//         lgammal(x) = (-(Phi, Plo) - (lnHi(|x|), lnLo(|x|))) -//                      - (PlnsinHi,PlnsinLo)//// 2)    0.5 <= x <   0.75  -> t = x - 0.625//     -0.75 <  x <= -0.5   -> t = x + 0.625//      2.25 <= x <   4.0   -> t = x/2 - 1.5//       4.0 <= x <   8.0   -> t = x/4 - 1.5//       -0.5 < x <= -0.40625 -> t = x + 0.5//       -2.6005859375 < x <= -2.5 -> t = x + 2.5//       1.3125 <= x < 1.5625 -> t = x - LOC_MIN, where LOC_MIN is point in//                                   which lgammal has local minimum. Exact//                                   value can be found in the table below,//                                   approximate value is ~1.46////    lgammal(x) is approximated by the polynomial of 25th degree: P25(t)//    P25(t) = A0 + A1*t + ... + A25*t^25 = (Phi, Plo) + t^4*P21(t),//    where//    (Phi, Plo) is sum of four highest terms of the polynomial P25(t)://    (Phi, Plo) = ((A0, A0L) + (A1, A1L)*t) + t^2 *((A2, A2L) + (A3, A3L)*t),//    (Ai, AiL) - coefficients represented as pairs of DP numbers.////    P21(t) = (PolC(t)*t^8 + PolD(t))*t^8 + PolE(t),//    where//    PolC(t) = C21*t^5 + C20*t^4 + ... + C16,//    C21 = A25, C20 = A24, ..., C16 = A20////    PolD(t) = D7*t^7 + D6*t^6 + ... + D0,//    D7 = A19, D6 = A18, ..., D0 = A12////    PolE(t) = E7*t^7 + E6*t^6 + ... + E0,//    E7 = A11, E6 = A10, ..., E0 = A4////    Cis and Dis are represented as double precision numbers,//    Eis are represented as double extended numbers.//// 3) 0.75 <=  x < 1.3125   -> t = x - 1.0//    1.5625 <= x < 2.25   -> t = x - 2.0//    lgammal(x) is approximated by the polynomial of 25th degree: P25(t)//    P25(t) = A1*t + ... + A25*t^25, and computations are carried out//    by similar way as in the previous case//// 4) 10.0 < x <= Overflow Bound ("positive Sterling" range)//    lgammal(x) is approximated using Sterling's formula://    lgammal(x) ~ ((x*(lnHi(x) - 1, lnLo(x))) - 0.5*(lnHi(x), lnLo(x))) +//                 + ((Chi, Clo) + S(1/x))//    where//    C = (Chi, Clo) - pair of double precision numbers representing constant//    0.5*ln(2*Pi);//    S(1/x) = 1/x * (B2 + B4*(1/x)^2 + ... + B20*(1/x)^18), B2, ..., B20 are//    Bernulli numbers. S is computed in native precision and then added to//    Clo;//    lnHi(x) - 1 is computed in native precision and the multiprecision//    multiplication (x, 0) *(lnHi(x) - 1, lnLo(x)) is used.//// 5) -INF < x <= -2^63, any negative integer < 0//    All numbers in this range are integers -> error handler is called//// 6) -2^63 < x <= -0.75 ("negative Sterling" range), x is "far" from root,//    lgammal(-t) for positive t is approximated using the following formula://    lgammal(-t) = -lgammal(t)-log(t)-log(|dT|)+log(sin(Pi*|dT|)/(Pi*|dT|))//        where dT = -t -round_to_nearest_integer(-t)//    Last item is approximated by the same polynomial as described in 1.2.//    We split the whole range into three subranges due to different ways of//    approximation of the first terms.//    6.1) -2^63 < x < -6.0 ("negative Sterling" range)//       lgammal(t) is approximated exactly as in #4. The only difference that//       for -13.0 < x < -6.0 subrange instead of Bernulli numbers we use their//       minimax approximation on this range.//       log(t), log(|dT|) are approximated by the log routine mentioned above.//    6.2) -6.0 < x <= -0.75, |x + 1|> 2^(-7)//       log(t), log(|dT|) are approximated by the log routine mentioned above,//       lgammal(t) is approximated by polynomials of the 25th degree similar//       to ones from #2. Arguments z of the polynomials are as follows//       a) 0.75 <= t < 1.0 - 2^(-7),  z = 2*t - 1.5//       b) 1.0 - 2^(-7)  < t < 2.0,   z = t - 1.5//       c) 2.0  < t < 3.0,   z = t/2 - 1.5//       d) 3.0  < t < 4.0,   z = t/2 - 1.5. Notice, that range reduction is//          the same as in case c) but the set of coefficients is different//       e) 4.0  < t < 6.0,   z = t/4 - 1.5//    6.3) |x + 1| <= 2^(-7)//       log(1 + (x-1)) is approximated by Taylor series,//       log(sin(Pi*|dT|)/(Pi*|dT|)) is still approximated by polynomial but//       it has just 4th degree.//       log(|dT|) is approximated by the log routine mentioned above.//       lgammal(-x) is approximated by polynomial of 8th degree from (-x + 1).//// 7) -20.0 < x < -2.0, x falls in root "neighbourhood".//    "Neighbourhood" means that |lgammal(x)| < epsilon, where epsilon is//    different for every root (and it is stored in the table), but typically//    it is ~ 0.15. There are 35 roots significant from "double extended"//    point of view. We split all the roots into two subsets: "left" and "right"//    roots. Considering [-(N+1), -N] range we call root as "left" one if it//    lies closer to -(N+1) and "right" otherwise. There is no "left" root in//    the [-20, -19] range (it exists, but is insignificant for double extended//    precision). To determine if x falls in root "neighbourhood" we store//    significands of all the 35 roots as well as epsilon values (expressed//    by the left and right bound).//    In these ranges we approximate lgammal(x) by polynomial series of 19th//    degree://    lgammal(x) = P19(t) = A0 + A1*t + ...+ A19*t^19, where t = x - EDP_Root,//    EDP_Root is the exact value of the corresponding root rounded to double//    extended precision. So, we have 35 different polynomials which make our//    table rather big. We may hope that x falls in root "neighbourhood"//    quite rarely -> ther might be no need in frequent use of different//    polynomials.//    A0, A1, A2, A3 are represented as pairs of double precision numbers,//    A4, A5 are long doubles, and to decrease the size of the table we//    keep the rest of coefficients in just double precision////*********************************************************************// Algorithm for log(X) = (lnHi(X), lnLo(X))////   ALGORITHM////   Here we use a table lookup method. The basic idea is that in//   order to compute logl(Arg) for an argument Arg in [1,2), we//   construct a value G such that G*Arg is close to 1 and that//   logl(1/G) is obtainable easily from a table of values calculated//   beforehand. Thus////      logl(Arg) = logl(1/G) + logl(G*Arg)//                = logl(1/G) + logl(1 + (G*Arg - 1))////   Because |G*Arg - 1| is small, the second term on the right hand//   side can be approximated by a short polynomial. We elaborate//   this method in four steps.////   Step 0: Initialization////   We need to calculate logl( X ). Obtain N, S_hi such that////      X = 2^N * S_hi exactly////   where S_hi in [1,2)////   Step 1: Argument Reduction////   Based on S_hi, obtain G_1, G_2, G_3 from a table and calculate////      G := G_1 * G_2 * G_3//      r := (G * S_hi - 1)////   These G_j's have the property that the product is exactly//   representable and that |r| < 2^(-12) as a result.////   Step 2: Approximation//////   logl(1 + r) is approximated by a short polynomial poly(r).////   Step 3: Reconstruction//////   Finally, logl( X ) is given by////   logl( X )   =   logl( 2^N * S_hi )//                 ~=~  N*logl(2) + logl(1/G) + logl(1 + r)//                 ~=~  N*logl(2) + logl(1/G) + poly(r).////   IMPLEMENTATION////   Step 0. Initialization//   ----------------------////   Z := X//   N := unbaised exponent of Z//   S_hi := 2^(-N) * Z////   Step 1. Argument Reduction//   --------------------------////   Let////      Z = 2^N * S_hi = 2^N * 1.d_1 d_2 d_3 ... d_63////   We obtain G_1, G_2, G_3 by the following steps.//////      Define          X_0 := 1.d_1 d_2 ... d_14. This is extracted//                      from S_hi.////      Define          A_1 := 1.d_1 d_2 d_3 d_4. This is X_0 truncated//                      to lsb = 2^(-4).////      Define          index_1 := [ d_1 d_2 d_3 d_4 ].////      Fetch           Z_1 := (1/A_1) rounded UP in fixed point with//      fixed point     lsb = 2^(-15).//                      Z_1 looks like z_0.z_1 z_2 ... z_15//                      Note that the fetching is done using index_1.//                      A_1 is actually not needed in the implementation//                      and is used here only to explain how is the value//                      Z_1 defined.////      Fetch           G_1 := (1/A_1) truncated to 21 sig. bits.//      floating pt.    Again, fetching is done using index_1. A_1//                      explains how G_1 is defined.////      Calculate       X_1 := X_0 * Z_1 truncated to lsb = 2^(-14)//                           = 1.0 0 0 0 d_5 ... d_14

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -