📄 libm_lgammal.s
字号:
.file "libm_lgammal.s"// Copyright (c) 2002 - 2005, Intel Corporation// All rights reserved.//// Contributed 2002 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,INCLUDING,BUT NOT// LIMITED TO,THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS// CONTRIBUTORS BE LIABLE FOR ANY DIRECT,INDIRECT,INCIDENTAL,SPECIAL,// EXEMPLARY,OR CONSEQUENTIAL DAMAGES (INCLUDING,BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,DATA,OR// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY// OF LIABILITY,WHETHER IN CONTRACT,STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE,EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.//// Intel Corporation is the author of this code,and requests that all// problem reports or change requests be submitted to it directly at// http://www.intel.com/software/products/opensource/libraries/num.htm.////*********************************************************************//// History:// 03/28/02 Original version// 05/20/02 Cleaned up namespace and sf0 syntax// 08/21/02 Added support of SIGN(GAMMA(x)) calculation// 09/26/02 Algorithm description improved// 10/21/02 Now it returns SIGN(GAMMA(x))=-1 for negative zero// 02/10/03 Reordered header: .section, .global, .proc, .align// 03/31/05 Reformatted delimiters between data tables////*********************************************************************//// Function: __libm_lgammal(long double x, int* signgam, int szsigngam)// computes the principal value of the logarithm of the GAMMA function// of x. Signum of GAMMA(x) is stored to memory starting at the address// specified by the signgam.////*********************************************************************//// Resources Used://// Floating-Point Registers: f8 (Input and Return Value)// f9-f15// f32-f127//// General Purpose Registers:// r2, r3, r8-r11, r14-r31// r32-r65// r66-r69 (Used to pass arguments to error handling routine)//// Predicate Registers: p6-p15////*********************************************************************//// IEEE Special Conditions://// __libm_lgammal(+inf) = +inf// __libm_lgammal(-inf) = QNaN// __libm_lgammal(+/-0) = +inf// __libm_lgammal(x<0, x - integer) = QNaN// __libm_lgammal(SNaN) = QNaN// __libm_lgammal(QNaN) = QNaN////*********************************************************************//// ALGORITHM DESCRIPTION//// Below we suppose that there is log(z) function which takes an long// double argument and returns result as a pair of long double numbers// lnHi and lnLo (such that sum lnHi + lnLo provides ~80 correct bits// of significand). Algorithm description for such log(z) function// see below.// Also, it this algorithm description we use the following notational// conventions:// a) pair A = (Ahi, Alo) means number A represented as sum of Ahi and Alo// b) C = A + B = (Ahi, Alo) + (Bhi, Blo) means multi-precision addition.// The result would be C = (Chi, Clo). Notice, that Clo shouldn't be// equal to Alo + Blo// c) D = A*B = (Ahi, Alo)*(Bhi, Blo) = (Dhi, Dlo) multi-precisiion// multiplication.//// So, lgammal has the following computational paths:// 1) |x| < 0.5// P = A1*|x| + A2*|x|^2 + ... + A22*|x|^22// A1, A2, A3 represented as a sum of two double precision// numbers and multi-precision computations are used for 3 higher// terms of the polynomial. We get polynomial as a sum of two// double extended numbers: P = (Phi, Plo)// 1.1) x > 0// lgammal(x) = P - log(|x|) = (Phi, Plo) - (lnHi(|x|), lnLo(|x|))// 1.2) x < 0// lgammal(x) = -P - log(|x|) - log(sin(Pi*x)/(Pi*x))// P and log(|x|) are computed by the same way as in 1.1;// - log(sin(Pi*x)/(Pi*x)) is approximated by a polynomial Plnsin.// Plnsin:= fLnSin2*|x|^2 + fLnSin4*|x|^4 + ... + fLnSin36*|x|^36// The first coefficient of Plnsin is represented as sum of two// double precision numbers (fLnSin2, fLnSin2L). Multi-precision// computations for higher two terms of Plnsin are used.// So, the final result is reconstructed by the following formula// lgammal(x) = (-(Phi, Plo) - (lnHi(|x|), lnLo(|x|))) -// - (PlnsinHi,PlnsinLo)//// 2) 0.5 <= x < 0.75 -> t = x - 0.625// -0.75 < x <= -0.5 -> t = x + 0.625// 2.25 <= x < 4.0 -> t = x/2 - 1.5// 4.0 <= x < 8.0 -> t = x/4 - 1.5// -0.5 < x <= -0.40625 -> t = x + 0.5// -2.6005859375 < x <= -2.5 -> t = x + 2.5// 1.3125 <= x < 1.5625 -> t = x - LOC_MIN, where LOC_MIN is point in// which lgammal has local minimum. Exact// value can be found in the table below,// approximate value is ~1.46//// lgammal(x) is approximated by the polynomial of 25th degree: P25(t)// P25(t) = A0 + A1*t + ... + A25*t^25 = (Phi, Plo) + t^4*P21(t),// where// (Phi, Plo) is sum of four highest terms of the polynomial P25(t):// (Phi, Plo) = ((A0, A0L) + (A1, A1L)*t) + t^2 *((A2, A2L) + (A3, A3L)*t),// (Ai, AiL) - coefficients represented as pairs of DP numbers.//// P21(t) = (PolC(t)*t^8 + PolD(t))*t^8 + PolE(t),// where// PolC(t) = C21*t^5 + C20*t^4 + ... + C16,// C21 = A25, C20 = A24, ..., C16 = A20//// PolD(t) = D7*t^7 + D6*t^6 + ... + D0,// D7 = A19, D6 = A18, ..., D0 = A12//// PolE(t) = E7*t^7 + E6*t^6 + ... + E0,// E7 = A11, E6 = A10, ..., E0 = A4//// Cis and Dis are represented as double precision numbers,// Eis are represented as double extended numbers.//// 3) 0.75 <= x < 1.3125 -> t = x - 1.0// 1.5625 <= x < 2.25 -> t = x - 2.0// lgammal(x) is approximated by the polynomial of 25th degree: P25(t)// P25(t) = A1*t + ... + A25*t^25, and computations are carried out// by similar way as in the previous case//// 4) 10.0 < x <= Overflow Bound ("positive Sterling" range)// lgammal(x) is approximated using Sterling's formula:// lgammal(x) ~ ((x*(lnHi(x) - 1, lnLo(x))) - 0.5*(lnHi(x), lnLo(x))) +// + ((Chi, Clo) + S(1/x))// where// C = (Chi, Clo) - pair of double precision numbers representing constant// 0.5*ln(2*Pi);// S(1/x) = 1/x * (B2 + B4*(1/x)^2 + ... + B20*(1/x)^18), B2, ..., B20 are// Bernulli numbers. S is computed in native precision and then added to// Clo;// lnHi(x) - 1 is computed in native precision and the multiprecision// multiplication (x, 0) *(lnHi(x) - 1, lnLo(x)) is used.//// 5) -INF < x <= -2^63, any negative integer < 0// All numbers in this range are integers -> error handler is called//// 6) -2^63 < x <= -0.75 ("negative Sterling" range), x is "far" from root,// lgammal(-t) for positive t is approximated using the following formula:// lgammal(-t) = -lgammal(t)-log(t)-log(|dT|)+log(sin(Pi*|dT|)/(Pi*|dT|))// where dT = -t -round_to_nearest_integer(-t)// Last item is approximated by the same polynomial as described in 1.2.// We split the whole range into three subranges due to different ways of// approximation of the first terms.// 6.1) -2^63 < x < -6.0 ("negative Sterling" range)// lgammal(t) is approximated exactly as in #4. The only difference that// for -13.0 < x < -6.0 subrange instead of Bernulli numbers we use their// minimax approximation on this range.// log(t), log(|dT|) are approximated by the log routine mentioned above.// 6.2) -6.0 < x <= -0.75, |x + 1|> 2^(-7)// log(t), log(|dT|) are approximated by the log routine mentioned above,// lgammal(t) is approximated by polynomials of the 25th degree similar// to ones from #2. Arguments z of the polynomials are as follows// a) 0.75 <= t < 1.0 - 2^(-7), z = 2*t - 1.5// b) 1.0 - 2^(-7) < t < 2.0, z = t - 1.5// c) 2.0 < t < 3.0, z = t/2 - 1.5// d) 3.0 < t < 4.0, z = t/2 - 1.5. Notice, that range reduction is// the same as in case c) but the set of coefficients is different// e) 4.0 < t < 6.0, z = t/4 - 1.5// 6.3) |x + 1| <= 2^(-7)// log(1 + (x-1)) is approximated by Taylor series,// log(sin(Pi*|dT|)/(Pi*|dT|)) is still approximated by polynomial but// it has just 4th degree.// log(|dT|) is approximated by the log routine mentioned above.// lgammal(-x) is approximated by polynomial of 8th degree from (-x + 1).//// 7) -20.0 < x < -2.0, x falls in root "neighbourhood".// "Neighbourhood" means that |lgammal(x)| < epsilon, where epsilon is// different for every root (and it is stored in the table), but typically// it is ~ 0.15. There are 35 roots significant from "double extended"// point of view. We split all the roots into two subsets: "left" and "right"// roots. Considering [-(N+1), -N] range we call root as "left" one if it// lies closer to -(N+1) and "right" otherwise. There is no "left" root in// the [-20, -19] range (it exists, but is insignificant for double extended// precision). To determine if x falls in root "neighbourhood" we store// significands of all the 35 roots as well as epsilon values (expressed// by the left and right bound).// In these ranges we approximate lgammal(x) by polynomial series of 19th// degree:// lgammal(x) = P19(t) = A0 + A1*t + ...+ A19*t^19, where t = x - EDP_Root,// EDP_Root is the exact value of the corresponding root rounded to double// extended precision. So, we have 35 different polynomials which make our// table rather big. We may hope that x falls in root "neighbourhood"// quite rarely -> ther might be no need in frequent use of different// polynomials.// A0, A1, A2, A3 are represented as pairs of double precision numbers,// A4, A5 are long doubles, and to decrease the size of the table we// keep the rest of coefficients in just double precision////*********************************************************************// Algorithm for log(X) = (lnHi(X), lnLo(X))//// ALGORITHM//// Here we use a table lookup method. The basic idea is that in// order to compute logl(Arg) for an argument Arg in [1,2), we// construct a value G such that G*Arg is close to 1 and that// logl(1/G) is obtainable easily from a table of values calculated// beforehand. Thus//// logl(Arg) = logl(1/G) + logl(G*Arg)// = logl(1/G) + logl(1 + (G*Arg - 1))//// Because |G*Arg - 1| is small, the second term on the right hand// side can be approximated by a short polynomial. We elaborate// this method in four steps.//// Step 0: Initialization//// We need to calculate logl( X ). Obtain N, S_hi such that//// X = 2^N * S_hi exactly//// where S_hi in [1,2)//// Step 1: Argument Reduction//// Based on S_hi, obtain G_1, G_2, G_3 from a table and calculate//// G := G_1 * G_2 * G_3// r := (G * S_hi - 1)//// These G_j's have the property that the product is exactly// representable and that |r| < 2^(-12) as a result.//// Step 2: Approximation////// logl(1 + r) is approximated by a short polynomial poly(r).//// Step 3: Reconstruction////// Finally, logl( X ) is given by//// logl( X ) = logl( 2^N * S_hi )// ~=~ N*logl(2) + logl(1/G) + logl(1 + r)// ~=~ N*logl(2) + logl(1/G) + poly(r).//// IMPLEMENTATION//// Step 0. Initialization// ----------------------//// Z := X// N := unbaised exponent of Z// S_hi := 2^(-N) * Z//// Step 1. Argument Reduction// --------------------------//// Let//// Z = 2^N * S_hi = 2^N * 1.d_1 d_2 d_3 ... d_63//// We obtain G_1, G_2, G_3 by the following steps.////// Define X_0 := 1.d_1 d_2 ... d_14. This is extracted// from S_hi.//// Define A_1 := 1.d_1 d_2 d_3 d_4. This is X_0 truncated// to lsb = 2^(-4).//// Define index_1 := [ d_1 d_2 d_3 d_4 ].//// Fetch Z_1 := (1/A_1) rounded UP in fixed point with// fixed point lsb = 2^(-15).// Z_1 looks like z_0.z_1 z_2 ... z_15// Note that the fetching is done using index_1.// A_1 is actually not needed in the implementation// and is used here only to explain how is the value// Z_1 defined.//// Fetch G_1 := (1/A_1) truncated to 21 sig. bits.// floating pt. Again, fetching is done using index_1. A_1// explains how G_1 is defined.//// Calculate X_1 := X_0 * Z_1 truncated to lsb = 2^(-14)// = 1.0 0 0 0 d_5 ... d_14
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -