📄 s_erfcl.s

📁 glibc 库, 不仅可以学习使用库函数,还可以学习函数的具体实现,是提高功力的好资料
💻 S
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
.file "erfcl.s"// Copyright (c) 2001 - 2005, Intel Corporation// All rights reserved.//// Contributed 2001 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.//// History//==============================================================// 11/12/01  Initial version// 02/08/02  Added missing }// 05/20/02  Cleaned up namespace and sf0 syntax// 02/10/03  Reordered header: .section, .global, .proc, .align;//           used data8 for long double table values// 03/31/05  Reformatted delimiters between data tables//// API//==============================================================// long double erfcl(long double)//// Implementation and Algorithm Notes://==============================================================// 1. 0 <= x <= 107.0//    //    erfcl(x) ~=~ P15(z) * expl( -x^2 )/(dx + x), z = x - xc(i).////    Comment:////    Let x(i) = -1.0 + 2^(i/4),i=0,...27. So we have 28 unequal//    argument intervals [x(i),x(i+1)] with length ratio q = 2^(1/4).//    Values xc(i) we have in the table erfc_xc_table,xc(i)=x(i)for i = 0//    and xc(i)= 0.5*( x(i)+x(i+1) ) for i>0.// //    Let x(i)<= x < x(i+1).//    We can find i as exponent of number (x + 1)^4.// //    Let P15(z)= a0+ a1*z +..+a15*z^15 - polynomial approximation of degree 15//    for function      erfcl(z+xc(i)) * expl( (z+xc(i))^2)* (dx+z+xc(i)) and //    -0.5*[x(i+1)-x(i)] <= z <= 0.5*[x(i+1)-x(i)].////    Let  Q(z)= (P(z)- S)/S, S = a0, rounded to 16 bits.//    Polynomial coeffitients for Q(z) we have in the table erfc_Q_table as//    long double values////    We use multi precision to calculate input argument -x^2 for expl and //    for u = 1/(dx + x). ////    Algorithm description for expl function see below. In accordance with//    denotation of this algorithm we have for expl:////    expl(X) ~=~ 2^K*T_1*(1+W_1)*T_2*(1+W_2)*(1+ poly(r)), X = -x^2. ////    Final calculations for erfcl:// //    erfcl(x) ~=~////         2^K*T_1*(1+W_1)*T_2*(1+W_2)*(1+ poly(r))*(1-dy)*S*(1+Q(z))*u*(1+du),////    where dy - low bits of x^2 and u, u*du - hi and low bits of 1/(dx + x).////    The order of calculations is the next:////    1)  M = 2^K*T_1*T_2*S          without rounding error,//    2)  W = W_1 + (W_2 + W_1*W_2), where 1+W  ~=~ (1+W_1)(1+W_2),//    3)  H = W - dy,                where 1+H  ~=~ (1+W )(1-dy),//    4)  R = poly(r)*H + poly(r),    //    5)  R = H + R              ,   where 1+R  ~=~ (1+H )(1+poly(r)),//    6)  G = Q(z)*R + Q(z),//    7)  R1 = R + du,               where 1+R1 ~=~ (1+R)(1+du),//    8)  G1 = R1 + G,               where 1+G1 ~=~ (1+R1)(1+Q(z)),//    9)  V  = G1*M*u,//    10) erfcl(x) ~=~ M*u + V     //                     // 2. -6.5 <= x < 0////    erfcl(x)  = 2.0 - erfl(-x)//// 3. x > 107.0//    erfcl(x)  ~=~ 0.0                      //// 4. x < -6.5            //    erfcl(x)  ~=~ 2.0                      // Special values //==============================================================// erfcl(+0)    = 1.0// erfcl(-0)    = 1.0// erfcl(+qnan) = +qnan // erfcl(-qnan) = -qnan // erfcl(+snan) = +qnan // erfcl(-snan) = -qnan // erfcl(-inf)  = 2.0 // erfcl(+inf)  = +0//==============================================================// Algorithm description of used expl function.//// Implementation and Algorithm Notes:////  ker_exp_64( in_FR  : X,//            out_FR : Y_hi,//            out_FR : Y_lo,//            out_FR : scale,//            out_PR : Safe )//// On input, X is in register format//// On output, ////   scale*(Y_hi + Y_lo)  approximates  exp(X)//// The accuracy is sufficient for a highly accurate 64 sig.// bit implementation.  Safe is set if there is no danger of // overflow/underflow when the result is composed from scale, // Y_hi and Y_lo. Thus, we can have a fast return if Safe is set. // Otherwise, one must prepare to handle the possible exception // appropriately.  Note that SAFE not set (false) does not mean // that overflow/underflow will occur; only the setting of SAFE// guarantees the opposite.//// **** High Level Overview **** //// The method consists of three cases.// // If           |X| < Tiny  use case exp_tiny;// else if  |X| < 2^(-6)    use case exp_small;// else     use case exp_regular;//// Case exp_tiny:////   1 + X     can be used to approximate exp(X) //   X + X^2/2 can be used to approximate exp(X) - 1//// Case exp_small:////   Here, exp(X) and exp(X) - 1 can all be //   appproximated by a relatively simple polynomial.////   This polynomial resembles the truncated Taylor series////  exp(w) = 1 + w + w^2/2! + w^3/3! + ... + w^n/n!//// Case exp_regular:////   Here we use a table lookup method. The basic idea is that in//   order to compute exp(X), we accurately decompose X into////   X = N * log(2)/(2^12)  + r,    |r| <= log(2)/2^13.////   Hence////   exp(X) = 2^( N / 2^12 ) * exp(r).////   The value 2^( N / 2^12 ) is obtained by simple combinations//   of values calculated beforehand and stored in table; exp(r)//   is approximated by a short polynomial because |r| is small.////   We elaborate this method in 4 steps.////   Step 1: Reduction////   The value 2^12/log(2) is stored as a double-extended number//   L_Inv.////   N := round_to_nearest_integer( X * L_Inv )////   The value log(2)/2^12 is stored as two numbers L_hi and L_lo so//   that r can be computed accurately via////   r := (X - N*L_hi) - N*L_lo////   We pick L_hi such that N*L_hi is representable in 64 sig. bits//   and thus the FMA   X - N*L_hi   is error free. So r is the //   1 rounding error from an exact reduction with respect to //   //   L_hi + L_lo.////   In particular, L_hi has 30 significant bit and can be stored//   as a double-precision number; L_lo has 64 significant bits and//   stored as a double-extended number.////   Step 2: Approximation////   exp(r) - 1 is approximated by a short polynomial of the form//   //   r + A_1 r^2 + A_2 r^3 + A_3 r^4 .////   Step 3: Composition from Table Values ////   The value 2^( N / 2^12 ) can be composed from a couple of tables//   of precalculated values. First, express N as three integers//   K, M_1, and M_2 as////     N  =  K * 2^12  + M_1 * 2^6 + M_2////   Where 0 <= M_1, M_2 < 2^6; and K can be positive or negative.//   When N is represented in 2's complement, M_2 is simply the 6//   lsb's, M_1 is the next 6, and K is simply N shifted right//   arithmetically (sign extended) by 12 bits.////   Now, 2^( N / 2^12 ) is simply  //  //      2^K * 2^( M_1 / 2^6 ) * 2^( M_2 / 2^12 )////   Clearly, 2^K needs no tabulation. The other two values are less//   trivial because if we store each accurately to more than working//   precision, than its product is too expensive to calculate. We//   use the following method.////   Define two mathematical values, delta_1 and delta_2, implicitly//   such that////     T_1 = exp( [M_1 log(2)/2^6]  -  delta_1 ) //     T_2 = exp( [M_2 log(2)/2^12] -  delta_2 )////   are representable as 24 significant bits. To illustrate the idea,//   we show how we define delta_1: ////     T_1     := round_to_24_bits( exp( M_1 log(2)/2^6 ) )//     delta_1  = (M_1 log(2)/2^6) - log( T_1 )  ////   The last equality means mathematical equality. We then tabulate////     W_1 := exp(delta_1) - 1//     W_2 := exp(delta_2) - 1////   Both in double precision.////   From the tabulated values T_1, T_2, W_1, W_2, we compose the values//   T and W via////     T := T_1 * T_2           ...exactly//     W := W_1 + (1 + W_1)*W_2 ////   W approximates exp( delta ) - 1  where delta = delta_1 + delta_2.//   The mathematical product of T and (W+1) is an accurate representation//   of 2^(M_1/2^6) * 2^(M_2/2^12).////   Step 4. Reconstruction////   Finally, we can reconstruct exp(X), exp(X) - 1. //   Because////  X = K * log(2) + (M_1*log(2)/2^6  - delta_1) //             + (M_2*log(2)/2^12 - delta_2)//             + delta_1 + delta_2 + r      ...accurately//   We have////  exp(X) ~=~ 2^K * ( T + T*[exp(delta_1+delta_2+r) - 1] )
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -