📄 s_erfcl.s
字号:
.file "erfcl.s"// Copyright (c) 2001 - 2005, Intel Corporation// All rights reserved.//// Contributed 2001 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.//// History//==============================================================// 11/12/01 Initial version// 02/08/02 Added missing }// 05/20/02 Cleaned up namespace and sf0 syntax// 02/10/03 Reordered header: .section, .global, .proc, .align;// used data8 for long double table values// 03/31/05 Reformatted delimiters between data tables//// API//==============================================================// long double erfcl(long double)//// Implementation and Algorithm Notes://==============================================================// 1. 0 <= x <= 107.0// // erfcl(x) ~=~ P15(z) * expl( -x^2 )/(dx + x), z = x - xc(i).//// Comment://// Let x(i) = -1.0 + 2^(i/4),i=0,...27. So we have 28 unequal// argument intervals [x(i),x(i+1)] with length ratio q = 2^(1/4).// Values xc(i) we have in the table erfc_xc_table,xc(i)=x(i)for i = 0// and xc(i)= 0.5*( x(i)+x(i+1) ) for i>0.// // Let x(i)<= x < x(i+1).// We can find i as exponent of number (x + 1)^4.// // Let P15(z)= a0+ a1*z +..+a15*z^15 - polynomial approximation of degree 15// for function erfcl(z+xc(i)) * expl( (z+xc(i))^2)* (dx+z+xc(i)) and // -0.5*[x(i+1)-x(i)] <= z <= 0.5*[x(i+1)-x(i)].//// Let Q(z)= (P(z)- S)/S, S = a0, rounded to 16 bits.// Polynomial coeffitients for Q(z) we have in the table erfc_Q_table as// long double values//// We use multi precision to calculate input argument -x^2 for expl and // for u = 1/(dx + x). //// Algorithm description for expl function see below. In accordance with// denotation of this algorithm we have for expl://// expl(X) ~=~ 2^K*T_1*(1+W_1)*T_2*(1+W_2)*(1+ poly(r)), X = -x^2. //// Final calculations for erfcl:// // erfcl(x) ~=~//// 2^K*T_1*(1+W_1)*T_2*(1+W_2)*(1+ poly(r))*(1-dy)*S*(1+Q(z))*u*(1+du),//// where dy - low bits of x^2 and u, u*du - hi and low bits of 1/(dx + x).//// The order of calculations is the next://// 1) M = 2^K*T_1*T_2*S without rounding error,// 2) W = W_1 + (W_2 + W_1*W_2), where 1+W ~=~ (1+W_1)(1+W_2),// 3) H = W - dy, where 1+H ~=~ (1+W )(1-dy),// 4) R = poly(r)*H + poly(r), // 5) R = H + R , where 1+R ~=~ (1+H )(1+poly(r)),// 6) G = Q(z)*R + Q(z),// 7) R1 = R + du, where 1+R1 ~=~ (1+R)(1+du),// 8) G1 = R1 + G, where 1+G1 ~=~ (1+R1)(1+Q(z)),// 9) V = G1*M*u,// 10) erfcl(x) ~=~ M*u + V // // 2. -6.5 <= x < 0//// erfcl(x) = 2.0 - erfl(-x)//// 3. x > 107.0// erfcl(x) ~=~ 0.0 //// 4. x < -6.5 // erfcl(x) ~=~ 2.0 // Special values //==============================================================// erfcl(+0) = 1.0// erfcl(-0) = 1.0// erfcl(+qnan) = +qnan // erfcl(-qnan) = -qnan // erfcl(+snan) = +qnan // erfcl(-snan) = -qnan // erfcl(-inf) = 2.0 // erfcl(+inf) = +0//==============================================================// Algorithm description of used expl function.//// Implementation and Algorithm Notes://// ker_exp_64( in_FR : X,// out_FR : Y_hi,// out_FR : Y_lo,// out_FR : scale,// out_PR : Safe )//// On input, X is in register format//// On output, //// scale*(Y_hi + Y_lo) approximates exp(X)//// The accuracy is sufficient for a highly accurate 64 sig.// bit implementation. Safe is set if there is no danger of // overflow/underflow when the result is composed from scale, // Y_hi and Y_lo. Thus, we can have a fast return if Safe is set. // Otherwise, one must prepare to handle the possible exception // appropriately. Note that SAFE not set (false) does not mean // that overflow/underflow will occur; only the setting of SAFE// guarantees the opposite.//// **** High Level Overview **** //// The method consists of three cases.// // If |X| < Tiny use case exp_tiny;// else if |X| < 2^(-6) use case exp_small;// else use case exp_regular;//// Case exp_tiny://// 1 + X can be used to approximate exp(X) // X + X^2/2 can be used to approximate exp(X) - 1//// Case exp_small://// Here, exp(X) and exp(X) - 1 can all be // appproximated by a relatively simple polynomial.//// This polynomial resembles the truncated Taylor series//// exp(w) = 1 + w + w^2/2! + w^3/3! + ... + w^n/n!//// Case exp_regular://// Here we use a table lookup method. The basic idea is that in// order to compute exp(X), we accurately decompose X into//// X = N * log(2)/(2^12) + r, |r| <= log(2)/2^13.//// Hence//// exp(X) = 2^( N / 2^12 ) * exp(r).//// The value 2^( N / 2^12 ) is obtained by simple combinations// of values calculated beforehand and stored in table; exp(r)// is approximated by a short polynomial because |r| is small.//// We elaborate this method in 4 steps.//// Step 1: Reduction//// The value 2^12/log(2) is stored as a double-extended number// L_Inv.//// N := round_to_nearest_integer( X * L_Inv )//// The value log(2)/2^12 is stored as two numbers L_hi and L_lo so// that r can be computed accurately via//// r := (X - N*L_hi) - N*L_lo//// We pick L_hi such that N*L_hi is representable in 64 sig. bits// and thus the FMA X - N*L_hi is error free. So r is the // 1 rounding error from an exact reduction with respect to // // L_hi + L_lo.//// In particular, L_hi has 30 significant bit and can be stored// as a double-precision number; L_lo has 64 significant bits and// stored as a double-extended number.//// Step 2: Approximation//// exp(r) - 1 is approximated by a short polynomial of the form// // r + A_1 r^2 + A_2 r^3 + A_3 r^4 .//// Step 3: Composition from Table Values //// The value 2^( N / 2^12 ) can be composed from a couple of tables// of precalculated values. First, express N as three integers// K, M_1, and M_2 as//// N = K * 2^12 + M_1 * 2^6 + M_2//// Where 0 <= M_1, M_2 < 2^6; and K can be positive or negative.// When N is represented in 2's complement, M_2 is simply the 6// lsb's, M_1 is the next 6, and K is simply N shifted right// arithmetically (sign extended) by 12 bits.//// Now, 2^( N / 2^12 ) is simply // // 2^K * 2^( M_1 / 2^6 ) * 2^( M_2 / 2^12 )//// Clearly, 2^K needs no tabulation. The other two values are less// trivial because if we store each accurately to more than working// precision, than its product is too expensive to calculate. We// use the following method.//// Define two mathematical values, delta_1 and delta_2, implicitly// such that//// T_1 = exp( [M_1 log(2)/2^6] - delta_1 ) // T_2 = exp( [M_2 log(2)/2^12] - delta_2 )//// are representable as 24 significant bits. To illustrate the idea,// we show how we define delta_1: //// T_1 := round_to_24_bits( exp( M_1 log(2)/2^6 ) )// delta_1 = (M_1 log(2)/2^6) - log( T_1 ) //// The last equality means mathematical equality. We then tabulate//// W_1 := exp(delta_1) - 1// W_2 := exp(delta_2) - 1//// Both in double precision.//// From the tabulated values T_1, T_2, W_1, W_2, we compose the values// T and W via//// T := T_1 * T_2 ...exactly// W := W_1 + (1 + W_1)*W_2 //// W approximates exp( delta ) - 1 where delta = delta_1 + delta_2.// The mathematical product of T and (W+1) is an accurate representation// of 2^(M_1/2^6) * 2^(M_2/2^12).//// Step 4. Reconstruction//// Finally, we can reconstruct exp(X), exp(X) - 1. // Because//// X = K * log(2) + (M_1*log(2)/2^6 - delta_1) // + (M_2*log(2)/2^12 - delta_2)// + delta_1 + delta_2 + r ...accurately// We have//// exp(X) ~=~ 2^K * ( T + T*[exp(delta_1+delta_2+r) - 1] )
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -