⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 s_tanhl.s

📁 glibc 2.9,最新版的C语言库函数
💻 S
📖 第 1 页 / 共 4 页
字号:
.file "tanhl.s"// Copyright (c) 2001 - 2003, Intel Corporation// All rights reserved.//// Contributed 2001 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.//// History//==============================================================// 11/29/01  Initial version// 05/20/02  Cleaned up namespace and sf0 syntax// 08/14/02  Changed mli templates to mlx// 02/10/03  Reordered header: .section, .global, .proc, .align//// API//==============================================================// long double tanhl(long double)//// Overview of operation//==============================================================//// Algorithm description// ---------------------//// There are 4 paths://// 1. Special path: x = 0, Inf, NaNs, denormal//    Return tanhl(x) = +/-0.0 for zeros//    Return tanhl(x) = QNaN for NaNs//    Return tanhl(x) = sign(x)*1.0 for Inf//    Return tanhl(x) = x + x^2   for - denormals//    Return tanhl(x) = x - x^2   for + denormals//// 2. [0;1/8] path: 0.0 < |x| < 1/8//    Return tanhl(x) = x + x^3*A3 + ... + x^15*A15//// 3. Main path: 1/8 <= |x| < 22.8//    For several ranges of 1/8 <= |x| < 22.8//    Return tanhl(x) = sign(x)*((A0H+A0L) + y*(A1H+A1L) + y^2*(A2H+A2L) + //                                       + y^3*A3 + y^4*A4 + ... + y^25*A25 )//    where y = (|x|/a) - b////    For each range there is particular set of coefficients.//    Below is the list of ranges://    1/8  <= |x| < 1/4     a = 0.125, b = 1.5//    1/4  <= |x| < 1/2     a = 0.25,  b = 1.5//    1/2  <= |x| < 1.0     a = 0.5,   b = 1.5//    1.0  <= |x| < 2.0     a = 1.0,   b = 1.5//    2.0  <= |x| < 3.25    a = 2.0,   b = 1.5//    3.25 <= |x| < 4.0     a = 2.0,   b = 2.0//    4.0  <= |x| < 6.5     a = 4.0,   b = 1.5//    6.5  <= |x| < 8.0     a = 4.0,   b = 2.0//    8.0  <= |x| < 13.0    a = 8.0,   b = 1.5//    13.0 <= |x| < 16.0    a = 8.0,   b = 2.0//    16.0 <= |x| < 22.8    a = 16.0,  b = 1.5//    ( [3.25;4.0], [6.5;8.0], [13.9;16.0] subranges separated //                               for monotonicity issues resolve )//// 4. Saturation path: 22.8 <= |x| < +INF //    Return tanhl(x) = sign(x)*(1.0 - tiny_value)//    (tiny_value ~ 1e-1233)//// Implementation notes// --------------------//// 1. Special path: x = 0, INF, NaNa, denormals////    This branch is cut off by one fclass operation.//    Then zeros+nans, infinities and denormals processed separately.//    For denormals we use simple fma operaton x+x*x (- for +denorms)//// 2. [0;1/8] path: 0.0 < |x| < 1/8////    Here we use simple polynimial computations, where last step//    is performed as x + x^3*A3+...//    The rest of polynomial is factorized using binary tree technique.//// 3. Main path: 1/8 <= |x| < 22.8////    Multiprecision have to be performed only for first few//    polynomial iterations (up to 3-rd x degree)//    Here we use the same parallelisation way as above://    Split whole polynomial to first, "multiprecision" part, and second, //    so called "tail", native precision part.////    1) Multiprecision part:  //    [v1=(A0H+A0L)+y*(A1H+A1L)] + [v2=y^2*((A2H+A2L)+y*A3)]//    v1 and v2 terms calculated in parallel////    2) Tail part://    v3 = x^4 * ( A4 + x*A5 + ... + x^21*A25 )//    v3 is splitted to 2 even parts (10 coefficient in each one).//    These 2 parts are also factorized using binary tree technique.//    //    So Multiprecision and Tail parts cost is almost the same//    and we have both results ready before final summation.////    Some tricks were applied to maintain symmetry at direct//    rounding modes (to +/-inf). We had to set result sign//    not at the last operation but much more earlier and at//    several places.//// 4. Saturation path: 22.8 <= |x| < +INF ////    We use formula sign(x)*(1.0 - tiny_value) instead of simple sign(x)*1.0//    just to meet IEEE requirements for different rounding modes in this case.//// Registers used//==============================================================// Floating Point registers used: // f8 - input & output// f32 -> f92// General registers used:  // r2, r3, r32 -> r52 // Predicate registers used:// p0, p6 -> p11, p14, p15// p6  - arg is zero, denormal or special IEEE// p7  - arg is in [16;32] binary interval// p8  - arg is in one of subranges //         [3.25;4.0], [6.5;8.0], [13.9;16.0]// p9  - arg < 1/8// p10  - arg is NOT in one of subranges //         [3.25;4.0], [6.5;8.0], [13.9;16.0]// p11 - arg in saturation domain// p14 - arg is positive// p15 - arg is negative// Assembly macros//==============================================================rDataPtr           = r2rTailDataPtr       = r3rBias              = r33rSignBit           = r34rInterval          = r35rArgExp            = r36rArgSig            = r37r3p25Offset        = r38r2to4              = r39r1p25              = r40rOffset            = r41r1p5               = r42rSaturation        = r43r1625Sign          = r44rTiny              = r45rAddr1             = r46rAddr2             = r47rTailAddr1         = r48rTailAddr2         = r49rTailOffset        = r50rTailAddOffset     = r51rShiftedDataPtr    = r52//==============================================================fA0H               = f32fA0L               = f33fA1H               = f34fA1L               = f35fA2H               = f36fA2L               = f37fA3                = f38fA4                = f39fA5                = f40fA6                = f41fA7                = f42fA8                = f43fA9                = f44fA10               = f45fA11               = f46fA12               = f47fA13               = f48fA14               = f49fA15               = f50fA16               = f51fA17               = f52fA18               = f53fA19               = f54fA20               = f55 fA21               = f56 fA22               = f57 fA23               = f58fA24               = f59fA25               = f60fArgSqr            = f61fArgCube           = f62fArgFour           = f63fArgEight          = f64fArgAbsNorm        = f65fArgAbsNorm2       = f66fArgAbsNorm2L      = f67fArgAbsNorm3       = f68fArgAbsNorm4       = f69fArgAbsNorm11      = f70fRes               = f71fResH              = f72fResL              = f73fRes1H             = f74fRes1L             = f75fRes1Hd            = f76fRes2H             = f77fRes2L             = f78fRes3H             = f79fRes3L             = f80fRes4              = f81fTT                = f82 fTH                = f83fTL                = f84fTT2               = f85 fTH2               = f86fTL2               = f87f1p5               = f88f2p0               = f89fTiny              = f90fSignumX           = f91fArgAbsNorm4X      = f92// Data tables//==============================================================RODATA.align 16LOCAL_OBJECT_START(tanhl_data)////////// Main tables ///////////_0p125_to_0p25_data: // exp = 2^-3// Polynomial coefficients for the tanh(x), 1/8 <= |x| < 1/4 data8 0x93D27D6AE7E835F8, 0x0000BFF4 //A3 = -5.6389704216278164626050408239e-04data8 0xBF66E8668A78A8BC //A2H = -2.7963640930198357253955165902e-03data8 0xBBD5384EFD0E7A54 //A2L = -1.7974001252014762983581666453e-20data8 0x3FBEE69E31DB6156 //A1H = 1.2070645062647619716322822114e-01data8 0x3C43A0B4E24A3DCA //A1L = 2.1280460108882061756490131241e-18data8 0x3FC7B8FF903BF776 //A0H = 1.8533319990813951205765874874e-01data8 0x3C593F1A61986FD4 //A0L = 5.4744612262799573374268254539e-18data8 0xDB9E6735560AAE5A, 0x0000BFA3 //A25 = -3.4649731131719154051239475238e-28data8 0xF0DDE953E4327704, 0x00003FA4 //A24 = 7.6004173864565644629900702857e-28data8 0x8532AED11DEC5612, 0x00003FAB //A23 = 5.3798235684551098715428515761e-26data8 0xAEF72A34D88B0038, 0x0000BFAD //A22 = -2.8267199091484508912273222600e-25data8 0x9645EF1DCB759DDD, 0x0000BFB2 //A21 = -7.7689413112830095709522203109e-24data8 0xA5D12364E121F70F, 0x00003FB5 //A20 = 6.8580281614531622113161030550e-23data8 0x9CF166EA815AC705, 0x00003FB9 //A19 = 1.0385615003184753213024737634e-21data8 0x852B1D0252498752, 0x0000BFBD //A18 = -1.4099753997949827217635356478e-20data8 0x9270F5716D25EC9F, 0x0000BFC0 //A17 = -1.2404055949090177751123473821e-19data8 0xC216A9C4EEBDDDCA, 0x00003FC4 //A16 = 2.6303900460415782677749729120e-18data8 0xDCE944D89FF592F2, 0x00003FC6 //A15 = 1.1975620514752377092265425941e-17data8 0x83C8DDF213711381, 0x0000BFCC //A14 = -4.5721980583985311263109531319e-16LOCAL_OBJECT_END(tanhl_data)LOCAL_OBJECT_START(_0p25_to_0p5_data)// Polynomial coefficients for the tanh(x), 1/4 <= |x| < 1/2 data8 0xB6E27B747C47C8AD, 0x0000BFF6 //A3 = -2.7905990032063258105302045572e-03data8 0xBF93FD54E226F8F7 //A2H = -1.9521070769536099515084615064e-02data8 0xBC491BC884F6F18A //A2L = -2.7222721075104525371410300625e-18data8 0x3FCBE3FBB015A591 //A1H = 2.1789499376181400980279079249e-01data8 0x3C76AFC2D1AE35F7 //A1L = 1.9677459707672596091076696742e-17data8 0x3FD6EF53DE8C8FAF //A0H = 3.5835739835078589399230963863e-01data8 0x3C8E2A1C14355F9D //A0L = 5.2327050592919416045278607775e-17data8 0xF56D363AAE3BAD53, 0x00003FBB //A25 = 6.4963882412697389947564301120e-21data8 0xAD6348526CEEB897, 0x0000BFBD //A24 = -1.8358149767147407353343152624e-20data8 0x85D96A988565FD65, 0x0000BFC1 //A23 = -2.2674950494950919052759556703e-19data8 0xD52CAF6B1E4D9717, 0x00003FC3 //A22 = 1.4445269502644677106995571101e-18data8 0xBD7E1BE5CBEF7A01, 0x00003FC5 //A21 = 5.1362075721080004718090799595e-18data8 0xAE84A9B12ADD6948, 0x0000BFC9 //A20 = -7.5685210830925426342786733068e-17data8 0xEAC2D5FCF80E250C, 0x00003FC6 //A19 = 1.2726423522879522181100392135e-17data8 0xE0D2A8AC8C2EDB95, 0x00003FCE //A18 = 3.1200443098733419749016380203e-15data8 0xB22F0AB7B417F78E, 0x0000BFD0 //A17 = -9.8911854977385933809488291835e-15data8 0xE25A627BAEFFA7A4, 0x0000BFD3 //A16 = -1.0052095388666003876301743498e-13data8 0xC90F32EC4A17F908, 0x00003FD6 //A15 = 7.1430637679768183097897337145e-13data8 0x905F6F124AF956B1, 0x00003FD8 //A14 = 2.0516607231389483452611375485e-12LOCAL_OBJECT_END(_0p25_to_0p5_data)LOCAL_OBJECT_START(_0p5_to_1_data)// Polynomial coefficients for the tanh(x), 1/2 <= |x| < 1 data8 0xAB402BE491EE72A7, 0x00003FF7 //A3 = 5.2261556931080934657023772945e-03data8 0xBFB8403D3DDA87BE //A2H = -9.4730212784752659826992271519e-02data8 0xBC6FF7BC2AB71A8B //A2L = -1.3863786398568460929625760740e-17data8 0x3FD3173B1EFA6EF4 //A1H = 2.9829290414066567116435635398e-01data8 0x3C881E4DCABDE840 //A1L = 4.1838710466827119847963316219e-17data8 0x3FE45323E552F228 //A0H = 6.3514895238728730220145735075e-01data8 0x3C739D5832BF7BCF //A0L = 1.7012977006567066423682445459e-17data8 0xF153980BECD8AE12, 0x00003FD0 //A25 = 1.3396313991261493342597057700e-14data8 0xEC9ACCD245368129, 0x0000BFD3 //A24 = -1.0507358886349528807350792383e-13data8 0x8AE6498CA36D2D1A, 0x00003FD4 //A23 = 1.2336759149738309660361813001e-13data8 0x8DF02FBF5AC70E64, 0x00003FD7 //A22 = 1.0085317723615282268326194551e-12data8 0x9E15C7125DA204EE, 0x0000BFD9 //A21 = -4.4930478919612724261941857560e-12data8 0xA62C6F39BDDCEC1C, 0x00003FD7 //A20 = 1.1807342457875095150035780314e-12data8 0xDFD8D65D30F80F52, 0x00003FDC //A19 = 5.0896919887121116317817665996e-11data8 0xB795AFFD458F743E, 0x0000BFDE //A18 = -1.6696932710534097241291327756e-10data8 0xFEF30234CB01EC89, 0x0000BFDD //A17 = -1.1593749714588103589483091370e-10data8 0xA2F638356E13761E, 0x00003FE2 //A16 = 2.3714062288761887457674853605e-09data8 0xC429CC0D031E4FD5, 0x0000BFE3 //A15 = -5.7091025466377379046489586383e-09data8 0xC78363FF929EFF62, 0x0000BFE4 //A14 = -1.1613199289622686725595739572e-08LOCAL_OBJECT_END(_0p5_to_1_data)LOCAL_OBJECT_START(_1_to_2_data)// Polynomial coefficients for the tanh(x), 1 <= |x| < 2.0 data8 0xB3D8FB48A548D99A, 0x00003FFB //A3 = 8.7816203264683800892441646129e-02data8 0xBFC4EFBD8FB38E3B //A2H = -1.6356629864377389416141284073e-01

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -