⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 s_erfl.s

📁 glibc 2.9,最新版的C语言库函数
💻 S
📖 第 1 页 / 共 4 页
字号:
.file "erfl.s"// Copyright (c) 2001 - 2003, Intel Corporation// All rights reserved.//// Contributed 2001 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.//// History//==============================================================// 11/21/01  Initial version// 05/20/02  Cleaned up namespace and sf0 syntax// 08/14/02  Changed mli templates to mlx// 02/06/03  Reordered header: .section, .global, .proc, .align//// API//==============================================================// long double erfl(long double)//// Overview of operation//==============================================================//// Algorithm description// ---------------------//// There are 4 paths://// 1. Special path: x = 0, Inf, NaNs, denormal//    Return erfl(x) = +/-0.0 for zeros//    Return erfl(x) = QNaN for NaNs//    Return erfl(x) = sign(x)*1.0 for Inf//    Return erfl(x) = (A0H+A0L)*x + x^2, ((A0H+A0L) = 2.0/sqrt(Pi))//                                             for denormals//// 2. [0;1/8] path: 0.0 < |x| < 1/8//    Return erfl(x) = x*(A1H+A1L) + x^3*A3 + ... + x^15*A15//// 3. Main path: 1/8 <= |x| < 6.53//    For several ranges of 1/8 <= |x| < 6.53//    Return erfl(x) = sign(x)*((A0H+A0L) + y*(A1H+A1L) + y^2*(A2H+A2L) + //                                       + y^3*A3 + y^4*A4 + ... + y^25*A25 )//    where y = (|x|/a) - b////    For each range there is particular set of coefficients.//    Below is the list of ranges://    1/8  <= |x| < 1/4     a = 0.125, b = 1.5//    1/4  <= |x| < 1/2     a = 0.25,  b = 1.5//    1/2  <= |x| < 1.0     a = 0.5,   b = 1.5//    1.0  <= |x| < 2.0     a = 1.0,   b = 1.5//    2.0  <= |x| < 3.25    a = 2.0,   b = 1.5//    3.25 <= |x| < 4.0     a = 2.0,   b = 2.0//    4.0  <= |x| < 6.53    a = 4.0,   b = 1.5//    ( [3.25;4.0] subrange separated for monotonicity issues resolve )//// 4. Saturation path: 6.53 <= |x| < +INF //    Return erfl(x) = sign(x)*(1.0 - tiny_value)//    (tiny_value ~ 1e-1233)//// Implementation notes// --------------------//// 1. Special path: x = 0, INF, NaNa, denormals////    This branch is cut off by one fclass operation.//    Then zeros+nans, infinities and denormals processed separately.//    For denormals we had to use multiprecision A0 coefficient to reach//    necessary accuracy: (A0H+A0L)*x-x^2//// 2. [0;1/8] path: 0.0 < |x| < 1/8////    First coefficient of polynomial we must split to multiprecision too.//    Also we can parallelise computations://    (x*(A1H+A1L)) calculated in parallel with "tail" (x^3*A3 + ... + x^15*A15)//    Furthermore the second part is factorized using binary tree technique.//// 3. Main path: 1/8 <= |x| < 6.53////    Multiprecision have to be performed only for first few//    polynomial iterations (up to 3-rd x degree)//    Here we use the same parallelisation way as above://    Split whole polynomial to first, "multiprecision" part, and second, //    so called "tail", native precision part.////    1) Multiprecision part:  //    [v1=(A0H+A0L)+y*(A1H+A1L)] + [v2=y^2*((A2H+A2L)+y*A3)]//    v1 and v2 terms calculated in parallel////    2) Tail part://    v3 = x^4 * ( A4 + x*A5 + ... + x^21*A25 )//    v3 is splitted to 2 even parts (10 coefficient in each one).//    These 2 parts are also factorized using binary tree technique.//    //    So Multiprecision and Tail parts cost is almost the same//    and we have both results ready before final summation.//// 4. Saturation path: 6.53 <= |x| < +INF ////    We use formula sign(x)*(1.0 - tiny_value) instead of simple sign(x)*1.0//    just to meet IEEE requirements for different rounding modes in this case.//// Registers used//==============================================================// Floating Point registers used: // f8 - input & output// f32 -> f90// General registers used:  // r2, r3, r32 -> r52 // Predicate registers used:// p0, p6 -> p11, p14, p15// p6  - arg is zero, denormal or special IEEE// p7  - arg is in [4;8] binary interval// p8  - arg is in [3.25;4] interval// p9  - arg < 1/8// p10 - arg is NOT in [3.25;4] interval// p11 - arg in saturation domain// p14 - arg is positive// p15 - arg is negative// Assembly macros//==============================================================rDataPtr           = r2rTailDataPtr       = r3rBias              = r33rSignBit           = r34rInterval          = r35rArgExp            = r36rArgSig            = r37r3p25Offset        = r38r2to4              = r39r1p25              = r40rOffset            = r41r1p5               = r42rSaturation        = r43r3p25Sign          = r44rTiny              = r45rAddr1             = r46rAddr2             = r47rTailAddr1         = r48rTailAddr2         = r49rTailOffset        = r50rTailAddOffset     = r51rShiftedDataPtr    = r52//==============================================================fA0H               = f32fA0L               = f33fA1H               = f34fA1L               = f35fA2H               = f36fA2L               = f37fA3                = f38fA4                = f39fA5                = f40fA6                = f41fA7                = f42fA8                = f43fA9                = f44fA10               = f45fA11               = f46fA12               = f47fA13               = f48fA14               = f49fA15               = f50fA16               = f51fA17               = f52fA18               = f53fA19               = f54fA20               = f55 fA21               = f56 fA22               = f57 fA23               = f58fA24               = f59fA25               = f60fArgSqr            = f61fArgCube           = f62fArgFour           = f63fArgEight          = f64fArgAbsNorm        = f65fArgAbsNorm2       = f66fArgAbsNorm2L      = f67fArgAbsNorm3       = f68fArgAbsNorm4       = f69fArgAbsNorm11      = f70fRes               = f71fResH              = f72fResL              = f73fRes1H             = f74fRes1L             = f75fRes1Hd            = f76fRes2H             = f77fRes2L             = f78fRes3H             = f79fRes3L             = f80fRes4              = f81fTT                = f82 fTH                = f83fTL                = f84fTT2               = f85 fTH2               = f86fTL2               = f87f1p5               = f88f2p0               = f89fTiny              = f90// Data tables//==============================================================RODATA.align 64LOCAL_OBJECT_START(erfl_data)////////// Main tables ///////////_0p125_to_0p25_data: // exp = 2^-3// Polynomial coefficients for the erf(x), 1/8 <= |x| < 1/4 data8 0xACD9ED470F0BB048, 0x0000BFF4 //A3 = -6.5937529303909561891162915809e-04data8 0xBF6A254428DDB452 //A2H = -3.1915980570631852578089571182e-03data8 0xBC131B3BE3AC5079 //A2L = -2.5893976889070198978842231134e-19data8 0x3FC16E2D7093CD8C //A1H = 1.3617485043469590433318217038e-01data8 0x3C6979A52F906B4C //A1L = 1.1048096806003284897639351952e-17data8 0x3FCAC45E37FE2526 //A0H = 2.0911767705937583938791135552e-01data8 0x3C648D48536C61E3 //A0L = 8.9129592834861155344147026365e-18data8 0xD1FC135B4A30E746, 0x00003F90 //A25 = 6.3189963203954877364460345654e-34data8 0xB1C79B06DD8C988C, 0x00003F97 //A24 = 6.8478253118093953461840838106e-32data8 0xCC7AE121D1DEDA30, 0x0000BF9A //A23 = -6.3010264109146390803803408666e-31data8 0x8927B8841D1E0CA8, 0x0000BFA1 //A22 = -5.4098171537601308358556861717e-29data8 0xB4E84D6D0C8F3515, 0x00003FA4 //A21 = 5.7084320046554628404861183887e-28data8 0xC190EAE69A67959A, 0x00003FAA //A20 = 3.9090359419467121266470910523e-26data8 0x90122425D312F680, 0x0000BFAE //A19 = -4.6551806872355374409398000522e-25data8 0xF8456C9C747138D6, 0x0000BFB3 //A18 = -2.5670639225386507569611436435e-23data8 0xCDCAE0B3C6F65A3A, 0x00003FB7 //A17 = 3.4045511783329546779285646369e-22data8 0x8F41909107C62DCC, 0x00003FBD //A16 = 1.5167830861896169812375771948e-20data8 0x82F0FCB8A4B8C0A3, 0x0000BFC1 //A15 = -2.2182328575376704666050112195e-19data8 0x92E992C58B7C3847, 0x0000BFC6 //A14 = -7.9641369349930600223371163611e-18LOCAL_OBJECT_END(erfl_data)LOCAL_OBJECT_START(_0p25_to_0p5_data)// Polynomial coefficients for the erf(x), 1/4 <= |x| < 1/2 data8 0xF083628E8F7CE71D, 0x0000BFF6 //A3 = -3.6699405305266733332335619531e-03data8 0xBF978749A434FE4E //A2H = -2.2977018973732214746075186440e-02data8 0xBC30B3FAFBC21107 //A2L = -9.0547407100537663337591537643e-19data8 0x3FCF5F0CDAF15313 //A1H = 2.4508820238647696654332719390e-01data8 0x3C1DFF29F5AD8117 //A1L = 4.0653155218104625249413579084e-19data8 0x3FD9DD0D2B721F38 //A0H = 4.0411690943482225790717166092e-01data8 0x3C874C71FEF1759E //A0L = 4.0416653425001310671815863946e-17data8 0xA621D99B8C12595E, 0x0000BFAB //A25 = -6.7100271986703749013021666304e-26data8 0xBD7BBACB439992E5, 0x00003FAE //A24 = 6.1225362452814749024566661525e-25data8 0xFF2FEFF03A98E410, 0x00003FB2 //A23 = 1.3192871864994282747963195183e-23data8 0xAE8180957ABE6FD5, 0x0000BFB6 //A22 = -1.4434787102181180110707433640e-22data8 0xAF0566617B453AA6, 0x0000BFBA //A21 = -2.3163848847252215762970075142e-21data8 0x8F33D3616B9B8257, 0x00003FBE //A20 = 3.0324297082969526400202995913e-20data8 0xD58AB73354438856, 0x00003FC1 //A19 = 3.6175397854863872232142412590e-19data8 0xD214550E2F3210DF, 0x0000BFC5 //A18 = -5.6942141660091333278722310354e-18data8 0xE2CA60C328F3BBF5, 0x0000BFC8 //A17 = -4.9177359011428870333915211291e-17data8 0x88D9BB274F9B3873, 0x00003FCD //A16 = 9.4959118337089189766177270051e-16data8 0xCA4A00AB538A2DB2, 0x00003FCF //A15 = 5.6146496538690657993449251855e-15data8 0x9CC8FFFBDDCF9853, 0x0000BFD4 //A14 = -1.3925319209173383944263942226e-13LOCAL_OBJECT_END(_0p25_to_0p5_data)LOCAL_OBJECT_START(_0p5_to_1_data)// Polynomial coefficients for the erf(x), 1/2 <= |x| < 1 data8 0xDB742C8FB372DBE0, 0x00003FF6 //A3 = 3.3485993187250381721535255963e-03data8 0xBFBEDC5644353C26 //A2H = -1.2054957547410136142751468924e-01data8 0xBC6D7215B023455F //A2L = -1.2770012232203569059818773287e-17data8 0x3FD492E42D78D2C4 //A1H = 3.2146553459760363047337250464e-01data8 0x3C83A163CAC22E05 //A1L = 3.4053365952542489137756724868e-17data8 0x3FE6C1C9759D0E5F //A0H = 7.1115563365351508462453011816e-01data8 0x3C8B1432F2CBC455 //A0L = 4.6974407716428899960674098333e-17data8 0x95A6B92162813FF8, 0x00003FC3 //A25 = 1.0140763985766801318711038400e-18data8 0xFE5EC3217F457B83, 0x0000BFC6 //A24 = -1.3789434273280972156856405853e-17data8 0x9B49651031B5310B, 0x0000BFC8 //A23 = -3.3672435142472427475576375889e-17

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -