⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 e_powf.s

📁 Glibc 2.3.2源代码(解压后有100多M)
💻 S
📖 第 1 页 / 共 5 页
字号:
.file "powf.s"// Copyright (C) 2000, 2001, Intel Corporation// All rights reserved.//// Contributed 2/2/2000 by John Harrison, Ted Kubaska, Bob Norin, Shane Story,// and Ping Tak Peter Tang of the Computational Software Lab, Intel Corporation.//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.//// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY// OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.//// Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at// http://developer.intel.com/opensource.//// History//==============================================================// 2/02/00  Initial version// 2/03/00  Added p12 to definite over/under path. With odd power we did not//          maintain the sign of x in this path.// 4/04/00  Unwind support added// 4/19/00  pow(+-1,inf) now returns NaN//          pow(+-val, +-inf) returns 0 or inf, but now does not call error support//          Added s1 to fcvt.fx because invalid flag was incorrectly set.// 8/15/00  Bundle added after call to __libm_error_support to properly//          set [the previously overwritten] GR_Parameter_RESULT.// 9/07/00  Improved performance by eliminating bank conflicts and other stalls,//          and tweaking the critical path// 9/08/00  Per c99, pow(+-1,inf) now returns 1, and pow(+1,nan) returns 1// 9/28/00  Updated NaN**0 path // 1/20/01  Fixed denormal flag settings.// 2/12/01  Improved speed.//// API//==============================================================// double pow(double)// float  powf(float)//// Overview of operation//==============================================================//// Three steps...// 1. Log(x)// 2. y Log(x)// 3. exp(y log(x))// // This means we work with the absolute value of x and merge in the sign later.//      Log(x) = G + delta + r -rsq/2 + p// G,delta depend on the exponent of x and table entries. The table entries are// indexed by the exponent of x, called K.// // The G and delta come out of the reduction; r is the reduced x.// // B = frcpa(x)// xB-1 is small means that B is the approximate inverse of x.// //      Log(x) = Log( (1/B)(Bx) )//             = Log(1/B) + Log(Bx)//             = Log(1/B) + Log( 1 + (Bx-1))// //      x  = 2^K 1.x_1x_2.....x_52//      B= frcpa(x) = 2^-k Cm //      Log(1/B) = Log(1/(2^-K Cm))//      Log(1/B) = Log((2^K/ Cm))//      Log(1/B) = K Log(2) + Log(1/Cm)// //      Log(x)   = K Log(2) + Log(1/Cm) + Log( 1 + (Bx-1))// // If you take the significand of x, set the exponent to true 0, then Cm is// the frcpa. We tabulate the Log(1/Cm) values. There are 256 of them.// The frcpa table is indexed by 8 bits, the x_1 thru x_8.// m = x_1x_2...x_8 is an 8-bit index.// //      Log(1/Cm) = log(1/frcpa(1+m/256)) where m goes from 0 to 255.// // We tabluate as two doubles, T and t, where T +t is the value itself.// //      Log(x)   = (K Log(2)_hi + T) + (Log(2)_hi + t) + Log( 1 + (Bx-1))//      Log(x)   =  G + delta           + Log( 1 + (Bx-1))// // The Log( 1 + (Bx-1)) can be calculated as a series in r = Bx-1.// //      Log( 1 + (Bx-1)) = r - rsq/2 + p// // Then,//    //      yLog(x) = yG + y delta + y(r-rsq/2) + yp//      yLog(x) = Z1 + e3      + Z2         + Z3 + (e2 + e3)// // //     exp(yLog(x)) = exp(Z1 + Z2 + Z3) exp(e1 + e2 + e3)//////       exp(Z3) is another series.//       exp(e1 + e2 + e3) is approximated as f3 = 1 + (e1 + e2 + e3)////       Z1 (128/log2) = number of log2/128 in Z1 is N1//       Z2 (128/log2) = number of log2/128 in Z2 is N2////       s1 = Z1 - N1 log2/128//       s2 = Z2 - N2 log2/128////       s = s1 + s2//       N = N1 + N2////       exp(Z1 + Z2) = exp(Z)//       exp(Z)       = exp(s) exp(N log2/128)////       exp(r)       = exp(Z - N log2/128)////      r = s + d = (Z - N (log2/128)_hi) -N (log2/128)_lo//                =  Z - N (log2/128) ////      Z         = s+d +N (log2/128)////      exp(Z)    = exp(s) (1+d) exp(N log2/128)////      N = M 128 + n////      N log2/128 = M log2 + n log2/128////      n is 8 binary digits = n_7n_6...n_1////      n log2/128 = n_7n_6n_5 16 log2/128 + n_4n_3n_2n_1 log2/128//      n log2/128 = n_7n_6n_5 log2/8 + n_4n_3n_2n_1 log2/128//      n log2/128 = I2 log2/8 + I1 log2/128////      N log2/128 = M log2 + I2 log2/8 + I1 log2/128 ////      exp(Z)    = exp(s) (1+d) exp(log(2^M) + log(2^I2/8) + log(2^I1/128))//      exp(Z)    = exp(s) (1+d1) (1+d2)(2^M) 2^I2/8 2^I1/128//      exp(Z)    = exp(s) f1 f2 (2^M) 2^I2/8 2^I1/128//// I1, I2 are table indices. Use a series for exp(s).// Then get exp(Z) ////     exp(yLog(x)) = exp(Z1 + Z2 + Z3) exp(e1 + e2 + e3)//     exp(yLog(x)) = exp(Z) exp(Z3) f3 //     exp(yLog(x)) = exp(Z)f3 exp(Z3)  //     exp(yLog(x)) = A exp(Z3)  //// We actually calculate exp(Z3) -1.// Then, //     exp(yLog(x)) = A + A( exp(Z3)   -1)//// Table Generation//==============================================================// The log values// ==============// The operation (K*log2_hi) must be exact. K is the true exponent of x.// If we allow gradual underflow (denormals), K can be represented in 12 bits// (as a two's complement number). We assume 13 bits as an engineering precaution.// //           +------------+----------------+-+//           |  13 bits   | 50 bits        | |//           +------------+----------------+-+//           0            1                66//                        2                34// // So we want the lsb(log2_hi) to be 2^-50// We get log2 as a quad-extended (15-bit exponent, 128-bit significand)// //      0 fffe b17217f7d1cf79ab c9e3b39803f2f6af (4...)// // Consider numbering the bits left to right, starting at 0 thru 127.// Bit 0 is the 2^-1 bit; bit 49 is the 2^-50 bit.// //  ...79ab//     0111 1001 1010 1011//     44//     89// // So if we shift off the rightmost 14 bits, then (shift back only // the top half) we get// //      0 fffe b17217f7d1cf4000 e6af278ece600fcb dabc000000000000// // Put the right 64-bit signficand in an FR register, convert to double;// it is exact. Put the next 128 bits into a quad register and round to double.// The true exponent of the low part is -51.// // hi is 0 fffe b17217f7d1cf4000// lo is 0 ffcc e6af278ece601000// // Convert to double memory format and get// // hi is 0x3fe62e42fefa39e8// lo is 0x3cccd5e4f1d9cc02 // // log2_hi + log2_lo is an accurate value for log2.// // // The T and t values// ==================// A similar method is used to generate the T and t values.// // K * log2_hi + T  must be exact.// // Smallest T,t// ----------// The smallest T,t is //       T                   t// data8 0x3f60040155d58800, 0x3c93bce0ce3ddd81  log(1/frcpa(1+0/256))=  +1.95503e-003// // The exponent is 0x3f6 (biased)  or -9 (true).// For the smallest T value, what we want is to clip the significand such that// when it is shifted right by 9, its lsb is in the bit for 2^-51. The 9 is the specific // for the first entry. In general, it is 0xffff - (biased 15-bit exponent).// Independently, what we have calculated is the table value as a quad precision number.// Table entry 1 is// 0 fff6 80200aaeac44ef38 338f77605fdf8000// // We store this quad precision number in a data structure that is//    sign:           1 //    exponent:      15//    signficand_hi: 64 (includes explicit bit)//    signficand_lo: 49// Because the explicit bit is included, the significand is 113 bits.// // Consider significand_hi for table entry 1.// // // +-+--- ... -------+--------------------+// | |// +-+--- ... -------+--------------------+// 0 1               4444444455555555556666//                   2345678901234567890123// // Labeled as above, bit 0 is 2^0, bit 1 is 2^-1, etc.// Bit 42 is 2^-42. If we shift to the right by 9, the bit in// bit 42 goes in 51.// // So what we want to do is shift bits 43 thru 63 into significand_lo.// This is shifting bit 42 into bit 63, taking care to retain the shifted-off bits.// Then shifting (just with signficaand_hi) back into bit 42. //  // The shift_value is 63-42 = 21. In general, this is //      63 - (51 -(0xffff - 0xfff6))// For this example, it is//      63 - (51 - 9) = 63 - 42  = 21// // This means we are shifting 21 bits into significand_lo.  We must maintain more// that a 128-bit signficand not to lose bits. So before the shift we put the 128-bit // significand into a 256-bit signficand and then shift.// The 256-bit significand has four parts: hh, hl, lh, and ll.// // Start off with//      hh         hl         lh         ll//      <64>       <49><15_0> <64_0>     <64_0>// // After shift by 21 (then return for significand_hi),//      <43><21_0> <21><43>   <6><58_0>  <64_0>// // Take the hh part and convert to a double. There is no rounding here.// The conversion is exact. The true exponent of the high part is the same as the// true exponent of the input quad.// // We have some 64 plus significand bits for the low part. In this example, we have// 70 bits. We want to round this to a double. Put them in a quad and then do a quad fnorm.// For this example the true exponent of the low part is //      true_exponent_of_high - 43 = true_exponent_of_high - (64-21)// In general, this is //      true_exponent_of_high - (64 - shift_value)  // // // Largest T,t// ----------// The largest T,t is// data8 0x3fe62643fecf9742, 0x3c9e3147684bd37d    log(1/frcpa(1+255/256))=  +6.92171e-001// // Table entry 256 is// 0 fffe b1321ff67cba178c 51da12f4df5a0000// // The shift value is //      63 - (51 -(0xffff - 0xfffe)) = 13// // The true exponent of the low part is //      true_exponent_of_high - (64 - shift_value)//      -1 - (64-13) = -52// Biased as a double, this is 0x3cb// // // // So then lsb(T) must be >= 2^-51// msb(Klog2_hi) <= 2^12// //              +--------+---------+//              |       51 bits    | <== largest T//              +--------+---------+//              | 9 bits | 42 bits | <== smallest T// +------------+----------------+-+// |  13 bits   | 50 bits        | |// +------------+----------------+-+// Special Cases//==============================================================//                                   double     float// overflow                          error 24   30// underflow                         error 25   31// X zero  Y zero//  +0     +0                 +1     error 26   32//  -0     +0                 +1     error 26   32//  +0     -0                 +1     error 26   32//  -0     -0                 +1     error 26   32// X zero  Y negative//  +0     -odd integer       +inf   error 27   33  divide-by-zero//  -0     -odd integer       -inf   error 27   33  divide-by-zero//  +0     !-odd integer      +inf   error 27   33  divide-by-zero//  -0     !-odd integer      +inf   error 27   33  divide-by-zero//  +0     -inf               +inf   error 27   33  divide-by-zero//  -0     -inf               +inf   error 27   33  divide-by-zero// X zero  Y positve//  +0     +odd integer       +0//  -0     +odd integer       -0//  +0     !+odd integer      +0//  -0     !+odd integer      +0//  +0     +inf               +0//  -0     +inf               +0//  +0     Y NaN              quiet Y               invalid if Y SNaN//  -0     Y NaN              quiet Y               invalid if Y SNaN// X one//  -1     Y inf              +1//  -1     Y NaN              quiet Y               invalid if Y SNaN//  +1     Y NaN              +1                    invalid if Y SNaN//  +1     Y any else         +1// X -     Y not integer      QNAN   error 28   34  invalid// X NaN   Y 0                +1     error 29   35// X NaN   Y NaN              quiet X               invalid if X or Y SNaN// X NaN   Y any else         quiet X               invalid if X SNaN// X !+1   Y NaN              quiet Y               invalid if Y SNaN// X +inf  Y >0               +inf// X -inf  Y >0, !odd integer +inf// X -inf  Y >0, odd integer  -inf

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -