⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 s_atanl.s

📁 glibc 2.9,最新版的C语言库函数
💻 S
📖 第 1 页 / 共 4 页
字号:
.file "atanl.s"// Copyright (c) 2000 - 2005, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.//////*********************************************************************//// History// 02/02/00 (hand-optimized)// 04/04/00 Unwind support added// 08/15/00 Bundle added after call to __libm_error_support to properly//          set [the previously overwritten] GR_Parameter_RESULT.// 03/13/01 Fixed flags when denormal raised on intermediate result// 01/08/02 Improved speed.// 02/06/02 Corrected .section statement// 05/20/02 Cleaned up namespace and sf0 syntax// 02/10/03 Reordered header: .section, .global, .proc, .align;//          used data8 for long double table values// 03/31/05 Reformatted delimiters between data tables////*********************************************************************//// Function:   atanl(x) = inverse tangent(x), for double extended x values// Function:   atan2l(y,x) = atan(y/x), for double extended y, x values//// API////  long double atanl  (long double x)//  long double atan2l (long double y, long double x)////*********************************************************************//// Resources Used:////    Floating-Point Registers: f8 (Input and Return Value)//                              f9 (Input for atan2l)//                              f10-f15, f32-f83////    General Purpose Registers://      r32-r51//      r49-r52 (Arguments to error support for 0,0 case)////    Predicate Registers:      p6-p15////*********************************************************************//// IEEE Special Conditions:////    Denormal fault raised on denormal inputs//    Underflow exceptions may occur //    Special error handling for the y=0 and x=0 case//    Inexact raised when appropriate by algorithm////    atanl(SNaN) = QNaN//    atanl(QNaN) = QNaN//    atanl(+/-0) = +/- 0//    atanl(+/-Inf) = +/-pi/2 ////    atan2l(Any NaN for x or y) = QNaN//    atan2l(+/-0,x) = +/-0 for x > 0 //    atan2l(+/-0,x) = +/-pi for x < 0 //    atan2l(+/-0,+0) = +/-0 //    atan2l(+/-0,-0) = +/-pi //    atan2l(y,+/-0) = pi/2 y > 0//    atan2l(y,+/-0) = -pi/2 y < 0//    atan2l(+/-y, Inf) = +/-0 for finite y > 0//    atan2l(+/-Inf, x) = +/-pi/2 for finite x //    atan2l(+/-y, -Inf) = +/-pi for finite  y > 0 //    atan2l(+/-Inf, Inf) = +/-pi/4//    atan2l(+/-Inf, -Inf) = +/-3pi/4////*********************************************************************//// Mathematical Description// ---------------------------//// The function ATANL( Arg_Y, Arg_X ) returns the "argument"// or the "phase" of the complex number////           Arg_X + i Arg_Y//// or equivalently, the angle in radians from the positive// x-axis to the line joining the origin and the point// (Arg_X,Arg_Y)//////        (Arg_X, Arg_Y) x//                        \//                \//                 \//                  \//                   \ angle between is ATANL(Arg_Y,Arg_X)//                    \//                     ------------------> X-axis//                   Origin//// Moreover, this angle is reported in the range [-pi,pi] thus////      -pi <= ATANL( Arg_Y, Arg_X ) <= pi.//// From the geometry, it is easy to define ATANL when one of// Arg_X or Arg_Y is +-0 or +-inf://////      \ Y |//     X \  |  +0  | -0  |  +inf |  -inf  |  finite non-zero//        \ |      |     |       |        |//    ______________________________________________________//          |            |       |        |//     +-0  |   Invalid/ |  pi/2 | -pi/2  |  sign(Y)*pi/2//          |    qNaN    |       |        |//  --------------------------------------------------------//          |      |     |       |        |//     +inf |  +0  | -0  |  pi/4 | -pi/4  |  sign(Y)*0//  --------------------------------------------------------//          |      |     |       |        |//     -inf |  +pi | -pi | 3pi/4 | -3pi/4 |  sign(Y)*pi//  --------------------------------------------------------//   finite |    X>0?    |  pi/2 | -pi/2  |  normal case//  non-zero| sign(Y)*0: |       |        |//       | sign(Y)*pi |       |        |////// One must take note that ATANL is NOT the arctangent of the// value Arg_Y/Arg_X; but rather ATANL and arctan are related// in a slightly more complicated way as follows://// Let U := max(|Arg_X|, |Arg_Y|);  V := min(|Arg_X|, |Arg_Y|);// sign_X be the sign bit of Arg_X, i.e., sign_X is 0 or 1;// s_X    be the sign     of Arg_X, i.e., s_X = (-1)^sign_X;//// sign_Y be the sign bit of Arg_Y, i.e., sign_Y is 0 or 1;// s_Y    be the sign     of Arg_Y, i.e., s_Y = (-1)^sign_Y;//// swap   be 0  if |Arg_X| >= |Arg_Y|  and 1 otherwise.//// Then, ATANL(Arg_Y, Arg_X) =////       /    arctan(V/U)     \      sign_X = 0 & swap = 0//       | pi/2 - arctan(V/U) |      sign_X = 0 & swap = 1// s_Y * |                    |//       |  pi  - arctan(V/U) |      sign_X = 1 & swap = 0//       \ pi/2 + arctan(V/U) /      sign_X = 1 & swap = 1////// This relationship also suggest that the algorithm's major// task is to calculate arctan(V/U) for 0 < V <= U; and the// final Result is given by////      s_Y * { (P_hi + P_lo) + sigma * arctan(V/U) }//// where////   (P_hi,P_lo) represents M(sign_X,swap)*(pi/2) accurately////   M(sign_X,swap) = 0  for sign_X = 0 and swap = 0//              1  for swap   = 1//              2  for sign_X = 1 and swap = 0//// and////   sigma = { (sign_X  XOR  swap) :  -1.0 : 1.0 }////      =  (-1) ^ ( sign_X XOR swap )//// Both (P_hi,P_lo) and sigma can be stored in a table and fetched// using (sign_X,swap) as an index. (P_hi, P_lo) can be stored as a// double-precision, and single-precision pair; and sigma can// obviously be just a single-precision number.//// In the algorithm we propose, arctan(V/U) is calculated to high accuracy// as A_hi + A_lo. Consequently, the Result ATANL( Arg_Y, Arg_X ) is// given by////    s_Y*P_hi + s_Y*sigma*A_hi + s_Y*(sigma*A_lo + P_lo)//// We now discuss the calculation of arctan(V/U) for 0 < V <= U.//// For (V/U) < 2^(-3), we use a simple polynomial of the form////      z + z^3*(P_1 + z^2*(P_2 + z^2*(P_3 + ... + P_8)))//// where z = V/U.//// For the sake of accuracy, the first term "z" must approximate V/U to// extra precision. For z^3 and higher power, a working precision// approximation to V/U suffices. Thus, we obtain:////      z_hi + z_lo = V/U  to extra precision and//      z           = V/U  to working precision//// The value arctan(V/U) is delivered as two pieces (A_hi, A_lo)////      (A_hi,A_lo) = (z_hi, z^3*(P_1 + ... + P_8) + z_lo).////// For 2^(-3) <= (V/U) <= 1, we use a table-driven approach.// Consider////      (V/U) = 2^k * 1.b_1 b_2 .... b_63 b_64 b_65 ....//// Define////       z_hi = 2^k * 1.b_1 b_2 b_3 b_4 1//// then//                                            /                \//                                            |  (V/U) - z_hi  |//      arctan(V/U) = arctan(z_hi) + acrtan| -------------- |//                                            | 1 + (V/U)*z_hi |//                                            \                /////                                            /                \//                                            |   V - z_hi*U   |//                  = arctan(z_hi) + acrtan| -------------- |//                                            |   U + V*z_hi   |//                                            \                /////                  = arctan(z_hi) + acrtan( V' / U' )////// where////      V' = V - U*z_hi;   U' = U + V*z_hi.//// Let////      w_hi + w_lo  = V'/U' to extra precision and//           w       = V'/U' to working precision//// then we can approximate arctan(V'/U') by////      arctan(V'/U') = w_hi + w_lo//                     + w^3*(Q_1 + w^2*(Q_2 + w^2*(Q_3 + w^2*Q_4)))////                       = w_hi + w_lo + poly//// Finally, arctan(z_hi) is calculated beforehand and stored in a table// as Tbl_hi, Tbl_lo. Thus,////      (A_hi, A_lo) = (Tbl_hi, w_hi+(poly+(w_lo+Tbl_lo)))//// This completes the mathematical description.////// Algorithm// -------------//// Step 0. Check for unsupported format.////    If//       ( expo(Arg_X) not zero AND msb(Arg_X) = 0 ) OR//       ( expo(Arg_Y) not zero AND msb(Arg_Y) = 0 )////    then one of the arguments is unsupported. Generate an//    invalid and return qNaN.//// Step 1. Initialize////    Normalize Arg_X and Arg_Y and set the following////    sign_X :=  sign_bit(Arg_X)//    s_Y    := (sign_bit(Arg_Y)==0? 1.0 : -1.0)//    swap   := (|Arg_X| >= |Arg_Y|?   0 :  1  )//    U      := max( |Arg_X|, |Arg_Y| )//    V      := min( |Arg_X|, |Arg_Y| )////    execute: frcpa E, pred, V, U//    If pred is 0, go to Step 5 for special cases handling.//// Step 2. Decide on branch.////    Q := E * V//    If Q < 2^(-3) go to Step 4 for simple polynomial case.//// Step 3. Table-driven algorithm.////    Q is represented as////      2^(-k) * 1.b_1 b_2 b_3 ... b_63; k = 0,-1,-2,-3//// and that if k = 0, b_1 = b_2 = b_3 = b_4 = 0.//// Define////      z_hi := 2^(-k) * 1.b_1 b_2 b_3 b_4 1//// (note that there are 49 possible values of z_hi).////      ...We now calculate V' and U'. While V' is representable//      ...as a 64-bit number because of cancellation, U' is//      ...not in general a 64-bit number. Obtaining U' accurately//      ...requires two working precision numbers////      U_prime_hi := U + V * z_hi            ...WP approx. to U'//      U_prime_lo := ( U - U_prime_hi ) + V*z_hi ...observe order//      V_prime    := V - U * z_hi             ...this is exact////         C_hi := frcpa (1.0, U_prime_hi)  ...C_hi approx 1/U'_hi////      loop 3 times//         C_hi := C_hi + C_hi*(1.0 - C_hi*U_prime_hi)////      ...at this point C_hi is (1/U_prime_hi) to roughly 64 bits////      w_hi := V_prime * C_hi     ...w_hi is V_prime/U_prime to//                     ...roughly working precision////         ...note that we want w_hi + w_lo to approximate//      ...V_prime/(U_prime_hi + U_prime_lo) to extra precision//         ...but for now, w_hi is good enough for the polynomial//      ...calculation.////         wsq  := w_hi*w_hi//      poly := w_hi*wsq*(Q_1 + wsq*(Q_2 + wsq*(Q_3 + wsq*Q_4)))////      Fetch//      (Tbl_hi, Tbl_lo) = atan(z_hi) indexed by (k,b_1,b_2,b_3,b_4)//      ...Tbl_hi is a double-precision number//      ...Tbl_lo is a single-precision number////         (P_hi, P_lo) := M(sign_X,swap)*(Pi_by_2_hi, Pi_by_2_lo)//      ...as discussed previous. Again; the implementation can//      ...chose to fetch P_hi and P_lo from a table indexed by//      ...(sign_X, swap).//      ...P_hi is a double-precision number;//      ...P_lo is a single-precision number.////      ...calculate w_lo so that w_hi + w_lo is V'/U' accurately//         w_lo := ((V_prime - w_hi*U_prime_hi) -//              w_hi*U_prime_lo) * C_hi     ...observe order//////      ...Ready to deliver arctan(V'/U') as A_hi, A_lo//      A_hi := Tbl_hi//      A_lo := w_hi + (poly + (Tbl_lo + w_lo)) ...observe order////      ...Deliver final Result//      ...s_Y*P_hi + s_Y*sigma*A_hi + s_Y*(sigma*A_lo + P_lo)////      sigma := ( (sign_X XOR swap) ? -1.0 : 1.0 )//      ...sigma can be obtained by a table lookup using//      ...(sign_X,swap) as index and stored as single precision//         ...sigma should be calculated earlier////      P_hi := s_Y*P_hi//      A_hi := s_Y*A_hi////      Res_hi := P_hi + sigma*A_hi     ...this is exact because//                          ...both P_hi and Tbl_hi//                          ...are double-precision//                          ...and |Tbl_hi| > 2^(-4)//                          ...P_hi is either 0 or//                          ...between (1,4)////      Res_lo := sigma*A_lo + P_lo////      Return Res_hi + s_Y*Res_lo in user-defined rounding control//// Step 4. Simple polynomial case.////    ...E and Q are inherited from Step 2.////    A_hi := Q     ...Q is inherited from Step 2 Q approx V/U////    loop 3 times//       E := E + E2(1.0 - E*U1//    ...at this point E approximates 1/U to roughly working precision////    z := V * E     ...z approximates V/U to roughly working precision//    zsq := z * z//    z4 := zsq * zsq; z8 := z4 * z4////    poly1 := P_4 + zsq*(P_5 + zsq*(P_6 + zsq*(P_7 + zsq*P_8)))//    poly2 := zsq*(P_1 + zsq*(P_2 + zsq*P_3))////    poly  := poly1 + z8*poly2////    z_lo := (V - A_hi*U)*E////    A_lo := z*poly + z_lo//    ...A_hi, A_lo approximate arctan(V/U) accurately////    (P_hi, P_lo) := M(sign_X,swap)*(Pi_by_2_hi, Pi_by_2_lo)//    ...one can store the M(sign_X,swap) as single precision//    ...values////    ...Deliver final Result//    ...s_Y*P_hi + s_Y*sigma*A_hi + s_Y*(sigma*A_lo + P_lo)////    sigma := ( (sign_X XOR swap) ? -1.0 : 1.0 )//    ...sigma can be obtained by a table lookup using//    ...(sign_X,swap) as index and stored as single precision//    ...sigma should be calculated earlier////    P_hi := s_Y*P_hi//    A_hi := s_Y*A_hi////    Res_hi := P_hi + sigma*A_hi          ...need to compute//                          ...P_hi + sigma*A_hi//                          ...exactly////    tmp    := (P_hi - Res_hi) + sigma*A_hi////    Res_lo := s_Y*(sigma*A_lo + P_lo) + tmp////    Return Res_hi + Res_lo in user-defined rounding control//// Step 5. Special Cases////    These are detected early in the function by fclass instructions.////    We are in one of those special cases when X or Y is 0,+-inf or NaN////    If one of X and Y is NaN, return X+Y (which will generate//    invalid in case one is a signaling NaN). Otherwise,//    return the Result as described in the table////////      \ Y |//     X \  |  +0  | -0  |  +inf |  -inf  |  finite non-zero//        \ |      |     |       |        |//    ______________________________________________________//          |            |       |        |//     +-0  |   Invalid/ |  pi/2 | -pi/2  |  sign(Y)*pi/2//          |    qNaN    |       |        |//  --------------------------------------------------------//          |      |     |       |        |//     +inf |  +0  | -0  |  pi/4 | -pi/4  |  sign(Y)*0//  --------------------------------------------------------//          |      |     |       |        |//     -inf |  +pi | -pi | 3pi/4 | -3pi/4 |  sign(Y)*pi//  --------------------------------------------------------//   finite |    X>0?    |  pi/2 | -pi/2  |//  non-zero| sign(Y)*0: |       |        |      N/A//       | sign(Y)*pi |       |        |////ArgY_orig   =   f8Result      =   f8FR_RESULT   =   f8ArgX_orig   =   f9ArgX        =   f10FR_X        =   f10ArgY        =   f11FR_Y        =   f11s_Y         =   f12U           =   f13V           =   f14E           =   f15Q           =   f32z_hi        =   f33U_prime_hi  =   f34U_prime_lo  =   f35V_prime     =   f36C_hi        =   f37w_hi        =   f38w_lo        =   f39

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -