⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 libm_tan.s

📁 Glibc 2.3.2源代码(解压后有100多M)
💻 S
📖 第 1 页 / 共 5 页
字号:
.file "libm_tan.s"// Copyright (C) 2000, 2001, Intel Corporation// All rights reserved.// // Contributed 2/2/2000 by John Harrison, Ted Kubaska, Bob Norin, Shane Story,// and Ping Tak Peter Tang of the Computational Software Lab, Intel Corporation.//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.//// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://developer.intel.com/opensource.//// *********************************************************************//// History:  // 02/02/00 Initial Version // 4/04/00  Unwind support added// 12/28/00 Fixed false invalid flags//// *********************************************************************//// Function:   tan(x) = tangent(x), for double precision x values//// *********************************************************************//// Accuracy:       Very accurate for double-precision values  //// *********************************************************************//// Resources Used:////    Floating-Point Registers: f8 (Input and Return Value)//                              f9-f15//                              f32-f112////    General Purpose Registers://      r32-r48//      r49-r50 (Used to pass arguments to pi_by_2 reduce routine)////    Predicate Registers:      p6-p15//// *********************************************************************//// IEEE Special Conditions:////    Denormal  fault raised on denormal inputs//    Overflow exceptions do not occur//    Underflow exceptions raised when appropriate for tan //    (No specialized error handling for this routine)//    Inexact raised when appropriate by algorithm////    tan(SNaN) = QNaN//    tan(QNaN) = QNaN//    tan(inf) = QNaN//    tan(+/-0) = +/-0//// *********************************************************************//// Mathematical Description//// We consider the computation of FPTAN of Arg. Now, given////      Arg = N pi/2  + alpha,          |alpha| <= pi/4,//// basic mathematical relationship shows that////      tan( Arg ) =  tan( alpha )     if N is even;//                 = -cot( alpha )      otherwise.//// The value of alpha is obtained by argument reduction and// represented by two working precision numbers r and c where////      alpha =  r  +  c     accurately.//// The reduction method is described in a previous write up.// The argument reduction scheme identifies 4 cases. For Cases 2// and 4, because |alpha| is small, tan(r+c) and -cot(r+c) can be// computed very easily by 2 or 3 terms of the Taylor series// expansion as follows://// Case 2:// -------////      tan(r + c) = r + c + r^3/3          ...accurately//        -cot(r + c) = -1/(r+c) + r/3          ...accurately//// Case 4:// -------////      tan(r + c) = r + c + r^3/3 + 2r^5/15     ...accurately//        -cot(r + c) = -1/(r+c) + r/3 + r^3/45     ...accurately////// The only cases left are Cases 1 and 3 of the argument reduction// procedure. These two cases will be merged since after the// argument is reduced in either cases, we have the reduced argument// represented as r + c and that the magnitude |r + c| is not small// enough to allow the usage of a very short approximation.//// The greatest challenge of this task is that the second terms of// the Taylor series for tan(r) and -cot(r)////      r + r^3/3 + 2 r^5/15 + ...//// and////      -1/r + r/3 + r^3/45 + ...//// are not very small when |r| is close to pi/4 and the rounding// errors will be a concern if simple polynomial accumulation is// used. When |r| < 2^(-2), however, the second terms will be small// enough (5 bits or so of right shift) that a normal Horner// recurrence suffices. Hence there are two cases that we consider// in the accurate computation of tan(r) and cot(r), |r| <= pi/4.//// Case small_r: |r| < 2^(-2)// --------------------------//// Since Arg = N pi/4 + r + c accurately, we have////      tan(Arg) =  tan(r+c)            for N even,//            = -cot(r+c)          otherwise.//// Here for this case, both tan(r) and -cot(r) can be approximated// by simple polynomials:////      tan(r) =    r + P1_1 r^3 + P1_2 r^5 + ... + P1_9 r^19//        -cot(r) = -1/r + Q1_1 r   + Q1_2 r^3 + ... + Q1_7 r^13//// accurately. Since |r| is relatively small, tan(r+c) and// -cot(r+c) can be accurately approximated by replacing r with// r+c only in the first two terms of the corresponding polynomials.//// Note that P1_1 (and Q1_1 for that matter) approximates 1/3 to// almost 64 sig. bits, thus////      P1_1 (r+c)^3 =  P1_1 r^3 + c * r^2     accurately.//// Hence,////      tan(r+c) =    r + P1_1 r^3 + P1_2 r^5 + ... + P1_9 r^19//                     + c*(1 + r^2)////        -cot(r+c) = -1/(r+c) + Q1_1 r   + Q1_2 r^3 + ... + Q1_7 r^13//               + Q1_1*c////// Case normal_r: 2^(-2) <= |r| <= pi/4// ------------------------------------//// This case is more likely than the previous one if one considers// r to be uniformly distributed in [-pi/4 pi/4].//// The required calculation is either////      tan(r + c)  =  tan(r)  +  correction,  or//        -cot(r + c)  = -cot(r)  +  correction.//// Specifically,////      tan(r + c) =  tan(r) + c tan'(r)  + O(c^2)//              =  tan(r) + c sec^2(r) + O(c^2)//              =  tan(r) + c SEC_sq     ...accurately//                as long as SEC_sq approximates sec^2(r)//                to, say, 5 bits or so.//// Similarly,////        -cot(r + c) = -cot(r) - c cot'(r)  + O(c^2)//              = -cot(r) + c csc^2(r) + O(c^2)//              = -cot(r) + c CSC_sq     ...accurately//                as long as CSC_sq approximates csc^2(r)//                to, say, 5 bits or so.//// We therefore concentrate on accurately calculating tan(r) and// cot(r) for a working-precision number r, |r| <= pi/4 to within// 0.1% or so.//// We will employ a table-driven approach. Let////      r = sgn_r * 2^k * 1.b_1 b_2 ... b_5 ... b_63//        = sgn_r * ( B + x )//// where////      B = 2^k * 1.b_1 b_2 ... b_5 1//         x = |r| - B//// Now,//                   tan(B)  +   tan(x)//      tan( B + x ) =  ------------------------//                   1 -  tan(B)*tan(x)////               /                         \ //               |   tan(B)  +   tan(x)          |//      = tan(B) +  | ------------------------ - tan(B) |//               |     1 -  tan(B)*tan(x)          |//               \                         /////                 sec^2(B) * tan(x)//      = tan(B) + ------------------------//                 1 -  tan(B)*tan(x)////                (1/[sin(B)*cos(B)]) * tan(x)//      = tan(B) + --------------------------------//                      cot(B)  -  tan(x)////// Clearly, the values of tan(B), cot(B) and 1/(sin(B)*cos(B)) are// calculated beforehand and stored in a table. Since////      |x| <= 2^k * 2^(-6)  <= 2^(-7)  (because k = -1, -2)//// a very short polynomial will be sufficient to approximate tan(x)// accurately. The details involved in computing the last expression// will be given in the next section on algorithm description.////// Now, we turn to the case where cot( B + x ) is needed.//////                   1 - tan(B)*tan(x)//      cot( B + x ) =  ------------------------//                   tan(B)  +  tan(x)////               /                           \ //               |   1 - tan(B)*tan(x)              |//      = cot(B) +  | ----------------------- - cot(B) |//               |     tan(B)  +  tan(x)            |//               \                           /////               [tan(B) + cot(B)] * tan(x)//      = cot(B) - ----------------------------//                   tan(B)  +  tan(x)////                (1/[sin(B)*cos(B)]) * tan(x)//      = cot(B) - --------------------------------//                      tan(B)  +  tan(x)////// Note that the values of tan(B), cot(B) and 1/(sin(B)*cos(B)) that// are needed are the same set of values needed in the previous// case.//// Finally, we can put all the ingredients together as follows:////      Arg = N * pi/2 +  r + c          ...accurately////      tan(Arg) =  tan(r) + correction    if N is even;//            = -cot(r) + correction    otherwise.//// For Cases 2 and 4,////     Case 2://     tan(Arg) =  tan(r + c) = r + c + r^3/3           N even//              = -cot(r + c) = -1/(r+c) + r/3           N odd//     Case 4://     tan(Arg) =  tan(r + c) = r + c + r^3/3 + 2r^5/15  N even//              = -cot(r + c) = -1/(r+c) + r/3 + r^3/45  N odd////// For Cases 1 and 3,////     Case small_r: |r| < 2^(-2)////      tan(Arg) =  r + P1_1 r^3 + P1_2 r^5 + ... + P1_9 r^19//                     + c*(1 + r^2)               N even////                  = -1/(r+c) + Q1_1 r   + Q1_2 r^3 + ... + Q1_7 r^13//               + Q1_1*c                    N odd////     Case normal_r: 2^(-2) <= |r| <= pi/4////      tan(Arg) =  tan(r) + c * sec^2(r)     N even//               = -cot(r) + c * csc^2(r)     otherwise////     For N even,////      tan(Arg) = tan(r) + c*sec^2(r)//               = tan( sgn_r * (B+x) ) + c * sec^2(|r|)//                  = sgn_r * ( tan(B+x)  + sgn_r*c*sec^2(|r|) )//                  = sgn_r * ( tan(B+x)  + sgn_r*c*sec^2(B) )//// since B approximates |r| to 2^(-6) in relative accuracy.////                 /            (1/[sin(B)*cos(B)]) * tan(x)//    tan(Arg) = sgn_r * | tan(B) + --------------------------------//                 \                     cot(B)  -  tan(x)//                                        \ //                       + CORR  |//                                     /// where////    CORR = sgn_r*c*tan(B)*SC_inv(B);  SC_inv(B) = 1/(sin(B)*cos(B)).//// For N odd,////      tan(Arg) = -cot(r) + c*csc^2(r)//               = -cot( sgn_r * (B+x) ) + c * csc^2(|r|)//                  = sgn_r * ( -cot(B+x)  + sgn_r*c*csc^2(|r|) )//                  = sgn_r * ( -cot(B+x)  + sgn_r*c*csc^2(B) )//// since B approximates |r| to 2^(-6) in relative accuracy.////                 /            (1/[sin(B)*cos(B)]) * tan(x)//    tan(Arg) = sgn_r * | -cot(B) + --------------------------------//                 \                     tan(B)  +  tan(x)//                                        \ //                       + CORR  |//                                     /// where////    CORR = sgn_r*c*cot(B)*SC_inv(B);  SC_inv(B) = 1/(sin(B)*cos(B)).////// The actual algorithm prescribes how all the mathematical formulas// are calculated.////// 2. Algorithmic Description// ==========================//// 2.1 Computation for Cases 2 and 4.// ----------------------------------//// For Case 2, we use two-term polynomials.////    For N even,////    rsq := r * r//    Result := c + r * rsq * P1_1//    Result := r + Result          ...in user-defined rounding////    For N odd,//    S_hi  := -frcpa(r)               ...8 bits//    S_hi  := S_hi + S_hi*(1 + S_hi*r)     ...16 bits//    S_hi  := S_hi + S_hi*(1 + S_hi*r)     ...32 bits//    S_hi  := S_hi + S_hi*(1 + S_hi*r)     ...64 bits//    S_lo  := S_hi*( (1 + S_hi*r) + S_hi*c )//    ...S_hi + S_lo is -1/(r+c) to extra precision//    S_lo  := S_lo + Q1_1*r////    Result := S_hi + S_lo     ...in user-defined rounding//// For Case 4, we use three-term polynomials////    For N even,////    rsq := r * r//    Result := c + r * rsq * (P1_1 + rsq * P1_2)//    Result := r + Result          ...in user-defined rounding////    For N odd,//    S_hi  := -frcpa(r)               ...8 bits//    S_hi  := S_hi + S_hi*(1 + S_hi*r)     ...16 bits//    S_hi  := S_hi + S_hi*(1 + S_hi*r)     ...32 bits//    S_hi  := S_hi + S_hi*(1 + S_hi*r)     ...64 bits//    S_lo  := S_hi*( (1 + S_hi*r) + S_hi*c )//    ...S_hi + S_lo is -1/(r+c) to extra precision//    rsq   := r * r//    P      := Q1_1 + rsq*Q1_2//    S_lo  := S_lo + r*P////    Result := S_hi + S_lo     ...in user-defined rounding////// Note that the coefficients P1_1, P1_2, Q1_1, and Q1_2 are// the same as those used in the small_r case of Cases 1 and 3// below.////// 2.2 Computation for Cases 1 and 3.// ----------------------------------// This is further divided into the case of small_r,// where |r| < 2^(-2), and the case of normal_r, where |r| lies between// 2^(-2) and pi/4.//// Algorithm for the case of small_r// ---------------------------------//// For N even,//      rsq   := r * r//      Poly1 := rsq*(P1_1 + rsq*(P1_2 + rsq*P1_3))//      r_to_the_8    := rsq * rsq//      r_to_the_8    := r_to_the_8 * r_to_the_8//      Poly2 := P1_4 + rsq*(P1_5 + rsq*(P1_6 + ... rsq*P1_9))//      CORR  := c * ( 1 + rsq )//      Poly  := Poly1 + r_to_the_8*Poly2//      Result := r*Poly + CORR//      Result := r + Result     ...in user-defined rounding//      ...note that Poly1 and r_to_the_8 can be computed in parallel//      ...with Poly2 (Poly1 is intentionally set to be much//      ...shorter than Poly2 so that r_to_the_8 and CORR can be hidden)//// For N odd,//      S_hi  := -frcpa(r)               ...8 bits//      S_hi  := S_hi + S_hi*(1 + S_hi*r)     ...16 bits//      S_hi  := S_hi + S_hi*(1 + S_hi*r)     ...32 bits//      S_hi  := S_hi + S_hi*(1 + S_hi*r)     ...64 bits//      S_lo  := S_hi*( (1 + S_hi*r) + S_hi*c )//      ...S_hi + S_lo is -1/(r+c) to extra precision//      S_lo  := S_lo + Q1_1*c////      ...S_hi and S_lo are computed in parallel with//      ...the following//      rsq := r*r//      P   := Q1_1 + rsq*(Q1_2 + rsq*(Q1_3 + ... + rsq*Q1_7))////      Result :=  r*P + S_lo//      Result :=  S_hi  +  Result      ...in user-defined rounding////// Algorithm for the case of normal_r// ----------------------------------//// Here, we first consider the computation of tan( r + c ). As// presented in the previous section,////      tan( r + c )  =  tan(r) + c * sec^2(r)//                 =  sgn_r * [ tan(B+x) + CORR ]//      CORR = sgn_r * c * tan(B) * 1/[sin(B)*cos(B)]//// because sec^2(r) = sec^(|r|), and B approximate |r| to 6.5 bits.////      tan( r + c ) =//           /           (1/[sin(B)*cos(B)]) * tan(x)//      sgn_r * | tan(B) + --------------------------------  +//           \                     cot(B)  -  tan(x)//                                \ //                          CORR  |//                                ///// The values of tan(B), cot(B) and 1/(sin(B)*cos(B)) are// calculated beforehand and stored in a table. Specifically,// the table values are////      tan(B)                as  T_hi  +  T_lo;//      cot(B)             as  C_hi  +  C_lo;//      1/[sin(B)*cos(B)]  as  SC_inv//// T_hi, C_hi are in  double-precision  memory format;// T_lo, C_lo are in  single-precision  memory format;// SC_inv     is  in extended-precision memory format.//// The value of tan(x) will be approximated by a short polynomial of// the form////      tan(x)  as  x  +  x * P, where//           P  =   x^2 * (P2_1 + x^2 * (P2_2 + x^2 * P2_3))//// Because |x| <= 2^(-7), cot(B) - x approximates cot(B) - tan(x)// to a relative accuracy better than 2^(-20). Thus, a good// initial guess of 1/( cot(B) - tan(x) ) to initiate the iterative// division is:////      1/(cot(B) - tan(x))      is approximately//      1/(cot(B) -   x)         is//      tan(B)/(1 - x*tan(B))    is approximately//      T_hi / ( 1 - T_hi * x )  is approximately////      T_hi * [ 1 + (Thi * x) + (T_hi * x)^2 ]//// The calculation of tan(r+c) therefore proceed as follows:////      Tx     := T_hi * x//      xsq     := x * x////      V_hi     := T_hi*(1 + Tx*(1 + Tx))//      P     := xsq * (P1_1 + xsq*(P1_2 + xsq*P1_3))//      ...V_hi serves as an initial guess of 1/(cot(B) - tan(x))//         ...good to about 20 bits of accuracy////      tanx     := x + x*P//      D     := C_hi - tanx//      ...D is a double precision denominator: cot(B) - tan(x)////      V_hi     := V_hi + V_hi*(1 - V_hi*D)//      ....V_hi approximates 1/(cot(B)-tan(x)) to 40 bits////      V_lo     := V_hi * ( [ (1 - V_hi*C_hi) + V_hi*tanx ]//                           - V_hi*C_lo )   ...observe all order//         ...V_hi + V_lo approximates 1/(cot(B) - tan(x))//      ...to extra accuracy////      ...               SC_inv(B) * (x + x*P)//      ...   tan(B) +      ------------------------- + CORR//         ...                cot(B) - (x + x*P)//      ...//      ... = tan(B) + SC_inv(B)*(x + x*P)*(V_hi + V_lo) + CORR//      ...////      Sx     := SC_inv * x//      CORR     := sgn_r * c * SC_inv * T_hi////      ...put the ingredients together to compute//      ...               SC_inv(B) * (x + x*P)//      ...   tan(B) +      ------------------------- + CORR//         ...                cot(B) - (x + x*P)//      ...//      ... = tan(B) + SC_inv(B)*(x + x*P)*(V_hi + V_lo) + CORR//      ...//      ... = T_hi + T_lo + CORR +//      ...    Sx * V_hi + Sx * V_lo + Sx * P *(V_hi + V_lo)////      CORR := CORR + T_lo//      tail := V_lo + P*(V_hi + V_lo)//         tail := Sx * tail  +  CORR//      tail := Sx * V_hi  +  tail//         T_hi := sgn_r * T_hi////         ...T_hi + sgn_r*tail  now approximate//      ...sgn_r*(tan(B+x) + CORR) accurately////      Result :=  T_hi + sgn_r*tail  ...in user-defined//                           ...rounding control//      ...It is crucial that independent paths be fully//      ...exploited for performance's sake.////// Next, we consider the computation of -cot( r + c ). As// presented in the previous section,////        -cot( r + c )  =  -cot(r) + c * csc^2(r)//                 =  sgn_r * [ -cot(B+x) + CORR ]//      CORR = sgn_r * c * cot(B) * 1/[sin(B)*cos(B)]//// because csc^2(r) = csc^(|r|), and B approximate |r| to 6.5 bits.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -