⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 libm_sincosl.s

📁 glibc 2.9,最新版的C语言库函数
💻 S
📖 第 1 页 / 共 5 页
字号:
.file "libm_sincosl.s"// Copyright (c) 2000 - 2004, Intel Corporation// All rights reserved.//// Contributed 2000 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY// OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.//// Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at// http://www.intel.com/software/products/opensource/libraries/num.htm.////*********************************************************************//// History:// 05/13/02 Initial version of sincosl (based on libm's sinl and cosl)// 02/10/03 Reordered header: .section, .global, .proc, .align;//          used data8 for long double table values// 10/13/03 Corrected .file name// 02/11/04 cisl is moved to the separate file.// 10/26/04 Avoided using r14-31 as scratch so not clobbered by dynamic loader////*********************************************************************//// Function:   Combined sincosl routine with 3 different API's//// API's//==============================================================// 1) void sincosl(long double, long double*s, long double*c)// 2) __libm_sincosl - internal LIBM function, that accepts//    argument in f8 and returns cosine through f8, sine through f9//////*********************************************************************//// Resources Used:////    Floating-Point Registers: f8 (Input x and cosl return value),//                              f9 (sinl returned)//                              f32-f121////    General Purpose Registers://      r32-r61////    Predicate Registers:      p6-p15////*********************************************************************////  IEEE Special Conditions:////    Denormal  fault raised on denormal inputs//    Overflow exceptions do not occur//    Underflow exceptions raised when appropriate for sincosl//    (No specialized error handling for this routine)//    Inexact raised when appropriate by algorithm////    sincosl(SNaN) = QNaN, QNaN//    sincosl(QNaN) = QNaN, QNaN//    sincosl(inf)  = QNaN, QNaN//    sincosl(+/-0) = +/-0, 1////*********************************************************************////  Mathematical Description//  ========================////  The computation of FSIN and FCOS performed in parallel.////  Arg = N pi/2 + alpha, |alpha| <= pi/4.////  cosl( Arg ) = sinl( (N+1) pi/2 + alpha ),////  therefore, the code for computing sine will produce cosine as long//  as 1 is added to N immediately after the argument reduction//  process.////  Let M = N if sine//      N+1 if cosine.////  Now, given////  Arg = M pi/2  + alpha, |alpha| <= pi/4,////  let I = M mod 4, or I be the two lsb of M when M is represented//  as 2's complement. I = [i_0 i_1]. Then////  sinl( Arg ) = (-1)^i_0  sinl( alpha ) if i_1 = 0,//             = (-1)^i_0  cosl( alpha )     if i_1 = 1.////  For example://       if M = -1, I = 11//         sin ((-pi/2 + alpha) = (-1) cos (alpha)//       if M = 0, I = 00//         sin (alpha) = sin (alpha)//       if M = 1, I = 01//         sin (pi/2 + alpha) = cos (alpha)//       if M = 2, I = 10//         sin (pi + alpha) = (-1) sin (alpha)//       if M = 3, I = 11//         sin ((3/2)pi + alpha) = (-1) cos (alpha)////  The value of alpha is obtained by argument reduction and//  represented by two working precision numbers r and c where////  alpha =  r  +  c     accurately.////  The reduction method is described in a previous write up.//  The argument reduction scheme identifies 4 cases. For Cases 2//  and 4, because |alpha| is small, sinl(r+c) and cosl(r+c) can be//  computed very easily by 2 or 3 terms of the Taylor series//  expansion as follows:////  Case 2://  -------////  sinl(r + c) = r + c - r^3/6 accurately//  cosl(r + c) = 1 - 2^(-67) accurately////  Case 4://  -------////  sinl(r + c) = r + c - r^3/6 + r^5/120 accurately//  cosl(r + c) = 1 - r^2/2 + r^4/24    accurately////  The only cases left are Cases 1 and 3 of the argument reduction//  procedure. These two cases will be merged since after the//  argument is reduced in either cases, we have the reduced argument//  represented as r + c and that the magnitude |r + c| is not small//  enough to allow the usage of a very short approximation.////  The required calculation is either////  sinl(r + c)  =  sinl(r)  +  correction,  or//  cosl(r + c)  =  cosl(r)  +  correction.////  Specifically,////  sinl(r + c) = sinl(r) + c sin'(r) + O(c^2)//       = sinl(r) + c cos (r) + O(c^2)//       = sinl(r) + c(1 - r^2/2)  accurately.//  Similarly,////  cosl(r + c) = cosl(r) - c sinl(r) + O(c^2)//       = cosl(r) - c(r - r^3/6)  accurately.////  We therefore concentrate on accurately calculating sinl(r) and//  cosl(r) for a working-precision number r, |r| <= pi/4 to within//  0.1% or so.////  The greatest challenge of this task is that the second terms of//  the Taylor series////  r - r^3/3! + r^r/5! - ...////  and////  1 - r^2/2! + r^4/4! - ...////  are not very small when |r| is close to pi/4 and the rounding//  errors will be a concern if simple polynomial accumulation is//  used. When |r| < 2^-3, however, the second terms will be small//  enough (6 bits or so of right shift) that a normal Horner//  recurrence suffices. Hence there are two cases that we consider//  in the accurate computation of sinl(r) and cosl(r), |r| <= pi/4.////  Case small_r: |r| < 2^(-3)//  --------------------------////  Since Arg = M pi/4 + r + c accurately, and M mod 4 is [i_0 i_1],//  we have////  sinl(Arg) = (-1)^i_0 * sinl(r + c)  if i_1 = 0//     = (-1)^i_0 * cosl(r + c)   if i_1 = 1////  can be accurately approximated by////  sinl(Arg) = (-1)^i_0 * [sinl(r) + c]  if i_1 = 0//           = (-1)^i_0 * [cosl(r) - c*r] if i_1 = 1////  because |r| is small and thus the second terms in the correction//  are unneccessary.////  Finally, sinl(r) and cosl(r) are approximated by polynomials of//  moderate lengths.////  sinl(r) =  r + S_1 r^3 + S_2 r^5 + ... + S_5 r^11//  cosl(r) =  1 + C_1 r^2 + C_2 r^4 + ... + C_5 r^10////  We can make use of predicates to selectively calculate//  sinl(r) or cosl(r) based on i_1.////  Case normal_r: 2^(-3) <= |r| <= pi/4//  ------------------------------------////  This case is more likely than the previous one if one considers//  r to be uniformly distributed in [-pi/4 pi/4]. Again,////  sinl(Arg) = (-1)^i_0 * sinl(r + c)  if i_1 = 0//           = (-1)^i_0 * cosl(r + c)   if i_1 = 1.////  Because |r| is now larger, we need one extra term in the//  correction. sinl(Arg) can be accurately approximated by////  sinl(Arg) = (-1)^i_0 * [sinl(r) + c(1-r^2/2)]      if i_1 = 0//           = (-1)^i_0 * [cosl(r) - c*r*(1 - r^2/6)]    i_1 = 1.////  Finally, sinl(r) and cosl(r) are approximated by polynomials of//  moderate lengths.////  sinl(r) =  r + PP_1_hi r^3 + PP_1_lo r^3 +//                PP_2 r^5 + ... + PP_8 r^17////  cosl(r) =  1 + QQ_1 r^2 + QQ_2 r^4 + ... + QQ_8 r^16////  where PP_1_hi is only about 16 bits long and QQ_1 is -1/2.//  The crux in accurate computation is to calculate////  r + PP_1_hi r^3   or  1 + QQ_1 r^2////  accurately as two pieces: U_hi and U_lo. The way to achieve this//  is to obtain r_hi as a 10 sig. bit number that approximates r to//  roughly 8 bits or so of accuracy. (One convenient way is////  r_hi := frcpa( frcpa( r ) ).)////  This way,////  r + PP_1_hi r^3 =  r + PP_1_hi r_hi^3 +//                          PP_1_hi (r^3 - r_hi^3)//            =  [r + PP_1_hi r_hi^3]  +//         [PP_1_hi (r - r_hi)//            (r^2 + r_hi r + r_hi^2) ]//            =  U_hi  +  U_lo////  Since r_hi is only 10 bit long and PP_1_hi is only 16 bit long,//  PP_1_hi * r_hi^3 is only at most 46 bit long and thus computed//  exactly. Furthermore, r and PP_1_hi r_hi^3 are of opposite sign//  and that there is no more than 8 bit shift off between r and//  PP_1_hi * r_hi^3. Hence the sum, U_hi, is representable and thus//  calculated without any error. Finally, the fact that////  |U_lo| <= 2^(-8) |U_hi|////  says that U_hi + U_lo is approximating r + PP_1_hi r^3 to roughly//  8 extra bits of accuracy.////  Similarly,////  1 + QQ_1 r^2  =  [1 + QQ_1 r_hi^2]  +//                      [QQ_1 (r - r_hi)(r + r_hi)]//          =  U_hi  +  U_lo.////  Summarizing, we calculate r_hi = frcpa( frcpa( r ) ).////  If i_1 = 0, then////    U_hi := r + PP_1_hi * r_hi^3//    U_lo := PP_1_hi * (r - r_hi) * (r^2 + r*r_hi + r_hi^2)//    poly := PP_1_lo r^3 + PP_2 r^5 + ... + PP_8 r^17//    correction := c * ( 1 + C_1 r^2 )////  Else ...i_1 = 1////    U_hi := 1 + QQ_1 * r_hi * r_hi//    U_lo := QQ_1 * (r - r_hi) * (r + r_hi)//    poly := QQ_2 * r^4 + QQ_3 * r^6 + ... + QQ_8 r^16//    correction := -c * r * (1 + S_1 * r^2)////  End////  Finally,////  V := poly + ( U_lo + correction )////                 /    U_hi  +  V         if i_0 = 0//  result := |//                 \  (-U_hi) -  V         if i_0 = 1////  It is important that in the last step, negation of U_hi is//  performed prior to the subtraction which is to be performed in//  the user-set rounding mode.//////  Algorithmic Description//  =======================////  The argument reduction algorithm shares the same code between FSIN and FCOS.//  The argument reduction description given//  previously is repeated below.//////  Step 0. Initialization.////  Step 1. Check for exceptional and special cases.////   * If Arg is +-0, +-inf, NaN, NaT, go to Step 10 for special//     handling.//   * If |Arg| < 2^24, go to Step 2 for reduction of moderate//     arguments. This is the most likely case.//   * If |Arg| < 2^63, go to Step 8 for pre-reduction of large//     arguments.//   * If |Arg| >= 2^63, go to Step 10 for special handling.////  Step 2. Reduction of moderate arguments.////  If |Arg| < pi/4   ...quick branch//     N_fix := N_inc (integer)//     r     := Arg//     c     := 0.0//     Branch to Step 4, Case_1_complete//  Else    ...cf. argument reduction//     N     := Arg * two_by_PI (fp)//     N_fix := fcvt.fx( N )  (int)//     N     := fcvt.xf( N_fix )//     N_fix := N_fix + N_inc//     s     := Arg - N * P_1 (first piece of pi/2)//     w     := -N * P_2  (second piece of pi/2)////     If |s| >= 2^(-33)//        go to Step 3, Case_1_reduce//     Else//        go to Step 7, Case_2_reduce//     Endif//  Endif////  Step 3. Case_1_reduce.////  r := s + w//  c := (s - r) + w  ...observe order////  Step 4. Case_1_complete////  ...At this point, the reduced argument alpha is//  ...accurately represented as r + c.//  If |r| < 2^(-3), go to Step 6, small_r.////  Step 5. Normal_r.////  Let [i_0 i_1] by the 2 lsb of N_fix.//  FR_rsq  := r * r//  r_hi := frcpa( frcpa( r ) )//  r_lo := r - r_hi////  If i_1 = 0, then//    poly := r*FR_rsq*(PP_1_lo + FR_rsq*(PP_2 + ... FR_rsq*PP_8))//    U_hi := r + PP_1_hi*r_hi*r_hi*r_hi  ...any order//    U_lo := PP_1_hi*r_lo*(r*r + r*r_hi + r_hi*r_hi)//    correction := c + c*C_1*FR_rsq    ...any order//  Else//    poly := FR_rsq*FR_rsq*(QQ_2 + FR_rsq*(QQ_3 + ... + FR_rsq*QQ_8))//    U_hi := 1 + QQ_1 * r_hi * r_hi    ...any order//    U_lo := QQ_1 * r_lo * (r + r_hi)//    correction := -c*(r + S_1*FR_rsq*r) ...any order//  Endif////  V := poly + (U_lo + correction) ...observe order////  result := (i_0 == 0?   1.0 : -1.0)////  Last instruction in user-set rounding mode////  result := (i_0 == 0?   result*U_hi + V ://                        result*U_hi - V)////  Return////  Step 6. Small_r.////  ...Use flush to zero mode without causing exception//    Let [i_0 i_1] be the two lsb of N_fix.////  FR_rsq := r * r////  If i_1 = 0 then//     z := FR_rsq*FR_rsq; z := FR_rsq*z *r//     poly_lo := S_3 + FR_rsq*(S_4 + FR_rsq*S_5)//     poly_hi := r*FR_rsq*(S_1 + FR_rsq*S_2)//     correction := c//     result := r//  Else//     z := FR_rsq*FR_rsq; z := FR_rsq*z//     poly_lo := C_3 + FR_rsq*(C_4 + FR_rsq*C_5)//     poly_hi := FR_rsq*(C_1 + FR_rsq*C_2)//     correction := -c*r//     result := 1//  Endif////  poly := poly_hi + (z * poly_lo + correction)////  If i_0 = 1, result := -result////  Last operation. Perform in user-set rounding mode////  result := (i_0 == 0?     result + poly ://                          result - poly )//  Return////  Step 7. Case_2_reduce.////  ...Refer to the write up for argument reduction for//  ...rationale. The reduction algorithm below is taken from//  ...argument reduction description and integrated this.////  w := N*P_3//  U_1 := N*P_2 + w    ...FMA//  U_2 := (N*P_2 - U_1) + w  ...2 FMA//  ...U_1 + U_2 is  N*(P_2+P_3) accurately////  r := s - U_1//  c := ( (s - r) - U_1 ) - U_2////  ...The mathematical sum r + c approximates the reduced//  ...argument accurately. Note that although compared to//  ...Case 1, this case requires much more work to reduce//  ...the argument, the subsequent calculation needed for//  ...any of the trigonometric function is very little because//  ...|alpha| < 1.01*2^(-33) and thus two terms of the//  ...Taylor series expansion suffices.////  If i_1 = 0 then//     poly := c + S_1 * r * r * r  ...any order//     result := r//  Else//     poly := -2^(-67)//     result := 1.0//  Endif////  If i_0 = 1, result := -result////  Last operation. Perform in user-set rounding mode////  result := (i_0 == 0?     result + poly ://                           result - poly )////  Return//////  Step 8. Pre-reduction of large arguments.////  ...Again, the following reduction procedure was described//  ...in the separate write up for argument reduction, which//  ...is tightly integrated here.//  N_0 := Arg * Inv_P_0//  N_0_fix := fcvt.fx( N_0 )//  N_0 := fcvt.xf( N_0_fix)//  Arg' := Arg - N_0 * P_0//  w := N_0 * d_1//  N := Arg' * two_by_PI//  N_fix := fcvt.fx( N )//  N := fcvt.xf( N_fix )//  N_fix := N_fix + N_inc////  s := Arg' - N * P_1//  w := w - N * P_2////  If |s| >= 2^(-14)//     go to Step 3//  Else//     go to Step 9//  Endif////  Step 9. Case_4_reduce.////    ...first obtain N_0*d_1 and -N*P_2 accurately//   U_hi := N_0 * d_1    V_hi := -N*P_2//   U_lo := N_0 * d_1 - U_hi V_lo := -N*P_2 - U_hi ...FMAs////   ...compute the contribution from N_0*d_1 and -N*P_3//   w := -N*P_3//   w := w + N_0*d_2//   t := U_lo + V_lo + w   ...any order////   ...at this point, the mathematical value//   ...s + U_hi + V_hi  + t approximates the true reduced argument//   ...accurately. Just need to compute this accurately.////   ...Calculate U_hi + V_hi accurately:

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -