⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 s_cos.s

📁 Glibc 2.3.2源代码(解压后有100多M)
💻 S
📖 第 1 页 / 共 5 页
字号:
.file "sincos.s"// Copyright (C) 2000, 2001, Intel Corporation// All rights reserved.//// Contributed 2/2/2000 by John Harrison, Ted Kubaska, Bob Norin, Shane Story,// and Ping Tak Peter Tang of the Computational Software Lab, Intel Corporation.//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.//// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY// OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.//// Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at// http://developer.intel.com/opensource.//// History//==============================================================// 2/02/00  Initial revision// 4/02/00  Unwind support added.// 6/16/00  Updated tables to enforce symmetry// 8/31/00  Saved 2 cycles in main path, and 9 in other paths.// 9/20/00  The updated tables regressed to an old version, so reinstated them// 10/18/00 Changed one table entry to ensure symmetry// 1/03/01  Improved speed, fixed flag settings for small arguments.// API//==============================================================// double sin( double x);// double cos( double x);//// Overview of operation//==============================================================//// Step 1// ======// Reduce x to region -1/2*pi/2^k ===== 0 ===== +1/2*pi/2^k  where k=4//    divide x by pi/2^k.//    Multiply by 2^k/pi.//    nfloat = Round result to integer (round-to-nearest)//// r = x -  nfloat * pi/2^k//    Do this as (x -  nfloat * HIGH(pi/2^k)) - nfloat * LOW(pi/2^k) for increased accuracy.//    pi/2^k is stored as two numbers that when added make pi/2^k.//       pi/2^k = HIGH(pi/2^k) + LOW(pi/2^k)//// x = (nfloat * pi/2^k) + r//    r is small enough that we can use a polynomial approximation//    and is referred to as the reduced argument.//// Step 3// ======// Take the unreduced part and remove the multiples of 2pi.// So nfloat = nfloat (with lower k+1 bits cleared) + lower k+1 bits////    nfloat (with lower k+1 bits cleared) is a multiple of 2^(k+1)//    N * 2^(k+1)//    nfloat * pi/2^k = N * 2^(k+1) * pi/2^k + (lower k+1 bits) * pi/2^k//    nfloat * pi/2^k = N * 2 * pi + (lower k+1 bits) * pi/2^k//    nfloat * pi/2^k = N2pi + M * pi/2^k////// Sin(x) = Sin((nfloat * pi/2^k) + r)//        = Sin(nfloat * pi/2^k) * Cos(r) + Cos(nfloat * pi/2^k) * Sin(r)////          Sin(nfloat * pi/2^k) = Sin(N2pi + Mpi/2^k)//                               = Sin(N2pi)Cos(Mpi/2^k) + Cos(N2pi)Sin(Mpi/2^k)//                               = Sin(Mpi/2^k)////          Cos(nfloat * pi/2^k) = Cos(N2pi + Mpi/2^k)//                               = Cos(N2pi)Cos(Mpi/2^k) + Sin(N2pi)Sin(Mpi/2^k)//                               = Cos(Mpi/2^k)//// Sin(x) = Sin(Mpi/2^k) Cos(r) + Cos(Mpi/2^k) Sin(r)////// Step 4// ======// 0 <= M < 2^(k+1)// There are 2^(k+1) Sin entries in a table.// There are 2^(k+1) Cos entries in a table.//// Get Sin(Mpi/2^k) and Cos(Mpi/2^k) by table lookup.////// Step 5// ======// Calculate Cos(r) and Sin(r) by polynomial approximation.//// Cos(r) = 1 + r^2 q1  + r^4 q2 + r^6 q3 + ... = Series for Cos// Sin(r) = r + r^3 p1  + r^5 p2 + r^7 p3 + ... = Series for Sin//// and the coefficients q1, q2, ... and p1, p2, ... are stored in a table////// Calculate// Sin(x) = Sin(Mpi/2^k) Cos(r) + Cos(Mpi/2^k) Sin(r)//// as follows////    Sm = Sin(Mpi/2^k) and Cm = Cos(Mpi/2^k)//    rsq = r*r//////    P = p1 + r^2p2 + r^4p3 + r^6p4//    Q = q1 + r^2q2 + r^4q3 + r^6q4////       rcub = r * rsq//       Sin(r) = r + rcub * P//              = r + r^3p1  + r^5p2 + r^7p3 + r^9p4 + ... = Sin(r)////            The coefficients are not exactly these values, but almost.////            p1 = -1/6  = -1/3!//            p2 = 1/120 =  1/5!//            p3 = -1/5040 = -1/7!//            p4 = 1/362889 = 1/9!////       P =  r + rcub * P////    Answer = Sm Cos(r) + Cm P////       Cos(r) = 1 + rsq Q//       Cos(r) = 1 + r^2 Q//       Cos(r) = 1 + r^2 (q1 + r^2q2 + r^4q3 + r^6q4)//       Cos(r) = 1 + r^2q1 + r^4q2 + r^6q3 + r^8q4 + ...////       Sm Cos(r) = Sm(1 + rsq Q)//       Sm Cos(r) = Sm + Sm rsq Q//       Sm Cos(r) = Sm + s_rsq Q//       Q         = Sm + s_rsq Q//// Then,////    Answer = Q + Cm P#include "libm_support.h"// Registers used//==============================================================// general input registers:// r14 -> r19// r32 -> r45// predicate registers used:// p6 -> p14// floating-point registers used// f9 -> f15// f32 -> f61// Assembly macros//==============================================================sind_NORM_f8                 = f9sind_W                       = f10sind_int_Nfloat              = f11sind_Nfloat                  = f12sind_r                       = f13sind_rsq                     = f14sind_rcub                    = f15sind_Inv_Pi_by_16            = f32sind_Pi_by_16_hi             = f33sind_Pi_by_16_lo             = f34sind_Inv_Pi_by_64            = f35sind_Pi_by_64_hi             = f36sind_Pi_by_64_lo             = f37sind_Sm                      = f38sind_Cm                      = f39sind_P1                      = f40sind_Q1                      = f41sind_P2                      = f42sind_Q2                      = f43sind_P3                      = f44sind_Q3                      = f45sind_P4                      = f46sind_Q4                      = f47sind_P_temp1                 = f48sind_P_temp2                 = f49sind_Q_temp1                 = f50sind_Q_temp2                 = f51sind_P                       = f52sind_Q                       = f53sind_srsq                    = f54sind_SIG_INV_PI_BY_16_2TO61  = f55sind_RSHF_2TO61              = f56sind_RSHF                    = f57sind_2TOM61                  = f58sind_NFLOAT                  = f59sind_W_2TO61_RSH             = f60fp_tmp                       = f61/////////////////////////////////////////////////////////////sind_AD_1                    = r33sind_AD_2                    = r34sind_exp_limit               = r35sind_r_signexp               = r36sind_AD_beta_table           = r37sind_r_sincos                = r38sind_r_exp                   = r39sind_r_17_ones               = r40sind_GR_sig_inv_pi_by_16     = r14sind_GR_rshf_2to61           = r15sind_GR_rshf                 = r16sind_GR_exp_2tom61           = r17sind_GR_n                    = r18sind_GR_m                    = r19sind_GR_32m                  = r19gr_tmp                       = r41GR_SAVE_PFS                  = r41GR_SAVE_B0                   = r42GR_SAVE_GP                   = r43#ifdef _LIBC.rodata#else.data#endif.align 16double_sind_pi:ASM_TYPE_DIRECTIVE(double_sind_pi,@object)//   data8 0xA2F9836E4E44152A, 0x00004001 // 16/pi (significand loaded w/ setf)//         c90fdaa22168c234   data8 0xC90FDAA22168C234, 0x00003FFC // pi/16 hi//         c4c6628b80dc1cd1  29024e088a   data8 0xC4C6628B80DC1CD1, 0x00003FBC // pi/16 loASM_SIZE_DIRECTIVE(double_sind_pi)double_sind_pq_k4:ASM_TYPE_DIRECTIVE(double_sind_pq_k4,@object)   data8 0x3EC71C963717C63A // P4   data8 0x3EF9FFBA8F191AE6 // Q4   data8 0xBF2A01A00F4E11A8 // P3   data8 0xBF56C16C05AC77BF // Q3   data8 0x3F8111111110F167 // P2   data8 0x3FA555555554DD45 // Q2   data8 0xBFC5555555555555 // P1   data8 0xBFDFFFFFFFFFFFFC // Q1ASM_SIZE_DIRECTIVE(double_sind_pq_k4)double_sin_cos_beta_k4:ASM_TYPE_DIRECTIVE(double_sin_cos_beta_k4,@object)data8 0x0000000000000000 , 0x00000000 // sin( 0 pi/16)  S0data8 0x8000000000000000 , 0x00003fff // cos( 0 pi/16)  C0data8 0xc7c5c1e34d3055b3 , 0x00003ffc // sin( 1 pi/16)  S1data8 0xfb14be7fbae58157 , 0x00003ffe // cos( 1 pi/16)  C1data8 0xc3ef1535754b168e , 0x00003ffd // sin( 2 pi/16)  S2data8 0xec835e79946a3146 , 0x00003ffe // cos( 2 pi/16)  C2data8 0x8e39d9cd73464364 , 0x00003ffe // sin( 3 pi/16)  S3data8 0xd4db3148750d181a , 0x00003ffe // cos( 3 pi/16)  C3data8 0xb504f333f9de6484 , 0x00003ffe // sin( 4 pi/16)  S4data8 0xb504f333f9de6484 , 0x00003ffe // cos( 4 pi/16)  C4data8 0xd4db3148750d181a , 0x00003ffe // sin( 5 pi/16)  C3data8 0x8e39d9cd73464364 , 0x00003ffe // cos( 5 pi/16)  S3data8 0xec835e79946a3146 , 0x00003ffe // sin( 6 pi/16)  C2data8 0xc3ef1535754b168e , 0x00003ffd // cos( 6 pi/16)  S2data8 0xfb14be7fbae58157 , 0x00003ffe // sin( 7 pi/16)  C1data8 0xc7c5c1e34d3055b3 , 0x00003ffc // cos( 7 pi/16)  S1data8 0x8000000000000000 , 0x00003fff // sin( 8 pi/16)  C0data8 0x0000000000000000 , 0x00000000 // cos( 8 pi/16)  S0data8 0xfb14be7fbae58157 , 0x00003ffe // sin( 9 pi/16)  C1data8 0xc7c5c1e34d3055b3 , 0x0000bffc // cos( 9 pi/16)  -S1data8 0xec835e79946a3146 , 0x00003ffe // sin(10 pi/16)  C2data8 0xc3ef1535754b168e , 0x0000bffd // cos(10 pi/16)  -S2data8 0xd4db3148750d181a , 0x00003ffe // sin(11 pi/16)  C3data8 0x8e39d9cd73464364 , 0x0000bffe // cos(11 pi/16)  -S3data8 0xb504f333f9de6484 , 0x00003ffe // sin(12 pi/16)  S4data8 0xb504f333f9de6484 , 0x0000bffe // cos(12 pi/16)  -S4data8 0x8e39d9cd73464364 , 0x00003ffe // sin(13 pi/16) S3data8 0xd4db3148750d181a , 0x0000bffe // cos(13 pi/16) -C3data8 0xc3ef1535754b168e , 0x00003ffd // sin(14 pi/16) S2data8 0xec835e79946a3146 , 0x0000bffe // cos(14 pi/16) -C2data8 0xc7c5c1e34d3055b3 , 0x00003ffc // sin(15 pi/16) S1data8 0xfb14be7fbae58157 , 0x0000bffe // cos(15 pi/16) -C1data8 0x0000000000000000 , 0x00000000 // sin(16 pi/16) S0data8 0x8000000000000000 , 0x0000bfff // cos(16 pi/16) -C0data8 0xc7c5c1e34d3055b3 , 0x0000bffc // sin(17 pi/16) -S1data8 0xfb14be7fbae58157 , 0x0000bffe // cos(17 pi/16) -C1data8 0xc3ef1535754b168e , 0x0000bffd // sin(18 pi/16) -S2data8 0xec835e79946a3146 , 0x0000bffe // cos(18 pi/16) -C2data8 0x8e39d9cd73464364 , 0x0000bffe // sin(19 pi/16) -S3data8 0xd4db3148750d181a , 0x0000bffe // cos(19 pi/16) -C3data8 0xb504f333f9de6484 , 0x0000bffe // sin(20 pi/16) -S4data8 0xb504f333f9de6484 , 0x0000bffe // cos(20 pi/16) -S4data8 0xd4db3148750d181a , 0x0000bffe // sin(21 pi/16) -C3data8 0x8e39d9cd73464364 , 0x0000bffe // cos(21 pi/16) -S3data8 0xec835e79946a3146 , 0x0000bffe // sin(22 pi/16) -C2data8 0xc3ef1535754b168e , 0x0000bffd // cos(22 pi/16) -S2data8 0xfb14be7fbae58157 , 0x0000bffe // sin(23 pi/16) -C1data8 0xc7c5c1e34d3055b3 , 0x0000bffc // cos(23 pi/16) -S1data8 0x8000000000000000 , 0x0000bfff // sin(24 pi/16) -C0data8 0x0000000000000000 , 0x00000000 // cos(24 pi/16) S0data8 0xfb14be7fbae58157 , 0x0000bffe // sin(25 pi/16) -C1data8 0xc7c5c1e34d3055b3 , 0x00003ffc // cos(25 pi/16) S1data8 0xec835e79946a3146 , 0x0000bffe // sin(26 pi/16) -C2data8 0xc3ef1535754b168e , 0x00003ffd // cos(26 pi/16) S2data8 0xd4db3148750d181a , 0x0000bffe // sin(27 pi/16) -C3data8 0x8e39d9cd73464364 , 0x00003ffe // cos(27 pi/16) S3data8 0xb504f333f9de6484 , 0x0000bffe // sin(28 pi/16) -S4data8 0xb504f333f9de6484 , 0x00003ffe // cos(28 pi/16) S4data8 0x8e39d9cd73464364 , 0x0000bffe // sin(29 pi/16) -S3data8 0xd4db3148750d181a , 0x00003ffe // cos(29 pi/16) C3data8 0xc3ef1535754b168e , 0x0000bffd // sin(30 pi/16) -S2data8 0xec835e79946a3146 , 0x00003ffe // cos(30 pi/16) C2data8 0xc7c5c1e34d3055b3 , 0x0000bffc // sin(31 pi/16) -S1data8 0xfb14be7fbae58157 , 0x00003ffe // cos(31 pi/16) C1data8 0x0000000000000000 , 0x00000000 // sin(32 pi/16) S0data8 0x8000000000000000 , 0x00003fff // cos(32 pi/16) C0ASM_SIZE_DIRECTIVE(double_sin_cos_beta_k4).align 32.global sin#.global cos##ifdef _LIBC.global __sin#.global __cos##endif////////////////////////////////////////////////////////// There are two entry points: sin and cos// If from sin, p8 is true// If from cos, p9 is true.section .text.proc  sin##ifdef _LIBC.proc  __sin##endif.align 32sin:#ifdef _LIBC__sin:#endif{ .mlx      alloc          r32=ar.pfs,1,13,0,0      movl sind_GR_sig_inv_pi_by_16 = 0xA2F9836E4E44152A // significand of 16/pi}{ .mlx      addl           sind_AD_1   = @ltoff(double_sind_pi), gp      movl sind_GR_rshf_2to61 = 0x47b8000000000000 // 1.1000 2^(63+63-2)};;{ .mfi      ld8 sind_AD_1 = [sind_AD_1]      fnorm     sind_NORM_f8  = f8      cmp.eq     p8,p9         = r0, r0}{ .mib      mov sind_GR_exp_2tom61 = 0xffff-61 // exponent of scaling factor 2^-61      mov            sind_r_sincos = 0x0      br.cond.sptk   L(SIND_SINCOS)};;.endp sinASM_SIZE_DIRECTIVE(sin).section .text.proc  cos##ifdef _LIBC.proc  __cos##endif.align 32cos:#ifdef _LIBC__cos:#endif{ .mlx      alloc          r32=ar.pfs,1,13,0,0      movl sind_GR_sig_inv_pi_by_16 = 0xA2F9836E4E44152A // significand of 16/pi}{ .mlx      addl           sind_AD_1   = @ltoff(double_sind_pi), gp      movl sind_GR_rshf_2to61 = 0x47b8000000000000 // 1.1000 2^(63+63-2)};;{ .mfi      ld8 sind_AD_1 = [sind_AD_1]      fnorm.s1     sind_NORM_f8  = f8      cmp.eq     p9,p8         = r0, r0}{ .mib      mov sind_GR_exp_2tom61 = 0xffff-61 // exponent of scaling factor 2^-61      mov            sind_r_sincos = 0x8      br.cond.sptk   L(SIND_SINCOS)};;////////////////////////////////////////////////////////// All entry points end up here.// If from sin, sind_r_sincos is 0 and p8 is true// If from cos, sind_r_sincos is 8 = 2^(k-1) and p9 is true// We add sind_r_sincos to NL(SIND_SINCOS):// Form two constants we need//  16/pi * 2^-2 * 2^63, scaled by 2^61 since we just loaded the significand//  1.1000...000 * 2^(63+63-2) to right shift int(W) into the low significand// fcmp used to set denormal, and invalid on snans{ .mfi      setf.sig sind_SIG_INV_PI_BY_16_2TO61 = sind_GR_sig_inv_pi_by_16      fcmp.eq.s0 p12,p0=f8,f0      mov       sind_r_17_ones    = 0x1ffff}{ .mlx      setf.d sind_RSHF_2TO61 = sind_GR_rshf_2to61      movl sind_GR_rshf = 0x43e8000000000000 // 1.1000 2^63 for right shift};;// Form another constant//  2^-61 for scaling Nfloat// 0x10009 is register_bias + 10.// So if f8 > 2^10 = Gamma, go to DBX{ .mfi      setf.exp sind_2TOM61 = sind_GR_exp_2tom61      fclass.m  p13,p0 = f8, 0x23           // Test for x inf      mov       sind_exp_limit = 0x10009};;// Load the two pieces of pi/16// Form another constant//  1.1000...000 * 2^63, the right shift constant{ .mmf      ldfe      sind_Pi_by_16_hi  = [sind_AD_1],16      setf.d sind_RSHF = sind_GR_rshf      fclass.m  p14,p0 = f8, 0xc3           // Test for x nan};;{ .mfi      ldfe      sind_Pi_by_16_lo  = [sind_AD_1],16(p13) frcpa.s0 f8,p12=f0,f0               // force qnan indef for x=inf      addl gr_tmp = -1,r0}{ .mfb      addl           sind_AD_beta_table   = @ltoff(double_sin_cos_beta_k4), gp      nop.f 999(p13) br.ret.spnt    b0 ;;                // Exit for x=inf}// Start loading P, Q coefficients// SIN(0){ .mfi      ldfpd      sind_P4,sind_Q4 = [sind_AD_1],16(p8)  fclass.m.unc  p6,p0 = f8, 0x07      // Test for sin(0)      nop.i 999}{ .mfb      addl           sind_AD_beta_table   = @ltoff(double_sin_cos_beta_k4), gp(p14) fma.d f8=f8,f1,f0                   // qnan for x=nan(p14) br.ret.spnt    b0 ;;                // Exit for x=nan}// COS(0){ .mfi      getf.exp  sind_r_signexp    = f8(p9)  fclass.m.unc  p7,p0 = f8, 0x07      // Test for sin(0)      nop.i 999}{ .mfi      ld8 sind_AD_beta_table = [sind_AD_beta_table]      nop.f 999      nop.i 999 ;;}{ .mmb      ldfpd      sind_P3,sind_Q3 = [sind_AD_1],16      setf.sig fp_tmp = gr_tmp // Create constant such that fmpy sets inexact(p6)  br.ret.spnt    b0 ;;

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -