⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 e_acosl.s

📁 Glibc 2.3.2源代码(解压后有100多M)
💻 S
📖 第 1 页 / 共 2 页
字号:
.file "acosl.s"// Copyright (C) 2000, 2001, Intel Corporation// All rights reserved.// // Contributed 2/2/2000 by John Harrison, Ted Kubaska, Bob Norin, Shane Story,// and Ping Tak Peter Tang of the Computational Software Lab, Intel Corporation.//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.//// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://developer.intel.com/opensource.//// History//==============================================================// 2/02/00  Initial version // 2/07/00  Modified calculation of acos_corr to correct acosl// 4/04/00  Unwind support added// 8/15/00  Bundle added after call to __libm_error_support to properly//          set [the previously overwritten] GR_Parameter_RESULT.// 12/20/00 Set denormal flag properly.//// API//==============================================================// double-extended = acosl (double-extended)// input  floating point f8// output floating point f8//// Registers used//==============================================================//// predicate registers used:// p6 -> p12//// floating-point registers used:// f8 has input, then output// f8 -> f15, f32 ->f99//// general registers used:// r32 -> r48//// Overview of operation//==============================================================// There are three paths// 1. |x| < 2^-25                 ACOS_TINY// 2. 2^-25 <= |x| < 1/4          ACOS_POLY// 3. 1/4 <= |x| < 1              ACOS_ATAN#include "libm_support.h"// Assembly macros//==============================================================// f8 is input, but acos_V must be put in f8//    when __libm_atan2_reg is called, f8 must get V// f9 gets U when __libm_atan2_reg is called// __libm_atan2_reg returns // f8  = Z_hi// f10 = Z_lo// f11 = s_loacos_Z_hi = f8acos_Z_lo = f10acos_S_lo = f11// When we call __libm_atan2_reg, we must save // the following:acos_corr  = f12acos_X     = f13acos_pi_hi = f14acos_pi_lo = f15// The rest of the assembly macrosacos_P79                   = f32acos_P59                   = f33acos_P39                   = f34acos_P19                   = f35acos_P810                  = f36acos_P610                  = f37acos_P410                  = f38acos_P210                  = f39acos_A1                    = f41acos_A2                    = f42acos_A3                    = f43acos_A4                    = f44acos_A5                    = f45acos_A6                    = f46acos_A7                    = f47acos_A8                    = f48acos_A9                    = f49acos_A10                   = f50acos_X2                    = f51acos_X4                    = f52acos_B                     = f53acos_Bb                    = f54acos_A                     = f55acos_Aa                    = f56acos_1mA                   = f57acos_W                     = f58acos_Ww                    = f59acos_y0                    = f60acos_y1                    = f61acos_y2                    = f62acos_H                     = f63acos_Hh                    = f64acos_t1                    = f65acos_t2                    = f66acos_t3                    = f67acos_t4                    = f68acos_t5                    = f69acos_Pseries               = f70acos_NORM_f8               = f71acos_ABS_NORM_f8           = f72acos_2                     = f73acos_P1P2                  = f74acos_HALF                  = f75acos_U                     = f76acos_1mB                   = f77acos_V                     = f78 acos_S                     = f79acos_BmUU                  = f80 acos_BmUUpb                = f81 acos_2U                    = f82acos_1d2U                  = f83acos_Dd                    = f84acos_pi_by_2_hi            = f85acos_pi_by_2_lo            = f86acos_xmpi_by_2_lo          = f87acos_xPmw                  = f88acos_Uu                    = f89acos_AmVV                  = f90 acos_AmVVpa                = f91 acos_2V                    = f92 acos_1d2V                  = f93acos_Vv                    = f94acos_Vu                    = f95 acos_Uv                    = f96 acos_2_Z_hi                = f97acos_s_lo_Z_lo             = f98acos_result_lo             = f99acos_Z_hi                  = f8acos_Z_lo                  = f10acos_s_lo                  = f11acos_GR_17_ones            = r33acos_GR_16_ones            = r34acos_GR_signexp_f8         = r35acos_GR_exp                = r36acos_GR_true_exp           = r37acos_GR_fffe               = r38GR_SAVE_PFS                = r43GR_SAVE_B0                 = r39GR_SAVE_GP                 = r41// r40 is address of table of coefficients// r42 GR_Parameter_X             = r44 GR_Parameter_Y             = r45 GR_Parameter_RESULT        = r46 GR_Parameter_TAG                = r47 // 2^-40:// A true exponent of -40 is//                    : -40 + register_bias//                    : -28 + ffff = ffd7// A true exponent of 1 is //                    : 1 + register_bias//                    : 1 + ffff = 10000// Data tables//==============================================================#ifdef _LIBC.rodata#else.data#endif.align 16acos_coefficients:ASM_TYPE_DIRECTIVE(acos_coefficients,@object)data8  0xc90fdaa22168c234, 0x00003FFF            // pi_by_2_hidata8  0xc4c6628b80dc1cd1, 0x00003FBF            // pi_by_2_lodata8  0xc90fdaa22168c234, 0x00004000            // pi_hidata8  0xc4c6628b80dc1cd1, 0x00003FC0            // pi_lodata8  0xBB08911F2013961E, 0x00003FF8            // A10data8  0x981F1095A23A87D3, 0x00003FF8            // A9 data8  0xBDF09C6C4177BCC6, 0x00003FF8            // A8 data8  0xE4C3A60B049ACCEA, 0x00003FF8            // A7 data8  0x8E2789F4E8A8F1AD, 0x00003FF9            // A6 data8  0xB745D09B2B0E850B, 0x00003FF9            // A5 data8  0xF8E38E3BC4C50920, 0x00003FF9            // A4 data8  0xB6DB6DB6D89FCD81, 0x00003FFA            // A3 data8  0x99999999999AF376, 0x00003FFB            // A2 data8  0xAAAAAAAAAAAAAA71, 0x00003FFC            // A1ASM_SIZE_DIRECTIVE(acos_coefficients).align 32.global acosl#ASM_TYPE_DIRECTIVE(acosl#,@function).section .text.proc  acosl#.align 32acosl: // After normalizing f8, get its true exponent{ .mfi      alloc r32 = ar.pfs,1,11,4,0                                             (p0)  fnorm.s1    acos_NORM_f8 = f8                                            (p0)  mov         acos_GR_17_ones = 0x1ffff                                    }{ .mmi(p0)  mov        acos_GR_16_ones = 0xffff                                     (p0)  addl                 r40   = @ltoff(acos_coefficients), gp      nop.i 999};;// Set denormal flag on denormal input with fcmp{ .mfi      ld8 r40 = [r40]      fcmp.eq  p6,p0 = f8,f0      nop.i 999};;// Load the constants pi_by_2 and pi.// Each is stored as hi and lo values// Also load the coefficients for ACOS_POLY{ .mmi(p0) ldfe       acos_pi_by_2_hi = [r40],16 ;;      (p0) ldfe       acos_pi_by_2_lo = [r40],16           nop.i 999 ;;}{ .mmi(p0) ldfe       acos_pi_hi      = [r40],16 ;;      (p0) ldfe       acos_pi_lo      = [r40],16           nop.i 999 ;;}{ .mmi(p0) ldfe       acos_A10        = [r40],16 ;;      (p0) ldfe       acos_A9         = [r40],16           nop.i 999 ;;}// Take the absolute value of f8{ .mmf      nop.m 999(p0)  getf.exp   acos_GR_signexp_f8  = acos_NORM_f8                           (p0)  fmerge.s  acos_ABS_NORM_f8 = f0, acos_NORM_f8 }{ .mii(p0) ldfe       acos_A8         = [r40],16           nop.i 999 ;;(p0) and        acos_GR_exp         = acos_GR_signexp_f8, acos_GR_17_ones ;;    }// case 1: |x| < 2^-25         ==> p6   ACOS_TINY// case 2: 2^-25 <= |x| < 2^-2 ==> p8   ACOS_POLY// case 3: 2^-2  <= |x| < 1    ==> p9   ACOS_ATAN// case 4: 1     <= |x|        ==> p11  ACOS_ERROR_RETURN//  Admittedly |x| = 1 is not an error but this is where that case is//  handled.{ .mii(p0) ldfe       acos_A7         = [r40],16      (p0) sub        acos_GR_true_exp    = acos_GR_exp, acos_GR_16_ones ;;           (p0) cmp.ge.unc p6, p7    = -26, acos_GR_true_exp ;;                            }{ .mii(p0) ldfe       acos_A6         = [r40],16      (p7) cmp.ge.unc p8, p9    = -3,  acos_GR_true_exp ;;                            (p9) cmp.ge.unc p10, p11  =  -1, acos_GR_true_exp                            }{ .mmi(p0) ldfe       acos_A5         = [r40],16 ;;      (p0) ldfe       acos_A4         = [r40],16            nop.i 999 ;;}{ .mmi(p0) ldfe       acos_A3         = [r40],16 ;;      (p0) ldfe       acos_A2         = [r40],16            nop.i 999 ;;}// ACOS_ERROR_RETURN ==> p11 is true// case 4: |x| >= 1{ .mib(p0)  ldfe       acos_A1         = [r40],16            nop.i 999(p11) br.spnt         L(ACOS_ERROR_RETURN) ;; }// ACOS_TINY ==> p6 is true// case 1: |x| < 2^-25{ .mfi      nop.m 999(p6)  fms.s1        acos_xmpi_by_2_lo = acos_NORM_f8,f1, acos_pi_by_2_lo       nop.i 999 ;;}{ .mfb           nop.m 999(p6)  fms.s0         f8 = acos_pi_by_2_hi,f1, acos_xmpi_by_2_lo                (p6)  br.ret.spnt   b0 ;;                                                   }// ACOS_POLY ==> p8 is true// case 2: 2^-25 <= |x| < 2^-2                   { .mfi      nop.m 999(p8)  fms.s1        acos_W       = acos_pi_by_2_hi, f1, acos_NORM_f8           nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fma.s1        acos_X2   = f8,f8, f0                                      nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fms.s1        acos_Ww      = acos_pi_by_2_hi, f1, acos_W                 nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fma.s1        acos_X4   = acos_X2,acos_X2, f0                            nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fms.s1        acos_Ww      = acos_Ww, f1, acos_NORM_f8                   nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fma.s1        acos_P810 = acos_X4, acos_A10, acos_A8                     nop.i 999}// acos_P79  = X4*A9   + A7// acos_P810 = X4*A10  + A8{ .mfi      nop.m 999(p8)  fma.s1        acos_P79  = acos_X4, acos_A9, acos_A7                      nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fma.s1        acos_Ww      = acos_Ww, f1, acos_pi_by_2_lo                nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fma.s1        acos_P610 = acos_X4, acos_P810, acos_A6                    nop.i 999}// acos_P59   = X4*(X4*A9   + A7)  + A5// acos_P610  = X4*(X4*A10  + A8)  + A6{ .mfi      nop.m 999(p8)  fma.s1        acos_P59  = acos_X4, acos_P79, acos_A5                     nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fma.s1        acos_P410 = acos_X4, acos_P610, acos_A4                    nop.i 999}// acos_P39   = X4*(X4*(X4*A9   + A7)  + A5) + A3// acos_P410  = X4*(X4*(X4*A10  + A8)  + A6) + A4{ .mfi      nop.m 999(p8)  fma.s1        acos_P39  = acos_X4, acos_P59, acos_A3                     nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fma.s1        acos_P210 = acos_X4, acos_P410, acos_A2                    nop.i 999}// acos_P19   = X4*(X4*(X4*(X4*A9   + A7)  + A5) + A3) + A1 = P1// acos_P210  = X4*(X4*(X4*(X4*A10  + A8)  + A6) + A4) + A2 = P2{ .mfi      nop.m 999(p8)  fma.s1        acos_P19  = acos_X4, acos_P39, acos_A1                     nop.i 999 ;;}// acos_P1P2 = Xsq*P2 + P1// acos_P1P2 = Xsq*(Xsq*P2 + P1){ .mfi      nop.m 999(p8)  fma.s1        acos_P1P2    = acos_X2, acos_P210, acos_P19                nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fma.s1        acos_P1P2    = acos_X2, acos_P1P2, f0                      nop.i 999 ;;}{ .mfi      nop.m 999(p8)  fms.s1        acos_xPmw    = acos_NORM_f8, acos_P1P2, acos_Ww             nop.i 999 ;;}{ .mfb      nop.m 999(p8)  fms.s0         f8           = acos_W, f1, acos_xPmw                 (p8)  br.ret.spnt   b0 ;;                                                   }// ACOS_ATAN// case 3: 2^-2  <= |x| < 1                      // case 3: 2^-2  <= |x| < 1    ==> p9   ACOS_ATAN// Step 1.1:     Get A,B and a,b// A + a = 1- |X|// B + b = 1+ |X|// Note also that we will use  acos_corr (f13)// and                         acos_W// Step 2// Call __libm_atan2_reg{ .mfi(p0)  mov    acos_GR_fffe = 0xfffe                      (p0)  fma.s1 acos_B          = f1,f1,  acos_ABS_NORM_f8                            (p0)  mov   GR_SAVE_B0 = b0 ;;                                }{ .mmf(p0)  mov   GR_SAVE_GP = gp                                      nop.m 999(p0)  fms.s1 acos_A   = f1,f1,  acos_ABS_NORM_f8                            }{ .mfi(p0)  setf.exp       acos_HALF = acos_GR_fffe                         nop.f 999      nop.i 999 ;;}{ .mfi      nop.m 999(p0)  fms.s1 acos_1mB = f1,f1, acos_B                                             nop.i 999 ;;}// We want atan2(V,U)//   so put V in f8 and U in f9//   but save X in acos_X{ .mfi      nop.m 999(p0)  fmerge.se acos_X = f8, f8                                     nop.i 999 ;;}// Step 1.2://///////////////////////// Get U = sqrt(B)/////////////////////////{ .mfi

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -