📄 w_tgamma.s
字号:
.file "tgamma.s"// Copyright (c) 2001 - 2005, Intel Corporation// All rights reserved.//// Contributed 2001 by the Intel Numerics Group, Intel Corporation//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,INCLUDING,BUT NOT // LIMITED TO,THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT,INDIRECT,INCIDENTAL,SPECIAL,// EXEMPLARY,OR CONSEQUENTIAL DAMAGES (INCLUDING,BUT NOT LIMITED TO,// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,DATA,OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY,WHETHER IN CONTRACT,STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE,EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code,and requests that all// problem reports or change requests be submitted to it directly at // http://www.intel.com/software/products/opensource/libraries/num.htm.////*********************************************************************//// History: // 10/12/01 Initial version// 05/20/02 Cleaned up namespace and sf0 syntax// 02/10/03 Reordered header: .section, .global, .proc, .align// 04/04/03 Changed error codes for overflow and negative integers// 04/10/03 Changed code for overflow near zero handling// 03/31/05 Reformatted delimiters between data tables////*********************************************************************////*********************************************************************//// Function: tgamma(x) computes the principle value of the GAMMA// function of x.////*********************************************************************//// Resources Used://// Floating-Point Registers: f8-f15// f33-f87//// General Purpose Registers:// r8-r11// r14-r28// r32-r36// r37-r40 (Used to pass arguments to error handling routine)//// Predicate Registers: p6-p15////*********************************************************************//// IEEE Special Conditions://// tgamma(+inf) = +inf// tgamma(-inf) = QNaN // tgamma(+/-0) = +/-inf // tgamma(x<0, x - integer) = QNaN// tgamma(SNaN) = QNaN// tgamma(QNaN) = QNaN////*********************************************************************//// Overview//// The method consists of three cases.// // If 2 <= x < OVERFLOW_BOUNDARY use case tgamma_regular;// else if 0 < x < 2 use case tgamma_from_0_to_2;// else if -(i+1) < x < -i, i = 0...184 use case tgamma_negatives;//// Case 2 <= x < OVERFLOW_BOUNDARY// -------------------------------// Here we use algorithm based on the recursive formula// GAMMA(x+1) = x*GAMMA(x). For that we subdivide interval// [2; OVERFLOW_BOUNDARY] into intervals [16*n; 16*(n+1)] and// approximate GAMMA(x) by polynomial of 22th degree on each// [16*n; 16*n+1], recursive formula is used to expand GAMMA(x)// to [16*n; 16*n+1]. In other words we need to find n, i and r// such that x = 16 * n + i + r where n and i are integer numbers// and r is fractional part of x. So GAMMA(x) = GAMMA(16*n+i+r) =// = (x-1)*(x-2)*...*(x-i)*GAMMA(x-i) =// = (x-1)*(x-2)*...*(x-i)*GAMMA(16*n+r) ~// ~ (x-1)*(x-2)*...*(x-i)*P22n(r).//// Step 1: Reduction// -----------------// N = [x] with truncate// r = x - N, note 0 <= r < 1//// n = N & ~0xF - index of table that contains coefficient of// polynomial approximation // i = N & 0xF - is used in recursive formula// //// Step 2: Approximation// ---------------------// We use factorized minimax approximation polynomials// P22n(r) = A22*(r^2+C01(n)*R+C00(n))*// *(r^2+C11(n)*R+C10(n))*...*(r^2+CA1(n)*R+CA0(n))//// Step 3: Recursion// -----------------// In case when i > 0 we need to multiply P22n(r) by product// R(i)=(x-1)*(x-2)*...*(x-i). To reduce number of fp-instructions// we can calculate R as follow: // R(i) = ((x-1)*(x-2))*((x-3)*(x-4))*...*((x-(i-1))*(x-i)) if i is// even or R = ((x-1)*(x-2))*((x-3)*(x-4))*...*((x-(i-2))*(x-(i-1)))*// *(i-1) if i is odd. In both cases we need to calculate// R2(i) = (x^2-3*x+2)*(x^2-7*x+12)*...*(x^2+x+2*j*(2*j-1)) =// = (x^2-3*x+2)*(x^2-7*x+12)*...*((x^2+x)+2*j*(2*(j-1)+(1-2*x))) =// = (RA+2*(2-RB))*(RA+4*(4-RB))*...*(RA+2*j*(2*(j-1)+RB))// where j = 1..[i/2], RA = x^2+x, RB = 1-2*x.//// Step 4: Reconstruction// ----------------------// Reconstruction is just simple multiplication i.e.// GAMMA(x) = P22n(r)*R(i)//// Case 0 < x < 2// --------------// To calculate GAMMA(x) on this interval we do following// if 1 <= x < 1.25 than GAMMA(x) = P15(x-1)// if 1.25 <= x < 1.5 than GAMMA(x) = P15(x-x_min) where// x_min is point of local minimum on [1; 2] interval.// if 1.5 <= x < 2.0 than GAMMA(x) = P15(x-1.5)// and // if 0 < x < 1 than GAMMA(x) = GAMMA(x+1)/x//// Case -(i+1) < x < -i, i = 0...184// ----------------------------------// Here we use the fact that GAMMA(-x) = PI/(x*GAMMA(x)*sin(PI*x)) and// so we need to calculate GAMMA(x), sin(PI*x)/PI. Calculation of// GAMMA(x) is described above.//// Step 1: Reduction// -----------------// Note that period of sin(PI*x) is 2 and range reduction for // sin(PI*x) is like to range reduction for GAMMA(x) // i.e r = x - [x] with exception of cases// when r > 0.5 (in such cases r = 1 - (x - [x])).//// Step 2: Approximation// ---------------------// To approximate sin(PI*x)/PI = sin(PI*(2*n+r))/PI = // = (-1)^n*sin(PI*r)/PI Taylor series is used.// sin(PI*r)/PI ~ S21(r).//// Step 3: Division// ----------------// To calculate 1/(x*GAMMA(x)*S21(r)) we use frcpa instruction// with following Newton-Raphson interations.// ////*********************************************************************GR_Sig = r8GR_TAG = r8GR_ad_Data = r9GR_SigRqLin = r10GR_iSig = r11GR_ExpOf1 = r11GR_ExpOf8 = r11GR_Sig2 = r14GR_Addr_Mask1 = r15GR_Sign_Exp = r16GR_Tbl_Offs = r17GR_Addr_Mask2 = r18GR_ad_Co = r19GR_Bit2 = r19GR_ad_Ce = r20GR_ad_Co7 = r21GR_NzOvfBound = r21GR_ad_Ce7 = r22GR_Tbl_Ind = r23GR_Tbl_16xInd = r24GR_ExpOf025 = r24GR_ExpOf05 = r25GR_0x30033 = r26GR_10 = r26GR_12 = r27GR_185 = r27GR_14 = r28GR_2 = r28GR_fpsr = r28GR_SAVE_B0 = r33GR_SAVE_PFS = r34GR_SAVE_GP = r35GR_SAVE_SP = r36GR_Parameter_X = r37GR_Parameter_Y = r38GR_Parameter_RESULT = r39GR_Parameter_TAG = r40FR_X = f10FR_Y = f1 // tgamma is single argument functionFR_RESULT = f8FR_AbsX = f9FR_NormX = f9FR_r02 = f11FR_AbsXp1 = f12FR_X2pX = f13FR_1m2X = f14FR_Rq1 = f14FR_Xt = f15FR_r = f33FR_OvfBound = f34FR_Xmin = f35FR_2 = f36FR_Rcp1 = f36FR_Rcp3 = f36FR_4 = f37FR_5 = f38FR_6 = f39FR_8 = f40FR_10 = f41FR_12 = f42FR_14 = f43FR_GAMMA = f43FR_05 = f44FR_Rq2 = f45FR_Rq3 = f46FR_Rq4 = f47FR_Rq5 = f48FR_Rq6 = f49FR_Rq7 = f50FR_RqLin = f51FR_InvAn = f52FR_C01 = f53FR_A15 = f53FR_C11 = f54FR_A14 = f54FR_C21 = f55FR_A13 = f55FR_C31 = f56FR_A12 = f56FR_C41 = f57FR_A11 = f57FR_C51 = f58FR_A10 = f58FR_C61 = f59FR_A9 = f59FR_C71 = f60FR_A8 = f60FR_C81 = f61FR_A7 = f61FR_C91 = f62FR_A6 = f62FR_CA1 = f63FR_A5 = f63FR_C00 = f64FR_A4 = f64FR_rs2 = f64FR_C10 = f65FR_A3 = f65FR_rs3 = f65FR_C20 = f66FR_A2 = f66FR_rs4 = f66FR_C30 = f67FR_A1 = f67FR_rs7 = f67FR_C40 = f68FR_A0 = f68FR_rs8 = f68FR_C50 = f69FR_r2 = f69FR_C60 = f70FR_r3 = f70FR_C70 = f71FR_r4 = f71FR_C80 = f72FR_r7 = f72FR_C90 = f73FR_r8 = f73FR_CA0 = f74FR_An = f75FR_S21 = f76FR_S19 = f77FR_Rcp0 = f77FR_Rcp2 = f77FR_S17 = f78FR_S15 = f79FR_S13 = f80FR_S11 = f81FR_S9 = f82FR_S7 = f83FR_S5 = f84FR_S3 = f85FR_iXt = f86FR_rs = f87// Data tables//==============================================================RODATA.align 16LOCAL_OBJECT_START(tgamma_data)data8 0x406573FAE561F648 // overflow boundary (171.624376956302739927196)data8 0x3FDD8B618D5AF8FE // point of local minium (0.461632144968362356785)////[2; 3]data8 0xEF0E85C9AE40ABE2,0x00004000 // C01data8 0xCA2049DDB4096DD8,0x00004000 // C11data8 0x99A203B4DC2D1A8C,0x00004000 // C21data8 0xBF5D9D9C0C295570,0x00003FFF // C31data8 0xE8DD037DEB833BAB,0x00003FFD // C41data8 0xB6AE39A2A36AA03A,0x0000BFFE // C51data8 0x804960DC2850277B,0x0000C000 // C61data8 0xD9F3973841C09F80,0x0000C000 // C71data8 0x9C198A676F8A2239,0x0000C001 // C81data8 0xC98B7DAE02BE3226,0x0000C001 // C91data8 0xE9CAF31AC69301BA,0x0000C001 // CA1data8 0xFBBDD58608A0D172,0x00004000 // C00data8 0xFDD0316D1E078301,0x00004000 // C10data8 0x8630B760468C15E4,0x00004001 // C20data8 0x93EDE20E47D9152E,0x00004001 // C30data8 0xA86F3A38C77D6B19,0x00004001 // C40//[16; 17]data8 0xF87F757F365EE813,0x00004000 // C01data8 0xECA84FBA92759DA4,0x00004000 // C11data8 0xD4E0A55E07A8E913,0x00004000 // C21data8 0xB0EB45E94C8A5F7B,0x00004000 // C31data8 0x8050D6B4F7C8617D,0x00004000 // C41data8 0x8471B111AA691E5A,0x00003FFF // C51data8 0xADAF462AF96585C9,0x0000BFFC // C61data8 0xD327C7A587A8C32B,0x0000BFFF // C71data8 0xDEF5192B4CF5E0F1,0x0000C000 // C81data8 0xBADD64BB205AEF02,0x0000C001 // C91data8 0x9330A24AA67D6860,0x0000C002 // CA1data8 0xF57EEAF36D8C47BE,0x00004000 // C00data8 0x807092E12A251B38,0x00004001 // C10data8 0x8C458F80DEE7ED1C,0x00004001 // C20data8 0x9F30C731DC77F1A6,0x00004001 // C30data8 0xBAC4E7E099C3A373,0x00004001 // C40//[32; 33]data8 0xC3059A415F142DEF,0x00004000 // C01data8 0xB9C1DAC24664587A,0x00004000 // C11data8 0xA7101D910992FFB2,0x00004000 // C21data8 0x8A9522B8E4AA0AB4,0x00004000 // C31data8 0xC76A271E4BA95DCC,0x00003FFF // C41data8 0xC5D6DE2A38DB7FF2,0x00003FFE // C51data8 0xDBA42086997818B2,0x0000BFFC // C61data8 0xB8EDDB1424C1C996,0x0000BFFF // C71data8 0xBF7372FB45524B5D,0x0000C000 // C81data8 0xA03DDE759131580A,0x0000C001 // C91data8 0xFDA6FC4022C1FFE3,0x0000C001 // CA1data8 0x9759ABF797B2533D,0x00004000 // C00data8 0x9FA160C6CF18CEC5,0x00004000 // C10data8 0xB0EFF1E3530E0FCD,0x00004000 // C20data8 0xCCD60D5C470165D1,0x00004000 // C30data8 0xF5E53F6307B0B1C1,0x00004000 // C40//[48; 49]data8 0xAABE577FBCE37F5E,0x00004000 // C01data8 0xA274CAEEB5DF7172,0x00004000 // C11data8 0x91B90B6646C1B924,0x00004000 // C21data8 0xF06718519CA256D9,0x00003FFF // C31data8 0xAA9EE181C0E30263,0x00003FFF // C41data8 0xA07BDB5325CB28D2,0x00003FFE // C51data8 0x86C8B873204F9219,0x0000BFFD // C61data8 0xB0192C5D3E4787D6,0x0000BFFF // C71data8 0xB1E0A6263D4C19EF,0x0000C000 // C81data8 0x93BA32A118EAC9AE,0x0000C001 // C91data8 0xE942A39CD9BEE887,0x0000C001 // CA1data8 0xE838B0957B0D3D0D,0x00003FFF // C00data8 0xF60E0F00074FCF34,0x00003FFF // C10data8 0x89869936AE00C2A5,0x00004000 // C20data8 0xA0FE4E8AA611207F,0x00004000 // C30data8 0xC3B1229CFF1DDAFE,0x00004000 // C40//[64; 65]data8 0x9C00DDF75CDC6183,0x00004000 // C01data8 0x9446AE9C0F6A833E,0x00004000 // C11data8 0x84ABC5083310B774,0x00004000 // C21data8 0xD9BA3A0977B1ED83,0x00003FFF // C31data8 0x989B18C99411D300,0x00003FFF // C41data8 0x886E66402318CE6F,0x00003FFE // C51data8 0x99028C2468F18F38,0x0000BFFD // C61data8 0xAB72D17DCD40CCE1,0x0000BFFF // C71data8 0xA9D9AC9BE42C2EF9,0x0000C000 // C81data8 0x8C11D983AA177AD2,0x0000C001 // C91data8 0xDC779E981C1F0F06,0x0000C001 // CA1data8 0xC1FD4AC85965E8D6,0x00003FFF // C00data8 0xCE3D2D909D389EC2,0x00003FFF // C10data8 0xE7F79980AD06F5D8,0x00003FFF // C20data8 0x88DD9F73C8680B5D,0x00004000 // C30data8 0xA7D6CB2CB2D46F9D,0x00004000 // C40//[80; 81]data8 0x91C7FF4E993430D0,0x00004000 // C01data8 0x8A6E7AB83E45A7E9,0x00004000 // C11data8 0xF72D6382E427BEA9,0x00003FFF // C21data8 0xC9E2E4F9B3B23ED6,0x00003FFF // C31data8 0x8BEFEF56AE05D775,0x00003FFF // C41data8 0xEE9666AB6A185560,0x00003FFD // C51data8 0xA6AFAF5CEFAEE04D,0x0000BFFD // C61data8 0xA877EAFEF1F9C880,0x0000BFFF // C71data8 0xA45BD433048ECA15,0x0000C000 // C81data8 0x86BD1636B774CC2E,0x0000C001 // C91data8 0xD3721BE006E10823,0x0000C001 // CA1data8 0xA97EFABA91854208,0x00003FFF // C00data8 0xB4AF0AEBB3F97737,0x00003FFF // C10data8 0xCC38241936851B0B,0x00003FFF // C20data8 0xF282A6261006EA84,0x00003FFF // C30data8 0x95B8E9DB1BD45BAF,0x00004000 // C40//[96; 97]data8 0x8A1FA3171B35A106,0x00004000 // C01data8 0x830D5B8843890F21,0x00004000 // C11data8 0xE98B0F1616677A23,0x00003FFF // C21data8 0xBDF8347F5F67D4EC,0x00003FFF // C31data8 0x825F15DE34EC055D,0x00003FFF // C41data8 0xD4846186B8AAC7BE,0x00003FFD // C51data8 0xB161093AB14919B1,0x0000BFFD // C61data8 0xA65758EEA4800EF4,0x0000BFFF // C71data8 0xA046B67536FA329C,0x0000C000 // C81data8 0x82BBEC1BCB9E9068,0x0000C001 // C91data8 0xCC9DE2B23BA91B0B,0x0000C001 // CA1data8 0x983B16148AF77F94,0x00003FFF // C00data8 0xA2A4D8EE90FEE5DD,0x00003FFF // C10data8 0xB89446FA37FF481C,0x00003FFF // C20data8 0xDC5572648485FB01,0x00003FFF // C30data8 0x88CD5D7DB976129A,0x00004000 // C40//[112; 113]data8 0x8417098FD62AC5E3,0x00004000 // C01data8 0xFA7896486B779CBB,0x00003FFF // C11data8 0xDEC98B14AF5EEBD1,0x00003FFF // C21data8 0xB48E153C6BF0B5A3,0x00003FFF // C31
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -