📄 libm_reduce.s
字号:
.file "libm_reduce.s"// Copyright (C) 2000, 2001, Intel Corporation// All rights reserved.// // Contributed 2/2/2000 by John Harrison, Ted Kubaska, Bob Norin, Shane Story,// and Ping Tak Peter Tang of the Computational Software Lab, Intel Corporation.//// Redistribution and use in source and binary forms, with or without// modification, are permitted provided that the following conditions are// met://// * Redistributions of source code must retain the above copyright// notice, this list of conditions and the following disclaimer.//// * Redistributions in binary form must reproduce the above copyright// notice, this list of conditions and the following disclaimer in the// documentation and/or other materials provided with the distribution.//// * The name of Intel Corporation may not be used to endorse or promote// products derived from this software without specific prior written// permission.//// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS // CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR // PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY // OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS // SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. // // Intel Corporation is the author of this code, and requests that all// problem reports or change requests be submitted to it directly at // http://developer.intel.com/opensource.//// History: 02/02/00 Initial Version//// *********************************************************************// *********************************************************************//// Function: __libm_pi_by_two_reduce(x) return r, c, and N where// x = N * pi/4 + (r+c) , where |r+c| <= pi/4.// This function is not designed to be used by the// general user.//// *********************************************************************//// Accuracy: Returns double-precision values//// *********************************************************************//// Resources Used://// Floating-Point Registers: f32-f70//// General Purpose Registers:// r8 = return value N// r32 = Address of x// r33 = Address of where to place r and then c // r34-r64//// Predicate Registers: p6-p14//// *********************************************************************//// IEEE Special Conditions://// No condions should be raised. //// *********************************************************************//// I. Introduction// ===============//// For the forward trigonometric functions sin, cos, sincos, and// tan, the original algorithms for IA 64 handle arguments up to // 1 ulp less than 2^63 in magnitude. For double-extended arguments x,// |x| >= 2^63, this routine returns CASE, N and r_hi, r_lo where// // x is accurately approximated by// 2*K*pi + N * pi/2 + r_hi + r_lo, |r_hi+r_lo| <= pi/4.// CASE = 1 or 2.// CASE is 1 unless |r_hi + r_lo| < 2^(-33).// // The exact value of K is not determined, but that information is// not required in trigonometric function computations.// // We first assume the argument x in question satisfies x >= 2^(63). // In particular, it is positive. Negative x can be handled by symmetry:// // -x is accurately approximated by// -2*K*pi + (-N) * pi/2 - (r_hi + r_lo), |r_hi+r_lo| <= pi/4.// // The idea of the reduction is that// // x * 2/pi = N_big + N + f, |f| <= 1/2// // Moreover, for double extended x, |f| >= 2^(-75). (This is an// non-obvious fact found by enumeration using a special algorithm// involving continued fraction.) The algorithm described below // calculates N and an accurate approximation of f.// // Roughly speaking, an appropriate 256-bit (4 X 64) portion of // 2/pi is multiplied with x to give the desired information.// // II. Representation of 2/PI// ==========================// // The value of 2/pi in binary fixed-point is// // .101000101111100110......// // We store 2/pi in a table, starting at the position corresponding// to bit position 63 // // bit position 63 62 ... 0 -1 -2 -3 -4 -5 -6 -7 .... -16576// // 0 0 ... 0 . 1 0 1 0 1 0 1 .... X// // ^// |__ implied binary pt // // III. Algorithm// ==============// // This describes the algorithm in the most natural way using// unsigned interger multiplication. The implementation section // describes how the integer arithmetic is simulated.// // STEP 0. Initialization// ----------------------// // Let the input argument x be // // x = 2^m * ( 1. b_1 b_2 b_3 ... b_63 ), 63 <= m <= 16383.// // The first crucial step is to fetch four 64-bit portions of 2/pi. // To fulfill this goal, we calculate the bit position L of the// beginning of these 256-bit quantity by// // L := 62 - m.// // Note that -16321 <= L <= -1 because 63 <= m <= 16383; and that // the storage of 2/pi is adequate.// // Fetch P_1, P_2, P_3, P_4 beginning at bit position L thus:// // bit position L L-1 L-2 ... L-63// // P_1 = b b b ... b// // each b can be 0 or 1. Also, let P_0 be the two bits correspoding to// bit positions L+2 and L+1. So, when each of the P_j is interpreted// with appropriate scaling, we have//// 2/pi = P_big + P_0 + (P_1 + P_2 + P_3 + P_4) + P_small// // Note that P_big and P_small can be ignored. The reasons are as follow.// First, consider P_big. If P_big = 0, we can certainly ignore it.// Otherwise, P_big >= 2^(L+3). Now, // // P_big * ulp(x) >= 2^(L+3) * 2^(m-63)// >= 2^(65-m + m-63 )// >= 2^2// // Thus, P_big * x is an integer of the form 4*K. So// // x = 4*K * (pi/2) + x*(P_0 + P_1 + P_2 + P_3 + P_4)*(pi/2)// + x*P_small*(pi/2).// // Hence, P_big*x corresponds to information that can be ignored for// trigonometic function evaluation.// // Next, we must estimate the effect of ignoring P_small. The absolute// error made by ignoring P_small is bounded by// // |P_small * x| <= ulp(P_4) * x// <= 2^(L-255) * 2^(m+1)// <= 2^(62-m-255 + m + 1)// <= 2^(-192)// // Since for double-extended precision, x * 2/pi = integer + f, // 0.5 >= |f| >= 2^(-75), the relative error introduced by ignoring// P_small is bounded by 2^(-192+75) <= 2^(-117), which is acceptable.// // Further note that if x is split into x_hi + x_lo where x_lo is the// two bits corresponding to bit positions 2^(m-62) and 2^(m-63); then// // P_0 * x_hi // // is also an integer of the form 4*K; and thus can also be ignored.// Let M := P_0 * x_lo which is a small integer. The main part of the// calculation is really the multiplication of x with the four pieces// P_1, P_2, P_3, and P_4.// // Unless the reduced argument is extremely small in magnitude, it// suffices to carry out the multiplication of x with P_1, P_2, and// P_3. x*P_4 will be carried out and added on as a correction only // when it is found to be needed. Note also that x*P_4 need not be// computed exactly. A straightforward multiplication suffices since// the rounding error thus produced would be bounded by 2^(-3*64),// that is 2^(-192) which is small enough as the reduced argument// is bounded from below by 2^(-75).// // Now that we have four 64-bit data representing 2/pi and a// 64-bit x. We first need to calculate a highly accurate product// of x and P_1, P_2, P_3. This is best understood as integer// multiplication.// // // STEP 1. Multiplication// ----------------------// // // --------- --------- ---------// | P_1 | | P_2 | | P_3 |// --------- --------- ---------// // ---------// X | X |// ---------// ----------------------------------------------------//// --------- ---------// | A_hi | | A_lo |// --------- ---------////// --------- ---------// | B_hi | | B_lo |// --------- ---------////// --------- --------- // | C_hi | | C_lo | // --------- --------- //// ====================================================// --------- --------- --------- ---------// | S_0 | | S_1 | | S_2 | | S_3 |// --------- --------- --------- ---------//////// STEP 2. Get N and f// -------------------// // Conceptually, after the individual pieces S_0, S_1, ..., are obtained,// we have to sum them and obtain an integer part, N, and a fraction, f.// Here, |f| <= 1/2, and N is an integer. Note also that N need only to// be known to module 2^k, k >= 2. In the case when |f| is small enough,// we would need to add in the value x*P_4.// // // STEP 3. Get reduced argument// ----------------------------// // The value f is not yet the reduced argument that we seek. The// equation// // x * 2/pi = 4K + N + f// // says that// // x = 2*K*pi + N * pi/2 + f * (pi/2).// // Thus, the reduced argument is given by// // reduced argument = f * pi/2.// // This multiplication must be performed to extra precision.// // IV. Implementation// ==================// // Step 0. Initialization// ----------------------// // Set sgn_x := sign(x); x := |x|; x_lo := 2 lsb of x.// // In memory, 2/pi is stored contigously as// // 0x00000000 0x00000000 0xA2F....// ^// |__ implied binary bit// // Given x = 2^m * 1.xxxx...xxx; we calculate L := 62 - m. Thus// -1 <= L <= -16321. We fetch from memory 5 integer pieces of data.// // P_0 is the two bits corresponding to bit positions L+2 and L+1// P_1 is the 64-bit starting at bit position L// P_2 is the 64-bit starting at bit position L-64// P_3 is the 64-bit starting at bit position L-128// P_4 is the 64-bit starting at bit position L-192// // For example, if m = 63, P_0 would be 0 and P_1 would look like// 0xA2F...// // If m = 65, P_0 would be the two msb of 0xA, thus, P_0 is 10 in binary.// P_1 in binary would be 1 0 0 0 1 0 1 1 1 1 .... // // Step 1. Multiplication// ----------------------// // At this point, P_1, P_2, P_3, P_4 are integers. They are// supposed to be interpreted as// // 2^(L-63) * P_1;// 2^(L-63-64) * P_2;// 2^(L-63-128) * P_3;// 2^(L-63-192) * P_4;// // Since each of them need to be multiplied to x, we would scale// both x and the P_j's by some convenient factors: scale each// of P_j's up by 2^(63-L), and scale x down by 2^(L-63).// // p_1 := fcvt.xf ( P_1 )// p_2 := fcvt.xf ( P_2 ) * 2^(-64)// p_3 := fcvt.xf ( P_3 ) * 2^(-128)// p_4 := fcvt.xf ( P_4 ) * 2^(-192)// x := replace exponent of x by -1// because 2^m * 1.xxxx...xxx * 2^(L-63)// is 2^(-1) * 1.xxxx...xxx// // We are now faced with the task of computing the following// // --------- --------- ---------// | P_1 | | P_2 | | P_3 |// --------- --------- ---------// // ---------// X | X |// ---------// ----------------------------------------------------// // --------- ---------// | A_hi | | A_lo |// --------- ---------// // --------- ---------// | B_hi | | B_lo |// --------- ---------// // --------- --------- // | C_hi | | C_lo | // --------- --------- // // ====================================================// ----------- --------- --------- ---------// | S_0 | | S_1 | | S_2 | | S_3 |// ----------- --------- --------- ---------// ^ ^// | |___ binary point// |// |___ possibly one more bit// // Let FPSR3 be set to round towards zero with widest precision// and exponent range. Unless an explicit FPSR is given, // round-to-nearest with widest precision and exponent range is// used.// // Define sigma_C := 2^63; sigma_B := 2^(-1); sigma_C := 2^(-65).// // Tmp_C := fmpy.fpsr3( x, p_1 );// If Tmp_C >= sigma_C then// C_hi := Tmp_C;// C_lo := x*p_1 - C_hi ...fma, exact// Else// C_hi := fadd.fpsr3(sigma_C, Tmp_C) - sigma_C// ...subtraction is exact, regardless// ...of rounding direction// C_lo := x*p_1 - C_hi ...fma, exact// End If// // Tmp_B := fmpy.fpsr3( x, p_2 );// If Tmp_B >= sigma_B then// B_hi := Tmp_B;// B_lo := x*p_2 - B_hi ...fma, exact// Else// B_hi := fadd.fpsr3(sigma_B, Tmp_B) - sigma_B// ...subtraction is exact, regardless// ...of rounding direction// B_lo := x*p_2 - B_hi ...fma, exact// End If// // Tmp_A := fmpy.fpsr3( x, p_3 );// If Tmp_A >= sigma_A then// A_hi := Tmp_A;// A_lo := x*p_3 - A_hi ...fma, exact// Else// A_hi := fadd.fpsr3(sigma_A, Tmp_A) - sigma_A// ...subtraction is exact, regardless// ...of rounding direction// A_lo := x*p_3 - A_hi ...fma, exact// End If// // ...Note that C_hi is of integer value. We need only the// ...last few bits. Thus we can ensure C_hi is never a big // ...integer, freeing us from overflow worry.// // Tmp_C := fadd.fpsr3( C_hi, 2^(70) ) - 2^(70);// ...Tmp_C is the upper portion of C_hi// C_hi := C_hi - Tmp_C// ...0 <= C_hi < 2^7// // Step 2. Get N and f// -------------------// // At this point, we have all the components to obtain // S_0, S_1, S_2, S_3 and thus N and f. We start by adding// C_lo and B_hi. This sum together with C_hi gives a good// estimation of N and f. // // A := fadd.fpsr3( B_hi, C_lo )// B := max( B_hi, C_lo )// b := min( B_hi, C_lo )// // a := (B - A) + b ...exact. Note that a is either 0// ...or 2^(-64).// // N := round_to_nearest_integer_value( A );// f := A - N; ...exact because lsb(A) >= 2^(-64)// ...and |f| <= 1/2.// // f := f + a ...exact because a is 0 or 2^(-64);// ...the msb of the sum is <= 1/2// ...lsb >= 2^(-64).// // N := convert to integer format( C_hi + N );// M := P_0 * x_lo;// N := N + M;// // If sgn_x == 1 (that is original x was negative)// N := 2^10 - N// ...this maintains N to be non-negative, but still// ...equivalent to the (negated N) mod 4.// End If// // If |f| >= 2^(-33)// // ...Case 1// CASE := 1// g := A_hi + B_lo;// s_hi := f + g;// s_lo := (f - s_hi) + g;// // Else// // ...Case 2// CASE := 2// A := fadd.fpsr3( A_hi, B_lo )// B := max( A_hi, B_lo )// b := min( A_hi, B_lo )// // a := (B - A) + b ...exact. Note that a is either 0// ...or 2^(-128).// // f_hi := A + f;// f_lo := (f - f_hi) + A;// ...this is exact.// ...f-f_hi is exact because either |f| >= |A|, in which// ...case f-f_hi is clearly exact; or otherwise, 0<|f|<|A|// ...means msb(f) <= msb(A) = 2^(-64) => |f| = 2^(-64).// ...If f = 2^(-64), f-f_hi involves cancellation and is// ...exact. If f = -2^(-64), then A + f is exact. Hence// ...f-f_hi is -A exactly, giving f_lo = 0.// // f_lo := f_lo + a;// // If |f| >= 2^(-50) then// s_hi := f_hi;// s_lo := f_lo;// Else// f_lo := (f_lo + A_lo) + x*p_4// s_hi := f_hi + f_lo// s_lo := (f_hi - s_hi) + f_lo// End If// // End If// // Step 3. Get reduced argument// ----------------------------// // If sgn_x == 0 (that is original x is positive)// // D_hi := Pi_by_2_hi// D_lo := Pi_by_2_lo// ...load from table// // Else// // D_hi := neg_Pi_by_2_hi// D_lo := neg_Pi_by_2_lo// ...load from table// End If// // r_hi := s_hi*D_hi// r_lo := s_hi*D_hi - r_hi ...fma// r_lo := (s_hi*D_lo + r_lo) + s_lo*D_hi// // Return CASE, N, r_hi, r_lo//#include "libm_support.h"FR_X = f32 FR_N = f33 FR_p_1 = f34 FR_TWOM33 = f35 FR_TWOM50 = f36
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -