📄 math.txt

📁 FreeRTOS 是一个源码公开的免费的嵌入式实时操作系统
💻 TXT
📖 第 1 页 / 共 2 页
字号:
12 下一页

CONTENTS
--------
* Math Support
* Floating Point
* Floating point Exception Flags
* IEEE754 interoperability
* Fixed Point, Introduction
* Constants
* Fixed Point Interoperability
* Integer Libraries
* Fixed Point Libraries
* Floating Point Library
* Fast and Compact Inline Operations
* Using Prototypes and Multiple Code Pages
* Fixed Point Example
* Floating Point Example
* Code Size Comparison
* How to save code space


MATH SUPPORT
------------

  The math support includes integer, fixed and floating point math
  including library functions:

    Integer:        8, 16, 24 and 32 bit, with and without sign
    Fixed point:    20 different formats, with and without sign
    Floating point: 16, 24 and 32 bit

  Math support for each compiler edition:
             STANDARD   EXTENDED
    int      8+16+24    8+16+24+32
    fixed    8+16+24    8+16+24+32
    float    24+32      16+24+32

  The compiler will automatically locate the required function for
  an operation like 'a*b'.

  The following command line options are available:
    -we: no warning when fixed point constants are rounded
    -wO: warning on operator library calls
    -wI: warning on long inline code for multiplication and division

  Fixed point requires manual worst case analysis to get correct
  results. This must include calculation of accumulated error and
  avoiding truncation and loss of significant bits. It is often
  straight forward to get correct results when using floating point.
  However, floating point functions requires significantly more
  code.

  In general, floating point and fixed point are both slow to
  execute. Note that floating point is FASTER than fixed point on
  multiplication and division, but slower on most other operations.

  Operations not found in the libraries are handled by the built in
  code generator. Also, the compiler will use inline code for
  operations that are most efficient handled inline.

  SAVE CODE AND SAVE RAM: All libraries are optimized to get compact
  code. The floating point library is more compact than the
  Microchip floating point libraries written in assembly. All
  variables (except for the floating point flags) are allocated on
  the generated stack to enable efficient RAM reuse with other local
  variables. A new concept of transparent sharing of parameters in a
  library is introduced to save code.

  CC8E will automatically delete unused library functions. This
  feature can also be used to delete unused user application
  functions.

    #pragma library 1
     .. library functions that are deleted if unused
    #pragma library 0
     .. remaining user application

  The normal use of '#pragma library' is around source library
  files that are included in the user application.


FLOATING POINT
--------------

  The compiler supports 16, 24 and 32 bit floating point. The
  32 bit floating point can be converted to and from IEEE754 by
  3 instructions (macro in math32f.h).

    Format    Resolution    Range
    16 bit    2.4 digits    +/- 3.4e38, +/- 1.1e-38
    24 bit    4.8 digits    +/- 3.4e38, +/- 1.1e-38
    32 bit    7.2 digits    +/- 3.4e38, +/- 1.1e-38

  Note that 16 bit floating point is intended for special
  use where accuracy is less important.

  Supported floating point types:
    float16         : 16 bit floating point
    float, float24  : 24 bit floating point
    double, float32 : 32 bit floating point

  32 bit floating point format:
    address ID
    X       a.low8  : LSB, bit 0-7 of mantissa
    X+1     a.midL8 : bit 8-15 of mantissa
    X+2     a.midH8 : bit 16-22 of mantissa, bit 23: sign bit
    X+3     a.high8 : MSB, bit 0-7 of exponent, with bias 0x7F

    bit 23 of mantissa is a hidden bit, always equal to 1
    zero (0.0) :  a.high8 = 0 (mantissa & sign ignored)

    MSB     LSB
    7F 00 00 00 : 1.0   =  1.0  * 2**(0x7F-0x7F) = 1.0 * 1
    7F 80 00 00 : -1.0  = -1.0  * 2**(0x7F-0x7F) = -1.0 * 1
    80 00 00 00 : 2.0   =  1.0  * 2**(0x80-0x7F) = 1.0 * 2
    80 40 00 00 : 3.0   =  1.5  * 2**(0x80-0x7F) = 1.5 * 2
    7E 60 00 00 : 0.875 =  1.75 * 2**(0x7E-0x7F) = 1.75 * 0.5
    7F 60 00 00 : 1.75  =  1.75 * 2**(0x7E-0x7F) = 1.75 * 1
    7F 7F FF FF : 1.9999998808

    00 7C E3 5A : 0.0 (mantissa & sign ignored)
    00 00 00 00 : 0.0

    01 00 00 00 : 1.1754943508e-38 : smallest number above zero
    FE 7F FF FF : 3.4028234664e+38 : largest number

    FF 00 00 00 : +INF : positive infinity
    FF 80 00 00 : -INF : negative infinity


  24 bit floating point format:

    address  ID
    X        a.low8  : LSB, bit 0-7 of mantissa
    X+1      a.mid8  : bit 8-14 of mantissa, bit 15: sign bit
    X+2      a.high8 : MSB, bit 0-7 of exponent, with bias 0x7F

    bit 15 of mantissa is a hidden bit, always equal to 1
    zero (0.0) :  a.high8 = 0 (mantissa & sign ignored)

    MSB  LSB
    7F 00 00  : 1.0   =  1.0  * 2**(0x7F-0x7F) = 1.0 * 1
    7F 80 00  : -1.0  = -1.0  * 2**(0x7F-0x7F) = -1.0 * 1
    80 00 00  : 2.0   =  1.0  * 2**(0x80-0x7F) = 1.0 * 2
    80 40 00  : 3.0   =  1.5  * 2**(0x80-0x7F) = 1.5 * 2
    7E 60 00  : 0.875 =  1.75 * 2**(0x7E-0x7F) = 1.75 * 0.5
    7F 60 00  : 1.75  =  1.75 * 2**(0x7E-0x7F) = 1.75 * 1
    7F 7F FF  : 1.999969482

    00 7C 5A  : 0.0 (mantissa & sign ignored)

    01 00 00  : 1.17549435e-38 : smallest number above zero
    FE 7F FF  : 3.40277175e+38 : largest number

    FF 00 00  : +INF : positive infinity
    FF 80 00  : -INF : negative infinity


  16 bit floating point format:

    address  ID
     X       a.low8  : LSB, bit 0-6 of mantissa, bit 7: sign bit
     X+1     a.high8 : MSB, bit 0-7 of exponent, with bias 0x7F

     bit 7 of mantissa is a hidden bit, always equal to 1
     zero (0.0) :  a.high8 = 0 (mantissa & sign ignored)

    MSB LSB
     7F 00 : 1.0   =  1.0  * 2**(0x7F-0x7F) = 1.0 * 1
     7F 80 : -1.0  = -1.0  * 2**(0x7F-0x7F) = -1.0 * 1
     80 00 : 2.0   =  1.0  * 2**(0x80-0x7F) = 1.0 * 2
     80 40 : 3.0   =  1.5  * 2**(0x80-0x7F) = 1.5 * 2
     7E 60 : 0.875 =  1.75 * 2**(0x7E-0x7F) = 1.75 * 0.5
     7F 60 : 1.75  =  1.75 * 2**(0x7E-0x7F) = 1.75 * 1
     7F 7F : 1.9921875

     00 7C : 0.0 (mantissa & sign ignored)

     01 00 : 1.175494-38 : smallest number above zero
     FE 7F : 3.389531+38 : largest number

     FF 00 : +INF : positive infinity
     FF 80 : -INF : negative infinity



FLOATING POINT EXCEPTION FLAGS
------------------------------

  The floating point flags are accessible in the application program.
  At program startup the flags should be initialized:

    FpFlags = 0;    // reset all flags, disable rounding
    FpRounding = 1; // enable rounding

  Also, after an exception is detected and handled in the
  application, the exception bit should be cleared so that new
  exceptions can be detected. Exceptions can be ignored if this is
  most convenient. New operations are not affected by old
  exceptions. This also enables delayed handling of exceptions. Only
  the application program can clear exception flags.

    char FpFlags;  // contains the floating point flags

    bit FpOverflow    @ FpFlags.1; // floating point overflow
    bit FpUnderFlow   @ FpFlags.2; // floating point underflow
    bit FpDiv0        @ FpFlags.3; // floating point divide by zero

    bit FpDomainError @ FpFlags.5; // domain error exception

    bit FpRounding    @ FpFlags.6; // floating point rounding
     // FpRounding=0: truncation
     // FpRounding=1: unbiased rounding to nearest LSB



IEEE754 INTEROPERABILITY
------------------------

  The floating point format used is not equivalent to the IEEE754
  standard, but the difference is very small. The reason for using a
  different format is code efficiency.

  IEEE compatibility is needed when floating point values are
  exchanged with the outside world. It may also happen that
  inspecting variables during debugging requires the IEEE754 format
  on some emulators/debuggers.

  Macros for converting to and from IEEE754 are available:

  math32f.h:
     // before sending a floating point value out of the controller:
     float32ToIEEE754(floatVar);  // change to IEEE754 (3 instr.)

     // before using a floating point value received from outside:
     IEEE754ToFloat32(floatVar);  // change from IEEE754 (3 instr.)

  math24f.h:
     float24ToIEEE754(floatVar);  // change to IEEE754 (3 instr.)
     IEEE754ToFloat24(floatVar);  // change from IEEE754 (3 instr.)



FIXED POINT, INTRODUCTION
-------------------------

  Fixed point can be used instead of floating point, mainly to save
  program space. Fixed point math use formats where the decimal
  point is permanently set at byte boundaries. For example, fixed8_8
  use one byte for the integer part and one byte for the decimal
  part. Fixed point operations maps nicely to integer operations
  except for multiplication and division which are supported by
  library functions.

  Example:  fixed8_8 fx;

    fx.low8  : Least significant byte, decimal part
    fx.high8 : Most significant byte, integer part

    MSB LSB   1/256 = 0.00390625
     07 01 : 7 + 0x01*0.00390625 = 7.0039625
     07 80 : 7 + 0x80*0.00390625 = 7.5
     07 FF : 7 + 0xFF*0.00390625 = 7.99609375
     00 00 : 0
     FF 00 : -1 
     FF FF : -1 + 0xFF*0.00390625 = -0.0039625
     7F 00 : +127
     7F FF : +127 + 0xFF*0.00390625 = 127.99609375
     80 00 : -128


  Convention:  fixed<S><I>_<D> :
           <S> : 'U' : unsigned
                 <none>:  signed
           <I> : number of integer bits
           <D> : number of decimal bits

  Thus, fixed16_8 uses 16 bits for the integer part plus 8 bits
  for the decimals, a total of 24 bits. The resolution for fixed16_8
  is 1/256=0.0039 which is the lowest possible increment. This is
  equivalent to 2 decimal digits (actually 2.4 decimal digits).

  Built in fixed point types:

    Type:        #bytes        Range                 Resolution

    fixed8_8     2 (1+1)      -128, +127.996         0.00390625
    fixed8_16    3 (1+2)      -128, +127.99998       0.000015259
    fixed8_24    4 (1+3)      -128, +127.99999994    0.000000059605
    fixed16_8    3 (2+1)    -32768, +32767.996       0.00390625
    fixed16_16   4 (2+2)    -32768, +32767.99998     0.000015259
    fixed24_8    4 (3+1)  -8388608, +8388607.996     0.00390625

    fixedU8_8    2 (1+1)         0, +255.996         0.00390625
    fixedU8_16   3 (1+2)         0, +255.99998       0.000015259
    fixedU8_24   4 (1+3)         0, +255.99999994    0.000000059605
    fixedU16_8   3 (2+1)         0, +65535.996       0.00390625
    fixedU16_16  4 (2+2)         0, +65535.99998     0.000015259
    fixedU24_8   4 (3+1)         0, +16777215.996    0.00390625

    (additional types with decimals only; no integer part)
    fixed_8      1 (0+1)      -0.5, +0.496           0.00390625
    fixed_16     2 (0+2)      -0.5, +0.49998         0.000015259
    fixed_24     3 (0+3)      -0.5, +0.49999994      0.000000059605
    fixed_32     4 (0+4)      -0.5, +0.4999999998    0.0000000002328

    fixedU_8     1 (0+1)         0, +0.996           0.00390625
    fixedU_16    2 (0+2)         0, +0.99998         0.000015259
    fixedU_24    3 (0+3)         0, +0.99999994      0.000000059605
    fixedU_32    4 (0+4)         0, +0.9999999998    0.0000000002328

  To sum up:

  1. All types ending on _8 have 2 correct digits after decimal
     point and a maximum error of 2 on the 3rd decimal digit.

  2. All types ending on _16 have 4 correct digits after decimal
     point and a maximum error of 1 on the 5th decimal digit.

  3. All types ending on _24 have 7 correct digits after decimal
     point and a maximum error of 3 on the 8th decimal digit.

  4. All types ending on _32 have 9 correct digits after decimal
     point and a maximum error of 2 on the 11th decimal digit.


FIXED POINT CONSTANTS
---------------------

  The 32 bit floating point format is used during compilation and
  calculation.

    fixed8_8 a = 10.24;
    fixed16_8 a = 8 * 1.23;
    fixed8_16 x = 2.3e-3;
    fixed8_16 x = 23.45e1;
    fixed8_16 x = 23.45e-2;
    fixed8_16 x = 0.;
    fixed8_16 x = -1.23;

  Constant rounding error example:
    Constant: 0.036
    Variable type: fixed16_8 (1 byte for decimals)
    Error calculation: 0.036*256=9.216
    The byte values assigned to the variable are simply: 0, 0, 9
    The error is: (9/256-0.036)/0.036 = -0.023
    The compiler prints this normalized error as a warning.



TYPE CONVERSION
---------------

  The fixed point types are handled as subtypes of float. Type casts
  are therefore infrequently required.



FIXED POINT INTEROPERABILITY
----------------------------

  It is recommended to stick to one fixed point format in a program.
  The main problem when using mixed types is the enormous number of
  combinations which makes library support a challenge. However,
  many mixed operations are allowed when CC8E can map the types to
  the built in integer code generator:

    fixed8_16 a, b;
    fixed_16 c;
    a = b + c;      // OK, code is generated directly
    a = b * 10.22;  // OK: library function is supplied

    a = b * c;      // a new user library function is required!
    // A type cast can select an existing library function:
    a = b * (fixed8_16)c;




INTEGER LIBRARIES
-----------------

  The math integer libraries allows selection between different
  optimizations, speed or size.

  The libraries contains operations for multiplication, division
  and division remainder.

    math16.h  : basic library, up to 16 bit, signed and unsigned
    math24.h  : basic library, up to 24 bit, signed and unsigned
    math32.h  : basic library, up to 32 bit, signed and unsigned

    math16m.h : speed & size, 8*8, 16*16
    math24m.h : speed & size, 8*8, 16*16, and 24*8 multiply.
    math32m.h : speed & size, 8*8, 16*16, and 32*8 multiply.
                These libraries can be used when execution speed
                is critical.
                NOTE 1: they must be included first (before math??.h)
                NOTE 2: math??.h contains similar functions (which
                        are deleted)

  The min and max timing cycles have been found by simulating many 
  thousands calculations. However, the min and max limits are not 
  quaranteed to be correct.

    Sign: -: unsigned, S: signed
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -