Software Floating point library
Softgun includes a Software floating point library for calculating with 32 Bit
and 64 Bit floating point numbers. The library is verified against an AMD Opteron
with a lot of more or less random numbers.
Softgun does not implement the floating point exceptions correctly in
the current version (0.19).
Getting the same result on an embedded System and a PC
Some companies build mobile data aquisition boards using a small Microcontroller. It is
often necessary to have exactly the same result on the Microcontroller and on the PC.
This sounds simple because the floating point operations are defined exactly in the
IEEE-754 Standard, but it is astonishing that normaly the smaller Microcontroller
gives a reproducible result and the Intel PC does not round correctly. The reason for this is:
Expansion to 80 bit on Intel compatible PC's
An Intel compatible CPU uses a floating point stack and does calculations by
default with 80 bit even for float and double. Rounding is done when the
floating point number is written back to memory. Because all operations are
done in 80 Bit the result is more precise. The problem with this is that your
floating point result depends on the compiler and the optimization level. When you
use no optimization at all or use only volatile variables then your result is nearer to
the IEEE-754 result because rounding to 64/32 bit float is done after
every step of the calculation.
Incorrect rounding of subnormal numbers
Many processors do not round subnormal numbers correctly (AMD Athlon, Pentium 4, Pentium 3,
Intel Atom). So on this processors you should optimize formula so that there are no subnormal intermediate results.
How do I get a reproducible result on a PC
- Use only one size of floating point variables (float, double or long double).
- Restrict your FPU to this type with the FPU-Control word.
- Make sure that there are no subnormal intermediate results.
Here an example how to switch your Intel CPU to 64 Bit (double) with gcc under Linux:
cw = (_FPU_DEFAULT & ~_FPU_EXTENDED) | _FPU_DOUBLE;