A Detailed Study of the Numerical Accuracy of GPU-Implemented Math Functions

Dan Fay, Ali Sazegari, Daniel A. Connors.
Supercomputing '06 Workshop on General-Purpose GPU Computing: Practice And Experience. November, 2006.
Modern programmable GPUs have demonstrated their ability to significantly accelerate certain important classes of non-graphics applications; however, GPUs' slipshod support for floating-point arithmetic severely limits their usefulness for general-purpose computing. Current GPUs do not support double-precision computation and their single-precision support glosses over important aspects of the IEEE-754 floating-point standard. Producing correctly rounded results and providing proper closure of the number system is critical for the adaptation of GPUs for general purpose computing. Previous studies of GPUs' numerical accuracy quantified only the "overall" accuracy of different arithmetic and math functions on the GPU by providing an average error and/or an error bounds for each operation. Since many algorithms' correctness depends on the precise, consistent results provided by the IEEE-754 floating-point standard[5], it is also essential to exactly quantify the GPUs' correctness for important edge cases. These edge cases deliberately expose numeric errors likely to occur in IEEE-754 implementations, such as inputs that involve denormalized numbers, +/- 0, infinities, and Not a Number (NaN). GPUs must also provide the programmer with rrobust results: given the same input, the same program should produce the same output regardless of the GPU platform. Such robustness needs to exist not only between different GPU vendors, but also between different GPU software platforms (shader language compiler, driver and operating system) and between vendors' GPU families. To investigate the issues of edge-case correctness and robustness, we tested the accuracy of the basic arithmetic operators (add, subtract, multiply and divide) as well as other important math functions (sine, cosine, tangent, exponential, etc.). These tests were run on a variety of different GPU platforms from both ATi and nVIDIA. For the math functions, we tested the GPUs' results produced using the math functions built in to the OpenGL Shading Language (GLSL) along with the results produced with a GPU port of the high-performance Cephes Math Library. Finally, we compared these results against reference values produced by libm as well as by vForce, Apple Computer's high-performance vectorized math library. Our results show that there are serious errors with the GPUs' results at certain edge cases, in addition to the incorrect handling of denormalized numbers. One example of this is the incorrect handling of -0. This causes problems with division; for example, +1/-0 should equal -infinity, not +infinity. Another example is the square root, where the sqrt() function in GLSL completely ignores the sign of the operand, returning a positive normal number instead of a NaN. Finally, we have observed inconsistencies between GPUs from different vendors. An example of this is with 0/0: the nVIDIA GeForce FX 7300 correctly produces a NaN result, while the ATi x1600 hardware produces an incorrect result of +0.

[ PDF ]