floating-point-equality

One often sees code like the following, embodying the feeling programmers have that “you should never compare a float to zero”.

double func(double a, double b) { double diff = a - b; if (fabs(diff) < EPSILON) throw ...; else return 1/diff; }

But what are you trying to avoid here? Is it the divide by zero exception? Then avoid that:

double func(double a, double b) { double diff = a - b; if (diff == 0) throw ...; else return 1/diff; }

“But”, you say, “what if diff is really small”? You mean like 1e-128? Then this function will return 1e+128. Floating point can deal with it. If you didn't need that dynamic range, you'ld be using `float`

instead of `double`

. And it would still be fine. If you're worried about floating underflow, then compare to a constant you named MAX_OF_FLT_MIN_AND_RECIPROCAL_OF_FLT_MAX, rather than some random EPSILON that is not related to either of these quantities.

Or maybe it's a precondition of the function call that a not be equal to b, and it's a coding bug if not satisfied.

double func(double a, double b) { assert(a != b); return 1/(a-b); }

Don't forget to leave your asserts on in all builds unless the profiler tells you otherwise. But hang on, you're on a platform which will give you line number here when you get the divby0 right? So the assert is superfluous.

double sinc(double x) { if (fabs(x) < EPSILON) return 1; else return sin(x)/x; }

Even here, you can think about what a better value to compare to might be. First note that sin(x)/x works fine for all normalized floats except zero, so you can avoid a dependency on fabs and a definition of epsilon by just comparing to zero.

But perhaps the profiler has told you that a big chunk of program time is spent in computing sin() and dividing, and furthermore you know that most times it's called with values near zero. Well then, let's do what we do with any special function that goes slow: chop up the domain and special case. It's true that sin(x)/x is exactly floating point 1.0 when x is small. In fact, compute the taylor series, see that

sin(x)/x = 1 - x^2/6 + …

and realize the threshold is sqrt(eps*6). In fact it can be a bit larger, which may help if those values “near” zero were in the range 3e-8 to 1e-6. Experiment a bit and find this code:

double sinc(double x) { if (fabs(x) < 1e-6) return 1; else return sin(x)/x; }

Yes, the bare constant above should tell you that for templating over floating types you need to care about values of epsilons. It's not just a matter of numeric_limits<T>, but needs to be worked out for each case.

Another reason to compare to zero when you can.

Also, compares to zero may be faster on your platform. And you don't have to define epsilon somewhere.

floating-point-equality.txt · Last modified: 2017/04/19 12:15 by awf

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution 4.0 International

## Discussion

Nice post, tricky subject. Every divide is a crash waiting to happen, or worse, a nan. Maybe people are concerned about denormalised numbers, whose reciprocals would overflow iirc?

I like to tell people that half the representable floating point numbers are between -1 and 1. So I think that units are pretty important. Should you really care about e-10 if your units are metres? Or radians? Where the units stop making sense is where you're going to find numerical problems.

The way I like to compute sinc: quadratic approximation close to zero. The threshold for when to take this approximation can be pre-determined by minimizing error against a more precisely computed value.

double sinc(double x) {

}

Toby, but for what range is 1-x^2/6 different from what sin(x)/x will produce? As far as I can see, it's only a speed question?

As an example, let's consider using floats with x = 0.00043f. (std::sin(x)/x) evaluates to 1, whereas the code below evaluates to 0.99999994. So using the direct method has an error of 6e-8. For doubles, the direct method has an error of 2.2e-16 compared to the Taylor series at x=0.000279. OK, these are small errors, but since we need some kind of test for x=0 anyway, I don't see why we shouldn't take the opportunity to improve accuracy in this vicinity. And yes, it only costs 2 multiplies which is less than using the sin in those cases. So we are improving both performance and accuracy in the range (0, threshold). threshold is chosen to be the largest value such that the quadratic approximation is as good as the direct evaluation.

float sinc(float x) {

}

Thanks Toby, great comments. Quite interesting how large the threshold is for these cases: 0.04 is certainly not something I would have guessed, or even obtained from considerations around sqrt(eps).