## Is my floating point number equal to zero?

One often sees code like the following, embodying the feeling programmers have that “you should never compare a float to zero”.

double func(double a, double b) { double diff = a - b; if (fabs(diff) < EPSILON) throw ...; else return 1/diff; }

But what are you trying to avoid here? Is it the divide by zero? Then avoid that…

double func(double a, double b) { double diff = a - b; if (diff == 0) throw ...; else return 1/diff; }

“But”, you say, “what if diff is really small”? You mean like 1e-128? Then this function will return 1e+128.
Floating point can deal with it. If you didn't need that dynamic range, you'ld be using `float`

instead of `double`

.

OK, what about 1e-309? Aha, different issue (denormal floats). Now you won't get a divide by zero, most likely you'll return Inf, and… your code might run very slowly. Aha! No-one wants slow code, right? OK, but then why the “if” in the first place? That's slowing down every call to the function. If your workloads mostly don't call the function with nearly equal very small numbers, then you can go a lot faster in the common case by not checking.

“Ah, but I want the additional error checking”. OK, fine. I'm with you. But is this the right place? Why did the caller call the routine with nearly equal very small numbers? Is that physically likely in whatever real-world problem you're trying to solve? If not, one should probably protect at the call site with an assert.

Or maybe it's a precondition of the function call that a not be equal to b, and it's a coding bug if not satisfied.

double func(double a, double b) { assert(a - b != 0); return 1/(a-b); }

Don't forget to leave your asserts on in all builds unless the profiler tells you otherwise. But hang on, you're on a platform which will give you line number here when you get the divby0 right? So the assert is superfluous – just enable exceptions on divide by zero, underflow, and overflow.

And there's also a very simple case. If I stored 0.3f somewhere, I expect to get it back. It's irrelevant that 0.3f doesn't have an exact binary representation – whatever it translates to, I should get exactly that back

void my_unit_test() { my_matrix<2,2,float> a = {3.0f, 1.1f, 2.2f, 4.0f}; TEST_ASSERT_EQ_APPROX(a(0,0), 3.0f, 1e-8f); // No, no, no. There is no reason for the stored value to be different. TEST_ASSERT_EQ(a(0,0), 3.0f); // I gave you whatever 3.0f is. I want it back. }

You may worry about excess precision on x86, but there's no computation going on here, so we can and should ask for exact equality.

### But my case is more subtle than that

double sinc(double x) { if (fabs(x) < EPSILON) return 1; else return sin(x)/x; }

Even here, you can think about what a better value to compare to might be. First note that sin(x)/x works fine for all floats except zero, so you can avoid a dependency on fabs and a definition of epsilon by just comparing to zero.

But perhaps the profiler has told you that a big chunk of program time is spent in computing sin() and dividing, and furthermore you know that most times it's called with values near zero. Well then, let's do what we do with any special function that goes slow: chop up the domain and special case. It's true that sin(x)/x is exactly floating point 1.0 when x is small. In fact, compute the taylor series, see that

sin(x)/x = 1 - x^2/6 + …

and realize the threshold is sqrt(eps*6). In fact it can be a bit larger, which may help if those values “near” zero were in the range 3e-8 to 1e-6. Experiment a bit and find this code:

double sinc(double x) { if (fabs(x) < 1e-6) return 1; else return sin(x)/x; }

It's a good idea then to write a blog post about the issue, because Toby Sharp might comment (see below) and point out that there is a much better choice:

“… since we need some kind of test for x=0 anyway, I don't see why we shouldn't take the opportunity to improve accuracy in this vicinity. And yes, it only costs 2 multiplies which is less than using the sin in those cases. So we are improving both performance and accuracy in the range (0, threshold). threshold is chosen to be the largest value such that the quadratic approximation is as good as the direct evaluation.”

float sinc(float x) { // Toby's threshold, determined through binary search as the best value for minimizing absolute error if (std::abs(x) < 0.0406015441f) return 1.0f - (x * x) * (1.0f / 6.0f); else return std::sin(x) / x; }

#### And templates?

Yes, the bare constant above should tell you that for templating over floating types you need to care about values of epsilons. It's not just a matter of numeric_limits<T>, but needs to be worked out for each case.

Another reason to compare to zero when you can.

Also, compares to zero may be faster on your platform. And you don't have to define epsilon somewhere.

### And more...

double sincminusone(double x) { if (fabs(x) < 1e-6) return 0; // But x^2/6 gives the right answer, not just truncating to the nearest zero. else return sin(x)/x-1; // Or try (sin(x)-x)/x and compare to zero instead... }

See also * https://stackoverflow.com/questions/51134021/floating-point-equality

## Discussion

Nice post, tricky subject. Every divide is a crash waiting to happen, or worse, a nan. Maybe people are concerned about denormalised numbers, whose reciprocals would overflow iirc?

I like to tell people that half the representable floating point numbers are between -1 and 1. So I think that units are pretty important. Should you really care about e-10 if your units are metres? Or radians? Where the units stop making sense is where you're going to find numerical problems.

The way I like to compute sinc: quadratic approximation close to zero. The threshold for when to take this approximation can be pre-determined by minimizing error against a more precisely computed value.

double sinc(double x) {

}

Toby, but for what range is 1-x^2/6 different from what sin(x)/x will produce? As far as I can see, it's only a speed question?

As an example, let's consider using floats with x = 0.00043f. (std::sin(x)/x) evaluates to 1, whereas the code below evaluates to 0.99999994. So using the direct method has an error of 6e-8. For doubles, the direct method has an error of 2.2e-16 compared to the Taylor series at x=0.000279. OK, these are small errors, but since we need some kind of test for x=0 anyway, I don't see why we shouldn't take the opportunity to improve accuracy in this vicinity. And yes, it only costs 2 multiplies which is less than using the sin in those cases. So we are improving both performance and accuracy in the range (0, threshold). threshold is chosen to be the largest value such that the quadratic approximation is as good as the direct evaluation.

float sinc(float x) {

}

Thanks Toby, great comments. Quite interesting how large the threshold is for these cases: 0.04 is certainly not something I would have guessed, or even obtained from considerations around sqrt(eps).