Floating Point Precision with Log Likelihoods
Computing the log likelihood is a common task in probabilistic machine learning, but it can easily under- or overflow. I discuss one such issue and its resolution.
The problem: floating point precision
In my research, I encountered an odd issue: my log likelihood was always or -inf
. The problem was that this term was always -inf
:
for some diagonal matrix . To understand what was happening, consider the scenario where the elements of are very small. Since the determinant of a diagonal matrix is just the product of the elements along the diagonal,
if the values are small, this number might be rounded to due to our computer’s floating point precision. Then when we take the log of , we get -inf
. To see this, consider this Python snippet:
>>> import numpy as np
>>> A = np.ones(100) * 1e-5
>>> np.linalg.det(np.diag(A))
0
To convince ourselves that floating point precision is the culprit, let us see if the above calculation results in a number that is too small for our machine. On my machine, the smallest possible floating point number in Python is
>>> import sys
>>> sys.float_info.min * sys.float_info.epsilon
5e-324
I can use the Python REPL to quickly see that 5e-324
is handled fine but 5e-325
is rounded as 0.0
. Now consider our previous snippet of code. The determinant is going to be 1e-5 ** 100
or 1e-500
. This is clearly too small to be represented in Python, and so it will be rounded to 0
. Since 324 / 5 = 64.8
, we can predict that (1e-5)**64
will cause no issues, while (1e-5)**65
will be rounded to 0
:
>>> (1e-5)**64
1e-320
>>> (1e-5)**65
0.0
So we have found our issue. And note that for a large enough , the numbers along the diagonal do not have to be that small to result in a rounding error. How can we fix this?
The solution: a little math
As mentioned above, the determinant of a diagonal matrix is the product of the elements along its diagonal,
So the log of this is
If we compute the right-hand side instead of the left-hand side, we might avoid an issue with floating point precision because we’re taking the log of much bigger numbers and then adding them together. Let’s try it:
>>> import numpy as np
>>> A = np.ones(100) * 1e-5
>>> np.linalg.det(np.diag(A))
0.0
>>> np.log(A).sum()
-1151.2925464970228
Okay! Now we have a better estimate using fixed-precision computers. As a sanity check, if our numbers are not small, these two calculations should give us roughly the same number:
>>> A = np.ones(100) * 2
>>> np.log(np.linalg.det(np.diag(A)))
69.31471805599459
>>> np.log(A).sum()
69.31471805599453
And they do.
Acknowledgements
I want to thank Bianca Dumitrascu for catching this bug and explaining the issue.