Ok, let's do a quick post on an annoying bug I encountered today.
Recently, I have been working on some algorithms for Bayesian optimization. As I have been extensively benchmarking my new implementations against existing methods, I noticed a strange behavior. In a certain setup, the output of the function is completely horizontal, which by no means is the expected behavior.
I was absolutely clueless initially, as I have been working on this code for
several weeks and the results were generally good. After hours of debugging, I
realized the problem was due to a mismatch in the shapes of two tensors
introduced by a wrong call of squeeze()
.
Say I have two tensors x
and y
. Both are two-dimensional with the same
shape. At the end of the numerical expression, we reduce the last dimension by
taking the mean value along that dimension.
import torch
x = torch.empty((10, 1))
y = torch.empty((10, 1))
z = x + y
w = z.mean(dim=-1)
print(f"x: {x.shape}")
print(f"y: {y.shape}")
print(f"z: {z.shape}")
print(f"w: {w.shape}")
x: torch.Size([10, 1]) y: torch.Size([10, 1]) z: torch.Size([10, 1]) w: torch.Size([10])
We simply do an element-wise addition of x
and y
, and then take the mean
along the last dimension. The final result w
should be a one-dimensional
tensor.
Now, let's assume we mildly screw up a little bit in the dimensionalities.
x2 = torch.empty(10)
y2 = torch.empty((10, 1))
z2 = x2 + y2
w2 = z2.mean(dim=-1)
print(f"x2: {x2.shape}")
print(f"y2: {y2.shape}")
print(f"z2: {z2.shape}")
print(f"w2: {w2.shape}")
x2: torch.Size([10]) y2: torch.Size([10, 1]) z2: torch.Size([10, 10]) w2: torch.Size([10])
Because x2
is only one-dimensional, z2 = x2 + y2
becomes a broadcasted addition,
instead of element-wise. While the resulting tensor z2
has a wrong shape of
(10, 10), the next step is immediately a dimension reduction, which produces a
tensor w2
of the current shape. However, the math behind is completely wrong.
Because this broadcasted version is programmingly valid, no error is raised. As for the numerical side, it is hard to notice as well.
The fist version,
$$w_i = x_i + y_i$$
While the second version,
$$w2_i = y2_i + \bar{x2}$$
Apparently, the difference can be small, when one tensor is dominating or x
is
close to degenerate. Especially in the context of Bayesian optimization, lots of
tensor operations are involved, and the "correct" values are not immediately
clear to the user. To make it worse, many tensors are not explicitly exposed to
the user unless the user is digging into the code. All in all, there is no easy
way to tell the numbers are numerically wrong.
I was lucky enough that I was dealing with an edge case where y
is 0, so
instead of getting w = x
, I got w2 = mean(x)
, whose elements are a constant
value. This was way too obvious that something was wrong.
As for the root cause? A wrong call of squeeze()
in another function.
import torch
def some_func(x):
# do something here
return x.squeeze(-1)
# some dozens of lines of code here
x = torch.empty((10, 1))
y = torch.empty((10, 1))
x_new = some_func(x) # x_new has shape (10,)
z = x + y
w = z.mean(dim=-1)
I was happily assuming x_new
was still of shape (10, 1) for weeks...
As you can see, this can go unnoticed fairly easily. Due to the random nature of the Gaussian processes (I use deterministic random numbers, but still the whole outcome is "unknown" to me), it is really hard to know that there is a hidden mathematical bug.
I will probably add some assert
statements here and there to make sure the
dimensions are always intended. Another way is to use type annotations like
jaxtyping
instead of relying on static linters.
Machine Learning Coding Python Torch