I get a very weird behavior for a 'smaller' operation on a float torch tensor. Consider the following snippet
t = torch.load(r"value.pt")
print(t.shape, t.dtype)
#t = t.double()
for i in range(t.shape[0]):
print(i, "%.20f" % (t[i].sum(-1)-1))
print((t.sum(-1)-1).abs()<1e-6)
print("%.8e"%(t[35].sum()-1),
(t[35].sum(-1)-1).abs()<1e-6, (t[34:50].sum(-1)-1).abs()<1e-6,
(t[34:40].sum(-1)-1).abs()<1e-6)
which produces the output
torch.Size([100, 1600]) torch.float32
...
33 -0.00000008132246875903
34 0.00000014945180737413
35 0.00000053211988415569
36 -0.00000006957179721212
37 -0.00000010645544534782
38 -0.00000000481304596178
...
tensor([ True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, False, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True],
device='cuda:0')
7.15255737e-07
tensor(True, device='cuda:0')
tensor([ True, False, True, True, True, True, True, True, True, True,
True, True, True, True, True, True], device='cuda:0')
tensor([True, True, True, True, True, True], device='cuda:0')
That one of the entries of t deviates from 1 more than 1e-6 is firstly wrong, but that result also changes when I slice/index t differently. How does this make any sense? When I convert the tensor to a double tensor the problem is gone..
I get a very weird behavior for a 'smaller' operation on a float torch tensor. Consider the following snippet
t = torch.load(r"value.pt")
print(t.shape, t.dtype)
#t = t.double()
for i in range(t.shape[0]):
print(i, "%.20f" % (t[i].sum(-1)-1))
print((t.sum(-1)-1).abs()<1e-6)
print("%.8e"%(t[35].sum()-1),
(t[35].sum(-1)-1).abs()<1e-6, (t[34:50].sum(-1)-1).abs()<1e-6,
(t[34:40].sum(-1)-1).abs()<1e-6)
which produces the output
torch.Size([100, 1600]) torch.float32
...
33 -0.00000008132246875903
34 0.00000014945180737413
35 0.00000053211988415569
36 -0.00000006957179721212
37 -0.00000010645544534782
38 -0.00000000481304596178
...
tensor([ True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, False, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True],
device='cuda:0')
7.15255737e-07
tensor(True, device='cuda:0')
tensor([ True, False, True, True, True, True, True, True, True, True,
True, True, True, True, True, True], device='cuda:0')
tensor([True, True, True, True, True, True], device='cuda:0')
That one of the entries of t deviates from 1 more than 1e-6 is firstly wrong, but that result also changes when I slice/index t differently. How does this make any sense? When I convert the tensor to a double tensor the problem is gone..
Share Improve this question edited 6 hours ago talonmies 72.3k35 gold badges202 silver badges287 bronze badges asked 6 hours ago user2224350user2224350 2,3245 gold badges30 silver badges57 bronze badges1 Answer
Reset to default 1This is because of the increase in precision.
Float32 (single precision) has limited precision (~7 decimal digits). Float64 (double precision) has much higher precision (~15-16 decimal digits). Your threshold (1e-6) is very close to the precision limit of float32, so it seems to be likely caused by numerical inaccuracies. When you use t.double()
, PyTorch converts the tensor to float64, reducing floating-point errors.
I'm not sure about the indexing part though. I would guess it is because of CUDA's non deterministic behavior leading certain operations on the GPU might using different execution orders, causing slight numerical differences.