Pytorch 'smaller than' operation on tensor gives wrong result

I get a very weird behavior for a 'smaller' operation on a float torch tensor. Consider the following snippet

t = torch.load(r"value.pt") 
print(t.shape, t.dtype) 
#t = t.double()
for i in range(t.shape[0]): 
    print(i, "%.20f" % (t[i].sum(-1)-1)) 
print((t.sum(-1)-1).abs()<1e-6) 
print("%.8e"%(t[35].sum()-1),
(t[35].sum(-1)-1).abs()<1e-6, (t[34:50].sum(-1)-1).abs()<1e-6,
(t[34:40].sum(-1)-1).abs()<1e-6)

which produces the output

torch.Size([100, 1600]) torch.float32

...
33 -0.00000008132246875903
34 0.00000014945180737413
35 0.00000053211988415569
36 -0.00000006957179721212
37 -0.00000010645544534782
38 -0.00000000481304596178
...

tensor([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True, False,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True],
       device='cuda:0')
7.15255737e-07 
tensor(True, device='cuda:0') 
tensor([ True, False,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True], device='cuda:0') 
tensor([True, True, True, True, True, True], device='cuda:0')

That one of the entries of t deviates from 1 more than 1e-6 is firstly wrong, but that result also changes when I slice/index t differently. How does this make any sense? When I convert the tensor to a double tensor the problem is gone..

I get a very weird behavior for a 'smaller' operation on a float torch tensor. Consider the following snippet

t = torch.load(r"value.pt") 
print(t.shape, t.dtype) 
#t = t.double()
for i in range(t.shape[0]): 
    print(i, "%.20f" % (t[i].sum(-1)-1)) 
print((t.sum(-1)-1).abs()<1e-6) 
print("%.8e"%(t[35].sum()-1),
(t[35].sum(-1)-1).abs()<1e-6, (t[34:50].sum(-1)-1).abs()<1e-6,
(t[34:40].sum(-1)-1).abs()<1e-6)

which produces the output

torch.Size([100, 1600]) torch.float32

...
33 -0.00000008132246875903
34 0.00000014945180737413
35 0.00000053211988415569
36 -0.00000006957179721212
37 -0.00000010645544534782
38 -0.00000000481304596178
...

tensor([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True, False,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True],
       device='cuda:0')
7.15255737e-07 
tensor(True, device='cuda:0') 
tensor([ True, False,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True], device='cuda:0') 
tensor([True, True, True, True, True, True], device='cuda:0')

Share Improve this question edited 6 hours ago talonmies 72.3k35 gold badges202 silver badges287 bronze badges asked 6 hours ago user2224350 2,3245 gold badges30 silver badges57 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

This is because of the increase in precision.

Float32 (single precision) has limited precision (~7 decimal digits). Float64 (double precision) has much higher precision (~15-16 decimal digits). Your threshold (1e-6) is very close to the precision limit of float32, so it seems to be likely caused by numerical inaccuracies. When you use t.double(), PyTorch converts the tensor to float64, reducing floating-point errors.

I'm not sure about the indexing part though. I would guess it is because of CUDA's non deterministic behavior leading certain operations on the GPU might using different execution orders, causing slight numerical differences.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Pytorch 'smaller than' operation on tensor gives wrong result - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)