最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Pytorch 'smaller than' operation on tensor gives wrong result - Stack Overflow

programmeradmin5浏览0评论

I get a very weird behavior for a 'smaller' operation on a float torch tensor. Consider the following snippet

t = torch.load(r"value.pt")

print(t.shape, t.dtype)

#t = t.double()
for i in range(t.shape[0]):

    print(i, "%.20f" % (t[i].sum(-1)-1))

print((t.sum(-1)-1).abs()<1e-6)

print("%.8e"%(t[35].sum()-1),
(t[35].sum(-1)-1).abs()<1e-6, (t[34:50].sum(-1)-1).abs()<1e-6,
(t[34:40].sum(-1)-1).abs()<1e-6)

which produces the output

torch.Size([100, 1600]) torch.float32

...
33 -0.00000008132246875903
34 0.00000014945180737413
35 0.00000053211988415569
36 -0.00000006957179721212
37 -0.00000010645544534782
38 -0.00000000481304596178
...

tensor([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True, False,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True],
       device='cuda:0')
7.15255737e-07 
tensor(True, device='cuda:0') 
tensor([ True, False,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True], device='cuda:0') 
tensor([True, True, True, True, True, True], device='cuda:0')

That one of the entries of t deviates from 1 more than 1e-6 is firstly wrong, but that result also changes when I slice/index t differently. How does this make any sense? When I convert the tensor to a double tensor the problem is gone..

I get a very weird behavior for a 'smaller' operation on a float torch tensor. Consider the following snippet

t = torch.load(r"value.pt")

print(t.shape, t.dtype)

#t = t.double()
for i in range(t.shape[0]):

    print(i, "%.20f" % (t[i].sum(-1)-1))

print((t.sum(-1)-1).abs()<1e-6)

print("%.8e"%(t[35].sum()-1),
(t[35].sum(-1)-1).abs()<1e-6, (t[34:50].sum(-1)-1).abs()<1e-6,
(t[34:40].sum(-1)-1).abs()<1e-6)

which produces the output

torch.Size([100, 1600]) torch.float32

...
33 -0.00000008132246875903
34 0.00000014945180737413
35 0.00000053211988415569
36 -0.00000006957179721212
37 -0.00000010645544534782
38 -0.00000000481304596178
...

tensor([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True, False,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True,  True],
       device='cuda:0')
7.15255737e-07 
tensor(True, device='cuda:0') 
tensor([ True, False,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True,  True,  True,  True,  True], device='cuda:0') 
tensor([True, True, True, True, True, True], device='cuda:0')

That one of the entries of t deviates from 1 more than 1e-6 is firstly wrong, but that result also changes when I slice/index t differently. How does this make any sense? When I convert the tensor to a double tensor the problem is gone..

Share Improve this question edited 6 hours ago talonmies 72.3k35 gold badges202 silver badges287 bronze badges asked 6 hours ago user2224350user2224350 2,3245 gold badges30 silver badges57 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

This is because of the increase in precision.

Float32 (single precision) has limited precision (~7 decimal digits). Float64 (double precision) has much higher precision (~15-16 decimal digits). Your threshold (1e-6) is very close to the precision limit of float32, so it seems to be likely caused by numerical inaccuracies. When you use t.double(), PyTorch converts the tensor to float64, reducing floating-point errors.

I'm not sure about the indexing part though. I would guess it is because of CUDA's non deterministic behavior leading certain operations on the GPU might using different execution orders, causing slight numerical differences.

发布评论

评论列表(0)

  1. 暂无评论