最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

pytorch - Why does setting requires_grad=True upon tensor creation specifically cause loss of leaf status after transfer to GPU?

programmeradmin0浏览0评论

I have not been able to wrap my head around this, and ChatGPT seems to think this shouldn't be the case. Why does setting requires_grad=True when creating a tensor cause it to lose its leaf status when transferring it to a GPU? For example (tested in a Google Colab notebook):

b = torch.rand(10).cuda()
b.is_leaf  # True

and

b = torch.rand(10, requires_grad=True)
b.is_leaf  # True

but

b = torch.rand(10, requires_grad=True).cuda()
b.is_leaf  # False

I realize that b = torch.rand(10, requires_grad=True, device='cuda') causes b to retain its leaf status after being transfered to the GPU, which is a perfectly fine workaround. However, I am very confused by the above-mentioned behavior.

I have not been able to wrap my head around this, and ChatGPT seems to think this shouldn't be the case. Why does setting requires_grad=True when creating a tensor cause it to lose its leaf status when transferring it to a GPU? For example (tested in a Google Colab notebook):

b = torch.rand(10).cuda()
b.is_leaf  # True

and

b = torch.rand(10, requires_grad=True)
b.is_leaf  # True

but

b = torch.rand(10, requires_grad=True).cuda()
b.is_leaf  # False

I realize that b = torch.rand(10, requires_grad=True, device='cuda') causes b to retain its leaf status after being transfered to the GPU, which is a perfectly fine workaround. However, I am very confused by the above-mentioned behavior.

Share Improve this question asked Mar 13 at 22:10 rkprkp 4385 silver badges9 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2

From the pytorch documentation:

All Tensors that have requires_grad which is False will be leaf Tensors by convention.

For Tensors that have requires_grad which is True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is None.

For your first example:

b = torch.rand(10).cuda()
b.is_leaf  # True

b has requires_grad=False, so it is considered a leaf tensor.

Your second example:

b = torch.rand(10, requires_grad=True)
b.is_leaf  # True

b is created by the user and has requires_grad=True, so it is a leaf tensor.

Your third example:

b = torch.rand(10, requires_grad=True).cuda()
b.is_leaf  # False

Here it returns False because creating the tensor and moving the tensor to the GPU are two separate operations. The leaf tensor is the original tensor torch.rand(10, requires_grad=True). Calling .cuda() is a separate operation that returns a new tensor.

From the pytorch documentation:

>>> a = torch.rand(10, requires_grad=True)
>>> a.is_leaf
True
>>> b = torch.rand(10, requires_grad=True).cuda()
>>> b.is_leaf
False
# b was created by the operation that cast a cpu Tensor into a cuda Tensor

This can be seen more explicitly by inspecting other tensor attributes:

b = torch.rand(10, requires_grad=True)
b.data_ptr()
> 1934898880
b.grad_fn
> None
b.is_leaf
> True

c = b.cuda()
c.data_ptr()
> 140481214283776 # different data pointer
c.grad_fn
> <ToCopyBackward0 at 0x7fc5f44839a0> # has grad function
c.is_leaf # is no longer a leaf tensor
> False

In the above, b is created by the user and has requires_grad=True, so it is a leaf tensor. Consistent with this, b's grad_fn is None.

We create c = b.cuda() which creates a new tensor object. We can verify this by seeing it has a different data_ptr. c has grad_fn=ToCopyBackward0, which is how pytorch will backward through the device switch operation.

Finally, when you use b = torch.rand(10, requires_grad=True, device='cuda'), you are creating b directly on the GPU in a single operation, so it is a leaf tensor.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论