pytorch - Why does setting requires_grad=True upon tensor creation specifically cause loss of leaf status after transfer to GPU?

I have not been able to wrap my head around this, and ChatGPT seems to think this shouldn't be the case. Why does setting requires_grad=True when creating a tensor cause it to lose its leaf status when transferring it to a GPU? For example (tested in a Google Colab notebook):

b = torch.rand(10).cuda()
b.is_leaf  # True

and

b = torch.rand(10, requires_grad=True)
b.is_leaf  # True

but

b = torch.rand(10, requires_grad=True).cuda()
b.is_leaf  # False

I realize that b = torch.rand(10, requires_grad=True, device='cuda') causes b to retain its leaf status after being transfered to the GPU, which is a perfectly fine workaround. However, I am very confused by the above-mentioned behavior.

b = torch.rand(10).cuda()
b.is_leaf  # True

and

b = torch.rand(10, requires_grad=True)
b.is_leaf  # True

but

b = torch.rand(10, requires_grad=True).cuda()
b.is_leaf  # False

Share Improve this question asked Mar 13 at 22:10 rkp 4385 silver badges9 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

From the pytorch documentation:

All Tensors that have requires_grad which is False will be leaf Tensors by convention.

For Tensors that have requires_grad which is True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so grad_fn is None.

For your first example:

b = torch.rand(10).cuda()
b.is_leaf  # True

b has requires_grad=False, so it is considered a leaf tensor.

Your second example:

b = torch.rand(10, requires_grad=True)
b.is_leaf  # True

b is created by the user and has requires_grad=True, so it is a leaf tensor.

Your third example:

b = torch.rand(10, requires_grad=True).cuda()
b.is_leaf  # False

Here it returns False because creating the tensor and moving the tensor to the GPU are two separate operations. The leaf tensor is the original tensor torch.rand(10, requires_grad=True). Calling .cuda() is a separate operation that returns a new tensor.

From the pytorch documentation:

>>> a = torch.rand(10, requires_grad=True)
>>> a.is_leaf
True
>>> b = torch.rand(10, requires_grad=True).cuda()
>>> b.is_leaf
False
# b was created by the operation that cast a cpu Tensor into a cuda Tensor

This can be seen more explicitly by inspecting other tensor attributes:

b = torch.rand(10, requires_grad=True)
b.data_ptr()
> 1934898880
b.grad_fn
> None
b.is_leaf
> True

c = b.cuda()
c.data_ptr()
> 140481214283776 # different data pointer
c.grad_fn
> <ToCopyBackward0 at 0x7fc5f44839a0> # has grad function
c.is_leaf # is no longer a leaf tensor
> False

In the above, b is created by the user and has requires_grad=True, so it is a leaf tensor. Consistent with this, b's grad_fn is None.

We create c = b.cuda() which creates a new tensor object. We can verify this by seeing it has a different data_ptr. c has grad_fn=ToCopyBackward0, which is how pytorch will backward through the device switch operation.

Finally, when you use b = torch.rand(10, requires_grad=True, device='cuda'), you are creating b directly on the GPU in a single operation, so it is a leaf tensor.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

pytorch - Why does setting requires_grad=True upon tensor creation specifically cause loss of leaf status after transfer to GPU?

1 Answer 1

与本文相关的文章

评论列表(0)