I have not been able to wrap my head around this, and ChatGPT seems to think this shouldn't be the case. Why does setting requires_grad=True
when creating a tensor cause it to lose its leaf status when transferring it to a GPU? For example (tested in a Google Colab notebook):
b = torch.rand(10).cuda()
b.is_leaf # True
and
b = torch.rand(10, requires_grad=True)
b.is_leaf # True
but
b = torch.rand(10, requires_grad=True).cuda()
b.is_leaf # False
I realize that b = torch.rand(10, requires_grad=True, device='cuda')
causes b
to retain its leaf status after being transfered to the GPU, which is a perfectly fine workaround. However, I am very confused by the above-mentioned behavior.
I have not been able to wrap my head around this, and ChatGPT seems to think this shouldn't be the case. Why does setting requires_grad=True
when creating a tensor cause it to lose its leaf status when transferring it to a GPU? For example (tested in a Google Colab notebook):
b = torch.rand(10).cuda()
b.is_leaf # True
and
b = torch.rand(10, requires_grad=True)
b.is_leaf # True
but
b = torch.rand(10, requires_grad=True).cuda()
b.is_leaf # False
I realize that b = torch.rand(10, requires_grad=True, device='cuda')
causes b
to retain its leaf status after being transfered to the GPU, which is a perfectly fine workaround. However, I am very confused by the above-mentioned behavior.
1 Answer
Reset to default 2From the pytorch documentation:
All Tensors that have
requires_grad
which isFalse
will be leaf Tensors by convention.For Tensors that have
requires_grad
which isTrue
, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and sograd_fn
is None.
For your first example:
b = torch.rand(10).cuda()
b.is_leaf # True
b
has requires_grad=False
, so it is considered a leaf tensor.
Your second example:
b = torch.rand(10, requires_grad=True)
b.is_leaf # True
b
is created by the user and has requires_grad=True
, so it is a leaf tensor.
Your third example:
b = torch.rand(10, requires_grad=True).cuda()
b.is_leaf # False
Here it returns False because creating the tensor and moving the tensor to the GPU are two separate operations. The leaf tensor is the original tensor torch.rand(10, requires_grad=True)
. Calling .cuda()
is a separate operation that returns a new tensor.
From the pytorch documentation:
>>> a = torch.rand(10, requires_grad=True)
>>> a.is_leaf
True
>>> b = torch.rand(10, requires_grad=True).cuda()
>>> b.is_leaf
False
# b was created by the operation that cast a cpu Tensor into a cuda Tensor
This can be seen more explicitly by inspecting other tensor attributes:
b = torch.rand(10, requires_grad=True)
b.data_ptr()
> 1934898880
b.grad_fn
> None
b.is_leaf
> True
c = b.cuda()
c.data_ptr()
> 140481214283776 # different data pointer
c.grad_fn
> <ToCopyBackward0 at 0x7fc5f44839a0> # has grad function
c.is_leaf # is no longer a leaf tensor
> False
In the above, b
is created by the user and has requires_grad=True
, so it is a leaf tensor. Consistent with this, b
's grad_fn
is None
.
We create c = b.cuda()
which creates a new tensor object. We can verify this by seeing it has a different data_ptr
. c
has grad_fn=ToCopyBackward0
, which is how pytorch will backward through the device switch operation.
Finally, when you use b = torch.rand(10, requires_grad=True, device='cuda')
, you are creating b
directly on the GPU in a single operation, so it is a leaf tensor.