Say I have obtained some alphas and betas as parameters from a neural network, which will be parameters of the Beta distribution. Now, I sample from the Beta distribution and then calculate some loss and back-propagate via the samples obtained. Is it possible to do that? Given that after the sampling process, I do .requires_grad_(True) to the sample and then compute the loss? This surely works, but it looks like the loss is not converging, is there any other way to do this in PyTorch?
Say, I get the following variables via some neural network:
mu, sigma, pred = model.forward(input)
Where say, mu is the (batch_size x 30
) shaped tensor, similarly sigma is (batch_size x 30
) shaped tensor. I compute the alphas and betas using the mu and sigma obtained from a Neural Network (both of the same shape (batch_size x 30
)), and then sample it via a beta distribution as follows:
def sample_from_beta_distribution(alpha, beta, eps=1e-6):
# Clamp alpha and beta to be positive
alpha_positive = torch.clamp(alpha, min=eps)
beta_positive = torch.clamp(beta, min=eps)
# Create a Beta distribution
# This will automatically broadcast to handle the batch dimension
beta_dist = torch.distributions.beta.Beta(alpha_positive, beta_positive)
# Sample from the distribution
# This will return samples of shape [38, 30]
samples = beta_dist.sample()
return samples
Here, I take the samples which is of the same shape as (batch_size x 30
), perform some operations on it, and then calculate the loss. I expected the gradient to propagate through this, but looks like the loss is not converging.
Any leads would help. Please note, this is not as simple as the reparameterization trick in the standard Normal distribution.
Say I have obtained some alphas and betas as parameters from a neural network, which will be parameters of the Beta distribution. Now, I sample from the Beta distribution and then calculate some loss and back-propagate via the samples obtained. Is it possible to do that? Given that after the sampling process, I do .requires_grad_(True) to the sample and then compute the loss? This surely works, but it looks like the loss is not converging, is there any other way to do this in PyTorch?
Say, I get the following variables via some neural network:
mu, sigma, pred = model.forward(input)
Where say, mu is the (batch_size x 30
) shaped tensor, similarly sigma is (batch_size x 30
) shaped tensor. I compute the alphas and betas using the mu and sigma obtained from a Neural Network (both of the same shape (batch_size x 30
)), and then sample it via a beta distribution as follows:
def sample_from_beta_distribution(alpha, beta, eps=1e-6):
# Clamp alpha and beta to be positive
alpha_positive = torch.clamp(alpha, min=eps)
beta_positive = torch.clamp(beta, min=eps)
# Create a Beta distribution
# This will automatically broadcast to handle the batch dimension
beta_dist = torch.distributions.beta.Beta(alpha_positive, beta_positive)
# Sample from the distribution
# This will return samples of shape [38, 30]
samples = beta_dist.sample()
return samples
Here, I take the samples which is of the same shape as (batch_size x 30
), perform some operations on it, and then calculate the loss. I expected the gradient to propagate through this, but looks like the loss is not converging.
Any leads would help. Please note, this is not as simple as the reparameterization trick in the standard Normal distribution.
Share Improve this question edited Jan 20 at 5:57 pomoworko.com 1,1182 gold badges15 silver badges43 bronze badges asked Jan 19 at 16:52 Jimut123Jimut123 5086 silver badges15 bronze badges1 Answer
Reset to default 0Looks like .rsample()
is doing the trick here, which is keeping the computational graph alive...