I have a dataframe with 1M rows in which one column represents the log posterior probability. Many of these values are very close to 0 when exponentiated, so I have to use log_sum_exp()
rather than log(sum(exp()))
to obtain accurate results. However, I would like to obtain the cumulative probability density implied by these posterior probabilities. If simple summing would work, I could use log(cumsum(exp)))
but for reasons specified above, this does not give accurate results. Currently I am running a for
-loop:
for(k in 1:nrow(y)){
y$cumlogsum[k] <- log_sum_exp(y$log_pp[1:k])
}
This is obviously hopelessly inefficient and takes hours to run. Is there an efficient way to do this in R, e.g. by vectorizing?
I have a dataframe with 1M rows in which one column represents the log posterior probability. Many of these values are very close to 0 when exponentiated, so I have to use log_sum_exp()
rather than log(sum(exp()))
to obtain accurate results. However, I would like to obtain the cumulative probability density implied by these posterior probabilities. If simple summing would work, I could use log(cumsum(exp)))
but for reasons specified above, this does not give accurate results. Currently I am running a for
-loop:
for(k in 1:nrow(y)){
y$cumlogsum[k] <- log_sum_exp(y$log_pp[1:k])
}
This is obviously hopelessly inefficient and takes hours to run. Is there an efficient way to do this in R, e.g. by vectorizing?
Share Improve this question asked Feb 4 at 10:48 IshishtIshisht 952 silver badges8 bronze badges 5 |2 Answers
Reset to default 3The log-sum-exp trick is:
log_sum_exp <- function(x) {
max.x <- max(x)
return(max.x + log(sum(exp(x - max.x))))
}
And a cumsum version would simply be:
log_cumsum_exp <- function(x) {
max.x <- max(x)
return(max.x + log(cumsum(exp(x - max.x))))
}
You seem to be running into precision problems. Rmpfr
can help increase R's floating point precision to much higher digit-precision:
#install.packages("Rmpfr")
library(Rmpfr)
set.seed(123)
y <- data.frame(log_pp = runif(1000, 1e-12, 1e-10))
# vectorized cumulative log sum with high precision
cumulative_logsum <- function(log_probs, precision_bits = 256) {
# Convert to mpfr for high precision
x <- mpfr(log_probs, precBits = precision_bits)
# Find maximum for numerical stability
max_val <- max(x)
# Compute shifted values
shifted <- exp(x - max_val)
# Compute cumulative sum
cumsum_shifted <- cumsum(shifted)
# Transform back to log space
result <- max_val + log(cumsum_shifted)
return(result)
}
y$cumlogsum <- cumulative_logsum(y$log_pp)
y$cumlogsum_numeric <- format(asNumeric(y$cumlogsum), scientific = F)
matrixStats::logSumExp()
? – jay.sf Commented Feb 4 at 11:02cumsum
. – jblood94 Commented Feb 4 at 12:36y$cumlogsum <- max(y$log_pp) + log(cumsum(exp(y$log_pp - max(y$log_pp))))
should do the trick. – Tim G Commented Feb 4 at 12:41log(sum(exp()))
? – ThomasIsCoding Commented Feb 4 at 12:46