r - efficient cumulative log_sum_exp

I have a dataframe with 1M rows in which one column represents the log posterior probability. Many of these values are very close to 0 when exponentiated, so I have to use log_sum_exp() rather than log(sum(exp())) to obtain accurate results. However, I would like to obtain the cumulative probability density implied by these posterior probabilities. If simple summing would work, I could use log(cumsum(exp))) but for reasons specified above, this does not give accurate results. Currently I am running a for-loop:

for(k in 1:nrow(y)){
  y$cumlogsum[k] <- log_sum_exp(y$log_pp[1:k])
}

This is obviously hopelessly inefficient and takes hours to run. Is there an efficient way to do this in R, e.g. by vectorizing?

for(k in 1:nrow(y)){
  y$cumlogsum[k] <- log_sum_exp(y$log_pp[1:k])
}

This is obviously hopelessly inefficient and takes hours to run. Is there an efficient way to do this in R, e.g. by vectorizing?

Share Improve this question asked Feb 4 at 10:48 Ishisht 952 silver badges8 bronze badges

What about matrixStats::logSumExp()? – jay.sf Commented Feb 4 at 11:02
Related. It would be easy to adapt with cumsum. – jblood94 Commented Feb 4 at 12:36
1 y$cumlogsum <- max(y$log_pp) + log(cumsum(exp(y$log_pp - max(y$log_pp)))) should do the trick. – Tim G Commented Feb 4 at 12:41
Could you show an example how much the precision loss is if you apply log(sum(exp()))? – ThomasIsCoding Commented Feb 4 at 12:46
@ThomasIsCoding I'm getting inaccuracies on the order of 1e-10, which is enough to matter at least in principle since some of the probabilities are << 1e-20. – Ishisht Commented Feb 7 at 11:03

Add a comment |

2 Answers 2

Sorted by: Reset to default 3

The log-sum-exp trick is:

log_sum_exp <- function(x) {
  max.x <- max(x)
  return(max.x + log(sum(exp(x - max.x))))
}

And a cumsum version would simply be:

log_cumsum_exp <- function(x) {
  max.x <- max(x)
  return(max.x + log(cumsum(exp(x - max.x))))
}

You seem to be running into precision problems. Rmpfr can help increase R's floating point precision to much higher digit-precision:

#install.packages("Rmpfr")
library(Rmpfr)

set.seed(123)
y <- data.frame(log_pp = runif(1000, 1e-12, 1e-10))

# vectorized cumulative log sum with high precision
cumulative_logsum <- function(log_probs, precision_bits = 256) {
  # Convert to mpfr for high precision
  x <- mpfr(log_probs, precBits = precision_bits)
  
  # Find maximum for numerical stability
  max_val <- max(x)
  
  # Compute shifted values
  shifted <- exp(x - max_val)
  
  # Compute cumulative sum
  cumsum_shifted <- cumsum(shifted)
  
  # Transform back to log space
  result <- max_val + log(cumsum_shifted)
  
  return(result)
}


y$cumlogsum <- cumulative_logsum(y$log_pp)

y$cumlogsum_numeric <- format(asNumeric(y$cumlogsum), scientific = F)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

r - efficient cumulative log_sum_exp - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)