最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

multithreading - How to make RePair compression algorithm multithreaded - Stack Overflow

programmeradmin0浏览0评论

The Re-Pair algorithm performs text compression by recursively replacing the most frequent pair of characters occurring in the text. The algorithm gets very slow on large texts, I was wondering how parellize the following algorithm with multithread to improve compression speed:

Pseudo code for Re-Pair compression for ASCII text compression

rules = {}
pair_counts = {}
next_pair_symbol = 256

while true:
    for (i = 0 .. text.length() - 2)
        pair_counts[text[i].substring(i, i + 1)]++

    most_frequent_pair = pair_counts.getPairMaxCount()

    if (most_frequent_pair < 2)
        break

    rules[next_pair_symbol] = most_frequent_pair
    text.replace(most_frequent_pair, next_pair_symbol++)

The final value of the variable rules is used to retrieve the initial text from the compressed text.

Before spending time coding, I'm wondering about the correct way to implement multithreading with Re-Pair: one possible solution to make the Re-Pair algorithm multithreaded is to split the given text into N slices, where N is the number of threads; then executes N threads in parallel, each one on its own slice, which result in specific rules[]. When all threads terminated, merge together rules[] from each thread and apply the merged rules on the given text to get the final compressed string.

The compression will not be optimal in case each slice has different statistic distribution. There are other better alternatives to be less sensible to the distribution of the characters in the given text?

发布评论

评论列表(0)

  1. 暂无评论