最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

c++ - What is the fastest way to compute number of 1 bits in an array of bytes? - Stack Overflow

programmeradmin6浏览0评论

I might be silly to ask this, but I search around in google and did not got an definite answer(maybe I should be more careful).

Given a series of bytes, what is the fastest way to know the number of 1 bits in it?

#include <array>

constexpr size_t nbytes = 256;
std::array<uint8_t, nbytes> bytes;

size_t table[256]{0, 1, 1, 2, 1, ...}; // define table from 0x00 to 0xff
size_t count1() {
    size_t res = 0;
    for (size_t i{0}; i < nbytes; ++i) { res += table[static_cast<int>(bytes[i])];}
    return res;
}

Is there any other method that is widely trusted to always be the fastest to cover this task?

I might be silly to ask this, but I search around in google and did not got an definite answer(maybe I should be more careful).

Given a series of bytes, what is the fastest way to know the number of 1 bits in it?

#include <array>

constexpr size_t nbytes = 256;
std::array<uint8_t, nbytes> bytes;

size_t table[256]{0, 1, 1, 2, 1, ...}; // define table from 0x00 to 0xff
size_t count1() {
    size_t res = 0;
    for (size_t i{0}; i < nbytes; ++i) { res += table[static_cast<int>(bytes[i])];}
    return res;
}

Is there any other method that is widely trusted to always be the fastest to cover this task?

Share Improve this question edited Mar 11 at 8:57 Weijun Zhou 5,2762 gold badges21 silver badges41 bronze badges asked Mar 11 at 2:00 Zeyu Zhang CNZeyu Zhang CN 1355 bronze badges 7
  • 9 std::popcount – Weijun Zhou Commented Mar 11 at 2:03
  • 1 There is no such thing as binary 1. All objects are binary. There is just 1. What is it supposed to mean, “the number of binary 1”? Is it the total number of array members having the value of 1? Or something else? Why constexpr 256? This is just a constant. – Sergey A Kryukov Commented Mar 11 at 2:17
  • I believe 0xA6 has 4 bits — you can edit if you're quick. 0xA = 0b1010; 0x6 = 0b0110. – Jonathan Leffler Commented Mar 11 at 2:24
  • Any way will be O(n). Which will be "fastest" depends on target architecture, compiler, optimization flags etc. For example, imagine CPU which has an instruction to calculate number of ones of 256-bit word in single cycle. In general fastest way can be determined by benchmarking. – dimich Commented Mar 11 at 2:24
  • 1 I'm sorry I did not understand you, but you really have to write more clearly. It is not “has”, it is the number of set bits in a byte. Now, please don't post code that does not compile. In your case, you had to add some #include lines. – Sergey A Kryukov Commented Mar 11 at 2:28
 |  Show 2 more comments

1 Answer 1

Reset to default 5

The fastest way is a bit of an aspirational goal, but a fast way would be to use std::popcount:

std::size_t count_one_bits_naive(std::span<const std::uint8_t> bytes) {
    std::size_t sum = 0;
    for (std::uint8_t b : bytes)
        sum += std::popcount(b);
    return sum;
}

Unfortunately, GCC does not optimize this well; it processes the array byte by byte as written, even when the the architecture has a popcnt instruction that can process 64 bits at a time.

A better version would at least process 64 bits at a time:

std::size_t count_one_bits_bulk(std::span<const uint8_t> bytes) {
    std::size_t sum = 0;
    std::size_t i = 0;
    // process as many blocks of 64 bits as possible
    for (; i + sizeof(uint64_t) < bytes.size(); i += sizeof(uint64_t)) {
        uint64_t piece;
        std::memcpy(&piece, bytes.data() + i, sizeof(uint64_t));
        sum += std::popcount(piece);
    }
    // process the trailing bytes (at most 7)
    for (; i < bytes.size(); ++i) {
        sum += std::popcount(bytes[i]);
    }
    return sum;
}

See also Compiler Explorer

Here are some benchmarks https://quick-bench/q/negFVF-WaxxVfS9rejJL3NfSGCo:

Unfortunately, quick-bench doesn't let you specify architecture flags, so this is measuring using the software fallback call __popcountdi2 instead of the popcnt instruction. However, the benchmark demonstrates the principle that processing bytes in bulk should be faster.

发布评论

评论列表(0)

  1. 暂无评论