I want to concat a big bitset with a smaller one in a way that wont kill performance. Currently my application spends 20% of cpu time in just the following code:
boost::dynamic_bitset<> encode(const std::vector<char>& data)
{
boost::dynamic_bitset<> result;
std::for_each(data.begin(), data.end(), [&](unsigned char symbol)
{
for(size_t n = 0; n < codes_[symbol].size(); ++n)
result.push_back(codes_[symbol][n]); // codes_[symbol][n].size() avarage ~5 bits
});
return result;
}
I have read this post which proposes a solution, which unfortunately will not work for me as the size difference between the sizes of destination bitset and the source bitset is very large.
Any ideas?
If this is not possible to do efficiently with boost开发者_JAVA百科::dynamic_bitset then I'm open for other suggestions.
This is because you keep using push_back()
, but in actual fact, you already know the size in advance. This means lots of redundant copying and reallocating. You should resize it first. In addition, you don't have to push_back()
every value- it should be possible for you to use some form of insert()
(I don't actually know it's exact interface, but I think append()
is the name) to insert the whole target vector at once, which should be significantly better.
In addition, you're leaving the dynamic_bitset
as unsigned long, but as far as I can see, you're only actually inserting unsigned char
into it. Changing that could make life easier for you.
I'm also curious as to what type codes_
is- if it's a map
you could replace it with a vector
, or infact since it's statically sized maximally (256 entries is the max of an unsigned char
) , a static array.
I've tried using boost bitset in performance code before and been disappointed. I dug into it a bit, and concluded I'd be better off implementing my own bit-buffer class, although I forget the details of what convinced me boost's class was never going to be fast (I did get as far as inspecting the assembly produced).
I still don't know what the fastest way of building bit-buffers/bitsets/bitstreams or whatever you want to call them is. A colleague is trying to find out with this related question, but at time of writing it's still awaiting a good answer.
I wrote my own bitset class. I appreciate any suggestions for improvements. I will try to look into SSE and see if there is anything useful there.
With my very rough benchmark I got a 11x performance increase while appending 6 bits at a time.
class fast_bitset
{
public:
typedef unsigned long block_type;
static const size_t bits_per_block = sizeof(block_type)*8;
fast_bitset()
: is_open_(true)
, blocks_(1)
, space_(blocks_.size()*bits_per_block){}
void append(const fast_bitset& other)
{
assert(!other.is_open_);
for(size_t n = 0; n < other.blocks_.size()-1; ++n)
append(other.blocks_[n], bits_per_block);
append(other.blocks_.back() >> other.space_, bits_per_block - other.space_);
}
void append(block_type value, size_t n_bits)
{
assert(is_open_);
assert(n_bits < bits_per_block);
if(space_ < n_bits)
{
blocks_.back() = blocks_.back() << space_;
blocks_.back() = blocks_.back() | (value >> (n_bits - space_));
blocks_.push_back(value);
space_ = bits_per_block - (n_bits - space_);
}
else
{
blocks_.back() = blocks_.back() << n_bits;
blocks_.back() = blocks_.back() | value;
space_ -= n_bits;
}
}
void push_back(bool bit)
{
append(bit, 1);
}
bool operator[](size_t index) const
{
assert(!is_open_);
static const size_t high_bit = 1 << (bits_per_block-1);
const size_t block_index = index / bits_per_block;
const size_t bit_index = index % bits_per_block;
const size_t bit_mask = high_bit >> bit_index;
return blocks_[block_index] & bit_mask;
}
void close()
{
blocks_.back() = blocks_.back() << space_;
is_open_ = false;
}
size_t size() const
{
return blocks_.size()*bits_per_block-space_;
}
const std::vector<block_type>& blocks() const {return blocks_;}
class reader
{
public:
reader(const fast_bitset& bitset)
: bitset_(bitset)
, index_(0)
, size_(bitset.size()){}
bool next_bit(){return bitset_[index_++];}
bool eof() const{return index_ >= size_;}
private:
const fast_bitset& bitset_;
size_t index_;
size_t size_;
};
private:
bool is_open_;
std::vector<block_type> blocks_;
size_t space_;
};
精彩评论