bit pattern matching and replacing_问答_开发者

I come across a very tricky problem with bit manipulation.

As far as I know, the smallest variable size to hold a value is one byte of 8 bits. The bit operations available in C/C++ apply to an entire unit of bytes.

Imagine that I have a map to replace a binary pattern 100100 (6 bits) with a signal 10000 (5 bits). If the 1st byte of input data from a file开发者_如何学运维 is 10010001 (8 bits) being stored in a char variable, part of it matches the 6 bit pattern and therefore be replaced by the 5 bit signal to give a result of 1000001 (7 bits).

I can use a mask to manipulate the bits within a byte to get a result of the left most bits to 10000 (5 bit) but the right most 3 bits become very tricky to manipulate. I cannot shift the right most 3 bits of the original data to get the correct result 1000001 (7 bit) followed by 1 padding bit in that char variable that should be filled by the 1st bit of next followed byte of input.

I wonder if C/C++ can actually do this sort of replacement of bit patterns of length that do not fit into a Char (1 byte) variable or even Int (4 bytes). Can C/C++ do the trick or we have to go for other assembly languages that deal with single bits manipulations?

I heard that Power Basic may be able to do the bit-by-bit manipulation better than C/C++.

If time and space are not important then you can convert the bits to a string representation and perform replaces on the string, then convert back when needed. Not an elegant solution but one that works.

<< shiftleft
^ XOR
>> shift right
~ one's complement

Using these operations, you could easily isolate the pieces that you are interested in and compare them as integers.

say the byte 001000100 and you want to check if it contains 1000:

char k = (char)68;
char c = (char)8;
int i = 0;
while(i<5){
    if((k<<i)>>(8-3-i) == c){
        //do stuff
        break;
    }
}

This is very sketchy code, just meant to be a demonstration.

I wonder if C/C++ can actually do this sort of replacement of bit patterns of length that do not fit into a Char (1 byte) variable or even Int (4 bytes).

What about std::bitset?

Here's a small bit reader class which may suit your needs. Of course, you may want to create a bit writer for your use case.

#include <iostream>
#include <sstream>
#include <cassert>

class BitReader {
    public:
        typedef unsigned char BitBuffer;

        BitReader(std::istream &input) :
            input(input), bufferedBits(8) {
        }

        BitBuffer peekBits(int numBits) {
            assert(numBits <= 8);
            assert(numBits > 0);

            skipBits(0);    // Make sure we have a non-empty buffer

            return (((input.peek() << 8) | buffer) >> bufferedBits) & ((1 << numBits) - 1);
        }

        void skipBits(int numBits) {
            assert(numBits >= 0);

            numBits += bufferedBits;

            while (numBits > 8) {
                buffer = input.get();
                numBits -= 8;
            }

            bufferedBits = numBits;
        }

        BitBuffer readBits(int numBits) {
            assert(numBits <= 8);
            assert(numBits > 0);

            BitBuffer ret = peekBits(numBits);

            skipBits(numBits);

            return ret;
        }

        bool eof() const {
            return input.eof();
        }

    private:
        std::istream &input;
        BitBuffer buffer;
        int bufferedBits;   // How many bits are buffered into 'buffer' (0 = empty)
};

Use a vector<bool> if you can read your data into the vector mostly at once. It may be more difficult to find-and-replace sequences of bits, though.

If I understood your questions correctly, you have an input stream and and output stream and you want to replace the 6bits of the input with 5 in the output - and your output still should be a bit stream?

So, the most important programmer's rule can be applied: Divide et impera! You should split your component in three parts:

Input Stream converter: Convert every pattern in the input stream to a char array (ring) buffer. If I understood you correctly your input "commands" are 8bit long, so there is nothing special about this.
Do the replacement on the ring buffer in a way that you replace every matching 6-bit pattern with the 5bit one, but "pad" the 5 bit with a leading zero, so the total length is still 8bit.
Write an output handler that reads from the ring buffer and let this output handler write only the 7 LSB to the output stream from each input byte. Of course some bit manipulation is necessary again for this. If your ring buffer size can be divided by 8 and 7 (= is a multiple of 56) you will have a clean buffer at the end and can start again with 1.

The most simplest way to implement this is to iterate over this 3 steps as long as input data is available.

If a performance really matters and you are running on a multi-core CPU you even could split the steps and 3 threads, but then you must carefully synchronize the access to the ring buffer.

I think the following does what you want.

PATTERN_LEN = 6
PATTERNMASK = 0x3F //6 bits
PATTERN     = 0x24 //b100100
REPLACE_LEN = 5
REPLACEMENT = 0x10  //b10000


void compress(uint8* inbits, uint8* outbits, int len)
{
  uint16 accumulator=0;
  int nbits=0;
  uint8 candidate; 

  while (len--) //for all input bytes
  {
    //for each bit (msb first)
    for (i=7;i<=0;i--)
    {
      //add 1 bit to accumulator
      accumulator<<=1;
      accumulator|=(*inbits&(1<<i));  
      nbits++;
      //check for pattern
      candidate = accumulator&PATTERNMASK;
      if (candidate==PATTERN)
      {
        //remove pattern
        accumulator>>=PATTERN_LEN; 
        //add replacement
        accumulator<<=REPLACE_LEN; 
        accumulator|=REPLACMENT;
        nbits+= (REPLACE_LEN - PATTERN_LEN);
      }
    }
    inbits++;
    //move accumulator to output to prevent overflow
    while (nbits>8)
    {
      //copy the highest 8 bits
      nbits-=8;      
      *outbits++ = (accumulator>>nbits)&0xFF;
      //clear them from accumulator
      accumulator&= ~(0xFF<<nbits);
    }
  }
  //copy remainder of accumulator  to output
  while (nbits>0)
  {
    nbits-=8;
    *outbits++ = (accumulator>>nbits)&0xFF;
    accumulator&= ~(0xFF<<nbits);
  }

}

You could use a switch or a loop in the middle to check the candidate against multiple patterns. There might have to be some special handling after doing a replacment to ensure the replacement pattern is not re-checked for matches.

#include <iostream>
#include <cstring>

size_t matchCount(const char* str, size_t size, char pat, size_t bsize) noexcept 
{
  if (bsize > 8) {
    return 0;
  }
  size_t bcount = 0; // curr bit number
  size_t pcount = 0; // curr bit in pattern char
  size_t totalm = 0; // total number of patterns matched
  const size_t limit = size*8;
  while (bcount < limit)
  {
    auto offset = bcount%8;
    char c = str[bcount/8];
    c >>= offset;
    char tpat = pat >> pcount;
    if ((c & 1) == (tpat & 1))
    {
      ++pcount;
      if (pcount == bsize)
      {
        ++totalm;
        pcount = 0;
      }
    }
    else // mismatch
    {
      bcount -= pcount; // backtrack
      //reset
      pcount = 0;
    }
    ++bcount;
  }
  return totalm;
}

int main(int argc, char** argv)
{
  const char* str = "abcdefghiibcdiixyz";
  char pat = 'i';
  std::cout << "Num matches = " << matchCount(str, 18, pat, 7) << std::endl;
  return 0;
}