Inserting into a bitstream_问答_开发者_运维开发者技术经验分享

I'm looking for a way to efficiently insert bits into a bitstream and have it 'overflow', padding with 0's.

So for example if you had a byte array with 2 bytes: 231 and 109 (11100111 01101101), and did BitInsert(byteArray,4,00) it would insert two bits at bit offset 4 making 11100001 11011011 01000000 (225,219,24). It would be ok even the method only allowed 1 bit insertions e.g. BitInsert(byteArray,4,true) or BitInsert(byteArray,4,false), but the method must be independent of bitstream length (the stream could span several hundred bytes worth).

I have one method of doing it, but it has to walk the stream with a bitmask bit by bit, so I'm wondering if there's a simpler approach...

Answers in assembly or a C derivative would be appreciated.

Edit: The particular use case is an implementation of an encoding scheme开发者_开发知识库 which reads a byte array 6 bits at a time, and encodes them (with 2 bit padding) into single byte. So every 6 bits, you insert 2 bits. {33,66,99} which as a bit stream is 001000010100001001100011 becomes 00001000000101000000100100100011 notice the inserts as xx: xx001000xx010100xx001001xx100011

I'm hoping for a way to do this without bit-walking... (Also if anyone knows an official name for this encoding scheme it would be helpful, as I've yet to identify it...it came up when porting an older C program into C#)

I had an hour to kill while proctoring a test, and here's the result:

class BitStream
{
    private List<byte> _bytes;

    public BitStream()
    {
        _bytes = new List<byte>();
        _bytes.Add(0);
    }

    public byte[] Data
    {
        get { return _bytes.ToArray(); }
    }

    public void Insert(int index, int value)
    {
        if (value < 0)
            value *= -1;

        int bit = value % 2;

        byte bitmask = GetBitmask(index, bit);

        // perform the right-shift operation
        int active_byte = index / 8;

        bool padded = PadIfNecessary();

        int i;
        if (padded)
            i = _bytes.Count - 2;
        else
            i = _bytes.Count - 1;

        for ( ; i > active_byte; --i)
        {
            _bytes[i] = (byte)(_bytes[i] << 1);

            // carry from earlier byte if necessary
            if ((_bytes[i - 1] & 128) == 128)
                _bytes[i] |= 1;
        }

        // now shift within the target byte
        _bytes[active_byte] = ShiftActiveByte(_bytes[active_byte], index);

        _bytes[active_byte] |= GetBitmask(index, bit);
    }

    protected byte ShiftActiveByte(byte b, int index)
    {
        index = index % 8;
        byte low_mask = 0;
        byte high_mask = 255;

        for (int i=0; i<index; ++i)
        {
            low_mask = (byte)((low_mask << 1) | 1);
            high_mask = (byte)(high_mask << 1);
        }

        byte low_part = (byte)(b & low_mask);
        byte high_part = (byte)(b & high_mask);

        high_part <<= 1;

        return (byte)(low_part | high_part);
    }

    protected byte GetBitmask(int index, int value)
    {
        return (byte)(value << (index % 8));
    }

    protected bool PadIfNecessary()
    {
        if ((_bytes[_bytes.Count - 1] & 128) == 128)
        {
            _bytes.Add(1);
            return true;
        }
        else return false;
    }
}

It won't handle inserting at an index beyond the existing bounds of the internal array, but otherwise handles itself properly in my informal smoke tests.

If you know your output will fit into something like an int32 or int64 you could probably use the bitshift operator >>.

Use a predefined set of masks to split your input stream into 2 parts.
Use the >> to move the tail end a number of bits equal to the length of your desired insertion.
Do the same thing to your insertion piece.
Use the | operator to combine all 3 pieces back together.

The efficient choice will depend greatly on where your bottleneck is. Since you are asking specifically about the insertions, I assume that your application needs to do a lot of random-access insertions, and really only needs to read the complete stream in-order once in a while. If that's the case, here are some possible options:

Option 1: A linked list of bytes

struct BitStreamNode;
{
    Byte size;
    Byte bits;
    BitStreamNode * next;
};

This will work best in cases where you can maintain a pointer to the node at which you want to insert the bits. If you have to specify the insertion point as a numeric index, see option 2. Note that I have included a size member. This would allow you to insert two bits as follows:

BitStreamNode newNode;
newNode.bits = 0x02; // for example
newNode.size = 2;

newNode.next = nodeToInsertAfter.next;
nodeToInsertAfter.next = newNode;

Inserting in the middle of an existing node would of course require splitting the node into two parts. Again, let me stress that this will only be more efficient than shuffling the entire array to the right if you a) do this quite often and b) have a large number of bits in your stream.

Option 2: A balanced tree structure

This method would use a similar node as outlined in option 1, but would include a numeric index at each node, and a link to the higher and lower indexed nodes. The basic idea is a binary search tree which cuts down the time required to find the exact location of a specific node within the stream, but with the addition of in-order links to the next and previous nodes (so as to be able to read the stream in-order).

Update: The exact definition appears to be a "threaded tree".

This one would require a good deal of work to implement properly, so I would only recommend going this route if you are absolutely sure that the speed gains will be worth it. i.e.: profile the basic brute-force solution, and then gauge the pros and cons of putting in the extra effort and complexity.