开发者

delete previous character in a byte array

开发者 https://www.devze.com 2023-02-22 17:45 出处:网络
Given a byte array, in which characters can be 1 byte or 2 bytes long For 1-byte characters, the Most significant bit must be 0. For a 2-

Given a byte array, in which characters can be 1 byte or 2 bytes long For 1-byte characters, the Most significant bit must be 0. For a 2- byte character, the mos开发者_StackOverflowt significant bit of the most significant byte must be one and the most significant bit of the least significant byte is don’t care (X). You are given the index, I of a character in the byte array. Note that I-1 or I+1 can lead you to either a character or the middle of a character. Give a logic (no need of code) to delete the previous character of the one to which I points.


Well then, let's try. I'm assuming the array starts at index = 0 and that we know its size. If not, change the while loop in the code below to count down and jiggle the logic around a bit.

The real question is not how to remove the previous element. It's figuring out if the given index i is the start of a char (be it 1- or 2-byte) or the middle of a 2-byte one. Once we know that, removing the previous element is trivial, since it's trivial to figure out if the previous element is 1- or 2-byte.

So, I believe this should work to figure out what index i is. Pseudo-code:

if MSB_i == 0
{
  if MSB_(i+1) == 1
    -> Start of 2-byte char
  else
    -> Start of 1-byte char
} else
{
  if MSB_(i+1) == 0
    -> Middle of 2-byte char
  else
  {
    j = i + 1
    while (MSB_j == 1) AND (j != size)
      j++

    j = j - i
    if j modulo 2 == 1
      -> Start of 2-byte char
    else
      -> Middle of 2-byte char
}

Obviously, I have not implemented and check this, but it seems like all options are covered. Feel free to correct me or poke me if you want a written explanation as to why this should work.


Looks to me like you all are a bit confused?

First, what is "the middle of a 2 byte character"? Is it perhaps the last 1/3 of the first character and the first 1/3 of the second one? No, its nonsense. A 2 byte character has a first and a second byte, but no "middle".

Second, because it is not specified in what order the most significant and least significant bytes are stored, the exercise is a bit underspecified.

But let us assume, that the LSByte is stored at the lower address then the task is easy. Just check the most significant bit of byte[i-1], if it is set, then the previous character is a double byte character, else it's single byte. (We know that i adresses a character, not just a byte.)


For any element array[i], you can determine if i points to a single byte character, the start of a 2-byte character, or the middle of a 2-byte character using the following test:

Starting with array[i-1] count the number of contiguous MSB==1.

If array[i] is preceded by an ODD number of 1's, then array[i] is the middle of a 2-byte array.

If array[i] is preceded by an EVEN number of 1's, then if MSB(array[i]) is 0, array[i] is a single byte character, otherwise array[i] is the start of a 2-byte character.

Since we're trying to delete the character just before array[i], once you determine if array[i] is the start or the middle of a character, then you have to run the same test for array[i-x], where x is either 1 or 2 depending on if array[i] is pointing to the start or middle of a character, respectively.


Edit (What happens when arr[0] is 1-byte, and arr[1] is 2-byte?):

Firstly, more detail about the search for contiguous 1's: When counting contiguous 1's, the loop stops if we reach array[0], or MSB(array[j])==0.

odd=0
j = i
while( j && MSB(arr[j-1]) )
    j-=1
    odd^=1       <<(binary XOR)

When the loop completes, odd will be 1 if there are an odd number of contiguous 1's and odd will be 0 if there are 0 or an even number of contiguous 1's.


If we have an array with a 1-byte character in arr[0] and a 2-byte character in arr[1], then, supposedly, i can only have the values 0, 1, or 2.

  • i=0: The loop never runs because i==0. We consider there to be an EVEN number of preceding 1's because odd==0. The MSB of arr[i] is 0, so arr[i] is the start of a 1-byte character.
  • i=1: The loop never runs because MSB(arr[i-1]) is 0. We consider there to be an EVEN number of contiguous 1's because odd==0. The MSB of arr[i] is 1, so arr[i] is the start of a 2-byte character.
  • i=2: The loop runs one time. We find an ODD number of consecutive 1's. Because there is an odd number of preceding 1's, arr[i] is the middle of a 2-byte character.
0

精彩评论

暂无评论...
验证码 换一张
取 消