开发者

Search a file for a sequence of bytes (C#)

开发者 https://www.devze.com 2023-01-26 18:34 出处:网络
I\'m writing a C# application in which I need to search a file (could be very big) for a sequence of bytes, and I can\'t use any libraries to do so. So, I need a function that takes a byte array as an

I'm writing a C# application in which I need to search a file (could be very big) for a sequence of bytes, and I can't use any libraries to do so. So, I need a function that takes a byte array as an argument and returns the开发者_高级运维 position of the byte following the given sequence. The function doesn't have to be fast, it simply has to work. Any help would be greatly appreciated :)


If it doesn't have to be fast you could use this:

int GetPositionAfterMatch(byte[] data, byte[]pattern)
{
  for (int i = 0; i < data.Length - pattern.Length; i++)
  {
    bool match = true;
    for (int k = 0; k < pattern.Length; k++)
    {
      if (data[i + k] != pattern[k])
      {
        match = false;
        break;
      }
    }
    if (match)
    {
      return i + pattern.Length;
    }
  }
}

But I really would recommend you to use Knuth-Morris-Pratt algorithm, it's the algorithm mostly used as a base of IndexOf methods for strings. The algorithm above will perform really slow, exept for small arrays and small patterns.


The straight-forward approach as pointed out by Turrau works, and for your purposes is probably good enough, since you say it doesn't have to be fast - especially since for most practical purposes the algorithm is much faster than the worst case O(n*m). (Depending on your pattern I guess).

For an optimal solution you can also check out the Knuth-Morris-Pratt algorithm, which makes use of partial matches which in the end is O(n+m).


Here's an extract of some code I used to do a boyer-moore type search. It's mean to work on pcap files, so it operates record by record, but should be easy enough to modify to suit just searching a long binary file. It's sort of extracted from some test code, so I hope I got everything for you to follow along. Also look up boyer-moore searching on wikipedia, since that is what it's based off of.

int[] badMatch = new int[256];
byte[] pattern;  //the pattern we are searching for

//badMath is an array of every possible byte value (defined as static later).
//we use this as a jump table to know how many characters we can skip comparison on
//so first, we prefill every possibility with the length of our search string
for (int i = 0; i < badMatch.Length; i++)
{
  badMatch[i] = pattern.Length;
}

//Now we need to calculate the individual maximum jump length for each byte that appears in my search string
for (int i = 0; i < pattern.Length - 1; i++)
{
  badMatch[pattern[i] & 0xff] = pattern.Length - i - 1;
}

// Place the bytes you want to run the search against in the payload variable
byte[] payload = <bytes>

// search the packet starting at offset, and try to match the last character
// if we loop, we increment by whatever our jump value is
for (i = offset + pattern.Length - 1; i < end && cont; i += badMatch[payload[i] & 0xff])
{
  // if our payload character equals our search string character, continue matching counting backwards
  for (j = pattern.Length - 1, k = i; (j >= 0) && (payload[k] == pattern[j]) && cont; j--)
  {
    k--;
  }
// if we matched every character, then we have a match, add it to the packet list, and exit the search (cont = false)
  if (j == -1)
  {
    //we MATCHED!!!
    //i = end;
    cont = false;
  }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消