开发者

code review: finding </body> tag reverse search on a non-null terminated char str [closed]

开发者 https://www.devze.com 2023-03-13 19:01 出处:网络
Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed. This question is off-topic. It is not currently accepting answers.

Want to improve this question? Update the question so it's on-topic for Stack Overflow.

Closed 11 years ago.

开发者_运维技巧 Improve this question

src is a non-null terminated char string whose length is data_len. I want to start from the end of this array, and find the first occurrence of html </body> tag.

find_pos should hold the position of the </body> tag with src

Does the code below look correct to you?

char *strrcasestr_len(const char *hay, size_t haylen, const char *ndl,size_t ndllen)
{    
   char *ret = NULL;
   int i;
   for (i = haylen - ndllen; i >= 0; i--) {
       if (!strncasecmp(&hay[i], ndl, ndllen)) {
            break;
       }
   }
   if (i == -1)
       return ret;
   else
       return (char *)&hay[i];
}


This should do it, very very fast.

char const* find_body_closing_tag( char const* const src, size_t const data_len )
{
    static char table[256];
    static bool inited;
    if (!inited) {
         table['<'] = 1;
         table['/'] = 2;
         table['b'] = table['B'] = 3;
         table['o'] = table['O'] = 4;
         table['d'] = table['D'] = 5;
         table['y'] = table['Y'] = 6;
         table['>'] = 7;
         inited = true;
    }

    for( char const* p = src + data_len - 7; p >= src; p -= 7 ) {
        if (char offset = table[*p]) {
            if (0 == strnicmp(p - (offset-1), "</body>", 7)) return p - (offset-1);
        }
    }
    return 0;
}

Another very fast approach would be using SIMD to test 16 consecutive characters against '>' at once (and this is what strrchr or memrchr ought to be doing).

0

精彩评论

暂无评论...
验证码 换一张
取 消