开发者

How to escape html entities in C?

开发者 https://www.devze.com 2023-02-16 09:50 出处:网络
I\'m trying to decode HTML entities (in the format ') in C. So far I\'ve got some code to try and decode them but it seems to produce odd output.

I'm trying to decode HTML entities (in the format ') in C.

So far I've got some code to try and decode them but it seems to produce odd output.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char* convertHtmlEntities(char* str) {
    size_t length = strlen(str);
    size_t i;
    char *endchar = malloc(sizeof(char));
    long charCode;
    if (!endchar) {
        fprintf(stderr,"not enough memory");
        exit(EXIT_FAILURE);
    }
    for (i=0;i<length;i++) {
        if (*(str+i) == '&' && *(str+i+1) == '#' && *(str+i+2) >= '0' && *(str+i+2) <= '9' && *(str+i+3) >= '0' && *(str+i+3) <= '9' && *(str+i+4) == ';') {
            charCode = strtol(str+i+2,&endchar,0);
            printf("ascii %li\n",charCode);
            *(str+i) = charCode;
            strncpy(str+i+1,str+i+5,length - (i+5));
            *(str + length - 5) = 0; /* null terminate string */
        }
    }
    return str;
}

int main()
{
    char string[] = "Helloworld&#39;s parent company has changed - comF";
    printf("%s",convertHtmlEntities(&string));
}

I'm not sure if the main statement is correct because I just made it for this example as my program generates it from开发者_JAVA百科 a web url, however the idea is the same.

The function does replace the &#39; with a apostrophe, but the output is garbled at the end and just after the replacement.

Does anyone have a solution?


strncpy (or strcpy) does not work for overlapping strings.

Your strings str+i+1 and str+i+5 overlap. Don't do that!

Replace strncpy with memmove

            *(str+i) = charCode;
            memmove(str+i+1,str+i+5,length - (i+5) + 1); /* also copy the '\0' */
            /* strncpy(str+i+1,str+i+5,length - (i+5)); */
            /* *(str + length - 5) = 0; */ /* null terminate string */


I had another problem with the code - it cut the last 'F' character. I replaced this line:

 *(str + length - 5) = 0; /* null terminate string */

with this:

 *(str + length - 4) = 0; /* null terminate string */

I belive it's because you delete five chars and add one, so the new length is not old-5, but old-4.

0

精彩评论

暂无评论...
验证码 换一张
取 消