C string casting between libraries_问答_开发者

I work on a project using libpcap for capturing IP packets. libpcap returns captured data in a buffer, with an unsigned char * pointer and a buffer length. The data in the buffer is not null-terminated.

I do process the buffer data with library functions, e.g. string functions from the C standard library. These functions expect (signed) char * pointers, requiring casting the data between unsigned char * and char *.

I like the idea of assuming an unsigned char * buffer as not-null-terminated (accompanied by a buffer length) with potentially non-printable characters, as opposed to a char * buffer which holds a printable string literal. However, that forces me to cast the libpcap buffer for each string function call which makes the code ugly.

What would be you开发者_高级运维r coding style preference in this case?

Keep the unsigned char * and cast when calling string functions.
Cast the libpcap buffer to char * immediatelly after receiving it from libpcap and differ between raw data and strings via variable naming conventions in the upstream code.

If you know that you are at a protocol level where there is supposed to be text, use the second approach, just keep a char* around and use that where needed. There's no reason to cast it to a char* everywhere.

However, be very, very, very careful about which string handling functions you use. You are capturing stuff off th wire, you could be getting anything. i.e. you have to respect the total length of the pcap supplied buffer everywhere - functions such as strlen, strcpy, etc. cannot be used unless you safely alter and nul terminate the buffer. (and you really have to make sanity checks, if e.g.you're parsing the length of an UDP packet and the length says 130 bytes, doesn't mean there actually is 130 bytes you can safely access)

You also have to verify that what you're parsing actually is text, you should not e.g. just print out a chunk of the payload assuming it is text.

This feels like a stylistic question to me, and if I were you I'd use the format that is going to be used by the most functions. If you have two or three that want the char * then I'd cast it for those few instances. However if you have many functions that want the char * and only a few that use the unsigned char * then I'd cast it when returned by libpcap.

Keep the unsigned char * and cast when calling string functions.

A signed value is not equivalent to an unsigned value you can get into all kinds of messes ignoring that fact. For example if you compare signed char and a unsigned char with the value 0xff to a signed integer with value -1 you will get different results.

ANSI C (and later standards) do not define if char is signed or unsigned by default it is left up to the compiler implementer to decide this (this is even mentioned at the beginning of the K&R book).

That said, I would keep it as is and cast it where needed only if you know that it is safe to do so. My reasoning is that if someone else needs to work with your code they will be aware of the fact that this data is unsigned for a reason. Because of this they will probably be able to ask the same question you did rather than assume it can be treated as a string. Also casting will communicate the intent to convert the type.

I probably don't have to tell you this be you should watch out for non zero terminated strings especially when dealing with the outside world.

I would be at least somewhat tempted to use C99 inline functions to 'cover' the libpcap functions. If the libpcap function is unsigned char *libpcap_func(int fd, unsigned char *buffer, size_t buflen), then you might write and use:

static inline char *pc_libpcap_func(int fd, char *buffer, size_t buflen)
{
    return (char *)libpcap_func(fd, (unsigned char *)buffer, bufflen);
}

This would go in a header, of course. The pc_ prefix is for 'plain char'. You can write one of these cover functions for each of the libpcap functions that you use (possibly even the ones that don't take any plain char pointers, just for consistency).

You would write your code to call the pc_ versions of the function.

Because they're inlined, they will be as efficient as macros, which would be the classic way to deal with the problem:

#define libpcap_func(fd, buffer, buflen) \
                ((char *)(libpcap_func)(fd, (unsigned char *)(buffer), bufflen)

This slightly tricky code relies on the fact that when a function-like macro name appears without an open parenthesis as the next token, it is not an invocation of that macro, and on the fact that when a macro is being expanded, its symbol is no longer eligible for expansion (preventing infinite recursion in the preprocessor; ISO/IEC 9899:1999 §6.10.3.4 'Rescanning and further replacement'). Or you could name the macros with a pc_ prefix as with the inline functions.