I've been working on a senior project for the last several months now, and a major sticking point in our team's development process has been dealing wtih rifts between Visual-C++ and gcc. (Yes, I know we all should have had the same development environment.) Things are about finished up at this point, but I ran into a moderate bug just today that had me wondering whether Visual-C++ is easier on newbies (like me) by design.
In one of my headers, there is a function that relies on strtok
to chop up a string, do some comparisons and return a string with a similar format. It works a little something like the following:
int main()
{
string a, b, c;
//Do stuff with a and b.
c = get_string(a,b);
}
string get_string(string a, string b)
{
const char * a_ch, b_ch;
a_ch开发者_运维问答 = strtok(a.c_str(),",");
b_ch = strtok(b.c_str(),",");
}
strtok
is infamous for being great at tokenizing, but equally great at destroying the original string to be tokenized. Thus, when I compiled this with gcc and tried to do anything with a
or b
, I got unexpected behavior, since the separator used was completely removed in the string. Here's an example in case I'm unclear; if I set a = "Jim,Bob,Mary"
and b="Grace,Soo,Hyun"
, they would be defined as a="JimBobMary"
and b="GraceSooHyun"
instead of staying the same like I wanted.
However, when I compiled this under Visual C++, I got back the original strings and the program executed fine.
I tried dynamically allocating memory to the strings and copying them the "standard" way, but the only way that worked was using malloc()
and free()
, which I hear is discouraged in C++. While I'm curious about that, the real question I have is this: Why did the program work when compiled in VC++, but not with gcc?
(This is one of many conflicts that I experienced while trying to make the code cross-platform.)
Thanks in advance!
-Carlos Nunez
This is an example of undefined behavior. You're passing the result of string::c_str()
, a const char*
, to strtok, which takes a char*
. By modifying the contents of the std::string data, you're invoking undefined behavior (you should be getting warnings for this unless you're casting).
When are you checking the value of a and b? In get_string, or in main? get_string is passed copies of a and b, so strtok will most likely not alter the originals in main. However, it could, as you are invoking undefined behavior.
The "right way" to do this is to use malloc/free or new[]/delete[]. You're using a C function, so you're already guilty of the same crime as you would be using malloc/free. A relatively elegant yet safe way to approach this is:
char *ap = strdup(a.c_str());
const char *a_ch = strtok(ap, ",");
/* do whatever it is you do */
free(ap);
Also bear in mind that strtok uses global state, so it won't play well with threads.
Tokens will be automatically replaced by a null-character by function strtok
. That is not what you can do with constant data.
To make your code safe and cross-platform consider using boost::tokenizer
.
I think the code is working because of differences in string implementation. VC++ string implementation must be making copies when you pass them to a function that could potentially modify the string.
精彩评论