I just wrote a program that tokenizes a char
array using pointers. The program only needed to work with a space as the delimiter character. I just turned it in and got开发者_StackOverflow中文版 full credit, but after turning it in, I realized that this program worked only if the delimiter character was a space.
My question is, how could I make this program work with an arbitrary delimiter character?
The function I've shown you below returns a pointer to the next word in the char array. This is what I believe I need to change for it to work with any delimiter character.
Thanks!
Code:
char* StringTokenizer::Next(void) {
pNextWord = pStart;
if (*pStart == '\0') { return NULL; }
while (*pStart != delim) {
pStart++;
}
if (*pStart == '\0') { return NULL; }
*pStart = '\0';
pStart++;
return pNextWord;
}
The printing loop in main()
:
while ((nextWord = tk.Next()) != NULL) {
cout << nextWord << endl;
}
The simpliest way is to change your
while (*pStart != delim)
to something like
while (*pStart != ' ' && *pStart != '\n' && *pStart != '\t')
Or, you could make delim a string, and create a function that checks if a char is in the string:
bool isDelim(char c, const char *delim) {
while (*delim) {
if (*delim == c)
return true;
delim++;
}
return false;
}
while ( !isDelim(*pStart, " \n\t") )
Or, perhaps the best solution is to use one of the prebuilt functions for doing all this, such as strtok.
Just change the line
while (*pStart != delim)
as follows:
while (*pStart != '\0' && strchr(" \t\n", *pStart) == NULL)
The standard strchr
function (declared in the string.h
header)
looks for a character (given in the second argument) in a C-string
(given in the first argument) and returns a pointer to the position
where that character occurs for the first time. Hence, the expression
strchr(" \t\n", *pStart) == NULL
is true if the current character
(*pStart
) cannot be not found in string " \t\n"
and, therefore,
is not a delimiter. (Modify the delimiter string to adapt it to your
needs, of course.)
This approach provides a short and simple way to test whether a given character belongs to a (small) set of characters of interest. And it uses a standard function.
By the way, you can do this using not only a C-string, but with
a std::string
, too. All you need is to declare a const std::string
with " \t\n"
-like value and then replace the call to the strchr
function with the find
method of the declared delimiter string.
Hmm...this doesn't look quite right:
if (*pStart = '\0')
The condition can never be true. I'm guessing you intended ==
instead of =
? You also have a bit of a problem here:
while (*pStart != delim)
If the last word in the string isn't followed by a delimiter, this is going to run off the end of the string, which will cause serious problems.
Edit: Unless you really need to do this on your own, consider using a stringstream for the job. It already has all the right mechanism in place and quite heavily tested. It does add overhead, but it's quite acceptable in a lot of cases.
Not compiled. but I'd do something like this.
//const int N = someGoodValue;
char delimList[N] = {' ',',','.',';', '|', '!', '$', '\n'};//all delims here.
char* StringTokenizer::Next(void)
{
if (*pStart == '\0') { return NULL; }
pNextWord = pStart;
while (1){
for (int x = 0; x < N; x++){
if (*pStart == delimList[x]){ //this is it.
*pStart = '\0';
pStart++;
return pNextWord;
}
}
if ('\0' == *pStart){ //last word.. maybe.
return pNextWord;
}
pStart++;
}
}
// (!compiled).
I assume that we want to stick to C instead of C++. Functions strspn
and strcspn
are good for tokenizing by a set a delimiters. You can use strspn to find where the next separator begins (i.e. where the current token ends) and then using strcspn to find where the separator ends (i.e. where the next token begins). Loop until you reach the end.
精彩评论