开发者

How can I make this work with every delimiter in C++?

开发者 https://www.devze.com 2022-12-19 06:01 出处:网络
I just wrote a program that tokenizes a char array using pointers. The program only needed to work with a space as the delimiter character. I just turned it in and got开发者_StackOverflow中文版 full c

I just wrote a program that tokenizes a char array using pointers. The program only needed to work with a space as the delimiter character. I just turned it in and got开发者_StackOverflow中文版 full credit, but after turning it in, I realized that this program worked only if the delimiter character was a space.

My question is, how could I make this program work with an arbitrary delimiter character?

The function I've shown you below returns a pointer to the next word in the char array. This is what I believe I need to change for it to work with any delimiter character.

Thanks!

Code:

char* StringTokenizer::Next(void) {
    pNextWord = pStart;

    if (*pStart == '\0') { return NULL; }

    while (*pStart != delim) {
        pStart++;
    }

    if (*pStart == '\0') { return NULL; }

    *pStart = '\0';
    pStart++;

    return pNextWord;
}

The printing loop in main():

while ((nextWord = tk.Next()) != NULL) {
    cout << nextWord << endl;
}


The simpliest way is to change your

while (*pStart != delim)

to something like

while (*pStart != ' ' && *pStart != '\n' && *pStart != '\t')

Or, you could make delim a string, and create a function that checks if a char is in the string:

bool isDelim(char c, const char *delim) {
   while (*delim) {
      if (*delim == c)
         return true;
      delim++;
   }
   return false;
}

while ( !isDelim(*pStart, " \n\t") ) 

Or, perhaps the best solution is to use one of the prebuilt functions for doing all this, such as strtok.


Just change the line

while (*pStart != delim)

as follows:

while (*pStart != '\0' && strchr(" \t\n", *pStart) == NULL)

The standard strchr function (declared in the string.h header) looks for a character (given in the second argument) in a C-string (given in the first argument) and returns a pointer to the position where that character occurs for the first time. Hence, the expression strchr(" \t\n", *pStart) == NULL is true if the current character (*pStart) cannot be not found in string " \t\n" and, therefore, is not a delimiter. (Modify the delimiter string to adapt it to your needs, of course.)

This approach provides a short and simple way to test whether a given character belongs to a (small) set of characters of interest. And it uses a standard function.

By the way, you can do this using not only a C-string, but with a std::string, too. All you need is to declare a const std::string with " \t\n"-like value and then replace the call to the strchr function with the find method of the declared delimiter string.


Hmm...this doesn't look quite right:

if (*pStart = '\0')

The condition can never be true. I'm guessing you intended == instead of =? You also have a bit of a problem here:

while (*pStart != delim)

If the last word in the string isn't followed by a delimiter, this is going to run off the end of the string, which will cause serious problems.

Edit: Unless you really need to do this on your own, consider using a stringstream for the job. It already has all the right mechanism in place and quite heavily tested. It does add overhead, but it's quite acceptable in a lot of cases.


Not compiled. but I'd do something like this.

 //const int N = someGoodValue;
char delimList[N] = {' ',',','.',';', '|', '!', '$', '\n'};//all delims here.

char* StringTokenizer::Next(void)
{
    if (*pStart == '\0') { return NULL; }

    pNextWord = pStart;

    while (1){  
        for (int x = 0; x < N; x++){
            if (*pStart == delimList[x]){ //this is it.
                *pStart = '\0';
                pStart++;
                return pNextWord;
            }

        }
        if ('\0' == *pStart){ //last word.. maybe.
                return pNextWord;   
        }
        pStart++;
    }
}

// (!compiled).


I assume that we want to stick to C instead of C++. Functions strspn and strcspn are good for tokenizing by a set a delimiters. You can use strspn to find where the next separator begins (i.e. where the current token ends) and then using strcspn to find where the separator ends (i.e. where the next token begins). Loop until you reach the end.

0

精彩评论

暂无评论...
验证码 换一张
取 消