开发者

C: Parse empty tokens from a string with strtok

开发者 https://www.devze.com 2023-01-09 20:55 出处:网络
My application produces strings like the one below. I need to parse values between the separator into individual values.

My application produces strings like the one below. I need to parse values between the separator into individual values.

2342|2sd45|dswer|2342||5523|||3654|Pswt

I am using strtok to do this in a loop. For the fifth token, I am getting 5523. However, I need to account for the empty value between the two separators || as well. 5523 should be the sixth token, as per开发者_JS百科 my requirement.

token = (char *)strtok(strAccInfo, "|");

for (iLoop=1;iLoop<=106;iLoop++) { 
            token = (char *)strtok(NULL, "|");
}

Any suggestions?


In that case I often prefer a p2 = strchr(p1, '|') loop with a memcpy(s, p1, p2-p1) inside. It's fast, does not destroy the input buffer (so it can be used with const char *) and is really portable (even on embedded).

It's also reentrant; strtok isn't. (BTW: reentrant has nothing to do with multi-threading. strtok breaks already with nested loops. One can use strtok_r but it's not as portable.)


That's a limitation of strtok. The designers had whitespace-separated tokens in mind. strtok doesn't do much anyway; just roll your own parser. The C FAQ has an example.


On a first call, the function expects a C string as argument for str, whose first character is used as the starting location to scan for tokens. In subsequent calls, the function expects a null pointer and uses the position right after the end of last token as the new starting location for scanning.

To determine the beginning and the end of a token, the function first scans from the starting location for the first character not contained in delimiters (which becomes the beginning of the token). And then scans starting from this beginning of the token for the first character contained in delimiters, which becomes the end of the token.

What this say is that it will skip any '|' characters at the beginning of a token. Making 5523 the 5th token, which you already knew. Just thought I would explain why (I had to look it up myself). This also says that you will not get any empty tokens.

Since your data is setup this way you have a couple of possible solutions:
1) find all occurrences of || and replace with | | (put a space in there)
2) do a strstr 5 times and find the beginning of the 5th element.


char *mystrtok(char **m,char *s,char c)
{
  char *p=s?s:*m;
  if( !*p )
    return 0;
  *m=strchr(p,c);
  if( *m )
    *(*m)++=0;
  else
    *m=p+strlen(p);
  return p;
}
  • reentrant
  • threadsafe
  • strictly ANSI conform
  • needs an unused help-pointer from calling context

e.g.

char *p,*t,s[]="2342|2sd45|dswer|2342||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
  puts(t);

e.g.

char *p,*t,s[]="2,3,4,2|2s,d4,5|dswer|23,42||5523|||3654|Pswt";
for(t=mystrtok(&p,s,'|');t;t=mystrtok(&p,0,'|'))
{
  char *p1,*t1;
  for(t1=mystrtok(&p1,t,',');t1;t1=mystrtok(&p1,0,','))
    puts(t1);
}

your work :) implement char *c as parameter 3


Look into using strsep instead: strsep reference


Use something other than strtok. It's simply not intended to do what you're asking for. When I've needed this, I usually used strcspn or strpbrk and handled the rest of the tokeninzing myself. If you don't mind it modifying the input string like strtok, it should be pretty simple. At least right off, something like this seems as if it should work:

// Warning: untested code. Should really use something with a less-ugly interface.
char *tokenize(char *input, char const *delim) { 
    static char *current;    // just as ugly as strtok!
    char *pos, *ret;
    if (input != NULL)
        current = input;

    if (current == NULL)
        return current;

    ret = current;
    pos = strpbrk(current, delim);
    if (pos == NULL) 
        current = NULL;
    else {
        *pos = '\0';
        current = pos+1;
    }
    return ret;
}


Inspired by Patrick Schlüter answer I made this function, it is supposed to be thread safe and support empty tokens and doesn't change the original string

char* strTok(char** newString, char* delimiter)
{
    char* string = *newString;
    char* delimiterFound = (char*) 0;
    int tokLenght = 0;
    char* tok = (char*) 0;

    if(!string) return (char*) 0;

    delimiterFound = strstr(string, delimiter);

    if(delimiterFound){
        tokLenght = delimiterFound-string;
    }else{
        tokLenght = strlen(string);
    }

    tok = malloc(tokLenght + 1);
    memcpy(tok, string, tokLenght);
    tok[tokLenght] = '\0';

    *newString = delimiterFound ? delimiterFound + strlen(delimiter) : (char*)0;

    return tok;
}

you can use it like

char* input = "1,2,3,,5,";
char** inputP = &input;
char* tok;
while( (tok=strTok(inputP, ",")) ){
    printf("%s\n", tok);
}

This suppose to output

1
2
3

5

I tested it for simple strings but didn't use it in production yet, and posted it on code review too, so you can see what do others think about it


Below is the solution that is working for me now. Thanks to all of you who responded.

I am using LoadRunner. Hence, some unfamiliar commands, but I believe the flow can be understood easily enough.

char strAccInfo[1024], *p2;
int iLoop;

Action() {  //This value would come from the wrsp call in the actual script.
    lr_save_string("323|90||95|95|null|80|50|105|100|45","test_Param");

    //Store the parameter into a string - saves memory. 
    strcpy(strAccInfo,lr_eval_string("{test_Param}"));
    //Get the first instance of the separator "|" in the string
    p2 = (char *) strchr(strAccInfo,'|');

    //Start a loop - Set the max loop value to more than max expected.
    for (iLoop = 1;iLoop<200;iLoop++) { 

        //Save parameter names in sequence.
        lr_param_sprintf("Param_Name","Parameter_%d",iLoop);

        //Get the first instance of the separator "|" in the string (within the loop).
        p2 = (char *) strchr(strAccInfo,'|');           

        //Save the value for the parameters in sequence. 
        lr_save_var(strAccInfo,p2 - strAccInfo,0,lr_eval_string("{Param_Name}"));   

        //Save string after the first instance of p2, as strAccInfo - for looping.
        strcpy(strAccInfo,p2+1);

        //Start conditional loop for checking for last value in the string.
        if (strchr(strAccInfo,'|')==NULL) {
            lr_param_sprintf("Param_Name","Parameter_%d",iLoop+1);
            lr_save_string(strAccInfo,lr_eval_string("{Param_Name}"));
            iLoop = 200;    
        }
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号