开发者

C string split problem

开发者 https://www.devze.com 2023-04-08 02:14 出处:网络
I am writing small IRC Bot, and i need to split incoming messages for easier handling. I wrote a function get_word, which should split string. According to gdb and valgrind, problem is that function s

I am writing small IRC Bot, and i need to split incoming messages for easier handling. I wrote a function get_word, which should split string. According to gdb and valgrind, problem is that function sometimes returns invalid pointer, and program fails when trying to free that pointer. Here is the code:

char **get_word(char *str) {
   char **res;
   char *token, *copy;
   int size = 1;
   for(int i = 0; str[i] != '\0'; i++) {
      if(str[i] == ' ') {
         while(str[i] == ' ') {
            i++;
  开发者_Python百科       }
         size++;
      }
   }
   res = malloc((size + 1) * sizeof(char *));
   copy = strdup(str);
   token = strtok(copy, " ");
   for(int i = 0; token != NULL; i++) {
      res[i] = strdup(token);
      token = strtok(NULL, " ");
   }
   free(copy);
   res[size] = NULL;
   return res;
}


One problem I see is with your nested loops:

Consider this input: ' \0'

The function execution reaches the for loop, i == 0. Then the while loop is also entered. At the end of while loop i == 1. Now the incrementation statement from the for loop is executed and i == 2. So next you will be reading past the end of the string.

EDIT

I understand that size is the number of words found in the input. So I'd go for something like:

for (int i = 0; str[i] != '\0'; ++i) {
    if (str[i] != ' ' && (str[i + 1] == ' ' || str[i + 1] == '\0')) {
         // Counting endings of words
         size++;
    }
}


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char **split (const char *str)
{
char **arr = NULL;
size_t cnt=0;
size_t pos, len;

arr = malloc (sizeof *arr);
for (pos = 0; str[pos]; pos += len) {
    char *dup;
    len = strspn(str+pos, " \t\r\n" );
    if (len) continue;

    len = strcspn(str+pos, " \t\r\n" );
    if (!len) break;

    arr = realloc (arr, (2+cnt) * sizeof *arr);
    dup = malloc (1+len);
    memcpy (dup, str+pos, len);
    dup [len] = 0;
    arr [cnt++] = dup;
    }
arr[cnt] = NULL;
return arr;
}

int main(int argc, char **argv)
{
char **zzz;

for( zzz = split( argv[1] ); *zzz; zzz++) {
    printf( "->%s\n", *zzz );
    }
return 0;
}

The reallocation is a bit clumsy (like in the OP) and improving it is left as an exercise to the reader 8-}


As julkiewicz points out, your nested loops where you count the words could miss the terminating null on str. Additionally, if str ends with spaces, your current code would count an extra word.

You could replace this section:

int size = 1;
for(int i = 0; str[i] != '\0'; i++) {
   if(str[i] == ' ') {
      while(str[i] == ' ') {
         i++;
      }
      size++;
   }
}

..with something like this:

while (*str == ' ') str++;  // skip leading spaces on str
/* count words */
int size = 0;
char *s = str;
do {
  if (*s && *s != ' ') {
    size++;                       // non-space group found
    while (*s && *s != ' ') s++;  // skip to next space
  }
  while (*s == ' ') s++;          // skip spaces after words
} while (*s);

..which counts the starts of groups of non-space characters, rather than groups of spaces, and watches for the terminating null in the inner loops.

You might also consider changing:

for(int i = 0; token != NULL; i++) {

..to:

for(int i = 0; token && i < size; i++) {

..just as a slightly paranoid guard in case strtok finds more words than you counted (though it shouldn't).


I think gdb may be complaining about the fact that you never check the return value of malloc (or strdup) to see if it's non-NULL.

0

精彩评论

暂无评论...
验证码 换一张
取 消