开发者

Parsing a string with varying number of whitespace characters in C

开发者 https://www.devze.com 2023-03-10 05:15 出处:网络
I\'m pretty new to C, and trying to write a function that will parse a string such as: \"This (5 spaces here) is (1 space

I'm pretty new to C, and trying to write a function that will parse a string such as:

"This (5 spaces here) is (1 space here) a (2 spaces here) string."

The function header would have a pointer to the string passed in such as:

bool Class::Parse( unsigned char* string )

In the end I'd like to parse each word regardless of the number of spaces between words, and store the words in a dynamic array.

Forgive the silly 开发者_JAVA技巧questions... But what would be the most efficient way to do this if I am iterating over each character? Is that how strings are stored? So if I was to start iterating with:

while ( (*string) != '\0' ) {

--print *string here--

}

Would that be printing out

T
h
i... etc?

Thank you very much for any help you can provide.


from http://www.cplusplus.com/reference/clibrary/cstring/strtok/

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-"); /* split the string on these delimiters into "tokens" */
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-"); /* split the string on these delimiters into "tokens" */
  }
  return 0;
}

Splitting string "- This, a sample string." into tokens:

This 
a 
sample 
string 


First of all, C does not have classes, so in a C program you would probably define your function with a prototype more like one of the following:

char ** my_prog_parse(char * string) { 
/* (returns a malloc'd array of pointers into the original string, which has had
 * \0 added throughout ) */
char ** my_prog_parse(const char * string) {
/* (returns a malloc'd NULL-terminated array of pointers to malloc'd strings) */
void my_prog_parse(const char * string, char buf, size_t bufsiz,
                      char ** strings, size_t nstrings)
/* builds a NULL-terminated array of pointers into buf, all memory 
   provided by caller) */

However, it is perfectly possible to use C-style strings in C++...

You could write your loop as

while (*string) { ... ; string++; }

and it will compile to exactly the same assembler on a modern optimizing compiler. yes, that is a correct way to iterate through a C-style string.

Take a look at the functions strtok, strchr, strstr, and strspn... one of them may help you build a solution.


I wouldn't do any non-trivial parsing in C, it's too laborious, the language is not suitable for that. But if you mean C++, and it looks like you do, since you wrote Class::Parse, then writing recursive descent parsers is pretty easy, and you don't need to reinvent the wheel. You can take Spirit for example, or AXE, if you compiler supports C++0x. For example, your parser in AXE can be written in few lines:

// assuming you have 0-terminated string
bool Class::Parse(const char* str)
{
    auto space = r_lit(' ');
    auto string_rule = "This" & r_many(space, 5) & space & 'a' & r_many(space, 2) 
        & "string" & r_end();
    return string_rule(str, str + strlen(str)).matched;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消