开发者

How to remove punctuation from a String in C

开发者 https://www.devze.com 2022-12-13 16:16 出处:网络
I\'m lo开发者_开发技巧oking to remove all punctuation from a string and make all uppercase letters lower case in C, any suggestions?Just a sketch of an algorithm using functions provided by ctype.h:

I'm lo开发者_开发技巧oking to remove all punctuation from a string and make all uppercase letters lower case in C, any suggestions?


Just a sketch of an algorithm using functions provided by ctype.h:

#include <ctype.h>

void remove_punct_and_make_lower_case(char *p)
{
    char *src = p, *dst = p;

    while (*src)
    {
       if (ispunct((unsigned char)*src))
       {
          /* Skip this character */
          src++;
       }
       else if (isupper((unsigned char)*src))
       {
          /* Make it lowercase */
          *dst++ = tolower((unsigned char)*src);
          src++;
       }
       else if (src == dst)
       {
          /* Increment both pointers without copying */
          src++;
          dst++;
       }
       else
       {
          /* Copy character */
          *dst++ = *src++;
       }
    }

    *dst = 0;
}

Standard caveats apply: Completely untested; refinements and optimizations left as exercise to the reader.


Loop over the characters of the string. Whenever you meet a punctuation (ispunct), don't copy it to the output string. Whenever you meet an "alpha char" (isalpha), use tolower to convert it to lowercase.

All the mentioned functions are defined in <ctype.h>

You can either do it in-place (by keeping separate write pointers and read pointers to the string), or create a new string from it. But this entirely depends on your application.


The idiomatic way to do this in C is to have two pointers, a source and a destination, and to process each character individually: e.g.

#include <ctype.h>

void reformat_string(char *src, char *dst) {
    for (; *src; ++src)
        if (!ispunct((unsigned char) *src))
            *dst++ = tolower((unsigned char) *src);
    *dst = 0;
}

src and dst can be the same string since the destination will never be larger than the source.

Although it's tempting, avoid calling tolower(*src++) since tolower may be implemented as a macro.

Avoid solutions that search for characters to replace (using strchr or similar), they will turn a linear algorithm into a geometric one.


Here's a rough cut of an answer for you:

void strip_punct(char * str) {
    int i = 0;
    int p = 0;
    int len = strlen(str);
    for (i = 0; i < len; i++) {
        if (! ispunct(str[i]) {
            str[p] = tolower(str[i]);
            p++;
        }
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消