开发者

mbsctows to count the number of wide characters in an array

开发者 https://www.devze.com 2023-02-06 15:27 出处:网络
I am currently working on UNIX and COBOL and have hit an requirement where I need to provide the number of chinese and korean characters in the received message which I plan to accomplish in C program

I am currently working on UNIX and COBOL and have hit an requirement where I need to provide the number of chinese and korean characters in the received message which I plan to accomplish in C program using mbstows.

I am using the below code which is not giving the correct count for the chinese double byte characters but giving the byte count.

#include <wchar.h>
#include <stdio.h>
#include <locale.h>
int main(int argc, char *argv[] )
{
    if ( argc != 2 开发者_开发问答) /* argc should be 2 for correct execution */
    {
        /* We print argv[0] assuming it is the program name */
        printf( "usage: %s filename", argv[0] );
    }
    int Size = getCharCount(argv[1]);
    printf ("THE CHAR COUNT  %d", Size);
    return Size;
}
int getCharCount(char *argv)
{
    wchar_t *wcsVal = NULL;     
    char *mbsVal = NULL;
    char* localeInfo;
    setlocale(LC_ALL, "zh_CN.GB18030");

    /* verify locale is set */      
    if (setlocale(LC_ALL, "") == 0)      
    {
        /*                      printf(stderr, "Failed to set locale\n"); */
        return 1;
    }
    mbsVal = argv;
    printf (" MBSVAL %s\n", mbsVal);
    /* validate multibyte string and convert to wide character */

    int size = mbstowcs(NULL, mbsVal, 0);
    if (size == -1)
    {         
        printf("Invalid multibyte\n");         
        return 1;
    }
    return size; 
}

Appreciate your kind response...

Regards

Akm


Setting the locale to a specific value chosen by the programmer in order to process a particular character set is incorrect usage. Not only are locale names implementation-specific; they're also intended to reflect the user's or system's character encoding.

If you need to programmatically process a particular character encoding, the iconv interface exists for this purpose. Use iconv_open("WCHAR_T", "GB18030"); to obtain a conversion descriptor, and convert a couple kb at a time into a throwaway buffer on the stack, summing up the number of output characters obtained from each run.


Your line:

if (setlocale(LC_ALL, "") == 0)

will reset the LOCALE to the values set in environment variables, so maybe not the chinese character set anymore. Try to remove it or check environment variables values.

0

精彩评论

暂无评论...
验证码 换一张
取 消