I am currently working on UNIX and COBOL and have hit an requirement where I need to provide the number of chinese and korean characters in the received message which I plan to accomplish in C program using mbstows.
I am using the below code which is not giving the correct count for the chinese double byte characters but giving the byte count.
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
int main(int argc, char *argv[] )
{
if ( argc != 2 开发者_开发问答) /* argc should be 2 for correct execution */
{
/* We print argv[0] assuming it is the program name */
printf( "usage: %s filename", argv[0] );
}
int Size = getCharCount(argv[1]);
printf ("THE CHAR COUNT %d", Size);
return Size;
}
int getCharCount(char *argv)
{
wchar_t *wcsVal = NULL;
char *mbsVal = NULL;
char* localeInfo;
setlocale(LC_ALL, "zh_CN.GB18030");
/* verify locale is set */
if (setlocale(LC_ALL, "") == 0)
{
/* printf(stderr, "Failed to set locale\n"); */
return 1;
}
mbsVal = argv;
printf (" MBSVAL %s\n", mbsVal);
/* validate multibyte string and convert to wide character */
int size = mbstowcs(NULL, mbsVal, 0);
if (size == -1)
{
printf("Invalid multibyte\n");
return 1;
}
return size;
}
Appreciate your kind response...
Regards
Akm
Setting the locale to a specific value chosen by the programmer in order to process a particular character set is incorrect usage. Not only are locale names implementation-specific; they're also intended to reflect the user's or system's character encoding.
If you need to programmatically process a particular character encoding, the iconv
interface exists for this purpose. Use iconv_open("WCHAR_T", "GB18030");
to obtain a conversion descriptor, and convert a couple kb at a time into a throwaway buffer on the stack, summing up the number of output characters obtained from each run.
Your line:
if (setlocale(LC_ALL, "") == 0)
will reset the LOCALE to the values set in environment variables, so maybe not the chinese character set anymore. Try to remove it or check environment variables values.
精彩评论