I use wchar_t
for internal strings and UTF-8 for storage in files. I need to use STL to input/output text to screen and also do it by using full Lithuanian charset.
#include <io.h>
#
include <fcntl.h>
#
include <iostream>
_setmode (_fileno(stdout), _O_U16TEXT);
wcout << L"AaĄąfl" << endl;
But I became curious and attempted to do the same with files with no success. Of course I could use formatted input/output, but that is... discouraged. FILE* fp;
_wfopen_s (&fp, L"utf-8_out_test.txt", L"w");
_setmode (_fileno (fp), _O_U8TEXT);
_fwprintf_p (fp, L"AaĄą\nfl");
fclose (fp);
_wfopen_s (&fp, L"utf-8_in_test.txt", L"r");
_setmo开发者_如何转开发de (_fileno (fp), _O_U8TEXT);
wchar_t text[256];
fseek (fp, NULL, SEEK_SET);
fwscanf (fp, L"%s", text);
wcout << text << endl;
fwscanf (fp, L"%s", text);
wcout << text << endl;
fclose (fp);
This snippet works perfectly (although I am not sure how it handles malformed chars). So, is there any way to:
- get
FILE*
or integer file handle form astd::basic_*fstream
? - simulate
_setmode ()
on it? - extend
std::basic_*fstream
so it handles UTF-8 I/O?
Yes, I am studying at an university and this is somewhat related to my assignments, but I am trying to figure this out for myself. It won't influence my grade or anything like that.
Use std::codecvt_facet template to perform the conversion.
You may use standard std::codecvt_byname, or a non-standard codecvt_facet implementation.
#include <locale>
using namespace std;
typedef codecvt_facet<wchar_t, char, mbstate_t> Cvt;
locale utf8locale(locale(), new codecvt_byname<wchar_t, char, mbstate_t> ("en_US.UTF-8"));
wcout.pubimbue(utf8locale);
wcout << L"Hello, wide to multybyte world!" << endl;
Beware that on some platforms codecvt_byname can only emit conversion only for locales that are installed in the system.
Well, after some testing I figured out that FILE
is accepted for _iobuf
(in the w*fstream
constructor). So, the following code does what I need.
#
include <iostream>
#
include <fstream>
#
include <io.h>
#
include <fcntl.h>
//For writing
FILE* fp;
_wfopen_s (&fp, L"utf-8_out_test.txt", L"w");
_setmode (_fileno (fp), _O_U8TEXT);
wofstream fs (fp);
fs << L"ąfl";
fclose (fp);
//And reading
FILE* fp;
_wfopen_s (&fp, L"utf-8_in_test.txt", L"r");
_setmode (_fileno (fp), _O_U8TEXT);
wifstream fs (fp);
wchar_t array[6];
fs.getline (array, 5);
wcout << array << endl;//For debug
fclose (fp);
This sample reads and writes legit UTF-8 files (without BOM) in Windows compiled with Visual Studio 2k8.
Can someone give any comments about portability? Improvements?
The easiest way would be to do the conversion to UTF-8 yourself before trying to output. You might get some inspiration from this question: UTF8 to/from wide char conversion in STL
get FILE* or integer file handle form a std::basic_*fstream?
Answered elsewhere.
You can't make STL to directly work with UTF-8. The basic reason is that STL indirectly forbids multi-char characters. Each character has to be one char/wchar_t.
Microsoft actually breaks the standard with their UTF-16 encoding, so maybe you can get some inspiration there.
精彩评论