I execute following code on windows xp and py开发者_C百科thon 2.6.4
But it show IOError.
How to open file whose name has utf-8 codec.
>>> open( unicode('한글.txt', 'euc-kr').encode('utf-8') )
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
open( unicode('한글.txt', 'euc-kr').encode('utf-8') )
IOError: [Errno 22] invalid mode ('r') or filename: '\xed\x95\x9c\xea\xb8\x80.txt'
But the following code to the normal operation.
>>> open( unicode('한글.txt', 'euc-kr') )
<open file u'\ud55c\uae00.txt', mode 'r' at 0x01DD63E0>
The C runtime interface that Windows exposes to Python uses the system code page to encode filenames. Unlike on OS X and modern Linux versions, on Windows the system code page is never UTF-8. So the UTF-8 byte string won't be any good.
You could encode the filename to the current code page using .encode('mbcs')
, which in your case is probably equivalent to .encode('cp949')
. To make it compatible with other platforms where filenames are UTF-8, you could look up sys.getfilesystemencoding
, which will give you utf-8
there or mbcs
on Windows.
However whilst cp949
would work for Korean characters, it would break on anything outside the repertoire of that code page (an extended version of EUC-KR).
So another approach is to keep your filenames as Unicode. On Windows this will use the Unicode-native interfaces to pass filenames to Windows in the UTF-16LE encoding it uses internally. (See PEP277 for more on this feature.)
This does generally still work on other platforms too: Linux and OS X should silently encode the Unicode filenames to UTF-8 for you. This may fail more in older Python versions, but it's the default way to handle filenames in Python 3 (where the default string type has changed to Unicode).
The traps to watch out for with using Unicode filenames on Python 2 are:
if os.path.supports_unicode_filenames is False, as it will be outside Windows, the functions that return filenames, such as
os.listdir
, will always give you byte strings. You'd have to detect that and decode them usingsys.getfilesystemencoding
.if you have a file on Linux/OS X with a name that's not a valid UTF-8 string, you won't be able to get a Unicode filename for it (UnicodeDecodeError if you try). Bit of a corner case, but it can lead to annoying inaccessible files.
Incidentally,
open(unicode('한글.txt', 'euc-kr'))
Probably you would want to say 'cp949'
there (as the Windows Korean code page has minor differences to EUC-KR). Or, more generally, 'mbcs'
, which gives you the system code page which is presumably going to be the same one your console is typing. Anyway, I don't know about PyShell, but normally if the above works then you should just be able to type it directly:
open(u'한글')
精彩评论