How can I make python 3 (3.1) print("Some text")
to stdout in UTF-8, or how to output raw bytes?
Test.py
TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8
TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes
print(sys.getdefaultencoding())
print(sys.stdout.encoding)
print(TestText)
print(TestText.encode("utf8"))
print(TestText.encode("cp1252","replace"))
print(TestText2)
Output (in CP1257 and I replaced chars to byte values [x00]
):
utf-8
cp1257
Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE]
b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
b'Test - ??????..\x9a\x8a??\x9e\x8e'
b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'
print
is just too smart... :D There's no point using encoded text with print
(since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding
.
For example: print(chr(255))
throws an error:
Traceback (most recent call last): File "Test.py", line 1, in <module> print(chr(255)); File "H:\Python31\lib\encodings\cp1257.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xff' in position 0: character maps to <undefined>
By the way print( TestText == TestText2.decode("utf8"))
returns False
, although print output is the same.
How does Python 3 determine sys.stdout.encoding
and how can I change it?
I made a printRAW()
function which works fine (actually it encodes output to UTF-8, so really it's not raw...):
def printRAW(*Text):
RAWOut = open(1, 'w', encoding='utf8', closefd=False)
print(*Text, file=RAWOut)
RAWOut.flush()
RAWOut.close()
printRAW("Cool&quo开发者_Python百科t;, TestText)
Output (now it print in UTF-8):
Cool Test - āĀēĒčČ..šŠūŪžŽ
printRAW(chr(252))
also nicely prints ü
(in UTF-8, [xC3][xBC]
) and without errors :)
Now I'm looking for maybe better solution if there's any...
Clarification:
TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X.
TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string.
To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:
import sys
sys.stdout.buffer.write(TestText2)
This is the best I can dope out from the manual, and it's a bit of a dirty hack:
utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout
print(whatever, file=utf8stdout)
It seems like file objects should have a method to change their encoding, but AFAICT there isn't one.
If you write to utf8stdout and then write to sys.stdout without calling utf8stdout.flush() first, or vice versa, bad things may happen.
As per this answer
You can manually reconfigure the encoding of stdout as of python 3.7
import sys
sys.stdout.reconfigure(encoding='utf-8')
I tried zwol's solution in Python 3.6, but it didn't work for me. With some strings there was no output printed to the console.
But iljau's solution worked: Reopen stdout with a different encoding.
import sys
sys.stdout = open(1, 'w', encoding='utf-8', closefd=False)
You can set the console encoding at utf-8 with:
import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
精彩评论