I have code:
encoding = guess_encoding()
text = unicode(text, encoding)
when wrong symbol appears in text UnicodeDecode exception is raised. How can I silently skip exception replacing wrong s开发者_运维知识库ymbol with '?' ?
Try
text = unicode(text, encoding, "replace")
From the documentation:
'replace' causes the official Unicode replacement character, U+FFFD, to be used to replace input characters which cannot be decoded.
If you want to use "?"
instead of the official Unicode replacement character, you can do
text = text.replace(u"\uFFFD", "?")
after converting to unicode.
In Python 3, you can decode a bytes
object into a string using the decode
method. It accepts two parameters:
encoding
, which is"utf-8"
by default, anderrors
, which defines what to do on illegal character sequences. The default value is"strict"
, which raises aUnicodeDecodeError
; other alternatives areignore
andreplace
-- the latter replaces illegal characters with the Unicode replacement character"\uFFFD"
.
Therefore, you'd need to do this to decode-and-replace:
encoding = guess_encoding()
text = text_bytes.decode(encoding, errors='replace').replace('\uFFFD', '?')
As Sven Marnach pointed out in a comment, you can supply the errors
argument directly to open
; otherwise you'd get the decode errors while reading the file (if it falls out of the character map).
精彩评论