I'm facing problems when trying to convert a UTF-8 file (containing Russian characters) into an ISO-8859-5 file: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to . Has anyone got an idea of what's wrong(?) given the following:
def convert():
try:
import codecs
data = codecs.open('in.开发者_Go百科txt', 'r', 'utf-8').read()
except Exception, e:
print e
sys.exit(1)
f = open('out.txt', 'w')
try:
f.write(data.encode('iso-8859-5'))
except Exception, e:
print e
finally:
f.close()
"in.txt": ё!—№%«»(эюпоиуыяафйклж;нцхз
feff is a Byte-Order-Mark character. ISO-8859-5 won't have any representation for it.
You'll need to strip it off your data
variable before encoding it into ISO-8859-5.
Recent versions of Python have the utf-8-sig
codec that will automatically strip the BOM off a UTF-8-encoded string or file when reading it:
>>> print '\xef\xbb\xbf\xe3\x81\x82'.decode('utf-8-sig')
あ
精彩评论