开发者

Python conversion to ISO-8859-5

开发者 https://www.devze.com 2022-12-19 15:02 出处:网络
I\'m facing problems when trying to convert a UTF-8 file (containing Russian characters) into an ISO-8859-5 file: \'charmap\' codec can\'t encode character u\'\\ufeff\' in position 0: character maps t

I'm facing problems when trying to convert a UTF-8 file (containing Russian characters) into an ISO-8859-5 file: 'charmap' codec can't encode character u'\ufeff' in position 0: character maps to . Has anyone got an idea of what's wrong(?) given the following:

def convert():
    try:
        import codecs
        data = codecs.open('in.开发者_Go百科txt', 'r', 'utf-8').read()
    except Exception, e:
        print e
        sys.exit(1)

    f = open('out.txt', 'w')

    try:
        f.write(data.encode('iso-8859-5'))
    except Exception, e:
        print e
    finally:
        f.close()

"in.txt": ё!—№%«»(эюпоиуыяафйклж;нцхз


feff is a Byte-Order-Mark character. ISO-8859-5 won't have any representation for it.

You'll need to strip it off your data variable before encoding it into ISO-8859-5.


Recent versions of Python have the utf-8-sig codec that will automatically strip the BOM off a UTF-8-encoded string or file when reading it:

>>> print '\xef\xbb\xbf\xe3\x81\x82'.decode('utf-8-sig')
あ
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号