开发者

How to substitute cp1250 specific characters to utf-8 in Vim

开发者 https://www.devze.com 2023-02-20 21:16 出处:网络
I have some central european characters in cp1250 encoding in Vim. When I change encoding with set encoding=utf-8 they appear like <d0> and such. How can I substitute over the entire file those

I have some central european characters in cp1250 encoding in Vim. When I change encoding with set encoding=utf-8 they appear like <d0> and such. How can I substitute over the entire file those charac开发者_JAVA百科ters for what they should be, i.e. Đ, in this case?


As sidyll said, you should really use iconv for the purpose. Iconv knows stuff. It knows all the hairy encodings, onscure code-points, katakana, denormalized, canonical forms, compositions, nonspacing characters and the rest.

:%!iconv --from-code cp1250 --to-code utf-8

or shorter

:%!iconv -f cp1250 -t utf-8

to filter the whole buffer. If you do

:he xxd

You'll get a sample of how to automatically encode on buffer load/save if you wanted.

iconv -l will list you all (many: 1168 on my system) encodings it accepts/knows about.

Happy hacking!


The iconv() function may be useful:

iconv({expr}, {from}, {to})             *iconv()*
        The result is a String, which is the text {expr} converted
        from encoding {from} to encoding {to}.
        When the conversion fails an empty string is returned.
        The encoding names are whatever the iconv() library function
        can accept, see ":!man 3 iconv".
        Most conversions require Vim to be compiled with the |+iconv|
        feature.  Otherwise only UTF-8 to latin1 conversion and back
        can be done.
        This can be used to display messages with special characters,
        no matter what 'encoding' is set to.  Write the message in
        UTF-8 and use:
            echo iconv(utf8_str, "utf-8", &enc)
        Note that Vim uses UTF-8 for all Unicode encodings, conversion
        from/to UCS-2 is automatically changed to use UTF-8.  You
        cannot use UCS-2 in a string anyway, because of the NUL bytes.
        {only available when compiled with the +multi_byte feature}


You can set encoding to the value of your file's encoding and termencoding to UTF-8. See The vim mbyte documentation.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号