开发者

How can I convert japanese characters to unicode in Perl?

开发者 https://www.devze.com 2022-12-30 11:32 出处:网络
开发者_开发问答Can you point me tool to convert japanese characters to unicode?CPAN gives me \"Unicode::Japanese\". Hope this is helpful to start with. Also you can look at article on Character Encodi

开发者_开发问答Can you point me tool to convert japanese characters to unicode?


CPAN gives me "Unicode::Japanese". Hope this is helpful to start with. Also you can look at article on Character Encodings in Perl and perl doc for unicode for more information.


See http://p3rl.org/UNI.

use Encode qw(decode encode);
my $bytes_in_sjis_encoding = "\x88\xea\x93\xf1\x8e\x4f";
my $unicode_string = decode('Shift_JIS', $bytes_in_sjis_encoding); # returns 一二三
my $bytes_in_utf8_encoding = encode('UTF-8', $unicode_string); # returns "\xe4\xb8\x80\xe4\xba\x8c\xe4\xb8\x89"

For batch conversion from the command line, use piconv:

piconv -f Shift_JIS -t UTF-8 < infile > outfile


First, you need to find out the encoding of the source text if you don't know it already.

The most common encodings for Japanese are:

  1. euc-jp: (often used on Unixes and some web pages etc with greater Kanji coverage than shift-jis)
  2. shift-jis (Microsoft also added some extensions to shift-jis which is called cp932, which is often used on non-Unicode Windows programs)
  3. iso-2022-jp is a distant third

A common encoding conversion library for many languages is iconv (see http://en.wikipedia.org/wiki/Iconv and http://search.cpan.org/~mpiotr/Text-Iconv-1.7/Iconv.pm) which supports many other encodings as well as Japanese.


This question seems a bit vague to me, I'm not sure what you're asking. Usually you would use something like this:

open my $file, "<:encoding(cp-932)", "JapaneseFile.txt"

to open a file with Japanese characters. Then Perl will automatically convert it into its internal Unicode format.

0

精彩评论

暂无评论...
验证码 换一张
取 消