Unicode character in octets is something like 110xxxxx 10xxxxxx. How can I transform these octets in hexadecimal notation like开发者_StackOverflow社区 U+XXXX?
You can leverage iconv
's UTF-8 decoder to avoid having to write one yourself:
function utf8_to_codepoints($s) {
return unpack('V*', iconv('UTF-8', 'UCS-4LE', $s));
}
$data= "Caf\xc3\xa9 \xe6\x97\xa5\xe6\x9c\xac \xf0\x9d\x84\x9e"; // Café 日本
精彩评论