开发者

Issue parsing html file with php

开发者 https://www.devze.com 2023-03-03 04:05 出处:网络
I have some code that parses through an html file and I stumbled across a page that contains this charcter which screwed up the parsing:

I have some code that parses through an html file and I stumbled across a page that contains this charcter which screwed up the parsing: “

When I execute the following code, $len is assigned a value of 3.

$test = "“";
$len = strlen($test);

I'm 开发者_如何学Pythonsuspecting that this character might be unicode.

For now I'm getting around this problem by replacing the curly double quote with a standard double quote. However I'm concerned about other files that might contain similar characters and I don't want to have replace functions for each separate instance.

How do I get php to treat this as a single character?


PHP's standard string handling functions are not multi-byte aware, they stupidly count the number of bytes in the string.

If you have the multibyte extension installed, mb_strlen() is what you are looking for.

For example, if your data is UTF-8:

$test = "“";
$len = mb_strlen($test, "UTF-8");


For unicode use php function was starts with mb_ (multibyte): For example: http://php.net/manual/en/function.mb-strlen.php


Use mb_strlen(), it will handle multibyte characters.


You need to use the multibyte version of the functions > http://php.net/manual/en/function.mb-strlen.php

0

精彩评论

暂无评论...
验证码 换一张
取 消