Issue parsing html file with php_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-03 04:05 出处：网络

I have some code that parses through an html file and I stumbled across a page that contains this charcter which screwed up the parsing:

I have some code that parses through an html file and I stumbled across a page that contains this charcter which screwed up the parsing: “

When I execute the following code, $len is assigned a value of 3.

$test = "“";
$len = strlen($test);

I'm 开发者_如何学Pythonsuspecting that this character might be unicode.

For now I'm getting around this problem by replacing the curly double quote with a standard double quote. However I'm concerned about other files that might contain similar characters and I don't want to have replace functions for each separate instance.

How do I get php to treat this as a single character?

PHP's standard string handling functions are not multi-byte aware, they stupidly count the number of bytes in the string.

If you have the multibyte extension installed, mb_strlen() is what you are looking for.

For example, if your data is UTF-8:

$test = "“";
$len = mb_strlen($test, "UTF-8");

For unicode use php function was starts with mb_ (multibyte): For example: http://php.net/manual/en/function.mb-strlen.php

Use mb_strlen(), it will handle multibyte characters.

You need to use the multibyte version of the functions > http://php.net/manual/en/function.mb-strlen.php

Issue parsing html file with php

精彩评论

关注公众号

热门标签

图文推荐

Issue parsing html file with php

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：