I have some code that parses through an html file and I stumbled across a page that contains this charcter which screwed up the parsing: “
When I execute the following code, $len is assigned a value of 3.
$test = "“";
$len = strlen($test);
I'm 开发者_如何学Pythonsuspecting that this character might be unicode.
For now I'm getting around this problem by replacing the curly double quote with a standard double quote. However I'm concerned about other files that might contain similar characters and I don't want to have replace functions for each separate instance.
How do I get php to treat this as a single character?
PHP's standard string handling functions are not multi-byte aware, they stupidly count the number of bytes in the string.
If you have the multibyte extension installed, mb_strlen()
is what you are looking for.
For example, if your data is UTF-8:
$test = "“";
$len = mb_strlen($test, "UTF-8");
For unicode use php function was starts with mb_ (multibyte): For example: http://php.net/manual/en/function.mb-strlen.php
Use mb_strlen()
, it will handle multibyte characters.
You need to use the multibyte version of the functions > http://php.net/manual/en/function.mb-strlen.php
精彩评论