While trying to run a string through PHP's htmlentities function, I have some cases where I get a 'Inval开发者_如何学Goid Multibyte Sequence' error. Is there a way to clean the string prior to calling the function to prevent this error from occuring?
As of PHP 5.4 you should use something along the following to properly escape output:
$escapedString = htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE | ENT_DISALLOWED | ENT_HTML5, $stringEncoding);
ENT_SUBSTITUTE
replaces invalid code unit sequences by � (instead of returning an empty string).
ENT_DISALLOWED
replaces code points that are invalid in the specified doctype with �.
ENT_HTML5
specifies the used doctype. Depending on what you are using you may choose ENT_HTML401
, ENT_XHTML
or ENT_XML1
.
Using those options you make sure that the result is always valid in the given doctype, regardless of the kind of abominated input you get.
Also, don't forget to specify the $stringEncoding
. Relying on the default is a bad idea as it depends on ini
settings and may (and did) change between versions.
I've encountered scenarios where it's not enough to specify UTF-8 and found the ENT_IGNORE option useful. I don't think it's documented for htmlentities, only for htmlspecialchars but it does work in stifling the error.
For PHP 5.3.0 and below, the default charset for htmlentities()
is ISO-8859-1. (Manual)
You are probably applying it to a UTF-8 string. Specify the character set using
htmlentities($string, (whatever), "UTF-8");
Since PHP 5.4.0, the default charset is UTF-8.
In general the php ini setting display_errors can be used to control whether errors are output to the browser, the ini setting log_errors can be independently used to control whether errors are written to logfile, and if a custom error handler has been set with set_error_handler() then this is always called for all errors and can then read the values of display_errors and log_errors along with the value of error_reporting() and take the appropriate course of action, right?
Wrong! In this case, htmlspecialchars() and htmlentities() only trigger the error if the value of display_errors is false. If the value of display_errors is true then no error is triggered at all! This seemingly nonsensical behaviour makes it impossible to detect these errors during debugging with display_errors on.
I got the information from here
Do you use substr somewhere in the string you want to check. I suggest then to use mb_substr as an alternative. The problem is that substr is not unicode aware. So, it is just chopping off bytes in your multi byte character set.
html_entities($variable, ENT_QUOTES); always works just fine for me.
Note that using utf-8 requires enabling multibyte string functions. This could mean replacing functions like substr with mb_substr, except that php provides a php ini setting to turn on overloading of those functions with the mb equivalent.
See here for more detail: http://www.php.net/manual/en/mbstring.overload.php
精彩评论