开发者

htmlentities 'Invalid Multibyte Sequence' error

开发者 https://www.devze.com 2022-12-20 23:09 出处:网络
While trying to run a string through PHP\'s htmlentities function, I have some cases where I get a \'Inval开发者_如何学Goid Multibyte Sequence\' error. Is there a way to clean the string prior to call

While trying to run a string through PHP's htmlentities function, I have some cases where I get a 'Inval开发者_如何学Goid Multibyte Sequence' error. Is there a way to clean the string prior to calling the function to prevent this error from occuring?


As of PHP 5.4 you should use something along the following to properly escape output:

$escapedString = htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE | ENT_DISALLOWED | ENT_HTML5, $stringEncoding);

ENT_SUBSTITUTE replaces invalid code unit sequences by � (instead of returning an empty string).

ENT_DISALLOWED replaces code points that are invalid in the specified doctype with �.

ENT_HTML5 specifies the used doctype. Depending on what you are using you may choose ENT_HTML401, ENT_XHTML or ENT_XML1.

Using those options you make sure that the result is always valid in the given doctype, regardless of the kind of abominated input you get.

Also, don't forget to specify the $stringEncoding. Relying on the default is a bad idea as it depends on ini settings and may (and did) change between versions.


I've encountered scenarios where it's not enough to specify UTF-8 and found the ENT_IGNORE option useful. I don't think it's documented for htmlentities, only for htmlspecialchars but it does work in stifling the error.


For PHP 5.3.0 and below, the default charset for htmlentities() is ISO-8859-1. (Manual)

You are probably applying it to a UTF-8 string. Specify the character set using

htmlentities($string, (whatever), "UTF-8");

Since PHP 5.4.0, the default charset is UTF-8.


In general the php ini setting display_errors can be used to control whether errors are output to the browser, the ini setting log_errors can be independently used to control whether errors are written to logfile, and if a custom error handler has been set with set_error_handler() then this is always called for all errors and can then read the values of display_errors and log_errors along with the value of error_reporting() and take the appropriate course of action, right?

Wrong! In this case, htmlspecialchars() and htmlentities() only trigger the error if the value of display_errors is false. If the value of display_errors is true then no error is triggered at all! This seemingly nonsensical behaviour makes it impossible to detect these errors during debugging with display_errors on.

I got the information from here


Do you use substr somewhere in the string you want to check. I suggest then to use mb_substr as an alternative. The problem is that substr is not unicode aware. So, it is just chopping off bytes in your multi byte character set.


html_entities($variable, ENT_QUOTES); always works just fine for me.


Note that using utf-8 requires enabling multibyte string functions. This could mean replacing functions like substr with mb_substr, except that php provides a php ini setting to turn on overloading of those functions with the mb equivalent.

See here for more detail: http://www.php.net/manual/en/mbstring.overload.php

0

精彩评论

暂无评论...
验证码 换一张
取 消