开发者

Encoding issue with form and HTML Purifier / MySQL

开发者 https://www.devze.com 2022-12-28 06:02 出处:网络
Driving me nuts... Page with form is encoded as Unicode (UTF-8) via: <meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\">

Driving me nuts...

Page with form is encoded as Unicode (UTF-8) via:

<meta http-equiv="content-type" content="text/html; charset=utf-8">

entry column in 开发者_运维技巧database is text utf8_unicode_ci

copying text from a Word document with " in it, like this: “1922.” is insta-fail and ends up in the database as â��1922.â�� (typing new data into the form, including " works fine... it's cut and pasting from Word...)

PHP steps behind the scenes are:

  • grab value from POST
  • run through HTML Purifier default settings
  • run through mysql_real_escape_string
  • insert query into dbase

Help?


“1922.” and "1922." are 2 different strings.
The quotes from word are not double quotes “ != "

The column that you describe is text utf8_unicode_ci. utf8_unicode_ci is the collation, make sure the charset on that column is set to utf8.

Then I would make sure that you setup correct encoding for each connection using SET NAMES utf8 COLLATE utf8_unicode_ci...

If you've done that and it's still not saved properly, make sure your php has mbstrings enabled and try to work with mb_ functions.

There are many root causes you might have, but I think the charset on column and SET NAMES ... should solve it.


Call mysql_set_charset to let the database know you are going to be sending it UTF-8 encoded strings.

typing new data into the form, including " works fine...

Well " is a normal ASCII quote. and aren't, they're smart quotes, which are non-ASCII characters. Whether they come from Word is unimportant; all your non-ASCII characters will be treated the same.

  • grab value from POST
  • run through HTML Purifier default settings

That's a bad idea. HTML Purifier should be run over strings that are HTML and you intend to output as HTML, for the relatively rare case where you need to let users submit HTML.

It is totally the wrong thing to run over all input text. Normally you should be allowing any old text, and then when you output that text inside HTML you should be calling htmlspecialchars() over it.

Otherwise you're breaking the ability of users to enter < and & like I am in this post, and you still risk cross-site-scripting when you are outputting processed or non-input-sourced data.

0

精彩评论

暂无评论...
验证码 换一张
取 消