I cant believe what im seeing here! I have a normal, basic html form (havent changed the enctype), if someone puts a strange japanese character in the field and posts the form then in my database it is saving an HTML encoded version of the character. I am not processing the string at all except with a Trim(). Using classic ASP (not out开发者_StackOverflow of choice i might add!). I have a feeling this might have something to do with utf-8/encoding but ive tried messing around with the meta tag and content type and been unable to get the character to come through properly. To make things harder i dont seem to be able to get classic ASP debugging in VS express 2010. Any comments appreciated :)
As you can see in this demo and read in the standard (4.10.22.6.4.2), characters that are not supported by the selected encoding (such as Japanese ones in an ISO8859-* or cp1252 encoding) are encoded as HTML entities.
If you are fine with incorrectly handling user input that contains html entities in the clear, you can replace all numeric HTML entities in the user input with the corresponding Unicode character (however, doing so in ASP is hard since there is no inverse function to Server.HTMLEncode
and Unicode support is pretty much nonexistent in the first place.
As an alternative, use UTF-8 (and/or a web development platform from this millennium) and all these problems go away. However, since that may not be an option, you may want the to unescape the HTML entities in different programs, for example with HttpUtility.HtmlDecode
in C#, html_entity_decode
in PHP, or HTMLParser.unescape
in Python.
精彩评论