I have a WebForm search page that gets occasional hits开发者_高级运维 from international visitors. When they enter in text, it appears to be plain ASCII a-z, 0-9 but they are printed in bold and my "is this text" logic can't handle the input. Is there any easy way in ASP.NET to convert Unicode characters that equate to A-Z, 0-9 into plain old text?
You are getting so-called "Fullwidth Forms" of the characters. In Unicode, these are encoded at codepoints U+FF01 to U+FF5E. To get the ASCII codepoint (U+0021 to U+007E) from them, you have to get their codepoint and subtract (0xFF01 - 0x0021) from it.
ASCII: http://unicode.org/charts/PDF/U0000.pdf
Fullwidth Forms: http://unicode.org/charts/PDF/UFF00.pdf
I don't speak ASP.NET, but in Java the code would look like this:
String decodeFullwidth(String s) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (0xFF01 <= c && c <= 0xFF5E) {
sb.append((char) (c - (0xFF01 - 0x0021)));
} else {
sb.append(c);
}
}
return sb.toString();
}
it appears to be plain ASCII a-z, 0-9 but they are printed in bold
This could be the Unicode "mathematical bold" characters
精彩评论