开发者

How do you convert posted "english" characters from international PC's in ASP.NET? (ex 2205)

开发者 https://www.devze.com 2023-01-05 14:31 出处:网络
I have a WebForm search page that gets occasional hits开发者_高级运维 from international visitors. When they enter in text, it appears to be plain ASCII a-z, 0-9 but they are printed in bold and my \"

I have a WebForm search page that gets occasional hits开发者_高级运维 from international visitors. When they enter in text, it appears to be plain ASCII a-z, 0-9 but they are printed in bold and my "is this text" logic can't handle the input. Is there any easy way in ASP.NET to convert Unicode characters that equate to A-Z, 0-9 into plain old text?


You are getting so-called "Fullwidth Forms" of the characters. In Unicode, these are encoded at codepoints U+FF01 to U+FF5E. To get the ASCII codepoint (U+0021 to U+007E) from them, you have to get their codepoint and subtract (0xFF01 - 0x0021) from it.

ASCII: http://unicode.org/charts/PDF/U0000.pdf
Fullwidth Forms: http://unicode.org/charts/PDF/UFF00.pdf

I don't speak ASP.NET, but in Java the code would look like this:

String decodeFullwidth(String s) {
  StringBuilder sb = new StringBuilder();
  for (int i = 0; i < s.length(); i++) {
    char c = s.charAt(i);
    if (0xFF01 <= c && c <= 0xFF5E) {
      sb.append((char) (c - (0xFF01 - 0x0021)));
    } else {
      sb.append(c);
    }
  }
  return sb.toString();
}


it appears to be plain ASCII a-z, 0-9 but they are printed in bold

This could be the Unicode "mathematical bold" characters

0

精彩评论

暂无评论...
验证码 换一张
取 消