开发者

How to approximate Java's Character.isLetterOrDigit() to identify non-English letters, digits in Javascript?

开发者 https://www.devze.com 2023-01-14 15:16 出处:网络
In Javascript, is there a way (that survives internationalization) to determine whether a character is a letter or digit?That will correctly identify Ä, ç as letters, and non-English digits (which I

In Javascript, is there a way (that survives internationalization) to determine whether a character is a letter or digit? That will correctly identify Ä, ç as letters, and non-English digits (which I am not going to look up as examples)!

In Java, the Character class has some static methods .isLetter(), .isDigit(), .isLetterOrDigit(), for determining in an internationally suitable way that a char is actually a letter or digit. This is better than code like

//this is not right, but common and ea开发者_C百科sy
if((ch>='A'&&ch<='Z')||(ch>='a'&&ch<='z')) { //it's a letter

because it will pick up non-English letters. I think C# has similar capabilities...

Of course, at worst I can send strings back to the server to be checked but that's a pain...

Of course, in the end I am looking to check if input is a valid name (starts with a letter, the rest is letter or digit). An outside the box possibility for low volume use might be:

var validName=function(atr) {
    var ele=document.createElement("div");
    try { ele.setAttribute(atr,"xxx"); }
    catch(e) { return false; }
    return true;
    }

This tests out fairly decent in IE, FF and Chrome... Though thorough testing might be needed to figure out how consistent the answers are. And again, not appropriate for heavy duty usage due to element creation.


I have created a small Javascript utility to provide this functionality. I don't claim it is perfect, so let me know how you fair. If people like it, I'll make this the official answer to this question.

CharFunk: https://github.com/joelarson4/CharFunk

  • CharFunk.getDirectionality(ch) - Used to find the directionality of the character
  • CharFunk.isAllLettersOrDigits(string) - Returns true if the string argument is composed of all letters and digits
  • CharFunk.isDigit(ch) - Returns true if provided a length 1 string that is a digit
  • CharFunk.isLetter(ch) - Returns true if provided a length 1 string that is a letter
  • CharFunk.isLetterNumber(ch) - Returns true if provided a length 1 string that is in the Unicode "Nl" category
  • CharFunk.isLetterOrDigit(ch) - Returns true if provided a length 1 string that is a letter or a digit
  • CharFunk.isLowerCase(ch) - Returns true if provided a length 1 string that is lowercase
  • CharFunk.isMirrored(ch) - Returns true if provided a length 1 string that is a mirrored character
  • CharFunk.isUpperCase(ch) - Returns true if provided a length 1 string that is uppercase
  • CharFunk.isValidFirstForName(ch) - Returns true if provided a length 1 string that is a valid leading character for a JavaScript identifier
  • CharFunk.isValidMidForName(ch) - Returns true if provided a length 1 string that is a valid non-leading character for a ECMAScript identifier
  • CharFunk.isValidName(string,checkReserved) - Returns true if the string is a valid ECMAScript identifier
  • CharFunk.isWhitespace(ch) - Returns true if provided a length 1 string that is a whitespace character
  • CharFunk.indexOf(string,callback) - Returns first matching index that returns a true return from the callback
  • CharFunk.lastIndexOf(string,callback) - Returns last matching index that returns a true return from the callback
  • CharFunk.matchesAll(string,callback) - Returns true if all characters in the provided string result in a true return from the callback
  • CharFunk.replaceMatches(string,callback,ch) - Returns a new string with all matched characters replaced


As far as I could tell when faced with a similar problem, the only way was really picking a couple of blocks and assume those are letters. The unicode standard has the full lists, so you could build a complete regex for this (I think). For instance, if you take all characters that are "alphabetic" according to this list you probably have all alphabetic characters. Likewise for numeric (decimal, digit, numeric) in the main unicode data file.

I'm not entirely sure if I'm pointing in the correct direction. There's a bunch of Unicode code charts that might help, and there's of course the unicode standard itself. It's all a bit much to read and understand though, especially if your only goal is to do some javascript string verification.

0

精彩评论

暂无评论...
验证码 换一张
取 消