JavaScript regular expression to catch kanji_问答_开发者

JavaScript regular expression to catch kanji

开发者 https://www.devze.com 2023-04-03 05:56 出处：网络

I can\'t get this javascript function to work the way I want... // matches a String that contains kanji and/or kana character(s)

I can't get this javascript function to work the way I want...

// matches a String that contains kanji and/or kana character(s)

String.prototype.isKanjiKana = function(){
    return !!this.match(/^开发者_开发技巧[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]+$/);
}

it does return TRUE if the string is made of kanji and/or kana characters, FALSE if alphabet or other chars are present.

I would like it to return if at least 1 kanji and/or kana characters are present instead that if all of them are.

thank you in advance for any help!

The right answer is not to hardcode ranges. Never ever put magic numbers in your code! That is a maintenance nightmare. It is hard to read, hard to write, hard to debug, hard to maintain. How do you know you got the numbers right? What happens when they add new ones? No, do not use magic numbers. Please.

The right answer is to use named Unicode scripts, which are a fundemental aspect of every Unicode code point:

[\p{Han}\p{Hiragana}\p{Katakana}]

That requires the XRegExp plugin for Javascript.

The real problem is that Javascript regexes on their own are too primitive to support Unicode properties — and therefore, to support Unicode. Maybe that was once an acceptable compromise 15 years ago, but today it is nothing less than intolerably negligent, as you yourself have discovered.

You will also miss a few Common code points specified as kana in the new Script Extensions property, but probably no matter. You could just add \p{Common} to the set above.

Now that Unicode property escapes are part of the ES (2018) spec, the following regex can be used natively if the JS engine supports this feature (expanding on @tchrist's answer):

/[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u

If you want to exclude punctuation from being matched:

/(?!\p{Punctuation})[\p{Script_Extensions=Han}\p{Script_Extensions=Hiragana}\p{Script_Extensions=Katakana}]/u

/[\u3000-\u303f]|[\u3040-\u309f]|[\u30a0-\u30ff]|[\uff00-\uffef]|[\u4e00-\u9faf]|[\u3400-\u4dbf]/

Japanese style punctuation: [\u3000-\u303f]
Hiragana: [\u3040-\u309f]
Katakana: [\u30a0-\u30ff]
Roman characters + half-width katakana: [\uff00-\uffef]
Kanji: [\u4e00-\u9faf]|[\u3400-\u4dbf]

String.prototype.isKanjiKana = function(){
    return !!this.match(/[\u4E00-\u9FAF\u3040-\u3096\u30A1-\u30FA\uFF66-\uFF9D\u31F0-\u31FF]/);
}

Don't anchor it to beginning and end of string with $^ and the + is useless in this case.

/[\u4E00-\u9FAF|\u3040-\u3096|\u30A1-\u30FA|\uFF66-\uFF9D|\u31F0-\u31FF]/

Why not just this? It will return true when it contains at least one Kanji.

/[一-龯]/.test(str)

JavaScript regular expression to catch kanji

精彩评论

关注公众号

热门标签

图文推荐

JavaScript regular expression to catch kanji

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：