开发者

How do I match latin unicode characters in ColdFusion or Java regex?

开发者 https://www.devze.com 2023-03-16 14:06 出处:网络
I\'m looking for a ColdFusion or Java regex (to use in a replace function) that will only match numbers [0-9], letters [a-z], but include none ASCII Portuguese letters (unicode latin, like ç and ã).

I'm looking for a ColdFusion or Java regex (to use in a replace function) that will only match numbers [0-9], letters [a-z], but include none ASCII Portuguese letters (unicode latin, like ç and ã).

Some like this:

str = reReplaceNoCase(str, "match none number/letter but keep unicode latin chars", "", "ALL");

Input string: "informação 123 ?:#$%"

Desired outcome: "informação 123"

I know I can match letters and numbers with [a-开发者_运维技巧z][0-9], but this doesn't match letters such as ç and ã.


Try alphanumeric character class: \w, it should match letters, digits, and underscores.

Also you can use special named class \p{L} (I don't know, does Java RegEx parser support it). So in C# your task can be done using following code:

var input = "informação 123 ?:#$%";
var result = Regex.Replace(input, @"[^\p{L}\s0-9]", string.Empty);

Regex [^\p{L}\s0-9] means: any character not in this class (all letters, white space, digits). Thereby it matches in your example ?:#$% and we can replace these characters with empty string.

0

精彩评论

暂无评论...
验证码 换一张
取 消