开发者

"Lenient" regex matching of similar characters in C#/.Net

开发者 https://www.devze.com 2023-01-05 18:25 出处:网络
Is there a way to get .Net to positively match strings, even if som开发者_JAVA技巧e characters are not exactly the same? Examples of characters that should be considered to be similar could be: \'a\'/

Is there a way to get .Net to positively match strings, even if som开发者_JAVA技巧e characters are not exactly the same? Examples of characters that should be considered to be similar could be: 'a'/'á' and 'í'/'i'. The Chrome browser find-as-you-type recognizes these characters as being equivalent.


Take a look at this blog post by Michael Kaplan. The code here uses standard .NET class library methods for

  1. Normalising Unicode strings, in this case, using a "composite" normalisation form which ensures that a character like á is represented by separate code points for a and its diacritic(s);
  2. Identifying the diacritics using classes that expose databases of information about Unicode characters, and stripping them out.


Sure its possible if you write out the algorithm yourself. The only thing close to doing what you speak with the OOB Regex.Match() overloads is in the RegexOptions, the CultureInvariant. But, unless you are flipping culture's that's not going to be of any use.


Maybe you want to look into Soundex/Metaphone functions, to first normalise strings, and then perform your regex operations on the results of that?

0

精彩评论

暂无评论...
验证码 换一张
取 消