开发者

accent insensitive regex

开发者 https://www.devze.com 2023-01-27 04:00 出处:网络
My code: jQuery.fn.extend({ highlight: function(search){ var regex = new RegExp(\'(<[^>]*>)|(\'+ search.replace(/[.+]i/,\"$0\") +\')\',\'ig\');

My code:

jQuery.fn.extend({
 highlight: function(search){
  var regex = new RegExp('(<[^>]*>)|('+ search.replace(/[.+]i/,"$0") +')','ig');

  return this.html(this.html().replace(regex, function(a, b, c){
   return (a.charAt(0) == '<') ? a : '<strong class="highlight">' + c + '</strong>';
  }));
 }

});

I want to highlight letters with accents, ie:

$('body').highlight("cao");

should highlight: [ção] OR [çÃo] OR [cáo] OR 开发者_开发问答expre[cão]tion OR [Cáo]tion

How can I do that?


The sole correct way to do this is to first run it through Unicode Normalization Form D, canonical decomposition.

You then strip our any Marks that result (\pM characters, or perhaps \p{Diacritic}, depending), and run your match against the de/un-marked version.

Do not under any circumstances hardcode a bunch of literals. Eek!

Boa sorte!


You need to come up with a table of alternative characters and dynamically generate a regex based on that. For example:

var alt = {
  'c': '[cCç]',
  'a': '[aAãÃá]',
  /* etc. */
};

highlight: function (search) {
  var pattern = '';
  for (var i = 0; i < search.length; i++) {
    var ch = search[i];
    if (alt.hasOwnProperty(ch))
      pattern += alt[ch];
    else
      pattern += ch;
  }

  ...
}

Then for search = 'cao' this will generate a pattern [cCç][aAãÃá]o.

0

精彩评论

暂无评论...
验证码 换一张
取 消