How do I define a libpcre regexp for arabic characters?_问答_开发者

How do I define a libpcre regexp for arabic characters?

开发者 https://www.devze.com 2023-03-05 08:31 出处：网络

I need to define a PCRE regexp for certain spam-ish words in Arabic/Persian alphabet to be used in drupal spam module. The problem is that the usual PCRE regexp is apparently unable to find patters in

For example, while /bad word/ flags开发者_开发百科 instances of 'bad word', but

/کلمه بد/i

Is unable to flag 'کلمه بد'.

I have no problem with that if I use the u (Unicode) PCRE modifier:

$string = 'کلمه بد';

if (preg_match('~\p{Arabic}~u', $string) > 0)
{
    var_dump('contains Arabic characters');

    if (preg_match('~کلمه بد~ui', $string) > 0)
    {
        var_dump('contains spam-ish Arabic characters');
    }
}

string(26) "contains Arabic characters"
string(35) "contains spam-ish Arabic characters"

It runs just fine on IDEOne.com too. Be sure to save your files (and convert input data) in (to) UTF-8.

Literal Unicode text in Perl source will only be recognized properly if the source file has use utf8; in it.

You can do /\x{644}/ and you can do

open my $fh, '<:utf8', 'somefile.txt' or die "blah blah";
my $bad_thing = <$fh>;
/$bad_thing/;

and either will work without the utf8 pragma if your data is properly decoded, but if you want to do /ل/ then you need use utf8. Make sense?

How do I define a libpcre regexp for arabic characters?

精彩评论

关注公众号

热门标签

图文推荐

How do I define a libpcre regexp for arabic characters?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：