I use these regex to remove 开发者_Python百科words less than 3 characters :
$str = preg_replace("!\\b\\w{1,3}\\b!", "", $str);
and
$rdu = "/\b[^\b]{1,2}\b/";
$str = preg_replace($rdu , " ", " " . $str . " ");
but in unicode text return me :
� �� �� �������� ��� �� � �� �� �������� ��� ��
....
is there any way with or without regex to remove words less than 3 characters in unicode text?
THXA
Use the u modifier for UTF-8 support:
/\b\w{1,2}\b/u
function RemoveLess($String,$Char=2)
{
$StringArray=explode (" ",$String);
foreach ($StringArray as &$Word)
{
if (mb_strlen($Word,"UTF-8")>$Char)
{
$Str.=$Word." ";
}
}
return trim($Str);
}
$text="any text here - لا اله إلا الله محمد رسول الله";
echo RemoveLess($text);
精彩评论