I have sorted list of strings that I move between php and java. to be able to bsearch on this data, I need the same comparison function.
any idea what string compare functions I can开发者_运维问答 use that will always give the same result in both? eg php's strcmp() vs java's String.compareTo()
yes I know I could make my own string compare that does char by char carefully, but I was hoping there's a simple answer.
PS, don't care if case sensitive or not, as long as it is consistant.
since the php code in this case is allowed to be slow, I ended up rolling my own ...
function unicodeStrCmp($s1,$s2)
{
// designed to be same as java's String.compareTo
// not extensivley tested, and doesn't deal with surrogate pairs
$l1 = mb_strlen($s1);
$l2 = mb_strlen($s2);
$i = 0;
while ($i<$l1 && $i<$l2)
{
$c1 = mb_convert_encoding(mb_substr($s1,$i,1),'utf-16le');
$c1 = ord($c1[0])+(ord($c1[1])<<8);
$c2 = mb_convert_encoding(mb_substr($s2,$i,1),'utf-16le');
$c2 = ord($c2[0])+(ord($c2[1])<<8);
$res = $c1-$c2;
if ($res!=0)
return $res;
$i++;
}
return $l1-$l2;
}
The other way to do this would be to implement your own 'byte string' class in Java, complete with a compareTo
method. The idea would be to avoid converting the byte representations (in UTF8 encoding, or whatever) into Unicode characters, and thereby avoiding the possibility of using the wrong character encoding.
But this would be exceedingly awkward, because all of Java's text handling APIs are based on the String type and are therefore Unicode based (more or less). Besides, if you weren't making any assumptions about character sets or encodings, you wouldn't be able to interpret the bytes in any way; e.g. you couldn't parse out words, etc.
精彩评论