开发者

Compare first three characters of two strings

开发者 https://www.devze.com 2022-12-20 16:19 出处:网络
Strings s1 and s2 will always be of length 1 or higher. How can I speed this up? int l1 = s1.length(); if (l1 > 3) { l1 = 3; }

Strings s1 and s2 will always be of length 1 or higher.

How can I speed this up?

int l1 = s1.length();

if (l1 > 3) { l1 = 3; }

if (s2开发者_运维知识库.startsWith(s1.substring(0,l1))) 
{
 // do something..
}

Regex maybe?


This seems pretty reasonable. Is this really too slow for you? You sure it's not premature optimization?


Rewrite to avoid object creation

Your instincts were correct. The creation of new objects (substring()) is not very fast and it means that each one created must incur g/c overhead as well.

This might be a lot faster:

static boolean fastCmp(String s1, String s2) {
    return s1.regionMatches(0, s2, 0, 3);
}


if (s2.startsWith(s1.substring(0, Math.min(3, s1.length())) {..};

Btw, there is nothing slow in it. startsWith has complexity O(n)

Another option is to compare the char values, which might be more efficient:

boolean match = true;
for (int i = 0; i < Math.min(Math.min(s1.length(), 3), s2.length()); i++) {
    if (s1.charAt(i) != s2.charAt(i)) {
       match = false;
       break;
    }
}


My java isn't that good, so I'll give you an answer in C#:

int len = Math.Min(s1.Length, Math.Min(s2.Length, 3));
for(int i=0; i< len; ++i)
{
    if (s1[i] != s2[i])
       return false;
}
return true;

Note that unlike yours and Bozho's, this does not create a new string, which would be the slowest part of your algorithm.


Perhaps you could do this

if (s1.length() > 3 && s2.length() > 3 && s1.indexOf (s2.substring (0, 3)) == 0)
{
  // do something..
}


There is context missing here: What are you trying to scan for? What type of application? How often is it expected to run?

These things are important because different scenarios call for different solutions:

  1. If this is a one-time scan then this is probably unneeded optimization. Even for a 20MB text file, it wouldn't take more than a couple of minutes in the worst case.
  2. If you have a set of inputs and for each of them you're scanning all the words in a 20MB file, it might be better to sort/index the 20MB file to make it easy to look up matches and skip the 99% of unnecessary comparisons. Also, if inputs tend to repeat themselves it might make sense to employ caching.

Other solutions might also be relevant, depending on the actual problem.

But if you boil it down only to comparing the first 3 characters of two strings, I believe the code snippets given here are as good as you're going to get - they're all O(1)*, so there's no drastic optimization you can do.

*The only place where this might not hold true is if getting the length of the string is O(n) rather than O(1) (which is the case for the strlen function in C++), which is not the case for Java and C# string objects.

0

精彩评论

暂无评论...
验证码 换一张
取 消