Using binarySearch with Comparator and regex_问答_开发者

I am trying to write a quick search that searches a List<String> Instead of looping through the list and manually checking, I want to do this using binarySearch, but I am not sure how to do it.

Old way:

for(String s : list) {
  if(s.startsWith("contact.")
     return true;
}

Instead I would like something like this:

Collections.sort(list);
Collections.binarySearch(list, FindContactComparator());

Can someone help me write this开发者_如何学Python Comparator?

Is there any better way of doing this instead of using binarySearch?

This should work:

        Comparator<String> startsWithComparator = new Comparator<String>() {
            public int compare(String currentItem, String key) {
                if(currentItem.startsWith(key)) {
                    return 0;
                }
                return currentItem.compareTo(key);
            }
        };

int index = Collections.binarySearch(items, "contact.", startsWithComparator);

However sorting and then binary searching is less efficient than the single pass iteration.

Addendum:

Though the above answer helps you, here is another way (inspired from Scala, Google Collections) :

List<String> items = Arrays.asList("one", "two", "three", "four", "five", "six");
int index = find(items, startsWithPredicate("th"));
System.out.println(index);


public static Predicate<String> startsWithPredicate(final String key) {
    return new Predicate<String>(){
        @Override
        public boolean apply(String item) {
            return item.startsWith(key); 
        }
    };
}

public static <T> int find(Collection<T> items, Predicate<T> predicate) {
    int index = 0;
    for(T item: items) {
        if(predicate.apply(item)) {
            return index;
        }
        index++;
    }
    return -1;
}

interface Predicate<T> {
    boolean apply(T item);
}

Here the thing is the find() method is not tied with your 'matching' logic; it just finds an element that satisfies the predicate. So you could pass on a different implementation of predicate, for ex. which can check 'endsWith' to find() method and it would return the found item which ends with a particular string. Further the find() method works for any type of collection; all it needs is a predicate which transforms an element of collection element type to a boolean. This multiple lines of code around a simple logic also show the Java's lack of support for first class functions.

The problem is that binary search never looks back. I solved this by finding the first matching an element using binary search, then loop backward to find the first occurrence of this substring, followed by a loop which collects all matching elements.

I think that the way you are doing this now is actually the best way from a performance standpoint. Sorting itself is probably more expensive than simply iterating through the unsorted list. But to be sure you would have to run some tests (although that's not as easy as it may sound due to JIT compilation).

Is the criterium you are looking for always 'starts with'? Because in your question you're talking about a regex.

If you do want to implement this, you should at least use the same Comparator for sorting as for searching. The comparator itself can be very simple. Just write one that puts everything that matches your criterium in front of everything that doesn't. My syntax may not be completely correct since I haven't done Java in a while.

public class MyComparator<string> implements Comparator<string> {
    private string prefix;
    public MyComparator(string prefix) {
        this.prefix = prefix;
    }
    public int compare(string s0, string s1) {
        if (s0.startsWith(prefix) && s1.startsWith(prefix)) {
            return 0;
        }
        else if (s0.startsWith(prefix)) {
            return -1;
        }
        else if (s1.startsWith(prefix)) {
            return 1;
        }
        return 0;
    }
    public bool equals(object comp) {
        return true;
    }
}

Sorting the list itself takes more time than a linear scan of the list. (Comparison based sort takes time proportional to n(log n) where n is the length of the list.)

Even if the list is completely sorted most of the times, the sorting algorithm will have to at least iterate through the list to check this.

Basically, no matter how you implement a sorting algorithm, the algorithm (even in the best case) has to at least look at all elements. Thus, a linear search for "concat" is probably your best option here.

A more elaborate solution would be to subclass the list that contains the strings, and maintain the index of the first occurnece of "concat".

Given that strings are immutable, all you have to do is to override add, remove and so on, and update the index accordingly.

Just another comparator (with regex):

Comparator<String> comparator = new Comparator<String>() {

    private final Pattern containsPattern = Pattern.compile(searchTerm,Pattern.CASE_INSENSITIVE);

    public int compare(String o1, String o2) {

        Matcher contains1 = containsPattern.matcher(o1);
        Matcher contains2 = containsPattern.matcher(o2);
        boolean find1 = contains1.find();
        boolean find2 = contains2.find();

        if(find1 && find2){
            int compareContains = contains1.end() - contains2.end();
            if (compareContains == 0) {
                return o1.compareTo(o2);
            } else {
                return compareContains;
            }
        }else if(find1){
            return -1;
        }else if(find2){
            return 1;
        }else{
            return o1.compareTo(o2);
        } 
    } 
};