开发者

comparator with null values

开发者 https://www.devze.com 2022-12-22 10:30 出处:网络
We have some code which sorts a list of addresses based on the distance between their coordinates. this is done through collections.sort with a custom comparator.

We have some code which sorts a list of addresses based on the distance between their coordinates. this is done through collections.sort with a custom comparator.

However from time to time an address without coordinates is in the list causing a NullPointerException. My initial idea to fix this was开发者_如何学编程 to have the comparator return 0 as distance for addresses where at least one of the coordinates is null. I fear this might lead to corruption of the order the 'valid' elements in the list.

so is returning a '0' values for null data in a comparator ok, or is there a cleaner way to resolve this?


Handle it like null means infinitely far away. Thus:

  • comp(1234, null) == -1
  • comp(null, null) == 0
  • comp(null, 1234) == 1

With this, you get a consistent ordering.


Just to expand on Willi Schönborn's answer, I came here to say that google-collections is exactly what you're after here.

In the general case, you can just write your own Comparator to ignore nulls (assume non-null, so it can concentrate on the important logic), and then use Ordering to handle the nulls:

Collections.sort(addresses, Ordering.from(new AddressComparator()).nullsLast());

In your case, though, it's data WITHIN the Address (the Coordinates) that is being used to sort, right? google-collections is even more useful in this case. So you might have something more like:

// Seems verbose at first glance, but you'll probably find yourself reusing 
// this a lot and it will pay off quickly.
private static final Function<Address, Coordinates> ADDRESS_TO_COORDINATES = 
  new Function<Address, Coordinates>() {
      public Coordinates apply(Address in) {
          return in.getCoordinates();
      }
  };

private static final Comparator<Coordinates> COORDINATE_SORTER = .... // existing

then when you want to sort:

Collections.sort(addresses,
    Ordering.from(COORDINATE_SORTER)
            .nullsLast()
            .onResultOf(ADDRESS_TO_COORDINATES));

and that's where the power of google-collections really starts to pay off.


If you are using Java 8, you have 2 new static methods in the Comparator class, which come in handy:

public static <T> Comparator<T> nullsFirst(Comparator<? super T> comparator)
public static <T> Comparator<T> nullsLast(Comparator<? super T> comparator)

The comparison will be null safe and you can choose where to place the null values in the sorted sequence.

The following example:

List<String> monkeyBusiness = Arrays.asList("Chimp", "eat", "sleep", "", null, "banana",
            "throw banana peel", null, "smile", "run");
Comparator<? super String> comparator = (a, b) -> a.compareTo(b);
monkeyBusiness.stream().sorted(Comparator.nullsFirst(comparator))
            .forEach(x -> System.out.print("[" + x + "] "));

will print: [null] [null] [] [Chimp] [banana] [eat] [run] [sleep] [smile] [throw banana peel]


My take on this is that anything you try to do to "make good" the null coordinates is just papering over the cracks. What you really need to do is find and fix the bugs that are injecting the spurious null coordinates.

In my experience, infestations of NPE bugs are frequently caused by the following bad coding habits:

  • inadequate validation of input parameters,
  • using null to avoid creating empty arrays or collections,
  • returning null when an exception should have been thrown, or
  • using null to represent "no value" when there is a better solution.

(Better solutions to the "no value" problem typically involve rewriting code so that you don't need to represent this and/or using a non-null value instead; e.g. an empty String, a special instance, a reserved value. You can't always find a better solution, but you often can.)

If this describes your application, you should be spending your time trying to find and correct the code problems that are injecting the null values rather than thinking of ways to avoid the NPEs they cause.


My solution (might be useful for someone looking here) is to do the comparison normal, with null values replaced not by 0, but the maximum value possible (e.g. Integer.MAX_VALUE). Returning 0 is not consistent, if you have values that are themselves 0. Here a correct example:

        public int compare(YourObject lhs, YourObject rhs) {
            Integer l = Integer.MAX_VALUE;
            Integer r = Integer.MAX_VALUE;
            if (lhs != null) {
                l = lhs.giveMeSomeMeasure();
            }
            if (rhs != null) {
                r = rhs.giveMeSomeMeasure();
            }
            return l.compareTo(r);
        }

I just wanted to add that you don't necessary need the max value for integer. It depends of what your giveMeSomeMeasure() method can return. If for example you compare Celsius degrees for weather, you can set l and r to -300 or +300, depending where you want to set the null objects - to the head or the tail of the list.


You probably dont want to return 0 as that implies the addresses are equidistant and you really dont know. This is quite a classic problem where you are trying to deal with bad input data. I dont think its the responsibility of the comparator to try and determin how far the address is in realtive terms when you dont know the distance. I would remove those addresses from the list before sorting.

The hack would be to move them to the bottom of the list (but thats ugly !)


Instead of looking at this like it's a technical problem with the comparator, it's better to take a look at the requirements again: what you're really trying to do here, what are you going to do with this sorted list?

  • If you are trying to sort them to show the most relevant solutions first to a user, it might be a good idea to put the unknown locations last, so treat it like infinity (returning 0/-1/1 depending on which of them is null).
  • If you are going to use this result to draw some graph or do some other calculations that depend on them really being sorted by their distance, then the nulls probably shouldn't have been in there anyways (so either remove them first, or throw an exception if at that point there actually weren't supposed to be any addresses with a null location).

As you already realize, always returning 0 when one of them is null is not a good idea here; it can corrupt the result indeed. But what you should do instead depends on what you need, not on what others usually do/need. How your program behaves with addresses that don't have a location (so what the user is going to see) should not depend on some technical detail like what the "best practice" for comparators is. (To me, asking what the "best practice" is here, sounds like asking what the "best requirements" are).


No, there is no cleaner way. Perhaps:

  • if the coordinates of both compared objects are null, return 0
  • if the coordinates of one of the objects are null, return -1 / 1 (depending on whether it's the first or the second argument)

But more importantly - try to get rid of / fill in the missing coordinates, or, better: don't put addresses with missing coordinates in the list.

Actually, not putting them in the list is the most logical behaviour. If you put them in the list, the result won't be actually ordered by distance.

You may create another list, containing the addresses with missing coordinates, and make it clear to whoever needs that information (end-user, API user), that the first list only contains addresses with the needed data, while the second list contains addresses that lack required information.


I personally hate dealing with special null cases everywhere in my comparators so i was looking for a cleaner soluation and finally found google collections. Their Ordering is just awesome. They support compound comparators, offer to sort nulls to the top and to the end and allow to run certain functions before comparing. Writing comparators has never been so easy. You should give it a try.


I know... I know... This post is quite old and already answered alot. But Guavas Ordering is obsolete and Java 8's Comparator has built in functionality to solve a lot of custom comparison stuff.

Also I wanted to add my approach in case anybody has a similar need for comparing objects via multiple fields in the object, which can be null.

Setup

Let's work with example data of the question. We have a list of Address which contains Coordinate data, which can be null in some situations.

Custom Comparator

Let's asume we sort the list in a class AddressSorter and we only want to separate the sorting for concrete objects from those that are null. We can achieve this by using a custom Comparator that makes basic null checking.

public class AddressSorter {
    private static final Comparator<Coordinate> COORDINATE_NULL_COMPARATOR = (c1, c2) -> {
        if (c1 != null && c2 == null) {
            return 1;
        }
        if (c1 == null && c2 != null) {
            return -1;
        }
        return 0;
    }

    public List<Address> sortAddressList(List<Address> addresses) {
        return addresses.stream()
            .sorted(Comparator.compare(Address::getCoordinate, COORDINATE_NULL_COMPARATOR))
            .collect(Collectors.toList());
    }
}

In this example we use the built in Comparator.comparing(Function<? super T,? extends U> keyExtractor, Comparator<? super U> keyComparator)

This will construct a list where the Addresses with null as Coordinate are at the beginning of the returned list.

This will skip any comparison between any two concrete Coordinate completely

This may look odd, but there are occasions where it is valid to skip the object comparison. For example any LocalDateTime (or any other timely object) comparison with additional chaining will result in unexpected behaviour if you need to separate the objects by a LocalDateTimefield that can be null.

Compare Coordinate with null safety

So if you need the comparison of the Coordinate objects including null-safety you can use the natural order with a null check like this:

    public List<Address> sortAddressList(List<Address> addresses) {
        return addresses.stream()
            .sorted(Comparator.compare(Address::getCoordinate, Comparator.nullsFirst(Comparator.naturalOrder())
            .collect(Collectors.toList());
    }

Edit: It is also possible to use nullsLast if you want the Addresses with Coordinate == null at the end of your list.

Chaining

With that we can also start to chain sortings base on multiple fields of our Addresses, e.g.:

    public List<Address> sortAddressList(List<Address> addresses) {
        return addresses.stream()
            .sorted(Comparator.compare(Address::getCoordinate, Comparator.nullsFirst(Comparator.naturalOrder())
                .thenCompare(Address::getId))
            .collect(Collectors.toList());
    }

So you will end up with a list where the leading Addresses are the once containing null as Coordinate sorted by id and after that all Addresses with a concrete Coordinate also sorted by id.

Comparable and Apache Commons

If you want this behaviour as natural order for Address you can make Address implement Comparable and then work for example with Apache Commons CompareToBuilder:

    @Override
    public int compareTo(Address address) {
        return new CompareToBuilder()
            .append(this.coordinate, address.coordinate, Comparator.nullsFirst(Comparator.naturalOrder())
            .append(this.id, address.id)
            .toComparison();

This then enables you to use sorted() in the stream, as it makes use of the compareTo of Address:

    public List<Address> sortAddressList(List<Address> addresses) {
        return addresses.stream()
            .sorted()
            .collect(Collectors.toList());
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消