Natural Ordering in Sets (Java)

I came across a note in TreeSet’s javadoc which intrigued me:

“Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.”

What does that mean? The TreeSet doc says to read the doc for Comparable or Comparator so that’s what I did and here’s what I found:

“The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C.”

Oooh! What is natural ordering and how do I know if my class has one?

The answer is it depends on what the class represents. As always, examples educate best. So let’s start with a simple one:

Let’s say we are creating a class to represent players in a video game. We start with a simple class which just has player IDs and we define two players to be equal if they have the same ID. The hashCode method is actually used by the Set to compare two objects for equality so we implement that plus the equals method since equal objects should have equal hashcode.

(I know I should be checking the argument in equals but I removed those checks for the sake of simplicity)

Now we can use Player objects in a HashSet:

As expected, the second add call returns false because although p2 is a separate object in memory by our definition it is equal to p1 since it has the same id as p1. Note that HashSet has no concept of ordering, just set membership.

Now let’s say we introduce the concept of rank of a player. We can see how rank defines a natural ordering of the Player objects. If we were to arrange the players in ascending order, lower ranked players should appear before higher ranked players. To express that in code we implement the Comparable interface which means that Player class gets a compareTo method.

(Again, I know I should be checking the argument in compareTo but I removed those checks for the sake of simplicity).

We can now put Player objects in a slightly different kind of Set called TreeSet which, unlike HashSet, implements both set membership and ordering.

This outputs player id 2 followed by 1because player 2’s rank is 10 which is lower than player 1’s rank 20. Note that the forEach method returns elements in natural order defined by our compareTo method (and it is more compact than getting an iterator by calling iterator() method on the TreeSet and looping over the iterator).

Before we added rank to the Player class there was no natural ordering for players. We could only say whether two players are ‘equal’ as in whether they have the same id. We could not say whether one player should come before or after another in sorted order. As you can imagine, there can be other attributes of a player which can be used for ordering. Suppose we introduce an attribute called gamesPlayed . Now we may have a scenario where we want the players to be sorted according to the number of games the have played. Should we change our compareTo method now? If we do, that would be a behavior change so might break existing applications which depend on players to be sorted according to their rank. There are two solutions to this dilemma:

[1] Create a subclass and implement a compareTo method which provides ordering based on gamesPlayed

This is a somewhat of a clumsy solution, mainly because it requires you to define a whole new class just to implement a different comparison behavior. It could be appealing if the class could store other information and have more functionality than just re-implementing the compareTo method.

[2] Pass in a Comparator when creating the TreeSet. This involves defining a Comparator<Player> and passing it in the TreeSet constructor.

This is a better solution in many situations since the amount of code you are adding is proportional to the additional behavior you want. The only drawback of this method is that it may not be possible since the attribute you want to compare on may not be visible to the Comparator class e.g. if it is private, or if it’s not public and the code where Comparator is defined is in a different package than the class to be compared.

In our example the Player class has no natural ordering. It is what we want it to be. We chose it to be defined by the rank. In some cases the class may have a natural ordering e.g. in case of Integer class there is a natural order and indeed Integer.compareTo method implements it so. Another example could be a Student class where a student’s gradecould define a natural order on Student objects.

So going back to the statement at the start of this article:

“Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.”

If we were to think in terms of Player class if two Player objects have the same id they should have the same rank (unless there is something terribly wrong somewhere in the system). But the opposite is not true. If two players have the same rank they may have different id’s. So the following statement is not true for the Player class:

“The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C.”

But for practical purposes our implementation is sufficient. Two Player objects with the same id cannot be added to a Set (HashSet or TreeSet) which makes sense and if TreeSet is used then the Players will be ordered by their rank which also makes sense.