Comparison of objects: practice

This is the second of the articles devoted to comparing objects. The first of them discussed the theoretical basis of comparison - how it is done, why and where it is used. In this article we will talk directly about comparing numbers, objects, special cases, subtleties and non-obvious points. More precisely, here's what we'll talk about:

String comparison: ' ==' andequals
MethodString.intern
Comparison of real primitives
+0.0And-0.0
MeaningNaN
Java 5.0. Generating methods and comparison via ' =='
Java 5.0. Autoboxing/Unboxing: ' ==', ' >=' and ' <=' for object wrappers.
Java 5.0. comparison of enum elements (type enum)

So let's get started!

String comparison: ' `==`' and`equals`

Ah, these lines... One of the most commonly used types, which causes a lot of problems. In principle, there is a separate article about them . And here I will touch on comparison issues. Of course, strings can be compared using equals. Moreover, they MUST be compared via equals. However, there are subtleties that are worth knowing. First of all, identical strings are actually a single object. This can be easily verified by running the following code:

String str1 = "string";
String str2 = "string";
System.out.println(str1==str2 ? "the same" : "not the same");

The result will be "the same" . Which means the string references are equal. This is done at the compiler level, obviously to save memory. The compiler creates ONE instance of the string, and assigns str1a str2reference to this instance. However, this only applies to strings declared as literals in code. If you compose a string from pieces, the link to it will be different. Confirmation - this example:

String str1 = "string";
String str2 = "str";
String str3 = "ing";
System.out.println(str1==(str2+str3) ? "the same" : "not the same");

The result will be "not the same" . You can also create a new object using the copy constructor:

String str1 = "string";
String str2 = new String("string");
System.out.println(str1==str2 ? "the same" : "not the same");

The result will also be "not the same" . Thus, sometimes strings can be compared through reference comparison. But it's better not to rely on this. I would like to touch on one very interesting method that allows you to obtain the so-called canonical representation of a string - String.intern. Let's talk about it in more detail.

String.intern method

Let's start with the fact that the class Stringsupports a string pool. All string literals defined in classes, and not only them, are added to this pool. So, the method internallows you to get a string from this pool that is equal to the existing one (the one on which the method is called intern) from the point of view of equals. If such a row does not exist in the pool, then the existing one is placed there and a link to it is returned. Thus, even if the references to two equal strings are different (as in the two examples above), then calls to these strings internwill return a reference to the same object:

String str1 = "string";
String str2 = new String("string");
System.out.println(str1.intern()==str2.intern() ? "the same" : "not the same");

The result of executing this piece of code will be "the same" . I can't say exactly why it was done this way. The method internis native, and to be honest, I don’t want to get into the wilds of C code. Most likely this is done to optimize memory consumption and performance. In any case, it is worth knowing about this implementation feature. Let's move on to the next part.

Comparison of real primitives

To begin with, I want to ask a question. Very simple. What is the following sum – 0.3f + 0.4f? Why? 0.7f? Let's check:

float f1 = 0.7f;
float f2 = 0.3f + 0.4f;
System.out.println("f1==f2: "+(f1==f2));

As a result? Like? Me too. For those who did not complete this fragment, I will say that the result will be...

f1==f2: false

Why is this happening?.. Let's perform another test:

float f1 = 0.3f;
float f2 = 0.4f;
float f3 = f1 + f2;
float f4 = 0.7f;
System.out.println("f1="+(double)f1);
System.out.println("f2="+(double)f2);
System.out.println("f3="+(double)f3);
System.out.println("f4="+(double)f4);

Note the conversion to double. This is done in order to output more decimal places. Result:

f1=0.30000001192092896
f2=0.4000000059604645
f3=0.7000000476837158
f4=0.699999988079071

Strictly speaking, the result is predictable. The representation of the fractional part is carried out using a finite series 2-n, and therefore there is no need to talk about the exact representation of an arbitrarily chosen number. As can be seen from the example, the representation accuracy floatis 7 decimal places. Strictly speaking, the representation float allocates 24 bits to the mantissa. Thus, the minimum absolute number that can be represented using float (without taking into account the degree, because we are talking about accuracy) is 2-24≈6*10-8. It is with this step that the values in the representation actually go float. And since there is quantization, there is also an error. Hence the conclusion: numbers in a representation floatcan only be compared with a certain accuracy. I would recommend rounding them to the 6th decimal place (10-6), or, preferably, checking the absolute value of the difference between them:

float f1 = 0.3f;
float f2 = 0.4f;
float f3 = f1 + f2;
float f4 = 0.7f;
System.out.println("|f3-f4|<1e-6: "+( Math.abs(f3-f4) < 1e-6 ));

In this case, the result is encouraging:

|f3-f4|<1e-6: true

Of course, the picture is exactly the same with the type double. The only difference is that 53 bits are allocated for the mantissa, therefore, the representation accuracy is 2-53≈10-16. Yes, the quantization value is much smaller, but it is there. And it can play a cruel joke. By the way, in the JUnit test library , in the methods for comparing real numbers, the precision is specified explicitly. Those. the comparison method contains three parameters - the number, what it should be equal to, and the accuracy of the comparison. By the way, I would like to mention the subtleties associated with writing numbers in a scientific format, indicating the degree. Question. How to write 10-6? Practice shows that more than 80% answer – 10e-6. Meanwhile, the correct answer is 1e-6! And 10e-6 is 10-5! We stepped on this rake in one of the projects, quite unexpectedly. They looked for the error for a very long time, looked at the constants 20 times. And no one had a shadow of a doubt about their correctness, until one day, largely by accident, the constant 10e-3 was printed and they found two digits after the decimal point instead of expected three. Therefore, be careful! Let's move on.

+0.0 and -0.0

In the representation of real numbers, the most significant bit is signed. What happens if all other bits are 0? Unlike integers, where in such a situation the result is a negative number located at the lower limit of the representation range, a real number with only the most significant bit set to 1 also means 0, only with a minus sign. Thus, we have two zeros - +0.0 and -0.0. A logical question arises: should these numbers be considered equal? The virtual machine thinks exactly this way. However, these are two different numbers, because as a result of operations with them, different values are obtained:

float f1 = 0.0f/1.0f;
float f2 = 0.0f/-1.0f;
System.out.println("f1="+f1);
System.out.println("f2="+f2);
System.out.println("f1==f2: "+(f1==f2));
float f3 = 1.0f / f1;
float f4 = 1.0f / f2;
System.out.println("f3="+f3);
System.out.println("f4="+f4);

... and the result:

f1=0.0
f2=-0.0
f1==f2: true
f3=Infinity
f4=-Infinity

So in some cases it makes sense to treat +0.0 and -0.0 as two different numbers. And if we have two objects, in one of which the field is +0.0, and in the other -0.0, these objects can also be regarded as unequal. The question arises - how can you understand that the numbers are unequal if their direct comparison with a virtual machine gives true? The answer is this. Even though the virtual machine considers these numbers to be equal, their representations are still different. Therefore, the only thing that can be done is to compare the views. And in order to get it, there are methods int Float.floatToIntBits(float)and long Double.doubleToLongBits(double), which return a bit representation in the form intand longrespectively (continuation of the previous example):

int i1 = Float.floatToIntBits(f1);
int i2 = Float.floatToIntBits(f2);
System.out.println("i1 (+0.0):"+ Integer.toBinaryString(i1));
System.out.println("i2 (-0.0):"+ Integer.toBinaryString(i2));
System.out.println("i1==i2: "+(i1 == i2));

The result will be

i1 (+0.0):0
i2 (-0.0):10000000000000000000000000000000
i1==i2: false

Thus, if you have +0.0 and -0.0 are different numbers, then you should compare real variables through their bit representation. We seem to have sorted out +0.0 and -0.0. -0.0, however, is not the only surprise. There is also such a thing as...

NaN value

NaNstands for Not-a-Number. This value appears as a result of incorrect mathematical operations, say, dividing 0.0 by 0.0, infinity by infinity, etc. The peculiarity of this value is that it is not equal to itself. Those.:

float x = 0.0f/0.0f;
System.out.println("x="+x);
System.out.println("x==x: "+(x==x));

...will result...

x=NaN
x==x: false

How can this turn out when comparing objects? If the field of the object is equal to NaN, then the comparison will give false, i.e. objects are guaranteed to be considered unequal. Although, logically, we may want just the opposite. You can achieve the desired result using the method Float.isNaN(float). It returns trueif the argument is NaN. In this case, I would not rely on comparing bit representations, because it is not standardized. Perhaps that's enough about primitives. Let's now move on to the subtleties that have appeared in Java since version 5.0. And the first point I would like to touch on is

Java 5.0. Generating methods and comparison via ' `==`'

There is a pattern in design called the producing method. Sometimes its use is much more profitable than using a constructor. Let me give you an example. I think I know the object shell well Boolean. This class is immutable and can contain only two values. That is, in fact, for any needs, only two copies are enough. And if you create them in advance and then simply return them, it will be much faster than using a constructor. There is such a method Boolean: valueOf(boolean). It appeared in version 1.4. Similar producing methods were introduced in version 5.0 in the Byte, Character, Short, Integerand classes Long. When these classes are loaded, arrays of their instances are created corresponding to certain ranges of primitive values. These ranges are as follows:

This means that when using the method, valueOf(...)if the argument falls within the specified range, the same object will always be returned. Perhaps this gives some increase in speed. But at the same time, problems arise of such a nature that it can be quite difficult to get to the bottom of it. Read more about it. In theory, the producing method valueOfhas been added to both the Floatand classes Double. Their description says that if you don’t need a new copy, then it’s better to use this method, because it can give an increase in speed, etc. and so on. However, in the current (Java 5.0) implementation, a new instance is created in this method, i.e. Its use is not guaranteed to give an increase in speed. Moreover, it is difficult for me to imagine how this method can be accelerated, because due to the continuity of values, a cache cannot be organized there. Except for integers. I mean, without the fractional part.

Java 5.0. Autoboxing/Unboxing: ' `==`', ' `>=`' and ' `<=`' for object wrappers.

I suspect that the production methods and instance cache were added to wrappers for integer primitives to optimize operations autoboxing/unboxing. Let me remind you what it is. If an object must be involved in an operation, but a primitive is involved, then this primitive is automatically wrapped in an object wrapper. This autoboxing. And vice versa - if a primitive must be involved in the operation, then you can substitute an object shell there, and the value will be automatically expanded from it. This unboxing. Naturally, you have to pay for such convenience. Automatic conversion operations slow down the application somewhat. However, this is not relevant to the current topic, so let’s leave this question. Everything is fine as long as we are dealing with operations that are clearly related to primitives or shells. What will happen to the ' ==' operation? Let's say we have two objects Integerwith the same value inside. How will they compare?

Integer i1 = new Integer(1);
Integer i2 = new Integer(1);
System.out.println("i1==i2: "+(i1==i2));

Result:

i1==i2: false

Кто бы сомневался... Сравниваются они How an objectы. А если так:Integer i1 = 1;
Integer i2 = 1;
System.out.println("i1==i2: "+(i1==i2));

Result:

i1==i2: true

Now this is more interesting! If autoboxing-e the same objects are returned! This is where the trap lies. Once we discover that the same objects are returned, we will begin experimenting to see if this is always the case. And how many values will we check? One? Ten? One hundred? Most likely we will limit ourselves to a hundred in each direction around zero. And we get equality everywhere. It would seem that everything is fine. However, look a little back, here . Have you guessed what the catch is?.. Yes, instances of object shells during autoboxing are created using producing methods. This is well illustrated by the following test:

public class AutoboxingTest {

    private static final int numbers[] = new int[]{-129,-128,127,128};

    public static void main(String[] args) {
        for (int number : numbers) {
            Integer i1 = number;
            Integer i2 = number;
            System.out.println("number=" + number + ": " + (i1 == i2));
        }
    }
}

The result will be like this:

number=-129: false
number=-128: true
number=127: true
number=128: false

For values falling within the caching range , identical objects are returned, for those outside it, different objects are returned. And therefore, if somewhere in the application shells are compared instead of primitives, there is a chance of getting the most terrible error: a floating one. Because the code will most likely also be tested on a limited range of values in which this error will not appear. But in real work, it will either appear or disappear, depending on the results of some calculations. It's easier to go crazy than to find such a mistake. Therefore, I would advise you to avoid autoboxing wherever possible. And that's not it. Let's remember mathematics, no further than 5th grade. Let the inequalities A>=Band А<=B. What can be said about the relationship Aand B? There is only one thing - they are equal. Do you agree? I think yes. Let's run the test:

Integer i1 = new Integer(1);
Integer i2 = new Integer(1);
System.out.println("i1>=i2: "+(i1>=i2));
System.out.println("i1<=i2: "+(i1<=i2));
System.out.println("i1==i2: "+(i1==i2));

Result:

i1>=i2: true
i1<=i2: true
i1==i2: false

And this is the biggest strange thing for me. I don’t understand at all why this feature was introduced into the language if it introduces such contradictions. In general, I will repeat once again - if it is possible to do without autoboxing/unboxing, then it is worth using this opportunity to the fullest. The last topic I would like to touch on is... Java 5.0. comparison of enumeration elements (enum type) As you know, since version 5.0 Java has introduced such a type as enum - enumeration. Its instances by default contain the name and sequence number in the instance declaration in the class. Accordingly, when the announcement order changes, the numbers change. However, as I said in the article 'Serialization as it is' , this does not cause problems. All enumeration elements exist in a single copy, this is controlled at the virtual machine level. Therefore, they can be compared directly, using links. * * * Perhaps that’s all for today about the practical side of implementing object comparison. Perhaps I'm missing something. As always, I'm looking forward to your comments! For now, let me take my leave. Thank you all for your attention! Link to source: Comparing objects: practice