JavaRush /Java Blog /Random EN /Object Comparison: Practice
articles
Level 15

Object Comparison: Practice

Published in the Random EN group
This is the second of the articles on comparing objects. The first of them dealt with the theoretical basis of comparison - how it is done, why and where it is used. In the same article, we will talk directly about comparing numbers, objects, about special cases, subtleties and non-obvious points. More specifically, we'll talk about this:
Comparison of objects: practice - 1
  • String comparison: ' ==' andequals
  • MethodString.intern
  • Comparison of real primitives
  • +0.0And-0.0
  • MeaningNaN
  • Java 5.0. Generating methods and comparison via ' =='
  • Java 5.0. Autoboxing/Unboxing: ' ==', ' >=' and ' <=' for shells.
  • Java 5.0. comparison of enumeration elements (type enum)
So let's get started!

String comparison: ' ==' andequals

Ah, those strings... One of the most commonly used types, but with a lot of problems. In principle, there is a separate article about them . And here I will touch on the issues of comparison. Of course, strings can be compared using equals. Moreover, they SHOULD be compared using equals. However, there are subtleties that are worth knowing. First of all, identical strings are actually the only object. This is easy to verify by running the following code:
String str1 = "string";
String str2 = "string";
System.out.println(str1==str2 ? "the same" : "not the same");
The result will be "the same" . Which means string references are equal. This is done at the compiler level, obviously to save memory. The compiler creates ONE instance of the string, and assigns a reference to that instance str1as well . str2However, this only applies to strings declared as literals in code. If you compose a string from pieces, the link to it will be different. Confirmation - this example:
String str1 = "string";
String str2 = "str";
String str3 = "ing";
System.out.println(str1==(str2+str3) ? "the same" : "not the same");
The result will be "not the same" . You can also create a new object using a copy constructor:
String str1 = "string";
String str2 = new String("string");
System.out.println(str1==str2 ? "the same" : "not the same");
The result will also be "not the same" . Thus, sometimes strings can also be compared via reference comparison. But it's better not to rely on it. I would like to touch on one very curious method that allows you to get the so-called canonical representation of a string - String.intern. Let's talk about it in more detail.

String.intern Method

Let's start with the fact that the class Stringmaintains a pool of strings. All string literals defined in classes, and not only them, are added to this pool. So, the method internallows you to get from this pool a string that is equal to the existing one (the one for which the method is called intern) from the point of view of equals. If such a string does not exist in the pool, then the existing one is placed there, and a reference to it is returned. Thus, even if the references to two equal strings are different (as in the two examples above), then calls to these strings internwill return a reference to the same object:
String str1 = "string";
String str2 = new String("string");
System.out.println(str1.intern()==str2.intern() ? "the same" : "not the same");
The result of executing this piece of code will be "the same" . I can't say exactly why it's done this way. The method internis native, and, to be honest, I don’t want to go into the wilds of C-code. Most likely this was done to optimize memory consumption and performance. In any case, it is worth knowing about this feature of the implementation. Let's move on to the next part.

Comparison of real primitives

First, I want to ask a question. Very simple. What is the next sum - 0.3f + 0.4f? What? 0.7f? Let's check:
float f1 = 0.7f;
float f2 = 0.3f + 0.4f;
System.out.println("f1==f2: "+(f1==f2));
As a result? Like? Me too. For those who have not completed this fragment, I will say - the result will be ...
f1==f2: false
Why is this happening?.. Let's run another test:
float f1 = 0.3f;
float f2 = 0.4f;
float f3 = f1 + f2;
float f4 = 0.7f;
System.out.println("f1="+(double)f1);
System.out.println("f2="+(double)f2);
System.out.println("f3="+(double)f3);
System.out.println("f4="+(double)f4);
Note the cast to double. This is done in order to display more decimal places. Result:
f1=0.30000001192092896
f2=0.4000000059604645
f3=0.7000000476837158
f4=0.699999988079071
In fact, the result is predictable. The representation of the fractional part is carried out using a finite series 2-n, and therefore there is no need to talk about the exact representation of an arbitrarily taken number. As can be seen from the example, the representation accuracy floatis 7 decimal places. Strictly speaking, float 24 bits are allocated to the mantissa. Thus, the minimum modulo number that can be represented using float (without taking into account the degree, because we are talking about accuracy) is 2-24≈6*10-8. It is with this step that the values ​​​​in the representation really go float. And since there is quantization, there is also an error. Hence the conclusion: the numbers in the representationfloatcan only be compared with a certain accuracy. I would recommend rounding them to the 6th decimal place (10-6), or, preferably, checking the absolute value of the difference between them:
float f1 = 0.3f;
float f2 = 0.4f;
float f3 = f1 + f2;
float f4 = 0.7f;
System.out.println("|f3-f4|<1e-6: "+( Math.abs(f3-f4) < 1e-6 ));
In this case, the result is encouraging:
|f3-f4|<1e-6: true
Of course, the picture is exactly the same with the type double. The only difference is that 53 bits are assigned to the mantissa, therefore, the representation accuracy is 2-53≈10-16. Yes, the amount of quantization is much less, but it is. And it can play a cruel joke. By the way, in the JUnit test library, in the methods for comparing real numbers, the precision is specified explicitly. Those. the comparison method contains three parameters - the number it should be equal to and the accuracy of the comparison. By the way, I want to mention the subtleties associated with writing numbers in a scientific format, indicating the degree. Question. How to write 10-6? Practice shows that more than 80% answer - 10e-6. Meanwhile, the correct answer is 1e-6! And 10e-6 is 10-5! We stepped on this rake in one of the projects, quite unexpectedly. They searched for the error for a very long time, looked at the constants 20 times. And no one had a shadow of doubt about their correctness, until one day, to a large extent by accident, the constant 10e-3 was printed and found to have two decimal places instead of the expected three. And therefore - be vigilant! We move on.

+0.0 and -0.0

In the representation of real numbers, the most significant bit is the sign bit. What happens if all other bits are 0? Unlike integers, where in such a situation a negative number is obtained, which is at the lower limit of the representation range, a real number with only the most significant bit set to 1 also denotes 0, only with a minus sign. Thus, we have two zeros - +0.0 and -0.0. A logical question arises - should these numbers be considered equal? The virtual machine thinks so. However, these are two different numbers, because as a result of operations with them, different values ​​are obtained:
float f1 = 0.0f/1.0f;
float f2 = 0.0f/-1.0f;
System.out.println("f1="+f1);
System.out.println("f2="+f2);
System.out.println("f1==f2: "+(f1==f2));
float f3 = 1.0f / f1;
float f4 = 1.0f / f2;
System.out.println("f3="+f3);
System.out.println("f4="+f4);
...and the result:
f1=0.0
f2=-0.0
f1==f2: true
f3=Infinity
f4=-Infinity
Thus, in some cases it makes sense to regard +0.0 and -0.0 as two different numbers. And if we have two objects, in one of which the field is equal to +0.0, and in the other -0.0 - these objects can also be regarded as unequal in the same way. The question arises - how to understand that the numbers are unequal, if their direct comparison by a virtual machine gives true? The answer is this. Despite the fact that the virtual machine considers these numbers to be equal, their representations are still different. Therefore, the only thing that can be done is to compare views. And in order to get it, there are methods int Float.floatToIntBits(float)and long Double.doubleToLongBits(double), which return a bit representation in the form intand longrespectively (continuation of the previous example):
int i1 = Float.floatToIntBits(f1);
int i2 = Float.floatToIntBits(f2);
System.out.println("i1 (+0.0):"+ Integer.toBinaryString(i1));
System.out.println("i2 (-0.0):"+ Integer.toBinaryString(i2));
System.out.println("i1==i2: "+(i1 == i2));
The result will be
i1 (+0.0):0
i2 (-0.0):10000000000000000000000000000000
i1==i2: false
Thus, if you have +0.0 and -0.0 - different numbers, then you should compare real variables through their bit representation. With +0.0 and -0.0 it seems like they figured it out. -0.0, however, is not the only surprise. There is another phenomenon...

NaN value

NaNdecoded as Not-a-Number. This value appears as a result of incorrect mathematical operations, say, dividing 0.0 by 0.0, infinity by infinity, etc. The peculiarity of this value is that it is not equal to itself. Those.:
float x = 0.0f/0.0f;
System.out.println("x="+x);
System.out.println("x==x: "+(x==x));
... will result in...
x=NaN
x==x: false
What does this mean when comparing objects? If the field of the object is equal to NaN, then the comparison will give false, i.e. objects are guaranteed to be considered unequal. Although, logically, we may want just the opposite. You can achieve the desired result using the method Float.isNaN(float). It returns trueif the argument is NaN. In this case, I would not rely on a comparison of bit representations, because it is not standardized. Enough about primitives. Now let's move on to the subtleties that have appeared in Java since version 5.0. And the first point I would like to touch on is

Java 5.0. Generating methods and comparison via ' =='

There is a pattern in design called the producing method. Sometimes its use is much more beneficial than using a constructor. I'll give you an example. I think everyone knows the object shell well Boolean. This class is immutable, capable of holding only two values. That is, in fact, only two copies will suffice for any needs. And if you create them in advance, and then just return them, then it will be much faster than using a constructor. There is such a method Boolean: valueOf(boolean). It appeared in version 1.4. Similar production methods have been introduced since version 5.0 in the Byte, Character, Short, Integerand Long. When these classes are loaded, arrays of their instances are created corresponding to certain ranges of primitive values. These ranges are as follows:
Comparison of objects: practice - 2
This means that when using the method, valueOf(...)if the argument falls into the specified range, the same object will always be returned. Perhaps this gives some increase in speed. But at the same time, problems of such a nature appear that it can be quite difficult to get to the bottom of the matter. Read more about it. Theoretically, a producing method valueOfis added to both classes FloatandDouble. Their description says that if a new instance is not needed, then it is better to use this method, because it can give an increase in speed, etc. and so on. However, in the current (Java 5.0) implementation, a new instance is created in this method, i.e. Its use will not guarantee an increase in speed. Moreover, it is difficult for me to imagine how this method can be accelerated, because due to the continuity of values, the cache cannot be organized there. Except for integers. I mean, without the fractional part.

Java 5.0. Autoboxing/Unboxing: ' ==', ' >=' and ' <=' for shells.

I suspect that the production methods and the instance cache were added to wrappers for integer primitives in order to optimize operations autoboxing/unboxing. Let me remind you what it is. If an object must participate in the operation, but a primitive participates, then this primitive is automatically wrapped in an object shell. This is autoboxing. And vice versa - if a primitive must participate in the operation, then you can substitute an object shell there, and the value will be automatically unwrapped from it. This is unboxing. Naturally, you have to pay for this convenience. Automatic conversion operations slightly slow down the speed of the application. However, this does not apply to the current topic, so we will leave this question. Everything is fine as long as we are dealing with operations that are uniquely related to primitives or shells. And what will happen to the ' ==' operation? Let's say we have two objects Integerwith the same value inside. How will they compare?
Integer i1 = new Integer(1);
Integer i2 = new Integer(1);
System.out.println("i1==i2: "+(i1==i2));
Result:
i1==i2: false

Кто бы сомневался... Сравниваются они How an objectы. А если так:Integer i1 = 1;
Integer i2 = 1;
System.out.println("i1==i2: "+(i1==i2));
Result:
i1==i2: true
Now that's more interesting! -e autoboxingreturns the same objects! This is where the trap lies. Once we find that the same objects are returned, we will start experimenting to see if this is always the case. And how many values ​​do we check? One? Ten? One hundred? Most likely we will limit ourselves to a hundred in each direction around zero. And everywhere we get equality. It would seem that everything is fine. However, look back a little, right here . Guessed, what's the catch?.. Yes, instances of object shells during autoboxing are created using producing methods. This is well illustrated by the following test:
public class AutoboxingTest {

    private static final int numbers[] = new int[]{-129,-128,127,128};

    public static void main(String[] args) {
        for (int number : numbers) {
            Integer i1 = number;
            Integer i2 = number;
            System.out.println("number=" + number + ": " + (i1 == i2));
        }
    }
}
The result will be:
number=-129: false
number=-128: true
number=127: true
number=128: false
For values ​​falling within the caching range , the same objects are returned, for values ​​outside it, different ones. And therefore, if somewhere in the application shells are compared instead of primitives, there is a chance to get the worst error: floating. Because the code will most likely be tested on a limited range of values, in which this error will not appear. And in real work, it will either appear or disappear, depending on the results of some calculations. It's easier to go crazy than to find such a mistake. Therefore, I would advise avoiding autoboxing wherever possible. And that's not it. Let's remember mathematics, no further than the 5th grade. Let the inequalities A>=Band be satisfied А<=B. What can be said about the ArelationshipB? Only one thing - they are equal. Do you agree? I think yes. Let's run the test:
Integer i1 = new Integer(1);
Integer i2 = new Integer(1);
System.out.println("i1>=i2: "+(i1>=i2));
System.out.println("i1<=i2: "+(i1<=i2));
System.out.println("i1==i2: "+(i1==i2));
Result:
i1>=i2: true
i1<=i2: true
i1==i2: false
And this is the biggest surprise for me. I don’t understand at all why it was necessary to introduce this feature into the language if it introduces such contradictions. In general, I repeat once again - if it is possible to do without autoboxing/unboxing, then it is worth using this opportunity to its fullest. The last topic I would like to touch on is... Java 5.0. comparison of enumeration elements (enum type) As you know, since version 5.0 Java has introduced such type as enum – enumeration. Its default instances contain the name and ordinal in the instance declaration in the class. Accordingly, when the order of the declaration is changed, the numbers change. However, as I said in the article 'Serialization as it is', this is not a problem. All elements of the enumeration exist in a single instance, this is controlled at the virtual machine level. Therefore, they can be compared directly, by reference. * * * Perhaps, that's all for today about the practical side of the implementation of object comparison. Perhaps I missed something. As always, comments are welcome! For now, let me digress. Thank you all for your attention! Link to the original source: Comparison of objects: practice
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION