Serialization as it is. Part 1

At first glance, serialization seems like a trivial process. Really, what could be simpler? Declared the class to implement the interface java.io.Serializable- and that’s it. You can serialize the class without problems. Serialization as it is. Part 1 - 1

Theoretically, this is true. In practice, there are a lot of subtleties. They are related to performance, to deserialization, to class safety. And with many more aspects. Such subtleties will be discussed. This article can be divided into the following parts:

Subtleties of mechanisms
Why is it needed?Externalizable
Performance
but on the other hand
Data Security
Object SerializationSingleton

Let's move on to the first part -

Subtleties of mechanisms

First of all, a quick question. How many ways are there to make an object serializable? Practice shows that more than 90% of developers answer this question approximately the same way (up to the wording) - there is only one way. Meanwhile, there are two of them. Not everyone remembers the second one, let alone says anything intelligible about its features. So what are these methods? Everyone remembers the first one. This is the already mentioned implementation java.io.Serializableand does not require any effort. The second method is also the implementation of an interface, but a different one: java.io.Externalizable. Unlike java.io.Serializable, it contains two methods that need to be implemented - writeExternal(ObjectOutput)and readExternal(ObjectInput). These methods contain the serialization/deserialization logic. Comment.SerializableIn what follows , I will sometimes refer to serialization with implementation as standard, and implementation Externalizableas extended. Anothercomment. I deliberately do not touch now on such standard serialization control options as defining readObjectand writeObject, because I think these methods are somewhat incorrect. These methods are not defined in the interface Serializableand are, in fact, props to work around the limitations and make standard serialization flexible. ExternalizableMethods that provide flexibility are built into them from the very beginning . Let's ask one more question. How does standard serialization actually work, using java.io.Serializable? And it works through the Reflection API. Those. the class is parsed as a set of fields, each of which is written to the output stream. I think it is clear that this operation is not optimal in terms of performance. We'll find out how much exactly later. There is another major difference between the two serialization methods mentioned. Namely, in the deserialization mechanism. When used, Serializabledeserialization occurs like this: memory is allocated for an object, after which its fields are filled with values from the stream. The object's constructor is not called. Here we need to consider this situation separately. Okay, our class is serializable. And his parent? Completely optional! Moreover, if you inherit a class from Object- the parent is definitely NOT serializable. And even though Objectwe don’t know anything about fields, they may well exist in our own parent classes. What will happen to them? They will not get into the serialization stream. What values will they take upon deserialization? Let's look at this example:

package ru.skipy.tests.io;

import java.io.*;

/**
 * ParentDeserializationTest
 *
 * @author Eugene Matyushkin aka Skipy
 * @since 05.08.2010
 */
public class ParentDeserializationTest {

    public static void main(String[] args){
        try {
            System.out.println("Creating...");
            Child c = new Child(1);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            ObjectOutputStream oos = new ObjectOutputStream(baos);
            c.field = 10;
            System.out.println("Serializing...");
            oos.writeObject(c);
            oos.flush();
            baos.flush();
            oos.close();
            baos.close();
            ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
            ObjectInputStream ois = new ObjectInputStream(bais);
            System.out.println("Deserializing...");
            Child c1 = (Child)ois.readObject();
            System.out.println("c1.i="+c1.getI());
            System.out.println("c1.field="+c1.getField());
        } catch (IOException ex){
            ex.printStackTrace();
        } catch (ClassNotFoundException ex){
            ex.printStackTrace();
        }
    }

    public static class Parent {
        protected int field;
        protected Parent(){
            field = 5;
            System.out.println("Parent::Constructor");
        }
        public int getField() {
            return field;
        }
    }

    public static class Child extends Parent implements Serializable{
        protected int i;
        public Child(int i){
            this.i = i;
            System.out.println("Child::Constructor");
        }
        public int getI() {
            return i;
        }
    }
}

It is transparent - we have a non-serializable parent class and a serializable child class. And this is what happens:

Creating...
Parent::Constructor
Child::Constructor
Serializing...
Deserializing...
Parent::Constructor
c1.i=1
c1.field=5

That is, during deserialization, the constructor without parameters of the parent NON-serializable class is called . And if there is no such constructor, an error will occur during deserialization. The constructor of the child object, the one we are deserializing, is not called, as was said above. This is how standard mechanisms behave when used Serializable. When using it, Externalizablethe situation is different. First, the constructor without parameters is called, and then the readExternal method is called on the created object, which actually reads all its data. Therefore, any class implementing the Externalizable interface must have a public constructor without parameters! Moreover, since all descendants of such a class will also be considered to implement the interface Externalizable, they must also have a parameterless constructor! Let's go further. There is such a field modifier as transient. It means that this field should not be serialized. However, as you yourself understand, this instruction only affects the standard serialization mechanism. When used, Externalizableno one bothers to serialize this field, as well as subtract it. If a field is declared transient, then when the object is deserialized, it takes on the default value. Another rather subtle point. With standard serialization, fields that have the modifier staticare not serialized. Accordingly, after deserialization this field does not change its value. Of course, during implementation, Externalizableno one bothers to serialize and deserialize this field, but I highly recommend not doing this, because this can lead to subtle errors. Fields with a modifier finalare serialized like regular fields. With one exception - they cannot be deserialized when using Externalizable. Because final-поляthey must be initialized in the constructor, and after that it will be impossible to change the value of this field in readExternal. Accordingly, if you need to serialize an object that has finala -field, you will only have to use standard serialization. Another point that many people don’t know. Standard serialization takes into account the order in which fields are declared in a class. In any case, this was the case in earlier versions; in JVM version 1.6 of the Oracle implementation, the order is no longer important, the type and name of the field are important. The composition of the methods is very likely to affect the standard mechanism, despite the fact that the fields may generally remain the same. To avoid this, there is the following mechanism. To each class that implements the interface Serializable, one more field is added at the compilation stage -private static final long serialVersionUID. This field contains the unique version identifier of the serialized class. It is calculated based on the contents of the class - fields, their declaration order, methods, their declaration order. Accordingly, with any change in the class, this field will change its value. This field is written to the stream when the class is serialized. By the way, this is perhaps the only case known to me when statica -field is serialized. During deserialization, the value of this field is compared with that of the class in the virtual machine. If the values do not match, an exception like this is thrown:

java.io.InvalidClassException: test.ser2.ChildExt;
    local class incompatible: stream classdesc serialVersionUID = 8218484765288926197,
                                   local class serialVersionUID = 1465687698753363969

There is, however, a way to, if not bypass, then deceive this check. This can be useful if the set of class fields and their order are already defined, but the class methods may change. In this case, serialization is not at risk, but the standard mechanism will not allow data to be deserialized using the bytecode of the modified class. But, as I said, he can be deceived. Namely, manually define the field in the class private static final long serialVersionUID. In principle, the value of this field can be absolutely anything. Some people prefer to set it equal to the date the code was modified. Some even use 1L. To obtain the standard value (the one that is calculated internally), you can use the serialver utility included in the SDK. Once defined this way, the value of the field will be fixed, hence deserialization will always be allowed. Moreover, in version 5.0, approximately the following appeared in the documentation: it is highly recommended that all serializable classes declare this field explicitly, because the default calculation is very sensitive to details of the class structure, which may vary depending on the compiler implementation, and thus cause unexpected InvalidClassExceptionconsequences . deserialization. It is better to declare this field as private, because it refers solely to the class in which it is declared. Although the modifier is not specified in the specification. Let us now consider this aspect. Let's say we have this class structure:

public class A{
    public int iPublic;
    protected int iProtected;
    int iPackage;
    private int iPrivate;
}

public class B extends A implements Serializable{}

In other words, we have a class inherited from a non-serializable parent. Is it possible to serialize this class, and what is needed for this? What will happen to the variables of the parent class? The answer is this. Yes, Byou can serialize an instance of a class. What is needed for this? But the class needs to Ahave a constructor without parameters, publicor protected. Then, during deserialization, all class variables Awill be initialized using this constructor. The class variables Bwill be initialized with the values from the serialized data stream. Theoretically, it is possible to define in a class Bthe methods that I talked about at the beginning - readObjectand writeObject, - at the beginning of which to perform (de-)serialization of class variables Bthrough in.defaultReadObject/out.defaultWriteObject, and then (de-)serialization of available variables from the class A(in our case these are iPublic, iProtectedand iPackage, if Bit is in the same package as A). However, in my opinion, it is better to use extended serialization for this. The next point I would like to touch on is serialization of multiple objects. Let's say we have the following class structure:

public class A implements Serializable{
    private C c;
    private B b;
    public void setC(C c) {this.c = c;}
    public void setB(B b) {this.b = b;}
    public C getC() {return c;}
    public B getB() {return b;}
}
public class B implements Serializable{
    private C c;
    public void setC(C c) {this.c = c;}
    public C getC() {return c;}
}
public class C implements Serializable{
    private A a;
    private B b;
    public void setA(A a) {this.a = a;}
    public void setB(B b) {this.b = b;}
    public B getB() {return b;}
    public A getA() {return a;}
}

What happens if you serialize an instance of the class A? It will drag along an instance of the class B, which, in turn, will drag along an instance Cthat has a reference to the instance A, the same one with which it all began. Vicious circle and infinite recursion? Fortunately, no. Let's look at the following test code:

// initiaizing
A a = new A();
B b = new B();
C c = new C();
// setting references
a.setB(b);
a.setC(c);
b.setC(c);
c.setA(a);
c.setB(b);
// serializing
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(a);
oos.writeObject(b);
oos.writeObject(c);
oos.flush();
oos.close();
// deserializing
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(baos.toByteArray()));
A a1 = (A)ois.readObject();
B b1 = (B)ois.readObject();
C c1 = (C)ois.readObject();
// testing
System.out.println("a==a1: "+(a==a1));
System.out.println("b==b1: "+(b==b1));
System.out.println("c==c1: "+(c==c1));
System.out.println("a1.getB()==b1: "+(a1.getB()==b1));
System.out.println("a1.getC()==c1: "+(a1.getC()==c1));
System.out.println("b1.getC()==c1: "+(b1.getC()==c1));
System.out.println("c1.getA()==a1: "+(c1.getA()==a1));
System.out.println("c1.getB()==b1: "+(c1.getB()==b1));

What are we doing? We create an instance of the classes A, Band C, give them links to each other, and then serialize each of them. Then we deserialize them back and run a series of checks. What will happen as a result:

a==a1: false
b==b1: false
c==c1: false
a1.getB()==b1: true
a1.getC()==c1: true
b1.getC()==c1: true
c1.getA()==a1: true
c1.getB()==b1: true

So what can you learn from this test? First. Object references after deserialization are different from references before it. In other words, during serialization/deserialization the object was copied. This method is sometimes used to clone objects. The second conclusion is more significant. When serializing/deserializing multiple objects that have cross references, those references remain valid after deserialization. In other words, if before serialization they pointed to one object, then after deserialization they will also point to one object. Another small test to confirm this:

B b = new B();
C c = new C();
b.setC(c);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(b);
oos.writeObject(c);
oos.writeObject(c);
oos.writeObject(c);
oos.flush();
oos.close();
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(baos.toByteArray()));
B b1 = (B)ois.readObject();
C c1 = (C)ois.readObject();
C c2 = (C)ois.readObject();
C c3 = (C)ois.readObject();
System.out.println("b1.getC()==c1: "+(b1.getC()==c1));
System.out.println("c1==c2: "+(c1==c2));
System.out.println("c1==c3: "+(c1==c3));

A class object Bhas a reference to a class object C. When serialized, bit is serialized along with an instance of the class С, after which the same instance of c is serialized three times. What happens after deserialization?

b1.getC()==c1: true
c1==c2: true
c1==c3: true

As you can see, all four deserialized objects actually represent one object - the references to it are equal. Exactly as it was before serialization. Another interesting point - what will happen if we simultaneously implement Externalizableand Serializable? Like in that question - elephant versus whale - who will defeat whom? Will overcome Externalizable. The serialization mechanism first checks for its presence, and only then for its presence. SerializableSo if class B, which implements Serializable, inherits from class A, which implements Externalizable, the fields of class B will not be serialized. The last point is inheritance. When inheriting from a class that implements Serializable, no additional actions need to be taken. The serialization will extend to the child class as well. When inheriting from a class that implements Externalizable, you must override the parent class's readExternal and writeExternal methods. Otherwise, the fields of the child class will not be serialized. In this case, you would have to remember to call the parent methods, otherwise the parent fields will not be serialized. * * * We’re probably done with the details. However, there is one issue that we have not touched upon, of a global nature. Namely -

Why do you need Externalizable?

Why do we need advanced serialization at all? The answer is simple. Firstly, it gives much more flexibility. Secondly, it can often provide significant gains in terms of the volume of serialized data. Thirdly, there is such an aspect as performance, which we will talk about below . Everything seems to be clear with flexibility. Indeed, we can control the serialization and deserialization processes as we want, which makes us independent of any changes in the class (as I said just above, changes in the class can greatly affect deserialization). Therefore, I want to say a few words about the gain in volume. Let's say we have the following class:

public class DateAndTime{

  private short year;
  private byte month;
  private byte day;
  private byte hours;
  private byte minutes;
  private byte seconds;

}

The rest is unimportant. The fields could be made of type int, but this would only enhance the effect of the example. Although in reality the fields may be typed intfor performance reasons. In any case, the point is clear. The class represents a date and time. It is interesting to us primarily from the point of view of serialization. Perhaps the easiest thing to do would be to store a simple timestamp. It is of type long, i.e. when serialized it would take 8 bytes. In addition, this approach requires methods for converting components to one value and back, i.e. – loss in productivity. The advantage of this approach is a completely crazy date that can fit in 64 bits. This is a huge margin of safety, most often not needed in reality. The class given above will take 2 + 5*1 = 7 bytes. Plus overhead for the class and 6 fields. Is there any way to compress this data? For sure. Seconds and minutes are in the range 0-59, i.e. to represent them, 6 bits are enough instead of 8. Hours – 0-23 (5 bits), days – 0-30 (5 bits), months – 0-11 (4 bits). Total, everything without taking into account the year - 26 bits. There are still 6 bits left to the size of int. Theoretically, in some cases this may be enough for a year. If not, adding another byte increases the size of the data field to 14 bits, which gives a range of 0-16383. This is more than enough in real applications. In total, we have reduced the size of the data required to store the necessary information to 5 bytes. If not up to 4. The disadvantage is the same as in the previous case - if you store the date packed, then conversion methods are needed. But I want to do it this way: store it in separate fields and serialize it in packaged form. This is where it makes sense to use Externalizable:

// data is packed into 5 bytes:
//  3         2         1
// 10987654321098765432109876543210
// hhhhhmmmmmmssssssdddddMMMMyyyyyy yyyyyyyy
public void writeExternal(ObjectOutput out){
    int packed = 0;
    packed += ((int)hours) << 27;
    packed += ((int)minutes) << 21;
    packed += ((int)seconds) << 15;
    packed += ((int)day) << 10;
    packed += ((int)month) << 6;
    packed += (((int)year) >> 8) & 0x3F;
    out.writeInt(packed);
    out.writeByte((byte)year);
}

public void readExternal(ObjectInput in){
    int packed = in.readInt();
    year = in.readByte() & 0xFF;
    year += (packed & 0x3F) << 8;
    month = (packed >> 6) & 0x0F;
    day = (packed >> 10) & 0x1F;
    seconds = (packed >> 15) & 0x3F;
    minutes = (packed >> 21) & 0x3F;
    hours = (packed >> 27);
}

Actually, that's all. After serialization, we get overhead per class, two fields (instead of 6) and 5 bytes of data. Which is already significantly better. Further packaging can be left to specialized libraries. The example given is very simple. Its main purpose is to show how advanced serialization can be used. Although the possible gain in the volume of serialized data is far from the main advantage, in my opinion. The main advantage, in addition to flexibility... (smoothly move on to the next section...) Link to the source: Serialization as it is

Comments

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION