JavaRush /Java Blog /Random EN /Serialization Formats in Java

Serialization Formats in Java

Published in the Random EN group
Hello! Let's talk about serialization in Java. You probably remember that we already had lectures on serialization. That's right :) Here's the first one And here's the second one If you don't remember very well how serialization works, why it's needed, and what tools there are for it in Java, you can skim through these lectures. Today’s lecture will be theoretical, and in it we will take a closer look at serialization formats. Serialization formats in Java - 1First, let's remember what serialization is. Serialization is the process of storing the state of an object into a sequence of bytes. Deserialization is the process of reconstructing an object from these bytes. A Java object can be serialized and transferred over a network (for example, to another computer). So, this same sequence of bytes can be represented in different formats. You are familiar with this from everyday computer use. For example, the e-book (or simple text document) you are reading can be written in a bunch of different formats:
  • docx (Microsoft Word format);
  • pdf (Adobe format);
  • mobi (commonly used in Amazon Kindle devices);
  • and much more (ePub, djvu, fb2...).
It would seem that the task is the same: to present the text in a human-readable form. But people have invented a whole bunch of formats. Even without going into the details of their work, we can assume that this was done for a reason. Each of them probably has its own advantages and disadvantages compared to the others. Maybe serialization formats were created according to the same principle? Well, good guess, student! :) The way it is. The fact is that transmitting data over a distance is a rather delicate thing, and there are many factors in it. Who transmits the data? Where? What volume? Will the receiving party be a person or a machine (i.e. should the data be human-readable)? What kind of device will read the data? Obviously, situations are different. It’s one thing when you need to transfer a 500KB image from one smartphone to another. And it’s completely different when we are talking about 500 terabytes of business data that needs to be compressed as efficiently as possible and at the same time transferred as quickly as possible. Let's take a look at the main serialization formats and look at the advantages and disadvantages of each!

JSON

JavaScript Object Notation. You are already a little familiar with him! We talked about it in this lecture , and we looked at serialization in JSON here . It got its name for a reason. Java objects converted to JSON actually look exactly like JavaScript objects. You don't need to know JavaScript to understand the meaning of our object:

{
   "title": "Война и мир",
   "author": "Лев Толстой",
   "year": 1869
}
It is not necessary to pass one object. JSON can also contain an array of objects:

[
 {
   "title": "Война и мир",
   "author": "Лев Толстой",
   "year": 1869
 },

 {
   "title": "Бесы",
   "author": "Федор Достоевский",
   "year": 1872
 },

 {
   "title": "Чайка",
   "author": "Антон Чехов",
   "year": 1896
 }
]
Since JSON is a JavaScript object, it supports the following JavaScript data formats:
  • strings;
  • numbers (number);
  • objects (object);
  • arrays (array);
  • boolean values ​​(true and false);
  • null.
What advantages does JSON have?
  1. Human-readable format. This is an obvious advantage if your end user is human. For example, your server stores a database with flight schedules. A human client requests data from this database using a web application while sitting at home at a computer. Since you need to provide the data in a format that he can understand, JSON is a great solution.

  2. Simplicity. You could say it's elementary :) Above we gave an example of two JSON files. And even if you have never heard of the existence of JavaScript (let alone its objects), you can easily understand what kind of objects are described there.
    The entire JSON documentation is one web page with a couple of pictures.

  3. Widespread. JavaScript is the dominant front-end language, and it dictates its terms. Using JSON is a must. Therefore, a huge number of web services use JSON as a format for exchanging data. Every modern IDE supports the JSON format (including Intellij IDEA). A bunch of libraries have been written for working with JSON for all possible programming languages.

For example, you already worked with the Jackson library in the lecture where we learned to serialize Java objects into JSON. But besides Jackson there is, for example, GSON - a very convenient library from Google.

YAML

At the beginning of its existence, it stood for Yet Another Markup Language - “another markup language.” At the time it was positioned as a competitor to XML. Now, after the passage of time, it stands for “YAML Ain't Markup Language” (“YAML is not a markup language”). What is he like? Let's imagine that we need to create 3 character classes for our computer game: Warrior, Mage and Thief. They will have the following characteristics: strength, agility, endurance, and a set of weapons. This is what our YAML file with class descriptions will look like:

classes:
 class-1:
   title: Warrior
   power: 8
   agility: 4
   stamina: 7
   weapons:
     - sword
     - spear
    
 class-2:
   title: Mage
   power: 5
   agility: 7
   stamina: 5
   weapons:
     - magic staff

 class-3:
   title: Thief
   power: 6
   agility: 6
   stamina: 5
   weapons:
     - dagger
     - poison
The YAML file has a tree structure: some elements are nested within others. We can control nesting by using a certain number of spaces to denote each level. What advantages does the YAML format have?
  1. Human-readable. Again, even if you see a yaml file without a description, you can easily understand what objects are described there. YAML is how well human readable that the main page of yaml.org is a regular yaml file :)

  2. Compactness. The file structure is formed by spaces: there is no need to use brackets or quotes.

  3. Support for data structures native to programming languages. A huge advantage of YAML over JSON and many other formats is that it supports different data structures. Among them:

    • !!map
      An unordered collection of key:value pairs with no possibility of duplicates;

    • !!omap
      An ordered sequence of key:value pairs with no possibility of duplicates;

    • !!pairs:
      An ordered sequence of key:value pairs with the possibility of duplicates;

    • !!set
      An unordered sequence of values ​​that are not equal to each other;

    • !!seq
      Sequence of arbitrary values;

    Some of these structures will be familiar to you from Java! :) Thanks to this feature, you can serialize various data structures from programming languages ​​into the YAML format.

  4. Ability to use anchor and alias

    Translation of the words “anchor” and “alias” - “anchor” and “pseudonym”. In principle, it quite accurately describes the essence of these terms in YAML.

    They allow you to identify an element in a yaml file, and refer to it in the rest of the file if it occurs repeatedly. Anchor is created using the symbol &and alias is created using *.

    Let's say we have a file with descriptions of Leo Tolstoy's books. To avoid writing the author's name manually each time, we'll simply create an anchor "leo" and refer to it using an alias when we need it:

    
    books:
     book-1:
       title: War and Peace
       author: &leo Leo Tolstoy
       year: 1869
    
     book-2:
       title: Anna Karenina
       author: *leo
       year: 1873
    
     book-3:
       title: Family Happiness
       author: *leo
       year: 1859

    When we read this file with some parser, the value “Leo Tolstoy” will be substituted in the right places in place of our alias.

  5. You can embed data in other formats in YAML. For example, JSON:

    
    books: [
            {
              "title": "War and Peace",
              "author": "Leo Tolstoy",
              "year": 1869
            },
    
            {
              "title": "Anna Karenina",
              "author": "Leo Tolstoy",
              "year": 1873
            },
    
            {
              "title": "Family Happiness",
              "author": "Leo Tolstoy",
              "year": 1859
            }
          ]

Other serialization formats

XML

This format is based on the so-called tag tree.

<book>
   <title>Harry Potter and the Philosopher’s Stone</title>
   <author>J. K. Rowling</author>
   <year>1997</year>
</book>
Each element consists of an opening and closing tag (<> and </>). Each element can have nested elements. XML is a common format, not inferior to JSON and YAML (if we talk about use in real projects). We have a separate lecture on XML .

BSON (binary JSON)

As its name suggests, it is very similar to JSON, but is not human-readable and operates on data in binary format. This makes it very convenient for storing and transferring images and other attachments. In addition, BSON supports some data types that are not available in JSON. For example, you can write a date (in millisecond format) or even a piece of JavaScript code into a BSON file. The popular NoSQL database MongoDB stores information in BSON format.

Position based protocol

In some situations, we need to dramatically reduce the amount of data transferred (for example, if there is a lot of data and we need to reduce the load). In this situation, we can use position based protocol , that is, pass parameter values ​​without the names of the parameters themselves.

"Leo Tolstoy" | "Anna Karenina" | 1873
Data in this format takes up much less space than a full-fledged JSON file. Of course, there are other serialization formats, but you don't need to know them all right now :) It's good to be familiar with the formats that are now the industry standard for application development, and remember their advantages and differences from each other. And our lecture has come to an end :) Don't forget to solve a couple of problems today! See you again! :)
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION