[Java collection] Java serialization

[Java collection] Java serialization


preface


In the life cycle of program operation, the operation of serialization and deserialization occurs almost all the time. For any language, whether it is compiled or interpreted, as long as it needs communication or persistence, it must involve serialization and deserialization. However, because serialization and deserialization are too important and common, most programming languages and frameworks have well encapsulated them, and because their moistening objects are silent, we often don't realize that there are a lot of serialization related operations under the code. Today we will explore this most familiar stranger.

What is serialization


Baidu Encyclopedia Serialization is defined as "serialization is the process of converting the state information of an object into a form that can be stored or transmitted". It seems a little abstract. Let's make a simple analogy with an example.


In daily life, there is always communication between people. The premise of communication is to express what we think in our brain in some form. Then others will understand through the content we express.





There are many ways of expression. The most common way is to speak. We express what we think in our mind by saying something. After listening to these words, the other party immediately understands our thoughts. Of course, the expression can also be words. For example, the article you are reading is also communicating with you? Directors express their understanding of the world through films, painters express their desire for beauty through paintings, and musicians describe their desire for freedom through musical symbols. All these are numerous.


So what does this have to do with our topic serialization?


In fact, communication is indispensable between people, between programs, and between machines. It's just that we usually don't say it's communication. We will say request, response, transmission, communication... The same content is just another way of saying it.


As mentioned above, communication between people needs a way of expression. Through this way of expression, we can transform what we think in our brain into what others can understand. The communication between machines also needs such an expression, through which the content in the memory can be transformed into the content that can be read by other machines.





Therefore, serialization can be simply understood as the expression of information in machine memory.

Why serialization


Usually, on the one hand, our language is used for communication, such as chat, to express the thoughts in my mind through language, and the other party will hear our words and understand our thoughts.


On the other hand, our language can be used not only for communication, but also for recording. There is a saying called "a good memory is better than a bad pen". That is the importance of recording, because words are easy to forget in our minds. We can keep them for a longer time by recording them.


The serialization function corresponds to these two points. One is used to transmit information, and the other is used for persistence. The role of serialization for transmission has been mentioned earlier, and the role of persistence is also well understood. First, clarify a question: what is the serialized content? Usually the content in memory. The memory has a feature we all know, that is, it disappears as soon as it is restarted. For some contents, we want to exist after restart (for example, the objects in the session in tomcat). What should we do? The answer is to save the objects in memory to disk, so you are not afraid to restart, and serialization technology is required for persistence.

How to implement serialization


There are many ways of expression between people, and there are also many ways of serialization between machines.

Java native form


For such a common serialization requirement, Java actually supports it at the language level as early as JDK 1.1. And it's very convenient to use. Let's take a look at the specific code.

  1. First of all, we need to implement the class we want to serialize into Java io. Serializable interface
/*
 *
 *  * *
 *  *  * blog.coder4j.cn
 *  *  * Copyright (C) 2016-2020 All Rights Reserved.
 *  *
 *
 */
package cn.coder4j.study.example.serialization;

import java.io.Serializable;
import java.util.StringJoiner;

/**
 * @author buhao
 * @version HaveSerialization.java, v 0.1 2020-09-17 16:58 buhao
 */
public class HaveSerialization implements Serializable {
    private static final long serialVersionUID = -4504407589319471384L;
    private String name;
    private Integer age;

    /**
     * Getter method for property <tt>name</tt>.
     *
     * @return property value of name
     */
    public String getName() {
        return name;
    }

    /**
     * Setter method for property <tt>name</tt>.
     *
     * @param name value to be assigned to property name
     */
    public void setName(String name) {
        this.name = name;
    }

    /**
     * Getter method for property <tt>age</tt>.
     *
     * @return property value of age
     */
    public Integer getAge() {
        return age;
    }

    /**
     * Setter method for property <tt>age</tt>.
     *
     * @param age value to be assigned to property age
     */
    public void setAge(Integer age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return new StringJoiner(", ", HaveSerialization.class.getSimpleName() + "[", "]")
                .add("name='" + name + "'")
                .add("age=" + age)
                .toString();
    }
}


It should be noted that although it implements Java io. Serializable interface, but we don't actually cover any methods. Why? Let's take a look at Java io. Serializable source code.

public interface Serializable {
}


Yes, it's an empty interface. There's nothing except the interface definition. We usually encounter this kind of situation. We call it tag interface, which is mainly used to mark some classes. The reason for marking is to distinguish it from other classes for our later special processing. The tag interface of {Serializable is to let us know that this class is a class to serialize, that's all.


In addition, although we only implement an empty interface, you must find that there is an additional serialVersionUID attribute in our class. So what is the function of this attribute?


Its main purpose is to verify whether the serialized and deserialized classes are consistent. For example, the above class HaveSerialization now has business attributes name and age. Now we need to add an address attribute because of business needs. There is no problem with serialization, but when serialization information is transmitted to other machines, other machines have problems in deserialization. Because the HaveSerialization of other machines does not have the address attribute.


In order to solve this problem, JDK uses serialVersionUID as the version number of the class. During deserialization, it compares whether the value of the transmitted class is consistent with the value of the class to be deserialized. If it is inconsistent, it will report InvalidCastException.


Of course, the starting point is good, but throwing exceptions directly will make the business impossible. Generally, after the serialVersionUID is generated, we will not update it. If the serialization is not updated, the corresponding changed attribute will be empty. We just need to be compatible in the business.


  1. serialized objects


Well, we have completed the first step and defined a serialization class. Now we will serialize it.

    /**
     * Serialize object (save serialized file)
     * @throws IOException
     */
    @Test
    public void testSaveSerializationObject() throws IOException {
        // create object
        final HaveSerialization haveSerialization = new HaveSerialization();
        haveSerialization.setName("kiwi");
        haveSerialization.setAge(18);
        // Create file saved by serialized object
        final File file = new File("haveSerialization.ser");
        // Create object output stream
        try (final ObjectOutputStream objectOutputStream = new ObjectOutputStream(new FileOutputStream(file))) {
            // Output object to serialization file
            objectOutputStream.writeObject(haveSerialization);
        }
    }


You can see that the code is very simple, which is roughly divided into the following four steps:

  1. Create an object to serialize
    In fact, it is your implementation of Java io. If the serializable class is not implemented, a NotSerializableException will be reported here
  2. Create a File object to save the serialized binary data.
    Note that the file name here is * Ser, this ser suffix is not mandatory, but it is convenient to understand. You may write other suffixes
  3. Create object output stream
    Create an ObjectOutputStream object, the object of the output stream, and pass the serialized file object defined above to it through the constructor.
  4. Write the object to the serialization file through the output stream
    Note that I use the try with resource syntax of JDK 8 here, so there is no need to close manually


Well, here we are, and the serialization is complete.

  1. Deserialize object


Since there is serialization, there must be deserialization. Deserialization can be understood as the reverse operation of serialization. Since serialization turns the object in memory into a persistent file, what deserialization needs to do is to load the file into the object in memory. Don't say much, just look at the code.

    /**
     * Deserialize object (read object from serialization file)
     * @throws IOException
     * @throws ClassNotFoundException
     */
    @Test
    public void testLoadSerializationObject() throws IOException, ClassNotFoundException {
        // Create object output stream
        try (ObjectInputStream objectInputStream = new ObjectInputStream(
                new FileInputStream(new File("haveSerialization.ser")))) {
            // Create objects from the output stream
            final Object obj = objectInputStream.readObject();
            System.out.println(obj);
        }
    }


Yes, there are fewer deserialization codes than serialization codes, which are mainly divided into the following two steps:

  1. Create object input stream
    Create an ObjectInputStream object and pass the serialized file to it through the constructor
  2. Read object from object input stream
    You can directly use the readObject method. Note that the Object type is after reading, and you need to manually turn it once for subsequent use




Here, we have completed the serialization and deserialization operations through the JDK native methods. Is it still very simple. However, it is not recommended to directly use the native method to realize serialization in daily work. On the one hand, it generates a large serialization file, and on the other hand, it is slower than that generated by some third-party frameworks, but the serialization principle is roughly similar. Let's take a brief look at other ways to serialize.

Generic object serialization


Generally, serialization is bound to the language. For example, the files serialized through the JDK above cannot be taken to the PHP application and deserialized into PHP objects. However, some special general object structure serialization can be used to realize cross language use. The more common ones are JSON and XML. Let's take JSON as an example

    /**
     * Test serialization passed json
     */
    @Test
    public void testSerializationByJSON(){
        //-------------Serialization operation---------------
        // create object
        final HaveSerialization haveSerialization = new HaveSerialization();
        haveSerialization.setName("kiwi");
        haveSerialization.setAge(18);

        // Serialize to JSON string
        final String jsonString = JSON.toJSONString(haveSerialization);
        System.out.println("JSON:" + jsonString);

        //-------------Deserialization operation---------------
        final HaveSerialization haveSerializationByJSON = JSON.parseObject(jsonString, HaveSerialization.class);
        System.out.println(haveSerializationByJSON);
    }


Operation results:

JSON:{"age":18,"name":"kiwi"}
HaveSerialization[name='kiwi', age=18]


The JSON framework used by the above code is alibaba/fastjson . However, most JSON frameworks are similar in use. It can be replaced according to personal preference.

Serialization framework


There are many serialization frameworks, such as kryo, hessian and protostuff. They have their own advantages and disadvantages. For a detailed comparison, you can see this article Serialization framework kryo vs Hessian vs protostaff vs Java . You can choose to use it according to your own use scenarios. The following takes kryo as an example.

  1. rely on
<dependency>
    <groupId>com.esotericsoftware</groupId>
    <artifactId>kryo</artifactId>
    <version>5.0.0-RC9</version>
</dependency>
  1. Specific code
    /**
     * Test serialization passed kryo
     */
    @Test
    public void testSerializationByKryo() throws FileNotFoundException {
        //-------------Serialization operation---------------
        // create object
        final HaveSerialization haveSerialization = new HaveSerialization();
        haveSerialization.setName("kiwi");
        haveSerialization.setAge(18);

        final Kryo kryo = new Kryo();
        // Register serialization class
        kryo.register(HaveSerialization.class);

        // Serialization operation
        try (final Output output = new Output(new FileOutputStream("haveSerialization.kryo"))) {
            kryo.writeObject(output, haveSerialization);
        }

        // Deserialization
        try (final Input input = new Input(new FileInputStream("haveSerialization.kryo"))) {
            final HaveSerialization haveSerializationByKryo = kryo.readObject(input, HaveSerialization.class);
            System.out.println(haveSerializationByKryo);
        }
    }


In fact, looking at the code, we can find that it is almost the same as the JDK process. There are several points to note. Kryo needs to manually register the serialized classes through register before serialization, which is a bit similar to the JDK implementation of Java io. Serializable interface. Then, the Input and Output objects are not JDK objects. It was provided by kryo. In addition, kryo has a lot to pay attention to. You can check the contents of the reference link section for learning.

Source address


Due to the limited space of the article, it is impossible to display all the codes. The complete code has been uploaded to github. The specific links are as follows:


https://github.com/kiwiflydream/study-example/tree/master/study-serialization-example

Reference link

  1. Serialization framework kryo vs Hessian vs protostaff vs Java - know what it is, know why - ITeye blog
  2. Kryo User Guide - hntyzgn - blog Park
  3. In depth understanding of RPC serialization -- Kryo | Xu Jingfeng | personal blog
  4. EsotericSoftware/kryo: Java binary serialization and cloning: fast, efficient, automatic

summary


This article mainly introduces the relevant contents of Java serialization. What is serialization? By analogy with the expression of communication between people, we can get the expression of information in machine memory. Why serialization is needed? We illustrate the functions of serialization information transmission and persistence through examples. Finally, let's start from the JDK native implementation of Java io. Serializable mode, JSON and XML mode of general object serialization, and finally the form of third-party framework kryo to understand how to realize serialization.

Tags: Java

Posted by Syn on Sat, 14 May 2022 13:38:35 +0300