It takes only 4 seconds to export 20w pieces of data from the system report page to the local. How do I do it

background

Recently, a younger student came to me and described the following scenes to me:

There are many reports on their company's internal management system. The report data are displayed in pages, and the browsing speed is OK. However, when each report exports data with a slightly larger time window, it is extremely slow. Sometimes, heap overflow occurs when multiple people export it together.

He knew that the heap overflow was caused by loading all the data into the jvm memory. So we can only limit the time window. To avoid heap overflow caused by excessive export data. The final limit is that the time window of exported data cannot exceed 1 month.

Although the problem was solved, the operation lady was unhappy. She ran over and said to her younger brother, I want to export the data of one year. Should I export it 12 times and then merge it manually. The younger brother thought, so is this. The system serves people and cannot change its essence in order to solve problems.

So the question he wants to ask me is: is there any way to fundamentally solve this problem.

To fundamentally solve this problem, he proposed to meet two conditions

  • Faster export speed
  • Multiple people can download large data sets in parallel

After listening to his question, I think many other children's shoes must have encountered this problem when exporting data from web pages. In order to maintain the stability of the system, many people usually limit the number of exported items or time window when exporting data. However, the demand side certainly prefers to export data sets with arbitrary conditions at one time.

Can you have both fish and bear's paw?

The answer is yes.

I firmly told my younger brother that about 7 years ago, I made a scheme of download center. The export of 20w data takes about 4 seconds... Support multiple people to export online at the same time...

The younger student was a little excited after listening to the expression, but his eyebrows wrinkled again and said, can there be so fast, 20w data for 4 seconds?

In order to give him an example, I dug out the code seven years ago... It took one night to take out the core code, peel it off, and make it an example of a download center

Demonstration of ultrafast download scheme

Let's not talk about technology first, but see the effect first. (the complete case code is provided at the end of the text)

The database is mysql (in theory, this scheme supports any structured database), and a test table is prepared t_person. The structure of the table is as follows:

CREATE TABLE `t_person` (
  `id` bigint(20) NOT NULL auto_increment,
  `name` varchar(20) default NULL,
  `age` int(11) default NULL,
  `address` varchar(50) default NULL,
  `mobile` varchar(20) default NULL,
  `email` varchar(50) default NULL,
  `company` varchar(50) default NULL,
  `title` varchar(50) default NULL,
  `create_time` datetime default NULL,
  PRIMARY KEY  (`id`)
);

There are 9 fields in total. Let's create test data first.

The case code provides a simple page. Click the following button to create 5w test data at one time:

Here I clicked 4 times in a row and quickly generated 20w pieces of data. Here, in order to show the general appearance of the data, I directly jumped to the last page

Then click on the download large capacity file and click the execute button to start downloading t_ All the data in the person table

After clicking the execute button and clicking the refresh button below, you can see an asynchronous download record. The status is P, indicating the pending status. Keep refreshing the refresh button. After a few seconds, this record will become S, indicating Success

Then you can download it locally. The file size is about 31M

Seeing here, many children's shoes need to be confused. Is this downloaded csv? csv is actually a text file. Opening it with Excel will lose format and accuracy. This can't solve the problem. We need excel format!!

In fact, children's shoes with a little knowledge of excel skills can use the function of excel to import data. Data - > import data. Step by step according to the prompt, just select comma separation. The key column can define the format, and the data import can be completed in 10 seconds

You just tell the little sister of operation to complete the import of excel according to this step. And the downloaded files can be downloaded repeatedly.

Does it essentially solve the problem of downloading large capacity data sets?

Principle and core code

Hearing this, the younger student said excitedly that this scheme can solve my pain points here. Tell me the principle.

In fact, the core of this scheme is very simple. It only comes from one knowledge point. Make full use of the interface of JdbcTemplate:

@Override
public void query(String sql, @Nullable Object[] args, RowCallbackHandler rch) throws DataAccessException {
  query(sql, newArgPreparedStatementSetter(args), rch);
}

sql is select * from t_person, RowCallbackHandler this callback interface refers to the callback function to be executed after each piece of data is traversed. Now post my own implementation of RowCallbackHandler

private class CsvRowCallbackHandler implements RowCallbackHandler{

    private PrintWriter pw;

    public CsvRowCallbackHandler(PrintWriter pw){
        this.pw = pw;
    }

    public void processRow(ResultSet rs) throws SQLException {
        if (rs.isFirst()){
            rs.setFetchSize(500);
            for (int i = 0; i < rs.getMetaData().getColumnCount(); i++){
                if (i == rs.getMetaData().getColumnCount() - 1){
                    this.writeToFile(pw, rs.getMetaData().getColumnName(i+1), true);
                }else{
                    this.writeToFile(pw, rs.getMetaData().getColumnName(i+1), false);
                }
            }
        }else{
            for (int i = 0; i < rs.getMetaData().getColumnCount(); i++){
                if (i == rs.getMetaData().getColumnCount() - 1){
                    this.writeToFile(pw, rs.getObject(i+1), true);
                }else{
                    this.writeToFile(pw, rs.getObject(i+1), false);
                }
            }
        }
        pw.println();
    }

    private void writeToFile(PrintWriter pw, Object valueObj, boolean isLineEnd){
        ...
    }
}

What this CsvRowCallbackHandler does is to take 500 pieces from the database at a time and write them to the local file on the server. In this way, whether you find 20w or 100w pieces of sql, the memory theoretically only occupies the storage space of 500 pieces of data. When the file is written, all we need to do is download the generated file from the server to the local.

Because only 500 pieces of data are continuously refreshed in memory, even in the environment of multi-threaded download. Memory will not overflow. In this way, the scenario of multi person download is perfectly solved.

Of course, too many parallel downloads will not overflow memory, but will consume a lot of IO resources. To this end, we still need to control the number of multi-threaded parallelism. We can use thread pool to submit jobs

ExecutorService threadPool = Executors.newFixedThreadPool(5);

threadPool.submit(new Thread(){
    @Override
    public void run() {
    Download big data set code
  }
}
                

Finally, the download of 50w person data is tested, which takes about 9 seconds, and 100w person data takes 19 seconds. This download efficiency should meet the report export needs of most companies.

last

After getting my sample code, after a week of modification, the new version of page export was launched. All reports were submitted to asynchronous operation, and everyone went to the download center to view and download files. Perfectly solved the previous two pain points.

But in the end, I still have a question, why can't I directly generate excel. That is, continue to write data to excel in RowCallbackHandler?

My answer is:

1. The writing of text file stream is relatively fast

I haven't tried to write excel file format successfully. Anyway, it seems that I haven't tried to write excel file format successfully.

I sorted out the stripped cases and provided them to you free of charge, hoping to help children's shoes who encounter similar scenes.

Focus on the author

Follow the official account "Yuanren tribe" reply "export case" to obtain the above complete case code, which can be run directly and input on the page http://127.0.0.1 : 8080 you can open the simulation page of the case in the text.

A technology sharing number that only makes original technology

Tags: Java

Posted by warpdesign on Tue, 24 May 2022 16:39:38 +0300