MySQL Optimization -- IO scheduling algorithm optimization

Previously, we have shared the database optimization method on wechat official account. The link is https://mp.weixin.qq.com/s/6Atzk9UKPJRxxAs0nsKBXg  . The operating system part introduces the optimization of IO scheduling algorithm. This paper will compare the performance of disk IO under different scheduling algorithms through stress test.

1 Preparation

1.1} installing sysbench

This time, sysbench is used for pressure measurement. Install sysbench first. The steps are as follows:

curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.rpm.sh | sudo bash
yum -y install sysbench
sysbench --version

1.2 preparation of test documents

Generate test files that need to be used in the follow-up. The block size is 16k (known to MySQL DBA s, ha ha). Create four files, totaling 20G

[root@mha1 ~]# sysbench fileio --file-num=4 --file-block-size=16384 --file-total-size=20G prepare
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

4 files, 5242880Kb each, 20480Mb total
Creating files for the test...
Extra file open flags: (none)
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
Creating file test_file.3
21474836480 bytes written in 47.94 seconds (427.24 MiB/sec).

1.3 preparation of test sheet

Because we also need to test the database reading and writing, we need to create relevant tables and data first

[root@mha1 ~]# sysbench --db-driver=mysql --time=300 --threads=10 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=20 --table_size=1000000 oltp_insert --db-ps-mode=disable prepare
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Initializing worker threads...

Creating table 'sbtest3'...
Creating table 'sbtest2'...
Creating table 'sbtest5'...
Creating table 'sbtest6'...
Creating table 'sbtest8'...
Creating table 'sbtest7'...
Creating table 'sbtest4'...
Creating table 'sbtest1'...
Creating table 'sbtest10'...
Creating table 'sbtest9'...
Inserting 1000000 records into 'sbtest4'
Inserting 1000000 records into 'sbtest7'
Inserting 1000000 records into 'sbtest5'
Inserting 1000000 records into 'sbtest3'
Inserting 1000000 records into 'sbtest1'
Inserting 1000000 records into 'sbtest2'
Inserting 1000000 records into 'sbtest8'
Inserting 1000000 records into 'sbtest6'
Inserting 1000000 records into 'sbtest10'
Inserting 1000000 records into 'sbtest9'
Creating a secondary index on 'sbtest7'...
Creating a secondary index on 'sbtest10'...
Creating a secondary index on 'sbtest8'...
Creating a secondary index on 'sbtest5'...
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest9'...
Creating a secondary index on 'sbtest1'...
Creating table 'sbtest17'...
Creating a secondary index on 'sbtest3'...
Inserting 1000000 records into 'sbtest17'
Creating a secondary index on 'sbtest4'...
Creating a secondary index on 'sbtest6'...
Creating table 'sbtest20'...
Inserting 1000000 records into 'sbtest20'
Creating table 'sbtest18'...
Inserting 1000000 records into 'sbtest18'
Creating table 'sbtest15'...
Inserting 1000000 records into 'sbtest15'
Creating table 'sbtest19'...
Inserting 1000000 records into 'sbtest19'
Creating table 'sbtest14'...
Inserting 1000000 records into 'sbtest14'
Creating table 'sbtest11'...
Inserting 1000000 records into 'sbtest11'
Creating table 'sbtest13'...
Creating table 'sbtest12'...
Inserting 1000000 records into 'sbtest13'
Inserting 1000000 records into 'sbtest12'
Creating table 'sbtest16'...
Inserting 1000000 records into 'sbtest16'
Creating a secondary index on 'sbtest17'...
Creating a secondary index on 'sbtest20'...
Creating a secondary index on 'sbtest18'...
Creating a secondary index on 'sbtest19'...
Creating a secondary index on 'sbtest15'...
Creating a secondary index on 'sbtest11'...
Creating a secondary index on 'sbtest12'...
Creating a secondary index on 'sbtest13'...
Creating a secondary index on 'sbtest14'...
Creating a secondary index on 'sbtest16'...

2. View the supported scheduling algorithms

This disk is SSD hard disk, and the operating system version is centos7 8 . Next, the scheduling algorithm is modified to three different values for random read and random write stress testing

This system is centos7 8. You need to check the supported IO scheduling algorithm, and then conduct modification test.

[root@mha1 ~]# dmesg | grep -i scheduler
[    4.885816] io scheduler noop registered
[    4.885820] io scheduler deadline registered (default)
[    4.885867] io scheduler cfq registered
[    4.885870] io scheduler mq-deadline registered
[    4.885872] io scheduler kyber registered

It can be seen that in this system, the default scheduling algorithm is {deadline.

You can also view the current scheduling algorithm through the following command, where the brackets represent the currently used scheduling algorithm.

[root@mha1 ~]# cat /sys/block/sda/queue/scheduler 
noop [deadline] cfq 

3. deadline algorithm

Deadline is the best choice for database environment (Oracle, RAC, mysql, etc.) in the case of mechanical disk. The following is a stress test of random write and random read

3.1 random writing

[root@mha1 ~]# sysbench fileio \
>  --time=180 \
>  --threads=24 \
>  --file-total-size=20G \
>  --file-test-mode=rndwr \
>  --file-num=4 \
>  --file-extra-flags=direct \
>  --file-fsync-freq=0 \
>  --file-block-size=16384 \
>  run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 24
Initializing random number generator from current time


Extra file open flags: directio
4 files, 5GiB each
20GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random write test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      0.00
    writes/s:                     6935.37
    fsyncs/s:                     0.53

Throughput:
    read, MiB/s:                  0.00
    written, MiB/s:               108.37

General statistics:
    total time:                          180.0138s
    total number of events:              1248484

Latency (ms):
         min:                                    0.10
         avg:                                    3.46
         max:                                  107.39
         95th percentile:                       14.73
         sum:                              4317610.93

Threads fairness:
    events (avg/stddev):           52020.1667/426.95
    execution time (avg/stddev):   179.9005/0.01

The iops of random write is 6935.37, and the disk write speed is 108.37MiB/s

3.2 random reading

[root@mha1 ~]# sysbench fileio \
>  --time=180 \
>  --threads=24 \
>  --file-total-size=20G \
>  --file-test-mode=rndrd \
>  --file-num=4 \
>  --file-extra-flags=direct \
>  --file-fsync-freq=0 \
>  --file-block-size=16384 \
>  run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 24
Initializing random number generator from current time


Extra file open flags: directio
4 files, 5GiB each
20GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      7956.88
    writes/s:                     0.00
    fsyncs/s:                     0.00

Throughput:
    read, MiB/s:                  124.33
    written, MiB/s:               0.00

General statistics:
    total time:                          180.0075s
    total number of events:              1432313

Latency (ms):
         min:                                    0.10
         avg:                                    3.01
         max:                                  322.24
         95th percentile:                        5.47
         sum:                              4309094.67

Threads fairness:
    events (avg/stddev):           59679.7083/2688.56
    execution time (avg/stddev):   179.5456/0.18

The iops of random reading is 7956.88, and the disk reading speed is 124.33MiB/s

3.3 test database write

sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_write_only --db-ps-mode=disable run

SQL statistics:
    queries performed:
        read:                            0
        write:                           589934
        other:                           294968
        total:                           884902
    transactions:                        147483 (491.43 per sec.)
    queries:                             884902 (2948.62 per sec.)
    ignored errors:                      2      (0.01 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.1050s
    total number of events:              147483

Latency (ms):
         min:                                    2.58
         avg:                                   16.27
         max:                                 2608.34
         95th percentile:                       35.59
         sum:                              2399415.58

Threads fairness:
    events (avg/stddev):           18435.3750/90.33
    execution time (avg/stddev):   299.9269/0.04

It can be seen that the TPS written randomly is 491.43 and the number of queries is 2948.62

3.4 test database reading

sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_read_only --db-ps-mode=disable run

SQL statistics:
    queries performed:
        read:                            1651692
        write:                           0
        other:                           235956
        total:                           1887648
    transactions:                        117978 (393.13 per sec.)
    queries:                             1887648 (6290.13 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.0949s
    total number of events:              117978

Latency (ms):
         min:                                    3.08
         avg:                                   20.34
         max:                                  170.48
         95th percentile:                       29.19
         sum:                              2399636.31

Threads fairness:
    events (avg/stddev):           14747.2500/1513.84
    execution time (avg/stddev):   299.9545/0.04

It can be seen that the TPS during random reading is 393.13 and the number of queries is 6290.13

4. noop algorithm

4.1} modified to noop algorithm

noop, called elevator scheduling algorithm, is based on FIFO queue. All requests are first in first out. Because SSD has fast random read and random write speed, this algorithm is suitable for SSD hard disk.

[root@mha1 ~]# echo 'noop' >/sys/block/sda/queue/scheduler
[root@mha1 ~]# cat /sys/block/sda/queue/scheduler 
[noop] deadline cfq 

4.2 random writing

[root@mha1 ~]# sysbench fileio \
>  --time=180 \
>  --threads=24 \
>  --file-total-size=20G \
>  --file-test-mode=rndwr \
>  --file-num=4 \
>  --file-extra-flags=direct \
>  --file-fsync-freq=0 \
>  --file-block-size=16384 \
>  run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 24
Initializing random number generator from current time


Extra file open flags: directio
4 files, 5GiB each
20GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random write test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      0.00
    writes/s:                     7057.60
    fsyncs/s:                     0.53

Throughput:
    read, MiB/s:                  0.00
    written, MiB/s:               110.27

General statistics:
    total time:                          180.0136s
    total number of events:              1270481

Latency (ms):
         min:                                    0.10
         avg:                                    3.40
         max:                                  240.39
         95th percentile:                       14.46
         sum:                              4317435.99

Threads fairness:
    events (avg/stddev):           52936.7083/487.57
    execution time (avg/stddev):   179.8932/0.02

The iops of random write is 7057.60, and the disk write speed is 110.27MiB/s

3.4 random reading

[root@mha1 ~]# sysbench fileio \
>  --time=180 \
>  --threads=24 \
>  --file-total-size=20G \
>  --file-test-mode=rndrd \
>  --file-num=4 \
>  --file-extra-flags=direct \
>  --file-fsync-freq=0 \
>  --file-block-size=16384 \
>  run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 24
Initializing random number generator from current time


Extra file open flags: directio
4 files, 5GiB each
20GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      8399.89
    writes/s:                     0.00
    fsyncs/s:                     0.00

Throughput:
    read, MiB/s:                  131.25
    written, MiB/s:               0.00

General statistics:
    total time:                          180.0100s
    total number of events:              1512081

Latency (ms):
         min:                                    0.10
         avg:                                    2.85
         max:                                  315.77
         95th percentile:                        5.00
         sum:                              4312384.33

Threads fairness:
    events (avg/stddev):           63003.3750/10086.77
    execution time (avg/stddev):   179.6827/0.12

The iops of random reading is 8399.89, and the disk reading speed is 131.25MiB/s

4.4 database writing

sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_write_only --db-ps-mode=disable run

SQL statistics:
    queries performed:
        read:                            0
        write:                           653457
        other:                           326730
        total:                           980187
    transactions:                        163364 (544.38 per sec.)
    queries:                             980187 (3266.28 per sec.)
    ignored errors:                      2      (0.01 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.0903s
    total number of events:              163364

Latency (ms):
         min:                                    2.62
         avg:                                   14.69
         max:                                  220.12
         95th percentile:                       32.53
         sum:                              2399040.57

Threads fairness:
    events (avg/stddev):           20420.5000/112.69
    execution time (avg/stddev):   299.8801/0.04

It can be seen that the TPS written randomly is 544.38 and the number of queries is 3266.28

4.5 database read only

sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_read_only --db-ps-mode=disable run
SQL statistics:
    queries performed:
        read:                            1596364
        write:                           0
        other:                           228052
        total:                           1824416
    transactions:                        114026 (379.97 per sec.)
    queries:                             1824416 (6079.59 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.0869s
    total number of events:              114026

Latency (ms):
         min:                                    3.08
         avg:                                   21.04
         max:                                  321.03
         95th percentile:                       31.37
         sum:                              2399600.56

Threads fairness:
    events (avg/stddev):           14253.2500/1475.71
    execution time (avg/stddev):   299.9501/0.02

It can be seen that the TPS for read-only is 379.97 and the number of queries is 6079.59

5. cfq algorithm

5.1 modified to cfq algorithm

cfq is called absolute fair scheduling algorithm. It creates a queue for each process and thread to manage IO requests, which has the effect of evenly distributing io for each process and thread. This algorithm is applicable to general servers, and centos6 is the default IO scheduling algorithm.

[root@mha1 ~]# echo 'cfq' >/sys/block/sda/queue/scheduler 
[root@mha1 ~]# cat /sys/block/sda/queue/scheduler 
noop deadline [cfq] 

5.2 random writing

[root@mha1 ~]# sysbench fileio \
>  --time=180 \
>  --threads=24 \
>  --file-total-size=20G \
>  --file-test-mode=rndwr \
>  --file-num=4 \
>  --file-extra-flags=direct \
>  --file-fsync-freq=0 \
>  --file-block-size=16384 \
>  run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 24
Initializing random number generator from current time


Extra file open flags: directio
4 files, 5GiB each
20GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random write test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      0.00
    writes/s:                     6614.37
    fsyncs/s:                     0.53

Throughput:
    read, MiB/s:                  0.00
    written, MiB/s:               103.35

General statistics:
    total time:                          180.0118s
    total number of events:              1190677

Latency (ms):
         min:                                    0.10
         avg:                                    3.63
         max:                                  348.78
         95th percentile:                       15.27
         sum:                              4317092.54

Threads fairness:
    events (avg/stddev):           49611.5417/517.80
    execution time (avg/stddev):   179.8789/0.03

The iops of random write is 6614.37, and the disk write speed is 103.35MiB/s

5.3 random reading

[root@mha1 ~]# sysbench fileio \
>  --time=180 \
>  --threads=24 \
>  --file-total-size=20G \
>  --file-test-mode=rndrd \
>  --file-num=4 \
>  --file-extra-flags=direct \
>  --file-fsync-freq=0 \
>  --file-block-size=16384 \
>  run
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 24
Initializing random number generator from current time


Extra file open flags: directio
4 files, 5GiB each
20GiB total file size
Block size 16KiB
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random read test
Initializing worker threads...

Threads started!


File operations:
    reads/s:                      7481.39
    writes/s:                     0.00
    fsyncs/s:                     0.00

Throughput:
    read, MiB/s:                  116.90
    written, MiB/s:               0.00

General statistics:
    total time:                          180.0086s
    total number of events:              1346731

Latency (ms):
         min:                                    0.10
         avg:                                    3.20
         max:                                  374.49
         95th percentile:                        5.77
         sum:                              4312382.07

Threads fairness:
    events (avg/stddev):           56113.7917/3058.00
    execution time (avg/stddev):   179.6826/0.17

The iops of random reading is 7481.39, and the disk reading speed is 116.90MiB/s

5.4 database write

sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_write_only --db-ps-mode=disable run
SQL statistics:
    queries performed:
        read:                            0
        write:                           598765
        other:                           299384
        total:                           898149
    transactions:                        149691 (498.54 per sec.)
    queries:                             898149 (2991.25 per sec.)
    ignored errors:                      2      (0.01 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.2552s
    total number of events:              149691

Latency (ms):
         min:                                    2.55
         avg:                                   16.02
         max:                                  779.62
         95th percentile:                       35.59
         sum:                              2397311.08

Threads fairness:
    events (avg/stddev):           18711.3750/132.24
    execution time (avg/stddev):   299.6639/0.38

It can be seen that the TPS written randomly is 498.54 and the number of queries is 2991.25

5.5 database reading

sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_read_only --db-ps-mode=disable run
SQL statistics:
    queries performed:
        read:                            1448342
        write:                           0
        other:                           206906
        total:                           1655248
    transactions:                        103453 (344.66 per sec.)
    queries:                             1655248 (5514.58 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          300.1562s
    total number of events:              103453

Latency (ms):
         min:                                    3.11
         avg:                                   23.19
         max:                                  222.31
         95th percentile:                       38.25
         sum:                              2399486.55

Threads fairness:
    events (avg/stddev):           12931.6250/1278.72
    execution time (avg/stddev):   299.9358/0.01

It can be seen that the TPS in read - only mode is 344.66 and the number of queries is 5514.58

6 Summary

According to the test results, compare the reading and writing speeds of the three scheduling algorithms

algorithm IOPS Disk write speed IOPS Disk read speed oltp_write_only  oltp_read_only 
deadline 6935.37 118.37MiB/s 7956.88 124.33MiB/s The TPS is 491.43 and the number of queries is 2948.62 The TPS is 393.13 and the number of queries is 6290.13
noop 7057.60 110.27MiB/s 8399.89 131.25MiB/s The TPS is 544.38 and the number of queries is 3266.28 The TPS is 379.97 and the number of queries is 6079.59
cfq 6614.37 103.35MiB/s 7481.39 116.90MiB/s The TPS is 498.54 and the number of queries is 2991.25 The TPS is 344.66 and the number of queries is 5514.58

Because the test environment is SSD hard disk, it is recommended to choose noop disk IO scheduling algorithm in this case. This conclusion is also in line with our expectations.

Special note: the scheduling algorithm of disk IO also needs to be distinguished according to various scenarios such as disk conditions, database type, database architecture and business scenarios (OLTP, OLAP, etc.), and the scheduling algorithms of different scenarios also need to be adjusted, which can not be generalized. If it is uncertain, it is recommended to conduct pressure measurement to judge and select the most appropriate algorithm in line with the corresponding scene.

If you want to learn more or participate in technical exchanges, you can follow the wechat official account [Database dry shop] or enter the technical exchange group for communication.

Tags: MySQL

Posted by Lashiec on Sat, 21 May 2022 20:13:33 +0300