SQL optimization case: OR condition optimization

Following is an introduction to the previous article, SQL Optimization Case (1): Implicit Conversion, which focuses on OR optimization.

In MySQL, the same query condition, if the OR s are transformed in the SQL statement, the results of the query will be different. In many complex cases, poor index selection may be a potential performance hazard. To avoid a significant reduction in execution efficiency, we can properly consider using uniform all SQL to separate complex query logic.

Common OR usage scenarios, please read the following cases.

Case 1: Use OR Conditional Query for Different Columns

1. Scene to be optimized

SELECT
..
..
 FROM`t1` a
 WHERE a.token= '16149684'
 AND a.store_id= '242950'
 AND(a.registrationId IS NOT NULL
 AND a.registrationId<> '')
 OR a.uid= 308475
 AND a.registrationId IS NOT NULL
 AND a.registrationId<> ''

Execution Plan

+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
| id           | select_type           | table           | type           | key               | key_len           | ref           | rows           | Extra                                       |
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
| 1            | SIMPLE                | a               | range          |idx_registrationid | 99                |               | 100445         | Using index condition; Using where          |
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+

A total of 1 row of records was returned, which took 5 ms.

2. Scene Analysis

Tokens and uid filtering can be ground very well from query conditions, but because of use, or because of the need for index merging, better performance can be achieved. However, the MySQL optimizer chose to use the registrationId index instead during actual execution, resulting in poor SQL performance.

3. Scene optimization

We'll rewrite SQL to union all.

SELECT
...
...
FROM`t1` a
WHERE a.token = '16054473'
AND a.store_id = '138343'
AND b.is_refund = 1
AND (a.registrationId IS NOT NULL
AND a.registrationId <> '')
union all
SELECT
...
...
FROM`t1` a
where a.uid = 181579
AND a.registrationId IS NOT NULL
AND a.registrationId <> ''
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
| id           | select_type           | table           | type           | possible_keys                | key           | key_len           | ref                          | rows           | Extra                              |
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
| 1            | PRIMARY               | a               | ref            | IDX_TOKEN,IDX_STORE_ID_TOKEN | IDX_TOKEN     | 63                | const                        | 1              | Using index condition; Using where |
| 1            | PRIMARY               | b               | eq_ref         | PRIMARY                      | PRIMARY       | 4                 | youdian_life_sewsq.a.role_id | 1              | Using where                        |
| 2            | UNION                 | a               | const          | PRIMARY                      | PRIMARY       | 4                 | const                        | 1              |                                    |
| 2            | UNION                 | b               | const          | PRIMARY                      | PRIMARY       | 4                 | const                        | 0              | unique row not found               |
|              | UNION RESULT          | <union1,2>      | ALL            |                              |               |                   |                              |                | Using temporary                    |
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+

Five rows of records were returned, which took 5 ms.

By comparing the execution plans before and after the optimization, you can prune them significantly, merge the SQL into two subqueries, and then use union to merge the results, which is more stable, safe and efficient.

Case 2: Use OR query criteria for the same column

1. Scene to be optimized

select
....
....
from
t1 as mci
left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1
left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2
left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3
left join(
 select product_id,
 count(0) count
 from t2 pprod
 inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id
 and pprod.is_enable =1
 and ppinfo.is_enable=1
 and pinfo.belong_t0 =1
 and pinfo.end_time >=now()
 and not (
 pinfo.onshelv_time>'2019-06-30 00:00:00'
 or pinfo.end_time>'2018-12-05 00:00:00'
 )group by pprod.product_id
)as pc on pc.product_id = mci.product_id
where mci.is_enable =0
and mci.comodifty_type in ('1', '5', '6')
and (pc.count =0 or pc.count isnull ) limit 0,5;

Execution Plan

2. Scene Analysis

In this example, there is a subquery in the SQL query, which is treated as a drive table and produces auto_key, tested by SQL, verifies that (pc.count = 0 or pc.count is null) can affect the performance of the entire SQL and needs to be overridden.

3. Scene optimization

First, let's think about how to optimize (pc.count = 0 or pc.count is null)? Write a similar SQL first

Select col from test where col =100 or col is null;
+--------+
| col    |
+--------+
|    100 |
|   NULL |
+--------+
2 rows in set (0.00 sec)

At this point we see the same column, but corresponding to different values, which can be converted using case when.

Select col From test where case when col is null then 100 else col =100 end;
+--------+
| col    |
+--------+
|    100 |
|   NULL |
+--------+
2 rows in set (0.00 sec)

Return to the original SQL for overwriting.

select
....
....
from
t1 as mci
left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1
left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2
left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3
left join(
 select product_id,
 count(0) count
 from t2 pprod
 inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id
 and pprod.is_enable =1
 and ppinfo.is_enable=1
 and pinfo.belong_t0 =1
 and pinfo.end_time >=now()
 and not (
 pinfo.onshelv_time>'2019-06-30 00:00:00'
 or pinfo.end_time>'2018-12-05 00:00:00'
 )group by pprod.product_id
)as pc on pc.product_id = mci.product_id
where mci.is_enable =0
and mci.comodifty_type in ('1', '5', '6')
and case when pc.count is null then 0 else pc.count end=0 limit 0,5;

Optimized SQL can be polished 30 seconds faster than the original SQL, and the execution efficiency is about 50 times higher.

Case 3: Optimizing associated SQL OR conditions

1. Scene to be optimized

SELECT user_msg.msg_id AS 'msg_id', user_msg.content AS 'msg_content', ...
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL30SECOND)
OR user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
OR group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

2. Scene Analysis

We carefully analyzed the above query statements and found that although business logic only needs to query data modified in half a minute, the execution process must associate all the data, resulting in a loss of performance.

3. Scene optimization

We decompose the original SQL, the first part sql-01 is as follows:

SELECT user_msg.msg_id AS 'msg_id', user_msg.content AS 'msg_content', ...
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

sql-01 to user_ The MSG table is driven, using gmt_ The modified index filters the latest data.

The second part sql-02 is as follows:

SELECT user_msg.msg_id AS 'msg_id', user_msg.content AS 'msg_content', ...
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

sql-02 user-driven table, msg user_id has a good index filter row.

The third part sql-03 is as follows:

SELECT user_msg.msg_id AS 'msg_id', user_msg.content AS 'msg_content', ...
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

sql-03 takes group as the drive table and uses gmt_ The modified index filters the latest data.

summary

Common scenarios for MySQL OR condition optimization include the following:

1, the same column can be replaced with IN

2. union all can be used to separate columns and complex cases

3, Associate SQL OR Conditions

We need to analyze and optimize the scenarios.

More technologies to view on the official website [ https://www.dtstack.com/dtsmart/
](https://www.dtstack.com/dtsma...

Tags: SQL cloud computing Optimize cloud serving

Posted by dewbie on Mon, 09 May 2022 19:28:01 +0300