MySQL advanced - six index optimization

Database optimization scheme

  • Index failure, insufficient use of index: index establishment
  • Too many JOIN (design defects or unavoidable requirements) associated queries: SQL optimization
  • Too much data: sub database and sub table
  • Server tuning / optimization and various parameter settings (buffer, number of threads, etc.): adjust my.cnf

Performance analysis -explain

1. What is explain?

Use the EXPLAIN keyword to simulate the optimizer to execute SQL query statements, so as to know how MySQL handles your SQL statements. Analyze the performance bottleneck of your query statement or table structure.

2. explain usage

explain sql statement

Data preparation

 use dudu;

 create table t1(id int(10) auto_increment, content varchar(100) null, primary key (id));
 create table t2(id int(10) auto_increment, content varchar(100) null, primary key (id));
 create table t3(id int(10) auto_increment, content varchar(100) null, primary key (id));
 create table t4(id int(10) auto_increment, content1 varchar(100) null, content2 varchar(100) null, primary key (id));

 create index idx_content1 on t4(content1);  -- General index

 # The following new sql is executed several times for demonstration
 insert into t1(content) values(concat('t1_',floor(1+rand()*1000)));
 insert into t2(content) values(concat('t2_',floor(1+rand()*1000)));
 insert into t3(content) values(concat('t3_',floor(1+rand()*1000)));
 insert into t4(content1, content2) values(concat('t4_',floor(1+rand()*1000)), concat('t4_',floor(1+rand()*1000)));

Explanation of each field

table

Single table: shows which table the data in this row is related to

explain select *from t1;

Multi table Association: t1 is the driven table and t2 is the driven table.

Note: during internal connection, MySQL performance optimizer will automatically determine which table is the driven table and which represents the driven table, regardless of the writing order

id

Represents the order in which the select clause or operation table is executed in the query
Same id: execution sequence from top to bottom

Different IDS: if it is a sub query, the sequence number of the id will increase. The larger the id value, the higher the priority, and the earlier it will be executed

explain select t1.id from t1 where t1.id =(
        select t2.id from t2 where t2.id =(
        select t3.id from t3 where t3.content = 't3_434'
        )
        );


Note: the query optimizer may optimize the statements involving sub queries and turn them into join queries`

explain select * from t1 where content in (select content from t2 where content = 'a');

id is null: last execution

explain select * from t1 union select * from t2;

Summary:

  • If the id is the same, it can be considered as a group, which is executed from top to bottom
  • In all groups, the higher the id value, the higher the priority, and the earlier the execution
  • Attention: each id number represents an independent query. The fewer the query times of an sql, the better

select_type

The type of query is mainly used to distinguish complex queries such as general queries, joint queries, sub queries, etc.

SIMPLE

Simple query. The query does not contain subqueries or unions.

explain select *from t1;

PRIMARY:

Main query. If a query contains subqueries, the outermost query is marked as PRIMARY.

SUBQUERY:

Subquery. Subqueries are included in the select or where list.

explain select * from t3 where id = ( select id from t2 where content= 'a');

DEPENDENT SUBQUREY:

If a subquery is included, and the query statement cannot be converted into a join query by the optimizer, and the subquery is a related subquery (the subquery is based on external data columns), the subquery is a required subquery.

explain select * from t3 where id = ( select id from t2 where content = t3.content);

UNCACHEABLE SUBQUREY:

Indicates that the query of this subquery is affected by external system variables

explain select * from t3 where id = ( select id from t2 where content = @@character_set_server);

UNION:

For query statements containing UNION or UNION ALL, all queries are UNION except that the leftmost query is PRIMARY.

UNION RESULT:

UNION will query and de duplicate the query results. MYSQL will use a temporary table to complete the de duplication of the UNION query. The query for this temporary table is "UNION RESULT".

explain select * from t3 where id = 1 union select * from t2 where id = 1;

DEPENDENT UNION:

For the UNION or UNION ALL in the subquery, except that the leftmost query is required subquery, all other queries are required UNION.

explain select * from t1 where content in (select content from t2 union select content from t3);

DERIVED:

In queries containing derived tables (subqueries in the from clause), MySQL will recursively execute these subqueries and put the results in temporary tables.

explain select * from (select content, count(*) as c from t1 group by content) as derived_t1 where c > 1;

Here <derived2> is the derived table generated in the query with id 2.

Add: when dealing with statements with derived tables, MySQL gives priority to trying to merge the derived tables and outer queries. If not, materialize the derived tables (execute sub queries and put the results into temporary tables), and then execute the query. The following example is an example of merging derived tables and outer queries:

explain select * from (select * from t1 where content = 't1_832') as derived_t1;

MATERIALIZED:

For the statements containing subqueries, if the optimizer chooses to materialize the subquery and then connect the query with the outer query, the type of the subquery is MATERIALIZED. In the following example, the query optimizer first converts the sub query into a MATERIALIZED table, and then joins t1 and the MATERIALIZED table for query. Use the select content from t2 result as a condition

explain select * from t1 where content in (select content from t2);

partitions

Represents the hit condition in the partitioned table. This item is NULL for non partitioned tables

type ☆

explain:

The result values from the best to the worst are:

system > const > eq_ref > ref > fulltext > ref_or_null > index_merge
unique_subquery > index_subquery > range > index > ALL

The more important ones include: system, const, eq_ref ,ref,range > index > ALL

The goal of SQL performance optimization: at least reach the range level, which is required to be ref level, preferably consts level. (Alibaba development manual requirements)

ALL

Full table scanning. Full Table Scan, which will traverse the whole table to find matching rows

explain select *from t1;

index

When the overlay index is used, but all index records need to be scanned

Overwrite index: ` if you can get the desired data by reading the index, you don't need to read user records or do back to table operations. An index that contains data that meets the query results is called an overlay index.

-- You only need to read the non leaf nodes of the clustered index part to get id The leaf node does not need to be queried
explain select id from t1;

-- You only need to read the secondary index, and you can get the desired data in the secondary index. There is no need to id Back to table operation
explain select id, deptId from t_emp;

range

Retrieve only rows in a given range, using an index to select rows. The key column shows which index is used. Generally, queries such as between, <, >, in appear in your where statement. This range scan index scan is better than full table scan, because it only needs to start at one point of the index and end at another point, without scanning all the indexes.

explain select * from t1 where id in (1, 2, 3);

ref

When performing equivalent matching between ordinary secondary index columns and constants

explain select * from t_emp where deptid = 1;

eq_ref

When performing equivalent matching through primary key or unique secondary index column that does not allow NULL value during connection query

explain select * from t1, t2 where t1.id = t2.id;

const

When matching a constant with a primary key or a unique secondary index column

explain select * from t1 where id = 1;

system

In MyISAM engine, when there is only one record in the table. (this is the highest performance scenario of all type values)

create table t(i int) engine=myisam;
insert into t values(1);
explain select * from t;

Other less common types (understand

index_subquery: Associate subqueries with common indexes for query statements containing IN subqueries. content1 is a general index field

explain select * from t1 where content in (select content1 from t4 where t1.content = t4.content2) or content = 'a';

unique_subquery: similar to index_subquery, which uses a unique index to associate subqueries. The id of t2 is the primary key, which can also be understood as a unique index field

explain select * from t1 where id in (select id from t2 where t1.content = t2.content) or content = 'a';

index_merge: multiple indexes need to be combined in the query process, which usually appears in sql with or keyword.

explain select * from t_emp where deptid = 1 or id = 1;

ref_or_null: when the ordinary secondary index is matched equally, and the value of the index column can also be NULL.

explain select * from t_emp where deptid = 1 or deptid is null;


**fulltext:* * full text index. It is generally realized through search engines, which we will not expand here.

possible_keys and keys * * ☆

  • possible_keys refers to one or more indexes that may be used when executing queries. If there is an index on the field involved in the query, the index will be listed, but it may not be actually used by the query.
  • keys indicates the index actually used. If NULL, no index is used.
explain select id from t1 where id = 1;

key_len ☆

Indicates the number of bytes used by the index. According to this value, you can judge the use of the index and check whether the index is fully utilized. The larger the joint index value, the better.

How to calculate:

  1. First look at the type + length of the field on the index. For example: int=4; varchar(20) =20 ; char(20) =20
  2. If it is a string field such as varchar or char, it depends on the character set to multiply different values, such as utf8 to multiply by 3, utf8mb4 to multiply by 4, and GBK to multiply by 2
  3. varchar, a dynamic string, needs 2 bytes
  4. Fields that are allowed to be empty should be added with 1 byte
-- Create index 
create index idx_age_name on t_emp(age, `name`);
-- Test 1
explain select * from t_emp where age = 30 and `name` = 'ab%';
-- Test 2
explain select * from t_emp where age = 30;

ref

Displays the columns or constants that are compared with the index in the key.

-- ref=dudu.t1.id   Appears when associated with a query, t2 Table and t1 Which column of the table is associated 
explain select * from t1, t2 where t1.id = t2.id;

-- ref=const  What is the equivalent comparison with the index column, const Represents a constant
explain select * from t_emp where age = 30;

rows ☆

The number of rows MySQL believes it must check when executing a query. The smaller the value, the better.

-- In case of full table scanning, rows The value of is the estimated number of rows of data in the table
explain select * from t_emp where empno = '10001';

-- If you are using index queries, rows The value of is the estimated number of rows of index records scanned
explain select * from t_emp where deptid = 1;

filtered

The percentage of the last queried data in all server-side check rows. The higher the value, the better.

-- First, according to the secondary index deptid Find the primary key of the data, and three records meet the conditions,
-- Then the table is returned according to the primary key. Finally, three records are found, including 100%Records of meet the conditions
explain select * from t_emp where deptid = 1;

-- This example if name If the column is an indexed column filtered = 100 otherwise filtered = 10(Full table scan)
explain select * from t_emp where `name` = 'Breezy';

Extra ☆

Contains additional information that is not suitable for display in other columns but is important. Use this additional information to understand how MySQL will execute the current query statement. There are dozens of additional information provided by mysql. Only the more important ones are selected here.

Impossible WHERE:

The value of the where clause is always false

explain select * from t_emp where 1 != 1;

Using where

Where is used, but there are fields on where that are not indexed

explain select * from t_emp where `name` = 'Breezy';

Using temporary

Use temporary table to save intermediate results

explain select distinct content from t1;

Using filesort

When sorting records in query results, indexes can be used, as shown below:

explain select * from t1 order by id;


If the sorting operation cannot use the index, it can only be sorted (filesort) in memory (when there are fewer records) or on disk (when there are more records), as shown below:

explain select * from t1 order by content;

Using index

The overlay index is used, which means that direct access to the index is enough to obtain the required data, and there is no need to return to the table through the index

explain select id, content1 from t4;

explain select id from t1;

Using index condition

Simply put, check all the conditions first, and then go back to the table

It is called Index Condition Pushdown Optimization

  • If there is no index pushdown (ICP), MySQL finds the first secondary index record that meets the condition of content1 >'z'in the storage engine layer. The primary key value is returned to the table, and the complete record is returned to the server layer, which then determines whether other search conditions are true. If it is true, keep the record. Otherwise, skip the record and ask for the next record from the storage engine layer.
  • If index pushdown (ICP) is used, MySQL finds the first secondary index record that meets the condition of content1 >'z'in the storage engine layer. Don't rush to execute the return table, but judge all the information about IDX on this record first_ Whether the conditions contained in the content1 index are true, that is, whether content1 >'z'and content1 like'%a' are true. If these conditions are not true, skip the secondary index record directly and find the next secondary index record; If these conditions are true, perform a table return operation and return the complete record to the server layer.
-- content1 Index on column idx_content1
explain select * from t4 where content1 > 'z' and content1 like '%a';


Note that if the query condition here is only content1 >'z ', then an index push down operation will also be performed after finding the index that meets the condition to judge whether content1 >'z' is true (this is a redundant judgment made in the source code for programming convenience)

Using join buffer

In connection query, when the driven table cannot effectively use the index, MySQL will request a piece of memory space (join buffer) in advance to speed up the query

explain  select * from t1, t2 where t1.content = t2.content;

The following example shows that the driven table uses indexes:

explain select * from t_emp, t_dept where t_dept.id = t_emp.deptid;

Extracurricular reading: ` in order to optimize the multi table connection and reduce the disk IO reading times and data traversal times without index, MySQL provides us with many different connection cache optimization algorithms, which can be referred to
Link: https://blog.csdn.net/qq_35423190/article/details/120504960

  • Using join buffer (hash join)**8.0 new: * * connection cache (hash connection) is faster
  • Using join buffer (Block Nested Loop)5.7: connect cache (block nested loop)

Prepare data

A lot of data should be prepared before optimization. Next, create two tables and insert 50W data into the employee table and 1W data into the Department table.

How to quickly insert 50w pieces of data? stored procedure

How to ensure that the inserted data is not repeated? function

Department table:

  • id: self growth

  • deptName: random string. Repetition is allowed

  • address: random string. Repetition is allowed

  • CEO: any number between 1-50w

Employee table:

  • id: self growth
  • empno: random numbers or self increasing numbers starting from 1 can be used, and repetition is not allowed
  • name: randomly generated. Duplicate names are allowed
  • age: interval random number
  • deptId: random number between 1-1w

**Summary: * * functions that generate random strings and interval random numbers are required.

Create table

CREATE TABLE `dept` (
	`id` INT(11) NOT NULL AUTO_INCREMENT,
	`deptName` VARCHAR(30) DEFAULT NULL,
	`address` VARCHAR(40) DEFAULT NULL,
	ceo INT NULL ,
	PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1;

CREATE TABLE `emp` (
	`id` INT(11) NOT NULL AUTO_INCREMENT,
	`empno` INT NOT NULL ,
	`name` VARCHAR(20) DEFAULT NULL,
	`age` INT(3) DEFAULT NULL,
	`deptId` INT(11) DEFAULT NULL,
	PRIMARY KEY (`id`)
	#CONSTRAINT `fk_dept_id` FOREIGN KEY (`deptId`) REFERENCES `t_dept` (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1;

Create function

-- see mysql Allow function creation:
SHOW VARIABLES LIKE 'log_bin_trust_function_creators';
-- Command on: allow to create function settings:( global-All session All effective)
SET GLOBAL log_bin_trust_function_creators=1; 
-- Randomly generated string
DELIMITER $$
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN    
	DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFJHIJKLMNOPQRSTUVWXYZ';
	DECLARE return_str VARCHAR(255) DEFAULT '';
	DECLARE i INT DEFAULT 0;
	WHILE i < n DO  
		SET return_str =CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));  
		SET i = i + 1;
	END WHILE;
	RETURN return_str;
END $$

-- If you want to delete
-- drop function rand_string;
-- Used to randomly generate interval numbers
DELIMITER $$
CREATE FUNCTION rand_num (from_num INT ,to_num INT) RETURNS INT(11)
BEGIN   
 DECLARE i INT DEFAULT 0;  
 SET i = FLOOR(from_num +RAND()*(to_num -from_num+1));
RETURN i;  
END$$

-- If you want to delete
-- drop function rand_num;

Create stored procedure

-- Insert employee data
DELIMITER $$
CREATE PROCEDURE  insert_emp(START INT, max_num INT)
BEGIN  
	DECLARE i INT DEFAULT 0;   
	#set autocommit =0 set autocommit to 0  
	SET autocommit = 0;    
	REPEAT  
		SET i = i + 1;  
		INSERT INTO emp (empno, NAME, age, deptid ) VALUES ((START+i) ,rand_string(6), rand_num(30,50), rand_num(1,10000));  
		UNTIL i = max_num  
	END REPEAT;  
	COMMIT;  
END$$
 
-- delete
-- DELIMITER ;
-- drop PROCEDURE insert_emp;
-- Insert Department data
DELIMITER $$
CREATE PROCEDURE insert_dept(max_num INT)
BEGIN  
	DECLARE i INT DEFAULT 0;   
	SET autocommit = 0;    
	REPEAT  
		SET i = i + 1;  
		INSERT INTO dept ( deptname,address,ceo ) VALUES (rand_string(8),rand_string(10),rand_num(1,500000));  
		UNTIL i = max_num  
	END REPEAT;  
	COMMIT;  
END$$
 
-- delete
-- DELIMITER ;
-- drop PROCEDURE insert_dept;

Call stored procedure

-- Execute the stored procedure and go to dept Add 10000 pieces of data to the table
CALL insert_dept(10000); 

-- Execute the stored procedure and go to emp Add 500000 pieces of data to the table, and the number starts from 100000
CALL insert_emp(100000,500000); 

Batch delete table index

DELIMITER $$
CREATE  PROCEDURE `proc_drop_index`(dbname VARCHAR(200),tablename VARCHAR(200))
BEGIN
       DECLARE done INT DEFAULT 0;
       DECLARE ct INT DEFAULT 0;
       DECLARE _index VARCHAR(200) DEFAULT '';
       DECLARE _cur CURSOR FOR  SELECT   index_name   FROM information_schema.STATISTICS   WHERE table_schema=dbname AND table_name=tablename AND seq_in_index=1 AND    index_name <>'PRIMARY'  ;
#Each cursor must use a different declare continue handler for not found set done=1 to control the end of the cursor
       DECLARE  CONTINUE HANDLER FOR NOT FOUND set done=2 ;      
#If no data is returned, the program continues and sets the variable done to 2
        OPEN _cur;
        FETCH _cur INTO _index;
        WHILE  _index<>'' DO 
               SET @str = CONCAT("drop index " , _index , " on " , tablename ); 
               PREPARE sql_str FROM @str ;
               EXECUTE  sql_str;
               DEALLOCATE PREPARE sql_str;
               SET _index=''; 
               FETCH _cur INTO _index; 
        END WHILE;
   CLOSE _cur;
END$$

-- Perform batch deletion(Indexes): dbname Library name, tablename Table name
CALL proc_drop_index("dbname","tablename"); 

Enable the display of SQL execution time

In order to check the running time of SQL at any time in the following tests and test the effect of index optimization, we turn on profiling

-- display sql Statement execution time
SET profiling = 1;
SHOW VARIABLES  LIKE '%profiling%';
SHOW PROFILES;

Single table index failure cases

One of the most effective ways to improve performance in MySQL is to design reasonable indexes for data tables. Index provides an efficient way to access data and speed up the query, so index has a crucial impact on the speed of query.

After we create an index, whether we use it or not is ultimately up to the optimizer the final say. The optimizer will select the index based on the cost, and it will do whatever the cost is small. It is not based on rules or semantics.

In addition, whether the SQL statement uses indexes is related to the version of the database, the amount of data, and the degree of data selection (the number of columns selected in the query).

-- Create index 
create index idx_name on emp(`name`);

Index failure caused by calculation and function

-- Display query analysis
explain select * from emp where emp.name  like 'abc%';
explain select * from emp where left(emp.name,3) = 'abc'; --Index failure

LIKE starts with% index is invalid

explain select * from emp where name like '%ab%'; --Index failure


The index is based on the one on the left. If it's fuzzy, it can't be found

Expansion: Alibaba Java development manual**

[mandatory] page search is strictly prohibited from left blur or full blur. If necessary, please go to the search engine to solve it.

Not equal to (! = or < >) index failure

explain select sql_no_cache * from emp where emp.name = 'abc' ;
explain select sql_no_cache * from emp where emp.name <> 'abc' ; --Index failure


IS NOT NULL and IS NULL

explain select * from emp where emp.name is null;
explain select * from emp where emp.name is not null; --Index failure


Note: when the NULL value of the index column of the data in the database reaches a high proportion, the MySQL query optimizer will choose to use the index even in the case of IS NOT NULL. At this time, the value of type is range (range query)

-- take id>20000 Of data name Value changed to null 
update emp set `name` = null where `id` > 20000;

-- By performing query analysis, you can find is not null Index used
-- The specific number of records is null Can make the index in is not null It is determined by the algorithm of the query optimizer
explain select * from emp where emp.name is not null;


Change the value of name back after the test

update emp set `name` = rand_string(6) where `id` > 20000;

Index invalidation caused by type conversion

explain select * from emp where name='123';
explain select * from emp where name= 123; --Index failure

Full value matching (most commonly used)

get ready:

-- Delete the previously created index first 
call proc_drop_index("dudu","emp");

Question: which index is the most efficient to create for the following query statements

-- Query analysis
explain select * from emp where emp.age = 30 and deptid = 4 and emp.name = 'abcd';
-- implement sql
select * from emp where emp.age = 30 and deptid = 4 and emp.name = 'abcd';
-- View execution time
show profiles;

Create the index and re run the above test:

-- Create index: create one of the following three indexes respectively, and perform the above query analysis respectively 
create index idx_age_deptid_name on emp(age,deptid,`name`);

Conclusion it can be found that the most efficient query applies the joint index idx_age_deptid_name

Best left prefix rule

get ready:

-- Delete the previously created index first
call proc_drop_index("dudu","emp");
-- Create index
create index idx_age_deptid_name on emp(age,deptid,`name`);

Question: can the following SQL statements hit idx_age_deptid_name index, how many index fields can be matched

Test:

  • If multiple columns are indexed, the leftmost prefix rule should be observed. That is, the query starts at the top left of the index and does not skip the columns in the index.
  • If you want to use the index for the filter condition, you must meet it in order according to the order when the index is established. Once you skip a field, the fields behind the index cannot be used.
    `
explain select * from emp where emp.age=30 and emp.name = 'abcd' ;
-- explain result:
-- key_len: 5 Only used age Indexes
-- The order of index lookup is age,deptid,name,Query criteria do not contain deptid,unavailable deptid and name Indexes
explain select * from emp where emp.deptid=1 and emp.name = 'abcd';
-- explain result:
-- type:  all, Full table scan performed
-- key_len:  null, Index failure
-- The order of index lookup is age,deptid,name,Query criteria do not contain age,Cannot use the entire index
explain select * from emp where emp.age = 30 and emp.deptid=1 and emp.name = 'abcd';
-- explain result:
-- The order of index lookup is age,deptid,name,Match all index fields
explain select * from emp where emp.deptid=1 and emp.name = 'abcd' and emp.age = 30;
-- explain result:
-- The order of index lookup is age,deptid,name,Match all index fields

The column on the right of the range condition in the index is invalid

Put the condition on the far right

get ready:

-- Delete the previously created index first
CALL proc_drop_index("dudu","emp");

Question: which index is the most efficient to create for the following query statements

explain select * from emp where emp.age=30 and emp.deptid>1000 and emp.name = 'abc'; 

Test 1:

-- Create an index and perform the above sql Declarative explain
create index idx_age_deptid_name on emp(age,deptid,`name`);
-- key_len: 10, Just used age and deptid Indexes, name invalid

Note that when we modify the range condition of deptId, for example, if deptId>100, the entire index will be invalid. MySQL optimizer thinks it is unnecessary to use the index based on cost calculation, so it performs a full table scan.

Test 2:

-- Create an index and perform the above sql Declarative explain(take deptid Index last)
create index idx_age_name_deptid on emp(age,`name`,deptid);
-- Full index used

When the above two indexes exist, the MySQL optimizer will automatically select the best scheme

Association query optimization

Data preparation
Create two tables and insert 16 and 20 data respectively:

-- classification
CREATE TABLE IF NOT EXISTS `class` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`id`)
);
-- books
CREATE TABLE IF NOT EXISTS `book` (
`bookid` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`card` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`bookid`)
);
 
-- Insert 16 records
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO class(card) VALUES(FLOOR(1 + (RAND() * 20)));
 
-- Insert 20 records
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));
INSERT INTO book(card) VALUES(FLOOR(1 + (RAND() * 20)));

Association query optimization -------- left outer connection

The test before index creation was not conducted with full table scanning, and the number of queries was 16*20

explain select * from class left join book on class.card = book.card;
-- Left table class: Drive table, right table book: Driven table


Test 1: create an index on the drive table: a full index scan was performed, and the number of queries was 16*20

-- Create index
create index idx_class_card on class(card);

Test 2 create an index on the driven table: it can avoid full table scanning, and the number of queries is 16*1

-- Delete the previously created index first
call proc_drop_index("dudu","class");
-- Create index
create index idx_book_card on book(card);


Test 3: add indexes to two tables at the same time: make full use of indexes, and the number of queries is 16*1

-- Already book Indexes
create index idx_class_card on class(card);


Conclusion: the index should be created on the driven table, and the driven table should be as small as possible

Association query optimization -------- inner connection

The test turns the LEFT JOIN in the front external connection into an INNER JOIN

-- change into inner join
EXPLAIN SELECT * FROM class INNER JOIN book ON class.card=book.card;
-- exchange class and book Location of
EXPLAIN SELECT * FROM book INNER JOIN class ON class.card=book.card;

Select a table with a small amount of data as the driving table when there are indexes

When the class table has indexes, the book table is the driving table

When the book table has indexes, the class table is the driving table

Select a table with a small amount of data as the driving table when there is no index

Conclusion: it is found that even if the location of the table is exchanged, the MySQL optimizer will automatically select the driving table. The principle of automatically selecting the driving table is: the index is created on the driven table.

Association query optimization -------- expand the practice of the leader

-- Delete the previously created index first
call proc_drop_index("dudu","emp");
call proc_drop_index("dudu","dept");

Three table left connection mode:

-- Employee table(t_emp),Department table(t_dept),ceo(t_emp)Table Association query
explain select emp.name, ceo.name as ceoname
from emp
left join dept on emp.deptid = dept.id
left join emp ceo on dept.ceo = ceo.id;

The primary key index is used in one query, and the effect is the best

Sub query method:

explain select emp.name,(select emp.name from emp where emp.id = dept.ceo) as ceoname from emp left join dept on emp.deptid = dept.id;

The primary key index is used in two queries. Compared with the first one, the effect is slightly worse.

Connection mode of temporary table

explain select emp_with_ceo_id.name, emp.name as ceoname from(select
 emp.name, dept.ceo from emp left join dept on emp.deptid = dept.id) 
 emp_with_ceo_id left join emp on emp_with_ceo_id.ceo = emp.id;

After a query, the MySQL query optimizer converts the derived table query into the join table query, which is as fast as the first method

MySQL 5.5 query results are queried twice. First, query a and B to generate the derived table ab, which is used as the driven table and c as the driven table, and the primary key of table c is used. The effect is better than the latter one.

Temporary table connection mode 2

explain select emp.name, ceo.ceoname from emp left join(select emp.deptid
as deptid, emp.name as ceoname from emp inner join dept on emp.id = 
dept.ceo) ceo on emp.deptid = ceo.deptid;

After a query, the MySQL query optimizer converts the derived table query into the join table query, but only one table uses the index. The data retrieval times are slightly more, and the performance is the worst.

MySQL 5.5 query results are queried twice. First, query B and a to generate a derived table ab. as a driven table, the derived table cannot be indexed and cannot be optimized; Therefore, this kind of statement has the worst performance.

Association query optimization - Summary

  • Ensure that the index has been created in the JOIN field of the driven table
  • The data types of fields that need to be joined are absolutely consistent.
  • When LEFT JOIN, select the small table as the driven table and the large table as the driven table. Reduce the number of outer cycles.
  • When INNER JOIN, MySQL will automatically select the table of the small result set as the driving table. Choose to trust MySQL optimization strategy.
  • Those that can be directly associated with multiple tables should be directly associated as far as possible without sub queries. (reduce the number of queries)
  • It is not recommended to use sub query. It is recommended to split the sub query SQL and combine the program for multiple queries, or use JOIN to replace the sub query.
  • Derivative tables cannot be indexed

       

Tags: Database MySQL server

Posted by tanita on Sat, 06 Aug 2022 21:43:39 +0300