MySQL 8.0 window function ranking and topN problems
I've been using mysql5 7. Unexpectedly, a very useful function - window function is added to 8.0. Here, the window function, its usage and problems that can be solved are summarized as follows
what
Window function --- also known as OLAP function --- online analytical processing
The window function is moved from Oracle. It seems that Oracle's acquisition of mysql is also beneficial
It mainly solves the problems of ranking and topN in business
how
Basic syntax:
Window function name() over (partition by Group column names order by Sort column name) as ranking
(as ranking is a common naming method. Of course, you can use other names)
Classification:
-
Some of MySQL's original aggregate functions can also be used in the syntax of window functions as window functions, such as sum(),avg(),max(),min(),count(), etc
This paper mainly discusses the serial number function combined with the problem
-
Q: "can you filter the results returned by the window function directly?"
Answer: "because the occurrence position (order) of the window function is after where and group by, so it can't" (please note that what is said here is "direct". Of course, you can take the result of the window function as a temporary table and then select...where...)
Difference from group by (important)
Number of rows
For example, the table math has the following fields: class, student ID and score of each student in a certain subject. Let's not rank first, but count the number of people in each class first
If used
select class, count(stu_id) from math group by class order by class # So the result is class count(stu_id) class1 3 class2 1 class3 2
In other words, using group by may change the number of rows in the table - one row shows one group
If you use the window function - the number of rows in the original table is the number of rows
select class, count(stu_id) over (partition by class order by class) as stu_num from class # The result is class stu_num class1 3 class1 3 class1 3 class2 1 class3 2 class3 2
Note: Parton by grouping will not reduce the number of rows in the original table
Hierarchical aggregate function
sum, avg, count, etc. can also be used in the window function, which aggregates the data of each row and above, such as:
# Use sum as the result of aggregation function --- hierarchical sum of the total scores of students in each class class stu_id Score 1 001 100 1 002 200 1 003 300 2 004 100 2 005 200 # If you use the traditional group by, there are only class and sum(score)
Ordinal function
For the ranking of grades and rankings, there will be different ranking methods in life, such as how to rank those with the same score. So there are different serial number functions: if you have the following results
table marks: stu_id marks 1 5 2 5 3 4 4 2 5 1
dense_rank() -- the same score and rank, not the next rank. This is often used in daily life
select stu_id, dense_rank() over (order by marks) as dense_ranking from marks # The result is stu_id marks ranking 1 5 1 2 5 1 3 4 2 4 2 3 5 1 4
rank() -- the same score and rank, but it will "occupy the rank". Use the above data:
select stu_id, rank() over (order by marks) as ranking from marks # The result is stu_id marks ranking 1 5 1 2 5 1 3 4 3 4 2 4 5 1 5
row_number() -- the same score, different ranking, also ranking, as the name suggests, just like the line number
select stu_id, row_number() over (order by marks) as row_number from marks # The result is stu_id marks ranking 1 5 1 2 5 2 3 4 3 4 2 4 5 1 5
Ranking problem
https://leetcode-cn.com/problems/rank-scores/
(mysql 8)
select a.Score as Score, (select count(distinct b.Score) from Scores as b where b.Score>=a.Score) as `Rank` from Scores as a order by Score desc;
(conventional solution)
select a.Score as Score, (select count(b.score) from Scores as b where b.Score>=a.Score) as `Rank` from Scores as a order by Score desc;
Idea: for each score, the number of scores greater than the score in the table is the ranking of scores
topN problem
Suppose there is a table Scores
Stu_id subject score 1 math 99 2 chinese 100 1 english 66 2 math 80 2 english 80 1 chinese 98
We want to know that each student's best two courses
select *, row_number() over (partition by Stu_id order by score) as ranking from Scores where ranking<2; Stu_id subject score ranking 1 math 99 1 1 chinese 98 2 1 english 66 3 2 chinese 100 1 2 math 80 2 2 english 80 3 # Filtering by ranking < 2