[interview question] there are two tables of "login log" and "login log" in the data background of a game.

"Login log" records the login time of each player and the role level at the time of login.

"Log out" records the log out time of each player and the role level at the time of log out.

The "character id" field uniquely identifies the player.

The log of role login and logout two days before the opening of the game service (2022-08-13 to 2022-08-14) is as follows:

Players can log in and out of the game many times a day. Please use SQL to analyze the following business problems:

The level stagnation rate of players on the first day of service opening (August 13, 2022).

(level stagnation rate = the number of characters staying at this level / the total number of people who have reached this level; if the player does not log out of the log, the level information of the login log will be used.)

[problem solving ideas]

Level stagnation rate = the number of characters staying at this level / the total number of people who have reached this level.

The number of characters at each stay level has been calculated in question 2. Now let's calculate the total number of people at each level.

Because the question indicates that if the player does not log out, the level information of the login log will be used. Therefore, we still connect the "login log" and "login log" vertically.

The linked table records all the character level information of the player, and the information that has reached the stay level must be in the table. Therefore, we first calculate the total number of people who have reached each level in this table, and then screen out the total number of people who have reached the stay level.

Calculating the total number of people reaching each level is also divided into two steps:

The first step is to group all levels;

Step 2: after grouping, calculate the number of roles at each level.

It can be seen that this is still a problem of grouping and summarizing. The group by clause is used to combine the count() function for calculation.

It should be noted that if a player logs in and out, the login and logout information is fully recorded in the "login log" and "logout log".

Then, the level under the login time in the login log will be the same as that under the login time immediately after the login time in the login log, because the level at the time of login will be the level of the next login.

In this way, the same player in the linked table will have repeated levels at different times. When calculating the number of people under this level by using the count() function, the player will be counted once more.

In this case, we need to de duplicate the role id (using the distinct keyword) to avoid double counting the number of people:

copycount(distinct role id)

Writing method of complete SQL:

copyselect Role level,count(distinct role id) as Total roles from (select date,role id,login time as time,Role level from Login log where date = '2022-08-13' union all select date,role id,Logout time as time,Role level from Log out where date = '2022-08-13') as a group by Role level;

The query results are as follows:

It can be seen that the total number of people who reached each stay level on the first day of service (August 13, 2022) is in the above results.

Now let's screen the total number of people under each stay level.

The residence level is derived from the results calculated in question 2, i.e. the following table:

We need to filter according to the stay level in this table. How to filter?

You can use the left join join join to filter.

left join is used to join two tables horizontally. When joining, the left table is the main table and all rows of the left table are returned, even if there is no match in the right table.

Here, we set the table of the above stay level distribution (set as temporary table c) as the left table, and the table containing the total number of roles under all role levels (set as temporary table d) as the right table.

The left and right tables are matched by level, so after using left join to vertically join the left and right tables, only the role level existing in the left table will be retained, and the role level not existing in the left table will be eliminated.

In this way, we can filter out the total number of roles belonging to the stay level in table d.

Writing method of SQL to realize the above connection:

copyselect c.*,d.* from c left join d on c.Stay level = d.Role level;

It can be seen that the number of roles staying at this level and the total number of people who have reached this level (the total number of roles) already exist in the table after left join connection, and the level stagnation rate can be calculated directly (the number of roles / the total number of roles is the level stagnation rate).

Then, after joining with left join, we can directly take the stay level in table c and the number of roles in table c, and divide the total number of roles in table d. There is no need to take out all columns in table c and table D.

Writing method of SQL:

copyselect c.Stay level,c.Number of roles/d.Total roles as Grade stagnation rate from c left join d on c.Stay level = d.Role level;

Now let's bring in the contents of temporary table c and temporary table d. In order to make the SQL statement easier to read, we use the with... as statement to define temporary table c and temporary table d (the with... as statement can encapsulate the sub query of a complete set of SQL statements to make the statement clearer and easier to understand).

Temporary table c is the query result of question 2, and temporary table d is the query result of calculating the total number of people at each level. Then there are:

copywith c as( select Role level as Stay level,count(role id) as Number of roles from (select *,rank() over(partition by role id order by time desc ) as ranking from (select date,role id,login time as time,Role level from Login log where date = '2022-08-13' union all select date,role id,Logout time as time,Role level from Log out where date = '2022-08-13') as a ) as b where ranking = 1 group by Role level order by Role level asc), #Details of provisional table c d as( select Role level,count(distinct role id) as Total roles from (select date,role id,login time as time,Role level from Login log where date = '2022-08-13' union all select date,role id,Logout time as time,Role level from Log out where date = '2022-08-13') as a group by Role level) #Specific contents of provisional table d

If it is brought into the statement of left join connection, the writing method of complete SQL for calculating the stagnation rate of each stay level is as follows:

copywith c as( select Role level as Stay level,count(role id) as Number of roles from (select *,rank() over(partition by role id order by time desc ) as ranking from (select date,role id,login time as time,Role level from Login log where date = '2022-08-13' union all select date,role id,Logout time as time,Role level from Log out where date = '2022-08-13') as a ) as b where ranking = 1 group by Role level order by Role level asc), #Details of provisional table c d as( select Role level,count(distinct role id) as Total roles from (select date,role id,login time as time,Role level from Login log where date = '2022-08-13' union all select date,role id,Logout time as time,Role level from Log out where date = '2022-08-13') as a group by Role level) #Specific contents of provisional table d select c.Stay level,c.Number of roles/d.Total roles as Grade stagnation rate from c left join d on c.Stay level = d.Role level;

The query results are as follows: