Big data development Hive common date function & & sql routine of date continuous questions

In the front is a summary of common date functions, followed by a continuous date sql problem and its solution routine.

1. Current date and time

select current_timestamp
-- 2020-12-05 19:16:29.284 

2. Get the current date, which is 2020-12-05

SELECT current_date; 
## OR 
SELECT current_date(); 
-- 2020-12-05 

3. Obtain the timestamp under unix system

SELECT UNIX_TIMESTAMP();
-- 1524884881 

4. The current is 2020-12-05

select substr(current_timestamp, 0, 10);
-- current_timestamp 

5. The current is 2020-12-05

select date_sub(current_date, 1);
--2020-12-04 

6. Yyyy MM DD HH: mm: SS interception date

select to_date("2017-10-22 10:10:10");
-- 2017-10-22 
select date_format("2017-10-22" "yyyy-MM")
-- 2017-10 

7. Days difference between two dates

select datediff("2017-10-22", "2017-10-12");
-- 10

select datediff("2017-10-22 10:10:10", "2017-10-12 23:10:10");

-- 10

select datediff("2017-10-22 01:10:10", "2017-10-12 23:10:10");

-- 10 

8. Time interception

select from_unixtime(cast(substr("1504684212155", 0,10) as int)) dt;
-- 2017-09-06 15:50:12 

9. Timestamp to date

Syntax: to_date(string timestamp)

select to_date(from_unixtime(UNIX_TIMESTAMP()));
-- 2018-04-28

select FROM_UNIXTIME(UNIX_TIMESTAMP(),'yyyy-MM-dd 10:30:00');

-- 2018-04-28 10:30:00

select concat(date_sub(current_date,1),' 20:30:00');

-- 2018-04-27 20:30:00

-- hive version 1.2.0

select date_format(date_sub(current_date,1),'yyyy-MM-dd 20:30:00'); 

10. Date increase

Note: only two original date formats are supported: yyyy MM DD yyyy MM DD HH: mm: SS

Otherwise, date is required_ Format to turn

date_add
next_day 

11. Additional questions

There is an active member table. The daily partition dimension is the member ID, which can be used with device_ ID instead, ask how to calculate the number of active members for three consecutive days in the last seven days. The structure of the table (dws.dws_member_start_day) is as follows (dt is the partition, the date format is yyyy MM DD, and each partition has a unique device_id):

device_id             string                                                                      
dt                    string                

Solution routine

1. First think about the date functions DateDiff and date that can be used_ sub/date_ add

2. A ranking function is used for continuous dates and continuous problems, but the value of the ranking function is a numerical value, which can be easily grouped only if it is mapped to the continuity of the date. For example, you can map the date to a continuous number or the number to a continuous date. The operations to achieve these two are through the previous datedff and date_ The principle of sub combination is that the continuous integer can be obtained by subtracting the date from the date, and the continuous date can be obtained by subtracting the integer from a date, where date_sub can be a continuous date obtained by reverse sorting.

3. This problem can be solved by subtracting successive sorting dates or sorting IDS, and then grouping

1. Add a column of sorting sequence number on the basis of the original table

SELECT device_id,
       dt,
       row_number() over(PARTITION BY device_id
                         ORDER BY dt) ro
FROM dws.dws_member_start_day

2. Convert the serial number to a continuous date, or convert the date to a continuous number, and then it becomes gid

-- 2.1 Convert serial number to continuous date
SELECT device_id,
    dt,
    datediff(dt, date_add('2020-07-20', row_number() over(PARTITION BY device_id
        ORDER BY dt))) gid
FROM dws.dws_member_start_day 

-- 2.2 Convert date to serial number
SELECT device_id,
    dt,
    (datediff(dt, '2020-07-21') - row_number() over(PARTITION BY device_id
        ORDER BY dt)) gid
FROM dws.dws_member_start_day 

3. Group screening

SELECT device_id,count(1)
FROM
    (SELECT device_id,
        dt,
        datediff(dt, date_add('2020-07-20', row_number() over(PARTITION BY device_id
            ORDER BY dt))) gid
        FROM dws.dws_member_start_day
        WHERE datediff(dt, CURRENT_DATE) BETWEEN -7 AND 7 ) tmp
GROUP BY device_id,
    gid
HAVING count(1) < 3  

Wu Xie, the third master, is a rookie in the field of big data and artificial intelligence.
Please pay more attention

Tags: Big Data

Posted by jk11uk on Tue, 03 May 2022 06:12:50 +0300