# Java statistics new customers

Last week, I did a task of order data statistics, which counted the number of new customers. This paper makes a record and sorting of the problem-solving process.

### Definition of new customer

New customer means that there are orders in the selected time period and no orders before the time period.

For example, the following order data:

time slotBefore February 1February 1 ~ March 1
customerA,B,CA,D,E

Before February 1, three enterprises a, B and C placed orders, while from February 1 to March 1, enterprises a, D and e placed orders. Find customers that exist from February 1 to March 1 but do not exist before February 1, that is, enterprises D and E are new customers.

Order form t_order has the following fields:
id, order No. order_sn, salesman sales, customer company, order time_ time

#### Count the number of new customers in a certain period of time (difficulty: simple)

For example, for the statistics of new customers from February 1 to March 1, the start time and end time of the time period are expressed by begin and end respectively.

First, count the number of customers before February 1, and use group by for reprocessing:

`select company from t_order where order_time < begin group by company`

Then count the number of customers from February 1 to March 1:

`select company from t_order where order_time >= begin and order_time <= end group by company`

New customers exist from February 1 to March 1, but not before February 1. That is, remove customers before February 1 from February 1 to March 1, and integrate the above two sql statements to obtain the following sql:

```select count(*) from
(select company from t_order where order_time >= begin and order_time <= end group by company)  where company  not in
(select company from t_order where order_time < begin group by company)```

### Number of new customers with medium difficulty

Based on the above, add more breakdown statistics of business representatives, use customers as groups, and count the customers before the time period first:

`select  company from t_order where order_time < begin group by  company`

Then, query the order customers within the time period, and use the salesperson and customer to group:

`select company,sales from t_order where order_time >= begin and order_time <= end group by company,sales`

The above figure shows the customers in the time period and before the time period. The same customers use associated connections. Among them, the new customer is not related, that is, C is the new customer. The number of new customers of each business can be obtained by using two queries as connection queries and then using business representatives as grouping queries:

```select  toi1.sales,
sum( if(toi1.company is not null and toi2.company is null,1,0)) as new_customer
from
(select company,sales from t_order where order_time >= begin and order_time <= end group by company,sales)
toi1 left join
(select  company from t_order where order_time < begin group by  company)  toi2 on toi1.company = toi2.company
group by toi1.sales ```

## Count the new customers every day or every month in the time period (difficulty: difficulty)

The above two queries exclude the data before the time period based on the statistics of customers in the time period. The daily or monthly statistics need to be compared with the previous ones every day. Here are two solutions.

### Scheme I:

#### Step 1: count the daily or monthly customers in the time period

Group customers_ Concat splicing:

```select substring(order_time,1,#{subTime}) as order_time,group_concat(distinct(company)) as companys
from  t_order where order_time >= begin and order_time <= end
group by substring(order_time,1,#{subTime})```

#### Step 2: count the customers before each day

Each day needs to be compared with the previous data. First query the daily customer collection, traverse the daily data, and then query the previous data. If the customer on the same day is not the previous customer, it is a new customer. Because the query needs to be queried many times, the query time will be very long.

For example, query new customers from February 1 to February 3:

|Date | company assembly|
| :-----:| :----: |
|February 1 | A,B|
|February 2 | B,D|
|February 3 | C,E|

There are three pieces of data above, which need to be queried three times. If the time period is relatively long, the query takes longer.

Later, I think of using union all combined query. On the basis of the above query, I use foreach to traverse each piece of data, and each data will query the customer set of data forward:

```     <foreach collection="list" item="list" separator=" UNION ALL ">
select #{list.order_time} as order_time,group_concat(distinct (company )) as companys from
t_order_info
where order_type=1 and amount>0 and finish_subtype not in (3,6)
and substring(order_time,1,#{subTime}) < #{list.order_time}
and company in
<foreach collection="list.companys" item="company" open="(" close=")" separator=",">
#{company}
</foreach>
</foreach>```

The above sql should actually be in the following format:

```select order_time,company from t_order
union all
select order_time,company from t_order
union all
select order_time,company from t_order
```

Joint query using union all is much faster.

#### Step 3: remove the set of step 2 from the set of step 1

For the data contained in the time period, remove the previous collection, that is, the data of new customers.

group_concat splices characters, which is incomplete because it exceeds group_concat_max_len value, which is 1024 by default. You can increase this value.

The following is the customer collection before February 1 and every day from February 1 to February 3:

|Date before February 1 February 1 February 2 February 3|
|:----:|:----:|:----:|:----:|:----:|
|Company | A,B |C|A,D|C,D|

#### analysis

First, look at the data on February 1. Customer C does not exist before February 1, so the new customer on February 1 is C.

Then look at February 2, to find the data before February 2.

Before February 2 is before February 1 + February 1

Therefore, the data before February 2 does not need to be queried in the database, and the previous data is accumulated.

#### Solution

Use set to store data. First put the data before February 1 into set

Before February 1, A and B are put into the set, and the new customers that do not exist in the set.

First of all, C on February 1 is not in the set, so the new customer on February 1 is C. Then add C to the set.

On February 2, A is in the set and D is not in the set, so the new customer on February 2 is d. Add d to the collection.

Both C and D in February 3 exist in the collection, so there are no new customers on February 3.

If you think the article is helpful to you, please praise it!

Tags: Java

Posted by t0ta11 on Thu, 19 May 2022 06:58:19 +0300