Help wife series - extract keywords such as work order number and work number from customer service data

Regular expression is a very powerful tool for matching strings. It also has the concept of regular expression in other programming languages. Python is no exception. Using regular expression, it's easy for us to extract what we want. This paper is a small scene written to reduce the work of our wife.

The approximate matching process of regular expressions is:

1. Compare the characters in the expression and text in turn, 2. If each character can be matched, the matching is successful; Once there are characters that fail to match, the matching fails. If there are some quantifiers in the expression or process, there will be some differences.

In fact, the text is relatively simple, which is to obtain the work order serial number, work number and reply content.

import restr='''
: (Service Recovery )"Recording serial number:(202201111111111111111111)","Complained job number:(HH111111)","Responsibility attribution:It is the responsibility of the business representative for handling errors". 
Recording content:The customer calls to turn off the data Internet function. Does the business representative say there is a password? After the transfer, the business representative said that he had helped close it, which was the responsibility of the business representative for handling errors
'''
str='''202201111111111111111111 has been processed. See 2022022222222 for the processing results. There is no need to add filing supplementary comments after the processing. Li Si (60000) 2/3
2/3  11:43 Contact 1111 customer to explain that we have received the feedback and are in the process of further confirmation. We will contact them in time for follow-up progress, and the customer agrees. Zhang San (666666)'''
pattern1 = re.compile(r'[0-9a-zA-Z_]{20,30}')
result1 = pattern1.findall(str)
# print(result1)
pattern2 = re.compile(r'[a-zA-Z]{2,4}[0-9]{4,8}')
pattern3=re.compile(r'Error in reply|The explanation is not up to standard|The reminder is not in place|Wrong reply|Handling error|Handling error|It is not the responsibility of the customer service representative|It is not the responsibility of the traffic representative|It is not the responsibility of the business representative|Service attitude')
pattern4=re.compile(r'(Error in reply)|(The explanation is not up to standard)|(The reminder is not in place)|(Wrong reply)|(Handling error)|(Handling error)|(It is not the responsibility of the customer service representative)|(It is not the responsibility of the traffic representative)|(It is not the responsibility of the business representative)|(Service attitude)')
pattern5=re.compile(r'[(](Error in reply)[)]|[(](The explanation was not up to standard)[)]|[(](The reminder is not in place)[)]|[(](Wrong reply)[)]|[(](Handling error)[)]|[(](Handling error)[)]|[(](It is not the responsibility of the customer service representative)[)]|[(](It is not the responsibility of the traffic representative)[)]|[(](It is not the responsibility of the business representative)[)]|[(](Service attitude)[)]')
pattern6=re.compile(r'[(]Error in reply[)]|[(]The explanation is not up to standard[)]|[(]The reminder is not in place[)]|[(]Wrong reply[)]|[(]Handling error[)]|[(]Handling error[)]|[(]It is not the responsibility of the customer service representative[)]|[(]It is not the responsibility of the traffic representative[)]|[(]It is not the responsibility of the business representative[)]|[(]Service attitude[)]')
result12 = pattern2.findall(str)

liushui = re.compile(r'[(|(][0-9a-zA-Z_]{20,30}[)|)]', re.S) #Minimum matching
gonghao = re.compile(r'[(|(][a-zA-Z]{2,4}[0-9]{4,8}[)|)]', re.S) #Minimum matching
huifu=re.compile(r'[(|(]Error in reply[)|)]|[(|(]The explanation is not up to standard[)|)]|[(|(]The reminder is not in place[)|)]|[(|(]Wrong reply[)|)]|[(|(]Handling error[)|)]|[(|(]Handling error[)|)]|[(|(]It is not the responsibility of the customer service representative[)|)]|[(|(]It is not the responsibility of the traffic representative[)|)]|[(|(]It is not the responsibility of the business representative[)|)]|[(|(]Service attitude[)|)]')

import xlrd3
# #Open excel
wb = xlrd3.open_workbook('20220330-4.5 Service quality work order data-(Total).xlsx')
#Locate worksheet by Workbook
sh = wb.sheet_by_name('Sheet1')
for i in range(1,sh.nrows):
    curvalue=sh.cell(i, 21).value
    print('The first',i,'that 's ok~',liushui.findall(curvalue),'~',gonghao.findall(curvalue),'~',huifu.findall(curvalue))
copy

There are some test cases in the front and real running programs in the back. Regular expressions are still difficult to understand, but they are enough to solve some problems.

The diagram of operation results was deleted at the request of his wife.

At best, I'm just a regular expression Xiaobai. There should be better and faster expressions.

In addition, there are many things that need to be improved, such as processing in excel, generating UI and so on.

Later, you can try to generate the UI and send it to your wife for self-processing

Baidu Encyclopedia has many examples about regular expressions, which you can learn from

1. Verify user name and password: ("^ [a-zA-Z]\w{5,15} $") correct format: "[A-Z] [A-Z] [0-9]", and the first word must be 6 ~ 16 letters;

2. Verify that the phone number: ("^ (\ d{3,4}-)\d{7,8} $") is in the correct format: XXX / XXXX XXXXXX / XXXXXXXX;

3. Verify the mobile phone number (including virtual number and new number segment): "^ 1 ([38] [0-9] | 4 [5-9] | 5 [0-3,5-9] | 66| 7 [0-8] | 9 [89]) [0-9] {8} $";

4. verify the ID number (15 digits): "\d{14}[[0-9],0-9xX]", (18 digits): "\d{17}(\d|X|x)";

5. Verify Email address: ("^ \ w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$");

6. You can only enter a string composed of numbers and 26 English letters: ("^ [A-Za-z0-9] + $");

7. Integer or decimal: ^ [0-9] + ([.] [0-9]+){0,1}$

8. Only numbers can be entered: "^ [0-9] * $".

9. Only n digits can be entered: "^ \ d{n} $".

10. You can only enter at least n digits: "^ \ d{n,} $".

11. Only m~n digits can be entered: "^ \ d{m,n} $".

12. You can only enter numbers starting with zero and non-zero: "^ (0| [1-9] [0-9] *) $".

13. Only positive real numbers with two decimal places can be entered: "^ [0-9] + (\. [0-9] {2})? $".

14. Only positive real numbers with 1 ~ 3 decimal places can be entered: "^ [0-9] + (\. [0-9] {1,3})? $".

15. Only non-zero positive integers can be entered: "^ \ +? [1-9] [0-9] * $".

16. Only non-zero negative integers can be entered: "^ \ - [1-9] [0-9] * $".

17. Only characters with length of 3 can be entered: "^. {3} $".

18. You can only enter a string consisting of 26 English letters: "^ [A-Za-z] + $".

19. You can only enter a string consisting of 26 uppercase English letters: "^ [A-Z] + $".

20. You can only enter a string consisting of 26 lowercase English letters: "^ [a-z] + $".

21. Verify whether it contains ^% & '; =?

22. Only Chinese characters can be entered: "^ [\ u4e00-\u9fa5]{0,} $".

23. Verification URL: "^ http: / / ([\ W -] + \.)+ [\w-]+(/[\w-./?%&=]*)?$ ".

24. Verify the 12 months of a year: "^ (0? [1-9] |1 [0-2]) $" in the correct format: "01" ~ "09" and "10" ~ "12".

25. Verify 31 days of a month: "^ ((0? [1-9]) | ((1 | 2) [0-9]) | 30 | 31) $" the correct format is; "01" ~ "09", "10" ~ "29" and "30" ~ "31".

26. Get date regular expression: \ \ d{4} [year | \ - | \.] \d{\-} [month | \ - | \.] \d{\-} day?

Comment: it can be used to match most of the year, month and day information.

27. Matching double byte characters (including Chinese characters): [^ \ x00-\xff]

Comment: can be used to calculate the length of a string (a double byte character length meter 2, ASCII character meter 1)

28. Regular expression matching blank lines: \ n\s*\r

Comment: can be used to delete blank lines

29. Regular expressions matching HTML Tags: < (\ s *?) [^>]*>.*?</>|<.*? />

Comment: the version circulated on the Internet is too bad. The above one can only match the part. There is still nothing to do with complex nested tags

30. Regular expression matching leading and trailing white space characters: ^ \ s*|\s*$

Comment: it can be used to delete white space characters at the beginning and end of a line (including spaces, tabs, page breaks, etc.), which is a very useful expression

31. Regular expression matching URL of web address: [a-zA-z]+://[^\s]*

Commentary: the functions of the version circulated on the Internet are very limited, and the above can basically meet the needs

32. Whether the matching account number is legal (starting with a letter, 5-16 bytes are allowed, and alphanumeric underscores are allowed): ^ [a-za-z] [a-za-z0-9] {4,15}$

Comment: form validation is very practical

33. Match Tencent QQ number: [1-9] [0-9] {4,}

Comment: Tencent QQ starts from 10000

34. Matching China Postal Code: [1-9] \ \ d{5} (?)! (d)

Commentary: the postal code of China is 6 digits

35. Matching ip address: ([1-9] {1,3} \.) {3}[1-9].

Comment: useful when extracting ip addresses

36. Matching MAC address: ([A-Fa-f0-9]{2}\:){5}[A-Fa-f0-9]

Posted by evildarren on Thu, 19 May 2022 10:11:46 +0300