# 5.5 Regular Expression Learning Notes and Homework

## Understanding Regular Expressions

###### Method 1
```def is_tel_num(tel_no: str):
if len(tel_no) != 11:
return False

if tel_no[0] != '1':
return False

if tel_no[1] not in '3456789':
return False

return tel_no.isdigit()

tel = '13354627381'
print(is_tel_num(tel))
```
```from re import fullmatch,findall

# Method 2:
def is_tel_num2(tel_no: str):
return fullmatch(r'1[3-9]\d{9}', tel_no) != None

tel = '13354622381'
print(is_tel_num2(tel))
```
```message = '234 data upon receipt of a response jssh 78 Shifu Haoxue 2391 Jinhu Fufu 23 card brusher sshs34'
# 234,78,2391,23,34

# Method 1:
all_num = []
num_str = ''
for index in range(len(message)):
if message[index].isdigit():
num_str += message[index]
else:
all_num.append(num_str)
num_str = ''
all_num.append(num_str)
all_num = [int(x) for x in all_num if x]

print(all_num)
```
```# Method 2:
message = '234 data upon receipt of a response jssh 78 Shifu Haoxue 2391 Jinhu Fufu 23 card brusher sshs34'
all_num2 = findall(r'\d+', message)
print(all_num2)
```

## Matching class symbols

```from re import fullmatch
```

### 1. re module

```"""
re Module is python A module used to support regular expressions
re Various regularly related functions are provided in the module: fullmatch,search,findall,match,split,sub Wait

fullmatch(regular expression, Character string)  -  Rule that determines whether the entire string is fully symbolized as described by a regular expression. If the non-conforming return value is None

python How regular expressions are provided in: r'regular expression'
js How regular expressions are provided in:/regular expression/
"""
```

### 2. Match Class Symbol - A regular symbol represents a class of characters

#### 1) Ordinary symbols - the symbol itself is represented in regular expressions, and the requirement for the characters in the corresponding string is the symbol itself.

###### String Requirement: There are 3 characters in total, the first is'a', the second is'b', and the third is'c'
```result = fullmatch(r'abc', 'abc')
print(result)

result = fullmatch(r'abc', 'mnd')
print(result)
```

#### 2). - Match an arbitrary character

###### String Requirement: There are 3 characters in total, the first is'a', the last is'c', which can be any symbol in between
```result = fullmatch(r'a.c', 'a*c')
print(result)

result = fullmatch(r'..xy', 'yes sxy')
print(result)
```

#### 3)\d - Match an arbitrary number

```result = fullmatch(r'a\dc', 'a2c')
print(result)

result = fullmatch(r'\d\d\d..', '823m yes')
print(result)
```

#### 4)\s - Matches any white space character

###### White space characters include: spaces, \n, \t
```result = fullmatch(r'a\sb', 'a b')
print(result)
```

#### 5)\w - Match any number, letter or underscore or Chinese

```result = fullmatch(r'a\wb', 'a3b')
print(result)
```

#### 6)\Uppercase-Contrary to the function of the corresponding lowercase letter

```"""
\D  -   Match any non-numeric character
\S  -   Match any non-whitespace character
\W
"""

result = fullmatch(r'a\Db', 'a2b')
print(result)       # None
result = fullmatch(r'a\Db', 'a)b')
print(result)
```

#### 7) [Character Set] - Matches any character in the character set

###### Note: A [] can only match one character
```"""
[Multiple common symbols] -  For example:[abc12]， stay'a','b','c','1','2'Any of the five symbols can match
[Contain\Special symbol at the beginning]  - For example:[mn\d],[m\dn],[\dmn]， Requirements are m perhaps n Or any number
[Character 1-Character 2]   -   For example:[a-z]，Require any lowercase letter
[a-zA-Z]，Require any letter
[2-9a-z]，Require 2 to 9 or any lower case letter
[\u4e00-\u9fa5]，Requirement is any Chinese
[\u4e00-\u9fa5\dabc]
Be careful:[]Medium if-It's not between two characters. It's just a common symbol.
"""
```
```result = fullmatch(r'1[xyzmn]2', '1n2')
print(result)

result = fullmatch(r'a[mn\d]b', 'a9b')
print(result)

result = fullmatch(r'1[a-z][a-z]2', '1hm2')
print(result)

result = fullmatch(r'a[x\u4e00-\u9fa5\dy]b', 'ayb')
print(result)

result = fullmatch(r'1[-az]2', '1-2')
print(result)
```

#### 8) [^Character Set] - Matches any character not in the character set

```result = fullmatch(r'1[^xyz]2', '1x2')
print(result)

result = fullmatch(r'1[^\dab]2', '1M2')
print(result)

result = fullmatch(r'a[^2-9]b', 'a3b')
print(result)
```

## Number of matches

```from re import fullmatch, search
```

### 1. * - Matches 0 or more times (any number of times)

```"""
a*  -   a Any number of occurrences
\d*  -  Any number\d  -> Any number
[abc]*  -  Any number[abc]  -> Any number(a perhaps b perhaps c)
...
"""
result = fullmatch(r'a*b', 'aaaaaaab')
print(result)

result = fullmatch(r'\d*b', '21222b')
print(result)

result = fullmatch(r'[A-Z]*b', 'KDBb')
print(result)
```

### 2. + - Match once or more (at least once)

```result = fullmatch(r'a+b', 'aaaaab')
print(result)
```

### 3.?- 0 or 1 time

```result = fullmatch(r'-?123', '-123')
print(result)

result = fullmatch(r'[+-]?123', '+123')
print(result)
```

### 4. {} - Number of previous characters repeated

```"""
{N}   -  N second
{M,N} -  M reach N second
{M,}  -  at least M second
{,N}  -  most N second

* == {0,}
+ == {1,}
? == {0,1}
"""
```
```result = fullmatch(r'\d{3}abc', '623abc')
print(result)

result = fullmatch(r'\d{2,5}abc', '2234abc')
print(result)
```

### 5. Greedy and non-greedy

###### When the number of matches is uncertain, the matching pattern is divided into greedy and non-greedy, which is greedy by default.
```"""
The number of matches is uncertain:*,+,?,{M,N},{M,},{,N}

Greedy and non-greedy: In the case of uncertain number of times, the corresponding string has multiple matching results at different times, greedy for the result corresponding to the most number of times. (Provided that matching succeeds in a variety of situations)
Non-greedy results for the minimum number of times.

Greedy:*,+,?,{M,N},{M,},{,N}
Non-greedy:*?,+?,??,{M,N}?,{M,}?,{,N}?
"""
result = fullmatch(r'\d+?', '26373')
print(result)       # '26373'

# Search (regular expression, string) - Finds the first string in a string that satisfies a regular expression
# 2 -> 1 26 -> 2  263 -> 3   2637 -> 4  26373 -> 5
result = search(r'\d+?', 'Water and electricity fee State 26373 sfdhgahj')
print(result)

# 'amnb'-> 2'amnbxnxb' -> 6'amnbxnxb Shengshi B'-> 9
result = search(r'a.*b', r'Construction party goes home amnbxnxb Sheng Shi b-2---==')
print(result)   # 'amnbxnxb Shengshi b

result = search(r'a.*?b', r'Construction party goes home amnbxnxb Sheng Shi b-2---==')
print(result)   # 'amnb'

# '<p>Are you okay</p>'- 4'<p>Are you okay</p><a>Baidu</a><p>hello world!</ P>'
html = '<body><span>start!</span><p>Are you OK</p><a>Baidu</a><p>hello world!</p></body>'
result = search(r'<p>(.*?)</p>', html)
print(result, result.group(1))

result = search(r'a.+?c', 'Mobile Play amnc Hello c Accounting Place abc')
print(result)
```

## Grouping and Branching

```from re import fullmatch, findall
```

### 1. Grouping - ()

#### 3) Capture-Get Part of Regular Matching Result

###### Match: The structure of two letters and two numbers is repeated three times,'mn78jh56lm89'
```result = fullmatch(r'[a-zA-Z]{2}\d\d[a-zA-Z]{2}\d\d[a-zA-Z]{2}\d\d', 'mn78jh56lm89')
print(result)

result = fullmatch(r'([a-zA-Z]{2}\d\d){3}', 'mn78jh56lm89')
print(result)
```
###### '23abc56'- Failed
```# result = fullmatch(r'\d\dabc\d\d', '23abc56')
# print(result)

result = fullmatch(r'(\d\d)abc\1', '23abc23')
print(result)

result = fullmatch(r'(\d{3})([a-z]{2})-\2\1=\1{3}', '876nm-nm876=876876876')
print(result)

result = fullmatch(r'(((\d{2})[A-Z]{3})([a-z]{2}))-\2-\1-\3', '34MNGbn-34MNG-34MNGbn-34')
print(result)

result = findall(r'[a-z]\d\d', 'Drinking 2 Spatial Data 78, Next Year and45 Mentlessness 2341==hsn89=river=263')
print(result)   # ['d45', 'n89']

result = findall(r'[a-z](\d\d)', 'Drinking 2 Spatial Data 78, Next Year and45 Mentlessness 2341==hsn89=river=263')
print(result)       # ['45', '89']
```

### 2. Branch - |

###### Match: ABC is followed by two arbitrary numbers or two arbitrary uppercase letters,'abc34','abcKJ'
```result = fullmatch(r'abc\d\d|abc[A-Z]{2}', 'abc34')
print(result)

result = fullmatch(r'abc(\d\d|[A-Z]{2})', 'abcKL')
print(result)
```

## Detect class symbols and escape symbols

```from re import fullmatch, findall, search
```

### 1. Detect class symbols (Understanding)

#### 1)\b - Check if it's a word boundary

```"""
Word boundary  -  Symbols that can be used to separate two words, such as blank characters, punctuation, the beginning and end of a string
"""
result = findall(r'\b\d+\b', '23 Data 2367 skjj,89 Is 2039,Is the key 768 hsj,237 Long time no 79 ssjs 89')
print(result)
```

#### 4) \$- Detects if it is the end of a string

```result = fullmatch(r'1([358][0-9]|4[579]|66|7[0135678]|9[89])[0-9]{8}', '13578237392')
print(result)

result = search(r'^1([358][0-9]|4[579]|66|7[0135678]|9[89])[0-9]{8}\$', '13578237392')
print(result)

# Phone number regular expression ^1 ([358][0-9]|4[579]|66|7[0135678]|9[89]) [0-9]{8}\$
```

### 2. Escape Symbols

###### Write a regular match decimal: 2.3
```result = fullmatch(r'\d\.\d', '2.3')
print(result)

# '23+78'
result = fullmatch(r'\d\d\+\d\d', '65+23')
print(result)

# '(Protective Devices)'
result = fullmatch(r'\([\u4e00-\u9fa5]{2}\)', '(Protective clothing)')
print(result)

# '\dabc'
result = fullmatch(r'\\dabc', '\dabc')
print(result)

# '-abc','Mabc','Nabc'
result = fullmatch(r'[mn\]]abc', ']abc')
print(result)
```
###### Supplement: Symbols with special meaning exist independently, and special functions disappear directly into a common symbol when placed in [], for example: +, *,.?,), (etc.)
```result = fullmatch(r'[.+*?\$]ab[.]c', '+ab.c')
print(result)
```

## re module

```import re
```

### 1. Common functions

```"""
1)re.fullmatch(regular, Character string)   -  Matches with the entire string and regular expression, and returns if the match succeeds in returning the matched object or fails None
2)re.match(regular, Character string)      -   Beginning of the match string, if the match successfully returns the matched object, the match fails to return None
3)re.search(regular, Character string)   -  The first regular string in a match string that returns a match if the match succeeds and the match fails None
4)re.findall(regular, Character string)   -   Gets all regular substrings in the string, the return value is a list, and the elements in the list are matched strings
5)re.finditer(regular, Character string)  -  Gets all regular substrings in the string and returns an iterator in which the elements are matched
6)re.split(regular, Character string)     -   Cuts all regular strings in a string as cutting points and returns a list of elements that are strings
7)re.sub(regular, String 1, String 2)   - Replace all regular strings in string 2 with string 1
"""
```

#### 1) re. Fullmatch (regular, string) - Matches with the entire string and regular expression, returns None if the match succeeds and None if the match fails

```result = re.fullmatch(r'\d{3}', '728')
print(result)
```

#### 2) re. Match (regular, string) - matches the beginning of a string, returns None if the match succeeds in returning the matching object, and returns None if the match fails

```result = re.match(r'\d{3}', '728 Coffee machine for the Conference of Accountants')
print(result)
```

#### 3) re. Search (Regular, String) - The first regular string in a match string that returns a match if the match succeeds and None if the match fails

```result = re.search(r'\d{3}', 'Coffee maker for 262 meeting of 728 CPA University')
print(result)
```

#### 4) re. Findall (regular, string) - Gets all regular substrings in a string, the return value is a list, and the elements in the list are matched strings

```result = re.findall(r'\d{3}', 'Cafe 0923 Coffee Machine for 262 CPA Conference No. 728')
print(result)       # ['728', '262', '092']

result = re.findall(r'[a-z]\d{3}', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)   # ['m262', 'k782']

result = re.findall(r'[a-z](\d{3})', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)   # ['262', '782']

result = re.findall(r'([a-z])(\d{3})', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)   # [('m', '262'), ('k', '782')]
```

#### 5) re. Finditer (regular, string) - Gets all regular substrings in a string and returns an iterator in which the elements are matched

```result = re.finditer(r'\d{3}', 'Cafe 0923 Coffee Machine for 262 CPA Conference No. 728')
print(list(result))

result = re.finditer(r'([a-z])(\d{3})', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(list(result))
```

#### 6)

###### Re. Split (regular, string, N)
```result = re.split(r'\d{3}', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)       # ['Cell Number','Accountant Master m','Meijia','3rphink','3pcs']

message = 'Anhui Division a Water and electricity b Flame ignition visible a After boiling b Shunfeng Technology c Water and electricity costs are sufficient'
result = re.split(r'[abc]', message)
print(result)

result = re.split(r'[abc]', message, 3)
print(result)
```

#### 7) re. Sub (regular, string 1, string 2) - Replaces all regular strings in string 2 with string 1

###### Re. Sub (regular, string 1, string 2, N)
```message = 'Anhui Division a Water and electricity b Flame ignition visible a After boiling b Shunfeng Technology c Water and electricity costs are sufficient'
```
###### Replace a, b, c with'++'
```# new_message = message.replace('a', '++')
# new_message = new_message.replace('b', '++')
# new_message = new_message.replace('c', '++')
# print(new_message)

result = re.sub(r'[abc]', '++', message)
print(result)

result = re.sub(r'\d', '0', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)

result = re.sub(r'\d', '0', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine', 5)
print(result)
```

### 2. Matching Objects

```result = re.search(r'([a-z]{2})-(\d{3})', 'Cell-phone number ag-728 Coffee Machine of 262 CPA Conference')
print(result)       # <re.Match object; span=(3, 9), match='ag-728'>

# 1) Get the string corresponding to the matching result
# a. Get the whole regular matched string: the matched object. group()
r1 = result.group()
print(r1)       # 'ag-728'

# b. Get the result that a group matches: Match object. group(N)
r2 = result.group(1)
print(r2)    # 'ag'

r3 = result.group(2)
print(r3)    # '728'

# 2) Get the location information of the matching result in the original string
r1 = result.span()
print(r1)

r2 = result.span(2)
print(r2)
```

### 3. Parameters

```# 1) Single-line and multi-line matching
"""
When multiple lines match.Cannot and'\n'Match (default): flags=re.M,(?m)
When one line matches.Can and'\n'Match: flags=re.S,(?s)
"""
# Set Single Line Matching
result = re.fullmatch(r'a.c', 'a\nc', flags=re.S)
print(result)

result = re.fullmatch(r'(?s)a.c', 'a\nc')
print(result)

# 2) Ignore case
"""
By default, uppercase and lowercase letters do not match. Ignoring uppercase and lowercase letters matches the corresponding lowercase letters
Method: flags=re.I,(?i)
"""
result = re.fullmatch(r'abc', 'aBc', flags=re.I)
print(result)

result = re.fullmatch(r'(?i)12[a-z]', '12N')
print(result)

# 3) Ignore both case and line matching
# Method: flags=re.I|re.S, (?si)
result = re.fullmatch(r'abc.12', 'aBc\n12', flags=re.I|re.S)
print(result)

result = re.fullmatch(r'(?si)abc.12', 'aBc\n12')
print(result)
```

### Use a regular expression to do the following:

#### 1. Indefinite Choice

1. Regular expressions that exactly match the strings "(010) -62661617" and "01062661617" include (A)

A. r"\(?\d{3}\)?-?\d{8}"
B. r"[0-9()-]+"
C. r"[0-9(-)]*\d*"
D.r"[(]?\d*[)-]*\d*"

2. Regular expressions that exactly match the strings "back" and "back-end" include (A)
A. r'\w{4}-\w{3}|\w{4}'
B. r'\w{4}|\w{4}-\w{3}'
C.r'\S+-\S+|\S+'
D. r'\w*\b-\b\w*|\w*'

3. Regular expressions that match strings "go go" and "kitty kitty" exactly, but not "go kitty", include (D)
A.r '\b(\w+)\b\s+\1\b'
B. r'\w{2,5}\s*\1'
C. r'(\S+) \s+\1'
D. r'(\S{2,5})\s{1,}\1'

4. Regular expressions that can match "aab" in a string, but not "aaab" and "aaaab" include (C)
A. r"a*?b"
B. r"a{,2}b"
C. r"aa??b"
D. r"aaa??b"

#### 2. Programming Questions

1. User name matching

Requirements: 1. User names can only contain numeric letter underscores

3. Degree is in the range of 6 to 16 bits

```from re import fullmatch

re_str = r"[a-zA-Z_][a-zA-Z_1-9]{5,15}"
result = fullmatch(re_str, name)
if result:
return True
else:
return False
```

Requirements: 1. Can't include!@# %^&* These special symbols

3. Degree is in the range of 6 to 12 bits

```from re import fullmatch

def is_keyword(keyword: str):
return fullmatch(r'[a-zA-Z][^!@#¥%^&*]{5,11}', keyword)
```
1. ip address matching in ipv4 format
Tip: IP address range is 0.0.0.0 - 255.255.255.255
```from re import fullmatch, findall

```
1. Extracts and sums values from user input data, including positive and negative numbers as well as integers and decimals
```for example:"-3.14good87nice19bye" =====> -3.14 + 87 + 19 = 102.86
```
```from re import fullmatch, findall
from functools import *

def sum_(str1):
re_str = r"[+-]?\d+[.]?\d+"
result = findall(re_str, str1)
print(reduce(lambda x, item: x+float(item), result, 0))
```
1. Verify that the input can only be Chinese characters

```from re import fullmatch

def is_chinese(chinese: str):
return fullmatch(r'[\u4e00 -\u9fa5]+', chinese)
```
2. Match integers or decimals (both positive and negative)

```from re import fullmatch

def is_numbers(numbers: str):
return fullmatch(r'[+-]?([1-9]\d*|0)(\.\d+)?', numbers)
```
3. Verify that the input username and QQ number are valid and give the corresponding prompt information

Requirement:
User name must be composed of letters, numbers, or underscores and be between 6 and 20 characters in length
QQ number is a number from 5 to 12 and the first place cannot be zero

```from re import fullmatch

is_username = input('enter one user name:')
if result_name:
print('User name is valid')
else:
print('Invalid user name')
keyword = r'[1-9]\d{4,11}'
result_qq = fullmatch(keyword, is_keyword)
if result_qq:
print('QQ No. valid')
else:
print('QQ Invalid number')
```
4. Split long string: take each sentence out of a poem

poem ='moonlight in front of window, frost on the ground suspected. Look up at the moon, look down and think about your home.

```from re import split

poem = 'Bright moonlight in front of the window, frost on the ground is suspected. Raising my head, I see the moon so bright; withdrawing my eyes, my nostalgia comes around.'
result = split(r'[，. ]', poem)
for x in result:
print(x)
```

Tags: Python regex

Posted by kurtsu on Thu, 05 May 2022 19:12:51 +0300