5 Regular expression learning notes and assignments

5.5 Regular Expression Learning Notes and Homework

Understanding Regular Expressions

Regular expression: A tool for solving string problems (a tool that makes complex string problems easy)
Question: Verify the validity of the entered mobile number.
abc - not
123 - No
12345678901 - Illegal
13354627381 - Legal
1335462738189 - Illegal
Method 1
def is_tel_num(tel_no: str):
    if len(tel_no) != 11:
        return False

    if tel_no[0] != '1':
        return False

    if tel_no[1] not in '3456789':
        return False

    return tel_no.isdigit()


tel = '13354627381'
print(is_tel_num(tel))
from re import fullmatch,findall


# Method 2:
def is_tel_num2(tel_no: str):
    return fullmatch(r'1[3-9]\d{9}', tel_no) != None


tel = '13354622381'
print(is_tel_num2(tel))
message = '234 data upon receipt of a response jssh 78 Shifu Haoxue 2391 Jinhu Fufu 23 card brusher sshs34'
# 234,78,2391,23,34

# Method 1:
all_num = []
num_str = ''
for index in range(len(message)):
    if message[index].isdigit():
        num_str += message[index]
    else:
        all_num.append(num_str)
        num_str = ''
all_num.append(num_str)
all_num = [int(x) for x in all_num if x]

print(all_num)
# Method 2:
message = '234 data upon receipt of a response jssh 78 Shifu Haoxue 2391 Jinhu Fufu 23 card brusher sshs34'
all_num2 = findall(r'\d+', message)
print(all_num2)

Matching class symbols

from re import fullmatch

1. re module

"""
re Module is python A module used to support regular expressions
re Various regularly related functions are provided in the module: fullmatch,search,findall,match,split,sub Wait

fullmatch(regular expression, Character string)  -  Rule that determines whether the entire string is fully symbolized as described by a regular expression. If the non-conforming return value is None

python How regular expressions are provided in: r'regular expression'
js How regular expressions are provided in:/regular expression/
"""

2. Match Class Symbol - A regular symbol represents a class of characters

The role of matching class symbols in regularization: the character used to require what a position in the string must be

1) Ordinary symbols - the symbol itself is represented in regular expressions, and the requirement for the characters in the corresponding string is the symbol itself.

String Requirement: There are 3 characters in total, the first is'a', the second is'b', and the third is'c'
result = fullmatch(r'abc', 'abc')
print(result)

result = fullmatch(r'abc', 'mnd')
print(result)

2). - Match an arbitrary character

String Requirement: There are 3 characters in total, the first is'a', the last is'c', which can be any symbol in between
result = fullmatch(r'a.c', 'a*c')
print(result)

result = fullmatch(r'..xy', 'yes sxy')
print(result)

3)\d - Match an arbitrary number

result = fullmatch(r'a\dc', 'a2c')
print(result)

result = fullmatch(r'\d\d\d..', '823m yes')
print(result)

4)\s - Matches any white space character

White space characters include: spaces, \n, \t
result = fullmatch(r'a\sb', 'a b')
print(result)

5)\w - Match any number, letter or underscore or Chinese

result = fullmatch(r'a\wb', 'a3b')
print(result)

6)\Uppercase-Contrary to the function of the corresponding lowercase letter

"""
\D  -   Match any non-numeric character
\S  -   Match any non-whitespace character
\W 
"""

result = fullmatch(r'a\Db', 'a2b')
print(result)       # None
result = fullmatch(r'a\Db', 'a)b')
print(result)

7) [Character Set] - Matches any character in the character set

Note: A [] can only match one character
"""
[Multiple common symbols] -  For example:[abc12], stay'a','b','c','1','2'Any of the five symbols can match 
[Contain\Special symbol at the beginning]  - For example:[mn\d],[m\dn],[\dmn], Requirements are m perhaps n Or any number 
[Character 1-Character 2]   -   For example:[a-z],Require any lowercase letter
                        [a-zA-Z],Require any letter
                        [2-9a-z],Require 2 to 9 or any lower case letter
                        [\u4e00-\u9fa5],Requirement is any Chinese
                        [\u4e00-\u9fa5\dabc]
                   Be careful:[]Medium if-It's not between two characters. It's just a common symbol.
"""
result = fullmatch(r'1[xyzmn]2', '1n2')
print(result)


result = fullmatch(r'a[mn\d]b', 'a9b')
print(result)

result = fullmatch(r'1[a-z][a-z]2', '1hm2')
print(result)

result = fullmatch(r'a[x\u4e00-\u9fa5\dy]b', 'ayb')
print(result)

result = fullmatch(r'1[-az]2', '1-2')
print(result)

8) [^Character Set] - Matches any character not in the character set

result = fullmatch(r'1[^xyz]2', '1x2')
print(result)

result = fullmatch(r'1[^\dab]2', '1M2')
print(result)


result = fullmatch(r'a[^2-9]b', 'a3b')
print(result)

Number of matches

from re import fullmatch, search

1. * - Matches 0 or more times (any number of times)

"""
a*  -   a Any number of occurrences
\d*  -  Any number\d  -> Any number
[abc]*  -  Any number[abc]  -> Any number(a perhaps b perhaps c)
...
"""
result = fullmatch(r'a*b', 'aaaaaaab')
print(result)

result = fullmatch(r'\d*b', '21222b')
print(result)

result = fullmatch(r'[A-Z]*b', 'KDBb')
print(result)

2. + - Match once or more (at least once)

result = fullmatch(r'a+b', 'aaaaab')
print(result)

3.?- 0 or 1 time

result = fullmatch(r'-?123', '-123')
print(result)

result = fullmatch(r'[+-]?123', '+123')
print(result)

4. {} - Number of previous characters repeated

"""
{N}   -  N second
{M,N} -  M reach N second
{M,}  -  at least M second
{,N}  -  most N second

* == {0,}
+ == {1,}
? == {0,1}
"""
result = fullmatch(r'\d{3}abc', '623abc')
print(result)

result = fullmatch(r'\d{2,5}abc', '2234abc')
print(result)

5. Greedy and non-greedy

When the number of matches is uncertain, the matching pattern is divided into greedy and non-greedy, which is greedy by default.
"""
The number of matches is uncertain:*,+,?,{M,N},{M,},{,N}

Greedy and non-greedy: In the case of uncertain number of times, the corresponding string has multiple matching results at different times, greedy for the result corresponding to the most number of times. (Provided that matching succeeds in a variety of situations)
            Non-greedy results for the minimum number of times.
            
Greedy:*,+,?,{M,N},{M,},{,N}
Non-greedy:*?,+?,??,{M,N}?,{M,}?,{,N}?
"""
result = fullmatch(r'\d+?', '26373')
print(result)       # '26373'


# Search (regular expression, string) - Finds the first string in a string that satisfies a regular expression
# 2 -> 1 26 -> 2  263 -> 3   2637 -> 4  26373 -> 5
result = search(r'\d+?', 'Water and electricity fee State 26373 sfdhgahj')
print(result)


# 'amnb'-> 2'amnbxnxb' -> 6'amnbxnxb Shengshi B'-> 9
result = search(r'a.*b', r'Construction party goes home amnbxnxb Sheng Shi b-2---==')
print(result)   # 'amnbxnxb Shengshi b


result = search(r'a.*?b', r'Construction party goes home amnbxnxb Sheng Shi b-2---==')
print(result)   # 'amnb'


# '<p>Are you okay</p>'- 4'<p>Are you okay</p><a>Baidu</a><p>hello world!</ P>'
html = '<body><span>start!</span><p>Are you OK</p><a>Baidu</a><p>hello world!</p></body>'
result = search(r'<p>(.*?)</p>', html)
print(result, result.group(1))


result = search(r'a.+?c', 'Mobile Play amnc Hello c Accounting Place abc')
print(result)

Grouping and Branching

from re import fullmatch, findall

1. Grouping - ()

Grouping is the bracketing of parts of a regular to form a grouping

1) Overall operation

2) Repetition

In a regular: \N can repeat what the Nth grouping before the \N's location matches

3) Capture-Get Part of Regular Matching Result

Match: The structure of two letters and two numbers is repeated three times,'mn78jh56lm89'
result = fullmatch(r'[a-zA-Z]{2}\d\d[a-zA-Z]{2}\d\d[a-zA-Z]{2}\d\d', 'mn78jh56lm89')
print(result)

result = fullmatch(r'([a-zA-Z]{2}\d\d){3}', 'mn78jh56lm89')
print(result)
Match:'23abc23,'59abc59'- Success
'23abc56'- Failed
# result = fullmatch(r'\d\dabc\d\d', '23abc56')
# print(result)

result = fullmatch(r'(\d\d)abc\1', '23abc23')
print(result)


result = fullmatch(r'(\d{3})([a-z]{2})-\2\1=\1{3}', '876nm-nm876=876876876')
print(result)


result = fullmatch(r'(((\d{2})[A-Z]{3})([a-z]{2}))-\2-\1-\3', '34MNGbn-34MNG-34MNGbn-34')
print(result)


result = findall(r'[a-z]\d\d', 'Drinking 2 Spatial Data 78, Next Year and45 Mentlessness 2341==hsn89=river=263')
print(result)   # ['d45', 'n89']

result = findall(r'[a-z](\d\d)', 'Drinking 2 Spatial Data 78, Next Year and45 Mentlessness 2341==hsn89=river=263')
print(result)       # ['45', '89']

2. Branch - |

Regular 1|Regular 2 - Match with Regular 1 first. If the match succeeds, it succeeds directly. If the match fails, match with Regular 2 again. If the match succeeds, it fails.
Match: ABC is followed by two arbitrary numbers or two arbitrary uppercase letters,'abc34','abcKJ'
result = fullmatch(r'abc\d\d|abc[A-Z]{2}', 'abc34')
print(result)

result = fullmatch(r'abc(\d\d|[A-Z]{2})', 'abcKL')
print(result)

Detect class symbols and escape symbols

from re import fullmatch, findall, search

1. Detect class symbols (Understanding)

Detect class symbols are not matching symbols and do not require what kind of characters a location must be, but are related to detect whether a location has symbols.

1)\b - Check if it's a word boundary

"""
Word boundary  -  Symbols that can be used to separate two words, such as blank characters, punctuation, the beginning and end of a string
"""
result = findall(r'\b\d+\b', '23 Data 2367 skjj,89 Is 2039,Is the key 768 hsj,237 Long time no 79 ssjs 89')
print(result)

2)\B - Check for non-word boundaries

3) ^ - Detects if the string begins ([] outside)

4) $- Detects if it is the end of a string

result = fullmatch(r'1([358][0-9]|4[579]|66|7[0135678]|9[89])[0-9]{8}', '13578237392')
print(result)

result = search(r'^1([358][0-9]|4[579]|66|7[0135678]|9[89])[0-9]{8}$', '13578237392')
print(result)

# Phone number regular expression ^1 ([358][0-9]|4[579]|66|7[0135678]|9[89]) [0-9]{8}$

2. Escape Symbols

Escape symbols in a regular refer to symbols with special functions that are preceded by \ to make the special functions disappear into a common symbol.
Write a regular match decimal: 2.3
result = fullmatch(r'\d\.\d', '2.3')
print(result)

# '23+78'
result = fullmatch(r'\d\d\+\d\d', '65+23')
print(result)

# '(Protective Devices)'
result = fullmatch(r'\([\u4e00-\u9fa5]{2}\)', '(Protective clothing)')
print(result)

# '\dabc'
result = fullmatch(r'\\dabc', '\dabc')
print(result)


# '-abc','Mabc','Nabc'
result = fullmatch(r'[mn\]]abc', ']abc')
print(result)
Supplement: Symbols with special meaning exist independently, and special functions disappear directly into a common symbol when placed in [], for example: +, *,.?,), (etc.)
result = fullmatch(r'[.+*?$]ab[.]c', '+ab.c')
print(result)

re module

import re

1. Common functions

"""
1)re.fullmatch(regular, Character string)   -  Matches with the entire string and regular expression, and returns if the match succeeds in returning the matched object or fails None
2)re.match(regular, Character string)      -   Beginning of the match string, if the match successfully returns the matched object, the match fails to return None
3)re.search(regular, Character string)   -  The first regular string in a match string that returns a match if the match succeeds and the match fails None
4)re.findall(regular, Character string)   -   Gets all regular substrings in the string, the return value is a list, and the elements in the list are matched strings
5)re.finditer(regular, Character string)  -  Gets all regular substrings in the string and returns an iterator in which the elements are matched
6)re.split(regular, Character string)     -   Cuts all regular strings in a string as cutting points and returns a list of elements that are strings
7)re.sub(regular, String 1, String 2)   - Replace all regular strings in string 2 with string 1
"""

1) re. Fullmatch (regular, string) - Matches with the entire string and regular expression, returns None if the match succeeds and None if the match fails

result = re.fullmatch(r'\d{3}', '728')
print(result)

2) re. Match (regular, string) - matches the beginning of a string, returns None if the match succeeds in returning the matching object, and returns None if the match fails

result = re.match(r'\d{3}', '728 Coffee machine for the Conference of Accountants')
print(result)

3) re. Search (Regular, String) - The first regular string in a match string that returns a match if the match succeeds and None if the match fails

result = re.search(r'\d{3}', 'Coffee maker for 262 meeting of 728 CPA University')
print(result)

4) re. Findall (regular, string) - Gets all regular substrings in a string, the return value is a list, and the elements in the list are matched strings

result = re.findall(r'\d{3}', 'Cafe 0923 Coffee Machine for 262 CPA Conference No. 728')
print(result)       # ['728', '262', '092']

result = re.findall(r'[a-z]\d{3}', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)   # ['m262', 'k782']

result = re.findall(r'[a-z](\d{3})', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)   # ['262', '782']

result = re.findall(r'([a-z])(\d{3})', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)   # [('m', '262'), ('k', '782')]

5) re. Finditer (regular, string) - Gets all regular substrings in a string and returns an iterator in which the elements are matched

result = re.finditer(r'\d{3}', 'Cafe 0923 Coffee Machine for 262 CPA Conference No. 728')
print(list(result))


result = re.finditer(r'([a-z])(\d{3})', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(list(result))

6)

Re. Split (regular, string) - Cuts all regular strings in a string as cutting points and returns a list where the elements are strings
Re. Split (regular, string, N)
result = re.split(r'\d{3}', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)       # ['Cell Number','Accountant Master m','Meijia','3rphink','3pcs']

message = 'Anhui Division a Water and electricity b Flame ignition visible a After boiling b Shunfeng Technology c Water and electricity costs are sufficient'
result = re.split(r'[abc]', message)
print(result)

result = re.split(r'[abc]', message, 3)
print(result)

7) re. Sub (regular, string 1, string 2) - Replaces all regular strings in string 2 with string 1

Re. Sub (regular, string 1, string 2, N)
message = 'Anhui Division a Water and electricity b Flame ignition visible a After boiling b Shunfeng Technology c Water and electricity costs are sufficient'
Replace a, b, c with'++'
# new_message = message.replace('a', '++')
# new_message = new_message.replace('b', '++')
# new_message = new_message.replace('c', '++')
# print(new_message)

result = re.sub(r'[abc]', '++', message)
print(result)


result = re.sub(r'\d', '0', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine')
print(result)

result = re.sub(r'\d', '0', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine', 5)
print(result)

2. Matching Objects

result = re.search(r'([a-z]{2})-(\d{3})', 'Cell-phone number ag-728 Coffee Machine of 262 CPA Conference')
print(result)       # <re.Match object; span=(3, 9), match='ag-728'>

# 1) Get the string corresponding to the matching result
# a. Get the whole regular matched string: the matched object. group()
r1 = result.group()
print(r1)       # 'ag-728'

# b. Get the result that a group matches: Match object. group(N)
r2 = result.group(1)
print(r2)    # 'ag'

r3 = result.group(2)
print(r3)    # '728'

# 2) Get the location information of the matching result in the original string
r1 = result.span()
print(r1)

r2 = result.span(2)
print(r2)

3. Parameters

# 1) Single-line and multi-line matching
"""
When multiple lines match.Cannot and'\n'Match (default): flags=re.M,(?m)
When one line matches.Can and'\n'Match: flags=re.S,(?s)
"""
# Set Single Line Matching
result = re.fullmatch(r'a.c', 'a\nc', flags=re.S)
print(result)

result = re.fullmatch(r'(?s)a.c', 'a\nc')
print(result)

# 2) Ignore case
"""
By default, uppercase and lowercase letters do not match. Ignoring uppercase and lowercase letters matches the corresponding lowercase letters
 Method: flags=re.I,(?i)
"""
result = re.fullmatch(r'abc', 'aBc', flags=re.I)
print(result)

result = re.fullmatch(r'(?i)12[a-z]', '12N')
print(result)

# 3) Ignore both case and line matching
# Method: flags=re.I|re.S, (?si)
result = re.fullmatch(r'abc.12', 'aBc\n12', flags=re.I|re.S)
print(result)

result = re.fullmatch(r'(?si)abc.12', 'aBc\n12')
print(result)

task

Use a regular expression to do the following:

1. Indefinite Choice

  1. Regular expressions that exactly match the strings "(010) -62661617" and "01062661617" include (A)

    A. r"\(?\d{3}\)?-?\d{8}"
    B. r"[0-9()-]+"
    C. r"[0-9(-)]*\d*"
    D.r"[(]?\d*[)-]*\d*"

  2. Regular expressions that exactly match the strings "back" and "back-end" include (A)
    A. r'\w{4}-\w{3}|\w{4}'
    B. r'\w{4}|\w{4}-\w{3}'
    C.r'\S+-\S+|\S+'
    D. r'\w*\b-\b\w*|\w*'

  3. Regular expressions that match strings "go go" and "kitty kitty" exactly, but not "go kitty", include (D)
    A.r '\b(\w+)\b\s+\1\b'
    B. r'\w{2,5}\s*\1'
    C. r'(\S+) \s+\1'
    D. r'(\S{2,5})\s{1,}\1'

  4. Regular expressions that can match "aab" in a string, but not "aaab" and "aaaab" include (C)
    A. r"a*?b"
    B. r"a{,2}b"
    C. r"aa??b"
    D. r"aaa??b"

2. Programming Questions

1. User name matching

Requirements: 1. User names can only contain numeric letter underscores

2. Cannot start with a number

3. Degree is in the range of 6 to 16 bits

from re import fullmatch


def username(name):
    re_str = r"[a-zA-Z_][a-zA-Z_1-9]{5,15}"
    result = fullmatch(re_str, name)
    if result:
        return True
    else:
        return False
  1. Password Matching

Requirements: 1. Can't include!@# %^&* These special symbols

2. Must start with a letter

3. Degree is in the range of 6 to 12 bits

from re import fullmatch


def is_keyword(keyword: str):
    return fullmatch(r'[a-zA-Z][^!@#¥%^&*]{5,11}', keyword)
  1. ip address matching in ipv4 format
    Tip: IP address range is 0.0.0.0 - 255.255.255.255
from re import fullmatch, findall


def is_address(address: str):
    return fullmatch(r'(([1-9]?\d|1\d{2}|2[0-4]\d|25[0-5])\.){3}(\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])', address)
  1. Extracts and sums values from user input data, including positive and negative numbers as well as integers and decimals
for example:"-3.14good87nice19bye" =====> -3.14 + 87 + 19 = 102.86
from re import fullmatch, findall
from functools import *


def sum_(str1):
    re_str = r"[+-]?\d+[.]?\d+"
    result = findall(re_str, str1)
    print(reduce(lambda x, item: x+float(item), result, 0))
  1. Verify that the input can only be Chinese characters

    from re import fullmatch
    
    
    def is_chinese(chinese: str):
        return fullmatch(r'[\u4e00 -\u9fa5]+', chinese)
    
  2. Match integers or decimals (both positive and negative)

    from re import fullmatch
    
    
    def is_numbers(numbers: str):
        return fullmatch(r'[+-]?([1-9]\d*|0)(\.\d+)?', numbers)
    
  3. Verify that the input username and QQ number are valid and give the corresponding prompt information

    Requirement:
    User name must be composed of letters, numbers, or underscores and be between 6 and 20 characters in length
    QQ number is a number from 5 to 12 and the first place cannot be zero

    from re import fullmatch
    
    username = r'(?i)[a-z1-9_]{6,20}'
    is_username = input('enter one user name:')
    result_name = fullmatch(username, is_username)
    if result_name:
        print('User name is valid')
    else:
        print('Invalid user name')
    keyword = r'[1-9]\d{4,11}'
    is_keyword = input('Please input a password:')
    result_qq = fullmatch(keyword, is_keyword)
    if result_qq:
        print('QQ No. valid')
    else:
        print('QQ Invalid number')
    
  4. Split long string: take each sentence out of a poem

    poem ='moonlight in front of window, frost on the ground suspected. Look up at the moon, look down and think about your home.

    from re import split
    
    poem = 'Bright moonlight in front of the window, frost on the ground is suspected. Raising my head, I see the moon so bright; withdrawing my eyes, my nostalgia comes around.'
    result = split(r'[,. ]', poem)
    for x in result:
        print(x)
    

Tags: Python regex

Posted by kurtsu on Thu, 05 May 2022 19:12:51 +0300