5.5 Regular Expression Learning Notes and Homework
Understanding Regular Expressions
Regular expression: A tool for solving string problems (a tool that makes complex string problems easy)
Question: Verify the validity of the entered mobile number.
abc - not
123 - No
12345678901 - Illegal
13354627381 - Legal
1335462738189 - Illegal
Method 1
def is_tel_num(tel_no: str): if len(tel_no) != 11: return False if tel_no[0] != '1': return False if tel_no[1] not in '3456789': return False return tel_no.isdigit() tel = '13354627381' print(is_tel_num(tel))
from re import fullmatch,findall # Method 2: def is_tel_num2(tel_no: str): return fullmatch(r'1[3-9]\d{9}', tel_no) != None tel = '13354622381' print(is_tel_num2(tel))
message = '234 data upon receipt of a response jssh 78 Shifu Haoxue 2391 Jinhu Fufu 23 card brusher sshs34' # 234,78,2391,23,34 # Method 1: all_num = [] num_str = '' for index in range(len(message)): if message[index].isdigit(): num_str += message[index] else: all_num.append(num_str) num_str = '' all_num.append(num_str) all_num = [int(x) for x in all_num if x] print(all_num)
# Method 2: message = '234 data upon receipt of a response jssh 78 Shifu Haoxue 2391 Jinhu Fufu 23 card brusher sshs34' all_num2 = findall(r'\d+', message) print(all_num2)
Matching class symbols
from re import fullmatch
1. re module
""" re Module is python A module used to support regular expressions re Various regularly related functions are provided in the module: fullmatch,search,findall,match,split,sub Wait fullmatch(regular expression, Character string) - Rule that determines whether the entire string is fully symbolized as described by a regular expression. If the non-conforming return value is None python How regular expressions are provided in: r'regular expression' js How regular expressions are provided in:/regular expression/ """
2. Match Class Symbol - A regular symbol represents a class of characters
The role of matching class symbols in regularization: the character used to require what a position in the string must be
1) Ordinary symbols - the symbol itself is represented in regular expressions, and the requirement for the characters in the corresponding string is the symbol itself.
String Requirement: There are 3 characters in total, the first is'a', the second is'b', and the third is'c'
result = fullmatch(r'abc', 'abc') print(result) result = fullmatch(r'abc', 'mnd') print(result)
2). - Match an arbitrary character
String Requirement: There are 3 characters in total, the first is'a', the last is'c', which can be any symbol in between
result = fullmatch(r'a.c', 'a*c') print(result) result = fullmatch(r'..xy', 'yes sxy') print(result)
3)\d - Match an arbitrary number
result = fullmatch(r'a\dc', 'a2c') print(result) result = fullmatch(r'\d\d\d..', '823m yes') print(result)
4)\s - Matches any white space character
White space characters include: spaces, \n, \t
result = fullmatch(r'a\sb', 'a b') print(result)
5)\w - Match any number, letter or underscore or Chinese
result = fullmatch(r'a\wb', 'a3b') print(result)
6)\Uppercase-Contrary to the function of the corresponding lowercase letter
""" \D - Match any non-numeric character \S - Match any non-whitespace character \W """ result = fullmatch(r'a\Db', 'a2b') print(result) # None result = fullmatch(r'a\Db', 'a)b') print(result)
7) [Character Set] - Matches any character in the character set
Note: A [] can only match one character
""" [Multiple common symbols] - For example:[abc12], stay'a','b','c','1','2'Any of the five symbols can match [Contain\Special symbol at the beginning] - For example:[mn\d],[m\dn],[\dmn], Requirements are m perhaps n Or any number [Character 1-Character 2] - For example:[a-z],Require any lowercase letter [a-zA-Z],Require any letter [2-9a-z],Require 2 to 9 or any lower case letter [\u4e00-\u9fa5],Requirement is any Chinese [\u4e00-\u9fa5\dabc] Be careful:[]Medium if-It's not between two characters. It's just a common symbol. """
result = fullmatch(r'1[xyzmn]2', '1n2') print(result) result = fullmatch(r'a[mn\d]b', 'a9b') print(result) result = fullmatch(r'1[a-z][a-z]2', '1hm2') print(result) result = fullmatch(r'a[x\u4e00-\u9fa5\dy]b', 'ayb') print(result) result = fullmatch(r'1[-az]2', '1-2') print(result)
8) [^Character Set] - Matches any character not in the character set
result = fullmatch(r'1[^xyz]2', '1x2') print(result) result = fullmatch(r'1[^\dab]2', '1M2') print(result) result = fullmatch(r'a[^2-9]b', 'a3b') print(result)
Number of matches
from re import fullmatch, search
1. * - Matches 0 or more times (any number of times)
""" a* - a Any number of occurrences \d* - Any number\d -> Any number [abc]* - Any number[abc] -> Any number(a perhaps b perhaps c) ... """ result = fullmatch(r'a*b', 'aaaaaaab') print(result) result = fullmatch(r'\d*b', '21222b') print(result) result = fullmatch(r'[A-Z]*b', 'KDBb') print(result)
2. + - Match once or more (at least once)
result = fullmatch(r'a+b', 'aaaaab') print(result)
3.?- 0 or 1 time
result = fullmatch(r'-?123', '-123') print(result) result = fullmatch(r'[+-]?123', '+123') print(result)
4. {} - Number of previous characters repeated
""" {N} - N second {M,N} - M reach N second {M,} - at least M second {,N} - most N second * == {0,} + == {1,} ? == {0,1} """
result = fullmatch(r'\d{3}abc', '623abc') print(result) result = fullmatch(r'\d{2,5}abc', '2234abc') print(result)
5. Greedy and non-greedy
When the number of matches is uncertain, the matching pattern is divided into greedy and non-greedy, which is greedy by default.
""" The number of matches is uncertain:*,+,?,{M,N},{M,},{,N} Greedy and non-greedy: In the case of uncertain number of times, the corresponding string has multiple matching results at different times, greedy for the result corresponding to the most number of times. (Provided that matching succeeds in a variety of situations) Non-greedy results for the minimum number of times. Greedy:*,+,?,{M,N},{M,},{,N} Non-greedy:*?,+?,??,{M,N}?,{M,}?,{,N}? """ result = fullmatch(r'\d+?', '26373') print(result) # '26373' # Search (regular expression, string) - Finds the first string in a string that satisfies a regular expression # 2 -> 1 26 -> 2 263 -> 3 2637 -> 4 26373 -> 5 result = search(r'\d+?', 'Water and electricity fee State 26373 sfdhgahj') print(result) # 'amnb'-> 2'amnbxnxb' -> 6'amnbxnxb Shengshi B'-> 9 result = search(r'a.*b', r'Construction party goes home amnbxnxb Sheng Shi b-2---==') print(result) # 'amnbxnxb Shengshi b result = search(r'a.*?b', r'Construction party goes home amnbxnxb Sheng Shi b-2---==') print(result) # 'amnb' # '<p>Are you okay</p>'- 4'<p>Are you okay</p><a>Baidu</a><p>hello world!</ P>' html = '<body><span>start!</span><p>Are you OK</p><a>Baidu</a><p>hello world!</p></body>' result = search(r'<p>(.*?)</p>', html) print(result, result.group(1)) result = search(r'a.+?c', 'Mobile Play amnc Hello c Accounting Place abc') print(result)
Grouping and Branching
from re import fullmatch, findall
1. Grouping - ()
Grouping is the bracketing of parts of a regular to form a grouping
1) Overall operation
2) Repetition
In a regular: \N can repeat what the Nth grouping before the \N's location matches
3) Capture-Get Part of Regular Matching Result
Match: The structure of two letters and two numbers is repeated three times,'mn78jh56lm89'
result = fullmatch(r'[a-zA-Z]{2}\d\d[a-zA-Z]{2}\d\d[a-zA-Z]{2}\d\d', 'mn78jh56lm89') print(result) result = fullmatch(r'([a-zA-Z]{2}\d\d){3}', 'mn78jh56lm89') print(result)
Match:'23abc23,'59abc59'- Success
'23abc56'- Failed
# result = fullmatch(r'\d\dabc\d\d', '23abc56') # print(result) result = fullmatch(r'(\d\d)abc\1', '23abc23') print(result) result = fullmatch(r'(\d{3})([a-z]{2})-\2\1=\1{3}', '876nm-nm876=876876876') print(result) result = fullmatch(r'(((\d{2})[A-Z]{3})([a-z]{2}))-\2-\1-\3', '34MNGbn-34MNG-34MNGbn-34') print(result) result = findall(r'[a-z]\d\d', 'Drinking 2 Spatial Data 78, Next Year and45 Mentlessness 2341==hsn89=river=263') print(result) # ['d45', 'n89'] result = findall(r'[a-z](\d\d)', 'Drinking 2 Spatial Data 78, Next Year and45 Mentlessness 2341==hsn89=river=263') print(result) # ['45', '89']
2. Branch - |
Regular 1|Regular 2 - Match with Regular 1 first. If the match succeeds, it succeeds directly. If the match fails, match with Regular 2 again. If the match succeeds, it fails.
Match: ABC is followed by two arbitrary numbers or two arbitrary uppercase letters,'abc34','abcKJ'
result = fullmatch(r'abc\d\d|abc[A-Z]{2}', 'abc34') print(result) result = fullmatch(r'abc(\d\d|[A-Z]{2})', 'abcKL') print(result)
Detect class symbols and escape symbols
from re import fullmatch, findall, search
1. Detect class symbols (Understanding)
Detect class symbols are not matching symbols and do not require what kind of characters a location must be, but are related to detect whether a location has symbols.
1)\b - Check if it's a word boundary
""" Word boundary - Symbols that can be used to separate two words, such as blank characters, punctuation, the beginning and end of a string """ result = findall(r'\b\d+\b', '23 Data 2367 skjj,89 Is 2039,Is the key 768 hsj,237 Long time no 79 ssjs 89') print(result)
2)\B - Check for non-word boundaries
3) ^ - Detects if the string begins ([] outside)
4) $- Detects if it is the end of a string
result = fullmatch(r'1([358][0-9]|4[579]|66|7[0135678]|9[89])[0-9]{8}', '13578237392') print(result) result = search(r'^1([358][0-9]|4[579]|66|7[0135678]|9[89])[0-9]{8}$', '13578237392') print(result) # Phone number regular expression ^1 ([358][0-9]|4[579]|66|7[0135678]|9[89]) [0-9]{8}$
2. Escape Symbols
Escape symbols in a regular refer to symbols with special functions that are preceded by \ to make the special functions disappear into a common symbol.
Write a regular match decimal: 2.3
result = fullmatch(r'\d\.\d', '2.3') print(result) # '23+78' result = fullmatch(r'\d\d\+\d\d', '65+23') print(result) # '(Protective Devices)' result = fullmatch(r'\([\u4e00-\u9fa5]{2}\)', '(Protective clothing)') print(result) # '\dabc' result = fullmatch(r'\\dabc', '\dabc') print(result) # '-abc','Mabc','Nabc' result = fullmatch(r'[mn\]]abc', ']abc') print(result)
Supplement: Symbols with special meaning exist independently, and special functions disappear directly into a common symbol when placed in [], for example: +, *,.?,), (etc.)
result = fullmatch(r'[.+*?$]ab[.]c', '+ab.c') print(result)
re module
import re
1. Common functions
""" 1)re.fullmatch(regular, Character string) - Matches with the entire string and regular expression, and returns if the match succeeds in returning the matched object or fails None 2)re.match(regular, Character string) - Beginning of the match string, if the match successfully returns the matched object, the match fails to return None 3)re.search(regular, Character string) - The first regular string in a match string that returns a match if the match succeeds and the match fails None 4)re.findall(regular, Character string) - Gets all regular substrings in the string, the return value is a list, and the elements in the list are matched strings 5)re.finditer(regular, Character string) - Gets all regular substrings in the string and returns an iterator in which the elements are matched 6)re.split(regular, Character string) - Cuts all regular strings in a string as cutting points and returns a list of elements that are strings 7)re.sub(regular, String 1, String 2) - Replace all regular strings in string 2 with string 1 """
1) re. Fullmatch (regular, string) - Matches with the entire string and regular expression, returns None if the match succeeds and None if the match fails
result = re.fullmatch(r'\d{3}', '728') print(result)
2) re. Match (regular, string) - matches the beginning of a string, returns None if the match succeeds in returning the matching object, and returns None if the match fails
result = re.match(r'\d{3}', '728 Coffee machine for the Conference of Accountants') print(result)
3) re. Search (Regular, String) - The first regular string in a match string that returns a match if the match succeeds and None if the match fails
result = re.search(r'\d{3}', 'Coffee maker for 262 meeting of 728 CPA University') print(result)
4) re. Findall (regular, string) - Gets all regular substrings in a string, the return value is a list, and the elements in the list are matched strings
result = re.findall(r'\d{3}', 'Cafe 0923 Coffee Machine for 262 CPA Conference No. 728') print(result) # ['728', '262', '092'] result = re.findall(r'[a-z]\d{3}', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine') print(result) # ['m262', 'k782'] result = re.findall(r'[a-z](\d{3})', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine') print(result) # ['262', '782'] result = re.findall(r'([a-z])(\d{3})', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine') print(result) # [('m', '262'), ('k', '782')]
5) re. Finditer (regular, string) - Gets all regular substrings in a string and returns an iterator in which the elements are matched
result = re.finditer(r'\d{3}', 'Cafe 0923 Coffee Machine for 262 CPA Conference No. 728') print(list(result)) result = re.finditer(r'([a-z])(\d{3})', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine') print(list(result))
6)
Re. Split (regular, string) - Cuts all regular strings in a string as cutting points and returns a list where the elements are strings
Re. Split (regular, string, N)
result = re.split(r'\d{3}', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine') print(result) # ['Cell Number','Accountant Master m','Meijia','3rphink','3pcs'] message = 'Anhui Division a Water and electricity b Flame ignition visible a After boiling b Shunfeng Technology c Water and electricity costs are sufficient' result = re.split(r'[abc]', message) print(result) result = re.split(r'[abc]', message, 3) print(result)
7) re. Sub (regular, string 1, string 2) - Replaces all regular strings in string 2 with string 1
Re. Sub (regular, string 1, string 2, N)
message = 'Anhui Division a Water and electricity b Flame ignition visible a After boiling b Shunfeng Technology c Water and electricity costs are sufficient'
Replace a, b, c with'++'
# new_message = message.replace('a', '++') # new_message = new_message.replace('b', '++') # new_message = new_message.replace('c', '++') # print(new_message) result = re.sub(r'[abc]', '++', message) print(result) result = re.sub(r'\d', '0', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine') print(result) result = re.sub(r'\d', '0', 'Accountant University with Mobile Number 728 m262 Cafe 0923 k7823 machine', 5) print(result)
2. Matching Objects
result = re.search(r'([a-z]{2})-(\d{3})', 'Cell-phone number ag-728 Coffee Machine of 262 CPA Conference') print(result) # <re.Match object; span=(3, 9), match='ag-728'> # 1) Get the string corresponding to the matching result # a. Get the whole regular matched string: the matched object. group() r1 = result.group() print(r1) # 'ag-728' # b. Get the result that a group matches: Match object. group(N) r2 = result.group(1) print(r2) # 'ag' r3 = result.group(2) print(r3) # '728' # 2) Get the location information of the matching result in the original string r1 = result.span() print(r1) r2 = result.span(2) print(r2)
3. Parameters
# 1) Single-line and multi-line matching """ When multiple lines match.Cannot and'\n'Match (default): flags=re.M,(?m) When one line matches.Can and'\n'Match: flags=re.S,(?s) """ # Set Single Line Matching result = re.fullmatch(r'a.c', 'a\nc', flags=re.S) print(result) result = re.fullmatch(r'(?s)a.c', 'a\nc') print(result) # 2) Ignore case """ By default, uppercase and lowercase letters do not match. Ignoring uppercase and lowercase letters matches the corresponding lowercase letters Method: flags=re.I,(?i) """ result = re.fullmatch(r'abc', 'aBc', flags=re.I) print(result) result = re.fullmatch(r'(?i)12[a-z]', '12N') print(result) # 3) Ignore both case and line matching # Method: flags=re.I|re.S, (?si) result = re.fullmatch(r'abc.12', 'aBc\n12', flags=re.I|re.S) print(result) result = re.fullmatch(r'(?si)abc.12', 'aBc\n12') print(result)
task
Use a regular expression to do the following:
1. Indefinite Choice
-
Regular expressions that exactly match the strings "(010) -62661617" and "01062661617" include (A)
A. r"\(?\d{3}\)?-?\d{8}"
B. r"[0-9()-]+"
C. r"[0-9(-)]*\d*"
D.r"[(]?\d*[)-]*\d*" -
Regular expressions that exactly match the strings "back" and "back-end" include (A)
A. r'\w{4}-\w{3}|\w{4}'
B. r'\w{4}|\w{4}-\w{3}'
C.r'\S+-\S+|\S+'
D. r'\w*\b-\b\w*|\w*' -
Regular expressions that match strings "go go" and "kitty kitty" exactly, but not "go kitty", include (D)
A.r '\b(\w+)\b\s+\1\b'
B. r'\w{2,5}\s*\1'
C. r'(\S+) \s+\1'
D. r'(\S{2,5})\s{1,}\1' -
Regular expressions that can match "aab" in a string, but not "aaab" and "aaaab" include (C)
A. r"a*?b"
B. r"a{,2}b"
C. r"aa??b"
D. r"aaa??b"
2. Programming Questions
1. User name matching
Requirements: 1. User names can only contain numeric letter underscores
2. Cannot start with a number
3. Degree is in the range of 6 to 16 bits
from re import fullmatch def username(name): re_str = r"[a-zA-Z_][a-zA-Z_1-9]{5,15}" result = fullmatch(re_str, name) if result: return True else: return False
- Password Matching
Requirements: 1. Can't include!@# %^&* These special symbols
2. Must start with a letter
3. Degree is in the range of 6 to 12 bits
from re import fullmatch def is_keyword(keyword: str): return fullmatch(r'[a-zA-Z][^!@#¥%^&*]{5,11}', keyword)
- ip address matching in ipv4 format
Tip: IP address range is 0.0.0.0 - 255.255.255.255
from re import fullmatch, findall def is_address(address: str): return fullmatch(r'(([1-9]?\d|1\d{2}|2[0-4]\d|25[0-5])\.){3}(\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])', address)
- Extracts and sums values from user input data, including positive and negative numbers as well as integers and decimals
for example:"-3.14good87nice19bye" =====> -3.14 + 87 + 19 = 102.86
from re import fullmatch, findall from functools import * def sum_(str1): re_str = r"[+-]?\d+[.]?\d+" result = findall(re_str, str1) print(reduce(lambda x, item: x+float(item), result, 0))
-
Verify that the input can only be Chinese characters
from re import fullmatch def is_chinese(chinese: str): return fullmatch(r'[\u4e00 -\u9fa5]+', chinese)
-
Match integers or decimals (both positive and negative)
from re import fullmatch def is_numbers(numbers: str): return fullmatch(r'[+-]?([1-9]\d*|0)(\.\d+)?', numbers)
-
Verify that the input username and QQ number are valid and give the corresponding prompt information
Requirement:
User name must be composed of letters, numbers, or underscores and be between 6 and 20 characters in length
QQ number is a number from 5 to 12 and the first place cannot be zerofrom re import fullmatch username = r'(?i)[a-z1-9_]{6,20}' is_username = input('enter one user name:') result_name = fullmatch(username, is_username) if result_name: print('User name is valid') else: print('Invalid user name') keyword = r'[1-9]\d{4,11}' is_keyword = input('Please input a password:') result_qq = fullmatch(keyword, is_keyword) if result_qq: print('QQ No. valid') else: print('QQ Invalid number')
-
Split long string: take each sentence out of a poem
poem ='moonlight in front of window, frost on the ground suspected. Look up at the moon, look down and think about your home.
from re import split poem = 'Bright moonlight in front of the window, frost on the ground is suspected. Raising my head, I see the moon so bright; withdrawing my eyes, my nostalgia comes around.' result = split(r'[,. ]', poem) for x in result: print(x)