foreword
A regular expression is a special sequence of characters that can help you easily check whether a string matches a certain pattern.
The re module also provides exactly the same functions as these methods, which take a pattern string as their first argument.
re.match function
re.match attempts to match a pattern from the beginning of the string. If the match is not successful at the beginning, match() returns none.
Function syntax:
re.match(pattern, string, flags=0)
Function parameter description:
parameter | describe |
---|---|
pattern | matching regular expression |
string | String to match. |
flags | The flag bit is used to control the matching method of the regular expression, such as: whether it is case-sensitive, multi-line matching, etc. |
The re.match method returns a matching object if the match is successful, otherwise it returns None.
We can use the group(num) or groups() match object functions to get the match expression.
match object method | describe |
---|---|
group(num=0) | A string of entire expressions to match, group() can enter multiple group numbers at once, in which case it will return a tuple containing the values corresponding to those groups. |
groups() | Returns a tuple containing all group strings, from 1 to the group number contained in . |
Example:
import re line = "I really like you yesterday" matchObj = re.match( r'(.*) really (.*?) .*', line) print ("matchObj.group() : ", matchObj.group())print ("matchObj.group(1) : ", matchObj.group(1))print ("matchObj.group(2) : ", matchObj.group(2))
The execution result of the above example is as follows:
matchObj.group() : I really like you yesterdaymatchObj.group(1) : ImatchObj.group(2) : like
re.search method
re.search will look for pattern matches within the string until the first match is found.
Function syntax:
re.search(pattern, string, flags=0)
Function parameter description:
parameter | describe |
---|---|
pattern | matching regular expression |
string | String to match. |
flags | The flag bit is used to control the matching method of the regular expression, such as: whether it is case-sensitive, multi-line matching, etc. |
The re.search method returns a matching object if the match is successful, otherwise it returns None.
We can use the group(num) or groups() match object function to get the match expression.
match object method | describe |
---|---|
group(num=0) | A string of entire expressions to match, group() can enter multiple group numbers at once, in which case it will return a tuple containing the values corresponding to those groups. |
groups() | Returns a tuple containing all group strings, from 1 to the group number contained in . |
Example:
#!/usr/bin/python import re line = "I really like you yesterday"; searchObj = re.search(r'(.*) really (.*?) .*', line) print ("searchObj.group() : ", searchObj.group()) print ("searchObj.group(1) : ", searchObj.group(1)) print ("searchObj.group(2) : ", searchObj.group(2))
The execution result of the above example is as follows:
searchObj.group() : I really like you yesterday searchObj.group(1) : I searchObj.group(2) : love
The difference between re.match and re.search
re.match only matches the beginning of the string. If the beginning of the string does not match the regular expression, the match fails and the function returns None; while re.search matches the entire string until a match is found.
Example:
#!/usr/bin/python import re line = "I really like you yesterday"; matchObj = re.match( r'love', line) if matchObj: print("match --> matchObj.group() : ", matchObj.group()) else: print "No match!!" matchObj = re.search( r'love', line) if matchObj: print "search --> matchObj.group() : ", matchObj.group() else: print "No match!!"
The result of running the above example is as follows:
No match!! search --> matchObj.group() : love
search and replace
Python's re module provides re.sub for replacing matches in strings.
grammar:
re.sub(pattern, repl, string, max=0)
The returned string is replaced with the leftmost unique match of RE in the string. If the pattern is not found, the character will be returned unchanged.
The optional parameter count is the maximum number of replacements after pattern matching; count must be a non-negative integer. The default value is 0 to replace all matches.
Example:
import re phone = "2004-959-559 # This is a foreign phone number" # Remove Python comments from strings num = re.sub(r'#.*$', "", phone) print("phone number is: ", num) # Remove non-numeric (-) strings num = re.sub(r'\D', "", phone) print("phone number is : ", num)
The execution result of the above example is as follows:
phone number : 2004-959-559 phone number : 2004959559
The repl parameter is a function
The following example multiplies the matched numbers in the string by 2:
Example:
import re # Multiply matching numbers by 2 def double(matched): value = int(matched.group('value')) return str(value * 2) s = 'A23G4HFD567' print(re.sub('(?P<value>\d+)', double, s))
The output of the execution is:
A46G8HFD1134
re.compile function
The compile function is used to compile the regular expression and generate a regular expression ( Pattern ) object for use by the match() and search() functions.
The syntax format is:
re.compile(pattern[, flags])
parameter:
-
pattern : a regular expression in string form
-
flags : optional, indicating matching patterns, such as ignoring case, multi-line patterns, etc. The specific parameters are:
-
- re.I ignore case
- re.L means special character set \w, \W, \b, \B, \s, \S depends on the current environment
- re.M multiline mode
- re.S is . and any character including newlines (. does not include newlines)
- re.U for special character set \w, \W, \b, \B, \d, \D, \s, \S depends on Unicode character attribute database
- re.X ignore spaces and comments after # for readability
Example
>>>import re >>> pattern = re.compile(r'\d+') # for matching at least one digit >>> m = pattern.match('one12twothree34four') # Find header, no match >>> print (m) None >>> m = pattern.match('one12twothree34four', 2, 10) # Matches from position 'e', no match >>> print (m) None >>> m = pattern.match('one12twothree34four', 3, 10) # Match from the position of '1', exactly match >>> print (m) # returns a Match object <_sre.SRE_Match object at 0x10a42aac0> >>> m.group(0) # 0 can be omitted '12' >>> m.start(0) # 0 can be omitted 3 >>> m.end(0) # 0 can be omitted 5 >>> m.span(0) # 0 can be omitted (3, 5)
In the above, a Match object is returned when the match is successful, where:
- The group([group1, …]) method is used to obtain one or more groups of matched strings. When you want to obtain the entire matched substring, you can use group() or group(0) directly;
- The start([group]) method is used to obtain the starting position (the index of the first character of the substring) of the substring matched by the group in the whole string, and the default value of the parameter is 0;
- The end([group]) method is used to obtain the end position of the substring matched by the group in the whole string (the index of the last character of the substring + 1), and the default value of the parameter is 0;
- The span([group]) method returns (start(group), end(group)).
Let's look at another example:
>>>import re >>> pattern = re.compile(r'([a-z]+) ([a-z]+)', re.I) # re.I means ignore case >>> m = pattern.match('Hello World Wide Web') >>> print (m) # If the match is successful, return a Match object <_sre.SRE_Match object at 0x10bea83e8> >>> m.group(0) # Returns the entire substring that matches successfully 'Hello World' >>> m.span(0) # Returns the index of the entire substring that matched successfully (0, 11) >>> m.group(1) # Returns the first substring whose group matches successfully 'Hello' >>> m.span(1) # Returns the index of the first substring whose group matches successfully (0, 5) >>> m.group(2) # Returns the substring that matches the second grouping successfully 'World' >>> m.span(2) # Returns the substring that matches the second grouping successfully (6, 11) >>> m.groups() # Equivalent to (m.group(1), m.group(2), ...) ('Hello', 'World') >>> m.group(3) # No third group exists Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: no such group
findall
Finds all substrings in the string matched by the regular expression and returns a list, or an empty list if no matches are found.
Note: match and search are one match and findall matches all.
The syntax format is:
findall(string[, pos[, endpos]])
parameter:
- string : The string to match.
- pos : optional parameter, specifying the starting position of the string, default is 0.
- endpos : an optional parameter that specifies the end position of the string, the default is the length of the string.
Find all numbers in a string:
import re pattern = re.compile(r'\d+') # find numbers result1 = pattern.findall('school 123 google 456') result2 = pattern.findall('sch88ool123google456', 0, 10) print(result1) print(result2)
Output result:
['123', '456'] ['88', '12']
re.finditer
Similar to findall, finds all substrings matched by the regular expression in a string and returns them as an iterator.
re.finditer(pattern, string, flags=0)
parameter:
parameter | describe |
---|---|
pattern | matching regular expression |
string | String to match. |
flags | The flag bit is used to control the matching method of the regular expression, such as: whether it is case-sensitive, multi-line matching, etc. |
Example:
import re it = re.finditer(r"\d+","12a32bc43jf3") for match in it: print (match.group() )
Output result:
12 32 43 3
re.split
The split method returns a list after splitting the string according to the substrings that can be matched. Its usage is as follows:
re.split(pattern, string[, maxsplit=0, flags=0])
parameter:
parameter | describe |
---|---|
pattern | matching regular expression |
string | String to match. |
maxsplit | The number of splits, maxsplit=1 splits once, the default is 0, and the number of times is not limited. |
flags | The flag bit is used to control the matching method of the regular expression, such as: whether it is case-sensitive, multi-line matching, etc. |
Example:
>>>import re >>> re.split('\W+', 'school, school, chool.') ['runoob', 'runoob', 'w3cschool', ''] >>> re.split('(\W+)', ' school, school, school.') ['', ' ', 'runoob', ', ', 'school', ', ', 'school', '.', ''] >>> re.split('\W+', ' w3cschool, w3cschool, w3cschool.', 1) ['', 'school, school, school.'] >>> re.split('a*', 'hello world') # split does not split a string that does not find a match ['hello world']