Python string depth summary

Today, let's learn about string data types. We will discuss how to declare string data types, the relationship between string data types and ASCII tables, the attributes of string data types, and some important string methods and operations. Super dry goods can't be missed!

What is a Python string

A string is an object that contains a series of characters. The character is a string of length 1. In Python, a single character is also a string. Interestingly, there is no character data type in Python programming language, but there are character data types in other programming languages such as C, Kotlin and Java

We can declare Python strings using single quotes, double quotes, triple quotes, or str() functions. Shows how to declare a string in Python code:

# A single quote string
single_quote = 'a'  # This is an example of a character in other programming languages. It is a string in Python

# Another single quote string
another_single_quote = 'Programming teaches you patience.'

# A double quote string
double_quote = "aa"

# Another double-quote string
another_double_quote = "It is impossible until it is done!"

# A triple quote string
triple_quote = '''aaa'''

# Also a triple quote string
another_triple_quote = """Welcome to the Python programming language. Ready, 1, 2, 3, Go!"""

# Using the str() function
string_function = str(123.45)  # str() converts float data type to string data type

# Another str() function
another_string_function = str(True)  # str() converts a boolean data type to string data type

# An empty string
empty_string = ''

# Also an empty string
second_empty_string = ""

# We are not done yet
third_empty_string = """"""  # This is also an empty string: ''''''
copy

Another way to get strings in Python is to use the input() function. The input() function allows us to use the keyboard to insert the entered value into the program. The inserted values are read as strings, but we can convert them to other data types:

# Inputs into a Python program
input_float = input()  # Type in: 3.142
input_boolean = input() # Type in: True

# Convert inputs into other data types
convert_float = float(input_float)  # converts the string data type to a float
convert_boolean = bool(input_boolean) # converts the string data type to a bool
copy

We use the type() function to determine the data type of the object in Python, which returns the class of the object. When the object is a string, it returns the str class. Similarly, when the object is a dictionary, integer, floating-point number, tuple or Boolean value, it returns dict, int, float, tuple and bool classes respectively. Now let's use the type() function to determine the data type of the variable declared in the previous code fragment:

# Data types/ classes with type()

print(type(single_quote))
print(type(another_triple_quote))
print(type(empty_string))

print(type(input_float))
print(type(input_boolean))

print(type(convert_float))
print(type(convert_boolean))
copy

ASCII table and Python string characters

The American Standard Code for information interchange (ASCII) is designed to help us map characters or text to numbers, because numeric sets are easier to store in computer memory than text. ASCII encodes 128 characters mainly in English, which is used to process information in computer and programming. ASCII encoded English characters include lowercase letters (A-Z), uppercase letters (A-Z), numbers (0-9), punctuation and other symbols

The ord() function converts a Python string of length 1 (one character) to its decimal representation on the ASCII table, while the chr() function converts the decimal representation back to the string. For example:

import string

# Convert uppercase characters to their ASCII decimal numbers
ascii_upper_case = string.ascii_uppercase  # Output: ABCDEFGHIJKLMNOPQRSTUVWXYZ

for one_letter in ascii_upper_case[:5]:  # Loop through ABCDE
    print(ord(one_letter))
copy

Output:

65
66
67
68
69
copy
# Convert digit characters to their ASCII decimal numbers
ascii_digits = string.digits  # Output: 0123456789

for one_digit in ascii_digits[:5]:  # Loop through 01234
    print(ord(one_digit))
copy

Output:

48
49
50
51
52
copy

In the code snippet above, we iterate through the strings ABCDE and 01234 and convert each character to their decimal representation in the ASCII table. We can also use the chr() function to perform the reverse operation to convert decimal numbers on ASCII tables to their Python string characters. For example:

decimal_rep_ascii = [37, 44, 63, 82, 100]

for one_decimal in decimal_rep_ascii:
    print(chr(one_decimal))
copy

Output:

%
,
?
R
d
copy

In the ASCII table, the string characters in the above program output are mapped to their respective decimal numbers

String properties

Zero index: the index of the first element in the string is zero, while the index of the last element is len(string) - 1. For example:

immutable_string = "Accountability"

print(len(immutable_string))
print(immutable_string.index('A'))
print(immutable_string.index('y'))
copy

Output:

14
0
13
copy

Invariance: this means that we cannot update the characters in the string. For example, we cannot delete an element from a string or try to allocate a new element at any of its index positions. If we try to update the string, it throws a TypeError:

immutable_string = "Accountability"

# Assign a new element at index 0
immutable_string[0] = 'B'
copy

Output:

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_11336/2351953155.py in 
      2 
      3 # Assign a new element at index 0
----> 4 immutable_string[0] = 'B'

TypeError: 'str' object does not support item assignment
copy

But we can reassign the string to immutable_string variables, but we should note that they are not the same string because they do not point to the same object in memory. Python does not update old string objects; It creates a new, as we can see through ids:

immutable_string = "Accountability"
print(id(immutable_string))

immutable_string = "Bccountability"
print(id(immutable_string)

test_immutable = immutable_string
print(id(test_immutable))
copy

Output:

2693751670576
2693751671024
2693751671024
copy

The above two IDs are also different on the same computer, which means two immutable IDs_ String variables all point to different addresses in memory. We will the last immutable_string variable assigned to test_immutable variable. You can see test_immutable variable and the last immutable_ The string variable points to the same address

Join: join two or more strings together to get a new string with a + sign. For example:

first_string = "Zhou"
second_string = "luobo"
third_string = "Learn Python"

fourth_string = first_string + second_string
print(fourth_string)

fifth_string = fourth_string + " " + third_string
print(fifth_string)
copy

Output:

Zhouluobo
Zhouluobo Learn Python
copy

Repeat: string can be repeated with * symbol. For example:

print("Ha" * 3)
copy

Output:

HaHaHa
copy

Indexing and slicing: we have determined that the string is indexed from zero, and we can use its index value to access any element in the string. We can also obtain a subset of strings by slicing between two index values. For example:

main_string = "I learned English and Python with ZHouluobo. You can do it too!"

# Index 0
print(main_string[0])

# Index 1
print(main_string[1])

# Check if Index 1 is whitespace
print(main_string[1].isspace())

# Slicing 1
print(main_string[0:11])

# Slicing 2:
print(main_string[-18:])

# Slicing and concatenation
print(main_string[0:11] + ". " + main_string[-18:])
copy

Output:

I

True
I learned English
You can do it too!
I learned English. You can do it too!
copy

String method

Str.split (sep = none, maxplit = - 1): the string splitting method contains two attributes: sep and maxplit. When this method is called with its default value, it splits the string wherever there are spaces. This method returns a list of strings:

string = "Apple, Banana, Orange, Blueberry"
print(string.split())
copy

Output:

['Apple,', 'Banana,', 'Orange,', 'Blueberry']
copy

We can see that the string is not well split because the split string contains,. We can use sep = '' to split where there is:

print(string.split(sep=','))
copy

Output:

['Apple', ' Banana', ' Orange', ' Blueberry']
copy

This is better than the previous split, but we can see spaces before some split strings. You can delete it using (sep = ''):

# Notice the whitespace after the comma
print(string.split(sep=', '))
copy

Output:

['Apple', 'Banana', 'Orange', 'Blueberry']
copy

Now the string is well segmented. Sometimes we don't want to split the maximum number of times. We can use the maxplit attribute to specify the number of times we want to split:

print(string.split(sep=', ', maxsplit=1))

print(string.split(sep=', ', maxsplit=2))
copy

Output:

['Apple', 'Banana, Orange, Blueberry']
['Apple', 'Banana', 'Orange, Blueberry']
copy

Str.splitlines (keeps = false): sometimes we want to deal with a corpus with different line breaks ('\ n', \ n\n ',' \ r ',' \ r\n ') at the boundary. We need to break it into sentences, not individual words. You can use the splitline method to do this. When keepers = true, the text contains line breaks; Otherwise they are excluded

import nltk  # You may have to `pip install nltk` to use this library.

macbeth = nltk.corpus.gutenberg.raw('shakespeare-macbeth.txt')
print(macbeth.splitlines(keepends=True)[:5])
copy

Output:

['[The Tragedie of Macbeth by William Shakespeare 1603]\n', '\n', '\n', 'Actus Primus. Scoena Prima.\n', '\n']
copy

str.strip([chars]): we use the strip method to remove trailing spaces or characters from both sides of the string. For example:

string = "    Apple Apple Apple no apple in the box apple apple             "

stripped_string = string.strip()
print(stripped_string)

left_stripped_string = (
    stripped_string
    .lstrip('Apple')
    .lstrip()
    .lstrip('Apple')
    .lstrip()
    .lstrip('Apple')
    .lstrip()
)
print(left_stripped_string)

capitalized_string = left_stripped_string.capitalize()
print(capitalized_string)

right_stripped_string = (
    capitalized_string
    .rstrip('apple')
    .rstrip()
    .rstrip('apple')
    .rstrip()
)
print(right_stripped_string)
copy

Output:

Apple Apple Apple no apple in the box apple apple
no apple in the box apple apple
No apple in the box apple apple
No apple in the box
copy

In the code snippet above, we used the lstrip and rstrip methods, which remove trailing spaces or characters from the left and right sides of the string, respectively. We also used the capitalize method, which converts the string to sentence case str.zfill(width): the zfill method fills the string with a 0 prefix to obtain the specified width. For example:

example = "0.8"  # len(example) is 3
example_zfill = example.zfill(5) # len(example_zfill) is 5
print(example_zfill)
copy

Output:

000.8
copy

str.isalpha(): if all characters in the string are letters, this method returns True; Otherwise, False is returned:

# Alphabet string
alphabet_one = "Learning"
print(alphabet_one.isalpha())

# Contains whitspace
alphabet_two = "Learning Python"
print(alphabet_two.isalpha())

# Contains comma symbols
alphabet_three = "Learning,"
print(alphabet_three.isalpha())
copy

Output:

True
False
False
copy

If the string character is alphanumeric, str.isalnum() returns True; If the string character is decimal, str.isdecimal() returns True; If the string character is a number, str.isdigit() returns True; str.isnumeric() returns True if the string character is a number

If all characters in the string are lowercase, str.islower() returns True; If all characters in the string are uppercase, str.isupper() returns True; str.istitle() returns True if the first letter of each word is capitalized:

# islower() example
string_one = "Artificial Neural Network"
print(string_one.islower())

string_two = string_one.lower()  # converts string to lowercase
print(string_two.islower())

# isupper() example
string_three = string_one.upper() # converts string to uppercase
print(string_three.isupper())

# istitle() example
print(string_one.istitle())
copy

Output:

False
True
True
True
copy

Str.endswitch (suffix) returns True, which is a string ending with the specified suffix. Str.startswitch (prefix) returns True if the string starts with the specified prefix:

sentences = ['Time to master data science', 'I love statistical computing', 'Eat, sleep, code']

# endswith() example
for one_sentence in sentences:
    print(one_sentence.endswith(('science', 'computing', 'Code')))
copy

Output:

True
True
False
copy
# startswith() example
for one_sentence in sentences:
    print(one_sentence.startswith(('Time', 'I ', 'Ea')))
copy

Output:

True
True
True
copy

str.find(substring) returns the lowest index if the substring exists in the string; Otherwise it returns - 1. str.rfind(substring) returns the highest index. The highest substring index (str.x) and substring index (str.x) are also returned if substring is found. If there is no substring in the string, ValueError is raised

string = "programming"

# find() and rfind() examples
print(string.find('m'))
print(string.find('pro'))
print(string.rfind('m'))
print(string.rfind('game'))

# index() and rindex() examples
print(string.index('m'))
print(string.index('pro'))
print(string.rindex('m'))
print(string.rindex('game'))
copy

Output:

6
0
7
-1
6
0
7

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_11336/3954098241.py in 
     11 print(string.index('pro'))  # Output: 0
     12 print(string.rindex('m'))  # Output: 7
---> 13 print(string.rindex('game'))  # Output: ValueError: substring not found

ValueError: substring not found
copy

str.maketrans(dict_map) creates a translation table from the dictionary map, and str.translate(maketrans) replaces the elements in the translation with their new values. For example:

example = "abcde"
mapped = {'a':'1', 'b':'2', 'c':'3', 'd':'4', 'e':'5'}
print(example.translate(example.maketrans(mapped)))
copy

Output:

12345
copy

String operation

Loop through a string

Strings are iteratable, so they support loop operations using for loops and enumerations:

# For-loop example
word = "bank"
for letter in word:
    print(letter)
copy

Output:

b
a
n
k
copy
# Enumerate example
for idx, value in enumerate(word):
    print(idx, value)
copy

Output:

0 b
1 a
2 n
3 k
copy

String and relational operators

When comparing two strings using relational operators (>, <, = = and so on), the elements of the two strings are compared one by one according to their ASCII decimal numbers. For example:

print('a' > 'b')
print('abc' > 'b')
copy

Output:

False
False
copy

In both cases, the output is False. The relational operator first compares the ASCII decimal number of elements on index 0 of two strings. Since b is greater than a, False is returned; In this case, the ASCII decimal numbers of other elements and the length of the string are irrelevant

When the string length is the same, it compares the ASCII decimal number of each element starting from index 0 until it finds elements with different ASCII decimal numbers. For example:

print('abd' > 'abc')
copy

Output:

True
copy

Check the membership of the string

The in operator checks whether a substring is a member of a string:

print('data' in 'dataquest')
print('gram' in 'programming')
copy

Output:

True
True
copy

Another way to check string membership, replace substrings, or match patterns is to use regular expressions

import re

substring = 'gram'
string = 'programming'
replacement = '1234'

# Check membership
print(re.search(substring, string))

# Replace string
print(re.sub(substring, replacement, string))
copy

Output:

pro1234ming
copy

String format

The f-string and str.format() methods are used to format strings. Both use braces {} placeholders. For example:

monday, tuesday, wednesday = "Monday", "Tuesday", "Wednesday"

format_string_one = "{} {} {}".format(monday, tuesday, wednesday)
print(format_string_one)

format_string_two = "{2} {1} {0}".format(monday, tuesday, wednesday)
print(format_string_two)

format_string_three = "{one} {two} {three}".format(one=tuesday, two=wednesday, three=monday)
print(format_string_three)

format_string_four = f"{monday} {tuesday} {wednesday}"
print(format_string_four)
copy

Output:

Monday Tuesday Wednesday
Wednesday Tuesday Monday
Tuesday Wednesday Monday
Monday Tuesday Wednesday
copy

f-strings are more readable, and they are implemented faster than the str.format() method. Therefore, f-string is the preferred method for string formatting

Handle quotes and apostrophes

An apostrophe (') represents a string in Python. To let Python know that we are not working with strings, we must use Python escape character (). Therefore, the apostrophe is expressed as' in Python. Unlike dealing with apostrophes, python has many ways to deal with quotation marks. They include the following:

# 1. Represent string with single quote (`""`) and quoted statement with double quote (`""`)
quotes_one =  '"Friends don\'t let friends use minibatches larger than 32" - Yann LeCun'
print(quotes_one)

# 2. Represent string with double quote `("")` and quoted statement with escape and double quote `(\"statement\")`
quotes_two =  "\"Friends don\'t let friends use minibatches larger than 32\" - Yann LeCun"
print(quotes_two)

# 3. Represent string with triple quote `("""""")` and quoted statment with double quote ("")
quote_three = """"Friends don\'t let friends use minibatches larger than 32" - Yann LeCun"""
print(quote_three)
copy

Output:

"Friends don't let friends use minibatches larger than 32" - Yann LeCun
"Friends don't let friends use minibatches larger than 32" - Yann LeCun
"Friends don't let friends use minibatches larger than 32" - Yann LeCun
copy

Write at the end

As the most common data type in the programming language, it is very important to master its various attributes and methods skillfully and flexibly. Friends must review it in real time and pay attention everywhere!

Well, that's what we're sharing today. If you like it, just like it~

Posted by shanx24 on Sun, 22 May 2022 10:42:34 +0300