NumPy supports a large number of dimensional array and matrix operations. It is a Python library for array operations.
This article is included in Pre machine learning series.
1, Python Basics
Let's first consolidate the basics of Python. Python has six standard data types: Number, String, List, Tuple, Set, and Dictionary.
Of which:
Immutable data: Number, String, Tuple.
Variable data: List, Dictionary, Set.
1. List
The list is wrapped in square brackets [] and the value of each position is variable.
list = [1, 2, 3, 4, 5, 6]
Take values according to the position, such as the value of the second position:
list[1]
Get 2.
All values from the third position to the end of the list:
a[2:]
Get [3, 4, 5, 6].
Change the value of the specified position:
list[0] = 9
The output of list a is [9, 2, 3, 4, 5, 6].
2. Tuple
Tuples are enclosed by parentheses (), and the value of each position is immutable. Duplicate data is allowed.
tuple = ('a', 'a, 'c', 1, 2, 3.0)
Output ('a ',' a ',' C ', 1, 2, 3.0).
Take the last element:
tuple[-1]
Output 3.0.
The tuple operation is similar to the list, but the value of the element in the tuple cannot be changed, otherwise an error will be reported.
tuple[2] = 'caiyongji'
3. Set {set}
A collection is a collective containing non repeating elements, wrapped in curly braces {}.
set1 = {'a','b','c','a'} set2 = {'b','c','d','e'}
The output result of set1 is: {'a', 'b', 'c'}. Note: the collection removes duplicate elements.
The output result of set2 is: {'B', 'C','d ',' e '}.
Unlike lists and tuples, sets are not subscribable, such as:
set1[0]
Next, let's look at set operations.
Difference set of set1 and set2:
set1 - set2 #set1.difference(set2)
Output: {'a'}.
Union of set1 and set2:
set1 | set2 #set1.union(set2)
Output: {'a', 'B', 'C','d ',' e '}.
Intersection of set1 and set2:
set1 & set2 #set1.intersection(set2)
Output: {B ','c'}.
Symmetric difference set of set1 and set2:
set1 ^ set2 #(set1 - set2) | (set2 - set1) #set1.symmetric_difference(set2)
Output: {'a','d ',' e '}.
The above difference set, union set, intersection set and symmetric difference set all have corresponding set methods. You can try the annotation method yourself.
4. Dictionary {dictionary}
Dictionary is a kind of mapping relationship, which is an unordered set of key value pairs. The dictionary does not allow duplicate keys, but allows duplicate values.
dict = {'gongzhonghao':'caiyongji','website':'caiyongji.com', 'website':'blog.caiyongji.com'}
The dictionary outputs {'gongzhonghao':'caiyongji ',' website ':'blog. Caiyongji. Com'}. It should be noted that when the dictionary contains duplicate keys, the following will overwrite the previous elements.
dict['gongzhonghao']
Output string caiyongji. We can also use the get method to get the same effect.
dict.get('gongzhonghao')
View all keys:
dict.keys()
Output dict_keys(['gongzhonghao', 'website']).
View all values:
dict.values()
Output dict_values(['caiyongji', 'blog.caiyongji.com']).
Change the value of an item:
dict['website'] = 'caiyongji.com' dict
Output {'gongzhonghao':'caiyongji ',' website ':'caiyongji. Com'}.
Knowing the data types of Python, we can learn to use NumPy.
2, Numpy common usage
1. Create an array
import numpy as np arr = np.array([1, 2, 3, 4, 5])
The output of array is array([1, 2, 3, 4, 5]).
We enter the following code to create a two-dimensional array:
my_matrix = [[1,2,3],[4,5,6],[7,8,9]] mtrx= np.array(my_matrix)
The output of mtrx is as follows:
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
2. Index and slice
The index one-dimensional array and two bit array are as follows:
print('arr[0]=',arr[0],'mtrx[1,1]=',mtrx[1,1])
Output arr[0]= 1 mtrx[1,1]= 5.
Slice array:
arr[:3]
The output result is array([1, 2, 3]).
Reciprocal slice:
arr[-3:-1]
Output array([3, 4]).
Add step, which determines the slice interval:
arr[1:4:2]
Output array([2, 4]).
2D array slice:
mtrx[0:2, 0:2]
Output, the meaning of the code is to take lines 1 and 2 and columns 1 and 2:
array([[1, 2], [4, 5]])
3. dtype
The dtpe of NumPy has the following data types:
- i - integer
- b - boolean
- u - unsigned integer
- f - float
- c - complex float
- m - timedelta
- M - datetime
- O - object
- S - string
- U - unicode string
- V - fixed chunk of memory for other type ( void )
import numpy as np arr1 = np.array([1, 2, 3, 4]) arr2 = np.array(['apple', 'banana', 'cherry']) print('arr1.dtype=',arr1.dtype,'arr2.dtype=',arr2.dtype)
The output is arr1 dtype= int32 arr2. dtype= <U6. The data type of arr1 is int32, and < U6 of arr2 represents no more than 6-bit Unicode string.
We can specify the dtype type.
arr = np.array(['1', '2', '3'], dtype='f')
Output result bit array ([1,2,3.], Dtype = float32), where 1 Indicates 1.0. You can see that dtype is set to bit float32 data type.
4. General methods
4.1 arange
np. The output result of range (0101,2) is as follows. This command indicates that the data is generated evenly in the [0101) interval, and the interval step is 2.
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100])
4.2 zeros
np. The output result of zeros ((2,5)) is as follows. This command indicates that the matrix (two-dimensional array) with 2 rows and 5 columns of 0 is output.
array([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])
4.3 ones
np. The output result of ones ((4,4)) is as follows. This command indicates that the matrix with 4 rows and 4 columns of 1 is output.
array([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])
4.4 eye
np. The output result of eye (5) is as follows. This command indicates that the output diagonal is 1 and the rest are all 0. A square matrix is a matrix with the same row and column.
array([[1., 0., 0., 0., 0.], [0., 1., 0., 0., 0.], [0., 0., 1., 0., 0.], [0., 0., 0., 1., 0.], [0., 0., 0., 0., 1.]])
4.5 rand
np. random. The rand (5,2) command generates random numbers in 5 rows and 2 columns.
array([[0.67227856, 0.4880784 ], [0.82549517, 0.03144639], [0.80804996, 0.56561742], [0.2976225 , 0.04669572], [0.9906274 , 0.00682573]])
If you want to ensure the same random number as this example, you can use the same random seed as this example. Pass NP random. Set the seed method.
np.random.seed(99) np.random.rand(5,2)
4.6 randint
np. random. The output result of randInt (0101, (4,5)) is as follows. This command indicates that integers are randomly selected in the [0101) interval to generate an array of 4 rows and 5 columns.
array([[ 1, 35, 57, 40, 73], [82, 68, 69, 52, 1], [23, 35, 55, 65, 48], [93, 59, 87, 2, 64]])
4.7 max min argmax argmin
Let's randomly generate a set of numbers:
np.random.seed(99) ranarr = np.random.randint(0,101,10) ranarr
Output:
array([ 1, 35, 57, 40, 73, 82, 68, 69, 52, 1])
The maximum and minimum values are:
print('ranarr.max()=',ranarr.max(),'ranarr.min()=',ranarr.min())
The output is ranarr max()= 82 ranarr. min()= 1.
The index positions of the maximum and minimum values are:
print('ranarr.argmax()=',ranarr.argmax(),'ranarr.argmin()=',ranarr.argmin())
Output: ranarr argmax()= 5 ranarr. argmin()= 0. Note that when there are multiple maximum and minimum values, the previous index position is taken.
3, NumPy advanced usage
1. reshape
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) newarr = arr.reshape(4, 3)
Where, arr is a one-dimensional array, newarr is a two bit array, where behavior 4 and column 3.
print('arr.shape=',arr.shape,'newarr.shape=',newarr.shape)
Output arr.shape = (12,) newarr shape= (4, 3).
The output of newarr is as follows:
array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]])
2. Merger and division
2.1 concatenate
One dimensional array merging:
arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) arr = np.concatenate((arr1, arr2)) arr
Output: array([1, 2, 3, 4, 5, 6]).
2D array merge:
arr1 = np.array([[1, 2], [3, 4]]) arr2 = np.array([[5, 6], [7, 8]]) arr = np.concatenate((arr1, arr2)) arr
Output is:
array([[1, 2], [3, 4], [5, 6], [7, 8]])
We add the parameter axis=1:
arr1 = np.array([[1, 2], [3, 4]]) arr2 = np.array([[5, 6], [7, 8]]) arr = np.concatenate((arr1, arr2), axis=1) arr
Output is:
array([[1, 2, 5, 6], [3, 4, 7, 8]])
We move the mouse to concatenate and press the shortcut key Shift+Tab to view the method description. You can see that the concatenate method performs the merge operation along the existing axis. The default axis is = 0. When axis=1 is set, the merge operation is performed along the column.
2.2 array_split
Split array:
arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]]) newarr = np.array_split(arr, 3) newarr
The value of newarr is:
[array([[1, 2], [3, 4]]), array([[5, 6], [7, 8]]), array([[ 9, 10], [11, 12]])]
3. Search and filter
3.1 search
NumPy can find the array index that meets the conditions through the where method.
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8]) x = np.where(arr%2 == 0) x
Output:
(array([1, 3, 5, 7], dtype=int64),)
3.2 screening
Let's look at the following code:
bool_arr = arr > 4 arr[bool_arr]
Output: array([5, 6, 7, 8]). This time we return the values in the array, not the index.
Let's see bool_ What exactly is the content of arr.
bool_ The output of arr is:
array([False, False, False, False, True, True, True, True])
So we can replace the above filter with the following command.
arr[arr > 4]
4. Sorting
The sort method sorts the array of ndarry.
arr = np.array(['banana', 'cherry', 'apple']) np.sort(arr)
Output the sorted result: array (['Apple', 'Banana', 'cherry'], dtype = '< U6').
For two-dimensional arrays, the sort method sorts each row separately.
arr = np.array([[3, 2, 4], [5, 0, 1]]) np.sort(arr)
Output result:
array([[2, 3, 4], [0, 1, 5]])
5. Random
5.1 random probability
What should we do if we want to complete the following requirements?
Generate a one-dimensional array of 100 values, where each value must be 3, 5, 7, or 9.
Set the probability of a value of 3 to 0.1.
Set the probability of a value of 5 to 0.3.
Set the probability of a value of 7 to 0.6.
Set the probability that the value is 9 to 0.
We solve it with the following command:
random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(100))
Output result:
array([7, 5, 7, 7, 7, 7, 5, 7, 5, 7, 7, 5, 5, 7, 7, 5, 3, 5, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 5, 3, 7, 5, 7, 5, 7, 3, 7, 7, 3, 7, 7, 7, 7, 3, 5, 7, 7, 5, 7, 7, 5, 3, 5, 7, 7, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 5, 7, 7, 7, 7, 7, 5, 7, 7, 7, 7, 3, 7, 7, 5, 7, 5, 7, 5, 7, 7, 5, 7, 7, 7, 7, 7, 7, 3, 5, 5, 7, 5, 7, 5])
5.2 random arrangement
5.2.1 permutation
Generate a new random arrangement according to the original array.
np.random.seed(99) arr = np.array([1, 2, 3, 4, 5]) new_arr = np.random.permutation(arr) new_arr
The output is: array([3, 1, 5, 4, 2]). The original array arr remains unchanged.
5.2.2 shuffle
Change the original array to random arrangement. Shuffle means shuffle in English.
np.random.seed(99) arr = np.array([1, 2, 3, 4, 5]) np.random.shuffle(arr) arr
The output is: array([3, 1, 5, 4, 2]). The original array arr changes.
5.3 random distribution
5.3.1 Zhengtai distribution
Use NP random. The normal method generates random numbers that conform to the positive distribution.
x = np.random.normal(loc=1, scale=2, size=(2, 3)) x
The output result is:
array([[ 0.14998973, 3.22564777, 1.48094109], [ 2.252752 , -1.64038195, 2.8590667 ]])
If we want to see the random distribution of x, we need to install seaborn to draw the image. Install using pip:
pip install -i https://pypi.tuna.tsinghua.ed... seaborn
import matplotlib.pyplot as plt import seaborn as sns sns.distplot(x, hist=False) plt.show()
5.3.2 binomial distribution
Use NP random. Binomial method generates random numbers conforming to binomial distribution.
x = np.random.binomial(n=10, p=0.5, size=10) x
The output result is: array([8, 6, 6, 2, 5, 5, 5, 5, 3, 4]).
Draw image:
import matplotlib.pyplot as plt import seaborn as sns sns.distplot(x, hist=True, kde=False) plt.show()
5.3.3 polynomial distribution
Polynomial distribution is a general representation of binomial distribution. Use NP random. Multinomial method generates random numbers conforming to polynomial distribution.
x = np.random.multinomial(n=6, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6]) x
The above code can be simply understood as rolling dice. n=6 is the face of the dice, and pvals means that the probability of each side is 1 / 6.
5.3.4 others
In addition to the above distribution, there are Poisson distribution, uniform distribution, exponential distribution, chi square distribution, Pareto distribution and so on. Interested parties can search by themselves.
This article is included in Pre machine learning series . You are welcome to like, collect and pay attention. More wonderful contents about machine learning are constantly updated!