Pre machine learning: master common NumPy usage in 30 minutes

NumPy supports a large number of dimensional array and matrix operations. It is a Python library for array operations.

This article is included in Pre machine learning series.

1, Python Basics

Let's first consolidate the basics of Python. Python has six standard data types: Number, String, List, Tuple, Set, and Dictionary.
Of which:
Immutable data: Number, String, Tuple.
Variable data: List, Dictionary, Set.

1. List

The list is wrapped in square brackets [] and the value of each position is variable.

list = [1, 2, 3, 4, 5, 6]

Take values according to the position, such as the value of the second position:

list[1]

Get 2.
All values from the third position to the end of the list:

a[2:]

Get [3, 4, 5, 6].

Change the value of the specified position:

list[0] = 9

The output of list a is [9, 2, 3, 4, 5, 6].

2. Tuple

Tuples are enclosed by parentheses (), and the value of each position is immutable. Duplicate data is allowed.

tuple = ('a', 'a, 'c', 1, 2, 3.0)

Output ('a ',' a ',' C ', 1, 2, 3.0).
Take the last element:

tuple[-1]

Output 3.0.

The tuple operation is similar to the list, but the value of the element in the tuple cannot be changed, otherwise an error will be reported.

tuple[2] = 'caiyongji'

3. Set {set}

A collection is a collective containing non repeating elements, wrapped in curly braces {}.

set1 = {'a','b','c','a'}
set2 = {'b','c','d','e'}

The output result of set1 is: {'a', 'b', 'c'}. Note: the collection removes duplicate elements.
The output result of set2 is: {'B', 'C','d ',' e '}.

Unlike lists and tuples, sets are not subscribable, such as:

set1[0]

Next, let's look at set operations.

Difference set of set1 and set2:

set1 - set2
#set1.difference(set2) 

Output: {'a'}.

Union of set1 and set2:

set1 | set2
#set1.union(set2) 

Output: {'a', 'B', 'C','d ',' e '}.

Intersection of set1 and set2:

set1 & set2
#set1.intersection(set2) 

Output: {B ','c'}.

Symmetric difference set of set1 and set2:

set1 ^ set2 
#(set1 - set2) | (set2 - set1)
#set1.symmetric_difference(set2)

Output: {'a','d ',' e '}.

The above difference set, union set, intersection set and symmetric difference set all have corresponding set methods. You can try the annotation method yourself.

4. Dictionary {dictionary}

Dictionary is a kind of mapping relationship, which is an unordered set of key value pairs. The dictionary does not allow duplicate keys, but allows duplicate values.

dict = {'gongzhonghao':'caiyongji','website':'caiyongji.com', 'website':'blog.caiyongji.com'}

The dictionary outputs {'gongzhonghao':'caiyongji ',' website ':'blog. Caiyongji. Com'}. It should be noted that when the dictionary contains duplicate keys, the following will overwrite the previous elements.

dict['gongzhonghao']

Output string caiyongji. We can also use the get method to get the same effect.

dict.get('gongzhonghao')

View all keys:

dict.keys()

Output dict_keys(['gongzhonghao', 'website']).

View all values:

dict.values()

Output dict_values(['caiyongji', 'blog.caiyongji.com']).
Change the value of an item:

dict['website'] = 'caiyongji.com'
dict

Output {'gongzhonghao':'caiyongji ',' website ':'caiyongji. Com'}.

Knowing the data types of Python, we can learn to use NumPy.

2, Numpy common usage

1. Create an array

import numpy as np
arr = np.array([1, 2, 3, 4, 5])

The output of array is array([1, 2, 3, 4, 5]).

We enter the following code to create a two-dimensional array:

my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
mtrx= np.array(my_matrix)

The output of mtrx is as follows:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

2. Index and slice

The index one-dimensional array and two bit array are as follows:

print('arr[0]=',arr[0],'mtrx[1,1]=',mtrx[1,1])

Output arr[0]= 1 mtrx[1,1]= 5.

Slice array:

arr[:3]

The output result is array([1, 2, 3]).

Reciprocal slice:

arr[-3:-1]

Output array([3, 4]).

Add step, which determines the slice interval:

arr[1:4:2]

Output array([2, 4]).

2D array slice:

mtrx[0:2, 0:2]

Output, the meaning of the code is to take lines 1 and 2 and columns 1 and 2:

array([[1, 2],
       [4, 5]])

3. dtype

The dtpe of NumPy has the following data types:

  • i - integer
  • b - boolean
  • u - unsigned integer
  • f - float
  • c - complex float
  • m - timedelta
  • M - datetime
  • O - object
  • S - string
  • U - unicode string
  • V - fixed chunk of memory for other type ( void )
import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array(['apple', 'banana', 'cherry'])
print('arr1.dtype=',arr1.dtype,'arr2.dtype=',arr2.dtype)

The output is arr1 dtype= int32 arr2. dtype= <U6. The data type of arr1 is int32, and < U6 of arr2 represents no more than 6-bit Unicode string.

We can specify the dtype type.

arr = np.array(['1', '2', '3'], dtype='f')

Output result bit array ([1,2,3.], Dtype = float32), where 1 Indicates 1.0. You can see that dtype is set to bit float32 data type.

4. General methods

4.1 arange

np. The output result of range (0101,2) is as follows. This command indicates that the data is generated evenly in the [0101) interval, and the interval step is 2.

array([  0,   2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,
        26,  28,  30,  32,  34,  36,  38,  40,  42,  44,  46,  48,  50,
        52,  54,  56,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,
        78,  80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100])

4.2 zeros

np. The output result of zeros ((2,5)) is as follows. This command indicates that the matrix (two-dimensional array) with 2 rows and 5 columns of 0 is output.

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

4.3 ones

np. The output result of ones ((4,4)) is as follows. This command indicates that the matrix with 4 rows and 4 columns of 1 is output.

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

4.4 eye

np. The output result of eye (5) is as follows. This command indicates that the output diagonal is 1 and the rest are all 0. A square matrix is a matrix with the same row and column.

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

4.5 rand

np. random. The rand (5,2) command generates random numbers in 5 rows and 2 columns.

array([[0.67227856, 0.4880784 ],
       [0.82549517, 0.03144639],
       [0.80804996, 0.56561742],
       [0.2976225 , 0.04669572],
       [0.9906274 , 0.00682573]])

If you want to ensure the same random number as this example, you can use the same random seed as this example. Pass NP random. Set the seed method.

np.random.seed(99)
np.random.rand(5,2)

4.6 randint

np. random. The output result of randInt (0101, (4,5)) is as follows. This command indicates that integers are randomly selected in the [0101) interval to generate an array of 4 rows and 5 columns.

array([[ 1, 35, 57, 40, 73],
       [82, 68, 69, 52,  1],
       [23, 35, 55, 65, 48],
       [93, 59, 87,  2, 64]])

4.7 max min argmax argmin

Let's randomly generate a set of numbers:

np.random.seed(99)
ranarr = np.random.randint(0,101,10)
ranarr

Output:

array([ 1, 35, 57, 40, 73, 82, 68, 69, 52,  1])

The maximum and minimum values are:

print('ranarr.max()=',ranarr.max(),'ranarr.min()=',ranarr.min())

The output is ranarr max()= 82 ranarr. min()= 1.
The index positions of the maximum and minimum values are:

print('ranarr.argmax()=',ranarr.argmax(),'ranarr.argmin()=',ranarr.argmin())

Output: ranarr argmax()= 5 ranarr. argmin()= 0. Note that when there are multiple maximum and minimum values, the previous index position is taken.

3, NumPy advanced usage

1. reshape

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)

Where, arr is a one-dimensional array, newarr is a two bit array, where behavior 4 and column 3.

print('arr.shape=',arr.shape,'newarr.shape=',newarr.shape)

Output arr.shape = (12,) newarr shape= (4, 3).

The output of newarr is as follows:

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

2. Merger and division

2.1 concatenate

One dimensional array merging:

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
arr

Output: array([1, 2, 3, 4, 5, 6]).

2D array merge:

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2))
arr

Output is:

array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

We add the parameter axis=1:

arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)
arr

Output is:

array([[1, 2, 5, 6],
       [3, 4, 7, 8]])

We move the mouse to concatenate and press the shortcut key Shift+Tab to view the method description. You can see that the concatenate method performs the merge operation along the existing axis. The default axis is = 0. When axis=1 is set, the merge operation is performed along the column.

2.2 array_split

Split array:

arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
newarr

The value of newarr is:

[array([[1, 2],
        [3, 4]]),
 array([[5, 6],
        [7, 8]]),
 array([[ 9, 10],
        [11, 12]])]

3. Search and filter

3.1 search

NumPy can find the array index that meets the conditions through the where method.

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
x

Output:

(array([1, 3, 5, 7], dtype=int64),)

3.2 screening

Let's look at the following code:

bool_arr = arr > 4
arr[bool_arr]

Output: array([5, 6, 7, 8]). This time we return the values in the array, not the index.
Let's see bool_ What exactly is the content of arr.
bool_ The output of arr is:

array([False, False, False, False,  True,  True,  True,  True])

So we can replace the above filter with the following command.

arr[arr > 4]

4. Sorting

The sort method sorts the array of ndarry.

arr = np.array(['banana', 'cherry', 'apple'])
np.sort(arr)

Output the sorted result: array (['Apple', 'Banana', 'cherry'], dtype = '< U6').

For two-dimensional arrays, the sort method sorts each row separately.

arr = np.array([[3, 2, 4], [5, 0, 1]])
np.sort(arr)

Output result:

array([[2, 3, 4],
       [0, 1, 5]])

5. Random

5.1 random probability

What should we do if we want to complete the following requirements?

Generate a one-dimensional array of 100 values, where each value must be 3, 5, 7, or 9.
Set the probability of a value of 3 to 0.1.
Set the probability of a value of 5 to 0.3.
Set the probability of a value of 7 to 0.6.
Set the probability that the value is 9 to 0.

We solve it with the following command:

random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(100))

Output result:

array([7, 5, 7, 7, 7, 7, 5, 7, 5, 7, 7, 5, 5, 7, 7, 5, 3, 5, 7, 7, 7, 7,
       7, 7, 7, 7, 7, 7, 5, 3, 7, 5, 7, 5, 7, 3, 7, 7, 3, 7, 7, 7, 7, 3,
       5, 7, 7, 5, 7, 7, 5, 3, 5, 7, 7, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 5,
       7, 7, 7, 7, 7, 5, 7, 7, 7, 7, 3, 7, 7, 5, 7, 5, 7, 5, 7, 7, 5, 7,
       7, 7, 7, 7, 7, 3, 5, 5, 7, 5, 7, 5])

5.2 random arrangement

5.2.1 permutation

Generate a new random arrangement according to the original array.

np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
new_arr = np.random.permutation(arr)
new_arr

The output is: array([3, 1, 5, 4, 2]). The original array arr remains unchanged.

5.2.2 shuffle

Change the original array to random arrangement. Shuffle means shuffle in English.

np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
arr

The output is: array([3, 1, 5, 4, 2]). The original array arr changes.

5.3 random distribution

5.3.1 Zhengtai distribution

Use NP random. The normal method generates random numbers that conform to the positive distribution.

x = np.random.normal(loc=1, scale=2, size=(2, 3))
x

The output result is:

array([[ 0.14998973,  3.22564777,  1.48094109],
       [ 2.252752  , -1.64038195,  2.8590667 ]])

If we want to see the random distribution of x, we need to install seaborn to draw the image. Install using pip:

pip install -i https://pypi.tuna.tsinghua.ed... seaborn

import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=False)
plt.show()

5.3.2 binomial distribution

Use NP random. Binomial method generates random numbers conforming to binomial distribution.

x = np.random.binomial(n=10, p=0.5, size=10)
x

The output result is: array([8, 6, 6, 2, 5, 5, 5, 5, 3, 4]).

Draw image:

import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=True, kde=False)
plt.show()

5.3.3 polynomial distribution

Polynomial distribution is a general representation of binomial distribution. Use NP random. Multinomial method generates random numbers conforming to polynomial distribution.

x = np.random.multinomial(n=6, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
x

The above code can be simply understood as rolling dice. n=6 is the face of the dice, and pvals means that the probability of each side is 1 / 6.

5.3.4 others

In addition to the above distribution, there are Poisson distribution, uniform distribution, exponential distribution, chi square distribution, Pareto distribution and so on. Interested parties can search by themselves.

This article is included in Pre machine learning series . You are welcome to like, collect and pay attention. More wonderful contents about machine learning are constantly updated!

Tags: Machine Learning numpy

Posted by lanbor on Tue, 03 May 2022 23:33:06 +0300