# Pre machine learning: master common NumPy usage in 30 minutes

NumPy supports a large number of dimensional array and matrix operations. It is a Python library for array operations.

## 1, Python Basics

Let's first consolidate the basics of Python. Python has six standard data types: Number, String, List, Tuple, Set, and Dictionary.
Of which:
Immutable data: Number, String, Tuple.
Variable data: List, Dictionary, Set.

### 1. List

The list is wrapped in square brackets [] and the value of each position is variable.

`list = [1, 2, 3, 4, 5, 6]`

Take values according to the position, such as the value of the second position:

`list[1]`

Get 2.
All values from the third position to the end of the list:

`a[2:]`

Get [3, 4, 5, 6].

Change the value of the specified position:

`list[0] = 9`

The output of list a is [9, 2, 3, 4, 5, 6].

### 2. Tuple

Tuples are enclosed by parentheses (), and the value of each position is immutable. Duplicate data is allowed.

`tuple = ('a', 'a, 'c', 1, 2, 3.0)`

Output ('a ',' a ',' C ', 1, 2, 3.0).
Take the last element:

`tuple[-1]`

Output 3.0.

The tuple operation is similar to the list, but the value of the element in the tuple cannot be changed, otherwise an error will be reported.

`tuple[2] = 'caiyongji'`

### 3. Set {set}

A collection is a collective containing non repeating elements, wrapped in curly braces {}.

```set1 = {'a','b','c','a'}
set2 = {'b','c','d','e'}```

The output result of set1 is: {'a', 'b', 'c'}. Note: the collection removes duplicate elements.
The output result of set2 is: {'B', 'C','d ',' e '}.

Unlike lists and tuples, sets are not subscribable, such as:

`set1[0]`

Next, let's look at set operations.

Difference set of set1 and set2:

```set1 - set2
#set1.difference(set2) ```

Output: {'a'}.

Union of set1 and set2:

```set1 | set2
#set1.union(set2) ```

Output: {'a', 'B', 'C','d ',' e '}.

Intersection of set1 and set2:

```set1 & set2
#set1.intersection(set2) ```

Output: {B ','c'}.

Symmetric difference set of set1 and set2:

```set1 ^ set2
#(set1 - set2) | (set2 - set1)
#set1.symmetric_difference(set2)```

Output: {'a','d ',' e '}.

The above difference set, union set, intersection set and symmetric difference set all have corresponding set methods. You can try the annotation method yourself.

### 4. Dictionary {dictionary}

Dictionary is a kind of mapping relationship, which is an unordered set of key value pairs. The dictionary does not allow duplicate keys, but allows duplicate values.

`dict = {'gongzhonghao':'caiyongji','website':'caiyongji.com', 'website':'blog.caiyongji.com'}`

The dictionary outputs {'gongzhonghao':'caiyongji ',' website ':'blog. Caiyongji. Com'}. It should be noted that when the dictionary contains duplicate keys, the following will overwrite the previous elements.

`dict['gongzhonghao']`

Output string caiyongji. We can also use the get method to get the same effect.

`dict.get('gongzhonghao')`

View all keys:

`dict.keys()`

Output dict_keys(['gongzhonghao', 'website']).

View all values:

`dict.values()`

Output dict_values(['caiyongji', 'blog.caiyongji.com']).
Change the value of an item:

```dict['website'] = 'caiyongji.com'
dict```

Output {'gongzhonghao':'caiyongji ',' website ':'caiyongji. Com'}.

Knowing the data types of Python, we can learn to use NumPy.

## 2, Numpy common usage

### 1. Create an array

```import numpy as np
arr = np.array([1, 2, 3, 4, 5])```

The output of array is array([1, 2, 3, 4, 5]).

We enter the following code to create a two-dimensional array:

```my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
mtrx= np.array(my_matrix)```

The output of mtrx is as follows:

```array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])```

### 2. Index and slice

The index one-dimensional array and two bit array are as follows:

`print('arr[0]=',arr[0],'mtrx[1,1]=',mtrx[1,1])`

Output arr[0]= 1 mtrx[1,1]= 5.

Slice array:

`arr[:3]`

The output result is array([1, 2, 3]).

Reciprocal slice:

`arr[-3:-1]`

Output array([3, 4]).

Add step, which determines the slice interval:

`arr[1:4:2]`

Output array([2, 4]).

2D array slice:

`mtrx[0:2, 0:2]`

Output, the meaning of the code is to take lines 1 and 2 and columns 1 and 2:

```array([[1, 2],
[4, 5]])```

### 3. dtype

The dtpe of NumPy has the following data types:

• i - integer
• b - boolean
• u - unsigned integer
• f - float
• c - complex float
• m - timedelta
• M - datetime
• O - object
• S - string
• U - unicode string
• V - fixed chunk of memory for other type ( void )
```import numpy as np
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array(['apple', 'banana', 'cherry'])
print('arr1.dtype=',arr1.dtype,'arr2.dtype=',arr2.dtype)```

The output is arr1 dtype= int32 arr2. dtype= <U6. The data type of arr1 is int32, and < U6 of arr2 represents no more than 6-bit Unicode string.

We can specify the dtype type.

`arr = np.array(['1', '2', '3'], dtype='f')`

Output result bit array ([1,2,3.], Dtype = float32), where 1 Indicates 1.0. You can see that dtype is set to bit float32 data type.

### 4. General methods

#### 4.1 arange

np. The output result of range (0101,2) is as follows. This command indicates that the data is generated evenly in the [0101) interval, and the interval step is 2.

```array([  0,   2,   4,   6,   8,  10,  12,  14,  16,  18,  20,  22,  24,
26,  28,  30,  32,  34,  36,  38,  40,  42,  44,  46,  48,  50,
52,  54,  56,  58,  60,  62,  64,  66,  68,  70,  72,  74,  76,
78,  80,  82,  84,  86,  88,  90,  92,  94,  96,  98, 100])```

#### 4.2 zeros

np. The output result of zeros ((2,5)) is as follows. This command indicates that the matrix (two-dimensional array) with 2 rows and 5 columns of 0 is output.

```array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])```

#### 4.3 ones

np. The output result of ones ((4,4)) is as follows. This command indicates that the matrix with 4 rows and 4 columns of 1 is output.

```array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])```

#### 4.4 eye

np. The output result of eye (5) is as follows. This command indicates that the output diagonal is 1 and the rest are all 0. A square matrix is a matrix with the same row and column.

```array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])```

#### 4.5 rand

np. random. The rand (5,2) command generates random numbers in 5 rows and 2 columns.

```array([[0.67227856, 0.4880784 ],
[0.82549517, 0.03144639],
[0.80804996, 0.56561742],
[0.2976225 , 0.04669572],
[0.9906274 , 0.00682573]])```

If you want to ensure the same random number as this example, you can use the same random seed as this example. Pass NP random. Set the seed method.

```np.random.seed(99)
np.random.rand(5,2)```

#### 4.6 randint

np. random. The output result of randInt (0101, (4,5)) is as follows. This command indicates that integers are randomly selected in the [0101) interval to generate an array of 4 rows and 5 columns.

```array([[ 1, 35, 57, 40, 73],
[82, 68, 69, 52,  1],
[23, 35, 55, 65, 48],
[93, 59, 87,  2, 64]])```

#### 4.7 max min argmax argmin

Let's randomly generate a set of numbers:

```np.random.seed(99)
ranarr = np.random.randint(0,101,10)
ranarr```

Output:

`array([ 1, 35, 57, 40, 73, 82, 68, 69, 52,  1])`

The maximum and minimum values are:

`print('ranarr.max()=',ranarr.max(),'ranarr.min()=',ranarr.min())`

The output is ranarr max()= 82 ranarr. min()= 1.
The index positions of the maximum and minimum values are:

`print('ranarr.argmax()=',ranarr.argmax(),'ranarr.argmin()=',ranarr.argmin())`

Output: ranarr argmax()= 5 ranarr. argmin()= 0. Note that when there are multiple maximum and minimum values, the previous index position is taken.

### 1. reshape

```arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3)```

Where, arr is a one-dimensional array, newarr is a two bit array, where behavior 4 and column 3.

`print('arr.shape=',arr.shape,'newarr.shape=',newarr.shape)`

Output arr.shape = (12,) newarr shape= (4, 3).

The output of newarr is as follows:

```array([[ 1,  2,  3],
[ 4,  5,  6],
[ 7,  8,  9],
[10, 11, 12]])```

### 2. Merger and division

#### 2.1 concatenate

One dimensional array merging:

```arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2))
arr```

Output: array([1, 2, 3, 4, 5, 6]).

2D array merge:

```arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2))
arr```

Output is:

```array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])```

```arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
arr = np.concatenate((arr1, arr2), axis=1)
arr```

Output is:

```array([[1, 2, 5, 6],
[3, 4, 7, 8]])```

We move the mouse to concatenate and press the shortcut key Shift+Tab to view the method description. You can see that the concatenate method performs the merge operation along the existing axis. The default axis is = 0. When axis=1 is set, the merge operation is performed along the column.

#### 2.2 array_split

Split array:

```arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])
newarr = np.array_split(arr, 3)
newarr```

The value of newarr is:

```[array([[1, 2],
[3, 4]]),
array([[5, 6],
[7, 8]]),
array([[ 9, 10],
[11, 12]])]```

### 3. Search and filter

#### 3.1 search

NumPy can find the array index that meets the conditions through the where method.

```arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])
x = np.where(arr%2 == 0)
x```

Output:

`(array([1, 3, 5, 7], dtype=int64),)`

#### 3.2 screening

Let's look at the following code:

```bool_arr = arr > 4
arr[bool_arr]```

Output: array([5, 6, 7, 8]). This time we return the values in the array, not the index.
Let's see bool_ What exactly is the content of arr.
bool_ The output of arr is:

`array([False, False, False, False,  True,  True,  True,  True])`

So we can replace the above filter with the following command.

`arr[arr > 4]`

### 4. Sorting

The sort method sorts the array of ndarry.

```arr = np.array(['banana', 'cherry', 'apple'])
np.sort(arr)```

Output the sorted result: array (['Apple', 'Banana', 'cherry'], dtype = '< U6').

For two-dimensional arrays, the sort method sorts each row separately.

```arr = np.array([[3, 2, 4], [5, 0, 1]])
np.sort(arr)```

Output result:

```array([[2, 3, 4],
[0, 1, 5]])```

### 5. Random

#### 5.1 random probability

What should we do if we want to complete the following requirements?

Generate a one-dimensional array of 100 values, where each value must be 3, 5, 7, or 9.
Set the probability of a value of 3 to 0.1.
Set the probability of a value of 5 to 0.3.
Set the probability of a value of 7 to 0.6.
Set the probability that the value is 9 to 0.

We solve it with the following command:

`random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(100))`

Output result:

```array([7, 5, 7, 7, 7, 7, 5, 7, 5, 7, 7, 5, 5, 7, 7, 5, 3, 5, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 5, 3, 7, 5, 7, 5, 7, 3, 7, 7, 3, 7, 7, 7, 7, 3,
5, 7, 7, 5, 7, 7, 5, 3, 5, 7, 7, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 5,
7, 7, 7, 7, 7, 5, 7, 7, 7, 7, 3, 7, 7, 5, 7, 5, 7, 5, 7, 7, 5, 7,
7, 7, 7, 7, 7, 3, 5, 5, 7, 5, 7, 5])```

#### 5.2 random arrangement

##### 5.2.1 permutation

Generate a new random arrangement according to the original array.

```np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
new_arr = np.random.permutation(arr)
new_arr```

The output is: array([3, 1, 5, 4, 2]). The original array arr remains unchanged.

##### 5.2.2 shuffle

Change the original array to random arrangement. Shuffle means shuffle in English.

```np.random.seed(99)
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
arr```

The output is: array([3, 1, 5, 4, 2]). The original array arr changes.

#### 5.3 random distribution

##### 5.3.1 Zhengtai distribution

Use NP random. The normal method generates random numbers that conform to the positive distribution.

```x = np.random.normal(loc=1, scale=2, size=(2, 3))
x```

The output result is:

```array([[ 0.14998973,  3.22564777,  1.48094109],
[ 2.252752  , -1.64038195,  2.8590667 ]])```

If we want to see the random distribution of x, we need to install seaborn to draw the image. Install using pip:

pip install -i https://pypi.tuna.tsinghua.ed... seaborn

```import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=False)
plt.show()```

##### 5.3.2 binomial distribution

Use NP random. Binomial method generates random numbers conforming to binomial distribution.

```x = np.random.binomial(n=10, p=0.5, size=10)
x```

The output result is: array([8, 6, 6, 2, 5, 5, 5, 5, 3, 4]).

Draw image:

```import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(x, hist=True, kde=False)
plt.show()```

##### 5.3.3 polynomial distribution

Polynomial distribution is a general representation of binomial distribution. Use NP random. Multinomial method generates random numbers conforming to polynomial distribution.

```x = np.random.multinomial(n=6, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
x```

The above code can be simply understood as rolling dice. n=6 is the face of the dice, and pvals means that the probability of each side is 1 / 6.

##### 5.3.4 others

In addition to the above distribution, there are Poisson distribution, uniform distribution, exponential distribution, chi square distribution, Pareto distribution and so on. Interested parties can search by themselves.