NumPy (Numerical Python) Basics

NumPy (Numerical Python) Foundation

NumPy is the abbreviation of Numerical Python. It is the most important basic package in Python numerical calculation.

One of the reasons why NumPy is important is that its design is very effective for data with a large number of arrays. There are also the following reasons:

  • NumPy internally stores data in consecutive memory blocks, which is different from other Python built-in data structures. NumPy's algorithm library is written in C language, so there is no need for any type of check or other management operations when operating data memory. NumPy arrays also use less memory than other Python built-in sequences.
  • NumPy can perform complex calculations for full arrays without writing Python loops.

The following example will show the difference of NumPy. Suppose a NumPy array contains 1 million integers and a Python list with the same data content:

In [3]: import numpy as np

In [4]: myArr=np.arange(1000000)

In [5]: myList=list(range(1000000))

In [6]: %time for _ in range(10):myArr2=myArr*2
Wall time: 38.9 ms

In [7]: %time for _ in range(10):myList2=[x*2 for x in myList]
Wall time: 1.52 s

It can be seen that NumPy's method is much faster than Python's method.

1 NumPy ndarray: multidimensional array object

One of the core features of NumPy is the N-dimensional array object - ndarray. Ndarray is a fast and flexible container for large data sets in Python.

Arrays allow you to perform mathematical calculations on the whole block of data using an operation syntax similar to scalar:

First, we generate a small random array:

In [8]: data=np.random.randn(2,3)

In [9]: data
Out[9]: 
array([[-0.87283691, -1.694129  , -0.4310078 ],
       [ 0.822441  ,  1.95883674,  0.66952787]])

Then do some mathematical operations on data

In [10]: data*10
Out[10]: 
array([[ -8.72836909, -16.94128997,  -4.31007798],
       [  8.22440997,  19.58836741,   6.69527866]])

In [11]: data+data
Out[11]: 
array([[-1.74567382, -3.38825799, -0.8620156 ],
       [ 1.64488199,  3.91767348,  1.33905573]])

In the first mathematical operation, all elements are multiplied by 10 at the same time. In the second mathematical operation, the corresponding elements in the array are added.

An ndarray is a general multi-dimensional homogeneous data container, that is, each element contained in it is of the same type. Each array has a shape attribute, which is used to represent the number of each dimension of the array; Each array has a dtype attribute, which is used to describe the data type of the array:

In [12]: data.shape
Out[12]: (2, 3)

In [13]: data.dtype
Out[13]: dtype('float64')

In the following, when you see "array", "NumPy array" or "ndarray", they all represent the same object: the ndarray object.

Note: since NumPy focuses on numerical calculation, if not specified, the default data type is float64 (floating point).

Generate ndarray 1.1

The easiest way to generate an array is to use the array function. The array function receives any sequential object (including other arrays of course) and generates a new NumPy array containing the passed data. For example, list conversion is a good example:

In [14]: data1=[6,7.5,8,3,4]

In [15]: arr1=np.array(data1)

In [16]: arr1
Out[16]: array([ 6. ,  7.5,  8. ,  3. ,  4. ])

Nested sequences, such as lists of the same length, are automatically converted to multidimensional arrays:

In [17]: data2=[[1,2,3,4],[5,6,7,8]]

In [18]: arr2=np.array(data2)

In [19]: arr2
Out[19]: 
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Because data2 is a list containing a list, the Numpy array arr2 forms a two-dimensional array. We can confirm this by checking the ndim and shape properties:

In [20]: arr2.ndim
Out[20]: 2

In [21]: arr2.shape
Out[21]: (2, 4)

NP. Unless explicitly specified Array automatically infers the data type of the generated array. The data type is stored in a special metadata dtype:

In [22]: arr1.dtype
Out[22]: dtype('float64')

In [23]: arr2.dtype
Out[23]: dtype('int32')

Except NP Array, there are many other functions that can create new arrays. For example, given the length and shape,

  • zeros can create all 0 arrays at once,
  • ones can create all 1 arrays at once.
  • empty can create an array without initialization value.

To create a high-dimensional array, you need to pass a tuple for the shape:

In [24]: np.zeros(10)
Out[24]: array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [25]: np.zeros((3,6))
Out[25]: 
array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

In [26]: np.ones(10)
Out[26]: array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [27]: np.ones((2,3,2))
Out[27]: 
array([[[ 1.,  1.],
        [ 1.,  1.],
        [ 1.,  1.]],

       [[ 1.,  1.],
        [ 1.,  1.],
        [ 1.,  1.]]])

In [28]: np.empty((2,4))
Out[28]: 
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

Note: you want to use NP Empty to generate an all 0 array is not safe. Sometimes it may return uninitialized garbage values.

Range is an array version of Python's built-in function range:

In [29]: np.arange(15)
Out[29]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

np.asarray converts input to ndarray

In [30]: a=[1,2,3]

In [31]: arr3=np.asarray(a)

In [32]: arr3
Out[32]: array([1, 2, 3])

Both array and asarray can convert structural data into ndarray, but the main difference is that when the data source is ndarray, array will still copy a copy and occupy new memory, but asarray will not.

1.2 data type of ndarray

Data type, dytpe, is a special object, which contains the memory block information (also known as metadata, i.e. data representing data) that darray needs to declare for a certain type of data:

np.array can specify the data type through dtype:

In [35]: arr1=np.array([1,2,3],dtype=np.float64)

In [36]: arr2=np.array([1,2,3],dtype=np.int32)

In [37]: arr1.dtype
Out[37]: dtype('float64')

In [38]: arr2.dtype
Out[38]: dtype('int32')

Dtypes of data are usually named in one way: type names, such as float and int, followed by numbers indicating the number of digits of each element. A standard double precision floating-point value (float in Python) will use 8 bytes or 64 bits. Therefore, this type is called float64 in NumPy. The following table will show the data types supported by all NumPy.

You can use the astype method to explicitly convert the data type of the array:

In [39]: arr=np.array([1,2,3,4,5])

In [40]: arr.dtype
Out[40]: dtype('int32')
In [42]: floatArr=arr.astype(np.float64)
In [43]: floatArr.dtype
Out[43]: dtype('float64')

If you have an array in which the elements are strings expressing the meaning of numbers, you can also convert the string into numbers through astype:

In [44]: numericStrings=np.array(['1.25','-9.6','42'],dtype=np.string_)

In [45]: numericStrings.astype(float)
Out[45]: array([  1.25,  -9.6 ,  42.  ])

Here I lazily use float instead of NP Float64, because NumPy can use the same alias to represent Python data types with the same precision as python.

You can also use another array dtype property to specify the array type:

In [49]: intArray=np.arange(10)

In [50]: intArray
Out[50]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [51]: calibers=np.array([.22,.27,.35],dtype=np.float64)

In [52]: intArray.astype(calibers.dtype)
Out[52]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

You can also use the type code in the table to pass in type data:

In [53]: emptyUnit32=np.empty(8,dtype='u4')

In [54]: emptyUnit32
Out[54]: array([0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)

Note: when using astype, a new array is always generated, even if the dtype you pass in is the same as before.

1.3 NumPy array arithmetic

Array is important because it allows you to perform batch operations without any for loops. NumPy users call this feature vectorization. Any arithmetic operation between two equal size arrays applies element by element operation:

In [56]: arr=np.array([[1.,2.,3.],[4.,5.,6.]])

In [57]: arr
Out[57]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [58]: arr*arr
Out[58]: 
array([[  1.,   4.,   9.],
       [ 16.,  25.,  36.]])

In [59]: arr-arr
Out[59]: 
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

The arithmetic operation with scalar calculation will pass the calculation parameters to each element of the array:

In [60]: 1/arr
Out[60]: 
array([[ 1.        ,  0.5       ,  0.33333333],
       [ 0.25      ,  0.2       ,  0.16666667]])

In [61]: arr**0.5
Out[61]: 
array([[ 1.        ,  1.41421356,  1.73205081],
       [ 2.        ,  2.23606798,  2.44948974]])

A comparison between arrays of the same size produces a Boolean array:

In [62]: arr2=np.array([[0.,4.,1.],[7.,2.,12.]])

In [63]: arr2
Out[63]: 
array([[  0.,   4.,   1.],
       [  7.,   2.,  12.]])

In [64]: arr2>arr
Out[64]: 
array([[False,  True, False],
       [ True, False,  True]], dtype=bool)

The operation between arrays of different sizes will use the broadcast feature, which will be explained in the high-level chapter.

1.4 basic index and slice

NumPy array index has many ways to let you select a subset of data or a single element. The one-dimensional array is relatively simple and looks very similar to the Python list:

In [66]: arr
Out[66]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [67]: arr[5]
Out[67]: 5

In [68]: arr[5:8]
Out[68]: array([5, 6, 7])

As you can see, if you pass a value to the slice of the array, for example, arr[5:8] = 12, the value is passed to the whole slice:

In [70]: arr
Out[70]: array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

Unlike Python's built-in list, the slice of the array is the view of the original array. This means that the data is not copied, and any changes to the view will be reflected on the original array.
For example:

In [72]: arrSlice=arr[5:8]

In [73]: arrSlice
Out[73]: array([12, 12, 12])

When I change arrSlice, the change will also be reflected in the original array:

In [74]: arrSlice[1]=12345

In [75]: arr
Out[75]: array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
   9])

[:] will reference all values of the array:

In [76]: arrSlice[:]=64

In [77]: arr
Out[77]: array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

Note: if you still want a copy of the array slice instead of a view, you must explicitly copy the array, such as arr [5:8] copy()

For higher dimensional arrays, you will have more choices. In a two-dimensional array, the element corresponding to each index value is no longer a value, but a one-dimensional array:

In [80]: arr2d=np.array([[1,2,3],[4,5,6],[7,8,9]])

In [81]: arr2d[2]
Out[81]: array([7, 8, 9])

Therefore, a single element can be obtained recursively. However, to write more code, you can select a single element by passing a comma separated list of indexes. The following two methods have the same effect:

In [82]: arr2d[0][2]
Out[82]: 3

In [83]: arr2d[0,2]
Out[83]: 3

The following figure shows the index on a two-dimensional array. We can regard axis 0 as a "row" and axis 1 as a "column":

In a multidimensional array, you can omit the subsequent index value, and the returned object will be an array with one dimension reduced. So in a 2 × two × In array arr3d of 3:

In [94]: arr3d=np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])

In [95]: arr3d
Out[95]: 
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

arr3d[0] is a 2 × Array of 3:

In [96]: arr3d[0]
Out[96]: 
array([[1, 2, 3],
       [4, 5, 6]])

Both scalar and array can be passed to arr3d[0]:

In [97]: oldValue=arr3d[0].copy()

In [98]: arr3d[0]=42

In [99]: arr3d
Out[99]: 
array([[[42, 42, 42],
        [42, 42, 42]],

Similarly, arr3d[1, 0] returns a one-dimensional array:

In [104]: arr3d[1,0]
Out[104]: array([7, 8, 9])

The above expression can be broken down into the following two steps:

In [105]: x=arr3d[1]

In [106]: x
Out[106]: 
array([[ 7,  8,  9],
       [10, 11, 12]])

In [107]: x[0]
Out[107]: array([7, 8, 9])

Note: in the above array subset selection, the returned arrays are views

Slice index of array:

Similar to one-dimensional objects in Python lists, arrays can be sliced with similar syntax:

In [114]: arr
Out[114]: array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [115]: arr[1:6]
Out[115]: array([ 1,  2,  3,  4, 64])

Recall the previous two-dimensional array, arr2d. Slicing the array is slightly different:

In [116]: arr2d
Out[116]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [117]: arr2d[:2]
Out[117]: 
array([[1, 2, 3],
       [4, 5, 6]])

As you can see, the array is sliced along axis 0. The expression arrzd[:2] means to select the first two "rows" of arr2d.

You can perform multiple sets of slices, similar to multiple sets of indexes:

In [118]: arr2d[:2,1:]
Out[118]: 
array([[2, 3],
       [5, 6]])

When you slice as in the above example, you need to slice according to the dimension of the original array. If you mix indexes and slices, you can get low-dimensional slices.

In [119]: arr2d[1,:2]
Out[119]: array([4, 5])

Similarly, I can select the third column, but only the first two rows:

In [120]: arr2d[:2,2]
Out[120]: array([3, 6])

As shown in the figure below.

It should be noted that a single colon indicates that the array on the whole axis is selected, so you can slice in a higher dimension in the following way:

In [121]: arr2d[:,:1]
Out[121]: 
array([[1],
       [4],
       [7]])

Of course, when assigning a value to the slice expression, the whole slice will be re assigned:

In [122]: arr2d[:2,1:]=0

In [123]: arr2d
Out[123]: 
array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

1.5 Boolean index

Let's consider the following example, assuming that our data is in the array, and the data in the array is some duplicate names. Use numpy randn function in random to generate some data with random normal distribution:

In [125]: names=np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])

In [126]: data=np.random.randn(7,4)

In [127]: names
Out[127]: 
array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],
      dtype='<U4')

In [128]: data
Out[128]: 
array([[-0.89412047, -0.42046182,  0.16146748, -0.14959688],
       [-0.40675299,  0.64242108, -0.50005725, -0.82097767],
       [ 0.32188333, -0.52883977,  0.64080695, -0.05199697],
       [ 1.33759774, -0.23715279,  1.25083876,  0.67882217],
       [ 0.78827395,  0.55551406, -0.71361533,  1.05719943],
       [ 0.59454886,  0.11691395,  0.14344497,  2.21820962],
       [-0.37751872, -0.39142441,  0.43449828,  0.02141508]])

Suppose that each person's name corresponds to a row in the data array, and we want to select all the rows corresponding to 'Bob'.

Similar to mathematical operations, array comparison operations (such as = =) can also be vectorized. Therefore, comparing the names array with the string 'Bob' produces a Boolean array:

In [129]: names=='Bob'
Out[129]: array([ True, False, False,  True, False, False, False], dtype=bool)

When indexing an array, you can pass in a Boolean array:

In [130]: data[names=='Bob']
Out[130]: 
array([[-0.89412047, -0.42046182,  0.16146748, -0.14959688],
       [ 1.33759774, -0.23715279,  1.25083876,  0.67882217]])

The length of Boolean array must be consistent with the length of array axis index.

You can even mix and match Boolean arrays with slices or integer values (or sequences of integer values, which will be described later).

Note: when the length of Boolean value array is incorrect, the method of selecting data by Boolean value will not report an error, so be careful when using this feature.

We can also select the row with names == 'Bob' and index each column:

In [131]: data[names=='Bob',2:]
Out[131]: 
array([[ 0.16146748, -0.14959688],
       [ 1.25083876,  0.67882217]])

In [132]: data[names=='Bob',3]
Out[132]: array([-0.14959688,  0.67882217])

In order to select data other than 'Bob', you can use= Or use before the conditional expression to negate the condition:

In [139]: data[names !='Bob']
Out[139]: 
array([[-0.40675299,  0.64242108, -0.50005725, -0.82097767],
       [ 0.32188333, -0.52883977,  0.64080695, -0.05199697],
       [ 0.78827395,  0.55551406, -0.71361533,  1.05719943],
       [ 0.59454886,  0.11691395,  0.14344497,  2.21820962],
       [-0.37751872, -0.39142441,  0.43449828,  0.02141508]])

In [140]: data[~(names=='Bob')]
Out[140]: 
array([[-0.40675299,  0.64242108, -0.50005725, -0.82097767],
       [ 0.32188333, -0.52883977,  0.64080695, -0.05199697],
       [ 0.78827395,  0.55551406, -0.71361533,  1.05719943],
       [ 0.59454886,  0.11691395,  0.14344497,  2.21820962],
       [-0.37751872, -0.39142441,  0.43449828,  0.02141508]])

The ~ symbol can be used when you want to negate a general condition:

In [153]: cond=names=='Bob'

In [154]: data[~cond]
Out[154]: 
array([[-0.40675299,  0.64242108, -0.50005725, -0.82097767],
       [ 0.32188333, -0.52883977,  0.64080695, -0.05199697],
       [ 0.78827395,  0.55551406, -0.71361533,  1.05719943],
       [ 0.59454886,  0.11691395,  0.14344497,  2.21820962],
       [-0.37751872, -0.39142441,  0.43449828,  0.02141508]])

When you want to select two of the three names, you can combine multiple Boolean conditions. You need to use mathematical operators such as & (and) and | (or):

In [155]: mask=(names=='Bob')|(names=='Will')

In [156]: mask
Out[156]: array([ True, False,  True,  True,  True, False, False], dtype=bool)

In [157]: data[mask]
Out[157]: 
array([[-0.89412047, -0.42046182,  0.16146748, -0.14959688],
       [ 0.32188333, -0.52883977,  0.64080695, -0.05199697],
       [ 1.33759774, -0.23715279,  1.25083876,  0.67882217],
       [ 0.78827395,  0.55551406, -0.71361533,  1.05719943]])

When selecting data using a Boolean index, a copy of the data is always generated, even if there is no change in the returned array.

Note: Python keywords and and or are not useful for Boolean arrays. Please use & (and) and | (or) instead.

It is also feasible to set the value of Boolean array based on common sense. To set all negative values in data to 0, we need to do the following:

In [159]: data
Out[159]: 
array([[ 0.        ,  0.        ,  0.16146748,  0.        ],
       [ 0.        ,  0.64242108,  0.        ,  0.        ],
       [ 0.32188333,  0.        ,  0.64080695,  0.        ],
       [ 1.33759774,  0.        ,  1.25083876,  0.67882217],
       [ 0.78827395,  0.55551406,  0.        ,  1.05719943],
       [ 0.59454886,  0.11691395,  0.14344497,  2.21820962],
       [ 0.        ,  0.        ,  0.43449828,  0.02141508]])

Using a one-dimensional Boolean array to set values for each row is also very simple:

In [162]: data
Out[162]: 
array([[ 7.        ,  7.        ,  7.        ,  7.        ],
       [ 0.        ,  0.64242108,  0.        ,  0.        ],
       [ 7.        ,  7.        ,  7.        ,  7.        ],
       [ 7.        ,  7.        ,  7.        ,  7.        ],
       [ 7.        ,  7.        ,  7.        ,  7.        ],
       [ 0.59454886,  0.11691395,  0.14344497,  2.21820962],
       [ 0.        ,  0.        ,  0.43449828,  0.02141508]])

pandas is convenient for the above operations on two-dimensional data, which will be introduced later.

1.6 magic index

Magic index is a term used in NumPy to describe the use of integer arrays for data indexing.

In [164]: for i in range(8):
     ...:     arr[i]=i
     ...: 

In [165]: arr
Out[165]: 
array([[ 0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.]])

To select a subset that fits a specific order, you can simply pass a list or array that indicates the desired order:

In [166]: arr[[4,3,0,7]]
Out[166]: 
array([[ 4.,  4.,  4.,  4.],
       [ 3.,  3.,  3.,  3.],
       [ 0.,  0.,  0.,  0.],
       [ 7.,  7.,  7.,  7.]])

If a negative index is used, the selection is made from the tail:

In [167]: arr[[-3,-5,-7]]
Out[167]: 
array([[ 5.,  5.,  5.,  5.],
       [ 3.,  3.,  3.,  3.],
       [ 1.,  1.,  1.,  1.]])

When passing multiple index arrays, the situation is slightly different. In this way, a one-dimensional array will be selected according to the elements corresponding to each index tuple:

In [168]: arr=np.arange(32).reshape((8,4))

In [169]: arr
Out[169]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [170]: arr[[1,5,7,2],[0,3,1,2]]
Out[170]: array([ 4, 23, 29, 10])

In the above example, elements (1, 0), (5, 3), (7, 1) and (2, 2) are selected. If the dimension of the array is not considered (two-dimensional in this case), the result of the magic index is always one-dimensional.

In this case, the behavior of magic index is different from what some users think. Usually, the result we imagine is a rectangular region formed by selecting a subset of rows and columns in the matrix. Here is a way to realize our idea:

In [172]: arr[[1,5,7,2]][:,[0,3,1,2]]
Out[172]: 
array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

Keep in mind that magic indexes, unlike slices, always copy data into a new array.

1.7 array transposition and axis change

transpose is a special form of data reorganization, which can return the view of the underlying data without copying anything. Arrays have a transfer method and special T attribute:

In [173]: arr=np.arange(15).reshape((3,5))

In [174]: arr
Out[174]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [175]: arr.T
Out[175]: 
array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

When calculating the matrix, you may often perform some specific operations. For example, when calculating the inner product of the matrix, you will use NP dot:

In [176]: arr=np.random.randn(6,3)

In [177]: arr
Out[177]: 
array([[ 0.43490809,  1.86441539, -1.02264956],
       [-0.62337491,  1.67335413,  1.49459031],
       [-0.37326413,  0.23498063, -0.45610503],
       [-1.13486383,  0.82876635,  1.35396902],
       [ 0.12299609,  0.01378183, -0.21528651],
       [-1.67351574,  0.2282682 ,  2.19521356]])

In [178]: np.dot(arr.T,arr)
Out[178]: 
array([[  4.8207663 ,  -1.64083975,  -6.44297531],
       [ -1.64083975,   7.07052466,   2.10741381],
       [ -6.44297531,   2.10741381,  10.18618708]])

For arrays with higher dimensions, the transfer method can receive tuples containing axis numbers for replacing axes (expand the following thinking):

In [190]: arr=np.arange(16).reshape((2,2,4))

In [191]: arr
Out[191]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [192]: arr.transpose((0,2,1))
Out[192]: 
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

Here, the axes have been reordered so that the original second axis becomes the first, the original first axis becomes the second, and the last axis does not change.
For the parameters of transfer: for example, (0,1,2) corresponds to (2,2,4); (0,2,1) corresponds to (2,4,2)

Using. T for transposition is a special case of shaft replacement. ndarray has a swaaxes method, which takes a pair of axis numbers as parameters and adjusts the axis to reorganize the data:

In [193]: arr
Out[193]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [194]: arr.swapaxes(1,2)
Out[194]: 
array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

Swaaxes returns a view of the data without copying it.

2 general function: fast element by element array function

General function, also known as ufunc, is a function that performs element by element operations in ndarray data. Some simple functions receive one or more scalar values and produce one or more scalar results, and the general function is the vectorization encapsulation of these simple functions.

Many ufunc are simple element by element transformations, such as sqrt or exp functions:

In [195]: arr=np.arange(10)

In [196]: arr
Out[196]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [197]: np.sqrt(arr)
Out[197]: 
array([ 0.        ,  1.        ,  1.41421356,  1.73205081,  2.        ,
        2.23606798,  2.44948974,  2.64575131,  2.82842712,  3.        ])

In [198]: np.exp(arr)
Out[198]: 
array([  1.00000000e+00,   2.71828183e+00,   7.38905610e+00,
         2.00855369e+01,   5.45981500e+01,   1.48413159e+02,
         4.03428793e+02,   1.09663316e+03,   2.98095799e+03,
         8.10308393e+03])

These are so-called univariate general-purpose functions that receive y an array and return an array as a result.

Some general-purpose functions, such as add or maximum, will receive two arrays and return an array as the result, so they are called binary general-purpose functions:

In [206]: x=np.random.randn(8)

In [207]: y=np.random.randn(8)

In [208]: x
Out[208]: 
array([ 1.63166293,  0.18836267,  0.36451741,  1.11050876,  1.1087304 ,
        0.44256716,  0.19144269,  0.27928437])

In [209]: y
Out[209]: 
array([ 1.16806423, -0.05249454,  0.97241914, -0.33117397, -0.06592521,
        0.27072546, -0.14238005,  0.6313865 ])

In [210]: np.maximum(x,y)
Out[210]: 
array([ 1.63166293,  0.18836267,  0.97241914,  1.11050876,  1.1087304 ,
        0.44256716,  0.19144269,  0.6313865 ])

In [211]: np.add(x,y)
Out[211]: 
array([ 2.79972716,  0.13586813,  1.33693655,  0.77933478,  1.04280519,
        0.71329262,  0.04906264,  0.91067088])

Here, numpy Maximum is the maximum value obtained by element by element analysis of x and y.

There are also general-purpose functions that return multiple arrays. For example, modf is the vectorized version of the Python built-in function divmod. It returns the decimal and integer parts of an array of floating-point values:

In [212]: arr=np.random.randn(7)*5

In [213]: arr
Out[213]: 
array([ -1.84578642,  -0.99515375,  -2.02592448,  -2.04486449,
       -12.47831449,   3.75158473,   2.33752581])

In [214]: remainder,wholePart=np.modf(arr)

In [215]: remainder
Out[215]: 
array([-0.84578642, -0.99515375, -0.02592448, -0.04486449, -0.47831449,
        0.75158473,  0.33752581])

In [216]: wholePart
Out[216]: array([ -1.,  -0.,  -2.,  -2., -12.,   3.,   2.])

Here are some general functions:

  • Univariate general function
Function name describe
abs,fabs Computes the absolute value of an integer, floating-point number, or complex number function by function
sqrt Calculate the square root of each element (equal to arr**0.5)
square Calculate the square of each element (equal to arr**2)
  • Binary general function
Function name describe
add Add the corresponding elements of the array
subtract The first array element minus the corresponding element of the second array
multiply Multiplies the corresponding elements of the array
divide,floor_divide Divide or divide the first array element by the corresponding element of the second array
power Take the elements of the second array as the power of the corresponding elements of the first array
maximum,fmax Calculate the maximum value element by element, fmax ignoring NaN
minimum,fmin Calculate the minimum value element by element, fmin ignoring NaN
mod Calculate according to the modulus of elements (i.e. calculate the remainder of division)
copysign Change the symbolic value of the first array to the symbolic value of the second array
greater,greater_equal,less,less_equal,equal,not_equal Compare elements one by one and return the Boolean value array (consistent with the mathematical operator >. > =. <. < =. =.! = effect)
logical_and,logical_or,logical_xor Carry out logical operation one by one (consistent with the effect of logical operator &. |. ^)

3 using array for array oriented programming

Using NumPy array allows you to use simple array expressions to complete a variety of data operation tasks without writing a large number of loops. This method of using array expressions instead of explicit loops is called vectorization. Generally, vectorized array operations are one to two orders of magnitude faster than pure Python equivalent implementations (or even more), which has the greatest impact on all kinds of numerical calculations.

As a simple example, suppose we want to calculate the value of the function sqrt(x^2 + y^2) for some grid data. np. The meshgrid function receives two one-dimensional arrays and generates a two-dimensional matrix according to all (x, y) pairs of the two arrays:

In [5]: points=np.arange(-5,5,0.01)# 1000 equally spaced points

In [6]: xs,ys=np.meshgrid(points,points)

In [7]: ys
Out[7]: 
array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ...,
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

In [8]: xs
Out[8]: 
array([[-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       ...,
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99]])

Now you can use the function with the same expression as the two coordinate values:

In [9]: z=np.sqrt(xs**2+ys**2)

In [10]: z
Out[10]: 
array([[ 7.07106781,  7.06400028,  7.05693985, ...,  7.04988652,
         7.05693985,  7.06400028],
       [ 7.06400028,  7.05692568,  7.04985815, ...,  7.04279774,
         7.04985815,  7.05692568],
       [ 7.05693985,  7.04985815,  7.04278354, ...,  7.03571603,
         7.04278354,  7.04985815],
       ...,
       [ 7.04988652,  7.04279774,  7.03571603, ...,  7.0286414 ,
         7.03571603,  7.04279774],
       [ 7.05693985,  7.04985815,  7.04278354, ...,  7.03571603,
         7.04278354,  7.04985815],
       [ 7.06400028,  7.05692568,  7.04985815, ...,  7.04279774,
         7.04985815,  7.05692568]])

Use matplotlib to generate the visualization of this two-dimensional array:

In [18]: plt.imshow(z,cmap=plt.cm.gray);plt.colorbar()
Out[18]: <matplotlib.colorbar.Colorbar at 0x29552e85a20>

In [19]: plt.show()

3.1 operate conditional logic as an array

numpy. The where function is the vectorized version of the ternary expression x if condition else y. Suppose we have a Boolean array and two numeric arrays:

In [20]: xarr=np.array([1.1,1.2,1.3,1.4,1.5])

In [21]: yarr=np.array([2.1,2.2,2.3,2.4,2.5])

In [22]: cond=np.array([True,False,True,True,False])

Assuming that the element in cond is True, we take the corresponding element value in xarr, otherwise we take the element in yarr. We can do this through list derivation, like the following code:

In [23]: result=[(x if c else y)for x,y,c in zip(xarr,yarr,cond)]

In [24]: result
Out[24]: [1.1000000000000001, 2.2000000000000002, 1.3, 1.3999999999999999, 2.5]

This can cause multiple problems. First, if the array is large, it will be slow (because all the work is done by interpreting Python code through the interpreter). Second, when the array is multidimensional, it can't work. Instead, use NP Where, you can do it very simply:

In [25]: result=np.where(cond,xarr,yarr)

In [26]: result
Out[26]: array([ 1.1,  2.2,  1.3,  1.4,  2.5])

np. The second and third parameters of where do not need to be arrays, they can be scalars. A typical use of where in data analysis is to generate a new array from an array. Suppose you have a randomly generated matrix data, and you want to replace the positive values with 2, replace all negative values with - 2, and use NP Where will be easy to implement:

In [27]: arr=np.random.randn(4,4)

In [28]: arr
Out[28]: 
array([[ 1.37269448, -0.34302535, -0.1347065 , -0.21584618],
       [-0.01130576, -1.39947124, -0.14525684,  0.25395165],
       [ 1.66207732,  0.7099269 , -0.75340511,  2.07840543],
       [-1.76960793,  0.33001967, -2.05205631, -0.23086318]])

In [29]: arr>0
Out[29]: 
array([[ True, False, False, False],
       [False, False, False,  True],
       [ True,  True, False,  True],
       [False,  True, False, False]], dtype=bool)

In [30]: np.where(arr>0,2,-2)
Out[30]: 
array([[ 2, -2, -2, -2],
       [-2, -2, -2,  2],
       [ 2,  2, -2,  2],
       [-2,  2, -2, -2]])

You can use NP Where combines scalar and array. For example, I can replace all positive values in arr with constant 2 like the following code:

In [31]: np.where(arr>0,2,arr) # Set only positive values to 2
Out[31]: 
array([[ 2.        , -0.34302535, -0.1347065 , -0.21584618],
       [-0.01130576, -1.39947124, -0.14525684,  2.        ],
       [ 2.        ,  2.        , -0.75340511,  2.        ],
       [-1.76960793,  2.        , -2.05205631, -0.23086318]])

3.2 mathematical and statistical methods

Many mathematical functions about calculating the statistics of the whole array or about axial data can be called as methods of array type. You can use aggregate functions (usually called reduction functions), such as sum, mean and std (standard deviation). You can call the methods of array instances directly or use the top-level NumPy function.

Here I generated some random numbers with normal distribution and calculated some aggregate statistics:

In [32]: arr=np.random.randn(5,4)

In [33]: arr
Out[33]: 
array([[-0.03304949,  1.93194584, -1.1230818 , -1.03654134],
       [-1.46226783, -0.45064104,  2.70765724, -2.03105991],
       [ 0.59024734,  0.79672987, -0.16060697, -0.48210045],
       [-0.13542699, -0.9491309 ,  0.45616927,  0.3331243 ],
       [-1.56575792,  1.29820459,  1.6194832 , -0.33264842]])

In [34]: arr.mean()
Out[34]: -0.0014375695653257915

In [35]: np.mean(arr)
Out[35]: -0.0014375695653257915

In [36]: arr.sum()
Out[36]: -0.028751391306515828

Functions such as mean and sum can receive an optional parameter axis, which can be used to calculate the statistical value on a given axis to form an array with one dimension down:

In [37]: arr.mean(axis=1)
Out[37]: array([-0.0651817 , -0.30907788,  0.18606745, -0.07381608,  0.25482036])


In [38]: arr.mean(axis=0)
Out[38]: array([-0.52125098,  0.52542167,  0.69992419, -0.70984516])

arr.mean(1) means "calculate the average value of each column", while arr.sum(0) means "calculate the cumulative sum of row axes".

Other methods, such as cumsum and cumprod, do not aggregate. They produce an intermediate result:

In [41]: arr
Out[41]: array([0, 1, 2, 3, 4, 5, 6, 7])

In [42]: arr.cumsum()
Out[42]: array([ 0,  1,  3,  6, 10, 15, 21, 28], dtype=int32)

In multidimensional arrays, cumulative functions such as cumsum return arrays of the same length, but can be partially aggregated according to slices of lower dimensions in the specified axis:

In [44]: arr=np.array([[0,1,2],[3,4,5],[6,7,8]])

In [45]: arr
Out[45]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [46]: arr.cumsum(axis=0)
Out[46]: 
array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)

In [47]: arr.cumsum(axis=1)
Out[47]: 
array([[ 0,  1,  3],
       [ 3,  7, 12],
       [ 6, 13, 21]], dtype=int32)

Here are some basic array statistics methods

method describe
sum Calculate the cumulative sum of all elements along the axis, an array of 0 length, and the cumulative sum is 0
mean Mathematical average, the average value of 0-length array is NaN
std,var Standard deviation and variance can be adjusted by degrees of freedom (the default denominator is n)
min,max Minimum and maximum
argmin,argmax Location of minimum and maximum values
cumsum Cumulative sum of elements starting from 0
cumprod Element accumulation from 1

3.3 method of Boolean array

In the method described earlier, Boolean values are forced to 1 (True) and 0False). Therefore, sum can usually be used to calculate the number of True in Boolean value array:

In [48]: arr=np.random.randn(100)

In [49]: (arr>0).sum() # Number of positive values
Out[49]: 48

For Boolean arrays, there are two very useful methods, any and all. Any checks whether there is at least one True value in the array and all checks whether each value is True:

In [50]: bools=np.array([False,False,True,False])

In [51]: bools.any()
Out[51]: True

In [52]: bools.all()
Out[52]: False

These methods can also be applied to non Boolean arrays, and all non-0 elements are treated as True.

3.4 sorting

Similar to Python's built-in list type, NumPy arrays can be sorted by location using the sort method:

In [53]: arr=np.random.randn(6)

In [54]: arr
Out[54]: 
array([ 1.44818722,  0.70287678, -0.48637124, -1.70994466, -0.42102148,
        1.69289396])

In [55]: arr.sort()

In [56]: arr
Out[56]: 
array([-1.70994466, -0.48637124, -0.42102148,  0.70287678,  1.44818722,
        1.69289396])

You can sort each one-dimensional data segment along the axis in the multi-dimensional array according to the axis value passed:

In [57]: arr=np.random.randn(5,3)

In [58]: arr
Out[58]: 
array([[ 0.25460328, -0.21277726,  0.59628914],
       [-0.15334087,  0.86854163,  0.98782802],
       [-0.74630985, -0.22902159, -0.08847989],
       [ 1.23971864,  0.82599879,  0.13872593],
       [ 0.38278173, -1.51187611,  0.57595778]])

In [59]: arr.sort(1)

In [60]: arr
Out[60]: 
array([[-0.21277726,  0.25460328,  0.59628914],
       [-0.15334087,  0.86854163,  0.98782802],
       [-0.74630985, -0.22902159, -0.08847989],
       [ 0.13872593,  0.82599879,  1.23971864],
       [-1.51187611,  0.38278173,  0.57595778]])

np. The sort method returns a copy of the sorted array instead of sorting the original array by position.

In [62]: arr
Out[62]: 
array([[-1.29047004,  0.3232859 , -2.10851172],
       [-1.17325772,  1.25104885,  0.50311541],
       [ 0.51118219, -0.78847183,  0.24601182],
       [-0.69279496,  0.1691795 , -0.05495197],
       [ 1.13416797, -0.55640695,  0.22453922]])

In [63]: np.sort(arr,axis=1)
Out[63]: 
array([[-2.10851172, -1.29047004,  0.3232859 ],
       [-1.17325772,  0.50311541,  1.25104885],
       [-0.78847183,  0.24601182,  0.51118219],
       [-0.69279496, -0.05495197,  0.1691795 ],
       [-0.55640695,  0.22453922,  1.13416797]])

In [64]: arr
Out[64]: 
array([[-1.29047004,  0.3232859 , -2.10851172],
       [-1.17325772,  1.25104885,  0.50311541],
       [ 0.51118219, -0.78847183,  0.24601182],
       [-0.69279496,  0.1691795 , -0.05495197],
       [ 1.13416797, -0.55640695,  0.22453922]])

3.5 unique value and other set logic

NumPy contains some basic set operations for one-dimensional ndarray. A common method is NP Unique, which returns the array formed after sorting the unique values in the array:

In [67]: names=np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])

In [68]: np.unique(names)
Out[68]: 
array(['Bob', 'Joe', 'Will'],
      dtype='<U4')

In [69]: ints=np.array([3,3,3,2,2,1,1,4,4])

In [70]: np.unique(ints)
Out[70]: array([1, 2, 3, 4])

Set NP Comparison between unique and pure Python implementations:

In [71]: sorted(set(names))
Out[71]: ['Bob', 'Joe', 'Will']

Another function, NP In1d, you can check whether the value in one array is in another array and return a Boolean array:

In [72]: values=np.array([6,0,0,3,2,5,6])

In [73]: np.in1d(values,[2,3,6])
Out[73]: array([ True, False, False,  True,  True, False,  True], dtype=bool)

Collection operation of array

method describe
unique(x) calculation × And sort
intersect1d(x,y) calculation × And y, and sort
union1d(x,y) calculation × Union of and y, and sort
ini1d() calculation × Returns an array of Boolean values whether the elements in are contained in y
setdiff1d(x,y) Difference set, in × In but not in y × Element of
setxor1d(x,y) XOR set, in × Or y, but not of ×, Elements of y intersection

4 use array for file input and output

NumPy can store the data in the form of text or binary file in the hard disk or load it from the hard disk. In this section, I will only discuss NumPy's built-in binary format, because most users prefer to use pandas or other tools to load text or tabular data.

np.save and NP Load is the two major tool functions for efficient access to hard disk data. By default, arrays are stored in uncompressed format with the suffix. npy:

In [3]: arr=np.arange(10)

In [4]: np.save('some_array',arr)

If. npy is not written in the file storage path, the suffix will be added automatically. Arrays on the hard disk can use NP Load:

In [5]: np.load('some_array.npy')
Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You can use NP Savez and pass the array as a parameter to the function to save multiple arrays in an uncompressed file:

In [6]: np.savez('arrayArchive.npz',a=arr,b=arr)

When loading an. npy file, you will get a dictionary object and easily load a single array through this object:

In [7]: arch=np.load('arrayArchive.npz')

In [8]: arch['b']
Out[8]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

5 linear algebra

Linear algebra, such as matrix multiplication, decomposition, determinant and other matrix mathematics, is an important part of all array class libraries. Compared with other languages such as Matlab, the difference in numpy's linear algebra is that * is the element by element product of the matrix, not the point product of the matrix. Therefore, there is a function dot in numpy's array method and numpy namespace for matrix operation:

In [11]: x=np.array([[1.,2.,3.],[4.,5.,6.]])

In [12]: y=np.array([[6.,23.],[-1,7],[8,9]])

In [13]: x
Out[13]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [14]: y
Out[14]: 
array([[  6.,  23.],
       [ -1.,   7.],
       [  8.,   9.]])

In [15]: x.dot(y)
Out[15]: 
array([[  28.,   64.],
       [  67.,  181.]])

x.dot(y) is equivalent to NP dot(x, y):

In [16]: np.dot(x,y)
Out[16]: 
array([[  28.,   64.],
       [  67.,  181.]])

The matrix product between a two-dimensional array and a one-dimensional array with appropriate length, and the result is a one-dimensional array:

In [17]: np.dot(x,np.ones(3))
Out[17]: array([  6.,  15.])

The special symbol @ is also used as infix operator for dot multiplication matrix operation:

In [18]: x @ np.ones(3)
Out[18]: array([  6.,  15.])

numpy.linalg has a standard function set of matrix decomposition, as well as other commonly used functions, such as inversion and determinant solution. These functions are implemented through the same industry standard linear algebra library used in other languages such as MATLAB and R, such as BLAS, LAPACK or Intel proprietary MKL (math Core Library) (whether to use MKL depends on the version of NumPy):

In [24]: X=np.random.randn(5,5)

In [25]: mat=X.T.dot(X)

In [26]: inv(mat)
Out[26]: 
array([[ 6.27361681,  0.21413277, -2.10784927,  0.88789016,  1.16687286],
       [ 0.21413277,  0.81349612,  0.23828606, -0.13675758,  0.19600023],
       [-2.10784927,  0.23828606,  1.10000762, -0.40129199, -0.45456528],
       [ 0.88789016, -0.13675758, -0.40129199,  0.2900041 ,  0.14127489],
       [ 1.16687286,  0.19600023, -0.45456528,  0.14127489,  0.47877011]])

In [27]: mat.dot(inv(mat))
Out[27]: 
array([[  1.00000000e+00,  -4.48048335e-18,  -6.88373257e-17,
          1.46296096e-17,   1.80181577e-16],
       [  6.09657689e-16,   1.00000000e+00,  -1.47375802e-16,
          1.81774693e-16,  -7.92478134e-17],
       [ -6.34304347e-16,   3.02540851e-17,   1.00000000e+00,
         -1.23856362e-16,   9.12309641e-17],
       [  7.54243991e-16,  -1.05561747e-16,   1.87814766e-16,
          1.00000000e+00,  -8.45483750e-17],
       [ -8.00317186e-17,  -4.00074463e-16,  -2.06059262e-16,
          4.38296871e-17,   1.00000000e+00]])

In [28]: q,r=qr(mat)

In [29]: r
Out[29]: 
array([[-1.64620018,  2.99450089, -3.66763049,  3.86291548, -1.62371872],
       [ 0.        , -2.93808377,  3.51353498,  1.06839142,  5.8801916 ],
       [ 0.        ,  0.        , -3.76016356, -6.83849909, -1.66644908],
       [ 0.        ,  0.        ,  0.        , -2.49132422,  1.74553809],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.73406139]])

function describe
diag Return the diagonal (or off diagonal) elements of a square matrix as a one-dimensional array, or convert the - dimensional array into a square matrix with zero points on the non diagonal
dot Matrix point multiplication
trace Calculate the sum of diagonal elements
det Determinant of calculation matrix
eig Calculate the eigenvalues and eigenvectors of the square matrix
inv Calculating the inverse matrix of a square matrix
pinv Calculating the Moore Penrose pseudo inverse of a matrix
qr Calculate QR decomposition
svd Compute singular value decomposition (SVD)
solve Solve the linear system of x Ax = b, where A is A square matrix
lstsq Calculate the least squares solution of Ax = b

6 pseudo random number generation

numpy. The random module fills the deficiency of Python's built-in random module and can efficiently generate a complete sample value array under a variety of probability distributions. For example, you can use normal to get a 4 × Normal distribution sample array of 4:

In [30]: samples=np.random.normal(size=[4,4])

In [31]: samples
Out[31]: 
array([[ 0.31532483, -0.1755875 ,  0.04120945,  0.30689172],
       [ 0.34147306,  0.53492713, -2.76257611, -0.64546541],
       [ 2.44510901,  0.35182538, -2.24432639, -0.12661826],
       [ 0.64133959,  1.66535095,  0.83798443, -0.52965891]])

However, Python's built-in random module can only generate one value at a time. As you can see from the example below, numpy Random generates large samples one order of magnitude faster than pure Python:

In [32]: from random import normalvariate

In [33]: N=1000000

In [34]: %timeit samples=[normalvariate(0,1) for _ in range(N)]
1.5 s ± 106 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [35]: %timeit np.random.normal(size=N)
51.3 ms ± 2.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

We can call these pseudo-random numbers because they are generated by algorithms with deterministic behavior based on the random number seeds in the random number generator. You can use NP random. Seed to change the random number seed of NumPy:

In [36]: np.random.seed(1234)

numpy. The data generation function in random shares a global random number seed. To avoid global state, you can use numpy random. Randomstate generates a random number generator to make the data independent of other random number states:

In [37]: rng=np.random.RandomState(1234)

In [38]: rng.randn(10)
Out[38]: 
array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

Numpy is listed below List of some functions in random

function describe
seed Pass random state seed to random number generator
permutation Returns the random arrangement of a sequence, or returns an out of order integer range sequence, and randomly arranges a sequence
shuffle Randomly arrange a sequence
rand Sample from uniform distribution
randint Extracts random integers from a given range from low to high
randn Take samples from the normal distribution of mean 0 and variance 1 (MATLAB interface)
binomial Sample from binomial distribution
normal Sample from normal (Gaussian) distribution
beta Sample from beta distribution
chisquare Sample from Chi square distribution

Reference: data analysis using Python (2nd edition of the original book), author: Wes McKinney, translator: Xu Jingyi, press: China Machine Press

Posted by gmartin1215 on Mon, 02 May 2022 18:42:10 +0300