Click jump
Pandas series catalog
1. Group by method splits data
 This method provides the splitting function in the grouping aggregation step, which can group the data according to the index or field
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs)

Special description of by parameter
 If a function is passed in, the index is calculated and grouped
 If a dictionary or Series is passed in, the value of the dictionary or Series is used as the grouping basis
 If a Numpy array is passed in, the elements of the data are used as the basis for grouping
 If the string or string list is passed in, the fields represented by these strings are used as the grouping basis

Descriptive statistical methods commonly used by group by objects

The group by method can't be used to directly view the results after being grouped in memory

Descriptive statistical methods commonly used by group by objects

import pandas as pd import numpy as np detail = pd.read_excel('...') detailGroup = detail[['order_id', 'counts', 'amounts']].groupby(by='order_id') print('The order details after grouping are as follows:', detailGroup) # Print address print(detailGroup.mean().head()) print(detailGroup.std().head()) print(detailGroup.size().head())
2. agg method aggregates data

agg and aggregate function parameters and their descriptions

Both agg and aggregate methods support the application of their functions to each group, including Python builtin functions or custom functions. At the same time, these two methods can also directly apply functions to DataFrame

During normal use, the functions of agg function and aggregate function are almost the same when operating on DataFrame object, so you only need to master one of them

DataFrame.agg(func, axis=0, *args, **kwargs)

DataFrame.aggregate(func, axis=0, *args, **kwargs)



Finding statistics by agg method

The agg method can be used to calculate the sum and average of the sales and selling prices of all dishes in the current data at one time

detail[['counts', 'amounts']].agg([np.sum, np.mean])


For one field, you want to do only the mean operation, while for another field, you want to do only the sum operation. You can use the dictionary to take the names of the two fields as key s, and then take the sum and mean functions of numpy library as value s respectively

detail.agg({'counts': np.sum, 'amounts': np.mean})


In some cases, you also want to find multiple statistics for a field, while for some fields, you only need to find one statistics. At this time, you only need to change the value of the key corresponding to the dictionary into a list, and the list element is the statistics of multiple targets

detail.agg({'counts': np.sum, 'amounts': [np.mean, np.sum]})



agg method and custom function
 You can pass in custom functions in the agg method
# Double the sum of custom functions def DoubleSum(data): s = data.sum() * 2 return s print('The sum of twice the sales volume of dishes in the dish order details table is:', '\n', detail.agg({'counts': DoubleSum}, axis=0))
 When using custom functions, you should pay attention to the function NP in the numpy library mean，np.median，np.prod，np.sum，np.std，np.var can be directly used in agg, but using these functions in numpy Library in userdefined functions will not get the desired results if it is a single sequence during calculation. If multiple columns of data are calculated at the same time, this problem will not occur. Therefore, try not to use NP functions in custom functions
# Double the sum of custom functions def DoubleSum1(data): s = np.sum(data) * 2 return s print('The sum of double sales of dishes in the order details table is:\n', detail.agg({'counts': DoubleSum1}, axis=0).head()) # Abnormal results print('Twice the sum of the sales volume and selling price of dishes in the order details table is:\n', detail[['counts', 'amounts']].agg(DoubleSum1)) # Normal results
 Using the agg method, you can use the same function for each field and group
 The agg method can operate on the groupby object
 If you need to apply different functions to different fields, you can use the same agg method as in DataFrame
print(detailGroup.agg(np.mean).head(3)) print(detailGroup.agg(np.std).head(3)) print(detailGroup.agg({'counts': np.sum, 'amounts': np.mean}).head(3))
3. Aggregate data with the apply method
 The apply method is similar to the agg method and can apply functions to each column. The difference is that the function passed in by the apply method can only act on the whole DataFrame or Series, and cannot apply different functions to different fields and obtain different results like agg
 Using the apply method to aggregate groupby objects is the same as the agg method, except that the agg method can apply different functions to different fields, but apply cannot
DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)
print(detail[['counts', 'amounts']].apply(np.mean)) print(detailGroup.apply(np.mean).head(3)) print(detailGroup.apply(np.std).head(3))
4. The transform method aggregates data

The transform method can operate on all elements of the entire DataFrame. And the transform method has only one parameter func, which represents the function of DataFrame operation
print(detail[['counts', "amounts"]].transform(lambda x: x * 2).head(4))

At the same time, the transform method can also operate the object groupby after DataFrame grouping, and can realize the standardization of deviation within the group
print(detailGroup.transform(lambda x: (x.mean()  x.min()) / (x.max()  x.min())).head())

If NaN is included in the result when calculating the dispersion standardization, this is because according to the dispersion standardization formula, when the maximum value and the minimum value are the same, the denominator is 0, and the number with denominator 0 is expressed as NaN in Python