Machine learning grid search parameters sklearn

grid search

Grid Search GridSearchCV We have two ways to select hyperparameters: 1. Based on experience; 2. Select parameters of different sizes, bring them into the model, and select the parameters with the best performance. When selecting hyperparameters through Path 2, the cost of manual manual adjustment of attention is too high to be worthwhile. For loops or methods similar to for loops are limited by too distinct layers, are not concise and flexible, have high attention costs, and are prone to errors. GridSearchCV is called grid search cross-validation parameter adjustment. It traverses all the permutations and combinations of the incoming parameters, and returns the evaluation index scores under all parameter combinations through cross-validation.

GridSearchCV sounds great, but it is actually a brute force search. Note that this method is useful on small datasets, but not so suitable for large datasets.

from sklearn.metrics import roc_auc_score
import warnings
warnings.filterwarnings("ignore")
from sklearn.datasets import load_iris  # Comes with a sample dataset


iris = load_iris()

X = iris.data  # 150 samples, 4 attributes
y = iris.target # 150 class labels
# Take random forest as an example to introduce the basic calling method

# Exhaustive grid search
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split  # Sliced ​​data
# Split data 80% training data 20% validation data
train_data, test_data, train_target, test_target = train_test_split(
    X, y, test_size=0.2, random_state=0)

model = RandomForestClassifier()
parameters = {'n_estimators': [20, 50, 100], 'max_depth': [1, 2, 3]}

clf = GridSearchCV(model, parameters, cv=3, verbose=2)
clf.fit(train_data, train_target)

print("optimal parameters:")
print(clf.best_params_)
print("best score:")
print(clf.best_score_)
sorted(clf.cv_results_.keys())

score_test = roc_auc_score(test_target, clf.predict_proba(test_data), multi_class='ovr')

print("RandomForestClassifier GridSearchCV test AUC:   ", score_test)
D:\anaconda\python.exe C:/Users/Administrator/Desktop/data mining project/Code package test set/Grid search tuning.py
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Fitting 3 folds for each of 9 candidates, totalling 27 fits
[CV] max_depth=1, n_estimators=20 ....................................
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[CV] ..................... max_depth=1, n_estimators=20, total=   0.0s
[CV] max_depth=1, n_estimators=20 ....................................
[CV] ..................... max_depth=1, n_estimators=20, total=   0.0s
[CV] max_depth=1, n_estimators=20 ....................................
[CV] ..................... max_depth=1, n_estimators=20, total=   0.0s
[CV] max_depth=1, n_estimators=50 ....................................
[CV] ..................... max_depth=1, n_estimators=50, total=   0.1s
[CV] max_depth=1, n_estimators=50 ....................................
[CV] ..................... max_depth=1, n_estimators=50, total=   0.1s
[CV] max_depth=1, n_estimators=50 ....................................
[CV] ..................... max_depth=1, n_estimators=50, total=   0.1s
[CV] max_depth=1, n_estimators=100 ...................................
[CV] .................... max_depth=1, n_estimators=100, total=   0.1s
[CV] max_depth=1, n_estimators=100 ...................................
[CV] .................... max_depth=1, n_estimators=100, total=   0.1s
[CV] max_depth=1, n_estimators=100 ...................................
[CV] .................... max_depth=1, n_estimators=100, total=   0.1s
[CV] max_depth=2, n_estimators=20 ....................................
[CV] ..................... max_depth=2, n_estimators=20, total=   0.0s
[CV] max_depth=2, n_estimators=20 ....................................
[CV] ..................... max_depth=2, n_estimators=20, total=   0.0s
[CV] max_depth=2, n_estimators=20 ....................................
[CV] ..................... max_depth=2, n_estimators=20, total=   0.0s
[CV] max_depth=2, n_estimators=50 ....................................
[CV] ..................... max_depth=2, n_estimators=50, total=   0.1s
[CV] max_depth=2, n_estimators=50 ....................................
[CV] ..................... max_depth=2, n_estimators=50, total=   0.1s
[CV] max_depth=2, n_estimators=50 ....................................
[CV] ..................... max_depth=2, n_estimators=50, total=   0.1s
[CV] max_depth=2, n_estimators=100 ...................................
[CV] .................... max_depth=2, n_estimators=100, total=   0.1s
[CV] max_depth=2, n_estimators=100 ...................................
[CV] .................... max_depth=2, n_estimators=100, total=   0.1s
[CV] max_depth=2, n_estimators=100 ...................................
[CV] .................... max_depth=2, n_estimators=100, total=   0.1s
[CV] max_depth=3, n_estimators=20 ....................................
[CV] ..................... max_depth=3, n_estimators=20, total=   0.0s
[CV] max_depth=3, n_estimators=20 ....................................
[CV] ..................... max_depth=3, n_estimators=20, total=   0.0s
[CV] max_depth=3, n_estimators=20 ....................................
[CV] ..................... max_depth=3, n_estimators=20, total=   0.0s
[CV] max_depth=3, n_estimators=50 ....................................
[CV] ..................... max_depth=3, n_estimators=50, total=   0.1s
[CV] max_depth=3, n_estimators=50 ....................................
[CV] ..................... max_depth=3, n_estimators=50, total=   0.1s
[CV] max_depth=3, n_estimators=50 ....................................
[CV] ..................... max_depth=3, n_estimators=50, total=   0.1s
[CV] max_depth=3, n_estimators=100 ...................................
[CV] .................... max_depth=3, n_estimators=100, total=   0.1s
[CV] max_depth=3, n_estimators=100 ...................................
[CV] .................... max_depth=3, n_estimators=100, total=   0.1s
[CV] max_depth=3, n_estimators=100 ...................................
[CV] .................... max_depth=3, n_estimators=100, total=   0.1s
[Parallel(n_jobs=1)]: Done  27 out of  27 | elapsed:    1.7s finished
 optimal parameters:
{'max_depth': 2, 'n_estimators': 50}
best score:
0.9583333333333334
RandomForestClassifier GridSearchCV test AUC:    1.0

Process has ended,exit code 0

random search

When we search for hyperparameters, if the number of hyperparameters is small (three or four or less), then we can use grid search, an exhaustive search method.

However, when the number of hyperparameters is relatively large, we still use grid search, and the time required for the search will increase exponentially. Therefore, someone proposed a random search method, which randomly searches dozens and hundreds of points in the hyperparameter space, and there may be relatively small values ​​among them. This method is faster than the sparse grid method above, and experiments have shown that the random search method results in slightly better results than the sparse grid method.

RandomizedSearchCV is used in a similar way to the class GridSearchCV, but instead of trying all possible combinations, it selects a specific number of random combinations of a random value for each hyperparameter. This method has two advantages: compared to the overall parameter space , you can choose a relatively small number of parameter combinations. If you let random search run, it will explore different values ​​of each hyperparameter. It is convenient to control the amount of calculation of hyperparameter search by setting the number of searches. Adding parameter nodes doesn't affect performance and doesn't reduce efficiency. The use of RandomizedSearchCV is actually consistent with GridSearchCV, but it replaces GridSearchCV's grid search for parameters by randomly sampling in the parameter space. For parameters with continuous variables, RandomizedSearchCV will be used as a distribution for sampling. This is not possible with grid search, and its search ability depends on the n_iter parameter set.

from sklearn.metrics import roc_auc_score
import warnings
warnings.filterwarnings("ignore")
from sklearn.datasets import load_iris  # Comes with a sample dataset


iris = load_iris()

X = iris.data  # 150 samples, 4 attributes
y = iris.target # 150 class labels
# Take random forest as an example to introduce the basic calling method




# Stochastic parameter optimization

from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split  # Sliced ​​data
# Split data 80% training data 20% validation data
train_data, test_data, train_target, test_target = train_test_split(
    X, y, test_size=0.2, random_state=0)

model = RandomForestClassifier()
parameters = {'n_estimators': [10, 20, 30, 50], 'max_depth': [1, 2, 3]}

clf = RandomizedSearchCV(model, parameters, cv=3, verbose=2)
clf.fit(train_data, train_target)

score_test = roc_auc_score(test_target, clf.predict_proba(test_data), multi_class='ovr')

print("RandomForestClassifier RandomizedSearchCV test AUC:   ", score_test)
print("optimal parameters:")
print(clf.best_params_)
sorted(clf.cv_results_.keys())
D:\anaconda\python.exe C:/Users/Administrator/Desktop/data mining project/Code package test set/Random parameter optimization parameter adjustment.py
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] n_estimators=10, max_depth=3 ....................................
[CV] ..................... n_estimators=10, max_depth=3, total=   0.0s
[CV] n_estimators=10, max_depth=3 ....................................
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[CV] ..................... n_estimators=10, max_depth=3, total=   0.0s
[CV] n_estimators=10, max_depth=3 ....................................
[CV] ..................... n_estimators=10, max_depth=3, total=   0.0s
[CV] n_estimators=50, max_depth=2 ....................................
[CV] ..................... n_estimators=50, max_depth=2, total=   0.1s
[CV] n_estimators=50, max_depth=2 ....................................
[CV] ..................... n_estimators=50, max_depth=2, total=   0.1s
[CV] n_estimators=50, max_depth=2 ....................................
[CV] ..................... n_estimators=50, max_depth=2, total=   0.1s
[CV] n_estimators=20, max_depth=1 ....................................
[CV] ..................... n_estimators=20, max_depth=1, total=   0.0s
[CV] n_estimators=20, max_depth=1 ....................................
[CV] ..................... n_estimators=20, max_depth=1, total=   0.0s
[CV] n_estimators=20, max_depth=1 ....................................
[CV] ..................... n_estimators=20, max_depth=1, total=   0.0s
[CV] n_estimators=30, max_depth=3 ....................................
[CV] ..................... n_estimators=30, max_depth=3, total=   0.0s
[CV] n_estimators=30, max_depth=3 ....................................
[CV] ..................... n_estimators=30, max_depth=3, total=   0.0s
[CV] n_estimators=30, max_depth=3 ....................................
[CV] ..................... n_estimators=30, max_depth=3, total=   0.0s
[CV] n_estimators=10, max_depth=2 ....................................
[CV] ..................... n_estimators=10, max_depth=2, total=   0.0s
[CV] n_estimators=10, max_depth=2 ....................................
[CV] ..................... n_estimators=10, max_depth=2, total=   0.0s
[CV] n_estimators=10, max_depth=2 ....................................
[CV] ..................... n_estimators=10, max_depth=2, total=   0.0s
[CV] n_estimators=20, max_depth=2 ....................................
[CV] ..................... n_estimators=20, max_depth=2, total=   0.0s
[CV] n_estimators=20, max_depth=2 ....................................
[CV] ..................... n_estimators=20, max_depth=2, total=   0.0s
[CV] n_estimators=20, max_depth=2 ....................................
[CV] ..................... n_estimators=20, max_depth=2, total=   0.0s
[CV] n_estimators=50, max_depth=3 ....................................
[CV] ..................... n_estimators=50, max_depth=3, total=   0.1s
[CV] n_estimators=50, max_depth=3 ....................................
[CV] ..................... n_estimators=50, max_depth=3, total=   0.1s
[CV] n_estimators=50, max_depth=3 ....................................
[CV] ..................... n_estimators=50, max_depth=3, total=   0.1s
[CV] n_estimators=30, max_depth=1 ....................................
[CV] ..................... n_estimators=30, max_depth=1, total=   0.0s
[CV] n_estimators=30, max_depth=1 ....................................
[CV] ..................... n_estimators=30, max_depth=1, total=   0.0s
[CV] n_estimators=30, max_depth=1 ....................................
[CV] ..................... n_estimators=30, max_depth=1, total=   0.0s
[CV] n_estimators=10, max_depth=1 ....................................
[CV] ..................... n_estimators=10, max_depth=1, total=   0.0s
[CV] n_estimators=10, max_depth=1 ....................................
[CV] ..................... n_estimators=10, max_depth=1, total=   0.0s
[CV] n_estimators=10, max_depth=1 ....................................
[CV] ..................... n_estimators=10, max_depth=1, total=   0.0s
[CV] n_estimators=50, max_depth=1 ....................................
[CV] ..................... n_estimators=50, max_depth=1, total=   0.1s
[CV] n_estimators=50, max_depth=1 ....................................
[CV] ..................... n_estimators=50, max_depth=1, total=   0.1s
[CV] n_estimators=50, max_depth=1 ....................................
[CV] ..................... n_estimators=50, max_depth=1, total=   0.1s
[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    0.9s finished
RandomForestClassifier RandomizedSearchCV test AUC:    1.0
 optimal parameters:
{'n_estimators': 30, 'max_depth': 3}

Process has ended,exit code 0


 

Tags: Python Machine Learning sklearn

Posted by AudiS2 on Wed, 05 Oct 2022 02:18:52 +0300