[Machine Learning] kaggle Competition Killer - Model Fusion


More code: Gitee homepage: https://gitee.com/GZHzzz
Blog homepage: CSDN: https://blog.csdn.net/gzhzzaa

0 is written in front

  • This article does not involve the depth of each algorithm principle level, the purpose is to help understand these model fusion methods from a macro perspective

1 Voting

Starting from the simplest Voting, this can be said to be the most intuitive and simple model fusion. Assuming that for a two-class problem, there are 3 basic models, then a voting method is adopted, and the one with the most votes is determined as the final classification 🤔

2 Averaging

For regression problems, a simple and straightforward idea is to take the average. A slightly improved approach is to do a weighted average. The weights can be determined by sorting. For example, there are three basic models A, B, and C. The model effects are ranked. Suppose the rankings are 1, 2, and 3, respectively. Then the weights assigned to these three models are respectively Yes 3/6, 2/6, 1/6

These two methods seem to be simple, but in fact, the advanced algorithms that follow can also be said to be based on this. Bagging or Boosting is a kind of idea that combines many weak classifiers into strong classifiers 😁

3 Bagging

Bagging is to use the method of sampling with replacement, build a sub-model with the sampled samples, and train the sub-model. This process is repeated many times, and finally the fusion is performed. It can be roughly divided into two steps:

  • Repeat K times, repeat sampling modeling with replacement, and train sub-models

  • Model fusion, classification problem: voting, regression problem: average
    Random forest is a typical example based on Bagging algorithm, the base classifier used is decision tree

4 Boosting

The Bagging algorithm can be processed in parallel, and the idea of ​​Boosting is an iterative method. In each training, it pays more attention to the misclassified examples, and adds greater weight to these misclassified examples. The goal of the next iteration is to It is easier to identify examples that were misclassified in the previous round. Finally, these weak classifiers are weighted and summed


Similarly, based on the idea of ​​Boosting, there are AdaBoost, GBDT, etc.

5 Stacking

  • stacking is an ensemble idea, and many ensemble algorithms are variants of it. To be precise, the stacking method uses the "learning" method to fuse the models (compare weighted fusion and average fusion, these two use a rule and a formula to fuse the prediction results of several models), that is, put the required The prediction results of several fused models are fused by another learning model. This learning model for fusion is called a meta-learner, and several single models are primary learners
  • Stacking is essentially such a direct idea, but this is definitely not possible. The problem is that the acquisition of the primary learner is problematic. The model trained with the entire training set is used to predict the label of the training set in turn, which is undoubtedly overfitting. is very, very serious, so the problem now becomes how to get a primary learner under the premise of solving overfitting, which becomes a familiar rhythm - K-fold cross-validation

Steps of satcking fusion:

step1: Train T primary learners, and use the cross-validation method to train on the Train Set (because the data for establishing the meta-learner in the second stage is output by the primary learner, if the generalization ability of the primary learner is low, the meta-learning will also overfit)
step2: The predicted values ​​output by T primary learners on the Train Set are used as the training data D of the meta-learner. There are T primary learners, and there are T features in D. The label of D is the same as the label when training the primary learner
step3: The predicted values ​​output by the T primary learners on the Test Set are used as the test set when training the meta-learner. There are also T models with T features.
step4: Train the meta-learner, the label of the meta-learner training set D is the same as the label when training the primary learner

  • In fact, when training the meta-learner in the second layer, the idea of ​​cross-validation can also be used to improve the predictive ability of the meta-learner on the test set.

show me code, no bb

The following is the code of stacking fusion, which can be saved as a py file and called directly 😎

def stack_model(oof_1, oof_2, oof_3, predictions_1, predictions_2, predictions_3, y, eval_type='regression'):
   
    # Part 1. Data Preparation
    # Splicing columns by row, splicing all prediction results of the validation set
    # train_stack is the training data of the final model
    train_stack = np.hstack([oof_1, oof_2, oof_3])
    # Concatenate columns by row to concatenate all predictions on the test set
    # test_stack is the test data of the final model
    test_stack = np.hstack([predictions_1, predictions_2, predictions_3])
    # Create an all-zero array with the same number of rows as the validation set,
    oof = np.zeros(train_stack.shape[0])
    # Create an all-zero array with the same number of rows as the test set
    predictions = np.zeros(test_stack.shape[0])
    
    # Part 2. Multiple rounds of cross-validation (for the cross-validation of the second step of stacking): divided into 5 folds, and Bayesian regression is used to train the model on the training set of each fold (model fusion process)
    #on the validation set
    from sklearn.model_selection import RepeatedKFold
    folds = RepeatedKFold(n_splits=5, n_repeats=2, random_state=2020)
    
    # fold_ is the number of folds, trn_idx is the training set index for each fold, and val_idx is the validation set index for each fold
    for fold_, (trn_idx, val_idx) in enumerate(folds.split(train_stack, y)):
        # print fold information
        print("fold n°{}".format(fold_+1))
        # The training set is divided into samples and labels of the training set
        trn_data, trn_y = train_stack[trn_idx], y[trn_idx]
        # The samples and labels in the training set divided into the validation set
        val_data, val_y = train_stack[val_idx], y[val_idx]
        # Prompt to start training
        print("-" * 10 + "Stacking " + str(fold_+1) + "-" * 10)
        # Using Bayesian regression as the final model for the fusion of results
        clf = BayesianRidge()
        # train on training data
        clf.fit(trn_data, trn_y)
        # Make predictions on the validation data and record the results in the corresponding position of oof (used to calculate evaluation indicators)
        oof[val_idx] = clf.predict(val_data)
        # Predict the test set data, and each round of prediction results accounts for an additional 1/10 (multi-round cross-validation: n_splits=5, n_repeats=2)
        predictions += clf.predict(test_stack) / (5 * 2)
        
    if eval_type == 'regression':
        print('mean: ',np.sqrt(mean_squared_error(y, oof)))
    if eval_type == 'binary':
        print('mean: ',log_loss(y, oof))
    
    # Returns the prediction results for the test set
    return oof, predictions

write at the end

Ten years of sharpening swords, encouragement with you!
More code: Gitee homepage: https://gitee.com/GZHzzz
Blog homepage: CSDN: https://blog.csdn.net/gzhzzaa

  • Fighting!😎

Classic model based on pytorch: A typical agent model based on pytorch
Reinforcement Learning Classic Papers: Reinforcement Learning Classic Papers

while True:
	Go life

Thanks for the likes and exchanges! (❁´◡`❁)

Tags: Algorithm Machine Learning AI Decision Tree Model

Posted by MHz on Tue, 03 May 2022 16:52:32 +0300