Li Mu hands-on learning deep learning V2-NiN model and code implementation

1. NiN

LeNet, AlexNet and VGG all share a common design pattern: extract spatial structure features through a series of convolutional layers and pooling layers; and then process the representation of features through fully connected layers. The improvement of LeNet by AlexNet and VGG mainly lies in how to expand and deepen these two modules. The NiN model removes the fully connected layer module, retains the spatial structure of the feature, and uses a 1x1 convolutional layer (full connection layer) on the channel of each pixel.

1.1 NiN block

Recall that the input and output of a convolutional layer consist of four-dimensional tensors, with each axis of the tensor corresponding to samples, channels, height, and width, respectively. In addition, the input and output of the fully connected layer are usually two-dimensional tensors corresponding to samples and features, respectively. The idea of ​​NiN is to apply a fully connected layer at each pixel location (for each height and width). If we connect a weight to each spatial location, we can think of it as a 1×1 convolutional layer, or as a fully connected layer acting independently at each pixel location. From another perspective, each pixel in the spatial dimension is regarded as a single sample, and the channel dimension is regarded as a different feature.
The figure below shows the main architectural differences between VGG and NiN and their blocks. A NiN block starts with a normal convolutional layer followed by two 1×1 convolutional layers. The two 1×1 convolutional layers contain the ReLU activation function. The convolution window shape of the first layer is usually set by the user as a hyperparameter. The subsequent convolution window shape is fixed to 1×1.

The implementation code of the NiN block:

#The parameters are the hyperparameters required by the first convolutional layer
def NiN_blocks(in_channels,out_channels,kernel_size,padding,stride):
    return nn.Sequential(nn.Conv2d(in_channels=in_channels,out_channels=out_channels,kernel_size=kernel_size,padding=padding,stride=stride),
                         nn.ReLU(),
                         nn.Conv2d(in_channels=out_channels,out_channels=out_channels,kernel_size=1),
                         nn.ReLU(),
                         nn.Conv2d(in_channels=out_channels,out_channels=out_channels,kernel_size=1),
                         nn.ReLU())

1.2 NiN model

The original NiN network was proposed shortly after AlexNet and apparently took some inspiration from it. NiN uses convolutional layers with window shapes of 11×11, 5×5, and 3×3, and the number of output channels is the same as in AlexNet. There is a max pooling layer after each NiN block with a pooling window shape of 3×3 and a stride of 2.
A notable difference between NiN and AlexNet is that NiN completely eliminates fully connected layers. Instead, NiN uses a NiN block whose number of output channels is equal to the number of label categories. Finally put a global average pooling layer to generate a log-odds (logits). An advantage of the NiN design is that it significantly reduces the number of parameters required for the model (using convolutional layers with much fewer parameters than fully connected layers because the kernel parameter weights are shared). However, in practice, this design sometimes increases the time to train the model.
The NiN model code is as follows:

def NiN():
    net = nn.Sequential(NiN_blocks(in_channels=1,out_channels=96,kernel_size=11,padding=0,stride=4),
                        nn.MaxPool2d(kernel_size=3,stride=2),
                        NiN_blocks(in_channels=96,out_channels=256,kernel_size=5,padding=2,stride=1),
                        nn.MaxPool2d(kernel_size=3,stride=2),
                        NiN_blocks(in_channels=256,out_channels=384,kernel_size=3,padding=1,stride=1),
                        nn.MaxPool2d(kernel_size=3,stride=2),
                        nn.Dropout(0.5),
                        # The number of label categories is 10
                        NiN_blocks(in_channels=384,out_channels=10,kernel_size=3,padding=1,stride=1),
                        nn.AdaptiveAvgPool2d((1,1)),
                        # Convert the 4D output to a 2D output with shape (batch size, 10)
                        nn.Flatten())
    return net
NiNNet = NiN()
X = torch.randn(size=(1,1,224,224))
for layer in NiNNet:
    X = layer(X)
    print(layer.__class__.__name__,"output shape : \t",X.shape)

1.3 Summary

  1. NiN uses a block consisting of one convolutional layer and multiple 1×1 convolutional layers.
  2. NiN removes the fully connected layers that are prone to overfitting and replaces them with a global average pooling layer (that is, summing and averaging over all positions above the entire channel). The number of channels in this pooling layer is the desired number of outputs (for example, the output of Fashion-MNIST needs to be 10).
  3. Removing the fully connected layer reduces overfitting while significantly reducing the parameters of NiN.
  4. The design of NiN influenced the design of many subsequent convolutional neural networks.

2.NiN model training all codes (the learning rate lr is 0.1, the number of training rounds epochs is 10 rounds, and the batch_size is 124):

import d2l.torch
import torch
from torch import nn

def NiN_blocks(in_channels,out_channels,kernel_size,padding,stride):
    return nn.Sequential(nn.Conv2d(in_channels=in_channels,out_channels=out_channels,kernel_size=kernel_size,padding=padding,stride=stride),
                         nn.ReLU(),
                         nn.Conv2d(in_channels=out_channels,out_channels=out_channels,kernel_size=1),
                         nn.ReLU(),
                         nn.Conv2d(in_channels=out_channels,out_channels=out_channels,kernel_size=1),
                         nn.ReLU())

def NiN():
    net = nn.Sequential(NiN_blocks(in_channels=1,out_channels=96,kernel_size=11,padding=0,stride=4),
                        nn.MaxPool2d(kernel_size=3,stride=2),
                        NiN_blocks(in_channels=96,out_channels=256,kernel_size=5,padding=2,stride=1),
                        nn.MaxPool2d(kernel_size=3,stride=2),
                        NiN_blocks(in_channels=256,out_channels=384,kernel_size=3,padding=1,stride=1),
                        nn.MaxPool2d(kernel_size=3,stride=2),
                        nn.Dropout(0.5),
                        # The number of label categories is 10
                        NiN_blocks(in_channels=384,out_channels=10,kernel_size=3,padding=1,stride=1),
                        nn.AdaptiveAvgPool2d((1,1)),
                        # Convert the 4D output to a 2D output with shape (batch size, 10)
                        nn.Flatten())
    return net
NiNNet = NiN()
X = torch.randn(size=(1,1,224,224))
for layer in NiNNet:
    X = layer(X)
    print(layer.__class__.__name__,"output shape : \t",X.shape)

#model training
lr,num_epochs,batch_size = 0.1,10,128
train_iter,test_iter = d2l.torch.load_data_fashion_mnist(batch_size=batch_size,resize=224)
d2l.torch.train_ch6(NiNNet,train_iter,test_iter,num_epochs,lr,device=d2l.torch.try_gpu())

The NiN model training and testing output results are shown in the following figure:

Tags: AI neural networks Pytorch Deep Learning CNN

Posted by phpshift on Tue, 03 May 2022 01:39:18 +0300