Image recognition using ResNet18 residual network model of CIFAR10 data set - with an accuracy of 90%

Image recognition experiment using ResNet18 residual network model of CIFAR10 data set - accuracy of 90% (detailed notes attached)

Auther:Yuandong Li
Date: 2020/11/17
Welcome to my github: Li-Y-D
The source code of this experiment and the code file in Jupyter Notebook format: Image-classification-CIFAR10-ResNet18

I Test requirements:

Use Jupyter Notebook+PyTorch to complete a network training. requirement:

  1. Use ad hoc network model
  2. Using the cifar10 dataset
  3. The final classification accuracy is at least 0.9

II Experiment preparation:

Hardware conditions:
Independent graphics card model: NVIDIA GeForce 940MX × one
Independent graphics memory: 2048 MB
Software conditions:
Locale: Python 3.7.2
Experimental tool: Jupyter Notebook
Deep learning framework library: PyTorch

III Experimental process:

Dataset: CIFAR10
Network model: resnet18 (modified)
The description of the source code and model are shown in the form of comments as follows.

1. Import relevant modules

Can be in Hands on learning and deep learning Download the dive into DL pytorch master file

import os
import time
import numpy as np
import torch
from torch import nn, optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import sys
sys.path.append("D:\Curriculum and teaching materials\be based on FPGA Design of hardware accelerated deep learning system\Dive-into-DL-PyTorch-master\code")
import d2lzh_pytorch as d2l
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(torch.__version__)
print(torchvision.__version__)
print(device)

Output is:
1.7.0
0.8.1
cuda

torch version 1.7.0
torchvision version 0.8.1
Computing device cuda

2. Obtain data set and preprocess (CIFAR10)

#Definition of image preprocessing transformation
transform_train = transforms.Compose([
transforms.RandomCrop(32, padding=4), #Cut in a random position, 32 squares, and fill 4 on each frame
transforms.RandomHorizontalFlip(), #The default value of the given image flipping probability is 0.5
transforms.ToTensor(), #Convert PIL Image or ndarray to tensor and normalize it to [0-1]
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),#Normalize the tensor image with mean and standard deviation,
#(M1,..., Mn) and (S1,..., Sn) will standardize each channel of the input
])
transform_test = transforms.Compose([ #The test set is also preprocessed
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

Preprocessing and transforming the image can improve the model accuracy and convergence speed to a certain extent.

#Download the data set and set the storage directory, training or not, download or not, data preprocessing conversion mode and other parameters to obtain cifar10_train training data set and cifar10_test data set
cifar10_train = torchvision.datasets.CIFAR10(root='~/Datasets/CIFAR10', train=True, download=True, transform=transform_train)
cifar10_test = torchvision.datasets.CIFAR10(root='~/Datasets/CIFAR10', train=False, download=True, transform=transform_test)

Output is:
Files already downloaded and verified
Files already downloaded and verified

The CIFAR10 data set used here has been downloaded locally, so it does not need to be downloaded online and can be accessed directly. CIFAR10 dataset download website:
Link: Baidu cloud disk
Extraction code: q0k8

#Display the data set type and size
print(type(cifar10_train))
print(len(cifar10_train), len(cifar10_test))

Output is:
<class 'torchvision.datasets.cifar.CIFAR10'>
50000 10000
#Print one of the data to obtain its image size, data type and corresponding label
feature, label = cifar10_train[3]
print(feature.shape, feature.dtype)
print(label)

Output is:
torch.Size([3, 32, 32]) torch.float32
4
#This function enables you to input a numeric label and return a string label, which is convenient for label readability
def get_CIFAR10_labels(labels):
	text_labels = ['airplane', 'automobile', 'bird', 'cat', 'deer',
					'dog', 'frog', 'horse', 'ship', 'truck']
return [text_labels[int(i)] for i in labels]
print(get_CIFAR10_labels([0,1,2,3,4,5,6,7,8,9]))

Output is:
['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
#This function displays the image and the corresponding label
def show_cifar10(images, labels):
	d2l.use_svg_display() #The image is displayed as a scalable vector graph
	_, figs = plt.subplots(1, len(images), figsize=(13, 13))#The first variable is the canvas, the second variable is 1 row, len(images) column, and a picture canvas of 13 size pictures
															#Here_ Indicates that we ignore (do not use) variables
	for f, img, lbl in zip(figs, images, labels): #zip is the package, and each element in figs, images and labels is packaged together accordingly
		img=torchvision.utils.make_grid(img).numpy() #make_grid collapses multiple pictures together. Here img is only one picture
		f.imshow(np.transpose(img,(1,2,0))) #Change the dimension of the picture from [C,W,H] to [W,H,C] for display
		f.set_title(lbl) #Set title label on canvas
		f.axes.get_xaxis().set_visible(False) #X-axis scale off
		f.axes.get_yaxis().set_visible(False) #Y-axis scale off
	plt.show() #mapping

3. Define ResNet18 model

In order to meet the requirement of accuracy of 0.9, this experiment has tried to use LeNet and VGG-16 as the network model, but the accuracy of the test set is the best, only reaching 0.68 and 0.85 respectively. After that, the more advanced MSRA residual networks of he Kaiming's team was used. After comprehensively considering the constraints of hardware conditions, the simple 18 layer ResNet-18 was used as the network model for training, and the accuracy of the test set was barely 0.9.
For more complex and advanced network structure, higher test set accuracy can be obtained. However, due to the performance bottleneck of hardware graphics card, it takes a very long training time and even reports an error due to insufficient video memory, so there is no further experiment.
Since the original ResNet was defined based on the image size of ImageNet, I made some changes to adapt to the CIFAR10 data set.

#Define a residual basic block class with two convolution paths and one shortcut
class BasicBlock(nn.Module):
	expansion = 1
	def __init__(self, in_planes, planes, stride=1): #Initialization function, in_planes is the number of input channels, planes is the number of output channels, and the step size is 1 by default
		super(BasicBlock, self).__init__()
#Define the first convolution. By default, the image size before and after convolution remains the same, but the stripe can be modified to change, and the channel may change
		self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, 				padding=1, bias=False)
#Define the first batch
		self.bn1 = nn.BatchNorm2d(planes)
#Define the second convolution. The image size and the number of channels remain unchanged before and after convolution
		self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,stride=1, padding=1, bias=False)
#Define the second batch
		self.bn2 = nn.BatchNorm2d(planes)
#Define a shortcut. If the image size before and after two convolutions changes (the image size changes or the number of channels changes due to stripe not being 1), the shortcut passes through 1 × 1. Modify the size with stripe
#And modify the number of channels with expansion to facilitate the matching and addition of shortcut output and the output size of two convolutions
		self.shortcut = nn.Sequential()
		if stride != 1 or in_planes != self.expansion*planes:
			self.shortcut = nn.Sequential(
				nn.Conv2d(in_planes, self.expansion*planes,kernel_size=1, stride=stride, bias=False),
				nn.BatchNorm2d(self.expansion*planes)
			)
#Define the forward propagation function. The input image is x and the output image is out
	def forward(self, x):
		out = F.relu(self.bn1(self.conv1(x))) #After the first convolution and the first batch normalization, it is activated with ReLU function
		out = self.bn2(self.conv2(out))
		out += self.shortcut(x) #The second convolution and the second batch are normalized and added to the shortcut
		out = F.relu(out) #The two convolution path outputs are added to the shortcut output and activated with ReLU
		return out
#Define residual network ResNet18
class ResNet(nn.Module):
#Define the initial function. The input parameter is residual block and the number of residual blocks. The default parameter is classification number 10
	def __init__(self, block, num_blocks, num_classes=10):
		super(ResNet, self).__init__()
#Set the number of input channels of the first layer
		self.in_planes = 64
#Define the input image. First perform one convolution and batch normalization to keep the image size unchanged and change the number of channels from 3 to 64
		self.conv1 = nn.Conv2d(3, 64, kernel_size=3,stride=1, padding=1, bias=False)
		self.bn1 = nn.BatchNorm2d(64)
#Define the first layer, input the number of channels 64, with num_blocks[0] residual blocks. The first convolution step in the residual block is customized as 1
		self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
#Define the second layer, input the number of channels 128, with num_blocks[1] residual blocks. The first convolution step in the residual block is customized as 2
		self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
#Define the third layer, input the number of channels 256, with num_blocks[2] residual blocks. The first convolution step in the residual block is customized as 2
		self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
#Define the fourth layer, input the number of channels 512, with num_blocks[3] residual blocks. The first convolution step in the residual block is customized as 2
		self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
#To define the full connection layer, enter 512 * block Expansion neurons, output 10 classification neurons
		self.linear = nn.Linear(512*block.expansion, num_classes)
#Define the function of creation layer. In the same layer, the number of channels is the same. The input parameters are residual block, number of channels, number of residual blocks and step size
	def _make_layer(self, block, planes, num_blocks, stride):
#The first element of the stripes list, stripe, represents the first convolution step of the first residual block, and the other elements represent other residual blocks. The first convolution step is 1
		strides = [stride] + [1]*(num_blocks-1)
#Create an empty list for placing layers
		layers = []
#Traverse the strings list and set different strings for different residual blocks in this layer
		for stride in strides:
			layers.append(block(self.in_planes, planes, stride)) #Create a residual block and add it to this layer
			self.in_planes = planes * block.expansion #Update the number of input channels of the next residual block of this layer or take it as the number of input channels of the next layer after the traversal of this layer
		return nn.Sequential(*layers) #Return to layer list
#The forward propagation function is defined, the input image is x, and the prediction data is output
	def forward(self, x):
		out = F.relu(self.bn1(self.conv1(x))) #After the first convolution and the first batch normalization, it is activated with ReLU function
		out = self.layer1(out) #First layer propagation
		out = self.layer2(out) #Second layer propagation
		out = self.layer3(out) #Layer 3 propagation
		out = self.layer4(out) #Layer 4 Communication
		out = F.avg_pool2d(out, 4) #After a 4 × Average pooling of 4
		out = out.view(out.size(0), -1) #flatten data
		out = self.linear(out) #Full connection propagation
		return out
#Assign ResNet class (the parameter is BasicBlock basic residual block, and there are two basic residual blocks in each of the four layers of [2,2,2,2]) to the object net
net = ResNet(BasicBlock, [2, 2, 2, 2])
#Print net to view the network structure
print(net)

Output is:
ResNet(
	(conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
	(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
	(layer1): Sequential(
		(0): BasicBlock(
			(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(shortcut): Sequential()
		)
		(1): BasicBlock(
			(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(shortcut): Sequential()
		)
	)
	(layer2): Sequential(
		(0): BasicBlock(
			(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
			(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(shortcut): Sequential(
				(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
				(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			)
		)
		(1): BasicBlock(
			(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(shortcut): Sequential()
		)
	)
	(layer3): Sequential(
		(0): BasicBlock(
			(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
			(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(shortcut): Sequential(
				(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
				(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			)
		)
		(1): BasicBlock(
			(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(shortcut): Sequential()
		)
	)
	(layer4): Sequential(
		(0): BasicBlock(
			(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
			(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(shortcut): Sequential(
				(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
				(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			)
		)
		(1): BasicBlock(
			(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
			(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
			(shortcut): Sequential()
		)
	)
	(linear): Linear(in_features=512, out_features=10, bias=True)
)

It can be seen that the network structure consists of six parts:
1. The first convolution layer and batch normalization layer, input picture [3,32,32], output picture [64,32,32]
2. The first layer includes two residual blocks

The first residual block contains two convolutions and two batches normalized to the input picture[64,32,32],Output picture[64,32,32],Shortcut without convolution repair
 The second residual block contains two convolutions and two batch normalization, and the input picture[64,32,32],Output picture[64,32,32],Shortcut without convolution modification

3. The second layer includes two residual blocks

The first residual block contains two convolutions and two batches normalized to the input picture[64,32,32],Output picture[128,16,16],Shortcut through 1 with step size of 2×1 Convolution modification output is[128,16,16]
The second residual block contains two convolutions and two batch normalization, and the input picture[128,16,16],Output picture[128,16,16],Shortcut without convolution modification

4. The third layer includes two residual blocks

The first residual block contains two convolutions and two batches normalized to the input picture[128,16,16],Output picture[256,9,9],Shortcut through 1 with step size of 2×1 Convolution modification output is[256,9,9]
The second residual block contains two convolutions and two batch normalization, and the input picture[256,9,9],Output picture[256,9,9],Shortcut without convolution modification

5. The fourth layer includes two residual blocks

The first residual block contains two convolutions and two batches normalized to the input picture[256,9,9],Output picture[512,5,5],Shortcut through 1 with step size of 2×1 Convolution modification output is[512,5,5]
The second residual block contains two convolutions and two batch normalization, and the input picture[512,5,5],Output picture[512,5,5],Shortcut without convolution modification

6. In the full connection layer, the number of input neuron nodes is 512 * 5 * 5, and the number of output neuron nodes is 10

4. Training model preparation

#Define the number of samples in a training batch
batch_size=128
#Build an iterative data loader (the parameters are dataset, the number of samples in a batch, whether it is out of order, and the number of working threads)
train_iter = torch.utils.data.DataLoader(cifar10_train, batch_size=128, shuffle=True, num_workers=2)
test_iter = torch.utils.data.DataLoader(cifar10_test, batch_size=100, shuffle=False, num_workers=2)
#Multi GPU training and optimization
if device == 'cuda':
#net object can be used for multi GPU parallel processing
	net = torch.nn.DataParallel(net)
#Cudnn is a GPU acceleration library developed by NVIDIA for deep neural network, which allows the built-in cudnn auto tuner to automatically find the most efficient algorithm suitable for the current configuration to optimize the operation efficiency.
	cudnn.benchmark = True

An accuracy evaluation function is created, which is mainly used to input the network model obtained in each round into the test set data in the training process, and then obtain its generalization accuracy.

#Define the accuracy evaluation function (parameters are data loader, network model and computing equipment)
def evaluate_accuracy(data_iter, net, device=None):
	# If no device is specified, use the device of net
	if device is None and isinstance(net, torch.nn.Module):
		device = list(net.parameters())[0].device
	#Set the cumulative correct samples to 0.0 and the cumulative predicted samples to 0
	acc_sum, n = 0.0, 0
	#Accuracy evaluation stage, with torch no_ Gradient calculation function is turned off in grad () package
	with torch.no_grad():
		#Read data X and label y in batch from data loader
		for X, y in data_iter:
			if isinstance(net, torch.nn.Module):#If the network model is inherited from torch nn. Module
				net.eval()#Enter the evaluation mode, which will turn off dropout and perform accuracy accumulation calculation by CPU
				#Judge net (x.to (device)) Argmax (dim = 1), i.e. whether the maximum output value of each sample in the batch forecast list output by X after net is the same as the real label of y.to(device) this sample,
				#If they are the same, the prediction is correct. The equal sign expression is True and the value is 1; Otherwise, the expression is False and the value is 0. Sum the equal sign expression values of all samples in batch and add them to acc_sum
				#Each acc_sum increases the number of correctly predicted samples in a batch. With continuous traversal, acc_sum represents the cumulative number of samples with correct predictions
				acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()
				net.train()# Change back to training mode
			else: #If using a custom model
				#Check whether there is a variable named is in the net object_ The parameter of training, if any, will be is_ When training is set to False, traverse and accumulate the number of correctly predicted samples in each batch
				if('is_training' in net.__code__.co_varnames):
					acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item()
				#If there is no variable named is in the net object_ The parameter of training is used to traverse and accumulate the number of correctly predicted samples in each batch
				else:
					acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()
			n += y.shape[0] #Each time y.shape[0] represents the number of labels in a batch, that is, the number of samples. Therefore, n represents the cumulative number of all predicted samples, whether correct or not
	return acc_sum / n #The accuracy rate is obtained by dividing the cumulative number of all predicted samples by the cumulative number of all predicted samples

A training function is created, which is mainly used for cyclic training. After each round of training, the training information such as the accuracy of this round of training set and the accuracy of test set are printed.

#Define training function (parameters are network model, training loader, test loader, batch samples, optimizer, computing equipment, training rounds)
def train(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):
	net = net.to(device) #Move the network model to the specified device
	print("training on ", device) #View the devices used in your current workout
	loss = torch.nn.CrossEntropyLoss() #The loss function loss uses the cross entropy loss function
	batch_count = 0 #The batch counter is set to 0
	for epoch in range(num_epochs):#Cycle training rounds. Each round will train a complete training set in batches, with a total of num_epochs rounds
		#The cumulative training loss function of each training round initialization is 0.0, the cumulative number of correct training samples is 0.0, the total number of training samples is 0, and start is the time point of starting timing
		train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
		for X, y in train_iter: #Cycle to take a batch of images and labels at a time
			X = X.to(device) #Move the image to the specified device
			y = y.to(device) #Move the label to the designated equipment
			y_hat = net(X) #The batch image data X is input into the network model net to obtain the output batch prediction data y_hat
			l = loss(y_hat, y) #Calculate batch forecast tab y_ Loss function l between hat and batch real label y
			optimizer.zero_grad() #Gradient zeroing of optimizer
			l.backward() #The batch loss function l is back propagated to calculate the gradient
			optimizer.step() #The gradient of the optimizer is updated, and the training parameters are also updated
			train_l_sum += l.cpu().item() #Add this batch loss function L to the cumulative train of the training loss function_ l_ In sum
			train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()#Add the number of correct samples predicted in this batch to the cumulative number of correct samples predicted_ acc_ In sum
			n += y.shape[0] #Add the number of samples in this batch training to the total number of training samples
			batch_count += 1 #Batch counter plus 1
		test_acc = evaluate_accuracy(test_iter, net)#For the network model parameters obtained from this round of training, take the batch as the unit, test the complete set to verify, and get the prediction accuracy of the test set
		#Print the number of rounds, the average loss function of each round, the accuracy of the training set of each round, the accuracy of the test set of each round, and the use time of each round
		print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'
				% (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))

5. Training model

#Set the learning rate lr and the number of training rounds num_epochs
lr, num_epochs = 0.01, 50
#Set the optimizer as Adam optimizer, and the parameters are the parameters and learning rate of the network model
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
#Start training model. The parameters are network model, training loader, test loader, batch size, optimizer, computing equipment and number of training rounds
train(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

Output is:
training on cuda
epoch 1, loss 1.8085, train acc 0.337, test acc 0.420, time 391.4 sec
epoch 2, loss 0.6419, train acc 0.532, test acc 0.601, time 382.5 sec
epoch 3, loss 0.3349, train acc 0.640, test acc 0.659, time 381.2 sec
epoch 4, loss 0.2087, train acc 0.706, test acc 0.734, time 381.7 sec
epoch 5, loss 0.1401, train acc 0.755, test acc 0.751, time 382.8 sec
epoch 6, loss 0.0990, train acc 0.792, test acc 0.782, time 382.9 sec
epoch 7, loss 0.0743, train acc 0.819, test acc 0.806, time 381.3 sec
epoch 8, loss 0.0572, train acc 0.841, test acc 0.826, time 381.2 sec
epoch 9, loss 0.0463, train acc 0.855, test acc 0.840, time 381.5 sec
epoch 10, loss 0.0380, train acc 0.867, test acc 0.839, time 381.6 sec
epoch 11, loss 0.0315, train acc 0.881, test acc 0.855, time 382.6 sec
epoch 12, loss 0.0265, train acc 0.887, test acc 0.844, time 381.8 sec
epoch 13, loss 0.0230, train acc 0.896, test acc 0.852, time 382.6 sec
epoch 14, loss 0.0198, train acc 0.903, test acc 0.867, time 381.5 sec
epoch 15, loss 0.0172, train acc 0.910, test acc 0.863, time 381.7 sec
epoch 16, loss 0.0149, train acc 0.916, test acc 0.887, time 381.8 sec
epoch 17, loss 0.0132, train acc 0.921, test acc 0.875, time 385.0 sec
epoch 18, loss 0.0116, train acc 0.927, test acc 0.879, time 381.6 sec
epoch 19, loss 0.0104, train acc 0.931, test acc 0.884, time 381.3 sec
epoch 20, loss 0.0092, train acc 0.935, test acc 0.885, time 382.1 sec
epoch 21, loss 0.0082, train acc 0.940, test acc 0.895, time 381.3 sec
epoch 22, loss 0.0074, train acc 0.943, test acc 0.894, time 382.1 sec
epoch 23, loss 0.0065, train acc 0.947, test acc 0.892, time 383.7 sec
epoch 24, loss 0.0059, train acc 0.950, test acc 0.896, time 385.9 sec
epoch 25, loss 0.0055, train acc 0.951, test acc 0.893, time 384.1 sec
epoch 26, loss 0.0050, train acc 0.955, test acc 0.888, time 382.0 sec
epoch 27, loss 0.0046, train acc 0.957, test acc 0.903, time 381.7 sec
epoch 28, loss 0.0041, train acc 0.958, test acc 0.881, time 381.5 sec
epoch 29, loss 0.0040, train acc 0.960, test acc 0.900, time 382.5 sec
epoch 30, loss 0.0035, train acc 0.963, test acc 0.902, time 381.4 sec
epoch 31, loss 0.0032, train acc 0.967, test acc 0.905, time 381.4 sec
epoch 32, loss 0.0030, train acc 0.965, test acc 0.896, time 381.6 sec
epoch 33, loss 0.0028, train acc 0.968, test acc 0.900, time 381.4 sec
epoch 34, loss 0.0027, train acc 0.968, test acc 0.892, time 380.8 sec
epoch 35, loss 0.0025, train acc 0.970, test acc 0.907, time 381.3 sec
epoch 36, loss 0.0022, train acc 0.973, test acc 0.905, time 381.0 sec
epoch 37, loss 0.0021, train acc 0.972, test acc 0.908, time 380.7 sec
epoch 38, loss 0.0020, train acc 0.973, test acc 0.911, time 380.4 sec
epoch 39, loss 0.0019, train acc 0.974, test acc 0.907, time 380.9 sec
epoch 40, loss 0.0018, train acc 0.975, test acc 0.906, time 380.8 sec
epoch 41, loss 0.0018, train acc 0.973, test acc 0.911, time 380.8 sec
epoch 42, loss 0.0016, train acc 0.978, test acc 0.904, time 382.2 sec
epoch 43, loss 0.0014, train acc 0.979, test acc 0.893, time 380.6 sec
epoch 44, loss 0.0015, train acc 0.978, test acc 0.901, time 380.6 sec
epoch 45, loss 0.0014, train acc 0.979, test acc 0.907, time 380.6 sec
epoch 46, loss 0.0014, train acc 0.978, test acc 0.907, time 380.6 sec
epoch 47, loss 0.0012, train acc 0.981, test acc 0.903, time 380.7 sec
epoch 48, loss 0.0011, train acc 0.982, test acc 0.911, time 381.1 sec
epoch 49, loss 0.0011, train acc 0.981, test acc 0.901, time 380.9 sec
epoch 50, loss 0.0011, train acc 0.981, test acc 0.904, time 380.9 sec

A total of 50 rounds have been cycled, the loss function is almost no longer reduced, and the accuracy of the training set tends to be constant. It can be seen from the test accuracy test acc that the training accuracy of the saturated test set of the model can barely reach 0.9, which meets the experimental requirements.

#Save the parameters of the network model to mymodel PTH file
torch.save(net.state_dict(),'mymodel.pth')

6. Verify the model

#Reload the data set and test set without preprocessing and deformation, which is convenient for picture display.
cifar10_train = torchvision.datasets.CIFAR10(root='~/Datasets/CIFAR10', train=True, download=True, transform=transforms.ToTensor())
cifar10_test = torchvision.datasets.CIFAR10(root='~/Datasets/CIFAR10', train=False, download=True, transform=transforms.ToTensor())
#Rebuild the training loader and test loader, but the batch size is 6 for easy picture display.
train_iter = torch.utils.data.DataLoader(cifar10_train, batch_size=6, shuffle=True, num_workers=2)
test_iter = torch.utils.data.DataLoader(cifar10_test, batch_size=6, shuffle=False, num_workers=2)

Output is:
Files already downloaded and verified
Files already downloaded and verified
#Create another training set and test set for preprocessing and deformation, so as to input it into the trained network model to display its output prediction label
cifar10_train_tran = torchvision.datasets.CIFAR10(root='~/Datasets/CIFAR10', train=True, download=True, transform=transform_train)
cifar10_test_tran = torchvision.datasets.CIFAR10(root='~/Datasets/CIFAR10', train=False, download=True, transform=transform_test)
#The training loader and test loader of the new dataset are newly built, but the batch size is 6, which is convenient for label display.
train_iter_tran = torch.utils.data.DataLoader(cifar10_train_tran, batch_size=6, shuffle=True, num_workers=2)
test_iter_tran = torch.utils.data.DataLoader(cifar10_test_tran, batch_size=6, shuffle=False, num_workers=2)

Output is:
Files already downloaded and verified
Files already downloaded and verified
dataiter = iter(test_iter) #Creates an iterator without a preprocessed test loader
images, labels = dataiter.next() #Return to batch_size images and labels, i.e. 6 images and labels
show_cifar10(images, get_CIFAR10_labels(labels))#Show the six pictures and labels of the test set

print('GroundTruth: ', ' '.join('%5s' % get_CIFAR10_labels([labels[j].numpy()]) for j in range(6)))#Print real label GroundTruth
dataiter_tran = iter(test_iter_tran) #An iterator that creates a test loader with a preprocessed test set
images_tran, labels_tran = dataiter_tran.next()#Return to batch_size images and labels, i.e. 6 images and labels
images_tran=images_tran.to(device) #Move pictures to GPU
labels_tran=labels_tran.to(device) #Move label to GPU
outputs = net(images_tran) #Input the batch pictures of the preprocessed test set into the trained network model and output the batch prediction data outputs
_, predicted = torch.max(outputs.data, 1) #Get batch forecast data outputs Maximum index of data predicted list
print('Predicted: ', ' '.join('%5s' % get_CIFAR10_labels([predicted[j].cpu().numpy()]) for j in range(6)))#Print Predicted label

The output pictures, real labels and prediction labels are as follows:

It can be seen that for the six pictures in the test set, the labels predicted by the trained model are completely consistent with the real labels, realizing the purpose of image classification, and the experiment is completed.

IV Experimental summary

In this experiment, ResNet18 residual model is used as the network model of image recognition, and its test accuracy reaches about 0.9. However, due to its shallow layers, it does not give full play to the advantages of residual network. Compared with LeNet, AlexNet, VGG16, GoogLeNet and other models, the deeper network layers bring not only the lack of accuracy, but also more difficult model training. The disappearance of gradient leads to the failure of back propagation.
The residual network invented by Dr. he Kaiming deepens the number of network layers and improves the accuracy. For example, several Identity mapping layers (i.e. y=x, output equals input) are added behind the shallow saturated network, which increases the depth of the network and does not increase the error, that is, the deeper network will not increase the error on the training set. The idea of using Identity mapping to directly transfer the output of the previous layer to the later layer mentioned here is the inspiration of the famous deep residual network ResNet. The accuracy of the network can be reduced only by deepening (x) - F to (x) = zero.

Due to the emergence of residual network, the number of layers of deep learning neural network instantly goes from dozens of layers to hundreds of layers, and even thousands of layers deeper. At present, some advanced network structures adopt deep network model to achieve higher accuracy.

For CIFAR10 datasets, some common deep network schedules are as follows:

network accuracy
VGG16 92.64%
ResNet18 93.02%
ResNet50 93.62%
ResNet101 93.75%
RegNetX_200MF 94.24%
RegNetY_400MF 94.29%
MobileNetV2 94.43%
ResNeXt29(32x4d) 94.73%
ResNeXt29(2x64d) 94.82%
DenseNet121 95.04%
PreActResNet18 95.11%
DPN92 95.16%

Tags: Python Machine Learning AI neural networks Pytorch Deep Learning

Posted by cretam on Sat, 07 May 2022 04:50:48 +0300