5.4 Handwritten Digit Recognition Experiment Based on Residual Network
Residual Network (ResNet) is a way of adding directly connected edges to nonlinear layers in neural network models to alleviate the problem of gradient disappearance, thereby making it easier to train deep neural networks.
In the residual network, the most basic unit is the residual unit.
5.4.1 Model Construction
Build the residual unit of ResNet18, and then form a complete network.
5.4.1.1 Residual unit
The input and output shape and size of the nonlinear layer wrapped by the residual unit should be the same. If the number of channels of the input feature map and output feature map of a convolutional layer are inconsistent, its output and input feature map cannot be directly added. To solve the above problem, we can use a convolution of size 1×1 to map the number of channels of the input feature map to a consistent number of channels with the output feature map of the concatenated convolutions.
1×1 convolution: exactly the same as the standard convolution, the only special point is that the size of the convolution kernel is 1×1, that is, the relationship between the local information of the input data is not considered, but the focus is on different channels between. By using 1×1 convolution, you can do the following:
Realize cross-channel interaction and integration of information. Considering that the input and output of the convolution operation are all 3 dimensions (width, height, multi-channel), 1×1 convolution is actually a linear combination of each pixel on different channels to integrate different channels. Information;
The number of convolution kernel channels is reduced and increased to reduce the number of parameters. The output after 1×1 convolution retains the original plane structure of the input data, and the function of dimension raising or dimension reduction is completed by adjusting the number of channels;
Using the nonlinear activation function after 1×1 convolution, the nonlinearity is greatly increased while keeping the size of the feature map unchanged.
import torch import torch.nn as nn import torch.nn.functional as F class ResBlock(nn.Module): def __init__(self, in_channels, out_channels, stride=1, use_residual=True): """ residual unit enter: - in_channels: Number of input channels - out_channels: Number of output channels - stride: The stride of the residual unit, controlled by adjusting the stride of the first convolutional layer in the residual unit - use_residual: Used to control whether residual connections are used """ super(ResBlock, self).__init__() self.stride = stride self.use_residual = use_residual # The first convolutional layer, the size of the convolution kernel is 3×3, the number of output channels and the step size can be set self.conv1 = nn.Conv2d(in_channels, out_channels, 3, padding=1, stride=self.stride, bias=False) # The second convolutional layer, the convolution kernel size is 3 × 3, does not change the shape of the input feature map, and the stride is 1 self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1, bias=False) # use_1x1conv = True if the output of conv2 and the input data shape of this residual block are inconsistent # When use_1x1conv = True, add a 1x1 convolution to the input data to make its shape consistent with conv2 if in_channels != out_channels or stride != 1: self.use_1x1conv = True else: self.use_1x1conv = False # When the number of input and output channels of the nonlinear layer wrapped by the residual unit is inconsistent, it is necessary to use 1×1 convolution to adjust the number of channels and then perform the addition operation if self.use_1x1conv: self.shortcut = nn.Conv2d(in_channels, out_channels, 1, stride=self.stride, bias=False) # Each convolutional layer is followed by a batch normalization layer. The content of batch normalization will be described in detail in 7.5.1 self.bn1 = nn.BatchNorm2d(out_channels) self.bn2 = nn.BatchNorm2d(out_channels) if self.use_1x1conv: self.bn3 = nn.BatchNorm2d(out_channels) def forward(self, inputs): y = F.relu(self.bn1(self.conv1(inputs))) y = self.bn2(self.conv2(y)) if self.use_residual: if self.use_1x1conv: # If true, perform a 1×1 convolution on the inputs and adjust the shape to match the output y of conv2 shortcut = self.shortcut(inputs) shortcut = self.bn3(shortcut) else: # Otherwise, directly add the inputs and the output y of conv2 shortcut = inputs y = torch.add(shortcut, y) out = F.relu(y) return out
5.4.1.2 Overall structure of residual network
Residual network is a very deep network composed of many residual units connected in series. The network structure of ResNet18 is shown in the figure below.
For ease of understanding, the ResNet18 network can be divided into 6 modules:
The first module: contains a convolutional layer with a stride of 2 and a size of 7×7. The number of output channels of the convolutional layer is 64. After the output of the convolutional layer is processed by batch normalization and ReLU activation function, Then a 3×3 maximum pooling layer with a stride of 2;
The second module: contains two residual units. After the operation, the number of output channels is 64, and the size of the feature map remains unchanged;
The third module: contains two residual units. After the operation, the number of output channels is 128, and the size of the feature map is reduced by half;
The fourth module: contains two residual units. After the operation, the number of output channels is 256, and the size of the feature map is reduced by half;
The fifth module: contains two residual units. After the operation, the number of output channels is 512, and the size of the feature map is reduced by half;
The sixth module: contains a global average pooling layer, which changes the feature map to a size of 1 × 1, and finally calculates the final output through the fully connected layer.
The code implementation of the ResNet18 model is as follows:
Define module one:
def make_first_module(in_channels): # Module 1: 7*7 convolution, batch normalization, aggregation m1 = nn.Sequential(nn.Conv2d(in_channels, 64, 7, stride=2, padding=3), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2, padding=1)) return m1
Define module two to module five:
def resnet_module(input_channels, out_channels, num_res_blocks, stride=1, use_residual=True): blk = [] # According to num_res_blocks, loop to generate residual units for i in range(num_res_blocks): if i == 0: # Create the first residual unit in the module blk.append(ResBlock(input_channels, out_channels, stride=stride, use_residual=use_residual)) else: # Create additional residual units in the module blk.append(ResBlock(out_channels, out_channels, use_residual=use_residual)) return blk
Package module two to module five:
def make_modules(use_residual): # Module 2: Contains two residual units, the number of input channels is 64, the number of output channels is 64, the step size is 1, and the size of the feature map remains unchanged m2 = nn.Sequential(*resnet_module(64, 64, 2, stride=1, use_residual=use_residual)) # Module 3: Contains two residual units, the number of input channels is 64, the number of output channels is 128, the step size is 2, and the size of the feature map is reduced by half. m3 = nn.Sequential(*resnet_module(64, 128, 2, stride=2, use_residual=use_residual)) # Module 4: Contains two residual units, the number of input channels is 128, the number of output channels is 256, the step size is 2, and the size of the feature map is reduced by half. m4 = nn.Sequential(*resnet_module(128, 256, 2, stride=2, use_residual=use_residual)) # Module 5: Contains two residual units, the number of input channels is 256, the number of output channels is 512, the step size is 2, and the size of the feature map is reduced by half. m5 = nn.Sequential(*resnet_module(256, 512, 2, stride=2, use_residual=use_residual)) return m2, m3, m4, m5
Define the complete network:
# Define the complete network class Model_ResNet18(nn.Module): def __init__(self, in_channels=3, num_classes=10, use_residual=True): super(Model_ResNet18,self).__init__() m1 = make_first_module(in_channels) m2, m3, m4, m5 = make_modules(use_residual) # Package module 1 to module 6 self.net = nn.Sequential(m1, m2, m3, m4, m5, # Module 6: Aggregation layer, fully connected layer nn.AdaptiveAvgPool2D(1), nn.Flatten(), nn.Linear(512, num_classes) ) def forward(self, x): return self.net(x)
Here you can also use torchsummary.summary to count the parameters of the model.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=True).to(device) torchsummary.summary(model, (1, 32, 32))
C:\Users\74502\ven\Scripts\python.exe C:/Users/74502/PycharmProjects/pythonProject/Big Data Experiment 3.py ---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 64, 16, 16] 3,200 BatchNorm2d-2 [-1, 64, 16, 16] 128 ReLU-3 [-1, 64, 16, 16] 0 MaxPool2d-4 [-1, 64, 8, 8] 0 Conv2d-5 [-1, 64, 8, 8] 36,928 BatchNorm2d-6 [-1, 64, 8, 8] 128 Conv2d-7 [-1, 64, 8, 8] 36,928 BatchNorm2d-8 [-1, 64, 8, 8] 128 ResBlock-9 [-1, 64, 8, 8] 0 Conv2d-10 [-1, 64, 8, 8] 36,928 BatchNorm2d-11 [-1, 64, 8, 8] 128 Conv2d-12 [-1, 64, 8, 8] 36,928 BatchNorm2d-13 [-1, 64, 8, 8] 128 ResBlock-14 [-1, 64, 8, 8] 0 Conv2d-15 [-1, 128, 4, 4] 73,856 BatchNorm2d-16 [-1, 128, 4, 4] 256 Conv2d-17 [-1, 128, 4, 4] 147,584 BatchNorm2d-18 [-1, 128, 4, 4] 256 Conv2d-19 [-1, 128, 4, 4] 8,320 BatchNorm2d-20 [-1, 128, 4, 4] 256 ResBlock-21 [-1, 128, 4, 4] 0 Conv2d-22 [-1, 128, 4, 4] 147,584 BatchNorm2d-23 [-1, 128, 4, 4] 256 Conv2d-24 [-1, 128, 4, 4] 147,584 BatchNorm2d-25 [-1, 128, 4, 4] 256 ResBlock-26 [-1, 128, 4, 4] 0 Conv2d-27 [-1, 256, 2, 2] 295,168 BatchNorm2d-28 [-1, 256, 2, 2] 512 Conv2d-29 [-1, 256, 2, 2] 590,080 BatchNorm2d-30 [-1, 256, 2, 2] 512 Conv2d-31 [-1, 256, 2, 2] 33,024 BatchNorm2d-32 [-1, 256, 2, 2] 512 ResBlock-33 [-1, 256, 2, 2] 0 Conv2d-34 [-1, 256, 2, 2] 590,080 BatchNorm2d-35 [-1, 256, 2, 2] 512 Conv2d-36 [-1, 256, 2, 2] 590,080 BatchNorm2d-37 [-1, 256, 2, 2] 512 ResBlock-38 [-1, 256, 2, 2] 0 Conv2d-39 [-1, 512, 1, 1] 1,180,160 BatchNorm2d-40 [-1, 512, 1, 1] 1,024 Conv2d-41 [-1, 512, 1, 1] 2,359,808 BatchNorm2d-42 [-1, 512, 1, 1] 1,024 Conv2d-43 [-1, 512, 1, 1] 131,584 BatchNorm2d-44 [-1, 512, 1, 1] 1,024 ResBlock-45 [-1, 512, 1, 1] 0 Conv2d-46 [-1, 512, 1, 1] 2,359,808 BatchNorm2d-47 [-1, 512, 1, 1] 1,024 Conv2d-48 [-1, 512, 1, 1] 2,359,808 BatchNorm2d-49 [-1, 512, 1, 1] 1,024 ResBlock-50 [-1, 512, 1, 1] 0 AdaptiveAvgPool2d-51 [-1, 512, 1, 1] 0 Flatten-52 [-1, 512] 0 Linear-53 [-1, 10] 5,130 ================================================================ Total params: 11,180,170 Trainable params: 11,180,170 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.00 Forward/backward pass size (MB): 1.05 Params size (MB): 42.65 Estimated Total Size (MB): 43.71 ---------------------------------------------------------------- Process ended with exit code 0
Calculation amount:
from thop import profile device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # PyTorch v0.4.0 model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=True).to(device) dummy_input = torch.randn(1, 1, 32, 32).to(device) flops, params = profile(model,(dummy_input,)) print(flops)
5.4.2 ResNet18 without residual connections (Plain Networks)
5.4.2.1 Model training
from PIL import Image import matplotlib.pyplot as plt from torchvision.transforms import Compose, Resize, Normalize, ToTensor import random import torch.utils.data as data import torch import torch.nn.functional as F import torch.nn as nn from torch.nn.init import constant_, normal_, uniform_ from torchsummary import summary from thop import profile import torch.optim as opt import numpy as np # Print and observe the distribution of the dataset train_set, dev_set, test_set = json.load(gzip.open('mnist (1).json.gz')) train_images, train_labels = train_set[0][:2000], train_set[1][:2000] dev_images, dev_labels = dev_set[0][:200], dev_set[1][:200] test_images, test_labels = test_set[0][:200], test_set[1][:200] train_set, dev_set, test_set = [train_images, train_labels], [dev_images, dev_labels], [test_images, test_labels] print('Length of train/dev/test set:{}/{}/{}'.format(len(train_set[0]), len(dev_set[0]), len(test_set[0]))) image, label = train_set[0][0], train_set[1][0] image, label = np.array(image).astype('float32'), int(label) # The original image data is a row vector of length 784, which needs to be adjusted to an image of size [28,28] image = np.reshape(image, [28, 28]) image = Image.fromarray(image.astype('uint8'), mode='L') print("The number in the picture is {}".format(label)) plt.figure(figsize=(5, 5)) plt.imshow(image) plt.savefig('conv-number5.pdf') # data preprocessing transforms = Compose([Resize(32), ToTensor(), Normalize(mean=[1], std=[1])]) class RunnerV3(object): def __init__(self, model, optimizer, loss_fn, metric, **kwargs): self.model = model self.optimizer = optimizer self.loss_fn = loss_fn self.metric = metric # Only used to calculate evaluation indicators # Record the changes in the evaluation indicators during the training process self.dev_scores = [] # Record the change of the loss function during the training process self.train_epoch_losses = [] # An epoch records a loss self.train_step_losses = [] # A step records a loss self.dev_losses = [] # Record the global optimal index self.best_score = 0 def train(self, train_loader, dev_loader=None, **kwargs): # Switch the model to training mode self.model.train() # Pass in the number of training epochs, if no value is passed in, it defaults to 0 num_epochs = kwargs.get("num_epochs", 0) # Incoming log printing frequency, if no value is passed in, the default is 100 log_steps = kwargs.get("log_steps", 100) # Evaluation frequency eval_steps = kwargs.get("eval_steps", 0) # Pass in the model save path, if no value is passed in, the default is "best_model.pdparams" save_path = kwargs.get("save_path", "best_model.pdparams") custom_print_log = kwargs.get("custom_print_log", None) # total training steps num_training_steps = num_epochs * len(train_loader) if eval_steps: if self.metric is None: raise RuntimeError('Error: Metric can not be None!') if dev_loader is None: raise RuntimeError('Error: dev_loader can not be None!') # Number of step s to run global_step = 0 # Do num_epochs rounds of training for epoch in range(num_epochs): # Loss for statistical training set total_loss = 0 for step, data in enumerate(train_loader): X, y = data # Get model predictions logits = self.model(X) loss = self.loss_fn(logits, y) # Default to mean total_loss += loss # During the training process, the loss of each step is saved self.train_step_losses.append((global_step, loss.item())) if log_steps and global_step % log_steps == 0: print( f"[Train] epoch: {epoch}/{num_epochs}, step: {global_step}/{num_training_steps}, loss: {loss.item():.5f}") # Gradient backpropagation, calculating the gradient value of each parameter loss.backward() if custom_print_log: custom_print_log(self) # Mini-batch gradient descent for parameter update self.optimizer.step() # Gradient zeroing self.optimizer.zero_grad() # Decide whether to evaluate if eval_steps > 0 and global_step > 0 and \ (global_step % eval_steps == 0 or global_step == (num_training_steps - 1)): dev_score, dev_loss = self.evaluate(dev_loader, global_step=global_step) print(f"[Evaluate] dev score: {dev_score:.5f}, dev loss: {dev_loss:.5f}") # Switch the model to training mode self.model.train() # If the current indicator is the optimal indicator, save the model if dev_score > self.best_score: self.save_model(save_path) print( f"[Evaluate] best accuracy performence has been updated: {self.best_score:.5f} --> {dev_score:.5f}") self.best_score = dev_score global_step += 1 # Current epoch training loss cumulative value trn_loss = (total_loss / len(train_loader)).item() # epoch granularity training loss preservation self.train_epoch_losses.append(trn_loss) print("[Train] Training done!") # During model evaluation, use 'paddle.no_grad()' to control not to compute and store gradients @torch.no_grad() def evaluate(self, dev_loader, **kwargs): assert self.metric is not None # Set the model to evaluation mode self.model.eval() global_step = kwargs.get("global_step", -1) # Loss for statistical training set total_loss = 0 # reset rating self.metric.reset() # Iterate over each batch of validation set for batch_id, data in enumerate(dev_loader): X, y = data # Computational model output logits = self.model(X) # Calculate the loss function loss = self.loss_fn(logits, y).item() # cumulative loss total_loss += loss # Cumulative evaluation self.metric.update(logits, y) dev_loss = (total_loss / len(dev_loader)) dev_score = self.metric.accumulate() # Record validation set loss if global_step != -1: self.dev_losses.append((global_step, dev_loss)) self.dev_scores.append(dev_score) return dev_score, dev_loss # During model evaluation, use 'paddle.no_grad()' to control not to compute and store gradients @torch.no_grad() def predict(self, x, **kwargs): # Set the model to evaluation mode self.model.eval() # Run the model forward calculation to get the predicted value logits = self.model(x) return logits def save_model(self, save_path): torch.save(self.model.state_dict(), save_path) def load_model(self, model_path): state_dict = torch.load(model_path) self.model.load_state_dict(state_dict) class MNIST_dataset(data.Dataset): def __init__(self, dataset, transforms, mode='train'): self.mode = mode self.transforms = transforms self.dataset = dataset def __getitem__(self, idx): # Get images and labels image, label = self.dataset[0][idx], self.dataset[1][idx] image, label = np.array(image).astype('float32'), int(label) image = np.reshape(image, [28, 28]) image = Image.fromarray(image.astype('uint8'), mode='L') image = self.transforms(image) return image, label def __len__(self): return len(self.dataset[0]) class Accuracy(): def __init__(self, is_logist=True): # Used to count the correct number of samples self.num_correct = 0 # total number of samples used for statistics self.num_count = 0 self.is_logist = is_logist def update(self, outputs, labels): # Judging whether it is a two-class task or a multi-class task, when shape[1]=1 is a two-class task, and when shape[1]>1 is a multi-class task if outputs.shape[1] == 1: # two-class outputs = torch.squeeze(outputs, dim=-1) if self.is_logist: # log determines whether it is greater than 0 preds = torch.tensor((outputs >= 0), dtype=torch.float32) else: # If it is not log, judge whether each probability value is greater than 0.5, when it is greater than 0.5, the category is 1, otherwise the category is 0 preds = torch.tensor((outputs >= 0.5), dtype=torch.float32) else: # When multi-classification, use 'torch.argmax' to calculate the maximum element index as the class preds = torch.argmax(outputs, dim=1) # Get the number of correctly predicted samples in this batch of data labels = torch.squeeze(labels, dim=-1) batch_correct = torch.sum(torch.tensor(preds == labels, dtype=torch.float32)).numpy() batch_count = len(labels) # Update num_correct and num_count self.num_correct += batch_correct self.num_count += batch_count def accumulate(self): # Using the accumulated data, calculate the total indicator if self.num_count == 0: return 0 return self.num_correct / self.num_count def reset(self): # Reset the correct number and total self.num_correct = 0 self.num_count = 0 def name(self): return "Accuracy" # visualization def plot(runner, fig_name): plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) train_items = runner.train_step_losses[::30] train_steps = [x[0] for x in train_items] train_losses = [x[1] for x in train_items] plt.plot(train_steps, train_losses, color='#8E004D', label="Train loss") if runner.dev_losses[0][0] != -1: dev_steps = [x[0] for x in runner.dev_losses] dev_losses = [x[1] for x in runner.dev_losses] plt.plot(dev_steps, dev_losses, color='#E20079', linestyle='--', label="Dev loss") # Plot the axes and legend plt.ylabel("loss", fontsize='x-large') plt.xlabel("step", fontsize='x-large') plt.legend(loc='upper right', fontsize='x-large') plt.subplot(1, 2, 2) # Draw the curve of the evaluation accuracy rate if runner.dev_losses[0][0] != -1: plt.plot(dev_steps, runner.dev_scores, color='#E20079', linestyle="--", label="Dev accuracy") else: plt.plot(list(range(len(runner.dev_scores))), runner.dev_scores, color='#E20079', linestyle="--", label="Dev accuracy") # Plot the axes and legend plt.ylabel("score", fontsize='x-large') plt.xlabel("step", fontsize='x-large') plt.legend(loc='lower right', fontsize='x-large') plt.savefig(fig_name) plt.show() # fixed random seed random.seed(0) # Load the mnist dataset train_dataset = MNIST_dataset(dataset=train_set, transforms=transforms, mode='train') test_dataset = MNIST_dataset(dataset=test_set, transforms=transforms, mode='test') dev_dataset = MNIST_dataset(dataset=dev_set, transforms=transforms, mode='dev') # Print and observe the distribution of the dataset torch.manual_seed(100) # Learning rate size lr = 0.005 # batch size batch_size = 64 # Download Data train_loader = data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True) dev_loader = data.DataLoader(dev_dataset, batch_size=batch_size) test_loader = data.DataLoader(test_dataset, batch_size=batch_size) # Define the network, deep network without residual structure model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=False) # Define the optimizer optimizer = opt.SGD(lr=lr, params=model.parameters()) # Define the loss function loss_fn = F.cross_entropy # Define evaluation metrics metric = Accuracy(is_logist=True) # Instantiate RunnerV3 runner = RunnerV3(model, optimizer, loss_fn, metric) # start training log_steps = 15 eval_steps = 15 runner.train(train_loader, dev_loader, num_epochs=5, log_steps=log_steps, eval_steps=eval_steps, save_path="best_model.pdparams") # Visually observe the Loss changes of the training set and the validation set plot(runner, 'cnn-loss2.pdf')
5.4.2.2 Model Evaluation
# Load the optimal model runner.load_model('best_model.pdparams') # Model evaluation score, loss = runner.evaluate(test_loader) print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))
5.4.3 ResNet18 with residual connections
5.4.3.1 Model training
# fixed random seed random.seed(0) # Load the mnist dataset train_dataset = MNIST_dataset(dataset=train_set, transforms=transforms, mode='train') test_dataset = MNIST_dataset(dataset=test_set, transforms=transforms, mode='test') dev_dataset = MNIST_dataset(dataset=dev_set, transforms=transforms, mode='dev') # Learning rate size lr = 0.01 # batch size batch_size = 128 # Download Data train_loader = data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True) dev_loader = data.DataLoader(dev_dataset, batch_size=batch_size) test_loader = data.DataLoader(test_dataset, batch_size=batch_size) # Define the network, by specifying use_residual as True, using a deep network with residual structure model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=True) # Define the optimizer optimizer = opt.SGD(lr=lr, params=model.parameters()) # Define the loss function loss_fn = F.cross_entropy # Define evaluation metrics metric = Accuracy(is_logist=True) # Instantiate RunnerV3 runner = RunnerV3(model, optimizer, loss_fn, metric) # start training log_steps = 15 eval_steps = 15 runner.train(train_loader, dev_loader, num_epochs=5, log_steps=log_steps, eval_steps=eval_steps, save_path="best_model.pdparams") # Visually observe the Loss changes of the training set and the validation set plot(runner, 'cnn-loss3.pdf')
5.4.3.2 Model Evaluation
# Load the optimal model runner.load_model('best_model.pdparams') # Model evaluation score, loss = runner.evaluate(test_loader) print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))
From the output results, compared with ResNet without residual connection, after adding residual connection, the model effect has been improved to a certain extent.
5.4.4 Comparison experiment with the high-level API implementation version
For the more classic image classification network such as Reset18, the PyTorch framework provides a well-implemented version for everyone, and you can no longer implement it from scratch. Here, assign the same weight to the resnet18 model of the high-level API version and the custom resnet18 model, and use the same input data to observe whether the output results are consistent.
from torchvision.models import resnet18 hapi_model = resnet18(pretrained=True) # Custom resnet18 model model = Model_ResNet18(in_channels=3, num_classes=10, use_residual=True) # Get the weights of the network params = hapi_model.state_dict() # Used to save the network weight after parameter name mapping new_params = {} # map parameter names for key in params: if 'layer' in key: if 'downsample.0' in key: new_params['net.' + key[5:8] + '.shortcut' + key[-7:]] = params[key] elif 'downsample.1' in key: new_params['net.' + key[5:8] + '.shorcutt' + key[23:]] = params[key] else: new_params['net.' + key[5:]] = params[key] elif 'conv1.weight' == key: new_params['net.0.0.weight'] = params[key] elif 'bn1' in key: new_params['net.0.1' + key[3:]] = params[key] elif 'fc' in key: new_params['net.7' + key[2:]] = params[key] # Here we use np.random to create a random array as test data inputs = np.random.randn(*[3, 3, 32, 32]) inputs = inputs.astype('float32') x = torch.tensor(inputs) output = hapi_model(x) hapi_out = hapi_model(x) # Calculate the difference between the outputs of two models diff = output - hapi_out # Take the value with the largest difference max_diff = torch.max(diff) print(max_diff)
There is no difference in the computation of the custom ResNet and the ResNet wrapped by the PyTorch framework.
Summarize
After learning the ResNet classic residual network, I completed the recognition of Mnist handwritten digits, and compared the ResNet18 with or without residual connection. After seeing the residual connection, the model effect is better. Also experienced the convenience of high-level API.