# 5.4 Handwritten Digit Recognition Experiment Based on Residual Network

Residual Network (ResNet) is a way of adding directly connected edges to nonlinear layers in neural network models to alleviate the problem of gradient disappearance, thereby making it easier to train deep neural networks.

In the residual network, the most basic unit is the residual unit.

## 5.4.1 Model Construction

Build the residual unit of ResNet18, and then form a complete network.

### 5.4.1.1 Residual unit

The input and output shape and size of the nonlinear layer wrapped by the residual unit should be the same. If the number of channels of the input feature map and output feature map of a convolutional layer are inconsistent, its output and input feature map cannot be directly added. To solve the above problem, we can use a convolution of size 1×1 to map the number of channels of the input feature map to a consistent number of channels with the output feature map of the concatenated convolutions.

1×1 convolution: exactly the same as the standard convolution, the only special point is that the size of the convolution kernel is 1×1, that is, the relationship between the local information of the input data is not considered, but the focus is on different channels between. By using 1×1 convolution, you can do the following:

Realize cross-channel interaction and integration of information. Considering that the input and output of the convolution operation are all 3 dimensions (width, height, multi-channel), 1×1 convolution is actually a linear combination of each pixel on different channels to integrate different channels. Information;
The number of convolution kernel channels is reduced and increased to reduce the number of parameters. The output after 1×1 convolution retains the original plane structure of the input data, and the function of dimension raising or dimension reduction is completed by adjusting the number of channels;
Using the nonlinear activation function after 1×1 convolution, the nonlinearity is greatly increased while keeping the size of the feature map unchanged.

```import torch
import torch.nn as nn
import torch.nn.functional as F
class ResBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1, use_residual=True):
"""
residual unit
enter:
- in_channels: Number of input channels
- out_channels: Number of output channels
- stride: The stride of the residual unit, controlled by adjusting the stride of the first convolutional layer in the residual unit
- use_residual: Used to control whether residual connections are used
"""
super(ResBlock, self).__init__()
self.stride = stride
self.use_residual = use_residual
# The first convolutional layer, the size of the convolution kernel is 3×3, the number of output channels and the step size can be set
self.conv1 = nn.Conv2d(in_channels, out_channels, 3, padding=1, stride=self.stride, bias=False)
# The second convolutional layer, the convolution kernel size is 3 × 3, does not change the shape of the input feature map, and the stride is 1
self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1, bias=False)

# use_1x1conv = True if the output of conv2 and the input data shape of this residual block are inconsistent
# When use_1x1conv = True, add a 1x1 convolution to the input data to make its shape consistent with conv2
if in_channels != out_channels or stride != 1:
self.use_1x1conv = True
else:
self.use_1x1conv = False
# When the number of input and output channels of the nonlinear layer wrapped by the residual unit is inconsistent, it is necessary to use 1×1 convolution to adjust the number of channels and then perform the addition operation
if self.use_1x1conv:
self.shortcut = nn.Conv2d(in_channels, out_channels, 1, stride=self.stride, bias=False)

# Each convolutional layer is followed by a batch normalization layer. The content of batch normalization will be described in detail in 7.5.1
self.bn1 = nn.BatchNorm2d(out_channels)
self.bn2 = nn.BatchNorm2d(out_channels)
if self.use_1x1conv:
self.bn3 = nn.BatchNorm2d(out_channels)

def forward(self, inputs):
y = F.relu(self.bn1(self.conv1(inputs)))
y = self.bn2(self.conv2(y))
if self.use_residual:
if self.use_1x1conv:  # If true, perform a 1×1 convolution on the inputs and adjust the shape to match the output y of conv2
shortcut = self.shortcut(inputs)
shortcut = self.bn3(shortcut)
else:  # Otherwise, directly add the inputs and the output y of conv2
shortcut = inputs
out = F.relu(y)
return out

```

### 5.4.1.2 Overall structure of residual network

Residual network is a very deep network composed of many residual units connected in series. The network structure of ResNet18 is shown in the figure below.

For ease of understanding, the ResNet18 network can be divided into 6 modules:

The first module: contains a convolutional layer with a stride of 2 and a size of 7×7. The number of output channels of the convolutional layer is 64. After the output of the convolutional layer is processed by batch normalization and ReLU activation function, Then a 3×3 maximum pooling layer with a stride of 2;
The second module: contains two residual units. After the operation, the number of output channels is 64, and the size of the feature map remains unchanged;
The third module: contains two residual units. After the operation, the number of output channels is 128, and the size of the feature map is reduced by half;
The fourth module: contains two residual units. After the operation, the number of output channels is 256, and the size of the feature map is reduced by half;
The fifth module: contains two residual units. After the operation, the number of output channels is 512, and the size of the feature map is reduced by half;
The sixth module: contains a global average pooling layer, which changes the feature map to a size of 1 × 1, and finally calculates the final output through the fully connected layer.

The code implementation of the ResNet18 model is as follows:
Define module one:

```def make_first_module(in_channels):
# Module 1: 7*7 convolution, batch normalization, aggregation
m1 = nn.Sequential(nn.Conv2d(in_channels, 64, 7, stride=2, padding=3),
nn.BatchNorm2d(64), nn.ReLU(),
return m1

```

Define module two to module five:

```def resnet_module(input_channels, out_channels, num_res_blocks, stride=1, use_residual=True):
blk = []
# According to num_res_blocks, loop to generate residual units
for i in range(num_res_blocks):
if i == 0: # Create the first residual unit in the module
blk.append(ResBlock(input_channels, out_channels,
stride=stride, use_residual=use_residual))
else:      # Create additional residual units in the module
blk.append(ResBlock(out_channels, out_channels, use_residual=use_residual))
return blk

```

Package module two to module five:

```def make_modules(use_residual):
# Module 2: Contains two residual units, the number of input channels is 64, the number of output channels is 64, the step size is 1, and the size of the feature map remains unchanged
m2 = nn.Sequential(*resnet_module(64, 64, 2, stride=1, use_residual=use_residual))
# Module 3: Contains two residual units, the number of input channels is 64, the number of output channels is 128, the step size is 2, and the size of the feature map is reduced by half.
m3 = nn.Sequential(*resnet_module(64, 128, 2, stride=2, use_residual=use_residual))
# Module 4: Contains two residual units, the number of input channels is 128, the number of output channels is 256, the step size is 2, and the size of the feature map is reduced by half.
m4 = nn.Sequential(*resnet_module(128, 256, 2, stride=2, use_residual=use_residual))
# Module 5: Contains two residual units, the number of input channels is 256, the number of output channels is 512, the step size is 2, and the size of the feature map is reduced by half.
m5 = nn.Sequential(*resnet_module(256, 512, 2, stride=2, use_residual=use_residual))
return m2, m3, m4, m5

```

Define the complete network:

```# Define the complete network
class Model_ResNet18(nn.Module):
def __init__(self, in_channels=3, num_classes=10, use_residual=True):
super(Model_ResNet18,self).__init__()
m1 = make_first_module(in_channels)
m2, m3, m4, m5 = make_modules(use_residual)
# Package module 1 to module 6
self.net = nn.Sequential(m1, m2, m3, m4, m5,
# Module 6: Aggregation layer, fully connected layer

def forward(self, x):
return self.net(x)

```

Here you can also use torchsummary.summary to count the parameters of the model.

```device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=True).to(device)
torchsummary.summary(model, (1, 32, 32))

```
```C:\Users\74502\ven\Scripts\python.exe C:/Users/74502/PycharmProjects/pythonProject/Big Data Experiment 3.py
----------------------------------------------------------------
Layer (type)               Output Shape         Param #
================================================================
Conv2d-1           [-1, 64, 16, 16]           3,200
BatchNorm2d-2           [-1, 64, 16, 16]             128
ReLU-3           [-1, 64, 16, 16]               0
MaxPool2d-4             [-1, 64, 8, 8]               0
Conv2d-5             [-1, 64, 8, 8]          36,928
BatchNorm2d-6             [-1, 64, 8, 8]             128
Conv2d-7             [-1, 64, 8, 8]          36,928
BatchNorm2d-8             [-1, 64, 8, 8]             128
ResBlock-9             [-1, 64, 8, 8]               0
Conv2d-10             [-1, 64, 8, 8]          36,928
BatchNorm2d-11             [-1, 64, 8, 8]             128
Conv2d-12             [-1, 64, 8, 8]          36,928
BatchNorm2d-13             [-1, 64, 8, 8]             128
ResBlock-14             [-1, 64, 8, 8]               0
Conv2d-15            [-1, 128, 4, 4]          73,856
BatchNorm2d-16            [-1, 128, 4, 4]             256
Conv2d-17            [-1, 128, 4, 4]         147,584
BatchNorm2d-18            [-1, 128, 4, 4]             256
Conv2d-19            [-1, 128, 4, 4]           8,320
BatchNorm2d-20            [-1, 128, 4, 4]             256
ResBlock-21            [-1, 128, 4, 4]               0
Conv2d-22            [-1, 128, 4, 4]         147,584
BatchNorm2d-23            [-1, 128, 4, 4]             256
Conv2d-24            [-1, 128, 4, 4]         147,584
BatchNorm2d-25            [-1, 128, 4, 4]             256
ResBlock-26            [-1, 128, 4, 4]               0
Conv2d-27            [-1, 256, 2, 2]         295,168
BatchNorm2d-28            [-1, 256, 2, 2]             512
Conv2d-29            [-1, 256, 2, 2]         590,080
BatchNorm2d-30            [-1, 256, 2, 2]             512
Conv2d-31            [-1, 256, 2, 2]          33,024
BatchNorm2d-32            [-1, 256, 2, 2]             512
ResBlock-33            [-1, 256, 2, 2]               0
Conv2d-34            [-1, 256, 2, 2]         590,080
BatchNorm2d-35            [-1, 256, 2, 2]             512
Conv2d-36            [-1, 256, 2, 2]         590,080
BatchNorm2d-37            [-1, 256, 2, 2]             512
ResBlock-38            [-1, 256, 2, 2]               0
Conv2d-39            [-1, 512, 1, 1]       1,180,160
BatchNorm2d-40            [-1, 512, 1, 1]           1,024
Conv2d-41            [-1, 512, 1, 1]       2,359,808
BatchNorm2d-42            [-1, 512, 1, 1]           1,024
Conv2d-43            [-1, 512, 1, 1]         131,584
BatchNorm2d-44            [-1, 512, 1, 1]           1,024
ResBlock-45            [-1, 512, 1, 1]               0
Conv2d-46            [-1, 512, 1, 1]       2,359,808
BatchNorm2d-47            [-1, 512, 1, 1]           1,024
Conv2d-48            [-1, 512, 1, 1]       2,359,808
BatchNorm2d-49            [-1, 512, 1, 1]           1,024
ResBlock-50            [-1, 512, 1, 1]               0
AdaptiveAvgPool2d-51            [-1, 512, 1, 1]               0
Flatten-52                  [-1, 512]               0
Linear-53                   [-1, 10]           5,130
================================================================
Total params: 11,180,170
Trainable params: 11,180,170
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 1.05
Params size (MB): 42.65
Estimated Total Size (MB): 43.71
----------------------------------------------------------------

Process ended with exit code 0

```

Calculation amount:

```from thop import profile
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # PyTorch v0.4.0
model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=True).to(device)
dummy_input = torch.randn(1, 1, 32, 32).to(device)

flops, params = profile(model,(dummy_input,))
print(flops)

```

## 5.4.2 ResNet18 without residual connections (Plain Networks)

### 5.4.2.1 Model training

```from PIL import Image
import matplotlib.pyplot as plt
from torchvision.transforms import Compose, Resize, Normalize, ToTensor
import random
import torch.utils.data as data
import torch
import torch.nn.functional as F
import torch.nn as nn
from torch.nn.init import constant_, normal_, uniform_
from torchsummary import summary
from thop import profile
import torch.optim as opt
import numpy as np

# Print and observe the distribution of the dataset
train_set, dev_set, test_set = json.load(gzip.open('mnist (1).json.gz'))
train_images, train_labels = train_set[0][:2000], train_set[1][:2000]
dev_images, dev_labels = dev_set[0][:200], dev_set[1][:200]
test_images, test_labels = test_set[0][:200], test_set[1][:200]
train_set, dev_set, test_set = [train_images, train_labels], [dev_images, dev_labels], [test_images, test_labels]
print('Length of train/dev/test set:{}/{}/{}'.format(len(train_set[0]), len(dev_set[0]), len(test_set[0])))

image, label = train_set[0][0], train_set[1][0]
image, label = np.array(image).astype('float32'), int(label)
# The original image data is a row vector of length 784, which needs to be adjusted to an image of size [28,28]
image = np.reshape(image, [28, 28])
image = Image.fromarray(image.astype('uint8'), mode='L')
print("The number in the picture is {}".format(label))
plt.figure(figsize=(5, 5))
plt.imshow(image)
plt.savefig('conv-number5.pdf')
# data preprocessing
transforms = Compose([Resize(32), ToTensor(), Normalize(mean=[1], std=[1])])

class RunnerV3(object):
def __init__(self, model, optimizer, loss_fn, metric, **kwargs):
self.model = model
self.optimizer = optimizer
self.loss_fn = loss_fn
self.metric = metric  # Only used to calculate evaluation indicators

# Record the changes in the evaluation indicators during the training process
self.dev_scores = []

# Record the change of the loss function during the training process
self.train_epoch_losses = []  # An epoch records a loss
self.train_step_losses = []  # A step records a loss
self.dev_losses = []

# Record the global optimal index
self.best_score = 0

# Switch the model to training mode
self.model.train()

# Pass in the number of training epochs, if no value is passed in, it defaults to 0
num_epochs = kwargs.get("num_epochs", 0)
# Incoming log printing frequency, if no value is passed in, the default is 100
log_steps = kwargs.get("log_steps", 100)
# Evaluation frequency
eval_steps = kwargs.get("eval_steps", 0)

# Pass in the model save path, if no value is passed in, the default is "best_model.pdparams"
save_path = kwargs.get("save_path", "best_model.pdparams")

custom_print_log = kwargs.get("custom_print_log", None)

# total training steps

if eval_steps:
if self.metric is None:
raise RuntimeError('Error: Metric can not be None!')
raise RuntimeError('Error: dev_loader can not be None!')

# Number of step s to run
global_step = 0

# Do num_epochs rounds of training
for epoch in range(num_epochs):
# Loss for statistical training set
total_loss = 0
X, y = data
# Get model predictions
logits = self.model(X)
loss = self.loss_fn(logits, y)  # Default to mean
total_loss += loss

# During the training process, the loss of each step is saved
self.train_step_losses.append((global_step, loss.item()))

if log_steps and global_step % log_steps == 0:
print(
f"[Train] epoch: {epoch}/{num_epochs}, step: {global_step}/{num_training_steps}, loss: {loss.item():.5f}")

loss.backward()

if custom_print_log:
custom_print_log(self)

# Mini-batch gradient descent for parameter update
self.optimizer.step()

# Decide whether to evaluate
if eval_steps > 0 and global_step > 0 and \
(global_step % eval_steps == 0 or global_step == (num_training_steps - 1)):

print(f"[Evaluate]  dev score: {dev_score:.5f}, dev loss: {dev_loss:.5f}")

# Switch the model to training mode
self.model.train()

# If the current indicator is the optimal indicator, save the model
if dev_score > self.best_score:
self.save_model(save_path)
print(
f"[Evaluate] best accuracy performence has been updated: {self.best_score:.5f} --> {dev_score:.5f}")
self.best_score = dev_score

global_step += 1

# Current epoch training loss cumulative value
# epoch granularity training loss preservation
self.train_epoch_losses.append(trn_loss)

print("[Train] Training done!")

assert self.metric is not None

# Set the model to evaluation mode
self.model.eval()

global_step = kwargs.get("global_step", -1)

# Loss for statistical training set
total_loss = 0

# reset rating
self.metric.reset()

# Iterate over each batch of validation set
X, y = data

# Computational model output
logits = self.model(X)

# Calculate the loss function
loss = self.loss_fn(logits, y).item()
# cumulative loss
total_loss += loss

# Cumulative evaluation
self.metric.update(logits, y)

dev_score = self.metric.accumulate()

# Record validation set loss
if global_step != -1:
self.dev_losses.append((global_step, dev_loss))
self.dev_scores.append(dev_score)

return dev_score, dev_loss

def predict(self, x, **kwargs):
# Set the model to evaluation mode
self.model.eval()
# Run the model forward calculation to get the predicted value
logits = self.model(x)
return logits

def save_model(self, save_path):
torch.save(self.model.state_dict(), save_path)

class MNIST_dataset(data.Dataset):
def __init__(self, dataset, transforms, mode='train'):
self.mode = mode
self.transforms = transforms
self.dataset = dataset

def __getitem__(self, idx):
# Get images and labels
image, label = self.dataset[0][idx], self.dataset[1][idx]
image, label = np.array(image).astype('float32'), int(label)
image = np.reshape(image, [28, 28])
image = Image.fromarray(image.astype('uint8'), mode='L')
image = self.transforms(image)

return image, label

def __len__(self):
return len(self.dataset[0])

class Accuracy():
def __init__(self, is_logist=True):
# Used to count the correct number of samples
self.num_correct = 0
# total number of samples used for statistics
self.num_count = 0

self.is_logist = is_logist

def update(self, outputs, labels):

# Judging whether it is a two-class task or a multi-class task, when shape[1]=1 is a two-class task, and when shape[1]>1 is a multi-class task
if outputs.shape[1] == 1:  # two-class
outputs = torch.squeeze(outputs, dim=-1)
if self.is_logist:
# log determines whether it is greater than 0
preds = torch.tensor((outputs >= 0), dtype=torch.float32)
else:
# If it is not log, judge whether each probability value is greater than 0.5, when it is greater than 0.5, the category is 1, otherwise the category is 0
preds = torch.tensor((outputs >= 0.5), dtype=torch.float32)
else:
# When multi-classification, use 'torch.argmax' to calculate the maximum element index as the class
preds = torch.argmax(outputs, dim=1)

# Get the number of correctly predicted samples in this batch of data
labels = torch.squeeze(labels, dim=-1)
batch_correct = torch.sum(torch.tensor(preds == labels, dtype=torch.float32)).numpy()
batch_count = len(labels)

# Update num_correct and num_count
self.num_correct += batch_correct
self.num_count += batch_count

def accumulate(self):
# Using the accumulated data, calculate the total indicator
if self.num_count == 0:
return 0
return self.num_correct / self.num_count

def reset(self):
# Reset the correct number and total
self.num_correct = 0
self.num_count = 0

def name(self):
return "Accuracy"

# visualization
def plot(runner, fig_name):
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
train_items = runner.train_step_losses[::30]
train_steps = [x[0] for x in train_items]
train_losses = [x[1] for x in train_items]

plt.plot(train_steps, train_losses, color='#8E004D', label="Train loss")
if runner.dev_losses[0][0] != -1:
dev_steps = [x[0] for x in runner.dev_losses]
dev_losses = [x[1] for x in runner.dev_losses]
plt.plot(dev_steps, dev_losses, color='#E20079', linestyle='--', label="Dev loss")
# Plot the axes and legend
plt.ylabel("loss", fontsize='x-large')
plt.xlabel("step", fontsize='x-large')
plt.legend(loc='upper right', fontsize='x-large')

plt.subplot(1, 2, 2)
# Draw the curve of the evaluation accuracy rate
if runner.dev_losses[0][0] != -1:
plt.plot(dev_steps, runner.dev_scores,
color='#E20079', linestyle="--", label="Dev accuracy")
else:
plt.plot(list(range(len(runner.dev_scores))), runner.dev_scores,
color='#E20079', linestyle="--", label="Dev accuracy")
# Plot the axes and legend
plt.ylabel("score", fontsize='x-large')
plt.xlabel("step", fontsize='x-large')
plt.legend(loc='lower right', fontsize='x-large')

plt.savefig(fig_name)
plt.show()

# fixed random seed
random.seed(0)
train_dataset = MNIST_dataset(dataset=train_set, transforms=transforms, mode='train')
test_dataset = MNIST_dataset(dataset=test_set, transforms=transforms, mode='test')
dev_dataset = MNIST_dataset(dataset=dev_set, transforms=transforms, mode='dev')
# Print and observe the distribution of the dataset
torch.manual_seed(100)
# Learning rate size
lr = 0.005
# batch size
batch_size = 64
# Define the network, deep network without residual structure
model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=False)
# Define the optimizer
optimizer = opt.SGD(lr=lr, params=model.parameters())
# Define the loss function
loss_fn = F.cross_entropy
# Define evaluation metrics
metric = Accuracy(is_logist=True)
# Instantiate RunnerV3
runner = RunnerV3(model, optimizer, loss_fn, metric)
# start training
log_steps = 15
eval_steps = 15
eval_steps=eval_steps, save_path="best_model.pdparams")
# Visually observe the Loss changes of the training set and the validation set
plot(runner, 'cnn-loss2.pdf')
```

### 5.4.2.2 Model Evaluation

```# Load the optimal model
# Model evaluation
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))

```

## 5.4.3 ResNet18 with residual connections

### 5.4.3.1 Model training

```# fixed random seed
random.seed(0)
train_dataset = MNIST_dataset(dataset=train_set, transforms=transforms, mode='train')
test_dataset = MNIST_dataset(dataset=test_set, transforms=transforms, mode='test')
dev_dataset = MNIST_dataset(dataset=dev_set, transforms=transforms, mode='dev')
# Learning rate size
lr = 0.01
# batch size
batch_size = 128
# Define the network, by specifying use_residual as True, using a deep network with residual structure
model = Model_ResNet18(in_channels=1, num_classes=10, use_residual=True)
# Define the optimizer
optimizer = opt.SGD(lr=lr, params=model.parameters())
# Define the loss function
loss_fn = F.cross_entropy
# Define evaluation metrics
metric = Accuracy(is_logist=True)
# Instantiate RunnerV3
runner = RunnerV3(model, optimizer, loss_fn, metric)
# start training
log_steps = 15
eval_steps = 15
eval_steps=eval_steps, save_path="best_model.pdparams")
# Visually observe the Loss changes of the training set and the validation set
plot(runner, 'cnn-loss3.pdf')

```

### 5.4.3.2 Model Evaluation

```# Load the optimal model
# Model evaluation
print("[Test] accuracy/loss: {:.4f}/{:.4f}".format(score, loss))

```

From the output results, compared with ResNet without residual connection, after adding residual connection, the model effect has been improved to a certain extent.

## 5.4.4 Comparison experiment with the high-level API implementation version

For the more classic image classification network such as Reset18, the PyTorch framework provides a well-implemented version for everyone, and you can no longer implement it from scratch. Here, assign the same weight to the resnet18 model of the high-level API version and the custom resnet18 model, and use the same input data to observe whether the output results are consistent.

```from torchvision.models import resnet18

hapi_model = resnet18(pretrained=True)
# Custom resnet18 model
model = Model_ResNet18(in_channels=3, num_classes=10, use_residual=True)

# Get the weights of the network
params = hapi_model.state_dict()
# Used to save the network weight after parameter name mapping
new_params = {}
# map parameter names
for key in params:
if 'layer' in key:
if 'downsample.0' in key:
new_params['net.' + key[5:8] + '.shortcut' + key[-7:]] = params[key]
elif 'downsample.1' in key:
new_params['net.' + key[5:8] + '.shorcutt' + key[23:]] = params[key]
else:
new_params['net.' + key[5:]] = params[key]
elif 'conv1.weight' == key:
new_params['net.0.0.weight'] = params[key]
elif 'bn1' in key:
new_params['net.0.1' + key[3:]] = params[key]
elif 'fc' in key:
new_params['net.7' + key[2:]] = params[key]

# Here we use np.random to create a random array as test data
inputs = np.random.randn(*[3, 3, 32, 32])
inputs = inputs.astype('float32')
x = torch.tensor(inputs)

output = hapi_model(x)
hapi_out = hapi_model(x)

# Calculate the difference between the outputs of two models
diff = output - hapi_out
# Take the value with the largest difference
max_diff = torch.max(diff)
print(max_diff)

```

There is no difference in the computation of the custom ResNet and the ResNet wrapped by the PyTorch framework.

# Summarize

After learning the ResNet classic residual network, I completed the recognition of Mnist handwritten digits, and compared the ResNet18 with or without residual connection. After seeing the residual connection, the model effect is better. Also experienced the convenience of high-level API.

Posted by agsparta on Wed, 09 Nov 2022 21:02:41 +0300