(Introduction to rookies) using pytorch framework to realize feedforward neural network

Feedforward neural network

Common feedforward neural networks include Perceptrons, BP (Back Propagation) networks, etc. Feedforward neural network (FNN) is the earliest type of simple artificial neural network invented in the field of artificial intelligence. The neurons were arranged in layers. Each neuron is only connected to the neurons in the previous layer. Receive the output of the previous layer and output it to the next layer. There is no feedback between layers. Inside it, parameters propagate unidirectionally from the input layer through the hidden layer to the output layer. Different from recurrent neural network, it will not form a directed ring inside it. The following figure is a schematic diagram of a simple feedforward neural network:

There is no feedback in the whole network, and the signal propagates unidirectionally from the input layer to the output layer, which can be represented by a directed acyclic graph


Perceptron is actually a neuron in the structure of neural network, so a perceptron constitutes the simplest neural network.
Perceptron is an artificial neural network with forward structure, which can be regarded as a directed graph, which is composed of multiple node layers, and each layer is connected to the next layer. In addition to the input node, each node is a neuron (or processing unit) with nonlinear activation function

Realize feedforward neural network

Previous blog s have mentioned how to build a pytorch GPU environment for windows system. We use pytorch to realize the first feedforward neural network:
Source code:
I made detailed comments in the source code for reference

import torch
import torch.nn as nn
import torchvision.datasets as dsets #torchvision is a library for graphics processing, loading data sets
import torchvision.transforms as transforms

torchvision.datasets This package contains MNIST,FakeData,COCO,LSUN,ImageFolder,DatasetFolder,ImageNet,CIFAR And other commonly used data sets, and provides some important parameter settings of data set settings, which can be called through simple data set settings. From these data sets, we can also see the main variables and functions of data set settings, which will also be of great help to our own data set settings in the future.
The interfaces of the above data sets are basically similar. They include at least two common parameters transform and target_transform,So as to transform the input and target respectively
from torch.autograd import Variable
#torch.autograd provides classes and functions for deriving arbitrary scalar functions.
import torch.utils.data as Data
#We need to use torch utils. data. Data loader loads data
import matplotlib.pyplot as plt
#Library required for drawing

# Hyper Parameters / algorithm parameters are set according to experience and affect the weight and bias, such as the number of iterations, the number of hidden layers, the number of neurons per layer, learning rate, etc
input_size = 784
hidden_size = 500
num_classes = 10
num_epochs = 5
batch_size = 100
learning_rate = 0.001

# MNIST Dataset
train_dataset = dsets.MNIST(root='./data', #Specifies the directory of the dataset
# transforms.ToTensor() will replace the empty's ndarray or PIL The image read by image is converted into Tensor format with shape (C,H, W), and / 255 is normalized to between [0,1.0]

test_dataset = dsets.MNIST(root='./data', 

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
dataset:Data set for loading data
batch_size: Number of data loaded for batch training
shuffle: In each Epoch Scramble data in

# Neural Network Model (1 hidden layer)
class Net(nn.Module):
    #Initialize network structure
    def __init__(self, input_size, hidden_size, num_classes):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) #Input layer, linear relationship
        self.relu = nn.ReLU()#Hidden layer, using ReLU function
        self.fc2 = nn.Linear(hidden_size, num_classes)  #Output layer, linear relationship
    #forword parameter transfer function, data flow in the network
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out
net = Net(input_size, hidden_size, num_classes)

# Loss and Optimizer
criterion = nn.CrossEntropyLoss()  #Set loss to least squares loss
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)  
#Set optimizer, torch optim. Adam
# Train the Model
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):  
        # Convert torch tensor to Variable
        images = Variable(images.view(-1, 28*28))#The picture size is 28 * 28
        labels = Variable(labels)
        #pytorch is calculated by tensor, and the parameters in tensor are in the form of Variable
        # Forward + Backward + Optimize
        optimizer.zero_grad()  # zero the gradient buffer
        outputs = net(images)
        loss = criterion(outputs, labels)
        if (i+1) % 100 == 0:
            print ('Epoch [%d/%d], Step [%d/%d], Loss: %.4f' 
                   %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.item()))
#Output the results every 100 step s
# Test the Model
correct = 0
total = 0
for images, labels in test_loader:
    images = Variable(images.view(-1, 28*28))
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)#Calculate the number of all label s
    correct += (predicted == labels).sum()#Calculation of predicted number of label s

print('Accuracy of the network on the 10000 test images: %d %%' % (100 * torch.true_divide(correct, total)))

# Save the Model
for i in range(1,4):

    plt.imshow(train_dataset.train_data[i].numpy(), cmap='gray')  

    plt.title('%i' % train_dataset.train_labels[i])  

torch.save(net.state_dict(), 'model.pkl')
#net.state_dict(), model file
test_output = net(images[:20])  

pred_y = torch.max(test_output, 1)[1].data.numpy().squeeze()  

print('prediction number',pred_y)  

print('real number',test_y[:20].numpy())  

Least squares Loss

class torch.nn.CrossEntropyLoss(weight=None, size_average=True)[source]
This standard integrates LogSoftMax and NLLLoss into one class.

This method is very useful when training a multi class classifier.

weight(tensor): 1-D tensor, n elements, representing the weights of N classes respectively, is very useful if your training samples are very unbalanced. The default value is None.
Call time parameters:

input: contains the score of each class, 2-D tensor, and shape is batch*n

target: 1-D tensor with size n, including the index of the category (0 to n-1).

Loss can be expressed in the following form:

When the weight parameter is specified, the calculation formula of loss becomes:


torch.optim is a library that implements various optimization algorithms. Most commonly used methods are supported, and the interface has enough universality to integrate more complex methods in the future.
To use torch Optim, you need to build an optimizer object. This object can maintain the current parameter state and update the parameters based on the calculated gradient.
In order to build an optimizer, you need to give it an iterable that contains the parameters to be optimized (which must all be Variable objects). Then, you can set the parameter options of optimizer, such as learning rate, weight attenuation, and so on.
For example:

optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr = 0.0001)

For Adam

class torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)[source]


  1. params (iterable) – iterable of the parameter to be optimized or dict lr (float, optional) with parameter group defined
  2. Learning rate (default: 1e-3) betas (Tuple[float, float], optional) –
  3. Coefficient used to calculate the running average of gradient and gradient square (default: 0.9, 0.999) eps (float, optional) –
  4. The term (default: 1e-8) added to the denominator to increase the stability of numerical calculation_ Deck (float, optional) –
    Weight attenuation (L2 penalty) (default: 0)

Attach an explanation of the pytorch document


output = torch.max(input, dim)

1. Input

input is a tensor output from the softmax function
dim is the dimension 0 / 1 of the max function index. 0 is the maximum value of each column and 1 is the maximum value of each row

2. Output

The function will return two tensors. The first tensor is the maximum value of each line, and the maximum output of softmax is 1, so the first tensor is all 1 tensors; The second tensor is the index of the maximum value of each row.


State in pytorch_ Dict is a simple python dictionary object that maps each layer to its corresponding parameters (such as weights and offsets of each layer of the model)

(note that only those layers whose parameters can be trained will be saved in the state_dict of the model, such as convolution layer, linear layer, etc.)

squeeze function

import numpy as np

x = np.array([[[0], [1], [2]]])

print(x.shape)  # (1, 3, 1)

x1 = np.squeeze(x)  # Delete the one-dimensional entry from the shape of the array, that is, remove the dimension with 1 in the shape

print(x1)  # [0 1 2]
print(x1.shape)  # (3,)

Posted by silvermice on Fri, 20 May 2022 11:22:22 +0300