Using python to realize simple artificial neural network recognition of handwritten digits

A simple neural network model is built using Python, and handwritten digits are recognized.

1. Front work

1.1 environment configuration

The handwritten numeral character set built in scikit learn library is used as the data set of this paper. Scikit learn library is a classic machine learning library. Its library and other dependent libraries need to be installed before use.
It mainly includes numpy, scipy, matplotlib, Jupiter, pandas and seaborn.

For example: pip install numpy

One thing to note here is that when using the original source to download the third-party library in China, the download speed is particularly slow, and there may even be download failure. Therefore, when downloading the third-party library, you will generally choose to change the domestic source. There are generally two ways to change domestic sources: temporary and permanent.

Four common domestic sources:
Alibaba cloud: http://mirrors.aliyun.com/pypi/simple/
Watercress: http://pypi.douban.com/simple/
USTC: https://pypi.mirrors.ustc.edu.cn/simple/
THU: https://pypi.tuna.tsinghua.edu.cn/simple/
Temporary mode
For domestic Tsinghua source
pip install numpy -i https://pypi.tuna.tsinghua.edu.cn/simple/
Permanent mode

I use Mac here, so the main environment display in this article is basically based on Mac system.
1. Open the terminal cd~
2. Check whether it exists pip folder. ls -a
3. If it does not exist, create it. mkdir .pip. And then in Create pip Conf configuration file. touch pip.conf

Permanently replace with Alibaba cloud source

[global]
    index-url=http://mirrors.aliyun.com/pypi/simple/
[install]
    trusted-host=mirrors.aliyun.com
Replace permanent domestic source under Windows

Take Alibaba source as an example, enter C: \ users < your_ Under username > \ appdata \ roaming directory, create a PIP folder and create a new file pip ini.
Open pip Ini, enter the following:

[global]
    index-url=http://mirrors.aliyun.com/pypi/simple/
[install]
    trusted-host=mirrors.aliyun.com

1.2 preparing data sets

When learning machine learning, the more common way is to use jupyter, so this article will also use this tool. This tool is relatively simple to use, which will not be introduced here.

# Open jupyter 
jupyter notebook

Let's take a look at the data set we need to use. The data set contains a number matrix converted from 1797 handwritten character images with numbers 0 to 9, and the target value is 0-9.

# Import dataset
from sklearn import datasets

digits = datasets.load_digits()
digits


The loaded DIGITS dataset contains three attributes:

attributedescribe
images8x8 matrix to record the pixel gray value corresponding to each handwritten character image
dataConvert 8x8 matrix corresponding to images to row vector
targetRecord the numbers represented by 1797 images

According to the gray value matrix, use Matplotlib to display the gray image and label corresponding to the character. You need to add% matplotlib inline in jupyter, but not in pycharm.

# According to the gray value matrix, use Matplotlib to convert the gray image and label corresponding to the character
from matplotlib import pyplot as plt
%matplotlib inline

image1 = digits.images[0]
print("Label as:", digits.target[0])
plt.imshow(image1, cmap=plt.cm.gray_r)


As can be seen from the figure, the picture we need to identify is 8 × 8 grayscale images, and their labels correspond to the picture content one by one.

2. Artificial neural network

2.1 full connection layer of neural network


There is a weight w on the connecting line between neurons. When the neural network works, the result obtained by multiplying the output of the previous layer of neurons by the weight w and adding an offset bias is transmitted to the next layer of neurons. Namely:

w11∗al+w12∗a2+w13∗a3+bias1=b1
w21∗al+w22∗a2+w23∗a3+bias2=b2

In essence, neural network is to give a random set of w and bias, judge the quality of the model under the condition of w and bias, and then update w and bias through a certain algorithm. This cycle continues until the best values of w matrix and bias matrix are found. The process of obtaining these parameters is actually the training (learning) process of the model.

Forward propagation

We call the process of data computing from left to right in the network layer forward propagation.

import numpy as np


class FullyConnect:
    # Pass in parameter len_x is the characteristic length of the input data (i.e. the number of neurons in the first layer)
    # len_y is the number of output data (that is, the number of neurons in the next layer)
    def __init__(self, len_x, len_y):
        # The size of the w matrix between the network layer of M neurons and the network layer of N neurons is (n*m)
        self.weights = np.random.randn(len_y, len_x) / np.sqrt(len_x)
        self.bias = np.random.randn(len_y, 1)  # Use random number initialization parameters. The number of bias is related to the number of output layers
        self.lr = 0  # First initialize the learning rate to 0, and finally set the learning rate uniformly

    # In the forward propagation process of full connection, the input is the training data
    def forward(self, x):
        self.x = x  # Save the intermediate results for back propagation
        # Calculate the output of the full connection layer, that is, the code representation of the matrix multiplication formula above
        self.y = np.array([np.dot(self.weights, xx) + self.bias for xx in x])
        return self.y  # Pass the calculation results of this layer forward
Input and output

For neural networks, a sample can only occupy one line, so here we need to set the size of 8 × The image of 8 is transformed into a row vector and transmitted into the neural network. The data attribute in DIGITS dataset has done this for us.

# Row vector of the first two pictures
digits.data[0:2]


Next, we transfer the first two row vectors into the fully connected middle layer and output the prediction results of the fully connected layer.

fully_connet = FullyConnect(64, 1)  # Length of incoming network layer 1 and network layer 2
full_result = fully_connet.forward(digits.data[0:2])
full_result  # Only two pieces of data are passed in here for testing. Get the predicted value of two pictures after one forward propagation


The above result is the prediction result after a forward propagation calculation.

2.2 activation function


In practice, there are many activation functions to choose from. You can even define your own activation function. Here we use the most classic activation function: Sigmoid activation function. Put the fully connected output data z into the activation function, and finally get the output of the neuron.

class Sigmoid:
    def __init__(self):  # No parameters, no initialization required
        pass
    # The x of the variable entered here
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    # Complete the forward propagation, put the input z into the Sigmoid function, and finally get the result h and return it
    def forward(self, x):
        self.x = x
        self.y = self.sigmoid(x)
        return self.y

Use matplotlib to draw the image of the activation function.

2.3 loss function

In fact, many neural networks let the data pass through the full connection layer and activation function layer, and finally get the prediction results. So the question is, after the prediction results are obtained, how to explain whether the model in the current state is good or bad? Do neural networks still need to be trained? Therefore, we introduce the concept of loss function.

The loss function is the difference between the label predicted by the model and the real label. The function that defines this difference is called the loss function. The training process of deep learning is actually the process of solving the minimum value of loss function. For example, calculate the absolute error between the real value and the predicted value. When the obtained value is relatively large, it indicates that the output of the neural network deviates greatly from the expected correct output. On the contrary, if the value obtained is very small or even equal to 0, it shows that our model works well and can correctly predict the output value.

In fact, there are many kinds of loss functions for us to choose from. Here we use one of the most classic loss functions: Quadratic Loss Function.

Unique heat coding

There are various forms of labels in life. They may be labels for predicting the weather, such as cloudy, sunny and rainy days, or a,b,c, etc. How to convert these labels into labels that can be recognized by the computer? There are many ways, such as decimal. However, if you use decimal to represent these discrete labels, there will be a disadvantage. Suppose I treat 0 as a sunny day, 1 as a rainy day, and 2 as a cloudy day. The loss will be calculated on sunny days and cloudy days. But they all predicted the label wrong, and there is no reason to make their losses different. Therefore, the concept of unique heat coding is proposed.

Single hot code: each bit of the number has only values of 0 and 1, and each represents a label. If this bit takes 1, the other bits must be 0. As shown in the figure below:

When bit 0 is 1 and other bits are 0, it means sunny. When the first bit is 1 and the other bits are 0, it means rainy days. The same is true for other labels. If we take them as vector coordinates here, the distance between sunny and cloudy days and between sunny and snowy days will be 1. In this way, the calculated losses are equal.

# Using Python to realize the secondary loss function layer
class QuadraticLoss:
    def __init__(self):
        pass
    # The first parameter is the predicted tag value and the second parameter is the actual tag value
    def forward(self, x, label):
        # Convert the real label into a single hot code
        self.x = x
        # Since our label itself contains only one number, we need to convert it into a vector form matching the size of the model output value
        self.label = np.zeros_like(x)
        for a, b in zip(self.label, label):
            a[b] = 1.0  # Only the position probability represented by the correct label is 1, and the others are 0
        # Calculate loss
        self.loss = np.sum(np.square(x - self.label)) / \
            self.x.shape[0] / 2  # Average and divide by 2 for convenience
        return self.loss

Next, we initialize the above-mentioned sunny, rainy, cloudy and snowy days. Then, using the written loss function, observe whether the loss of (cloudy and snowy days) is the same as that of (rainy and snowy days).

# test
loss = QuadraticLoss()
# Assuming that the predicted value of the sample calculated by the neural network is 0, it is snowy
pred = np.zeros((1, 4))
pred[0][0] = 1
print("The average loss in cloudy days and snowy days is:", loss.forward(pred, [1]))
print("The actual average loss in rainy days and the predicted average loss in snowy days are:", loss.forward(pred, [2]))


It can be seen from the results that the loss of (cloudy and snowy days) after single heat coding is the same as that of (rainy and snowy days).

2.4 accuracy function

class Accuracy:
    def __init__(self):
        pass

    def forward(self, x, label):  # Just forward
        self.accuracy = np.sum(
            [np.argmax(xx) == ll for xx, ll in zip(x, label)])  # Sum the number of instances with correct prediction
        self.accuracy = 1.0 * self.accuracy / x.shape[0]  # That is, calculate the correct rate
        return self.accuracy

Using these network layers, a complete forward propagation of neural network is constructed. And input the data set to be predicted, conduct a forward propagation, and view the output results.

# The picture size is 8 * 8
# At this time, a picture is a piece of data, and each picture corresponds to a label (within the range of 0-9)
x = digits.data
print(x[0])
labels = digits.target
print(labels[0])

# Start building neural networks
inner_layers = []
inner_layers.append(FullyConnect(8 * 8, 10))
inner_layers.append(Sigmoid())
# Neural network construction completed

losslayer = QuadraticLoss()  # Calculate loss
accuracy = Accuracy()  # Calculation accuracy

# Start sending data into neural network for forward propagation
for layer in inner_layers:  # Forward calculation
    x = layer.forward(x)
loss = losslayer.forward(x, labels)  # Call the forward function of the loss layer to calculate the value of the loss function
accu = accuracy.forward(x, labels)
print('loss:', loss, 'accuracy:', accu)


After a forward propagation, the loss of the model is very large, and the accuracy is close to 0. So is there any way to reduce the loss and improve the accuracy? Here we use a method to solve the minimum loss: gradient descent algorithm. The basic approach is back propagation.

class QuadraticLoss:
    def __init__(self):
        pass
    # Forward propagation is the same as above
    def forward(self, x, label):
        self.x = x
        self.label = np.zeros_like(x)
        for a, b in zip(self.label, label):
            a[b] = 1.0
        self.loss = np.sum(np.square(x - self.label)) / \
        self.x.shape[0] / 2  # Average and divide by 2 for convenience
        return self.loss

    # Define back propagation
    def backward(self):
        # dx here is that we find the partial derivative of the function with respect to x, that is, the gradient, and save it, which will be used later in the update
        self.dx = (self.x - self.label) / self.x.shape[0]  # 2 was offset
        return self.dx
# Back propagation of activation function
class Sigmoid:
    def __init__(self):  # No parameters, no initialization required
        pass
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def forward(self, x):
        self.x = x
        self.y = self.sigmoid(x)
        return self.y
    
    def backward(self, d):
        sig = self.sigmoid(self.x)
        self.dx = d * sig * (1 - sig)
        return self.dx  # Reverse transfer gradient

2.5 back propagation of full connection layer

This process is also the most important process. It will receive the processed loss error transmitted by the activation function layer. This layer will also calculate the corresponding parameters ?, ?gradient, ?, ? through loss error.

# We begin to rewrite the full connection layer, and finally use the gradient descent to update the parameters.
class FullyConnect:
    def __init__(self, l_x, l_y):  # The two parameters are the length of the input layer and the length of the output layer
        # Use random numbers to initialize parameters. Please ignore why there are more NP sqrt(l_x)
        self.weights = np.random.randn(l_y, l_x) / np.sqrt(l_x)
        self.bias = np.random.randn(l_y, 1)  # Initialize parameters with random numbers
        self.lr = 0  # First initialize the learning rate to 0, and finally set the learning rate uniformly

    def forward(self, x):
        self.x = x  # Save the intermediate results for back propagation
        self.y = np.array([np.dot(self.weights, xx) +
                           self.bias for xx in x])  # Calculate the output of the full connection layer
        return self.y  # Pass the calculation results of this layer forward

    def backward(self, d):
        # According to the chain rule, multiply the derivative value passed back by x to obtain the gradient of the parameter
        ddw = [np.dot(dd, xx.T) for dd, xx in zip(d, self.x)]
        # Each piece of data can find a ddw, and then take an average of them to get the average gradient change
        self.dw = np.sum(ddw, axis=0) / self.x.shape[0]
        self.db = np.sum(d, axis=0) / self.x.shape[0]
        self.dx = np.array([np.dot(self.weights.T, dd) for dd in d])

        # Using the idea of gradient descent, the parameters are updated. lr here means step size
        self.weights -= self.lr * self.dw
        self.bias -= self.lr * self.db
        return self.dx  # Back propagation gradient

3. Training neural network

Here, we take the first 1500 pieces of data as training data and the latter as test data. The following data sets are obtained:

# Partition dataset
train_data,train_target = digits.data[:1500],digits.target[:1500]
test_data,test_target = digits.data[1500:-1],digits.target[1500:-1]
train_data.shape,train_target.shape,test_data.shape,test_target.shape


Next, we use the network layer written above to build a network structure for digital recognition. The network structure consists of (full connection layer, activation function layer, full connection layer, activation function). The specific codes are as follows:

inner_layers = []
inner_layers.append(FullyConnect(64, 60)) # Because the length of each data is 8 * 8 = 64, the receiving length of the first full connection layer here is 64
inner_layers.append(Sigmoid())
inner_layers.append(FullyConnect(60, 10))
inner_layers.append(Sigmoid())
inner_layers

Next, initialize the loss function, accuracy function, learning rate and number of iterations.

# Next, initialize the loss function, accuracy function, learning rate and number of iterations.
losslayer = QuadraticLoss()
accuracy = Accuracy()
for layer in inner_layers:
    layer.lr = 1000     #Set learning rate for all middle layers
epochs = 150  # The number of times to traverse the training data, that is, the learning time.
#At the beginning, the accuracy will increase with the increase of learning time.
#When the model learns all the information in the training data, the accuracy will tend to be stable
losslayer,accuracy,epochs

Finally, the model is trained. And every 10 times of training, the test results will be output once.

for i in range(epochs):
   
    losssum = 0
    iters = 0
    x = train_data
    label = train_target
    x = x.reshape(-1,64,1)
    for layer in inner_layers:  # Forward calculation
        x = layer.forward(x)
    loss = losslayer.forward(x, label)  # Call the forward function of the loss layer to calculate the value of the loss function
    losssum += loss
    iters += 1
    d = losslayer.backward()  # Call the backward function of the loss layer to calculate the gradient to be back propagated

    for layer in inner_layers[::-1]:  # Back propagation
        d = layer.backward(d)

    if i%10==0: 
        x = test_data
        label = test_target
        x = x.reshape(-1,64,1)
        for layer in inner_layers:
            x = layer.forward(x)
            
        accu = accuracy.forward(x, label)  # Call the forward() function of the accuracy layer to calculate the accuracy
        print('epochs:{},loss:{},test_accuracy:{}'.format(i,losssum / iters,accu))


You can observe the learning effect by setting different iteration times and learning rate. In fact, in order to improve efficiency, neural networks are generally built using deep learning frameworks such as TensorFlow and PyTorch.

Tags: Machine Learning AI neural networks

Posted by shadypalm88 on Mon, 18 Apr 2022 15:01:19 +0300