Handwritten digit picture recognition - convolutional neural network

import dependencies

from tensorflow import keras
from matplotlib import pyplot as plt
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense

 

Download dataset

The mnist dataset is a public handwritten digit dataset. There are a total of 7W 28*28 pixel 0-9 handwritten digit pictures and labels, of which 6W are training sets and 1W are test sets.

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Among them, x_train is the training set feature, y_train is the training set label, x_test is the test set feature, and y_test is the test set label.

 

data normalization

The original gray value between 0-255 is changed to a value between 0-1, so that the gradient becomes gentle and it is easier to converge to find the optimal solution.

x_train, x_test = x_train / 255.0, x_test / 255.0

 

add dimension

Add a dimension to the dataset to make it 6W sheets of 28*28 single-channel data, and let the convolution kernel perform feature extraction.

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

 

one-hot code

y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

After one-hot encoding, each category corresponds to a status code, 1 is yes, 0 is no. If a picture label is 6, the one-hot code is: 0 0 0 0 0 0 1 0 0 0

 

Split validation set

Take 5000 samples from the training set as the validation set, and the validation set is used to participate in the training and update the gradient.

x_validation = x_train[:5000]
y_validation = y_train[:5000]
x_train = x_train[5000:]
y_train = y_train[5000:]

 

build network structure

Using a three-layer convolution and two-layer fully connected network structure, the first layer of convolution uses 32 3*3 convolution kernels, and the second three-layer convolution uses 64 3*3 convolution kernels. The purpose of convolution is to extract the spatial features of the image, and the maximum pooling is to suppress over-fitting.

model = keras.models.Sequential([
    Conv2D(32, (3, 3), activation='relu',input_shape=(28, 28, 1)),
    MaxPool2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPool2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

 

Compile the model

Using the multi-category cross-entropy loss function, the optimizer chooses rmsprop, which can be selected under normal circumstances, it will not disappoint you, and this is also the default default optimizer.

model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=['accuracy'])

 

save model

checkpoint_save_path = "./checkpoint/mnist2.ckpt"
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_save_path,save_weights_only=True, save_best_only=True)

 

perform training

The dataset is fed into the neural network in batches of 32, with a total of 7 iterations, and the accuracy is tested once per iteration.

history = model.fit(x_train, y_train, batch_size=32, epochs=7,  verbose=1, validation_data=(x_validation,y_validation),validation_freq=1,callbacks=[cp_callback])

 

Evaluation model

score = model.evaluate(x_test, y_test, verbose=0, batch_size=32)
print('test accuracy:{}, test loss value: {}'.format(score[1], score[0]))

 

Visualize acc and loss curves

plt.rcParams['font.sans-serif']=['SimHei']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

plt.subplot(1, 2, 1)
plt.plot(acc, label='train Acc')
plt.plot(val_acc, label='test Acc')
plt.title('Acc curve')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(loss, label='train Loss')
plt.plot(val_loss, label='test Loss')
plt.title('Loss curve')
plt.legend()
plt.show()

At this point, run the program. After the training is completed, the training images of acc and loss will be displayed, and the checkpoint folder will appear in the current directory.

 

It can be seen that the neural network with convolution calculation has been added, and the effect has been improved to a certain extent, and the accuracy of the model test has reached 99%.

 

Reproduce the network structure

After the training is completed, an application should be written next to receive pictures, recognize pictures, and return the recognition results.

So I open a new py file here

from PIL import Image
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D,  MaxPool2D, Flatten, Dense

First, reproduce the network structure during training

model = keras.models.Sequential([
    Conv2D(32, (3, 3), activation='relu',input_shape=(28, 28, 1)),
    MaxPool2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPool2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

 

load model

model_save_path = './checkpoint/mnist2.ckpt'
model.load_weights(model_save_path)

 

image recognition

I drew ten images in Photoshop for identification

 

imgs = ['./img/p_0.jpg','./img/p_1.jpg','./img/p_2.jpg','./img/p_3.jpg','./img/p_4.jpg','./img/p_5.jpg','./img/p_6.jpg','./img/p_7.jpg','./img/p_8.jpg','./img/p_9.jpg']

for path in imgs:

    #read image
    img = Image.open(path)
    img = img.resize((28, 28), Image.ANTIALIAS)
    img_arr = np.array(img.convert('L'))

    #The training image is black and white, but the image we recognize is black and white, so the color needs to be reversed
    #Convert the pixel value to two extreme values ​​of 0 and 255, while retaining the useful information of the image, filter out the background noise and make the image cleaner
    for i in range(28):
        for j in range(28):
            if img_arr[i][j] < 150:
                img_arr[i][j] = 255
            else:
                img_arr[i][j] = 0

    # Normalized
    img_arr = img_arr / 255.0

    # add a dimension
    x_predict = img_arr.reshape(1, 28, 28, 1)

    # Identify
    result = model.predict(x_predict)
    pred = tf.argmax(result[0])
    print('Identifying:{} ---- > {}'.format(path, pred))

operation result:

Tags: AI

Posted by SuperTini on Sun, 08 May 2022 07:16:41 +0300