[Self-study on the principles of artificial intelligence] Convolutional neural network: breaking the bottleneck of image recognition

๐Ÿ˜ŠHi, I'm Xiaohang, a literary and artistic young man who is getting bald and getting stronger.
๐Ÿ””This article explains the convolutional neural network: breaking the bottleneck of image recognition, roll it up together!

1. Handwriting recognition

In the field of machine learning and neural networks, there is a classic "Hello World" at the application layer: handwriting recognition, so it has become a practical project for many beginners.


This is a handwritten "5", which is a grayscale image of 28 * 28, and each pixel uses a one-byte unsigned number to represent its level. If it is 0, it is the darkest (pure black); if it is 255, it is the lightest (pure white).

In reality, human beings are different from precise but rigid computers after all, every time a number is written by hand, it may be different. For example, the first time we write it like this:

The second time it was written like this due to shaking hands:

At this time, there is no definite rule to judge what number is based on the gray value of the pixel. In other words, this is no longer a problem that is suitable for computer mechanical logic to make judgments. We need to use a system with certain fault tolerance. Doing this, it's clear that neural networks are a great choice.

We take the mnist dataset (handwritten picture data) as input with a 784-dimensional vector, and send it to the neural network for training in turn.

The different performance of the model on the training set and the test set leads to three common phenomena in machine learning:
1. The accuracy rate of the training set is very low. Well, this model is probably useless. This phenomenon is called underfitting. The model may be too simple
2. The accuracy of the training set is high and the accuracy of the test set is low, indicating that the model has a good generalization ability and the ability to solve new problems
3. Both the training set and the test set have high accuracy, indicating that the generalization ability of this model is not good, which is called overfitting. There are many reasons, such as using an overly complex model to fit a simple problem, of course it can be solved There are also many ways of overfitting, such as: adjusting the neural network structure, L2 regularization, node deactivation (Dropout) regularization, etc.

Of course, in this section, we mainly explain it with a fully connected neural network. As an image is a two-dimensional object, there is always a relationship between adjacent elements. If we force it to be reduced to one dimension, these relationships will be destroyed and the importance will be lost. Characteristics. In practice, the convolutional neural network will have better results and is the more commonly used method. This also shows that important features have a great effect on improving the generalization ability of the model.

As for how to extract important features, here is a brief explanation, let's take the above "5" as an example:

๐Ÿ’กHmm... How to quickly understand this convolution kernel? Let's take a teacup image as an example and think about what the convolved image looks like?

It turned out that the vertical edge was extracted.

Let's take a look at the details with an 8*8 small picture:

You will find that the resulting image has the characteristics of vertical stripes

We are working on an extreme situation, or take the above cup as an example:


We convolve this image, as witty as you, you will find that only the middle two columns have values, and the other two columns are 0

For both sides, they will cancel each other on the left and right, one positive and one negative

For the middle part, the left side is big and the right side is small, which is completely asymmetrical. After adding up, this value will become very large, or the characteristics will be highlighted.

2. "Alchemy"

๐Ÿ”จWe implement the above process code: mnist_recognizer.py

# import dataset
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
import matplotlib.pyplot as plt
# one-hot encoding conversion
from keras.utils import to_categorical

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# View the type of sample data: 60000, 28, 28
print("X_train.shape:" + str(X_train.shape))
print("X_test.shape:" + str(X_test.shape))
print("Y_train.shape:" + str(Y_train.shape))
print("Y_test.shape:" + str(Y_test.shape))
# # print tag value
# print(Y_train[0])
# # The first sample data of the training set, drawing mode: grayscale
# plt.imshow(X_train[0], cmap="gray")
# plt.show()

# 28 * 28 = 784 two-dimensional to one-dimensional
X_train = X_train.reshape(60000, 784) / 255.0 # Reduce gaps, speed up gradient descent
X_test = X_test.reshape(10000, 784) / 255.0

Y_train = to_categorical(Y_train, 10)
Y_test = to_categorical(Y_test, 10)

model = Sequential()

model.add(Dense(units=256, activation='relu', input_dim=784))
model.add(Dense(units=256, activation='relu'))
model.add(Dense(units=256, activation='relu'))
model.add(Dense(units=10, activation='softmax'))
# Using the multi-class cross-entropy cost function
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.05), metrics=['accuracy'])
model.fit(X_train, Y_train, epochs=5000, batch_size=256)

loss, accuracy = model.evaluate(X_test, Y_test)
print("loss" + str(loss))
print("accuracy" + str(accuracy))

๐ŸšฉTraining results and model testing:

๐Ÿ“Œ [ author ]   Years of literature and art
๐Ÿ“ƒ [ renew ]   2023.1.22
โŒ [ Errata ]   /* no yet */
๐Ÿ“œ [ statement ]   Due to the limited level of the author, errors and inaccuracies in this article are inevitable.
              I also want to know about these mistakes, and I hope readers can criticize and correct me!
๐Ÿ” [ the code ]   https://github.com/itxaiohanglover/ai_lesson

Tags: AI CNN

Posted by alchemist_fr on Mon, 23 Jan 2023 00:42:36 +0300