[deep learning] sprite pattern recognition (CNN convolutional neural network training)

catalogue

Sprite recognition (CNN convolutional neural network training)

Gilded sky It is an Internet skill certification website, all of which are crawler problems. There is one question Reptile sprite figure-2 Need to use pictures to identify. So imitate mnist and train a model with CNN convolutional neural network.

Github project source code

# Based on tensorflow 2.0
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorflow==2.0.0
# Project composition
├─glidesky
    │  model.h5                      # Model file
    │  predict.py                    # Model call
    │  train.py                      # model training
    │
    ├─data_source
    │  │  data.h5                 # Dataset file
    │  │  make_data_set.py        # Generate dataset
    │  │  spider.py               # Reptile
    │  │
    │  └─imgs                     # Store collected pictures
    │
    ├─logs            # Training visualization log
    │
    ├─test            # Test picture

Data acquisition spider py

For data acquisition, first find a page containing all numbers from 0 to 9, and then use the crawler to collect the digital pictures for subsequent use as a data set for in-depth learning.

  1. Because each request is a different diagram, but the number is fixed, so just keep requesting the same page
  2. Only 10 pictures are reserved for each request, so as to ensure the uniform distribution of sample data
  3. The collection process is time-consuming and boring, so the original plan was to collect 1 million pieces, and only 450000 pieces were collected later, which should be similar

import re
import os
import uuid
import base64
import requests

from PIL import Image
from io import BytesIO
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor

Cookie = 'your cookies'
headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
    'Cookie': Cookie,
    'Host': 'www.glidedsky.com',
    'Referer': 'http://www.glidedsky.com/level/web/crawler-basic-2?page=1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.92 Safari/537.36'
}


def get_img(text):
    """
    :param text: Get picture template
    :return:
    """
    img_str = re.findall('base64,(.*?)"', text)[0]
    img_fp = BytesIO(base64.b64decode(img_str.encode('utf-8')))
    img = Image.open(img_fp)
    return img


def crawler(url):
    text = requests.get(url, headers=headers).text
    img = get_img(text)
    rows = BeautifulSoup(text, 'lxml').find_all('div', class_="col-md-1")
    num_labels = list(str(123171140339373274129338158411319368))
    num_imgs = []
    for row in rows:
        for div in row.find_all('div'):
            css_name = div.get('class')[0].split(' ')[0]
            tag_x = re.findall(f'\.{css_name} \{{ background-position-x:(.*?)px \}}', text)
            tag_y = re.findall(f'\.{css_name} \{{ background-position-y:(.*?)px \}}', text)
            width = re.findall(f'\.{css_name} \{{ width:(.*?)px \}}', text)
            height = re.findall(f'\.{css_name} \{{ height:(.*?)px \}}', text)
            tag_x = abs(int(tag_x[0]))
            tag_y = abs(int(tag_y[0]))
            width = int(width[0])
            height = int(height[0])
            box = (tag_x, tag_y, tag_x + width, tag_y + height)
            num_imgs.append(img.crop(box))
    save_list = [str(i) for i in range(10)]
    for num_img, num_label in zip(num_imgs, num_labels):
        if num_label in save_list:
            file_name = f'./imgs/{num_label}_{uuid.uuid1()}.png'
            num_img = num_img.resize((20, 20))
            num_img.save(file_name)
            save_list.remove(num_label)

os.makedirs('./imgs', exist_ok=True)
urls = []
for _ in range(90000):
    url = f'http://www.glidedsky.com/level/web/crawler-sprite-image-2?page=999'
    urls.append(url)

pool = ThreadPoolExecutor(max_workers=20)
for result in pool.map(crawler, urls):
    ...

Make dataset_ data_ set. py

After the unified size of all pictures is 20 * 20, it will be converted to gray value; The corresponding tag is transformed into a single hot code, and the training set and test set data are randomly segmented through sklearn, and finally saved as h5 data set file.

  1. The test set should not overlap with the training set, so as to evaluate the generalization ability of the model
  2. Reasons for not pre-processing when saving data sets: save directly, and the data file size is 190 M; If the normalized value is saved, it is 1.9 G
  3. h5 is the 5th generation version of Hierarchical Data Format (HDF5), which is a file format and library file used to store scientific data
  4. Single hot code refers to one effective code. For example, there are ten numbers from 0 to 9, which can be represented by a list with a length of 10. For example, 2 is [0,0,1,0,0,0,0,0,0,0], 9 is [0,0,0,0,0,0,0,0,1], and so on; The value can be passed through NP Argmax() get.

import os
import h5py
import numpy as np
from PIL import Image
from sklearn.model_selection import train_test_split

images = []
labels = []

for path in os.listdir('./imgs'):
    label = int(path.split('_')[0])
    label_one_hot = [0 if i != label else 1 for i in range(10)]
    labels.append(label_one_hot)

    img = Image.open('./imgs/' + path).resize((20, 20)).convert('L')
    img_arr = np.reshape(img, 20 * 20)
    images.append(img_arr)

# Split training set and test set
train_images, test_images, train_labels, test_labels = train_test_split(images, labels, test_size=0.1, random_state=0)

with h5py.File('./data.h5', 'w') as f:
    f.create_dataset('train_images', data=np.array(train_images))
    f.create_dataset('train_labels', data=np.array(train_labels))
    f.create_dataset('test_images', data=np.array(test_images))
    f.create_dataset('test_labels', data=np.array(test_labels))

Training model train py

Construct a convolutional neural network model, feed data (400000 training sets and 40000 test sets) and train the model.

  1. The data set is composed of a gray image of black words on a white background into a matrix (20 * 20). Each number is between 0-255, black 0 and white 255. After preprocessing, convert it into white words on black background, and divide it by 255.0 to complete normalization. After data normalization, it helps to improve the accuracy of the model. Why normalize
  2. epochs, the data of the training set are trained once, that is, an epoch; epochs settings are appropriate for many times. At present, there is no universal formula, which needs to be tried constantly
  3. When compiling the model, you need to specify parameters such as optimizer, loss loss function, metrics and so on
  4. In the process of model training, you can specify callback functions, such as saving models, logging, etc
  5. The trained model can continue training after loading
import os
import h5py
import tensorflow as tf
from tensorflow.keras import layers, models


class Train:
    def __init__(self):
        # Final model storage path
        self.modelpath = './model.h5'

        # Define model
        if os.path.exists(self.modelpath):
            self.model = tf.keras.models.load_model(self.modelpath)
            print(f"{self.model} The model is loaded successfully. Continue training...")
        else:
            self.model = models.Sequential([
                # The first layer is convolution. The size of convolution kernel is 3 * 3, 32, and 28 * 28 is the size of the image to be trained
                layers.Conv2D(32, (3, 3), activation='relu', input_shape=(20, 20, 1)),
                layers.MaxPooling2D((2, 2)),
                # Layer 2 convolution, convolution kernel size is 3 * 3, 64
                layers.Conv2D(64, (3, 3), activation='relu'),
                layers.MaxPooling2D((2, 2)),
                # Layer 3 convolution, convolution kernel size is 3 * 3, 64
                layers.Conv2D(64, (3, 3), activation='relu'),
                layers.Flatten(),
                layers.Dense(64, activation='relu'),
                layers.Dense(10, activation='softmax'),
            ])
        self.model.summary()

        # Read data
        with h5py.File('./data_source/data.h5', 'r') as f:
            self.train_images = f['train_images'][()]
            self.train_labels = f['train_labels'][()]
            self.test_images = f['test_images'][()]
            self.test_labels = f['test_labels'][()]

        train_count, test_count = 400000, 40000
        self.train_images = self.train_images[:train_count].reshape((train_count, 20, 20, 1))
        self.train_labels = self.train_labels[:train_count]
        self.test_images = self.test_images[:test_count].reshape((test_count, 20, 20, 1))
        self.test_labels = self.test_labels[:test_count]

        # Data processing normalization
        self.train_images = 1 - self.train_images / 255.0
        self.test_images = 1 - self.test_images / 255.0

    def train(self):
        # Visual tensorboard --logdir=D:\GitHub\antman\glidedsky\logs
        TensorBoardcallback = tf.keras.callbacks.TensorBoard(
            log_dir='logs',
            histogram_freq=1,
            write_graph=True,
            write_images=True,
            update_freq=1
        )
        self.model.compile(optimizer='Adam',
                           loss='categorical_crossentropy',
                           metrics=['accuracy'])
        self.model.fit(self.train_images, self.train_labels, epochs=5, callbacks=[TensorBoardcallback])
        self.model.save(self.modelpath)

    def test(self):
        self.model = tf.keras.models.load_model(self.modelpath)
        test_loss, test_acc = self.model.evaluate(self.test_images, self.test_labels)
        print("Accuracy: %.4f,A total of%d Picture " % (test_acc, len(self.test_labels)))


if __name__ == "__main__":
    app = Train()
    app.train()
    app.test()

Model call

Call input of the model: list composed of normalized four-dimensional matrix (size 20 * 20, need to be converted into white words on black background); Output: a list of labels with unique thermal codes.

  1. The input of the model should be processed in the same way as the data during training
  2. The subscript of the maximum value is taken as the unique heat code, that is, the label number represented
import numpy as np
import tensorflow as tf
from PIL import Image

class Predict(object):
    def __init__(self):
        self.cnn = tf.keras.models.load_model('./model.h5')

    def predict(self, image_path):
        # Read pictures in black and white
        img = Image.open(image_path).resize((20, 20)).convert('L')
        img_arr = 1 - np.reshape(img, (20, 20, 1)) / 255.0
        x = np.array([img_arr])

        # API refer: https://keras.io/models/model/
        y = self.cnn.predict(x)

        # Because only one picture is imported from x, take y[0]
        # np.argmax() gets the subscript of the maximum value, that is, the number represented by
        print(image_path)
        print(y[0])
        print('        -> Predict digit', np.argmax(y[0]))

if __name__ == "__main__":
    app = Predict()
    app.predict('./test/0.png')
    app.predict('./test/3.png')
    app.predict('./test/4.png')
    app.predict('./test/7.png')
    app.predict('./test/9.png')

Pass the reptile test

Call the model directly in the crawler. Because there is a probability problem, you can solve this crawler problem by running several times. If you are interested, you can refer to the complete code glidedsky clearance notes

Posted by scripterdx on Mon, 16 May 2022 18:47:51 +0300