What you need to know when learning Keras

What you need to know when learning Keras
This paper is based on Chapter 7 of Deep Learning with Python. I hope I can remember these codes by heart. Don't type two lines at a time to check OTZ.

Create network structure

Guide Package

from keras import layers, Input
from keras.models import Sequential, Model

Creating a basic network using Sequential

seq_model = Sequential()
seq_model.add(layers.Dense(32, activation='relu', input_shape=(64,)))
seq_model.add(layers.Dense(32, activation='relu'))
seq_model.add(layers.Dense(10, activation='softmax'))

Use function API

Create basic network

input_tensor = Input(shape=(64,))
x = layers.Dense(32, activation='relu')(input_tensor)
x = layers.Dense(32, activation='relu')(x)
output_tensor = layers.Dense(10, activation='softmax')(x)
model = Model(input_tensor, output_tensor)

Create dual input network

Use keras layers. add,keras.layers.concatenate, etc. to add and connect multiple inputs

text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500


# For multiple inputs, you can name different inputs and get the model conveniently
text_input = Input(shape=(None,), dtype='int32', name='text')
embedded_text = layers.Embedding(text_vocabulary_size, 64)(text_input)
encoded_text = layers.LSTM(32)(embedded_text)

question_input = Input(shape=(None,), dtype='int32', name='question')  
embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)

concatenated = layers.concatenate([encoded_text, encoded_question], axis=-1)

answer = layers.Dense(answer_vocabulary_size, activation='softmax')(concatenated)

model = Model([text_input, question_input], answer)  # Note the double input input_ How does tensor write
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

In the training part, a group of false data is randomly selected

import numpy as np
num_samples = 1000
max_length = 100
text = np.random.randint(1, text_vocabulary_size, size=(num_samples, max_length))
question = np.random.randint(1, question_vocabulary_size, size=(num_samples, max_length))
answers = np.random.randint(answer_vocabulary_size, size=(num_samples))
answers = keras.utils.to_categorical(answers, answer_vocabulary_size)

There are two ways to write fit.

One is to pass in all input data in the form of array:

model.fit([text, question], answers, epochs=10, batch_size=128)

Another way to use Input naming is:

model.fit({'text':text, 'question':question}, answers, epochs=10, batch_size=128)

Create a multi output network (multi head network)

vocabulary_size = 50000
num_incom_groups = 10

posts_input = Input(shape=(None,), dtype='int32', name='posts')
embedded_posts = layers.Embedding(256, vocabulary_size)(posts_input)
x = layers.Conv1D(128, 5, activation='relu')(embedded_posts)
x = layers.MaxPooling1D(5)(x)

x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.GlobalAvgPooling1D(x)
x = layers.Conv1D(128, activation='relu')(x)

# All output layers need to have names
age_prediction = layers.Dense(1, name='age')(x)
income_prediction = layers.Dense(num_incom_groups, activation='softmax', name='income')(x)
gender_prediction = layers.Dense(1, activation='sigmoid', name='gender')(x)
model = Model(posts_input, [age_prediction, income_prediction, gender_prediction])

The focus of multi output network is loss. The same loss has different value ranges. In order to balance the contribution of different losses, different losses are weighted.
The weight coefficient is obtained a priori.

model.compile(optimizer='rmsprop',
              loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'],
              loss_weights=[0.25, 1, 10])  # This weighted weight is added a priori
# Dictionary writing
model.compile(optimizer='rmsprop',
              loss={'age':'mse',
                    'income':'categorical_crossentropy',
                    'gender':'binary_crossentropy'},
              loss_weights={'age':0.25,
                            'income':1.,
                            'gender':10.})

The input data can also be written in list and Dictionary:

model.fit(posts, [age_targets, income_targets, ], epochs=10, batch_size=64)
model.fit(posts, {'age':age_targets,
                  'income':income_targets,
                  'gender':gender_targets},
          epochs=10, batch_size=64)

Directed acyclic graph

The only allowed processing loop (i.e. loop connection) is the internal loop of the loop layer (but the internal loop is not known for the time being and will not be discussed)

Inception structure

In fact, Keras is built into Keras applications. inception_ v3. InceptionV3

branch_a = layers.Conv2D(128, 1, activation='relu', strides=2)(x)

branch_b = layers.Conv2D(128, 1, activation='relu')(x)
branch_b = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_b)

branch_c = layers.AveragePooling2D(3, strides=2)(x)
branch_c = layers.Conv2D(128, 1, activation='relu')(branch_c)

branch_d = layers.Conv2D(128, 1, activation='relu')(x)
branch_d = layers.Conv2D(128, 3, activation='relu')(branch_d)
branch_d = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_d)

output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

Residual connection

It is divided into constant residual connection and linear residual connection

# Constant residual connection
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)

y = layers.add([y, x])

# Linear residual connection
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.MaxPooling2D(2, strides=2)(y)

residual = layers.Conv2D(128, 1, strides=2, padding='same')(x)

y = layers.add([y, residual])

Reuse weights / share weights

from keras.models import Model
from keras import layers, Input

# Shared LSTM
lstm = layers.LSTM(32)  # Instantiate an LSTM layer once

left_input = Input(shape=(None, 128))  # Variable length sequence composed of vectors with length of 128
left_output = lstm(left_input)

right_input = Input(shape=(None, 128))
right_output = lstm(right_input)  # If you call an existing layer instance, its weight will be reused

merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, activation='sigmoid')(merged)

model = Model([left_input, right_input], predictions)
model.fit([left_data, right_data], targets)  # When training this model, the weight of LSTM layer will be updated based on two inputs

Use the model as a layer

When you call a model instance, you are reusing the weight of the model, just as when you call a layer instance, you are reusing the weight of the layer. Calling an instance, whether it is a layer instance or a model instance, will reuse the representation learned by the instance, which is very intuitive.

Reuse model instances:

from keras import layers
from keras import applications
from keras import Input

# Reused model: xception_base
xception_base = applications.Xception(weight=None, include_top=False)

left_input = Input(shape=(250, 250, 3))
right_input = Input(shape=(250, 250, 3))

left_features = xception_base(left_input)
right_features = xception_base(right_input)
merged_features = layers.concatenate([left_features, right_features], axis=-1)

Necessary functions

model correlation

  1. model.summary() # output model structure and parameter quantity
  2. model.compile(optimizer = '...', loss = '...') # compile the model
  3. model.fit(x_train, y_train, epochs=10, batch_size=128) # training model
  4. score = model.evaluate(x_train, y_train) # evaluation model
  5. keras.applications.inception_v3.InceptionV3 and Xception (higher accuracy)
  6. Depth separable convolution: layers SeparableConv2D(64, 3, activation=‘relu’)

Draw a picture

from keras.utils import plot_model

plot_model(model, to_file='model.png')
# Want to display shape information in the picture
plot_model(model, show_shapes=True, to_file='model.png')

Check the loss and accuracy of each epoch and draw a picture

model.compile(optimizer=optimizers.RMSprop(lr=2e-5),
              loss='binary_crossentrop',
              metrics=['acc'])
history = model.fit(train_features, train_labels,epochs=30, batch_size=20, validation_data=(validation_features, validation_labels))

import matplotlib.pyplot as plt
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc)+1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

Check and monitor the deep learning model

Stop training when it is observed that the verification loss is no longer improved, which can be realized by using the Keras callback function.

A callback function is an object (that is, a class instance implementing a specific method) that is passed into the model when calling fit. It will be called by the model at different time points in the training process. It can access all available data about the state and performance of the model, and take actions: interrupt training, save the model, load a different set of weights, or change the state of the model.

Built in callback function example

Some usage examples of callback functions are as follows:

  • model checkpointing: save the current weight of the model at different time points in the training process.
  • early stopping: if the verification loss is no longer improved, the training is interrupted (of course, while saving the best model obtained in the training process).
  • Dynamically adjust some parameter values in the training process: such as the learning rate of the optimizer.
  • Record the training indicators and verification indicators during the training process, or visualize the representations learned from the model (these representations are also constantly updated): the Keras progress bar you are familiar with is a callback function!

keras. The callbacks module contains many built-in callback functions, some of which are listed below:

  • keras.callbacks.ModelCheckpoint
  • keras.callbacks.EarlyStopping
  • keras.callbacks.LearningRateScheduler
  • keras.callbacks.ReduceLROnPlateau
  • keras.callbacks.CSVLogger
import keras

# The callback function is passed into the model through the callback parameter of fit, which accepts a list of callback functions.
# You can pass in any number of callback functions
callback_list = [
    keras.callbacks.EarlyStopping(  # If you don't improve, stop training
        monitor='acc',  # Verification accuracy of monitoring model
        patience=1,  # If the accuracy is no longer improved in more than one round (i.e. two rounds), the training will be interrupted
    ),
    keras.callbacks.ModelCheckpoint(  # Save current weights after each round
        filepath='my_model.h5',  # Save path of target model file
        monitor='val_loss',
        save_best_only=True,  # If val_loss is not improved, and there is no need to overwrite the model file
    )
]
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])
# Since the callback function needs to monitor the verification loss and verification accuracy, validation needs to be passed in when calling fit_ Data (validation data)
model.fit(x, y, epochs=10, batch_size=32, 
          callbacks=callback_list, 
          validation_data=(x_val, y_val))

If the verification loss is no longer improved, you can use the reducelronplatform callback function to reduce the learning rate. If there is a loss platform in the training process, increasing or reducing the learning rate is an effective strategy to jump out of the local minimum.

callback_list = [
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',  # Verification loss of monitoring model
        factor=0.1,  # When triggered, divide the learning rate by 10
        patience=10,  # If the verification loss does not improve within 10 rounds, this callback function is triggered
    )
]

Write your own callback function

Is to create keras callbacks. Subclass of callback class. Then you can implement the following methods (the function of these methods can be seen from the name), which are called at different time points in the training process.

  • on_epoch_begin is called at the beginning of each round
  • on_epoch_end is called at the end of each round
  • on_batch_begin is called before processing each batch
  • on_batch_end is called after processing each batch
  • on_train_begin is called at the beginning of the training
  • on_train_end is called at the end of the training

When these methods are called, there is a logs parameter. This parameter is a dictionary, which contains the information of the previous batch, previous round or previous training, that is, training indicators and verification indicators.

In addition, the callback function can access the following properties:

  • self.model: the model instance that calls the callback function
  • self.validation_data: pass in fit as the value of validation data

Model integration

It refers to bringing together the prediction results of a series of different models to get better prediction results

To ensure the effectiveness of the integration method, the key lies in the diversity of this group of classifiers. If the deviations of all models are in the same direction, the integration will retain the same deviation. If the deviations of each model are in different directions, these deviations will offset each other, and the integration results will be more stable and accurate.

Therefore, the integrated model should be as good as possible and as different as possible. This usually means using very different architectures and even different types of machine learning methods.

Take the classification problem as an example. The simplest way to gather the prediction results of a group of classifiers (i.e. ensemble the classifiers) is to take the average of their prediction results as the prediction results:

This method works only when the performance of each classifier in this group is almost the same.

preds_a = model_a.predict(x_val)
preds_b = model_b.predict(x_val)
preds_c = model_c.predict(x_val)
preds_d = model_d.predict(x_val)

final_preds = 0.25 * (preds_a + preds_b + preds_c + preds_d)

What's more optimized is to weight them. The weight value can be calculated by random search or simple optimization algorithm (such as Nelder Mead method) (I don't know this one for the time being, so I need to learn later)

Tags: Python TensorFlow Deep Learning Convolutional Neural Networks

Posted by fonecave on Thu, 31 Mar 2022 00:48:44 +0300