[from the official case study framework Keras] seq2seq based on character LSTM

[from the official case study framework Keras] seq2seq based on character LSTM

Keras official case link
Tensorflow official case link
Paddle official case link
Pytoch official case link

Note: this series only helps you to quickly understand and learn, and can independently use the relevant framework for in-depth learning research. Please learn and supplement the theoretical part by yourself. The official classic cases of each framework are very well written, which is worth learning and using. It can be said that most common related tasks can be solved after fully understanding the official classic cases and modifying them.

Abstract: [from the official case study framework Keras] Based on the character LSTM's seq2seq, the translation task from English to French is realized, and the graphic explanation of LSTM's implementation of seq2seq is attached

1 Introduction

The character based seq2seq realizes the translation task from English to French


  • Domain to domain (English to French)
  • encoder LSTM retains the last state layer and discards the output layer
  • decoder LSTM is trained to change the target sequence into the same sequence, but offset by a time step in the future, and uses the teacher forcing training method
  • In the inference stage, the input is encoded into a state vector (encoder_state) and passed to the decoder. The input of the sequence start character (in this case, \ t) is added to the decoder for character level prediction. For example, the letter A is predicted, and then a is added to the input of the decoder. The same character level prediction is carried out until the sequence end character (in this case, \ n) is finally predicted

2 Setup

Import the required package

import numpy as np
import tensorflow as tf
from tensorflow import keras

2.1 Download the data

Local can be copied directly and downloaded by linking to the browser: http://www.manythings.org/anki/fra-eng.zip

!!curl -O http://www.manythings.org/anki/fra-eng.zip
!!unzip fra-eng.zip

2.2 Configuration

Define super parameters

# Configuration

batch_size = 64  # Batch size for training.
epochs     = 100  # Number of epochs to train for.
latent_dim = 256  # Latent dimensionality of the encoding space.
num_samples = 10000  # Number of samples to train on.
# Path to the data txt file on disk.
data_path = "fra.txt"

3 Prepare the data

Vectorize the text data, and refer to the comments in the code for details

Here we should focus on understanding decoder_input_data and decoder_ target_ The relationship between data and one step difference. Don't look at the following figure, think about why you need to set a step difference timestamp, and then look at the example diagram of encoder decoder

# Vectorize the data.
input_texts = []  # Enter text
target_texts = [] # Output text
input_characters = set()  # Character dictionary for input text
target_characters = set() # Character dictionary for output text
with open(data_path, "r", encoding="utf-8") as f:
    lines = f.read().split("\n")
                    # Take num_ Minimum number of samples and file lines
for line in lines[: min(num_samples, len(lines) - 1)]:
    input_text, target_text, _ = line.split("\t")
    # We use "tab" as the "start sequence" character
    # for the targets, and "\n" as "end sequence" character.
    target_text = "\t" + target_text + "\n" # In this example '\ t' is taken as the starting character and '\ n' is taken as the ending character
	# Add character to character dictionary
    for char in input_text:
        if char not in input_characters:
    for char in target_text:
        if char not in target_characters:

input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])

print("Number of samples:", len(input_texts))
print("Number of unique input tokens:", num_encoder_tokens)
print("Number of unique output tokens:", num_decoder_tokens)
print("Max sequence length for inputs:", max_encoder_seq_length)
print("Max sequence length for outputs:", max_decoder_seq_length)

# token, the result after word segmentation is changed to index, such as a:1,b:2
input_token_index = dict([(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict([(char, i) for i, char in enumerate(target_characters)])

encoder_input_data = np.zeros(
    (len(input_texts), max_encoder_seq_length, num_encoder_tokens), dtype="float32"
decoder_input_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype="float32"
decoder_target_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype="float32"

for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
    for t, char in enumerate(input_text):
        encoder_input_data[i, t, input_token_index[char]] = 1.0
    encoder_input_data[i, (t + 1): , input_token_index[" "]] = 1.0
    for t, char in enumerate(target_text):
        # decoder_target_data is ahead of decoder_input_data by one timestep
        decoder_input_data[i, t, target_token_index[char]] = 1.0
        if t > 0:
            # decoder_target_data will be ahead by one timestep
            # and will not include the start character.
            decoder_target_data[i, t - 1, target_token_index[char]] = 1.0
    decoder_input_data[i, t + 1 :, target_token_index[" "]] = 1.0
    decoder_target_data[i, t:, target_token_index[" "]] = 1.0

Here, the shape of the vectorized "time series data" is
(len(input_texts), max_encoder(decoder)_seq_length, num_encoder(decoder)_tokens)

  • axis=0 number of translated text pairs
  • Axis = 1 is the maximum sequence length in the encoder (decoder). If it is less than the maximum length, fill it with "". Here is timestamp
  • Axis = 2 index of token in character dictionary

For example: if the character dictionary is abc and the input text is ['ac', 'bc], it will be encoded as
In fact, it is one hot encoding mode. The part of all zeros is omitted here

4 Build the model

Let's take a look at the model first. With the help of the following specific examples, you will fully understand how to use the encoder decoder architecture to implement seq2seq. The following two figures are completely corresponding and correspond to the following code variable names.


a key!: Return in Keras_ Sequences and return_ The Boolean value of state will get different state values

The following is the description of these two parameters. I didn't understand them for the first time

  • return_sequences: the default is false. In the output sequence, whether to return a single hidden state value or all the hidden state values of time step. False returns a single, true returns all.

  • return_state: the default is False. Whether to return to the last state other than output.

Here, referring to the above figure and taking an example, you will fully understand and understand the basic idea of implementing seq2seq with encoder decoder architecture

We all know that LSTM is an improvement based on RNN, which has only one transmission state compared with RNN h t h_t ht, LSTM has two transmission states, one c t c_t ct (cell state), and a h t h_t ht​(hidden state). Simply put, increase c t c_t ct # will affect later h t h_t ht, so as to solve the problem of gradient disappearance / explosion in RNN.

OK, let's look back at the Encoder part in the figure, when return_sequences and return_ This result will be obtained when the Boolean value of state is different

return_sequences return_state output
False False h5
True False h1,h2,h3,h4,h5
False True h5,h5,c5
True True h1-h5,h5,c5

Thinking: in the encoder decoder architecture, what role does the encoder play and what information needs to be transmitted to the decoder?
Answer: the encoder needs to_ Input is the input in the figure above_ 1 time sequence is encoded into a vector, representing Input_1 message, that is, the encoder composed of h5 and c5_ states

If it is a simple classification task, the encoder is enough. Take out h5 and give it to the sense layer for classification task

If you already understand, return_sequences and return_ The result of Boolean value of state will be very clear and simple after looking at the code!

The following code has given the same representation comments as the above figure to help you understand

# Define an input sequence and process it.
encoder_inputs = keras.Input(shape=(None, num_encoder_tokens)) # It's a fine day today
encoder = keras.layers.LSTM(latent_dim, return_state=True) #h5,h5,c5 
encoder_outputs, state_h, state_c = encoder(encoder_inputs)

# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c] # h5,c5 

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = keras.Input(shape=(None, num_decoder_tokens)) # <start> Today is a nice day

# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.

# h1-h6,h6,c6
decoder_lstm = keras.layers.LSTM(latent_dim, return_sequences=True, return_state=True)
# h1-h6
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
 # h1-h6 are classified one by one
decoder_dense = keras.layers.Dense(num_decoder_tokens, activation="softmax")
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)

5 Train the model

    optimizer="rmsprop", loss="categorical_crossentropy", metrics=["accuracy"]
    [encoder_input_data, decoder_input_data],
# Save model

6 Run inference (sampling)

In the prediction stage, given a sequence, how do we substitute it into the encoder decoder model?

Answer: input the sequence to the encoder to get the encoder_state, which represents the encoder of the input sequence_ State is the initial of the Decoder_ State initial vector, which is passed to the Decoder end, and '\ t' represents the initial input of the Decoder end

Similarly, the comments of the following code are the same as the example above

# Define sampling models
# Restore the model and construct the encoder and decoder.
model = keras.models.load_model("s2s")

# The encoder side generates an encoder_states h5,c5
encoder_inputs = model.input[0]  # input_1
encoder_outputs, state_h_enc, state_c_enc = model.layers[2].output  # lstm
encoder_states = [state_h_enc, state_c_enc]
encoder_model = keras.Model(encoder_inputs, encoder_states)

# The decoder side generates a decoder_outputs h1,h2,h3,h4,h5,h6
decoder_inputs = model.input[1]  # input_2
decoder_state_input_h = keras.Input(shape=(latent_dim,), name="input_3")
decoder_state_input_c = keras.Input(shape=(latent_dim,), name="input_4")
# In the prediction stage
# Decoder here_ states_ Inputs is the encoder generated at the encoder end_ States is initialized
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_lstm = model.layers[3] # lstm_1
decoder_outputs, state_h_dec, state_c_dec = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs
# h1-h6,c6
decoder_states = [state_h_dec, state_c_dec]
decoder_dense = model.layers[4] # dense
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = keras.Model(
    [decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states

# Reverse-lookup token index to decode sequences back to
# something readable.
# index: char
reverse_input_char_index = dict((i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict((i, char) for char, i in target_token_index.items())

def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index["\t"]] = 1.0

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ""
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict([target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if sampled_char == "\n" or len(decoded_sentence) > max_decoder_seq_length:
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.0

        # Update states
        states_value = [h, c]
    return decoded_sentence
for i in range(20):
    seq_index = np.random.randint(num_samples)
    # Take one sequence (part of the training set)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index : seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print("Input sentence:", input_texts[seq_index])
    print("Decoded sentence:", decoded_sentence)

In this example, the encoder decoder structure is no different. Both are LSTM, but the targets are different. Understand the return in Keras_ Sequences and return_ The result of Boolean value of state can easily use LSTM to implement seq2seq.

Tags: Python Deep Learning NLP keras

Posted by sarah on Sun, 08 May 2022 07:29:41 +0300