Deep learning recommendation method related papers and PyTorch implementation

Note that the code in this article is written by myself according to the paper. There must be some details that are not expressed, and it is inevitable that there are errors. It is recommended to see the original code of the paper for more model details. If you find errors in the code, please correct them in the comment area

I. AutoRec

1.1 thesis

Thesis title: AutoRec: Autoencoders Meet Collaborative Filtering 2015 WWW

Thesis address: <AutoRec: Autoencoders Meet Collaborative Filtering>

The first attempt of deep learning in Recommendation System

Using collaborative filtering (CF) based on self encoder, it surpasses the previous CF in Movielens and Netflix data sets. The model structure is shown in the figure below

CF finds out the user's interest through the user item interaction matrix (each row is the user u's evaluation of different items i, and each column is the evaluation of different users on items i), and then recommends the corresponding items to the user according to the user's interest.

Each row of the user based interaction matrix (user based autorec)

Or each column (item based autorec)

Input to the self encoder, and optimize the self encoder by using the following loss function (take item based autorec as an example)

Where h() is the output of the self encoder and r is the real article column vector

The U-Th output of the self encoder is the score of the user u predicted by the model on the current item i. Enter different item column vectors in turn, and you will get the user U's score for all items, and recommend them to the user according to the predicted score.

(

I wonder why the author said that the output is a score, which means that the model doesn't do anything, that is, input a vector and output another vector.

First of all, we should know what self encoder does. Self encoder is an unsupervised learning model, which is divided into two parts - encoder and decoder.

The encoder is equivalent to dimensionality reduction and compression of the input. The purpose is to extract the important information in the input and generate a code for the input, which is this part of the picture

The decoder attempts to restore the original input through the encoding output of the encoder. If the information extracted by the encoder is good, the decoding output of the decoder will be very close to the original output, which is this part in the figure

So how to make the encoder extract more important information is constrained by the loss function, which is the one in this paper

It is also called 'reconstruction loss' in codec.

Keep using the loss function for training, and slowly the encoder will learn to extract important information from the input.

)

1.2 code

# coding=UTF-8
import torch
import torch.nn as nn


# Recommendation model based on self encoder
class AutoRec(nn.Module):
    # Network initialization layer
    def __init__(self, feature_dim, hidden_dim):
        # feature_dim: user / item scoring vector dimension
        # hidden_dim: hidden layer dimension
        super(AutoRec, self).__init__()
        self.feature_dim = feature_dim
        self.hidden_dim = hidden_dim

        # encoder
        self.encoder = nn.Sequential(
            nn.Linear(in_features=self.feature_dim, out_features=self.hidden_dim, bias=True),
            # The paper says that the effect of ReLU activation is the worst
            # sigmoid is used
            nn.Sigmoid(),
        )

        # decoder
        self.decoder = nn.Sequential(
            nn.Linear(in_features=self.hidden_dim, out_features=self.feature_dim, bias=True),
            nn.Sigmoid(),
        )

        # Initialize network layer
        self.init_layer()

    def init_layer(self):
        # Traverse all network layers
        for layer in self.modules():
            if isinstance(layer, nn.Linear):
                layer.bias.data.fill_(1)

    # Forward transfer and build calculation diagram
    def forward(self, x):
        # x is the entered user rating vector or item rating vector
        # Code first
        encoder_x = self.encoder(x)
        # Then decode
        decoder_x = self.decoder(encoder_x)
        # Return results
        return decoder_x


# Define loss function
class AutoRecLoss(nn.Module):
    # initialization
    def __init__(self):
        super(AutoRecLoss, self).__init__()

    # Calculate loss function
    def forward(self, r, pre_r, w, v):
        return nn.MSELoss(r, pre_r)


# # The optimizer of pytorch comes with L2 regularization
# AutoRec = AutoRec(100, 50)
# # Regularization equilibrium factor
# lamuda = int(input())
# optimer = torch.optim.Adam(AutoRec.parameters(), lr=0.001, weight_decay=lamuda / 2)

# Model test
if __name__ == '__main__':
    # Randomly generate a user item scoring matrix
    x = torch.randn(50, 100)
    # Input each user rating vector into AutoRec
    auto_rec = AutoRec(100, 50)
    output = auto_rec(x)
    print(output)

II. Deep Crossing

2.1 thesis

Thesis title: deep crossing: Web scale modeling without manually crafted combinatorial features 2016 KDD

Thesis address: <Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features>

Deep learning is used to automatically learn effective feature representation, instead of the time-consuming manual feature engineering.

Deep Crossing will learn effective feature representation from the input data and use the learned features for the next prediction.

Deep Crossing is proposed to solve this problem: when a user searches for keywords in Microsoft Bing, how does the system recommend corresponding advertisements according to the keywords he searches. The following is the explanation of some participants in this process

First, the original features input into the model (such as user age, search term, search time...) Do some processing, such as encoding it with one hot, and then input it into the model. The model structure is shown in the figure below

Embed the input original features

(

Embedding, embedding, is an important idea in deep learning.

Embedding in deep learning refers to how to more reasonably represent an object with a vector. For example, word embedding refers to the use of vectors to represent a word, which can also reflect the semantic level of words in European space. Graph embedding refers to the use of vectors to represent nodes or the whole graph in graph structure data, so that vectors can also reflect the connection relationship between nodes in European space.

Specifically, the implementation is a full connection layer, that is, a linear transformation, in which the parameters are learned through the training process.

It can also play the role of densifying sparse vectors, such as embedding here.

)

Then the results of each Embedding are spliced as a new feature vector

Input the feature vector into the full connection layer with residual connection to combine different features (this is also the origin of the name Deep Crossing. Previous feature combinations are displayed by formula, such as FM, while Deep Crossing uses neural network to learn this combination)

Then CTR prediction is carried out, and the loss function (cross entropy loss) is used

Optimize the parameters in the network.

2.2 code

import torch
import torch.nn as nn
import torch.nn.functional as F


# Residual full connection layer for feature crossover in DeepCrossing
class ResidualBlock(nn.Module):
    def __init__(self, feature_dim, out_dim, use_residual=True):
        super(ResidualBlock, self).__init__()
        # Whether to use residual connection
        self.use_residual = use_residual
        # Enter the dimension of the feature
        self.feature_dim = feature_dim
        # Dimension of output feature of middle hidden layer
        self.out_dim = out_dim
        # Full connection layer for feature crossing
        self.feature_interaction_layer = nn.Sequential(
            # The residual in the paper is used in the full connection layer
            # Two linear transformation layers
            # Offset used
            # The activation function is ReLU
            # In Figure 2 of the paper
            nn.Linear(in_features=feature_dim, out_features=out_dim, bias=True),
            nn.ReLU(True),
            nn.Linear(in_features=out_dim, out_features=feature_dim, bias=True),
        )

    # Forward transfer and establish calculation diagram
    def forward(self, x):
        # x is the input feature
        # Obtained by Embedding+concat
        residual_out = self.feature_interaction_layer(x)
        # If residual connection is used
        if self.use_residual:
            # Make residual connection
            residual_out = residual_out + x
        # activation
        # Or ReLU function
        # See Figure 2 in the paper
        residual_out = F.relu(residual_out)
        return residual_out


class DeepCrossing(nn.Module):
    # Initialize network layer
    # See Figure 1 of the paper
    # Some of the input features need to be embedded
    # Other features are not embedded
    def __init__(
            self,
            embedding_layer_num=5,
            residual_layer_num=3,
            need_embd_dim=100,
            without_embd_dim=10,
            embedding_dim=50,
            output_dim=25,
    ):

        super(DeepCrossing, self).__init__()
        # Number of embedded layers
        self.embedding_layer_num = embedding_layer_num
        # Number of characteristic cross layers
        self.multiple_residual_units_num = residual_layer_num
        # Enter the dimension of the feature to be embedded in
        self.input_dim = need_embd_dim
        # Dimensions of features in the input that do not need to be embedded
        self.other_input_dim = without_embd_dim
        # Dimension of embedded vector
        self.embedding_dim = embedding_dim
        # multiple_residual_units dimension of hidden layer output in the middle
        self.output_dim = output_dim

        # Embedded layer
        self.embedding_layer_block = nn.Linear(self.input_dim, self.embedding_dim, bias=True)
        self.embedding_layer = nn.ModuleList()
        for i in range(self.embedding_layer_num):
            self.embedding_layer.append(self.embedding_layer_block)

        # Multiple Residual Units, characteristic cross layer
        self.multiple_residual_units = nn.ModuleList()
        for i in range(self.multiple_residual_units_num):
            self.multiple_residual_units.append(
                ResidualBlock(self.embedding_layer_num * self.embedding_dim + self.other_input_dim, self.output_dim))
        # Note that this is different from the final processing of the embedded layer
        # Because in Figure 1 of the paper
        # multiple_residual_units is a vertical structure, so it needs to be expanded into a sequential relationship
        # The embedded layer is a horizontal structure, which can be saved with a list
        self.multiple_residual_units = nn.Sequential(*self.multiple_residual_units)

        # scoring layer
        # CTR result prediction
        # Second classification problem
        self.scoring_layer = nn.Linear(in_features=self.embedding_layer_num * self.embedding_dim + self.other_input_dim, out_features=2, bias=False)

    # Forward transfer and establish calculation diagram
    def forward(self, x_list, x):
        # The number of features to be embedded must be equal to the number of embedded layers
        assert len(x_list) == self.embedding_layer_num
        # Embed the features that need to be embedded
        embedding_result = []
        for i in range(self.embedding_layer_num):
            temp_result = self.embedding_layer[i](x_list[i])
            # Note that the embedding layer of the paper has a truncation operation, which is in formula (2) of the paper
            temp_result = torch.clamp(temp_result, min=0.0)
            embedding_result.append(torch.tensor(temp_result, dtype=torch.float32))
        # Connect embedded results
        embedding_result = torch.cat(embedding_result, dim=-1)
        embedding_result = torch.cat([embedding_result, x], dim=-1)

        # Perform feature crossover
        feature_interaction = self.multiple_residual_units(embedding_result)
        
        # Result prediction
        out = self.scoring_layer(feature_interaction)
        return out


# Model test
if __name__ == '__main__':
    # Randomly generate features that need to be embedded
    x_list = [[0, 1], [1, 0]]
    x_list = torch.tensor(x_list, dtype=torch.float32)
    # Randomly generate features that do not need to be embedded
    x = [16, 14, 13]
    x = torch.tensor(x, dtype=torch.float32)
    deep_crossing = DeepCrossing(
        embedding_layer_num=2,
        residual_layer_num=1,
        need_embd_dim=2,
        without_embd_dim=3,
        embedding_dim=5,
        output_dim=10
    )
    output = deep_crossing(x_list, x)
    print(output)

III. NeuralCF

3.1 thesis

Thesis title: Neural Collaborative Filtering 2017 WWW

Thesis address: <Neural Collaborative Filtering>

NCF (Neural network--based Collaborative Filtering) is a matrix decomposition method in the form of neural network

The previous matrix decomposition (MF) method decomposes the user item interaction matrix to obtain the hidden vectors of users and items respectively, and then uses the inner product of the hidden vector to measure the user's love for the corresponding items and make recommendations.

The author believes that the use of inner product hinders the generalization of MF and cannot fully capture the interaction between users and item hidden vectors. In this paper, feedforward neural network is used to learn the interaction between user and item hidden vector, instead of the previous way of using inner product. The model structure is shown in the figure below

 

For example, the feature vector of the user's item is decomposed (embedded) into the feature matrix of the user's input of a certain item, and then the feature vector of the user's item is embedded (embedded) into the feature matrix of the user's input in the last step.

Then calculate the loss between the predicted value and the real value of the model, and update the network parameters. The loss function is (cross entropy loss)

The model expression is

The above is the standard NCF. In this paper, the author puts forward an extended NCF---GMF. The extended part is the way of feature intersection.

The GMF structure is shown in the figure

Two feature crossing methods are integrated. On the left is Hadamard product of embedding results (two matrices are multiplied element by element)

On the right is the NCF mentioned above, which uses multilayer feedforward neural network for feature crossover. Then, the left and right feature cross results are connected for CTR prediction, and the model expression is

3.2 code

import torch
import torch.nn as nn
import torch.nn.functional as F


class NCF(nn.Module):
    # initialization
    def __init__(self, user_feature_dim, item_feature_dim, embedding_dim, output_dim_list):
        super(NCF, self).__init__()
        # Dimension of user characteristics
        self.user_feature_dim = user_feature_dim
        # Dimension of item characteristics
        self.item_feature_dim = item_feature_dim
        # Dimension of embedded vector
        self.embedding_dim = embedding_dim
        # Feature cross layer output dimension list
        self.output_dim_list = output_dim_list
        # User feature embedding layer
        self.user_embedding = nn.Linear(self.user_feature_dim, self.embedding_dim)
        # Item feature embedding layer
        self.item_embedding = nn.Linear(self.item_feature_dim, self.embedding_dim)
        # Characteristic cross layer
        self.neural_cf_layers = nn.ModuleList()
        # Add full connection layer
        for i in range(len(self.output_dim_list)):
            if i == 0:
                input_dim = embedding_dim * 2
            layer = nn.Sequential(
                nn.Linear(input_dim, self.output_dim_list[i], bias=True),
                # The activation function used in the paper is ReLU, which is on page 4 of the paper
                nn.ReLU(),
            )
            self.neural_cf_layers.append(layer)
            # The input dimension of updating the next linear layer is the output dimension of the current linear layer
            input_dim = self.output_dim_list[i]
        # Expand in order
        self.neural_cf_layers = nn.Sequential(*self.neural_cf_layers)

        # Output layer, two classification
        # The last output layer of the paper does not use bias
        self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=False)

    # Forward transfer and establish calculation diagram
    def forward(self, user_feature, item_feature):
        # The input user features and item features are embedded respectively
        user_embedding = self.user_embedding(user_feature)
        item_embedding = self.item_embedding(item_feature)
        # connect
        feature = torch.cat([user_embedding, item_embedding], dim=-1)
        # Perform feature crossover
        feature_interaction = self.neural_cf_layers(feature)
        out = self.output_layer(feature_interaction)
        out = F.sigmoid(out)
        return out


# See Figure 3 in the paper
class GMF(nn.Module):
    # Initialize network layer
    def __init__(
            self,
            user_feature_dim,
            item_feature_dim,
            embedding_dim,
            output_dim_list,
    ):
        super(GMF, self).__init__()
        # User's characteristic dimension
        self.user_feature_dim = user_feature_dim
        # Characteristic dimension of goods
        self.item_feature_dim = item_feature_dim
        # Dimension of embedded vector
        self.embedding_dim = embedding_dim
        # Output dimension of each layer of feature cross layer
        self.output_dim_list = output_dim_list
        # MLP embedded network
        self.mlp_user_embedding_layer = nn.Linear(self.user_feature_dim, self.embedding_dim, bias=True)
        self.mlp_item_embedding_layer = nn.Linear(self.item_feature_dim, self.embedding_dim, bias=True)
        # Embedded MF network
        self.mf_user_embedding_layer = nn.Linear(self.user_feature_dim, self.embedding_dim, bias=True)
        self.mf_item_embedding_layer = nn.Linear(self.item_feature_dim, self.embedding_dim, bias=True)
        # Characteristic cross layer
        self.neural_cf_layers = nn.ModuleList()
        # Add full connection layer
        for i in range(len(self.output_dim_list)):
            if i == 0:
                input_dim = embedding_dim * 2
            layer = nn.Sequential(
                nn.Linear(input_dim, self.output_dim_list[i], bias=True),
                # The activation function used in the paper is ReLU, which is on page 4 of the paper
                nn.ReLU(),
            )
            self.neural_cf_layers.append(layer)
            # The input dimension of updating the next linear layer is the output dimension of the current linear layer
            input_dim = self.output_dim_list[i]
        # Expand in order
        self.neural_cf_layers = nn.Sequential(*self.neural_cf_layers)
        # Output layer for CTR prediction
        self.output_layer = nn.Linear(self.embedding_dim + self.output_dim_list[-1], 2, bias=False)

    # Forward transfer and establish calculation diagram
    def forward(self, user_feature, item_feature):
        # Embed
        mlp_user_embedding = self.mlp_user_embedding_layer(user_feature)
        mlp_item_embedding = self.mlp_item_embedding_layer(item_feature)
        mf_user_embedding = self.mf_user_embedding_layer(user_feature)
        mf_item_embedding = self.mf_item_embedding_layer(item_feature)
        mlp_input = torch.cat([mlp_user_embedding, mlp_item_embedding], dim=-1)
        mlp_resault = self.neural_cf_layers(mlp_input)
        # Hadamar plot is also called yuansuji
        # The common * method in PyTorch is Hadamard product
        gmf_resault = mf_user_embedding * mf_item_embedding
        # connect
        final_resault = torch.cat([gmf_resault, mlp_resault], dim=-1)
        out = self.output_layer(final_resault)
        out = F.sigmoid(out)
        return out


# Model test
if __name__ == '__main__':
    # Randomly generate user characteristics and item characteristics
    user_feature = torch.randn(1, 10)
    item_feature = torch.randn(1, 10)
    ncf = NCF(user_feature_dim=10, item_feature_dim=10, embedding_dim=20, output_dim_list=[10, 5, 4])
    output = ncf(user_feature, item_feature)
    print(output)
    gmf = GMF(user_feature_dim=10, item_feature_dim=10, embedding_dim=20, output_dim_list=[10, 5, 4])
    output = gmf(user_feature, item_feature)
    print(output)

Quad PNN

4.1 thesis

Thesis title: product based neural networks for user response prediction 2016 ICDM

Thesis address: <Product-based Neural Networks for User Response Prediction>

The input data used in the recommendation system are generally high-dimensional sparse vectors

 

Previous logistic regression and FM methods rely on manual feature engineering to extract high-order features hidden in data. In recent years, because neural network can automatically learn effective feature representation from data, it has attracted people's attention. A method based on embedded representation + feedforward neural network (multilayer perceptron, MLP) is proposed to learn the combined information between different features. However, this method can not capture the interactive relationship of features in different feature domains. For this, this paper proposes PNN (product based neural network) to learn the feature combination information between different feature domains.

PNN model structure is shown in the figure

The paper is introduced from top to bottom. It's a little awkward. Here, it's from bottom to top.

Input: the input is a sparse feature vector of different feature fields, such as age, address, purchase date These are vectors encoded with one hot or multi hot

Embedding: embed the input separately. Note that the dimension of the embedding vector is the same for the later calculation. For example, the sparse feature vectors in different feature fields are represented by 500 dimensional embedding vectors.

Product: this layer is the innovation of the model. After the embedded vector of the previous model is obtained, it is directly concatenated and then input into the feedforward neural network. Here, the author is divided into z and p parts to capture the combination of features in different feature domains. The z part captures the linear combination of features

The p part captures the nonlinear feature combination information. According to the different calculation methods of p, the author proposes two different types of PNN---IPNN and OPNN.

The p part of IPNN uses inner product to capture the interaction information between different features

The p part of OPNN uses outer product to capture the interaction information between different features

The inner product result of the two features is a number, which can be directly placed in part p, but the outer product result of the two features is a matrix. The author's treatment method is to add all the result matrices of the outer product, and then carry out row vectorization, so that it can be placed in part p.

(note that the previous z part is N data, and the p part is (data)

Then input the data of part z and part p into a full connection layer for linear transformation

Then concat enate the linear transformation results and input them into the back feedforward neural network to further combine different features. The back is the conventional process. Nothing to say. The activation function uses ReLU and the last layer uses sigmoid

 

Updating network parameters using cross entropy loss

4.2 code

import torch
import torch.nn as nn
import torch.nn.functional as F


# Fig.1 in the paper
class PNN(nn.Module):
    # Initialize network layer
    def __init__(
            self,
            feature_dim,
            field_num,
            embedding_dim,
            output_dim_list,
    ):
        super(PNN, self).__init__()
        # Dimensions of different domain characteristics entered
        self.feature_dim = feature_dim
        # Number of features in different domains
        self.field_num = field_num
        # Feature embedding dimension
        self.embedding_dim = embedding_dim
        # Output dimensions of different layers of feature cross layer
        self.output_dim_list = output_dim_list

        # Embedded layer
        self.embedding_layer = nn.ModuleList()
        for i in range(self.field_num):
            self.embedding_layer.append(nn.Linear(self.feature_dim, self.embedding_dim, bias=False))
        # Part z
        self.z = nn.Linear(self.field_num * self.embedding_dim, self.field_num * self.embedding_dim, bias=True)
        self.z.bias = nn.Parameter(torch.ones(self.field_num * self.embedding_dim))
        # Characteristic cross layer
        self.hidden_layer = nn.ModuleList()
        for i in range(len(self.output_dim_list)):
            if i == 0:
                # N data of part z + n*(n-1)/2 data of part p
                input_dim = self.field_num * self.embedding_dim + int(self.field_num * (self.field_num - 1) / 2)
            layer = nn.Sequential(
                nn.Linear(input_dim, self.output_dim_list[i], bias=True),
                nn.ReLU(True),
            )
            self.hidden_layer.append(layer)
            # Updating the input dimension of the next layer is the output dimension of the current layer
            input_dim = self.output_dim_list[i]
        self.hidden_layer = nn.Sequential(*self.hidden_layer)
        # Output layer
        self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=True)

    # Forward transfer and establish calculation diagram
    def forward(self, x_list):
        # Embed
        # The length of the feature list must be equal to the number of feature fields
        assert len(x_list) == self.field_num
        embedding_resault = []
        for i in range(self.field_num):
            embedding_resault.append(self.embedding_layer[i](x_list[i]))
        # z-part, linear transformation
        embedding_feature = torch.cat(embedding_resault, dim=-1)
        z_result = self.z(embedding_feature)
        # Part p, calculate the inner product
        embedding_feature = torch.stack(embedding_resault, dim=0)
        # Calculating inner product matrix
        innear_product = torch.matmul(embedding_feature, embedding_feature.T)
        # Gets the inner product value of the upper triangular division diagonal
        innear_result = torch.triu(innear_product, diagonal=1)
        innear_result = innear_result[innear_result != 0]
        # Transform dimension
        innear_result = innear_result.reshape(1, -1).squeeze()
        # Enter the characteristics of the following cross layer
        input_feature = torch.cat([z_result, innear_result], dim=-1)
        # Characteristic cross result
        interaction_result = self.hidden_layer(input_feature)
        out = self.output_layer(interaction_result)
        out = F.sigmoid(out)
        return out


# Model test
if __name__ == '__main__':
    # Randomly generate features of different feature domains
    x_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    x_list = torch.tensor(x_list, dtype=torch.float32)
    pnn = PNN(feature_dim=3, field_num=3, embedding_dim=6, output_dim_list=[5, 4])
    output = pnn(x_list)
    print(output)

V. wide & deep

5.1 thesis

Thesis title: wide & deep learning for recommender systems 2016 RecSys

Thesis address: <Wide & Deep Learning for Recommender Systems>

The model proposed by Google 2016 for APP recommendation in Google APP store.

wide refers to the memory of the model (it is a linear model, which directly generates recommendation results according to the input features like logistic regression, just as the model remembers this feature combination), and deep refers to the generalization of the model (it is a neural network, which captures the high-order information between different feature combinations through the neural network, so that the corresponding recommendation results can be generated even if it meets the feature combinations rarely seen before)

The model structure is shown in the figure

The wide part is to input the features directly into a linear model

 

In the specific application, cross product is used to transform the input characteristics

 

(although the formula is given, it is designed manually in advance - let the specific feature combination value be 1 and the others be 0)

In the deep part, the features are embedded first, and then the embedded vector is input into the feedforward neural network to capture different feature combination information.

 

Finally, the results of the two parts are connected, input to the output layer for prediction, and the parameters of the network are updated by cross entropy loss

 

5.2 code

 

import torch
import torch.nn as nn
import torch.nn.functional as F


class WideDeep(nn.Module):
    # Initialize network layer
    def __init__(
            self,
            deep_feature_dim,
            wide_feature_dim,
            embedding_dim,
            output_dim_list
    ):
        super(WideDeep, self).__init__()
        # Dimension of some features of deep
        self.deep_feature_dim = deep_feature_dim
        # Dimension of some features of wide
        self.wide_feature_dim = wide_feature_dim
        # Dimension of feature embedding vector
        self.embedding_dim = embedding_dim
        # Output dimension list of middle network layer in deep part
        self.output_dim_list = output_dim_list

        # Embedded layer
        self.embedding_layer = nn.Linear(deep_feature_dim, embedding_dim, bias=True)

        # deep part
        self.deep = nn.ModuleList()
        # Add full connection layer
        for i in range(len(self.output_dim_list)):
            if i == 0:
                input_dim = embedding_dim
            layer = nn.Sequential(
                nn.Linear(input_dim, self.output_dim_list[i], bias=True),
                # The activation function used in the paper is ReLU, which is shown in Figure 4 of the paper
                nn.ReLU(),
            )
            self.deep.append(layer)
            # The input dimension of updating the next linear layer is the output dimension of the current linear layer
            input_dim = self.output_dim_list[i]
        # Expand in order
        self.deep = nn.Sequential(*self.deep)
        # Output layer
        self.output_layer = nn.Linear(wide_feature_dim + self.output_dim_list[-1], 2, bias=True)

    # Forward transfer and establish calculation diagram
    def forward(self, wide_feature, deep_feature):
        embedding_resault = self.embedding_layer(deep_feature)
        deep_resault = self.deep(embedding_resault)
        final_resault = torch.cat([wide_feature, deep_resault], dim=-1)
        out = self.output_layer(final_resault)
        out = F.sigmoid(out)
        return out


# Model test
if __name__ == '__main__':
    wide_feature = torch.randn(1, 20)
    deep_feature = torch.randn(1, 30)
    wide_deep = WideDeep(wide_feature_dim=20, deep_feature_dim=30, embedding_dim=50, output_dim_list=[30, 15, 10])
    output = wide_deep(wide_feature, deep_feature)
    print(output)

Six DCN

6.1 thesis

Thesis title: Deep & cross network for Ad Click predictions 2017 KDD

Thesis address: <Deep & Cross Network for Ad Click Predictions>

The author believes that the above wide part does not have sufficient feature crossover, and the feature interaction mode is also manually specified, which lacks generalization, so the wide part is improved. The model structure is shown in the figure below

 

The Cross part replaces the original wide part, and nothing else has changed

The expression of the cross section is

 

 Is the input of the current network layer, Is the output of the current network layer, It is the feature vector obtained by concat enation under the network. (equivalent to characteristic cross + residual connection)

The calculation process is shown in the figure below

 

6.2 code

import torch
import torch.nn as nn
import torch.nn.functional as F


# Layers in cross layer
class CrossLayer(nn.Module):
    def __init__(self, input_dim):
        super(CrossLayer, self).__init__()
        self.weight = nn.Linear(input_dim, 1, bias=False)
        self.bias = nn.Parameter(torch.zeros(input_dim))

    def forward(self, x0, xi):
        interaction_out = self.weight(xi) * x0 + self.bias
        return interaction_out


# Sub modules in cross layer
class CrossLayerBlock(nn.Module):
    def __init__(self, input_dim, layer_num):
        super(CrossLayerBlock, self).__init__()
        # Number of cross layers
        self.layer_num = layer_num
        # Enter the dimension of the feature
        self.input_dim = input_dim
        self.layer = nn.ModuleList(CrossLayer(self.input_dim) for _ in range(self.layer_num))

    def forward(self, x0):
        xi = x0
        for i in range(self.layer_num):
            xi = xi + self.layer[i](x0, xi)
        return xi


# Figure 1 in the paper
class DCN(nn.Module):
    # Initialize network layer
    def __init__(self, input_dim, embedding_dim, cross_layer_num, deep_layer_num):
        super(DCN, self).__init__()
        # Enter the dimension of the feature
        self.input_dim = input_dim
        # Dimension of embedded vector
        self.embedding_dim = embedding_dim
        # Number of cross layer s
        self.cross_layer_num = cross_layer_num
        # Number of deep layer s
        self.deep_layer_num = deep_layer_num

        # Embedded layer
        self.embedding_layer = nn.Linear(self.input_dim, self.embedding_dim, bias=False)
        # cross layer
        self.cross_layer = CrossLayerBlock(self.embedding_dim, self.cross_layer_num)
        # deep layer
        self.deep_layer = nn.ModuleList()
        for i in range(self.deep_layer_num):
            layer = nn.Sequential(
                nn.Linear(self.embedding_dim, self.embedding_dim, bias=True),
                nn.ReLU(),
            )
            self.deep_layer.append(layer)
        self.deep_layer = nn.Sequential(*self.deep_layer)
        # Output layer
        self.output_layer = nn.Linear(self.embedding_dim * 2, 2, bias=True)

    # Forward transfer and establish calculation diagram
    def forward(self, x):
        # Embedding features
        x_embedding = self.embedding_layer(x)
        # cross layer
        cross_resault = self.cross_layer(x_embedding)
        # deep layer
        deep_resault = self.deep_layer(x_embedding)
        temp_resault = torch.cat([cross_resault, deep_resault], dim=-1)
        out = self.output_layer(temp_resault)
        out = F.sigmoid(out)
        return out


# Model test
if __name__ == '__main__':
    x = torch.randn(1, 10)
    dcn = DCN(10, 20, 2, 2)
    output = dcn(x)
    print(output)

VII. FNN

7.1 thesis

Thesis title: deep learning over multi field categorical data – A Case Study on User Response Prediction 2016 ECIR

Thesis address: Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction

The model structure is shown in the figure below

 

The input is the feature in different domains, and then compare the FM expression in the figure below with the Dense Real Layer in FNN

 

First train to get an FM, and then use the corresponding parameters in FMInitialize the sense real layer of FNN.

Then the results are concatenated into the feedforward neural network

 

 

 

 

The author uses the prediction function in the feedforward neural network to get the final attention

Or train the network through cross entropy loss

 

Later, the author also gives an SNN, that is, a feedforward neural network, which attempts to use RBM (restricted Boltzmann machine) and DAE (denoising self encoder) to train the layer weight initialized with FM parameters in FNN.

 

7.2 code

import torch
import torch.nn as nn
import torch.nn.functional as F


class FNN(nn.Module):
    # Initialize network layer
    def __init__(
            self,
            dense_input_dim,
            dense_output_dim,
            output_dim_list,
    ):
        super(FNN, self).__init__()
        # Input dimension of deny layer
        self.dense_input_dim = dense_input_dim
        # Output dimension of deny layer
        self.dense_output_dim = dense_output_dim
        # List of output layer dimensions of hiden layer
        self.output_dim_list = output_dim_list

        # dense layer
        self.dense_layer = nn.Linear(self.dense_input_dim, self.dense_output_dim, bias=True)
        # hiden layer
        self.hiden_layer = nn.ModuleList()
        for i in range(len(self.output_dim_list)):
            if i == 0:
                input_dim = self.dense_output_dim
            layer = nn.Sequential(
                nn.Linear(input_dim, self.output_dim_list[i], bias=True),
                nn.Tanh()
            )
            self.hiden_layer.append(layer)
            input_dim = self.output_dim_list[i]
        self.hiden_layer = nn.Sequential(*self.hiden_layer)
        # # Initialize the parameters of FNN with the pre trained FM parameters
        # self.__init_layer()
        # Output layer
        self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=True)

    # # Initialize FNN parameters
    # def __init_layer(self, w, b):
    #     # Traverse the network layer of FNN
    #     for m in self.modules():
    #         if isinstance(m, nn.Linear):
    #             m.weight.data = w
    #             m.bias.data = b
    #             break

    # Forward transfer and establish calculation diagram
    def forward(self, x):
        dense_result = self.dense_layer(x)
        out = self.hiden_layer(dense_result)
        out = self.output_layer(out)
        out = F.sigmoid(out)
        return out


# Model test
if __name__ == '__main__':
    x = torch.randn(1, 10)
    fnn = FNN(dense_input_dim=10, dense_output_dim=20, output_dim_list=[15, 10, 5])
    output = fnn(x)
    print(output)

VIII. DeepFM

8.1 thesis

Thesis title: deepfm: a factorization machine based neural network for CTR prediction 2017 IJCAI

Thesis address: <DeepFM: A Factorization-Machine based Neural Network for CTR Prediction>

The wide part of the wide & deep model is improved by using FM layer. The model structure is

 

The inputs are different characteristic fields (gender, age, address...) One hot sparse coding vector, and then embedded through the Embedding layer to get the embedded vector. Conventional operation, nothing to say. Let's talk about the FM layer proposed by the model

 

Like the product layer of PNN, '+' here is the linear combination of features, and the following 'X' is the inner product (m*(m-1)/2 results) between different features, which are connected and input into the output layer

 

The author also compares it with FNN, PNN and wide & deep

 

 

8.2 code

import torch
import torch.nn as nn
import torch.nn.functional as F


class DeepFM(nn.Module):
    # Initialize network layer
    def __init__(
            self,
            field_num,
            feature_dim,
            embedding_dim,
            output_dim_list,
    ):
        super(DeepFM, self).__init__()
        # Number of features in different domains
        self.field_num = field_num
        # Dimensions of different domain characteristics entered
        self.feature_dim = feature_dim
        # Feature embedding dimension
        self.embedding_dim = embedding_dim
        # Output dimensions of different layers of feature cross layer
        self.output_dim_list = output_dim_list

        # Embedded layer
        self.embedding_layer = nn.ModuleList()
        for i in range(self.field_num):
            self.embedding_layer.append(nn.Linear(self.feature_dim, self.embedding_dim, bias=False))
        # FM layer
        self.fm_layer = nn.Linear(self.field_num * self.embedding_dim, 1, bias=False)
        # hidden layer
        self.hidden_layer = nn.ModuleList()
        for i in range(len(self.output_dim_list)):
            if i == 0:
                input_dim = self.field_num * self.embedding_dim
            layer = nn.Sequential(
                nn.Linear(input_dim, self.output_dim_list[i], bias=True),
                nn.ReLU(True),
            )
            self.hidden_layer.append(layer)
            # Updating the input dimension of the next layer is the output dimension of the current layer
            input_dim = self.output_dim_list[i]
        self.hidden_layer = nn.Sequential(*self.hidden_layer)
        # Output layer
        input_dim = 1 + int(self.field_num * (self.field_num - 1) / 2) + self.output_dim_list[-1]
        self.output_layer = nn.Linear(input_dim, 2, bias=True)

    # Forward transfer and establish calculation diagram
    def forward(self, x_list):
        # Embed
        # The length of the feature list must be equal to the number of feature fields
        assert len(x_list) == self.field_num
        embedding_resault = []
        for i in range(self.field_num):
            embedding_resault.append(self.embedding_layer[i](x_list[i]))
        # Linear transformation of FM
        embedding_feature = torch.cat(embedding_resault, dim=-1)
        l_result = self.fm_layer(embedding_feature)
        # Hidden layer section
        hiddent_result = self.hidden_layer(embedding_feature)
        # FM calculation inner product part
        embedding_feature = torch.stack(embedding_resault, dim=0)
        # Calculating inner product matrix
        innear_product = torch.matmul(embedding_feature, embedding_feature.T)
        # Gets the inner product value of the upper triangular division diagonal
        innear_result = torch.triu(innear_product, diagonal=1)
        innear_result = innear_result[innear_result != 0]
        # Transform dimension
        innear_result = innear_result.reshape(1, -1).squeeze()
        fm_result = torch.cat([l_result, innear_result], dim=-1)
        input_feature = torch.cat([fm_result, hiddent_result], dim=-1)
        out = self.output_layer(input_feature)
        out = F.sigmoid(out)
        return out


# Model test
if __name__ == '__main__':
    # Randomly generate features of different feature domains
    x_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    x_list = torch.tensor(x_list, dtype=torch.float32)
    deepFM = DeepFM(feature_dim=3, field_num=3, embedding_dim=6, output_dim_list=[5, 4])
    output = deepFM(x_list)
    print(output)

 

Nine NFM

9.1 thesis

Title of the paper: neural factorization machines for spark predictive analytics 2017 SIGIR

Thesis address: <Neural Factorization Machines for Sparse Predictive Analytics>

NFM uses neural network to overcome the problem that FM has weak expression ability and can not capture high-order feature cross information.

The expression for FM is

 

NFM uses neural network to replace the part of the last second-order feature intersection

 

The neural network structure is

 

There is Embedding in the input. Needless to say, the feed-forward neural network in the back is used for routine operation. Let's talk about the Bi interaction layer introduced by NFM

 

Among themIt is the embedded vector representation of the ith feature. The Bi interaction layer makes Hadamard product of these embedded vectors in pairs, then adds them and inputs them into the following feedforward neural network

The final NFM expression is

 

NFM also uses the Dropout strategy in Bi interaction layer, and BN(Batch Normalization) is used for the output of Bi interaction layer and the output of feedforward neural network

9.2 code

import torch
import torch.nn as nn
import torch.nn.functional as F


class NFM(nn.Module):
    # Initialization layer
    def __init__(
            self,
            field_num,
            feature_dim,
            embedding_dim,
            output_dim_list,
    ):
        super(NFM, self).__init__()
        # Number of characteristic fields
        self.field_num = field_num
        # Feature dimension
        self.feature_dim = feature_dim
        # Embedded dimension
        self.embedding_dim = embedding_dim
        # Output dimension of hidden layer
        self.output_dim_list = output_dim_list

        # Embedded layer
        self.embedding_layer = nn.ModuleList()
        for i in range(self.field_num):
            layer = nn.Linear(self.feature_dim, self.embedding_dim, bias=False)
            self.embedding_layer.append(layer)
        # hidden layer
        self.hidden_layer = nn.ModuleList()
        for i in range(len(self.output_dim_list)):
            if i == 0:
                input_dim = self.embedding_dim
            layer = nn.Sequential(
                nn.Linear(input_dim, self.output_dim_list[i], bias=True),
                # nn.BatchNorm1d(),
                nn.ReLU(),
            )
            self.hidden_layer.append(layer)
            input_dim = self.output_dim_list[i]
        self.hidden_layer = nn.Sequential(*self.hidden_layer)
        # Output layer
        self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=True)

    # Forward transfer and establish calculation diagram
    def forward(self, x_list):
        assert len(x_list) == self.field_num
        # embed
        embedding_result = []
        for i in range(self.field_num):
            embedding_result.append(self.embedding_layer[i](x_list[i]))

        # Bi-interaction pooling
        batch_size = x_list[0].size()[0]
        bi_pool_result = torch.empty(batch_size, self.embedding_dim)
        for i in range(self.field_num):
            for j in range(self.field_num):
                bi_pool_result += embedding_result[i] * embedding_result[j]
        out = self.hidden_layer(bi_pool_result)
        out = self.output_layer(out)
        out = F.sigmoid(out)
        return out


# Model test
if __name__ == '__main__':
    # Randomly generate features of different feature domains
    x_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    x_list = torch.tensor(x_list, dtype=torch.float32)
    nfm = NFM(field_num=3, feature_dim=3, embedding_dim=10, output_dim_list=[5, 4])
    output = nfm(x_list)
    print(output)

Ten AFM

10.1 thesis

Thesis title: attentive factorization machines: learning the weight of feature interactions via attention networks 2017 IJCAI

Thesis address: <Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks>

NFM's Bi interaction layer directly adds the results of feature intersection, which implies a hypothesis: all feature intersections have the same impact on the final result. AFM believes that the results of different feature intersections should be treated differently, and introduces the attention mechanism.

The model structure is

 

Most of them are the same as NFM, but after feature crossing (Hadamard product), the results are input into the attention network (that is, a fully connected layer, don't think too complex) to learn the attention weight, and then sum with the attention weight

 

The calculation formula of attention weight is

 

Finally, the expression of AFM model is

 

10.2 code

import torch
import torch.nn as nn
import torch.nn.functional as F


class AFM(nn.Module):
    # Initialize network layer
    def __init__(
            self,
            field_num,
            feature_dim,
            embedding_dim,
            output_dim_list,
            attention_dim,
    ):
        super(AFM, self).__init__()
        # Number of characteristic fields
        self.field_num = field_num
        # Feature dimension
        self.feature_dim = feature_dim
        # Embedded dimension
        self.embedding_dim = embedding_dim
        # Output dimension of hidden layer
        self.output_dim_list = output_dim_list
        # Hidden layer output dimension of attention network
        self.attention_dim = attention_dim

        # Embedded layer
        self.embedding_layer = nn.ModuleList()
        for i in range(self.field_num):
            layer = nn.Linear(self.feature_dim, self.embedding_dim, bias=False)
            self.embedding_layer.append(layer)
        # Attention network
        self.attention_layer = nn.Sequential(
            nn.Linear(self.embedding_dim, self.attention_dim, bias=True),
            nn.ReLU(),
            nn.Linear(self.attention_dim, 1, bias=False)
        )
        # hidden layer
        self.hidden_layer = nn.ModuleList()
        for i in range(len(self.output_dim_list)):
            if i == 0:
                input_dim = self.embedding_dim
            layer = nn.Sequential(
                nn.Linear(input_dim, self.output_dim_list[i], bias=True),
                nn.ReLU(),
            )
            self.hidden_layer.append(layer)
            input_dim = self.output_dim_list[i]
        self.hidden_layer = nn.Sequential(*self.hidden_layer)
        # Output layer
        self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=True)

    # Forward transfer and establish calculation diagram
    def forward(self, x_list):
        assert len(x_list) == self.field_num
        # embed
        embedding_result = []
        for i in range(self.field_num):
            embedding_result.append(self.embedding_layer[i](x_list[i]))

        # attention-based pooling
        pair_wise_interaction_result = []
        for i in range(self.field_num):
            for j in range(i + 1, self.field_num):
                pair_wise_interaction_result.append(embedding_result[i] * embedding_result[j])
        # Attention weight
        attention_weight = []
        for i in pair_wise_interaction_result:
            attention_weight.append(self.attention_layer(i))
        attention_weight = torch.tensor(attention_weight, dtype=torch.float32)
        attention_weight = F.softmax(attention_weight, dim=-1)
        result = 0
        for i in range(len(attention_weight)):
            result += attention_weight[i] * pair_wise_interaction_result[i]
        out = self.hidden_layer(result)
        out = self.output_layer(out)
        out = F.sigmoid(out)
        return out


# Model test
if __name__ == '__main__':
    # Randomly generate features of different feature domains
    x_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
    x_list = torch.tensor(x_list, dtype=torch.float32)
    afm = AFM(field_num=3, feature_dim=3, embedding_dim=10, attention_dim=5, output_dim_list=[5, 4])
    output = afm(x_list)
    print(output)

Tags: Pytorch Deep Learning

Posted by kit on Mon, 02 May 2022 15:29:25 +0300