Note that the code in this article is written by myself according to the paper. There must be some details that are not expressed, and it is inevitable that there are errors. It is recommended to see the original code of the paper for more model details. If you find errors in the code, please correct them in the comment area
I. AutoRec
1.1 thesis
Thesis title: AutoRec: Autoencoders Meet Collaborative Filtering 2015 WWW
Thesis address: <AutoRec: Autoencoders Meet Collaborative Filtering>
The first attempt of deep learning in Recommendation System
Using collaborative filtering (CF) based on self encoder, it surpasses the previous CF in Movielens and Netflix data sets. The model structure is shown in the figure below
CF finds out the user's interest through the user item interaction matrix (each row is the user u's evaluation of different items i, and each column is the evaluation of different users on items i), and then recommends the corresponding items to the user according to the user's interest.
Each row of the user based interaction matrix (user based autorec)
Or each column (item based autorec)
Input to the self encoder, and optimize the self encoder by using the following loss function (take item based autorec as an example)
Where h() is the output of the self encoder and r is the real article column vector
The U-Th output of the self encoder is the score of the user u predicted by the model on the current item i. Enter different item column vectors in turn, and you will get the user U's score for all items, and recommend them to the user according to the predicted score.
(
I wonder why the author said that the output is a score, which means that the model doesn't do anything, that is, input a vector and output another vector.
First of all, we should know what self encoder does. Self encoder is an unsupervised learning model, which is divided into two parts - encoder and decoder.
The encoder is equivalent to dimensionality reduction and compression of the input. The purpose is to extract the important information in the input and generate a code for the input, which is this part of the picture
The decoder attempts to restore the original input through the encoding output of the encoder. If the information extracted by the encoder is good, the decoding output of the decoder will be very close to the original output, which is this part in the figure
So how to make the encoder extract more important information is constrained by the loss function, which is the one in this paper
It is also called 'reconstruction loss' in codec.
Keep using the loss function for training, and slowly the encoder will learn to extract important information from the input.
)
1.2 code
# coding=UTF-8 import torch import torch.nn as nn # Recommendation model based on self encoder class AutoRec(nn.Module): # Network initialization layer def __init__(self, feature_dim, hidden_dim): # feature_dim: user / item scoring vector dimension # hidden_dim: hidden layer dimension super(AutoRec, self).__init__() self.feature_dim = feature_dim self.hidden_dim = hidden_dim # encoder self.encoder = nn.Sequential( nn.Linear(in_features=self.feature_dim, out_features=self.hidden_dim, bias=True), # The paper says that the effect of ReLU activation is the worst # sigmoid is used nn.Sigmoid(), ) # decoder self.decoder = nn.Sequential( nn.Linear(in_features=self.hidden_dim, out_features=self.feature_dim, bias=True), nn.Sigmoid(), ) # Initialize network layer self.init_layer() def init_layer(self): # Traverse all network layers for layer in self.modules(): if isinstance(layer, nn.Linear): layer.bias.data.fill_(1) # Forward transfer and build calculation diagram def forward(self, x): # x is the entered user rating vector or item rating vector # Code first encoder_x = self.encoder(x) # Then decode decoder_x = self.decoder(encoder_x) # Return results return decoder_x # Define loss function class AutoRecLoss(nn.Module): # initialization def __init__(self): super(AutoRecLoss, self).__init__() # Calculate loss function def forward(self, r, pre_r, w, v): return nn.MSELoss(r, pre_r) # # The optimizer of pytorch comes with L2 regularization # AutoRec = AutoRec(100, 50) # # Regularization equilibrium factor # lamuda = int(input()) # optimer = torch.optim.Adam(AutoRec.parameters(), lr=0.001, weight_decay=lamuda / 2) # Model test if __name__ == '__main__': # Randomly generate a user item scoring matrix x = torch.randn(50, 100) # Input each user rating vector into AutoRec auto_rec = AutoRec(100, 50) output = auto_rec(x) print(output)
II. Deep Crossing
2.1 thesis
Thesis title: deep crossing: Web scale modeling without manually crafted combinatorial features 2016 KDD
Thesis address: <Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features>
Deep learning is used to automatically learn effective feature representation, instead of the time-consuming manual feature engineering.
Deep Crossing will learn effective feature representation from the input data and use the learned features for the next prediction.
Deep Crossing is proposed to solve this problem: when a user searches for keywords in Microsoft Bing, how does the system recommend corresponding advertisements according to the keywords he searches. The following is the explanation of some participants in this process
First, the original features input into the model (such as user age, search term, search time...) Do some processing, such as encoding it with one hot, and then input it into the model. The model structure is shown in the figure below
Embed the input original features
(
Embedding, embedding, is an important idea in deep learning.
Embedding in deep learning refers to how to more reasonably represent an object with a vector. For example, word embedding refers to the use of vectors to represent a word, which can also reflect the semantic level of words in European space. Graph embedding refers to the use of vectors to represent nodes or the whole graph in graph structure data, so that vectors can also reflect the connection relationship between nodes in European space.
Specifically, the implementation is a full connection layer, that is, a linear transformation, in which the parameters are learned through the training process.
It can also play the role of densifying sparse vectors, such as embedding here.
)
Then the results of each Embedding are spliced as a new feature vector
Input the feature vector into the full connection layer with residual connection to combine different features (this is also the origin of the name Deep Crossing. Previous feature combinations are displayed by formula, such as FM, while Deep Crossing uses neural network to learn this combination)
Then CTR prediction is carried out, and the loss function (cross entropy loss) is used
Optimize the parameters in the network.
2.2 code
import torch import torch.nn as nn import torch.nn.functional as F # Residual full connection layer for feature crossover in DeepCrossing class ResidualBlock(nn.Module): def __init__(self, feature_dim, out_dim, use_residual=True): super(ResidualBlock, self).__init__() # Whether to use residual connection self.use_residual = use_residual # Enter the dimension of the feature self.feature_dim = feature_dim # Dimension of output feature of middle hidden layer self.out_dim = out_dim # Full connection layer for feature crossing self.feature_interaction_layer = nn.Sequential( # The residual in the paper is used in the full connection layer # Two linear transformation layers # Offset used # The activation function is ReLU # In Figure 2 of the paper nn.Linear(in_features=feature_dim, out_features=out_dim, bias=True), nn.ReLU(True), nn.Linear(in_features=out_dim, out_features=feature_dim, bias=True), ) # Forward transfer and establish calculation diagram def forward(self, x): # x is the input feature # Obtained by Embedding+concat residual_out = self.feature_interaction_layer(x) # If residual connection is used if self.use_residual: # Make residual connection residual_out = residual_out + x # activation # Or ReLU function # See Figure 2 in the paper residual_out = F.relu(residual_out) return residual_out class DeepCrossing(nn.Module): # Initialize network layer # See Figure 1 of the paper # Some of the input features need to be embedded # Other features are not embedded def __init__( self, embedding_layer_num=5, residual_layer_num=3, need_embd_dim=100, without_embd_dim=10, embedding_dim=50, output_dim=25, ): super(DeepCrossing, self).__init__() # Number of embedded layers self.embedding_layer_num = embedding_layer_num # Number of characteristic cross layers self.multiple_residual_units_num = residual_layer_num # Enter the dimension of the feature to be embedded in self.input_dim = need_embd_dim # Dimensions of features in the input that do not need to be embedded self.other_input_dim = without_embd_dim # Dimension of embedded vector self.embedding_dim = embedding_dim # multiple_residual_units dimension of hidden layer output in the middle self.output_dim = output_dim # Embedded layer self.embedding_layer_block = nn.Linear(self.input_dim, self.embedding_dim, bias=True) self.embedding_layer = nn.ModuleList() for i in range(self.embedding_layer_num): self.embedding_layer.append(self.embedding_layer_block) # Multiple Residual Units, characteristic cross layer self.multiple_residual_units = nn.ModuleList() for i in range(self.multiple_residual_units_num): self.multiple_residual_units.append( ResidualBlock(self.embedding_layer_num * self.embedding_dim + self.other_input_dim, self.output_dim)) # Note that this is different from the final processing of the embedded layer # Because in Figure 1 of the paper # multiple_residual_units is a vertical structure, so it needs to be expanded into a sequential relationship # The embedded layer is a horizontal structure, which can be saved with a list self.multiple_residual_units = nn.Sequential(*self.multiple_residual_units) # scoring layer # CTR result prediction # Second classification problem self.scoring_layer = nn.Linear(in_features=self.embedding_layer_num * self.embedding_dim + self.other_input_dim, out_features=2, bias=False) # Forward transfer and establish calculation diagram def forward(self, x_list, x): # The number of features to be embedded must be equal to the number of embedded layers assert len(x_list) == self.embedding_layer_num # Embed the features that need to be embedded embedding_result = [] for i in range(self.embedding_layer_num): temp_result = self.embedding_layer[i](x_list[i]) # Note that the embedding layer of the paper has a truncation operation, which is in formula (2) of the paper temp_result = torch.clamp(temp_result, min=0.0) embedding_result.append(torch.tensor(temp_result, dtype=torch.float32)) # Connect embedded results embedding_result = torch.cat(embedding_result, dim=-1) embedding_result = torch.cat([embedding_result, x], dim=-1) # Perform feature crossover feature_interaction = self.multiple_residual_units(embedding_result) # Result prediction out = self.scoring_layer(feature_interaction) return out # Model test if __name__ == '__main__': # Randomly generate features that need to be embedded x_list = [[0, 1], [1, 0]] x_list = torch.tensor(x_list, dtype=torch.float32) # Randomly generate features that do not need to be embedded x = [16, 14, 13] x = torch.tensor(x, dtype=torch.float32) deep_crossing = DeepCrossing( embedding_layer_num=2, residual_layer_num=1, need_embd_dim=2, without_embd_dim=3, embedding_dim=5, output_dim=10 ) output = deep_crossing(x_list, x) print(output)
III. NeuralCF
3.1 thesis
Thesis title: Neural Collaborative Filtering 2017 WWW
Thesis address: <Neural Collaborative Filtering>
NCF (Neural network--based Collaborative Filtering) is a matrix decomposition method in the form of neural network
The previous matrix decomposition (MF) method decomposes the user item interaction matrix to obtain the hidden vectors of users and items respectively, and then uses the inner product of the hidden vector to measure the user's love for the corresponding items and make recommendations.
The author believes that the use of inner product hinders the generalization of MF and cannot fully capture the interaction between users and item hidden vectors. In this paper, feedforward neural network is used to learn the interaction between user and item hidden vector, instead of the previous way of using inner product. The model structure is shown in the figure below
For example, the feature vector of the user's item is decomposed (embedded) into the feature matrix of the user's input of a certain item, and then the feature vector of the user's item is embedded (embedded) into the feature matrix of the user's input in the last step.
Then calculate the loss between the predicted value and the real value of the model, and update the network parameters. The loss function is (cross entropy loss)
The model expression is
The above is the standard NCF. In this paper, the author puts forward an extended NCF---GMF. The extended part is the way of feature intersection.
The GMF structure is shown in the figure
Two feature crossing methods are integrated. On the left is Hadamard product of embedding results (two matrices are multiplied element by element)
On the right is the NCF mentioned above, which uses multilayer feedforward neural network for feature crossover. Then, the left and right feature cross results are connected for CTR prediction, and the model expression is
3.2 code
import torch import torch.nn as nn import torch.nn.functional as F class NCF(nn.Module): # initialization def __init__(self, user_feature_dim, item_feature_dim, embedding_dim, output_dim_list): super(NCF, self).__init__() # Dimension of user characteristics self.user_feature_dim = user_feature_dim # Dimension of item characteristics self.item_feature_dim = item_feature_dim # Dimension of embedded vector self.embedding_dim = embedding_dim # Feature cross layer output dimension list self.output_dim_list = output_dim_list # User feature embedding layer self.user_embedding = nn.Linear(self.user_feature_dim, self.embedding_dim) # Item feature embedding layer self.item_embedding = nn.Linear(self.item_feature_dim, self.embedding_dim) # Characteristic cross layer self.neural_cf_layers = nn.ModuleList() # Add full connection layer for i in range(len(self.output_dim_list)): if i == 0: input_dim = embedding_dim * 2 layer = nn.Sequential( nn.Linear(input_dim, self.output_dim_list[i], bias=True), # The activation function used in the paper is ReLU, which is on page 4 of the paper nn.ReLU(), ) self.neural_cf_layers.append(layer) # The input dimension of updating the next linear layer is the output dimension of the current linear layer input_dim = self.output_dim_list[i] # Expand in order self.neural_cf_layers = nn.Sequential(*self.neural_cf_layers) # Output layer, two classification # The last output layer of the paper does not use bias self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=False) # Forward transfer and establish calculation diagram def forward(self, user_feature, item_feature): # The input user features and item features are embedded respectively user_embedding = self.user_embedding(user_feature) item_embedding = self.item_embedding(item_feature) # connect feature = torch.cat([user_embedding, item_embedding], dim=-1) # Perform feature crossover feature_interaction = self.neural_cf_layers(feature) out = self.output_layer(feature_interaction) out = F.sigmoid(out) return out # See Figure 3 in the paper class GMF(nn.Module): # Initialize network layer def __init__( self, user_feature_dim, item_feature_dim, embedding_dim, output_dim_list, ): super(GMF, self).__init__() # User's characteristic dimension self.user_feature_dim = user_feature_dim # Characteristic dimension of goods self.item_feature_dim = item_feature_dim # Dimension of embedded vector self.embedding_dim = embedding_dim # Output dimension of each layer of feature cross layer self.output_dim_list = output_dim_list # MLP embedded network self.mlp_user_embedding_layer = nn.Linear(self.user_feature_dim, self.embedding_dim, bias=True) self.mlp_item_embedding_layer = nn.Linear(self.item_feature_dim, self.embedding_dim, bias=True) # Embedded MF network self.mf_user_embedding_layer = nn.Linear(self.user_feature_dim, self.embedding_dim, bias=True) self.mf_item_embedding_layer = nn.Linear(self.item_feature_dim, self.embedding_dim, bias=True) # Characteristic cross layer self.neural_cf_layers = nn.ModuleList() # Add full connection layer for i in range(len(self.output_dim_list)): if i == 0: input_dim = embedding_dim * 2 layer = nn.Sequential( nn.Linear(input_dim, self.output_dim_list[i], bias=True), # The activation function used in the paper is ReLU, which is on page 4 of the paper nn.ReLU(), ) self.neural_cf_layers.append(layer) # The input dimension of updating the next linear layer is the output dimension of the current linear layer input_dim = self.output_dim_list[i] # Expand in order self.neural_cf_layers = nn.Sequential(*self.neural_cf_layers) # Output layer for CTR prediction self.output_layer = nn.Linear(self.embedding_dim + self.output_dim_list[-1], 2, bias=False) # Forward transfer and establish calculation diagram def forward(self, user_feature, item_feature): # Embed mlp_user_embedding = self.mlp_user_embedding_layer(user_feature) mlp_item_embedding = self.mlp_item_embedding_layer(item_feature) mf_user_embedding = self.mf_user_embedding_layer(user_feature) mf_item_embedding = self.mf_item_embedding_layer(item_feature) mlp_input = torch.cat([mlp_user_embedding, mlp_item_embedding], dim=-1) mlp_resault = self.neural_cf_layers(mlp_input) # Hadamar plot is also called yuansuji # The common * method in PyTorch is Hadamard product gmf_resault = mf_user_embedding * mf_item_embedding # connect final_resault = torch.cat([gmf_resault, mlp_resault], dim=-1) out = self.output_layer(final_resault) out = F.sigmoid(out) return out # Model test if __name__ == '__main__': # Randomly generate user characteristics and item characteristics user_feature = torch.randn(1, 10) item_feature = torch.randn(1, 10) ncf = NCF(user_feature_dim=10, item_feature_dim=10, embedding_dim=20, output_dim_list=[10, 5, 4]) output = ncf(user_feature, item_feature) print(output) gmf = GMF(user_feature_dim=10, item_feature_dim=10, embedding_dim=20, output_dim_list=[10, 5, 4]) output = gmf(user_feature, item_feature) print(output)
Quad PNN
4.1 thesis
Thesis title: product based neural networks for user response prediction 2016 ICDM
Thesis address: <Product-based Neural Networks for User Response Prediction>
The input data used in the recommendation system are generally high-dimensional sparse vectors
Previous logistic regression and FM methods rely on manual feature engineering to extract high-order features hidden in data. In recent years, because neural network can automatically learn effective feature representation from data, it has attracted people's attention. A method based on embedded representation + feedforward neural network (multilayer perceptron, MLP) is proposed to learn the combined information between different features. However, this method can not capture the interactive relationship of features in different feature domains. For this, this paper proposes PNN (product based neural network) to learn the feature combination information between different feature domains.
PNN model structure is shown in the figure
The paper is introduced from top to bottom. It's a little awkward. Here, it's from bottom to top.
Input: the input is a sparse feature vector of different feature fields, such as age, address, purchase date These are vectors encoded with one hot or multi hot
Embedding: embed the input separately. Note that the dimension of the embedding vector is the same for the later calculation. For example, the sparse feature vectors in different feature fields are represented by 500 dimensional embedding vectors.
Product: this layer is the innovation of the model. After the embedded vector of the previous model is obtained, it is directly concatenated and then input into the feedforward neural network. Here, the author is divided into z and p parts to capture the combination of features in different feature domains. The z part captures the linear combination of features
The p part captures the nonlinear feature combination information. According to the different calculation methods of p, the author proposes two different types of PNN---IPNN and OPNN.
The p part of IPNN uses inner product to capture the interaction information between different features
The p part of OPNN uses outer product to capture the interaction information between different features
The inner product result of the two features is a number, which can be directly placed in part p, but the outer product result of the two features is a matrix. The author's treatment method is to add all the result matrices of the outer product, and then carry out row vectorization, so that it can be placed in part p.
(note that the previous z part is N data, and the p part is (data)
Then input the data of part z and part p into a full connection layer for linear transformation
Then concat enate the linear transformation results and input them into the back feedforward neural network to further combine different features. The back is the conventional process. Nothing to say. The activation function uses ReLU and the last layer uses sigmoid
Updating network parameters using cross entropy loss
4.2 code
import torch import torch.nn as nn import torch.nn.functional as F # Fig.1 in the paper class PNN(nn.Module): # Initialize network layer def __init__( self, feature_dim, field_num, embedding_dim, output_dim_list, ): super(PNN, self).__init__() # Dimensions of different domain characteristics entered self.feature_dim = feature_dim # Number of features in different domains self.field_num = field_num # Feature embedding dimension self.embedding_dim = embedding_dim # Output dimensions of different layers of feature cross layer self.output_dim_list = output_dim_list # Embedded layer self.embedding_layer = nn.ModuleList() for i in range(self.field_num): self.embedding_layer.append(nn.Linear(self.feature_dim, self.embedding_dim, bias=False)) # Part z self.z = nn.Linear(self.field_num * self.embedding_dim, self.field_num * self.embedding_dim, bias=True) self.z.bias = nn.Parameter(torch.ones(self.field_num * self.embedding_dim)) # Characteristic cross layer self.hidden_layer = nn.ModuleList() for i in range(len(self.output_dim_list)): if i == 0: # N data of part z + n*(n-1)/2 data of part p input_dim = self.field_num * self.embedding_dim + int(self.field_num * (self.field_num - 1) / 2) layer = nn.Sequential( nn.Linear(input_dim, self.output_dim_list[i], bias=True), nn.ReLU(True), ) self.hidden_layer.append(layer) # Updating the input dimension of the next layer is the output dimension of the current layer input_dim = self.output_dim_list[i] self.hidden_layer = nn.Sequential(*self.hidden_layer) # Output layer self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=True) # Forward transfer and establish calculation diagram def forward(self, x_list): # Embed # The length of the feature list must be equal to the number of feature fields assert len(x_list) == self.field_num embedding_resault = [] for i in range(self.field_num): embedding_resault.append(self.embedding_layer[i](x_list[i])) # z-part, linear transformation embedding_feature = torch.cat(embedding_resault, dim=-1) z_result = self.z(embedding_feature) # Part p, calculate the inner product embedding_feature = torch.stack(embedding_resault, dim=0) # Calculating inner product matrix innear_product = torch.matmul(embedding_feature, embedding_feature.T) # Gets the inner product value of the upper triangular division diagonal innear_result = torch.triu(innear_product, diagonal=1) innear_result = innear_result[innear_result != 0] # Transform dimension innear_result = innear_result.reshape(1, -1).squeeze() # Enter the characteristics of the following cross layer input_feature = torch.cat([z_result, innear_result], dim=-1) # Characteristic cross result interaction_result = self.hidden_layer(input_feature) out = self.output_layer(interaction_result) out = F.sigmoid(out) return out # Model test if __name__ == '__main__': # Randomly generate features of different feature domains x_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] x_list = torch.tensor(x_list, dtype=torch.float32) pnn = PNN(feature_dim=3, field_num=3, embedding_dim=6, output_dim_list=[5, 4]) output = pnn(x_list) print(output)
V. wide & deep
5.1 thesis
Thesis title: wide & deep learning for recommender systems 2016 RecSys
Thesis address: <Wide & Deep Learning for Recommender Systems>
The model proposed by Google 2016 for APP recommendation in Google APP store.
wide refers to the memory of the model (it is a linear model, which directly generates recommendation results according to the input features like logistic regression, just as the model remembers this feature combination), and deep refers to the generalization of the model (it is a neural network, which captures the high-order information between different feature combinations through the neural network, so that the corresponding recommendation results can be generated even if it meets the feature combinations rarely seen before)
The model structure is shown in the figure
The wide part is to input the features directly into a linear model
In the specific application, cross product is used to transform the input characteristics
(although the formula is given, it is designed manually in advance - let the specific feature combination value be 1 and the others be 0)
In the deep part, the features are embedded first, and then the embedded vector is input into the feedforward neural network to capture different feature combination information.
Finally, the results of the two parts are connected, input to the output layer for prediction, and the parameters of the network are updated by cross entropy loss
5.2 code
import torch import torch.nn as nn import torch.nn.functional as F class WideDeep(nn.Module): # Initialize network layer def __init__( self, deep_feature_dim, wide_feature_dim, embedding_dim, output_dim_list ): super(WideDeep, self).__init__() # Dimension of some features of deep self.deep_feature_dim = deep_feature_dim # Dimension of some features of wide self.wide_feature_dim = wide_feature_dim # Dimension of feature embedding vector self.embedding_dim = embedding_dim # Output dimension list of middle network layer in deep part self.output_dim_list = output_dim_list # Embedded layer self.embedding_layer = nn.Linear(deep_feature_dim, embedding_dim, bias=True) # deep part self.deep = nn.ModuleList() # Add full connection layer for i in range(len(self.output_dim_list)): if i == 0: input_dim = embedding_dim layer = nn.Sequential( nn.Linear(input_dim, self.output_dim_list[i], bias=True), # The activation function used in the paper is ReLU, which is shown in Figure 4 of the paper nn.ReLU(), ) self.deep.append(layer) # The input dimension of updating the next linear layer is the output dimension of the current linear layer input_dim = self.output_dim_list[i] # Expand in order self.deep = nn.Sequential(*self.deep) # Output layer self.output_layer = nn.Linear(wide_feature_dim + self.output_dim_list[-1], 2, bias=True) # Forward transfer and establish calculation diagram def forward(self, wide_feature, deep_feature): embedding_resault = self.embedding_layer(deep_feature) deep_resault = self.deep(embedding_resault) final_resault = torch.cat([wide_feature, deep_resault], dim=-1) out = self.output_layer(final_resault) out = F.sigmoid(out) return out # Model test if __name__ == '__main__': wide_feature = torch.randn(1, 20) deep_feature = torch.randn(1, 30) wide_deep = WideDeep(wide_feature_dim=20, deep_feature_dim=30, embedding_dim=50, output_dim_list=[30, 15, 10]) output = wide_deep(wide_feature, deep_feature) print(output)
Six DCN
6.1 thesis
Thesis title: Deep & cross network for Ad Click predictions 2017 KDD
Thesis address: <Deep & Cross Network for Ad Click Predictions>
The author believes that the above wide part does not have sufficient feature crossover, and the feature interaction mode is also manually specified, which lacks generalization, so the wide part is improved. The model structure is shown in the figure below
The Cross part replaces the original wide part, and nothing else has changed
The expression of the cross section is
Is the input of the current network layer,
Is the output of the current network layer,
It is the feature vector obtained by concat enation under the network. (equivalent to characteristic cross + residual connection)
The calculation process is shown in the figure below
6.2 code
import torch import torch.nn as nn import torch.nn.functional as F # Layers in cross layer class CrossLayer(nn.Module): def __init__(self, input_dim): super(CrossLayer, self).__init__() self.weight = nn.Linear(input_dim, 1, bias=False) self.bias = nn.Parameter(torch.zeros(input_dim)) def forward(self, x0, xi): interaction_out = self.weight(xi) * x0 + self.bias return interaction_out # Sub modules in cross layer class CrossLayerBlock(nn.Module): def __init__(self, input_dim, layer_num): super(CrossLayerBlock, self).__init__() # Number of cross layers self.layer_num = layer_num # Enter the dimension of the feature self.input_dim = input_dim self.layer = nn.ModuleList(CrossLayer(self.input_dim) for _ in range(self.layer_num)) def forward(self, x0): xi = x0 for i in range(self.layer_num): xi = xi + self.layer[i](x0, xi) return xi # Figure 1 in the paper class DCN(nn.Module): # Initialize network layer def __init__(self, input_dim, embedding_dim, cross_layer_num, deep_layer_num): super(DCN, self).__init__() # Enter the dimension of the feature self.input_dim = input_dim # Dimension of embedded vector self.embedding_dim = embedding_dim # Number of cross layer s self.cross_layer_num = cross_layer_num # Number of deep layer s self.deep_layer_num = deep_layer_num # Embedded layer self.embedding_layer = nn.Linear(self.input_dim, self.embedding_dim, bias=False) # cross layer self.cross_layer = CrossLayerBlock(self.embedding_dim, self.cross_layer_num) # deep layer self.deep_layer = nn.ModuleList() for i in range(self.deep_layer_num): layer = nn.Sequential( nn.Linear(self.embedding_dim, self.embedding_dim, bias=True), nn.ReLU(), ) self.deep_layer.append(layer) self.deep_layer = nn.Sequential(*self.deep_layer) # Output layer self.output_layer = nn.Linear(self.embedding_dim * 2, 2, bias=True) # Forward transfer and establish calculation diagram def forward(self, x): # Embedding features x_embedding = self.embedding_layer(x) # cross layer cross_resault = self.cross_layer(x_embedding) # deep layer deep_resault = self.deep_layer(x_embedding) temp_resault = torch.cat([cross_resault, deep_resault], dim=-1) out = self.output_layer(temp_resault) out = F.sigmoid(out) return out # Model test if __name__ == '__main__': x = torch.randn(1, 10) dcn = DCN(10, 20, 2, 2) output = dcn(x) print(output)
VII. FNN
7.1 thesis
Thesis title: deep learning over multi field categorical data – A Case Study on User Response Prediction 2016 ECIR
Thesis address: Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction
The model structure is shown in the figure below
The input is the feature in different domains, and then compare the FM expression in the figure below with the Dense Real Layer in FNN
First train to get an FM, and then use the corresponding parameters in FMInitialize the sense real layer of FNN.
Then the results are concatenated into the feedforward neural network
The author uses the prediction function in the feedforward neural network to get the final attention
Or train the network through cross entropy loss
Later, the author also gives an SNN, that is, a feedforward neural network, which attempts to use RBM (restricted Boltzmann machine) and DAE (denoising self encoder) to train the layer weight initialized with FM parameters in FNN.
7.2 code
import torch import torch.nn as nn import torch.nn.functional as F class FNN(nn.Module): # Initialize network layer def __init__( self, dense_input_dim, dense_output_dim, output_dim_list, ): super(FNN, self).__init__() # Input dimension of deny layer self.dense_input_dim = dense_input_dim # Output dimension of deny layer self.dense_output_dim = dense_output_dim # List of output layer dimensions of hiden layer self.output_dim_list = output_dim_list # dense layer self.dense_layer = nn.Linear(self.dense_input_dim, self.dense_output_dim, bias=True) # hiden layer self.hiden_layer = nn.ModuleList() for i in range(len(self.output_dim_list)): if i == 0: input_dim = self.dense_output_dim layer = nn.Sequential( nn.Linear(input_dim, self.output_dim_list[i], bias=True), nn.Tanh() ) self.hiden_layer.append(layer) input_dim = self.output_dim_list[i] self.hiden_layer = nn.Sequential(*self.hiden_layer) # # Initialize the parameters of FNN with the pre trained FM parameters # self.__init_layer() # Output layer self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=True) # # Initialize FNN parameters # def __init_layer(self, w, b): # # Traverse the network layer of FNN # for m in self.modules(): # if isinstance(m, nn.Linear): # m.weight.data = w # m.bias.data = b # break # Forward transfer and establish calculation diagram def forward(self, x): dense_result = self.dense_layer(x) out = self.hiden_layer(dense_result) out = self.output_layer(out) out = F.sigmoid(out) return out # Model test if __name__ == '__main__': x = torch.randn(1, 10) fnn = FNN(dense_input_dim=10, dense_output_dim=20, output_dim_list=[15, 10, 5]) output = fnn(x) print(output)
VIII. DeepFM
8.1 thesis
Thesis title: deepfm: a factorization machine based neural network for CTR prediction 2017 IJCAI
Thesis address: <DeepFM: A Factorization-Machine based Neural Network for CTR Prediction>
The wide part of the wide & deep model is improved by using FM layer. The model structure is
The inputs are different characteristic fields (gender, age, address...) One hot sparse coding vector, and then embedded through the Embedding layer to get the embedded vector. Conventional operation, nothing to say. Let's talk about the FM layer proposed by the model
Like the product layer of PNN, '+' here is the linear combination of features, and the following 'X' is the inner product (m*(m-1)/2 results) between different features, which are connected and input into the output layer
The author also compares it with FNN, PNN and wide & deep
8.2 code
import torch import torch.nn as nn import torch.nn.functional as F class DeepFM(nn.Module): # Initialize network layer def __init__( self, field_num, feature_dim, embedding_dim, output_dim_list, ): super(DeepFM, self).__init__() # Number of features in different domains self.field_num = field_num # Dimensions of different domain characteristics entered self.feature_dim = feature_dim # Feature embedding dimension self.embedding_dim = embedding_dim # Output dimensions of different layers of feature cross layer self.output_dim_list = output_dim_list # Embedded layer self.embedding_layer = nn.ModuleList() for i in range(self.field_num): self.embedding_layer.append(nn.Linear(self.feature_dim, self.embedding_dim, bias=False)) # FM layer self.fm_layer = nn.Linear(self.field_num * self.embedding_dim, 1, bias=False) # hidden layer self.hidden_layer = nn.ModuleList() for i in range(len(self.output_dim_list)): if i == 0: input_dim = self.field_num * self.embedding_dim layer = nn.Sequential( nn.Linear(input_dim, self.output_dim_list[i], bias=True), nn.ReLU(True), ) self.hidden_layer.append(layer) # Updating the input dimension of the next layer is the output dimension of the current layer input_dim = self.output_dim_list[i] self.hidden_layer = nn.Sequential(*self.hidden_layer) # Output layer input_dim = 1 + int(self.field_num * (self.field_num - 1) / 2) + self.output_dim_list[-1] self.output_layer = nn.Linear(input_dim, 2, bias=True) # Forward transfer and establish calculation diagram def forward(self, x_list): # Embed # The length of the feature list must be equal to the number of feature fields assert len(x_list) == self.field_num embedding_resault = [] for i in range(self.field_num): embedding_resault.append(self.embedding_layer[i](x_list[i])) # Linear transformation of FM embedding_feature = torch.cat(embedding_resault, dim=-1) l_result = self.fm_layer(embedding_feature) # Hidden layer section hiddent_result = self.hidden_layer(embedding_feature) # FM calculation inner product part embedding_feature = torch.stack(embedding_resault, dim=0) # Calculating inner product matrix innear_product = torch.matmul(embedding_feature, embedding_feature.T) # Gets the inner product value of the upper triangular division diagonal innear_result = torch.triu(innear_product, diagonal=1) innear_result = innear_result[innear_result != 0] # Transform dimension innear_result = innear_result.reshape(1, -1).squeeze() fm_result = torch.cat([l_result, innear_result], dim=-1) input_feature = torch.cat([fm_result, hiddent_result], dim=-1) out = self.output_layer(input_feature) out = F.sigmoid(out) return out # Model test if __name__ == '__main__': # Randomly generate features of different feature domains x_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] x_list = torch.tensor(x_list, dtype=torch.float32) deepFM = DeepFM(feature_dim=3, field_num=3, embedding_dim=6, output_dim_list=[5, 4]) output = deepFM(x_list) print(output)
Nine NFM
9.1 thesis
Title of the paper: neural factorization machines for spark predictive analytics 2017 SIGIR
Thesis address: <Neural Factorization Machines for Sparse Predictive Analytics>
NFM uses neural network to overcome the problem that FM has weak expression ability and can not capture high-order feature cross information.
The expression for FM is
NFM uses neural network to replace the part of the last second-order feature intersection
The neural network structure is
There is Embedding in the input. Needless to say, the feed-forward neural network in the back is used for routine operation. Let's talk about the Bi interaction layer introduced by NFM
Among themIt is the embedded vector representation of the ith feature. The Bi interaction layer makes Hadamard product of these embedded vectors in pairs, then adds them and inputs them into the following feedforward neural network
The final NFM expression is
NFM also uses the Dropout strategy in Bi interaction layer, and BN(Batch Normalization) is used for the output of Bi interaction layer and the output of feedforward neural network
9.2 code
import torch import torch.nn as nn import torch.nn.functional as F class NFM(nn.Module): # Initialization layer def __init__( self, field_num, feature_dim, embedding_dim, output_dim_list, ): super(NFM, self).__init__() # Number of characteristic fields self.field_num = field_num # Feature dimension self.feature_dim = feature_dim # Embedded dimension self.embedding_dim = embedding_dim # Output dimension of hidden layer self.output_dim_list = output_dim_list # Embedded layer self.embedding_layer = nn.ModuleList() for i in range(self.field_num): layer = nn.Linear(self.feature_dim, self.embedding_dim, bias=False) self.embedding_layer.append(layer) # hidden layer self.hidden_layer = nn.ModuleList() for i in range(len(self.output_dim_list)): if i == 0: input_dim = self.embedding_dim layer = nn.Sequential( nn.Linear(input_dim, self.output_dim_list[i], bias=True), # nn.BatchNorm1d(), nn.ReLU(), ) self.hidden_layer.append(layer) input_dim = self.output_dim_list[i] self.hidden_layer = nn.Sequential(*self.hidden_layer) # Output layer self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=True) # Forward transfer and establish calculation diagram def forward(self, x_list): assert len(x_list) == self.field_num # embed embedding_result = [] for i in range(self.field_num): embedding_result.append(self.embedding_layer[i](x_list[i])) # Bi-interaction pooling batch_size = x_list[0].size()[0] bi_pool_result = torch.empty(batch_size, self.embedding_dim) for i in range(self.field_num): for j in range(self.field_num): bi_pool_result += embedding_result[i] * embedding_result[j] out = self.hidden_layer(bi_pool_result) out = self.output_layer(out) out = F.sigmoid(out) return out # Model test if __name__ == '__main__': # Randomly generate features of different feature domains x_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] x_list = torch.tensor(x_list, dtype=torch.float32) nfm = NFM(field_num=3, feature_dim=3, embedding_dim=10, output_dim_list=[5, 4]) output = nfm(x_list) print(output)
Ten AFM
10.1 thesis
Thesis title: attentive factorization machines: learning the weight of feature interactions via attention networks 2017 IJCAI
Thesis address: <Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks>
NFM's Bi interaction layer directly adds the results of feature intersection, which implies a hypothesis: all feature intersections have the same impact on the final result. AFM believes that the results of different feature intersections should be treated differently, and introduces the attention mechanism.
The model structure is
Most of them are the same as NFM, but after feature crossing (Hadamard product), the results are input into the attention network (that is, a fully connected layer, don't think too complex) to learn the attention weight, and then sum with the attention weight
The calculation formula of attention weight is
Finally, the expression of AFM model is
10.2 code
import torch import torch.nn as nn import torch.nn.functional as F class AFM(nn.Module): # Initialize network layer def __init__( self, field_num, feature_dim, embedding_dim, output_dim_list, attention_dim, ): super(AFM, self).__init__() # Number of characteristic fields self.field_num = field_num # Feature dimension self.feature_dim = feature_dim # Embedded dimension self.embedding_dim = embedding_dim # Output dimension of hidden layer self.output_dim_list = output_dim_list # Hidden layer output dimension of attention network self.attention_dim = attention_dim # Embedded layer self.embedding_layer = nn.ModuleList() for i in range(self.field_num): layer = nn.Linear(self.feature_dim, self.embedding_dim, bias=False) self.embedding_layer.append(layer) # Attention network self.attention_layer = nn.Sequential( nn.Linear(self.embedding_dim, self.attention_dim, bias=True), nn.ReLU(), nn.Linear(self.attention_dim, 1, bias=False) ) # hidden layer self.hidden_layer = nn.ModuleList() for i in range(len(self.output_dim_list)): if i == 0: input_dim = self.embedding_dim layer = nn.Sequential( nn.Linear(input_dim, self.output_dim_list[i], bias=True), nn.ReLU(), ) self.hidden_layer.append(layer) input_dim = self.output_dim_list[i] self.hidden_layer = nn.Sequential(*self.hidden_layer) # Output layer self.output_layer = nn.Linear(self.output_dim_list[-1], 2, bias=True) # Forward transfer and establish calculation diagram def forward(self, x_list): assert len(x_list) == self.field_num # embed embedding_result = [] for i in range(self.field_num): embedding_result.append(self.embedding_layer[i](x_list[i])) # attention-based pooling pair_wise_interaction_result = [] for i in range(self.field_num): for j in range(i + 1, self.field_num): pair_wise_interaction_result.append(embedding_result[i] * embedding_result[j]) # Attention weight attention_weight = [] for i in pair_wise_interaction_result: attention_weight.append(self.attention_layer(i)) attention_weight = torch.tensor(attention_weight, dtype=torch.float32) attention_weight = F.softmax(attention_weight, dim=-1) result = 0 for i in range(len(attention_weight)): result += attention_weight[i] * pair_wise_interaction_result[i] out = self.hidden_layer(result) out = self.output_layer(out) out = F.sigmoid(out) return out # Model test if __name__ == '__main__': # Randomly generate features of different feature domains x_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] x_list = torch.tensor(x_list, dtype=torch.float32) afm = AFM(field_num=3, feature_dim=3, embedding_dim=10, attention_dim=5, output_dim_list=[5, 4]) output = afm(x_list) print(output)