Deep eye Paper with reading notes GNN 01.Node2Vec

preface

This course is from the eye of depth, and some screenshots are from the course video.

Article title: Node2Vec: Scalable Feature Learning for Networks
node2vec: representation learning of large-scale network nodes
Author: Adiya Grover, Jure Leskovec
Unit: Stanford
Release meeting and time: KDD 2016 (KDD comparison experiment)
For formula input, please refer to: Online Latex formula

Thesis structure

  1. Abstract: introduce the background and propose node2vec model, flexible and adjustable search strategy.
  2. Introduction introduces the importance of the diagram and compares it with previous methods such as PCA, ISOMAP and DeepWalk.
  3. Related Work: traditional graph based artificial feature algorithm, dimension reduction algorithm, etc.
  4. Feature Learning: basic concept of graph, skip gram algorithm and optimization objective function.
  5. Search strategies: BFS, DFS, search strategies.
  6. Node2vec: based random walk algorithm, selection method of parameters p and q, time complexity analysis
    alias sampling.
  7. Effectiveness: experiment to explore the effectiveness of the model: Case study, baselines parameter setting, classification task and edge prediction task.
  8. Experiments: experiment to explore robustness and scale.
  9. Discussion: This paper summarizes and puts forward a search adjustable network representation learning method, and puts forward further research directions.

Learning objectives


Research background, achievements and significance of this paper

Research background

Ubiquitous: computer, social sciences, biology, economics, statistics
Graph is a model describing complex data (arbitrary number of nodes and complex relationships) vs picture (CNN) and text structure (word2vec)

SNAP:Stanford Large Network Dataset Collection
snap data set is a network data set collected by Jure and others continuously, which greatly promotes the development of social network field. The research covers: node classification, link prediction, community detection, virtual marketing and network similarity

The learning goal of Network embedding is to reduce the dimension of nodes or edges in high-dimensional graphs into d-dimensional vectors.

research findings

Case analysis based on Les Miserables dataset
Data set size: n=77, m=254
Visualization results of homogeneity (community attribute, same attribute):

Structural (blue dots are structurally connected with yellow and red dots) visualization results:

Node classification task
Macro-F1 vs Micro-F1
Macro-F1: the distribution calculates F1 for each category, and then averages it
Micro F1: calculate the total number of TP, FN and FP first, and then calculate F1
The data set has three, vertex, edge and classification quantity, as shown in the table below:

Comparison with baseline:

The last line in the above figure is the performance improvement percentage, and the penultimate line is the two super parameter settings for Node2Vec to achieve the best effect.

research meaning

·It is the most cited article 2800 + in Jure at present (as of April 2020)
·Like DeepWalk[2014], it belongs to the representative work of early network representation learning, and later as a classic baseline
·Inspired a lot of work to do network representation learning based on random walk

extensive reading

Abstract core viewpoint

1. Emphasize the shortcomings of previous work based on Feature Engineering, so as to lead to node2vec and explore the diversity of neighborhoods
2. Propose adjustable search strategy through biased random walk algorithm to generate node sequence information with different semantics
3. Discuss the efficiency and robustness of the algorithm, and the characteristics of the model from case analysis and a large number of experimental papers
4. Based on the above algorithm, node2vec algorithm achieves the current SOTA on network data sets in multiple fields

Thesis title

  1. Introduction
  2. Related Work
  3. Feature Learning Framework
    3.1 Classic search strategies
    3.2 node2vec
    3.3 learning edge features
  4. Experiments
    4.1case study
    4.2 Experiments setup
    4.3 multi-label classification
    4.4parameter sensitivity
    4.5 Perturbation analysis
    4.6Scalability
    4.7 link prediction
  5. Discussion and conclusion

Feature Engineering of traditional graph

Node centralities
·Degree(in/out): degrees
·Betweeness: Bridge
·Closeness: path length
·Pagerank: node importance

Construction of graph

1.The fully connected graph
2.The ϵ \epsilon ϵ-neighborhood graph
3.k-nearest neighbor graphs
4. Build drawings according to practical application problems
5. Synthesize the diagram according to the research questions

Application of graph

Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba,
alibaba kdd 2018.
Recommendation system: the dotted line in the figure on the left represents the division line of a certain time period. Looking horizontally, it is a time series. User 1 has successively purchased three DAB items. Therefore, in the second figure, the three DAB nodes have a directional relationship, and the second user has purchased BEDEF. Because there is a long time interval between E and D, there is no directional edge between ED nodes. After constructing the graph structure, use DeepWalk or random walk to generate many sequence s (Figure c), and finally connect word2Vec algorithm for prediction.

iFashion:
POG:Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion, alibaba kdd 2019

intensive reading

Overview of algorithm model in this paper

Using Word2Vec algorithm for reference, a similar loss function is introduced
The expressions obtained by DFS and BFS variable diagrams have different meanings
biased random walk algorithm: p and q
Representation of the diagram:
Given G = ( V , E ) G=(V,E) G=(V,E), our goal is to learn a mapping
f : u → R d f:u→R^d f:u→Rd
Log-likelihood objective:
max f ∑ u ∈ V logPr ( N S ( u ) ∣ f ( u ) ) (1) \underset{f}{\text{max}}\sum_{u\in V}\text{logPr}(N_S(u)|f(u))\tag1 fmax​u∈V∑​logPr(NS​(u)∣f(u))(1)
where N S ( u ) N_S(u) NS​(u) is neighborhood of node u u u.
Given node u u u, we want to learn feature representations predictive of nodes in its neighborhood N S ( u ) N_S(u) NS​(u) .
BFS: Micro-view of neighbourhood

DFS: Macro-view of neighbourhood

Look at the examples of mages and archmages in the text below
Two classic strategies to define a neighborhood N S ( u ) N_S(u) NS​(u) of a given node u u u (1-hop):

In the figure above:
N B F S ( u ) = { s 1 , s 2 , s 3 } , Local microscopic view N D F S ( u ) = { s 4 , s 5 , s 6 } , Global macroscopic view N_{BFS(u)}=\{s_1,s_2,s_3\},\text{Local microscopic view}\\ N_{DFS(u)}=\{s_4,s_5,s_6\},\text{Global macroscopic view} NBFS(u)​={s1​,s2​,s3​},Local microscopic viewNDFS(u)​={s4​,s5​,s6​},Global macroscopic view

Details of algorithm model in this paper

Detail 1

Optimization goal: similar to skip gram
Independence Assumption: (Assumption)
Conditional likelihood factors over the set of neighbors
Negative sampling
SGD optimization method
The core of this paper: generating by random walk strategy N S ( u ) N_S(u) NS​(u)
Due to the assumption that neighbor nodes do not affect each other, formula (1) can be written as (continuous multiplication variable continuous addition):
logPr ( N S ( u ) ∣ f ( u ) ) = ∑ n i ∈ N S ( u ) logPr ( f ( n i ) ∣ f ( u ) ) \text{logPr}(N_S(u)|f(u))=\sum_{n_i\in N_S(u)}\text{logPr}(f(n_i)|f(u)) logPr(NS​(u)∣f(u))=ni​∈NS​(u)∑​logPr(f(ni​)∣f(u))
softmax is on the right:
Pr ( f ( n i ) ∣ f ( u ) ) = exp ( f ( n i ) ⋅ f ( u ) ) ∑ v ∈ V exp ( f ( v ) ⋅ f ( u ) ) \text{Pr}(f(n_i)|f(u))=\cfrac{\text{exp}(f(n_i)\cdot f(u))}{\sum_{v\in V}\text{exp}(f(v)\cdot f(u))} Pr(f(ni​)∣f(u))=∑v∈V​exp(f(v)⋅f(u))exp(f(ni​)⋅f(u))​
Then add the external log, and formula (1) can actually be written as (denominator on the left, numerator on the right, ignoring a constant coefficient on the left):
max f ∑ u ∈ V [ − log Z u + ∑ n i ∈ N S ( u ) f ( n i ) ⋅ f ( u ) ] \underset{f}{\text{max}}\sum_{u\in V}\left [-\text{log}Z_u+\sum_{n_i\in N_{S}(u)}f(n_i)\cdot f(u)\right ] fmax​u∈V∑​⎣⎡​−logZu​+ni​∈NS​(u)∑​f(ni​)⋅f(u)⎦⎤​
among
Z u = ∑ v ∈ V exp ( f ( v ) ⋅ f ( u ) ) Z_u=\sum_{v\in V}\text{exp}(f(v)\cdot f(u)) Zu​=v∈V∑​exp(f(v)⋅f(u))
In the above formula, it is necessary to calculate each vertex. In fact, the computational complexity is quite high. This paper uses the negative sampling and hierarchical softmax in Word2Vec for reference to optimize this item.
The negative sampling in Word2Vec uses the one with high word frequency as the negative sample. Here, the degree of the vertex is used as the word frequency for negative sampling. The higher the degree, the greater the sampling probability.

Detail 2 BFS and DFS

BFS algorithm uses data structures: queue (queue), FIFO (first in first out)
This paper holds that BFS: structural equivalence (structural similarity, similar neighbors)

from collections import deque 
def iter-bfs(G,s,S=None): 
S,Q=set() deque()# Visited-set and queue 
Q. append(s)# We plan on visiting s vertex s is the starting point of traversal
while Q:# Planned nodes left? Place the neighbor node to be accessed in the Q queue
	u=Q.popleft()# Get one 
	if u in S: continue# Already visited? Skip it #Vertex u accessed skip
	S.add(u)# We 've visited it now add vertices to the accessed set S
	Q.extend(G[u])# Schedule all neighbors G[u] represents all neighbors of vertex u and puts them in the queue
	yield u# Report u as visited, print vertices

DFS algorithm uses the data structure: stack (stack), first in and last out
This paper holds that DFS: homophily (homogeneity / community similarity, no matter how you go, you are in a community)

def iter_dfs(G,s): 
S,Q=set(),[]# Visited-set and queue 
Q. append(s)# We plan on visitings 
while Q:# Planned nodes left?
	u=Q.pop()# Get one note that this is not a popleft
	if u in S: continue #Already visited? Skip it 
	S.add(u)# We' ve visited it now 
	Q.extend(G[u])# Schedule all neighbors 
	yield u # Report u as visited

Detail 3 biased Random Walk algorithm

Core of the paper:
The traditional random walk does not have the ability to explore different types of fields of nodes. This paper holds that the network has structural homogeneous similarity at the same time. The formula of traditional RW is (the probability of jumping from node x to v):
P ( c i = x ∣ c i − 1 = v ) = { π v x Z  if  ( v , x ) ∈ E 0  otherwise  P(c_i=x|c_{i-1}=v)=\begin{cases} \cfrac{\pi_{vx}}{Z} & \text{ if } (v,x)\in E \\ 0 & \text{ otherwise } \end{cases} P(ci​=x∣ci−1​=v)=⎩⎨⎧​Zπvx​​0​ if (v,x)∈E otherwise ​

w v x = 1 w_{vx}=1 wvx = 1 indicates no authority graph
Z Z In fact, Z is not important. It is the normalized term of all weights.
Section 3.2 of this paper focuses on the biased random walk algorithm (2nd order):

α p q ( t , x ) = { 1 p  if  d t x = 0 1  if  d t x = 1 1 q  if  d t x = 2 \alpha_{pq}(t,x)=\begin{cases} \cfrac{1}{p} & \text{ if } d_{tx}=0 \\ 1 & \text{ if } d_{tx}=1 \\ \cfrac{1}{q} & \text{ if } d_{tx}=2 \end{cases} αpq​(t,x)=⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧​p1​1q1​​ if dtx​=0 if dtx​=1 if dtx​=2​
For example, the shortest path from t to t in the above figure is 0, so α = 1 p \alpha=\cfrac{1}{p} α=p1​;
The shortest path from t to x1 is 1, so α = 1 \alpha=1 α=1;
The shortest path from t to x2 is 1, so α = 1 q \alpha=\cfrac{1}{q} α=q1​;
The shortest path from t to x3 is 1, so α = 1 q \alpha=\cfrac{1}{q} α=q1​;
π v x = α p q ( t , x ) ⋅ w v x \pi_{vx}=\alpha_{pq}(t,x)\cdot w_{vx} πvx​=αpq​(t,x)⋅wvx​
d t x d_{tx} dtx , is the shortest path between t and x, and the value range is 0, 1 and 2
The current time step i is at node v, and the i-1 time step is at point t. which node the current time step goes to depends on the i-1 time step. Therefore, it is called biased random walk based on 2nd order
P and Q control the speed of leaving other neighbors from the source point v. they are super parameters. For example, when p=1, q=10 and the weight is 1, they tend to go x2 and x3, that is, depth first.

p: Return parameter
Large p value: it tends not to backtrack, which reduces the redundancy of 2-hop
Small p value: tend to backtrack, and the sampling sequence is concentrated around the starting point
q: In-out parameter
q>1: BFS-behavior,local view
q<1: DFS-behavior

Detail four algorithm

#G = (V, E, W) where W is the weight of the edge
#The number of sampled sequences is r and the sequence length is l
#Context length is k
LearnFeatures (Graph G = (V, E, W). Dimensions d, Walks per node r, Walk length l, Context size k, Return p, In-out
q)
π = PreprocessModifiedWeights(G, p, q)
G′ = (V, E, π)
 Initialize walks to Empty
for iter = 1 to r do#Each point goes r times to get r sequences, and the sequences are added to the walks
 for all nodes u ∈ V do
 walk = node2vec Walk(G′, u, l)
 Append walk to walks
f = StochasticGradientDescent(k, d walks)
return f#Last learned function (network)
node2vecWalk (Graph G′ = (V, E, π), Start node u, Length l)
 Inititalize walk to [u]
for walk_iter = 1 to l do
 curr = walk[−1]
 Vcurr  = GetNeighbors(curr, G′)#Get all neighbors of the current node
 s = AliasSample(Vcurr, π)#Sampling in biased random walk, see the following sampling techniques for details
 Append s to walk
return walk

Detail 5 alias sampling

If an event p has four states, the probability of occurrence is:
p = [ 0.3 , 0.2 , 0.1 , 0.4 ] p=[0.3,0.2,0.1,0.4] p=[0.3,0.2,0.1,0.4]
Then accumulate one by one:
s u m p = [ 0.3 , 0.5 , 0.6 , 1 ] sump=[0.3,0.5,0.6,1] sump=[0.3,0.5,0.6,1]
To check which state a probability corresponds to, you can use:
Query linear search one by one: O ( n ) O(n) O(n)
Since there is an increasing sequence after accumulation, you can use binary search: O ( log ⁡ n ) O(\log n) O(logn)
There is also a more NB approach: alias sampling , the time complexity is O ( 1 ) O(1) O(1)
The general steps are as follows:
There are four probabilities

There is a probability of four events, so changing the total area from 1 to 4 is to divide them by 1 / 4 (n events are divided by 1/n)

Crosscutting by height 1

It is found that the brown part on the right is vacant. It is necessary to find the largest local tyrant and re divide it by 2 / 3 (pay attention to the color):

Find another local tyrant and divide it by 2 / 3. After that, another piece of blue is missing:

Find pink Division 1 / 3:

You can see that there are at most two events in each column and get the following table:

Prob represents the probability of this event and Alias represents the probability of other events. First locate to a column, and then judge whether you are yourself or Alias according to the probability value.

Experimental setup and result analysis

Case Study: Les Misérables network

The data set is visualized with k = 254 and K = 254.
set p = 1, q = 0.5

set p = 1, q = 2

Experimental setup

Then prepare to start the comparative test
The following table shows that there are four baseline s in this paper: spectral clustering (Laplace decomposition of graph), and two based on DL

Let's say here that when p=q=1, there is no control over the trend of random walk, which is the same as deepwalk.
For fairness, the amount of training data used by the other two DL methods is the same, both
K = r ⋅ l ⋅ ∣ V ∣ K=r\cdot l\cdot |V| K=r⋅l⋅∣V∣
That is, the number of generated sequences (10) multiplied by the length of sequence (80) multiplied by the number of nodes is the same.
Both use SGD as the optimizer.
All use negative sampling.

Multi-label classification

There are three types of data sets:
Microblog blogcatalog [38]: This is a network of social relationships of the bloggers listed on the blogcatalog website The labels represent blogger interests inferred through the metadata provided by the bloggers. The network has 10,312 nodes, 333,983 edges, and 39 different labels.
Protein protein interactions (PPI) [5]: we use a subgraph of the PPI network for Homo sapiens The subgraph corresponds to the graph induced by nodes for which we could obtain labels from the hallmark gene sets [19] and represent biological states. The network has 3,890 nodes, 76,584 edges, and 50 different labels.
Wikipedia [20]: This is a occurrence network of words appearing in the first million bytes of the Wikipedia dump The labels represent the Part-ofSpeech (POS) tags inferred using the Stanford POS-Tagger [32]. The network has 4,777 nodes, 184,812 edges, and 40 different labels.
The above three data sets have one feature, that is, they have both structural characteristics and community characteristics
All these networks exhibit a fair mix of homophilic and structural equivalences.
The results are shown in the figure above.

Parameter sensitivity

Using different data for training, the results obtained on macro f1 and micro f1 are as follows:

Sensitivity of parameters:

The smaller the p and q, the more detailed the backtracking and in-depth walk, the better the effect, and the larger the dimension is, the better. The following three figures are the number of sequence s taken, the length of walk, and the window size of the context. The larger the better.

scalibility

Basic linearity.

Link prediction

Changed dataset:
Facebook [14]: In the Facebook network, nodes represent users, and edges represent a friendship relation between any two users. The network has 4,039 nodes and 88,234 edges.
Protein-Protein Interactions (PPI) [5]: In the PPI network for Homo Sapiens, nodes represent proteins, and an edge indicates a biological interaction between a pair of proteins. The network has 19,706 nodes and 390,633 edges.
arXiv ASTRO-PH [14]: This is a collaboration network generated from papers submitted to the e-print arXiv where nodes represent scientists, and an edge is present between two scientists if they have collaborated in a paper. The network has 18,722 nodes and 198,110 edges.
result:

At the top are four traditional algorithms

Common Neighbors: predict whether there are edges according to whether there are common neighbor nodes.
Jaccard's Coefficient: this method is quite normalized in the above method. For example, two big V's have 1000 common friends, but their respective friends have 10 million respectively. Two ordinary people have 50 common friends and about 100 respective friends. Then these two ordinary people may know each other better than big v.
Adamic ADAR score: consider the weight of friends based on the previous algorithm.
Preferred attachment: if two users have more friends, they are more likely to be willing to establish contact. That is, the principle of "the rich get richer". Based on this idea, the product of the number of friends of their two users is used as the score.
There are also some references: https://blog.csdn.net/a358463121/article/details/79350292
The following abcd represents four different treatments of points and edges:
(a) Average, (b) Hadamard, © Weighted-L1, and (d) WeightedL2

The effect of b is the best.

Paper summary

Key points
word2vec training framework
Generating training sequence based on random walk
Performance – alias sampling
Experimental setup
Innovation
Discuss the semantics of bfs and dfs
Design biased random network
Rich experimental demonstration effect
Inspiration Point
The role of graph understanding in network representation learning (must see)
The author of this paper is a researcher in the direction of graph / social network, and his perspective is the understanding of graph
The random walk method inspired a lot of work
From random walk in 2014, node2vec in 2016 (this article) to 2020, there is still a lot of work on metapath2vec:Scalable Representation Learning for Heterogeneous Networks[kdd17]
Complexity analysis
The algorithm part is discussed in detail, which shows the efficiency of the algorithm

Reappearance

karate dataset, which can be https://github.com/aditya-grover/node2vec/ Download
34 points
It is mainly the code implementation of alias table. Pay attention to the pseudo code above

main.py

'''
Reference implementation of node2vec. 

Author: Aditya Grover

For more details, refer to the paper:
node2vec: Scalable Feature Learning for Networks
Aditya Grover and Jure Leskovec 
Knowledge Discovery and Data Mining (KDD), 2016
'''

import argparse
import numpy as np
import networkx as nx
import node2vec
from gensim.models import Word2Vec


# Read the drawing and set the model parameters
# Calculate the alias table of points and edges
# Generating node sequence by biased random walk
# Using word2vec training model
# Result presentation and visualization



# 1. Read the drawing and set the model parameters
# 1) Set model parameters: set graph related parameters, such as directed undirected graph, weight graph, and model coefficients, such as p, q, embedding length, etc
# 2) The way of reading pictures is very simple. Use networx package to load edgelist directly
# 3) Input and output
# Input file/ graph/karate.edgelist'
# Output file/ emb/karate.emtb'
def parse_args():
    '''
	Parses the node2vec arguments.
	'''
    parser = argparse.ArgumentParser(description="Run node2vec.")
	# input file 
    parser.add_argument('--input', nargs='?', default='graph/karate.edgelist', help='Input graph path')
	# output file
    parser.add_argument('--output', nargs='?', default='emb/karate.emb', help='Embeddings path')
	# Representation dimension
    parser.add_argument('--dimensions', type=int, default=128, help='Number of dimensions. Default is 128.')
	# sequence length
    parser.add_argument('--walk-length', type=int, default=80, help='Length of walk per source. Default is 80.')
	# The number of times each node generates a sequence
    parser.add_argument('--num-walks', type=int, default=10, help='Number of walks per source. Default is 10.')
	# Skip gram context window size
    parser.add_argument('--window-size', type=int, default=10, help='Context size for optimization. Default is 10.')

    parser.add_argument('--iter', default=1, type=int, help='Number of epochs in SGD')

    parser.add_argument('--workers', type=int, default=8, help='Number of parallel workers. Default is 8.')

    parser.add_argument('--p', type=float, default=1, help='Return hyperparameter. Default is 1.')

    parser.add_argument('--q', type=float, default=1, help='Inout hyperparameter. Default is 1.')

    parser.add_argument('--weighted', dest='weighted', action='store_true',
                        help='Boolean specifying (un)weighted. Default is unweighted.')
    parser.add_argument('--unweighted', dest='unweighted', action='store_false')
    parser.set_defaults(weighted=False)

    parser.add_argument('--directed', dest='directed', action='store_true',
                        help='Graph is (un)directed. Default is undirected.')
    parser.add_argument('--undirected', dest='undirected', action='store_false')
    parser.set_defaults(directed=False)

    return parser.parse_args()


def read_graph():
    '''
	Reads the input network in networkx.
	'''
    if args.weighted:#Different weights are set for the directed weighted graph
        G = nx.read_edgelist(args.input, nodetype=int, data=(('weight', float),), create_using=nx.DiGraph())
    else:#The weight of the directed graph is set to 1
        G = nx.read_edgelist(args.input, nodetype=int, create_using=nx.DiGraph())
        for edge in G.edges():
            G[edge[0]][edge[1]]['weight'] = 1

    if not args.directed:#Undirected graph
        G = G.to_undirected()

    return G


def learn_embeddings(walks):
    '''
	Learn embeddings by optimizing the Skipgram objective using SGD.
	'''
	#This sentence is to convert int to string, but Python 2
    #walks = [map(str, walk) for walk in walks]

	#Rewritten to Python 3
	walk_new = []
	for walk in walks:
		tmp = []
	for node in walk:
		tmp.append(str(node))
	walk_new.append(tmp)

    model = Word2Vec(walk_new, size=args.dimensions, window=args.window_size, min_count=0, sg=1, workers=args.workers,
                     iter=args.iter)
    model.save_word2vec_format(args.output)

    return


def main(args):
    '''
	Pipeline for representational learning for all nodes in a graph.
	'''
    nx_G = read_graph()
    G = node2vec.Graph(nx_G, args.directed, args.p, args.q)
    G.preprocess_transition_probs()#Generate alias table
    walks = G.simulate_walks(args.num_walks, args.walk_length)# Generating node sequence by biased random walk
    learn_embeddings(walks)


if __name__ == "__main__":
    args = parse_args()
    main(args)

node2vec.py

import numpy as np
import networkx as nx
import random


class Graph():
	def __init__(self, nx_G, is_directed, p, q):
		self.G = nx_G
		self.is_directed = is_directed
		self.p = p
		self.q = q

	def node2vec_walk(self, walk_length, start_node):#The core algorithm is how to generate a single biased random walk sequence
		'''
		Simulate a random walk starting from start node.
		'''
		G = self.G
		alias_nodes = self.alias_nodes
		alias_edges = self.alias_edges

		walk = [start_node]

		while len(walk) < walk_length:# Loop to the length of sequence
			cur = walk[-1]
			cur_nbrs = sorted(G.neighbors(cur))#The purpose of sorting here is to correspond to the calculation order of alias table
			if len(cur_nbrs) > 0:
				if len(walk) == 1:#If the sequence has only one node, it will not jump to other nodes
					walk.append(cur_nbrs[alias_draw(alias_nodes[cur][0], alias_nodes[cur][1])])
				else:
					prev = walk[-2]#Previous node
					#Next node
					next = cur_nbrs[alias_draw(alias_edges[(prev, cur)][0], 
						alias_edges[(prev, cur)][1])]
					walk.append(next)
			else:
				break

		return walk

	def simulate_walks(self, num_walks, walk_length):#Cyclic iterative single point walk function
		'''
		Repeatedly simulate random walks from each node.
		'''
		G = self.G
		walks = []#list
		nodes = list(G.nodes())
		print 'Walk iteration:'
		for walk_iter in range(num_walks):#Num per node_ Walks times
			print str(walk_iter+1), '/', str(num_walks)
			random.shuffle(nodes)# Disrupt the order of nodes
			for node in nodes:
				walks.append(self.node2vec_walk(walk_length=walk_length, start_node=node))

		return walks

	def get_alias_edge(self, src, dst):
		'''
		Get the alias edge setup lists for a given edge.
		'''
		G = self.G
		p = self.p
		q = self.q

		unnormalized_probs = []
		for dst_nbr in sorted(G.neighbors(dst)):#Jump to the core algorithm, which is related to p and q. p is backtracking and q is the next neighbor
			# Calculate the weight of the next item through pq
			if dst_nbr == src:
				unnormalized_probs.append(G[dst][dst_nbr]['weight']/p)
			elif G.has_edge(dst_nbr, src):
				unnormalized_probs.append(G[dst][dst_nbr]['weight'])
			else:
				unnormalized_probs.append(G[dst][dst_nbr]['weight']/q)
		# normalization
		norm_const = sum(unnormalized_probs)
		normalized_probs =  [float(u_prob)/norm_const for u_prob in unnormalized_probs]

		return alias_setup(normalized_probs)

	def preprocess_transition_probs(self):
		'''
		Preprocessing of transition probabilities for guiding the random walks.
		'''
		G = self.G
		is_directed = self.is_directed#Is there a directed graph

		alias_nodes = {}#Create dictionary
		for node in G.nodes():#Cycle each node on the graph
			unnormalized_probs = [G[node][nbr]['weight'] for nbr in sorted(G.neighbors(node))]#Find the neighbors of each node, for example, 4
			norm_const = sum(unnormalized_probs)#Sum 1 + 1 + 1 + 1 = 4 (here is the unauthorized graph)
			normalized_probs =  [float(u_prob)/norm_const for u_prob in unnormalized_probs]#The probability of taking a substitute is 1 / 4. If you don't notice your neighbor, you'll always jump.
			alias_nodes[node] = alias_setup(normalized_probs)#The jump probability is sampled using the alias table algorithm to obtain the complexity of O(1)

		alias_edges = {}
		triads = {}

		if is_directed:
			for edge in G.edges():#The opposite side is also operated in alias table
				alias_edges[edge] = self.get_alias_edge(edge[0], edge[1])
		else:
			for edge in G.edges():
				alias_edges[edge] = self.get_alias_edge(edge[0], edge[1])
				alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0])

		self.alias_nodes = alias_nodes
		self.alias_edges = alias_edges

		return


def alias_setup(probs):
	'''
	Compute utility lists for non-uniform sampling from discrete distributions.
	Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/
	for details
	'''
	K = len(probs)
	q = np.zeros(K)
	J = np.zeros(K, dtype=np.int)#Double

	smaller = []
	larger = []
	for kk, prob in enumerate(probs):#The probability is divided into two types, greater than 1 and less than 1
	    q[kk] = K*prob
	    if q[kk] < 1.0:
	        smaller.append(kk)
	    else:
	        larger.append(kk)
	# The greedy algorithm is used to fill the probability less than 1
	while len(smaller) > 0 and len(larger) > 0:
	    small = smaller.pop()
	    large = larger.pop()

	    J[small] = large
	    q[large] = q[large] + q[small] - 1.0
	    if q[large] < 1.0:
	        smaller.append(large)
	    else:
	        larger.append(large)

	return J, q

def alias_draw(J, q):#Specific sampling algorithm, O(1)
	'''
	Draw sample from a non-uniform discrete distribution using alias sampling.
	'''
	K = len(J)

	kk = int(np.floor(np.random.rand()*K))
	if np.random.rand() < q[kk]:
	    return kk
	else:
	    return J[kk]

Posted by rachae1 on Mon, 16 May 2022 09:48:19 +0300