# preface

This course is from the eye of depth, and some screenshots are from the course video.

Article title: Node2Vec: Scalable Feature Learning for Networks

node2vec: representation learning of large-scale network nodes

Author: Adiya Grover, Jure Leskovec

Unit: Stanford

Release meeting and time: KDD 2016 (KDD comparison experiment)

For formula input, please refer to: Online Latex formula

## Thesis structure

- Abstract: introduce the background and propose node2vec model, flexible and adjustable search strategy.
- Introduction introduces the importance of the diagram and compares it with previous methods such as PCA, ISOMAP and DeepWalk.
- Related Work: traditional graph based artificial feature algorithm, dimension reduction algorithm, etc.
- Feature Learning: basic concept of graph, skip gram algorithm and optimization objective function.
- Search strategies: BFS, DFS, search strategies.
- Node2vec: based random walk algorithm, selection method of parameters p and q, time complexity analysis

alias sampling. - Effectiveness: experiment to explore the effectiveness of the model: Case study, baselines parameter setting, classification task and edge prediction task.
- Experiments: experiment to explore robustness and scale.
- Discussion: This paper summarizes and puts forward a search adjustable network representation learning method, and puts forward further research directions.

## Learning objectives

# Research background, achievements and significance of this paper

## Research background

Ubiquitous: computer, social sciences, biology, economics, statistics

Graph is a model describing complex data (arbitrary number of nodes and complex relationships) vs picture (CNN) and text structure (word2vec)

SNAP:Stanford Large Network Dataset Collection

snap data set is a network data set collected by Jure and others continuously, which greatly promotes the development of social network field. The research covers: node classification, link prediction, community detection, virtual marketing and network similarity

The learning goal of Network embedding is to reduce the dimension of nodes or edges in high-dimensional graphs into d-dimensional vectors.

## research findings

Case analysis based on Les Miserables dataset

Data set size: n=77, m=254

Visualization results of homogeneity (community attribute, same attribute):

Structural (blue dots are structurally connected with yellow and red dots) visualization results:

Node classification task

Macro-F1 vs Micro-F1

Macro-F1: the distribution calculates F1 for each category, and then averages it

Micro F1: calculate the total number of TP, FN and FP first, and then calculate F1

The data set has three, vertex, edge and classification quantity, as shown in the table below:

Comparison with baseline:

The last line in the above figure is the performance improvement percentage, and the penultimate line is the two super parameter settings for Node2Vec to achieve the best effect.

## research meaning

·It is the most cited article 2800 + in Jure at present (as of April 2020)

·Like DeepWalk[2014], it belongs to the representative work of early network representation learning, and later as a classic baseline

·Inspired a lot of work to do network representation learning based on random walk

# extensive reading

## Abstract core viewpoint

1. Emphasize the shortcomings of previous work based on Feature Engineering, so as to lead to node2vec and explore the diversity of neighborhoods

2. Propose adjustable search strategy through biased random walk algorithm to generate node sequence information with different semantics

3. Discuss the efficiency and robustness of the algorithm, and the characteristics of the model from case analysis and a large number of experimental papers

4. Based on the above algorithm, node2vec algorithm achieves the current SOTA on network data sets in multiple fields

## Thesis title

- Introduction
- Related Work
- Feature Learning Framework

3.1 Classic search strategies

3.2 node2vec

3.3 learning edge features - Experiments

4.1case study

4.2 Experiments setup

4.3 multi-label classification

4.4parameter sensitivity

4.5 Perturbation analysis

4.6Scalability

4.7 link prediction - Discussion and conclusion

## Feature Engineering of traditional graph

Node centralities

·Degree(in/out): degrees

·Betweeness: Bridge

·Closeness: path length

·Pagerank: node importance

## Construction of graph

1.The fully connected graph

2.The
ϵ
\epsilon
ϵ-neighborhood graph

3.k-nearest neighbor graphs

4. Build drawings according to practical application problems

5. Synthesize the diagram according to the research questions

## Application of graph

Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba,

alibaba kdd 2018.

Recommendation system: the dotted line in the figure on the left represents the division line of a certain time period. Looking horizontally, it is a time series. User 1 has successively purchased three DAB items. Therefore, in the second figure, the three DAB nodes have a directional relationship, and the second user has purchased BEDEF. Because there is a long time interval between E and D, there is no directional edge between ED nodes. After constructing the graph structure, use DeepWalk or random walk to generate many sequence s (Figure c), and finally connect word2Vec algorithm for prediction.

iFashion:

POG:Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion, alibaba kdd 2019

# intensive reading

## Overview of algorithm model in this paper

Using Word2Vec algorithm for reference, a similar loss function is introduced

The expressions obtained by DFS and BFS variable diagrams have different meanings

biased random walk algorithm: p and q

Representation of the diagram:

Given
G
=
(
V
,
E
)
G=(V,E)
G=(V,E), our goal is to learn a mapping

f
:
u
→
R
d
f:u→R^d
f:u→Rd

Log-likelihood objective:

max
f
∑
u
∈
V
logPr
(
N
S
(
u
)
∣
f
(
u
)
)
(1)
\underset{f}{\text{max}}\sum_{u\in V}\text{logPr}(N_S(u)|f(u))\tag1
fmaxu∈V∑logPr(NS(u)∣f(u))(1)

where
N
S
(
u
)
N_S(u)
NS(u) is neighborhood of node
u
u
u.

Given node
u
u
u, we want to learn feature representations predictive of nodes in its neighborhood
N
S
(
u
)
N_S(u)
NS(u) .

BFS: Micro-view of neighbourhood

DFS: Macro-view of neighbourhood

Look at the examples of mages and archmages in the text below

Two classic strategies to define a neighborhood
N
S
(
u
)
N_S(u)
NS(u) of a given node
u
u
u (1-hop):

In the figure above:

N
B
F
S
(
u
)
=
{
s
1
,
s
2
,
s
3
}
,
Local microscopic view
N
D
F
S
(
u
)
=
{
s
4
,
s
5
,
s
6
}
,
Global macroscopic view
N_{BFS(u)}=\{s_1,s_2,s_3\},\text{Local microscopic view}\\ N_{DFS(u)}=\{s_4,s_5,s_6\},\text{Global macroscopic view}
NBFS(u)={s1,s2,s3},Local microscopic viewNDFS(u)={s4,s5,s6},Global macroscopic view

## Details of algorithm model in this paper

### Detail 1

Optimization goal: similar to skip gram

Independence Assumption: (Assumption)

Conditional likelihood factors over the set of neighbors

Negative sampling

SGD optimization method

The core of this paper: generating by random walk strategy
N
S
(
u
)
N_S(u)
NS(u)

Due to the assumption that neighbor nodes do not affect each other, formula (1) can be written as (continuous multiplication variable continuous addition):

logPr
(
N
S
(
u
)
∣
f
(
u
)
)
=
∑
n
i
∈
N
S
(
u
)
logPr
(
f
(
n
i
)
∣
f
(
u
)
)
\text{logPr}(N_S(u)|f(u))=\sum_{n_i\in N_S(u)}\text{logPr}(f(n_i)|f(u))
logPr(NS(u)∣f(u))=ni∈NS(u)∑logPr(f(ni)∣f(u))

softmax is on the right:

Pr
(
f
(
n
i
)
∣
f
(
u
)
)
=
exp
(
f
(
n
i
)
⋅
f
(
u
)
)
∑
v
∈
V
exp
(
f
(
v
)
⋅
f
(
u
)
)
\text{Pr}(f(n_i)|f(u))=\cfrac{\text{exp}(f(n_i)\cdot f(u))}{\sum_{v\in V}\text{exp}(f(v)\cdot f(u))}
Pr(f(ni)∣f(u))=∑v∈Vexp(f(v)⋅f(u))exp(f(ni)⋅f(u))

Then add the external log, and formula (1) can actually be written as (denominator on the left, numerator on the right, ignoring a constant coefficient on the left):

max
f
∑
u
∈
V
[
−
log
Z
u
+
∑
n
i
∈
N
S
(
u
)
f
(
n
i
)
⋅
f
(
u
)
]
\underset{f}{\text{max}}\sum_{u\in V}\left [-\text{log}Z_u+\sum_{n_i\in N_{S}(u)}f(n_i)\cdot f(u)\right ]
fmaxu∈V∑⎣⎡−logZu+ni∈NS(u)∑f(ni)⋅f(u)⎦⎤

among

Z
u
=
∑
v
∈
V
exp
(
f
(
v
)
⋅
f
(
u
)
)
Z_u=\sum_{v\in V}\text{exp}(f(v)\cdot f(u))
Zu=v∈V∑exp(f(v)⋅f(u))

In the above formula, it is necessary to calculate each vertex. In fact, the computational complexity is quite high. This paper uses the negative sampling and hierarchical softmax in Word2Vec for reference to optimize this item.

The negative sampling in Word2Vec uses the one with high word frequency as the negative sample. Here, the degree of the vertex is used as the word frequency for negative sampling. The higher the degree, the greater the sampling probability.

### Detail 2 BFS and DFS

BFS algorithm uses data structures: queue (queue), FIFO (first in first out)

This paper holds that BFS: structural equivalence (structural similarity, similar neighbors)

from collections import deque def iter-bfs(G,s,S=None): S,Q=set() deque()# Visited-set and queue Q. append(s)# We plan on visiting s vertex s is the starting point of traversal while Q:# Planned nodes left? Place the neighbor node to be accessed in the Q queue u=Q.popleft()# Get one if u in S: continue# Already visited? Skip it #Vertex u accessed skip S.add(u)# We 've visited it now add vertices to the accessed set S Q.extend(G[u])# Schedule all neighbors G[u] represents all neighbors of vertex u and puts them in the queue yield u# Report u as visited, print vertices

DFS algorithm uses the data structure: stack (stack), first in and last out

This paper holds that DFS: homophily (homogeneity / community similarity, no matter how you go, you are in a community)

def iter_dfs(G,s): S,Q=set(),[]# Visited-set and queue Q. append(s)# We plan on visitings while Q:# Planned nodes left? u=Q.pop()# Get one note that this is not a popleft if u in S: continue #Already visited? Skip it S.add(u)# We' ve visited it now Q.extend(G[u])# Schedule all neighbors yield u # Report u as visited

### Detail 3 biased Random Walk algorithm

Core of the paper:

The traditional random walk does not have the ability to explore different types of fields of nodes. This paper holds that the network has structural homogeneous similarity at the same time. The formula of traditional RW is (the probability of jumping from node x to v):

P
(
c
i
=
x
∣
c
i
−
1
=
v
)
=
{
π
v
x
Z
if
(
v
,
x
)
∈
E
0
otherwise
P(c_i=x|c_{i-1}=v)=\begin{cases} \cfrac{\pi_{vx}}{Z} & \text{ if } (v,x)\in E \\ 0 & \text{ otherwise } \end{cases}
P(ci=x∣ci−1=v)=⎩⎨⎧Zπvx0 if (v,x)∈E otherwise

w
v
x
=
1
w_{vx}=1
wvx = 1 indicates no authority graph

Z
Z
In fact, Z is not important. It is the normalized term of all weights.

Section 3.2 of this paper focuses on the biased random walk algorithm (2nd order):

α
p
q
(
t
,
x
)
=
{
1
p
if
d
t
x
=
0
1
if
d
t
x
=
1
1
q
if
d
t
x
=
2
\alpha_{pq}(t,x)=\begin{cases} \cfrac{1}{p} & \text{ if } d_{tx}=0 \\ 1 & \text{ if } d_{tx}=1 \\ \cfrac{1}{q} & \text{ if } d_{tx}=2 \end{cases}
αpq(t,x)=⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧p11q1 if dtx=0 if dtx=1 if dtx=2

For example, the shortest path from t to t in the above figure is 0, so
α
=
1
p
\alpha=\cfrac{1}{p}
α=p1；

The shortest path from t to x1 is 1, so
α
=
1
\alpha=1
α=1；

The shortest path from t to x2 is 1, so
α
=
1
q
\alpha=\cfrac{1}{q}
α=q1；

The shortest path from t to x3 is 1, so
α
=
1
q
\alpha=\cfrac{1}{q}
α=q1；

π
v
x
=
α
p
q
(
t
,
x
)
⋅
w
v
x
\pi_{vx}=\alpha_{pq}(t,x)\cdot w_{vx}
πvx=αpq(t,x)⋅wvx

d
t
x
d_{tx}
dtx ， is the shortest path between t and x, and the value range is 0, 1 and 2

The current time step i is at node v, and the i-1 time step is at point t. which node the current time step goes to depends on the i-1 time step. Therefore, it is called biased random walk based on 2nd order

P and Q control the speed of leaving other neighbors from the source point v. they are super parameters. For example, when p=1, q=10 and the weight is 1, they tend to go x2 and x3, that is, depth first.

p: Return parameter

Large p value: it tends not to backtrack, which reduces the redundancy of 2-hop

Small p value: tend to backtrack, and the sampling sequence is concentrated around the starting point

q: In-out parameter

q>1: BFS-behavior，local view

q<1: DFS-behavior

### Detail four algorithm

#G = (V, E, W) where W is the weight of the edge #The number of sampled sequences is r and the sequence length is l #Context length is k LearnFeatures (Graph G = (V, E, W). Dimensions d, Walks per node r, Walk length l, Context size k, Return p, In-out q) π = PreprocessModifiedWeights(G, p, q) G′ = (V, E, π) Initialize walks to Empty for iter = 1 to r do#Each point goes r times to get r sequences, and the sequences are added to the walks for all nodes u ∈ V do walk = node2vec Walk(G′, u, l) Append walk to walks f = StochasticGradientDescent(k, d walks) return f#Last learned function (network)

node2vecWalk (Graph G′ = (V, E, π), Start node u, Length l) Inititalize walk to [u] for walk_iter = 1 to l do curr = walk[−1] Vcurr = GetNeighbors(curr, G′)#Get all neighbors of the current node s = AliasSample(Vcurr, π)#Sampling in biased random walk, see the following sampling techniques for details Append s to walk return walk

### Detail 5 alias sampling

If an event p has four states, the probability of occurrence is:

p
=
[
0.3
,
0.2
,
0.1
,
0.4
]
p=[0.3,0.2,0.1,0.4]
p=[0.3,0.2,0.1,0.4]

Then accumulate one by one:

s
u
m
p
=
[
0.3
,
0.5
,
0.6
,
1
]
sump=[0.3,0.5,0.6,1]
sump=[0.3,0.5,0.6,1]

To check which state a probability corresponds to, you can use:

Query linear search one by one:
O
(
n
)
O(n)
O(n)

Since there is an increasing sequence after accumulation, you can use binary search:
O
(
log
n
)
O(\log n)
O(logn)

There is also a more NB approach: alias sampling , the time complexity is
O
(
1
)
O(1)
O(1)

The general steps are as follows:

There are four probabilities

There is a probability of four events, so changing the total area from 1 to 4 is to divide them by 1 / 4 (n events are divided by 1/n)

Crosscutting by height 1

It is found that the brown part on the right is vacant. It is necessary to find the largest local tyrant and re divide it by 2 / 3 (pay attention to the color):

Find another local tyrant and divide it by 2 / 3. After that, another piece of blue is missing:

Find pink Division 1 / 3:

You can see that there are at most two events in each column and get the following table:

Prob represents the probability of this event and Alias represents the probability of other events. First locate to a column, and then judge whether you are yourself or Alias according to the probability value.

## Experimental setup and result analysis

## Case Study: Les Misérables network

The data set is visualized with k = 254 and K = 254.

set p = 1, q = 0.5

set p = 1, q = 2

## Experimental setup

Then prepare to start the comparative test

The following table shows that there are four baseline s in this paper: spectral clustering (Laplace decomposition of graph), and two based on DL

Let's say here that when p=q=1, there is no control over the trend of random walk, which is the same as deepwalk.

For fairness, the amount of training data used by the other two DL methods is the same, both

K
=
r
⋅
l
⋅
∣
V
∣
K=r\cdot l\cdot |V|
K=r⋅l⋅∣V∣

That is, the number of generated sequences (10) multiplied by the length of sequence (80) multiplied by the number of nodes is the same.

Both use SGD as the optimizer.

All use negative sampling.

## Multi-label classification

There are three types of data sets:

Microblog blogcatalog [38]: This is a network of social relationships of the bloggers listed on the blogcatalog website The labels represent blogger interests inferred through the metadata provided by the bloggers. The network has 10,312 nodes, 333,983 edges, and 39 different labels.

Protein protein interactions (PPI) [5]: we use a subgraph of the PPI network for Homo sapiens The subgraph corresponds to the graph induced by nodes for which we could obtain labels from the hallmark gene sets [19] and represent biological states. The network has 3,890 nodes, 76,584 edges, and 50 different labels.

Wikipedia [20]: This is a occurrence network of words appearing in the first million bytes of the Wikipedia dump The labels represent the Part-ofSpeech (POS) tags inferred using the Stanford POS-Tagger [32]. The network has 4,777 nodes, 184,812 edges, and 40 different labels.

The above three data sets have one feature, that is, they have both structural characteristics and community characteristics

All these networks exhibit a fair mix of homophilic and structural equivalences.

The results are shown in the figure above.

## Parameter sensitivity

Using different data for training, the results obtained on macro f1 and micro f1 are as follows:

Sensitivity of parameters:

The smaller the p and q, the more detailed the backtracking and in-depth walk, the better the effect, and the larger the dimension is, the better. The following three figures are the number of sequence s taken, the length of walk, and the window size of the context. The larger the better.

## scalibility

Basic linearity.

## Link prediction

Changed dataset:

Facebook [14]: In the Facebook network, nodes represent users, and edges represent a friendship relation between any two users. The network has 4,039 nodes and 88,234 edges.

Protein-Protein Interactions (PPI) [5]: In the PPI network for Homo Sapiens, nodes represent proteins, and an edge indicates a biological interaction between a pair of proteins. The network has 19,706 nodes and 390,633 edges.

arXiv ASTRO-PH [14]: This is a collaboration network generated from papers submitted to the e-print arXiv where nodes represent scientists, and an edge is present between two scientists if they have collaborated in a paper. The network has 18,722 nodes and 198,110 edges.

result:

At the top are four traditional algorithms

Common Neighbors: predict whether there are edges according to whether there are common neighbor nodes.

Jaccard's Coefficient: this method is quite normalized in the above method. For example, two big V's have 1000 common friends, but their respective friends have 10 million respectively. Two ordinary people have 50 common friends and about 100 respective friends. Then these two ordinary people may know each other better than big v.

Adamic ADAR score: consider the weight of friends based on the previous algorithm.

Preferred attachment: if two users have more friends, they are more likely to be willing to establish contact. That is, the principle of "the rich get richer". Based on this idea, the product of the number of friends of their two users is used as the score.

There are also some references: https://blog.csdn.net/a358463121/article/details/79350292

The following abcd represents four different treatments of points and edges:

(a) Average, (b) Hadamard, © Weighted-L1, and (d) WeightedL2

The effect of b is the best.

## Paper summary

Key points

word2vec training framework

Generating training sequence based on random walk

Performance – alias sampling

Experimental setup

Innovation

Discuss the semantics of bfs and dfs

Design biased random network

Rich experimental demonstration effect

Inspiration Point

The role of graph understanding in network representation learning (must see)

The author of this paper is a researcher in the direction of graph / social network, and his perspective is the understanding of graph

The random walk method inspired a lot of work

From random walk in 2014, node2vec in 2016 (this article) to 2020, there is still a lot of work on metapath2vec:Scalable Representation Learning for Heterogeneous Networks[kdd17]

Complexity analysis

The algorithm part is discussed in detail, which shows the efficiency of the algorithm

# Reappearance

karate dataset, which can be https://github.com/aditya-grover/node2vec/ Download

34 points

It is mainly the code implementation of alias table. Pay attention to the pseudo code above

## main.py

''' Reference implementation of node2vec. Author: Aditya Grover For more details, refer to the paper: node2vec: Scalable Feature Learning for Networks Aditya Grover and Jure Leskovec Knowledge Discovery and Data Mining (KDD), 2016 ''' import argparse import numpy as np import networkx as nx import node2vec from gensim.models import Word2Vec # Read the drawing and set the model parameters # Calculate the alias table of points and edges # Generating node sequence by biased random walk # Using word2vec training model # Result presentation and visualization # 1. Read the drawing and set the model parameters # 1) Set model parameters: set graph related parameters, such as directed undirected graph, weight graph, and model coefficients, such as p, q, embedding length, etc # 2) The way of reading pictures is very simple. Use networx package to load edgelist directly # 3) Input and output # Input file/ graph/karate.edgelist' # Output file/ emb/karate.emtb' def parse_args(): ''' Parses the node2vec arguments. ''' parser = argparse.ArgumentParser(description="Run node2vec.") # input file parser.add_argument('--input', nargs='?', default='graph/karate.edgelist', help='Input graph path') # output file parser.add_argument('--output', nargs='?', default='emb/karate.emb', help='Embeddings path') # Representation dimension parser.add_argument('--dimensions', type=int, default=128, help='Number of dimensions. Default is 128.') # sequence length parser.add_argument('--walk-length', type=int, default=80, help='Length of walk per source. Default is 80.') # The number of times each node generates a sequence parser.add_argument('--num-walks', type=int, default=10, help='Number of walks per source. Default is 10.') # Skip gram context window size parser.add_argument('--window-size', type=int, default=10, help='Context size for optimization. Default is 10.') parser.add_argument('--iter', default=1, type=int, help='Number of epochs in SGD') parser.add_argument('--workers', type=int, default=8, help='Number of parallel workers. Default is 8.') parser.add_argument('--p', type=float, default=1, help='Return hyperparameter. Default is 1.') parser.add_argument('--q', type=float, default=1, help='Inout hyperparameter. Default is 1.') parser.add_argument('--weighted', dest='weighted', action='store_true', help='Boolean specifying (un)weighted. Default is unweighted.') parser.add_argument('--unweighted', dest='unweighted', action='store_false') parser.set_defaults(weighted=False) parser.add_argument('--directed', dest='directed', action='store_true', help='Graph is (un)directed. Default is undirected.') parser.add_argument('--undirected', dest='undirected', action='store_false') parser.set_defaults(directed=False) return parser.parse_args() def read_graph(): ''' Reads the input network in networkx. ''' if args.weighted:#Different weights are set for the directed weighted graph G = nx.read_edgelist(args.input, nodetype=int, data=(('weight', float),), create_using=nx.DiGraph()) else:#The weight of the directed graph is set to 1 G = nx.read_edgelist(args.input, nodetype=int, create_using=nx.DiGraph()) for edge in G.edges(): G[edge[0]][edge[1]]['weight'] = 1 if not args.directed:#Undirected graph G = G.to_undirected() return G def learn_embeddings(walks): ''' Learn embeddings by optimizing the Skipgram objective using SGD. ''' #This sentence is to convert int to string, but Python 2 #walks = [map(str, walk) for walk in walks] #Rewritten to Python 3 walk_new = [] for walk in walks: tmp = [] for node in walk: tmp.append(str(node)) walk_new.append(tmp) model = Word2Vec(walk_new, size=args.dimensions, window=args.window_size, min_count=0, sg=1, workers=args.workers, iter=args.iter) model.save_word2vec_format(args.output) return def main(args): ''' Pipeline for representational learning for all nodes in a graph. ''' nx_G = read_graph() G = node2vec.Graph(nx_G, args.directed, args.p, args.q) G.preprocess_transition_probs()#Generate alias table walks = G.simulate_walks(args.num_walks, args.walk_length)# Generating node sequence by biased random walk learn_embeddings(walks) if __name__ == "__main__": args = parse_args() main(args)

## node2vec.py

import numpy as np import networkx as nx import random class Graph(): def __init__(self, nx_G, is_directed, p, q): self.G = nx_G self.is_directed = is_directed self.p = p self.q = q def node2vec_walk(self, walk_length, start_node):#The core algorithm is how to generate a single biased random walk sequence ''' Simulate a random walk starting from start node. ''' G = self.G alias_nodes = self.alias_nodes alias_edges = self.alias_edges walk = [start_node] while len(walk) < walk_length:# Loop to the length of sequence cur = walk[-1] cur_nbrs = sorted(G.neighbors(cur))#The purpose of sorting here is to correspond to the calculation order of alias table if len(cur_nbrs) > 0: if len(walk) == 1:#If the sequence has only one node, it will not jump to other nodes walk.append(cur_nbrs[alias_draw(alias_nodes[cur][0], alias_nodes[cur][1])]) else: prev = walk[-2]#Previous node #Next node next = cur_nbrs[alias_draw(alias_edges[(prev, cur)][0], alias_edges[(prev, cur)][1])] walk.append(next) else: break return walk def simulate_walks(self, num_walks, walk_length):#Cyclic iterative single point walk function ''' Repeatedly simulate random walks from each node. ''' G = self.G walks = []#list nodes = list(G.nodes()) print 'Walk iteration:' for walk_iter in range(num_walks):#Num per node_ Walks times print str(walk_iter+1), '/', str(num_walks) random.shuffle(nodes)# Disrupt the order of nodes for node in nodes: walks.append(self.node2vec_walk(walk_length=walk_length, start_node=node)) return walks def get_alias_edge(self, src, dst): ''' Get the alias edge setup lists for a given edge. ''' G = self.G p = self.p q = self.q unnormalized_probs = [] for dst_nbr in sorted(G.neighbors(dst)):#Jump to the core algorithm, which is related to p and q. p is backtracking and q is the next neighbor # Calculate the weight of the next item through pq if dst_nbr == src: unnormalized_probs.append(G[dst][dst_nbr]['weight']/p) elif G.has_edge(dst_nbr, src): unnormalized_probs.append(G[dst][dst_nbr]['weight']) else: unnormalized_probs.append(G[dst][dst_nbr]['weight']/q) # normalization norm_const = sum(unnormalized_probs) normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs] return alias_setup(normalized_probs) def preprocess_transition_probs(self): ''' Preprocessing of transition probabilities for guiding the random walks. ''' G = self.G is_directed = self.is_directed#Is there a directed graph alias_nodes = {}#Create dictionary for node in G.nodes():#Cycle each node on the graph unnormalized_probs = [G[node][nbr]['weight'] for nbr in sorted(G.neighbors(node))]#Find the neighbors of each node, for example, 4 norm_const = sum(unnormalized_probs)#Sum 1 + 1 + 1 + 1 = 4 (here is the unauthorized graph) normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]#The probability of taking a substitute is 1 / 4. If you don't notice your neighbor, you'll always jump. alias_nodes[node] = alias_setup(normalized_probs)#The jump probability is sampled using the alias table algorithm to obtain the complexity of O(1) alias_edges = {} triads = {} if is_directed: for edge in G.edges():#The opposite side is also operated in alias table alias_edges[edge] = self.get_alias_edge(edge[0], edge[1]) else: for edge in G.edges(): alias_edges[edge] = self.get_alias_edge(edge[0], edge[1]) alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0]) self.alias_nodes = alias_nodes self.alias_edges = alias_edges return def alias_setup(probs): ''' Compute utility lists for non-uniform sampling from discrete distributions. Refer to https://hips.seas.harvard.edu/blog/2013/03/03/the-alias-method-efficient-sampling-with-many-discrete-outcomes/ for details ''' K = len(probs) q = np.zeros(K) J = np.zeros(K, dtype=np.int)#Double smaller = [] larger = [] for kk, prob in enumerate(probs):#The probability is divided into two types, greater than 1 and less than 1 q[kk] = K*prob if q[kk] < 1.0: smaller.append(kk) else: larger.append(kk) # The greedy algorithm is used to fill the probability less than 1 while len(smaller) > 0 and len(larger) > 0: small = smaller.pop() large = larger.pop() J[small] = large q[large] = q[large] + q[small] - 1.0 if q[large] < 1.0: smaller.append(large) else: larger.append(large) return J, q def alias_draw(J, q):#Specific sampling algorithm, O(1) ''' Draw sample from a non-uniform discrete distribution using alias sampling. ''' K = len(J) kk = int(np.floor(np.random.rand()*K)) if np.random.rand() < q[kk]: return kk else: return J[kk]