Professional Documents
Culture Documents
Anshumali Shrivastava
Ping Li
I.
I NTRODUCTION
Algorithm 1 CovarianceRepresentation(A,k)
Input: Adjacency matrix A Rnn , k, the number of
power iterations. Initialize x0 = e Rn1 .
for t = 1 to k do
Axt1
xt = M(:),(t)
M(:),(t) = n ||Ax
t1 || ,
1
end for
k1
=eR
n
1
A
C = n i=1 (M(i),(:) )(M(i),(:) )T
return C A Rkk
identityP T = P 1 , it is not difficult to show that for any
permutation matrix P, (P AP T )k = P Ak P T . This along with
the fact P T e = e, yields
(P AP T )i e = P Ai e.
(1)
P AP
Thus, Ci,j
= Cov(P Ai e, P Aj e). The proof follows
from the fact that shuffling vectors under same permutation
does not change the value of covariance between them, i.e.,
Cov(x, y) = Cov(P x, P y)
T
A
P AP
which implies Ci,j
= Ci,j
i, j
IV.
n
+
e
e
n2 T i
n
[e A e][eT Aj e]
[eT Ai+j e]
= n T i
1
[e A e][eT Aj e]
Thus, we have
A
Ci,j
= n
[eT Ai+j e]
[eT Ai e][eT Aj e]
(2)
(3)
it st [eT vt ] =
1
n
3
A
n
1
C1,2
=
2m
(P2 + m)
where denotes the total number of triangles, P3 is the total
number of distinct simple paths of length 3, P2 is the total
number of distinct simple paths of length 2 and
n
2
n
1
1
deg(i)2
deg(i)
V ar(deg) =
n i=1
n i=1
is the variance of degree.
Proof:
To compute [eT Ai e], we use the fact that the vector Ai e can
be written in terms of eigenvalues and eigenvectors of A as
[eT Ai e] =
n
it s2t
t=1
(4)
The term [eT Ae] is the sum of all elements of adjacency matrix
A, which is equal to twice the number of edges. So,
[eT Ae] = 2m,
T
(5)
2
[eT A2 e] = 2m + 2P2 .
Q
(a)
(b)
R
P
(b)
4m
[eT A3 e] = 6 + 2P3 + 2n(V ar(deg)) + 2m
1 ,
n
n
2
n
1
1
2
where
V ar(deg) =
deg(i)
deg(i)
n i=1
n i=1
Q
R
(a)
(c)
(i,j)E
n
(deg(i)2 deg(i)) +
i=1
n
=2
i=1
n
deg(j)
i=1 jN gh(i)
2
deg(i)
n
deg(i)
i=1
Adding
contributions of all possible types of paths and using
n
i=1 deg(i) = 2m yields Lemma 2 after some algebra.
Substituting for the terms [eT A2 e] and [eT A3 e] in Eq. (4)
from Lemmas 1 and 2 leads to the desired expression.
Remarks on Theorem 3: From its proof, it is clear that terms
of the form [eT At e], for small values of t like 2 or 3, are
weighted combinations of counts of small sub-structures like
triangles and small paths along with global features like degree
variance. The key observation behind the proof is that Ati,j
counts paths (with repeated nodes and edges) of length t, which
in turn can be decomposed into disjoint structures over t + 1
nodes and can be counted separately. Extending this analysis
for t > 3, involves dealing with more complicated bigger
patterns. For instance, while computing the term [eT A4 e],
we will encounter counts of quadrilaterals along with more
complex patterns. The representation C A is informative in that
it captures all such information and is sensitive to the counts
of these different substructures present in the graph.
Empirical Evidence for Theorem 3: To empirically validate
Theorem 3, we take publicly available twitter graphs2, which
2 http://snap.stanford.edu/data/egonets-Twitter.html
Sim(C A , C B ) = eDist(C ,C )
det()
1
A
B
Dist(C , C ) = log
2
(det(C A )det(C B ))
=
(6)
(7)
CA + CB
2
Algorithm 2 ComputeSimilarity(A,B,k)
Input: Adjacency matrices A Rn1 n1 and B Rn2 n2 ,
k, the number of power iterations.
C A = CovarianceRepresentation(A, k)
C B = CovarianceRepresentation(B, k)
return Sim(C A , C B ) computed using Eq. (6)
TABLE I.
G RAPH STATISTICS OF EGO - NETWORKS USED IN THE PAPER . T HE R ANDOM DATASETS CONSIST OF RANDOM E RDOS -R EYNI GRAPHS ( SEE
S ECTION VI-A FOR MORE DETAILS )
STATS
Number of Graphs
Mean Number of Nodes
Mean Number of Edges
Mean Clustering Coefficient
High Energy
1000
131.95
8644.53
0.95
Condensed Matter
415
73.87
410.20
0.86
Astro Physics
1000
87.40
1305.00
0.85
Twitter
973
137.57
1709.20
0.55
Random
973
137.57
1709.20
0.18
They do not account for the formation of hubs. Formally, the degree distribution of Erdos-Reyni random
graphs converges to a Poisson distribution, rather than
a power law observed in many real-world networks.
Thus, one reasonable task is to discriminate social network structures from random Erdos-Reyni graphs. We expect
methodologies which capture properties like triadic closure and
the degree distribution to perform well on this task.
We used the Twitter7 ego networks [20], which is a large
public dataset of social ego networks, which contains around
950 ego networks of users from Twitter with a mean of around
130 nodes and 1700 edges per graph. Since we are interested
only in the graph structure, these directed graphs were made
undirected. We do not use any information other than the
adjacency matrix of the graph structure.
For each of the undirected twitter graphs, we generated a
corresponding random graph with the same number of nodes
and edges. We start with the required number of nodes and
then select two nodes at random and add an edge between
them. This process is repeated until the graph has the same
number of edges as the corresponding twitter graph. We label
5 http://snap.stanford.edu/data/ca-CondMat.html
6 http://snap.stanford.edu/data/ca-AstroPh.html
7 http://snap.stanford.edu/data/egonets-Twitter.html
1 T
e M e,
n1 n2
TABLE II.
P REDICTION ACCURACIES IN PERCENTAGE FOR PROPOSED AND THE STATE - OF - THE - ART SIMILARITY MEASURES ON DIFFERENT SOCIAL
NETWORK CLASSIFICATION TASKS . T HE REPORTED RESULTS ARE AVERAGED OVER 10 REPETITIONS OF 10- FOLD CROSS - VALIDATION . S TANDARD ERRORS
ARE INDICATED USING PARENTHESES . B EST RESULTS MARKED IN BOLD .
Methodology
PROPOSED (k =4)
PROPOSED (k =5)
PROPOSED (k =6)
SUBFREQ-5
SUBFREQ-4
SUBFREQ-3
RW
EIGS-5
EIGS-10
COLLAB
(HEnP Vs CM)
98.06(0.05)
98.22(0.06)
97.51(0.04)
96.97 (0.04)
97.16 (0.05)
96.38 (0.03)
96.12 (0.07)
94.85(0.18)
96.92(0.21)
COLLAB
(HEnP Vs ASTRO)
87.70(0.13)
87.47(0.04)
82.07(0.06)
85.61(0.1)
82.78(0.06)
80.35(0.06)
80.43(0.14)
77.69(0.24)
78.15(0.17)
COLLAB
(ASTRO Vs CM)
89.29(0.18)
89.26(0.17)
89.65(0.09)
88.04(0.14)
86.93(0.12)
82.98(0.12)
85.68(0.03)
83.16(0.47)
84.60(0.27)
COLLAB (Full)
82.94(0.16)
83.56(0.12)
82.87(0.11)
81.50(0.08)
78.55(0.08)
73.42(0.13)
75.64(0.09)
72.02(0.25)
72.93(0.19)
SOCIAL
(Twitter Vs Random)
99.18(0.03)
99.43(0.02)
99.48(0.03)
99.42(0.03)
98.30(0.08)
89.70(0.04)
90.23(0.06)
90.74(0.22)
92.71(0.15)
SOCIAL
1946
177.20
200.28
207.20
5678.67
193.39
115.58
19669.24
36.84
41.15
COLLAB (Full)
2415
260.56
276.77
286.87
7433.41
265.77
369.83
25195.54
26.03
29.46
C ONCLUSIONS
[2]
[3]
[4]
[5]