You are on page 1of 14

538 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO.

3, MARCH 2015

Friendbook: A Semantic-Based Friend


Recommendation System for Social Networks
Zhibo Wang, Student Member, IEEE, Jilong Liao, Qing Cao, Member, IEEE,
Hairong Qi, Senior Member, IEEE, and Zhi Wang, Member, IEEE

AbstractExisting social networking services recommend friends to users based on their social graphs, which may not be the most
appropriate to reflect a users preferences on friend selection in real life. In this paper, we present Friendbook, a novel semantic-based
friend recommendation system for social networks, which recommends friends to users based on their life styles instead of social
graphs. By taking advantage of sensor-rich smartphones, Friendbook discovers life styles of users from user-centric sensor data,
measures the similarity of life styles between users, and recommends friends to users if their life styles have high similarity. Inspired by
text mining, we model a users daily life as life documents, from which his/her life styles are extracted by using the Latent Dirichlet
Allocation algorithm. We further propose a similarity metric to measure the similarity of life styles between users, and calculate users
impact in terms of life styles with a friend-matching graph. Upon receiving a request, Friendbook returns a list of people with highest
recommendation scores to the query user. Finally, Friendbook integrates a feedback mechanism to further improve the
recommendation accuracy. We have implemented Friendbook on the Android-based smartphones, and evaluated its performance on
both small-scale experiments and large-scale simulations. The results show that the recommendations accurately reflect the
preferences of users in choosing friends.

Index TermsFriend recommendation, mobile sensing, social networks, life style

1 INTRODUCTION

T WENTY years ago, people typically made friends with


others who live or work close to themselves, such as
neighbors or colleagues. We call friends made through this
on recent sociology findings [16], [27], [29], [30]. According
to these studies, the rules to group people together include:
1) habits or life style; 2) attitudes; 3) tastes; 4) moral stand-
traditional fashion as G-friends, which stands for geograph- ards; 5) economic level; and 6) people they already know.
ical location-based friends because they are influenced by Apparently, rule #3 and rule #6 are the mainstream factors
the geographical distances between each other. With the considered by existing recommendation systems. Rule #1,
rapid advances in social networks, services such as Face- although probably the most intuitive, is not widely used
book, Twitter and Google+ have provided us revolutionary because users life styles are difficult, if not impossible, to
ways of making friends. According to Facebook statistics, a capture through web actions. Rather, life styles are usually
user has an average of 130 friends, perhaps larger than any closely correlated with daily routines and activities. There-
other time in history [2]. fore, if we could gather information on users daily routines
One challenge with existing social networking services is and activities, we can exploit rule #1 and recommend
how to recommend a good friend to a user. Most of them friends to people based on their similar life styles. This rec-
rely on pre-existing user relationships to pick friend candi- ommendation mechanism can be deployed as a standalone
dates. For example, Facebook relies on a social link analysis app on smartphones or as an add-on to existing social net-
among those who already share common friends and recom- work frameworks. In both cases, Friendbook can help mobile
mends symmetrical users as potential friends. Unfortu- phone users find friends either among strangers or within a
nately, this approach may not be the most appropriate based certain group as long as they share similar life styles.
In our everyday lives, we may have hundreds of activi-
ties, which form meaningful sequences that shape our lives.
 Z. Wang is with the Key Laboratory of Aerospace Information Security and In this paper, we use the word activity to specifically refer to
Trusted Computing, School of Computer, Wuhan University, China, the actions taken in the order of seconds, such as sitting,
430072, and Department of Electrical Engineering and Computer Science,
University of Tennessee, Knoxville, TN 37909.
walking, or typing, while we use the phrase life style to
E-mail: zwang32@utk.edu. refer to higher-level abstractions of daily lives, such as
 J. Liao, Q. Cao, and H. Qi are with the Department of Electrical Engineering office work or shopping. For instance, the shopping
and Computer Science, University of Tennessee, Knoxville, TN 37909. life style mostly consists of the walking activity, but may
E-mail: {jliao2, cao, hqi}@utk.edu.
 Z. Wang is with the State Key Laboratory of Industrial Control Technol- also contain the standing or the sitting activities.
ogy, Zhejiang University, Hangzhou 310027, P.R. China. To model daily lives properly, we draw an analogy
E-mail: wangzhi@iipc.zju.edu.cn. between peoples daily lives and documents, as shown in
Manuscript received 21 Jan. 2014; revised 21 Apr. 2014; accepted 26 Apr. Fig. 1. Previous research on probabilistic topic models in
2014. Date of publication 6 May 2014; date of current version 29 Jan. 2015. text mining has treated documents as mixtures of topics,
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below. and topics as mixtures of words [10]. Inspired by this, simi-
Digital Object Identifier no. 10.1109/TMC.2014.2322373 larly, we can treat our daily lives (or life documents) as a
1536-1233 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
WANG ET AL.: FRIENDBOOK: A SEMANTIC-BASED FRIEND RECOMMENDATION SYSTEM FOR SOCIAL NETWORKS 539

The rest of the paper is organized as follows. Section 2


discusses related work. Section 3 provides the high-level
overview of Friendbook. Section 4 presents activity recog-
nition and life style modeling and extraction. In Section 5,
we describe the social graph construction and user impact
estimation. We elaborate on the user query and friend rec-
ommendation in Section 6. We describe the feedback
mechanism in Section 7. In Section 8, we evaluate the per-
formance of Friendbook intensively with both simulations
and real experiments. Finally, we conclude the paper and
Fig. 1. An analogy between word documents and peoples daily lives. present the future work in Section 9.

mixture of life styles (or topics), and each life style as a mix- 2 RELATED WORK
ture of activities (or words). Observe here, essentially, we
represent daily lives with life documents, whose semantic Recommendation systems that try to suggest items (e.g.,
meanings are reflected through their topics, which are life music, movie, and books) to users have become more and
styles in our study. Just like words serve as the basis of more popular in recent years. For instance, Amazon [1] rec-
documents, peoples activities naturally serve as the primi- ommends items to a user based on items the user previ-
tive vocabulary of these life documents. ously visited, and items that other users are looking at.
Our proposed solution is also motivated by the recent Netflix [3] and Rotten Tomatoes [4] recommend movies to a
advances in smartphones, which have become more and user based on the users previous ratings and watching
more popular in peoples lives. These smartphones (e.g., habits. Recently, with the advance of social networking sys-
iPhone or Android-based smartphones) are equipped with tems, friend recommendation has received a lot of atten-
a rich set of embedded sensors, such as GPS, accelerometer, tion. Generally speaking, existing friend recommendation
microphone, gyroscope, and camera. Thus, a smartphone is in social networking systems, e.g., Facebook, LinkedIn and
no longer simply a communication device, but also a power- Twitter, recommend friends to users if, according to their
ful and environmental reality sensing platform from which social relations, they share common friends.
we can extract rich context and content-aware information. Meanwhile, other recommendation mechanisms have
From this perspective, smartphones serve as the ideal plat- also been proposed by researchers. For example, Bian and
form for sensing daily routines from which peoples life Holtzman [8] presented MatchMaker, a collaborative filter-
styles could be discovered. ing friend recommendation system based on personality
In spite of the powerful sensing capabilities of smart- matching. Kwon and Kim [20] proposed a friend recom-
phones, there are still multiple challenges for extracting users mendation method using physical and social context. How-
life styles and recommending potential friends based on their ever, the authors did not explain what the physical and
similarities. First, how to automatically and accurately dis- social context is and how to obtain the information. Yu et al.
cover life styles from noisy and heterogeneous sensor data? [32] recommended geographically related friends in social
Second, how to measure the similarity of users in terms of life network by combining GPS information and social network
styles? Third, who should be recommended to the user structure. Hsu et al. [18] studied the problem of link recom-
among all the friend candidates? To address these challenges, mendation in weblogs and similar social networks, and pro-
in this paper, we present Friendbook, a semantic-based friend posed an approach based on collaborative recommendation
recommendation system based on sensor-rich smartphones. using the link structure of a social network and content-
The contributions of this work are summarized as follows: based recommendation using mutual declared interests.
Gou et al. [17] proposed a visual system, SFViz, to support
 To the best of our knowledge, Friendbook is the first users to explore and find friends interactively under the
friend recommendation system exploiting a users life context of interest, and reported a case study using the
style information discovered from smartphone sensors. system to explore the recommendation of friends based on
 Inspired by achievements in the field of text mining, peoples tagging behaviors in a music community. These
we model the daily lives of users as life documents existing friend recommendation systems, however, are sig-
and use the probabilistic topic model to extract life nificantly different from our work, as we exploit recent soci-
style information of users. ology findings to recommend friends based on their similar
 We propose a unique similarity metric to character- life styles instead of social relations.
ize the similarity of users in terms of life styles and Activity recognition serves as the basis for extracting
then construct a friend-matching graph to recom- high-level daily routines (in close correlation with life
mend friends to users based on their life styles. styles) from low-level sensor data, which has been widely
 We integrate a linear feedback mechanism that studied using various types of wearable sensors. Zheng
exploits the users feedback to improve recommen- et al. [33] used GPS data to understand the transportation
dation accuracy. mode of users. Lester et al. [21] used data from wearable
 We conduct both small-scale experiments and large- sensors to recognize activities based on the Hidden Markov
scale simulations to evaluate the performance of our Model (HMM). Li et al. [22] recognized static postures and
system. Experimental results demonstrate the effec- dynamic transitions by using accelerometers and gyro-
tiveness of our system. scopes. The advance of smartphones enables activity
540 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 3, MARCH 2015

Friendbook which adopts a client-server mode where each


client is a smartphone carried by a user and the servers are
data centers or clouds.
On the client side, each smartphone can record data of its
user, perform real-time activity recognition and report the
generated life documents to the servers. It is worth noting
that an offline data collection and training phase is needed
to build an appropriate activity classifier for real-time activ-
ity recognition on smartphones. We spent three months on
collecting raw data of eight volunteers for building a large
training data set. As each user typically generates around
50 MB of raw data each day, we choose MySQL as our low
level data storage platform and Hadoop MapReduce as our
computation infrastructure. After the activity classifier is
built, it will be distributed to each users smartphone and
Fig. 2. System architecture of Friendbook. then activity recognition can be performed in real-time
manner. As a user continually uses Friendbook, he/she will
recognition using the rich set of sensors on the smartphones. accumulate more and more activities in his/her life docu-
Reddy et al. [26] used the built-in GPS and the accelerome- ments, based on which, we can discover his/her life styles
ter on the smartphones to detect the transportation mode of using probabilistic topic model.
an individual. CenceMe [24] used multiple sensors on the On the server side, seven modules are designed to fulfill
smartphone to capture users activities, state, habits and the task of friend recommendation. The data collection mod-
surroundings. SoundSense [23] used the microphone on the ule collects life documents from users smartphones. The
smartphone to recognize general sound types (e.g., music, life styles of users are extracted by the life style analysis mod-
voice) and discover user specific sound events. EasyTracker ule with the probabilistic topic model. Then the life style
[7] used GPS traces collected from smartphones that are indexing module puts the life styles of users into the data-
installed on transit vehicles to determine routes served, base in the format of (life-style, user) instead of (user, life-
locate stops, and infer schedules. style). A friend-matching graph can be constructed accord-
Although a lot of work has been done for activity recog- ingly by the friend-matching graph construction module to
nition using smartphones, there is relatively little work on represent the similarity relationship between users life
discovery of daily routines using smartphones. The MIT styles. The impacts of users are then calculated based on the
Reality Mining project [12] and Farrahi and Gatica-Perez friend-matching graph by the user impact ranking module.
[14] tried to discover daily location-driven routines from The user query module takes a users query and sends a
large-scale location data. They could infer daily routines ranked list of potential friends to the user as response. The
such as leaving from home to office and eating at a restau- system also allows users to give feedback of the recommen-
rant. However, they could not discover the daily routines of dation results which can be processed by the feedback control
people who are staying at the same location. For instance, module. With this module, the accuracy of friend recom-
when one stays at home, his/her daily routines like eating mendation can be improved.
lunch and watching movie could not be discovered if In the following sections, we will elaborate on all the
only using the location information. In [13], Farrahi and components of the system.
Gatica-Perez took a step further and overcame the short-
coming of discovering daily routines of people staying in
the same location by considering combined location and 4 LIFE STYLE EXTRACTION USING TOPIC MODEL
physical proximity sensed by the mobile phone. Another 4.1 Life Style Modeling
closely related work was presented in [19], which used a As stated in Section 1, life styles and activities are reflections
topic model to extract activity patterns from sensor data. of daily lives at two different levels where daily lives can be
However, they used two wearable sensors, but not smart- treated as a mixture of life styles and life styles as a mixture
phones, to discover the daily routines. In our work, we of activities. This is analogous to the treatment of docu-
attempt to use the probabilistic topic model to discover life ments as ensemble of topics and topics as ensemble of
styles using the smartphone. We further utilize patterns dis- words. By taking advantage of recent developments in the
covered from activities as a basis for friend recommenda- field of text mining, we model the daily lives of users as life
tion that helps users find friends who have similar life documents, the life styles as topics, and the activities as words.
styles. Note that the work in this paper is significantly dif- Given documents, the probabilistic topic model could
ferent from our preliminary demo work of Friendbook [31] discover the probabilities of underlying topics. Therefore,
that recommended friends to users based on the similarity we adopt the probabilistic topic model to discover the prob-
of pictures taken by users. abilities of hidden life styles from the life documents.
In probabilistic topic models, the frequency of vocabulary is
particularly important, as different frequency of words
3 SYSTEM OVERVIEW denotes their information entropy variances. Following this
In this section, we give a high-level overview of the Friend- observation, we propose the bag-of-activity model (Fig. 3)
book system. Fig. 2 shows the system architecture of to replace the original sequences of activities recognized
WANG ET AL.: FRIENDBOOK: A SEMANTIC-BASED FRIEND RECOMMENDATION SYSTEM FOR SOCIAL NETWORKS 541

Fig. 3. Bag-of-activity modeling for life document.

based on the raw data with their probability distributions.


Thereafter, each user has a bag-of-activity representation of
his/her life document, which comprises a mixture of activ- Fig. 4. The flowchart of activity recognition.
ity words.
Let w w1 ; w2 ; . . . ; wW  denote a set of activities, where gyroscope, are used to infer users motion activities. Gener-
wi is the ith activity and W is the total number of activities. ally speaking, there are two mainstream approaches: super-
Let z z1 ; z2 ; . . . ; zZ  denote a set of life styles, where zi is vised learning and unsupervised learning. For both
the ith life style and Z is the total number of life styles. Let approaches, mature techniques have been developed and
d d1 ; d2 ; . . . ; dn  denote a set of life documents, where di tested. In practice, the number of activities involved in the
is the ith life document and n is the total number of users. analysis is unpredictable and it is difficult to collect a large
Let pwi j dk denote the probability of the activity wi in a set of ground truth data for each activity, which makes
certain life document dk , pwi j zj denote the probability of supervised learning algorithms unsuitable for our system.
how much the activity wi contributes to the life style zj , and Therefore, we use unsupervised learning approaches to rec-
pzj j dk denote the probability of the life style zj embedded ognize activities. Here, we adopt the popular K-means clus-
in the life document dk . According to the probabilistic topic tering algorithm [9] to group data into clusters, where each
model, we have cluster represents an activity. Note that activity recognition
is not the main concern of our paper. Other more compli-
X
Z
pwi j dk pwi j zj pzj j dk : (1) cated clustering algorithms can certainly be used. We
j1 choose K-means for its simplicity and effectiveness.
Fig. 4 shows the flowchart of activity recognition. Since
Observe that pwi j dk can be easily calculated by using the raw data collected on the smartphones are noisy, we
the bag-of-activity representation for the life document dk : first use a median filter [5] with sliding windows to filter
out the outliers of the noisy data. In order to further
fk wi improve recognition accuracy, features are extracted to
pwi j dk PW ; (2) characterize the data after preprocessing. We have tested
i1 fk wi
several features, such as mean, standard deviation, correla-
tion, and the combination of them, on the data and found
where fk wi denotes the frequency of wi in dk .
that standard deviation is the most representative feature
We represent the life styles of a user using the life style
for characterizing motion activities. Therefore, a feature vec-
vector, denoted by Lk pz1 j dk ; pz2 j dk ; . . . ; pzZ j dk . In
tor f tc ; accx ; accy ; accz ; gyrx ; gyry ; gyrz  is used to charac-
this paper, our objective is to discover the life style vector
terize users activities instead of the raw data, where tc
for each user given the life documents of all users. However,
is the mean of normalized time; accx , accy and accz represent
in Eq. (1), although pwi j dk can be calculated easily,
the standard deviation of normalized data obtained from
pwi j zj and pzj j dk are difficult to solve because of the
the accelerometer in the x, y and z directions, respectively;
hidden feature of life styles.
gyrx , gyry and gyrz represent the standard deviation of nor-
In the following sections, we first present the details of
malized data obtained from the gyroscope in the x, y and z
activity recognition used to calculate pwi j dk , then show
directions, respectively. We then apply the K-means cluster-
how to use the Latent Dirichlet Allocation (LDA) decompo-
ing algorithm on the feature vectors to group them into dif-
sition algorithm to solve Eq. (1) so that we can obtain the
ferent clusters as well as calculating the cluster centroids.
life style vector of each user.
Our empirical study in Section 8.1.1 indicates that K 15 is
a good compromise between classification accuracy and
4.2 Activity Recognition computational time. The cluster centroids are then distrib-
To derive pwi j dk , we need to first classify or recognize the uted to the smartphones. Then each smartphone could inde-
activities of users. Life styles are usually reflected as a mix- pendently recognize activities based on the minimum
ture of motion activities with different occurrence probabil- distance rule and upload the activity sequence instead of
ity. Therefore, two motion sensors, accelerometer and the raw data to the server.
542 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 3, MARCH 2015

that correspond to possible daily routines. In Friendbook,


since we are to only compare similarity in activities or
topic patterns, there is no need to infer the physical mean-
ing of each cluster center or topic. On the other hand, not
revealing the actual physical meaning of activities and
topics also has advantages from the perspective of pre-
Fig. 5. Matrix decomposition for life styles analysis. (Redrawn from [19]).
serving privacy.
4.3 Life Style Extraction Using LDA
Given the life documents of all users, Eq. (1) can be further
5 FRIEND-MATCHING GRAPH AND USER IMPACT
represented as a matrix decomposition problem: To characterize relations among users, in this section, we
propose the friend-matching graph to represent the similarity
pw j d pw j zpz j d; (3) between their life styles and how they influence other peo-
ple in the graph. In particular, we use the link weight
where pw j d pw j d1 ; pw j d2 ; . . . ; pw j dn  is the
between two users to represent the similarity of their life
activity-document matrix as shown in Fig. 5 containing the
styles. Based on the friend-matching graph, we can obtain a
probability of each activity over each life document, and
users affinity reflecting how likely this user will be chosen
pw j dk pw1 j dk ; pw2 j dk ; . . . ; pwW j dk T is the kth
as another users friend in the network.
column in the activity-document matrix representing the
probabilities of activities over the life document dk of user k;
pw j z pw j z1 ; pw j z2 ; . . . ; pw j zZ  is the activity- 5.1 Similarity Metric
topic matrix as shown in Fig. 5 representing the probability We define a new similarity metric to measure the similarity
of each activity over each life style (topic), and pw j zk between two life style vectors. Let Li pz1 j di ; pz2 j di ;
pw1 j zk ; pw2 j zk ; . . . ; pwW j zk T is the kth column in the . . . ; pzZ j di  and Lj pz1 j dj ; pz2 j dj ; . . . ; pzZ j dj 
activity-topic matrix representing the probabilities of activi- denote the life style vectors of user i and user j, respectively.
ties over the life style zk ; pz j d pz j d1 ; pz j d2 ; . . . ; We argue that the similarity is not only affected by their
pz j dn  is the topic-document matrix as shown in Fig. 5 life style vectors as a whole, but also by the most important
containing the probability of each topic over each life docu- life styles, i.e., the elements within the vector with larger
ment, and pz j dk pz1 j dk ; pz2 j dk ; . . . ; pzZ j dk T is probability values, also known as the dominant life styles.
the kth column in the topic-document matrix representing We also argue that two users do not share much similarity
the probabilities of life styles over the life document dk of if majority of their life styles are totally different. Therefore,
user k. the similarity of life styles between user i and user j,
The above matrix decomposition problem is actually the denoted by Si; j, is defined as follows:
Latent Dirichlet Allocation model [10]. We use the Expecta-
Si; j Sc i; j  Sd i; j; (4)
tion-Maximization (EM) method to solve the LDA decompo-
sition, where the E-step is used to estimate the free where Sc i; j is used to measure the similarity of the life
variational Dirichlet parameter g [15] and multinomial style vectors of users as a whole, Sd i; j is used to empha-
parameter F in the standard LDA model [10] and the M- size the similarity of users on their dominant life styles.
step is used to maximize the log likelihood of the activities We adopt the commonly used cosine similarity metric for
under these parameters. After the EM algorithm converges, Sc i; j, that is,
we are able to calculate the decomposed activity-topic
Sc i; j cosLi ; Lj : (5)
matrix. Readers are referred to [10] for more details of the
LDA algorithm and alternative decomposition approaches.
It is worth noting that the matrix decomposition process In order to calculate Sd i; j, we first define the set of
can be implemented more efficiently through incremental dominant life styles of a user.
iteration. That is, when a users life document changes or a Definition 1 (Dominant life styles). The set of dominant life
new users life document is uploaded to the system, Friend- styles of a user i, Di , is a subset of all the life styles satisfying
book can calculate the new life style vectors for each user the following requirements:
based on previously derived life style vectors and the new
life document. We did not implement the incremental itera- 1) The total probability distribution of the set is larger
tion in the current framework by simply replying on the than or equal to  which is a predefined threshold.
computing power of cloud computing. As part of our future 2) The probability distribution of any life style in the set is
work, we could add this implementation to the framework larger than or equal to that of any life style not in the
to make Friendbook scalable to large-scale systems. set.
It is also worth noting that since our system uses unsu- 3) The set should have the minimum number of life styles.
pervised learning algorithms to recognize activities and These requirements guarantee that life styles with larger
the topic model to discover life styles, the physical mean- probabilities are more probably to be included in the set. To
ings of derived activities (or cluster centers from the K- find Di , we sort the life style vector Li in the descending
means algorithm) or topics are unknown to us. As men- order with respect to the probabilities of life styles. Then we
tioned in [19], such meaning can be estimated via the addi- have L ^ i pzi1 j di ; pzi2 j di ; . . . ; pziZ j di  where pzis j
tional step of comparing the topic activations to the actual di  pzit j di if s  t. The size of dominant life style set is
structure of the subjects day and then identifying topics calculated as
WANG ET AL.: FRIENDBOOK: A SEMANTIC-BASED FRIEND RECOMMENDATION SYSTEM FOR SOCIAL NETWORKS 543

vertices mean that they do not share enough similar life


styles with others (e.g., user 4). We use the following repre-
sentation to convert the graph into a matrix representation:
2 3
0 v1; 2    v1; n
6 v2; 1 0    v2; n 7
6 7
N Nij nn 6 .. .. .. .. 7: (8)
4 . . . . 5
vn; 1 vn; 2    0
Note that the values on the diagonal are all 0 because we
would not recommend himself/herself to a user.

Fig. 6. An example of friend-matching graph for eight users. 5.3 User Impact Ranking
The friend-matching graph has been constructed to reflect
!
X
q life style relations among users. However, we still lack a
qi arg min pzik j di   : (6) measurement to identify the impact ranking of a user quan-
q
k1 titatively. Intuitively, the impact ranking means a users
Finally, we can obtain the dominant life style set capability to establish friendships in the network. In other
Di fzi1 ; . . . ; ziqi g. words, the higher the ranking, the easier the user can be
The similarity metric Sd i; j for measuring the similarity made friends with, because he/she shares broader life styles
of the dominant life style sets of two users is then defined as with others. Inspired by PageRank [25] which is used in
web page ranking, we form the idea that a users ranking is
2 jDi \ Dj j reflected by his neighbors in the friend-matching graph and
Sd i; j : (7)
jDi j jDj j how much his neighbors endorse the user as a friend.
Once the ranking of a user is obtained, it provides guide-
The range of Sd i; j is 0; 1. Observe that the higher the lines to those who receive the recommendation list on how
percentage of the same life styles, the larger the similarity. to choose friends. The ranking itself, however, should be
When there is no overlap between Di and Dj , the similarity independent from the query user. In other words, the rank-
is 0. When Di and Dj are the same, the similarity is 1. Since ing depends only on the graph structure of the friend-
both Sc i; j and Sd i; j vary between 0 and 1, we conclude matching graph, which contains two aspects: 1) how the
that the similarity metric Si; j varies between 0 and 1. edges are connected; 2) how much weight there is on every
As an example to show the calculation of two users life edge. Moreover, the ranking should be used together with
style similarity, we assume that there are two users 1 and 2 the similarity scores between the query user and the poten-
in the system, who have the life style vectors L1 tial friend candidates, so that the recommended friends are
0:3; 0:1; 0:2; 0:3; 0:1 and L2 0:2; 0:1; 0:4; 0; 0:3, respec- those who not only share sufficient similarity with the query
tively. The number of life style topics is 5. We first calculate user, and are also popular ones through whom the query
Sc 1; 2 cosL1 ; L2 0:6708. Given  0:8, we can calcu- user can increase their own impact rankings.
late the dominant life style sets of these two users, Let N i denote the set of neighbors of user i. Let
D1 fz1 ; z4 ; z3 g and D2 fz3 ; z5 ; z1 g, respectively. There- r r1; r2; . . . ; rnT denote the impact ranking vector
fore, the dominant life style similarity is calculated as where ri is the impact ranking of user i in the friend-
Sd 1; 2 32
2
3 0:67. Finally, the similarity of user 1 and 2 matching graph, and n is the number of users in the system.
is S1; 2 Sc 1; 2  Sd 1; 2 0:45. The calculation of ri is defined as follows:
P
j2N i vi; j  rj
5.2 Friend-Matching Graph Construction ri P : (9)
j2N i vi; j
Based on the similarity metric, we model the relations
between users in real life as a friend-matching graph. As shown in Eq. (9), the impact ranking of a user is
Definition 2 (Friend-matching graph). It is a weighted undi- affected by its neighbors from two aspects: first, the similar-
rected graph G V; E; W , where V fv1 ; v2 ; . . . ; vn g is the ity between itself and a neighbor; second, the impact rank-
set of users and n is the number of users, E fei; jg is the ing of its neighbors. Note that in friend-matching graph,
set of links between users, and W : E ! R is the set of weights vi; j 0 if j is not a neighbor of i. Also vi; i 0 because
of edges. There is an edge ei; j linking user i and user j if and we would not recommend himself/herself to a user. There-
only if their similarity Si; j  Sthr , where Sthr is the prede- fore, Eq. (9) can be rewritten as follows:
fined similarity threshold. The weight of that edge is repre- P
sented by the similarity, that is, vi; j Si; j. j vi; j  rj
ri P : (10)
j vi; j
Fig. 6 demonstrates a friend-matching graph based on the
life styles of eight users and the similarity threshold Sthr is The calculation of ri is an iterative process because any
set to 0.3. An edge linking two users means they have similar change of its neighbors will change ri accordingly. There-
life styles (e.g., e1; 7), and their similarity is quantified by fore, we use a matrix representation to clearly show the iter-
the weight of the edge (e.g., v1; 7 0:62). Some isolated ative process of Eq. (10):
544 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 3, MARCH 2015

rTk1 rTk  H; (11) Friendbook is scalable to large-scale systems if we could


implement the iterative matrix-vector multiplication method
where k and k 1 indicate two subsequent iteration steps, incrementally or distributively, which would be our future
and H Hij nn is the transitional matrix representing the work.
similarity of neighbors. Combining Eq. (8) and Eq. (10), an
element Hij in H is calculated as follows: 6 QUERY AND FRIEND RECOMMENDATION
Before a user initiates a request, he/she should have accu-
vi; j Nij
Hij P P : (12) mulated enough activities in his/her life documents for effi-
j vi; j j Nij cient life styles analysis. The period for collecting data
usually takes at least one day. Longer time would be
In practice, it is possible that people choose friends ran- expected if the user wants to get more satisfied friend rec-
domly rather than based on their importance. Therefore, we ommendation results. After receiving a users request (e.g.,
revise the transitional matrix by introducing a matrix with life documents), the server would extract the users life style
equal value. Then the transitional matrix is modified to Eq. (13): vector, and based on which recommend friends to the user.
The recommendation results are highly dependent on
~ H 1  1 eeT ;
H (13) users preference. Some users may prefer the system to rec-
n ommend users with high impact, while some users may
where e is the n  1 unit vector and is the damping factor used want to know users with the most similar life styles. It is also
to emphasize the importance of the friend-matching graph. possible that some users want the system to recommend
Finally, the iterative process for calculating the impact users who have high impact and also similar life styles to
rank of users is carried out as follows: them. To better characterize this requirement, we propose
the following metric to facilitate the recommendation:
rTk1 rTk  H:
~ (14)
Ri j bSi; j 1  brj k; (15)
The process terminates when the impact ranking vector where Ri j is the recommendation score of user j for the
converges to a stable
Pnvalue. In our experiment, the process query user i, Si; j is the similarity between user i and
terminates when i1 jrk1 i  rk ij  " where " is an user j, and rj is the impact of user j. b 2 0; 1 is the recom-
arbitrary small value larger than 0. The pseudocode for the mendation coefficient characterizing users preference. k is
iterative process is shown in Algorithm 1. After the impact introduced to make Si; j and rj in the same order of mag-
vector r is calculated, we can rank the users in a non- nitude, which can be roughly set to n=10, where n is the
descending order so that a user with higher impact ranking number of users in the system. When b 1, the recom-
is always ahead of a user with lower impact ranking. mendation is solely based on the similarity; when b 0,
the recommendation is solely based on the impact ranking.
With the metric in Eq. (15), our recommendation mecha-
nism for finding the most appropriate friends to a query
user is described as follows. For a query user i, the server
calculates the recommendation scores for all the users in the
system and sorts them in the descending order according to
their recommendation scores. The top p users will be
returned to the query user i. The parameter p is an integer
and can be defined by the querying user. The complexity of
our recommendation mechanism is On since it checks all
users in the system, where n is the overall number of users
in the system.
As the number of users increases, the overhead of query
and recommendation increases linearly. In reality, users
may have totally different life styles and it is not necessary
to calculate their recommendation scores at all. Therefore,
in order to speed up the query and recommendation pro-
In general, by using the Hadoop MapReduce framework, cess, we adopt the reverse index table using (life-style, user)
the impact ranking process can converge quickly. However, pair instead of (user, life-style) pair in the database. Fig. 7
this is becoming increasing infeasible when the size of the sys- shows the difference. With the reverse index table, before
tem is becoming very large. Fortunately, accordingly to the calculating recommendation score for each user, the server
incremental computation of PageRank [6], [11] and the dis- first picks up all the users having overlapping life styles
tributed computation of PageRank [28], the iterative matrix- with the query user and sets the similarities of rest users to
vector multiplication method in Eq. (14) can be implemented the query user to 0. The server then checks all the users to
incrementally or distributively for large-scale evolving calculate their recommendation scores. Although the com-
graphs. In [28], the authors presented a fast algorithm that
p plexity is still On, we can observe that the reverse index
takes O log n= rounds in undirected graphs, where n is table reduces the computation overhead, the advantage of
the network size and  is a fixed constant. Therefore, which is considerable when the system is in large-scale.
WANG ET AL.: FRIENDBOOK: A SEMANTIC-BASED FRIEND RECOMMENDATION SYSTEM FOR SOCIAL NETWORKS 545

of uploading raw data to the servers, Friendbook processes


raw data and classifies them into activities in real-time. The
recognized activities are labeled by integers. In this way,
even if the documents containing the integers are compro-
mised, they cannot tell the physical meaning of the docu-
ments. Second, Friendbook protects users privacy at the life
pattern level. Instead of telling the similar life styles of
users, Friendbook only shows the recommendation scores
of the recommended friends with the users. With the recom-
mendation score, it is almost impossible to infer the life
styles of recommended friends.

7 FEEDBACK CONTROL
To support performance optimization at runtime, we also
integrate a feedback control mechanism into Friendbook.
After the server generates a reply in response to a query, the
feedback mechanism allows us to measure the satisfaction of
users, by providing a user interface that allows the user to rate
the friend list. Let  r denote the impact ranking vector calcu-
lated from the feedback of users. Here,  r1; r2; . . . ;
r 
Fig. 7. Illustration of the reverse index table.
rnT where n is the number of current users of the system.
Let ri; j denote the score that user j rates user i. Then we
The pseudocode of the friend recommendation mecha- have:
nism is shown in Algorithm 2. 
ri; j if user j rates user i;
ri; j (16)
ri otherwise,

where the second equation in Eq. (16) means that the feed-
back score is equal to the original score if user j does not
rate user i. This may commonly occur when the user does
not know the persons being recommended especially when
our system becomes very large.
Based on Eq. (16), we define the impact ranking of user i
influenced by the feedback of users as follows:
X
ri ri; j=n; (17)
j

which takes the feedback of all users into consideration.


Finally, the original impact ranking vector R calculated
from the friend-matching graph is updated as follows:

r ar 1  ar;
 (18)
where a is named as confidence factor and 0  a  1. The
final impact ranking vector considers both the influence of
friend-matching graph and the feedback from users. When
a > 0:5, the friend-matching graph dominates the impact
ranking, however, when a < 0:5, users feedback will sig-
Friendbook also uses GPS location information to help nificantly affect the impact ranking. In this way, the system
users find friends within some distance. In order to protect takes users feedback into consideration to improve the
the privacy of users, a region surrounding the accurate loca- accuracy of future recommendations.
tion will be uploaded to the system. When a user uses
Friendbook, he/she can specify the distance of friends 8 EVALUATION
before recommendation. In this way, only friends having
similarity with the user within the specified distance can be In this section, we present the performance evaluation of
recommended as friends. Friendbook on both small-scale field experiments and large-
Privacy is very important especially for users who are scale simulations.
sensitive to information leakage. In our design of Friend-
book, we also considered the privacy issue and the existing 8.1 Evaluation Using Real Data
system can provide two levels of privacy protection. First, We first evaluate the performance of Friendbook on small-
Friendbook protects users privacy at the data level. Instead scale experiments. Eight volunteers help contribute data
546 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 3, MARCH 2015

TABLE 1
Profession of Users

and evaluate our system. Table 1 demonstrates the profes-


sion of these users. Most of them are students, while the
rest include a businessman, an office worker, and a wait-
ress. Each volunteer carries a Nexus S smartphone with
Fig. 9. User interfaces: (a) query-recommendation interface; (b) connec-
Friendbook application installed in advance. tion interface; (c) rating interface.
They are required to start the application after they
wake up and turn it off before they go to bed. Aside from In the following, we first present the activity classifica-
this, we do not impose any additional requirement on the tion results using K-means clustering algorithm, and then
usage of the smartphone. For example, we do not require show the performance evaluation on real experiments.
them to carry the smartphone all the time during the day or
attach the smartphone to some special parts of the body.
8.1.1 Activity Classification
It is worth noting that some of the eight users are already
friends before experiments but some of them are not. In Fig. 8 shows the classification results using the K-means
fact, some strangers within the group become friends after- clustering algorithm on the data collected from the eight
wards. However, strangers living far away from each other users for a period of three months. Feature vectors instead
do not become friends although they choose each other as a of raw data are used for classification. Each feature vector
friend at the friend recommendation phase. This also moti- consists of seven attributes, f tc ; accx ; accy ; accz ; gyrx ; gyry ;
vates the usage of GPS information into the system to gyrz  (see Section 4.2), and we extract feature vectors every
improve the recommendation accuracy. 60 seconds. As shown in Fig. 8a, the sum of squared error
drops quickly when the cluster number K increases from 1
to 10, and then does not change too much after 15. This
implies that the daily lives of the volunteers are roughly
composed of 15 different types of activities. Although we
can use additional activities to characterize their daily lives
at a finer scale, we find 15 an appropriate compromise as
the most important activities are already involved. Other
activities occur rarely and cannot considerably affect the
friend recommendation results. Therefore, we use K 15
as the number of activities in Friendbook.
Fig. 8b demonstrates the distribution of 15 activities. Most
of activities have the probability of about 6 percent, and three
activities have the probability of larger than 10 percent, and
the remaining three activities have the probability of less than
2 percent. Corresponding to each activity, a centroid feature
vector is calculated by using the K-means clustering algo-
rithm. These 15 centroids are distributed to each smartphone
so that it can perform real-time onboard activity recognition.

8.1.2 Friend Recommendation Results


There are four free parameters used to generate the friend
recommendation results, including the similarity threshold
for friend-matching graph Sthr (Definition 2), the threshold
 (Eq. (6)) that controls the number of dominant life styles,
the damping factor that emphasizes the importance of the
friend matching graph (Eq. (13)), and the number of life
styles. In our practical experiments, we have used the fol-
lowing values as default through empirical studies, i.e., the
similarity threshold Sthr is set to 0:5, the threshold  is set to
0:8, the damping factor is set to 0:85, and the number of
life styles is set to 10.
Fig. 9 shows several user interfaces of Friendbook. Fig. 9a
Fig. 8. Classification performance using the K-means clustering. shows a snapshot of the query and recommendation user
WANG ET AL.: FRIENDBOOK: A SEMANTIC-BASED FRIEND RECOMMENDATION SYSTEM FOR SOCIAL NETWORKS 547

TABLE 2
User Impact Ranking of Eight Users

8.2 Evaluation Using Simulated Data


We perform simulations to further evaluate performance of
Friendbook when the scale of the system is large. Our friend
Fig. 10. The gray image representation of the eight users similarity.
recommendation method is based on life styles extracted
from sensors on users smartphones, which is quite different
interface. The IDs and recommendation scores of recom- from existing friend recommendation methods. To the best
mended friends are shown in the list. Note that Friendbook of our knowledge, there is no real data set that can be used
returns the ID of users instead of their real names due to pri- for a large-scale performance evaluation.
vacy concerns in our experiments. Figs. 9b and 9c show the In our simulation, we randomly and independently gen-
snapshots of user feedback interfaces. Users can connect to erate the life style vectors for 1,000 users, Lk pz1 j dk ;
people in the recommended friend list through our system pz2 j dk ; . . . ; pz10 j dk ; k 1; . . . ; 1;000. Note that since the
and also give a score on the recommended friends. Note life style vector contains the probability of each life style
that we intentionally anonymize the personal information that sum to 1, each entry of the life style vector is randomly
in Fig. 9 to protect the privacy of subjects. In the real system, generated between 0 and 1, and normalized to guarantee
when a user wants to use the system, he/she will be encour- the sum of the values in the vector is equal to 1. For each
aged to complete his/her personal profile, e.g., name and user, the similarities between itself and all the other users
photo. Therefore, the name and photo information as well can be calculated based on the similarity metric in Eq. (4)
as the similarity score of each recommended friend will be and the 100 most similar users are chosen as its true friends,
shown to the user. denoted as Gi for user i. After the life styles are uploaded to
Fig. 10 illustrates the gray-scale image representation of the system, a friend-matching graph can be constructed and
the similarity matrix for eight users. The blocks in the diago- each user has an impact. We then let each user query the
nal has the darkest color because users always have the per- system and obtain its friend recommendation results. Let Fi
fect match with themselves. As shown in Fig. 10, user 1 has denote the set of recommended friends. The following mea-
strong relationship with user 2 and user 5, user 3 has strong surement metrics are used for performance evaluation:
relationship with user 7, user 6 has relationship with the
aforementioned users but not very strong, while user 4 and  Recommendation precision Rp . The average of the ratio
user 8 have no relationship with others at all. The result is of the number of recommended friends in the set of
consistent with the ground truth of professions shown in true friends of the query user over the total number
Table 1 because people have the same profession usually of recommended friends.
have the same life styles. P
The user impact rankings of the eight users are shown jFi \ Gi j=jFi j
Rp i ;
in Table 2. The top ranks are users 1 and 7, followed by 1;000
users 4 and 8 who seem to have high impact ranks. How-
where j  j denotes the number of elements in a set.
ever, users 4 and 8 are not supposed to be higher than
The dominator is 1,000 because Rp is the average of
others because they have no connections. Indeed, because
1,000 users in one experiment.
of this, they should always maintain the initial score.
 Recommendation recall Rr . The average of the ratio of
Since we only have eight users in the system, each of
the number of recommended friends in the set of
whom uses 18 0:125 as its initial random impact, as
true friends of the query user over the number of the
described in Algorithm 1, which results in that their rank-
set of true friends of the query user, which is 100 in
ings are even higher than some of the connected users. If
our experiments:
we have 1;000 users, they should only have 0:001 for their
P P
i jFi \ Gi j=jGi j jFi \ Gi j=100
final score, thus they would not have so high rank.
Although the independent users are not avoidable, we Rr i :
1;000 1;000
only recommend a small portion of top p friends. The
chance that an independent user ranks in top p is small Using these performance metrics, we study the effect of a
when the total number of users is large. Therefore, the few parameters, such as the similarity threshold, Sthr , for
recommendation quality should not be affected much friend-matching graph construction, and the recommenda-
from independent users. tion coefficient, b, for balancing users preference on impact
548 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 3, MARCH 2015

Fig. 11. Evaluation on similarity threshold. The number of returned rec-


ommended friends is 100.

and similarity. For all the simulation results presented in


this paper, each data point is an average of 100 experiments.

8.2.1 Effect of Similarity Threshold Sthr


We first evaluate the effect of the similarity threshold Sthr
which is important for friend-matching graph construction.
When Sthr is too small, almost all users in the system are con-
nected, which increases the overhead for graph construction
and maintenance as well as the ranking process. While if Sthr
is too large, the friend-matching graph cannot reflect the true
relationships between users since those with high similarity
may not be connected with each other. Therefore, the value
of the similarity threshold for the friend-matching graph
Fig. 12. Evaluation on recommendation coefficient b.
construction should be carefully selected.
Fig. 11 shows the effect of the similarity threshold on
the recommendation recall. We only consider the case that 10 percent when 1;000 users are recommended. This is
100 friends are recommended. In this case, the recommen- because we only choose 100 users as the set of true friends
dation precision equals the recommendation recall. We first for each user. The recommendation precision also increases
observe that the recommendation recall drops when Sthr when b increases because similarity plays more and more
increases. The reason is that links with small similarities are important roles as b increases.
gradually removed from the graph as Sthr increases. We It is obvious that our recommendation is much better
also notice that the influence of Sthr is more obvious for than random recommendation which is usually 10 percent
smaller b. This is because removing links affects the impact on average. However, note that we cannot conclude that the
rankings of users and meanwhile smaller b relies more on larger b the better, as shown in Fig. 12. Because users may
the impact rankings. In our following experiments, we prefer impact to similarity and our metrics cannot reflect
choose Sthr 0:5 when we evaluate other parameters. the recommendation accuracy on impact. To better charac-
terize the recommendation results, we define a new metric,
8.2.2 Effect of Recommendation Coefficient b called compromise score, as follows:
b is introduced to characterize users preference on impact
X
100
and similarity. Figs. 12a and 12b show the evaluation results B Si; ij rij ;
of b on recommendation recall and precision, respectively. j1
When b 0, the recommendation recall is relatively low
which is just a little better than random recommendation. where B is the compromise score and ij is the jth recom-
This is because the recommendation only relies on the mended friend for the query user i. In this experiment,
impact rankings of users rather than similarities between 100 friends are recommended for each query user. Fig. 13
users with the query user. The recommendation recall shows the impact of b on the metric. As we mentioned
increases as b increases because similarity is more and more before, each data point is an average of 100 experiments.
emphasized. When b reaches 1, all friends in a query users We can see that the metric achieves its maximum when b is
group are recommended. Note that even when b is not too around 0:3. Although the recommended friends have high
large (e.g., b 0:3), the recommendation recall is still high impact when b is small, the similarity between them and
and reaches 100 percent quickly as the number of recom- the query user is low. In contrast, the similarity between the
mended friends increases. As shown in Fig. 12, the recom- recommended friends and the query user is high, but their
mendation precision decreases when the number of popularity is low. Therefore, in order to well balance the
recommendation friends increases and finally reaches similarity and popularity, b should be carefully selected.
WANG ET AL.: FRIENDBOOK: A SEMANTIC-BASED FRIEND RECOMMENDATION SYSTEM FOR SOCIAL NETWORKS 549

TABLE 3
Resources Requirements Comparison

9 CONCLUSION AND FUTURE WORK


In this paper, we presented the design and implementation
of Friendbook, a semantic-based friend recommendation
system for social networks. Different from the friend recom-
mendation mechanisms relying on social graphs in existing
Fig. 13. The impact of b on B. social networking services, Friendbook extracted life styles
from user-centric data collected from sensors on the smart-
Our system leaves the setting of b to users, so they can find phone and recommended potential friends to users if they
friends based on their preferences. share similar life styles. We implemented Friendbook on the
Android-based smartphones, and evaluated its performance
on both small-scale experiments and large-scale simulations.
8.3 Resource Consumption The results showed that the recommendations accurately
Finally, we evaluate the energy consumption performance reflect the preferences of users in choosing friends.
of the Friendbook client application. Usually, the user will Beyond the current prototype, the future work can be
not have the incentive to use the application if the battery four-fold. First, we would like to evaluate our system on
runs out in less than 10 hours, which is the typical hour of large-scale field experiments. Second, we intend to imple-
usage for a day (the user can re-charge the battery during ment the life style extraction using LDA and the iterative
the night). Therefore, energy consumption is another impor- matrix-vector multiplication method in user impact ranking
tant metric that has to be measured. We test the energy con- incrementally, so that Friendbook would be scalable to
sumption of the same smartphone under two modes: idle large-scale systems. Third, the similarity threshold used for
mode with Friendbook off and active mode with Friend- the friend-matching graph is fixed in our current prototype
book on. Either mode is under a users normal use such as of Friendbook. It would be interesting to explore the adap-
making phone calls, checking emails, sending SMS, etc. As tion of the threshold for each edge and see whether it can
shown in Fig. 14, Friendbook drops the battery to 15 percent better represent the similarity relationship on the friend-
in about 13 hours. The evaluation shows that Friendbook matching graph. At last, we plan to incorporate more sen-
achieves satisfactory results on the energy performance. sors on the mobile phones into the system and also utilize
In addition to energy, as shown in Table 3, we monitor the information from wearable equipments (e.g., Fitbit,
the runtime resources that our client application consumes iwatch, Google glass, Nike+, and Galaxy Gear) to discover
comparing to other Android system services and applica- more interesting and meaningful life styles. For example,
tions. It is easily observed that the client application is small we can incorporate the sensor data source from Fitbit, which
in size while the runtime memory usage is comparable to extracts the users daily fitness infograph, and the users
some popular applications, such as Google Maps. In addi- place of interests from GPS traces to generate an infograph
tion, the wireless data usage is low so that the user does not of the user as a document. From the infograph, one can
need to worry about the data plan of their mobile phone. easily visualize a users life style which will make more
Moreover, even when we turn on GPS, accelerometer and sense on the recommendation. Actually, we expect to incor-
gyroscope sensors all the time, the total CPU usage of the porate Friendbook into existing social services (e.g., Face-
client application is still very low on average. book, Twitter, LinkedIn) so that Friendbook can utilize
more information for life discovery, which should improve
the recommendation experience in the future.

ACKNOWLEDGMENTS
The authors would like to thank the anonymous reviewers
whose insightful comments have helped improve the
presentation of this paper significantly. This work was
supported in part by NSF CNS-1017156 and CNS-0953238,
and in part by NSFC 61273079, ANRNSFC 61061130563,
NSFC 61373167 and Natural Science Foundation of Hubei
Province No. 2013CFB297.

REFERENCES
[1] Amazon. (2014). [Online]. Available: http://www.amazon.com/
Fig. 14. Energy consumption comparison.
550 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 14, NO. 3, MARCH 2015

[2] Facebook statistics. (2011). [Online]. Available: http://www. [28] A. D. Sarma, A. R. Molla, G. Pandurangan, and E. Upfal, Fast Dis-
digitalbuzzblog.com/facebook-statistics-stats-facts-2011/ tributed Pagerank Computation. Berlin, Germany: Springer, pp. 11-
[3] Netfix. (2014). [Online]. Available: https://signup.netflix.com/ 26, 2013.
[4] Rotten tomatoes. (2014). [Online]. Available: http://www.rotten- [29] G. Spaargaren and B. Van Vliet, Lifestyles, consumption and the
tomatoes.com/ environment: The ecological modernization of domestic con-
[5] G. R. Arce, Nonlinear Signal Processing: A Statistical Approach. New sumption, Environ. Politic, vol. 9, no. 1, pp. 5076, 2000.
York, NY, USA: Wiley, 2005. [30] M. Tomlinson, Lifestyle and social class, Eur. Sociol. Rev.,
[6] B. Bahmani, A. Chowdhury, and A. Goel, Fast incremental and vol. 19, no. 1, pp. 97111, 2003.
personalized pagerank, Proc. VLDB Endowment, vol. 4, pp. 173 [31] Z. Wang, C. E. Taylor, Q. Cao, H. Qi, and Z. Wang, Demo:
184, 2010. Friendbook: Privacy preserving friend matching based on shared
[7] J. Biagioni, T. Gerlich, T. Merrifield, and J. Eriksson, EasyTracker: interests, in Proc. 9th ACM Conf. Embedded Netw. Sens. Syst., 2011,
Automatic transit tracking, mapping, and arrival time prediction pp. 397398.
using Smartphones, in Proc. 9th ACM Conf. Embedded Netw. Sen- [32] X. Yu, A. Pan, L.-A. Tang, Z. Li, and J. Han, Geo-friends recom-
sor Syst., 2011, pp. 6881. mendation in GPS-based cyber-physical social network, in Proc.
[8] L. Bian and H. Holtzman, Online friend recommendation Int. Conf. Adv. Social Netw. Anal. Mining, 2011, pp. 361368.
through personality matching and collaborative filtering, in Proc. [33] Y. Zheng, Y. Chen, Q. Li, X. Xie, and W.-Y. Ma, Understanding
5th Int. Conf. Mobile Ubiquitous Comput., Syst., Services Technol., transportation modes based on GPS data for web applications,
2011, pp. 230235. ACM Trans. Web, vol. 4, no. 1, pp. 136, 2010.
[9] C. M. Bishop, Pattern Recognition and Machine Learning. New York,
NY, USA: Springer, 2006. Zhibo Wang received the B.E. degree in Auto-
[10] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet mation from Zhejiang University, China, in 2007,
allocation, J. Mach. Learn. Res., vol. 3, pp. 9931022, 2003. and his Ph.D. degree in Electrical Engineering
[11] P. Desikan, N. Pathak, and J. Srivastava, V. Kumar, Incremental and Computer Science from University of
page rank computation on evolving graphs, in Proc. Special Inter- Tennessee, Knoxville, in 2014. He is currently an
est Tracks Posters 14th Int. Conf. World Wide Web, 2005, pp. 1094 associate professor with the School of Computer,
1095. Wuhan University. His research interests include
[12] N. Eagle and A. S. Pentland, Reality mining: Sensing complex wireless sensor networks and mobile sensing
cocial systems, Pers. Ubiquitous Comput., vol. 10, no. 4, pp. 255 systems. He is a student member of the IEEE.
268, Mar. 2006.
[13] K. Farrahi and D. Gatica-Perez, Probabilistic mining of socio-
geographic routines from mobile phone data, IEEE J. Select.
Topics Signal Process., vol. 4, no. 4, pp. 746755, Aug. 2010. Jilong Liao received the BE degree from the
[14] K. Farrahi and D. Gatica Perez, Discovering routines from large- University of Electronic Science and Technology,
scale human locations using probabilistic topic models, ACM China, in 2010. He received the masters of sci-
Trans. Intell. Syst. Technol., vol. 2, no. 1, pp. 3:13:27, 2011. ence degree from the University of Tennessee,
[15] B. A. Frigyik, A. Kapila, and M. R. Gupta, Introduction to the Knoxville in 2013. His major research interests
Dirichlet distribution and related processes, Dept. Elect. Eng., included mobile system, data analysis, distrib-
Univ. Washington, Seattle, WA, USA, UWEETR-2010-0006, 2010. uted system, and wireless sensor networks. He
[16] A. Giddens, Modernity and Self-Identity: Self and Society in the Late is currently working at Microsofts Operating Sys-
Modern Age. Stanford, CA, USA: Stanford Univ. Press, 1991. tem Group (OSG) as a software development
[17] L. Gou, F. You, J. Guo, L. Wu, and X. L. Zhang, Sfviz: Interest- engineer.
based friends exploration and recommendation in social
networks, in Proc. Visual Inform. Commun.-Int. Symp., 2011, p. 15. Qing Cao received the BS degree from Fudan
[18] W. H. Hsu, A. King, M. Paradesi, T. Pydimarri, and T. Weninger, University, China, the MS degree from the Uni-
Collaborative and structural recommendation of friends using versity of Virginia, and the PhD degree from the
weblog-based social network analysis, in Proc. AAAI Spring University of Illinois in 2008. He is currently an
Symp. Ser., 2006, pp. 5560. assistant professor in the Department of Electri-
[19] T. Huynh, M. Fritz, and B. Schiel, Discovery of activity patterns cal Engineering and Computer Science at the
using topic models, in Proc. 10th Int. Conf. Ubiquitous Comput., University of Tennessee. His research interests
2008, pp. 1019. include wireless sensor networks, embedded
[20] J. Kwon and S. Kim, Friend recommendation method using systems, and distributed networks. He is a mem-
physical and social context, Int. J. Comput. Sci. Netw. Security, ber of the ACM, IEEE, and the IEEE Computer
vol. 10, no. 11, pp. 116120, 2010. Society.
[21] J. Lester, T. Choudhury, N. Kern, G. Borriello, and B. Hannaford,
A hybrid discriminative/generative approach for modeling
human activities, in Proc. Int. Joint Conf. Artif. Intell., 2005, p. 766-
772.
[22] Q. Li, J. A. Stankovic, M. A. Hanson, A. T. Barth, J. Lach, and G.
Zhou, Accurate, fast fall detection using gyroscopes and acceler-
ometer-derived posture information, in Proc. 6th Int. Workshop
Wearable Implantable Body Sensor Netw., 2009, pp. 138143.
[23] E. Miluzzo, C. T. Cornelius, A. Ramaswamy, T. Choudhury, Z.
Liu, and A. T. Campbell, Darwin phones: The evolution of sens-
ing and inference on mobile phones, in Proc. 8th Int. Conf. Mobile
Syst., Appl., Services, 2010, pp. 520.
[24] E. Miluzzo, N. D. Lane, S. B. Eisenman, and A. T. Campbell,
CenceMe: Injecting sensing presence into social networking
applications, in Proc. 2nd Eur. Conf. Smart Sensing Context, Oct.
2007, pp. 128.
[25] L. Page, S. Brin, R. Motwani, and T. Winograd, The Pagerank
citation ranking: Bringing order to the web, Stanford InfoLab,
Stanford, CA, USA, Tech. Rep. 1999-66, 1999.
[26] S. Reddy, M. Mun, J. Burke, D. Estrin, M. Hansen, and M.
Srivastava, Using mobile phones to determine transportation
modes, ACM Trans. Sens. Netw., vol. 6, no. 2, p. 13, 2010.
[27] I. Ropke, The dynamics of willingness to consume, Ecological
Econ., vol. 28, no. 3, pp. 399420, 1999.
WANG ET AL.: FRIENDBOOK: A SEMANTIC-BASED FRIEND RECOMMENDATION SYSTEM FOR SOCIAL NETWORKS 551

Hairong Qi received the BS and MS degrees Zhi Wang received the BS degree from
in computer science from Northern JiaoTong Shenyang Jian Zhu University, China, in 1991,
University, Beijing, China, in 1992 and 1995, the MS degree from Southeast University, China,
respectively, and the PhD degree in computer in 1997, and the PhD degree from Shenyang
engineering from North Carolina State University, Institute of Automation, the Chinese Academy of
Raleigh, in 1999. She is currently a professor Sciences, China, in 2000. During 2000-2002, as
with the Department of Electrical Engineering a postdoc, he has conducted research in Institut
and Computer Science at the University of National Polytechique de Lorraine, France and
Tennessee, Knoxville. Her current research inter- Zhejiang University, China, respectively. During
ests are in advanced imaging and collaborative 2010-2011, he has conducted research as an
processing in resource-constrained distributed advanced scholar at UW-Madion. He is currently
environment, hyperspectral image analysis, and bioinformatics. She an associate professor in the Department of Control Science and Engi-
received the US National Science Foundation (NSF) CAREER Award. neering of Zhejiang University. His research interests include wireless
She also received the Best Paper Award at the 18th International Confer- sensor networks, visual sensor networks, industrial communication and
ence on Pattern Recognition and the third ACM/IEEE International systems, and networked control systems. He is a member of the IEEE
Conference on Distributed Smart Cameras. She recently receives the and ACM.
Highest Impact Paper from the IEEE Geoscience and Remote Sensing
Society. She is a senior member of the IEEE.
" For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.

You might also like