You are on page 1of 4

Link State Anomaly Detection For Social Networks

Prof. Sajitha N.
Assistant professor, Computer science and Engineering, BNMIT, Bengaluru, India

Abstract-The rapid growth of technology plays important role in the day to day life of human beings.
Human beings make the maximum use of social networking to communicate and interact with each
other. Hence, the rapid growth of social networking resulted in the interest to detect and discover the
emerging topic. The information exchanged between the users in social networking involves not only
text messages but also URLs, videos, audios etc. So the conventional frequency based approaches may
not be appropriate to use in this context. The focus is on the emergence of topics by the social aspects of
the network such as the mentions of users links generated dynamically between users depending on the
replies, mentions and re-tweets. A probability model is proposed based on the mentioning behavior of
the social network user, and the emergence of new topic is detected from the anomalies measured
through the model. By aggregating the anomaly scores from hundreds of users, it is possible to detect
the emerging topic only based on the reply/mention relationships in social-network posts.
Index terms: Social networks, data mining, anomaly detection

I. INTRODUCTION
The communication and interaction between human beings over social networks such as Face book,
twitter is gaining more and more importance in our daily life. The information exchanged over social
networks is not only texts but also URLs, images, audios and videos which become a challenging study
of data mining. The detection of emerging topic from social streams can be used to create automated
breaking news or to discover the hidden market requirements or to know the underground political
movements. Social media are able to capture the earliest, unedited voice of ordinary people compared to
conventional media. Because of the existence of mentions the social media has become social. Mentions
are the links to other users in the same social network in the form of message-to, reply-to, and re-tweet-
of or explicitly in the text. One post may contain a number of mentions. Some users may include
mentions in their posts rarely or some may mention frequently. For example, celebrities may receive
mentions every minute, where as for normal people being mentioned is a rare occasion. Thus, mention
can be described as a language with the number of words equal to the number of users in the social
network. By monitoring the mentioning behavior of users, the emerging topics can be detected. This is
based on the assumption that the emerging topic is something people feel like discussing, commenting
or may be forwarding the information further to their friends. Frequencies of textual words can be used
as the tool to detect the emerging topic in conventional methods. The problem with this method is the
misinterpretation of the homonyms and synonyms. It also requires complex preprocessing and
segmentation. Main thing is that it cannot be applied when the contents of the message are non-textual in
nature.

1.1 Anomaly Detection


In data mining, anomaly detection (or outlier detection)[5] is the identification of items, events or
observations which do not conform to an expected pattern or other items in a dataset. Typically the
anomalous items will translate to some kind of problem such as bank fraud, a structural defect, medical

@IJRTER-2016, All Rights Reserved 1


International Journal of Recent Trends in Engineering & Research (IJRTER)
Volume 02, Issue 11; November - 2016 [ISSN: 2455-1457]

problems or finding errors in text. Anomalies are also referred to as outliers, novelties, noise, deviations
and exceptions.
Three broad categories of anomaly detection techniques exist.
1.Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the
assumption that the majority of the instances in the data set are normal by looking for instances that
seem to fit least to the remainder of the data set.
Supervised anomaly detection techniques require a data set that has been labeled as "normal" and
"abnormal" and involves training a classifier (the key difference to many other statistical
classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised
anomaly detection techniques construct a model representing normal behavior from a
given normal training data set, and then testing the likelihood of a test instance to be generated by the
learnt model.

1.2 Social Networks


A social network [6] is a social structure made up of a set of social actors (such as individuals or
organizations) and a set of the relationship between these actors. The social network perspective
provides a set of methods for analyzing the structure of whole social entities as well as a variety of
theories explaining the patterns observed in these structures.
Computer networks combined with social networking software produces a new medium for social
interaction. A relationship over a computerized social networking service can be characterized by
context, direction, and strength. The content of a relation refers to the resource that is exchanged. In a
computer mediated communication context, social pairs exchange different kinds of information,
including sending a data file or a computer program as well as providing emotional support or arranging
a meeting. With the rise of electronic commerce, information exchanged may also correspond to
exchanges of money, goods or services in the real world.
A social networking service (also social networking site or SNS) is a platform to build social networks
or social relations among people who share interests, activities, backgrounds or real-life connections.
Through data mining, companies are able to improve their sales and profitability. With this data,
companies create customer profiles that contain customer demographics and online behavior. Examples
of social networking sites include Twitter; Face book etc. Twitter is a well known source of information
regarding breaking news stories. This aspect of Twitter makes it ideal for identifying events as they
happen.

1.3 Sequentially Discounting Normalized Maximum Likelihood (SDNML) Coding


SDNML [7] is a method for sequential data compression of a sequence. It attains the least code length
for the sequence and the effect of past data is gradually discounted as time goes on, hence the data
compression can be done adaptively to non-stationary data sources. SDNML is used to learn the
mechanism of a time series, and then a change-point score at each time is measured in terms of the
SDNML code-length. This technology has recently received vast attentions in the area of data mining
since it can be applied to a wide variety of important risk management issues such as the detection of
failures of computer devices from computer performance data, the detection of masqueraders/malicious
executables from computer access logs, etc.

1.4 Burst Detection


A burst is a period of increased activity, determined by minimizing a cost function that assumes a set of
possible states (not bursting and various degrees of burstiness) with increasing event frequencies, where

@IJRTER-2016, All Rights Reserved 2


International Journal of Recent Trends in Engineering & Research (IJRTER)
Volume 02, Issue 11; November - 2016 [ISSN: 2455-1457]

it is expensive to go up a level and cheap to decrease a level. It is useful for text stream analysis (such as
emails, corpus, and publication). Burst detection [3] is particularly useful for examining the trends in
collections of texts or communities of conversation. Even words that are used comparatively little, but
that change in frequency of usage over time, stand out, unlike in burst detection algorithms based on
thresholds.

II. LITERATURE SURVEY


2.1 Topic Detection and Tracking (TDT)
TDT [2] is used to investigate the new events in a stream of broadcast news stories. For example, the
tragedy due to Tsunami at Chennai on 2004 is considered to be an event where as the similar tragedies
due to cyclone in general is considered to be a class of events. The events can be expected or
unexpected.

2.2 Bursty and Hierarchical Structure in Streams


A fundamental problem in text data mining is to extract meaningful structure from document streams
that arrive continuously over time. E-mail and news articles are two natural examples of such streams,
each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The
approach is based on modeling the stream using an infinite-state automaton, in which bursts appear
naturally as state transitions; it can be viewed as drawing an analogy with models from queuing theory
for bursty network traffic. The resulting algorithms are highly efficient, and yield a nested representation
of the set of bursts that imposes a hierarchical structure on the overall stream. [3]
Methodology
The proposed method has a probability model that can capture the normal mentioning behavior of a user
which consists of both the number of mentions per post and the frequency of users mentioned in the
post. This model is used to measure the anomaly of future user behavior. The anomaly scores obtained
in this way is aggregated over several users and the sequentially discounting normalized maximum-
likelihood (SDNML) coding based change point detection technique is applied. This technique will
detect the change in the statistical dependence structure in the time series of aggregated anomaly scores
shows where the emergence of topic is. The overall flow of the proposed method is shown in the fig.3.1.

Fig3.1. Overall flow of the Link-Anomaly based emerging topic discovery

III. CONCLUSION
A new method to detect the emergence of topics in a social network stream has been proposed. The
social aspect of the posts reflected in the mentioning behavior of users is used in the model. This
mentioning behavior model is the basic idea of proposed approach, compared to the normal textual
content based approach. A probability model that captures both the number of mentions per posts and
the frequency of mentions is derived. The emergence of the topic is then detected using SDNML

@IJRTER-2016, All Rights Reserved 3


International Journal of Recent Trends in Engineering & Research (IJRTER)
Volume 02, Issue 11; November - 2016 [ISSN: 2455-1457]

change-point detection algorithm and Kleinbergs burst detection method that is been applied over the
proposed mention model. Future improvements for this approach can be made to handle the social
stream in real time. A further research can be performed to boost the performance and to reduce the
false-alarms by combining the proposed link-anomaly model with the content-based topic detection
approach.
REFERENCES
[1]. Toshimitsu Takahashi, Ryota Tomioka, and Kenji Yamanishi, Member, IEEE, VOL. 26, NO. 1, JANUARY
2014,Discovering Emerging Topics in Social Streams via Link Anomaly Detection
[2]. J. Allan et al., Topic Detection and Tracking Pilot Study: Final Report, Proc. DARPA Broadcast News
Transcription and Understanding Workshop, 1998.
[3]. J. Kleinberg, Bursty and Hierarchical Structure in Streams, Data Mining Knowledge Discovery, vol. 7, no. 4, pp.
373-397, 2003.
[4]. A. Krause, J. Leskovec, and C. Guestrin, Data Association for Topic Intensity Tracking, Proc. 23rd Intl Conf.
Machine Learning (ICML 06), pp. 497-504, 2006.
[5]. http://en.wikipedia.org/wiki/Anomaly_detection
[6]. http://en.wikipedia.org/wiki/Social_network
[7]. http://link.springer.com/chapter/10.1007%2F978-3-642-20847-8_16

@IJRTER-2016, All Rights Reserved 4

You might also like