Professional Documents
Culture Documents
Prof. Sajitha N.
Assistant professor, Computer science and Engineering, BNMIT, Bengaluru, India
Abstract-The rapid growth of technology plays important role in the day to day life of human beings.
Human beings make the maximum use of social networking to communicate and interact with each
other. Hence, the rapid growth of social networking resulted in the interest to detect and discover the
emerging topic. The information exchanged between the users in social networking involves not only
text messages but also URLs, videos, audios etc. So the conventional frequency based approaches may
not be appropriate to use in this context. The focus is on the emergence of topics by the social aspects of
the network such as the mentions of users links generated dynamically between users depending on the
replies, mentions and re-tweets. A probability model is proposed based on the mentioning behavior of
the social network user, and the emergence of new topic is detected from the anomalies measured
through the model. By aggregating the anomaly scores from hundreds of users, it is possible to detect
the emerging topic only based on the reply/mention relationships in social-network posts.
Index terms: Social networks, data mining, anomaly detection
I. INTRODUCTION
The communication and interaction between human beings over social networks such as Face book,
twitter is gaining more and more importance in our daily life. The information exchanged over social
networks is not only texts but also URLs, images, audios and videos which become a challenging study
of data mining. The detection of emerging topic from social streams can be used to create automated
breaking news or to discover the hidden market requirements or to know the underground political
movements. Social media are able to capture the earliest, unedited voice of ordinary people compared to
conventional media. Because of the existence of mentions the social media has become social. Mentions
are the links to other users in the same social network in the form of message-to, reply-to, and re-tweet-
of or explicitly in the text. One post may contain a number of mentions. Some users may include
mentions in their posts rarely or some may mention frequently. For example, celebrities may receive
mentions every minute, where as for normal people being mentioned is a rare occasion. Thus, mention
can be described as a language with the number of words equal to the number of users in the social
network. By monitoring the mentioning behavior of users, the emerging topics can be detected. This is
based on the assumption that the emerging topic is something people feel like discussing, commenting
or may be forwarding the information further to their friends. Frequencies of textual words can be used
as the tool to detect the emerging topic in conventional methods. The problem with this method is the
misinterpretation of the homonyms and synonyms. It also requires complex preprocessing and
segmentation. Main thing is that it cannot be applied when the contents of the message are non-textual in
nature.
problems or finding errors in text. Anomalies are also referred to as outliers, novelties, noise, deviations
and exceptions.
Three broad categories of anomaly detection techniques exist.
1.Unsupervised anomaly detection techniques detect anomalies in an unlabeled test data set under the
assumption that the majority of the instances in the data set are normal by looking for instances that
seem to fit least to the remainder of the data set.
Supervised anomaly detection techniques require a data set that has been labeled as "normal" and
"abnormal" and involves training a classifier (the key difference to many other statistical
classification problems is the inherent unbalanced nature of outlier detection). Semi-supervised
anomaly detection techniques construct a model representing normal behavior from a
given normal training data set, and then testing the likelihood of a test instance to be generated by the
learnt model.
it is expensive to go up a level and cheap to decrease a level. It is useful for text stream analysis (such as
emails, corpus, and publication). Burst detection [3] is particularly useful for examining the trends in
collections of texts or communities of conversation. Even words that are used comparatively little, but
that change in frequency of usage over time, stand out, unlike in burst detection algorithms based on
thresholds.
III. CONCLUSION
A new method to detect the emergence of topics in a social network stream has been proposed. The
social aspect of the posts reflected in the mentioning behavior of users is used in the model. This
mentioning behavior model is the basic idea of proposed approach, compared to the normal textual
content based approach. A probability model that captures both the number of mentions per posts and
the frequency of mentions is derived. The emergence of the topic is then detected using SDNML
change-point detection algorithm and Kleinbergs burst detection method that is been applied over the
proposed mention model. Future improvements for this approach can be made to handle the social
stream in real time. A further research can be performed to boost the performance and to reduce the
false-alarms by combining the proposed link-anomaly model with the content-based topic detection
approach.
REFERENCES
[1]. Toshimitsu Takahashi, Ryota Tomioka, and Kenji Yamanishi, Member, IEEE, VOL. 26, NO. 1, JANUARY
2014,Discovering Emerging Topics in Social Streams via Link Anomaly Detection
[2]. J. Allan et al., Topic Detection and Tracking Pilot Study: Final Report, Proc. DARPA Broadcast News
Transcription and Understanding Workshop, 1998.
[3]. J. Kleinberg, Bursty and Hierarchical Structure in Streams, Data Mining Knowledge Discovery, vol. 7, no. 4, pp.
373-397, 2003.
[4]. A. Krause, J. Leskovec, and C. Guestrin, Data Association for Topic Intensity Tracking, Proc. 23rd Intl Conf.
Machine Learning (ICML 06), pp. 497-504, 2006.
[5]. http://en.wikipedia.org/wiki/Anomaly_detection
[6]. http://en.wikipedia.org/wiki/Social_network
[7]. http://link.springer.com/chapter/10.1007%2F978-3-642-20847-8_16