Professional Documents
Culture Documents
BiDA2014
A National Workshop On
Big Data Analytics
at CRRao AIMSCS and CMSD, UoH,
Hyderabad, India
on 22-24 August, 2014
BiDA2014
BiDA2014
Page
Foreword ........................
BiDA2014 Schedule...........
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Technical Experts
V C V Rao, V Handa, R R Naik, S Rout, C-DAC Bangalore...
24
BiDA2014
Foreword
Welcome to the First National Workshop on Big Data Analytics (BiDA2014), 22-24 August, 2014, in Hyderabad.
Data and information deluge is a truly remarkable phenomenon in todays world. Today we no longer need to go out
seeking for precious bits of data; it is, in fact, the staggering amounts of data that stare at us begging our attention,
whether or not we are suitably equipped to tackle the numerous challenges that they present.
Indeed with their ever-increasing volume, velocity and variety, large datasets generated from different sources from
all spheres of life continue to pose new challenges, and increasing demands, on computational analysis and data
management. The inherent complexities (e.g., data in motion, unstructured nature, security, veracity, ethical issues)
of Big Data has forced us to rethink how we can collect, store, combine and analyze it in efficient, reliable and costeffective manner. The foundations and capabilities to understand and address these needs must be built up urgently.
Large amounts of data from all sectors, ranging from health to education, environment to agriculture, commerce to art
and humanities, security to governance, etc., now provide us both professionals and researchers alike with
unprecedented opportunity to make an informed difference: introduce new approaches to industry and policymaking, take objective data-driven decisions, take advantage of inference and prediction, gain insights on risks and
protect health and wealth, create more useful infrastructure, offer enhanced and dynamic understanding of various
phenomena, and through all these myriad abilities, profoundly empower our lives and the world around us.
CR Rao AIMSCS embodies the foresight and vision of Professor C R Rao in uniting the analytical trinity of
Mathematics, Statistics and Computer Science under one roof. It is, therefore, both timely and appropriate for CR
Rao AIMSCS to organize BiDA2014, jointly with the Centre for Modelling, Simulation and Design (CMSD), University
of Hyderabad (UoH), and the Computer Society of India (CSI), which is celebrating its Golden Jubilee Year in 2014.
As the field of Big Data is taking shape around the world, in this workshop, we have decided to offer both knowledge
and skills that are relevant to Big Data. The participants will learn about different domains and applications, both
established as well as emerging, in which the task of dealing with large and complex datasets is getting increasingly
unavoidable, and in fact, highly sought after. Distinguished experts will speak on the current problems and technical
aspects in their areas of research, while a panel of experts from the government and the industry will share their
perspectives with the participants. I thank all of them for agreeing to come to speak at the workshop.
Further, we also have also arranged for an experienced team of technical experts from the Centre for Development
of Advanced Computing (C-DAC) to provide the participants with multiple sessions of hands-on laboratory training to
demonstrate and work on practical problems in Big Data. Thanks much for their valuable time and expertise.
We are pleased to note that the academia, the industry and the government are all represented in both our list of
speakers as well as our list of participants. Notably, the latter spans the length and breadth of the country, in the true
spirit of a National Workshop. We wish all of them a most enriching experience here.
I take this opportunity to thank all my colleagues and scholars, as well as the co-organizers and the Organizing
Committee-members, for their hard work in organizing the workshop. I also thank University of Hyderabad, C-DAC,
CSI Hyderabad Chapter and IIIT Hyderabad for their kind and enthusiastic co-operation. Finally, we express our
gratitude to all the sponsors, especially DST and MoS&PI, for providing generous support to the event.
Thank you and best wishes,
Saumyadipta Pyne.
Chair, BiDA2014,
CR Rao AIMSCS, Hyderabad, India.
15th August, 2014.
BiDA2014
Message from Professor C R Rao
BiDA2014
BiDA2014 Workshop Schedule
Day 1: Fri. 22nd August, 2014
Venue: Auditorium, Ramanujan building, CR Rao Advanced Institute of Mathematics, Statistics and
Computer Science (AIMSCS), University of Hyderabad Campus, Gachibowli, Hyderabad 500046.
Registration starts at 8:30 AM in CR Rao AIMSCS.
Session 1
Chair: Dr. S Pyne
09:30 AM
09:40
10:10
10:15
Session 2
Chair: Dr. B L S Prakasa Rao
10:45
11:30
12:15 PM
01:00
Session 3
Chair: Dr. S B Rao
02:00
02:45
03:30
04:15
Session 4
Chair: Dr. S K Udgata
04:45
05:30
BiDA2014
Day 2: Sat. 23rd August, 2014
Venue: Morning session Auditorium, Ramanujan Building, CR Rao AIMSCS.
Afternoon session Centre for Modelling Simulation and Design (CMSD), University of Hyderabad (UoH).
Session 1
Chair: Dr. S Pyne
09:00 AM
09:45
10:30
Session 2
Chair: Dr. A K Pujari
11:00
Dr. Yogesh Simmhan: Fast Data Analytics for the Internet of Things.
Dr. A K Pujari: Data Mining Trends in the Big Data Era.
Tea Break
BiDA2014
Shri HR Mohan
President, Computer Society of India.
Chairman, IEEE Computer Society & IEEE Professional Communication Society
Vice Chairman, IEEE Communications Society
Interface to Technical Societies at ACM Chennai
Former Associate Vice President (Systems), The Hindu, Chennai
Title: BIG DATA: Opportunities Ahead.
Speaker Biosketch: Mr. HR Mohan is a graduate in Engineering from IIT Madras. He is currently a consultant in
Information and Communications Technology area and ICT Education. He has a rich experience in the publishing
industry. He had served at the India's National Newspaper The Hindu as Associate Vice President (Systems) till
recently. At The Hindu, Mr. Mohan looked after the Corporate MIS activities. He was instrumental for the Internet
Publishing of The Hindu the first newspaper from India to go online. Subsequently, Business Line, Frontline,
Sportstar & other group publications including all supplements were made online with his efforts. At The Hindu, his
other initiatives included Library Automation, Indexing, Digital Archives of 135 years of The Hindu, Information
Services, Content Syndication and compilation and publishing of special thematic publications.
Mr. Mohan is associated with a no. of professional bodies in the areas of Information & Communication Technology,
Library, Information Sciences, Management, Technical Communication, Media and Industry bodies such as FICCI,
Hindustan Chamber of Commerce, CII, AIMA & MMA. He has assisted in organizing over 750 technical meetings /
seminars / workshops / conferences during the last 30 plus years of his association with the professional societies.
Mr. Mohan, currently the President of the Computer Society of India is a Fellow of CSI and has served in the
executive council of CSI at Chennai chapter and at the national level for over two decades in various capacities.
Further, he has served as the Chairman of Conferences Committee, Intersociety Relations Committee, Publications
Committee and Committee on Special Interest Groups. He is the Convener of CSI Chennai CIO Forum, and a
Member of CSI Digital Library Committee. He represents India in the forums such as ICANN & SEARCC.
Mr. Mohan is closely associated with the international associations such as IEEE & ACM as Senior Member. He
serves as the Chairman of the IEEE Computer Society, Madras Chapter & IEEE Professional Communication
Society, Madras Chapter. He serves in the executive committees of IEEE Communications Society (as Vice
Chairman) and IEEE Technology Management Council Madras Chapter (as Treasurer) and ACM Madras Chapter as
an interface to technical societies. He has served as the Vice Chairman of the IEEE Madras Section for the term
2012-2013. He is a Trustee of Ranganathan Centre for Information Studies. He is a Director at Internet Society India
Chennai Chapter. He is also associated with a no. of educational institutions as a member in their academic and
governing councils.
BiDA2014
Mr. Mohan has rich experience in editing, publishing and content management. He currently edits the monthly CSI
eNewsletter, which reaches to over 100,000 members of CSI and members of few other ICT related eGropus with
readership over 20,000. He had been the editor of the IEEE India Info, the newsletter of the IEEE India Council (for
2013) that reaches to about 55,000 members across the country. He had also edited IEEE MAS LINK, the monthly
eNewsletter of the IEEE Madras Section, which reaches to about 12,500 members for over seven years till Dec 2013,
INFOLINE, the newsletter of CSI Madras Chapter for about 20 years and CSI Digest, a quarterly monograph of
CSI for about three years.
Mr. Mohan manages a number of eGroups relating to his professional activities and interest and assists non-profit
organizations and institutions in managing their websites. He is a regular contributor for the ICT Happenings column
in CSI Communications and ICT Quiz columns in the newsletters and conducts ICT Quiz programmes. Mr. Mohan
delivers guest lectures and presentations at various institutions and forums in the areas of his interest such as
Information and Communication Technology, Open Source Software, Software Engineering, Knowledge
Management, Internet, Web & ePublishing, eLearning, Web Marketing & Electronic Commerce, Cloud Computing,
Bid Data & Analytics, eGovernance, Information & Cyber Security, Digital Libraries & Archives, Library &
Information/Content Management and Services, Employability & Soft Skills, IT Education and related areas. Mr.
Mohan believes in information sharing and makes himself available for such initiatives.
BiDA2014
10
BiDA2014
11
BiDA2014
12
BiDA2014
Dr. P Manimaran
CR Rao Advanced Institute of Mathematics, Statistics and Computer Science
(AIMSCS), Hyderabad.
Title: Graph Mining Applications to Social Network Analysis.
Abstract: Nowadays, with growth of social media any individual in this world can easily
connect to another in the cyber space. With this information, a social network can be
constructed considering the individuals as nodes and their interactions as edges between
them. The most challenging task is to mine the patterns in such social networks. In this
talk, we will discuss graph mining applications using centrality analysis and community
detection to extract information from social networks.
Speaker Biosketch: Dr. Manimaran is an Assistant Professor in CR Rao Advanced Institute of Mathematics,
Statistics and Computer Science, Hyderabad. Previously, he has done his post-doctoral research at Centre for
DNA Fingerprinting and Diagnostics, Hyderabad. Dr. Manimaran received his Ph.D. in Physics from University of
Hyderabad. His main research interests are computational Physics, time series analysis, complex networks and
wavelet transform. He has published research papers in peer-reviewed journals and conferences. He has guided
ten M.Tech. students, and guiding a Ph.D. student.
13
BiDA2014
14
BiDA2014
15
BiDA2014
16
BiDA2014
17
BiDA2014
Dr. K S Rajan
Lab for Spatial Informatics, International Institute of Information Technology (IIIT),
Hyderabad.
Title: Challenges in Managing Spatio-Temporal Big Data Sharing Current Research
Initiatives.
Abstract: With the increase in a range of location based data collection models and
devices and the frequency of such data collection, the current computer systems are not
only challenged to store and manage these but also are faced with the need to adapt and
develop the various algorithms to help handle such data. Some examples of these
include the vehicle tracking systems for traffic flow analysis, mobile based location based
service requests and public contributed data, mobile phone, GPS and other sensor
trajectories. In many of these cases, efforts at discovering science related questions, is still quite an effort. Spatiotemporal bigdata has been looked upon to provide clues to some of these complex questions, with efforts ranging
from spatio-temporal hypothesis generation to attempts at answering some of these complex questions.
In this talk, we will cover some of the ongoing efforts in our research group, with a focus on our current research work
in using spatio-temporal data for traffic pattern flow understanding in a city-wide road network and in knowledge
discovery in the field of epidemiology. The latter has led to the development of MiSTIC, a spatio-temporal data mining
algorithm with applications in crime and climatic studies too.
Speaker Biosketch: Dr. KS Rajan is an Associate Professor at IIIT, Hyderabad, one of Indias top ranked research
institutes, and leads the institutes Lab for Spatial Informatics (LSI). Recently, IIIT Hyderabad was ranked 3rd among
Indias Top Technical Institutions in Dataquest-IDC's Survey 2012.
Dr. Rajan is a multi-disciplinarian, with major interests in Geo-Spatial Technologies - GIS and Remote Sensing; Land
use modelling and Environmental Policy. He has taken a key interest in the gap-areas between computer science
and geospatial technologies and through his research works has helped focus on bridging this gap be it in developing
spatio-temporal data mining algorithms, Web-based Geospatial technologies, or New algorithms to help convert
satellite imagery to useful satellite based thematic products. He is also an active proponent of OpenSource in India
and for Geospatial technologies in particular. His Lab has recently released two Open Source tools LSIViewer and
VRGeo.in. While in environmental modelling, work includes the integration through analysis of the multi-disciplinary
fields of science and engineering and agent based modeling of the Human-LandWater-Energy linkages with
Ecosystem wide understanding and their interactions for impact studies and national and global level policy
initiatives.
Dr. Rajan has handled more than 20 projects (from both Government and Industry), has over 100 publications in
Books, Journals and Conferences and given more than 60 invited talks in a range of domains. Has been active in
curricular/academic & research matters of other Universities and Institutes, International and National programs.
Recently, Dr. Rajan has been awarded the Indian National Geospatial Award 2013 of Indian Society of Remote
Sensing.
18
BiDA2014
Dr. S B Rao
CR Rao Advanced Institute of Mathematics, Statistics and Computer Science
(AIMSCS), Hyderabad.
Title: Privacy Preservation in Graphs and Social Networks.
Abstract: The recent proliferation of graph and network data in various application
domains has raised privacy-preservation concerns for the individuals/organizations
involved. Recent studies show that simply removing the identity of the persons/nodes
before publishing the graph/social network data does not guarantee privacy of the
individuals. The structure of the graph/network and its basic parameters like the
degrees of the nodes, immediate neighbors, similarities and other central measures of
nodes - like eccentricity, betweenness centrality, can be revealing the identity of the
individuals. To address the issues, we survey specific graph anonymous problems based on these degree,
neighborhood and automorphisms and some of the known solutions. We also discuss recent work in this regard on
other parameters / centrality measures of nodes like eccentricity, betweenness centrality and others eccentricity,
betweenness centrality, centroid, stress, pagerank etc, and others and /or combination of these.
Let P be a parameter set of a node that can be calculated easily (in a polynomial way) for each node of a graph,
like the degree, the neighbors, the eccentricity, the betweenness centrality / other centrality measures. We call a
graph Pk-anonymous, if for every node v, there are at least k-1 other nodes in the graph with the same values of the
parameter set P. The definition of anonymity prevents the identification of the individuals / organizations by the
adversaries with probability more than 1/k, based on prior knowledge of the parameter set P for v, the
node/individual. We formally define, the graph Pk-anonymous problem thus as: Given a graph G, and a parameter
set P, find a Pk-anonymous graph obtained from G with the minimum number of graph-modifications / operations
(addition/deletion of edges etc) specified. The algorithms for finding a Pk anonymous graph for a given graph G
and parameter set P will be based on the principle of realizabilty of the sequences of these parameter values of
nodes of a new graph, which is Pk anonymous.
Speaker Biosketch: Dr. S B Rao is an Emeritus Professor at CR Rao Advanced Institute of Mathematics, Statistics
and Computer Science (AIMSCS), Hyderabad. He was the first Director of CRRAO AIMSCS. He was previously the
Director of Indian Statistical Institute. He has held academic positions at Indian Statistical Institute Kolkata, Ohio
State University, The Hungarian Academy of Sciences, The National University of Singapore, University of Western
Australia and many more. His research interests encompass Graph Theory and Applications, Social, Biological and
Economic Networks, Ramsey Theory, and other areas of Discrete Mathematics as well as Design of Experiments. He
has published above 70 research papers in reputed journals of national and international repute including Journal of
combinatorial theory, Discrete Mathematics, Sankhya. Dr. S B Rao had a Ph.D. from Indian Statistical Institute
Kolkata under the supervision of the legendary Dr. C R Rao. He has held various administrative positions as well as
in various Government bodies.
19
BiDA2014
Dr. V C V Rao
Centre for Development of Advanced Computing (C-DAC), Pune.
Title: An Overview of Cloud & Distributed Computing - Programming Paradigms.
Abstract: The author discusses an overview of current trends on High Performance
Computing and High Throughput Computing focusing on distributed computing
perspective. A summary of distributed computing technologies which covers wide
spectrum of Message Passing Clusters, Massively parallel processors and Grid
Computing technologies are explained from programming perspective. Most importantly,
the platform characteristics based on shared address space and message passing with
emphasizes on performance, scalability, energy efficiency and virtualization are
summarized. An overview of Cloud Computing that is evolved from cluster, grid and utility
computing are explained. The author explains an overview of parallel and distributed computing paradigms such as
Message Passing Interface (MPI), Hadoop MapReduce and Hadoop library from Apache and performance issues.
The computational results for application kernels such image processing and large scale matrix computations and
Graph analytics based on graph partitioning algorithms are presented based on BIG Data Hadoop MapReduce
framework.
Speaker Biosketch: Dr. VCV Rao is an Associate Director and Head of the High-Performance Computing - Frontier
Technologies & Exploration (HPC-FTE) Group, at C-DAC, Pune, India. VCV Rao specializes in implementation of
parallel algorithms on emerging parallel processing platforms (Cluster of Multi-Core Processors with accelerator
devices -GPUs & CPUs). His group works on performance of application and system benchmarks and
implementation of distributed computing algorithms focusing on Heterogeneous Computing environments such as
Distributed Computing systems with coprocessors, accelerators, power aware computing, and Out-of-Core
algorithms for large data processing.
Dr. VCV Rao contributed to design, develop and deployment of C-DACs PARAM Series of Supercomputers from
the year 1994 onwards. He is also playing an active role to proliferate parallel processing technology through
workshops in India and contributed to PARAM series at Premier institutes in India. Dr. VCV Rao is associated with
C-DAC since 1993. He received his Ph.D. degree in Mathematics in 1993 from IIT-Kanpur. VCV Rao was a visiting
faculty at the Dept. of Computer Science, University of Minnesota (UoM), Minneapolis, and Post-Doctoral fellow at
Army High Performance Computing (AHPCRC), UoM, during the year 1997-98.
20
BiDA2014
21
BiDA2014
22
BiDA2014
23
BiDA2014
Dr. V C V Rao, Mr. Rahul Ravidas Naik, Mr. Swapnajit Rout, Mr. Vikalp Handa
High-Performance Computing Frontier Technologies & Exploration Group, Centre for Development of
Advanced Computing (C-DAC), Pune.
An Overview of the Laboratory Sessions: The laboratory sessions are focused on understanding Hadoop
MapReduce framework, writing and execution of codes on IBM Multi-Core Processor system (IBM P755 32 CPU
124GB RAM AIX as operating Systems) of Message Passing Cluster of CMSD, UoH for numerical and nonnumerical computations. The basic codes (accessible from C-DAC website) for the following:
(i)
(ii)
(iii)
(iv)
24