You are on page 1of 24

BiDA2014

BiDA2014
A National Workshop On
Big Data Analytics
at CRRao AIMSCS and CMSD, UoH,
Hyderabad, India
on 22-24 August, 2014

Organized jointly by:


CR Rao Advanced Institute of Mathematics, Statistics and Computer Science (AIMSCS),
Centre for Modelling, Simulation and Design (CMSD), University of Hyderabad (UoH),
and
Computer Society of India (CSI)

BiDA2014

BiDA2014
Page
Foreword ........................

Message from Professor C R Rao .

BiDA2014 Schedule...........

Inaugural address by H R Mohan..................

Invited Speakers (in alphabetical order)


S Chattopadhyay, C-DAC Bangalore................

10

C Hota, BITS Pilani, Hyderabad.......................

11

K Karlapalem, IIIT Hyderabad..........................

12

P Manimaran, CRRao AIMSCS........................

13

B L S Prakasa Rao, CRRao AIMSCS..................

14

K Prasad, C-DAC Bangalore.............................

15

A K Pujari, UoH, Hyderabad............................

16

S Pyne, CRRao AIMSCS....................................

17

K S Rajan, IIIT Hyderabad................................

18

S B Rao, CRRao AIMSCS...................................

19

V C V Rao, C-DAC Pune....................................

20

Y Simmhan, IISc Bangalore.............................

21

S K Udgata, CMSD, UoH, Hyderabad...............

22

Expert Panel Discussion ....

23

Technical Experts
V C V Rao, V Handa, R R Naik, S Rout, C-DAC Bangalore...

24

BiDA2014
Foreword
Welcome to the First National Workshop on Big Data Analytics (BiDA2014), 22-24 August, 2014, in Hyderabad.
Data and information deluge is a truly remarkable phenomenon in todays world. Today we no longer need to go out
seeking for precious bits of data; it is, in fact, the staggering amounts of data that stare at us begging our attention,
whether or not we are suitably equipped to tackle the numerous challenges that they present.
Indeed with their ever-increasing volume, velocity and variety, large datasets generated from different sources from
all spheres of life continue to pose new challenges, and increasing demands, on computational analysis and data
management. The inherent complexities (e.g., data in motion, unstructured nature, security, veracity, ethical issues)
of Big Data has forced us to rethink how we can collect, store, combine and analyze it in efficient, reliable and costeffective manner. The foundations and capabilities to understand and address these needs must be built up urgently.
Large amounts of data from all sectors, ranging from health to education, environment to agriculture, commerce to art
and humanities, security to governance, etc., now provide us both professionals and researchers alike with
unprecedented opportunity to make an informed difference: introduce new approaches to industry and policymaking, take objective data-driven decisions, take advantage of inference and prediction, gain insights on risks and
protect health and wealth, create more useful infrastructure, offer enhanced and dynamic understanding of various
phenomena, and through all these myriad abilities, profoundly empower our lives and the world around us.
CR Rao AIMSCS embodies the foresight and vision of Professor C R Rao in uniting the analytical trinity of
Mathematics, Statistics and Computer Science under one roof. It is, therefore, both timely and appropriate for CR
Rao AIMSCS to organize BiDA2014, jointly with the Centre for Modelling, Simulation and Design (CMSD), University
of Hyderabad (UoH), and the Computer Society of India (CSI), which is celebrating its Golden Jubilee Year in 2014.
As the field of Big Data is taking shape around the world, in this workshop, we have decided to offer both knowledge
and skills that are relevant to Big Data. The participants will learn about different domains and applications, both
established as well as emerging, in which the task of dealing with large and complex datasets is getting increasingly
unavoidable, and in fact, highly sought after. Distinguished experts will speak on the current problems and technical
aspects in their areas of research, while a panel of experts from the government and the industry will share their
perspectives with the participants. I thank all of them for agreeing to come to speak at the workshop.
Further, we also have also arranged for an experienced team of technical experts from the Centre for Development
of Advanced Computing (C-DAC) to provide the participants with multiple sessions of hands-on laboratory training to
demonstrate and work on practical problems in Big Data. Thanks much for their valuable time and expertise.
We are pleased to note that the academia, the industry and the government are all represented in both our list of
speakers as well as our list of participants. Notably, the latter spans the length and breadth of the country, in the true
spirit of a National Workshop. We wish all of them a most enriching experience here.
I take this opportunity to thank all my colleagues and scholars, as well as the co-organizers and the Organizing
Committee-members, for their hard work in organizing the workshop. I also thank University of Hyderabad, C-DAC,
CSI Hyderabad Chapter and IIIT Hyderabad for their kind and enthusiastic co-operation. Finally, we express our
gratitude to all the sponsors, especially DST and MoS&PI, for providing generous support to the event.
Thank you and best wishes,
Saumyadipta Pyne.
Chair, BiDA2014,
CR Rao AIMSCS, Hyderabad, India.
15th August, 2014.

BiDA2014
Message from Professor C R Rao

Statisticians are used to developing methodologies for analysis of data


collected for a specific purpose in a planned way. Sample surveys and design of experiments
are typical examples. Big data refers to massive amounts of very high dimensional and
unstructured data which are continuously produced and stored with much cheaper cost than
they are used to be. High dimensionality combined with large sample size creates issues such
as heavy computational cost and algorithmic instability. The massive samples in big data are
typically aggregated from multiple sources at different time points using different
technologies. This creates issues of heterogeneity, experimental variations, and statistical
biases and requires us to develop more adaptive and robust procedures.
I am glad that Mahalanobis Professor Pyne has taken the initiative to conduct short courses,
discussion meetings and conferences to evolve suitable methodologies to extract information
from large data. I wish the workshop a great success.

C.R. Rao, Sc.D., F.R.S.


Padma Vibhushan, India
Research Professor, University at Buffalo,
Williamsville, NY 14221, USA.
15th August, 2014.

BiDA2014
BiDA2014 Workshop Schedule
Day 1: Fri. 22nd August, 2014
Venue: Auditorium, Ramanujan building, CR Rao Advanced Institute of Mathematics, Statistics and
Computer Science (AIMSCS), University of Hyderabad Campus, Gachibowli, Hyderabad 500046.
Registration starts at 8:30 AM in CR Rao AIMSCS.
Session 1
Chair: Dr. S Pyne
09:30 AM
09:40
10:10
10:15
Session 2
Chair: Dr. B L S Prakasa Rao
10:45
11:30
12:15 PM
01:00
Session 3
Chair: Dr. S B Rao
02:00
02:45
03:30
04:15
Session 4
Chair: Dr. S K Udgata
04:45
05:30

Welcome address by Dr. Allam Apparao


Inaugural address by Shri H R Mohan (Chief Guest)
BIG DATA Opportunities Ahead.
Introductory Remarks by Dr. S Pyne
Tea Break

Dr. V C V Rao: An Overview of Cloud and Distributed Computing Programming Paradigms.


Dr. Subrata Chattopadhyay: The Role and Challenges of
e-Infrastructure for Supporting Big Science Discoveries.
Dr. Chittaranjan Hota: Security and Privacy Concerns in Campus Wide
Networks: Easing Out Using Big Data Analytics.
Lunch

Dr. B L S Prakasa Rao: Big Data and High-Dimensional Data Analysis.


Dr. Kamal Karlapalem: Towards Visualizing Clusters and Classes for Real
Valued High Dimensional Data Sets.
Dr. S K Udgata: Large Sensor Network and IoT Data Management with
Big Data: The Research Challenges.
Tea Break

Dr. K S Rajan: Challenges in Managing Spatio-Temporal Big Data


Sharing Current Research Initiatives.
Expert Panel Discussion (Big Data: New Challenges and Directions)
Panelists: Mr. K Mohan Raidu (Chair), Mr. T. Krishna Kumar

Day 1 ends at 06:30 PM.


6:30 7:30 PM Exhibition and tour of Professor C.R. Rao Gallery (CR Rao AIMSCS First Floor).

BiDA2014
Day 2: Sat. 23rd August, 2014
Venue: Morning session Auditorium, Ramanujan Building, CR Rao AIMSCS.
Afternoon session Centre for Modelling Simulation and Design (CMSD), University of Hyderabad (UoH).
Session 1
Chair: Dr. S Pyne
09:00 AM
09:45
10:30
Session 2
Chair: Dr. A K Pujari
11:00

Dr. Yogesh Simmhan: Fast Data Analytics for the Internet of Things.
Dr. A K Pujari: Data Mining Trends in the Big Data Era.
Tea Break

Dr. S Pyne: Multivariate Stream Data Analytics with Applications to


Health Sciences and Technology.
11:45
Dr. S B Rao: Privacy preservation in Graphs and Social Networks.
12:30 PM
Dr. P Manimaran: Graph Mining Applications to Social Network
Analysis.
01:15
Lunch
Session 3 (CMSD: Please note change in venue)
Chair: Dr. V C V Rao
02:15
Lab Session 1: Dr. V C V Rao & Team
04:00
Tea Break
Session 4
Chair: Dr. Yogesh Simmhan
04:30
Lab Session 2: Dr. V C V Rao & Team
Day 2 ends at 06:00 PM.

Day 3: Sun. 24th August, 2014


Venue: Centre for Modelling Simulation and Design (CMSD), University of Hyderabad (UoH).
Session 1
Chair: Dr. P Manimaran
09:00 AM
Lab Session 3: Dr. V C V Rao & Team
10:30
Tea Break
Session 2
Chair: Dr. S Pyne
11:00
Ms. Karuna Prasad: Processing Large Datasets Using Hadoop.
01:00 PM
Conclusion of Workshop
01:15
Lunch
Day 3 (and workshop) ends at 02:15 PM.

BiDA2014

Shri HR Mohan
President, Computer Society of India.
Chairman, IEEE Computer Society & IEEE Professional Communication Society
Vice Chairman, IEEE Communications Society
Interface to Technical Societies at ACM Chennai
Former Associate Vice President (Systems), The Hindu, Chennai
Title: BIG DATA: Opportunities Ahead.

Speaker Biosketch: Mr. HR Mohan is a graduate in Engineering from IIT Madras. He is currently a consultant in
Information and Communications Technology area and ICT Education. He has a rich experience in the publishing
industry. He had served at the India's National Newspaper The Hindu as Associate Vice President (Systems) till
recently. At The Hindu, Mr. Mohan looked after the Corporate MIS activities. He was instrumental for the Internet
Publishing of The Hindu the first newspaper from India to go online. Subsequently, Business Line, Frontline,
Sportstar & other group publications including all supplements were made online with his efforts. At The Hindu, his
other initiatives included Library Automation, Indexing, Digital Archives of 135 years of The Hindu, Information
Services, Content Syndication and compilation and publishing of special thematic publications.
Mr. Mohan is associated with a no. of professional bodies in the areas of Information & Communication Technology,
Library, Information Sciences, Management, Technical Communication, Media and Industry bodies such as FICCI,
Hindustan Chamber of Commerce, CII, AIMA & MMA. He has assisted in organizing over 750 technical meetings /
seminars / workshops / conferences during the last 30 plus years of his association with the professional societies.
Mr. Mohan, currently the President of the Computer Society of India is a Fellow of CSI and has served in the
executive council of CSI at Chennai chapter and at the national level for over two decades in various capacities.
Further, he has served as the Chairman of Conferences Committee, Intersociety Relations Committee, Publications
Committee and Committee on Special Interest Groups. He is the Convener of CSI Chennai CIO Forum, and a
Member of CSI Digital Library Committee. He represents India in the forums such as ICANN & SEARCC.
Mr. Mohan is closely associated with the international associations such as IEEE & ACM as Senior Member. He
serves as the Chairman of the IEEE Computer Society, Madras Chapter & IEEE Professional Communication
Society, Madras Chapter. He serves in the executive committees of IEEE Communications Society (as Vice
Chairman) and IEEE Technology Management Council Madras Chapter (as Treasurer) and ACM Madras Chapter as
an interface to technical societies. He has served as the Vice Chairman of the IEEE Madras Section for the term
2012-2013. He is a Trustee of Ranganathan Centre for Information Studies. He is a Director at Internet Society India
Chennai Chapter. He is also associated with a no. of educational institutions as a member in their academic and
governing councils.

BiDA2014
Mr. Mohan has rich experience in editing, publishing and content management. He currently edits the monthly CSI
eNewsletter, which reaches to over 100,000 members of CSI and members of few other ICT related eGropus with
readership over 20,000. He had been the editor of the IEEE India Info, the newsletter of the IEEE India Council (for
2013) that reaches to about 55,000 members across the country. He had also edited IEEE MAS LINK, the monthly
eNewsletter of the IEEE Madras Section, which reaches to about 12,500 members for over seven years till Dec 2013,
INFOLINE, the newsletter of CSI Madras Chapter for about 20 years and CSI Digest, a quarterly monograph of
CSI for about three years.
Mr. Mohan manages a number of eGroups relating to his professional activities and interest and assists non-profit
organizations and institutions in managing their websites. He is a regular contributor for the ICT Happenings column
in CSI Communications and ICT Quiz columns in the newsletters and conducts ICT Quiz programmes. Mr. Mohan
delivers guest lectures and presentations at various institutions and forums in the areas of his interest such as
Information and Communication Technology, Open Source Software, Software Engineering, Knowledge
Management, Internet, Web & ePublishing, eLearning, Web Marketing & Electronic Commerce, Cloud Computing,
Bid Data & Analytics, eGovernance, Information & Cyber Security, Digital Libraries & Archives, Library &
Information/Content Management and Services, Employability & Soft Skills, IT Education and related areas. Mr.
Mohan believes in information sharing and makes himself available for such initiatives.

BiDA2014

Dr. Subrata Chattopadhyay


Centre for Development of Advanced Computing (C-DAC), Bangalore.
Title: The role and challenges of e-Infrastructure for supporting Big Science discoveries.
Abstract: The basic steps of scientific discoveries are to conduct experiments which
normally generate huge data that need to be checked, shared and analyzed or
visualized. Finally these findings are published and announced to the communities.
Some of the well-known discoveries in the field of high energy physics and life science
will be presented to reflect on the basic steps and challenges of huge data handling and
processing addressed for these discoveries.
The role of data handling and analysis is a common challenge and becoming more and more complex as indicated
by the some of the use cases presented above and some of the collaborating global experiment already planned. In
that context, understanding of Big Data and its innovation to address the future challenges are becoming more
critical. C-DAC has considerable experience on developing and managing e-Infrastructures that include High
Performance Computing, Grid and Cloud technologies.
The present status and future roadmap of these
technologies being planned to address these challenges will be elaborated in this talk.
Speaker Biosketch: Dr. Subrata Chattopadhyay is currently Associate Director at C-DAC, Bangalore. He is also the
Chief Investigator of Garuda the national grid computing initiative of India. Previously he was involved in setting up
the PARAM Padma, the first Indian supercomputing facility listed from India. He was also involved in setting up of
nationwide high speed communication fabric of GARUDA and deploying grid middleware across various platforms of
supercomputers. From C-DAC, he was the technical manager for the EUIndia Grid project that interconnect Indian
grid project Garuda with the European grid initiatives EGI. He is also leading another International project funded
by European Commission entitled Co-ordination and Harmonization of Advanced E-Infrastructure - Research and
Education Data Sharing (CHAIN-REDS).
He did his Bachelors in Engineering (BE) degree from NIT, Durgapur, Masters (M.Tech) from IIT, Kanpur and
Doctorate (Ph.D.) from University of British Columbia, Vancouver, Canada. He brings more than 27 years of
experience both from IT industry and research organizations. His major areas of interest include high performance
computing, grid/cloud computing and process modeling and simulations

10

BiDA2014

Dr. Chittaranjan Hota


Department of Computer Science, Birla Institute of Technology and Science
(BITS) Pilani, Hyderabad Campus.
Title: Security and Privacy Concerns in Campus Wide Networks: Easing Out Using Big
Data Analytics.
Abstract: With the proliferation of P2P systems, it is critical to consider the impact of
these systems on the security of an Internet environment that is already struggling from
several security issues. Many developing, and developed countries have less stringent
regulations on P2P application usage. Currently, the P2P traffic control is achieved by
either throttling the P2P bandwidth or allowing P2P traffic at certain times. Recent
empirical studies indicate that P2P and Web traffic together dominate today's Internet
traffic. Several open source and proprietary products detect and alert policy violations for usage of P2P applications
using techniques like port-based analysis, and protocol analysis. In this talk, we will assess the impact of P2P traffic
on perimeter security appliances, and develop intelligent approaches using machine learning techniques to counter
their impact on campus wide networks. We will discuss the usage of Hadoop, Hive and Mahout to scale the data
analytics framework that can capture Gigabytes of network traffic and try to figure out if there is any existence of
anomalous P2P traffic like Botnet, or Malware within the corporate network traffic.
Speaker Biosketch: Dr. Chittaranjan Hota did his Ph.D.. in Computer Science and Engineering from Birla Institute of
Technology & Science, Pilani. He was the founding Head of the Computer Science Department at BITS Hyderabad,
and currently he is Associate Dean, Admissions. He has been teaching and researching in Computer Science and
Engineering area at BITS-Pilani since past 15 years, and overall since past twenty-five years. He has been a visiting
researcher and visiting professor at University of New South Wales, Sydney; University of Cagliari, Italy; Aalto
University, Finland; City University, London. He has research funding from UGC, New Delhi; DIT, New Delhi; and
TCS, India. He has guided three Ph.D.. students and currently guiding five Ph.D.. students in the area of P2P
Overlays, Information Security, Wireless Networks, and Distributed Computing Systems. He is the recipient of
Australian Vice Chancellors Committee award, recipient of Erasmus Mundus fellowship from European commission,
and recipient of Certificate of Excellence for Faculty Excellence Award from BITS Pilani. He has published
extensively in peer-reviewed journals and conferences. He has also edited LNCS volumes. He is a member of IEEE,
ACM, IE, and ISTE.

11

BiDA2014

Dr. Kamalakar Karlapalem


Centre for Data Engineering, International Institute of Information Technology
(IIIT), Hyderabad.
Title: Towards Visualizing Clusters and Classes for Real Valued High Dimensional Data
Sets.
Abstract: High dimensional real valued data sets are the most difficult to process and
mine. Data mining such data sets is usually done by techniques such as, clustering and
classification. A major challenge is to conceptualize the results of the clustering and
classification algorithms. Data visualization helps to comprehend and to get a deeper
insight of the data and the data mining results. The problem is to come up with
techniques and tools to results.
We have built four tools to visualize and comprehend high dimensional real valued data. Heidi helps in visualizing the
subspace overlap among the clusters. Beads help in visualizing spatial spread, size and shape of the clusters.
PEARLS is a visual tool kit to simultaneous query and explore data sets using concepts behind Beads. CROVHD
helps to visualize spread of data across dimensions, and to show closeness and separation of classes. In this talk, I
shall present the background for data visualization, technical insights behind above tools, and list some open
problems.
Speaker Biosketch: Dr. Kamalakar Karlapalem is a faculty member at International Institute of Information
Technology, Hyderabad, and heads the Centre for Data Engineering. His research spans the areas of database
visualization, data analytics, workflow management systems, multi-agent systems, and data systems. He with his
students have been awarded in academic competitions such as, RoboCup, VAST, and TAC. He has graduated eight
Ph.D.. and thirty three Masters by Research students.
He is an alumni of Indian Statistical Institute (M.Stat.), IIT, Kharagpur(M.Tech.) and Georgia Tech (Ph.D.) and was a
faculty member in computer science department at HKUST (1992-2000), before joining IIIT, Hyderabad.

12

BiDA2014

Dr. P Manimaran
CR Rao Advanced Institute of Mathematics, Statistics and Computer Science
(AIMSCS), Hyderabad.
Title: Graph Mining Applications to Social Network Analysis.
Abstract: Nowadays, with growth of social media any individual in this world can easily
connect to another in the cyber space. With this information, a social network can be
constructed considering the individuals as nodes and their interactions as edges between
them. The most challenging task is to mine the patterns in such social networks. In this
talk, we will discuss graph mining applications using centrality analysis and community
detection to extract information from social networks.
Speaker Biosketch: Dr. Manimaran is an Assistant Professor in CR Rao Advanced Institute of Mathematics,
Statistics and Computer Science, Hyderabad. Previously, he has done his post-doctoral research at Centre for
DNA Fingerprinting and Diagnostics, Hyderabad. Dr. Manimaran received his Ph.D. in Physics from University of
Hyderabad. His main research interests are computational Physics, time series analysis, complex networks and
wavelet transform. He has published research papers in peer-reviewed journals and conferences. He has guided
ten M.Tech. students, and guiding a Ph.D. student.

13

BiDA2014

Dr. B L S Prakasa Rao


CR Rao Advanced Institute of Mathematics, Statistics and Computer Science
(AIMSCS), Hyderabad.
Title: Big Data and High Dimensional Data Analysis.
Abstract: Over the last ten to fifteen years, more and more corporations are adapting to
data-driven approach to have targeted services, reduce risks and improve performance.
They are implementing specialized data analytic programs to collect, store, manage and
analyze large data sets or what is now called BIG DATA. Such data sets are
characterized by massive sample size and high-dimensionality. Traditional statistical
methods are inappropriate to tackle such problems. There are many types of events
where there are a potentially large number of parameters/covariates but relatively few
instances of the event. This type of data is termed as HIGH-DIMENSIONAL DATA. We will discuss some issues
arising in analysis of BIG DATA and HIGH-DIMENSIONAL DATA.
Speaker Biosketch: Dr. B.L.S. Prakasa Rao holds the prestigious Ramanujan Chair Professorship at C R Rao
Advanced Institute of Mathematics Statistics and Computer Science, Hyderabad. He was previously the Director of
Indian Statistical Institute, Kolkata. He has held academic positions at Indian Statistical Institute, Kolkata and New
Delhi, University of Iowa, University of Wisconsin, University of California, Davis, Purdue University, University of
Illinois, University of California, Berkeley, Universite de Montreal, Indian Institute of Technology, Kanpur, University of
Hyderabad and many others. He did his M.A. from Andhra University, M. Stat. at Indian Statistical Institute, Kolkata,
and Ph.D. at Michigan State University. His research interests span Limit Theorems, Stochastic inequalities,
Characterization of Distributions, Stochastic Processes, Inference for Stochastic Processes, Nonparametric
Functional Estimation and Asymptotic Theory of Statistical Inference. He is the Editor-in-Chief of Sankhya Series A
and Sankhya Series B, and member of Editorial boards of many journals of international repute. He has published
over 220 research papers in journals of international repute. He also has written 13 books as well as many expository
articles. He is a Fellow of all the Science Academies of the country, Fellow of Institute of Mathematical Statistics,
USA and an elected member of the International Statistical Institute and many reputed professional societies. Dr.
Prakasa Rao is the recipient of the prestigious S S Bhatnagar award and P V Sukhatme prize.

14

BiDA2014

Ms. Karuna Prasad


Centre for Development of Advanced Computing (C-DAC), Bangalore.
Title: Processing Large Datasets Using Hadoop.
Abstract: The talk will give an overview of Hadoop Distributed File System
(HDFS), its architecture and features. We will discuss MapReduce, the framework
for processing data and components of Hadoop ecosystem. Further, the need of
higher level tools like Pig and Hive on Hadoop cluster, and the architecture of Pig
and Hive, and how their data models differ will be discussed.
Speaker Biosketch: Ms. Karuna Prasad is working as Senior Technical Officer in C-DAC, Bangalore. She
has received MCA degree from Nagpur University and MS in Software Systems from BITS, Pilani. Her
experience in Distributed Computing areas include grid and cloud computing. She has publications in
national and international conferences.

15

BiDA2014

Dr. Arun K Pujari


School of Computer and Information Sciences, University of Hyderabad,
Hyderabad.
Title: Data Mining Trends in the Big Data Era.
Speaker Biosketch: Dr. A K Pujari is Professor of Computer Science at the
University of Hyderabad, Hyderabad. He is currently Dean, School of Computer &
Information Sciences. Prior to joining UoH, he served at Automated Cartography
Cell, Survey of India, and Jawaharlal Nehru University, New Delhi. He received
Ph.D. from the Indian Institute of Technology, Kanpur and M.Sc. from Sambalpur
University, Sambalpur. In 2008-2011, he was the Vice Chancellor, Sambalpur
University, Sambalpur, Orissa.
He has 32 years of teaching experience of post-graduate classes in several national institutions. During this
period he had taught, designed and developed several new courses. The important ones are Data Bases,
Theory of Computation, Computer Vision, Computational Geometry, Algorithms, Artificial Intelligence,
Computer Based Optimization Techniques, Knowledge Representation & Reasoning, Machine Learning,
Data Mining & Data Warehousing, Neural Networks, GIS and Bioinformatics.
His research interests include AI, GIS, Combinatorial Algorithms, Data Mining.

16

BiDA2014

Dr. Saumyadipta Pyne


CR Rao Advanced Institute of Mathematics, Statistics and Computer Science,
Hyderabad.
Title: Multivariate Stream Data Analytics with Applications to Health Sciences and
Technology.
Abstract: Analytics for large volume, high velocity data streams presents serious
research challenges. In the recent years, many methodological developments have
been made to address a variety of problems in stream data analytics. In this talk,
well briefly review the field, and then look at different methods and algorithms for
tackling the key problem of anomaly detection in multivariate stream data as applied
to, in particular, disease outbreak analysis.
Speaker Biosketch: Dr. Saumyadipta Pyne holds the prestigious PC Mahalanobis
Chair and is Professor and Head of Bioinformatics in CR Rao Advanced Institute of
Mathematics, Statistics and Computer Science in Hyderabad. He is also Adjunct Professor in Public Health
Foundation of India, and Remote Research Associate of Broad Institute of MIT and Harvard University. Dr. Pyne is a
Ramalingaswami Fellow of Department of Biotechnology, Government of India, and a former research fellow of the
Indian Statistical Institute. Formerly he worked in Dana-Farber Cancer Institute of Harvard Medical School in Boston.
He received his doctorate from the State University of New York at Stony Brook in USA working simultaneously in the
Departments of Computer Science, Molecular Genetics and Microbiology. He conducted his postdoctoral research in
Broad Institute of MIT and Harvard University in Cambridge. Dr. Pyne conducted pioneering research in the field of
computational modeling of single cell level high-resolution, high-dimensional data. Dr. Pynes research interests
include Big Data in Life Sciences and Health Informatics, Computational Statistics and High-dimensional Data
Modeling. He has published extensively in top international journals.
Dr. Pyne is actively engaged in promoting Big Data research and training activities in both India and abroad. He is
the Workshop Co-Chair of IEEE International Conference on Big Data 2014, to be held in Washington DC in October
2014. He is currently the only Member of the Program Committee from India, as also for the First IEEE International
Conference on Big Data 2013 held in Santa Clara, Silicon Valley, in 2013, as well as the ACM International
Workshop on Big Data in Life Sciences, to be held in September 2014. Dr. Pyne currently teaches a course on Big
Data and High-Dimensional Data Analysis to the final year Integrated Masters students in the School of Mathematics
and Statistics of the University of Hyderabad. He is also co-editing a book on the subject.

17

BiDA2014

Dr. K S Rajan
Lab for Spatial Informatics, International Institute of Information Technology (IIIT),
Hyderabad.
Title: Challenges in Managing Spatio-Temporal Big Data Sharing Current Research
Initiatives.
Abstract: With the increase in a range of location based data collection models and
devices and the frequency of such data collection, the current computer systems are not
only challenged to store and manage these but also are faced with the need to adapt and
develop the various algorithms to help handle such data. Some examples of these
include the vehicle tracking systems for traffic flow analysis, mobile based location based
service requests and public contributed data, mobile phone, GPS and other sensor
trajectories. In many of these cases, efforts at discovering science related questions, is still quite an effort. Spatiotemporal bigdata has been looked upon to provide clues to some of these complex questions, with efforts ranging
from spatio-temporal hypothesis generation to attempts at answering some of these complex questions.
In this talk, we will cover some of the ongoing efforts in our research group, with a focus on our current research work
in using spatio-temporal data for traffic pattern flow understanding in a city-wide road network and in knowledge
discovery in the field of epidemiology. The latter has led to the development of MiSTIC, a spatio-temporal data mining
algorithm with applications in crime and climatic studies too.
Speaker Biosketch: Dr. KS Rajan is an Associate Professor at IIIT, Hyderabad, one of Indias top ranked research
institutes, and leads the institutes Lab for Spatial Informatics (LSI). Recently, IIIT Hyderabad was ranked 3rd among
Indias Top Technical Institutions in Dataquest-IDC's Survey 2012.
Dr. Rajan is a multi-disciplinarian, with major interests in Geo-Spatial Technologies - GIS and Remote Sensing; Land
use modelling and Environmental Policy. He has taken a key interest in the gap-areas between computer science
and geospatial technologies and through his research works has helped focus on bridging this gap be it in developing
spatio-temporal data mining algorithms, Web-based Geospatial technologies, or New algorithms to help convert
satellite imagery to useful satellite based thematic products. He is also an active proponent of OpenSource in India
and for Geospatial technologies in particular. His Lab has recently released two Open Source tools LSIViewer and
VRGeo.in. While in environmental modelling, work includes the integration through analysis of the multi-disciplinary
fields of science and engineering and agent based modeling of the Human-LandWater-Energy linkages with
Ecosystem wide understanding and their interactions for impact studies and national and global level policy
initiatives.
Dr. Rajan has handled more than 20 projects (from both Government and Industry), has over 100 publications in
Books, Journals and Conferences and given more than 60 invited talks in a range of domains. Has been active in
curricular/academic & research matters of other Universities and Institutes, International and National programs.
Recently, Dr. Rajan has been awarded the Indian National Geospatial Award 2013 of Indian Society of Remote
Sensing.

18

BiDA2014

Dr. S B Rao
CR Rao Advanced Institute of Mathematics, Statistics and Computer Science
(AIMSCS), Hyderabad.
Title: Privacy Preservation in Graphs and Social Networks.
Abstract: The recent proliferation of graph and network data in various application
domains has raised privacy-preservation concerns for the individuals/organizations
involved. Recent studies show that simply removing the identity of the persons/nodes
before publishing the graph/social network data does not guarantee privacy of the
individuals. The structure of the graph/network and its basic parameters like the
degrees of the nodes, immediate neighbors, similarities and other central measures of
nodes - like eccentricity, betweenness centrality, can be revealing the identity of the
individuals. To address the issues, we survey specific graph anonymous problems based on these degree,
neighborhood and automorphisms and some of the known solutions. We also discuss recent work in this regard on
other parameters / centrality measures of nodes like eccentricity, betweenness centrality and others eccentricity,
betweenness centrality, centroid, stress, pagerank etc, and others and /or combination of these.
Let P be a parameter set of a node that can be calculated easily (in a polynomial way) for each node of a graph,
like the degree, the neighbors, the eccentricity, the betweenness centrality / other centrality measures. We call a
graph Pk-anonymous, if for every node v, there are at least k-1 other nodes in the graph with the same values of the
parameter set P. The definition of anonymity prevents the identification of the individuals / organizations by the
adversaries with probability more than 1/k, based on prior knowledge of the parameter set P for v, the
node/individual. We formally define, the graph Pk-anonymous problem thus as: Given a graph G, and a parameter
set P, find a Pk-anonymous graph obtained from G with the minimum number of graph-modifications / operations
(addition/deletion of edges etc) specified. The algorithms for finding a Pk anonymous graph for a given graph G
and parameter set P will be based on the principle of realizabilty of the sequences of these parameter values of
nodes of a new graph, which is Pk anonymous.
Speaker Biosketch: Dr. S B Rao is an Emeritus Professor at CR Rao Advanced Institute of Mathematics, Statistics
and Computer Science (AIMSCS), Hyderabad. He was the first Director of CRRAO AIMSCS. He was previously the
Director of Indian Statistical Institute. He has held academic positions at Indian Statistical Institute Kolkata, Ohio
State University, The Hungarian Academy of Sciences, The National University of Singapore, University of Western
Australia and many more. His research interests encompass Graph Theory and Applications, Social, Biological and
Economic Networks, Ramsey Theory, and other areas of Discrete Mathematics as well as Design of Experiments. He
has published above 70 research papers in reputed journals of national and international repute including Journal of
combinatorial theory, Discrete Mathematics, Sankhya. Dr. S B Rao had a Ph.D. from Indian Statistical Institute
Kolkata under the supervision of the legendary Dr. C R Rao. He has held various administrative positions as well as
in various Government bodies.

19

BiDA2014

Dr. V C V Rao
Centre for Development of Advanced Computing (C-DAC), Pune.
Title: An Overview of Cloud & Distributed Computing - Programming Paradigms.
Abstract: The author discusses an overview of current trends on High Performance
Computing and High Throughput Computing focusing on distributed computing
perspective. A summary of distributed computing technologies which covers wide
spectrum of Message Passing Clusters, Massively parallel processors and Grid
Computing technologies are explained from programming perspective. Most importantly,
the platform characteristics based on shared address space and message passing with
emphasizes on performance, scalability, energy efficiency and virtualization are
summarized. An overview of Cloud Computing that is evolved from cluster, grid and utility
computing are explained. The author explains an overview of parallel and distributed computing paradigms such as
Message Passing Interface (MPI), Hadoop MapReduce and Hadoop library from Apache and performance issues.
The computational results for application kernels such image processing and large scale matrix computations and
Graph analytics based on graph partitioning algorithms are presented based on BIG Data Hadoop MapReduce
framework.
Speaker Biosketch: Dr. VCV Rao is an Associate Director and Head of the High-Performance Computing - Frontier
Technologies & Exploration (HPC-FTE) Group, at C-DAC, Pune, India. VCV Rao specializes in implementation of
parallel algorithms on emerging parallel processing platforms (Cluster of Multi-Core Processors with accelerator
devices -GPUs & CPUs). His group works on performance of application and system benchmarks and
implementation of distributed computing algorithms focusing on Heterogeneous Computing environments such as
Distributed Computing systems with coprocessors, accelerators, power aware computing, and Out-of-Core
algorithms for large data processing.
Dr. VCV Rao contributed to design, develop and deployment of C-DACs PARAM Series of Supercomputers from
the year 1994 onwards. He is also playing an active role to proliferate parallel processing technology through
workshops in India and contributed to PARAM series at Premier institutes in India. Dr. VCV Rao is associated with
C-DAC since 1993. He received his Ph.D. degree in Mathematics in 1993 from IIT-Kanpur. VCV Rao was a visiting
faculty at the Dept. of Computer Science, University of Minnesota (UoM), Minneapolis, and Post-Doctoral fellow at
Army High Performance Computing (AHPCRC), UoM, during the year 1997-98.

20

BiDA2014

Dr. Yogesh Simmhan


Supercomputing, Education and Research Centre, Indian Institute of Science
(IISc), Bangalore.
Title : Fast Data Analytics for the Internet of Things.
Abstract: The pervasive spread of sensing, actuation and communication
technology is helping realize the hardware aspects of the Internet of Things (IoT).
However, capitalizing on the true potential of IoT for achieving societal impact in a
developing country like India requires both affordable IoT solutions as well as
meaningful analytics on top of the data collected. Data from IoT is often streaming in
nature, distributed in their source, and intermittent. Realtime data aggregation and
analytics requires research into stream processing systems that adapt to system
behavior, and complex event processing across edge devices and the Cloud. This
talk will discuss an IoT Architecture for India based on ongoing projects at IISc, and lay emphasis on managing the
velocity dimension of Big Data for intelligent actions.
Speaker Biosketch: Dr. Yogesh Simmhan is an Assistant Professor at the SERC Department at IISc. Previously, he
was a Research Assistant Professor in the Electrical Engineering Department (Computer Engineering) at the
University of Southern California, Los Angeles and Associate Director of the USC Centre for Energy Informatics. His
research explores abstractions, algorithms and applications on distributed data and computing systems. These span
Cloud Computing, Scalable and Distributed Computing, distributed data and metadata management, and software
architectures for large scale applications in eScience and eEngineering. His research advances fundamental
knowledge, and offers a practitioner's insight, on building scalable, resilient systems to empower dynamic, distributed
and Big Data applications. Yogesh has a Ph.D. in Computer Science from Indiana University and was earlier a
Postdoc at Microsoft Research, San Francisco. He is a Senior Member of IEEE and ACM.

21

BiDA2014

Dr. Siba K Udgata


Centre for Modelling, Simulation and Design (CMSD), University of Hyderabad,
Hyderabad.
Title: Large Sensor Network and IoT Data Management with Big Data: The Research
Challenges.
Abstract: The ongoing convergence of evolution of devices (Internet of Things),
accumulation of Big Data (large scale sensors) and deployment of large shared
infrastructures (computing clouds) has created exciting new research challenges. The
Internet of Things (IoT) has generated a large amount of research interest across a wide
variety of technical areas. These include the physical devices themselves,
communications among them, and relationships between them. One of the effects of
ubiquitous sensors networked together into large ecosystems has been an enormous flow of data supporting a wide
variety of applications. Technical and management challenges abound in this area, including: sensor networks
management, and data management, analysis, and visualization. New research, tools, and applications in the field of
big data have been exploding as researchers find new ways of addressing the challenges posed by volume,
velocity and variety of data. The convergence of IoT and big data creates new opportunities for interesting and high
impact research. Many sensor network data flows exhibit high velocity, distributed streams of heterogeneous data,
often from mobile sources, and varying quality. We will present discuss some applications and challenges of some
large scale sensor network applications with IoT using Big Data and data analytic.
Speaker Biosketch: Dr. Siba K Udgata is a Professor in the School of Computer and Information Sciences,
University of Hyderabad, India. He is also presently working at Centre for Modelling, Simulation and Design (CMSD)
as Director. He has a Ph.D. in Computer Science in the area of mobile computing and wireless communication. His
main research interests are Wireless Communication, Mobile Computing and Wireless Sensor Networks. He has
twenty years of teaching experience of teaching Masters students of Computer Science and guiding research
students. So far he has supervised more than 50 master student theses and four Ph.D. theses.
He was a United Nations Fellow and worked in the United Nations University/ International Institute for Software
Technology (UNU/IIST), Macau as research fellow in the year 2001. He was a visiting fellow at Ball State University,
USA. He was also a visiting Professor at Asian Institute Technology, Bangkok, Mahasarakham University, Thailand
and Tribhuban University, Kathmandu, Nepal.
His research focus is on intelligent algorithm for wireless communication and related domains, mobile computing,
sensor network algorithms and applications. He has worked as principal investigator in many Government of India
funded research projects mainly for development of wireless sensor network applications and application of swarm
intelligence techniques in cognitive radio network domain. Presently, he is leading a multi-institutional project funded
by Information Technology Research Academy (ITRA), Department of Electronics and Information Technology
(DeITy), Govt. of India.

22

BiDA2014

Expert Panel Discussion


Theme: Big Data: New Challenges and Directions.

Chair and Panelist: Mr. K Mohan Raidu, Director, Informatics India


Mr. Raidu has founded Informatics India and is presently heading it as its Director. He is
an Enterprise Solutions expert in the verticals of Sugar, Cement and Banking Industries.
He has 35 years of experience in Software Development starting from 1979. As there
was no formal education available in many Computer subjects in those days, he has
acquired the skills on the jobs. While in service, he has taken short term courses from
RR Labs-Hyderabad, CMC Hyderabad, IBM Bangalore, and University of Genova, Italy.
Mr. Raidu holds MBA (Osmania University), MA (Philosophy, OU), and BSc (OU).
Presently Mr. Raidus company works from two divisions, viz, Turn-Key projects and
Software Products. Projects Division provides Solutions for Units of DAE (Department of
Atomic Energy). Products Division provides ECS Methods (Electronic Clearing Services)
to Branches of SBI. Other products are, Infrasoft (Infrastructure Asset Management
Solution), LawOffice (Law Office Management Solution), FileManager a workflow
management Software for Offices.
Currently, Mr. Raidu is the Convener of CSI Golden Jubilee Event 2014. He is Member of CSI-SIG on e-Governance.
He is also President of TITA Corporate Wing (Telangana Information Technology Association).

Co-Panelist: Mr. T Krishna Kumar, Vice President, Tech Mahindra


Mr. T Krishna Kumar is currently the Vice President at Tech Mahindra. He has about 18
years experience in the area of Management Consulting, IT and Mfg sector. Before Tech
Mahindra he has worked in organizations like Avalon Consulting, KPMG, Satyam
Computer Services and Batliboi. He has consulted for several Fortune 500 companies and
International companies in the area of BPR and IT in the industries such as Manufacturing,
Retail and Telecom. He has successfully driven IT strategies around IT in the areas of
Manufacturing, CRM, Retail, SAP and Data warehousing solutions.
Mr. Krishna Kumar is a 6 Sigma Blackbelt, after having done his MBA from SP Jain,
Mumbai, BE from REC-Surat, CFA from ICFAI and Business Management from Harvard
Business School. He has represented Indian IT Industry at the G-15 summit in Malaysia.
His areas of passion are around Business Performance Management and how IT can deliver significant value to
various human endeavors business or otherwise.

23

BiDA2014

Dr. V C V Rao, Mr. Rahul Ravidas Naik, Mr. Swapnajit Rout, Mr. Vikalp Handa
High-Performance Computing Frontier Technologies & Exploration Group, Centre for Development of
Advanced Computing (C-DAC), Pune.
An Overview of the Laboratory Sessions: The laboratory sessions are focused on understanding Hadoop
MapReduce framework, writing and execution of codes on IBM Multi-Core Processor system (IBM P755 32 CPU
124GB RAM AIX as operating Systems) of Message Passing Cluster of CMSD, UoH for numerical and nonnumerical computations. The basic codes (accessible from C-DAC website) for the following:
(i)
(ii)
(iii)
(iv)

Hadoop MapReduce framework,


Large scale matrix computations,
Graph analytics based on Open Source Software Graph Distributed Computing GIRAPH & Bulk
Synchronous Parallel (BSP) model, and
Demonstration of Image Processing Kernels on BIG Data Framework.

Mr. Rahul Ravidas Naik is a project engineer in High-Performance Computing - Frontier


Technologies & Exploration (HPC-FTE) Group, at C-DAC, Pune, India. Mr. Rahul specializes
in implementation of parallel algorithms on large scale distributed computing based on
Hadoop MapReduce with accelerators GPUs and tuning performance of application and
system benchmarks. He did his post graduate course at C-DAC, Mumbai in the year 201213. Prior to this, Mr. Rahul did his Masters in Computer Science at University of Southern
California in the year 2006-08 and Bachelor in Computer Engineering at America university of
Sharjah In the year 2006.

Mr. Swapnajit Rout is a project engineer in High-Performance Computing - Frontier


Technologies & Exploration (HPC-FTE) Group, at C-DAC, Pune, India. Mr. Swpanajit
specializes on performance of application and system benchmarks on distributed computing
systems with Hadoop Map Reduce and GPUs. He did his post graduate diploma in system
software in C-DACs ACTS at Pune in the year 2013. He did his Bachelors in Information
Technology at Biju Patnaik University of Technology, Odisha in the year 2011.

Mr. Vikalp Handa is a project engineer in High-Performance Computing - Frontier


Technologies & Exploration (HPC-FTE) Group, at C-DAC, Pune, India. Mr. Vikalp
specializes on design and implementation of distributed parallel algorithms for information
science and scientific application kernels on distributed computing systems with Hadoop
Map Reduce and GPUs. Prior to this, he worked on implementation of computational
finance applications and machine learning algorithms. He did his Bachelors in Computer
Science Engineering in UIET, Punjab University, and Chandigarh in the year 2013.

24

You might also like