Full Doc Janani

CHAPTER 1
INTRODUCTION
Due to the rapid growth and innovation in communications as well as computer
technologies, smart cities are the subject of permanent research in industry and academia. The
final goal is to provide numerous services such as real-time traffic monitoring, healthcare
assistance, security, safety. It should be noted that the concept of smart buildings with many
Internet enabled devices can be controlled from the remote locations and communicate each
other, thus becoming parts of smart cities. Internet of Things(IoT), as the collection of smart
appliances for data sharing across the globe, introduces a vision of the future smart cities
where users, computing systems and everyday objects[2][1] cooperate with economic
benefits One of the best options for processing a large collection of data from different
buildings is cloud-computing. The ability to share resources, services, responsibility and
management among cloud providers is the fundamental assumption from the view point of
cloud interoperability.
Fig: 1.1. Data Analysis for Smart Home
1
1.1 RESEARCH PROBLEM
In the previous section, it was discussed that big data introduces new security and
privacy issues. For the health care sector these issues are even amplified, due to fact that
health care data are considered privacy sensitive data and traditional security and privacy
methods to protect privacy health care data seem insufficient or even obsolete. This is a
problem for patients as personal information can unwillingly be derived from these health
information systems and end up in wrong hands. Besides that individuals have some rights
against intrusion of their personal information, in wrong hands, personal information can
potentially harm individuals.
On the other hand, weak security and privacy methods can hinder the adoption of big
data in the health care. There can be public resistance from individuals or government against
the use of big data in health care, when there is no trust in the protection of their personal
information. Hindering in the adoption of big data in the health care could also hinder
potential benefits big data could bring to the health care, which are for example improved
quality of health. Therefore, the owners of the problem are the hospitals and other
organizations in the healthcare that potentially can benefit from an adoption of big data in
health care. These organizations have to deal with hurdles such as privacy legalization and
the public perception of privacy before they can successfully adopt big data.
1.2 AIM & OBJECTIVE
The objective is to select the optimal cloud server for a mobile VM in addition to
minimizing the total number of VM migrations, reducing task-execution time. Honey Bean
Optimization algorithm (HBO) to identify the optimal target cloudlet.
OBJECTIVES
To reduce dimensionality and remove noise.
To improve mining performance
• Speed of Data reducing task-execution time
• Predictive accuracy
• Simplicity and comprehensibility of mined results
2
Fig: 1.2. Big Data Mining Platform
1.2.1 BIG DATA CHALLENGES IN HEALTHCARE
 Inferring knowledge from complex heterogeneous patient sources. Leveraging the

patient/data correlations in longitudinal records.
 Understanding unstructured clinical notes in the right context.
 Efficiently handling large volumes of medical imaging data and extracting potentially
useful information and biomarkers.
 Analyzing genomic data is a computationally intensive task and combining with
standard clinical data adds additional layers of complexity.
 Capturing the patient’s behavioral data through several sensors; their various social
interactions and communications
3
1.2.2 THESIS CONTRIBUTION
This thesis reports on the use of phrases to Seamless access of Smart healthcare services
requires resource migration in terms of VM migration during the offloading process to ensure
QoS for the user. Honey Bean Optimization algorithm (HBO) based joint VM migration
technique is proposed in which user mobility is also considered.
 To develop an Ant Colony Optimization (ACO)-based VM migration model, in which

VM are migrated to candidate cloud servers so as to maximize the total utility of the
MCC system.
 Mobility-aware selection of cloudlets for VM provisioning in our proposed PRIMIO
system helps significantly to reduce service provisioning time.
 To introduce a joint VM migration approach to optimize both the resource utilization
and task execution time, diminishing the shortcomings of a single VM migration
approach.
 The results of performance evaluation, depicted from test-bed implementation and
extensive experiments, show that the proposed PRIMIO system achieves significant
improvements compared to state-of-the-art works.
1.2.3 BIG DATA AS AN EXTENSION TO TRADITIONAL BI
According to Oracle, big data is the inclusion of additional sources to augment

existing data analytics operations. Exploiting big data entails dealing with multiple sources
and combining structured and unstructured data. Various sources advices not to dispense
existing infrastructures and capabilities of BI, but current capabilities should be integrated
with the new requirements of Big Data. Big Data technologies work best in cooperation with
the original enterprise data warehouses, as used with Business Intelligence
BIG DATA AS A NEW GENERATION OF BI & ANALYTICS
Every generation introduces new data types, which require new capabilities to deal
with these new data types. The first generation of BI&A applications and research focused on
mostly structured data collected by companies through legacy systems and stored in relational
database management systems (RDBMS)
4
 Analytical techniques used in this generation of BI&A are rooted in statistical
methods and data mining techniques developed in the 70s and 80s respectively.
 The second generation of BI&A is a result of the development of the internet; it
encompasses analysis of web-based unstructured content.
 The third generation of BI&A is emerging as a result of smart phones, tablets and
other sensor based information supplies and includes analysis on location-based,
person-centered and context-relevant analysis
1.2.4 HADOOP FRAMEWORK
Unlike RDBMS and NoSQL, Hadoop is not referring to a type of database, but rather
a software platform that allows for massively parallel computing. Hadoop is an open source
software framework, which consists of several software modules that are targeted to process
big data, large volume and high variety of data. Core modules of the Hadoop ecosystem are
Hadoop Distributed File System (HDFS) and Hadoop Map reduce. Below we describe the
most popular modules of the Hadoop framework.
HDFS is the software module that arranges the storage in a Hadoop big data ecosystem.
HDFS breaks down data into pieces and distributes these pieces to multiple nodes of physical
data storage in a system. The main advantages of HDFS are that it is designed to be scalable
and fault tolerant. Additionally, by dividing data into pieces HDFS prepares data for parallel
processing. Other modules in the Hadoop framework are designed to take advantage of
distributed data over multiple nodes.
MAP REDUCE is a software framework that provides a programming language that takes
full advantage of parallel processing. Tasks that programmed in Map Reduce are divided in
smaller tasks, which are sent to the relevant nodes in the system. The Map Reduce framework
takes care of the whole process: managing communication between nodes, running tasks in
parallel and providing redundancy and fault-tolerance.
HBASE is a software module that runs as non-relational database on top of HDFS. HBase is
NoSQL database that stores data according a key-value model. As it is a NoSQL type of
database it requires low level programming to query. Like other software modules of Hadoop,
HBase is open-source and is modeled after Google’s Big Table database.
5
HIVE is essentially a data warehouse that runs on top of HDFS. Hive structures data into
concepts like tables, columns, rows and partitions, similar to a relational database. Data in a
Hive database can be queried using (limited) SQL like language, named HiveQL.
1.3 BIG DATA
The phrase "big data" is often used in enterprise settings to describe large amounts of
data. It does not refer to a specific amount of data, but rather describes a dataset that cannot
be stored or processed using traditional database software.
Examples of big data include the Google search index, the database of Facebook user
profiles, and Amazon.com's product list. These collections of data (or "datasets") are so large
that the data cannot be stored in a typical database or even a single computer [2] [4]. Instead,
the data must be stored and processed using a highly scalable database management
system .Big data is often distributed across multiple storage devices, sometimes in several
different locations.
Many traditional database management systems have limits to how much data they
can store. For example, an Access 2010 database can only contain two gigabytes of data,
which makes it infeasible to store several petabytes orexabytes of data. Even if a DBMS can
store large amounts of data, it may operate inefficiently if too many tables or records are
created, which can lead to slow performance. Big data solutions solve these problems by
providing highly responsive and scalable storage systems.
There are several different types of big data software solutions, including data storage
platforms and data analytics programs. Some of the most common big data software products
include Apache Hadoop, IBM’s Big Data Platform, Oracle NoSQL Database, Microsoft
HDInsight, and EMC Pivotal One.
6
1.3.1 OVERVIEW OF RECENT BIG DATA STUDIES
Since the publication of the benchmark report on big data by the McKinseyGlobal
Institute in June 2011i a plethora of reports have been published over the past year that have
sought to define the term ‘big data’, establish potential user benefits, and forecast future
uptake within the business community. In view of this large volume of readily available
supporting research [2] [1], we have elected not to go into great depth about the
benefits/pitfalls of big data adoption, taking
It was read that this is a well-identified emerging trend and one that has well
recognized potential for business creation and development. It was thought pertinent,
however, to provide a brief overview of some of the generic findings arising from research
[2] [1] in this field and to highlight some important caveats that have tended to be overlooked
by many of those reporting on big data developments within the media/elsewhere.
7
Fig.1.3. The Big Data Stack Divided Into Three Different Layers
1.3.2 BIG DATA CHALLENGES
The list of challenges in a big data projects include a combination of the following issues:
• Lack of appropriately scoped objectives,

• Lack of required skills,
• The size of big data,
• The non-clearly defined structure of much of the big data,
• The difficulty of enforcing data consistency,
• Privacy and Data verification,
• Data management/integration,
• Rights management,
• ETL,
• Data discovery (how to find high-quality data from the web?),
• Data veracity (how can we cope with uncertainty, imprecision, missing details?),
8
1.3.3 BIG DATA MANAGEMENT TECHNOLOGIES
Given the high volume, velocity and variety of big data, the traditional Data
Warehouse (DWH) and Business Intelligence (BI) architectures already existing in
Companies need to be enhanced in order to meet the new requirements of storing and
processing big data. To optimize the performance of the Big Data Analytics pipeline, it is
important to select the appropriate big data technology for given requirements. This section
contains an overview of the various big data technologies and gives recommendations when
to use them.
• Non-relational technologies (No SQL)

• New-relational technologies (New SQL)
• In-memory Databases
• Analytical platforms
• Hadoop based solutions
• Streaming technologies
1.3.4 ADVANTAGES AND DISADVANTAGES OF BIG DATA

ADVANTAGES
 Data mining allows uses are that you can find correlations easier.
 More calculated now therefore accuracy is higher.
 Data is now combined into a big mass, which allows for links to be found.
 For example: company with decades of information can make use of Big Data and
data analysis to create competitive advantages and open new business opportunities.
 Started because companies have been finding it hard to manage all their data.
 Creates new growth opportunities, lots of jobs.
DISADVANTAGES
 Big risks on security and privacy.
 Challenges arise: expensive, need to spend a lot to get it working.
 A lot of analyzing: uncover patterns, apply algorithms, connections relationships.
Still need specialization regarding the analysts; hard to find the right skill set.
9
1.3.5 BIG DATA AND CLOUD COMPUTING
Big Data is an umbrella term which encompasses all sorts of data which exists today.
From hospital records and digital data to the overwhelming amount of government paperwork
which is archived – there is more to it than we officially know.
You can’t categorize Big Data under one definition or description, because we are still
working on it. The great thing about information technology is that it has always been
available for technology companies, business and all types of institutions.
It was the emergence of cloud computing which made it easier to provide the best of
technology in the most cost-effective packages. Cloud computing not only reduced costs, but
also made a wide array of applications available to the smaller companies.
Just as the cloud is growing steadily, we are also noticing an explosion of information
across the web. Social media is a completely different world, where both marketers and
common users generate loads of data every day.
Organizations and institutions are also creating data on a daily basis, which can
eventually become difficult to manage. Take a look at these statistics on Big Data generation
in the last five years;
 2.5 quintillion bytes (2.3 Trillion Gigabytes) of data are created every day.
 40 zettabytes (43 Trillion Gigabytes) of data will be created by 2020.
 Most companies in the US have at least 100 Terabytes (100,000 Gigabytes) of stored
data.
It seems like cloud computing and big data are an ideal combination for this. Together,
they provide a solution which is both scalable and accommodating for big data and business
analytics. The analytics advantage is going to be a huge benefit in today’s world. Imagine all
the information resources which will become easily accessible. Every field of life can benefit
from this information. Let’s look at these advantages in detail;
10
AGILITY
The traditional infrastructure of storing and managing data is now proving to be

slower and harder to manage. It can literally take weeks to just install and run a server. Cloud
computing is here now, and it can provide your company with all the resources you need. A
cloud database can enable your company to have thousands of virtual servers and get them
working seamlessly in only a matter of minutes.
AFFORDABILITY
Cloud computing is a blessing in disguise for a company that wishes to have updated
technology under a budget. Companies can pick what they want and pay for it as they go. The
resources required to manage Big Data are easily available and they don’t cost big bucks.
Before the cloud, companies used to invest huge sums of money in setting up IT departments
and then paid more money to keep that hardware updated. Now the companies can host their
Big Data on off-site servers or pay only for storage space and power they use every hour.
DATA PROCESSING
The explosion of data leads to the issue of processing it. Social media alone generates
a load of unstructured, chaotic data like tweets, posts, photos, videos and blogs which can’t
be processed under a single category. With Big Data Analytics platforms like Apache
Hadoop, structured and unstructured data can be processed. Cloud computing makes the
whole process easier and accessible to small, medium and larger enterprises.
FEASIBILITY
While traditional solutions would require the addition of more physical servers to the
cluster in order to increase processing power and storage space, the virtual nature of the cloud
allows for seemingly unlimited resources on demand. With the cloud, enterprises can scale up
or down to the desired level of processing power and storage space easily and quickly. Big
Data analytics require new processing requirements for large data sets. The demand for
processing this data can raise or fall at any time of the year, and cloud environment is the
11
perfect platform to fulfill this task. There is no need for additional infrastructure, since cloud
can provide most solutions in SaaS models.
1.3.6 CHALLENGES TO BIG DATA IN THE CLOUD ENVIRONMENT
Just as Big Data has provided organizations with terabytes of data, it has also
presented an issue of managing this data under a traditional framework. How to analyze the
large sum of data to take out only the most useful bits, analyzing these large volumes of data
often becomes a difficult task as well. In the high speed connectivity era, moving large sets of
data and providing the details needed to access it, is also a problem. These large sets of data
often carry sensitive information like credit/debit card numbers, addresses and other details,
raising data security concerns.
Security issues in the cloud are a major concern for businesses and cloud providers
today. It seems like the attackers are relentless, and they keep inventing new ways to find
entry points in a system. Other issues include ransom ware, which deeply affects a company’s
reputation and resources, Denial of Service attacks, Phishing attacks and Cloud Abuse.
Globally, 40% of businesses experienced a ransom ware incident during the past year.
Both clients and cloud providers have their own share of risks involved when making an
agreement on cloud solutions. Insecure interfaces and weak API’s can give away valuable
information to hackers, and these hackers can misuse this information for the wrong reasons.
Some cloud models are still in the deployment stage and basic DBMS is not only tailored for
Cloud computing. Data Acts is also a serious issue which requires data centers to be closer to
a user than a provider. Data replication must be done in a way which leaves zero room for
error; otherwise it can affect the analysis stage. It is crucial to make the searching, sharing,
storage, transfer, analysis, and visualization of this data as smoothly as possible.
The only way to deal with these challenges is to implement next-generation

technology which can predict an issue before it causes more damage. Fraud detection
patterns, encryptions and smart solutions are immensely important to combat attackers. At the
same time, it is your responsibility to own your data and keep it protected at your end while
looking for business intelligent solutions that can ensure a steady ROI as well.
12
1.4 OVERVIEW OF MOBILE CLOUD
Here assume a three-tier Mobile Cloud Computing (MCC) environment, where a set
of M access points (APs) comprise the backbone network. Tier one represents the master
cloud, which consists of several public cloud providers, such as Google App Engine, and
Microsoft Azure Amazon EC2. A set of high-speed interconnected cloudlets constitute the tier
two or the backbone layer of the mobile cloud architecture. Smartphone’s, wearable devices
or other mobile devices constitute the tier three or user layer. Users access the nearest cloud
resources using devices from tier three. A set of cloudlets is controlled and monitored by the
master cloud (MC). All cloudlets route their hypervisor information to the master clouds and
they are connected to the MC with a high-speed network connection.
Fig: 1.4 Access control in the healthcare
13
1.5 MOBILE CLOUD ARCHITECTURE
Each of the considered smart city applications (energy, mobility, healthcare, disaster
recovering) can be defined through the services provided to citizens, concerning the
requirements in terms:
1.5.1 LATENCY
The amount of time required by a certain application between the event happening
and the event being acquired by the system.
1.5.2 ENERGY CONSUMPTION

The energy consumed for executing a certain application locally or remotely
1.5.3 THROUGHPUT
The amount of bandwidth required by a specific application to be reliably executed in
the smart city environment
1.5.4 CLOUD COMPUTING

The amount of computing processes requested by a certain application.
1.5.5 EXCHANGED DATA

The amount of input, output, and code information to be transferred by means of the
wireless network.
1.5.6 STORAGE
The amount of storage space required for storing the sensed data and/or the
processing application.
1.5.7 USERS
The number of users needed to achieve reliable service.
14
1.6 CHALLENGES OF PRESENT CLOUD
Providing sustainable solutions to jointly optimize the data transfer exploiting

heterogeneous networks and the data processing, is one of the main challenges. For handling
video services, mobile edge computing (MEC) originating from the cloudlet concept is one of
the vital solutions. MEC and the 5G mobile networks when integrated are going to improve
the quality of service in smart cities. In connection with demands for mobile communication,
current heterogeneous networks meet face to face with issues for the information services in
smart cities such as: network convergence, balancing in cell networks, handover, and
interference. Cellular networks cannot accept these issues in order to satisfy the demands.
Thus, considering the deployment of 4G ultra-dense wireless networks, 4G converged cells-
less communication networks are proposed to support mobile terminals in smart cities.
1.7 OBJECTIVES
 Reduced cost as the need to recollect and verify data is removed;

 Integrated city systems and data-driven services
 A common understanding of the needs of communities
 Shared objectives, collaboratively developed and evidenced using data;
 Engaged and enabled citizens and communities
 Transparency in decision-making models
1.8 MOBILE CLOUD APPLICATION
Definition of Mobile Cloud Computing the Mobile Cloud Computing (MCC) term
was introduced after the concept of Cloud Computing. Basically, MCC refers to an
infrastructure where both the data storage and the data processing happen outside of the
mobile device. Regarding the definition, mobile applications move the computation power
and storage from the mobile phones to the cloud. It can be thought as a combination of the
cloud computing and mobile environment. The cloud can be used for power and storage, as
mobile devices don’t have powerful resources compared to traditional computation devices.
15
1.9 MOBILE HEALTHCARE
Medical applications in the mobile environment called as mobile healthcare

applications and used for medical treatment, patient tracking, etc. The purpose of applying
MCC in medical applications is to decrease disadvantages of traditional medical applications
like small physical storage, security and privacy, and medical errors.
1.10 MOBILE HEALTHCARE PROVIDES THESE

FACILITIES
Health monitoring services for patients to be monitored at anytime and anywhere 8

through internet or network provider. Emergency management system for emergency
vehicles to reach or manage vehicles effectively and in time, in case of receiving calls from
incidents and accidents. Healthcare mobile devices for detecting pulse-rate, blood pressure,
and level of alcohol integrated with a system to alert in case of emergency. Store healthcare
information of patients to use in medical experiments or researches. Mobile healthcare
applications provide users easiness and quickness by accessing resources at any time, from
anywhere. By the help of cloud, mobile healthcare applications offer a variety of on-demand
services on clouds rather than standalone applications on local computers and servers.
However, there have to be proposed solutions to protect participant’s health information to
increase the privacy of the users, as have to be done in the traditional applications.
1.11 MOBILECLOUD COMPUTING

ADVANTAGES/DISADVANTAGES
There are many reasons to use cloud computing with mobile applications. MCC
provides some solutions to the obstacles which mobile subscribers are usually face up with.
These advantages are:
1.11.1 BATTERY LIFE
Battery life is one of the main concerns in the mobile environment. There are already
several solutions for extending battery life by enhancing CPU performance, using disk and
screen in an efficient manner to reduce power consumption.
16
But these solutions generally require changes in the mobile devices’ structure or a new
hardware which means increasing the cost. Computation or data offloading techniques are
suggested to migrate the huge and complex computations from limited resource devices like
mobile devices to powerful machines like servers in clouds. This avoids taking a long
application execution time on mobile devices which results in large amount of power 6 and/or
read-write time consumption [4]. There are many evaluations to show effectiveness of these
techniques.
1.11.2 DATA STORAGE CAPACITY/PROCESS POWER
Another obstacle is storage capacity of mobile devices. Mobile devices are generally
have limited storage. To overcome this problem, MCC can be used to access, query or store
the large data on the cloud through wireless networks. There are several examples which are
widely used such as Amazon Simple Storage Service (Amazon S3) to provide file storage on
the cloud. In addition, MCC reduces the time and energy consumption for compute-intensive
applications, which is too applicable when thinking of the limited-resource devices.
1.11.3. RELIABILITY
With the help of CC paradigm, reliability can be improved since data and application
are stored and backed up on several numbers of computers on the cloud. This provides more
confidentiality by reducing the chance of data lost on the mobile devices. In addition,
copyrighting digital contents and preventing illegal distributions like music, video can be
more available in this model. Also security services like virus detection applications can be
easily provided and used in an efficient way without effecting mobile device performance.
Furthermore, CC scalability, elasticity advantages can be used in MCC, as well since cloud
flexibility is applicable as a whole infrastructure, in the same way.
1.11.4. PRIVACY
Privacy is an important issue, when thinking about private data. As in the CC era, the
same trust problem comes out with the mobile network providers and cloud providers. They
can monitor at all the communication and data stored in the cloud or network provider,
although there is encryption mechanisms to crypt data communicated or stored. So from this
perspective, it is a big headache to be solved.
17
1.11.5. COMMUNICATION
The communication is composed from multiple parts from mobile subscriber to the
cloud provider. Therefore there can be some problems like poor network speed or 7 limited
bandwidth. It can be a big concern because the number of mobile and cloud users is
dramatically increasing.
1.12. CHALLENGES IN MOBILE CLOUD COMPUTING
As mentioned in the previous section, Mobile Cloud Computing has many benefits and good
application examples for mobile users and service providers. On the other hand, as mentioned
in some parts, there are also some challenges related to cloud computing and mobile
networks communication. This section gives some explanation about these obstacles and
solutions. 3.1. Mobile Side Challenges In the mobile network side, main obstacles and
solutions are listed below:
1.12.1. LOW BANDWIDTH
Bandwidth is the one of important issues in mobile cloud environment because

mobile network resource is much smaller compared with the traditional networks. Therefore,
P2P Media Streaming for sharing limited bandwidth among the users who are located nearby
in the same area for the same content such as the same video [6]. By this method, each user
transmits or exchanges parts of the same content with the other users, which is resulted in
improvement of content quality, especially for videos.
1.12.2. AVAILABILITY
Network failures, out of signal errors, or high traffic related poor performance
problems are main threats to prevent users to connect to the cloud. But there are some
solutions to help mobile users in the case of any disconnection from the clouds. One of them
is Wi-Fi Based Multihop MANET. It is a distributed content sharing protocol for the situation
without any infrastructure [7]. In this mechanism, nearby nodes are detected in case of the
failure of direct connection to the cloud. In this case, instead of having a link directly to the
cloud, mobile user can connect to the cloud through neighboring nodes. Although there are
some considers about security issues for such mechanisms, these issues can also be solved.
18
1.12.3. HETEROGENEITY
There are types of networks which are used simultaneously in mobile environment
such as WCDMA, GPRS, WiMAX, CDMA2000, and WLAN. As a result, handling like
heterogeneous network connectivity becomes very hard while satisfying mobile cloud
computing requirements such as connectivity which is always on, on-demand scalable
connectivity, and the energy efficiency of mobile devices. This problem can be solved 10 by
using standardized interfaces and messaging protocols to reach, manage and distribute
contents.
1.12.4. PRICING
Using multiple services in mobile requires with both mobile network provider and
cloud service provider. However, these providers have different methods of payment and
prices for services, features and facilities. Therefore, this has possibility of leading to many
problems like how to determine price, how the price could be shared among the providers or
parties, and how the subscribers can pay. As an example, when a mobile user wants to run a
not free mobile application on the cloud, this participates three stakeholders as one of them is
application provider for application license, second one is mobile network provider for used
data communication from user to cloud, and third one is cloud provider for providing and
running application on the cloud.
19
CHAPTER 2
LITERATURE REVIEW
Marcos D. Assuncao et.al. [1] Have discussed approaches and environments for
carrying out analytics on clouds for Big data applications.They have identified possible
gaps in technology and provide recommendations for the research community on future
directions on Cloud-supported Big Data computing and analytics solutions.
KhairulMunadi et.al. [2] Have proposed a conceptual image trading framework that
enables secure storage and retrieval over internet services. The aim is to facilitate secure
storage and retrieval of original images for commercial transactions, while preventing
untrusted server providers and unauthorized users from gaining access to true contents.
Rupali S. Khachane et.al.[3] have focused on Privacy Homomorp \hism technique

which emphasize to resolve the security of query processing from client side, cloud with R-
tree index query and distance re-coding algorithm.
BadrishChandramouli et.al. [4] Have proposed a new progressive analytics system

based on a progress model called Prism that allows users to communicate progressive
samples to the system and efficient and deterministic query processing over samples.
Satoshi Tsuchiya et.al.[5]Have discussed about two fundamental technologies :

distributed data store and complex event processing, and workflow description for
distributed data processing.
DivyakantAgrawal et.al. [6]Have focused on an organized picture of the challenges

faced by application developers and DBMS designers in developing and deploying internet
scale applications.
Ms.PreetiTiwari et.al. [7]Have discussed that the performance of distributed query

optimization is improved when ACO is integrated with other optimization algorithms.
20
Haibo Hu et.al. [8]Have proposed a holistic and efficient solution that comprises a
secure traversal framework and an encryption scheme based on privacy homomorphism.
The framework is scalable to large datasets by leveraging an index-based approach. Based
on this framework, we devise secure protocols for processing typical queries such as k-
nearest-neighbor queries (kNN) on R-tree index.
Ku Rahaneet. al. [9] has proposed about a framework for big data clustering which
utilizes grid technology and ant-based algorithm.
Sudipto das et.al. [10] Have discussed to clarify some of the critical concepts in the
design space of big data and cloud computing such as: the appropriate systems for a
specific set of application requirements on the mobile data.
21
2.1 RELATED PAPERS
TITLE:A Genetic Algorithm for Virtual Machine Migration in Heterogeneous Mobile Cloud
Computing.
AUTHOR:Md. Mofijul Islam, Md. AbdurRazzaque and Md. Jahidul Islam
YEAR: 2016
DESCRIPTION:
 Mobile Cloud Computing (MCC) improves the performance of a mobile application
by executing it at a resourceful cloud server.
 Virtual Machine (VM) migration in MCC brings cloud resources closer to a user so as
to further minimize the response time of an offloaded application.
 The key challenge is to find an optimal cloud server for migration that offers the
maximum reduction in computation time.
 The goal of GAVMM is to select the optimal cloud server for a mobile VM and to
minimize the total number of VM migrations, resulting in a reduced task execution
time.
ADVANTAGES:
 Mass storage capacity and high-speed computing power.
 It will assign multiple tasks in larger bandwidth the VM, but the smaller bandwidth
VM will be assigned rarely tasks.
 Load balancing of the entire system can be handled dynamically by using
virtualization technology
DISADVANTAGES:
 VM placement problem is the hinge of scheduling and management in cloud data
center.
 Limited-bandwidth and other limited resources.
 VM placement problem needs to consider the influence of network factors.
ALGORITHM:
 Genetic algorithm based virtual Machine migration (GAVMM).
22
TITLE:A Survey of Mobile Cloud Computing Application Models.
AUTHOR:Atta urRehman Khan, Mazliza Othman, Sajjad Ahmad Madani and SameeUllah
Khan.
YEAR: 2014
DESCRIPTION:
 Smartphones are now capable of supporting a wide range of applications, many of
which demand an ever increasing computational power.
 This poses a challenge because smartphones are resource-constrained devices with
limited computation power, memory, storage, and energy.
 The cloud computing technology offers virtually unlimited dynamic resources for
computation, storage, and service provision.
 The traditional smartphone application models do not support the development of
applications that can incorporate cloud computing features and requires specialized
mobile cloud application models.
ADVANTAGES:
 EXCloud, it transfers only the top stack frames, unlike the traditional process
migration techniques in which full state migrations are performed.
 MAUI provides a programming environment where independent methods can be
marked for remote execution.
 Model is a wide range of elasticity patterns to optimize the execution of applications
according to the users’ desired objectives.
DISADVANTAGES:
 The sharing of data and states between the web lets that execute on distributed
locations are prone to security issues.
 The data replication may give rise to data synchronization and integrity issues.
 The latency issue is very crucial in mobile cloud application models.
ALGORITHM:
Application partitioning algorithms such as
 All-step
 K-step
23
TITLE: Big Data-Driven Service Composition Using Parallel Clustered Particle Swarm
Optimizationin Mobile Environment.
AUTHOR:M. ShamimHossain, MohdMoniruzzaman, Ghulam Muhammad and Ahmed
Ghoneim, AtifAlamri.
DESCRIPTION:
 A mobile service providers support numerous emerging services with differing quality
metrics but similar functionality.
 The mobile environment is ambient and dynamic in nature, requiring more efficient
techniques to deliver the required service composition promptly to users.
 Selecting the optimum required services in a minimal time from the numerous sets of
dynamic services is a challenge.
 By using parallel processing, the optimum service composition is obtained in
significantly less time than alternative algorithms.
ADVANTAGES:
 The performance of this algorithm can be improved by using efficient optimization
techniques like PSO.
 Qualities of the mobile environment demand efficient optimization and clustering
techniques.
DISADVANTAGES:
 The issue of parallel and distributed data operations where the structure of data is
multi-dimensional.
 Dynamic QoS and the rapidly changing nature of services in the mobile environment.
ALGORITHM:
 Particle swarm optimization
 k-means clustering
24
TITLE:Clone Cloud: Elastic Execution between Mobile Device and Cloud
AUTHOR:Byung-GonChun,SunghwanIhm, PetrosManiatis, MayurNaik and Ashwin Patti.
DESCRIPTION:
 Mobile applications are becoming increasingly ubiquitous and provide ever richer
functionality on mobile devices.
 Such devices often enjoy strong connectivity with more powerful machines ranging
from laptops and desktops to commercial clouds.
 Clone Cloud uses a combination of static analysis and dynamic profiling to partition
applications automatically at a fine granularity while optimizing execution time and
energy use for a target computation and communication environment.
 At runtime, the application partitioning is effected by migrating a thread from the
mobile device at a chosen point to the clone in the cloud, executing there for the
remainder of the partition, and re-integrating the migrated thread back to the mobile
device.
ADVANTAGES:
 Like desktops and laptops and place demands on an extremely limited supply of
energy.
 The granularity of partitioning is coarse since it is at class level, and it focuses on
static partitioning.
 Supporting native method calls was an important design choice we made, which
increases its applicability.
DISADVANTAGES:
 Web page Consistency problem.
 Optimization problem.
ALGORITHM:
 DEFLATE compression algorithm
25
TITLE: Federated Internet of Things and Cloud Computing Pervasive Patient Health
Monitoring System
AUTHOR:Jemal H. Abawajy and Mohammad Mehedi Hassan
YEAR: 2017
DESCRIPTION:
 In the conventional hospital-centric healthcare system, patients are often tethered to
several monitors.
 It develop an inexpensive but flexible and scalable remote health status monitoring
system that integrates the capabilities of the IoT and cloud technologies for remote
monitoring of a patient’s health status.
 The healthcare spending challenges by substantially reducing inefficiency and waste
as well as enabling patients to stay in their own homes and get the same or better care.
 It demonstrates the suitability of the proposed PPHM infrastructure, a case study for
real-time monitoring of a patient suffering from congestive heart failure using ECG is
presented.
ADVANTAGES:
 A flexible, energy-efficient, and scalable remote patient health status monitoring
framework.
 A health data clustering and classification mechanism to enable good patient care.
 Performance analysis of the PPHM framework to show its effectiveness.
DISADVANTAGES:
 IoT-cloud convergence is crucial issue in healthcare application.
 Access control, location privacy, data confidentiality.
ALGORITHM:
 Rank correlation coefficient algorithm.
 Classification algorithm.
26
TITLE: Healthcare Big Data Voice Pathology Assessment Framework.
AUTHOR: M. Shamimhossain and Ghulammuhammad
YEAR: 2016
DESCRIPTION:
 Healthcare big data comprise data from different structured, semi-structured, and
unstructured sources.
 A framework is required that facilitates collection, extraction, storage, classification,
processing, and modeling of this vast heterogeneous volume of data.
 The machine learning algorithms in the form of a support vector machine, an extreme
learning machine and a Gaussian mixture model are used as the classifier.
 The proposed VPA system shows its efficiency in terms of accuracy and time
requirement.
ADVANTAGES:
 It likely to see an increasingly diverse set of stakeholders involved, spanning the
technical, health, and policy domains.
 Big data tools with their merits that facilitate the execution of specified tasks in the
healthcare ecosystem.
DISADVANTAGES:
 Security, integrity and privacy violations of these data can cause irremediable damage
to the health, or even death, of the individual and loss to society.
 The standardization and format of big data, big data transfer and processing, searching
and mining of big data, and management of services.
 Patients with similar symptoms and diseases can share their experiences through
social media to get ad-hoc counseling, which constitutes a big data problem.
ALGORITHM:
 Support vector machines (SVM)
 Extreme learning machine (ELM)
 Gaussian mixture model (GMM)
27
TITLE:Migrate or not? Exploiting dynamic task migration in Mobile cloud computing
systems.
AUTHOR: LazarosGkatzikis and IordanisKoutsopoulos
YEAR: 2013
DESCRIPTION:
 Contemporary mobile devices generate heavy loads of computationally intensive
tasks, which cannot be executed locally due to the limited processing and energy
capabilities of each device.
 Cloud facilities enable mobile devices-clients to offload their tasks to remote cloud
servers, giving birth to Mobile Cloud Computing (MCC).
 The challenge for the cloud is to minimize the task execution and data transfer time to
the user, whose location changes due to mobility.
 It provides quality of service guarantees is particularly challenging in the dynamic
MCC environment, due to the time-varying bandwidth of the access links.
ADVANTAGES:
 The elasticity of resource provisioning and the pay as- you-go pricing model.
 We delineate the performance benefits that arise for mobile applications and identify
the peculiarities of the cloud that introduce significant challenges in deriving optimal
migration strategies.
 Reducing the energy consumption of individual servers by moving the processes from
heavily loaded to less loaded servers (load balancing).
DISADVANTAGES:
 A strategy that does not consider migration cost and downloads time.
 No migration.
28
TITLE:Mobiles on Cloud Nine: Efficient Task Migration Policies for Cloud Computing
Systems.
AUTHOR:LazarosGkatzikis and IordanisKoutsopoulos
YEAR: 2014
DESCRIPTION:
 Due to limited processing and energy resources, mobile devices outsource their
computationally intensive tasks to the cloud.
 Clouds are shared facilities and hence task execution time may vary significantly.
 It investigates the potential of task migrations to reduce contention for the shared
resources of a mobile cloud computing architecture in which local clouds are attached
to wireless access infrastructure.
 It devises online migration strategies that at each time make migration decisions
according to the instantaneous load and the anticipated execution time.
ADVANTAGES:
 The modification of program to incorporate state capture and recovery function.
 Simplified IT management and maintenance capabilities.
 Enormous computing resources available on demand.
DISADVANTAGES:
 Classifying current computation offloading frameworks. Analyzing them by
identifying their approaches and crucial issues.
 Process migration applications are strongly connected with the system in the form of
sockets.
 Application development complexity and unauthorized access to remote data
demand a systematized plenary solution.
29
TITLE: Smart City Solution for Sustainable Urban Development
AUTHOR:MostafaBasiri, Ali ZeynaliAzim, Mina Farrokhi.
YEAR:2017
DESCRIPTION:
 Large, dense cities can be highly efficient in which it is most desirable that side, by
the heads of the green, and the future porticos.
 Bearing to the influx of the citizens of the new challenges of the rapid advance to
command positions.
 The globalization of urban economics, cities increasingly have to compete directly
with worldwide and regional economies for international investment to generate
employment, revenue and funds for development.
 Smart Cities are those towns which use information technology to improve both the
quality of life and accessibility for their inhabitants.
ADVANTAGES:
 Reducing resource consumption, notably energy and water, hence contributing to
reductions in CO2 emissions.
 Improving commercial enterprises through the publication of real-time data on the
operation of city services.
 The growing penetration of fixed and wireless networks that allow such sensors
and systems to be connected to distributed processing centers and for these centers
in turn to exchange information among themselves.
DISADVANTAGES:
 Where there are threats of serious or irreversible damage, lack of full scientific
certainty shall not be used as a reason for postponing cost effective measure to
prevent environmental degradation.
 The substitutability of capital.
 Sustainable development problem.
TECHNIQUE:
 Information management technique
30
CHAPTER 3
HEALTHCARE BIG DATA SOURCE ECO SYSTEM
Healthcare big data is a revolutionary tool in the healthcare industry, and is becoming
vital in current patient-centric care. Owing to the massive growth of data in the healthcare
industry, diverse data sources have been aggregated into the healthcare big data ecosystem.
These data sources are is used by a healthcare provider to enable him or her to make
decisions and provide appropriate care. Major data sources, along with the challenges
involved, are discussed below:
3.1. PHYSIOLOGICAL DATA
These data are huge in terms of volume and velocity. Regarding data volume, a
variety of signals is collected from heterogeneous sources to monitor patient characteristics,
including blood pressure, blood glucose, and heart rate. Sources include
electroencephalogram, electrocardiogram, and electroglottogram. Data velocity can be
observed from the growing rate of data generation from continuous monitoring, especially for
patients in a critical condition, requires these signals to be processed in real-time, for decision
making. These signals need to be extracted efficiently and processed with the suitable
machine learning algorithm to provide meaningful data for effective patient care. Efficient
and comprehensive methods are also required to analyze and process the collected signals to
provide useable data to the healthcare professionals and other related stakeholders. The
combination of EHR and physiological signals may increase the precision of data based on
the surrounding context of the patient.
3.2. EHRS/EMRS
EHRs or electronic medical records (EMRs) are digitized structured healthcare data
from a patient. The EHRs are collected from and shared among hospitals, research centers,
government agencies, and insurance companies. Security, integrity and privacy violations of
these data can cause irremediable damage to the health, or even death, of the individual and
loss to society. Thus, big healthcare data security is now a key topic of research.
31
3.3. MEDICAL IMAGES
These images generate a huge volume of data to assist healthcare professionals for
identifying or detecting disease, treatment, predicting and monitoring of patients. Medical
imaging techniques such as X-ray, ultrasound, or computed tomography scan play a crucial
role in diagnosis and prognosis. Owing to the complication, dimensionality and noise of the
collected images, efficient image processing methods are required to provide clinically
suitable data for patient care.
3.4. SENSED DATA
A sensed data from patients are collected using different wearable or implantable
devices, environment mounted devices, ambulatory devices, and sensors and smart phones
from home or in hospitals. The sensed data forms a key part of healthcare big data, as these
sensors are used to capture critical events or provide continuous monitoring. However, sensed
data must be collected, pre-processed, stored, shared and delivered correctly in a reasonable
time to be of use to healthcare providers when making clinical decisions. Owing to the
enormous volume of data collected, automated algorithms are required to reduce noise and to
allow for the deployment with big data analytics so that computation time can be reduced.
Moreover, it is a challenge to collect and collate multimodal sensed data from multiple
sources at the same time.
3.5. CLINICAL NOTES
The clinical notes, claims, recommendations, and decisions constitute one of the
largest unstructured sources of healthcare big data. Owing to the variety in format, reliability,
completeness, and accuracy of the clinical notes, it is challenging to ensure the health care
provider has the correct information. Efficient data mining and natural language processing
techniques are required to provide meaningful data.
32
Fig: 3.1. Big healthcare data source eco system.
3.6. GENE DATA

The genome data makes a major contribution to healthcare big data. The human
genome has a huge number of genes; collecting, analyzing, and classifying data on these
genes has taken years. These gene data have now been integrated from the genetic level to
physiological level of a human being.
3.7. SOCIAL MEDIA DATA

Social media connect healthcare professionals and patients outside their clinics,
hospitals, and homes through machine-to-machine, physician-to-patient, physician-to-
physician, and patient-to-patient communications. Patients with similar symptoms and
diseases can share their experiences through social media to get ad-hoc counseling, which
constitutes a big data problem. Based on study [18], 80% of unstructured data comes social
media. Messages, trending medical images, location information, and other features of social
media contribute to healthcare big data.
For example, social media has recently been used for investigating depression and
suicide rates using real-time emotional state analysis from Twitter. Because of the
33
heterogeneous nature of social healthcare media data, it is difficult to conduct data analysis
and provide meaningful data to healthcare big data stakeholders. Thus, this data needs to be
appropriately mined, analyzed and processed to improve the quality the healthcare services in
healthcare providers.
3.8. MAP REDUCE PROGRAMMING MODEL
The Map Reduce programming model divides computation into map, and reduces
phases as shown
Fig: 3.2. The Map Reduce Programming Model
The map phase partitions input data into many input splits and automatically stores
them across a number of machines in the cluster. Once input data are distributed across the
cluster, the runtime creates a large number of map tasks that execute in parallel to process the
input data. The map tasks read in a series of key-value pairs as input and produce one or more
intermediate key-value pairs. A key-value pair is the basic unit of input to the map task. We
use the Word Count application, which counts the number of occurrences of each word in a
series of text documents, as an example. In the case of processing text documents, a key-
value pair can be a line in a text document. The user can customize the dentition of a key-
value pair.
public class MyMapper extends Mapper <LongWritable,Text, Text, LongWritable>

34
{
public void map(LongWritable key, Text value, Contextcontext)
{ ... }
3.9. USER IMPLEMENTED MAPPER CLASS
public class MyReducer extends Reducer <LongWritable,Text, Text, LongWritable>
public void reduce(LongWritable key, Text value, Contextcontext)
{ ... }
3.10. HADOOP MAPREDUCE RUNTIME
Map Reduce runtime is an open source implementation of the Map Reduce model first
proposed by Google. It automatically handles task distribution, fault tolerance and other
aspects of distributed computing, making it much easier for programmers to write data
parallel programs. It also enables Google to exploit a large number of commodity computers
to achieve high performance at a fraction of the cost of a system built from fewer but more
expensive high-end servers. Map Reduce scales performance by scheduling parallel tasks on
nodes that store the task inputs. Each node executes the tasks with loose communication with
other nodes.Hadoop is an open source implementation of Map Reduce. To use the
HadoopMap Reduce framework, the user first writes a MapReduce application using the
programming model we described in the previous section. The user then submits the
MapReduce job to a job tracker, which is a Java application that runs in its own dedicated
JVM. The job tracker is responsible for coordinating the job run. It splits the job into a
number of map/reduce tasks and schedules the execution of the tasks
35
Fig: 3.3. Hadoop runs a Map Reduce job
3.11. HADOOP MAP REDUCE FOR MULTI-CORE
Task trackers have a fixed number of slots for map tasks and for reduce tasks. Each
slot corresponds to a JVM executing a task. Each JVM only employs a single computation
thread. To utilize more than one core, the user needs to configure the number of map/reduces
slots based on the total number of cores and the amount of memory available on each node.
The configuration can be set in the mapred-site.xml file. The relevant properties are map
reduce .The examples in Figure 3.5 shows how to set four map slots and two reduce slot son
each compute node. The setting can be used to express heterogeneity of the machines in the
cluster.
This setting can be different for each compute node. The reason is that different
machines in the cluster can have a different number of cores and differing amounts of
memory
36
For example, a typical hash join application requires each map task to store a copy of
the lookup table in memory. Duplicating the lookup table will decrease the amount of
memory available to each map task. To make sufficient memory available to each map task,
memory intensive applications are often forced to restrict the number of JVMs created to be
smaller than the number of cores in a node at the expense of reducing CPU utilization. For
example, in a machine with four cores and 4 GB of RAM, the system needs to create four
map tasks to use the four cores. However, if 1 GB RAM is in sufficient for each map task, the
Hadoop MapReduce system can create only two map tasks with 2 GBRAM available to each
task. With two map tasks, the runtime system utilizes only
Fig: 3.4. Hadoop Map Reduce on a four cores system of the four available cores or 50
percent of the CPU resources.
37
3.12. MAP REDUCE FOR BIG DATA ANALYSIS
• There is a growing trend of applications that should handle big data. However,
analyzing big data is a very challenging problem today.
• For such applications, the Map Reduce framework has recently attracted a lot of
attention. Google’s Map Reduce or its open source equivalent Hadoop is a powerful
tool for building such applications
• Effective management and analysis of large-scale data poses an interesting but critical
challenge.
Recently, big data has attracted a lot of attention from academia, industry as well as
government
Fig: 3.5.Map Reduce for Big Data Analysis
38
3.13. HEALTHCARE PROVIDER
Big data proof-of-value and data lake build out for a healthcare provider. The goal
was a big data solution for predictive analytics and enterprise reporting. The challenge was
that the organization did not have an enterprise data warehouse to consolidate data from their
enterprise operational systems. Further, key business stakeholders were not provided with
tools needed to access data. Lastly, teams tasked with creating analytics spent 90% of their
time integrating data in SAS, leaving them little time to analyze the data or create predictive
analytics. The successful project involved meeting with key business and IT stakeholders to
determine reporting and analytic challenges and priorities, and also performing a Current
State Assessment, along with meta data Discovery, profiling and outlier analysis of source
data. Dell EMC proposed data lake architecture to address enterprise reporting and predictive
analytic needs. The solution also initiated a governance program to ensure data quality and to
establish stewardship procedures. Finally, the project identified federated business data lake
hardware and Pivotal big data suite software as the target platform for the data lake.
The results of the project included new client analytics environment that facilitated
the execution of analytics and reporting activities to reduce time to insight. Further, client
governance structure ensured that metadata for new data sources into the data lake was shared
with users. The environment also supported the rapid creation of sandboxes to support
analytics Thesis.
 boosting patient care, service levels and efficiency by simplifying data access
 staff can view patient information with high reliability
 saving money by avoiding data migrations and upgrades
 increasing agility by breaking free of proprietary file constraint
Hadoop is a strong example of a technology that allows healthcare to store data in its
native form. If Hadoop didn’t exist, decisions would have to be made about what can be
incorporated into the data warehouse or the electronic medical record (and what cannot).
Now everything can be brought into Hadoop, regardless of data format or speed of ingests. If
a new data source is found, it can be stored immediately. No data is left behind.
39
By the end of 2017, the number of health records of millions of people is likely to
increase into tens of billions. Thus, the computing technology and infrastructure must be able
to render a cost efficient implementation of:
1. parallel data processing that is unconstrained
2. provide storage for billions and trillions of unstructured data sets
3. fault tolerance along with high availability of the system
Hadoop technology is successful in meeting the above challenges faced by the healthcare
industry as the Map Reduce engine and Hadoop Distributed File System (HDFS) have the
capability to process thousands of terabytes of data. Hadoop makes use of highly optimized,
yet inexpensive commodity hardware making it a budget friendly investment for the
healthcare industry.
 Integrating healthcare data dispersed among different healthcare organizations and

social media.
 Providing a shared pool of computing resources that is capable of storing and
analyzing healthcare big data efficiently to take smarter decisions at the right time.
 Providing dynamic provision of reconfigurable computing resources which can be
scaled up and down upon user demand. This will help reduce the cost of cloud based
healthcare systems.
 Improving user and device scalability and data availability and accessibility in
healthcare systems
40
3.13.1. BIG DATA ANALYTICS IS MOTIVATED IN HEALTHCARE
THROUGH THE FOLLOWING ASPECTS
 Healthcare data is now growing very rapidly in terms of size, complexity, and Speed
of generation and traditional database and data mining techniques are no longer
efficient in storing, processing and analyzing these data. New innovative tools are
needed in order to handle these data within a tolerable elapsed time.
 The patient’s behavioral data is captured through several sensors; patients' various
social interactions and communications.
 The standard medical practice is now moving from relatively ad-hoc and subjective
Decision making to evidence-based healthcare.
 Inferring knowledge from complex heterogeneous patient sources and leveraging the
patient/data correlations in longitudinal records.
 Understanding unstructured clinical notes in the right context.
 Efficiently handling large volumes of medical imaging data and extracting pot
initially useful information and biomarkers.
 Analyzing genomic data is a computationally intensive task and combining with
standard clinical data adds additional layers of complexity.
41
Fig: 3.6. Mobile Cloud Computing and Big data Analytics
3.14. BIG DATA ANALYTICS MAKING EFFICIENT USE OF

MEDICAL DATA
A lot of data is produced on a routine basis by hospitals, laboratories, retail, and non-
retail medical operations and promotional activities. But most of it gets wasted because
respective persons are not able to figure out what to do with that data. This is where Cloud-
based Big Data comes into the picture.
The big data analytics tools and repositories remove the hard thinking and generate
reliable and calculative insights out of huge volumes of data within a matter of seconds. This
means in the future we will need more doctors who are trained to work with big data.
42
The big data revolution is bringing up sophisticated methods of consolidating
information from tons of sources. The focus is on providing the most relevant and updated
information to doctors and medical practitioners in real time while they are consulting their
patients.
3.15. IMPROVED COMPUTING RESOURCES AT LOWER

INITIAL CAPITAL
Mobile Cloud computing is playing a vital role in making healthcare industry more
patient-centric and data-driven. It helps in storing a large amount of data and sharing
information among hospital, physicians, data analysts, and patients at a lower initial capital. It
supports big data set for storing and computing medical imaging like radiology, genomic data
offloading and collecting Electronic Health Records. It is a boundary-less and flexible way to
compute and store data. Incorporating Mobile Cloud-based Big Data increases collaboration
and security in a cost-effective manner.
3.16. BIG DATA STORE AND ANALYZING DATA FROM ALL

POSSIBLE RESOURCES
Up till now the collection of data is limited to the major available resources in the
healthcare sector. However, with the advent of Smartphone apps and wearable’s, data is now
everywhere. And this allows practitioners to know patients’ health conditions in a more
precise manner. Apps that act like pedometers to measure your steps, the calorie counter for
your diet, the app for monitoring and recording heart rate, blood pressure and blood sugar
levels, and wearable devices like Fit bit, Jawbone etc. are all sources of data nowadays. In the
near future, the patient will share this data with the doctor who can utilize it as a diagnostic
toolbox to provide better treatment in less time.
43
CHAPTER 4
BIG DATA HEALTHCARE USING ANT COLONY
OPTIMIZATION
Big Data Healthcare is the drive to capitalize on growing patient and health system
data availability to generate healthcare innovation. By making smart use of the ever-
increasing amount of data available, we can find new insights by re-examining the data or
combining it with other information. In healthcare this means not just mining patient records,
medical images, bio banks, test results , etc., for insights, diagnoses and decision support
advice, but also continuous analysis of the data streams produced for and by every patient in
a hospital, a doctor’s office, at home and even while on the move via mobile devices.
Current medical hardware, monitoring everything from vital signs to blood chemistry,
is beginning to be networked and connected to electronic patient records, personal health
records, and other healthcare systems.
The resulting data stream is monitored by healthcare professionals and healthcare

software systems. This allows the former to care for more patients, or to intervene and guide
patients earlier before an exacerbation of their (chronic) diseases. At the same time data are
provided for bio-medical and clinical Smart health care to mine for patterns and correlations,
triggering a process of “data-intensive scientific discovery”, building on the traditional uses
of empirical description, theoretical computer-based models and simulations of complex
phenomena.
Big Data has been characterized as raising five essentially independent challenges:
 Volume,
 Velocity,
 Variety,
 Veracity (lack thereof),
 Value (hard to extract).
44
As elsewhere, in Big Data Healthcare the data volume is in-creasing and so is data
velocity as continuous monitoring technology becomes ever cheaper. With so many types of
tests, and the existing wide range of medical hardware and personalized monitoring devices
healthcare data could not be more varied, yet data from this variety of sources must be
combined for processing to reap the expected rewards. In healthcare, veracity of data is of
paramount importance, requiring careful data curation and standardization efforts but at the
same time seeming to be in opposition to the enforcement of privacy rights2.
Finally, extracting value out of big healthcare data for all its beneficiaries (clinicians,
clinical Smart health care, pharmaceutical companies, healthcare policy-makers, etc.)
demands significant innovations in data discovery, transparency and openness, explanation
and provenance, summarization and visualization, and will constitute a major step towards
the cove-ted democratization of data analytics.
The following points will need therefore to be fleshed out:

 Management of big data
 Seamless end-to-end big data curation
 Data discovery, profiling, extraction, cleaning, integration, analysis,
visualization, summarization, explanation
 Use of big data
 Appropriate use of big data – avoiding over-reliance
 Responsible use of automated techniques
 Communicating big data findings to patients
 Integrating data analytics into clinical workflows
45
4.1 ANT Colony Optimization Technique
The Ant Colony Optimization (ACO) algorithm is a meta heuristic initially proposed
by Marco Dario in his PhD dissertation in 1992. “The original idea comes from observing the
exploitation of food resources among ants, in which ants’ individually limited cognitive
abilities have collectively been able to find the shortest path between a food source and the
nest”.
It is firstly used to solve traveling salesman problem (TSP). Because of the
characteristics of distributing computing, self-organization and positive feedback, ACO has
been used in prior works for routing in Sensor Networks “Node Potential” is the heuristic
used to evaluate the potential of next hop selection based on three factors: the candidate’s
distance to the sink node, its distance to the nearest aggregation node and its data correlation
with the current node.
In this algorithm, random searching for the destination (sink node) is needed in early
iterations. Use a simpler heuristic by only considering the distance to the sink node. An
algorithm composed of path construction, path maintenance, and aggregation schemes
including synchronization scheme, loop-free scheme, and avoiding collision scheme.
There is a problem ignored by the algorithms above. Although ACO aggregation
algorithms converge to a route very close to the optimum route, most of them only use a
single path to transfer data until an active node in the path runs out of battery. Then the path
construction and data delivery cycle starts again.
Although route discovery overhead can be reduced, those algorithms do not taken into
consideration limitations of WSNs, especially energy limit of Sensor nodes and number of
agents required to establish the routing Repeatedly using the same optimal path exhausts the
relaying nodes’ energy quickly.
Relatively frequent efforts to maintain the Network and to explore new paths are
needed. Therefore, this approach is not energy efficient and results in shorter Sensor nodes’
lifetime and consequently Network lifetime. Algorithms that separate path establishment and
data delivery processes suffer from this problem.
Data aggregation approach improves energy efficiency in Wireless Sensor Networks
by eliminating redundant packets, reducing end-to-end delay and Network traffic. This
research studies the effect of combining data aggregation technique and multi-path ACO
algorithm with different heuristics on Network lifetime and end-to-end delay.
46
4.2 VIRTUALIZATION RESOURCE ACCESS
Virtualization technology introduces a middle layer between the hardware and

software layers in a cloudlet, allowing the hardware resources to be shared by means of VM.
Resources (e.g., CPU, memory, network bandwidth, etc.) in a cloudlet are provisioned to
these VMs. Resource provisioning in cloud computing is a well-studied area.
However, the mobility in MCC introduces several challenges to maintain an

acceptable Quality of Service (QoS) when provisioning cloud resources. Mobile users may
move from one Access Point (AP) to another, increasing their distances between current
locations and the cloudlet, where the tasks are provisioned.
This increases the task-execution time. To address this issue, we propose a VM

migration technique for a heterogeneous MCC system following the user’s mobility pattern.
That is, when a user moves from one cloudlet to another cloudlet, the resource or VM must
be migrated to the cloudlet that is nearest to the user.
Consider the following scenario: a blind user is executing an application that takes an
image from his surroundings. Then, the application processes the image in the cloudlet and
gives a response to the user’s local client. That is, the application continuously uploads some
data and the cloud server processes this data to provide responses back to the user.
47
Fig: 4.1. Ant colony optimization
Now, if the blind user moves away from the current cloudlet, then hoer she will
experience a delayed response from the mobile application executing in the cloudlet,
degrading the overall performance of the application. To avoid this performance degradation,
it is necessary for the system to adopt a VM migration method to choose a cloudlet that is
currently closer to the user to which to migrate the VM. User mobility is not the only reason
forcing a VM to migrate. Migration can be initiated to minimize the over provisioned
resources and thus improve the overall system objectives. For instance, if a VM is required to
be migrated from a cloudlet to any of the candidate cloudlets, the new cloudlet may not have
the same type of VM. In that case, a VM with more resource than the current one must be
chosen and provisioned in order to migrate the VM and thus minimize task-execution time.
48
Fig: 4.2. Double bridge experiment. (a) Ants start exploring the double bridge. (b)
Eventually most of the ants choose the shortest path in principle capable of building a
solution (i.e., of finding a path between nest and food resource), it is only the colony of
ants that presents the “shortest path finding” behavior. In a sense, this behavior is an
emergent property of the ant colony.
The VM migration is provisioned more resources than the required. Therefore, this
over-provisioned resource greatly decreases the system objectives, as it reduces the number
of provisioned VMs in the cloudlets .Furthermore, the joint VM migration approach, where a
set of VMs is remapped based on the VM task execution time and over-provisioned resources
can help to effectively increase the overall system objectives. In contrast to the joint VM
migration approach, single VM migration can only improve particular user objectives but not
the system objectives.
 Cloud-wide task migration, where the task-migration decision is made by a central

cloud, which maximizes the objectives of a cloud provider.
 Server-centric task migration, where all migration decisions are made by the server,
where the task is currently executing.
 Task-based migration, where migration is initiated by the task itself.
49
4.3 ANT COLONY ALGORITHM
Step 1: while (termination criterion not satisfied)
Step 2: ant generation and activity();
Step 3: pheromone
Step 4: evaporation();
Step 5: daemon actions(); “optional”
Step 6: end while
Step 7: end Algorithm
4.3.1 ANT COLONY USE MOBILE CLOUD COMPUTING

1. Each cloud agent searches for the data which taken for the retrieval
2. Cloud agent has its own memory which enable to store the searched data in it for latter,
based on the memory capacity different types of data can be searched out and stored for the
future retrieval purpose.
3. An cloud agent C can be assigned a start state Sc and retrieving storage buffer can be set as
Ec
4. Cloud agent start from an initial state and move over all to the feasible data locations,
building the solutions in an incremental way. The procedure stops when at least one queried
data has found.
5. The Cloud agent locates a data in a node f can move to node g chosen in a feasible
neighborhood Nc through probabilistic decision rules. This can be formulated as follows : An
cloud agent c in state sr=<Sr-1;f> can move to any node in its feasible neighborhood Nc fined
as N ki j | (j ЄNi) Λ (<sr, j > Є S)} sr Є S , with S is a set of all states
6.A probabilistic rule is function of the following a) The data stored in a local node, data
structures A f =[A fg ] called ant routing table obtains from. pheromone trails and heuristic
values b) the ant’s own memory from previous iteration, and the problem constraints.
7. When moving from node f to neighbor node g, the agent update the pheromone trails t (fg)
on the edge (f,g).
8. Once the data is retrieved from the cloud, the agent can retrace the same path backward,
update pheromone trails and close the operation.
50
4.4 ALGORITHM OVERVIEW
In ACO algorithms, a colony of artificial ants is used to construct solutions guided by
the pheromone trails and heuristic information. The original idea of ACO comes from
observing the exploitation of food resources among ants. Ants explore the area surrounding
their nest initially in a random manner. As soon as an ant finds a source of food (source
node), it evaluates the quantity and quality of the food and carries some of it to the nest (sink
node). During the back tracking, the ant deposits a pheromone trail on the ground. The
quality of deposited pheromone, which may depend on the quantity and quality of the food,
will guide other ants to the food source. The pheromone trails are simulated via a
parameterized probabilistic model. The pheromone model consists of a set of parameters. In
general, the ACO approach attempts to find the optimal routing by iterating the following two
steps:
1. Solutions are constructed using a node selection model based on a predetermined
heuristic and the pheromone model, a parameterized probability distribution over the solution
space.
2. The solutions that were constructed in earlier iterations are used to modify the
pheromone values in a way that is deemed to bias the search toward high quality solutions.
The algorithm runs in two passes: forward and backward. In the forward pass, the
route is constructed by a group of ants, each of which starts from a unique source node. In the
first iteration, an ant searches a route to the destination randomly. Later, an ant searches the
nearest point of the previously discovered route. This could take much iteration before the ant
can find a correct path with a reasonable length. A solution is flooding the sink node ID from
the sink to all the Sensor nodes in the Network before any ant starts. The points where
multiple ants join are aggregation nodes. In the backward pass every ant starts from sink
node and travels back to the corresponding source node by following the path discovered in
the forward pass. Pheromone is deposited hop by hop during the traversal.
Nodes of the discovered path are given weights as a result of node selection
depending on the node potential which indicates heuristics for reaching the destination
Pheromone trails are the heuristics to communicate with other ants of the route discovered.
The trail followed by ants most often gets more and more pheromone and eventually
converges to the optimal route. Pheromone in non-optimal route gets evaporated with time.
The aggregation points on the optimal tree identify data aggregation.
51
4.4.1 PATH DISCOVERY PROCEDURE
The procedure is mainly composed of forward and backward passes. In the forward
pass, an ant tries to explore a new path based on the heuristic rule and the pheromone amount
on the edges. Backtracking is used in the forward pass when an ant finds a dead end or is
running into a loop. In the backward pass, the ant updates the pheromone amount on the path
constructed in the forward pass. Other important components in the algorithms include data-
aggregation, loop control, and Network maintenance. In WSN, each node has a unique
identity. Every node is able to calculate and remember its current heuristic value. Initially, the
sink node floods its identity to all the nodes in the Network. After a node receives the packet,
it computes its hop-count to the sink node and correspondingly its initial heuristic value.
4.4.2 FORWARD PASS
Each ant is assigned a source node. After that, an ant starts from the source node and
moves towards the sink node using ad-hoc routing. The forward pass ends only if all the ants
have arrived at the sink node. Single ant-based solution construction uses following steps:
 If the node has been visited in the same iteration, follow a previous ant’s path
 Use a node selection rule
 If all the neighbors have been visited, use the shortest path
 If no neighbor nodes, backtrack to the previous node
 If no neighbor nodes and the previous node is dead, record the Network
Lifetime and exit the program.
The current node sends the packet. The selected node receives the packet. Both Nodes
update the residual energy after transmission. If the current node does not have enough
energy to send, this transmission fails. The Network is maintained afterwards. Transmission
failure is mostly prevented by doing a receiving and sending energy check in the node
selection step.
4.4.3 BACKWARD PASS
52
Ants start from the sink node and move towards their source nodes. The ants follow
the paths discovered in the forward pass. Before an ant arrives at its source node, the
algorithm repeats:
 Retrieve the previous node in the path solution.
 Transmit the packet.
 If transmission fails, maintain the Network and terminate this ant.
 Encourage or discourage the node selection in the forward pass by depositing.


4.4.4 DATA AGGREGATION IN FORWARD PASS
Each Sensor node maintains two queues to store packets: a receiving queue and a
sending queue. The packet sending process includes:
 Remove all the packets from the receiving queue.
 For ”SinkDistNoAggre”, push all the packets into the sending queue.
 For other aggregation algorithms, use the predefined function to aggregate all the
received packets into one packet and push it into the sending queue.
 Among all the ants arrived at this node, select the earliest ant as the aggregating ant.
 The aggregating ant will finish the rest of the routing construction in this iteration.
 All the later arrived ants become aggregated ants. They remember the aggregating
ant.
 Each aggregated ant shares its path with the aggregating ant. The aggregating ant
updates its subsequent hops with all the aggregated ants.
53
4.5 LOOP CONTROL AND FAILURE HANDLING
“Loop” is defined as the situation that anent revisits an already-visited node in the
same forward pass. Since each ant remembers the path, it can avoid running into a loop by
comparing the candidate node’s ID with the visited nodes’ IDs.
An ant is considered failing its task in a iteration, if all the neighborhood nodes of the
current node have been visited. In that case, the ant uses the shortest path to deliver the
packet to the sink node. The node’s previous visiting history is not considered when choosing
the next node. A path resulting in “failure” is discouraged.
4.6 NETWORK MAINTENANCE

When a node does not have sufficient energy to send or receive (the “dead node”), it
is removed from the neighbor list of its neighborhood nodes. Nodes with more hop-count
than the “dead node” recalculate their hop-count and heuristic value. If the “dead node” is a
source node, find the node with the maximum energy in the Network as the new source.
Afterwards, update the source node of the ant.
If the “dead node” is the sink node, recharge the node with more energy. Sink node is
different from other nodes because it needs to perform more frequent transmission and
computation for the purpose of application. Therefore, it is assumed that the sink node has
plenty energy to last until the Network dies.
4.6.1 NEXT NODE SELECTION

To support next node selection, rules are established and followed in the forward
pass. These rules check the candidate node’s probability calculated from heuristic and
pheromone values. The heuristic is updated whenever the value is changed. The pheromone
is updated in the backward pass according to the rules set.
4.6.2 NODE SELECTION RULES

“LeadingExploration” and “Combined Rule”. The “SinkDistNoAggre,”
“SinkDistLead,” and “ResidualEnergy” algorithms use “LeadingExploration” because the
first found best candidate node needs to be selected. The “SinkDistComb” and
“SinkAggreDist” algorithms use “CombinedRule” so that multiple paths can be established.
54
4.6.3 LEADING EXPLORATION
Among all the neighborhood nodes, select the first node with the highest probability,
even if there are multiple nodes with the same probability. This method is deterministic. In
every iteration, an ant always discovers the same path to the sink node until one of the
intermediate nodes dies. If the same Network topology is tested repeatedly, the total energy
cost and Network lifetime are the same.
4.6.4 COMBINED RULE

Node selection is divided into sessions. Each session includes one or more iterations.
A node discovered from the current or a previous iteration is used. Similar to “Leading
Exploration,” the probability of each neighborhood node is calculated. A group of nodes with
highest probability is stored in a cache. In each iteration, one node is randomly selected and
removed from the cache. When the cache is empty, the probability calculation of all the alive
neighborhood nodes is repeated.
4.6.5 PROBABILITY CALCULATION

When a node is ready to send a packet, it calculates the probability of all the
neighbors using the equation below.
(1)
In equation (1), an ant k having data packet in node i chooses to move to node j until
the sink node, where τ is the pheromone, η is the heuristic, Ni is the set of neighbors of node
i, and β is a parameter which determines the relative importance of pheromone versus
distance (β>0). Value η is calculated using equation (2). Multiple factors can be used and
each one is weighted.
4.6.6 PHEROMONE UPDATE RULES

The pheromone value is associated with the link (edge) between two nodes. Each
edge has a pheromone value, which is initially all the same. The value is updated in each
iteration in order to bias the node selection process in the next iteration. The value is updated
twice in each iteration.
55
4.6.7 EVAPORATION ON ALL EDGES
After all the ants finish the forward passing and before they are going backward, the
pheromone values on all the edges in the Network evaporate at rate ρ. The value is
consistently reduced. Equation (3) shows how the evaporated pheromone value is calculated.
τij = (1 – ρ) ×τij (2)
4.7 DEPOSIT PHEROMONE DURING BACKWARD PASS

In the backward pass, each ant deposits or reduces the pheromone value on its own
solution path. This step is different from the conventional ACO algorithm, in which
pheromone is always deposited using the same rate. Encouraging or discouraging a node
choice in the forward pass depends on the comparison of performance in the forward pass
with the one of the best iteration found so far. The new pheromone is calculated using
equation (4). Equations (5) and (6) are used to support equation (4).
τij = (τij + ρΔτij) × e0 (3)
τij = [ζ + (hi – hj)] × Δωj (4)
Δωj = ∑ (5)
In equation (4), ρ is the pheromone decay parameter, τij is the pheromone value on the
edge between nodes i and j, and e0 is the encouraging or discouraging rate derived from the
forward pass. A path resulting in less energy consumption and smaller total hop-count is
preferred. The best iteration is one with the least energy consumption and hop-count among
all previous iterations. It is used as a control to calculate the e0 in the current iteration. If the
forward pass is a failed path exploration or used more hop-count and energy consumption
than the best iteration, the path is discouraged. Very small amount of pheromone is deposited
on the edge to differentiate from those links not been visited, and e0 is set to a predetermined
“PunishRate,” which is a relatively low rate between 0 and 1.
If the forward pass found a path using the same hop-count and energy consumption as
the best iteration, e0 is set to a relatively higher rate between 0 and 1-- the “encourageRate.”
If the forward pass found a path with the same hop-count but less energy consumption than
the best iteration, e0 = 1.5 × encourageRate. If the forward pass found a path using less hop-
count and energy consumption than the best iteration, e0 = hop-count difference ×
encourageRate.
56
In equation (5), ζ is a positive number, hi is the hop-count between node i and the
sink, and hj is the hop-count between node j and the sink. If the value of (hi - hj) is greater
than zero, it can be concluded that node j is closer to the sink node than node i. Therefore, the
algorithm rewards the path from node i to node j by depositing more pheromone. If the value
equals to zero, it means that both nodes i and j have the same hop-count to the sink, then the
algorithm lays little pheromone on the path. If the value is less than zero, the algorithm does
not lay pheromone on this path. In equation (6), Rj is the total hop-counts of these sources
before vising node j. Therefore, Δωj is the total hop-counts of some sources to the sink
through node j. The less the total hop accounts, the larger amount of pheromone is added on
the path from node i to node j, as shown in equation (5).
This means that more ants are encouraged to follow this path. For an aggregation
node, it updates the pheromone levels of its all neighbors by equation (4) when an ant moves
to it. If a node does not have ants visit it within a limited time, its pheromone is evaporated
according to equation (3).
57
CHAPTER 5
HONEY POTS DETECTION USING SMART HEALTH
CARE
5.1 HISTORY OF HONEYPOTS

The concept of Honeypots was first described by Clifford Stoll in 1990. The book is a
novel based on a real story which happened to Stoll. He discovered a hacked computer and
decided to learn how the intruder gained access to the system. To track the hacker back to his
origin, Stoll created a faked environment with the purpose to keep the attacker busy. The idea
was to track the connection while the attacker was searching through prepared documents.
Stoll did not call his trap a Honeypot; he just prepared a network drive with faked documents
to keep the intruder on his machine. Then he used monitoring tools to track the hacker’s
origin and find out how he came in.
In 1999 that idea was picked up again by the Honey net project, lead and founded by
Lance Spritzer. During years of development the Honey net project created several papers on
Honeypots and introduced techniques to build efficient Honeypots. The Honey net Project is
a non-profit research organization of security professionals dedicated to information security.
5.2 TYPES OF HONEYPOTS

To describe Honeypots in greater detail it is necessary to define types of Honeypots.
The type also defines their goal, as we will see in the following. A very good description on
those can also be found in. Honeypots can be classified based on their deployment
(use/action) and based on their level of involvement. Based on deployment, honeypots may
be classified as
 Production Honeypots
 Research Honeypots
58
5.2.1 PRODUCTION HONEYPOTS
production honeypots are easy to use, capture only limited information, and are used
primarily by corporations. Production honeypots are placed inside the production network
with other production servers by an organization to improve their overall state of security.
Normally, production honeypots are low-interaction honeypots, which are easier to deploy.
They give less information about the attacks or attackers than research honeypots.
5.2.2 RESEARCH HONEYPOTS
Research honeypots are run to gather information about the motives and tactics of
the black hat community targeting different networks. These honeypots do not add direct
value to a specific organization; instead, they are used to research the threats that
organizations face and to learn how to better protect against those threats Research honeypots
are complex to deploy and maintain, capture extensive information, and are used primarily by
research, military, or government organizations.
5.3 THE IDEA OF HONEYPOTS
The concept of Honeypots in general is to catch malicious network activity with a

prepared machine. This computer is used as bait. The intruder is intended to detect the
Honeypot and try to break into it. Next the type and purpose of the Honeypot specifies what
the attacker will be able to perform. Often Honeypots are used in conjunction with Intrusion
Detection Systems. In these cases Honeypots serve as Production Honeypots and only extend
the IDS. But in the concept of Honeynets the Honeypot is the major part. Here the IDS is set
up to extend the Honeypot’s recording capabilities
59
Fig: 5.1. Single Honey pot detection in system
A common setup is to deploy a Honeypot within a production system. The figure
above shows the Honeypot colored orange. It is not registered in any naming servers or any
other production systems, i.e. domain controller. In this way no one should know about the
existence of the Honeypot. This is important, because only within a properly configured
network, one can assume that every packet sent to the Honeypot, is suspect for an attack.
If misconfigured packets arrive, the amount of false alerts will rise and the value of
the Honeypot drops. Production Honeypot Production Honeypots are primarily used for
detection. Typically they work as extension to Intrusion Detection Systems performing an
advanced detection function. They also prove if existing security functions are adequate, i.e.
if a Honeypot is probed or attacked the attacker must have found a way to the Honeypot. This
could be a known way, which is hard to lock, or even an unknown hole. However measures
should be taken to avoid a real attack. With the knowledge of the attack on the Honeypot it is
easier to determine and close security holes.
A Honeypot allows justifying the investment of a firewall. Without any evidence that
there were attacks, someone from the management could assume that there are no attacks on
the network. Therefore that person could suggest stopping investing in security as there are
no threats. With a Honeypot there is recorded evidence of attacks. The system can provide
information for statistics of monthly happened attacks.
60
5.4 LOG RHYTHM’S HONEYPOT SECURITY ANALYTICS
SUITE
Log Rhythm’s Honeypot Security Analytics Suite allows customers to centrally
manage and continuously monitor honeypot event activity for adaptive threat defense. When
an attacker begins to interact with the honeypot,Log Rhythm’s Security Intelligence Platform
begins tracking the attacker’s actions, analyzing the honeypot data to create profiles of
behavioral patterns and attack methodologies based on the emerging threats. This automated
and integrated approach to honeypots eliminates the need for the manual review and
maintenance associated with traditional honeypot deployments.
Fig: 5.2. LogRhythm’s Honeypot Security Analytics Suite
The Honeypot Security Analytics Suite provides AI Engine rules that perform real-
time, advanced analytics on all activity captured in the honeypot, including successful logins
to the system, observed successful attacks, and attempted/successful malware activity on the
host. As a result, the Honeypot suite allows AI Engine to also detect when similar activity
captured from the honeypot is observed on the production network. For example, if an
observed attacker interaction on the honeypot is followed by a subsequent interaction with
legitimate hosts within the environment such as production web servers, Log Rhythm can
generate an alarm alerting IT and security personnel to the suspicious activity.
61
5.5 PREVENT COMPROMISED CREDENTIALS
5.5.1 CHALLENGE
The majority of attacks exploit valid user credentials to gain unrestricted access to the
corporate network. Organizations need an effective means of monitoring for insecure
accounts and passwords to prevent credentials from being compromised.
5.5.2 SOLUTION
Log Rhythm’s Honeypot Security Analytics Suite provides AI Engine rules that
monitor for successful and unsuccessful logon attempts to honeypot servers, capturing details
on the username and password. This allows analysts to see commonly attempted username
and password combinations on the honeypot hosts.
5.5.3 ADDITIONAL BENEFIT

By knowing which accounts are being targeted by hackers and which passwords are
vulnerable to exploit in the honeypot, organizations are able to strategically increase defense
measures within their network by monitoring at-risk user accounts and enforcing stricter
password policies. A Smart Response TMplugin can automatically add the IP address
observed in the honeypot to a firewall list to prevent interaction with the corporate network
62
Fig: 5.3 MCC User Authentications
Transferring data from one remote system to another under the control of a local
system is remote uploading. Remote uploading is used by some online file hosting services. It
is also used when the local computer has a slow connection to the remote systems, but they
have a fast connection between them. Without remote uploading functionality, the data would
have to first be download to local host and then uploaded to the remote file hosting server,
both times over slow connections.
63
5.6 ALGORITHM IMPLEMENTATION
Algorithm Honey been Based VM Migration, at Cloudlet i
INPUT:VMsetVandaccessiblecloudletsetCuforeach VM
u. System Parameters
OUTPUT: VM-cloudlet pair.
Step 1: Initialize system tunning parameter α
Step2: Initialize system migration parameters γl ,γg
Step3: Determine Candidate VM set Vs using Algorithm 1
Step4: Initialize ants set A
Step5: Generate initial solution using FFVM Algorithm 3
Step6: Calculate initial pheromone γ0
Step7: Set maximum iteration MAX_IT
Step 8: while ( doiteration≤MAX_IT)
Step 9: for (Ant a∈A) do
Step10: k=0 11: repeat
Step12: Select a VM v in cloudlet j for VM uk∈Vs
Step13: k=k+1
Step14: until every VM is provisioned to a cloudlet 15: for (VM k∈Vs) do
Step16: Update the local pheromone
Step17: end for
Step18: end for
Step19: Update the global pheromone
Step20: iteration=iteration+1
64
Step21: end while
Step22: Return VM-cloudlet pairs
5.7 HONEYPOTS IMPROVE NETWORK SECURITY
Honey pots turn the tables for Hackers and computer security experts. While in the
classical field of computer security, a computer should be as secure as possible, in the realm
of Honeypots the security holes are opened on purpose. In other words Honeypots welcome
Hacker and other threats.
5.7.1 PURPOSE OF HONEYPOT TECHNIQUE
The purpose of a Honeypot is to detect and learn from attacks and use that
information to improve security. A network administrator obtains first-hand information
about the current threats on his network. Undiscovered security holes can be protected gained
by the information from a Honeypot. A Honeypot is a computer connected to a network. It
can be used to examine vulnerabilities of the operating system or network.
Depending on the setup, security holes can be studied in general or in particular.
Moreover it can be used to observe activities of an individual which gained access to the
Honeypot. Honeypots are a unique tool to learn about the tactics of hackers. So far network
monitoring techniques use passive devices, such as Intrusion Detection Systems (IDS). IDS
analyze network traffic for malicious connections based on patterns.
Those can be particular words in packet payloads or specific sequences of packets.
However there is the possibility of false positive alerts, due to a pattern mismatch or even
worse, false negative alerts on actual attacks. On a Honeypot every packet is suspicious.
65
Fig: 5.4 Honey pot setup
5.7.2 RESEARCH SMART HEALTH HONEYPOTS
As the name suggests these honeypots are deployed and used by Smart health care or
curious individuals. These are used to gain knowledge about the methods used by the black
hat community. They help security Smart health care learn more about attack methods and
help in designing better security tools. They can also help us detect new attack methods or
bugs in existing protocols or software. They can also be used to strengthen or verify existing
intrusion detection systems. They can provide valuable data which can be used to perform
forensic or statistical analysis.
5.7.3 PRODUCTION HONEYPOTS

These honeypots are deployed by organizations as a part of their security
infrastructure. These add value to the security measures of an organization. These honeypots
can be used to refine an organization’s security policies and validate its intrusion detection
systems. Production honeypots can provide warnings a head of an actual attack.
66
For example, lots of HTTP scans detected by honeypot is an indicator that a new http
exploit might be in the wild. Normally commercial servers have to deal with large amounts of
traffic and it is not always possible for intrusion detection systems to detect all suspicious
activity. Honeypots can function as early warning systems and provide hints and directions to
security administrators on what to lookout for.
Fig: 5.5 Process of finding the minimum distance
The real value of a honeypot lies in it being probed, scanned and even compromised,
so it should be made accessible to computers on the Internet or at least as accessible as other
computers on the network. As far as possible the system should behave as a normal system on
the Internet and should not show any signs of it being monitored or of it being a honeypot.
Even though we want the honeypot to be compromised it shouldn’t pose a threat to other
systems on the Internet. To achieve this, network traffic leaving the honeypot should be
regulated and monitored.
5.8 SECURITY ISSUES

Honeypots don’t provide security (they are not a securing tool) for an organization but
if implemented and used correctly they enhance existing security policies and techniques.
Honeypots can be said to generate a certain degree of security risk and it is the
administrator’s responsibility to deal with it. The level of security risk depends on their
implementation and deployment. There are two views of how honeypot systems should
handle its security risks.
67
5.8.1 HONEYPOTS THAT FAKE OR SIMULATE
There are honeypot tools that simulate or fake services or even fake vulnerabilities.
They deceive any attacker to think they are accessing one particular system or service. A
properly designed tool can be helpful in gathering more information about a variety of servers
and systems. Such systems are easier to deploy and can be used as alerting systems and are
less likely to be used for further illegal activities.
5.8.2 HONEYPOTS THAT ARE REAL SYSTEMS
This is a viewpoint that states that honeypots should not be anything different from
actual systems since the main idea is to secure the systems that are in use. These honeypots
don’t fake or simulate anything and are implemented using actual systems and servers that
are in use in the real world. Such honeypots reduce the chances of the hacker knowing that he
is on a honeypot. These honeypots have a high risk factor and cannot be deployed
everywhere. They need a controlled environment and administrative expertise. A
compromised honeypot is a potential risk to other computers on the network or for that matter
the Internet.
5.9 LEGAL ISSUES
To start with, a honeypot should be seen as an instrument of learning. Though there is

a viewpoint that honeypots can be used to “trap” hackers. Such an idea can be considered as
an entrapment. The legal definition of entrapment is “Entrapment is the conception and
planning of an offense by an officer, and his procurement of its commission by one who
would not have perpetrated it except for the trickery, persuasion, or fraud of the officers."
This legal definition applies only to law-enforcement, so organizations or educational

institutions cannot be charged with entrapment. The key to establishing entrapment is
“predisposition” would the attacker have committed the crime without “encouragement
activity”. Also as long as one doesn’t entice the hacker in any way it cannot be considered
entrapment. The issue of privacy is also of concern with respect to the monitoring and
intercepting of communication. Honeypots are systems intended to be used by nobody.
68
5.10 ROLE OF HONEYPOTS IN NETWORK SECURITY
Honeypots and related technologies have generated great deal of interest in the past
two years. Honeypots can be considered to be one of the latest technologies in network
security today. Project Honey net is actively involved with deployment and study of
honeypots. Honeypots are used extensively in research and it’s a matter of time that they will
be used in production environments as well.
Fig: 5.6. Security Honeypot
5.10.1 SECURITY CATEGORIES
To assess the value of Honeypots we will break down security into three categories as
defined by Bruce Schneider in Secrets and Lies. Schneider breaks security into prevention,
detection and response.
69
5.10.2 PREVENTION
Prevention means keeping the bad guys out. Normally this is accomplished by
firewalls and well patched systems. The value Honeypots can add to this category is small. If
a random attack is Performed, Honeypots can detect that attack, but not prevent it as the
targets are not predictable.
One case where Honeypots help with prevention is when an attacker is directly
hacking into a server. In this case a Honeypot would cause the hacker to waste time on a non-
sufficient target and help preventing an attack on a production system. But this means that the
attacker has attacked the Honeypot before attacking a real server and not otherwise.
5.10.3 DETECTION
Detecting intrusions in networks is similar to the function of an alarm system for

protecting facilities. Someone breaks into a house and an alarm goes off. In the realm of
computers this is accomplished by Intrusion Detection Systems.
Fig: 5.7 Intrusion Detection based on Web server
The problems with these systems are false alarms and non-detected alarms. A system
might alert on suspicious or malicious activity, even if the data was valid production traffic.
70
Due to the high network traffic on most networks it is extremely difficult to process every
data, so the chances for false alarms increase with the amount of data processed. High traffic
also leads to non-Detected attacks. When the system is not able to process all data, it has to
drop certain packets, which leaves those unscanned. An attacker could benefit of such high
loads on network traffic.
5.10.4 RESPONSE
After successfully detecting an attack we need information to prevent further threats

of the same type. Or in case an institution has established a security policy and one of the
employees violated against them, the administration needs proper evidence. Honeypots
provide exact evidence of malicious activities. As they are not part of production systems any
packet sent to them is suspicious and recorded for analysis. The difference to a production
server is that there is no traffic with regular data such as traffic to and from a web server. This
reduces the amount of data recorded dramatically and makes evaluation much easier. With
that specific information it is fairly easy to start effective countermeasures.
5.10.5 CONCEPT, ARCHITECTURE AND TERMS OF A HONEYPOT
This chapter defines concepts, architecture and terms used in the realm of Honeypots.
It describes the possible types of Honeypots and the intended usage and purpose of each type.
Further auxiliary terms are explained to gain a deeper understanding about the purpose of
Honeypot concepts.
BLACK HATS AND WHITE HATS
In the computer security community, a Black hat is a skilled hacker who uses his or
her ability to pursue his interest illegally. They are often economically motivated, or may be
representing a political cause. Sometimes, however, it is pure curiosity. The term “Black hat”
is derived from old Western movies where outlaws wore black hats and outfits and heroes
typically wore white outfits with white hats.
White hats are ethically opposed to the abuse of Computer systems. A White hat
generally concentrates on securing IT Systems whereas a Black hat would like to break into
71
them. Both Black hats and White hats are hackers. However both are skilled computer
experts in contrast to the so-called "script kiddies". Actually script kiddies could be referred
as Black hats, but this would be a compliment to such individuals. From the work of real
hackers, script kiddies, extract discovered and published exploits and merge them into a
script.
5.11 LEVEL OF INTERACTION
Honeypots were described by their role of application. To describe them in greater

detail it is necessary to explain the level of interaction with the attacker.
5.11.1 LOW-INTERACTION HONEYPOTS
A low-interaction Honeypot emulates network services only to the point that an

intruder can log in but perform no actions. In some cases a banner can be sent back to the
origin but not more. Low-interaction Honeypots are used only for detection and serve as
production Honeypots.
In comparison to IDS systems, low-interaction Honeypots are also logging and
detecting attacks. Furthermore they are capable of responding to certain login attempts, while
an IDS stays passive.
The attacker will only gain access to the emulated service. The underlying operating
system is not touched in any way. Hence this is a very secure solution which promotes little
risk to the environment where it is installed in.
5.11.2 MEDIUM-INTERACTION HONEYPOTS
Medium-interaction Honeypots are further capable of emulating full services or

specific vulnerabilities, i.e. they could emulate the behavior of a Microsoft IIS web server.
Their primary purpose is detection and they are used as production Honeypots.
Similar to low-interaction Honeypots, medium-interaction Honeypots are installed as
an application on the host operating system and only the emulated services are presented to
the public. But the emulated services on medium interaction Honeypots are more powerful,
72
thus the chance of failure is higher which makes the use of medium-interaction Honeypots
more risky.
5.11.3 HIGH-INTERACTION HONEYPOTS

These are the most elaborated Honeypots. They either emulate a full operating system
or use a real installation of an operating system with additional monitoring. High-interaction
Honeypots are used primarily as research Honeypots but can also serve as production
Honeypots.
As they offer a full operating system the risk involved is very high. An intruder could easily
use the compromised platform to attack other devices in the network or cause bandwidth
losses by creating enormous traffic.
5.12 TYPES OF ATTACKS

There are a lot of attacks on networks, but there are only two main categories of
attacks.
5.12.1 RANDOM ATTACKS
Most attacks on the internet are performed by automated tools. Often used by
unskilled users, the so-called script-kiddies they search for vulnerabilities or already installed
Backdoors (see introduction). This is like walking down a street and trying to open every car
by pulling the handle. Until the end of the day at least one car will be discovered unlocked.
Most of these attacks are preceded by scans on the entire IP address range, which means that
any device on the net is a possible target.
5.12.2 DIRECT ATTACKS

A direct attack occurs when a Black hat wants to break into a system of choice, such
as an ecommerce web server containing credit card numbers. Here only one system is
73
touched and often with unknown vulnerabilities. A good example for this is the theft of 40
million credit card details at MasterCard International.
That Card Systems Solutions, a third-party processor of payment data has encountered
a security breach which potentially exposed more than 40 million cards of all brands to fraud.
"It looks like a hacker gained access to Card Systems' database and installed a script that acts
like a virus, searching out certain types of card transaction data, direct attacks are performed
by skilled hackers; it requires experienced knowledge. In contrast to the tools used for
random attacks, the tools used by experienced Black hats are not common. Often the attacker
uses a tool which is not published in the Black hat community. This increases the threat of
those attacks.
5.13 SECURITY CATEGORIES

To assess the value of Honeypots we will break down security into three categories as
defined by Bruce Schneider in Secrets and Schneider breaks security into prevention,
detection and response.
5.13.1 PREVENTION
Prevention means keeping the bad guys out. Normally this is accomplished by
firewalls and well patched systems. The value Honeypots can add to this category is small. If
a random attack is performed, Honeypots can detect that attack, but not prevent it as the
targets are not predictable. One case where Honey pots help with prevention is when an
attacker is directly hacking into a server. In this case a Honeypot would cause the hacker to
waste time on a non-sufficient target and help preventing an attack on a production system.
But this means that the attacker has attacked the Honeypot before attacking a real server and
not otherwise. Also if an institution publishes the information that they use a Honeypot it
might deter attackers from hacking. But this is more in the fields of psychology and quite too
abstract to add proper value to security.
5.14 HONEYPOTS IN THE FIELD OF APPLICATION

The categorizes the field of application of Honeypots. It investigates different
environments and explains their individual attributes. Five scenarios have been developed to
74
separate the demands to Honeypots. The use of a Honeypot poses risk and needs exact
planning ahead to avoid damage. Therefore it is necessary to consider what environment will
be basis for installation. According to the setup the results are quite different and need to be
analyzed separately. For example the amount of attacks occurring in a protected environment
are less than the number of attacks coming from the internet at least they should. Therefore a
comparison of results afterwards needs to focus on the environment.
In every case there is a risk of using a Honeypot. Risk is added on purpose by the
nature of a Honeypot. A compromised Honeypot, in Hacker terms an “owned box”, needs
intensive monitoring but also strong controlling mechanisms. Scenario VI discusses
requirements on a Honey pot-out-of-the box solution and elaborates different functions which
have to be provided.
CHAPTER 6
EXPERIMENTAL RESULT
75
6.1 HADOOP SERVER
Hadoop is an open source distributed processing framework that manages data

processing and storage for big data applications running in clustered systems. It is at the
center of a growing ecosystem of big data technologies that are primarily used to support
advanced analytics initiatives, including predictive analytics, data mining and machine
learning applications. Hadoop can handle various forms of structured and unstructured data,
giving users more flexibility for collecting, processing and analyzing data than relational
databases and data warehouses provide.
Fig: 6.1 Hadoop Login
6.2 THE USAGE OF HADOOP
76
The flexible nature of a Hadoop system means companies can add to or modify their
data system as their needs change, using cheap and readily-available parts from any IT
vendor.
Just about all of the big online names use it, and as anyone is free to alter it for their
own purposes, modifications made to the software by expert engineers at, for example,
Amazon and Google, are fed back to the development community, where they are often used
to improve the "official" product. This form of collaborative development between volunteer
and commercial users is a key feature of open source software.
In its "raw" state - using the basic modules supplied here http://hadoop.apache.org/ by
Apache, it can be very complex, even for IT professionals - which is why various commercial
versions have been developed such as Cloudera which simplify the task of installing and
running a Hadoop system, as well as offering training and support services.
Fig: 6.2To Find the Data Node
6.3 ECLIPSE
77
1. Open Ellipse
2. Click File ->New Project >Java project
Copy all the Jar files from the locations “D:\hadoop-2.6.0\”
a. \share\hadoop\common\lib
b. \share\hadoop\mapreduce
c. \share\hadoop\mapreduce\lib share\hadoop\yarn
d. \share\hadoop\yarn\lib
Fig: 6.3 Hadoop Integration for Eclipse
Connect DFS in Eclipse Eclipse ->Window ->Perspective ->Open Perspective->

Other ->MapReduce ->Click OK. See a bar at the bottom. Click on Map/Reduce locations.
Right click on blank space, then click on “Edit setting,” and you will see the following
screen.
78
Fig: 6.4 To Detect The Hadoop location
4 Configuration for Hadoop 1.x Fetch Hadoop using version control systems
subversion or git and checkout branch-1 or the particular release branch. Otherwise,
download a source tarball from the CDH3 releases or Hadoop releases.
5 Generate Eclipse project information using Ant via command line:
6 For Hadoop (1.x or branch-1), “ant eclipse”
7 For Smart City releases, “ant eclipse-files”
8 Pull sources into Eclipse:
9 Go to File -> Import.
10 Select General -> Existing Projects into Workspace.
11 For the root directory, navigate to the top directory of the above downloaded source
79
Fig: 6.5 Hadoop Initialized
1. Follow Steps 1 and 2 of the previous section (Hadoop 2.x).
2. Download MR1 source tarball from CDH4 Downloads and untar into a folder
different than the one from Step 1.
3. Within the MR1 folder, generate Eclipse project information using Ant via command
line (ant eclipse-files).
4. Configure .classpath using this perl script to make sure all classpath entries point to
the local Maven repository:
1. Copy the script to the top-level Hadoop directory.
2. Run $ perl configure-classpath.pl
5. Pull sources into Eclipse:
1. Go to File -> Import.
2. Select General -> Existing Projects into Workspace.
80
3. For the root directory, navigate to the top directory of the above downloaded
sources.
Fig:6.6 Hadoop-0.19.1tar.gz
1. Generate Eclipse project information using Maven: mvn clean &&mvn install –
DskipTests &&mvneclipse:eclipse. Note: mvneclipse:eclipse generates a static
.classpath file that Eclipse uses, this file isn’t automatically updated as the
project/dependencies change.
2. Pull sources into Eclipse:
1. Go to File -> Import.
2. Select General -> Existing Projects into Workspace.
3. For the root directory, navigate to the top directory of the above downloaded
source.
 Execute tar –xzf Hadoop-0.19.1tar.gz in the cygwin prompt, this will start the process
of unpacking Hadoop distribution. Once this is done, it will display newly created
directory called hadoop-0.19.1
81
 Verify whether unpacking is success by executing cd Hadoop-0.19.1 and then -1,
which provides the output as mentioned below which tells that everything is unpacked
correctly.
Fig: 6.7 Hadoop Advance Setting Configuration
In the next step, click on Configure Hadoop Installation link, displayed on the right
side of the project configuration window. Project preferences window display is shown in the
image below. Fill in the location of Hadoop directory in Hadoop Installation Directory in
preferences and click OK, and then close the project window after clicking on finish
82
6.4 SCENARIO I – UNPROTECTED ENVIRONMENT
In an unprotected environment any IP address on the internet is able to initiate

connections to any port on the Honeywell. The Honeypot is accessible within the entire
internet.
Fig: 6.8 unprotected Environments

An adequate setup needs to ensure that the monitoring and logging capabilities are
sufficient of handling large numbers of packets. An experiment based on this scenario,
recorded approximately 597 packets a second Depending on the current propagation of
worms in the internet this can be more or less. The monitoring device, the Honeypot or an
external monitor, needs enough resources to handle the huge amount of traffic.
The type of address of the Honeypot can be public or private (def. of public and
private addresses in 3.3 and 3.4). The type of network addresses the Honeypot is located in is
defined in Scenario III resp. Scenario IV. If specifying a setup Scenario I and II cannot occur
alone. Both have to be used in conjunction with either Scenario III or Scenario IV. The reason
for this is a limitation described in Scenario IV.
6.5 SCENARIO II – PROTECTED ENVIRONMENT
In this scenario the Honeypot is connected to the internet by a firewall. The firewall
limits the access to the Honeypot. Not every port is accessible from the internet resp. not
83
every IP address on the internet is able to initiate connections to the Honeypot. This scenario
does not state the degree of connectivity; it only states that there are some limitations.
However those limitations can be either strict, allowing almost no connection, or loose, only
denying a few connections.
The firewall can be a standard firewall or a firewall with NAT1capabilities (see chapter 3.3).
However a public IP address is always assigned to the firewall.
Fig: 6.9 Protected Environments
6.6 SCENARIO III – PUBLIC ADDRESS
This scenario focuses on the IP address on the Honeypot. In this scenario the
Honeypot is assigned a public address. The Internet Assigned Numbers Authority (IANA)
maintains a database [IANA 05] which lists the address ranges of public available addresses.
All previous RFCs have been replaced by this database [RFC 3232]. A public IP can be
addressed from any other public IP in the internet. This means that IP data grams targeting a
public IP are routed through the internet to the target. A public IP must occur only once, it
may not be assigned twice.
Applications on the Honeypot can directly communicate with the internet as they have
information of the public internet address. This is in contrast to scenario IV where an
application on the Honeypot is not aware of the public IP. It is further possible to perform a
query on the responsible Regional Internet Registry to lookup the name of the address
registrar; this is called a “whoissearch”.
84
6.6.1 REGIONAL INTERNET REGISTRIES ARE
 AfriNIC (African Network Information Centre) - Africa Region
http://www.afrinic.net/
 APNIC (Asia Pacific Network Information Centre) - Asia/Pacific Region
http://www.apnic.net/
 ARIN (American Registry for Internet Numbers) - North America Region
http://www.arin.net/
 LACNIC (Regional Latin-American and Caribbean IP Address Registry) – Latin
America and some Caribbean Islands http://lacnic.net/en/index.html
 RIPE NCC (Réseaux IP Européens) - Europe, the Middle East, and Central Asia
http://www.ripe.net/
6.7 SCENARIO IV – PRIVATE ADDRESS

This scenario also focuses on the IP address on the Honeypot. In this scenario the
Honeypot is assigned a private address. Private addresses are specified in [RFC 1918]. In
contrast to public addresses, private IPs cannot be addressed from the internet. Packets with
private addresses are discarded on internet gateways routers. To connect to a private address,
the host needs to be located within the same address range or it needs provision of a gateway
with a route to the target network.
The Internet Assigned Numbers Authority (IANA) reserved three blocks of IP
addresses, namely 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16 for private internets. For
interconnecting private and public networks an intermediate device is used. That device
needs to implement Network Address Port Translation (NAPT) [RFC 3022]. NAPT allows
translating many IP addresses and related ports to a single IP and related ports. This hides the
addresses of the internal network behind a single public IP.
Outbound access is transparent to most of the applications. Unfortunately some
applications depend on the local IP address sent in the payload, i.e. FTP sends a PORT
command [RFC 959] with the local IP. Those applications require an Application Layer
85
Gateway which rewrites the IP in the payload. Therefore the applications on the Honeypot are
not aware of the public IP and limited by the functionality of the intermediate network
device.
6.8 SCENARIO V – RISK ASSESSMENT

A Honeypot allows external addresses to establish a connection. This means that
packets from the outside are replied. Without a Honeypot there would be no such response.
So a Honeypot increases traffic on purpose, especially traffic which is suspicious to be
malicious.
Security mechanisms need to make sure, that this traffic is not affecting the
production systems. Moreover the amount of traffic needs to be controlled. A hacker could
use the Honeypot to launch a DoS2 or DDoS3 attack. Another possibility would be to use the
Honeypot as a file server for stolen software, in hacker terms called warez. Both cases would
increase bandwidth usage and slow production traffic.
As hacking techniques evolve, an experienced Black hat could launch a new kind of
attack which is not recognized automatically. It could be possible to bypass the controlling
functions of the Honeypot and misuse it. Such activity could escalate the operation of a
Honeypot and turn it into a severe threat. A Honeypot operator needs to be aware of this risk
and therefore control the Honeypot on a regular basis.
6.9 SCENARIO VI – HONEYPOT-OUT-OF-THE-BOX

A Honeypot-out-of-the-box is a ready-to-use solution, which also could be thought as
a commercial product. The question is which features are needed. As showed in the previous
chapters there is a wide range of eventualities. A complete product needs to cover security,
hide from the attacker, good analyzability, and easy access to captured data and automatic
alerting functions to be sufficient.
The data owner splits the access right of the encrypted data into n pieces, with each
legitimate user holding one piece of the access right. This can effectively reduce the risk of
information leakage in big data storage.
86
CHAPTER 7
SOURCE CODE
MapAndReduceJob.java
packagecloudmapreduce;
importjava.io.BufferedWriter;
importjava.io.FileWriter;
importjava.io.IOException;
importjava.util.*;
importorg.apache.hadoop.fs.FileSystem;
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
importorg.apache.hadoop.mapred.*;
importorg.apache.hadoop.util.*;
public class MapAndReduceJob
public static class Map extends MapReduceBase implementsMapper<LongWritable, Text,

Text, IntWritable>
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
87
public void map(LongWritable key, Text value,OutputCollector<Text, IntWritable> output,
Reporter reporter)
throwsIOException
String line = value.toString();
System.out.println("Line == : "+line);
StringTokenizertokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
word.set(tokenizer.nextToken());
output.collect(word, one);
}}}
public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable,

Text, IntWritable>
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text,

IntWritable> output, Reporter reporter)
throwsIOException
int sum = 0;
while (values.hasNext())
sum += values.next().get();
output.collect(key, new IntWritable(sum));
public static void main(String[] args) throws Exception
88
{
JobConfconf = new JobConf(MapAndReduceJob.class);
conf.setJobName("wordcountMapAndReduce");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new
Path("/hadoop/mapred/system/workLoadFile.1.txt","/hadoop/mapred/system/workLoadFile.2
.txt"));
FileOutputFormat.setOutputPath(conf, new
Path("/hadoop/mapred/system/wcMROutput.txt"));
FileSystemfs=FileSystem.get(conf);
if(fs.exists(new Path("/hadoop/mapred/system/wcMROutput.txt")))
fs.delete(new Path("/hadoop/mapred/system/wcMROutput.txt"));
JobClient.runJob(conf);
89
MabJob.java
importjava.io.IOException;
importjava.util.StringTokenizer;
importorg.apache.hadoop.fs.FileSystem;
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.LongWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapred.FileInputFormat;
importorg.apache.hadoop.mapred.FileOutputFormat;
importorg.apache.hadoop.mapred.JobClient;
importorg.apache.hadoop.mapred.JobConf;
importorg.apache.hadoop.mapred.MapReduceBase;
importorg.apache.hadoop.mapred.Mapper;
importorg.apache.hadoop.mapred.OutputCollector;
importorg.apache.hadoop.mapred.Reporter;
importorg.apache.hadoop.mapred.TextInputFormat;
importorg.apache.hadoop.mapred.TextOutputFormat;
public class MapJob
90
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text,
Text, IntWritable>
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throwsIOException
String line = value.toString();
System.out.println("Line == : "+line);
StringTokenizertokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
word.set(tokenizer.nextToken());
output.collect(word, one);
public static void main(String[] args) throws Exception
JobConfconf = new JobConf(MapJob.class);
conf.setJobName("wordcountMap");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setInputFormat(TextInputFormat.class);
91
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new
Path("/hadoop/mapred/system/workLoadFile.2.txt"));
FileOutputFormat.setOutputPath(conf, new
Path("/hadoop/mapred/system/wcMapOutput2.txt"));
FileSystemfs=FileSystem.get(conf);
if(fs.exists(new Path("/hadoop/mapred/system/wcMapOutput2.txt")))
fs.delete(new Path("/hadoop/mapred/system/wcMapOutput2.txt"));
JobClient.runJob(conf);
92
Preprocessing2.java
import java.io.*;
importjava.util.ArrayList;
importjava.util.StringTokenizer;
importjava.util.logging.Level;
importjava.util.logging.Logger;
importjavax.swing.JFileChooser;
importjavax.swing.JOptionPane;
public class Preprocessing2 extends javax.swing.JFrame {
public static File input = start.input;
public static ArrayList<Integer>redundantIndex = new ArrayList<Integer>();
public static ArrayList<String>redundantData = new ArrayList<String>();
public static ArrayListdataTable = new ArrayList();
public static ArrayList dataTable1 = new ArrayList();
public static ArrayList<String> attributes = new ArrayList<String>();
public Preprocessing2() {
initComponents();
@SuppressWarnings("unchecked")
93
// <editor-fold defaultstate="collapsed" desc="Generated Code">//GEN-
BEGIN:initComponents
private void initComponents() {
jPanel1 = new javax.swing.JPanel();
jSeparator1 = new javax.swing.JSeparator();
jLabel1 = new javax.swing.JLabel();
jButton1 = new javax.swing.JButton();
jScrollPane1 = new javax.swing.JScrollPane();
jTextArea1 = new javax.swing.JTextArea();
jButton2 = new javax.swing.JButton();
setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);
jPanel1.setBackground(new java.awt.Color(51, 51, 51));
jPanel1.setLayout(null);
jSeparator1.setBackground(new java.awt.Color(153, 153, 153));
jSeparator1.setOpaque(true);
jPanel1.add(jSeparator1);
jSeparator1.setBounds(0, 0, 690, 10);
jLabel1.setFont(new java.awt.Font("Times New Roman", 1, 18)); // NOI18N
jLabel1.setForeground(new java.awt.Color(255, 255, 255));
jLabel1.setHorizontalAlignment(javax.swing.SwingConstants.CENTER);
94
jLabel1.setText("MapReduce Across Datacenters");
jPanel1.add(jLabel1);
jLabel1.setBounds(10, 20, 660, 20);
jButton1.setFont(new java.awt.Font("Times New Roman", 0, 16)); // NOI18N
jButton1.setText("Display Data");
jButton1.addActionListener(new java.awt.event.ActionListener() {
public void actionPerformed(java.awt.event.ActionEventevt) {
jButton1ActionPerformed(evt);
}});
jPanel1.add(jButton1);
jButton1.setBounds(10, 380, 120, 27);
jTextArea1.setColumns(20);
jTextArea1.setRows(5);
jScrollPane1.setViewportView(jTextArea1);
jPanel1.add(jScrollPane1);
jScrollPane1.setBounds(10, 70, 660, 280);
jButton2.setFont(new java.awt.Font("Times New Roman", 0, 16)); // NOI18N
jButton2.setText("Proceed");
jButton2.addActionListener(new java.awt.event.ActionListener() {
95
public void actionPerformed(java.awt.event.ActionEventevt) {
jButton2ActionPerformed(evt);
}});
jPanel1.add(jButton2);
jButton2.setBounds(550, 380, 120, 27);
javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane());
getContentPane().setLayout(layout);
layout.setHorizontalGroup(
layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
.addComponent(jPanel1, javax.swing.GroupLayout.DEFAULT_SIZE, 682,

Short.MAX_VALUE));
layout.setVerticalGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LE
ADING)
.addComponent(jPanel1, javax.swing.GroupLayout.DEFAULT_SIZE, 430,

Short.MAX_VALUE));
pack();
}// </editor-fold>//GEN-END:initComponents
private void jButton1ActionPerformed(java.awt.event.ActionEventevt) {//GEN-

FIRST:event_jButton1ActionPerformed
BufferedReaderbr = null;
try {
intcountToken = 0;
br = new BufferedReader(new FileReader(input));
String read = "";
StringTokenizerst;
int j = 0, line = 0;
while ((read = br.readLine()) != null) {
st = new StringTokenizer(read, ",");
String token = "";

96
while (st.hasMoreTokens()) {
if (line == 0) {
dataTable.add(new ArrayList());
dataTable1.add(new ArrayList());
token = st.nextToken();
if (line > 0) {
if (!((ArrayList) dataTable.get(j)).contains(token)) {
((ArrayList) dataTable.get(j)).add(token);
((ArrayList) dataTable1.get(j)).add(token);
} else {
attributes.add(token);
jTextArea1.append(token + "\t");
System.out.println(token + "j " + j + " line " + line);
countToken++;
j++;
System.out.println("Token count : " + countToken);
countToken = 0;
j = 0;
jTextArea1.append("\n\n");
line++;
for (String att : attributes) {
System.out.println("Att : " + att);
97
}
int size;
for (int i = 0; i <dataTable.size(); i++) {
size = ((ArrayList) dataTable.get(i)).size();
if (size == 1) {
JOptionPane.showMessageDialog(null, "Preprocessing necessary");
redundantIndex.add(i);
redundantData.add(attributes.get(i));
System.out.println("size : " + i + " : " + size);
} catch (IOException ex) {
Logger.getLogger(Preprocessing2.class.getName()).log(Level.SEVERE, null, ex);
} finally {
try {
br.close();
} catch (IOException ex) {
Logger.getLogger(Preprocessing2.class.getName()).log(Level.SEVERE, null, ex);
}}
}//GEN-LAST:event_jButton1ActionPerformed
private void jButton2ActionPerformed(java.awt.event.ActionEventevt) {//GEN-

FIRST:event_jButton2ActionPerformed
new Preprocessing3().setVisible(true);
}//GEN-LAST:event_jButton2ActionPerformed
public static void main(String args[]) {
//<editor-fold defaultstate="collapsed" desc=" Look and feel setting code (optional) ">
try {
98
for (javax.swing.UIManager.LookAndFeelInfo info :
javax.swing.UIManager.getInstalledLookAndFeels()) {
if ("Nimbus".equals(info.getName())) {
javax.swing.UIManager.setLookAndFeel(info.getClassName());
break;
}}
} catch (ClassNotFoundException ex) {
java.util.logging.Logger.getLogger(Preprocessing2.class.getName()).log(java.util.logging.Le
vel.SEVERE, null, ex);
} catch (InstantiationException ex) {
} catch (IllegalAccessException ex) {
} catch (javax.swing.UnsupportedLookAndFeelException ex) {
java.awt.EventQueue.invokeLater(new Runnable() {
public void run() {
new Preprocessing2().setVisible(true);
}});}
// Variables declaration - do not modify//GEN-BEGIN:variables
privatejavax.swing.JButton jButton1;
privatejavax.swing.JButton jButton2;
privatejavax.swing.JLabel jLabel1;
privatejavax.swing.JPanel jPanel1;
privatejavax.swing.JScrollPane jScrollPane1;
99
privatejavax.swing.JSeparator jSeparator1;
privatejavax.swing.JTextArea jTextArea1;
CHAPTER 8
SCREEN SHOTS
8.1 VALIDATE RESULT
Fig: 8.1 Validate Result
100
8.2 BROWSING MAPREDUCE ACRESS
DATACENTER
Fig: 8.2 Browsing Mapreduce Acress Datacenter

101
8.3 BROWSING DATA LOADED
Fig:8.3 Browsing Data Loaded
102
8.4 MAP REDUCE PROCESSING ACROSS
Fig: 8.4 Map Reduce Processing Across
103
8.5 REDUNDANT DATA REMOVAL
Fig: 8.5 Redundant Data Removal
104
8.6 BEFORE AND AFTER PREPROCESSING
Fig: 8.6 Before And After Preprocessing
105
8.7 BEFORE AND AFTER PREPROCESSING WITH
VALUES
Fig: 8.7 Before And After Preprocessing With Values
106
8.8 SLOT ALLOCATIONS PROCESS
Fig: 8.8 Slot Allocations Process
107
8.9 SLOT ALLOCATIONS PROCESS REPORT
Fig: 8.9 Slot Allocations Process Report
108
8.10 MAPPING PROCESS
Fig: 8.10 Mapping Process
109
8.11 MAPPING PROCESS WITH VALUES
Fig: 8.11 Mapping Process With Values
110
8.12EXECUTION TIME FOR VM RESOURCE IN
MOBILE CLOUD
Fig: 8.12 Execution time for VM resource in Mobile Cloud
111
8.13 DATA CENTERS OPTIMIZATION TIME
Fig: 8.13 Data Centers Optimization Time
112
8.14 OPTIMIZATION EVALUATION
Fig: 8.14 Optimization Evaluation
113
8.15 OPTIMIZED EXECUTION TIME
Fig: 8.15 Optimized Execution Time
114
8.16 HONEY POT-OUT-OF-THE-BOX IN HADOOP
SMART CITY VM
Fig: 8.16 Honey pot-out-of-the-box in Hadoop Smart city VM
115
8.17 ACCESS CONTROL
Fig: 8.17 Access Control
116
CHAPTER 9
CONCLUSION
This dissertation centers on performance modeling and resource management for Map
Reduce applications. It introduces a performance modeling frame work for estimating
completion time for complex Map Reduce applications defined as a DAG of Map Reduce
jobs when it is executed on a given platform with different resource allocations and different
input data set(s).Based on the performance modeling framework, we further introduce
resource allocation strategies as well as our customized deadline-driven scheduler in
estimating and controlling the appropriate amount of resource that should be allocated to each
application to meet their (soft) deadlines.
First propose an improved Honey Been Optimization cryptosystem to overcome the

decryption failures of the original Honey been Optimization and then present a secure and
verifiable access control scheme based on the improved Honey been Optimization to protect
the outsourced big data stored in a cloud. Our scheme allows the data owner to dynamically
update the data access policy and the cloud server to successfully update the corresponding
outsourced cipher text to enable efficient access control over the big data in the cloud.
Considers the user mobility without looking at cloudlet resources. Meanwhile, the task-
centric VM migration method takes a single VM migration approach, which fails to
effectively utilize cloudlet resources. Finally, the No VM migration approach, even though it
does not increase the over-provisioned resources, but it negatively impacts so another aspect.
117
CHAPTER 10
BIBLIOGRAPHY
[1] J.H.AbawajyandM.M.Hassan,‘‘FederatedInternetofThingsandcloud computing pervasive

patient health monitoring system,’’ IEEE Commun. Mag., vol. 55, no. 1, pp. 48–53, Jan.
2017.
[2] E. Ahmed et al., ‘‘Enabling mobile and wireless technologies for smart cities,’’ IEEE
Commun. Mag., vol. 55, no. 1, pp. 74–75, May 2017.
[3] M. Basiri, A. Z. Azim, and M. Farrokhi, ‘‘Smart city solution for
sustainableurbandevelopment,’’Eur.J.Sustain.Develop.,vol.6,no.1,pp.71–84, 2017.
[4] N. Bobroff, A. Kochut, and K. Beaty, ‘‘Dynamic placement of virtual machines for
managing SLA violations,’’ in Proc. 10th IFIP/IEEE Int. Symp.Integr.Netw. Manage. (IM),
May 2007, pp. 119–128.
[5] R. Buyya, C. S. Yeo, and S. Venugopal, ‘‘Market-oriented cloud computing: Vision, hype,
and reality for delivering it services as computing utilities,’’ in Proc. 10th IEEE Int. Conf.
High Perform. Comput.Commun. (HPCC), Sep. 2008, pp. 5–13.
[6] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, ‘‘Clonecloud: Elastic execution
between mobile device and cloud,’’ in Proc. 6th Conf. Comput. Syst., 2011, pp. 301–314.
[7] R. Cimler, J. Matyska, and V. Sobeslav, ‘‘Cloud based solution for mobile healthcare
application,’’ in Proc. 18th Int. Database Eng. Appl. Symp., 2014, pp. 298–301.
[8] E. Cuervo et al., ‘‘Maui: Making smartphones last longer with code offload,’’ in Proc. 8th
Int. Conf. Mobile Syst., Appl., Services, 2010, pp. 49–62.
[9] A. V. Dastjerdi and R. Buyya, ‘‘Fog computing: Helping the Internet of Things realize its
potential,’’ Computer, vol. 49, no. 8, pp. 112–116, Aug. 2016.
[10] H. T. Dinh, C. Lee, D. Niyato, and P. Wang, ‘‘A survey of mobile cloud computing:
Architecture, applications, and approaches,’’ Wireless Commun. Mobile Comput., vol. 13, no.
18, pp. 1587–1611, Dec. 2013.
[11] C. Doukas, T. Pliakas, and I. Maglogiannis, ‘‘Mobile healthcare information
management utilizing cloud computing and Android OS,’’ in Proc. Annu. Int. Conf. IEEE
Eng. Med. Biol., Aug. 2010, pp. 1037–1040.
[12] E.-M. Fong and W.-Y. Chung, ‘‘Mobile cloud-computing-based healthcare service by
noncontact ecg monitoring,’’ Sensors, vol. 13, no. 12, pp. 16451–16473, 2013.
118
[13] L. Gkatzikis and I. Koutsopoulos, ‘‘Migrate or not? Exploiting dynamic task migration
in mobile cloud computing systems,’’ IEEE Wireless Commun., vol. 20, no. 3, pp. 24–32,
Jun. 2013.
[14] L. Gkatzikis and I. Koutsopoulos, ‘‘Mobiles on cloud nine: Efficient task migration
policies for cloud computing systems,’’ in Proc. IEEE 3rd Int. Conf. Cloud Netw. (CloudNet),
Oct. 2014, pp. 204–210.
[15] M. Guntsch, M. Middendorf, and H. Schmeck, ‘‘An ant colony optimization approach to
dynamic TSP,’’ in Proc. 3rd Annu. Conf. Genetic Evol.Comput., 2001, pp. 860–867.
[16] M. M. Hassan, ‘‘Cost-effective resource provisioning for multimedia cloud-based e-
health systems,’’ Multimedia Tools Appl., vol. 74, no. 14, pp. 5225–5241, 2015.
[17] Mumak: Map-Reduce Simulatorhttps://issues.apache.org/jira/browse/MAPREDUCE-
728.
[18] Yang Wang and Wei Shi. On Optimal Budget-Driven Scheduling Algorithmsfor
MapReduce Jobs in the Hetereogeneous Cloud.Technical Report TR-13-02, Carleton
University, 2013.
[19] Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael Ernsat. Haloop: Efficient
iterative data processing on large clusters.Proc. VLDB Endow., 3(1):285–296, 2010.
[20]Songting Chen. Cheetah: A high performance, custom data warehouse on top of
mapreduce.Proc. VLDB Endow., 3(2):1459–1468, 2010
[21] Cliff Engle, Antonio Lupher, ReynoldXin, MateiZaharia, Michael J. Franklin,Scott
Shenker, and Ion Stoica. Shark: Fast data analysis using coarse-graineddistributed memory.
In
Proceedings of the 2012 ACM SIGMOD InternationalConference on Management of Data ,
SIGMOD ’12, pages 689–692, 2012.
[22] M. S. Hossain, G. Muhammad, "Healthcare big data voice pathology assessment
framework", IEEE Access, vol. 4, pp. 7806-7815, 2016.
[23] M. Islam, A. Razzaque, J. Islam, "A genetic algorithm for virtual machine migration in
heterogeneous mobile cloud computing", Proc. Int. Conf. Netw. Syst. Security (NSysS), pp.
1-6, Jan. 2016.
[24] A. R. Khan, M. Othman, S. A. Madani, S. U. Khan, "A survey of mobile cloud
computing application models", IEEE Commun. Surveys Tuts., vol. 16, no. 1, pp. 393-413,
Feb. 2014.
119
[25] S. Kosta, A. Aucinas, P. Hui, R. Mortier, X. Zhang, "ThinkAir: Dynamic resource
allocation and parallel execution in the cloud for mobile code offloading", Proc. IEEE
INFOCOM, pp. 945-953, Mar. 2012.
[26] P. Kulkarni, T. Farnham, "Smart city wireless connectivity considerations and cost
analysis: Lessons learnt from smart water case studies", IEEE Access, vol. 4, pp. 660-672,
2016.
[27] R. Kumari et al., "Application offloading using data aggregation in mobile cloud
computing environment" in Leadership Innovation Entrepreneurship as Driving Forces
Global Economy, Switzerland:Springer, pp. 17-29, 2017.
[28] P. G. J. Leelipushpam, J. Sharmila, "Live VM migration techniques in cloud
environment : A survey", Proc. IEEE Conf. Inf. Commun. Technol. (ICT), pp. 408-413, Apr.
2013.
[29] J. Li, K. Bu, X. Liu, B. Xiao, "Enda: Embracing network inconsistency for dynamic
application offloading in mobile cloud computing", Proc. 2nd ACM SIGCOMM Workshop
Mobile Cloud Comput., pp. 39-44, 2013.
[30] J. Liu, Y. Li, D. Jin, L. Su, L. Zeng, "Traffic aware cross-site virtual machine migration
in future mobile cloud computing", Mobile Netw. Appl., vol. 20, no. 1, pp. 62-71, Feb. 2015.
[31] J. Montgomery, M. Randall, T. Hendtlass, "Structural advantages for ant colony
optimisation inherent in permutation scheduling problems", Proc. 18th Int. Conf. Innov. Appl.
Artif. Intell., pp. 218-228, 2005.
[32] Z. L. Phyo, T. Thein, "Correlation based vms placement resource provision", Int. J.
Comput. Sci. Inf. Technol., vol. 5, no. 1, pp. 95, 2013.
[33] M. Rahimi, J. Ren, C. Liu, A. Vasilakos, N. Venkatasubramanian, "Mobile cloud
computing: A survey state of art and future directions", Mobile Netw. Appl., vol. 19, no. 2,
pp. 133-143, 2014.
[34] C. C. Sasan Adibi, N. Wickramasinghe, "CCmH: The cloud computing paradigm for
mobile health (mHealth)", Int. J. Soft Comput. Softw. Eng., vol. 3, no. 3, pp. 403-410, 2013.
[35] M. Satyanarayanan, P. Bahl, R. Caceres, N. Davies, "The case for VM-based cloudlets in
mobile computing", IEEE Pervas. Comput., vol. 8, no. 4, pp. 14-23, Oct. 2009.
[36] M. Sneps-Sneppe, D. Namiot, On mobile cloud for smart city applications, 2016,
[online] Available: https://arxiv.org/abs/1605.02886.
[37] T. Taleb, A. Ksentini, "An analytical model for follow me cloud", Proc. IEEE Global
Commun. Conf. (GLOBECOM), pp. 1291-1296, Dec. 2013.
120
[38] H. N. Van, F. Tran, J.-M. Menaud, "Sla-aware virtual resource management for cloud
infrastructures", Proc. 9th IEEE Int. Conf. Comput. Inf. Technol. (CIT), vol. 1, pp. 357-362,
Oct. 2009.
[39] U. Varshney, Pervasive Computing and Healthcare, Boston, MA, USA:Springer, pp. 39-
62, 2009.
[40] L. Wang, F. Zhang, A. V. Vasilakos, C. Hou, Z. Liu, "Joint virtual machine assignment
and traffic engineering for green data center networks", SIGMETRICS Perform. Eval. Rev.,
vol. 41, no. 3, pp. 107-112, Jan. 2014.
[41] S. Wang, R. Urgaonkar, T. He, M. Zafer, K. Chan, K. Leung, "Mobility-induced service
migration in mobile micro-clouds", Proc. IEEE Military Commun. Conf. (MILCOM), pp.
835-840, Oct. 2014.
[42] I. Yaqoob, I. A. T. Hashem, Y. Mehmood, A. Gani, S. Mokhtar, S. Guizani, "Enabling
communication technologies for smart cities", IEEE Commun. Mag., vol. 55, no. 1, pp. 112-
120, Jan. 2017.
[43] Q. Zhang, L. Cheng, R. Boutaba, "Cloud computing: State-of-the-art and research
challenges", J. Internet Services Appl., vol. 1, no. 1, pp. 7-18, 2010.
[44] Lance Spitzner (2002). Honeypots tracking hackers. Addison-Wesley. pp. 68–
70. ISBN 0-321-10895-7.
121

Full Doc Janani

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Full Doc Janani

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Fig: 1.1. Data Analysis for Smart Home

1.2 AIM & OBJECTIVE

To reduce dimensionality and remove noise.

To improve mining performance

• Speed of Data reducing task-execution time

• Simplicity and comprehensibility of mined results

1.2.1 BIG DATA CHALLENGES IN HEALTHCARE

 Inferring knowledge from complex heterogeneous patient sources. Leveraging the

 To develop an Ant Colony Optimization (ACO)-based VM migration model, in which

1.2.3 BIG DATA AS AN EXTENSION TO TRADITIONAL BI

According to Oracle, big data is the inclusion of additional sources to augment

BIG DATA AS A NEW GENERATION OF BI & ANALYTICS

1.2.4 HADOOP FRAMEWORK

1.3 BIG DATA

1.3.2 BIG DATA CHALLENGES

• Lack of appropriately scoped objectives,

• Non-relational technologies (No SQL)

1.3.4 ADVANTAGES AND DISADVANTAGES OF BIG DATA

 40 zettabytes (43 Trillion Gigabytes) of data will be created by 2020.

The traditional infrastructure of storing and managing data is now proving to be

1.3.6 CHALLENGES TO BIG DATA IN THE CLOUD ENVIRONMENT

The only way to deal with these challenges is to implement next-generation

Fig: 1.4 Access control in the healthcare

1.5.2 ENERGY CONSUMPTION

1.5.4 CLOUD COMPUTING

1.5.5 EXCHANGED DATA

Providing sustainable solutions to jointly optimize the data transfer exploiting

 Reduced cost as the need to recollect and verify data is removed;

1.8 MOBILE CLOUD APPLICATION

Medical applications in the mobile environment called as mobile healthcare

1.10 MOBILE HEALTHCARE PROVIDES THESE

Health monitoring services for patients to be monitored at anytime and anywhere 8

1.11 MOBILECLOUD COMPUTING

1.11.1 BATTERY LIFE

1.11.2 DATA STORAGE CAPACITY/PROCESS POWER

1.12. CHALLENGES IN MOBILE CLOUD COMPUTING

1.12.1. LOW BANDWIDTH

Bandwidth is the one of important issues in mobile cloud environment because

Rupali S. Khachane et.al.[3] have focused on Privacy Homomorp \hism technique

BadrishChandramouli et.al. [4] Have proposed a new progressive analytics system

Satoshi Tsuchiya et.al.[5]Have discussed about two fundamental technologies :

DivyakantAgrawal et.al. [6]Have focused on an organized picture of the challenges

Ms.PreetiTiwari et.al. [7]Have discussed that the performance of distributed query

3.1. PHYSIOLOGICAL DATA

3.4. SENSED DATA

3.5. CLINICAL NOTES

3.6. GENE DATA

3.7. SOCIAL MEDIA DATA

3.8. MAP REDUCE PROGRAMMING MODEL

Fig: 3.2. The Map Reduce Programming Model

public class MyMapper extends Mapper <LongWritable,Text, Text, LongWritable>

public void map(LongWritable key, Text value, Contextcontext)

3.9. USER IMPLEMENTED MAPPER CLASS

public class MyReducer extends Reducer <LongWritable,Text, Text, LongWritable>

public void reduce(LongWritable key, Text value, Contextcontext)

3.10. HADOOP MAPREDUCE RUNTIME

3.11. HADOOP MAP REDUCE FOR MULTI-CORE

Fig: 3.5.Map Reduce for Big Data Analysis

 staff can view patient information with high reliability

 saving money by avoiding data migrations and upgrades

 increasing agility by breaking free of proprietary file constraint

1. parallel data processing that is unconstrained

2. provide storage for billions and trillions of unstructured data sets