You are on page 1of 84

ABSTRACT

In recent years, the Smart City concept has become popular for its promise to improve the
quality of life of urban citizens .Smart City services typically contain a set of applications with
data sharing options. Most of the services in Smart Cities are actually mashups combined data
from several sources. This means that access to all available data is vital to the services.
Seamless access of Smart Network services requires resource migration in terms of VM
migration during the offloading process to ensure QoS for the user. Ant Colony Optimization
(ACO) based joint VM migration technique is proposed in which user mobility is also
considered. Propose an Ant Colony Optimization (ACO) based joint VM migration model for a
heterogeneous, MCC based Smart Network system in Smart City environment. In this model, the
user’s mobility and provisioned VM resources in the cloud address the VM migration problem.
Use DSMS, CEP, batch-based MapReduce and other processing mode and FPGA, GPU, CPU,
ASIC technologies differently to processing the data at the terminal of data collection. Here
structured the data and then upload to the cloud server and Map Reduce the data combined with
the powerful computing capabilities cloud architecture. Present a thorough performance
evaluation to investigate the effectiveness of our proposed model compared with the state-of-the-
art approaches.
CHAPTER 1

INTRODUCTION

Due to the rapid growth and innovation in communications as well as computer


technologies, smart citiesare the subject of permanent research in industry and academia. The
final goal is to provide numerous services such as real-time traffic monitoring, Network
assistance, security, safety. It should be noted that the concept of smart buildings with many
Internetenabled devices can be controlled from the remote locations and communicate each
other, thus becoming parts of smart cities. Internet of Things(IoT), as the collection of smart
appliances for data sharing across the globe, introduces a vision of the future smart cities where
users, computing systems and everyday objects cooperate with economic benefitsOne of the best
options for processinga large collection of data from different buildings is cloud-computing. The
ability to share resources, services, responsibility and management among cloud providers is the
fundamental assumption from the view point of cloud interoperability.

Fig: Data Analysis for Smart Home


Research Problem

In the previous section, it was discussed that big data introduces new security and privacy
issues. For the Network care sector these issues are even amplified, due to fact that Network care
data are considered privacy sensitive data and traditional security and privacy methods to protect
privacy Network care data seem insufficient or even obsolete. This is a problem for Security as
personal information can unwillingly be derived from these Network information systems and
end up in wrong hands. Besides that individuals have some rights against intrusion of their
personal information, in wrong hands, personal information can potentially harm individuals.

On the other hand, weak security and privacy methods can hinder the adoption of big data in the
Network care. There can be public resistance from individuals or government against the use of
big data in Network care, when there is no trust in the protection of their personal information.
Hindering in the adoption of big data in the Network care could also hinder potential benefits big
data could bring to the Network care, which are for example improved quality of Network.
Therefore, the owners of the problem are the hospitals and other organizations in the Network
that potentially can benefit from an adoption of big data in Network care. These organizations
have to deal with hurdles such as privacy legalization and the public perception of privacy before
they can successfully adopt big data.

Aim & Objective

The objective is to select the optimal cloud server for a mobile VM in addition to minimizing the
total number of VM migrations, reducing task-execution time. Honey Bean Optimization
algorithm (HBO) to identify the optimal target cloudlet.
Big Data Challenges in Network

 Inferring knowledge from complex heterogeneousSecurity sources. Leveraging the


Security/data correlations in longitudinal records.
 Understanding unstructured clinical notes in the right context.
 Efficiently handling large volumes of medical imaging data and extracting potentially
useful information and biomarkers.
 Analyzing genomic data is a computationally intensive task and combining with standard
clinical data adds additional layers of complexity.
 Capturing the Security’s behavioral data through several sensors; their various social
interactions and communications

Thesis Contribution:

This thesis reports on the use of phrases to Seamless access of Smart Network services
requires resource migration in terms of VM migration during the offloading process to ensure
QoS for the user. Honey Bean Optimization algorithm (HBO) based joint VM migration
technique is proposed in which user mobility is also considered.

 To develop an Ant Colony Optimization (ACO)-based VM migration model, in which


VM are migrated to candidate cloud servers so as to maximize the total utility of the
MCC system.
 Mobility-aware selection of cloudlets for VM provisioning in our proposed PRIMIO
system helps significantly to reduce service provisioning time.
 To introduce a joint VM migration approach to optimize both the resource utilization and
task execution time, diminishing the shortcomings of a single VM migration approach.
 The results of performance evaluation, depicted from test-bed implementation and
extensive experiments, show that the proposed PRIMIO system achieves significant
improvements compared to state-of-the-art works.
Big data as an extension to traditional BI

According to Oracle, big data is the inclusion of additional sources to augment existing
data analytics operations. Exploiting big data entails dealing with multiple sources and
combining structured and unstructured data. Various sources advices not to dispense existing
infrastructures and capabilities of BI, but current capabilities should be integrated with the new
requirements of Big Data. Big Data technologies work best in cooperation with the original
enterprise data warehouses, as used with Business Intelligence

Big data as a new generation of BI & Analytics

Every generation introduces new data types, which require new capabilities to deal with
these new data types. The first generation of BI&A applications and research focused on mostly
structured datacollected by companies through legacy systems and stored in relational database
management systems (RDBMS)

 Analytical techniques used in this generation of BI&A are rooted in statistical methods
and data mining techniques developed in the 70s and 80s respectively.
 The second generation of BI&A is a result of the development of the internet; it
encompasses analysis of web-based unstructured content.
 The third generation of BI&A is emerging as a result of smartphones, tablets and other
sensorbased information supplies and includes analysis on location-based, person-
centered and context-relevant analysis
HADOOP FRAMEWORK

Unlike RDBMS and NoSQL, Hadoop is not referring to a type of database, but rather a
software platform that allows for massively parallel computing. Hadoop is an open source
software framework, which consists of several software modules that are targeted to process big
data, large volume and high variety of data. Core modules of the Hadoop ecosystem are Hadoop
Distributed File System (HDFS) and Hadoop Mapreduce. Below we describe the most popular
modules of the Hadoop framework.

 HDFS is the software module that arranges the storage in a Hadoop big data ecosystem.
HDFS breaks down data into pieces and distributes these pieces to multiple nodes of
physical data storage in a system. The main advantages of HDFS are that it is designed to
be scalable and fault tolerant. Additionally, by dividing data into pieces HDFS prepares
data for parallel processing. Other modules in the Hadoop framework are designed to
take advantage of distributed data over multiple nodes.
 Mapreduce is a software framework that provides a programming language that takes
full advantage of parallel processing. Tasks that programmed in MapReduce are divided
in smaller tasks, which are sent to the relevant nodes in the system. The MapReduce
framework takes care of the whole process: managing communication between nodes,
running tasks in parallel and providing redundancy and fault-tolerance.
 HBase is a software module that runs as non-relational database on top of HDFS. HBase
is NoSQL database that stores data according a key-value model. As it is a NoSQL type
of database it requires low level programming to query. Like other software modules of
Hadoop, HBase is open-source and is modeled after Google’s Big Table database.
 Hive is essentially a data warehouse that runs on top of HDFS. Hive structures data into
concepts like tables, columns, rows and partitions, similar to a relational database. Data
in a Hive database can be queried using (limited) SQL like language, named HiveQL.
OVERVIEW OF MOBILE CLOUD

Here assume a three-tier Mobile Cloud Computing (MCC) environment, where a set of M
access points (APs) comprise the backbone network. Tier one represents the master cloud, which
consists of several public cloud providers, such as Google App Engine, and Microsoft Azure
Amazon EC2. A set of high-speed interconnected cloudlets constitute the tier two or the
backbone layer of the mobile cloud architecture. Smartphones, wearable devices or other mobile
devices constitute the tier three or user layer. Users access the nearest cloud resources using
devices from tier three. A set of cloudlets is controlled and monitored by the master cloud (MC).
All cloudlets route their hypervisor information to the master clouds and they are connected to
the MC with a high-speed network connection.

Fig: Access control in the Big Data Analysis


Mobile Cloud Architecture

Each of theconsideredsmart city applications (energy, mobility, Network, disaster


recovering) can be defined through the services provided to citizens, concerning the
requirements in terms:

Latency
The amount of time required by a certain application between the event happening and the event
being acquired by the system.

Energy consumption
The energy consumed for executing a certain application locally or remotely

Throughput
The amount of bandwidth required by a specific application to be reliably executed in the
smart city environment

Cloud Computing
The amount of computing processes requested by a certain application.
Exchanged data
The amount of input, output, and code information to be transferred by means of the
wireless network.
Storage
The amount of storage space required for storing the sensed data and/or the processing
application.

Users
The number of users needed to achieve reliable service
Challenges of Present Cloud

Providing sustainable solutions to jointly optimize the data transfer exploiting


heterogeneous networks and the data processing, is one of the main challenges. For handling
video services, mobile edge computing (MEC) originating from the cloudlet concept is one of
the vital solutions. MEC and the 5G mobile networks when integrated are going to improve the
quality of service in smart cities. In connection with demands for mobile communication, current
heterogeneous networks meet face to face with issues for the information services in smart cities
such as: network convergence, balancing in cell networks, handover, and interference. Cellular
networks cannot accept these issues in order to satisfy the demands. Thus, considering the
deployment of 4G ultra-dense wireless networks, 4G converged cells-less
communicationnetworks are proposed to support mobile terminals in smart cities.

OBJECTIVES

 Reduced cost as the need to recollect and verify data is removed;


 Integrated city systems and data-driven services
 A common understanding of the needs of communities
 Shared objectives, collaboratively developed and evidenced using data;
 Engaged and enabled citizens and communities
 Transparency in decision-making models

Mobile Cloud Application:

Definition of Mobile Cloud Computing the Mobile Cloud Computing (MCC) term was
introduced after the concept of Cloud Computing. Basically, MCC refers to an infrastructure
where both the data storage and the data processing happen outside of the mobile device.
Regarding the definition, mobile applications move the computation power and storage from the
mobile phones to the cloud. It can be thought as a combination of the cloud computing and
mobile environment. The cloud can be used for power and storage, as mobile devices don’t have
powerful resources compared to traditional computation devices.
Mobile Network

Medical applications in the mobile environment called as mobile Network applications


and used for medical treatment, Security tracking, etc. The purpose of applying MCC in medical
applications is to decrease disadvantages of traditional medical applications like small physical
storage, security and privacy, and medical errors.

Mobile Network provides these facilities:

Network monitoring services for Security to be monitored at anytime and anywhere 8


through internet or network provider. Emergency management system for emergency vehicles to
reach or manage vehicles effectively and in time, in case of receiving calls from incidents and
accidents. Network mobile devices for detecting pulse-rate, blood pressure, and level of alcohol
integrated with a system to alert in case of emergency. Store Network information of Security to
use in medical experiments or researches. Mobile Network applications provide users easiness
and quickness by accessing resources at any time, from anywhere. By the help of cloud, mobile
Network applications offer a variety of on-demand services on clouds rather than standalone
applications on local computers and servers. However, there have to be proposed solutions to
protect participant’s Network information to increase the privacy of the users, as have to be done
in the traditional applications.

Mobile Cloud Computing Advantages/Disadvantages

There are many reasons to use cloud computing with mobile applications. MCC provides
some solutions to the obstacles which mobile subscribers are usually face up with.

These advantages are:

Battery Life

Battery life is one of the main concerns in the mobile environment. There are already
several solutions for extending battery life by enhancing CPU performance, using disk and
screen in an efficient manner to reduce power consumption. But these solutions generally require
changes in the mobile devices’ structure or a new hardware which means increasing the cost.
Computation or data offloading techniques are suggested to migrate the huge and complex
computations from limited resource devices like mobile devices to powerful machines like
servers in clouds. This avoids taking a long application execution time on mobile devices which
results in large amount of power 6 and/or read-write time consumption [4]. There are many
evaluations to show effectiveness of these techniques.

Data storage capacity/Process power:

Another obstacle is storage capacity of mobile devices. Mobile devices are generally
have limited storage. To overcome this problem, MCC can be used to access, query or store the
large data on the cloud through wireless networks. There are several examples which are widely
used such as Amazon Simple Storage Service (Amazon S3) to provide file storage on the cloud.
In addition, MCC reduces the time and energy consumption for compute-intensive applications,
which is too applicable when thinking of the limited-resource devices.

Reliability:

With the help of CC paradigm, reliability can be improved since data and application are
stored and backed up on several numbers of computers on the cloud. This provides more
confidentiality by reducing the chance of data lost on the mobile devices. In addition,
copyrighting digital contents and preventing illegal distributions like music, video can be more
available in this model. Also security services like virus detection applications can be easily
provided and used in an efficient way without effecting mobile device performance.
Furthermore, CC scalability, elasticity advantages can be used in MCC, as well since cloud
flexibility is applicable as a whole infrastructure, in the same way.

Privacy:

Privacy is an important issue, when thinking about private data. As in the CC era, the
same trust problem comes out with the mobile network providers and cloud providers. They can
monitor at all the communication and data stored in the cloud or network provider, although
there is encryption mechanisms to crypt data communicated or stored. So from this perspective,
it is a big headache to be solved.
Communication:

The communication is composed from multiple parts from mobile subscriber to the cloud
provider. Therefore there can be some problems like poor network speed or 7 limited bandwidth.
It can be a big concern because the number of mobile and cloud users is dramatically increasing.

Challenges in mobile cloud computing

As mentioned in the previous section, Mobile Cloud Computing has many benefits and good
application examples for mobile users and service providers. On the other hand, as mentioned in
some parts, there are also some challenges related to cloud computing and mobile networks
communication. This section gives some explanation about these obstacles and solutions. 3.1.
Mobile Side Challenges In the mobile network side, main obstacles and solutions are listed
below:

Low Bandwidth

: Bandwidth is the one of important issues in mobile cloud environment because mobile
network resource is much smaller compared with the traditional networks. Therefore, P2P Media
Streaming for sharing limited bandwidth among the users who are located nearby in the same
area for the same content such as the same video [6]. By this method, each user transmits or
exchanges parts of the same content with the other users, which is resulted in improvement of
content quality, especially for videos.

Availability:

 Network failures, out of signal errors, or high traffic related poor performance problems are
main threats to prevent users to connect to the cloud. But there are some solutions to help mobile
users in the case of any disconnection from the clouds. One of them is Wi-Fi Based Multihop
BDSA. It is a distributed content sharing protocol for the situation without any infrastructure [7].
In this mechanism, nearby nodes are detected in case of the failure of direct connection to the
cloud. In this case, instead of having a link directly to the cloud, mobile user can connect to the
cloud through neighboring nodes. Although there are some considers about security issues for
such mechanisms, these issues can also be solved.

Heterogeneity:

 There are types of networks which are used simultaneously in mobile environment such as
WCDMA, GPRS, WiMAX, CDMA2000, and WLAN. As a result, handling like heterogeneous
network connectivity becomes very hard while satisfying mobile cloud computing requirements
such as connectivity which is always on, on-demand scalable connectivity, and the energy
efficiency of mobile devices. This problem can be solved 10 by using standardized interfaces and
messaging protocols to reach, manage and distribute contents.

Pricing:

 Using multiple services in mobile requires with both mobile network provider and cloud
service provider. However, these providers have different methods of payment and prices for
services, features and facilities. Therefore, this has possibility of leading to many problems like
how to determine price, how the price could be shared among the providers or parties, and how
the subscribers can pay. As an example, when a mobile user wants to run a not free mobile
application on the cloud, this participates three stakeholders as one of them is application
provider for application licence, second one is mobile network provider for used data
communication from user to cloud, and third one is cloud provider for providing and running
application on the cloud.
LITERATURE REVIEW

Marcos D. Assuncao et.al. [1] have discussed approaches and environments for carrying
out analytics on clouds for Big data applications. They have identified possible gaps in
technology and provide recommendations for the research community on future directions on
Cloud-supported Big Data computing and analytics solutions.
KhairulMunadi et.al. [2] Have proposed a conceptual image trading framework that
enables secure storage and retrieval over internet services. The aim is to facilitate secure
storage and retrieval of original images for commercial transactions, while preventing
untrusted server providers and unauthorized users from gaining access to true contents.
Rupali S. Khachane et.al. [3] have focused on Privacy Homomorphism technique which
emphasize to resolve the security of query processing from client side, cloud with R-tree index
query and distance re-coding algorithm.
BadrishChandramouli et.al. [4] Have proposed a new progressive analytics system based
on a progress model called Prism that allows users to communicate progressive samples to the
system and efficient and deterministic query processing over samples.
[5]
Satoshi Tsuchiya et.al. Have discussed about two fundamental technologies :
distributed data store and complex event processing, and workflow description for distributed
data processing.
Divyakant Agrawal et.al. [6] Have focused on an organized picture of the challenges
faced by application developers and DBMS designers in developing and deploying internet
scale applications.
Ms.Preeti Tiwari et.al. [7] Have discussed that the performance of distributed query
optimization is improved when ACO is integrated with other optimization algorithms.
Haibo Hu et.al. [8] Have proposed a holistic and efficient solution that comprises a
secure traversal framework and an encryption scheme based on privacy homomorphism. The
framework is scalable to large datasets by leveraging an index-based approach. Based on this
framework, we devise secure protocols for processing typical queries such as k-nearest-
neighbor queries (kNN) on R-tree index.
Ku Rahaneet. al. [9] has proposed about a framework for big data clustering which
utilizes grid technology and ant-based algorithm.
Sudipto das et.al. [10] Have discussed to clarify some of the critical concepts in the
design space of big data and cloud computing such as: the appropriate systems for a specific
set of application requirements on the mobile data.

Related Papers

Title: A Genetic Algorithm for Virtual Machine Migration in Heterogeneous Mobile Cloud
Computing.
Author: Md. Mofijul Islam, Md. AbdurRazzaque and Md. Jahidul Islam
Year: 2016
Description:
 Mobile Cloud Computing (MCC) improves the performance of a mobile application by
executing it at a resourceful cloud server.
 Virtual Machine (VM) migration in MCC brings cloud resources closer to a user so as to
further minimize the response time of an offloaded application.
 The key challenge is to find an optimal cloud server for migration that offers the
maximum reduction in computation time.
 The goal of GAVMM is to select the optimal cloud server for a mobile VM and to
minimize the total number of VM migrations, resulting in a reduced task execution time.

Advantages:
 Mass storage capacity and high-speed computing power.
 It will assign multiple tasks in larger bandwidth the VM, but the smaller bandwidth VM
will be assigned rarely tasks.
 Load balancing of the entire system can be handled dynamically by using
virtualization technology
Disadvantages:
 VM placement problem is the hinge of scheduling and management in cloud data center.
 Limited-bandwidth and other limited resources.
 VM placement problem needs to consider the influence of network factors.
Algorithm:
 Genetic algorithm based virtual Machine migration (GAVMM).
Title: A Survey of Mobile Cloud Computing Application Models.
Author: Atta urRehman Khan, Mazliza Othman, Sajjad Ahmad Madani and Samee Ullah Khan.
Year: 2014
Description:
 Smartphones are now capable of supporting a wide range of applications, many of which
demand an ever increasing computational power.
 This poses a challenge because smartphones are resource-constrained devices with
limited computation power, memory, storage, and energy.
 The cloud computing technology offers virtually unlimited dynamic resources for
computation, storage, and service provision.
 The traditional smartphone application models do not support the development of
applications that can incorporate cloud computing features and requires specialized
mobile cloud application models.

Advantages:
 eXCloud, it transfers only the top stack frames, unlike the traditional process migration
techniques in which full state migrations are performed.
 MAUI provides a programming environment where independent methods can be marked
for remote execution.
 model is a wide range of elasticity patterns to optimize the execution of applications
according to the users’ desired objectives.
Disadvantages:
 The sharing of data and states between the web lets that execute on distributed locations
are prone to security issues.
 The data replication may give rise to data synchronization and integrity issues.
 The latency issue is very crucial in mobile cloud application models.

Algorithm:
Application partitioning algorithms such as
 All-step
 K-step

Title: Big Data-Driven Service Composition Using Parallel Clustered Particle Swarm
Optimizationin Mobile Environment.
Author: M. Shamim Hossain, MohdMoniruzzaman, Ghulam Muhammad and Ahmed Ghoneim,
AtifAlamri.
Description:
 A mobile service providers support numerous emerging services with differing quality
metrics but similar functionality.
 The mobile environment is ambient and dynamic in nature, requiring more efficient
techniques to deliver the required service composition promptly to users.
 Selecting the optimum required services in a minimal time from the numerous sets of
dynamic services is a challenge.
 By using parallel processing, the optimum service composition is obtained in
significantly less time than alternative algorithms.

Advantages:
 The performance of this algorithm can be improved by using efficient optimization
techniques like PSO.
 Qualities of the mobile environment demand efficient optimization and clustering
techniques.

Disadvantages:
 The issue of parallel and distributed data operations where the structure of data is multi-
dimensional.
 Dynamic QoS and the rapidly changing nature of services in the mobile environment.

Algorithm:
 Particle swarm optimization
 k-means clustering

Title: Clone Cloud: Elastic Execution between Mobile Device and Cloud
Author: Byung-GonChun,SunghwanIhm, PetrosManiatis, MayurNaik and Ashwin Patti.
Description:
 Mobile applications are becoming increasingly ubiquitous and provide ever richer
functionality on mobile devices.
 Such devices often enjoy strong connectivity with more powerful machines ranging from
laptops and desktops to commercial clouds.
 CloneCloud uses a combination of static analysis and dynamic profiling to partition
applications automatically at a fine granularity while optimizing execution time and
energy use for a target computation and communication environment.
 At runtime, the application partitioning is effected by migrating a thread from the mobile
device at a chosen point to the clone in the cloud, executing there for the remainder of the
partition, and re-integrating the migrated thread back to the mobile device.

Advantages:
 Like desktops and laptops and place demands on an extremely limited supply of energy.
 The granularity of partitioning is coarse since it is at class level, and it focuses on static
partitioning.
 Supporting native method calls was an important design choice we made, which
increases its applicability.

Disadvantages:
 Web page Consistency problem.
 Optimization problem.

Algorithm:
 DEFLATE compression algorithm
Title: Federated Internet of Things and Cloud Computing Pervasive Security Network
Monitoring System
Author: Jemal H. Abawajy and Mohammad Mehedi Hassan
Year: 2017
Description:
 In the conventional hospital-centric Network system, Security are often tethered to
several monitors.
 It develop an inexpensive but flexible and scalable remote Network status monitoring
system that integrates the capabilities of the IoT and cloud technologies for remote
monitoring of a Security’s Network status.
 The Network spending challenges by substantially reducing inefficiency and waste as
well as enabling Security to stay in their own homes and get the same or better care.
 It demonstrates the suitability of the proposed PPHM infrastructure, a case study for real-
time monitoring of a Security suffering from congestive heart failure using ECG is
presented.

Advantages:
 A flexible, energy-efficient, and scalable remote Security Network status monitoring
framework.
 A Network data clustering and classification mechanism to enable good Security care.
 Performance analysis of the PPHM framework to show its effectiveness.

Disadvantages:
 IoT-cloud convergence is crucial issue in Network application.
 Access control, location privacy, data confidentiality.

Algorithm:
 Rank correlation coefficient algorithm.
 Classification algorithm.

Title: Network Big Data Voice Pathology Assessment Framework.


Author: M. Shamimhossain and Ghulam muhammad
Year: 2016
Description:
 Network big data comprise data from different structured, semi-structured, and
unstructured sources.
 A framework is required that facilitates collection, extraction, storage, classification,
processing, and modeling of this vast heterogeneous volume of data.
 The machine learning algorithms in the form of a support vector machine, an extreme
learning machine and a Gaussian mixture model are used as the classifier.
 The proposed VPA system shows its efficiency in terms of accuracy and time
requirement.

Advantages:
 It likely to see an increasingly diverse set of stakeholders involved, spanning the
technical, Network, and policy domains.
 Big data tools with their merits that facilitate the execution of specified tasks in the
Network ecosystem.

Disadvantages:
 Security, integrity and privacy violations of these data can cause irremediable damage to
the Network, or even death, of the individual and loss to society.
 The standardization and format of big data, big data transfer and processing, searching
and mining of big data, and management of services.
 Security with similar symptoms and diseases can share their experiences through social
media to get ad-hoc counseling, which constitutes a big data problem.

Algorithm:
 Support vector machines (SVM)
 Extreme learning machine (ELM)
 Gaussian mixture model (GMM)
Title: Migrate or not? Exploiting dynamic task migration in Mobile cloud computing systems.
Author: LazarosGkatzikis and IordanisKoutsopoulos
Year: 2013
Description:
 Contemporary mobile devices generate heavy loads of computationally intensive tasks,
which cannot be executed locally due to the limited processing and energy capabilities of
each device.
 Cloud facilities enable mobile devices-clients to offload their tasks to remote cloud
servers, giving birth to Mobile Cloud Computing (MCC).
 The challenge for the cloud is to minimize the task execution and data transfer time to the
user, whose location changes due to mobility.
 It provides quality of service guarantees is particularly challenging in the dynamic MCC
environment, due to the time-varying bandwidth of the access links.

Advantages:
 The elasticity of resource provisioning and the pay as- you-go pricing model.
 We delineate the performance benefits that arise for mobile applications and identify the
peculiarities of the cloud that introduce significant challenges in deriving optimal
migration strategies.
 Reducing the energy consumption of individual servers by moving the processes from
heavily loaded to less loaded servers (load balancing).

Disadvantages:
 A strategy that does not consider migration cost and downloads time.
 No migration.
Title: Mobiles on Cloud Nine: Efficient Task Migration Policies for Cloud Computing Systems.
Author: LazarosGkatzikis and IordanisKoutsopoulos
Year: 2014
Description:
 Due to limited processing and energy resources, mobile devices outsource their
computationally intensive tasks to the cloud.
 Clouds are shared facilities and hence task execution time may vary significantly.
 It investigates the potential of task migrations to reduce contention for the shared
resources of a mobile cloud computing architecture in which local clouds are attached to
wireless access infrastructure.
 It devises online migration strategies that at each time make migration decisions
according to the instantaneous load and the anticipated execution time.

Advantages:
 The modification of program to incorporate state capture and recovery function.
 Simplified IT management and maintenance capabilities.
 Enormous computing resources available on demand.

Disadvantages:
 Classifying current computation offloading frameworks. Analyzing them by identifying
their approaches and crucial issues.
 Process migration applications are strongly connected with the system in the form of
sockets.
 Application development complexity and unauthorized access to remote data demand
a systematized plenary solution.
Title: Smart City Solution for Sustainable Urban Development
Author: Mostafa Basiri, Ali Zeynali Azim, Mina Farrokhi.
Year: 2017
Description:
 Large, dense cities can be highly efficient in which it is most desirable that side, by the
heads of the green, and the future porticos.
 Bearing to the influx of the citizens of the new challenges of the rapid advance to
command positions.
 The globalization of urban economics, cities increasingly have to compete directly with
worldwide and regional economies for international investment to generate employment,
revenue and funds for development.
 Smart Cities are those towns which use information technology to improve both the
quality of life and accessibility for their inhabitants.

Advantages:
 Reducing resource consumption, notably energy and water, hence contributing to
reductions in CO2 emissions.
 Improving commercial enterprises through the publication of real-time data on the
operation of city services.
 The growing penetration of fixed and wireless networks that allow such sensors and
systems to be connected to distributed processing centers and for these centers in turn
to exchange information among themselves.
Disadvantages:
 Where there are threats of serious or irreversible damage, lack of full scientific
certainty shall not be used as a reason for postponing cost effective measure to
prevent environmental degradation.
 The substitutability of capital.
 Sustainable development problem.

Technique:
 Information management technique
CHAPTER 3
NETWORK BIG DATA SOURCE ECO SYSTEM

Network big data is a revolutionary tool in the Network industry, and is becoming vital in
current Security-centric care. Owing to the massive growth of data in the Network industry,
diverse data sources have been aggregated into the Network big data ecosystem. These data
sources are is used by a Network provider to enable him or her to make decisions and provide
appropriate care. Major data sources, along with the challenges involved, are discussed below:

1. Physiological data.

These data are huge in terms of volume and velocity. Regarding data volume, a variety of
signals is collected from heterogeneous sources to monitor Security characteristics, including
blood pressure, blood glucose, and heart rate. Sources include electroencephalogram,
electrocardiogram, and electroglottogram. Data velocity can be observed from the growing rate
of data generation from continuous monitoring, especially for Security in a critical condition,
requires these signals to be processed in real-time, for decision making. These signals need to be
extracted efficiently and processed with the suitable machine learning algorithm to provide
meaning ful data for effective Security care. Efficient and comprehensive methods are also
required to analyze and process the collected signals to provide useable data to the Network
professionals and other related stakeholders. The combination of EHR and physiological signals
may increase the precision of data based on the surrounding context of the Security.

2) EHRs/EMRs.

EHRs or electronic medical records (EMRs) are digitized structured Network data from a
Security. The EHRs are collected from and shared among hospitals, research centers,
government agencies, and insurance companies. Security, integrity and privacy violations of
these data can cause irremediable damage to the Network, or even death, of the individual and
loss to society. Thus, big Network data security is now a key topic of research.
3) Medical images.

These images generate a huge volume of data to assist Network professionals for identifying or
detecting disease, treatment, predicting and monitoring of Security. Medical imaging techniques
such as X-ray, ultrasound, or computed tomography scan play a crucial role in diagnosis and
prognosis. Owing to the complication, dimensionality and noise of the collected images, efficient
image processing methods are required to provide clinically suitable data for Security care.

4) Sensed data

A sensed data from Security’s are collected using different wearable or implantable
devices, environment mounted devices, ambulatory devices, and sensors and smart phones from
home or in hospitals. The sensed data forms a key part of Network big data, as these sensors are
used to capture critical events or provide continuous monitoring. However, sensed data must be
collected, pre-processed, stored, shared and delivered correctly in a reasonable time to be of use
to Network providers when making clinical decisions. Owing to the enormous volume of data
collected, automated algorithms are required to reduce noise and to allow for the deployment
with big data analytics so that computation time can be reduced. Moreover, it is a challenge to
collect and collate multimodal sensed data from multiple sources at the same time.

5) Clinical notes.

The clinical notes, claims, recommendations, and decisions constitute one of the largest
unstructured sources of Network big data. Owing to the variety in format, reliability,
completeness, and accuracy of the clinical notes, it is challenging to ensure the Network care
provider has the correct information. Efficient data mining and natural language processing
techniques are required to provide meaningful data.
Fig: Big Network data source ecosystem.

6) Gene data.
The genome data makes a major contribution to Network big data. The human genome has a
huge number of genes; collecting, analyzing, and classifying data on these genes has taken years.
These gene data have now been integrated from the genetic level to physiological level of a
human being.

7) Social media data.


Social media connect Network professionals and Security outside their clinics, hospitals, and
homes through machine-to-machine, physician-to-Security, physician-to-physician, and
Security-to-Security communications. Securities with similar symptoms and diseases can share
their experiences through social media to get ad-hoc counseling, which constitutes a big data
problem. Based on study [18], 80% of unstructured data comes social media. Messages, trending
medical images, location information, and other features of social media contribute to Network
big data. For example, social media has recently been used for investigating depression and
suicide rates using real-time emotional state analysis from Twitter. Because of the heterogeneous
nature of social Network media data, it is difficult to conduct data analysis and provide meaning
ful data to Network big data stakeholders. Thus, this data needs to be appropriately mined,
analyzed and processed to improve the quality the Network services in Network providers.

Map Reduce Programming Model

The Map Reduce programming model divides computation into map, and reduces phases
as shown

Fig : The Map Reduce Programming Model

The map phase partitions input data into many input splits and automatically stores them
across a number of machines in the cluster. Once input data are distributed across the cluster, the
runtime creates a large number of map tasks that execute in parallel to process the input data.
The map tasks read in a series of key-value pairs as input and produce one or more intermediate
key-value pairs. A key-value pair is the basic unit of input to the map task.We uses the Word
Count application, which counts the number of occurrences of each word in a series of text
documents, as an example. In the case of processing text documents, a key-value pair can be a
line in a text document. The user can customize the dentition of a key-value pair.
public class MyMapper extends Mapper <LongWritable,Text, Text, LongWritable>

public void map(LongWritable key, Text value, Contextcontext)

{ ... }

User implemented Mapper Class

public class MyReducer extends Reducer <LongWritable,Text, Text, LongWritable>

public void reduce(LongWritable key, Text value, Contextcontext)

{ ... }

Hadoop MapReduce Runtime

Map Reduce runtime is an open source implementation of the Map Reduce model first
proposed by Google. It automatically handles task distribution, fault tolerance and other aspects
of distributed computing, making it much easier for programmers to write data parallel
programs. It also enables Google to exploit a large number of commodity computers to achieve
high performance at a fraction of the cost of asystem built from fewer but more expensive high-
end servers. Map Reduce scales performance by scheduling parallel tasks on nodes that store the
task inputs. Each node executes the tasks with loose communication with other nodes.Hadoop is
an open source implementation of Map Reduce. To use the Hadoop Map Reduce framework, the
user first writes a Map Reduce application using the programming model we described in the
previous section. The user then submits the Map Reduce job to a jobtracker, which is a Java
application that runs in its own dedicated JVM. The job tracker is responsible for coordinating
the job run. It splits the job into a number of map/reduce tasks and schedules the execution of the
tasks
Fig : Hadoop runs a Map Reduce job

Hadoop Map Reduce for Multi-core

Task trackers have a fixed number of slots for map tasks and for reduce tasks. Each slot
corresponds to a JVM executing a task. Each JVM only employs a single computation thread. To
utilize more than one core, the user needs to configure the number ofmap/reduces slots based on
the total number of cores and the amount of memory available on each node. The configuration
can be set in the mapred-site.xml file. The relevant properties are map reduce .The examples in
Figure 3.5 shows how to set four map slots and two reduce slot son each compute node. The
setting can be used to express heterogeneity of the machines in the cluster. This setting can be
different for each compute node. The reason is that different machines in the cluster can have a
different number of cores and differing amounts of memory
For example, a typical hash join application requires each map task to store a copy of the lookup
table in memory . Duplicating the lookup table will decrease the amount of memory available to
each map task. To make sufficient memory available to each map task, memory intensive
applications are often forced to restrict the number of JVMs created to be smaller thatthe number
of cores in a node at the expense of reducing CPU utilization. For example, in a machine with
four cores and 4 GB of RAM, the system needs to create four map tasks to use the four cores.
However, if 1 GB RAM is in sufficient for each map task, the Hadoop MapReduce system can
create only two map tasks with 2 GBRAM available to each task. With two map tasks, the
runtime system utilizes only

Fig: Hadoop Map Reduce on a four cores system of the four available cores or 50 percent
of the CPU resources.
Map Reduce for Big Data Analysis

• There is a growing trend of applications that should handle big data. However, analyzing
big data is a very challenging problem today.

• For such applications, the Map Reduce framework has recently attracted a lot of
attention. Google’s Map Reduce or its open source equivalent Hadoop is a powerful tool
for building such applications

• Effective management and analysis of large-scale data poses an interesting but critical
challenge.

Recently, big data has attracted a lot of attention from academia, industry as well as government

Fig : Map Reduce for Big Data Analysis


Network provider

Big data proof-of-value and data lake build out for a Network provider. The goal was a
big data solution for predictive analytics and enterprise reporting. The challenge was that the
organization did not have an enterprise data warehouse to consolidate data from their enterprise
operational systems. Further, key business stakeholders were not provided with tools needed to
access data. Lastly, teams tasked with creating analytics spent 90% of their time integrating data
in SAS, leaving them little time to analyze the data or create predictive analytics. The successful
project involved meeting with key business and IT stakeholders to determine reporting and
analytic challenges and priorities, and also performing a Current State Assessment, along with
meta data Discovery, profiling and outlier analysis of source data. Dell EMC proposed data lake
architecture to address enterprise reporting and predictive analytic needs. The solution also
initiated a governance program to ensure data quality and to establish stewardship procedures.
Finally, the project identified federated business data lake hardware and Pivotal big data suite
software as the target platform for the data lake.

The results of the project included new client analytics environment that facilitated the
execution of analytics and reporting activities to reduce time to insight. Further, client
governance structure ensured that metadata for new data sources into the data lake was shared
with users. The environment also supported the rapid creation of sandboxes to support analytics
Thesis.

 boosting Security care, service levels and efficiency by simplifying data access
 staff can view Security information with high reliability
 saving money by avoiding data migrations and upgrades
 increasing agility by breaking free of proprietary file constraint

Hadoop is a strong example of a technology that allows Network to store data in its native
form. If Hadoop didn’t exist, decisions would have to be made about what can be incorporated
into the data warehouse or the electronic medical record (and what cannot). Now everything can
be brought into Hadoop, regardless of data format or speed of ingests. If a new data source is
found, it can be stored immediately. No data is left behind. By the end of 2017, the number of
Network records of millions of people is likely to increase into tens of billions. Thus, the
computing technology and infrastructure must be able to render a cost efficient implementation
of:

1. parallel data processing that is unconstrained


2. provide storage for billions and trillions of unstructured data sets
3. fault tolerance along with high availability of the system

Hadoop technology is successful in meeting the above challenges faced by the Network industry
as the Map Reduce engine and Hadoop Distributed File System (HDFS) have the capability to
process thousands of terabytes of data. Hadoop makes use of highly optimized, yet inexpensive
commodity hardware making it a budget friendly investment for the Network industry.

 Integrating Network data dispersed among different Network organizations and social
media.
 Providing a shared pool of computing resources that is capable of storing and analyzing
Network big data efficiently to take smarter decisions at the right time.
 Providing dynamic provision of reconfigurable computing resources which can be scaled
up and down upon user demand. This will help reduce the cost of cloud based Network
systems.
 Improving user and device scalability and data availability and accessibility in Network
systems

Big data analytics is motivated in Network through the following aspects


 Network data is now growing very rapidly in terms of size, complexity, and Speed of
generation and traditional database and data mining techniques are no longer efficient in
storing, processing and analyzing these data. New innovative tools are needed in order to
handle these data within a tolerable elapsed time.
 The Security’s behavioral data is captured through several sensors; Security' various
social interactions and communications.
 The standard medical practice is now moving from relatively ad-hoc and subjective
Decision making to evidence-based Network.
 Inferring knowledge from complex heterogeneous Security sources and leveraging the
Security/data correlations in longitudinal records.
 Understanding unstructured clinical notes in the right context.
 Efficiently handling large volumes of medical imaging data and extracting pot Initially
useful information and biomarkers.
 Analyzing genomic data is a computationally intensive task and combining with standard
clinical data adds additional layers of complexity.

Fig: Mobile Cloud Computing and Big data Analytics


Big Data Analytics making efficient use of medical data
A lot of data is produced on a routine basis by hospitals, laboratories, retail, and non-
retail medical operations and promotional activities. But most of it gets wasted because
respective persons are not able to figure out what to do with that data. This is where Cloud-based
Big Data comes into the picture. The big data analytics tools and repositories remove the hard
thinking and generate reliable and calculative insights out of huge volumes of data within a
matter of seconds. This means in the future we will need more doctors who are trained to work
with big data. The big data revolution is bringing up sophisticated methods of consolidating
information from tons of sources. The focus is on providing the most relevant and updated
information to doctors and medical practitioners in real time while they are consulting their
Security.

Improved computing resources at lower initial capital


Mobile Cloud computing is playing a vital role in making Network industry more
Security-centric and data-driven. It helps in storing a large amount of data and sharing
information among hospital, physicians, data analysts, and Security at a lower initial capital. It
supports big data set for storing and computing medical imaging like radiology, genomic data
offloading and collecting Electronic Network Records. It is a boundary-less and flexible way to
compute and store data. Incorporating Mobile Cloud-based Big Data increases collaboration and
security in a cost-effective manner.

Big Data store and analyzing data from all possible resources
Up till now the collection of data is limited to the major available resources in the
Network sector. However, with the advent of Smartphone apps and wearable’s, data is now
everywhere. And this allows practitioners to know Security’ Network conditions in a more
precise manner. Apps that act like pedometers to measure your steps, the calorie counter for your
diet, the app for monitoring and recording heart rate, blood pressure and blood sugar levels, and
wearable devices like Fit bit, Jawbone etc. are all sources of data nowadays. In the near future,
the Security will share this data with the doctor who can utilize it as a diagnostic toolbox to
provide better treatment in less time.
CHAPTER 4
BIG DATA NETWORK USING ANT COLONY OPTIMIZATION

Big Data Network is the drive to capitalize on growing Security and Network system data
availability to generate Network innovation. By making smart use of the ever-increasing amount
of data available, we can find new insights by re-examining the data or combining it with other
information. In Network this means not just mining Security records, medical images, biobanks,
test results , etc., for insights, diagnoses and decision support advice, but also continuous
analysis of the data streams produced for and by every Security in a hospital, a doctor’s office, at
home and even while on the move via mobile devices.
Current medical hardware, monitoring everything from vital signs to blood chemistry, is
beginning to be networked and connected to electronic Security records, personal Network
records, and other Network systems.

The resulting data stream is monitored by Network professionals and Network software systems.
This allows the former to care for more Security, or to intervene and guide Security earlier
before an exacerbation of their (chronic) diseases. At the same time data are provided for bio-
medical and clinical Smart Network care to mine for patterns and correlations, triggering a
process of “data-intensive scientific discovery”, building on the traditional uses of empirical
description, theoretical computer-based models and simulations of complex phenomena.

Big Data has been characterized as raising five essentially independent challenges:
Volume,

Velocity,

Variety,

Veracity (lack thereof),

Value (hard to extract).

As elsewhere, in Big Data Network the data volume is in-creasing and so is data velocity
as continuous monitoring technology becomes ever cheaper. With so many types of tests, and the
existing wide range of medical hardware and personalized monitoring devices Network data
could not be more varied, yet data from this variety of sources must be combined for processing
to reap the expected rewards. In Network, veracity of data is of paramount importance, requiring
careful data curation and standardization efforts but at the same time seeming to be in opposition
to the enforcement of privacy rights2.

Finally, extracting value out of big Network data for all its beneficiaries (clinicians,
clinical Smart Network care, pharmaceutical companies, Network policy-makers, etc.) demands
significant innovations in data discovery, transparency and openness, explanation and
provenance, summarization and visualization, and will constitute a major step towards the cove-
ted democratization of data analytics.

The following points will need therefore to be fleshed out:


 Management of big data
 Seamless end-to-end big data curation
 Data discovery, profiling, extraction, cleaning, integration, analysis, visualisation,
summarisation, explanation
 Use of big data
 Appropriate use of big data – avoiding over-reliance
 Responsible use of automated techniques
 Communicating big data findings to Security
 Integrating data analytics into clinical workflows

ANT Colony Optimization Technique

The Ant Colony Optimization (ACO) algorithm is a metaheuristic initially proposed by


Marco Dorigo in his PhD dissertation in 1992. “The original idea comes from observing the
exploitation of food resources among ants, in which ants’ individually limited cognitive abilities
have collectively been able to find the shortest path between a food source and the nest”.

It is firstly used to solve traveling salesman problem (TSP). Because of the characteristics
of distributing computing, self-organization and positive feedback, ACO has been used in prior
works for routing in Sensor Networks “Node Potential” is the heuristic used to evaluate the
potential of next hop selection based on three factors: the candidate’s distance to the sink node,
its distance to the nearest aggregation node and its data correlation with the current node.

In this algorithm, random searching for the destination (sink node) is needed in early
iterations. Use a simpler heuristic by only considering the distance to the sink node. An
algorithm composed of path construction, path maintenance, and aggregation schemes including
synchronization scheme, loop-free scheme, and avoiding collision scheme.

There is a problem ignored by the algorithms above. Although ACO aggregation


algorithms converge to a route very close to the optimum route, most of them only use a single
path to transfer data until an active node in the path runs out of battery. Then the path
construction and data delivery cycle starts again.

Although route discovery overhead can be reduced, those algorithms do not taken into
consideration limitations of WSNs, especially energy limit of Sensor nodes and number of
agents required to establish the routing Repeatedly using the same optimal path exhausts the
relaying nodes’ energy quickly.

Relatively frequent efforts to maintain the Network and to explore new paths are needed.
Therefore, this approach is not energy efficient and results in shorter Sensor nodes’ lifetime and
consequently Network lifetime. Algorithms that separate path establishment and data delivery
processes suffer from this problem.

Data aggregation approach improves energy efficiency in Wireless Sensor Networks by


eliminating redundant packets, reducing end-to-end delay and Network traffic. This research
studies the effect of combining data aggregation technique and multi-path ACO algorithm with
different heuristics on Network lifetime and end-to-end delay.

Virtualization Resource Access


Virtualization technology introduces a middle layer between the hardware and software
layers in a cloudlet, allowing the hardware resources to be shared by means of VM. Resources
(e.g., CPU, memory, network bandwidth, etc.) in a cloudlet are provisioned to these VMs.
Resource provisioning in cloud computing is a well-studied
area.However,themobilityinMCCintroduces several challenges to maintain an acceptable Quality
ofService(QoS)whenprovisioningcloudresources.Mobile users may move from one Access Point
(AP) to another, increasing their distances between current locations and the cloudlet, where the
tasks are provisioned. This increases the task-execution time. To address this issue, we propose a
VM migration technique for a heterogeneous MCC system following the user’s mobility pattern.
That is, when a user moves from one cloudlet to another cloudlet, the resource or
VMmustbemigratedtothecloudletthatisnearesttotheuser. Consider the following scenario: a blind
user is executing an application that takes an image from his surroundings.
Then,theapplicationprocessestheimageinthecloudletand gives a response to the user’s local client.
That is, the application continuously uploads some data and the cloud server
processesthisdatatoprovideresponsesbacktotheuser.

Fig: Ant colony optimization


Now, if the blind user moves away from the current cloudlet, then
heorshewillexperienceadelayedresponsefromthemobile application executing in the cloudlet,
degrading the overall performance of the application. To avoid this performance degradation, it
is necessary for the system to adopt a VM migration
methodtochooseacloudletthatiscurrentlycloser to the user to which to migrate the VM. User
mobility is not the only reason forcing a VM to migrate. Migration can be initiated to minimize
the overprovisioned resources and thus improve the overall system objectives. For instance, if a
VM is required to be migrated from a cloudlet to any of the candidate cloudlets, the new cloudlet
may not have the same type of VM. In that case, a VM with more resource than the current one
must be chosen and provisioned in order to migrate the VM and thus minimize task-execution
time.

Fig : Double bridge experiment. (a) Ants start exploring the double bridge. (b) Eventually
most of the ants choose the shortest path in principle capable of building a solution (i.e., of
finding a path between nest and food resource), it is only the colony of ants that presents
the “shortest path finding” behavior. In a sense, this behavior is an emergent property of
the ant colony.

The VM migration is provisioned more resources than the required. Therefore, this over-
provisioned resource greatly decreases the system objectives, as it reduces the number of
provisionedVMsinthecloudlets.Furthermore,thejointVM
migrationapproach,whereasetofVMsisremappedbasedon the VM task execution time and over-
provisioned resources canhelptoeffectivelyincreasetheoverallsystemobjectives. In contrast to the
joint VM migration approach, single VM migration can only improve particular user objectives
but not the system objectives.

 Cloud-wide task migration, where the task-migration decision is made by a central cloud,
which maximizes the objectives of a cloud provider.
 Server-centric task migration, where all migration decisions are made by the server,
where the task is currently executing.
 Task-based migration, where migration is initiated by the task itself.

Ant colony Algorithm


Step 1: while (termination criterion not satisfied)
Step 2: ant generation and activity();
Step 3: pheromone
Step 4: evaporation();
Step 5: daemon actions(); “optional”
Step 6: end while
Step 7: end Algorithm

Ant colony use Mobile Cloud Computing


1. Each cloud agent searches for the data which taken for the retrieval
2. Cloud agent has its own memory which enable to store the searched data in it for latter, based
on the memory capacity different types of data can be searched out and stored for the future
retrieval purpose.
3. An cloud agent C can be assigned a start state Sc and retrieving storage buffer can be set as Ec
4. Cloud agent start from a initial state and move over all to the feasible data locations, building
the solutions in an incremental way. The procedure stops when at least one queried data has
found.
5. The Cloud agent locates a data in a node f can move to node g chosen in a feasible
neighborhood Nc through probabilistic decision rules. This can be formulated as follows : An
cloud agent c in state sr=<Sr-1;f> can move to any node in its feasible neighborhood Nc fined as
N ki j | (j ЄNi) Λ (<sr, j > Є S)} sr Є S , with S is a set of all states
6. A probabilistic rule is function of the following a) The data stored in a local node, data
structures A f =[A fg ] called ant routing table obtains from. pheromone trails and heuristic
values b) the ant’s own memory from previous iteration, and the problem constraints.

7. When moving from node f to neighbor node g, the agent update the pheromone trails t (fg) on
the edge (f,g).

8. Once the data is retrieved from the cloud, the agent can retrace the same path backward,
update pheromone trails and close the operation.

Algorithm Overview

In ACO algorithms, a colony of artificial ants is used toconstruct solutions guided by the
pheromone trails and heuristic information. The original idea of ACO comes from observing the
exploitation of food resources among ants. Ants explore the area surrounding their nest initially
in a random manner. As soon as an ant finds a source of food (source node), it evaluates the
quantity and quality of the food and carries some of it to the nest (sink node). During the back
tracking, the ant deposits a pheromone trail on the ground. The quality of deposited pheromone,
which may depend on the quantity and quality of the food, will guide other ants to the food
source. The pheromone trails are simulated via a parameterized probabilistic model. The
pheromone model consists of a set of parameters. In general, the ACO approach attempts to find
the optimal routing by iterating the following two steps:
1. Solutions are constructed using a node selection model based on a predetermined
heuristic and the pheromone model, a parameterized probability distribution over the solution
space.
2. The solutions that were constructed in earlier iterations are used to modify the
pheromone values in a way that is deemed to bias the search toward high quality solutions.
The algorithm runs in two passes: forward and backward. In the forward pass, the route is
constructed by a group of ants, each of which starts from a unique source node. In the first
iteration, an ant searches a route to the destination randomly. Later, an ant searches the nearest
point of the previously discovered route. This could take much iteration before the ant can find a
correct path with a reasonable length. A solution is flooding the sink node ID from the sink to all
the Sensor nodes in the Network before any ant starts. The points where multiple ants join are
aggregation nodes. In the backward pass every ant starts from sink node and travels back to the
corresponding source node by following the path discovered in the forward pass. Pheromone is
deposited hop by hop during the traversal.
Nodes of the discovered path are given weights as a result of node selection depending
on the node potential which indicates heuristics for reaching the destination Pheromone trails are
the heuristics to communicate with other ants of the route discovered. The trail followed by ants
most often gets more and more pheromone and eventually converges to the optimal route.
Pheromone in non-optimal route gets evaporated with time. The aggregation points on the
optimal tree identify data aggregation.

Path Discovery Procedure

The procedure is mainly composed of forward and backward passes. In the forward pass,
an ant tries to explore a new path based on the heuristic rule and the pheromone amount on the
edges. Backtracking is used in the forward pass when an ant finds a dead end or is running into a
loop. In the backward pass, the ant updates the pheromone amount on the path constructed in the
forward pass. Other important components in the algorithms include data-aggregation, loop
control, and Network maintenance. In WSN, each node has a unique identity. Every node is able
to calculate and remember its current heuristic value. Initially, the sink node floods its identity to
all the nodes in the Network. After a node receives the packet, it computes its hop-count to the
sink node and correspondingly its initial heuristic value.

Forward Pass

Each ant is assigned a source node. After that, an ant starts fromthe source node and
moves towards the sink node using ad-hoc routing. The forward pass ends only if all the ants
have arrived at the sink node. Single ant-based solution construction uses following steps:

 If the node has been visited in the same iteration, follow a previous ant’s path 
 Use a node selection rule
 If all the neighbors have been visited, use the shortest path
 If no neighbor nodes, backtrack to the previous node
 If no neighbor nodes and the previous node is dead, record the Network 
Lifetime and exit the program. 

The current node sends the packet. The selected node receives the packet. Both Nodes
update the residual energy after transmission. If the current node does not have enough energy to
send, this transmission fails. The Network is maintained afterwards. Transmission failure is
mostly prevented by doing a receiving and sending energy check in the node selection step.

Backward Pass

Ants start from the sink node and move towards their sourcenodes. The ants follow the
paths discovered in the forward pass. Before an ant arrives at its source node, the algorithm
repeats:

 Retrieve the previous node in the path solution. 



 Transmit the packet. 

 If transmission fails, maintain the Network and terminate this ant. 



 Encourage or discourage the node selection in the forward pass by depositing. 



Data Aggregation in Forward Pass

Each Sensor node maintains two queues to store packets: a receiving queue and a sending
queue. The packet sending process includes:

 Remove all the packets from the receiving queue. 



 For ”SinkDistNoAggre”, push all the packets into the sending queue. 

 For other aggregation algorithms, use the predefined function to aggregate all the
received packets into one packet and push it into the sending queue. 
 Among all the ants arrived at this node, select the earliest ant as the aggregating ant. 
 The aggregating ant will finish the rest of the routing construction in this iteration. 
 All the later arrived ants become aggregated ants. They remember the aggregating ant. 
 Each aggregated ant shares its path with the aggregating ant. The aggregating ant updates
its subsequent hops with all the aggregated ants.

Loop Control and Failure Handling


“Loop” is defined as the situation that anant revisits an already-visited node in the same
forward pass. Since each ant remembers the path, it can avoid running into a loop by comparing
the candidate node’s ID with the visited nodes’ IDs.
An ant is considered failing its task in a iteration, if all the neighborhood nodes of the
current node have been visited. In that case, the ant uses the shortest path to deliver the packet to
the sink node. The node’s previous visiting history is not considered when choosing the next
node. A path resulting in “failure” is discouraged.

Network Maintenance

When a node does not have sufficient energy to send orreceive (the “dead node”), it is
removed from the neighbor list of its neighborhood nodes. Nodes with more hop-count than the
“dead node” recalculate their hop-count and heuristic value. If the “dead node” is a source node,
find the node with the maximum energy in the Network as the new source. Afterwards, update
the source node of the ant.

If the “dead node” is the sink node, recharge the node with more energy. Sink node is
different from other nodes because it needs to perform more frequent transmission and
computation for the purpose of application. Therefore, it is assumed that the sink node has plenty
energy to last until the Network dies.

 Next Node Selection


To support next node selection, rules are established and followed in the forward pass.
These rules check the candidate node’s probability calculated from heuristic and pheromone
values. The heuristic is updated whenever the value is changed. The pheromone is updated in the
backward pass according to the rules set.

 Node selection rules

“LeadingExploration” and “Combined Rule”. The “SinkDistNoAggre,” “SinkDistLead,”


and “ResidualEnergy” algorithms use “LeadingExploration” because the first found best
candidate node needs to be selected. The “SinkDistComb” and “SinkAggreDist” algorithms use
“CombinedRule” so that multiple paths can be established.

 Leading Exploration

Among all the neighborhood nodes, select the first nodewith the highest probability, even
if there are multiple nodes with the same probability. This method is deterministic. In every
iteration, an ant always discovers the same path to the sink node until one of the intermediate
nodes dies. If the same Network topology is tested repeatedly, the total energy cost and Network
lifetime are the same.

 Combined Rule
Node selection is divided into sessions. Each session includesone or more iterations. A
node discovered from the current or a previous iteration is used. Similar to “Leading
Exploration,” the probability of each neighborhood node is calculated. A group of nodes with
highest probability is stored in a cache. In each iteration, one node is randomly selected and
removed from the cache. When the cache is empty, the probability calculation of all the alive
neighborhood nodes is repeated.

 Probability Calculation
When a node is ready to send a packet, it calculates theprobability of all the neighbors
using the equation below.

(1)

In equation (1), an ant k having data packet in node i chooses to move to node j until the
sink node, where τ is the pheromone, η is the heuristic, Ni is the set of neighbors of node i, and β
is a parameter which determines the relative importance of pheromone versus distance (β>0).
Value η is calculated using equation (2). Multiple factors can be used and each one is weighted.

 Pheromone Update Rules

The pheromone value is associated with the link (edge)between two nodes. Each edge
has a pheromone value, which is initially all the same. The value is updated in each iteration in
order to bias the node selection process in the next iteration. The value is updated twice in each
iteration.

 Evaporation on all edges


After all the ants finish the forward passing and beforethey are going backward, the
pheromone values on all the edges in the Network evaporate at rate ρ. The value is consistently
reduced. Equation (3) shows how the evaporated pheromone value is calculated.

τij = (1 – ρ) ×τij (2)

 Deposit pheromone during backward pass

In the backward pass, each ant deposits or reduces the pheromone value on its own
solution path. This step is different from the conventional ACO algorithm, in which pheromone
is always deposited using the same rate. Encouraging or discouraging a node choice in the
forward pass depends on the comparison of performance in the forward pass with the one of the
best iteration found so far. The new pheromone is calculated using equation (4). Equations (5)
and (6) are used to support equation (4).

τij = (τij + ρΔτij) × e0 (3)

τij = [ζ + (hi – hj)] × Δωj (4)

Δωj = ∑ (5)

In equation (4), ρ is the pheromone decay parameter, τij is the pheromone value on the

edge between nodes i and j, and e0 is the encouraging or discouraging rate derived from the
forward pass. A path resulting in less energy consumption and smaller total hop-count is
preferred. The best iteration is one with the least energy consumption and hop-count among all
previous iterations. It is used as a control to calculate the e0 in the current iteration. If the
forward pass is a failed path exploration or used more hop-count and energy consumption than
the best iteration, the path is discouraged. Very small amount of pheromone is deposited on the
edge to differentiate from those links not been visited, and e0 is set to a predetermined
“PunishRate,” which is a relatively low rate between 0 and 1.
If the forward pass found a path using the same hop-count and energy consumption as
the best iteration, e0 is set to a relatively higher rate between 0 and 1-- the “encourageRate.” If
the forward pass found a path with the same hop-count but less energy consumption than the
best iteration, e0 = 1.5 × encourageRate. If the forward pass found a path using less hop-count
and energy consumption than the best iteration, e0 = hop-count difference × encourageRate.

In equation (5), ζ is a positive number, hi is the hop-count between node i and the sink,
and hj is the hop-count between node j and the sink. If the value of (hi - hj) is greater than zero,
it can be concluded that node j is closer to the sink node than node i. Therefore, the algorithm
rewards the path from node i to node j by depositing more pheromone. If the value equals to
zero, it means that both nodes i and j have the same hop-count to the sink, then the algorithm
lays little pheromone on the path. If the value is less than zero, the algorithm does not lay
pheromone on this path. In equation (6), Rj is the total hop-counts of these sources before vising

node j. Therefore, Δωj is the total hop-counts of some sources to the sink through node j. The

less the total hop accounts, the larger amount of pheromone is added on the path from node i to
node j, as shown in equation (5).
This means that more ants are encouraged to follow this path. For an aggregation node,it updates
the pheromone levels of its all neighbors by equation (4) when an ant moves to it. If a node does
not have ants visit it within a limited time, its pheromone is evaporated according to equation (3).
CHAPTER 4
HONEY POTS DETECTION USING SMART NETWORK CARE

History of Honeypots:
The concept of Honeypots was first described by Clifford Stoll in 1990. The book is a
novel based on a real story which happened to Stoll. He discovered a hacked computer and
decided to learn how the intruder gained access to the system. To track the hacker back to his
origin, Stoll created a faked environment with the purpose to keep the attacker busy. The idea
was to track the connection while the attacker was searching through prepared documents. Stoll
did not call his trap a Honeypot; he just prepared a network drive with faked documents to keep
the intruder on his machine. Then he used monitoring tools to track the hacker’s origin and find
out how he came in.

In 1999 that idea was picked up again by the Honey net project, lead and founded by
Lance Spritzer. During years of development the Honey net project created several papers on
Honeypots and introduced techniques to build efficient Honeypots. The Honey net Project is a
non-profit research organization of security professionals dedicated to information security.

Types of Honeypots:

To describe Honeypots in greater detail it is necessary to define types of Honeypots. The type
also defines their goal, as we will see in the following. A very good description on those can also
be found in.

The idea of Honeypots:

The concept of Honeypots in general is to catch malicious network activity with a prepared
machine. This computer is used as bait. The intruder is intended to detect the Honeypot and try
to break into it. Next the type and purpose of the Honeypot specifies what the attacker will be
able to perform. Often Honeypots are used in conjunction with Intrusion Detection Systems. In
these cases Honeypots serve as Production Honeypots and only extend the IDS. But in the
concept of Honeynets the Honeypot is the major part. Here the IDS is set up to extend the
Honeypot’s recording capabilities

Fig: Single Honey pot detection in system

A common setup is to deploy a Honeypot within a production system. The figure above
shows the Honeypot colored orange. It is not registered in any naming servers or any other
production systems, i.e. domain controller. In this way no one should know about the existence
of the Honeypot. This is important, because only within a properly configured network, one can
assume that every packet sent to the Honeypot, is suspect for an attack. If misconfigured packets
arrive, the amount of false alerts will rise and the value of the Honeypot drops. Production
Honeypot Production Honeypots are primarily used for detection. Typically they work as
extension to Intrusion Detection Systems performing an advanced detection function. They also
proove if existing security functions are adequate, i.e. if a Honeypot is probed or attacked the
attacker must have found a way to the Honeypot. This could be a known way, which is hard to
lock, or even an unknown hole. However measures should be taken to avoid a real attack. With
the knowledge of the attack on the Honeypot it is easier to determine and close security holes.
A Honeypot allows justifying the investment of a firewall. Without any evidence that there were
attacks, someone from the management could assume that there are no attacks on the network.
Therefore that person could suggest stopping investing in security as there are no threats. With a
Honeypot there is recorded evidence of attacks. The system can provide information for statistics
of monthly happened attacks.

LogRhythm’s Honeypot Security Analytics Suite


Log Rhythm’s Honeypot Security Analytics Suite allows customers to centrally manage
and continuously monitor honeypot event activity for adaptive threat defense. When an attacker
begins to interact with the honeypot, LogRhythm’s Security Intelligence Platform begins
tracking the attacker’s actions, analyzing the honeypot data to create profiles of behavioral
patterns and attack methodologies based on the emerging threats. This automated and integrated
approach to honeypots eliminates the need for the manual review and maintenance associated
with traditional honeypot deployments.

Fig :LogRhythm’s Honeypot Security Analytics Suite

The Honeypot Security Analytics Suite provides AI Engine rules that perform real-time,
advanced analytics on all activity captured in the honeypot, including successful logins to the
system, observed successful attacks, and attempted/successful malware activity on the host. As a
result, the Honeypot suite allows AI Engine to also detect when similar activity captured from
the honeypot is observed on the production network. For example, if an observed attacker
interaction on the honeypot is followed by a subsequent interaction with legitimate hosts within
the environment such as production web servers, LogRhythm can generate an alarm alerting IT
and security personnel to the suspicious activity.

Prevent Compromised Credentials


Challenge
The majority of attacks exploit valid user credentials to gain unrestricted access to the corporate
network. Organizations need an effective means of monitoring for insecure accounts and
passwords to prevent credentials from being compromised.
Solution
Log Rhythm’s Honeypot Security Analytics Suite provides AI Engine rules that monitor for
successful and unsuccessful logon attempts to honeypot servers, capturing details on the
username and password. This allows analysts to see commonly attempted username and
password combinations on the honeypot hosts.
Additional Benefit
By knowing which accounts are being targeted by hackers and which passwords are vulnerable
to exploit in the honeypot, organizations are able to strategically increase defense measures
within their network by monitoring at-risk user accounts and enforcing stricter password
policies. A Smart Response TMplugin can automatically add the IP address observed in the
honeypot to a firewall list to prevent interaction with the corporate network
Flow Chart

Fig: MCC User Authentication

Transferring data from one remote system to another under the control of a local system
is remote uploading. Remote uploading is used by some online file hosting services. It is also
used when the local computer has a slow connection to the remote systems, but they have a fast
connection between them. Without remote uploading functionality, the data would have to first
be download to local host and then uploaded to the remote file hosting server, both times over
slow connections.
Algorithm Implementation

Algorithm Honey been Based VM Migration, at Cloudlet i

INPUT:VMsetVandaccessiblecloudletsetCu foreach VM

u. System Parameters

OUTPUT: VM-cloudlet pair.

Step 1: Initialize system tunning parameter α

Step 2: Initialize system migration parameters γl ,γg

Step 3: Determine Candidate VM set Vs using Algorithm 1

Step 4: Initialize ants set A

Step 5: Generate initial solution using FFVM Algorithm 3

Step 6: Calculate initial pheromone γ0

Step 7: Set maximum iteration MAX_IT

Step 8: while ( doiteration≤MAX_IT)

Step 9: for (Ant a∈A) do

Step 10: k=0 11: repeat

Step 12: Select a VM v in cloudlet j for VM uk∈ Vs

Step 13: k=k+1

Step 14: until every VM is provisioned to a cloudlet 15: for (VM k∈Vs) do

Step 16: Update the local pheromone

Step 17: end for

Step 18: end for


Step 19: Update the global pheromone

Step 20: iteration=iteration+1

Step 21: end while

Step 22: Return VM-cloudlet pairs

Honeypots improve network security


Honey pots turn the tables for Hackers and computer security experts. While in the
classical field of computer security, a computer should be as secure as possible, in the realm of
Honeypots the security holes are opened on purpose. In other words Honeypots welcome Hacker
and other threats.

Purpose of Honeypot Technique:


The purpose of a Honeypot is to detect and learn from attacks and use that information to
improve security. A network administrator obtains first-hand information about the current
threats on his network. Undiscovered security holes can be protected gained by the information
from a Honeypot. A Honeypot is a computer connected to a network. It can be used to examine
vulnerabilities of the operating system or network. Depending on the setup, security holes can be
studied in general or in particular. Moreover it can be used to observe activities of an individual
which gained access to the Honeypot. Honeypots are a unique tool to learn about the tactics of
hackers. So far network monitoring techniques use passive devices, such as Intrusion Detection
Systems (IDS). IDS analyze network traffic for malicious connections based on patterns. Those
can be particular words in packet payloads or specific sequences of packets. However there is the
possibility of false positive alerts, due to a pattern mismatch or even worse, false negative alerts
on actual attacks. On a Honeypot every packet is suspicious.
Fig: Honey pot setup

Research Smart Network Honeypots:


As the name suggests these honeypots are deployed and used by Smart Network care or
curious individuals. These are used to gain knowledge about the methods used by the black hat
community. They help security Smart Network care learn more about attack methods and help in
designing better security tools. They can also help us detect new attack methods or bugs in
existing protocols or software. They can also be used to strengthen or verify existing intrusion
detection systems. They can provide valuable data which can be used to perform forensic or
statistical analysis.

Production Honeypots:
These honeypots are deployed by organizations as a part of their security infrastructure.
These add value to the security measures of an organization. These honeypots can be used to
refine an organization’s security policies and validate its intrusion detection systems. Production
honeypots can provide warnings a head of an actual attack. For example, lots of HTTP scans
detected by honeypot is an indicator that a new http exploit might be in the wild. Normally
commercial servers have to deal with large amounts of traffic and it is not always possible for
intrusion detection systems to detect all suspicious activity. Honeypots can function as early
warning systems and provide hints and directions to security administrators on what to lookout
for.

Fig: Process of finding the minimum distance

The real value of a honeypot lies in it being probed, scanned and even compromised, so it
should be made accessible to computers on the Internet or at least as accessible as other
computers on the network. As far as possible the system should behave as a normal system on
the Internet and should not show any signs of it being monitored or of it being a honeypot. Even
though we want the honeypot to be compromised it shouldn’t pose a threat to other systems on
the Internet. To achieve this, network traffic leaving the honeypot should be regulated and
monitored.

Security Issues:
Honeypots don’t provide security (they are not a securing tool) for an organization but if
implemented and used correctly they enhance existing security policies and techniques.
Honeypots can be said to generate a certain degree of security risk and it is the administrator’s
responsibility to deal with it. The level of security risk depends on their implementation and
deployment. There are two views of how honeypot systems should handle its security risks.
Honeypots that fake or simulate: There are honeypot tools that simulate or fake services or
even fake vulnerabilities. They deceive any attacker to think they are accessing one particular
system or service. A properly designed tool can be helpful in gathering more information about a
variety of servers and systems. Such systems are easier to deploy and can be used as alerting
systems and are less likely to be used for further illegal activities.

Honeypots that are real systems: This is a viewpoint that states that honeypots should not be
anything different from actual systems since the main idea is to secure the systems that are in
use. These honeypots don’t fake or simulate anything and are implemented using actual systems
and servers that are in use in the real world. Such honeypots reduce the chances of the hacker
knowing that he is on a honeypot. These honeypots have a high risk factor and cannot be
deployed everywhere. They need a controlled environment and administrative expertise. A
compromised honeypot is a potential risk to other computers on the network or for that matter
the Internet.

Legal issues:
To start with, a honeypot should be seen as an instrument of learning. Though there is a
viewpoint that honeypots can be used to “trap” hackers. Such an idea can be considered as an
entrapment. The legal definition of entrapment is “Entrapment is the conception and planning of
an offense by an officer, and his procurement of its commission by one who would not have
perpetrated it except for the trickery, persuasion, or fraud of the officers."

This legal definition applies only to law-enforcement, so organizations or educational


institutions cannot be charged with entrapment. The key to establishing entrapment is
“predisposition” would the attacker have committed the crime without “encouragement activity”.
Also as long as one doesn’t entice the hacker in any way it cannot be considered entrapment. The
issue of privacy is also of concern with respect to the monitoring and intercepting of
communication. Honeypots are systems intended to be used by nobody.
Role of Honeypots in Network Security:
Honeypots and related technologies have generated great deal of interest in the past two
years. Honeypots can be considered to be one of the latest technologies in network security
today. Project Honey net is actively involved with deployment and study of honeypots.
Honeypots are used extensively in research and it’s a matter of time that they will be used in
production environments as well.

Fig: Security Honeypot


Security categories:
To assess the value of Honeypots we will break down security into three categories as
defined by Bruce Schneier in Secrets and Lies. Schneier breaks security into prevention,
detection and response.

Prevention:
Prevention means keeping the bad guys out. Normally this is accomplished by firewalls and well
patched systems. The value Honeypots can add to this category is small. If a random attack is
Performed, Honeypots can detect that attack, but not prevent it as the targets are not predictable.
One case where Honeypots help with prevention is when an attacker is directly hacking into a
server. In this case a Honeypot would cause the hacker to waste time on a non-sufficient target
and help preventing an attack on a production system. But this means that the attacker has
attacked the Honeypot before attacking a real server and not otherwise.

Detection:
Detecting intrusions in networks is similar to the function of an alarm system for
protecting facilities. Someone breaks into a house and an alarm goes off. In the realm of
computers this is accomplished by Intrusion Detection Systems.

Fig: Intrusion Detection based on Web server

The problems with these systems are false alarms and non-detected alarms. A system might alert
on suspicious or malicious activity, even if the data was valid production traffic. Due to the high
network traffic on most networks it is extremely difficult to process every data, so the chances
for false alarms increase with the amount of data processed. High traffic also leads to non-
Detected attacks. When the system is not able to process all data, it has to drop certain packets,
which leaves those un scanned. An attacker could benefit of such high loads on network traffic.
Response:
After successfully detecting an attack we need information to prevent further threats of the same
type. Or in case an institution has established a security policy and one of the employees violated
against them, the administration needs proper evidence. Honeypots provide exact evidence of
malicious activities. As they are not part of production systems any packet sent to them is
suspicious and recorded for analysis. The difference to a production server is that there is no
traffic with regular data such as traffic to and from a web server. This reduces the amount of data
recorded dramatically and makes evaluation much easier. With that specific information it is
fairly easy to start effective countermeasures.

Concept, architecture and terms of a Honeypot:


This chapter defines concepts, architecture and terms used in the realm of Honeypots. It
describes the possible types of Honeypots and the intended usage and purpose of each type.
Further auxiliary terms are explained to gain a deeper understanding about the purpose of
Honeypot concepts.

Black hats and White hats:


In the computer security community, a Black hat is a skilled hacker who uses his or her ability to
pursue his interest illegally. They are often economically motivated, or may be representing a
political cause. Sometimes, however, it is pure curiosity. The term “Black hat” is derived from
old Western movies where outlaws wore black hats and outfits and heroes typically wore white
outfits with white hats. White hats are ethically opposed to the abuse of Computer systems. A
White hat generally concentrates on securing IT Systems whereas a Black hat would like to
break into them.
Both Black hats and White hats are hackers. However both are skilled computer experts in
contrast to the so-called "script kiddies". Actually script kiddies could be referred as Black hats,
but this would be a compliment to such individuals. From the work of real hackers, script
kiddies, extract discovered and published exploits and merge them into a script. They do not
develop own exploits or discover vulnerabilities. Instead they use tools published by the Black
hat community and create random damage.
Level of interaction

Honeypots were described by their role of application. To describe them in greater detail it is
necessary to explain the level of interaction with the attacker.

Low-interaction Honeypots:
A low-interaction Honeypot emulates network services only to the point that an intruder can log
in but perform no actions. In some cases a banner can be sent back to the origin but not more.
Low-interaction Honeypots are used only for detection and serve as production Honeypots.
In comparison to IDS systems, low-interaction Honeypots are also logging and detecting attacks.
Furthermore they are capable of responding to certain login attempts, while an IDS stays passive.
The attacker will only gain access to the emulated service. The underlying operating system is
not touched in any way. Hence this is a very secure solution which promotes little risk to the
environment where it is installed in.

Medium-interaction Honeypots:
Medium-interaction Honeypots are further capable of emulating full services or specific
vulnerabilities, i.e. they could emulate the behavior of a Microsoft IIS web server. Their primary
purpose is detection and they are used as production Honeypots.
Similar to low-interaction Honeypots, medium-interaction Honeypots are installed as an
application on the host operating system and only the emulated services are presented to the
public. But the emulated services on mediuminteraction Honeypots are more powerful, thus the
chance of failure is higher which makes the use of medium-interaction Honeypots more risky.

High-interaction Honeypots:
These are the most elaborated Honeypots. They either emulate a full operating system or use a
real installation of an operating system with additional monitoring. High-interaction Honeypots
are used primarily as research Honeypots but can also serve as production Honeypots.
As they offer a full operating system the risk involved is very high. An intruder could easily use
the compromised platform to attack other devices in the network or cause bandwidth losses by
creating enormous traffic.
Types of attacks:
There are a lot of attacks on networks, but there are only two main categories of attacks.

Random attacks:
Most attacks on the internet are performed by automated tools. Often used by unskilled
users, the so-called script-kiddies they search for vulnerabilities or already installed Backdoors
(see introduction). This is like walking down a street and trying to open every car by pulling the
handle. Until the end of the day at least one car will be discovered unlocked. Most of these
attacks are preceded by scans on the entire IP address range, which means that any device on the
net is a possible target.

Direct attacks:
A direct attack occurs when a Black hat wants to break into a system of choice, such as
an ecommerce web server containing credit card numbers. Here only one system is touched and
often with unknown vulnerabilities. A good example for this is the theft of 40 million credit card
details at MasterCard International. That Card Systems Solutions, a third-party processor of
payment data has encountered a security breach which potentially exposed more than 40 million
cards of all brands to fraud. "It looks like a hacker gained access to Card Systems' database and
installed a script that acts like a virus, searching out certain types of card transaction data, direct
attacks are performed by skilled hackers; it requires experienced knowledge. In contrast to the
tools used for random attacks, the tools used by experienced Black hats are not common. Often
the attacker uses a tool which is not published in the Black hat community. This increases the
threat of those attacks.

Security categories:
To assess the value of Honeypots we will break down security into three categories as
defined by Bruce Schneider in Secrets and Schneider breaks security into prevention, detection
and response.
Prevention:
Prevention means keeping the bad guys out. Normally this is accomplished by firewalls
and well patched systems. The value Honeypots can add to this category is small. If a random
attack is performed, Honeypots can detect that attack, but not prevent it as the targets are not
predictable. One case where Honey pots help with prevention is when an attacker is directly
hacking into a server. In this case a Honeypot would cause the hacker to waste time on a non-
sufficient target and help preventing an attack on a production system. But this means that the
attacker has attacked the Honeypot before attacking a real server and not otherwise. Also if an
institution publishes the information that they use a Honeypot it might deter attackers from
hacking. But this is more in the fields of psychology and quite too abstract to add proper value to
security.

Honeypots in the field of application


The categorizes the field of application of Honeypots. It investigates different
environments and explains their individual attributes. Five scenarios have been developed to
separate the demands to Honeypots.
The use of a Honeypot poses risk and needs exact planning ahead to avoid damage.
Therefore it is necessary to consider what environment will be basis for installation. According
to the setup the results are quite different and need to be analyzed separately. For example the
amount of attacks occurring in a protected environment are less than the number of attacks
coming from the internet at least they should. Therefore a comparison of results afterwards needs
to focus on the environment.
In every case there is a risk of using a Honeypot. Risk is added on purpose by the nature
of a Honeypot. A compromised Honeypot, in Hacker terms an “owned box”, needs intensive
monitoring but also strong controlling mechanisms. Scenario VI discusses requirements on a
Honey pot-out-of-the box solution and elaborates different functions which have to be provided.
CHAPTER 6
EXPERIMENTAL RESULT

Hadoop Server
Hadoop is an open source distributed processing framework that manages data processing
and storage for big data applications running in clustered systems. It is at the center of a growing
ecosystem of big data technologies that are primarily used to support advanced analytics
initiatives, including predictive analytics, data mining and machine learning applications.
Hadoop can handle various forms of structured and unstructured data, giving users more
flexibility for collecting, processing and analyzing data than relational databases and data
warehouses provide.

Fig: Hadoop Login

The Usage of Hadoop


The flexible nature of a Hadoop system means companies can add to or modify their data
system as their needs change, using cheap and readily-available parts from any IT vendor.
Just about all of the big online names use it, and as anyone is free to alter it for their own
purposes, modifications made to the software by expert engineers at, for example, Amazon and
Google, are fed back to the development community, where they are often used to improve the
"official" product. This form of collaborative development between volunteer and commercial
users is a key feature of open source software.
In its "raw" state - using the basic modules supplied here http://hadoop.apache.org/ by
Apache, it can be very complex, even for IT professionals - which is why various commercial
versions have been developed such as Cloudera which simplify the task of installing and running
a Hadoop system, as well as offering training and support services.

Fig : To Find the Data Node


Eclipse
1. Open Ellipse
2. Click File ->New Project >Java project
Copy all the Jar files from the locations “D:\hadoop-2.6.0\”
a. \share\hadoop\common\lib
b. \share\hadoop\mapreduce
c. \share\hadoop\mapreduce\lib share\hadoop\yarn
d. \share\hadoop\yarn\lib

Fig : Hadoop Integration for Eclipse

Connect DFS in Eclipse Eclipse ->Window ->Perspective ->Open Perspective-> Other -


>MapReduce ->Click OK. See a bar at the bottom. Click on Map/Reduce locations. Right click
on blank space, then click on “Edit setting,” and you will see the following screen.
Fig: To Detect The Hadoop location

 Configuration for Hadoop 1.x Fetch Hadoop using version control systems subversion
or git and checkout branch-1 or the particular release branch. Otherwise, download a
source tarball from the CDH3 releases or Hadoop releases.
 Generate Eclipse project information using Ant via command line:
 For Hadoop (1.x or branch-1), “ant eclipse”
 For Smart City releases, “ant eclipse-files”
 Pull sources into Eclipse:
 Go to File -> Import.
 Select General -> Existing Projects into Workspace.
 For the root directory, navigate to the top directory of the above downloaded source
Fig : Hadoop Initialized

1. Follow Steps 1 and 2 of the previous section (Hadoop 2.x).


2. Download MR1 source tarball from CDH4 Downloads and untar into a folder different
than the one from Step 1.
3. Within the MR1 folder, generate Eclipse project information using Ant via command line
(ant eclipse-files).
4. Configure .classpath using this perl script to make sure all classpath entries point to the
local Maven repository:
1. Copy the script to the top-level Hadoop directory.
2. Run $ perl configure-classpath.pl
5. Pull sources into Eclipse:
1. Go to File -> Import.
2. Select General -> Existing Projects into Workspace.
3. For the root directory, navigate to the top directory of the above downloaded
sources.
Fig : Hadoop-0.19.1tar.gz

1. Generate Eclipse project information using Maven: mvn clean && mvn install –D skip
Tests && mvn eclipse : eclipse. Note: mvn eclipse : eclipse generates a static .class path
file that Eclipse uses, this file isn’t automatically updated as the project/dependencies
change.
2. Pull sources into Eclipse:
1. Go to File -> Import.
2. Select General -> Existing Projects into Workspace.
3. For the root directory, navigate to the top directory of the above downloaded
source.

 Execute tar –xzf Hadoop-0.19.1tar.gz in the cygwin prompt, this will start the process of
unpacking Hadoop distribution. Once this is done, it will display newly created directory
called hadoop-0.19.1
 Verify whether unpacking is success by executing cd Hadoop-0.19.1 and then -1, which
provides the output as mentioned below which tells that everything is unpacked correctly.
Fig : Hadoop Advance Setting Configuration

In the next step, click on Configure Hadoop Installation link, displayed on the right side
of the project configuration window. Project preferences window display is shown in the image
below. Fill in the location of Hadoop directory in Hadoop Installation Directory in preferences
and click OK, and then close the project window after clicking on finish
Scenario I – unprotected environment
In an unprotected environment any IP address on the internet is able to initiate connections to
any port on the Honeywell. The Honeypot is accessible within the entire internet.

Fig : unprotected Environment


An adequate setup needs to ensure that the monitoring and logging capabilities are
sufficient of handling large numbers of packets. An experiment based on this scenario, recorded
approximately 597 packets a second Depending on the current propagation of worms in the
internet this can be more or less. The monitoring device, the Honeypot or an external monitor,
needs enough resources to handle the huge amount of traffic.
The type of address of the Honeypot can be public or private (def. of public and private
addresses in 3.3 and 3.4). The type of network addresses the Honeypot is located in is defined in
Scenario III resp. Scenario IV. If specifying a setup Scenario I and II cannot occur alone. Both
have to be used in conjunction with either Scenario III or Scenario IV. The reason for this is a
limitation described in Scenario IV.

Scenario II – protected environment

In this scenario the Honeypot is connected to the internet by a firewall. The firewall limits the
access to the Honeypot. Not every port is accessible from the internet resp. not every IP address
on the internet is able to initiate connections to the Honeypot. This scenario does not state the
degree of connectivity; it only states that there are some limitations. However those limitations
can be either strict, allowing almost no connection, or loose, only denying a few connections.
The firewall can be a standard firewall or a firewall with NAT1capabilities (see chapter 3.3).
However a public IP address is always assigned to the firewall.
Fig: Protected Environment

Scenario III – public address


This scenario focuses on the IP address on the Honeypot. In this scenario the Honeypot is
assigned a public address. The Internet Assigned Numbers Authority (IANA) maintains a
database [IANA 05] which lists the address ranges of public available addresses. All previous
RFCs have been replaced by this database [RFC 3232]. A public IP can be addressed from any
other public IP in the internet. This means that IP datagrams targeting a public IP are routed
through the internet to the target. A public IP must occur only once, it may not be assigned twice.
Applications on the Honeypot can directly communicate with the internet as they have
information of the public internet address. This is in contrast to scenario IV where an application
on the Honeypot is not aware of the public IP. It is further possible to perform a query on the
responsible Regional Internet Registry to lookup the name of the address registrar; this is called a
“whoissearch”.

Regional Internet Registries are:


 AfriNIC (African Network Information Centre) - Africa Region http://www.afrinic.net/
 APNIC (Asia Pacific Network Information Centre) - Asia/Pacific Region
http://www.apnic.net/
 ARIN (American Registry for Internet Numbers) - North America Region
http://www.arin.net/
 LACNIC (Regional Latin-American and Caribbean IP Address Registry) – Latin
America and some Caribbean Islands http://lacnic.net/en/index.html
 RIPE NCC (Réseaux IP Européens) - Europe, the Middle East, and Central Asia
http://www.ripe.net/

Scenario IV – private address

This scenario also focuses on the IP address on the Honeypot. In this scenario the Honeypot is
assigned a private address. Private addresses are specified in [RFC 1918]. In contrast to public
addresses, private IPs can not be addressed from the internet. Packets with private addresses are
discarded on internet gateways routers. To connect to a private address, the host needs to be
located within the same address range or it needs provision of a gateway with a route to the
target network.

The Internet Assigned Numbers Authority (IANA) reserved three blocks of IP addresses, namely
10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16 for private internets. For interconnecting private
and public networks an intermediate device is used. That device needs to implement Network
Address Port Translation (NAPT) [RFC 3022]. NAPT allows translating many IP addresses and
related ports to a single IP and related ports. This hides the addresses of the internal network
behind a single public IP. Outbound access is transparent to most of the applications.
Unfortunately some applications depend on the local IP address sent in the payload, i.e. FTP
sends a PORT command [RFC 959] with the local IP. Those applications require an Application
Layer Gateway which rewrites the IP in the payload. Therefore the applications on the Honeypot
are not aware of the public IP and limited by the functionality of the intermediate network
device.

Scenario V – risk assessment:

A Honeypot allows external addresses to establish a connection. This means that packets from
the outside are replied. Without a Honeypot there would be no such response. So a Honeypot
increases traffic on purpose, especially traffic which is suspicious to be malicious.
Security mechanisms need to make sure, that this traffic is not affecting the production systems.
Moreover the amount of traffic needs to be controlled. A hacker could use the Honeypot to
launch a DoS2 or DDoS3 attack. Another possibility would be to use the Honeypot as a file
server for stolen software, in hacker terms called warez. Both cases would increase bandwidth
usage and slow production traffic.

As hacking techniques evolve, an experienced Black hat could launch a new kind of attack which
is not recognized automatically. It could be possible to bypass the controlling functions of the
Honeypot and misuse it. Such activity could escalate the operation of a Honeypot and turn it into
a severe threat. A Honeypot operator needs to be aware of this risk and therefore control the
Honeypot on a regular basis.

Scenario VI – Honeypot-out-of-the-box
A Honeypot-out-of-the-box is a ready-to-use solution, which also could be thought as a
commercial product. The question is which features are needed. As showed in the previous
chapters there is a wide range of eventualities. A complete product needs to cover security, hide
from the attacker, good analyzability, and easy access to captured data and automatic alerting
functions to be sufficient.
The data owner splits the access right of the encrypted data into n pieces, with each
legitimate user holding one piece of the access right. This can effectively reduce the risk of
information leakage in big data storage.
Validate Result

Fig: Execution time for VM resource in Mobile Cloud

In case of Tera sort experiment, it has been observed with an interesting case (Case 2).
It has been analyzed and verified that the execution time is first decreases and then increases,
when Data size is kept constant and increasing number of nodes from 1 to 4. This is happening
because the load (Data set size) is constant but the communication between nodes is increasing
since number of Node is increasing from 1 to 4
Fig: Data Centers Optimization Time

MAP REDUCE programming model is defined MAP-REDUCE computing model


comprises of two functions, Map( )and Reduce( ) functions. The Map and Reduce functions
are both defined with data structure of (key1; value1) pairs. Map function is applied to each item
in the input dataset according to the format of the (key1; value1) pairs; each call
produces a list (key2; value2).
Fig: Optimized Execution Time
Dynamic resource management framework. Building on the current performance
modeling framework, I hope to extend it towards a more general resource management and
optimization framework which dynamically allocates different types of resources according to
the characteristics of Map Reduce jobs and different service level objectives (e.g., completion
time, cost, energy consumption).
Fig: Honey pot-out-of-the-box in Hadoop Smart city VM

Performance modeling in public cloud with virtualization. Today’s public cloud


platforms make extensive use of virtualization across computing storage, and net work resources.
An interesting trend that has emerged in recent years is the virtualization of the network layer,
first demonstrated by the use of the Open Flow API as part of the Software Defined Networking
(SDN) stack
Fig: Access Control

Performance modeling framework also enables automated tuning of the job settings (i.e.,
number of reduce tasks) along the applications defined as sequential Map Reduce workflows for
optimizing both completion time and the resource usage for the workflows. To develop novel
performance models and resource allocation strategies that can take into considerations the high
degrees of variance in highly virtualized environments.
Conclusion
This dissertation centers on performance modeling and resource management for Map
Reduce applications. It introduces a performance modeling framework for estimating
completion time for complex Map Reduce applications defined as a DAG of Map Reduce jobs
when it is executed on a given platform with different resource allocations and different input
data set(s). Based on the performance modeling framework, we further introduce resource
allocation strategies as well as our customized deadline-driven scheduler in estimating and
controlling the appropriate amount of resource that should be allocated to each application to
meet their (soft) deadlines.

First propose an improved Honey Been Optimization cryptosystem to overcome the


decryption failures of the original Honey been Optimization and then present a secure and
verifiable access control scheme based on the improved Honey been Optimization to protect the
outsourced big data stored in a cloud. Our scheme allows the data owner to dynamically update
the data access policy and the cloud server to successfully update the corresponding outsourced
ciphertext to enable efficient access control over the big data in the cloud.Considers the user
mobility without looking at cloudlet resources.Meanwhile, the task-centricVMmigrationmethod
takes a single VM migration approach, which fails to effectively utilize cloudlet resources.
Finally, the No VM migration approach, eventhough it does not increase the over-provisioned
resources, but It negative elyimpactson other aspects.
Future work
Considered rates of resource over-provisioning during VM migration, allowing the
overall system to utilize computing resources optimally. Left for our future work is how to
further optimize the task-computation time and data-access latency through considering the
presence of fog clouds and crowd sourcing. In this aspect, we will explore the emerging edge
computing technology to further optimize the task execution time while considering mobility and
context-awareness.
References
[1] J.H.AbawajyandM.M.Hassan,‘‘FederatedInternetofThingsandcloud computing pervasive
Security Network monitoring system,’’ IEEE Commun. Mag., vol. 55, no. 1, pp. 48–53, Jan.
2017.
[2] E. Ahmed et al., ‘‘Enabling mobile and wireless technologies for smart cities,’’ IEEE
Commun. Mag., vol. 55, no. 1, pp. 74–75, May 2017.
[3] M. Basiri, A. Z. Azim, and M. Farrokhi, ‘‘Smart city solution for
sustainableurbandevelopment,’’Eur.J.Sustain.Develop.,vol.6,no.1,pp.71–84, 2017.
[4] N. Bobroff, A. Kochut, and K. Beaty, ‘‘Dynamic placement of virtual machines for managing
SLA violations,’’ in Proc. 10th IFIP/IEEE Int. Symp. Integr.Netw. Manage. (IM), May 2007, pp.
119–128.
[5] R. Buyya, C. S. Yeo, and S. Venugopal, ‘‘Market-oriented cloud computing: Vision, hype,
and reality for delivering it services as computing utilities,’’ in Proc. 10th IEEE Int. Conf. High
Perform. Comput.Commun. (HPCC), Sep. 2008, pp. 5–13.
[6] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, ‘‘Clonecloud: Elastic execution
between mobile device and cloud,’’ in Proc. 6th Conf. Comput. Syst., 2011, pp. 301–314.
[7] R. Cimler, J. Matyska, and V. Sobeslav, ‘‘Cloud based solution for mobile Network
application,’’ in Proc. 18th Int. Database Eng. Appl. Symp., 2014, pp. 298–301.
[8] E. Cuervo et al., ‘‘Maui: Making smartphones last longer with code offload,’’ in Proc. 8th Int.
Conf. Mobile Syst., Appl., Services, 2010, pp. 49–62.
[9] A. V. Dastjerdi and R. Buyya, ‘‘Fog computing: Helping the Internet of Things realize its
potential,’’ Computer, vol. 49, no. 8, pp. 112–116, Aug. 2016.
[10] H. T. Dinh, C. Lee, D. Niyato, and P. Wang, ‘‘A survey of mobile cloud computing:
Architecture, applications, and approaches,’’ Wireless Commun. Mobile Comput., vol. 13, no.
18, pp. 1587–1611, Dec. 2013.
[11] C. Doukas, T. Pliakas, and I. Maglogiannis, ‘‘Mobile Network information management
utilizing cloud computing and Android OS,’’ in Proc. Annu. Int. Conf. IEEE Eng. Med. Biol.,
Aug. 2010, pp. 1037–1040.
[12] E.-M. Fong and W.-Y. Chung, ‘‘Mobile cloud-computing-based Network service by
noncontact ecg monitoring,’’ Sensors, vol. 13, no. 12, pp. 16451–16473, 2013.
[13] L. Gkatzikis and I. Koutsopoulos, ‘‘Migrate or not? Exploiting dynamic task migration in
mobile cloud computing systems,’’ IEEE Wireless Commun., vol. 20, no. 3, pp. 24–32, Jun.
2013.
[14] L. Gkatzikis and I. Koutsopoulos, ‘‘Mobiles on cloud nine: Efficient task migration policies
for cloud computing systems,’’ in Proc. IEEE 3rd Int. Conf. Cloud Netw. (CloudNet), Oct. 2014,
pp. 204–210.
[15] M. Guntsch, M. Middendorf, and H. Schmeck, ‘‘An ant colony optimization approach to
dynamic TSP,’’ in Proc. 3rd Annu. Conf. Genetic Evol.Comput., 2001, pp. 860–867.
[16] M. M. Hassan, ‘‘Cost-effective resource provisioning for multimedia cloud-based e-
Network systems,’’ Multimedia Tools Appl., vol. 74, no. 14, pp. 5225–5241, 2015.
[17] Mumak: Map-Reduce Simulator https://issues.apache.org/jira/browse/MAPREDUCE-728.
[18] Yang Wang and Wei Shi. On Optimal Budget-Driven Scheduling Algorithms for
MapReduce Jobs in the Hetereogeneous Cloud. Technical Report TR-13- 02, Carleton
University, 2013.
[19] Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael Ernst. Haloop: Efficient
iterative data processing on large clusters. Proc. VLDB Endow. , 3(1):285– 296, 2010.
[20] Songting Chen. Cheetah: A high performance, custom data warehouse on top of
mapreduce. Proc. VLDB Endow. , 3(2):1459–1468, 2010
[21] Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael J. Franklin, Scott
Shenker, and Ion Stoica. Shark: Fast data analysis using coarse-grained distributed memory. In
Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data ,
SIGMOD ’12, pages 689–692, 2012.

You might also like