You are on page 1of 9

1

Evolving Smart Grid Information Management Cloudward:


A Cloud Optimization Perspective
Xi Fang, Dejun Yang, and Guoliang Xue, Fellow, IEEE
Arizona State University
AbstractSmart Grid (SG) is a power system with advanced
communication and information technologies integrated and
leveraged. In this paper, we study an optimization problem of
leveraging the cloud domain to reduce the cost of information
management in the SG. We propose a cloud-based SG information management model and present a cloud and network
resource optimization framework to solve the cost reduction
problem in cloud-based SG information storage and computation.
Index TermsSmart Grid, Cloud Computing, Information
Management, Optimization

I. I NTRODUCTION
Smart Grid (SG) is an intelligent power system that uses
two-way communication and information technologies, and
computational intelligence to revolutionize power generation,
delivery, and consumption. Its evolution relies on the utilization and integration of advanced information technologies,
which transform the energy system from an analog one to
a digital one. In the vision of the SG, information plays a key
role and should be managed efficiently and effectively [10].
One of the important trends in todays information management is outsoucing management tasks to cloud computing,
which has been widely regarded as the next-generation computing and storage paradigm [5], [20]. The concept of cloud
computing is based on large data centers with massive computation and storage capacities operated by cloud providers,
which deliver computing and storage services as utilities.
The overwhelming data generated in the SG due to widelydeployed monitoring, metering, measurement, and control
devices calls for an information management paradigm shift
in large scale data processing and storage mechanisms. Integrating the DNA of cloud computing into the SG information
management makes sense for the following four reasons.
First, highly scalable computing and storage services provided by cloud providers fit well with the requirement of the
information processing in the SG. This is because the resource
demands for many computation and data intensive applications
in the SG vary so much that a highly scalable information
storage and computing platform is required. For instance, the
resource demands for the electric utility vary over the time
of the day, with peak operation occurring during the day and
information processing needs slowing down at night [22].
Second, the level of information integration in the SG
can be effectively improved by leveraging cloud information
sharing. As stated in [10], autonomous business activities often
lead to islands of information. As a result, in many cases
Fang, Yang, and Xue are all affiliated with Arizona State University, Tempe,
AZ 85281. Email: {xi.fang, dejun.yang, xue}@asu.edu This research was
supported in part by ARO grant W911NF-09-1-0467 and NSF grant 0901451.
The information reported here does not reflect the position or the policy of
the federal government.

the information in one department of an electric utility is


not easily accessible by applications in other departments or
organizations. However, a well-functioning SG requires the
information to be widely and highly available. The property
of sharing information easily and conveniently enabled by the
cloud storage provides a cost-effective way to integrate these
islands of information in the SG.
Third, the sophistication of the SG may lead to a highly
complex information management system [10]. For traditional
electric utilities, realizing such complicated information systems may be costly or even beyond their capacity. Therefore,
it would be a good option to get the information technology
sector involved and outsource some tasks to the clouds, which
provide cost-effective computing and storage solutions. This
relieves the pain of electric utilities in the costly information
system design, deployment, maintenance, and upgrade during
the massive transformation to the SG.
Fourth, distributed electricity generation [10] lowers entry
barriers for new players (e.g. small businesses or households
capable of generating electricity) and brings about a new
ecosystem of the SG. Outsourcing information management to
the clouds allows these new players to focus on their business
innovation rather than focusing on building data centers to
achieve scalability goals. It also allows these players to cut
their losses if the launched products do not succeed.
Although a cloud-based SG information management
paradigm is promising, we are still facing many challenges.
The first challenge is a systematic optimization on the resources in the SG, cloud providers, and networking providers.
For example, a fully functioning SG may be a large-scale or
even continental-wide system, where information generating
sources (e.g. smart meters and sensors [10]) are distributed
and scattered across a large geographical area, and heterogeneous communication networks (e.g. WiMax, fiber optic
networks, and powerline communications [10]) are used for
data transmission. Hence, many geographically and architecturally diverse cloud providers and networking providers may
get involved. Systematically optimizing the usage of different
resources in these diverse clouds and networks can reduce the
overall cost of the SG information management.
The second challenge is the fact that the healthy operation
of the SG is dependent on the high availability and the prompt
analysis of the critical information (e.g. power grid status monitoring data). Without a careful design, outsourcing information management to the clouds may bring about potential risks
to the operation of the SG. The first risk is that the information
security and privacy may be compromised, since outsourcing
information management may lead to electric utilities losing
the full control of the integrity, confidentiality, and availability
of the information [9]. The second risk is that the quality

of service delivery may not be guaranteed. Although servicelevel agreements can be established between cloud providers
and actors in the SG to enforce quality of service delivery,
risks of outsourcing still exist. For example, electric utilities
may fail to receive critical alerts from a grid status analysis
service running in a public cloud, since public clouds are
often built upon the Internet, where realtime data delivery
with an extremely high probability is not easily guaranteed.
Compared with the consequences of failing to ensure quality
of service delivery in other industries outsourcing information
management, the consequences in the power industry may be
much more fatal. For example, millions of households may
lose power supplies due to the power outage.
In this paper, we study an optimization problem of leveraging clouds to reduce the cost of information management in
the SG, taking into account the concerns of security, privacy,
and protection of service quality. We propose a cloud-based
SG information management model, and present a resource
optimization framework to solve the cost reduction problem
in cloud-based SG information storage and computation.
The remainder of this paper is organized as follows. First,
we review the related works in Section II. Then, we describe
the system model in Section III. Next, we present an optimization framework for cloud-based SG information storage
in Section IV and an optimization framework for information
computation in Section V. We present simulation results in
Section VII and conclude this paper in Section VIII.
II. R ELATED W ORKS
Recently, researchers have studied how to use cloud computing to help manage the SG. Simmhan et al. [21] analyzed
the benefit of using cloud for demand response optimization
in the SG. Rusitschka et al. [19] presented a model for
the SG data management based on cloud computing, which
takes advantage of distributed data management for real-time
data gathering, parallel processing for real-time information
retrieval, and ubiquitous access. Nagothu et al. [17] proposed
to use cloud data centers as the central communication and optimization infrastructure supporting a cognitive radio network
of smart meters. Nikolopoulos et al. [18] presented a decisionsupport system and a cloud computing software methodology
that bring together energy consultants and modern web interoperable technologies. Kim et al. [14] proposed a cloudbased demand response architecture for fast response times
in large scale deployments. Mohsenian-Rad and Leon-Garcia
[16] designed an approach to improve load balancing in the
grid by distributing the service requests among data centers.
Xin et al. [24] presented a cloud-based virtual SG architecture
that embeds the SG into a cloud environment. The difference
between our work and the above line of research is that to
the best of our knowledge, we are the first to present a cloud
and network resource optimization framework for cloud-based
SG information management, and then based on that to study
the problem of reducing the cost of information storage and
computation.
Note that cloud optimization is also an active research
area. Bossche et al. [8] proposed an optimization approach
to maximize the utilization of the internal data center and

to minimize the cost of running the outsourced tasks in the


cloud, while fulfilling quality of service constraints. Chaisiri
et al. [6] proposed an optimal cloud resource provisioning
algorithm to provision resources offered by multiple cloud
providers. Li et al. [15] used a combination of bin-packing,
mixed integer programming and performance models to make
decisions on optimal deployments of large service centers and
clouds. Zhao et al. [25] proposed two resource rental planning
models to generate rental decisions, which aim to minimize
resource rental cost for running elastic applications in clouds.
Hajjat et al. [11] proposed algorithms to tackle challenges
in migrating enterprise services into hybrid cloud-based deployments. Wang et al. [23] proposed a practical mechanism
to securely outsource linear programming to the cloud. The
difference between our work and the research on general cloud
resource optimization is the following. First, our optimization
framework is designed based on our novel cloud-based SG
information management model, which takes into account the
requirements of actors in the SG and resources provided by
cloud providers and networking providers. Second, even from
the perspective of general cloud resource optimization, our
work is novel, since our cloud model is designed based on
novel computation flow structures and aims to optimize the
cost and information flow in multiple domains.
III. S YSTEM M ODEL
A. Overview
Our cloud-based SG information management model consists of four domains SG domain, cloud domain, broker
domain, and network domain, as shown in Fig. 1.

Fig. 1.

The Cloud-Based Smart Grid Information Management Model

SG Domain: As defined by the National Institute of Standards and Technology, the SG domain is composed of seven
subdomains (refer to [4] for more details). Next, we introduce
three concepts related to SG information management data
item (DI), computational project (CP), and user.

Data item: A DI is an information object generated by


some information sources in the SG, such as smart meters
or phasor measurement units (PMUs) [10]. It should be
securely stored. It may be taken by some CPs as input.
We use D to denote the set of DIs in the SG.
Computational project: A CP consists of one or more
tasks, each of which takes some DI(s) and/or the output(s)
of previously finished task(s) as inputs and performs
required computing operations.

User: A user is a party in the SG who is interested in


accessing some stored DIs or the outputs generated from
some CPs. We use U to denote the set of users in the SG.
Cloud Domain: The cloud domain consists of one or more
clouds which provide storage and/or computing services. Each
cloud has one pricing policy (including transfer-in, transferout, storage, and computation pricing), while different clouds
may have different pricing policies. For example, Amazon
Simple Storage Service is available in seven regions with
different pricing policies [2]. Therefore, we consider that
Amazon Simple Storage Service is provided by seven clouds.
We use C to denote the set of clouds available.
Broker Domain: The broker domain consists of one or more
cloud brokers that mediate between the SG domain and the
cloud domain for gathering requirements from the actors of the
SG domain (e.g. electric utilities), locating the suitable clouds,
and assisting the actors of the SG domain in buying, obtaining,
and releasing cloud services. The idea of cloud broker is
compelling because as the number of services supported in the
cloud domain increases, end users may have more difficulties
in finding one that meets their requirements such as cost,
availability, and service category [5], or have difficulties in
optimizing various system resources. A new class of cloud
brokers may evolve from traditional brokers and specifically
aim to solve SG information management problems.
Network Domain: Networking providers in the network
domain own the communications and network infrastructure
and provide the information transmission service between any
two of the above three domains.
Let us illustrate the above concepts using the following two
examples.
Information Sharing: As discussed in Section I, cloud
storage provides a way to integrate the islands of information
in the SG domain. These DIs can be stored in a storage cloud
and shared by different users for the purpose of advanced information processing. For example, customer electricity usage
data (i.e. DIs) would be useful to multiple parties (i.e. users),
such as the customer billing service in the electric utility,
the electricity market analysis department, and the customer
power saving recommendation service provider. The customer
electricity usage data is full of customer behavior information,
which can be mined to provide energy recommendations or
consultant advices [9]. This information can also be used to
advertise appropriate energy saving appliances or improve the
hit rate of advertising.
Coordinated Electric Vehicle Charging Analysis: In the
vision of SG, electric vehicle (EV) plays an important role
since they can be used to provide power to help balance
loads by peak shaving (sending power back to the grid
when demand is high) but also valley filling (charging
when demand is low) [10]. However, coordinated EV charging
is very important, since serious problems (e.g. significant
degradation of power system performance and efficiency, and
even overloading) can arise under high penetration levels of
uncoordinated charging. Thus, a service, which collects grid
status information and charging station information, gathers
the location of a large number of EVs, and estimates user
demands, can help to give the assistance information. A cloud-

TABLE I
F REQUENTLY USED NOTATIONS

C
D
DS
DO
T
TD (d)
TT (t)
C
CD (d)
CT (t)
U
UD (d)
UT (t)
p+ (d, c)
p (c, u)
pS (c)
pI (c1 , c2 )
(t, c)
sD (d)
sT (t)
sD (d, c)
(d)
(d)
X + (d, c)

Total cost
Set of DIs
Set of DIs required to be split
Set of DIs required to be stored in one cloud
Set of tasks
Set of tasks taking DI d as input
Set of tasks taking the output of task t as input
Set of clouds
Set of clouds in which DI d is allowed to be stored
Set of clouds in which task t is allowed to be executed
Set of users
Set of users requesting DI d
Set of users requesting the output of task t
Unit price of uploading DI d to cloud c
Unit price of downloading data from cloud c to user u
Unit storage price in cloud c
Unit inter-cloud transfer price from cloud c1 to cloud c2
Computational cost of task t charged by cloud c
Size of DI d
Size of the output of task t
Size of the portion of DI d stored in cloud c
Storage redundancy ratio of DI d
Storage splitting ratio of DI d
Binary variable indicating whether DI d is uploaded to
cloud c for data computation
X T (t, c)
Variable indicating whether task t is executed in cloud c
Binary variable indicating whether cloud c1 transmits
X I (t1,t2,c1,c2) intermediate data to cloud c2 because task t2 takes the
output of task t1 as input

based service is clearly a good candidate, because information


integration via clouds is relatively cost-effective and cloud
services are highly scalable, which can easily deal with the
fluctuation of the number of EVs involved. More specifically,
a set of EV charging agents (i.e. users), each of which
is responsible for coordinating a group of EVs charging
operations, upload the DIs, including EVs battery status
information and behavior prediction, to a cloud. At the same
time, the electric utility uploads the DI, including the power
grid status information, to the cloud. A CP running in the cloud
takes these DIs as inputs and outputs an optimized charging
schedule for these EVs. These charging agents then download
the charging schedule from the cloud.
The above examples depict two intriguing example scenarios for cloud-based SG information management. Detailed design and implementation of practical algorithms and protocols
would be an interesting topic to be explored.
Our framework is focused on the SG, cloud, and network
domains, while cloud brokers help formulate programming
according to demands. Note that virtualization technology
may be used in clouds for resource optimization. The focus
of our work is above the virtualization layer in a cloud.
For instance, recall the example of Amazon Simple Storage
Service discussed before, which has seven clouds. For a DI,
we focus on the problem of which cloud(s) should be used
to store this DI, rather than how to optimize the virtual and
physical resources in the chosen cloud(s) for data storage.
In the following, we use three submodels to characterize
cloud-based SG information storage, computation, and security and protection requirements. Table I shows the notations
frequently used in this paper.

B. Cloud Storage Submodel


One of the key information management tasks in the SG is
data storage. Each DI d D is uploaded to the cloud domain
and stored in one or more clouds. If one DI should be stored
in two or more clouds, each of these clouds stores one portion
of this DI. Let sD (d) denote the size of DI d, and sD (d, c)
denote the size of the portion of DI d stored in cloud c. In
this model, we use C to denote the set of clouds providing
storage services. Each user u U is interested in one or more
DIs and downloads them from the cloud domain.
There are three types of prices related to this model.
+
We use p (d, c) to represent the unit upload price of
uploading DI d to cloud c. This price consists of the data
transfer-in price charged by cloud c and the communication price between cloud c and the information source of
DI d charged by networking providers.
S
We use p (c) to represent the unit storage price of storing
DI d in cloud c.

We use p (c, u) to represent the unit download price


of downloading data from cloud c to user u. This price
consists of the data transfer-out price charged by cloud c
and the communication price between cloud c and user
u charged by the networking providers.
An illustrative example is given in Fig. 2. DI d1 is requested
by both user u1 and user u2 . Although c1 has the highest
unit storage price, it is easy to verify that the optimal total
cost is achieved when d1 is stored in c1 due to the cheaper
download prices from c1 to u1 and u2 . Note that the reason
for the download prices being cheaper may be due to the fact
that both u1 and u2 are closer to the data center of c1 than
the data centers of c2 and c3 . This example also justifies
our previous statement that we should more systematically
leverage resources in the diverse clouds and networks to reduce
the total cost. In addition, we will discuss how to store DI d2
in Section III-D.
C. Cloud Computation Submodel
Another key information management in the SG is processing and analyzing the information generated in the SG.
As defined before, a CP consists of one or more tasks, each
of which takes the DI(s) generated in the SG or the output
of previously finished task(s) as input and performs required
computing operations. In this model, each DI d D is
uploaded to the cloud domain and taken by some tasks as
inputs. We use C to denote the set of compute clouds, and T
to denote the set of all tasks existing in the CPs. Each user
u U is interested in the output of one or more tasks and
downloads the output from the cloud domain.
Next, we explain CPs and tasks in more detail. Each CP
consists of a series of tasks, each of which aims to solve one
SG information management job. The relationships among the
tasks in a CP and the relationships among different CPs are
represented by a graph, called a CP structure graph.
The basic concept of the CP structure graph is similar to
those used in Microsoft Dryad [12] and Google MapReduce
[7]. In Dryad the overall structure of a Dryad job is determined
by its communication flow. A Dryad job is a directed acyclic
graph where each vertex is a program and edges represent data

channels. In MapReduce, there are two sequential stepsMap


and Reduce. Tasks in MapReduce are a series of operations
whose relationship can be modeled as a directed acyclic graph.
In our model, DIs, tasks, and users are represented as nodes
in a CP structure graph. Let DI node, task node, and user
node denote the node in this graph representing DI, task, and
user, respectively. A CP structure graph is a directed acyclic
graph that characterizes the work flow and communication
flow of the tasks in CPs. Directed link <t1 , t2 > from t1 to t2
represents that task t2 takes the output of task t1 as an input.
Directed link <d, t> from DI node d to task node t represents
that task t takes DI d as an input. Directed link <t, u> from
task node t to user node u represents that user u downloads
the output of task t from the cloud domain. In addition, we
use sT (t) to represent the size of the output of task t.
Fig. 3 shows an example of a CP structure graph that
represents two CPs, seven tasks, four DIs, and three users.
Let us take CP1 as an example. Task t1 takes DI d1 as an
input and its output is taken by task t2 as an input. Task t5
takes DI d3 , the output of task t2 , and the output of task
t4 as inputs. The output of task t5 is downloaded by user
u2 , and the output of task t4 is downloaded by user u1 . Let
TT (t) denote the set of successor nodes of task t on the CP
structure graph. Therefore, TT (t1 ) = {t2 }, TT (t2 ) = {t5 },
TT (t3 ) = {t4 }, TT (t4 ) = {t5 }, and TT (t5 ) = . We
use TD (d) to denote the set of tasks that take DI d as
input. Therefore, TD (d1 ) = {t1 , t3 }, TD (d2 ) = {t2 }, and
TD (d3 ) = {t5 , t6 }.
There are five types of prices related to this model.
+
As in Section III-B, we use p (d, c) to represent the unit
upload price of uploading DI d to cloud c. Note that only
one copy of a DI is uploaded to a cloud even if there are
two or more tasks in this cloud needing this DI as an
input.
We use (t, c) to represent the computational cost of task
t charged by cloud c.
I
We use p (c1 , c2 ) to denote the unit inter-cloud transfer
price of transferring data from cloud c1 to cloud c2 .
S
As in Section III-B, let p (c) represent the unit storage
price of storing DI d in cloud c.

As in Section III-B, let p (c, u) represent the unit download price of downloading data from cloud c to user u.
Fig. 4 shows an example of assigning clouds to the tasks
shown in Fig. 3. For simplicity, we do not show the prices in
this figure. DIs d1 and d2 are uploaded to clouds c1 and c2 ,
in which tasks t1 t4 are executed. The arrow from c2 to c1
represents that the output of task t1 will be transferred to c1 ,
since t2 takes the output of task t1 as an input. Similarly, the
arrow from c1 to c3 represents the information flow resulted
by transferring the outputs of t2 and t4 from c1 to c3 . Since
user u1 needs the output of task t4 , it downloads this output
from cloud c1 . In addition, although both t5 and t6 need DI
d3 , since they are both operated in cloud c3 , DI d3 is uploaded
to cloud c3 only once (to save the upload cost).
D. Security, Privacy and Protection Submodel
1) Data Storage: The integrity, availability, and confidentiality of the data in the SG (e.g. the information generated

U
s

Fig. 2.

An example for the cloud storage

Fig. 3.

An example for a CP structure graph

from PMUs and smart meters) is critical for the operation of


the SG [10]. We hence consider that each DI d D has one
or more of the following storage requirements (note that the
third and fourth requirements are mutually exclusive):
DI d should be stored in one or more pre-selected
candidate clouds. Let CD (d) denote the set of these
clouds.
DI d should be stored with redundancy.
DI d should be split and stored in two or more clouds.
DI d should be completely stored in one cloud.
In the following, we explain the above four requirements.
The first requirement: Deciding a set of candidate clouds can
be done by cloud brokers, which find a set of clouds satisfying
the preferences of data storage, upload, and download, such
as data center location, data availability, and properties of
communication networks among DI sources, clouds, and users.
The second requirement: Although some cloud providers
(e.g. Amazon) claim that they provide data redundancy services, the actual data processing in such service is like being
encapsulated in a black box and is done by the clouds rather
than the actors in the SG (e.g. electric utilities). In order
to allow the actors in the SG to have better control of the
redundancy of the critical data, they may need to ensure the
data redundancy by themselves.
The third requirement: Data splitting is an important approach to protecting sensitive DIs generated in the SG from
unauthorized access, by encrypting the DIs and storing different portions of each DI in different clouds. Even if some
clouds are compromised, the attacker may still be unable to
retrieve the whole DI. Fig.2 shows an example where DI d2
is forced to be split and stored in clouds c2 and c3 .
The fourth requirement: This constraint is applicable when
a DI of some type must be stored completely as a whole in
order to, for example, facilitate data content check.
2) Data Computation: Each task has its own quality of
service delivery requirements, such as computation deadlines
and data center locations. For example, a sensitive task (e.g.
a data analyzer of an electric utility that uses confidential
algorithms) should be only executed in the private clouds of
the electric utility. For each task t, cloud brokers decide a
set of candidate clouds (denoted by CT (t)) satisfying these
requirements.
IV. A N O PTIMIZATION F RAMEWORK FOR C LOUD -BASED
S MART G RID I NFORMATION S TORAGE
We formulate the cost minimization problem for cloud-based
SG information storage as a mixed-integer linear programming

Fig. 4. An example for the cloud computation

problem.
System1 :
min

(Cost)
X X
C=
sD (d, c)pS (c)
dD cCD (d)

X X

dD cCD (d)

X X

sD (d, c)p+ (d, c)


X

sD (d, c)p (c, u); (1)

dD cCD (d) uUD (d)

s.t.

(Data Redundancy Constraint)


X
sD (d, c) = (d)sD (d), d D;

(2)

cCD (d)

over

(Data Splitting Constraint)


(d)sD (d)
sD (d, c)
, c CD (d), d DS ; (3)
(d)
(Data Exclusive Constraint)
sD (d, c)
{0, 1}, c CD (d), d DO ; (4)
(d)sD (d)
sD (d, c) [0, ), c CD (d), d D.
(5)

Interpretations:
The objective is to minimize the total cost. In Equation (1), the three terms represent the cost of storage,
the cost of uploading DIs to the clouds, and the cost of
downloading DIs to users from the clouds, respectively.
Constraint (2) represents the second storage requirement
(see Section III-D). We use (d) to denote the storage
redundancy ratio of DI d. In other words, for DI d, the
total amount of this DI stored in the cloud domain equals
the size of DI d times its storage redundancy ratio.
Constraint (3) represents the third storage requirement
(see Section III-D). Let DS denote the set of DIs required
to be split, and (d) the storage splitting ratio of DI d.
1
Constraint (3) means that no cloud stores more than (d)
of the total amount of DI d stored in the cloud domain.
Constraint (4) represents the fourth storage requirement
(see Section III-D). We use DO to denote the set of DIs
required to be stored completely in one of the clouds.
V. A N O PTIMIZATION F RAMEWORK FOR C LOUD -BASED
S MART G RID I NFORMATION C OMPUTATION
We formulate the cost minimization problem for cloud-based
information computation as an integer linear programming.

of t1 as an input, then cloud c1 will transfer the output


of t1 to cloud c2 .

System2 :
min
C=

(Cost)
X X
X X
X + (d, c)sD (d)pS (c) +
X T (t, c)(t, c)

X X

dD cCD (d)

XX
+

In this section, we will discuss some practical issues.

X + (d, c)sD (d)p+ (d, c)


X

X
X I (t1 , t2 , c1 , c2)sT (t1)pI (c1 , c2)

t1 T t2 TT (t1 )c1 CT (t1 )c2 CT (t2 )

X X

X T (t, c)sT (t)p (c, u);

(6)

tT cCT (t) uUT (t)

s.t.

VI. D ISCUSSIONS

tT cCT (t)

dD cCD (d)

(Task Execution Constraint)


X
X T (t, c) = 1, t T;

(7)

cCT (t)

(Data Upload Constraint)


P
T
tTD (d) X (t, c)
X + (d, c)
|TD (d)|
X

X T (t, c), c CD (d), d D;

(8)

tTD (d)

X T (t, c) = 0, c
/ CT (t), t T;
(9)
(Inter-Cloud Intermediate Data Transfer Constraint)
1 T
1
(X (t1 , c1 ) + X T (t2 , c2 )) X I (t1 , t2 , c1 , c2 )
2
2
1 T
1
T
(X (t1 , c1 ) + X (t2 , c2 )) + , c1 CT (t1 ),
3
3
c2 CT (t2 ), t2 TT (t1 ), t1 T;
(10)
over X + (d, c) {0, 1}, c CD (d), d D;
X T (t, c) {0, 1}, t T, c C;

(11)
(12)

X I (t1 , t2 , c1 , c2 ){0, 1}, c1 CT (t1 ), c2 CT (t2 ),


t2 TT (t1 ), t1 T.
(13)
Interpretations:
The objective is to minimize the total cost, which is
computed as in Equation (6). The five terms represent the
cost of storing DIs, the cost of executing tasks, the cost of
uploading DIs to the cloud domain, the cost of inter-cloud
intermediate data transfer, and the cost of downloading
the outputs of tasks to users from the cloud domain,
respectively. Binary variable X + (d, c) is 1 if and only
if DI d is uploaded to cloud c. Binary variable X T (t, c)
is 1 if and only if task t is executed in cloud c. Binary
variable X I (t1 , t2 , c1 , c2 ) is 1 if and only if 1) task t1 is
executed in cloud c1 , 2) task t2 is executed in cloud c2 ,
and 3) cloud c1 transfers data to cloud c2 because task
t2 takes the output of task t1 as an input.
Constraint (7) ensures that each task is executed by
exactly one of the clouds.
+
Constraints (8) and (9) ensure X (d, c) to be 1 if for
D
T
some t T (d), X (t, c) is 1. In other words, we need
to upload DI d to cloud c once if there is at least one task
executed in cloud c that would take DI d as an input.
I
Constraint (10) ensures X (t1 , t2 , c1 , c2 ) to be 1 if and
T
T
only if X (t1 , c1 ) = X (t2 , c2 ) = 1. In other words,
this constraint means that if task t1 is executed in cloud
c1 , task t2 is executed in cloud c2 , and t2 takes the output

A. Dealing with Uncertain Prices


In the SG information storage problem, we may not know
the accurate values of upload, storage, and download prices.
We list some possible scenarios in the following.
The data upload operation takes a long time and the
communication price changes during the operation.
The unit storage price is computed based on the knowledge of how long DIs will be stored. If the actual storage
period is longer than the designed one, the unit storage
price would be larger.
The actual unit download price could be different than
that when the programming is formulated and solved.
One approach to dealing with uncertain prices is using
a worst case approximation. More specifically, although the
actual values of these prices may be uncertain, we assume
that they are upper bounded (i.e. p+ (d, c) p+
UB (d, c),
S
S
p (c, u) p
(c,
u)
and
p
(c)

p
(c)),
and
that
we know
UB
UB
these bounds. We replace objective (1) by the following:
X X
X X
min C =
sD (d, c)pSUB (c) +
sD (d, c)p+
UB (d, c)
dD cCD (d)

X X

dD cCD (d)

sD (d, c)p
UB (c, u).

dD cCD (d) uUD (d)

We use the solution to this objective as an approximate


solution to System 1, when some prices are uncertain. Another
approach is adopting stochastic programming [13] (if we
know the distribution of the uncertain prices), which aims to
minimize the expectation of objective (1).
In the SG information computation problem, we also need
to deal with uncertain prices. First, we may not know the
accurate values of unit upload, storage, and download prices.
Second, unpredictable size of task output may also lead to
uncertainty of download cost. Moreover, computational cost
of tasks may also be uncertain due to many reasons, such
as unpredictable running time. The cost of inter-cloud transfer
may also be uncertain due to varying inter-cloud transfer price
and unpredictable size of task output. Similarly, one approach
is using the worst case method for the treatment of uncertainty.
We assume that sT (t) sTUB (t), (t, c) UB (t, c), and
pI (c1 , c2 ) pIUB (c1 , c2 ). We replace objective (6) by the
following:
XX
XX
X + (d, c)sD (d)pSUB (c) +
X T (t, c)UB (t, c)
min C =
tT cCT (t)

dD cCD (d)

X X

dD cCD (d)

XX

X + (d, c)sD (d)p+


UB (d, c)
X

X
X I (t1 , t2 , c1 , c2)sTUB (t1)pIUB (c1 , c2)

t1 T t2 TT (t1 )c1 CT (t1 )c2 CT (t2 )

X X X

tT cCT (t) uU

X T (t, c)sTUB (t)p


UB (c, u).

If we know the distribution of the uncertain prices, we


can adopt stochastic programming and aim to minimize the
expectation of objective (6).

D. Refining Resource Optimization Model

Our model can be further refined to capture more practical


and sophisticated properties. We list the following two extensions as the future research work.
B. Jointly Optimizing Information Computation and Storage
The communication topology among DI sources, clouds
In some scenarios, DIs would be taken by the CPs as
and users may be taken into account to optimize the ininputs, and also stored in clouds for information sharing. The
formation flow to reduce the cost. For example, consider
objective function of the problem of minimizing the total cost
a scenario, where we want to transport a DI from a cloud
of computation, storage and sharing can be formulated as
to two users. In our current model, both users download
min
(Cost)
this DI from this cloud. However, when these two users
X X
X X
+
D
S
T
are quite close, the total cost may be further reduced if
C=
X (d, c)s (d)p (c) +
X (t, c)(t, c)
one user forwards this DI to the other once the former
D
T
dD cC (d)
tT cC (t)
X X
one downloads the DI from the cloud. Therefore, it would
+
X + (d, c)sD (d)p+ (d, c)
be beneficial to optimize communication resources on a
dD cCD (d)
more sophisticated communication model, which captures
XX
X
X
+
X I (t1 , t2 , c1 , c2)sT (t1)pI (c1 , c2)
more details of information flow.
t1 T t2 TT (t1 )c1 CT (t1 )c2 CT (t2 )
It might be beneficial to consider physical and virtual
X X
X
resources in clouds for some scenarios. For example,
+
X T (t, c)sT (t)p (c, u)
consider a scenario, where Infrastructure as a Service
tT cCT (t) uUT (t)
X X
X
(IaaS) [5] is provided to the electric utility for SG
X + (d, c)sD (d)p (c, u).
+
information management. Optimization on physical and
dD cCD (d) uUD (d)
virtual resources catering to the information management
tasks might lead to a cost reduction.
C. Wisely Selecting Pricing Options
Cloud providers often offer multiple pricing options. For instance, three common options are on-demand instance pricing,
reserved instance pricing, and spot instance pricing [1].
On-demand instances let customers pay for compute capacity by the hour with no long-term commitments or upfront
payments. This pricing option is suitable for the information
management services, which may lead to short term, spiky
or unpredictable workloads. Reserved instances let customers
make a low, one-time, upfront payment for an instance, reserve
it for a one or three year term, and pay a significantly
lower hourly rate for that instance. This pricing option is
suitable for the information management services with steady
state or predictable usage. Spot instances provide the ability
for customers to purchase compute capacity with no upfront
commitment and at hourly rates usually lower than the ondemand rate. The spot price fluctuates based on supply and
demand for instances, but customers will never pay more than
the maximum price they have specified. We list the following
two example scenarios and analyze the corresponding pricing
options.
Customer behavior data analysis service: In general,
analyzing customer behavior data is a service with steady
state and predictable usage. Therefore, we can adopt the
reserved instance pricing option to lower hourly rate. Spot
instance pricing option may also be applicable because
this service has flexible start and end times and can be
executed at the time when the spot price is low.
Coordinated EV charing analysis: As discussed in Section III-A, a service can be run in the cloud to coordinate
EV charing. One property of this service is that the
analysis work load is fluctuating and often unpredictable
because the number of the EVs involved varies over
time. As a result, in this scenario the on-demand instance
pricing option may be better than the other two options.

VII. P ERFORMANCE E VALUATION


In this section, we evaluate the performance of our optimization framework. The programmings formulated were solved
using the CPLEXTM solver [3] on a 2.8GHz Linux machine.
A. Cloud-Based Smart Grid Information Storage
We evaluated the following scenario. There exists one large
electric utility, two public clouds (Amazon-I and Amazon-II),
four types of DIs (customer behavior data, customer account
data, EV data, and PMU data), and four service departments in
this utility (grid and market analysis service, recommendation
service, customer billing service, and EV analysis service)
requesting these data. This electric utility maintains three
geographically different private clouds by itself.
The number of residential areas served by this utility
was varied from 50 to 500 with increment of 50. The
number of households in each residential area was set
to G(10000, 5000, 1, 100000), where G(a, b, c, d) denotes a
Gaussian variable with mean of a, variance of b, and range of
[c, d].
Each household is equipped with one smart meter. We
assumed that each smart meter generates 1 megabyte of
information per day (according to Austin Energy, a smart
meter generates about 1 megabyte of information per day if
the sampling interval is 15 minutes). The data aggregated from
the smart meters in each residential area is represented as one
DI and categorized as the customer behavior data. These DIs
can be stored in public and/or private clouds.
Each household also generates 1 kilobyte of important
account information per day. The data aggregated in each
residential area is represented as one DI and categorized as the
customer account data. Due to the sensitivity of the customer

8000
6000
4000
2000
0

50 100 150 200 250 300 350 400 450 500


Number of Residential Areas

(a) Information Storage Cost


Fig. 5. Performance Evaluation

0.1

x 10
OPT
STOC
WST
4 PRVT
6

0.075
0.05
0.025
50 100 150 200 250 300 350 400 450 500

Number of Residential Areas

(b) Running Time of System 1

account data, they can only be stored in the private clouds,


and the storage splitting ratio and the redundancy ratio were
both set to 2.
Each household has zero or more EVs. One EV generates
50 kilobyte of information per day. We assumed that the
total number of EVs in each residential area is uniformly
distributed in [ R2 , R], where R is the number of households in
this residential area. The data aggregated from EVs in each
residential area is represented as one DI and categorized as
the EV data. These DIs can be stored in public and/or private
clouds.
In addition, the electric utility collects the information from
100 PMUs, each of which generates 500 megabyte of phasor
information per day. The data generated from each PMU is
represented as one DI and categorized as the PMU data. They
should be stored in one of the public or private clouds.
Now we explain the relationships between DIs and users.
Customer behavior data and PMU data are used by the grid and
market analysis service department to establish the grid and
market status in the past, which can be further used to improve
future operation of SG. Customer behavior data is also used
by the recommendation service department, which mines these
DIs to provide customers with energy recommendations to
help them reduce billing. Customer account data is used by
the customer billing service department. PMU data, customer
behavior data, and EV data are used by the EV analysis service
department to investigate the impacts of charging/discharging
EVs on the power grid status and customer electricity usage.
We assumed that all the DIs are stored for six months and
we evaluated the cost of storing the DIs generated in one day.
The unit storage price for one gigabyte per month of Amazon-I
is $0.125 (i.e. US standard pricing [2]) and that of Amazon-II
is $0.14 (i.e. US West-Northern California pricing [2]). The
unit storage price for one gigabyte per month of a private cloud
is uniformly distributed between $0.15$0.2. The rationale
behind the setting that the storage price of private clouds is
higher than that of public clouds is that public cloud providers
are usually focused IT companies and have the large-scale and
professional advantage to reduce the price.
The unit upload (download) price is equal to the sum of
the transfer-in (transfer-out) price charged by the clouds and
the communication price charged by networking providers.
The average transfer-in (transfer-out) price of Amazon-I and
Amazon-II is $0 ($0.12) per gigabyte. The average transferin (transfer-out) price of the private clouds was assumed to
be $0.2 ($0.2) per gigabyte. The communication price was
assumed to be G(0.1, 0.05, 0.05, 0.15).
In Fig. 5(a), we compared the costs of four cases: OPT,
STOC, WST, and PRVT. OPT represents the case where the
optimal solution is achieved. STOC and WST represent the

50 100 150 200 250 300 350 400 450 500


Number of Residential Areas

(c) Information Computation Cost

Running Time (second)

Cost ($)

10000

OPT
STOC
WST
PRVT

Cost ($)

12000

Running Time (second)

5
4
3
2
1
50 100 150 200 250 300 350 400 450 500

Number of Residential Areas

(d) Running Time of System 2

stochastic programming and the worst case method discussed


in Section VI-A, respectively, when the real storage and
download prices are unknown in advance. PRVT represents
the case where all the DIs must be stored in the private clouds
of the electric utility. We observe that in STOC and WST
although we may not know the exact values of storage and
download prices, the costs are just slightly higher than the
optimal one. We also observe that it would lead to a 40% cost
increase to enforce all the DIs to be stored in private clouds.
In addition, although solving mixed integer programming is
NP-hard, Fig. 5(b) shows in practice the running time of
CPLEXTM is very short.
B. Cloud-Based Smart Grid Information Computation
We evaluated a CP that analyzes electricity saving products
once a month (30 days). The settings of clouds, residential
areas, upload prices, and download prices are similar to those
in Section VII-A. In addition, the price of Amazon-I for high
memory on-demand computational task is $2.28 per hour, and
that of Amazon-II is $2.00 per hour. The price of private clouds
for computational task is $3.00 per hour. The unit inter-cloud
transfer price between clouds c1 and c2 is equal to the sum
of the transfer-out price of c1 , the transfer-in price of c2 , and
the communication price between c1 and c2 (note that the
communication price was set to G(0.1, 0.05, 0.05, 0.15)).
For each residential area, the CP first needs a set of tasks,
each of which analyzes the customer behavior data generated
in one residential area. We assumed that these tasks output
10 kilobyte of data for each household. For example, in a
residential area with 10000 households, a 300 gigabyte DI
is generated (recall the setting of customer behavior data
in Section VII-A), and 100 megabyte of intermediate data
is generated by the corresponding task. We assumed that
the time each of these tasks takes is G(15, 1, 10, 20) hours.
After every DI has been analyzed by the corresponding task,
another task takes all the intermediate data and analyzes what
kinds of power saving products should be produced to meet
customers requirements of electricity saving. We assumed that
1 megabyte output is generated by this task and G(5, 1, 1, 9)
hours is needed to finish this task. We further assumed that a
secret commercial model is used in this task, and hence this
task should be run in the private clouds for a privacy reason.
In Fig. 5(c), we compared the costs of four cases: OPT,
STOC, WST, and PRVT. OPT represents the case where the
optimal solution is achieved. STOC and WST represent the
stochastic programming and the worst case method, respectively, when the task computational costs and download prices
are unknown in advance. PRVT represents the case where all
the DIs must be stored in private clouds and all the tasks must
be executed in private clouds. We observe that in STOC and

WST, although we may not know the exact values of task


computational costs and download prices, the costs are just
slightly higher than the optimal one. We also observe that it
would lead to a 50% cost increase to enforce all the DIs to be
stored in the private clouds of the electric utility and all the
tasks to be executed in the private clouds. In addition, Fig. 5(d)
shows that the running time of CPLEXTM is no more than
5 seconds in all cases studied, although our formulation is an
integer programming.
VIII. C ONCLUSION
In this paper, we have proposed a cloud-based SG information management model and presented a cloud and network
resource optimization framework to solve the cost reduction
problem in cloud-based SG information storage and computation. Simulations have shown that our optimization framework
can significantly reduce the SG information management cost.
ACKNOWLEDGMENT
We thank the editor and the reviewers whose comments on
an earlier version of this paper have helped to significantly
improve the presentation and the content of this paper.

[17] K. Nagothu, B. Kelley, M. Jamshidi, and A. Rajaee. Persistent Net-AMI


for microgrid infrastructure using cognitive radio on cloud data centers.
IEEE System Journal, 6(1):415, 2012.
[18] V. Nikolopoulos, G. Mpardis, I. Giannoukos, I. Lykourentzou, and
V. Loumos. Web-based decision-support system methodology for smart
provision of adaptive digital energy services over cloud technologies.
IET Software, 5(5):454465, 2011.
[19] S. Rusitschka, K. Eger, and C. Gerdes. Smart grid data cloud: A
model for utilizing cloud computing in the smart grid domain. IEEE
SmartGridComm10, pages 483488, 2010.
[20] S. Sakr, A. Liu, D. M. Batista, and M. Alomari. A survey of large
scale data management approaches in cloud environments. IEEE
Communications Surveys and Tutorials, 13(3):311336, 2011.
[21] Y. Simmhan, S. Aman, B. Cao, M. Giakkoupis, A. Kumbhare, Q. Zhou,
D. Paul, C. Fern, A. Sharma, and V. Prasanna. An informatics approach
to demand response optimization in smart grids. Technical Report,
Computer Science Department, University of Southern California, 2011.
[22] Y. Simmhan, A. Kumbhare, B. Cao, and V. Prasanna. An analysis
of security and privacy issues in smart grid software architectures on
clouds. IEEE International Conference on Cloud Computing, 2011.
[23] C. Wang, K. Ren, and J. Wang. Secure and practical outsourcing of linear
programming in cloud computing. In INFOCOM, 2011 Proceedings
IEEE, pages 820 828, april 2011.
[24] Y. Xin, I. Baldine, J. Chase, T. Beyene, B. Parkhurst, and
A. Chakrabortty. Virtual smart grid architecture and control framework.
IEEE SmartGridComm, pages 16, 2011.
[25] H. Zhao, M. Pan, X. Liu, X. Li, and Y. Fang. Optimal resource rental
planning for elastic applications in cloud market. IEEE International
Parallel and Distributed Processing Symposium, 2012.

R EFERENCES
[1] Amazon EC2 Instance Purchasing Options: http://aws.amazon.com/ec2/
purchasing-options/.
[2] Amazon Simple Storage Service: http://aws.amazon.com/s3/.
[3] IBM ILOG CPLEX optimizer: http://www-01.ibm.com/software/
integration/optimization/cplex-optimizer.
[4] National Institute of Standards and Technology. NIST framework and
roadmap for smart grid interoperability standards, release 1.0. 2010.
[5] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic. Cloud
computing and emerging it platforms: Vision, hype, and reality for
delivering computing as the 5th utility. Future Generation Computer
Systems, pages 599616, 2009.
[6] S. Chaisiri, B.-S. Lee, and D. Niyato. Optimization of resource
provisioning cost in cloud computing. IEEE Transactions on Services
Computing, 2011.
[7] J. Dean and S. Ghemawat. MapReduce: Simplified data processing on
large clusters. USENIX OSDI, pages 137150, 2004.
[8] R. V. den Bossche, K. Vanmechelen, and J. Broeckhove. Cost-optimal
scheduling in hybrid iaas clouds for deadline constrained workloads.
IEEE Conference on Cloud Computing, pages 228235, 2010.
[9] X. Fang, S. Misra, G. Xue, and D. Yang. Managing smart grid
information in the cloud: Opportunities, model, and applications. IEEE
Network, 2012.
[10] X. Fang, S. Misra, G. Xue, and D. Yang. Smart grid - the new and
improved power grid: A survey. IEEE Communications Surveys and
Tutorial, 2012.
[11] M. Hajjat, X. Sun, Y.-W. E. Sung, D. Maltz, S. Rao, K. Sripanidkulchai,
and M. Tawarmalani. Cloudward bound: planning for beneficial migration of enterprise applications to the cloud. In Proceedings of the ACM
SIGCOMM, pages 243254, 2010.
[12] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed
data-parallel programs from sequential building blocks. EuroSys, 2007.
[13] P. Kall and J. Mayer. Stochastic linear programming: models, theory,
and computation, volume 156. Springer Verlag, 2010.
[14] H. Kim, Y.-J. Kim, K. Yang, and M. Thottan. Cloud-based demand
response for smart grid: Architecture and distributed algorithms. IEEE
SmartGridComm11, pages 398403, 2011.
[15] J. Z. Li, M. Woodside, J. Chinneck, and M. Litoiu. Cloudopt: Multi-goal
optimization of application deployments across a cloud. International
Conference on Network and Service Management, pages 19, 2011.
[16] A.-H. Mohsenian-Rad and A. Leon-Garcia. Coordination of cloud
computing and smart power grids. IEEE SmartGridComm, pages 368
372, 2010.

Xi Fang [StM 09] received B.S. and M.S. degrees


from Beijing University of Posts and Telecommunications, China, in 2005 and 2008, respectively. He is
a computer science Ph.D. candidate at Arizona State
University. He has received Best Paper Awards at
IEEE ICC 2012, IEEE MASS 2011, and IEEE ICC
2011. One of his co-authored papers was a runner-up
to the Best Paper Award at IEEE ICNP 2010.

Dejun Yang [StM08] received his B.S. from Peking


University, Beijing, China, in 2007. Currently he
is a Ph.D. student in the School of Computing,
Informatics, and Decision Systems Engineering at
Arizona State University. He has received Best Paper
Awards at IEEE ICC 2012, IEEE MASS 2011, and
IEEE ICC 2011. One of his co-authored papers was
a runner-up to the Best Paper Award at IEEE ICNP
2010.

Guoliang Xue [M96, SM99, Fellow11] is a Professor of Computer Science and Engineering at Arizona State University. His research interests include
survivability, security, and resource allocation issues
in networks. He has published over 200 papers in
these areas. He is an Associate Editor of IEEE/ACM
Transactions on Networking and IEEE Network. He
was a Keynote Speaker at IEEE LCN 2011, and
served as a TPC Co-Chair of IEEE INFOCOM 2010.
He is an IEEE Fellow.

You might also like