Professional Documents
Culture Documents
Promotoren:
prof. Dr. Jan Broeckhove
Dr. Kurt Vanmechelen
K. Vermeersch
List of Figures v
Preface x
Abstract xi
1 INTRODUCTION 1
1.1 What is Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 The Cloud (R)Evolution . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Service Models . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.4 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.5 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Amazon’s Cloud Computing Offering . . . . . . . . . . . . . . . . . . . 7
1.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 AWS Product Portfolio . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Instance Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.4 Instance Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Motivation for a EC2 Broker . . . . . . . . . . . . . . . . . . . . . . . 15
2 ENVIRONMENTAL ANALYSIS 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Price Evolution of the On-Demand and Reserved Instances . . . . . . . 20
2.3 Price Evolution Spot . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Working Method . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.3 Price Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.4 SpotWatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
i
2.4 Pricing Comparison of the Different Regions . . . . . . . . . . . . . . . 35
2.4.1 On-Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.2 Reserved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.3 Spot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.4 Comparison of the Pricing Models . . . . . . . . . . . . . . . . 39
2.4.5 Data storage and transfer . . . . . . . . . . . . . . . . . . . . . 41
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 RESOURCE SCHEDULING 44
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Optimal Division between Pricing Models . . . . . . . . . . . . . . . . 46
3.2.1 Reserved vs On-Demand Instances . . . . . . . . . . . . . . . . 46
3.2.2 Spot Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Workload Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1 Workload Models . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.2 Workload Constraints . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Workload Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4.1 Workload Model 1 (total VM hours needed is specified) . . . . . 54
3.4.2 Workload Model 2 (every hour #VMs needed is specified) . . . 56
3.5 Spot Decision Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.1 Checkpointing . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5.2 SpotModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5.3 Implementation Changes . . . . . . . . . . . . . . . . . . . . . 63
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 BROKER DESIGN 66
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 Broker Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.1 Task Generation and Specification . . . . . . . . . . . . . . . . 68
4.2.2 Price Gathering and Analysis . . . . . . . . . . . . . . . . . . . 71
4.3 Broker Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3.1 Region Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3.2 Workload Distribution . . . . . . . . . . . . . . . . . . . . . . . 74
4.4 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4.1 Reserved Model . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4.2 Spot Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.5 Broker Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.5.1 Graphical Representation . . . . . . . . . . . . . . . . . . . . . 78
4.5.2 Textual Representation . . . . . . . . . . . . . . . . . . . . . . 80
4.5.3 Detailed Cost Overview . . . . . . . . . . . . . . . . . . . . . . 80
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
ii
5 BROKER EVALUATION 83
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Cost Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2.1 Workload Model 1 . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2.2 Workload Model 2 . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3 Scalability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3.1 Workload Model 1 . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.3.2 Workload Model 2 . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 CONCLUSION 94
6.1 Conclusions and Contributions . . . . . . . . . . . . . . . . . . . . . . 94
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Appendices 98
iii
Appendix G G: Developed Software 117
G.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
G.2 Environmental Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . 117
G.3 Broker Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
iv
List of Figures
v
2.15 Boxplots Spot Price Evolution (High-CPU Extra Large Instance in
the US-East Region) . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.16 Screenshot of SpotWatch.eu . . . . . . . . . . . . . . . . . . . . . . . 34
2.17 Average Spot Price for Standard Large Linux instance in the US-East
Region between December 14th 2009 and February 14th 2010 . . . . 38
2.18 Instance Pricing Comparison US-East Region . . . . . . . . . . . . . 39
2.19 Instance Pricing Relative Comparison US-East Region . . . . . . . . 40
2.20 Instance Pricing Relative Comparison EU-West Region . . . . . . . . 41
vi
4.11 Broker Output GUI Details SubTask . . . . . . . . . . . . . . . . . . 79
4.12 Broker Output GUI Cost Overview . . . . . . . . . . . . . . . . . . . 80
4.13 Broker Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . 81
C.1 CPU Procurement Cost Evolution for Intel Xeon E5430 (per unit) . 107
C.2 CPU Procurement Cost Evolution for Intel Xeon E5507 (per unit) . 108
vii
List of Tables
viii
3.5 Spot Prices in Percentage of the On-Demand Prices . . . . . . . . . 51
3.6 Reserved prices become cheaper than spot prices after the stated
amount of days . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
ix
Preface
The subject of this thesis, designing a broker for intelligent (cost-efficient and QoS-
aware) cloud resource allocation, was presented to me during my first CS Masters
year at the University of Antwerp. I couldn’t wait to take on the challenge. The
choice of my Computer Science major, namely Distributed Systems and Computer
Networks, emphasizes my special affinity with cloud computing.
This thesis is written using a very data-analysis and prototype-driven work method.
Regular meetings with my mentors kept me on the right track, while keeping a blog1
up-to-date provided me valuable feedback from the industry and obliged me to start
the writing process early. The first term (fall 2010) was focused on the research
aspect, such as the price analysis and region comparison. The second term (spring
2011) was spent finishing the broker prototype and writing this document.
First of all, I’d like to thank my promoter Prof. Dr. Jan Broeckhove, co-promoter
Dr. Kurt Vanmechelen and mentor Ruben Van den Bossche. They are all members
of the research group Computational Modeling and Programming (CoMP) at the
University of Antwerp. Without the continuous support of Kurt and Ruben my
work wouldn’t be as valuable, it was a very pleasant experience to work with them.
I’m grateful for the opportunity I got and for the valuable skills I could develop
during the work on this thesis. Of course I would not have been able to successfully
get my computer science degree without the support of my parents and my fellow
students, who became great friends during the course of my college experience.
Let’s finish by thanking the providers of cloud applications, such as Dropbox and
Google Docs that helped me during the writing process. Most importantly however
I want to thank the people behind EC2 at Amazon, all their tools and products are
easy to use and well-documented. I hope the information in this thesis will be as
valuable for you, as the experience of gathering it was for me.
1
The thesis blog can be found at http://www.thesis.kurtvermeersch.com/.
x
Abstract
In terms of pricing, a user has the choice between the following models:
1. On-Demand Instances are priced with a fixed hourly charging rate. Once
launched, these instances are guaranteed to be kept live as long as the user
pays for them. A user however does not have a full guarantee of being able to
launch on-demand instance(s).
2. Reserved Instances require an upfront payment for an instance for a one or
three year period, which is then supplemented with usage based pricing based
on a fixed hourly price. These instances come with a guaranteed availability
for the user.
3. Spot Instances have prices that vary hourly. They can be shut down by EC2
if the consumer’s bid no longer exceeds the spot market price, so there are no
guarantees for the user.
At present, consumers do not have any tools to optimally map their workload and
QoS requirements (such as the deadline by which a workload needs to finish), to
these different pricing plans. Nevertheless the potential for cost reductions through
intelligent instance allocations are huge, the spot prices for example are on average
a lot lower than the on-demand prices. We devise heuristics in this thesis that can
be used by a brokering component to realize this optimization goal. This involves
an analysis of the consumer’s workload characteristics, QoS requirements and the
volatility of the EC2 spot market to define an optimal portfolio of instance allocations
across the three pricing plans.
xi
CHAPTER 1
INTRODUCTION
This chapter first introduces the technology that forms the focus of this thesis, which
is “Cloud Computing”. Then the cloud concepts and how these are implemented by
Amazon in the Elastic Compute Cloud (EC2) products are discussed. The chapter
concludes with a detailed overview of and motivation for the subject of this thesis.
1.1.1 Definition
One of the definitions of cloud computing that best fits our problem context, is given
in “Cloud Computing and Grid Computing 360-Degree Compared” [4].
The Cloud Computing term is often used as a buzzword for the big switch to an
IT world where computation and storage resources are provided over the Internet.
This is the reason why a lot of different definitions and misconceptions about it
exist. In “The Cloud Revolution” [5] Charles Babcock sketches the typical case in
which a company’s CEOthe cloud is ‘the next phase of Internet computing’, but that
the meaning of this term is now more muddled than ever. Since cloud computing
is a rather new step in the computing evolution, it is constantly redefining itself,
which makes it hard to describe it with one generic definition. The definition by Ian
Foster however emphasizes a number of the key features of cloud computing, these
are explained here.
It all started in 1960 when John McCarthy said “computation may someday be
organized as a public utility” [9]. In 1966 a lot of the characteristics of cloud
computing, such as elasticity, the illusion of infinite supply and the comparison
with the electricity industry, were mentioned by Douglas Parkhill in the book
“The Challenge of the Computer Utility” [10]. The term ‘cloud’ originated in the
telecommunication sector whenVPNs were introduced to balance the load, instead
of using direct point-to-point connections, the decentralized network structure was
referred to as the cloud. Then the cloud symbol found is way to telecommunication
network diagrams, in which it represents the telephone network. Later on, the
symbol also started representing the Internet in computer network diagrams.
Nicholas Charr stated in The Big Switch[11]: “Cheap utility-supplied computing will
ultimately change society as profoundly as cheap electricity did.” A hundred years
ago companies stopped producing their own power, since getting power through
the electricity grid became cheaper. These companies could focus on their core
business activities, and no longer had to worry about power production. With IT
this shift is happening as well. A number of non-IT companies will no longer need
IT departments and own a server park. Even IT startups don’t need to invest in
expensive servers any longer. FourSquare, a startup that created a location-based
social network, is running its applications on top of Amazon EC2 [12]. Computing
is becoming a utility, one plugs a cable in the wall and has access to it over the
Internet. A commoditization of IT infrastructure is happening, which means that
cloud computing is becoming commonplace and standardized. Cloud computing is
a metered service, which means that the customer only pays for the services and
capacity that is used. The usage is measured according to well-defined models.
The last few years, large IT enterprises are making big investments in cloud
computing. Companies such as Google (Google Apps [13]), Microsoft (Windows
Azure [14]), Amazon (EC2 [2] and S3 [15]) and IBM are investing a lot in their
cloud computing services. These companies are building large next generation data
centers containing tens of thousands of servers. These investments are crucial in
making the growth of their services and the rise of cloud computing a reality.
Different deployment models for the cloud computing model exist: public, private
and hybrid clouds.
A public cloud means that the customer uses an off-site third-party cloud provider.
A private cloud means that the customer emulates the cloud computing concepts
on a privately owned data center. This emulation is made possible by a number
of virtualization products, that offer the ability to host virtual machines on the
infrastructure (that is solely used by one organization). These products provide
some of the advantages of cloud computing, but still require an up-front investment
in hardware (they lack the advantages inherent to provisioning capacity from a third-
party provider).
A hybrid cloud means that a public and private cloud are combined. A company
can host a part of the application portfolio on managed dedicated servers, while the
other part of the portfolio is hosted on public cloud instances.
1.1.4 Characteristics
A number of the cloud computing characteristics were already touched upon, when
discussing the definition of cloud computing. This section gives an overview of the
most important characteristics and associated advantages of cloud computing.
Device and Location Independence: The cloud resources are accessible over
the Internet, which makes the access to the applications hosted on these resources
device (as long as it has Internet access) and location independent.
Pay-what-you-use: The customer only pays for the services he actually uses.
There are clear rules defined about how the user is charged for the consumed
resources, so it is important that the resource usage is measured properly.
Green IT: Cloud computing is a form of green IT, since it is ecologically friendly
to use a data center that increases the utilization rate of the hardware, which lowers
the total power usage. The cloud providers invest to achieve an optimal Power
Usage Effectiveness (PUE) [21] of their data centers, this is important since power
consumption is one of the highest operational costs of a data center.
1.1. WHAT IS CLOUD COMPUTING 6
CapEx to OpEx: Cloud computing turns capital expenses, which existed in the
form of an up-front investment for buying servers, into operational expenses. A
consumption-based pricing model is in use, which means that one does no longer
pay for resources that aren’t needed.
1.1.5 Challenges
Not all the characteristics of the cloud computing paradigm are advantageous, it
does introduce a number of (new) challenges.
Data and vendor lock-in: Data and vendor lock-in can occur in a cloud context,
if a provider makes it difficult or impossible to move data or applications out of their
data centers to another cloud provider. To solve this problem standardization of the
technologies and procedures involved is required.
Privacy: Privacy concerns rise, because countries have different regulations. The
United States’ PATRIOT ACT [23] for example makes it possible for the US
government to request data access when a terrorist threat is feared. This problem
can be partially solved by allowing customers to select an availibility zone, which
ensures them their instances run in the corresponding geographical part of the world
where legislation acceptable to the customer is in use.
2
A good illustration to this uncertainty is the failure Amazon EC2 experienced on April 22nd
2011, when a large number of instances and services were not available for multiple hours, see [22].
1.2. AMAZON’S CLOUD COMPUTING OFFERING 7
1.2.1 Introduction
Amazon decided to modernize its data centers after the dot-com bubble (in 2000),
since their servers were most of the time only using a small fraction of the maximum
capacity. Because of the introduction of the cloud model, which consolidates
workloads on virtualized systems, Amazon obtained much higher resource utilization
rates. This increased efficiency gave them the insight that they could have even
more of the advantages of cloud computing, if a larger variety of workloads was
available. This variety could be achieved by offering cloud solutions to the public. By
operating the famous web book shop, which made Amazon a well known company,
they obtained the knowledge about the technologies that are needed to develop a
cloud product. Thus Amazon decided to develop a cloud product, which lead to the
beta release of Amazon Web Services (AWS) in 2006. Diversification means trying
to introduce a revenue stream from a new market, which can reduce the business
risks of a company.
Compute Products
EC2 is the web service that offers compute capacity in the cloud, it allows the user
to obtain capacity in the form of instances. EC2 is a public virtual computing
4
This overview was created in February 2011, it might not be up-to-date. Check out http:
//aws.amazon.com/ for detailed information.
1.2. AMAZON’S CLOUD COMPUTING OFFERING 8
Storage Products
Amazon S3 provides a web service that can be used to store and retrieve any amount
of data, at any time, from anywhere on the web. This service gives the customer the
perception of infinite storage capacity. S3 users are charged on a pay-what-you-use
basis as well. The objects stored in S3 can be up to 5TB in size (upgraded from 5GB
recently5 ). These objects are stored in buckets that can be accessed by providing a
unique developer key.
The other storage services that Amazon provides are Amazon Elastic Block Store and
AWS Import and Export functionality. The latter accelerates moving large amounts
of data into and out of AWS, using portable physical storage devices for transport.
Amazon Elastic Block Store is an alternative root device that can be used to launch
an Amazon EC2 instance. Data on Amazon EBS will persist independently from
the lifetime of the instance, data on the instance’s local store on the other hand only
remains available during the lifetime of the instance.
Database Products
Amazon Relational Database Service (Amazon RDS) is a distributed relational
database in the cloud. Amazon RDS is based on the familiar MySQL database,
which means that applications and tools that work with MySQL databases will work
seamlessly with Amazon RDS. Amazon SimpleDB is a web service that provides the
functionality to do data indexing and fast querying of the data.
5
On December 9th 2010 the Object Size Limit was raised to 5TB.
1.2. AMAZON’S CLOUD COMPUTING OFFERING 9
Messaging Products
The Amazon Simple Queue Service (Amazon SQS) allows developers to move data
between distributed components of their applications that perform different tasks,
without losing messages or requiring each component to be always available. Amazon
SNS (Simple Notification Service) enables sending notifications to subscribers or
other applications from the cloud. This is done by a push mechanism, which means
that whenever the customer posts a notification it is propagated directly to all the
subscribers for the corresponding topic and delivered over the protocol of choice of
the subscriber.
Networking Products
Amazon Route 53 is a DNS service. It translates human readable names into the
numeric IP addresses of the servers associated with the web service. Route 53
answers DNS queries with low latency by using a global network of DNS servers.
Amazon VPC (Virtual Private Cloud) creates a secure and seamless bridge between
the existing IT infrastructure of a company and the AWS cloud, this is achieved
by accessing EC2 through an IPSec based virtual private network (VPN). This way
Amazon also tries to embrace the hybrid cloud model that combines a private cloud
with EC2. Elastic Load Balancing automatically distributes incoming application
traffic across multiple Amazon EC2 instances using a number of protocols, namely
HTTP, HTTPS, TCP and SSL. The service is also able to detect unhealthy instances
and will in that case cease routing traffic to the affected instances.
Conclusion
Amazon developed an array of different cloud products through the years6 . Their
cloud services have been expanding and evolving constantly ever since the beta
introduction in 2006. James Hamilton wrote the following on his blog [25] which
reflects the fact that Amazon is involved in an emerging market:
Even working in Amazon Web Services, I’m finding the frequency of new
product announcements and updates a bit dizzying. It’s amazing how
fast the cloud is taking shape and the feature set is filling out. Utility
computing has really been on fire. I’ve never seen an entire new industry
created and come fully to life this fast. Fun times.
power, Amazon introduced the EC2 Compute Unit (ECU). One ECU matches the
equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
This is also the equivalent capacity of an early-2006 1.7 GHz Xeon processor, which
is referenced in Amazon’s original documentation. Amazon’s data centers contain
commodity off-the-shelf hardware, so through the years different microprocessors
were used. The fact that Amazon uses ECU to specify the compute power of the
different instance types it offers, has as a consequence that it is quite difficult to
compare prices across different IaaS cloud providers, since this measure is Amazon-
specific.
The instance types that exist today are divided in groups that have similar
characteristics and are well suited for a certain workload type. These instance
type groups are Standard, Micro, High-Memory, High-CPU, Cluster-Compute and
Cluster-GPU. Per instance group we’ll compare the corresponding instances in a
table. The stated ‘ECU’ amounts can be obtained by offering a different number
of virtual CPU cores (given in the ‘cores’ column). The ‘platform’ column of the
instance comparison tables state whether the instance is a 32 or 64 bit system. The
‘I/O performance’ is labeled with a subjective grade (e.g. moderate). The fact that
Amazon does not use an absolute measure, indicates that the performance is highly
dependable on how many customers are sharing the resource at a given moment in
time.
Cluster instances come in two varieties: Compute and GPU instances. Cluster
Compute instances are best suited for HPC (High Performance Compute) appli-
cations. These instances are powered by 2 Intel Xeon X5570 quad-core Nehalem
processors and have an improved I/O performance, thanks to the increased network
performance (10 Gigabit Ethernet connections). The Cluster GPU instances on
the other hand are best suited for applications that make use of highly parallelized
processing, such as rendering and media processing applications. These instances
contain two NVidia Tesla Fermi M2050 GPUs and contain the same processor and
networking interface as the Cluster Compute instances.
All these instances are launched using a certain AMI, which determines the operating
system and software stack that is available at launch. Amazon EC2 currently
supports a variety of operating systems including: RedHat Linux, Windows Server,
OpenSuSE Linux, Fedora, Debian, OpenSolaris, Cent OS, Gentoo Linux, and
Oracle Linux. This supply is constantly expanding, EC2 suppports more and more
platforms.
Amazon offers three pricing models to their customers, On-Demand, Reserved and
Spot pricing.
On-Demand Instances
In the On-Demand pricing model the customer pays for compute capacity by the
hour, this hourly rate is fixed. There is no upfront investment (i.e. a fixed fee per
instance), which means that there is no long-term commitment for the customer.
This makes the user more flexible about starting and terminating instances whenever
more or less compute capacity is needed by his application. It is however possible
that for short periods of time no more instances, in a certain availability zone,
are available. Whenever one has to be assured about the availability of a certain
number of instances,this number of instances should be reserved instead of acquiring
them as an on-demand instances. The on-demand pricing model is best suited
for applications with short-term, spiky, or unpredictable workloads that cannot be
interrupted. Applications that are in testing or development phase are also suited
for the on-demand pricing model.
Table 1.6 presents the pricing for the on-demand pricing model for all instance
types in the US East region.
Reserved Instances
Reserved Instances require a one-time upfront payment (a fixed price), which
reserves the instance for a one or three year term (depending on the customer’s
choice). As a compensation for this upfront investment, the rate per hour is
significantly lower and the customer is assured that his instance, with the chosen
operating system and availability zone, will be available at any time. If one needs
a certain instance constantly for a significant amount of time, it is cheaper to use
the reserved than the on-demand pricing model. The tipping point for a reserved
1.2. AMAZON’S CLOUD COMPUTING OFFERING 13
instance to be cheaper than an on-demand instance depends on the instance type and
on the utilization rate of the instance. This is analyzed in section 3.2.1 of chapter
3. The reserved pricing model is best suited for applications with steady state
or predictable usage and for applications that require reserved capacity, including
disaster recovery software.
Table 1.7 presents the pricing for the reserved pricing model for all instance types
in the US East region.
Spot Instances
In December 2009, a new pricing model was released, called Spot pricing. Spot
Instances do not require an upfront commitment and most of the time the hourly
rate is lower than the on-demand rate. In this model the hourly price fluctuates, it
is set by Amazon based on the supply and demand of instances. Amazon can sell its
excess capacity using this pricing model. A customer specifies the maximum price
(bid) he is willing to pay for the instance. When the Spot Price becomes higher
than the customer’s maximum bid, the instance is shut down by Amazon. The spot
pricing model is best suited for applications that have flexible start and end times
and for applications that are only feasible at very low compute prices. The average
spot prices and an analysis of the history of the spot prices will follow in the next
chapter.
Dedicated Instances
This pricing model was introduced on March 27th 2011 for the Amazon Virtual
Private Cloud (VPC) product. Dedicated instances are launched on hardware
dedicated to a single customer. It ensures that Amazon EC2 compute instances are
running in an isolated environment, such that a customer’s application performance
can’t be influenced by the workload of other customers. The single tenancy can
be assumed to be limited to the local disk, processor and memory. The network
and networked storage devices are still shared by multiple customers. Two versions
exist, one without an upfront commitment called Dedicated On-Demand instances
and another one with a one or three year upfront fee called Dedicated Reserved
1.2. AMAZON’S CLOUD COMPUTING OFFERING 14
instances. For now this offering is only released in the US-East and the EU-West
regions. It is available for all instance types, except for the Micro, Cluster Compute
and Cluster GPU instances. The prices for dedicated instances, presented in Table
1.8 and Table 1.9, are between 17% and 25% higher (the fixed fee as well as the hourly
rate for reserved instances) than the regular on-demand and reserved instances. On
top of this, there is an additional dedication fee of $10 per hour for the regions
in which at least one dedicated instance is running. The convenience of having an
isolated resource is offered at a rather high cost.
Free Tier
On November 1, 2010 Amazon introduced a free usage tier for their AWS services.
The following is offered to new customers for a duration of 12 months on a per
month basis:
• 750 hours of Micro instance usage.
• 750 hours of an Elastic Load Balancer and 15 GB of data processing.
• 10 GB of EBS, 1 million I/Os, 1 GB of snapshot storage, 10000 snapshot get
requests and 1000 snapshot put requests.
1.3. MOTIVATION FOR A EC2 BROKER 15
Another part of the free usage tier is offered to existing customers as well, and
these offerings will not expire after 12 months. Everyone gets 25 Amazon SimpleDB
machine hours with 1 GB of storage and 100000 Amazon Simple Queue Service
requests, 10 CloudWatch alarms and a number of Simple Notification Service
requests and notifications.
Table 1.10 gives an overview of the instance types that are offered by Amazon
EC2 using the different pricing models and operating systems in the different
geographical regions. The Cluster Compute and Cluster GPU instances are not
offered everywhere, these instances are only offered running Linux in the US-East
region.
The pricing models that are offered by Amazon, as stated in the previous section,
provide the user the following options:
• Allocate on-demand instances that are priced with a fixed hourly charging
rate.
This model however requires an upfront payment per instance for a one or
three year period.
Determining how much running an application on EC2 costs is a hard task. The cost
depends on a lot of properties of your application’s workload, such as what instances
and cloud services it requires, in what geographic location it has to run, how much
data storage and data transferring is required, and so on. Although a simple cost
calculator [26] is available, it still requires a lot of manual decisions that are hard
to make, such as what instances and instance pricing models have to be used. It’s
not possible to determine the optimal, in the sense of the one with the lowest cost,
solution for running your application on EC2, that still respects the QoS constraints
of your application/workload.
For the moment consumers do not have any tools to optimally map their workload
and QoS requirements (such as the deadline by which a workload needs to finish),
to the different EC2 pricing plans. Nevertheless the potential for cost reductions
through intelligent instance allocations are huge, since the difference in hourly prices
paid using the different models is significant. The difference between these models
and an analysis of the historic prices will be discussed in the next chapter.
The goal of this thesis is to devise heuristics that can be used by a brokering
component to realize this optimization goal. This involves an analysis of the
consumer’s workload characteristics, QoS requirements and the volatility of the EC2
spot market to define an optimal portfolio of instance allocations across the three
pricing plans.
CHAPTER 2
ENVIRONMENTAL ANALYSIS
This chapter explains what environmental parameters are taken into account in the
heuristic this research proposes for the EC2 resource broker. These parameters
mostly resulted from the analysis of Amazon’s pricing models and the actual prices
of the different AWS products and services. The environmental parameters that can
influence the total cost for the customer considerably, will be considered important.
Other parameters will be ignored, since the total cost is not influenced much by them.
Determining which parameters to take into account is a trade-off that influences the
complexity of the broker.
2.1 Introduction
The price of running a particular application, with a corresponding workload, on
EC2 does not only depend on the chosen pricing model. The first choice that
determines the associated cost is the choice of the instance type that the application
requires. We’ll presume this choice to be made by the user in the proposed broker,
but determining an intelligent choice of instance type could be automated through
further research. Depending on the type of workload, the application can be
benchmarked for a fixed amount of time and based on the acquired data, the resource
requirements of the application can be determined, which directly results in a best
fit instance type.
There are other degrees of freedom when choosing the instance on which a workload
will be handled, some of these choices influence the price. One of these parameters
is the choice of operating system (OS) of the instance, EC2 offers both Linux and
Windows instances. This choice is very dependent on the application characteristics.
A geographical region should be chosen for the instance as well, depending on the
2.1. INTRODUCTION 18
application this choice is totally constraint free. As will be shown later on, this
choice can make a big difference in the resulting cost. Another degree of freedom is
the availability zone, but all zones within a region do offer the same prices on EC2.
The pricing model that is chosen, clearly influences the total cost significantly. The
choice between on-demand and reserved instances only depends on how long the
instance will be needed by the customer. There is a tipping point (as will be shown
in chapter three) from where on taking a reserved instance is the appropriate choice
to ensure the lowest total cost possible. Whether the spot pricing model should be
used, depends on whether the application allows to be interrupted on unexpected
moments in time. Certain types of workload will require a checkpointing technique
to be used. This checkpointing/snapshotting overhead takes time away from the
instance hour and should be taken into account when determining the total cost
for a certain workload. Since these parameters are workload dependent, different
workloads will be looked into (see chapter 3).
Amazon offers different instance purchasing options are on the basis of a number
of environmental parameters that can be exploited to reduce the cost of running a
workload on EC2.
• On-Demand Instances let customers pay for compute power by the hour. There
are no long-term up-front commitments, which means the customer is free
from the costs of purchasing and maintaining hardware. The large fixed costs
(capital expenses) are transformed into smaller variable operation expenses.
The need for an own capacity safety net, which is needed in case of traffic
peaks, is removed. Extra instances can be launched, even automatically, in
case of an unexpected high traffic load. The evolution of the on-demand price
over time will be discussed in this chapter, to determine whether this price
change has to be accounted for in our model. We can assume that cloud
provider competition and lower hardware costs over time will cause the on-
demand prices to drop. A comparison of the on-demand prices in the different
regions will be discussed and conclusions will be made about whether certain
instance types are cheaper in certain regions.
• Reserved Instances require the user to make a one-time upfront payment (every
1 or 3 years according to the user’s choice). The hourly rate for these instances
is however significantly lower than the corresponding on-demand price. The
customer has no further obligations, he may choose how much of the time a
reserved instance is used, and will only be charged for the hours the instance
is actually running. The evolution of the reserved prices over time will be
discussed in this chapter, to determine whether there have been price changes
both for the upfront fee as for the hourly rate. We assume again that this will
be a price reduction, due to the fact that the hardware cost for Amazon to offer
these resources becomes lower over time. A comparison of the reserved prices
in the different regions will be made, from which we can conclude whether
certain instance types are cheaper in certain regions. We will determine from
what level of usage a certain region becomes cheaper, the difference is caused
2.1. INTRODUCTION 19
by differences in the upfront fee versus the hourly rate ratio in the different
regions.
• Spot Instances allow customers to bid on unused Amazon EC2 capacity, the
corresponding instances are periodically priced based on the level of supply and
demand (see appendix D). As long as this price is lower that the customer’s
bid, the customer will be able to run a requested instance. If an application is
flexible enough to run on spot instances, the Amazon EC2 costs can become
significantly lower. So, it is important to study the spot price history in
order to develop an intelligent resource allocation heuristic that takes spot
instances into account. The statistical analysis of the spot price history focuses
on finding trends in the spot price evolution. This means investigating the
difference in price between the different geographical regions, but also between
night and day, between week and weekend days, and so on. The spot price is
also compared to the different pricing models across the different regions. We
try to determine whether these prices are purely based upon the rules of supply
and demand, or whether these prices are artificially chosen to meet Amazon’s
needs. More information about the statistical terms and the boxplots that are
used in this chapter can be found in the appendix E.
The research in this chapter concerns an empirical analysis, based on the history
of the EC2 instance prices. An important preliminary is to find a way to acquire
this pricing history information. The current on-demand and reserved prices can
be found (subdivided per region) on the EC2 pricing website [27]. The EC2 API
does not provide a way to fetch the current on-demand and reserved prices. The
history of these on-demand and reserved prices is pretty hard to determine. The
only way to find out when and how much these prices changed was to examine the
announcements in the News and Events section on the AWS EC2 website [28] to
find out when price reductions occurred. To determine what the previous prices
were several blogs and forums had to be consulted. The spot price history on the
other hand can be requested through the Amazon EC2 API, although only the spot
prices of the last 3 months can be obtained. Thus the API is not the ideal source
for spot prices when a long term analysis is desired. The spot price history data can
be found on cloudexchange.org [29] and exported to a CVS file containing record
tuples of a timestamp (date and time) and the spot price of that particular moment
expressed in US dollars.
2.2. PRICE EVOLUTION OF THE ON-DEMAND AND RESERVED
INSTANCES 20
In the first appendix B the pricing history can be found in detail, but a general
overview of the evolution in on-demand prices in the US-East region is given by the
following Table 2.1. Note that the dark grey colored cells contain the prices in US
dollars that are currently in use.
1
Micro instances were announced on September 9th 2010
2.2. PRICE EVOLUTION OF THE ON-DEMAND AND RESERVED
INSTANCES 21
Table 2.1 demonstrates that the on-demand prices of the Standard and High-CPU
Unix instances have decreased 15%, since their introduction years ago. The High-
Memory instances have decreased over 16% in price since their introduction in 2009.
The instances that were introduced in 2010 haven’t gotten a price update yet. The
price of the Windows instances has decreased less over time, the Standard and High-
CPU instances became about 4% cheaper in 2 years. What causes this difference in
price reduction of the on-demand Linux and Windows instances is hard to determine.
On possible explanation is that Microsoft is still fine-tuning the licensing costs to
charge Amazon for the Windows operating system. Another explanation is that
Windows instances were already priced more competitive at their launch date. The
Windows High-Memory instances however got a serious price decrease of about 14%.
The Windows instances that launched in 2010 haven’t had a price update yet.
Next a general overview of the evolution of the reserved prices is given in the Table
2.2.
When the hourly rate for reserved instances changes, this also impacts the hourly
rate for reserved instances acquired before the rate change. This is not mentioned
in the official terms, but an official press release [28] of the price reduction that
happened in September 2010 shows this is done in practice.
If you have existing Reserved Instances your hourly usage rate will
automatically be lowered to the new usage rate and your estimated bill
will reflect these changes later this month.
One still has paid the original one-time fee, so the possible price reduction of the
one-time fee could still be taken into account by a broker. We notice that not many
price reductions have happened in the EC2 history. Most of the new prices were
2.3. PRICE EVOLUTION SPOT 22
introduced when a new instance type was introduced. The only real general price
reduction happened in October 2009, when all on-demand prices were lowered up to
15%. The increasing competition in the IaaS market could introduce more regular
price changes in the future. There are competitors on the IaaS cloud market who
are gaining market share, such as CloudSigma, ElasticHosts, FlexiScale, GoGrid,
RackSpace, and so on.
Spot instances open a market model where unused resources can be sold through
the mechanisms of supply and demand (see appendix D), but are these rules
actually carried out? The problem could be that the pricing model is not purely
supply/demand driven, because of the fact that there is only one seller, namely
Amazon. No one can assure that Amazon does not use all the information it has at
hand to determine the (for them most profitable) current spot price. Amazon could
for example use the maximum bids customers placed on the spot market, to choose
the price that maximizes their profit.
Customers with applications that need resources urgently can bid a higher price on
the spot market to get the remaining resources at their disposal. Specifying a higher
maximum bid raises the priority of a request for capacity. An interesting question
is why customers wouldn’t always bid the on-demand price on the spot market, in
the hope to always pay less than the on-demand price on the spot market? The
problem with this technique is that the spot price can actually rise above the on-
demand price, which would cause all spot instances to get terminated at once. Spot
Instances are especially applicable to certain applications that are flexible (are easily
stopped and started again) such as financial modeling, web crawling, load testing
and video conversion jobs. Note that these tasks can be performed in iterations,
which makes taking snapshots rather easy. It remains however difficult to choose
an intelligent checkpointing scheme, since the best scenario, the one that causes
the least amount of overhead, would be to only take a snapshot right before the
spot instance gets terminated. Snapshotting makes these types of applications more
resilient to the fact that spot instances are terminated when the current spot price
exceeds the maximum bid.
The following boxplot (Figure 2.2) shows that the spot price in the US-East region for
Standard Small instances rose above $0.085/hour on multiple days, the on-demand
price for the US-East region. On a number of days the boxplot indicates outliers that
represent spot prices higher than the corresponding on-demand price of the instance.
We notice that during most days of the month the quartiles are positioned around
the $0.031/hour point, which is the spot price of the corresponding instance most
of the time. This phenomenon can be explained by the fact that when a number of
users need resources urgently, the users can specify a maximum bid higher than the
on-demand price to raise the relative priority of their requests. This allows them to
gain access to as much immediate capacity as possible. If only a limited amount of
spot instances are available this can cause the spot price to rise above the on-demand
2.3. PRICE EVOLUTION SPOT 24
price, thanks to the rules of supply and demand. The spot price does not follow the
rules of supply and demand all of the time, since we sometimes get out of capacity
errors without changes of the spot price as discussed previously.
Figure 2.2: Example of Spot Prices exceeding the On-Demand Prices in 2010
(Standard Small Linux instance in the US-East region)
• Average per week: We want to be able to investigate whether there are certain
weeks in the year when the spot market behaves differently, for example it could
be that prices decrease during the Christmas holidays, since a lot of people are
off from work and thus certain cloud application need less capacity. Or maybe
this is a period of higher demand, since people buy gifts online and web shops
need more capacity.
• Average per day of the week: The average per day of the week is included to
determine whether a pattern can be found in the price of the spot market across
the days of the week. This could for example make it possible to determine
2.3. PRICE EVOLUTION SPOT 25
whether during the weekend, when a lot of people are off from work, the spot
prices are lower (or higher).
• Average per hour of the day: The average per hour of the day is included
to determine whether a pattern can be found in the price of the spot market
during the day. We could for example find out whether prices decrease during
the night in a certain region (taking the time differences into account).
For all these scenarios we created a boxplot of the data and a plot of the average
price. Boxplots are very good at presenting information about the central tendency,
symmetry and skew, as well as outliers of the data. This data can be used to
make the EC2 capacity planner more intelligent, it could for example be used to
foresee cheaper prices at certain times of the day. To make this possible some
statistics, including for example the mean, skew and kurtosis, are extracted from
the spot price history data set, these values are stored for every scenario for each
OS-Region-Instance type combination possible. The different boxplot components
and the statistical values that are calculated are explained in the appendix E.
Figure 2.3: Average Spot Price per Day (High-Memory Double Extra Large Linux
instance in the US-East region)
2.3. PRICE EVOLUTION SPOT 26
Although the average prices could suggest considerably higher spot prices on certain
dates, the boxplots show that the fluctuations in average price are caused by outliers.
The percentiles in the boxplots are aligned quite well, also on the days when the
average is higher. This means that most of the values on such a date are still lower
than the Q3 border, which is situated around $0.176/hour. If we take into account
the lower and upper whisker ends, it becomes even more clear that most prices,
lay within the same range of values on all dates. The fluctuations in average spot
price can only be explained by the existence of outliers. For this instance we get the
following statistical values: Q1=0.165, Q2=0.172, Q3=0.176, an arithmetic mean of
0.1833, a skew of 4.73 and a kurtosis of 21.01. A positive skew indicates that the
tail on the right side is longer than the left side and that the bulk of the values lie
to the left of the mean (including the median). A high skew value is an indication
for outliers, since it tells us that most values are smaller than the average value we
got. Another indication for outliers is the high kurtosis value.
Figure 2.4: Boxplots Spot Price per Day (High-Memory Double Extra Large Linux
instance in the US-East region)
We can use the information we acquire from the average per day analysis in our
resource allocation algorithm. By determining a maximum bid price that guarantees
us with a certain probability access to the needed amount of spot instances. This
avoids the need to rent instances on the moments that extreme high outlier values
occur. Also the evolution of the quartiles should be monitored to modify our
maximum bid appropriately, this however requires our broker to have a real-time
component in its scheduler. We’ll focus on general trends for now, by which we
mean conclusions that can be drawn from analysing the spot price history, without
introducing a real-time component in our resource allocation algorithm.
Average per Week
The average price per week graph (see Figure 2.5) shows fluctuations. These
fluctuations seem less abrupt, this is understandable since averaging the spot prices
per week flattens out the outliers values.
2.3. PRICE EVOLUTION SPOT 27
Figure 2.5: Average Spot Price per Week (High-Memory Double Extra Large Linux
instance in the US-East region)
Once again the boxplot graph (Figure 2.6) shows less fluctuations, the quartiles are
well aligned across the different weeks. The differences in average spot price are
again caused by outliers.
Figure 2.6: Boxplots Spot Price per Week (High-Memory Double Extra Large Linux
instance in the US-East region)
The following Figure (2.7) shows that on a bigger time scale, namely from the
beginning of the existence of the spot instance (December 2009) until January
1th 2011, the same conclusion holds. All quartiles are aligned very well, the only
peculiarity is that during the first couple of weeks there was a search for the right
spot price.
2.3. PRICE EVOLUTION SPOT 28
Figure 2.7: Boxplots Spot Price per Week between December 2009 and January
2011 (Standard Large Linux instance in the US-East region)
Figure 2.8: Boxplots Spot Price per Week during the Christmas period between
December 1th 2010 and January 10th 2011 (High-Memory Double Extra Large Linux
instance in the US-East region)
2.3. PRICE EVOLUTION SPOT 29
Figure 2.9: Boxplots Spot Price per Day of the Week (High-Memory Double Extra
Large Linux instance in the US-East region)
Figure 2.10: Average Spot Price per Hour of the Day (High-Memory Double Extra
Large Linux instance in the US-East region)
2.3. PRICE EVOLUTION SPOT 30
The boxplots (see Figure 2.11) clearly show that the percentiles lay rather close to
each other, and thus that average spot price fluctuations are caused by outliers.
The statistic values are almost always around 0.165 for Q1 and around 0.176 for Q3.
The Kurtosis values is rather high most of the time, which indicates the presence of
outliers.
Figure 2.11: Boxplots Spot Price per Hour of the Day (High-Memory Double Extra
Large Linux instance in the US-East region)
Since a difference in price between the weekend and week days is noticeable, it is
rather evident to investigate the price difference between day and night as well. We
assume the day to be from 8AM until 8PM. The plot beneath (see Figure 2.12)
shows the difference between the day and night prices for every day. A positive peak
means that the spot price was more expensive during the day. Note that this was
the case for most of the days. If we have a look at the statistics, we notice that the
price during the day is 0.0096 US dollars higher on average than during the night.
This difference is rather small for an average price of $0.1833). We conclude that
our suspicion of higher prices during the day is in this case confirmed, but that the
difference is very small. The fact that there is only a very small difference between
day and night can be explained by the fact that the US-East region is the very first
region of EC2 and it is used by developers from all over the world. The developers
are attracted to this region by the pricing of the instances and the introduction of
new services is first done at the US-East location. The global consumer base of the
US-East region cause a constant workload on the region and can be the reason for
small difference in price between the day and night. It is not a bad idea to bear in
mind that this observation could become more clear in the future, when the spot
market becomes more active. The average difference between day and night and the
standard deviation are good metrics to keep an eye on.
2.3. PRICE EVOLUTION SPOT 31
Figure 2.12: Spot Price Difference between Day and Night (High-Memory Double
Extra Large Linux instance in the US-East region)
Figure 2.13: Boxplots of the spot prices between December 18th 2010 and February
13th 2011 (High-Memory Double Extra Large Linux instance in the US-East region)
Comparing the average spot price of these two different time frames (see Table 2.3),
reveals that the average prices did change over time. Most of the average spot prices
had a little price increase, except for the high-CPU extra large and standard large
instances, whose prices reduced considerably. For the standard large instance this
2.3. PRICE EVOLUTION SPOT 32
can be explained by the fact that during the earlier time frame this spot price was
too high compared to the other spot prices, its ratio spot versus on-demand price
was much higher than the ratio of the other instance types. This deviation can thus
be seen as a price correction. The High-Memory Double and Quadruple Extra Large
instance prices also decreased, which is explained by the fact that these instances
got an on-demand hourly price reduction on September 1th 2010. During the second
time frame the spot prices of these instances were adjusted according to the newly
introduced prices. We conclude that monitoring the long term evolution of the prices
is an important task.
Table 2.3: Comparison Average Spot Prices (High-Memory Double Extra Large
Linux instance in the US-East region)
Figure 2.14: Average Spot Price Evolution (High-CPU Extra Large Instance in the
US-East Region)
On the boxplot we notice that the price evolution starts to become interesting in
the beginning of October, because the percentiles start to be positioned differently.
Figure 2.15: Boxplots Spot Price Evolution (High-CPU Extra Large Instance in the
US-East Region)
Also other graphs for this High-CPU Extra Large instance type start to be interesting
in October 2010. So, it is important to analyze the price traces during different time
frames, because time periods can be detected during which the market mechanisms
are not fully in operation.
Comparison to other Operating Systems and Regions
The US East region is the cheapest and is presented by Amazon as the region to play
around in, all new functionalities are introduced in this region first. This makes it
obvious to assume that customers from all over the world use instances in this region,
such that there is activity on the US-East spot market at all times of the day. This
2.3. PRICE EVOLUTION SPOT 34
is caused by the different time zones the customers are located in. When the spot
market in the EU-West region is analyzed, one would predict that the prices during
the night would decrease more than in the US-East region. This is however not the
case, which can indicate a lack of activity in the spot market in this region. The
same conclusions, as those made in the US-East region, hold for the other regions
and for the Windows instances as well.
2.3.4 SpotWatch
To enable further research on the statistical analysis of the EC2 spot pricing, we
created the website http://www.spotwatch.eu, where graphs containing averages
and the accompanying box plots are generated on request. This web service allows
its users to create graphs for all the existing regions, including the new Tokyo region,
instance types and operating systems offered by Amazon EC2 in any desired time
frame. The spot price history is available from the beginning of the existence of
the EC2 spot market until the current date. The spot price history is updated
daily through an Amazon EC2 API call. SpotWatch offers four different chart
types, the data can be plotted per date, per week, per day of the week and per
hour of the day. The features of this service can be easily extended in the future.
Useful extensions include presenting a number of statistical values about the queried
time period and offering more graph types. We could also offer users to download
graphs together with their corresponding data CVS files. Since we received multiple
positive reactions from people in the cloud industry, including two AWS evangelists
at Amazon.com, we plan to keep this site up and improve its features over time.
The following screenshot shows the service’s look and feel (see Figure 2.16).
2.4.1 On-Demand
Table 2.4 presents the on-demand prices for the different instances across the existing
regions. For Linux instances we notice that the US-East region is the cheapest. The
US-West, EU-West and APAC-SouthEast (Singapore) region have the same prices,
but these prices are considerably higher than the ones in the US-East region. The
standard and high-CPU instances are 11.76% more expensive, the micro instances
are 25% more expensive and the high-memory instances are 14% more expensive
in these regions than in the US-East region. The newly introduced APAC-North
(Tokyo) region has the highest Linux instance prices. The fact that the US-East
region is the cheapest can be caused by of a lot of reasons, such as higher operational
costs (e.g. personnel, taxes, ...) in the other regions.
Table 2.5 presents a comparison of the on-demand prices for Windows instances
across the different regions. The same order does not hold for Windows instances:
the US-East, EU-West, APAC-SouthEast and APAC-North regions have the same
prices, while the US-West region is the most expensive one for Windows instances.
There is however one exception, the Micro instances are the cheapest in the US-East
region, and hold the same price in all other regions. This proves that, even when
the most important parameter is to get instances at the lowest cost possible, it isn’t
2.4. PRICING COMPARISON OF THE DIFFERENT REGIONS 36
always the best option to choose for the US-East region. If dealing with Windows
instances, it would be beneficial to study some other parameters, such as latency, to
determine the appropriate region.
2.4.2 Reserved
Table 2.6 presents the hourly reserved prices for the different instances across the
existing geographical regions. For Linux instances the US-East region demonstrates
the cheapest prices. The US-West, EU-West and APAC-SouthEast (Singapore)
region have the same prices, which are between 25 and 30 percents higher than the
US-East prices. The prices in the new APAC-North (Tokyo) region are again (as
with the on-demand prices) the highest, these are about 9 to 11 percents higher than
the prices in the US-West, Eu-West and the other APAC region. It is important to
note that the Tokyo region is the first one that introduced a different fixed fee, which
makes this region even more expensive compared to the other regions (its fixed fees
are about 5 percent higher).
Table 2.7 shows a comparison of the reserved hourly prices for Windows instances
across the different regions. The same order does hold for Windows instances
than for the Linux instances. The US-East region is the cheapest, followed by
the US-West, EU-West, APAC-SouthEast regions who all have the same prices that
are between 13 and 25 percents higher than the US-East prices. For Windows
2.4. PRICING COMPARISON OF THE DIFFERENT REGIONS 37
instances the same increased fixed fee holds for the newest region, namely APAC-
North (Tokyo), this makes this region even more expensive. The hourly prices were
already between 7 and 9 percents higher in this region compared to the US-West,
EU-West, APAC-SouthEast regions, except for the Micro instances price which is
about 23 percents higher in this region.
2.4.3 Spot
For the following tables the average spot price for each OS, instance type and region
combination possible was first determined using the spot price history up until the
end of March 2011. The first Table 2.8 presents the spot prices for the Linux
instances across the different regions, it shows that for all instances the US-East
region is the cheapest. The US-West, EU-West and APAC-SouthEast regions have
practically the same average spot prices which are a little higher than the ones in
the US-East region. The Tokyo region is once again the most expensive one. The
high-memory Double Extra Large and Quadruple Extra Large instances however
seem to be an exception, since their prices do not exceed those of the other regions.
The Tokyo region does not exist long enough (only since the beginning of March
2011) to be certain whether this is a robust trend.
Table 2.9 presents a comparison of the average spot prices for Windows instances
across the different regions. The same order does roughly hold: the US-East region
2.4. PRICING COMPARISON OF THE DIFFERENT REGIONS 38
All instances, except for the High-Memory Extra Large and the Micro instance, have
a cheaper price in the Tokyo region for the Windows instances in comparison to the
other regions. The average spot prices in the Tokyo region are still considerably
higher than in the US-East region however. This could again mean that the time
of observation to get a good image of the average spot prices was too short, or that
there is an initialization phase during which the spot prices behave a little different.
A phase is noticed in which the spot price was not settled yet, in other regions as
well. This can be seen on the following graph that shows the price fluctuations in
the first couple of days at the end of 2009 when spot instances were just introduced
(see Figure 2.17).
Figure 2.17: Average Spot Price for Standard Large Linux instance in the US-East
Region between December 14th 2009 and February 14th 2010
2.4. PRICING COMPARISON OF THE DIFFERENT REGIONS 39
The following graph (Figure 2.19) is a normalised version of the instance price
comparison for the US East region. The reserved prices are always around 65%
of the on-demand prices, as discussed above this is the best case scenario, with
the smallest fixed fee amount per hour possible (for the one-year period). The
average spot price equals about 35% to 45% of the on-demand price, so it is a lot
cheaper even compared to the reserved prices. The Micro instance however is an
exception, its average spot price is relatively high and only about 4% cheaper than
the corresponding reserved price.
2.4. PRICING COMPARISON OF THE DIFFERENT REGIONS 40
The following is the normalised graph (see Figure 2.20) for the EU-West region.
The reserved prices (when assumed that the instances are used the whole year) are
about 70% of the on-demand prices, which is relatively high compared to the US-
East Region. The spot prices in the EU-West region lay between 43% and 53% of
the on-demand prices, this is also relatively high compared to the US-East region,
where most average spot prices were less than 40% of the on-demand prices. The
on-demand price in the EU-West region is relatively low or the one in the US-East
region is relatively high.
2.4. PRICING COMPARISON OF THE DIFFERENT REGIONS 41
In the other regions a similar graph is observed than the one for the EU-West region,
which indicates that the spot and reserved price for Micro instances are lay closer
together than the prices of the different pricing models for the other instance types.
The Asian regions are the most expensive ones for data transfer, with the Tokyo
region being the most expensive one. For incoming data on the other hand all regions
do offer the same price.
2.5. CONCLUSION 42
2.5 Conclusion
This chapter determined the scope of our model. The environmental parameters
that will be taken into account in the heuristic this thesis proposes for its resource
broker are detected in the analysis of the EC2 instance prices. Other parameters
that at first seemed to be important will be ignored, since it became clear that these
parameters do not influence the total cost much. Determining the scope of the model
influences the complexity of the broker, there is a trade-of between completeness of
the model and complexity of the algorithms involved.
The cloud computing market is relatively new and thus very volatile, the product
portfolio and prices change continuously. It is however possible to determine some
trends and conclusions that the given analysis has shown. The following items
influence the total cost of running a customer’s workload on EC2 considerably:
• The choice of region will be taken into account, we’ve shown that always
choosing the US-East region does not constitute an optimal strategy. As said
before, it is also important to take into account the introduced latencies, since
these might be considered important for certain workloads.
• Concerning the spot instances, the differences between the regions and the fact
that there is an evolution noticeable during the hours of the day have to be
accounted for.
The following choices that influence the cost of running on EC2, will be assumed to
be fixed and will not be taken into account in our heuristics.
• The choice of instance type, our broker assumes to know for a presented
workload which instance type is the most appropriate. To make the broker
2.5. CONCLUSION 43
• The previous argument also holds for the choice of OS of the instance, we’ve
seen that different OSes have different prices on EC2, presumably caused by
varying amounts of licensing costs. We will however assume the OS to be
known for a certain workload, so this is no longer a degree of freedom for our
heuristic.
• The different availability zones within a region will not be taken into account,
this is not necessary since the availability zones of a certain region all hold the
same prices.
• The longterm price evolution is not be considered, for on-demand and reserved
pricing we’ve shown that up until now, the prices haven’t decreased very often.
For spot pricing however the price trends will be taken into account.
The proposed model need constant adaptation to the current trends and conclusions.
This process should be automated as much as possible. Also, the more price trend
characteristics are taken into account, the more complete the model becomes and
the better our heuristic will help the broker in minimizing the total cost for the
cloud consumer.
CHAPTER 3
RESOURCE SCHEDULING
This chapter explains how, based upon the findings in the previous chapter, an
intelligent heuristic can be developed that approaches the optimal division between
the On-Demand, Reserved and Spot instance pricing models. An optimal division in
this case should be seen as one that minimizes the total cost for the customer. The
environmental characteristics that should be considered valuable for the heuristic
were already presented in the previous chapter 2. In this chapter a way to make
an optimal division between Reserved and On-Demand instances is presented first.
The fact that a workload can be described by different models, that all require
an appropriate resource scheduling algorithm is discussed in this chapter and it is
concluded with a discussion about how spot instances fit into the derivation.
3.1 Introduction
Cloud computing introduced a business change in many companies, since the
way departments interact and the way costs are allocated is altered [31]. Cloud
computing provides a layer of abstraction, that hides the technical details and allows
companies to focus on aligning supply and demand, while efficiently provisioning
infrastructure. IT departments are able to react to the business’ needs more quickly,
since these are isolated from the arcane business of buying, installing and managing
IT infrastructure. Automating the resource allocation process is what enables a
customer to benefit from the advantages of cloud computing the fullest. The
automation also ensures that infrastructure is managed as efficiently as possible.
This automatic resource allocation heuristic should try and minimize the total cost
for the cloud customer. The capacity management approach that focuses on trend
and threshold models is combined in this chapter with a focus on the importance of
workload analysis, intelligent workload placement and resource allocation.
3.1. INTRODUCTION 45
• Over-provisioning: means that the actual demand doesn’t equal the foreseen
demand, such that too many resources get provisioned. This is a problem,
since the phenomenon leads to resources with no, or a low level of, utilization.
This way money is being wasted on resources that aren’t really needed.
The following Figure 3.1 illustrates these properties and shows that a traditional
data center always encounters one or both of these problems. A traditional solution
isn’t as flexible in the allocation process of resources as a cloud solution. One of
the key characteristics of cloud computing is on-demand accessibility of resources,
but to exactly meet the actual demand with the provisioned amount of resources is
still a rather difficult task. It is however possible to use the Auto Scaling feature on
AWS to automatically add on-demand instances, when the current demand requires
more resources.
Amazon offers a reserved pricing model, which introduces the phenomenon of over-
provisioning on the instance level again, since it is possible to have more reserved
instances than the number of instances needed at a particular moment in time. To
achieve the best fit that minimizes the total cost, it is important to have a clear view
on the resource demand over time. We assume the workload to be known for an entire
year when determining the resource allocation scheme in this chapter. This does not
take away that foreseeing/predicting the resource demand is extremely important
3.2. OPTIMAL DIVISION BETWEEN PRICING MODELS 46
to be able to draw some of the conclusions made in this chapter. Further research
should be done regarding patterns in the workload of typical cloud applications.
Monitoring tools should be developed that give feedback information to our broker
information about the evolution and trends in resource demand. We restrict the
planning of resource allocation to a one year period, since it is hard to imagine what
the present value will be of the resources in three years. A customer can’t foresee
how his workload will evolve over such a long period of time, as technology and a
customer’s needs are changing rapidly.
The example workload requires 87 instances (for a day) in total and the maximum
number of instances needed on a single day is 4. So, the maximum number of reserved
instances needed will also be 4. Now we extrapolate the workload presented for a
period of a month (30 days) to a workload for a year by multiplying the calculated
3.2. OPTIMAL DIVISION BETWEEN PRICING MODELS 47
price with 365/30. The tables in Figure 3.3 present all the possible divisions between
on-demand and reserved instances.
Thus, the usage of 3 reserved instances yields the desirable result, namely the
solution with the cheapest total cost. We show now that determining the tipping
point, identified by x, is identical to finding the solution of a simple equation. The
tipping point expresses how much time an instance has to be in use, for it to be
cheaper to be a reserved instance rather than an on-demand instance.
The reserved and on-demand instance cost price is equal when x equals 416.36
instance hours, which is about 172.35 days or 47.22% of a year. If we now apply this
technique to the example, we get the results stated in Table 3.1. It shows for each
possible number of instances, how many percent of the time the example workload
requires that many instances.
We notice that 3 is the biggest number of reserved instances for which our tipping
point value of 47.22% is still reached. This yields an optimal division containing 3
reserved instances. On days that need more than 3 instances simultaneously, these
reserved instances are supplemented with on-demand instances.
Table 3.2 gives an overview of the tipping point values for Linux instances when
instances are reserved for a 1-year period.
3.2. OPTIMAL DIVISION BETWEEN PRICING MODELS 48
Notice that the choice of geographical region influences the optimal division. Micro
instances for example has the lowest tipping point value in the Tokyo region, which
means that for this instance the reserved pricing model is already preferred compared
to the on-demand pricing model for smaller loads than it is in other regions.
Table 3.3 gives an overview for Windows instances for a 1-year period.
We notice that Windows instances have lower tipping point values than the
corresponding Linux instances. The tipping points that were calculated for instances
that are used during a 3-year period can be found in the appendix F.
The choice of using the reserved pricing model has a lot to do with how certain
one is about the expected workload. This yields the interesting research question
of whether it’s cheaper to take too less or too many reserved instances? We derive
the conditions for one less reserved instance to be cheaper than one more reserved
instance than the optimal amount. For our example workload 3 was found to be the
3.2. OPTIMAL DIVISION BETWEEN PRICING MODELS 49
OP T M IN 1 = X ∗ P O ∗ S + (T − X) ∗ P RH ∗ S + (OP T − 1) ∗ P RY
365 ∗ 24 365 ∗ 24
= 29 ∗ 0.085 ∗ + (87 − 29) ∗ 0.03 ∗ + 2 ∗ 227.5
30 30
= 29 ∗ 24.82 + 58 ∗ 8.76 + 455
= 719.78 + 963.08
= 1682.86
3.2. OPTIMAL DIVISION BETWEEN PRICING MODELS 50
OP T P LU S1 = Y ∗ P O ∗ S + (T − Y ) ∗ P RH ∗ S + (OP T + 1) ∗ P RY
365 ∗ 24 365 ∗ 24
= 0 ∗ 0.085 ∗ + (87 − 0) ∗ 0.03 ∗ + 4 ∗ 227.5
30 30
= 0 ∗ 24.82 + 87 ∗ 8.76 + 910
= 0 + 1672.12
= 1672.12
The OPTPLUS1 solution is the better choice, since it is the one with the smallest
total cost. Formally, we would determine whether the difference between OPTMIN1
and OPTPLUS1 is positive or negative to make a choice.
OP T M IN 1 − OP T P LU S1 = (X − Y ) ∗ P O ∗ S + (Y − X) ∗ P RH ∗ S − 2 ∗ P RY
= 10.74
>0
OP T M IN 1 > OP T P LU S1
The tipping point for taking more or less reserved instances is the point where the
difference of OPTPLUS1 and OPTMIN1 equals 0.
0 = (X − Y ) ∗ P O ∗ S + (Y − X) ∗ P RH ∗ S − 2 ∗ P RY
2 ∗ P RY = (X − Y ) ∗ (P O ∗ S − P RH ∗ S)
2 ∗ P RY
X −Y =
P O ∗ S − P RH ∗ S
455
X −Y =
24.82 − 8.76
X − Y = 28.33
When X is 28.33 or more instance days higher than Y, taking 1 more reserved
instance is the better choice. Remember that X is the number of on-demand instance
days needed in a 30-day period when taking 1 less reserved instance, while Y is the
number of on-demand instance days needed in a 30-day period when taking 1 more
reserved instance. The tipping point is almost equal (except for the Micro instance
type) across all instance types and geographical regions, as shown in the Table 3.4.
Table 3.4: Tipping Point taking one more Reserved (versus On-Demand) instance
is better
3.2. OPTIMAL DIVISION BETWEEN PRICING MODELS 51
First we investigate the division between Spot and On-Demand instances for our
example that uses Standard Small Linux instances.
The average spot price is cheaper than the on-demand rate, so choosing spot
instances will always be the cheapest solution. Table 3.5 presents the average
spot price expressed in a percentage of the on-demand price for the corresponding
instance. Since all percentages are smaller than 100, spot instances will always be
the choice with the smallest total cost.
Next we apply the technique for the division between spot and reserved instances.
In this case taking a reserved instance becomes the cheapest when x equals 32500
instance hours, which is about 1354.17 days. This equals a period of more than a
year, so choosing the spot instance is always the cheapest option. Table 3.6 shows
that this conclusion holds for every instance region combination, since we always
need to run the instance longer than 365 days while we only took a fixed fee for
the reserved instances for a one-year period into account. The ‘Not Available’ (NA)
3.3. WORKLOAD MODELS 52
fields were introduced for the cases where the average spot price was already lower
than the reserved hourly rate, such that the spot choice would always yield the
solution with the smallest cost.
Table 3.6: Reserved prices become cheaper than spot prices after the stated amount
of days
1. In the first workload model, a task has a deadline and a certain amount of VM
hours that need to be executed, these work hours can be divided over time
such that it best fits the schedule. This workload model can be compared to
the BOINC Powered Projects [32] who have a deadline (number of days) and
a process time (task length).
2. The second model specifies a task with a deadline and a certain length, for
every work hour is also specified how many VMs are needed to execute the
corresponding part of the task. This workload model could be compared to a
system that takes batch jobs, where the total load takes on a wave form.
type and operating system of the instance that should be used for the task. It would
be best to benchmark the task to determine the appropriate instance type, but as
discussed this falls out of the scope of this thesis. We can optionally pinpoint a
task to a certain region, this is appropriate when small latencies are of importance
to the task. When the geographic region field is left blank, our broker determines
where it is the cheapest to place the workload. We also specify whether the task
can make use of spot instances, or in other words whether the application can deal
with snapshotting easily.
This task is specified with a deadline, which is the moment in time when the
task has to be finished, and a length, which is the amount of instance hours
the task will take in total.
A task of workload model two is also constrained with a deadline. The length
specified reflects the total width in hours of the task, but each hour can
require multiple instances to complete the corresponding part of the task.
These amounts are specified in a separate workload specification file, which
is referenced in the task specification. The workload generator takes an
upper and lower boundary and fluctuates the load between these boundaries.
Starting with a random load (within the boundaries) for the first hour of
the task, random amounts of workload are added each consecutive task hour
until the upper bound is reached. From then on we start subtracting random
amounts of workload until the lower boundary is reached. We repeat this
3.4. WORKLOAD SCHEDULING 54
process to generate a workload amount for every hour of the task. The
values are adjusted for every instance type available, according to the Elastic
Compute Units (ECUs) that Amazon assigned to the corresponding instance
type. The ratio workload versus ECU will be constant for every instance type.
These workload values represent the number of instances required, thus these
numbers are rounded to the first greater integer in our broker. A snippet from
a CSV file containing such a load description is given in Figure 3.6.
to the deadline of the task is reached. If not all task hours of the task fit on a resource,
an extra resource is added.
This algorithm yields the following result (illustrated by Figure 3.7 and Figure 3.8),
with at total price of 619.92 US dollars for our example workload to run on EC2.
To perform the scheduling more intelligently (preferably optimally) for this workload
model, the tasks are first sorted in an earliest deadline first manner. Then for every
task we assign the task hours to the instance hours of the existing resources. We start
from the earliest available instance hour of the resource and process the task hours
until all the task hours are distributed or the instance hour corresponding to the
deadline of the task is reached. If the instance hours of all the existing instances are
filled up until the task’s deadline and more task hours need to be distributed, an extra
resource is added. This process yields an optimal scheduling solution that provisions
the right amount of reserved instances to get the smallest total cost possible. How
to determine the boundary (from what amount of VM hour utilization it is cheaper
3.4. WORKLOAD SCHEDULING 56
tasks.performEDFSort();
for all t in tasks do
for all r in resources do
r.addPartUntilDeadlineOrEndReached(t);
if t.isDistributed() then
break;
end if
if r.isLast() then
resources.addNew();
end if
end for
end for
This intelligent scheduling algorithm resulted in a solution with a total cost of 516
US dollars (illustrated by Figure 3.9 and Figure 3.10), which amounts to a decrease
of 16.76 percents compared to the naive scheduling method. The more intelligent
solution uses the reserved pricing model for all four resources.
width (in hours) of the task, each hour of the task can need multiple instances to
complete the corresponding part of the task. The task is constrained be a deadline
as well. Using the basic scheduling technique, which consists of simply starting every
task at the start date and scheduling all the hours of the task adjacently, without
gaps. When we do this for our example workload, the following result, that has a
total cost price of 37509.08 US dollars, is obtained (illustrated by Figure 3.11 and
Figure 3.12).
First we developed a way to optimally schedule these tasks. Finding the optimal
solution consists of first calculating all different divisions of the workload hours over
3.4. WORKLOAD SCHEDULING 58
the time available for that task, this is equal to a combination of x out of y with y
being the total number of time slots available (between the start and deadline time)
and x the number of time slots of the task. The combination formula is given in
Figure 3.14.
y
y!
x = x !(y−x )!
Because not all the possible combinations can be stored in memory at once, a
combination is only generated at the moment it is used. This is made possible
by a combination generator class that is initialized with x and y and can be used to
iterate over all the possible combinations. Determining the optimal task schedule
is reduced to the task of selecting the optimal solution between the schedules. We
calculate the cost for each schedule and are then able to determine the optimal
schedule. We came to the conclusion that the time complexity of this algorithm is
too high. This is caused by the fact that all possible combinations are tried, with
the only constraint that the task hours are scheduled in order and that the schedule
does not exceed the deadline. This technique becomes unfeasible when the difference
between the task length and the time between the start and deadline of the task
is large. This leads to many possibilities to schedule the task, especially since the
generation of one single schedule is already time-consuming.
We had to settle for a suboptimal solution that uses the above algorithm in smaller
intervals, we divide the time in intervals and distribute the workload evenly among
the intervals. The size of the interval is determined by looking at the number of
combinations that would be possible within the interval and making the intervals
smaller until a given threshold value is reached. This threshold can be made
bigger on a system that has more CPU and memory resources available, or can
be made smaller when a quicker scheduling result is desired. The solution within
a bucket/interval is reached by minimizing the number of instances (resources)
needed using the straightforward technique of trying all possible solutions, after
which the concatenation of the bucket schedules yields a total schedule. The fact
that we minimize the number of resources needed, already makes the solution
suboptimal in terms of finding the solution with the lowest total cost, since this
requirement will yield an even distribution that is not necessarily the cheapest
solution. We are now dealing with a small time frame, which makes it difficult
to select the solution that will use the highest number of reserved instances. Note
that the allocation of the pricing models for the different instances of the schedule
happens post-mortem, thus after the scheduling is completed. The allocation
phase is explained in detail in section 4.4. To enable the selection of the best
solution we would have to determine whether using all buckets together would reach
the reserved tipping point for a certain resource, but this would take away the
advantage of using intervals (to not have to put all possible combinations in memory
simultaneously). We prefer an option that only makes use of information within the
3.4. WORKLOAD SCHEDULING 59
bucket itself, since this simplifies the solution and facilitates the possibility to make
the algorithm recursive. We also tried a scheduler that interpolates the number
of hours the instance was needed in the interval to the bigger time frame (to be
able to divide between reserved and on-demand this way), but this didn’t seem
to give better results than the scheduler that minimized the number of instances
needed in each interval. The described algorithm is given in pseudocode here.
Using this scheduling algorithm we obtain a result with a total price of 34531.56
US dollars (illustrated by Figure 3.15 and Figure 3.16), which corresponds with a
price decrease of 7.94 percents of the total cost in comparison to the basic scheduling
result. Only 27 instances are required now, instead of the 39 instances that were
needed at one point in the basic version. Thirteen of these instances are reserved
ones, while in the basic scheduling version only 11 reserved instances were used.
3.5.1 Checkpointing
Spot instances get terminated and are no longer available, when the customer’s bid
doesn’t exceed the current spot price any longer. Because of this, it is a good idea
to take snapshots of the work that is already done. The following two possible
checkpointing schemes are considered:
Adding more checkpointing schemes to our model is not too difficult, since the
implementation of a simulation of a number of checkpointing techniques can be
found in the source code that comes with the “Reducing Costs of Spot Instances
via Checkpointing in the Amazon Elastic Compute Cloud” [33] paper. The work
of Kondo shows that the hourly checkpointing reduces the costs significantly in the
3.5. SPOT DECISION MODEL 61
presence of failurs and that other schemes do not perform better. Only for a small
set of instance types an edge-driven checkpointing technique was found to result in
a smaller total cost. For more in depth information, check out the paper [33].
3.5.2 SpotModel
The proposed Decision Model has the following random variables (see Table 3.17)
on which constraints can be placed in the implementation.
The following example (see Figure 3.18) shows what the different variables mean in
practice. We notice that a failure occurs when the spot price exceeds the bid price,
and that checkpointing and restarting time is taken into account in the adjacent
spot instance hours to this gap.
The decision model performs simulations that use real price traces of Amazon EC2,
but it selects a random starting point in this price history to start scheduling the
task on. Because of the number of simulations involved, the result will give a
realistic expected cost price. The output of the program can be used to make an
intelligent bid for the spot price (one that ensures the task to be finished before the
given deadline and within the foreseen budget). To find the optimal bid price, we
determine whether there exist combinations of the parameters of the model that are
feasible or in other words we select all possibilities that result in the task meeting
its time constraints. This is done by “look-ups” in tables of previously computed
distributions, such that the processing effort is negligible. Among the feasible cases,
we select the one with the smallest cost. If no feasible cases exist, the job cannot be
performed under the desired constraints.
3.5. SPOT DECISION MODEL 63
Using the results of the example run of the decision model implementation in the
paper, the following graph (see Figure 3.20) was created. It shows us that for a
certain bid price and confidence level (p) to meet the given constraints, we get the
expected execution time. We notice that when the bid price becomes too low the
execution time increases significantly and also that an increasing level of confidence
yields a larger execution time. One of the findings of the paper was that bidding
a low price yields cost savings of about 10%, but can lengthen the execution time
significantly.
Figure 3.20: Decision Model [1]: execution time - bid price - confidence level graph
We also changed the input file for the Decision Model, previously it took one
data.csv file as input containing the spot price data for every instance type (different
3.6. CONCLUSION 64
columns). This file did not contain any date information, every record contained
the price for a minute of time. The application therefore takes the CSV files from
cloudexchange.org as input, which means there is a separate file for every region-
OS-instance combination that has two columns: the first one contains the date, the
second one the corresponding spot price for that instance at that time.
3.6 Conclusion
This chapter showed how to make the optimal division between reserved and on-
demand instances, as illustrated by Figure 3.21. This basic level of workload that
requires reserved instances to minimize the cost price, is easily determined for a
certain instance type, operating system and geographical region combination. It
involves solving a simple equation to determine the tipping point for the particular
situation and comparing the actual use with this tipping point to decide whether or
not to use the reserved pricing model.
Whether taking too many or too little reserved instances is the better (in terms of
total cost) option was investigated, it was found to depend on the characteristics of
the workload involved.
We then introduced the two workload models that are used during the development
of the broker prototype. The first one describes a task by stating the total amount
of VM hours that need to be executed and a deadline at which time the task has to
be finished. The second workload model describes a task by stating the length of the
task, which in this case equals the total width (in hours) of the task, every hour of
the task could however still need multiple instances to complete the corresponding
part of the task. We developed an algorithm that schedules tasks of both models in
a way that tries to minimize the total cost. These can be used to determine how to
get to a resource schedule which satisfies all deadline constraints and that divides
the load as equally as possible over time, starting from a number of constrained
workload descriptions. Once the schedule is made, we can determine which pricing
3.6. CONCLUSION 65
model is most appropriate for each resource involved. To make the division between
reserved and on-demand instances the tipping point values determined earlier in the
chapter (see section 3.2.1) can be used.
To introduce spot instances in our heuristic, we use the port of the ‘SpotModel’
application that accompanies the “Decision Model for Cloud Computing under SLA
Constraints” [1] paper. It provides a way to determine the bid price that minimizes
the total cost for a certain instance type and a task of a given length in the spot
market, all while meeting deadline and budget constraints with a certain confidence
percentage. Details about the integration of this software in our broker can be found
in section 4.4.2.
CHAPTER 4
BROKER DESIGN
This chapter introduces the developed prototype of the broker and the underlying
heuristics and algorithms. The broker is a tool1 that tries to optimally map the
workload of a consumer to the different pricing models offered by Amazon EC2,
namely on-demand, reserved and spot instances. The consumer’s jobs can be bound
by a set of quality of service constraints, such as a deadline by which the task needs
to finish. The potential for cost reductions through intelligent instance allocations
are shown to be huge in the previous chapters, where we stated that the spot prices
are almost always a lot cheaper than the on-demand prices. The different algorithms
and broker components, involved in reaching an intelligent schedule from an input
of constrained workloads, are discussed within this chapter. The evaluation of the
underlying heuristics of the broker is done in the next chapter (see chapter 5).
4.1 Introduction
The cloud computing market is relatively new and is rapidly changing, the spot
market for example was only introduced in EC2 in December 2009. EC2 now offers
three pricing models, namely on-demand, reserved and spot pricing. It does however
not offer any tools to its consumers to optimally map workloads and corresponding
QoS requirements, such as a deadline by which a workload needs to be finished, to
these pricing models. This chapter describes the broker prototype which implements
the proposed heuristics in order to map a number of constrained workloads to an
intelligent resource allocation schedule, that tries to maximize the cost reductions
for the consumer while still meeting the given deadlines.
1
The Java prototype of our broker can be found on-line, see appendix G for download
instructions.
4.1. INTRODUCTION 67
Our broker consists of four easily separable tasks that are equally important, which
are schematically shown in Figure 4.1 and afterwards briefly explained.
• Input The first component of the broker prototype is labeled as the input
of the broker. It consist of a task generation and specification component
that is used to specify (and generate) tasks that utilize the previously defined
workload schemes, see 3.3.1. A task’s characteristics and the corresponding
workload is specified in CSV files using a predefined format. Another
component of the broker is the part that provides and analyzes the prices of
the different instances (which are identified by the combination of an operating
system, an instance type and a geographical region) across the existing pricing
models. The on-demand, reserved and spot prices can’t all be accessed through
the EC2 API2 , such that we need to foresee our own component that provides
this information to our broker. The spot pricing component needs to track
the history of the prices and analyze the trends that occur, as was explained
in the environmental analysis chapter (see chapter 2).
• Scheduling The next step in the brokering process is the scheduling of the
tasks (specified by the input component) across the different geographical
regions. The scheduling is performed in a way that tries to minimize the
cost for the consumer as much as possible. A second function of this part
of the broker is to spread the workloads over time, while taking into account
the start and deadline constraints of the workloads. Spreading the load in a
well-balanced way (meaning in an evenly distributed form) ensures that the
resources will be in use as much of the time as possible, which is necessary
to be able to use the cheaper, in comparison to on-demand pricing, reserved
pricing model as much as possible.
should be used for a certain resource, given the load that was scheduled on this
resource in the previous step of the brokering process. The resource allocation
algorithms incorporate the conclusions that are based on the analysis of the
pricing history in the environmental analysis chapter (see chapter 2). This
component uses the pricing input to determine the tipping point in terms
of resource utilization percentage. These tipping points reflect from what
amount of resource utilization it is better to use the reserved pricing model
rather than the on-demand one for a certain instance, see section 3.2.1. The
determination about the usage of the spot pricing model depends on whether
the tasks scheduled on a resource are allowed to run on spot instances. If all
tasks are spot-enabled, the spot pricing model is used. If it’s a combination
of spot-enabled task and tasks that do not allow the spot pricing model, the
tipping point for reserved versus a combination of spot and on-demand is
calculated. The underlying algorithms are discussed in section 4.4.
• Output The last step of the brokering process consists in presenting the
calculated schedule and associated costs to the consumer. A graphical
representation, which is presented in the form of a Gantt chart, and a textual
representation of the schedule will be provided by our application. Both
representations are accompanied by a detailed overview of the costs associated
with the proposed schedule. The cost is the metric to measure the performance
of our broker, since our goal is to minimize the total cost for the consumer.
These different components of the proposed broker prototype are presented in this
chapter, their software designs are briefly discussed and the algorithms involved are
explained.
The task’s specification input file (see Figure 4.2) contains the name and a
description of the task. The file provides information about the associated workload
4.2. BROKER INPUT 69
and the deadline of the task as well. The task specification includes information
about the EC2 instances on which the workload should run: the instance type, the
operating system and whether the spot pricing model may be used for the workload is
specified. The region in which to run the task can be specified as well, this is however
optional since cost reductions can be achieved by automatically determining the
cheapest location. Sometimes it is however important to run a certain application
in a specific geographical region, for example when the users of the application are
situated in the same region and small latencies are desirable or for legal purposes.
Whether a certain region is required by a workload is hard to decide, since it is based
on a large number of workload characteristics. The decision whether the choice of
region is free is left to the user of our broker for now.
Figure 4.3 gives an overview of the design of the task specification input component
of the broker.
The output of the broker input step is a TaskCollection, that is passed to the
scheduling component. The tasks in the list of the collection are read from task
specification CSV files, and each Task contains a task specification and a workload.
The workload is represented by an object corresponding to the workload model
of the task. A workload of the first model is specified by a list of SubTasks,
while a workload of the second type is specified by a list of SubTaskCollections,
and a SubTaskCollection is a list of SubTasks. This way, we can represent the
appropriate amount of workload for every hour of the task. The TaskSpecification
consists of a name and description for the task, an earliest start date and a deadline,
whether the task is allowed to run on spot instances and a description of the instance
on which the task can be performed. The InstanceSpecification consists of the
operating system, the instance type and the geographical region of the instance that
is required for the task.
There are a couple of limitations of the broker introduced in this section, since
we require the user to specify certain properties manually in the task specifications.
The properties include the instance type and the operating system of the instance
that is most appropriate for the task and whether spot instances can be used for the
task. These characteristics should be determined automatically as well, this however
needs further research. The first part of the broker should then iterate the tasks
that have to be scheduled in order to determine different characteristics of the given
tasks.
for all task in providedTasks do
task.determineAppropriateOperatingSystem();
task.determineAppropriateInstanceType();
task.determineWhetherSpotEnabled();
end for
One of the limitations of the broker prototype is that the determination of whether
a certain task can be ran on a spot instance has to be specified by the user of our
broker. Whether a task is spot enabled, is dependent on the ability of the application
to handle frequent gaps within the availability of the resource. These gaps are
dealt with by using a checkpointing scheme that takes snapshots at the appropriate
moments in time (for example at the end of every hour) and recovers from an outage
the next available instance hour by recovering using the snapshot taken earlier, these
schemes were discussed before in section 3.5.1. Certain applications are able to deal
with these situations better and can thus efficiently use the spot instance market to
reduce the cost for the consumer.
The determination of the appropriate instance type for a given workload, can
be automated too. The instance type for a certain workload can be determined
by benchmarking the workload on a virtualized Xen environment, since it’s such
an environment that instances in EC2 are hosted on. The benchmark would
measure certain characteristics of the workload, such as the CPU, disk and memory
requirements. With the acquired information a mapping can be made to the instance
4.2. BROKER INPUT 71
types that are offered by EC2. My research project [35], entitled “Instrumentation
of Xen VMs for efficient VM scheduling and capacity planning in hybrid clouds”,
enables the possibility to monitor a Xen environment that is loaded with a certain
task and could be extended by a set of metrics that are valuable for the mapping
of a virtual machine (VM), running a certain task/workload, to the virtual machine
profiles of EC2.
The on-demand and reserved prices are provided through an input CVS file, since
these values can’t be easily acquired by an EC2 API call. The only way to acquire
the prices there is by launching an instance of the corresponding type and querying
for the cost of the instance hour. This is why the choice was made to manually input
the prices in the broker component. The required information can be found in the
EC2 pricing section on the AWS website [27].
Spot pricing information is provided by the SpotWatch tool, it accesses the new
spot prices daily (through an API call to the EC2 web service) and adds them to its
history database. The SpotWatch application can be used to access the spot price
history and to acquire certain statistical properties of the price traces, see section
2.3.4.
4.3. BROKER SCHEDULING 72
Figure 4.5 shows that the scheduling component uses the tasks with their correspond-
ing deadline constraints together with the EC2 pricing information as input. The
scheduling component then divides all workloads across the different geographical
regions of EC2, namely US-East, US-West, EU-West, APAC-Tokyo and APAC-
Singapore. This section explains how this distribution is made, such that the
goal of minimizing the cost is achieved. The workload distribution part of the
broker’s scheduling component then distributes the load evenly over time. An even
distribution maximizes the utilization rate of the involved resources in order to
maximize the usage of the reserved pricing model, which was shown to be cheaper
than the on-demand one once a certain utilization tipping point is reached (see
section 3.2.1). The scheduling algorithms for the different workload models are
already presented in the previous chapter, see chapter 3.4. These algorithms however
are improved upon here, such that handling workloads that are allowed to utilize
spot instances are supported too. The output of the scheduling component of the
broker is an in-memory schedule of the load per instance (identified by a certain
operating system, instance type and geographical region).
else
instanceDescription=task.getInstanceDescription();
if task.isSpotEnabled() then
cheapestRegion=determineSpotCheapestRegion(instanceDescription,
task.getTimePeriod());
else
cheapestRegion=determineCheapestRegion(instanceDescription);
end if
task.setRegion(cheapestRegion);
end if
end for
The algorithm iterates over all tasks and assigns them to a geographical region, when
this was not yet done by the user. Our users can pinpoint a task to a region, when the
task requires the application to be run at a certain location. This can be important
when low latencies (to reach an external service in a certain region) are required
by the application. When a region is not yet assigned to a task, we determine the
cheapest one purely based on the pricing and do not take into account the load
earlier appointed to the different regions. Note that further price reductions can be
reached by first scheduling the tasks that are already pinpointed to a certain region
by the user. The other tasks can afterwards be appointed to a region in a way that
minimizes the total cost, taking into account the existing schedules. The decision
in our broker is only based on the pricing in the different regions, which keeps the
broker’s strategy straightforward and reduces the computational complexity of the
schedule. There is a trade-off between broker complexity and performance.
The algorithm used to determine the ranking of the regions for on-demand and
reserved pricing for a specific instance (of which we know the instance type and
the operating system) is discussed now. The on-demand and reserved prices of the
instance are separately ranked from low to high. If these rankings select the same
geographical region as the cheapest one for the corresponding instance specification,
the broker will assign tasks to this region. For Linux instances we conclude that this
is always the case, Figure 4.6 demonstrates this for the Standard Small instances.
Note that the numbering of the regions is done from the cheapest to the most
expensive one, but consecutive prices are higher or equal to each other. For Standard
Small Windows instances the region ranking was not the same for the on-demand
and the reserved pricing model, but both rankings indicated that the US-East region
has the cheapest price (see Figure 4.7). There is no problem in selecting the cheapest
4.3. BROKER SCHEDULING 74
region here, but if the rankings differentiate more problems could rise. In case this
occurs we could introduce a parameter that indicates how much the on-demand and
the reserved price influence the decision. We could for example let the on-demand
price count for x% (and the reserved one for (100-x)%), with x being the average
percentage of the total number of instance hours a not-spot-enabled task runs using
the on-demand pricing model (based on empirical data). This technique ensures
that the typical usage of on-demand versus reserved prices is taken into account
when a geographical region is chosen.
For the determination of the ranking of the different regions for a spot-enabled task,
the average Q3 percentile value (during the period in which we are scheduling the
task) is used. The period in which the average Q3 value is examined is defined as
the period from the task’s start date until its deadline. If a broker is developed that
does price predictions, the coming spot prices are not known and the average Q3
value over the course of the last month could for example be used. For the Standard
Small instances the ranking that is found is shown in Figure 4.8 (for March 2011).
Note that the Reserved instances give the same ranking for the geographical regions
than the ranking based on the spot prices. When there are instance hours left on a
reserved instance this unused capacity will be used by spot-enabled tasks as well. If
there are no free instance hours left, the cheapest solution is to use the spot pricing
model (instead of the on-demand one) (see the environmental analysis chapter 2).
A check is however performed during the allocation phase to see whether the spot
price crosses the on-demand price on the corresponding moment in time. If this is
the case, the on-demand pricing model is used.
provided workloads as equally as possible over time, while still meeting the deadline
constraints of all the tasks involved. This section explains how to take spot-enabled
tasks into account in the scheduling algorithms of the broker.
Workload Model 1 (total VM hours needed is specified)
In the first workload model we handle the spot-enabled tasks separately, these
tasks are added to the schedule after the other tasks have been scheduled
already.
notSpotEnabledTasks=tasks.subsetNotSpotEnabled();
spotEnabledTasks=tasks.subsetNotSpotEnabled();
performSchedulingWLM1(notSpotEnabledTasks);
performSchedulingWLM1(spotEnabledTasks);
This is done by calling the algorithm presented in the scheduling section 3.4.1 two
times (the function is called performSchedulingWLM1 in the algorithm 4.3.2), first
for the not-spot-enabled subset of the tasks and afterwards for the spot-enabled
ones. Scheduling the tasks in this order will cause a larger amount of gaps for the
spot-enabled tasks, which these tasks should be able to cope with better. It also
ensures that as much of the not-spot-enabled workloads as possible will be co-located
on the first resources in the list of the schedule. These resources have the highest
chance of reaching the reserved pricing tipping point, since the highest amount of
workload is scheduled on them. This way, as much of the not-spot-enabled tasks as
possible will be located on the resources that will be assigned to the reserved pricing
model in the next step of the brokering process. Not-spot-enabled task parts need to
use the on-demand pricing model when they are scheduled on a resource that does
not reach the reserved tipping point, since these are tagged as not suited for spot
instances. The on-demand prices have been shown to be more expensive than the
corresponding reserved prices, thus it is important to schedule the not-spot-enabled
tasks first in order to achieve the cheapest result possible.
Workload Model 2 (every hour #VMs needed is specified)
In the second workload model the spot instance possibility needs to be accounted
for too, here a little adjustment to the previously introduced algorithm 3.4.2 is
made.
for all t in tasks do
buckets.divideEqually(t);
end for
for all b in buckets do
//try all combinations, choose the one
//that minimizes the number
//of needed instances
b.makePlanningWithoutSpotEnabledTasks();
b.addSpotEnabledTasksToPlanning();
end for
In every bucket we want the spot instances to snoop away as little reserved slots in
the final schedule as possible. Since we focus on having an as heavy load as possible
4.4. RESOURCE ALLOCATION 76
on the first resources of the schedule, it’s better to handle the spot enabled tasks
after the other ones are already scheduled on as little as possible resources. Note
that when we add the spot enabled resources to the schedule, we again try to keep
the total amount of resources required as little as possible.
Figure 4.9 shows that the resource allocation component of the broker uses the
task schedule made for every instance required by one of the provided tasks as
input. An instance is specified by an operating system, an instance type and a
geographical region. The resource allocation algorithm makes a lot of choices based
on the conclusions of the analysis on the pricing history in the environmental analysis
chapter (see chapter 2). The algorithm determines whether a certain resource of
the schedule (determined in the previous brokering step), reaches the tipping point
utilization rate for it to be better (or in other words cheaper) to use the reserved
pricing model rather than the on-demand one. The output of this component of
the broker is an in-memory representation of the instance specific schedule it got as
input annotated with cost information.
• Only spot-enabled tasks are scheduled on the resource. In this case all the
task parts will be using the spot pricing model, since this was found to be
4.4. RESOURCE ALLOCATION 77
the cheapest model on average. The SpotModel (see section 4.4.2) is used to
determine how the subtasks assigned to the resource will be scheduled in time
using the bid price received from the spot model. When the spot price crosses
the on-demand price, the corresponding task hours use the on-demand pricing
model.
The equation to check when taking a reserved instance is the preferred choice
then becomes:
cost reserved instances ≤ cost on-demand instances
+ cost spot instances
(x+y)*cost reserved instance ≤ y ∗ cost on-demand instance
+ x ∗ cost spot instance
(x+y)*hourlyResPrice ≤ y ∗ hourlyOnDemandP rice
+ x ∗ avgHourlySpotP rice − f ixedResP rice
[36]. A Gantt chart is a type of bar chart that generally illustrates a project schedule.
Their elements have a start and finish date and sometimes the dependencies between
the components are indicated as well. The domain axis represents the time. A
vertical line is drawn to show what components should already be finished according
to the schedule. This type of graph was found to be appropriate to represent the
information of our schedule. The JFreechart library provided an easy way to present
the information in such a way, and it makes the zooming process to see parts of the
schedule in more detail possible, as shown in Figure 4.10. Note that all subtasks of
a certain task get the same unique color assigned.
This cost overview is important for the evaluation of the performance of the proposed
broker, which is explained in the next chapter.
4.6 Conclusion
This chapter presented a design of the developed broker prototype. The broker maps
the consumer’s workloads and QoS requirements, such as the deadline by which a
workload needs to finish, to the different pricing plans offered by EC2. The potential
for cost reductions through this intelligent instance allocation scheme is huge, the
spot prices for example are on average a lot lower than the on-demand prices. The
heuristics given in this and previous chapters all try to realize the optimization goal
of making the schedule of the workloads as cheap as possible for the consumer.
The broker’s working process is divided into four different components, an overview
of this is given by Figure 4.13.
4.6. CONCLUSION 81
There are a number of items taken into account in the developed broker prototype,
since these were found to influence the total cost of running the customer’s workload
on EC2 considerably. The choice of geographical region is automatically chosen, in
order to minimize the cost. This choice however influences the quality of service,
in terms of network latency, of the application running on EC2. This requires the
possibility for the user to assign a task to a certain region (when high latencies are
not desirable). It’s the user’s responsibility to weigh this advantage with respect
to the possible cost reductions when this choice is left to the broker. A division
between the different pricing models is made by the broker too. A pricing model,
namely on-demand, reserved or spot, is allocated for every resource of the created
schedule. The fact that the spot price history showed certain price trends, such as
cheaper prices during weekends, is taken into account.
and instance type that should be used to process a certain task were assumed to
be provided by the user in this prototype. Benchmarking a workload in order to
acquire a number of resource utilization characteristics, enables the automation of
the mapping of the workload to an instance type.
CHAPTER 5
BROKER EVALUATION
This chapter evaluates the performance of the proposed broker and the underlying
algorithms. The cost savings achieved by the introduction of the broker are
evaluated. The benchmarking of the broker does not only indicate the reachable
cost reduction, but also measures the scalability of the proposed heuristics.
5.1 Introduction
To evaluate the working of the broker model and implementation (see chapter 4),
the prototype needs to be benchmarked. The benchmark provides access to the
data required to make conclusions concerning the cost reduction and scalability of
the broker prototype.
To be able to make a comparison between the achieved cost savings, the broker
is provided a certain workload using different scheduling and resource allocation
options. Remember that the scheduling part distributes the different task hours
across a number of required resources in time. The resource allocation process on
the other hand determines the appropriate pricing model for a resource on which
tasks are scheduled. In section 3.4 of the ‘Resource Scheduling’ chapter, different
versions of the scheduling algorithms for both workload models (see section 3.3) were
presented:
details about the underlying algorithms, including the distinction between the
two workload models, can be consulted in section 3.4.
• Optimized scheduling is the method in which the task hours are distributed as
equally as possible over time, in order to get resources that are loaded as much
of the time as possible. This enables a larger number of the resources to use
the reserved pricing model, which is cheaper than using on-demand instances
once a certain utilization tipping point is reached. More details about the
distinction between the algorithms for both workload models can be found in
section 3.4.
• On-Demand & Reserved resource allocation means that for every resource
involved in the brokering process, it is checked whether the reserved tipping
point is reached. In other words, when using the reserved pricing model yields
a lower total cost, than the on-demand pricing model, the reserved one is used.
• Spot enabled resource allocation is the scheme that takes spot instances into
account, it uses the spot model implementation to determine an appropriate
bid that results in an allocation decision according to the rules described in
section 4.4.2.
• Optimal Spot resource allocation is the scheme that does not use the
spot model, instead it determines the optimal spot instance hour allocations
according to the actual spot history prices. The algorithm uses the fact that
the history of the spot prices is known during the time period in which the
broker is creating a schedule, such that the task hours of a spot-enabled task
can be appointed to the instance hours that correspond to the lowest spot
prices. This results in a cost price that has an optimal (lowest possible) spot
price contribution. Optimal spot resource allocation is suited to be seen as the
best achievable result and thus can be used as a reference point.
5.1. INTRODUCTION 85
during the whole scheduling period, such that the tipping point for the reserved
pricing point can be reached from time to time. The tasks are assigned a random
task length between one hour and a hundred days (or 2400 hours). For a task of
the second workload model, the maximum task length equals the number of hours
between the beginning of the schedule and the deadline assigned to the task itself.
These tasks also require a certain number of resources for each task hour, according
to the distribution explained in section 3.4.2. With a given probability, the task is
allowed to be run on spot instances. In our benchmark this was chosen to be 40% of
the tasks. If too many tasks are randomly chosen to be spot-enabled, the benchmark
starts to take a long time, because of the spot model part which will be shown to
be the most compute intensive part of the brokering process. A benchmark starts
with one task and keeps adding them until a given maximum is reached, for every
version of the workload all different scheduling and resource allocation options are
benchmarked.
For every single run of the benchmark a line is written to the instance-specific
results file, which means that a separate file is created for the region, instance
type and operating system combination that is being processed. The output file
contains the general information of the benchmark run, such as a reference to the
workload being scheduled, the scheduling and resource allocation options that were
active and the cost and timing measurements. Figure 5.1 shows an example of the
benchmark output. The total time (in seconds) the brokering process took and the
time the four different phases (input, scheduling, allocation and output) needed are
presented. The total cost price (in US Dollars) and how much the different pricing
models contribute to this total cost is stated too.
To get an idea of the cost reductions accomplished by the broker model, a slight
alteration is made to the benchmark described in section 5.1. Every benchmarking
step now generates a workload of ten random tasks, instead of starting with one
task and adding tasks one by one until a given maximum is reached. The brokering
process is executed a hundred times for every option-combination. Table 5.1
indicates the price reductions achieved by using the broker prototype when certain
options are used instead of others.
Table 5.1: Average price reduction from a set of brokering options to a set of different
brokering options [workload model 1]
scheme, this increase is indicated in the table by the negative price reduction.
The most interesting reduction stated in the table is the one our proposed broker
implementation achieves, the reduction equals on average 42.55%. So, the spot-
optimized scheduling that uses the spot-enabled allocation option results in a
schedule that is on average 42.55% cheaper than the one of the basic scheduling
technique. The variance on this percentage measures only 0.33%, which indicates
that the proposed broker almost always generated results that are around 42.55%
cheaper.
For workloads of the first model can be concluded that the broker benchmarking
indicates that the broker makes a considerable cut in the cost price for running
randomly presented workloads on EC2. This was shown in this section for a specific
instance specification, but further investigation shows that similar cost reductions
are also reached within other geographical regions and for different instance types
and operating systems.
Table 5.2: Average price reduction from a set of brokering options to a set of different
brokering options [workload model 2]
For workloads of the second model can be concluded that the broker benchmarking
indicates that the broker makes considerable cost savings for running randomly
presented workloads on EC2.
results used in this section were obtained by performing the benchmark for Linux
instances in the US-East region.
Table 5.3: Average time distribution (in percentage of total brokering time) of the
different brokering phases [workload model 1]
To determine whether there is a scalability problem, the amount of time that is added
to the duration of the brokering process when more tasks need to be scheduled is
analyzed. Table 5.4 presents the average number of seconds the brokering duration
increases when a random task is added to the existing workload. When a spot-
enabled allocation scheme is used, thus ‘Spot-Enabled’ or ‘Optimal Spot’, the
duration increases about twenty seconds for Standard Small instances. One run
of this experiment consists of starting with a workload of one random task and keep
5.3. SCALABILITY EVALUATION 90
on adding tasks until there are ten tasks in the workload. The time increases show
that only when an allocation algorithm that takes the spot pricing possibility into
account is used, scalability issues rise. When the number of tasks is increased to a
hundred tasks, the average duration increase reduces, but is still significant. The
spot allocation scheme of the broker should be altered, such that it becomes more
scalable. One solution would be to run the simulations of the spot model in advance,
but doing the simulation with all possible task lengths is unfeasible. The brokering
time does not increase too much when tasks are added and only a division between
on-demand and reserved is made. Further analysis of the data shows that there is
not much difference in duration increase between the addition of a spot-enabled or
not spot-enabled task once there are already spot-enabled tasks part of the workload.
Table 5.4: Average time increase (in seconds) when a task is added to the workload
presented to the broker prototype (US-East region, Standard Small Linux Instance)
[workload model 1]
Figure 5.3 shows the box plot graphs containing the total time information gathered
during an altered benchmark run. The benchmark generated a random workload
of ten tasks twenty times in a row, for each workload all the different brokering
option combinations were used. The box plots make a comparison between the basic
scheduling with only on-demand allocation, this matches the most naive scheduling
and allocation option that should be fast but has a high cost for the customer, and
a spot-optimized scheduling with the spot-enabled allocation technique (which is
our proposed broker implementation). The duration for the naive implementation
is very small, it lies between 0.05 and 0.06 seconds for a workload of ten random
tasks. For our proposed broker we notice that the duration is always a lot larger, as
would be expected. There are also large fluctuations in the duration, depending on
the presented workload.
5.3. SCALABILITY EVALUATION 91
Figure 5.3: Box plots total brokering time Basic Only On-Demand versus Spot-
Enabled Scheduling and Allocation (US-East region, Standard Small Linux Instance)
[workload model 1]
Table 5.5: Average time distribution (in percentage of total brokering time) of the
different brokering phases [workload model 2]
The amount of time that is added to the duration of the brokering process when more
tasks need to be scheduled is analyzed for the second workload model too. Table
5.6 presents the average number of seconds the brokering duration increases when
a random task is added to the existing workload. When a spot-enabled allocation
scheme is used, thus ‘Spot-Enabled’ or ‘Optimal Spot’, the duration increases over
400 seconds for the Standard Small instances that were benchmarked. This increase
is considerably higher than the one for workload one tasks, thanks to the complexity
of the scheduling algorithm for this kind of tasks, see section 3.4.2.
Table 5.6: Average time increase (in seconds) when a task is added to the workload
presented to the broker prototype (US-East region, Standard Small Linux Instance)
[workload model 2]
Figure 5.4 shows the box plot graphs containing the total time information gathered
during the alternative benchmark version (see section 5.3.1). The basic scheduling
box plot is situated around the ten seconds mark and has a very small range, while
the spot-optimized scheduling with a spot-enabled allocation scheme results in a box
plot that shows durations between 200 and 1400 seconds are most common. These
durations seem very high, but take into consideration that a schedule for a one year
period is created every time. Note that the duration of the brokering process for
tasks of workload model one, only take between 60 and 270 seconds in general (see
Figure 5.3).
5.4. CONCLUSION 93
Figure 5.4: Box plots total brokering time Basic Only On-Demand versus Spot-
Enabled Scheduling and Allocation (US-East region, Standard Small Linux Instance)
[workload model 2]
The brokering process for tasks of the second workload model takes longer than it
does for the first workload process. The scalability problem is again localized to the
spot allocation scheme, so this is where improvements have to be made.
5.4 Conclusion
The benchmarking of the broker performance indicates that the broker achieves
considerable cost savings when running randomly presented workloads on EC2, while
still meeting the imposed deadline constraints.
CONCLUSION
This chapter concludes the ‘A Broker for Cost-efficient QoS aware resource allocation
in EC2’ thesis, by giving an overview of the steps that have been performed to
come to the resulting broker prototype and associated model. In the overview the
contributions of the work, such as the developed algorithms, are stated. A number
of possible extensions to the broker are proposed and future research that could be
performed is presented in the second section of the chapter.
Determining how much running an application on EC2 costs, is still a hard task
to date. The cost depends on a lot of properties of the application’s workload,
6.1. CONCLUSIONS AND CONTRIBUTIONS 95
such as what instances and cloud services it requires, in what geographic location
it has to run, how much data storage and data transferring is required, and so
on. A number of different environmental parameters that can influence the total
cost for the customer considerably, were presented in chapter 2. These parameters
were identified by making a comparison of the different pricing models across the
different geographical regions and by performing an analysis of the history of the
prices. Determining which parameters to take into account in the broker’s model is
a trade-off that influences the complexity of the broker. The following parameters
are taken into account in the broker’s model:
• The choice of geographical region influences the cost and always choosing
the US-East region does not constitute an optimal strategy (see section 2.4).
Taking into account the introduced latency associated with choosing a certain
region is important, since a workload might impose latency constraints.
• The division between the different pricing models, namely on-demand, reserved
and spot instances, influences the cost as well (see section 2.4.4). When spot
prices are not considered, an optimal division between on-demand and reserved
resources can be made that is purely based upon the resource utilization, as
explained in section 3.2.1. This basic level of workload that requires reserved
instances to minimize the cost price, is easily determined by solving an equation
to determine the tipping point utilization.
• Concerning spot pricing, the differences between the regions and the fact that
there is an evolution noticeable during the hours of the day has to be accounted
for. The time periods, for which has statistically be shown that they hold lower
spot prices, are preferred by the broker. These time periods include weekends
and nights. The volatility of the spot market makes it an interesting market
to study (see section 2.3), that’s why a tool suite to analyze the statistical
properties of the spot price history was developed. A part of this program
is made available as a web service, it is called SpotWatch and can be found
online [37]. SpotWatch presents the spot history in a clear to interpret way, in
the form of box plot graphs. The determination of an appropriate bid is done
by a port of the ‘SpotModel’ software, that accompanies the work of Kondo
[1] on the creation of a decision model for cloud computing.
The following choices influence the cost of running a workload on EC2, but are
ignored or assumed to be fixed in our broker’s heuristics:
• The choice of the instance type on which to run a provided workload is assumed
to be made by the user in our broker prototype. To make the broker more
complete a workload benchmark that determines the most appropriate instance
type for a given workload could be implemented (see chapter 4).
6.1. CONCLUSIONS AND CONTRIBUTIONS 96
The broker prototype, described in chapter 4, uses two workload models. The first
one describes a task by stating the total amount of work hours that need to be
executed and a deadline at which time the task has to be finished. The second
workload model describes a task by stating the length of the task, which in this case
equals the total width (in hours) of the task, every hour of the task could however
still need multiple instances to complete the corresponding part of the task. The
broker’s working consists of four phases: input, scheduling, allocation and output
(illustrated by Figure 6.1).
The input component (see section 4.2) provides pricing information and a task
generation and specification part that delivers the constrained workloads for which
the broker develops a cost-efficient schedule. The scheduling component (see section
4.3) provides the functionality that assigns the different workloads to a geographical
region. Per region, an instance-specific schedule is created, which divides the load as
equally as possible over time. The resource allocation component (see section 4.4)
then takes this schedule and determines for every resource involved, what pricing
model is suited best in order to minimize the total cost for the consumer. The
task components are annotated with the corresponding costs and the schedule is
then graphically or textually presented by the output component of the broker (see
section 4.5). The graphical representation is a Gantt-like graph that shows the task
parts scheduled over time across different resources. An accompanying detailed cost
overview is presented to the user as well.
The broker performance was evaluated in chapter 5, it was found that the broker
achieves considerable cost savings when running randomly presented workloads on
EC2, while still meeting the imposed deadline constraints. Besides of the cost
evaluation, the scalability of the broker was evaluated as well (see section 5.3).
A linear increase in time consumption was noticed when a certain number of tasks
were added to the workload one by one, such that the broker prototype could be
considered rather scalable. The duration of a broker benchmark run might seem
long, but the benchmark was generating schedules over the period of a whole year.
6.2. FUTURE WORK 97
• The broker model assumes the appropriate instance type and operating system,
to run a given workload on, to be provided by the user (see section 3.3).
Benchmarking a workload in order to acquire a number of resource utilization
characteristics, enables the automation of the mapping of the workload to an
instance type. Another similar limitation of the broker prototype is that the
determination of whether a certain task can be run on a spot instance has to
be specified by the user as well.
• The broker prototype should be transformed into a web service with a well-
defined protocol to perform brokering tasks. This should be a rather easy
thing to do using JAX-WS [38], since all brokering code is written in the Java
language.
• Expand the spot model such that more of the conclusions drawn from the
created box plots are incorporated (see section 2.3). Modify the model such
that it uses a prediction mechanism that tries to foresee the future spot prices
based on the spot price history and noticeable general trends. The broker
prototype now requires the spot prices to be known during the scheduling
period.
• More EC2 specific features (see section 1.2) can be incorporated into the broker
model, such that the model better represents the real price the customer would
pay to run its workload on the Amazon EC2 cloud. Data storage and transfer
cost can be taken into account, dedicated instances can be added to the model,
the possibility of having a free tier can be accounted for, and so on.
• Checkpointing is only incorporated in the spotmodel for now (see section 3.5.1),
it is however not used by the brokering prototype. Taking the overhead cost
of snapshotting into account would yield a more realistic representation of the
cost overview and is thus desirable.
Appendices
APPENDIX A
A.1 Introduction
To determinine how large the EC2 markets are in the different geographical regions,
it’s interesting to investigate the size of the infrastructure offered by Amazon in
these regions. On April 22th 2011, AWS posted on the official EC2 forums [39] the
public IP ranges used by the different geographical EC2 regions. It was accompanied
by the following introduction sentence: “We are pleased to announce that as part of
our ongoing expansion, we have added a new public IP range (APAC-Tokyo)”. The
size of these IP ranges can be seen as an indication of the size of the different EC2
regions.
From the provided data it can be seen that more than half of all the public IP
addresses provided by Amazon EC2 are situated in the US-East region, it is followed
by EU-West and US-West who account for about 15% of the IP addresses. The
Asian regions represent less than 10% of the IP addresses, but these regions are the
most recently introduced and are still demonstrating a faster growth than the other
regions.
APPENDIX B
B.1 Introduction
The history of EC2 and its instances is given in a chronological order in this
appendix. This history overview contains a selection of the events that were
announced by Amazon in their ‘What’s New’ section [28]. Only the events that are
considered important for the research of this thesis are stated. When the reserved or
on-demand pricing in the US-East region got an update, these changes are illustrated
with a table containing the new prices.
(22/10/2007) EC2 in unlimited beta and new instance types are announced.
(23/10/2008) EC2 exits beta and offers SLA with commitment of 99.95(23/10/2008)
EC2 instances running Windows Server available.
(01/09/2010) Lower prices for High Memory Double and Quadruple XL instances.
C.1 Introduction
An interesting research concerns the relation between the price reductions in EC2
and the expected hardware cost reduction that happens over time. In this appendix,
that focusses on the CPU cost, the underlying hardware Amazon is using for
instances of a certain type is discussed. The evolution of the hardware being used
by EC2 instances is investigated as well.
The following pseudocode was used in a test to verify whether these are still the mi-
croprocessors that are used by Amazon today1 .
for i = 0 to x do
for all type in instanceTypes do
instance=startInstanceUSRegion(type)
procInfo=instance.execute(“more /proc/cpuinfo”)
output.append(procInfo)
end for
end for
The command used in this algortithm “more /proc/cpuinfo” returns a number of
CPU-related characteristics of the underlying system [41].
This experiment, that tested what type of processor the machine on which a
requested instance (of a certain type) runs, resulted in the following Table C.2
presenting the microprocessors being used by instances of a certain type.
These results show that Amazon started using newer microprocessors for a number of
earlier introduced instance types. They did not replace the old processors, but added
more machines with a newer microprocessors in their data centers. The processor
1
These tests were performed on March 26th 2011.
C.3. PRICE EVOLUTION 107
that has been around the longest however, namely the AMD Opteron 2218 that was
used in the US East Region for Standard Small instances in the early days of EC2,
seems to have disappeared (or it at least became rare to get a machine with this
processor). Table C.3 gives a bit more information about the microprocessors that
are in use in EC2, such as their official launch date and their original price (when
they are sold in bulk, per thousand units) [42].
Table C.3: Extra Information about the Microprocessors used by EC2 Instances
Figure C.1: CPU Procurement Cost Evolution for Intel Xeon E5430 (per unit)
C.4. CONCLUSION 108
Figure C.2: CPU Procurement Cost Evolution for Intel Xeon E5507 (per unit)
The fact that the CPU price didn’t decrease much, does not mean that the
hardware cost for Amazon didn’t decrease. Amazon started using newer processor
models over time, which can reduce the hardware cost for a certain amount of
compute power which is needed to offer the advertised amount of Elastic Compute
Units (ECU) of a certain instance type. The Standard Small instances on EC2
started using different microprocessors over time, this evolution is accompanied by
a price reduction. The AMD Opteron 2218, which was used in the early days of
EC2 in the US East region, was a dual core 2.6GHz processor and costed 873 dollars
at the time of its introduction. If expressed in price per GHz, this processor had a
price of 167 dollar per GHz. The microprocessors that are used today for this kind
of instances are the Intel Xeon E5430 (quad core at 2.66GHz for 455 dollars) and the
Intel Xeon E5507 (quad core at 2.26GHz for 276 dollars). They have a corresponding
price per GHz of 42.76 and 30.53 dollars. So, substituting the AMD Opteron by the
newer E5430 processor comes with a price reduction of about 75%. This reduction
happened in a bit over a year, since the E5430 was launched in the fourth quarter of
2007 and the AMD Opteron 2218 was introduced in the third quarter of 2006. The
newer Intel Xeon E5507, launched in the first quarter of 2010, signifies a reduction
of about 82% compared to the AMD Opteron processor and a decrease of about 29
percents compared to the Intel Xeon E5430.
C.4 Conclusion
Not much data about hardware costs and what hardware Amazon is using is publicly
available. During the last 5 years, in which Amazon EC2 has been active, they put
newer microprocessors in use to provide certain instances to its customers. This
evolution was accompanied by a noteworthy hardware price decrease of up to 80
C.4. CONCLUSION 109
percents. The hourly rate for EC2 instances on the other hand has only had a
price reduction of up to 15 percents. It’s however hard to make any conclusions
based on the realisation of this divergence, the CPU procurement cost is only a
small portion of the total cost of the service that Amazon offers through EC2. The
price impact should be normalized using the percentage the CPU cost represents
of the instance hourly price. This percentage can however only be determined by
Amazon itself, since others can’t get access to the needed data. It is reasonable
to assume the hardware price decrease gives Amazon room for price reductions, if
tougher competition would become reality.
APPENDIX D
D: Basic Economics
E.1 Introduction
The empirical analysis of the spot price involves a number of statistical terms that
will briefly be discussed in this appendix. Since the empirical analysis chapter 2
utilized a large number of box plots to illustrate the conclusions made, it is important
to explain how to interpret the different components of a box plot graph.
A box plot [45], which is also called a box and whisker graph, is a graphical
representation that gives an idea of the dispersion of the data, it is useful for
describing the behavior of the data in the middle as well as at the ends/tails of
the distribution. A box is drawn between the upper (Q3, the 75th percentile) and
lower quartiles (Q1, the 25th percentile), with a solid line drawn across the box to
indicate the median value. The interquartile (IQ) range is the difference between
the upper and the lower quartile. There are whiskers used to identify the ranges
that contain values that differ more from the median of the distribution. There are
inner fences/whiskers drawn (not shown on the illustration) at Q1 - 1.5*IQ and Q2
+ 1.5*IQ. The upper outer fence is located at Q2 + 3*IQ, while the lower outer
fence can be found at Q1 - 3*IQ. These fences give an idea of how the tails of the
distribution look like. Any observation outside these fences is considered a potential
outlier.
• Geometric mean: The geometric mean indicates the typical value of a set
of numbers. It is similar to the arithmetic mean, which is what most people
think of with the word “average”, except that the numbers are all multiplied
with each other and then the nth root is taken of the resulting product (where
n is the count of numbers in the set).
• Maximum & Minimum: The maximum and minimum are, as you would
expect, the greatest value and the smallest in the set.
Thus, a negative skew indicates that the tail on the left side of the probability
density function is longer than the right side and this yields that most of the
values (including the median) lie to the right of the mean. A positive skew on
the other hand indicates that the tail on the right side is longer than the left
side and that the bulk of the values lie to the left of the mean. A zero value
indicates that the values are relatively evenly distributed on both sides of the
mean, typically this implies a symmetric distribution.
Sometimes the distinction with an extreme value [48] is made, but we consider those
values outliers as well. An extreme value is said to be an observation that might
have a low probability of occurrence and cannot be statistically shown to originate
from a different distribution than the rest of the data.
Outliers are often indicators of either measurement errors or of the fact that the
population has a heavy-tailed distribution. In the former case we can discard the
values, measurement errors will however not exist in our application, because we
use the values that are actually used as spot price by Amazon EC2 since we got by
using their API. In case of a heavy-tailed distribution they are indicated by a high
kurtosis value.
A number of outlier detection strategies [49] exist, but they all incorporate the
idea of a measure with a spread. This means that the non-outlier values fall within
a distance below and above the the mean value. One possibility is to use the mean
+/- x times the standard deviation as the range for the ‘normal’ values, the ones
that fall outside of this range are considered possible outliers. The range that is
often applied here is from the mean minus three times the standard deviation to the
mean plus three times the standard deviation. Emperical analysis has shown that
approximately 68% of the values of a normal distribution fall within one standard
deviation (SD) unit of the mean, 95% within 2 SD of the mean, and 99% within 3 SD
of the mean. Another possible detection strategy uses a range defined by the usage
of the characteristics used in a boxplot. They consider values that fall below Q1-
1.5*IQ or above Q3+1.5*IQ to be outliers. IQ stands for inter quartile range, which
means the distance between the 25 and 75 percentile, or in other terms: Q3-Q1.
APPENDIX F
F.1 Introduction
In the Resource Scheduling chapter 3 a technique to make the optimal division
between on-demand and reserved instances is given. It’s optimal in the sense that it
minimizes the total cost price. For a particular instance (in a certain geographical
region, using a certain operating system and being of a certain instance type), it
was shown that it is possible to determine a tipping point that expresses from what
utilization rate it is cheaper for an instance to be using the reserved pricing model.
In this appendix the tipping points for Linux and Windows instances that are rented
for a 3-year period are presented.
Table F.2 gives an overview for Windows instances for a 3-year period.
Already at a utilization rate of 25 percent of the time, for a Windows instance that
is rented for a period of 3 years, the reserved version is cheaper than an on-demand
instance. The 3-year reserved version is not taken into account in our model, since
it is impossible to predict the utilization rate for such a long period. First of all it
is difficult/impossible for a company to predict its workload for the next 3 years.
Secondly, EC2 is evolving quickly and Amazon could start offering new pricing
models that better fit one’s workload. It is impossible to predict what technology
will do in 3 years time, this makes both predicting one’s workload and foreseeing
what Amazon EC2 will look like hard.
APPENDIX G
G: Developed Software
G.1 Introduction
In the this appendix the information is provided to access the software that was
developed to accompany the research stated in this thesis. It consists of a Java
prototype of the broker application and the SpotWatch website that was developed
to make the spot price history publicly available.
http://kurtvermeersch.com/Thesis/EnvironmentalAnalysisFinal.xlsx
The statistical analysis of the EC2 spot pricing that was performed in the
environmental analysis chapter, is made into a webservice that allows the user to
create graphs for all existing regions, instance types and operating systems currently1
offered by Amazon EC2 in any desired time frame. The spot price history is available
from the beginning of the existence of the EC2 spot market untill the current date,
the spot price history is updated daily through an Amazon EC2 API call. SpotWatch
1
Last checked on April 25th 2011.
offers 4 different chart types, the data can be plotted per date, per week, per day
of the week or per hour of the day and generates for every situation a graph of the
average spot price and a corresponding boxplot. The application can be accessed
through its website
http://spotwatch.eu
http://kurtvermeersch.com/Thesis/PrototypeBrokerFinal.zip
http://kurtvermeersch.com/Thesis/SpotModelFinal.zip
Bibliography
[1] A. Andrzejak, D. Kondo, and S. Yi, “Decision model for cloud computing
under sla constraints,” in Modeling, Analysis Simulation of Computer
and Telecommunication Systems (MASCOTS), 2010 IEEE International
Symposium on, pp. 257 –266, 2010.
[2] Amazon, “Elastic compute cloud.” http://aws.amazon.com/ec2, 2008.
[Accessed 22-12-08].
[3] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski,
G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above the
clouds: A berkeley view of cloud computing,” Tech. Rep. UCB/EECS-2009-28,
EECS Department, University of California, Berkeley, February 2009.
[4] I. R. I. Foster, Zhao Yong and S. Lu, Cloud Computing and Grid Computing
360-Degree Compared. Proc. 2008 Grid Computing Environments Workshop,
2008.
[5] C. Babcock, The Cloud Revolution. Mc Graw Hill, 2010.
[6] P. Mell and T. Grance, “The nist definition of cloud computing,” 2009.
[7] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud
computing and emerging it platforms: Vision, hype, and reality for delivering
computing as the 5th utility,” Future Gener. Comput. Syst., vol. 25, no. 6,
pp. 599–616, 2009.
[8] A. A. R. Mueen Uddin, “Server consolidation: An approach to make
data centers energy efficient & green.” http://aws.amazon.com/ec2, 2008.
[Accessed 22-12-08].
[9] J. McCarthy, Centennial Keynote Address. MIT, 1961.
[10] D. Parkhill, The Challenge of the Computer Utility. Addison-Wesley Publishing
Company, 1966.
[11] N. Charr, The Big Switch: Rewiring the World, from Edison to Google. W. W.
Norton & Company, 2008.
[33] A. A. Sangho Yi, Derrick Kondo, “Reducing costs of spot instances via
checkpointing in the amazon elastic compute cloud,” in 3rd International
Conference on Cloud Computing (IEEE CLOUD 2010), pp. 236–243, 2010.
[44] N. Geographic, The Knowledge Book: Everything you need to knoe to get by in
the 21th century. National Geographic, 2009.
[48] V. Barnett and T. Lewis, Outliers in Statistical Data. John Wiley & Sons, 1985.