Professional Documents
Culture Documents
Amril Nazir
Introduction
From networks for workstations through to Internet, the high-performance computing community has
long advocated composing individual computing resources in an attempt to to solve the most demanding
computational and data-intensive problems. In recent years this progression has been driven by the vision
of "Grid computing" [Foster et al., 2001] where the computational power, storage power, and specialist
functionality of arbitrary networked devices is to be made available on-demand to any other connected
device that is allowed to access them.
The Grid computing vision promises to provide the required platform for users to run a new and
more demanding range of high performance computing (HPC) applications. Using grid technologies,
it is now possible to generate virtual computers with processing capabilities that rival those of the high
cost, dedicated supercomputer. The advent of high-speed networks has also enabled the integration of
computational resources which are geographically distributed and administered at different domains.
Such possibilities offer a way to solve most complex computation-intensive challenges such as protein
folding, molecular dynamics, financial analysis, fluid dynamics, structural analysis and many others.
workflow management engine and the grid workflow scheduler are provided as mechanisms to schedule
the execution of tasks on grid computing resources that span multiple administrative domains across
the grid. The workflow management engine provides the required quality-of-service (QoS) support to
guarantee the execution of each task based on advance reservation. When scheduling, the workfow
management engine produces a workflow model that outlines where and when each task component
is to be processed. Based on this plan, the resources are then reserved on a number of participating
grid resources. For the reservations to be effective, detailed knowledge is required with respect to the
amount of time each task execution will take. Approaches such as advance reservation and backfilling
are employed based on the very precise information of job execution time to efficiently schedule jobs.
This approach however can be a significant problem for emerging class of adaptive, real-time,
interactive-driven high performance computing (HPC) applications1 because the processing require-
ments for these applications always change and fluctuate throughout their execution. In most cases,
it is not often possible to predict job execution times during execution because of user interaction or
unpredictable changes between states representing a very broad spectrum of resource loads. Workflows
are not suitable for these applications because late completions are not allowed as they will interfere with
the reservation plan already agreed by all parties. Workflows that do not adhere to their deadlines will
consequently have to be terminated with potential significant losses to the efficiency for such applica-
tions. Consequently, the thesis takes a drastic departure from previous work in which we do not assume
any knowledge about job execution times, thereby freeing users of the need to estimate them. However,
this mandates the use of an alternate approach. An alternative solution is to make scheduling decision
at the time of task execution. This avoids the need to reserve each job on the required resources in the
future time slot. However, this approach is currently not feasible under grid environments because on-
demand allocation in grid environments may produce a poor schedule, since it is a dynamic environment
where utilization and availability of resources varies over time and resources from participating sites can
unpredictably join and leave at any time.
This thesis addresses the above problems by introducing the possibility of managing distributed re-
sources in smaller units. The units, nodes or the capacity, can be rented from the global grids as required.
Once rented, the nodes are moved outside the management of the resource providers (e.g., Grid manage-
ment systems) and will be managed by an entity which is known as a service provider. The creation of
such a unit maintains the global concept and at the same time introduces a local concept. This concept
has many benefits. Firstly, individual users no longer need to be recognised globally. The organization
they belong to can hire equipment and can create a local service, to serve its own users and/or appli-
cations. As a result, it will be able to schedule, with a minimum of hassle, as there are no competing
scheduling authorities and the resource pool is limited. This also provides the user with isolated, cus-
tomized execution environments needed, and, it promotes simplified resource administration. Secondly,
an immediate opportunity that this environment offers, is the opportunity for users to outsource work
to a third party provider. This outsourcing concept has many benefits such as; users avoid the hassle
1 this type of application is also referred to as urgent high performance computing applications in this thesis.
4 Chapter 1. Introduction
and expense of maintaining their own equipment and can specifically provision resources during peak
loads. Finally, the system can optimize use of the nodes processing capabilities more efficiently, because
resources are managed in significantly smaller units in comparison to global grids, and at the same time,
retain full control.
Incorporating such a concept within current grid systems does not require any major changes. The
only requirement is to have access to resources i.e. nodes, machines and processors for very long periods
of time (from a few hours to several days or even a week). In such a scenario, nodes are abstracted as
a service and given to applications or users. When the nodes are rented out, the owner looses usage
of these nodes and therefore, is paid a rental fee. During the rented period, the service provider is in
command of how it would like to utilise the nodes.
Since the system can be tailored accordingly, dynamic runtime negotiations for nodes are also
promoted. This enables users to customise their applications with a set of distinct resource type and
amounts to form ideal node configurations based on current load demand. Moreover, temporary and
unexpected spikes in demand for nodes can be accommodated by flexible rental arrangements. The
thesis defines the principles of the rental arrangements or the resource renting from the perspective
of the service provider. The nature of the problem is such that the demand for the resources and the
processing times of individual application requests are unknown or that estimates are unreliable. The
service provider is faced with the conflicting goals of renting sufficient computing nodes to provide an
adequate level of application satisfactions and keeping the cost of renting to a minimum. However,
renting too few nodes results in long wait times and application dissatisfaction. There is a need to
balance the cost of satisfying customer demand and the cost for renting computing resources. Therefore,
this thesis presents a novel costing model that can be used to identify important costs and to provide a
clear performance objective for the service provider. The cost model is useful for capacity planning and
for the evaluation of resource scheduling in the presence of bursty and unpredictable demand.
Using the proposed cost model, the service provider can explicitly measure and manage the trade-off
between the cost of renting computational nodes and the lost opportunity if customer demand is not met.
To this end, the thesis proposes several rental heuristics and policies that perform rental decisions based
on current and anticipated (future) workload. A rental policy provides the rules to decide what, when
and how many nodes to order. Such approach offers the possibility for a service provider to customise
ideal set of nodes to satisfy the current and future workload more effectively. For example, the service
provider may rent a large number of nodes for the period covered, in the case when the demands for node
amounts are very high. Alternatively, if the demand is consistent, the service provider may rent a few
nodes at any one time to reduce the rental cost. Overall, this provides greater flexibility, which allows
the systems to operate under dynamic environments such as existing Grids and potentially commercial
service Clouds.
1. We disclose a novel service provider approach or a rental-based system for supporting urgent
high-performance computing applications with minimal cost and infrastructure. The approach re-
solves the issues by aggregating worldwide computing resources to form a localised rental based
management system. The framework decomposes rental-based systems into three distinct tiers
which map well on the structure of modern grid computing systems. The first tier offers the abil-
ity for the applications to interact directly with the rental management system using simple API
calls. The calls are handled by the application agent (AA) which resides between the application
and the rental management system to enable dynamic resource allocation at an application level
to make the application execution benefited from the dynamic, and on-demand nature. The sec-
ond tier is a service provider that makes use of the QoS information provided from the AA and
appropriately schedules application jobs based on job description/requirement, SLA agreements,
and resource costs. A local scheduler uses information provided by the AA to estimate demand.
The third tier is a negotiator that obtains resources from resource providers which provide shared
pools of ready-to-use compute resources. The three-tier approach essentially differentiates the
roles of managing application processing requirements, scheduling application jobs and managing
distributed resources.
2. We present the fundamental requirements that a service provider or a rental-based system must
meet in order to support QoS requirements of a variety HPC applications. Based on the require-
ments, we further present an architectural framework for a rental management system that allows
autonomic and dynamic renting of resources for high performance computing applications. We
demonstrate how such systems are are capable to provide rapid provisioning of computing re-
sources, and to provide support for dynamic resource negotiation at run-time.
4. We determine the cost-benefits of the rental management approach versus dedicated HPC sys-
tems. Based on an analysis of an existing grid infrastructure such as the international EGEE Grid
[Berlich et al., 2006], we identify the real cost of current grid infrastructure, and propose alter-
native mechanisms by which jobs can be managed in decentralized environments, providing new
avenues for agility, service improvement and cost control. We detail specific costs of EGEE Grid,
and Lawrence Livermore National Lab (LLNL) HPC system. EGEE Grid is currently a world-
leading production Grid across Europe and the world. On the other hand, LLNL HPC is a large
6 Chapter 1. Introduction
Linux cluster installed at the Lawrence Livermore National Lab which is currently being used to
run a broad class of HPC applications including best effort, adaptive, and interactive applications.
With these performance and monetary cost-benefits in mind, we demonstrate the performance of
a rental management system compared to a dedicated HPC system.
5. We present a comprehensive costing model for the evaluation of a service provider. The pro-
posed cost model provides the mechanism for the service provider to quantitatively evaluate its
conflicting objectives in minimising operating and rental related costs subject to an application
satisfaction-level constraint. Effectively, the evaluation metric is the profit which is the trade-off
between earning monetary values and paying for the rental cost. Although both service provider
and resource providers act independently, changes in behaviour from a resource provider may in-
fluence how a service provider makes decisions, and vice versa. For example, a resource provider
may choose to support long rental contracts, and therefore charge a lower unit rental fee for long
contracts than short rental contracts. This will in turn impact on how the service provider makes
rental decisions. Both have their own expectations and strategies: users adopt the strategy of
solving their problems at low cost within a required timeframe and resource providers adopt the
strategy of obtaining best possible return on their investment. The responsibility of a service
provider is to offer a competitive service access cost in order to attract users. It may have an op-
tion of choosing the providers that best meet users’ requirements. The proposed cost model can be
used to evaluate the cost and the effectiveness of specific adopted strategies from the perspective
of both service provider and resource provider.
6. We propose and examine several cost-aware scheduling and rental policies that incorporate execu-
tion deadlines and monetary values when making scheduling and rental decisions. The provision
of sophisticated rental policies is essential for the economic viability of a service provider. A
rigid rental policy is examined whereby additional nodes are rented exactly based on estimates of
demand. We further introduce more sophisticated aggressive and conservative heuristics that op-
erate in a reactionary mode, where a service provider does not rent additional nodes until specific
conditions have reached e.g., when there is a sudden increase in demand or when the nodes are
running low. We explore how these policies can be improved further by taking into account of
job deadlines, monetary values, system revenue and system profitability, and examine how load,
job mix, job values, jod deadlines, node heterogeneity, lease duration, node lead time, job sizes,
and rental price influence the service provider’s profit. We also examine the impact of uncertainty
in demand, uncertainty in resource availability and uncertainty in charging prices from resource
providers. Our results provide insight on the benefits of possible optimisations and is a step to-
wards understanding the balance of satisfying customer demand and the cost for renting computing
resources. The investigated policies serve as the foundation for improving productivity and return
on investment for satisfying demand without a heavy upfront investment and without the cost of
maintaining idle resources.
1.3. Scope and Assumptions 7
7. We propose service level agreements management (SLAM) framework that could be effectively
used to enhance the application quality-of-service (QoS) while minimising rental costs. Rental
policies which operate on reactionary mode are not sufficient for highly variable or chaotic work-
load. Simple per-job information does not capture the true intent of the user, leaving the service
provider to do the best it can, but risking unhappy users, under-utilized nodes, or both. An ad-
ditional control is proposed by means of service level agreements (SLA), or long-term contracts,
which specify the service to be provided, its quality and quantity levels (e.g., the load that the user
can impose), price, and penalties for non-compliance. As such, a service provider can optimise the
rental decisions because there will be a minimal unexpected surprise, and it will be able to respond
to unexpected demand more efficiently since an upper bound on workload can be established. We
present and evaluate three new SLA-aware policies that can be incorporated in such a framework
to improve scheduling and rental decisions.
The above contributions are very much complementary in nature. Used jointly, the resulting framework
presents a unique set of characteristics that distinguish it from existing distributed, grid, and cluster sys-
tems: the result is an adaptive self-centered approach to collaboration, allowing a service provider to
construct a dynamic resource management environment and to operate such environment accordingly in
the most cost effective manner. Unlike centralized approaches or approaches based on fully autonomous
behaviour, where independent participants operate mostly in isolation, our framework fosters collabora-
tion without compromising site autonomy through a separation of interests between the service provider
and the resource provider. Furthermore, it is designed to ensure that significant benefit can still be ob-
tained even when the scope of deployment is limited, allowing it to be integrated into current and existing
distributed and grid environments incrementally, without major modifications to the environment itself,
and without the need for complete cooperation by all providers. This thesis will demonstrate that the
use of these mechanisms can help achieve significant gains for both the applications and the resource
providers.
their return on investment. The pricing2 policies consider the following question: “What should the
provider charge for the resources?” However, pricing policies and market dynamics imposed by resource
providers are beyond the scope of this thesis, although our rental heuristics are potentially compatible
with their pricing schemes. This thesis do not venture further into other market concepts such as user bid-
ding strategies [Wolski et al., 2001, Chun et al., 2005] and auction pricing [Lai et al., 2004, Waldspurger
et al., 1992, Das and Grosu, 2005] mechanisms.
2 We differentiate between pricing nodes and charging for services in the thesis.
1.4. Organisation 9
after adding a resource, either process migration or data load balancing may take place to take advantage
of the new added resource.
1.4 Organisation
We end this introductory chapter with an outline of the remainder of this dissertation. The outline of the
thesis is as follows:
Chapter 2 introduces the research background and related work from a wide variety of areas related
to distributed systems for high performance computing applications.
Chapter 3 sets the stage of the thesis by discussing the motivation behind the adoption of rental-
based mechanisms. We then present fundamental requirements that a rental based system must meet in
order to deliver application QoS satisfactions for HPC applications.
Chapter 4 proposes a framework for building such a rental-based system. We also describe our
simulator framework we used to simulate the behaviour of rental-based systems. We discuss in details
our simulation environments and the experimental design including the application scenarios, workload,
traffic models, and performance metrics.
Chapter 5 presents HASEX, a prototype implementation of a rental-based system and we describe
HASEX’s key features and discuss how the implementation is realised.
Chapter 6 evaluates the cost-benefit of a rental-based system versus dedicated HPC systems.
Chapter 7 presents resource management strategies that incorporate cost-aware scheduling and
rental policies that consider both exact and heuristic approaches.
Chapter 8 introduces the service level agreements management (SLAM) framework that can used
to enhance the overall application satisfactions. We proposed three new SLA-aware policies that make
use of SLA information to improve scheduling and rental decisions.
Finally, Chapter 9 concludes the thesis work and outlines directions for future work.
Grid computing can be considered as a consolidated field in high performance computing. However, it
still presents serious limitations from the point of view of urgent, HPC applications. Response time is
a normal handicap in such environments. Each administrative domain, in a grid infrastructure, has its
own entities that take care about information flow and scheduling issues. All these entities introduce
considerable delay to the jobs starting, which is a clear disadvantage for dynamic and interactive HPC
applications.
In the previous chapter, we have touched upon how current grid infrastructure lacks the mechanisms
necessary to harness the power of geographically distributed resources worldwide for HPC applications
that need to leverage these resources. In particular, we discussed the problems of the workflows in
providing the QoS support required for adaptive, interactive, and parallel HPC applications. For these
applications, their processing requirements are difficult to predict since problem size can grow or shrink
over time, and some of their tasks need to complete within scheduled times.
In the remainder of this chapter, we identify several significant shortcomings of the current Grid
models, and outline the approaches used to resolve these issues. Based on these approaches, we present
the fundamental requirements that a service provider must meet in order to address all the limitations.
tion on resources but the approach is infeasible because the application requires prior knowledge of job
runtime estimates.
Our approach is to reduce the complexity of global scheduling optimisation problems. This can
be achieved by introducing the possibility of managing distributed resources in much smaller units, in
comparison to traditional grid approach. The management of these smaller units is controlled by a service
provider. A service provider defines a resource service that controls and manages local scheduling policy.
A service provider provides a localised exclusive environment for applications or job execution services
using virtual resources created from rented hardware, which is supported by a pool of resources that
it may either purchase or rent. The rental arrangement is possible because it makes agreements with
other pool owners (resource providers) to rent some of their resources in times of high utilization in
exchange for rental and usage fees. As such, a consumer is provided secure and controlled access
to individual managed resources. Such an environment does not differ much if at all from that of a
dedicated HPC cluster. It importantly removes the interaction between applications and the Distributed
Resource Managers since the system should look like any multiprocessor solution to the applications.
The notion described here is similar to the concept of elastic computing in Cloud computing.
However, it is also important to achieve the above objectives without requiring major changes to
the overall structure of the grid systems. Such systems must be able to operate within the context of an
existing resource grid, because of the high availability, variety of resource types and amounts that are
widely available from global grids.
allocated to a job is staticaly determined at allocation time. When allocated, applications will hold the
processors assigned to them until they terminate (i.e., for the lifetime of the application). with flexi-
ble execution approach, applications can dynamically request additional processors or release some of
their allocated processosr during their execution. There are several motivations for this. Our primary
motivation is to provide extensive QoS supports for different type of HPC applications. In the modern
distributed computing, the application can be dynamic and its execution behavior could not be predicted
in advance, i.e. its resource requirements may change in the middle of execution. This is an important
requirement in ensuring the adoption of the grid for everyday usage by users, who wish to launch com-
putations from their desktops. In our previous works [Liu et al., 2009], we have introduced a software
framework to support adaptive applications in distributed and parallel computing, but we soon realise that
current grid systems lack the mechanisms to support such features due to the issues in providing rapid
response to requests and exclusive access to resources. Hence, up until now, only batch type jobs are
currently adopted in grid systems due to the lack of mechanisms for such resource and QoS guarantees.
Secondly, such an adaptive HPC application requires an exclusive execution environment in order to
support dynamic resource addition and release during its execution. To accommodate this requirement,
it is desirable to have a mechanism that allows on-demand and rapid provisioning of additional resources
to accommodate fluctuations in demand. A service provider can choose to rent resources that reflect
the current load in the system. If it can predict the system load with reasonable accuracy within a short
interval or period, it will be able to schedule with a minimum of hassle because there are no competing
scheduling authorities and the resource pool is limited. Furthermore, it will be able to keep idle resources
to a minimum.
This however requires the capability for a service provider to expand and shrink in size depending
on its customers’ QoS requirements. Therefore, it is envisaged that such capability could achieved by
negotiations with third party external resource providers. Negotiations consider the issues of when a ser-
vice provider should rent resources and when should it release them with the aim to meet customers’ QoS
requirements. Similarly, with this capability, the applications should also need the ability to reconfigure
themselves with more processors and/or fewer processors at the run-time to make efficient and full use
of rented resources. Resource provision in our model is based on fine-grained, multiple requests, instead
of one-time allocation basis. Resource can be dynamically added or removed from the service provider
according to local needs as to meet consumers’ QoS and/or service level agreements (SLA) agreements.
In the case when there is high demand for resources, the service provider can always negotiate and rent
more resources during application execution to maintain certain expected QoS performance objectives.
On another occasion, when the demand is low, the service provider can choose to release rented re-
sources to remove the burden of unnecessary scheduling and management overhead. In this manner, the
system can be tailored according to demand; newer and faster resources will be allowed to replace the
slow resources if it deems more desirable in specific situations. For example, the system may choose
to rent large number but fewer powerful nodes for best-effort jobs because they have no strict QoS con-
straints. Alternatively, the system may choose to rent very fast nodes for interactive parallel jobs in order
2.3. Inadequate Performance Measures 13
to meet their deadlines. Such flexibility is desirable to support the requirements of multiple consumers
(applications) with different QoS (deadline) constraints.
systems could also potentially lead to resources not being fully utilised if the environment is not properly
managed. Unlike tangible goods were ownership is completely transferred from a seller to a buyer, com-
putational resource capacity simply represent the right to use a shared resource where a lease agreement
determines how resource sharing is initiated in time and space. Resource allocations vary mainly in the
quantity of the allocation and over what period of time that allocation is delivered. This applies for both
CPU allocations as well as specific application QoS constraint e.g., ensure that a job finishes within its
deadline of one minute. Meeting such a QoS constraint requires a sophisticated control on the level of
sharing such as the need for effective scheduling and rental decisions.
The system should effectively determine which type of nodes, the amount of nodes to rent, and
when to rent them. For example, if an application requires N nodes for a specified duration, the system
can react immediately by renting exactly N nodes for the period covered. It is important to note, how-
ever, that the renting resources is not necessarily a one-time activity that occurs at the start of application
execution. The system may not always estimate the right amount of resources it needs since the individ-
ual application processing requirements may fluctuate at runtime. Therefore, the system must evaluate
its rental decision on a periodic or reactionary basis in response to sudden demand. For example, the
system may rent fewer nodes one at a time until there is a sudden demand for additional nodes. In this
way, the system may also optimise the resource costs by allowing a single node to be used by multiple
applications at different periods of their execution. This can significantly reduce node idle times and un-
necessary overhead. To improve rental decisions further, the system should incorporate information on
both execution deadlines and monetary values (refer to Section 2.4.2.1 and Section 2.4.2.2) when making
scheduling and rental decisions. Nonetheless, careful rental decisions are important due to uncertainty
in resource availability from resource providers. In particular, nodes may not be necessarily available at
the time when the system needs to rent them. Therefore, the uncertainty in waiting times for obtaining
nodes can affect how a service provider makes a rental decision. Furthermore, the pricing options can
also influence rental decisions greatly. For example, if resource providers offer lower charges for long
rental contracts in comparison to short term contracts, the service provider may find that a long term core
supported by short term options is the best policy.
Therefore, a complete costing model is needed to measure several important overheads and metrics
that need to be considered by the service provider. The cost framework could be used for capacity plan-
ning and evaluate resource scheduling in the presence of bursty and unpredictable demand. In particular,
it could address the trade-off between the cost of rental and the lost opportunity if customer demand is
not met. The cost model should be incorporated as part of the rental decision-making process to identify
and balance the cost of rental and the lost opportunity if customer demand is not met. In this manner,
rental decisions can be made in a principled way, so that a proactive, rather than an ad-hoc approach to
performance measurement, is promoted. The cost model must rely on the following parameters which
should be provided directly by the application or the application agent on its behalf:
16 Chapter 2. Motivation and Requirements
2.4.2.1 Deadlines
The system needs to serve multiple running applications that compete for resources simultaneously.
Moreover, these competing applications often have diverse characteristics in terms of their computation,
communication and I/O requirements from the underlying system. They also impose diverse Quality-
of-Service (QoS) demands (low response time for execution and interactivity, high throughput, and even
bounded response times or guaranteed throughput) with different QoS parameters for each job, depend-
ing on how important the job is [Zhang et al., 2000]. A rental-based system should therefore be able
to determine correct priority of a job, or task before it can make decisions to prioritise allocation of
resources. Without this information, there is no meaningful way of inferring how valuable jobs are when
deciding how to allocate resources.
There should be an absolute parameter that the system could use to prioritise competing application
jobs with reasonable accuracy. One of these parameters is the job deadline. The job deadline can either
be soft or hard [Yeo and Buyya, 2006]. A hard deadline does not tolerate a missed deadline at all whereas
a soft deadline can tolerate a missed deadline as long as the total number of misses does not reach above
the probability of misses specified by the application’s QoS. In this thesis, we consider hard deadlines
for the purpose of evaluation. We assume that deadlines are specified directly in application code, or it
can be determined by the application agent that specifies the task deadline on the application behalf.
Jobs that are only asking for low response times or high throughput is referred to as best-effort (BE)
jobs [Zhang et al., 2000] whereas jobs that require bounded response times or guaranteed throughput
is referred to as interactive jobs in this thesis. Under this model, a best-effort job is likely to have a
longer soft deadline compared to an interactive job that needs to process its execution results in seconds
or minutes.
a high monetary value will be given higher execution priority compared to a low monetary value. Hence,
all applications will be served accordingly to their reflected values. This provides an incentive for the
applications to specify honest deadlines for their jobs.
There are also several other potential issues when using virtual currency. These include the stability
of the system which can be greatly affected if a few applications decide to save currency for long periods
of time in order to amass a disproportionate amount of wealth. Any amount of strategy-proof design
cannot prevent such an application from dominating control over all resources, since a consumer can
over-specify the actual value of each task [Chun et al., 2005]. These issues are beyond the scope of this
thesis. In the thesis, we assume that the value assigned to each task represents the true monetary value
of the deadline. The user or the application agent acting on its behalf is solely responsible to specify a
value for each job based on the task deadline. From this information, it is then the responsibility for the
system to ensure that computing resources are sufficiently available to serve consumers’ processing and
QoS requirements.
changes in demand.
• Three-tier Approach: A framework for a rental-based system is needed to support the flexible
runtime negotiation for reconfigurable applications and the dynamic negotiations for additional
resources from third-party (external) resource providers.
• Rental Policies: The provision of sophisticated rental heuristics that consider both execution dead-
lines and monetary values is essential to ensure the economic viability of rental-based systems.
The economic viability should be clearly measured by using a comprehensive cost model, which
enables the service provider to quantitatively evaluate its conflicting objectives in minimising op-
erating and rental related costs subject to application-level constraints.
• Service Level Agreements: A framework for enabling consumers or users to negotiate long-term
application/job execution contracts with the service provider is advocated. The provision of SLA-
aware policies is also essential for a service provider to make efficient scheduling and rental deci-
sions based on both per-contract and per-job information.
The above requirements were specifically motivated by practical experience with the deployment and use
of current grid technologies for high performance computing projects over the last five years. This expe-
rience has allowed us to identify the difficulties in operating on a grid scale and reconciling the interests
of the various participants from different administrative organisations. This has led to the cornerstone
of our localised-based rental approach, which is to optimise scheduling by compensating for the lack of
direct control over resources.
Bibliography
Alexander Barmouta and Rajkumar Buyya. Gridbank: A grid accounting services architecture (gasa) for
distributed systems sharing. In Proceedings of the 17th Annual International Parallel and Distributed
Processing Symposium (IPDPS 2003), IEEE Computer, pages 22–26. Society Press, 2002.
Ruediger Berlich, Marcus Hardt, Marcel Kunze, Malcolm Atkinson, and David Fergusson. Egee: Build-
ing a pan-european grid training organisation. In Rajkumar Buyya and Tianchi Ma, editors, Fourth
Australasian Symposium on Grid Computing and e-Research (AusGrid 2006), volume 54 of CRPIT,
pages 105–111, Hobart, Australia, 2006. ACS.
Anca I. D. Bucur and Dick H. J. Epema. The influence of communication on the performance of co-
allocation. In JSSPP ’01: Revised Papers from the 7th International Workshop on Job Scheduling
Strategies for Parallel Processing, pages 66–86, London, UK, 2001. Springer-Verlag. ISBN 3-540-
42817-8.
Rajkumar Buyya. Economic-based distributed resource management and scheduling for grid computing.
CoRR, cs.DC/0204048, 2002.
Abdur Chowdhury, Lisa D. Nicklas, Sanjeev K. Setia, and Elizabeth L. White. Supporting dynamic
space-sharing on clusters of non-dedicated workstations. In In Proceedings of the 17th International
conference on distributed computing, 1997.
B.N. Chun, P. Buonadonna, A. AuYoung, Chaki Ng, D.C. Parkes, J. Shneidman, A.C. Sno-
eren, and A. Vahdat. Mirage: a microeconomic resource allocation system for sensor-
net testbeds. Embedded Networked Sensors, IEEE Workshop on, 0:19–28, 2005. doi:
http://doi.ieeecomputersociety.org/10.1109/EMNETS.2005.1469095.
Anubhav Das and Daniel Grosu. Combinatorial auction-based protocols for resource allocation in grids.
In IPDPS ’05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Sym-
posium (IPDPS’05) - Workshop 13, page 251.1, Washington, DC, USA, 2005. IEEE Computer Soci-
ety. ISBN 0-7695-2312-9. doi: http://dx.doi.org/10.1109/IPDPS.2005.140.
Ian Foster, Carl Kesselman, and Steven Tuecke. The anatomy of the grid: Enabling scalable virtual
organizations. Int. J. High Perform. Comput. Appl., 15(3):200–222, 2001. ISSN 1094-3420. doi:
http://dx.doi.org/10.1177/109434200101500302.
Kevin Lai. Markets are dead, long live markets. SIGecom Exch., 5(4):1–10, 2005. doi:
http://doi.acm.org/10.1145/1120717.1120719.
Kevin Lai, Lars Rasmusson, Eytan Adar, Stephen Sorkin, Li Zhang, and Bernardo A. Huberman. Ty-
coon: an Implemention of a Distributed Market-Based Resource Allocation System. Technical Report
arXiv:cs.DC/0412038, HP Labs, Palo Alto, CA, USA, December 2004.
Hao Liu, Amril Nazir, and Søren-Aksel Sørensen. A software framework to support adaptive applications
in distributed/parallel computing. In HPCC, pages 563–570, 2009.
Sang-Min Park and Marty Humphrey. Feedback-controlled resource sharing for predictable escience.
SC Conference, 0:1–11, 2008. doi: http://doi.ieeecomputersociety.org/10.1145/1413370.1413384.
M. A. Rappa. The utility business model and the future of computing services. IBM Syst. J., 43(1):
32–42, 2004. ISSN 0018-8670.
C.A. Waldspurger, T. Hogg, B.A. Huberman, J.O. Kephart, and W.S. Stornetta. Spawn: A distributed
computational economy. IEEE Transactions on Software Engineering, 18:103–117, 1992. ISSN
0098-5589. doi: http://doi.ieeecomputersociety.org/10.1109/32.121753.
Rich Wolski, James S. Plank, John Brevik, and Todd Bryan. Analyzing market-based resource allocation
strategies for the computational grid. Int. J. High Perform. Comput. Appl., 15(3):258–281, 2001. ISSN
1094-3420. doi: http://dx.doi.org/10.1177/109434200101500305.
Chee Shin Yeo and Rajkumar Buyya. A taxonomy of market-based resource management systems for
utility-driven cluster computing. Softw. Pract. Exper., 36(13):1381–1419, 2006. ISSN 0038-0644.
doi: http://dx.doi.org/10.1002/spe.v36:13.
Yanyong Zhang, Anand Sivasubramaniam, Jose Moreira, and Hubertus Franke. A simulation-based
study of scheduling mechanisms for a dynamic cluster environment. In ICS ’00: Proceedings of the
14th international conference on Supercomputing, pages 100–109, New York, NY, USA, 2000. ACM.
ISBN 1-58113-270-0. doi: http://doi.acm.org/10.1145/335231.335241.
Bibliography
Alexander Barmouta and Rajkumar Buyya. Gridbank: A grid accounting services architecture (gasa) for
distributed systems sharing. In Proceedings of the 17th Annual International Parallel and Distributed
Processing Symposium (IPDPS 2003), IEEE Computer, pages 22–26. Society Press, 2002.
Ruediger Berlich, Marcus Hardt, Marcel Kunze, Malcolm Atkinson, and David Fergusson. Egee: Build-
ing a pan-european grid training organisation. In Rajkumar Buyya and Tianchi Ma, editors, Fourth
Australasian Symposium on Grid Computing and e-Research (AusGrid 2006), volume 54 of CRPIT,
pages 105–111, Hobart, Australia, 2006. ACS.
Anca I. D. Bucur and Dick H. J. Epema. The influence of communication on the performance of co-
allocation. In JSSPP ’01: Revised Papers from the 7th International Workshop on Job Scheduling
Strategies for Parallel Processing, pages 66–86, London, UK, 2001. Springer-Verlag. ISBN 3-540-
42817-8.
Rajkumar Buyya. Economic-based distributed resource management and scheduling for grid computing.
CoRR, cs.DC/0204048, 2002.
Abdur Chowdhury, Lisa D. Nicklas, Sanjeev K. Setia, and Elizabeth L. White. Supporting dynamic
space-sharing on clusters of non-dedicated workstations. In In Proceedings of the 17th International
conference on distributed computing, 1997.
B.N. Chun, P. Buonadonna, A. AuYoung, Chaki Ng, D.C. Parkes, J. Shneidman, A.C. Sno-
eren, and A. Vahdat. Mirage: a microeconomic resource allocation system for sensor-
net testbeds. Embedded Networked Sensors, IEEE Workshop on, 0:19–28, 2005. doi:
http://doi.ieeecomputersociety.org/10.1109/EMNETS.2005.1469095.
Anubhav Das and Daniel Grosu. Combinatorial auction-based protocols for resource allocation in grids.
In IPDPS ’05: Proceedings of the 19th IEEE International Parallel and Distributed Processing Sym-
posium (IPDPS’05) - Workshop 13, page 251.1, Washington, DC, USA, 2005. IEEE Computer Soci-
ety. ISBN 0-7695-2312-9. doi: http://dx.doi.org/10.1109/IPDPS.2005.140.
Ian Foster, Carl Kesselman, and Steven Tuecke. The anatomy of the grid: Enabling scalable virtual
organizations. Int. J. High Perform. Comput. Appl., 15(3):200–222, 2001. ISSN 1094-3420. doi:
http://dx.doi.org/10.1177/109434200101500302.
Kevin Lai. Markets are dead, long live markets. SIGecom Exch., 5(4):1–10, 2005. doi:
http://doi.acm.org/10.1145/1120717.1120719.
Kevin Lai, Lars Rasmusson, Eytan Adar, Stephen Sorkin, Li Zhang, and Bernardo A. Huberman. Ty-
coon: an Implemention of a Distributed Market-Based Resource Allocation System. Technical Report
arXiv:cs.DC/0412038, HP Labs, Palo Alto, CA, USA, December 2004.
Hao Liu, Amril Nazir, and Søren-Aksel Sørensen. A software framework to support adaptive applications
in distributed/parallel computing. In HPCC, pages 563–570, 2009.
Sang-Min Park and Marty Humphrey. Feedback-controlled resource sharing for predictable escience.
SC Conference, 0:1–11, 2008. doi: http://doi.ieeecomputersociety.org/10.1145/1413370.1413384.
M. A. Rappa. The utility business model and the future of computing services. IBM Syst. J., 43(1):
32–42, 2004. ISSN 0018-8670.
C.A. Waldspurger, T. Hogg, B.A. Huberman, J.O. Kephart, and W.S. Stornetta. Spawn: A distributed
computational economy. IEEE Transactions on Software Engineering, 18:103–117, 1992. ISSN
0098-5589. doi: http://doi.ieeecomputersociety.org/10.1109/32.121753.
Rich Wolski, James S. Plank, John Brevik, and Todd Bryan. Analyzing market-based resource allocation
strategies for the computational grid. Int. J. High Perform. Comput. Appl., 15(3):258–281, 2001. ISSN
1094-3420. doi: http://dx.doi.org/10.1177/109434200101500305.
Chee Shin Yeo and Rajkumar Buyya. A taxonomy of market-based resource management systems for
utility-driven cluster computing. Softw. Pract. Exper., 36(13):1381–1419, 2006. ISSN 0038-0644.
doi: http://dx.doi.org/10.1002/spe.v36:13.
Yanyong Zhang, Anand Sivasubramaniam, Jose Moreira, and Hubertus Franke. A simulation-based
study of scheduling mechanisms for a dynamic cluster environment. In ICS ’00: Proceedings of the
14th international conference on Supercomputing, pages 100–109, New York, NY, USA, 2000. ACM.
ISBN 1-58113-270-0. doi: http://doi.acm.org/10.1145/335231.335241.