Professional Documents
Culture Documents
Abstract—We present a practical, market-based solution to low-level scheduling algorithms (e.g., time sharing) used to
the resource provisioning problem in a set of heterogeneous actually assign jobs to units of physical hardware.
resource clusters. We focus on provisioning rather than imme- Traditionally, such limits / quotas have been set manually
diate scheduling decisions to allow users to change long-term
job specifications based on market feedback. Users enter bids according to pre-defined policies. The operator either grants
to purchase quotas, or bundles of resources for long-term use. each user an equal share of the system or, more likely, decides
These requests are mapped into a simulated clock auction which that certain jobs / users are “more important” than others,
determines uniform, fair resource prices that balance supply and giving the former higher quotas or the ability to preempt
demand. The reserve prices for resources sold by the operator in lower-ranked tasks. This approach works well for small, ho-
this auction are set based on current utilization, thus guiding the
users as they set their bids towards under-utilized resources. By mogeneous systems, but scales poorly as the systems and their
running these auctions at regular time intervals, prices fluctuate supported user bases get larger and more heterogeneous. In the
like those in a real-world economy and provide motivation latter case, the information requirements on the operator can
for users to engineer systems that can best take advantage of become extremely onerous; even with complex optimization
available resources. procedures, the final allocation is often inefficient from a
These ideas were implemented in an experimental resource
market at Google. Our preliminary results demonstrate an system-wide perspective. These inefficiencies are manifested
efficient transition of users from more congested resource pools though uneven utilization, significant shortages and surpluses
to less congested resources. The disparate engineering costs for in certain resource pools, and general user unhappiness.
users to reconfigure their jobs to run on less expensive resource One solution is to instead allocate resources through a
pools was evidenced by the large price premiums some users market-based system. Indeed, this is the manner in which
were willing to pay for more expensive resources. The final
resource allocations illustrated how this framework can lead nearly all commodities (e.g., oil, food, real estate, etc.) are
to significant, beneficial changes in user behavior, reducing the priced and allocated in the “real-world” economy. People use
excessive shortages and surpluses of more traditional allocation money to purchase goods and services in such a way as to
methods. maximize their individual happiness. Participants with excess
resources, on the other hand, sell them to receive money.
I. I NTRODUCTION The prices for these goods and services, in turn, dynamically
The last decade has seen the rise of IT services supported fluctuate so as to match supply and demand. If the latter
by large, heterogeneous, distributed computing systems. These forces are not well-understood, then one can use auctions to
systems, which serve as the backbone for “grid,” “cloud,” gauge these and set prices appropriately. In this way, goods
and “utility” computing environments, provide the necessary are allocated efficiently without quotas or extensive centralized
speed, redundancy, and cost-effectiveness to enable the pow- control.
erful Internet applications offered by Google and others. In Much literature has proposed applying these market princi-
addition, such systems can be used internally, within the ples to the large-scale computing systems under study. Many
enterprise, to support essential business functions. In the of these models use auction-based pricing for the scheduling of
future, as costs per unit continue to go down, we will see individual jobs across multiple administrative domains [1], [2],
exciting new functionality and applications supported by these [3]. Within this environment, these authors propose replacing
environments. standard scheduling algorithms with “economically inspired”
Despite their size, however, the resources in these large- ones. Round robin, for instance, might be replaced by a
scale, distributed systems are necessarily finite. Not every simulated auction in which the priority of each job is mapped
external client or internal engineering team can get everything into a “bid” [4], [5].
that it demands. The system operator must place hard limits While the previous models are potentially useful for low-
on the CPU, disk, memory, etc. that each job or job class can level scheduling, our focus here is on the higher-level pro-
use. Otherwise, some tasks will be starved and user experience visioning of aggregate resources (e.g. quotas) across users
will suffer. These allocation limits are then mapped into the within a single administrative domain that spans sites around
the world. These types of allocation problems have been other secondary characteristics may (or may not) distinguish
studied extensively from a theoretical standpoint, but few a particular pool for a particular user. In the experiments
practical applications have been introduced and evaluated. described later, these pools represented high-level, resource
Those few appearing in the literature have been small-scale / location pairs (e.g., “CPUs in cluster 1”). The exact criteria
and limited to very specific application domains [6], [7]. As used to distinguish these groups, however, is flexible and
discussed in [8], much work remains to be done in making ultimately depends on the system being modeled. Similarly,
these “market-based” approaches both practical and scalable. the definition of a “user” is also flexible; in the case of our
In this paper, we propose a novel, high-level approach to the experiments, these were taken as distinct engineering teams
pricing and allocation of resources in large-scale, “grid” sys- within the company.
tems. In particular, users specify desired bundles of resources
along with the maximum amount they are willing to pay (or, These users, given some initial endowment of money and
if selling, the minimum amount they are willing to receive). resources, seek to buy, sell, and trade the latter in an individ-
In addition, the system operator acts as a seller of resources ually optimal way. The exact details of this optimization are
and sets reserve prices based on the current utilization of application-dependent. In many “grid” settings, however, the
each resource pool. The final winners and prices are then underlying user preferences are highly combinatorial; that is,
determined through a clock auction starting at the latter users get maximal utility from particular bundles of resources
prices. By repeating this process on a regular time scale (e.g., consisting of very specific resource combinations. CPUs in a
weekly), one can thus develop a dynamic, efficient “economy” particular place, for instance, are probably not useful unless the
for computing resources within a distributed, heterogeneous user can get colocated memory, disk, and network resources
environment. as well. At the same time, these preferences may also reveal
Given this framework, we implemented a combinatorial strong substitutability among resource bundles. For example, a
exchange within Google and ran experiments in which this was user may demand a certain combination of CPU, memory, and
used to allocate resources across the company’s engineering disk but may be indifferent with respect to the exact location.
teams. Our preliminary results indicate that this approach of-
fers several advantages over traditional allocation mechanisms:
These preferences are well-understood by the users but
1) Allows market participants to make engineering design
not necessarily by the system operator. As discussed in the
tradeoffs with the jobs they choose to run on their
introduction, therefore, we propose that prices and allocations
allocated resources. Teams that find resource A at a
be set through an auction. In particular, users announce bids
significant discount to resource B may bid on resource
encapsulating their desired bundles and “willingness to pay”
A and set about reengineering their job to use less of
criteria in a tree-based bidding language similar to TBBL
resource B and more of resource A.
described in [9]. More formally, we assume that each user
2) Long-term resource utilization can be taken into account
u submits some bid, B u , consisting of the pair {Q u , πu }.
when setting reserve prices for new resources brought
The first term is a set of R-component vectors, q u1 , qu2 , . . .
into the marketplace. In this way bidding behavior can
representing bundles of resources over which u is indifferent.
be encouraged that improves the overall bin-packing of
In other words, u desires [q u1 XOR qu2 XOR qu3 . . .]. The com-
system clusters. Specifically, in later sections we develop
ponents of each vector, in turn, encode either the quantities of
a utilization weighted reserve price calculation to encour-
each resource desired (if positive) or offered (if negative). The
age uniform utilization levels across resource pools.
second bid term, π u , gives the total, maximum amount that u
The remainder of this paper is organized as follows. We be-
is willing to pay (if positive) or the minimum amount that u is
gin in the next section by discussing our mathematical model willing to receive (if negative). We assume that this is a scalar
for resources and user preferences in a large-scale computing
value applying to all bundles in Q u . Extending the model to
environment. Section 3 describes the clock auction mechanism
allow for vector π’s, corresponding to distinct valuations for
used to set prices in our system. Section 4 addresses the each individual user bundle, does not significantly change our
utilization-based reserve pricing scheme described above. Our
results and is omitted for notational simplicity.
experimental results are discussed in Section 5. Finally, in
Section 6, we conclude and give directions for future research.
We also assume that all bids are entered simultaneously, i.e.
II. R ESOURCE AND U SER P REFERENCE M ODELS users cannot modify their bids based on the actions of others or
Consider a large-scale computing system with R resource provisional feedback from the system. In practice, participants
pools, indexed as r = 1, 2, . . . , R and U users, indexed are typically given some window of time in which to enter
as u = 1, 2, . . . U . The former represent aggregations of bids and, possibly, respond to environmental conditions. In
individual, physical resources which are divisible and shared this case, our analysis here still holds if we instead consider
across the users. Although the units of resources are the same the bids to be the final values as of the end of the entry
for all resource pools, the pools in practice need not have period. Modeling the dynamics of this initial phase is a very
exactly the same characteristics. Geographic location, location interesting problem, but one that is outside the scope of our
of other required resources or data, network connectivity, or investigation here.
III. P RICING A LGORITHM We refer to the above optimization as SYSTEM in the sequel.
A. Design Goals There are many possible forms for the objective function,
f (x, p), and the “best” choice from these varies significantly
Given the auction setup above, we now seek some algorithm
from application to application. Some possibilities include
that maps the bids described previously into both resource
using (1) the total surplus, i.e. the total difference between
prices and allocations. The former are important for two
what users are willing to pay minus what they actually pay or
reasons. First, these determine how much each winning user
(2) the total value of trade, i.e. the net value of all resources
pays for its awarded resources. Second, and perhaps more
that change hands. Optimizing over these and then comparing
importantly, though, these final prices act as signals to both
the relative costs and benefits of the outcomes produced are
the operator and the users. If demand exceeds supply for some
rich areas for future research. In this paper we focus on a
resource pool, for instance, then this should be reflected by a
tractable solution for satisfying the constraints. Note that these
significant price increase. This increase, in turn, indicates to
have the following interpretations:
losing bidders that they should increase their bids in future
auctions. Moreover, it indicates to the system operator that (1) xu ∈ {0 ∪ Qu } ∀u: Users either get one of their desired
there may be a shortage in the corresponding pool; the operator bundles
or nothing at all. Thus, no scaling is allowed.
should address this shortage by increasing the supply of (2) u xu ≤ 0: The final allocation leads to a net surplus
resources appropriately. of resources. There are no shortages created by awarding
Any pricing algorithm, therefore, should satisfy the follow- resources which are not available for distribution.
ing criteria: (3) πu ≥ xTu p ∀u ∈ W: All winners have provided values
1) Clear Signaling: As discussed above, one of the primary that are “high enough” given final prices.
goals of a grid-system resource auction is to provide clear (4) xTu p = minq∈Qu q T p ∀u ∈ W: Winners attain the
signals about supply and demand to the participants. This “cheapest” bundles in their indifference sets.
is easiest to achieve when the final prices are uniform and (5) πu < minq∈Qu q T p ∀u ∈ L: All losers have bid “too low”
linear. In other words, all winners within a given resource given the final prices.
pool pay exactly the same “price per unit.” These unit (6) p ≥ 0: Prices must be non-negative.
prices are clearly announced at the end of the auction Alternatively, one can tighten the second constraint to
so that losing bidders can adjust their behavior in future u xu = 0, i.e. eliminate all “left over,” surplus resources, at
auctions. the expense of violating the first one. Note that, in practice, it
2) Fair Outcomes: Winners should be those who have “bid is usually impossible to have the market perfectly clear without
enough” and losers those who have bid “too little” given scaling some user; supplies and demands will rarely “align”
the final auction prices. In this sense, allocations are to the necessary extent.
determined completely by the prices and bids, not by, Another possible modification is to replace the last con-
“unfair,” exogenously provided operator preferences. straint by p ≥ pmin , p ≤ pmax where pmin and pmax are
3) Computational Tractability: In a large grid computing reasonable lower and upper bounds, respectively, on system
environment, any pricing and allocation method has to prices. Doing so can keep the system away from “weird”
scale well in the size of the system (i.e., number of or “unfair” values. On the downside, though, these additional
resource pools and users). Procedures that require solving constraints reduce the size of the feasible region and increase
NP-hard optimizations are probably not a good choice. the possibility that no solution exists. In the remainder of this
document, we assume for simplicity that p is only constrained
B. Mathematical Description to be nonnegative. Replacing this with upper and lower bounds
We now describe the problem in more formal, mathe- does not significantly change the analysis or our results.
matical terms. Let the final auction prices be given by the Finally, one might want to relax the win / loss constraints
R-component vector p and let x 1 , x2 , . . . , xU represent the in the case of ties. For instance, if there is one unit of supply,
corresponding user allocations. Furthermore, let W and L and two users are both willing to pay up to $1.00 for this unit,
represent, respectively, the sets of “winning” and “losing” then the only feasible, “fair” outcome is that both lose and the
bidders. Our algorithm design problem can then be expressed resource remains unallocated. As this seems wasteful, it may
as be desirable in this case to break the “fairness” constraints
by allowing one to lose and the other to win. This is less of
SYSTEM: a worry in large systems, though, because the probability of
maxx,p f (x, p) having an exact tie is extremely low.
subject to: x
u ∈ {0 ∪ Qu } ∀u
C. Ascending Clock Auction
u xu ≤ 0
πu ≥ xTu p ∀u ∈ W The requirements above rule out the most commonly applied
xTu p = minq∈Qu q T p ∀u ∈ W combinatorial auction algorithms (e.g., the Vickrey-Clarke-
πu < minq∈Qu q T p ∀u ∈ L Groves (VCG) mechanism), as these are generally not com-
p ≥ 0 putationally tractable and do not produce fair, uniform prices
Algorithm 1 Ascending Clock Auction
User 1 Proxy x1
(t) 1: Given: U users, R resources, starting prices p̃, update
User 2 Proxy
x2 (t) increment function g : (x, p) → R R
Auctioneer 2: Set t = 0, p(0) = p̃
User 3 Proxy x 3 (t) P 3: loop
.. u xu (t)>0
) 4: Collect bids: xu (t) = Gu (p(t)) ∀u
. (t
xU 5: Calculate excess demand: z(t) = u xu (t)
User U Proxy
6: if z(t) ≤ 0 then
Price Clock 7: Break
p(t+1) 8: else
9: Update prices: p(t + 1) = p(t) + g(x(t), p(t))
Fig. 1. Schematic of price update loop in ascending clock auction. The 10: t←t+1
auctioneer collects the demands and/or supplies from each bidder proxy as a 11: end if
function of the current price. If demand exceeds supply, then the price clock 12: end loop
is increased and the process repeats.
gives us a “price ceiling” beyond which the auction must end. f1(x) = exp(2(x−0.5)
f2(x) = exp(x−0.5)
seek to buy more than they sell) which prevent these types of
results. Proving these formally, however, is quite tricky and 0.5
simulated
Utilization Percentile
auction
mapping 60
Bidder Proxies
40
Clock Auction
Simulator
20
0.0
C. Bidder Behavior
As the internal market economy has evolved we have
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15
r16
r17
r18
r19
r20
r21
r22
r23
r24
r25
r26
r27
r28
r29
r30
r31
r32
r33
r34