You are on page 1of 5

Zipf law in the popularity distribution of chess openings

Bernd Blasius1 and Ralf Tönjes2,3


1
ICBM, University Oldenburg, 26111 Oldenburg, Germany
2
Institut für Physik, Universität Potsdam, 14415 Potsdam, Germany and
3
Ochadai Academic Production, Ochanomizu University, Tokyo 112-8610, Japan

We perform a quantitative analysis of extensive chess databases and show that the frequencies of
opening moves are distributed according to a power-law with an exponent that increases linearly
with the game depth, whereas the pooled distribution of all opening weights follows Zipf’s law with
universal exponent. We propose a simple stochastic process that is able to capture the observed
arXiv:0704.2711v2 [physics.soc-ph] 18 Nov 2009

playing statistics and show that the Zipf law arises from the self-similar nature of the game tree of
chess. Thus, in the case of hierarchical fragmentation the scaling is truly universal and independent
of a particular generating mechanism. Our findings are of relevance in general processes with
composite decisions.

PACS numbers: 89.20.-a, 05.40.-a, 89.75.Da

Decision making refers to situations where individuals sequence as a directed path starting from the initial node.
have to select a course of action among multiple alter- We will differentiate between two game situations if they
natives [1]. Such processes are ubiquitous, ranging from are reached by different move sequences. This way the
one’s personal life to business, management and politics graph becomes a game tree, and each node σ is uniquely
and take a large part in shaping our life and society. assoiciated with an opening sequence.
Decision making is an immensely complex process and, Using a chess-database [6] we can measure the popular-
given the number of factors that influence each choice, a ity nσ or weight of every opening sequence as the number
quantitative understanding in terms of statistical laws re- of occurences in the database. We find that the weighted
mains a difficult and often elusive goal. Investigations are game-tree of chess is self-similar and the frequencies S(n)
complicated by the shortage of reliable data sets, since of weights follow a Zipf-Law [7]
information about human behavior is often difficult to
be quantified and not easily available in large numbers, S(n) ∼ n−α (1)
whereas decision processes typically involve a huge space
with universal exponent α = 2. Note, the precise scal-
of possible courses of action. Board games, such as chess,
ing in the histogram of weight frequencies S(n) and in
provide a well-documented case where the players in turn
the cumulative distribution C(n) over the entire ob-
select their next move among a set of possible game con-
servable range (Fig. 2A). Similar power-law distribu-
tinuations that are determined by the rules of the game.
tions with universal exponent have been identified in a
Human fascination with the game of chess is long- large number of natural, economic and social systems
standing and pervasive [2], not least due to the sheer in- [7, 8, 9, 10, 11, 12, 13, 14, 15] - a fact which has come
finite richness of the game. The total number of different to be known as Zipf- or Pareto law [7, 8]. If we count
games that can be played, i.e., the game tree complex- only the frequencies Sd (n) of opening weights nσ after
ity of chess, has roughly been estimated as the average the first d moves we still find broad distributions consis-
number of legal moves in a chess position to the power of tent with power-law behavior Sd (n) ∼ n−αd (Fig. 2B).
the length of a typical game, yielding the Shannon num- The exponents αd are not universal, however, but in-
ber 3080 ≈ 10120 [3]. Obviously only a small fraction of crease linearly with d (Fig. 2B, inset). The results are
all possible games can be realized in actual play. But robust: similar power-laws could be observed in different
even during the first moves of a game, when the game databases and other board games, regardless of the con-
complexity is still manageable, not all possibilities are sidered game depth, constraints on player levels or the
explored equally often. While the history of successful decade when the games were played. Stretching over six
initial moves has been classified in opening theory [4], orders of magnitude, the here reported distributions are
about the mechanisms underlying the formation of fash- one of the most precise examples for power-laws known
ionable openings not much is known [5]. With the recent today in social data sets.
appearance of extensive databases playing habits have As seen in (Fig.1) for each node σ the weights of its
become accessible to quantitative analysis, making chess subtrees define a partition of the integers (1 . . . nσ ). The
an ideal platform for analyzing human decision processes. assumption of self-similarity implies a statistical equiv-
The set of all possible games can be represented by alence of the branching in the nodes of the tree. We
a directed graph whose nodes are game situations and can thus define the branching ratio distribution over the
whose edges correspond to legal moves from each posi- real interval r ∈ [0, 1] by the probability Q(r|n) that a
tion (Fig. 1). Every opening is represented by its move random pick from the numbers 1 . . . n is in a subset of
2

A
8
10 6
6 10
10 10
4

frequency S(n)
4 C(n)
10 10
2

2 0
10 10 0 2 4
n 6
10 10 10 10
0
10 histogram
-2 logbin
10 Zipf-distribution
-4
Figure 1: A) Schematic representation of the weighted game 10
-6
tree of chess based on the ScidBase [6] for the first three half 10 0 1 2 3 4 6
5
moves. Each node indicates a state of the game. Possible 10 10 10 10 10 10 10
game continuations are shown as solid lines together with the B
6
branching ratios rd . Dotted lines symbolize other game con- 10 4
tinuations, which are not shown. B) Alternative represen- 4 3
10

frequency Sd(n)
tation emphasizing the successive segmentation of the set of α2
games, here indicated for games following a 1.d4 opening until 2 1
10 0
the fourth half move d = 4. Each node σ is represented by a 0 1 10 20 30 40
box of a size proportional to its frequency nσ . In the subse- 10 d
quent half move these games split into subsets (indicated ver- -2 d=4 logbin

tically below) according to the possible game continuations. 10 d=16 histogram


d=16 logbin
Highlighted in (A) and (B) is a popular opening sequence -4
10 d=22 logbin

1.d4 Nf6 2.c4 e6 (Indian Defense). 0 1 2 3 4 5 6


10 10 10 10 10 10 10
popularity n
size smaller or equal to rn. Taking n to infinity Q(r|n)
may have a continuous limit Q(r) for which we find the Figure 2: A) Histogram of weight frequencies S(n) of open-
ings up to d = 40 in the Scid database (brown dots) and
probability density function (pdf) q(r) = Q′ (r). If the
with logarithmic binning (blue). A straight line fit (not
limit distribution q(r) of branching ratios exists it carries shown) yields an exponent of α = 2.05 with a goodness
the fingerprint of the generating process. For instance, of fit R2 > 0.9992. For comparison, the Zipf-distribution
the continuum limit of the branching ratio distribution Eq. (8) with µ = 1 is indicated as a solid line. Inset: num-
for a Yule-Simon preferential growth process [13] in each
PN
ber C(n) = m=n+1 S(m) of openings with a popularity
node of the tree would be q(r) ∼ rβ , where β < 0 is a m > n. C(n) follows a power-law with exponent α = 1.04
model specific parameter. On the other hand, in a k- (R2 = 0.994). B) Number Sd (n) of openings of depth d with
ary tree where each game continuation has a uniformly a given popularity n for d = 16 (brown dots) and histograms
with logarithmic binning for d = 4 (blue), d = 16 (red) and
distributed random a-priori probability the continuum
d = 22 (black). Solid lines are regression lines to the loga-
limit corresponds to a random stick breaking process in rithmically binned data (R2 > 0.99 for d < 35). Inset: slope
each node, yielding q(r) ∼ (1 − r)k−2 . For the weighted αd of the regression line as a function of d (red dots) and the
game tree of chess q(r) can directly be measured from the analytical estimation Eq. (6) using N = 1.4 · 106 and β = 0
database (Fig. 3A). We find that q(r) is remarkably con- (black solid line).
stant over most of the interval but diverges with exponent
0.5 as r → 1, and is very well fitted by the parameterless
arcsine-distribution random game from the database of N games (e.g., dark
shading in Fig. 1). The weights nd = nσd describe a
2
q(r) = √ . (2) multiplicative random process
π 1 − r2
d
The form of the branching ratio distribution suggests
Y
nd = N ri , n0 = N (3)
that in the case of chess there is no preferential growth i=1
process involved, but something entirely different which
must be rooted in the decision process during the opening where the branching ratios rd = nd /nd−1 for sufficiently
moves of a chess game [5]. large nd are distributed according to q(r) independent
In the following we show that the asymptotic Zipf-Law of d. For lower values of nd the continuous branching
in the weight frequencies arises independently from the ratio distribution is no longer a valid approximation and
specific form of the distribution q(r), and hence, the mi- a node of weight one has at most one subtree, i.e. the
croscopic rules of the underlying branching process. Con- state nd = 1 is absorbing.
sider N realizations of a general self-similar random seg- To calculate the probability density function (pdf)
mentation process of N integers, with paths (σ0 , σ1 , . . . ) pd (n) of the random variable nd after d steps it is con-
in the corresponding weighted tree. In the context of venient to consider the log-transformed variables ν =
chess each realization of this process corresponds to a log(N/n) and ρ = − log r. The corresponding pro-
3
Pd
cess {νd } is a random walk νd = i=1 ρi with non-
A 9

negative increments ρi and its pdf πd (ν) transforms as 8

n pd (n) = πd (ν). An analytic solution can be obtained 7


for the class 6

q(r)
q(r) = (1 + β) rβ , 0 ≤ r ≤ 1, (4) 4

3
of power-law distributions, which typically arise in prefer-
2
ential attachment schemes. In this case the jump process
νd is Poissonian and distributed according to a gamma 1
d
distribution πd (ν) = (1+β)
(d−1)! ν
d−1 −(1+β)ν
e . After retrans- 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
r
formation to the original variables and noting that from B 6
the probability pd (n) for a single node at distance d to the 10
root to have the weight n one obtains the expected num- 10
4 database d=22
simulation arcsin
ber Sd (n) of these nodes in N realizations of the random simulation uniform
theory uniform

frequency Pd(n)
2
process as Sd (n) = N pd (n)/n, and in particular 10
0
d−1  1−β 10
(1 + β)d

N N
Sd (n) = log . (5) -2
N (d − 1)! n n 10
-4
The functions Sd (n) are strongly skewed and can exhibit 10

power-law like scaling over several decades. A logarith- 10


-6
1 2 3 4
mic expansion for 1 < n ≪ N shows that they approxi- 10 10 10 10
n
mately follow a scaling law Sd (n) ∼ n−αd with exponent
Figure 3: A) Probability density q(r) of branching ratios r
1
αd = (1 − β) + (d − 1) . (6) sampled from all games in the Scid database with a bin size of
log N ∆r = 0.01 (red bars) and arcsine distribution Eq. (2) (black
solid line). Every edge of the weighted game tree, from nodes
The exponent αd is linearly increasing with the game of size nd−1 to nd , contributes to the bin corresponding to
depth d and with a logarithmic finite size correction r = nd /nd−1 with weight r. We disregarded clusters with
which is in excellent agreement with the chess database nd < 100 so that, in principle, a cluster could contribute to
(Fig. 2B, inset). Power-laws in the stationary distri- any of the bins. We found q(r) to be depth-independent. B)
bution of random segmentation and multiplicative pro- Distribution of opening popularities Sd (n) for d = 22 obtained
cesses have been reported before [9] and can be ob- from the Scid database (black) and from a direct simulation of
the multiplicative process Eq. (2), whith branching ratios q(r)
tained by introducing slight modifications, such as re-
taken from a uniform (red) or arcsine (blue) distribution. Fur-
flecting boundaries, frozen segments, merging or reset ther indicated is the theoretical result Eq. (5) (dashed line).
events [16, 17, 18]. In contrast, the approximate scal- Similar results are obtained for other values of d.
ing of Sd (n) in Eq. (5) is fundamentally different, as our
process does not admit a stationary distribution. The ex-
ponents αd increase due to the finite size of the database. weight frequencies are difficult to obtain analytically.
As shown in Fig. 3B we find excellent agreement be- But using renewal theory [19] the scaling can be shown
tween the weight frequencies Sd (n) in the chess database to hold asymptotically for n ≪ N and a large class
and direct simulations of the multiplicative process, of distributions q(r). For this, note that the random
Eq. (3) using the arcsine distribution Eq. (2). If the variable τ (ν) = max(d : νd < ν) is a renewal process
branching ratios are approximated by a uniform distri- in ν. The expectation E[τ (ν)] is the corresponding re-
bution q(r) = 1 the predicted values of Sd (n) are system- newal
P∞ function related to the distributions of the νd as
atically too small, since a uniform distribution yields a d=1 Prob(νd < ν) = E[τ (ν)]. If the expected value
larger flow into the absorbing state n∗ = 1 than observed µ = E[ρ] = E[− log r] is finite and positive (e.g., for the
in the database. Still, due to the asymtotic behavior distribution (4) µ = 1/(1 + β)), the renewal theorem pro-
of q(r) for r → 0, this approximation yields the correct vides
slope in the log-log plot so that the exponent αd can be d 1
estimated quite well based on Eq. (6) with β = 0. E [τ (ν)] = . lim (7)
dν µ ν→∞
By observing that Sd (n) in Eq. (5) is the d-th term
in a series expansion of an exponential function, we
P∞ 1
Thus, we obtain limν→∞ d=1 πd (ν) = µ and finally
find theP weight distribution in the whole game tree as
S(n) = d Sd (n) to be an exact Zipf-Law. For branch- N
lim S(n) = . (8)
ing ratio distributions q(r) different from Eq. (4) the n
N →0 µn2
4

A B
fraction of openings Q 1 1
W=0.9
database are concentrated in about 23% of the most pop-

fraction of openings Q
0.8 0.8 W=0.8 ular openings), whereas beyond the critical game depth

Gini coefficient G
0.6 0.6
rarely used move sequences are dominating such that
W=0.5
in aggregate they comprise the majority of all games
0.4 0.4
(Fig. 4). Note, that this result arises from the statistics
0.2 0.2
d=25 d=20 d=15 d=10 d=5 of iterated decisions and does not indicate a crossover of
0
0 0.2 0.4 0.6 0.8 1
0
0 5 10 15 20 25 30 playing behavior with increasing game depth.
fraction of games W game depth d
Our study suggests the analysis of board games as a
promising new perspective for statistical physics. The
Figure 4: Inequality [11, 20] of the distribution Sd (n). A)
Proportion W of games that is concentrating in the fraction
enormous amount of information contained in game
Q of the most popular openings, for several levels of the game databases, with its evolution resolved in time and in re-
depth d. B) Q as a function of d for three different values of lation to an evolving network of players, provide a rich
environment to study the formation of fashions and col-
R1
W (solid lines) and Gini-coefficient G = 1 − 2 0 Q(W ) dW as
a function of game depth (dotted line). lective behavior in social systems.
We are indebted to Andriy Bandrivskyy for invaluable
help with the data analysis.
Thus, the multiplicative random process (Eq. 3) with any
well behaving branching ratio distribution q(r) on the
intervall [0, 1] always leads to an asymptotically univer-
sal scaling for n ≪ N (compare also the excellent fit of
Eq. (8) to the chess data in Fig. 2a). In [15] the same [1] H. Simon, Administrative behaviour (Macmillan, New
Zipf-Law scaling was found for the sizes of the directory York, 1947); I. L. Janis, L. Mann, Decision making: A
trees in a computer cluster. The authors propose a grow- Psychological Analysis of Conflict, Choice, and Commit-
ment (Free Press, New York, 1977).
ing mechanism based on linear preferential attachment.
[2] H. J. R. Murray, A History of Chess (Oxford University
Here we have shown that the exponent α = 2 for the Press Reprint, 2002).
weight distribution of subtrees in a self-similar tree is [3] C. E. Shannon, Phil. Mag. 41, 256-275 (1950).
truly universal in the sense that it is the same for a much [4] The Encyclopedia of Chess Openings A-E (Chess Infor-
larger class of generating processes and not restricted to mant, Beograd, Serbia, 4th edition 2001).
preferential attachment or growing. [5] M. Levena, J. Bar-Ilan, Computer Journal 50, 567
There are direct implications of our theory to general (2007).
[6] Here we present results based on ScidBase
composite decision processes, where each action is as- (http://scid.sourceforge.net) with N = 1.4 · 106
sembled from a sequence of d mutually exclusive choices. recoreded games. Each game was uniquely coded by a
What in chess corresponds to an opening sequence, may string (up to d = 40 half moves) and the game strings
be a multivariate strategy or a customized ordering in were sorted alphabetically, so that all games following
other situations. The question how such strategies are the same move sequence up to a given game depth d
distributed is important for management and marketing were grouped together in clusters. Popularities were
[21]. One consequence of our theory is, that in a process obtained by counting the cluster sizes nd .
[7] G. K. Zipf, Human Behaviour and the Principle of Least-
of d composite decisions the distribution Sd (n) ∼ n−αd Effort (Addison-Wesley, Cambridge, 1949).
of decision sequences, or strategies, which occur n times [8] V. Pareto, Cours d’Economie Politique (Droz, Geneva,
shows a transition from low exponents αd ≤ 2, where 1896).
a few strategies are very common, to higher exponents [9] D. Sornette, Critical Phenomena in Natural Sciences
αd > 2, where individual stategies are dominating. This (Springer, Heidelberg, 2nd edition 2003).
is due to the divergence of the first moment in power-laws [10] M. Mitzenmacher, Internet Mathematics 1, 226 (2004).
with exponents smaller than two [11]. From (Eq. 6) the [11] M. E. J. Newman, Contemp. Phys., 46, 323 (2005).
[12] J. Willis, G. Yule, Nature 109, 177 (1922).
critical number dcr of descisions at which this transition [13] H. A. Simon, Biometrika 42, 425 (1955).
occurs depends logarithmically on the sample size N and [14] R. A. K. Cox, et al., J. Cult. Econ. 19, 333 (1995);
on the leading order β of q(r) near zero as S. Redner, Eur. Phys. J. B, 4, 131 (1998); A. L. Barabasi,
R. Albert, Science 286, 509 (1999); R. L. Axtell, Science
dcr = 1 + (1 + β) log N . (9) 293, 1818 (2001); X. Gabaix et al., Nature 423, 267
(2003).
Applied to the chess database with N = 1.4 · 106 we [15] K. Klemm, et al., Phys. Rev. Lett. , 95, 128701 (2005).
[16] H. Kesten, Acta Mathematica 131, 207 (1973); D. Sor-
obtain dcr ≈ 15 (see also Fig. 4 and Fig. 2B inset). This
nette, Phys. Rev. E 57, 4811 (1998); D. Sornette,
separates the database into two very different regimes: in R. Cont, J. Phys. I 7, 431 (1997); S. C. Manrubia,
their initial phase (d < dcr ) the majority of chess games is D. H. Zanette, Phys. Rev. E 59 4945 (1999).
distributed among a small number of fashionable open- [17] P. L. Krapivsky, et al., Phys. Rev. E 61, R993 (2000).
ings (for d = 12, for example, 80% of all games in the [18] J. R. Banavar, et al., Phys. Rev. E 69, 036123 (2004)
5

[19] W. Feller, An introduction to probability theory and its [21] C. Anderson, The Long Tail: why the future of business
applications, Vol. 2 (John Wiley & Sons, 1971) is selling less of more. (Hyperion, 2006).
[20] J. L. Gastwirth, Rev. Econ. Stat. 54, 306 (1972).

You might also like