Peter Alexander Foster

Parallel Combinatorial Optimisation for Finding Ground
States of Ising Spin Glasses

Peter Alexander Foster
MSc in High Performance Computing
The University of Edinburgh
Year of Presentation: 2008
To my Parents
Abstract
This dissertation deals with the Ising spin glass ground state problem. An exact approach to
this optimisation problem is described, based on combining the Markov chain framework with
dynamic programming. Resulting algorithms allow ground states of the aperiodic k
2
-spin lattice
to be computed in O
_
k 2
2k
_
time, which is subsequently improved to O
_
k
2
2
k
_
, thus resembling
transfer matrix approaches. Based on parallel matrix/vector multiplication, cost optimal parallel
algorithms for the message passing architecture are described, using collective or alternatively
cyclic communications. In addition, a parallel realisation of the Harmony Search heuristic is
described. The implementation of both exact and heuristic approaches using MPI is detailed, as
is an application framework, which allows spin glass problems to be generated and solved.
Dynamic programming codes are evaluated on a small-scale AMD Opteron based SMP
system and a large-scale IBM P575 based cluster, HPCx. On both systems, parallel eciencies
above 90% are obtained on 16 and 256 processors, respectively, when executing the O
_
k 2
2k
_
algorithm on problem sizes 14
2
spins. For the improved algorithm, while computationally
less expensive, scalability is considerably diminished. Results for the parallel heuristic approach
suggest marginal improvements in solution accuracy over serial Harmony Search, under certain
conditions. However, the examined optimisation problem appears to be a challenge to obtaining
near-optimum solutions, using this heuristic.
i
Acknowledgements
I sincerely thank my project supervisor, Dr. Adam Carter for guidance throughout the project,
and for commenting on this dissertation prior to submitting it.
In addition, I am grateful for funding awarded by the Engineering and Physical Sciences Re-
search Council.
iii
Table of Contents
1 Introduction 1
2 The Spin Glass 3
2.1 Introduction to magnetic systems . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Modelling magnetic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Spin interaction models . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Spin models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 The Ising spin glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Computational Background 13
3.1 Ising spin glass ground states and combinatorial optimisation . . . . . . . . . . 13
3.1.1 Approximate approaches for determining ground states . . . . . . . . . 15
3.1.2 Exact methods for determining ground states . . . . . . . . . . . . . . 19
3.2 A dynamic programming approach to spin glass ground states . . . . . . . . . 21
3.2.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Ising state behaviour as a Markov chain . . . . . . . . . . . . . . . . . 22
3.2.3 The ground state sequence . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.4 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.5 An order-n Markov approach to determining ground states . . . . . . . 27
4 Parallelisation Strategies 31
4.1 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.1 Harmony search performance . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 Existing approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.3 Proposed parallelisation scheme . . . . . . . . . . . . . . . . . . . . . 36
4.2 Dynamic programming approaches . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 First-order Markov chain approach . . . . . . . . . . . . . . . . . . . . 39
4.2.2 Higher-order Markov chain approach . . . . . . . . . . . . . . . . . . 43
v
vi TABLE OF CONTENTS
5 The Project 45
5.1 Project description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.1 Available resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Project preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 Initial investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.2 Design and implementation . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.3 Implementation language and tools . . . . . . . . . . . . . . . . . . . 48
5.2.4 Choice of development model . . . . . . . . . . . . . . . . . . . . . . 49
5.2.5 Project schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.6 Risk analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.7 Changes to project schedule . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.8 Overview of project tasks . . . . . . . . . . . . . . . . . . . . . . . . 51
6 Software Implementation 53
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Implementation overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3 Source code structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3.1 Library functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3.2 Client functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7 Performance Evaluation 69
7.1 Serial performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.1.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.2 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2 Parallel performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2.2 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8 Conclusion 99
8.1 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.1 Algorithmic approaches . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.2 Existing code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.1.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2 Project summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
A Project Schedule 103
B UML Chart 105
TABLE OF CONTENTS vii
C Markov Properties of Spin Lattice Decompositions 107
C.1 First-order property of row-wise decomposition . . . . . . . . . . . . . . . . . 107
C.2 Higher-order property of unit spin decomposition . . . . . . . . . . . . . . . . 108
D The Viterbi Path 111
D.1 Evaluating the Viterbi path in terms of system energy . . . . . . . . . . . . . . 111
E Software usage 113
F Source Code Listings 115
List of Figures
2.1 Types of spin interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Graphs of spin interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Frustrated systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Subsystems and associated interaction energy . . . . . . . . . . . . . . . . . . 10
2.5 Clamping spins to determine interface energy. . . . . . . . . . . . . . . . . . . 10
3.1 Computing total system energy from subsystem interactions . . . . . . . . . . 14
3.2 Example rst-order Markov chain with states a, b, c . . . . . . . . . . . . . . . 22
3.3 Illustrating the principle of optimality. Paths within the dashed circle are known
to be optimal. Using this information, optimal paths for a larger subproblem can
be computed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Sliding a unit-spin window across a lattice . . . . . . . . . . . . . . . . . . . . 28
4.1 Using parallelism to improve heuristic performance . . . . . . . . . . . . . . . 32
4.2 Conceptual illustration of harmony search behaviour within search space . . . . 33
4.3 Parallelisation strategies for population based heuristics . . . . . . . . . . . . . 34
4.4 Harmony search parallelisation scheme . . . . . . . . . . . . . . . . . . . . . 37
4.5 Graph of subproblem dependencies for an n = 3, m = 2 spin problem . . . . . . 40
4.6 Parallel matrix operations. Numerals indicate order of vector elements. . . . . . 41
5.1 Spin glass structure design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Software framework design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1 Functions provided by spinglass.c . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Schematic of operations performed by get optimal prestates() (basic dynamic
programming, collective operations). In contrast, when using cyclic communi-
cations, processes evaluate dierent congurations of rowi1, shifting elements
in minPath. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3 Sliding window for improved dynamic programming . . . . . . . . . . . . . . 65
ix
x LIST OF FIGURES
6.4 Schematic of operations performed by get optimal prestates() (improved dyanamic
programming), executed on four processors. The problem instance is a 22 spin
lattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.1 Execution times for serial dynamic programming (basic algorithm) . . . . . . . 70
7.2 Log execution times for serial dynamic programming (basic algorithm) . . . . 71
7.3 Execution times for serial dynamic programming (improved algorithm) . . . . 72
7.4 Log execution times for serial dynamic programming (improved algorithm) . . 72
7.5 Memory consumption for serial dynamic programming (basic algorithm) . . . . 73
7.6 Log memory consumption for serial dynamic programming (basic algorithm) . 74
7.7 Memory consumption for serial dynamic programming (improved algorithm) . 75
7.8 Log memory consumption for serial dynamic programming (improved algorithm) 75
7.9 Parallel execution time for dynamic programming (basic algorithm, Ness) . . . 78
7.10 Parallel eciency for dynamic programming (basic algorithm, Ness) . . . . . . 78
7.11 Vampir trace summary for dynamic programming (basic algorithm, Ness) . . . 79
7.12 Parallel execution time for dynamic programming (basic algorithm, cyclic com-
munications, Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.13 Parallel eciency for dynamic programming (basic algorithm, cyclic commu-
nications, Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.14 Vampir trace summary for dynamic programming (basic algorithm, cyclic com-
munications, Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.15 Parallel execution time for dynamic programming (improved algorithm, Ness) . 82
7.16 Parallel eciency for dynamic programming (improved algorithm, Ness) . . . 83
7.17 Vampir trace summary for dynamic programming (improved algorithm, Ness) . 83
7.18 Parallel execution time for dynamic programming (basic algorithm, HPCx) . . 84
7.19 Parallel eciency for dynamic programming (basic algorithm, HPCx) . . . . . 85
7.20 Parallel execution time for dynamic programming (basic algorithm, cyclic com-
munications, HPCx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.21 Parallel eciency for dynamic programming (basic algorithm, cyclic commu-
nications, HPCx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.22 Parallel execution time for dynamic programming (improved algorithm, HPCx) 87
7.23 Parallel eciency for dynamic programming (improved algorithm, HPCx) . . . 87
7.24 Summary of parallel eciencies on HPCx . . . . . . . . . . . . . . . . . . . . 88
7.25 Conceptual representation of properties relevant to parallel performance . . . . 89
7.26 Parallel harmony search convergence durations (ZONEEXBLOCK= 100) . . . 91
7.27 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100) . . . . . 91
7.28 Parallel harmony search convergence durations (ZONEEXBLOCK= 1000) . . . 92
7.29 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 1000) . . . . 93
7.30 Parallel harmony search convergence durations (ZONEEXBLOCK= 10000) . . 94
LIST OF FIGURES xi
7.31 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100000) . . . 94
7.32 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 100) . 95
7.33 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 1000) 96
7.34 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 10000) 96
A.1 Project schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.1 UML class diagram of source code module and header relationships . . . . . . 106
List of Tables
5.1 Identied project risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.1 Mean error
e
, standard error
e
and error rate
e
of serial harmony search
ground states for increasing solution memory NVECTORS. Results are based
on the ground truth value 30.7214. Error rate is dened as the amount of cor-
rectly obtained ground state congurations over the total amount of algorithm
invokations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.2 Serial execution times for basic dynamic programming on Ness, for various
GCC 4.0 optimisation ags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.3 Serial execution times for basic dynamic programming on HPCx, for various
xlc optimisation ags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.4 Results for parallel basic dynamic programming on HPCx using 32 processors,
for combinations of user space (US) or IP communications in conjunction with
the bulkxfer directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
xiii
Chapter 1
Introduction
This dissertation describes aspects concerned with obtaining solutions to an optimisation prob-
lem, namely nding ground states of the Ising spin glass. Attention is given to parallel ap-
proaches, their implementation, and their performance.
The rst half of this work is devoted to theoretical aspects: The Ising spin glass is a model
relevant to statistical physics and other elds. In Chapter 2, the origins of this model are de-
scribed. The relation is drawn between between the projects physical background and the
aforementioned optimisation problem. The Ising spin glass is but one possibility of modelling
materials exhibiting glass-like properties; Chapter 2 also exposes its relation to more involved
models. In Chapter 3, the theoretical background of optimisation is examined. Existing ap-
proaches are reviewed. The two approaches bearing signicance to undertaken practical work
are detailed, namely dynamic programming and the harmony search heuristic. Parallelisation
strategies are described in Chapter 4, based on dynamic programming and harmony search.
Having examined theoretical aspects, practical aspects are then considered: Chapter 5 de-
scribes work relevant to project organisation. It includes a description of the projects objectives
and identied risks. This chapter is relevant to practical work undertaken during the project. As
a result of practical work, implemented software is described in Chapter 6. Software function-
ality is detailed, in addition to implemented libraries and the source codes structure. In Chapter
7, the implemented codes are evaluated. Experimental procedures are described, alongside pa-
rameters used for testing. Results are presented and interpreted. Finally, Chapter 8 concludes
the work. The projects objectives are reviewed in relation to undertaken practical work. Also,
possibilities for further work are explored.
1
Chapter 2
The Spin Glass
2.1 Introduction to magnetic systems
The phenomenon of magnetism is ubiquitously harnessed in modern technology; it crucially
underpins many applications in areas such as automotive engineering, information processing
and telecommunications. While known since antiquity, the scientic process has enabled an
increasingly accurate understanding of magnetic phenomena. In current research, investigating
the magnetic properties of physical systems remains of great interest in the eld of condensed
matter physics. One physical system, the spin glass, is the subject of such investigations. It
forms the background of work undertaken during the course of this project.
Given a physical system, it is possible to characterise its magnetic properties by examining
the relation between interactions occurring between internal subsystems, and the systems ex-
ternal magnetic moment. The systems external magnetic moment is a manifestation of these
interactions. More generally, all externally observable magnetic properties are the result of indi-
vidual subsystems properties. This concept is applicable both to microscopic and macroscopic
systems, for single or multiple subsystems: As an extreme case, one might consider a single
electron a system, as it possesses an intrinsic magnetic moment. In contrast, the interactions
within a three dimensional crystalline solid, for example are considerably complex and moti-
vate current investigations. This complexity is chiey due to magnetic interactions at atomic
scale.
At atomic level, the electron eects magnetism not only as a result of its intrinsic eld,
but also as a consequence of its orbital motion. The former is associated with binary state,
known as spin, which describes the particles internal angular momentum. It is spin which
determines the direction of the electrons intrinsic magnetic moment. In contrast, orbital motion
contributes towards the particles external angular momentum, since it describes the particles
movement about the nucleus. Atomic magnetic elds depend both on orbital conguration and
spin alignment, where each electron contributes towards the atoms net magnetic moment.
3
4 Chapter 2. The Spin Glass
In general, an electrons state is governed by quantum properties, which are subject to the
Pauli exclusion principle [31]. This asserts that for any fermion, such as the electron, particles
may not assume identical quantum state simultaneously. This has important consequences for
the spin conguration of interacting electrons and therefore inuences the magnetic properties
of multiatomic systems.
The rst implication of the exclusion principle is that for two electrons possessing iden-
tical orbital movement, spins must antialign to satisfy state uniqueness. Consequentially, the
electrons intrinsic magnetic moments antialign, causing net cancellation of these elds for the
particle pair.
The second implication relates to minimising a systems energy: For interacting electrons
with dierent orbital motion, the Pauli exclusion principle states that parallel spin alignment will
be favoured, since it guarantees that orbital movement remains disjoint. Because of electrostatic
repulsion, decreasing proximity between electrons lowers the systems energy. It is this relation
which allows certain materials to retain a magnetic eld, the result of a surplus of aligned spins
opposed to disordered spin conguration, in a favourable energetic state.
It turns out that the diculty in determining a systems magnetic properties stems from the
complexity of spin interactions: The structure of a specied material may be irregular, resulting
in diering ranges between electron orbitals. The type of atomic bonds and electron cong-
urations present in the material is also inuential, since these inuence the orbital energy of
electrons. It was previously mentioned that a systems energy is sought to be minimised. This
energy depends on the proximity in which interactions occur and hence behaves characteristi-
cally for the examined system.
The energy associated with spin interaction is expressed exactly in the so-called exchange
energy, rst formulated by Heisenberg [38] and Dirac [20]. Based on consequences of the Pauli
exclusion principle for the wavefunction of a system consisting of multiple fermions, the system
wavefunction is dened for combinations of aligned and antialigned spins. These wavefunctions
are then used to compute the exchange energy
J = 2
_

1
(r
1
)
2
(r
2
)V
I
(r
1
, r
2
)
2
(r
1
)
1
(r
2
) dr
1
dr
2
where
1
,
2
are wavefunctions of interacting particles with locations r
1
, r
2
on the real line and
V
I
is the interaction energy.
Using eigenanlysis
, it is furthermore possible to express the contribution towards the sys-

tems Hamiltonian arising from spin interaction, which depends on J and the spin operands s
1
,
s
2
for a pair of spins:
J (s
1
s
2
) (2.1)
An explanation is given by Griths [31]

2.2. Modelling magnetic systems 5

(a) Ferromagnetic

(b) Antiferromag-
netic
Figure 2.1: Types of spin interaction
This object is of fundamental importance for describing the interaction energy of large sys-
tems, since these may be described in terms of their underlying interacting subsystems. It is
employed simplied in models such as the Ising model [47] used in this project. The interaction
variable J is comprehensively known as the coupling constant. Although it assumes a positive
real value for spin interactions where parallel alignment is favoured, it is important to note that
antiparallel alignment is also favoured in many materials. Bearing this in mind, positive J are
associated with ferromagnetic coupling, whilst negative J are associated with antiferromagnetic
coupling. Figures 2.1(a), 2.1(b) illustrate these interactions.
2.2 Modelling magnetic systems
As currently described, the simplest type of magnetic interaction is expressed by dening two
fundamental operands and an associated coupling constant. Together with the coupling con-
stant, these fundamental operands are evaluated using an interaction operator. The operands
are commonly spins, whose state may be described using either a unit vector or an integer, for
example.
2.2.1 Spin interaction models
Because spin coupling is a symmetric relation, it is possible to describe interactions occurring
amongst multiple spins by considering the set E
_
s
i
, s
j
s
i
, s
j
S, i j
_
of pairwise bonds
amongst spins in a spin set S , given the weight function w : s
i
, s
j
R. This corresponds to
an undirected weighted graph. In the graph, the absence of the edge between two spins s
k
, s
l
is equivalent to the zero coupled edge w(s
k
, s
l
) = 0. An example of such a graph is shown
in Figure 2.2(a). Given this general case of an undirected graph, there are three specialisations
which have been used extensively to investigate the properties of magnetic systems consisting
of many spins.
In terms of spin interactions, a comparatively involved model is the so-called Axial Next
Nearest Neighbour Interaction (ANNNI) model. Here, spins are arranged conceptually as a
lattice in Euclidean n-space, with bond edges dened between neighbouring spins along each
dimension. In addition to these bonds, interactions for each spin are extended in a next spin
but one fashion along each dimensions. That is, interactions are dened by conducting a walk
(a) General undirected case

(b) ANNNI model

(c) EA model
Figure 2.2: Graphs of spin interactions
of length l 2 along the lattice in each dimension, given an initial node. A spin therefore
interacts with n 4d partner spins, as displayed in Figure 2.2(b). This model has been employed
extensively in research [57, 17, 56].
If the ANNNI model is modied by extending the length of the walk to innity in ar-
bitrary direction, the graph dened by spin interactions E becomes fully connected: E =
_
s
i
, s
j
s
i
, s
j
S, i j
_
. This realisation of lattice interactions is known as the Sherrington-
Kirkpatrick model [58], whose Hamiltonian is equal to
H =
(i, j)
J
i j
s
i
s
j
.
Here, the notation
_
(i, j)
indicates the sumover all spin interactions, as described. The Sherrington-
Kirkpatrick model is employed by Parisi [54] for the purpose of exploring transition properties
of magnetisation, using an approach known as mean eld theory.
Given that spin interactions occur over short range, an elementary approach to representing a
systemconsiders only nearest neighbour interactions between spins. In a two dimensional lattice
model, the graph of spin interactions becomes is then dened as E =
_
s
i
, s
j
s
i
, s
j
S, d(s
i
, s
j
) = 1
_
,
where d(s
i
, s
j
) is the block distance between spins s
i
, s
j
. This is illustrated in Figure 2.2(c). The
Hamiltonian of such a system is
H =
(i, j)
J
i j
s
i
s
j
where the notation
_
(i, j)
indicates the sum over nearest neighbour spin interactions. Due to
Edwards and Anderson [22], this model is the subject of work undertaken during the course of
this project.
Bonds
The exchange energy between two spins is governed by the magnitude of the coupling constant
J. When dealing with multiple interactions, these bond strengths are often selected from a
probability distribution. This distribution is a continuous uniform or Gaussian distribution for
2.2. Modelling magnetic systems 7
many modelling purposes [58, 60, 52]. When dealing with the Sherrington-Kirkpatrick model,
the exchange energy distribution often includes the property of exponential decay over spin
distance [12]. Another commonly used distribution [36, 21] permits only coupling constants
J 1, 1, such that both values are equally probable.
Other distributions have also been employed for dening coupling constants, such as the
twin peaked Gaussian [15]. Ermilov et al. [23] provide an investigation of the implications for
interactions with arbitrarily distributed bonds. In this project, the equally distributed variant of
spin coupling is considered.
2.2.2 Spin models
As with the approaches to modelling spin interaction, the spin object itself may be modelled
to varying levels of of complexity. Most realistically, in a quantum Heisenberg model, each
spin is described by its quantum state in three dimensions, so that the Hamiltonian for a two
dimensional Edwards-Anderson model becomes
H =
1
2
(i j)
J
x
x
i

x
j
+ J
y
y
i
y
j
+ J
z
z
i
z
j
where
x
k
,
y
k
,
z
k
are Pauli matrices corresponding to spin s
k
.
Alternatively, a classical Heisenberg formulation is also possible, as employed by Ding
[19], Kawamura [42]: Here, spins are represented as thee-dimensional real-valued unit vectors,
so that exchange energy between spins s
i
, s
j
is calculated by means of the inner vector product,
as described in Equation 2.1. A simplication achieved by discretising spin state exists in the
so-called Potts model [63]. Here, a spin may assume a state s
i
1, . . . k where k is the total
number of states. The Hamiltonian of a system of spins with nearest-neighbour interaction is
expressed as
H =
(i, j)
J
i j
cos
_
(s
i
) (s
j
)
_
with (s
i
) = 2s
i
/k.
The Potts model may be simplied further, achieved when considering the case of the model
when k = 2: Dene the Potts correlation function (s
i
, s
j
) = cos
_
(s
i
) (s
j
)
_
. Given that
: 1, 2 , 2, the mapping
(s
i
, s
j
) =
_
_
1, s
i
= s
j
1, s
i
s
j
is a sucient denition for the correlation function in the described case. Alternatively, (s
i
, s
j
) =
cf. Baxter [8]

s
i
s
j
, with s
i
, s
j
1, 1. This leads to the denition of system energy as
H =
(i, j)
J
i j
s
i
s
j
,
with s
i
, s
j
1, 1.
When combined with nearest neighbour interactions and constant J, this archetypal model
of spin interaction is known as the Ising model [11]. As formulated, in the Ising model, a
spins state eects an exchange energy, whose sign is inverted if the spins neighbours assumes
opposing state. In this respect, the model spin object is an abstraction of electron state which
discards the consequences of orbital movement , considering only intrinsic angular momentum.
While comparatively restrictive, an adaptation of the Ising model has been the subject of
intense research in its originating eld of statistical physics [8]. In addition to certain applica-
tions in investigating the behaviour of neural networks [4] and biological evolution [49], this
model has proven popular in examining the properties of materials in the eld of condensed
matter physics [26]. One application involves investigating the properties of materials collec-
tively known as spin glasses. These possess distinctive properties, which are described in the
following.
2.3 The Ising spin glass
Spin glasses are substances which are characterised by structural disorder. This is the case for
chemical glasses or certain types of dilute metal alloys. These materials possess highly irreg-
ular microscopic structure, which has implications for magnetic interactions between ions. In
particular, disorder results in a distribution of ferromagnetic and antiferromagnetic interactions,
which are the origin of the phenomenon known as frustration.
The dynamics of spin glasses are such that there exists a critical phase transition tempera-
ture, above which the system behaves like a conventional paramagnet or ferromagnet. Below
the transition temperature however, a magnetic disorder manifests itself, called the spin glass
phase. This magnetic disorder is responsible for the systems unique behaviour.
Frustration, the second component to characteristic behaviour, arises when a systems ener-
getically optimal state is the result of combined interactions which cannot individually assume
optimum state. Instead, the global optimum requires certain interactions to be suboptimal. De-
pending on the constituent interactions, this may imply that there exist multiple state congura-
tions which yield the energetic optimum.
An example of this principle is shown in Figure 2.3(a). Here, three Ising spins s
0
, s
1
, s
2

1, 1 interact in a triangular lattice. Because bonds are not consistently ferromagnetic, it is
apparent that some interactions require spins with opposing orientation, to be optimal. This is
the case for the antiferromagnetic bond between spins s
1
, s
2
. For either optimal conguration of
2.3. The Ising spin glass 9

s
0
s
1
s
2
(a) Three spins

(b) Four spin plaque-
tte
Figure 2.3: Frustrated systems
the spin pair it is not possible however, to set s
0
so that optimality of the remaining interactions
is satised. Similarly, when evaluating the system commencing with pairs s
0
, s
1
or s
0
, s
2
, it is
not possible to set the remaining spin so that all interactions are satised. It follows that there
exists no conguration of this system in which all interactions are optimal.
In the n-dimensional lattice Ising realisation of a spin glass, the smallest structure capable
of exhibiting frustration is shown in Figure 2.3(b). Considering all 2
4
combinations of positive
and negative coupling constants, it can be seen that frustrated interactions occur for odd num-
bers of antiferromagnetic or ferromagnetic bonds. For larger systems, it is possible to analyse
frustration by decomposing the lattice into subsystems of this kind. In this context, the square
substructure is termed a plaquette.
Uses of the Ising spin glass
The extent to which the Ising model departs from a realistic representation of magnetic phe-
nomena was previously described. Although the models accuracy presents a disadvantage, its
comparative simplicity lends itself to certain analytical advantages: These advantages are based
on the fact that the state space of a single spin is small, which has consequences for evaluating
sets of spin systems. Also, since spins interact only over nearest neighbour boundaries, it is
trivial to decompose a system into its constituent subsystems, should this be required. Using
such a scheme, total exchange energy is the sum of internal subsystem energy and subsystem
interaction energy (Figure 2.4). This approach is employed in analytical methods described in
following chapters.
For experimental purposes, it is of interest to examine computationally the behaviour of
various realisations of spin glasses. As spin glasses are thermodynamic systems, knowledge
of ground state energy is of particular importance towards this aim. Formally, given an n-spin
system where S = s
0
, s
1
, . . . , s
n1
represents some conguration of these spins,
argmin
S
H(S )
Subsystem interaction energy

Figure 2.4: Subsystems and associated interaction energy

Free Invert, clamp
Figure 2.5: Clamping spins to determine interface energy.
is the systems ground state. The Hamiltonian H(S ) describes the energy of system congura-
tion S . In the case of the Ising model with real valued coupling constants, there exists a single
ground state conguration, and an equivalent conguration with all spins inverted. For systems
with discrete valued coupling constants, a number of degenerate ground states may exist. Pro-
vided an algorithm for determining ground states, it may be of interest to examine the eect
system size on ground state energy.
Previous work investigates scaling with regard to a related quantity, the so-called interface
energy [15]. For an Ising-like model, interface energy is the absolute dierence between ground
state energies, obtained when altering the model instances spin conguration with respect to
a certain boundary condition (coupling constants are left unaltered). Figure 2.5) shows an ex-
ample, again using the two dimensional lattice Ising model. Here, ground state congurations
are obtained for two experimental instances: In the rst instance, the entire set of spin cong-
urations is considered. In the second instance, spins in the rightmost column are clamped:
Their state is equal to that of the previously obtained conguration, only inverted. Enforcing
this condition in the second instance allows the behaviour of adjacent spins to be examined.
A closely related aspect deals with exploring the behaviour of spin glass properties in the
limit N , where N is the system size. For certain purposes, it is benecial to approxi-
mate this condition by introducing periodicity into spin interactions. In the Ising model, pairs
of boundary spins along dimensions with periodic boundary conditions interact in the manner
2.3. The Ising spin glass 11
illustrated in Figure 2.2(b). This can easily be expressed mathematically by applying modular
arithmetic to the one dimensional Ising case H =
_
i
J
i
s
i
s
i+1
, requiring minor modication for
models with d > 1.
In thermodynamic systems, attention must be given to the relation between macroscopic and
microscopic properties. To this extent, an important object is the partition function, dened as
Z(T) =
S
e
H(S )/kT
,
where H(S ) is the system energy, T the absolute temperature and k the Boltzmann constant.
The sum is over all (microscopic) system congurations S . Using the partition function, it is
possible to determine the probability P(S ) of a specic state as
P(S ) =
e
H(S )/kT
Z(T)
Fortunately, when examining an ensemble at T = 0K it turns out that P(S ) = 1 i S is a ground
state conguration, otherwise P(S ) = 0. This fact has implications for computing ground state
energies of Ising spin glasses, the subject of this project.
Chapter 3
Computational Background
In the previous chapter, the Ising model was introduced. System energy was described as a type
of utility function for evaluating system congurations. The problem of obtaining ground state
energy was introduced.
In this chapter, nding ground states of the Ising spin glass is approached as a combinatorial
optimisation problem. In this context, existing solutions are examined, in addition to describing
two approaches implemented in this project, harmony search and dynamic programming. The
latter approach is the consequence of describing spin glass interactions as a Markov chain, which
lends itself to a formulation of the most likely sequence of events in the chain, i.e. the Viterbi
path [61].
3.1 Ising spin glass ground states and combinatorial optimisation
Formally, any instance of the Ising spin glass denes the energy function E(S ) with E : 1, 1
n
R. Here, S = (s
1
, s
2
, . . . , s
n
) is an n-spin conguration, with each spin s
i
1, 1. For con-
venience, a notation for describing a conguration partitioned into p disjoint subsystems is
introduced as S = S
1
, S
2
, . . . , S
p
. The real valued co-domain of E(S ) corresponds to the total
system energy. The total system energy of a partitioned system is
E(S ) =
p
k=1
E(S
k
) +
(i, j)
J
i j
s
i
s
j
s
i
S
, s
j
S
,
where (i, j) denotes nearest neighbour Ising interactions, as described in Chapter 2. The subsys-
tems S
, S
are disjoint. By decomposing spin interactions occurring within the entire system,
energy is expressed as the sum of subsystem energy and system boundary energy. Dening
13
14 Chapter 3. Computational Background
E(S
2
)
E(S
3
)
E(S
4
)
E(S
1
)
E(S )
E(S
1
, S
2
)
E(S
2
, S
3
)
E(S
3
, S
4
)
E(S
4
, S
1
)
Figure 3.1: Computing total system energy from subsystem interactions
E
b
(S
i
, S
j
) as the system boundary energy between disjoint subsystems S
i
, S
j
.
E
b
(S
i
, S
j
) =
(q,r)
J
q,r
s
q
s
r
s
q
S
i
, s
r
S
j
,
the total system energy can be dened as
E(S ) =
p
k=1
E(S
k
) +
(i, j)
E
b
(S
i
, S
j
)
where (i, j) denotes nearest neighbour interactions between subsystems, in analogy to nearest
neighbour interactions between individual spins. An example of system decomposition is pre-
sented in Figure 3.1, for a system with cyclic boundary interactions. Decomposition is relevant
to approaches described in this chapter.
Determining ground states
The ground state conguration of an Ising spin glass is dened as S
min
= argmin
S
E(S ). The
domain of the evaluated function E(S ) implies that an exhaustive search of the systems state
space requires 2
S
individual evaluations. Such a brute force approach might be implemented
using a depth-rst traversal of the state space.
Clearly, using this method is only practicable for the very smallest of problem instances, as
the search space grows exponentially with the number of spins in the system. Therefore, it is
of interest to examine the possibility of restricting the search space, consequently reducing the
complexity of obtaining solutions to problem instances.
The fact that the upper bound of search space size grows exponentially, suggests that the
3.1. Ising spin glass ground states and combinatorial optimisation 15
ground state problem belongs to the class of NP problems. Due to Barahona [6], it is shown
that in fact, certain cases of the problem are NP-complete, such as the two dimensional lattice
where every spin interacts with an external magnetic eld, and the three dimensional lattice
model. Istrail generalises the proof of NP-completeness to any model where interactions are
represented as a non-planar graph [16].
Fortunately, planar instances of the Ising model are not guaranteed to be in NP; a polynomial-
time bound is shown by Barahona for the two dimensional, nite sized model. This fact implies
that obtaining ground states is not intractable for this case of the model, and motivates the devel-
opment of ecient algorithms which obtain exact solutions. The latter are dened as solutions
equivalent to those generated from an exhaustive search.
3.1.1 Approximate approaches for determining ground states
Regardless of NP-completeness, formulation of the ground state problem as a combinatorial
optimisation problem allows a second approach to be considered, involving the class of meta-
heuristic algorithms. Although these algorithms are typically only guaranteed to search exhaus-
tively as time goes towards innity, many have been shown to produce optimal or near-optimal
solutions to a wide number of problems, provided sucient execution time. It is therefore of
proximate interest to investigate the performance of these algorithms, in context of the Ising
spin glass.
By common denition, a metaheuristic is a heuristic applicable to solving a broad class
of problems [28]. In practice, this is achieved by dening a set of black-box procedures,
i.e. routines specic to the problem. When dealing with combinatorial optimisation problems,
these routines typically include a utility function, whose purpose it is to evaluate candidate
solutions selected from the state space. Utility is then used to compare solutions amongst one
another.
To be of practical use for problems with large state spaces, a heuristic must arrive at a solu-
tion by considering some subset of this space its search space. The metaheuristic approach often
achieves this by random sampling [28], which may cause the algorithm to produce suboptimal
results. To apply a metaheuristic eectively, it may therefore be necessary to evaluate perfor-
mance against dierent combinations of algorithm parameters. Generating sucient amounts
of samples may motivate parallel algorithmic approaches. Also, although it has been shown
that the performance of optimisation algorithms remains constant over the class of all optimisa-
tion problems [62], there may be signicant performance dierences between algorithms when
applied to a specic problem class. It is hence of interest to examine diverse metaheuristic
approaches in conjunction with the described optimisation problem.
Evolutionary algorithms
One class of metaheuristic is inspired by biological evolution. Here, a population of candidate
solutions is created and subsequently evolved in an iterative process, where individual parent
solutions are selected stochastically in order to generate ospring solutions. The process of
selection is designed to favour solutions which exhibit high tness, the latter evaluated using
a utility function. In a further biological analogy, ospring are generated by combining solution
parameters fromboth parents, prior to randomised modication (mutation). These newsolutions
are then added to the population, which is typically maintained in order to stay in equilibrium.
The process is then repeated, terminating either on completing a specied number of iterations,
or when a convergence criterion is fullled.
Evolutionary algorithmic approaches applicable to combinatorial optimisation are known
as genetic algorithms [50]. The approach here involves representing a solution by the set of
parameters supplied to the target function as a string. After evaluating solution tness as pre-
viously described, crossover is typically realised as a manipulation of substrings: For example,
one might generate ospring as a combination of permuted substrings from parent strings. Cor-
respondingly, mutation might be realised as a permutation of substring elements from a single
solution. It is evident that the multitude of possibilities in which selection, crossover and mu-
tation may be implemented, has the potential to cause deviations in the optimisation process
performance.
Genetic algorithms have been applied to the spin glass ground state problemby Gropengiesser
[32], who considers two variants of the basic evolution procedure. In the rst, the population
is initialised to multiple instances of a single solution, to which mutation is then applied it-
eratively. Using a local search heuristic, mutations conducive to lowering the system energy
are accepted. In the second variant, the former regime is augmented with random parent se-
lection and crossover, such that every child solution replaces one of its parents. Results show
that performance is aected strongly by the method of admitting new candidate solutions to the
population, following mutation.
As one might expect, approaches incorporating local minimisation techniques have shown
to improve optimisation performance, as implemented by Hempel et al. [40], using a so-called
hybrid genetic algorithm. This is in comparison to an early investigation by Sutton [59], using
a general evolutionary approach. Houdayer and Martin [41] report good performance for the
Ising model with discrete J bond distribution, using a Genetic Renormalisation algorithm.
Here, domain specic knowledge is incorporated into the optimisation process by recursively
partitioning the graph of spin interactions, in resemblance to the description at the beginning of
this chapter. A local optimisation process is then applied to the partitioned system.
Given the nature of the project, of special interest are methods of parallelising genetic al-
gorithms. In the general context of evolutionary computing, Cantu-Paz [14] describes a coarse
grained approach known as the island method. In the distributed memory paradigm, processes
are arranged in a toroidal grid, each executing the algorithm in parallel. After each iteration,
a subpopulation of local solutions is selected based on tness, and exported to neighbouring
processes asynchronously. As an alternative, a ne grained scheme may also be used, where
crossover is allowed to take place between solutions residing at dierent processes.
Simulated annealing
Simulated annealing is a technique readily applicable to calculating ground states, as it is based
on the principles in statistical physics which underpin the Ising model. The technique is derived
from the Metropolis-Hastings algorithm [37], in which a probability distribution is sampled in-
directly by means of a rst-order Markov chain. That is, the distribution of a generated sample is
suciently dened by the value of its predecessor. In simulated annealing, a candidate solution
S in the state space is associated with the probability
P(S ) e
H(S )/(kT)
,
the state probability of a canonical ensemble, which was introduced in Chapter 2.
Optimisation is performed by initialising a random solution conguration and sampling
proximate congurations in the state space by stochastic parameter modication: Specically
for the Ising model, this would involve perturbing spins by inverting their state. The new con-
guration is accepted if the perturbation resulted in lower system energy, otherwise the state is
accepted with probability e
H/(kT)
where H is the change in system energy. Of importance is
the value of temperature T, which is initialised to a certain value and decreased monotonically
towards zero according to a specic annealing schedule, as the algorithm progresses.
In Chapter 2, it was mentioned that as T approaches zero, P(S ) = 1 i S is a ground
state. A consequence of this fact for the optimisation process is that if T is initialised to a
nite temperature and decreased suciently slowly, the algorithm is guaranteed to arrive at the
systems globally optimal state [51]. In practice, execution time is restricted to a fraction of that
required for an exhaustive search, so that the annealing process becomes an approximation.
Simulated annealing was rst applied to the spin glass problem by Kirkpatrick, Gelatt and
Vecchi [44]. It is important to note that the choice of annealing schedule signicantly aects
the algorithms ability to arrive at an optimal solution. This is because temperature inuences
the amount of selectivity involved as state space is explored. Conversely, it follows that the
solution landscape particular to a problem usually aects the accuracy of solutions obtained by
the algorithm using a particular schedule.
Ram et al. describe an approach to parallelising the algorithm [55]. Clustering simulated
annealing is based on the observation that a good initial solution typically reduces the amount
of iterations required for the algorithm to converge. After executing the algorithm on multiple
processing elements with dierent initial states, an exchange of partial results takes place to
determine the most favourable solution. This result is then redistributed to all processing ele-
ments, in order to repeat the process a set number of iterations, after which the nal solution is
determined.
Harmony search
A recently developed optimisation algorithm is due to Geem [27]. Known as harmony search,
this algorithm has been applied to a number of optimisation problems such as structural design
[45] and data mining [25]. Harmony search can be considered an evolutionary algorithm, as it
maintains a population of candidate solutions, which compete with one another for permanency
and inuence generation of successive candidates.
Inspired by the improvisational process exhibited by musicians playing in an ensemble, har-
mony search iteratively evolves new solutions as a composite of of existing solutions. As with
genetic algorithms, a utility function determines whether a newly generated solution is included
in the candidate set. In addition to devising a probabilistic scheme for combining parameters
from existing solutions, new solutions are modied according to a certain probability. This is
designed to improve exploration of the state space, similar to genetic mutation.
Formally, the algorithm denes an ordered set = (
1
,
2
, . . . ,
m
), of m candidate solu-
tions, where each candidate is an n-tuple
k
= (
k
1
,
k
2
, . . . ,
k
n
). Algorithm parameters are the
memory selection rate P
mem
, the so-called pitch adjustment rate P
ad j
and the distance bandwidth
R. Random variables X 1, 2, . . . , m and Y [0, 1) are also dened. Using a termination
criterion such as the number of completed iterations, the algorithm performs the following steps
on the set of initially random candidates:
Generate:
= ((1), (2), . . . , (n)) where (i) =

_
X
i
Y P
mem
Random parameter value Y > P
mem
Update: For 1 i n,
i
+ i Y P
ad j
Replace:
w argmax
w
min
w
,
In the rst step, the algorithm generates a new candidate, whose parameters are selected at
random both from existing solutions in the population and from a probability distribution. In a
further stochastic procedure using random variable Y, solution parameters are modied. This
step is of particular signicance for continuous optimisation problems; it may be preferable
to omit it in other cases. Finally, the population is updated by replacing its worst solution, if
the generated candidate is of higher utility. The process is then repeated, using the updated
population.
An application of harmony search to the discrete Ising ground state problem is trivial, by
assigning each solution the ordered set of spins dened at the beginning of this chapter, i.e.
k
=
(s
1
, s
2
, . . . , s
n
). Because the set of solution parameter values is discrete and small, the eect
of modifying solutions due to distance bandwidth can be consolidated into the algorithms
generation step. The process thus consists solely of generating and conditionally replacing
existing solutions in memory, governed by parameters m (the candidate population size) and
P
mem
(the memory selection rate). Work undertaken for this project examines the performance
of this algorithm for nding Ising spin glass ground states.
3.1.2 Exact methods for determining ground states
Graph theoretic methods
Returning to the spin glass as an exactly solvable model, it is necessary to examine the graph
representation of spin interactions more closely. An undirected graph G = (V, E) is described
by a set of vertices V = v
1
, v
2
, . . . , v
n
and edges E v
i
, v
j
v
i
, v j V. Given an Ising spin
glass model S = s
1
, s
2
, . . . , s
n
let S = V and E = s
i
, s
j
J
i j
> 0 where J
i j
is the bond
strength between spins s
i
, s
j
. The set of vertices is partitioned into subsets S
+
, S
such that
S
+
= s
i
s
i
= 1, S
= s
i
s
i
= 1.
Gr otschel et al. [29] provide a description of a method which is the basis of algorithms
developed by Barahona et al. [7]. Here, the systems Hamiltonian is described in terms of S
+
and S
as
H(S ) =
(i, j)E(S
+
)
J
i j
s
i
s
j

(i, j)E(S
)
J
i j
s
i
s
j

(i, j)(S
+
)
J
i j
s
i
s
j
where E(T) = s
i
, s
j
s
i
, s
j
T and (T) = s
i
, s
j
s
i
S, s
j
S \T. Considering the eect
of opposing spin interactions, the Hamiltonian can be rewritten as
H(S ) =
(i, j)E(S
+
)
J
i j

(i, j)E(S
)
J
i j
+
(i, j)(S
+
)
J
i j
,
from which it follows
H(S ) +
(i, j)S
J
i j
= 2
(i, j)(S
+
)
J
i j
.
The ground state energy can now be formulated in terms of the function as
H
min
= min
S
+
S
_
_
2
(i, j)(S
+
)
J
i j

(i, j)S
J
i j
_
_
Because the co-domain of consists of edges which dene a cut of the graph of spin interac-
tions, i.e. a partition of nodes into two disjoint sets, obtaining ground states is now described in
graph theoretical terms as a cut optimisation: As formulated, ground state energy is expressed
as the minimum cut of a weighted graph. Equivalently the problem can be formulated as a
maximisation, if the signs of interaction energies are inverted.
Hadlock [34] shows further that nding a maximum cut of a planar graph is equivalent
to determining a maximum weighted matching of a graph, for which there exist polynomial
time algorithms. Bieche et al. [10] and Barahona [6] follow this approach, where a graph is
constructed based on interactions between spin plaquettes. A recent similar approach due to
Pardella and Liers [53] allows very large systems to be solved exactly.
De Simone et al. employ a method known as branch-and-cut. Here, the cut optimisa-
tion problem is initially expressed as an integer programming problem. In integer program-
ming, the objective is to determine max
_
u
T
xAx b
_
, where the components of vector x Z
n
are determined subject to constraints dened by vectors a, b and matrix A. During execution,
branch-and-cut specically employs the linear relaxation of the programming problem, where
it is permitted that x R. This relaxation is combined with the branch and bound algorithm,
which is invoked when a non-integral solution of x is determined. Substituting the non-integral
component with integers, the problem is divided using a further algorithm, which recursively
generates a tree of subproblems. By maintaining bounds on solution utility, it is possible to
identify partial solutions which are guaranteed to be suboptimal. Since these are not required
to be subdivided further, the search tree is pruned. Liers et al. [46] describe the branch-and-cut
algorithm in detail, which permits tractable computation of spin glass models consisting of 50
2
spins without periodic boundaries.
Transfer matrix
Atechnique applicable to various problems in statistical mechanics is the transfer matrix method
[8]. The requirement is as described at the beginning of this chapter, where a system is described
in terms of adjacently interacting subsystems. Using the denition of system state probability,
a matrix describing interactions is dened as A =
_
p
i j
_
where p
i j
= P(S
i
k+1
, S
j
k
), given sub-
systems S
k+1
, S
k
assuming states S
i
k+1
2
S
k+1
, S
j
k
2
S
k
. Conditional independence from other
systems is assumed, i.e. P(S
i
k+1
S
j
k
) = P(S
i
k+1
S
j
k
, S
1
, S
2
, . . . S
p
). Here, the notation 2
S
denotes
the set of all spin congurations of system S .
By implications of conditional state probability, given an initial subsystem it is possible to
evaluate the state of successive subsystems via a series of matrix multiplications. Problems such
as determining the partition function can be solved using eigenanalysis, an example of which is
given in [15]. The transfer matrix approach due to Onsager allows the partition function of the
two-dimensional Ising model to be formulated [39].
In the next section, the framework of Markov chain theory is used to examine in detail
probabilistic interactions within the Ising spin glass. The Markov transition matrix is equivalent
3.2. A dynamic programming approach to spin glass ground states 21
to the transfer matrix, hence it follows that methods for system properties are closely related.
The chosen approach exposes a dynamic programming formulation of the ground state problem
with implications for further parallelisation.
3.2 A dynamic programming approach to spin glass ground states
A system S is described by a set of states S =
_
S
1
, S
2
, . . . , S
n
_
, for example spin congurations
S =
_
S
i
S
i
2
S
_
. Again, 2
S
denotes the set of all system congurations. Residing in state S
,
the system undergoes a series of non-deterministic state transitions, such that each successive
system conguration S
is determined from the assignment S
= t(S
). The map t : 2
S
2
S
is dened using a vector of random variables v = (v
S
1 , v
S
2 , . . . , v
S
n ), where v
S
i is a random
successor state the system may assume when in state S
i
. The probability mass function of these
random variables is dened as
f
v
S
i
_
S
j
_
= P
_
v
S
i = S
j
S
i
_
.
Given an initial distribution of states, it may be of interest to determine the most likely sequence
of states. For this purpose, it is useful to examine the system in terms of its Markov properties.
3.2.1 Markov chains
Dene a sequence of states C = (S
x
1
, S
x
2
, . . . , S
x
m
). The sequence is said to full the rst-
order Markov property, if the value of any single state suciently determines the probability
distribution of the states successor in the sequence, i.e.
i
_
S
x
i+1
S
x
i
_
= P
_
S
x
i+1
S
x
i
, S
x
i1
, . . . , S
x
1
_
.
Formulating the probabilities of state transitions in matrix form is convenient for evaluating
the behaviour of the sequence after nite or innite state emissions: Dene the transition matrix
between sequence elements i, i + 1 as
M
i, i+1
=
_
_
P(S
1
S
1
) P(S
1
S
2
) . . . P(S
1
S
n
)
P(S
2
S
1
) P(S
2
S
2
) . . . P(S
2
S
n
)
.
.
.
.
.
.
.
.
.
.
.
.
P(S
n
S
1
) P(S
n
S
2
) . . . P(S
n
S
n
)
_
_
,
where P(S
) denotes the probability of the emission S
as the i + 1
th
element in the chain af-
ter the i
th
emission, S
. It follows that the probability distribution of states d
=
_
P(S
1
), P(S
2
), . . . , P(S
n
)
_
T
a b c
P(b|a) P(c|b)
P(a|b) P(b|c)
Figure 3.2: Example rst-order Markov chain with states a, b, c
after m sequence emissions can be evaluated as
d
=
_
_
m
_
k=1
M
k, k+1
_
_
d (3.1)
where vector d is the initial state distribution. If for all k, M
k, k+1
= M
k1, k
, the Markov chain
is termed time-homogeneous. Such a chain may be represented by a directed, weighted graph
as shown in Figure 3.2, where nodes represent states and labelled edges represent transition
probabilities. A detailed discussion of further Markov chain properties is provided by Meyn
and Tweedie [48].
By current denition, state emission is governed by an amount of memory, in that preced-
ing sequence values inuence state output at any given point in the sequence. The rst-order
Markov chain, where states are conditionally dependent on a single, immediate predecessor, is
the simplest instance of a Markov process.
When extending the amount of chain memory, i.e. increasing the number of preceding states
which determine the distribution of output states, the order-n Markov chain must be considered.
A generalisation of the archetypal rst-order model, the distribution of an emitted state depends
on n immediate predecessors in the sequence. Following the denition of the rst-order model,
the requirement for an order-n chain is
i
_
S
x
i
S
x
i1
, S
x
i2
, . . . , S
x
in
_
= P
_
S
x
i
S
x
i1
, S
x
i2
, . . . , S
x
1
_
,
i.e. knowledge of preceding n states suciently denes the probability of state S
x
i
in the se-
quence. Both model have implications for algorithm design.
3.2.2 Ising state behaviour as a Markov chain
In context of the previously described Markov model, the following approach examines Ising
interactions within the two-dimensional lattice without boundary conditions. Initially, the lattice
lattice is partitioned into rows, as shown in Figure 3.1. Clearly, interactions between individual
rows occur in nearest-neighbour fashion, signicantly along a single dimension. That is, for an
n m spin system, the partition is dened as S = S
1
, S
2
, . . . , S
n
with S
i
1, 1
m
, 1 i n.
The energy of the system is
n
i=1
H(S
i
) +
n
i=2
H
b
(S
i1
, S
i
) = H(S
1
) +
n
i=2
H(S
i
) + H
b
(S
i1
, S
i
)
where H(S
i
) is the Hamiltonian of subsystem S
i
and H
b
(S
i
, S
j
) is the boundary energy between
subsystems S
i
, S
j
, as previously dened.
Since
S
i
= S , the entire lattices state is suciently described by the states of its constituent
rows. It is reminded that because it is a statistical mechanical model, state is probabilistic, with
P(S ) e
H(S )/(kT)
. Using the described partitioning scheme, it turns out that subsystem state
probability fulls the property of a rst-order Markov chain (cf. Appendix C).
3.2.3 The ground state sequence
Given the Markov property under the chosen representation of Ising interactions, the implica-
tions of ground state for the chain of states (S
x
1
1
, S
x
2
2
, . . . , S
x
n
n
) are next examined. Formally, the
probability P
gnd
of obtaining ground state energy min
S S
H(S ) is
P
gnd
exp
_
1
kT
min
S S
H(S )
_
max
S S
_
exp
_
1
kT
H(S )
__
,
from which it is clear that P
gnd
must be maximised, in order to infer the ground state congura-
tion. This conguration is given by the sequence
argmax
(S
1
,S
2
,...,S
n
)
_
_
P(S
1
)
n
_
i=2
P(S
i
S
i1
)
_
_
,
which is the most likely sequence of emitted states in a rst-order Markov chain.
This result is of signicance for obtaining an algorithm for computing ground states, be-
cause there exists a well-known approach due to Viterbi [61]. The basis of the Viterbi algo-
rithm is the observation that optimal state for the rst symbol emission in the chain is simply
argmin
S
1
H(S
1
). Augmenting the size of considered subproblems, optimum solutions are de-
termined successively, until the size of the set of considered problems equals the originally
specied problem. At this point, the optimisation is complete.
The probability of the most likely sequence of emissions (S
1
1
, S
2
2
, . . . , S
n
n
), known as the
6
2
10
1
Figure 3.3: Illustrating the principle of optimality. Paths within the dashed circle are known to
be optimal. Using this information, optimal paths for a larger subproblem can be computed.
Viterbi path, can be obtained from the recurrent formulation
P
viterbi
(S
i
) =
_
_
max
S
i
P(S
i
) i = 1
max
S
i1
P(S
i
S
i1
) P
viterbi
(S
i1
) i > 1,
by evaluating max
S
n
P
viterbi
(S
n
). It follows that the actual sequence can be formulated as
viterbi(i) =
_
_
argmax
S
i
P
viterbi
(S
i
) i = 1
argmax
S
i
P
viterbi
(S
i
) + viterbi(i 1) i > 1,
determined by evaluating viterbi(n). In this case, the + operator denotes symbol concatenation,
so that (S
1
1
, S
2
2
, . . . , S
n
n
) = S
1
1
+ S
1
2
+ . . . + S
n
n
.
It is important to note that recursive denition of the Viterbi path diers from the (subopti-
mal) approach of optimising every conditional probability P(S
i
S
i1)
individually. Instead, the
path is dened as the optimum of incremented subproblems, where subproblems are dened as
optimal. Schematically depicted in Figure 3.3, this approach is an application of the principle
of optimality due to Bellman [9]. Consequentially, the Viterbi algorithm is an instance of the
dynamic programming problem, recursively dened for all x X as
V(x) = max
y(x)
F (x, y) + V (y) ,
where is a map and 0 1 is the so-called discount factor. The function V(x) is known as
the value function, and is optimised using F(x, y).
The concrete algorithm for computing the Viterbi path probability avoids the overhead and
backtracking suggested by the aforementioned recursive formulation. It involves an iterative
loop to increment the size of the considered system:
opt[] := 1
for i := 1 to n
for S
j
i
S
i
p
max
:=
for S
k
i1
S
i1
p := P
_
S
j
i
S
k
i1
_
* opt[k]
if p > p
max
optNew[k] := p
opt := optNew
In the listing, S
j
i
denotes conguration j of subsystem S
i
, according to previous convention.
The array opt[] records the optimum path probability for preceding subsystems S
1
, S
2
, . . . , S
i
for every iteration i of the algorithm. Elements of the array are initially set to unity. A second
array optNew[] is used to store updated path probabilities, which are subsequently copied to
opt[] after each iteration of the outer loop. Although the values of optimal state emissions are
discarded in this pseudocode, it is possible to retain them by storing them in an associative data
structure. An implementation of this approach is presented in Chapter 6.
Examining the algorithms time complexity, it is apparent that execution time is proportional
to the product of the three loops length, since these assume nested structure. That is,
t(n) n
2
S
1
2
,
where n is the number of subsystems, and 2
S
1
is the set of congurations of subsystem S
1
. It
follows that if the spin lattice has dimensions n m, it is
t(n, m) n 2
2m
which is O
_
n 2
2m
_
.
By further observation it turns out that the Viterbi path can also be used to evaluate system
energy (cf. Appendix D). This provides a dynamic programming solution to the two dimensional
lattice without boundary conditions, which is
H
min
(S
i
) =
_
_
min
S
i
H (S
i
) i = 1
min
S
i1
H (S
i
) + H
b
(S
i
, S
i1
) + H
min
(S
i1
) i > 1.
(3.2)
3.2.4 Boundary conditions
It is of interest to examine the eects of introducing cyclic boundary conditions on state op-
timality, using the described approach. As the latter involves partitioning the spin lattice into
rows, it is possible to dierentiate between energetic contributions occurring within subsystems
S
1
, S
2
, . . . , S
n
, and energetic contributions occurring between these. It is apparent that horizon-
tal conditions have an eect on subsystem energy, whereas vertical conditions eect subsystem
interactions.
The rst eect is caused by horizontal boundary interactions, as these involve spins located
at the outermost positions of each spin row. The Hamiltonian H(S
i
) thus eectively includes an
additional term to account for an additional pairwise interaction. The Hamiltonian of the entire
lattice is
_
n
i=1
H(S
i
) +
_
n
i=2
H
b
(S
i
, S
i1
), which suciently accounts for horizontal boundary in-
teractions within the system. Since the recursive formulation of ground state energy in Equation
3.2 also computes the sum of all subsystem Hamiltonians and their interactions, the existing
dynamic program formulations and algorithms can be left unmodied. It follows that the al-
gorithmic complexity of computing ground states does not increase for the case with cyclic
boundaries along a single dimension.
In contrast, the vertical cyclic boundary condition results in pairwise interactions between
subsystems S
1
, S
n
, i.e. the initial and ultimate spin rows. Here, each row constituent spin s
j

S
k
(k 1, n) potentially has a non-zero bond interaction with its neighbour, s
j
S
k
(k

1, n \k). Consequentially, The Hamiltonian for the entire lattice is given by
_
n
i=1
H(S
i
) +
_
n
i=2
H
b
(S
i
, S
i1
) + H
b
(S
1
, S
n
), where the latter term is the interaction energy between the two
boundary systems in question. Here, it follows that the proposed existing solution does not
yield the ground state energy, as the recursive formulation does not include the additional term.
Conguration optimality is therefore not guaranteed, for the case with cyclic boundaries along
both lattice dimensions.
As a modication of the original dynamic programming solution, it is conjectured that the
ground state conguration can be determined by evaluating the set of problem instances where
both boundary rows are assigned spin congurations in advance, i.e.
H
min
= min
S
1
, S
n
H
min
(S
n
, S
n
, S
1
),
with
H
min
(S
n
, S
i
, S
1
) =
_
_
H (S
i
) + H
b
(S
1
, S
n
) i = 1
min
S
i1
H (S
i
) + H
b
(S
i
, S
i1
) + H
min
(S
n
, S
i1
, S
1
) i > 1,
Adapting the previous algorithm, this formulation implies that the execution time t
(n) is
t
(n)
2
S
1
t(n)
where n is the number of subsystems, 2
S
1
is the set of congurations of S
1
and t(n) is the
execution time of the previously specied algorithm. Therefore,
t
(n, m) 2
m
_
n 2
2m
_
n 2
3m
which is O(n 2
3m
),
where the system consists of n m spins.
Proof of the conjecture is by induction. Since interactions within the system occur in a
regular lattice, the two adjacent boundary subsystems can be chosen arbitrarily, so the recursive
formulation becomes
H
min
(S
j
, S
i
, S
j+1
) =
_
_
H (S
i
) + H
b
(S
i
, S
i1
) i = j + 1
min
S
i1
_
H (S
i
) + H
b
(S
i
, S
i1
) + H
min
_
S
j
, S
i1
, S
j+1
__
otherwise,
with subsystems S
0
, S
1
, . . . , S
n1
, boundary subsystems S
j
, S
j+1
and subsystem interactions
mod n. It follows that the ground state energy is dened as
H
min
= min
S
j
, S
j+1
_
H
min
_
S
j
, S
n
, S
j+1
__
.
Choosing boundary subsystems S
j+1
, S
j+2
the formulation further becomes
H
min
(S
j+1
, S
i
, S
j+2
) =
_
_
H (S
i
) + H
b
(S
i
, S
i1
) i = j + 2
min
S
i1
_
H (S
i
) + H
b
(S
i
, S
i1
) + H
min
_
S
j+1
, S
i1
, S
j+2
__
otherwise,
which clearly is the optimal sequence of emitted states, given states S
j+1
, S
j+2
. As the ground
state conguration can be deduced from min
S
j+1
,S
j+2
_
H
min
_
S
j+1
, S
n
, S
j+2
__
, the sequence re-
mains optimal also for this case. Therefore, the sequence is optimal for all j, i.e.
0i<n
k,
S
i
_
0j<n
S
j
_
H
_
S
k
j

S
j
_
< H
_
S
k
i

S
i
___
, (3.3)
using the notation

S
j
to denote S \S
j
.
3.2.5 An order-n Markov approach to determining ground states
Having introduced the Markov model for both the rst-order case and its higher-order extension,
it is of interest to examine whether the latter lends itself to a more powerful formulation of Ising
system state probability: Previously, the approach consisted of a row-wise system decomposi-
tion, which resulted in a sequence of subsystems with nearest-neighbour interactions along one
dimension. Reducing subsystem size, it is apparent that interactions between subsystems are no
longer restricted to occurring along one dimension.
Consider the extreme case, where a subsystem consists of a single spin. For the two-
dimensional n m spin lattice, there exist subsystems S = S
0
, S
1
, . . . , S
n m1
. The systems
total energy is the result of horizontal and vertical interactions between subsystems, which may
be evaluated by sliding a window across the entire lattice, as shown in Figure 3.4. For each spin,
this window considers the interactions originating from a vertical and horizontal predecessor.
Figure 3.4: Sliding a unit-spin window across a lattice
Formally, the Hamiltonian is expressed as
H(S ) =
nm1
i=0
H
b
(S
i
, S
i1
) + H
b
(S
i
, S
im
),
where H
b
(S
i
, S
im
) is the interaction energy between S
i
and its vertical predecessor. Simi-
larly H
b
(S
i
, S
i1
) is the interaction due to horizontal predecessor S
i1
. Also, subsystem indices
are computed mod (nm), in order to evaluate interactions occurring across lattice boundaries.
Here, it indeed turns out that a higher-order formulation of system state is possible (cf. Appendix
C), namely
P(S ) =
nm1
_
i=0
P(S
i
S
i1
, S
i2
, . . . , S
im1
) ,
from which ground state probability can be formulated as
P
viterbi
(S
i
, S
i1
, . . . , S
im
) =
_
_
P(S
i
, S
i1
, . . . , S
im
) i m
max
S
im1
P(S
i
S
i1
, . . . , S
im1
) P
viterbi
(S
i1
, . . . , S
im1
) i > m,
for the lattice without cyclic boundary interactions. As previously described, this probability
can be used to determine the actual ground state conguration, and can be reformulated to
determine ground state energy. It follows that the algorithm for obtaining solutions to this
dynamic programming problem is also a modication of the previous approach:
opt[] := 1
for i := m to n*m
for
_
S
j 0
i
, S
j 1
i1
, . . . , S
jm
im
_
( S
i
, S
i1
, . . . , S
im
)
if i > m
p
max
:=
for S
k
im1
S
im1
p := P
_
S
j0
i
S
j1
i1
, . . . , S
k
im1
_
* opt[
_
S
j1
i1
, . . . , S
k
im1
_
]
if p > p
max
optNew[k] := p
else
p := P
_
S
j 0
i
, S
j 1
i1
, . . . , S
jm
im
_
optNew[
_
S
j 0
i
, S
j 1
i1
, . . . , S
jm
im
_
] := p
opt := optNew
The above pseudocode consists of three nested loops, the outermost of which is responsible
for calculating the probability P(S
i
S
i1
, S
i2
, . . . , S
im1
) for iteratively increasing i. The loop
thus eectively species a sliding window of size m+1, which is moved across the lattice in the
fashion previously described. For each position of the window all spin congurations are eval-
uated, using the associative data structure opt[] to obtain the probabilities of preceding window
congurations. These are referenced by the tuple
_
S
j 0
i
, S
j 1
i1
, . . . , S
jm
im
_
, which represents a win-
dow conguration. The algorithm is for the case without cyclic boundary conditions, therefore
the window is not required to precede position i = m+1; at this position, window conguration
probability is unconditional.
Adapting the algorithm for calculating ground state energy, where the statement
p := P
_
S
j0
i
S
j1
i1
, . . . , S
k
im1
_
* opt[
_
S
j1
i1
, . . . , S
k
im1
_
]
becomes a summation of subsystem energies, the optimisation proceeds by determining ener-
getically minimal preceding window states for each position of the window on the system lat-
tice. In this form, the algorithm performs identically to the transfer matrix optimisation scheme
described in [15]. It follows that the described scheme must have equivalent computational
complexity.
An analysis thereof conrms this assumption: Given that the lattice consists of n m spins,
the algorithms execution time is proportional to
t(n, m) (nm m 1) 2
2
(S
1
,S
2
,...,S
m+2
)
,
where 2
(S
1
,S
2
,...,S
m
)
is the set of congurations of tuple (S
1
, S
2
, . . . , S
m
). Therefore,
t(n, m) (nm m 1) 2
m+2
+ 2
m+1
which is O
_
(nm m 1 ) 2
m+2
_
= O
_
(nm) 2
m
_
.
Although not considered in further detail, the opportunity for further modication of this al-
gorithm presents itself, to account for cyclic boundary interactions within the spin lattice. This
entails invoking the algorithm for specied congurations of the spin tuple (S
1
, S
2
, . . . , S
1+m
),
similar to the algorithm employing a row-wise lattice decomposition. This is conjectured to in-
crease the algorithmic complexity to O(nm2
m
2
m
), since there are O(2
m
) possible congurations
of the specied spin tuple.
In the following chapter, parallelisation strategies are described for the harmony search
heuristic, the rst-order Markov chain solution, and as an extension the aforementioned higher-
order modication.
Chapter 4
Parallelisation Strategies
To be of practical use, a computational solution to a given problem must be able to be im-
plemented on a machine architecture, such that the algorithm completes within a reasonable
amount of time. While computational complexity provides a means of qualitatively evaluating
problem tractability, the properties of the machine determine the amount of time required for
solving a particular problem instance.
To reduce machine execution time, an approach applicable to physical architectures is to
increase the processing rate of machine instructions. This may be achieved in practice by in-
creasing the machines CPU clock rate, improving memory bandwidth, and augmenting the ar-
chitecture by additional features such as registers, caches and pipelining. In general terms, this
requires no conceptual modication to the algorithm , although the algorithms performance is
usually amenable to optimisation for the respective architecture.
The second approach to increasing machine performance involves parallelisation. Here, per-
formance is improved by distributing computation among a set of processing elements. With
the exception of algorithms with implicit parallelism in operations on data structures in combi-
nation with vector processing architectures, it is necessary to adapt the algorithm and devise a
scheme for achieving this distribution. For message passing architectures, this includes dening
explicit communication operations.
In the following, the potential for implementing parallel versions of harmony search and
dynamic programming methods is considered, with regard to MIMD architectures.
4.1 Harmony search
In the previous chapter, harmony search was described as a probabilistic algorithm employing
an evolutionary strategy for both discrete and continuous optimisation. As such, it performs
a heuristic evaluation of problem state space, i.e. search is non-exhaustive. Since improving
performance motivates parallelisation, it is necessary to examine the heuristic for the purpose of
31
32 Chapter 4. Parallelisation Strategies
P
1
(a) No distribution (serial)
P
1
P
2
(b) Weak scaling
P
1
P
2
(c) Strong scaling
Figure 4.1: Using parallelism to improve heuristic performance
dening performance relevant characteristics.
For any heuristic algorithm, on one hand performance can be quantied by the search pro-
cess accuracy. The latter is inuenced by the algorithms state space traversal policy, signi-
cantly by the size of the search space. It follows that performance can be improved by enlarging
the search space, since in the limit of search space towards state space, solution optimality is
guaranteed.
On the other hand, it may be of interest to restrict the heuristics execution time, as previ-
ously described for the general class of halting algorithms. In this case, the task is to increase
the rate at which search is performed.
Using parallelism to improve either of these characteristics, it is apparent that distribution
of computation among processors bears similarity to the concepts of strong scaling and weak
scaling, commonly encountered in parallel performance analysis. Whereas weak scaling im-
plies increasing the number of processing elements while keeping the problem size constant
(therefore varying the fraction of computation assigned to a processor), strong scaling increases
the problem size with the number of processors (therefore keeping the fraction of computation
assigned to a processor constant). Similarly, in the case of the heuristic, parallelism can either
be applied for the purpose of distributing a search space of constant size (weak scaling), or for
increasing the size of the search space (strong scaling). Using a tree model, an example of this
relationship is shown in Figure 4.1.
4.1.1 Harmony search performance
The evolutionary strategy used by harmony search for combinatorial optimisation consists of ini-
tial candidate generation, followed by iterative randomised candidate recombination (including
randomised mutation) and solution replacement. The algorithm is probabilistic, hence search
is a random walk, whose average length is inuenced by the memory choosing rate (Figure
4.2(a)). Also, the number of solution vectors inuences search, such that for NVECTORS=1,
the optimisation becomes greedy: This is because a single solution is retained, which is only
replaced when a solution of higher utility is found . For larger NVECTORS i.e. maintaining a
4.1. Harmony search 33
(a) Decreasing memory choosing
rate (radius increases)
(b) Increasing NVECTORS
Figure 4.2: Conceptual illustration of harmony search behaviour within search space
larger set of candidate solutions, the search process becomes biased, reminiscent of rejection-
sampling algorithms: Here, the random walk is eectively centred around islands represented
by candidate solutions, since these eect the composition of future candidates generated by the
algorithm (Figure 4.2(b)). As the algorithm progresses, the positions of candidates in the so-
lution space progress monotonically from their initial random position towards local maxima,
since least favourable solutions are replaced by tter candidates upon generation. Informally, it
is easy to see that increasing the value of NVECTORS oers the benet of a more diverse set
of solutions from which to initiate the search. This is likely to improve the probability of ob-
taining an optimal solution. More signicantly, this diversity allows for a large state space from
which random walks are initiated to generate further candidates; as the algorithm progresses,
an increasingly large set of local optima is held. Upon termination, it follows as a conjecture
that there is greater potential for the solution set to hold diverse local optima. For various ap-
plications of harmony search, solution accuracy is indeed shown to improve when increasing
NVECTORS, as shown in [43, 45]. Considering parallelism, it hence suggests itself to apply
strong scaling to increase the number of solution vectors, in order to enlarge the algorithms
search space. This might be achieved by assigning a set of solution vectors to a processor, such
that each executes the harmony search algorithm on its allocated vectors. In this case, further
consideration must be given to exchanging solutions between processors.
In contrast, a type of weak scaling might be achieved by maintaining a set of solution vec-
tors replicated among processing elements. Here, parallelisation assists in improving the rate
at which successive solutions are generated from vectors held in memory, and hence search is
conducted. This is in comparison to a single processor executing the algorithm, and updating
solutions stored in a set of equivalent size. Each processor executes the harmony search algo-
rithm, potentially generating an updated solution vector at each iteration of the process. Upon
replacing the solution vector in question, its value is communicated to every other processor, so
that these continue to operate on replicate solution vectors. A likely consequence of this method
is that convergence is obtained more quickly, due to the increased rate of generating solutions.
Slave Slave Slave
Master
Solution
(a) Master-slave
Migrate
(b) Coarse-grained
Migrate & select
(c) Fine-grained
Figure 4.3: Parallelisation strategies for population based heuristics
4.1.2 Existing approaches
Parallelisation methods for metaheuristic algorithms were briey mentioned in Chapter 3. These
are considered in more detail, in order to assess their potential adaptation for harmony search.
Cantu-Paz [14] provides an overview of parallelisation schemes for evolutionary algorithms.
Although these are discussed specically in context of genetic algorithms, they are also appli-
cable to other evolutionary heuristics, such as those introduced by Koza for generating software
programs [5]. Cantu-Paz discerns between three classes of approach, known as global master-
slave, ne-grained and coarse-grained, respectively. These dier in the way the evolution-
ary process is distributed amongst processors and to which extent solutions are communicated
amongst them.
Depicted schematically in Figure 4.3(a), the master-slave approach implements a single pop-
ulation; ospring are generated from potentially any parent solutions in the population (termed
panmixia). This is achieved by assigning the population to a single master processor, allowing
slave processors to access and modify individual solutions. Slave processors may be tasked with
evaluating solution tness, whereas the master is responsible for selection and crossover. It is
possible to consider both a synchronous variant, where solutions are retrieved and modied in
discrete generations, and an asynchronous variant, where a slave may initiate a retrieval in ad-
vance of its peers. Either are suited for implementation on shared-memory or message passing
architectures, however it is noted that the heterogeneous organisation of processes into master
and slaves makes the approach generally less suitable for massively parallel architectures.
In the coarse-grained approach (Figure 4.3(b)), the evolutionary process is no longer pan-
mictic. The set of solutions which forms the population is partitioned among processors, so
that optimisation progresses primarily within semi-isolated demes [14]. To allow evolution to
progress globally, demes exchange a proportion of their population with neighbours in a prede-
ned graph topology. This allows solutions of high utility to propagate across the graph, which
promotes convergence towards a common, global solution. On the other hand, the insularity
of subpopulations permits a high degree of diversity, allowing multiple local optima to be ap-
proached independently, thereby preventing early convergence. Previous work includes investi-
gations based on coarse-grained approaches, using both xed toroidal or hypercubic topologies
and dynamic topologies . The distributed approach makes this technique particularly attractive
for implementation on message passing architectures.
The ne-grained approach, shown in Figure 4.3(c), is also based on distributing the solu-
tion population amongst processors. However in contrast, exchange of solutions occurs more
frequently during the evolutionary process: Instead of periodically initiating migration between
subpopulations, selection itself takes place between processor-assigned demes, which in the
most extreme case consists of a single solution. Depending on the specied network topology,
it may be practicable to select from all subpopulations within a certain vicinity from the initiat-
ing deme, which results in a overlapping selection scheme. Cantu-Paz notes that if this vicinity
is equal to the network diameter for all nodes, evolution regains panmixia. Suited for mas-
sively parallel architectures due to its scalability, this approach appears to be especially eective
because of its exibility.
Aside fromevolutionary algorithms, a potentially relevant approach to parallelising a heuris-
tic is presented by Ram et al. [55]. Here, the simulated annealing algorithm is executed indepen-
dently by multiple processors, where each initialises search with a random conguration. This
allows parallel exploration of the seach space, in analogy to the eect achieved by executing an
evolutionary process such as genetic algorithms using disjoint subpopulations: Since annealing
proceeds independently, the process executed by each processor potentially converges towards a
dierent local optimum. To counteract state space exploration, periodically the most promising
solution is determined and exchanged between processors. Akin to migrating solutions between
demes, this promotes global convergence towards a single solution. The number of algorithm
iterations required for convergence is hence reduced. In their implementation, Ram et al. em-
ploy a collective exchange scheme for communicating solutions between individual annealing
processes. However, the neighbourhood exchange scheme described by Cantu-Paz is equally
applicable.
4.1.3 Proposed parallelisation scheme
In the described approaches, parallelism is applied with the intention of enhancing the explo-
rative or exploitative properties of heuristics: Whereas the coarse-grained evolutionary approach
improves exploration alone through parallel selection, the remaining approaches include an el-
ement of parallel search exploitation, by propagating promising solutions in order to accelerate
solution convergence. The method used by Ram et al. can be viewed as a simplication of
the coarse-grained evolutionary approach, where the graph dening solution exchanges is fully
connected.
Having stated the motivation for parallelising harmony search, the opportunity is given to
apply the described approaches to this heuristic. Given that harmony search is an evolutionary
algorithm, distributed state space exploration and exploitation are readily adapted from parallel
genetic algorithms.
Figure 4.4 schematically depicts the proposed parallelisation scheme. Here, optimisation
takes place in distributed fashion, so that the heuristic is executed by multiple processors, each
assigned a set of solution vectors. To allow solutions to be exchanged between processors,
the latter are arranged in a ring. Periodically, processors send solutions to their successors,
while receiving these from predecessors. This reects the behaviour of the aforementioned
ne-grained approach. In addition however, processors are organised into a twofold hierarchy,
where subordinate processors are not directly involved in cyclic exchange of solutions. Instead,
these exchange solutions using collective operations, based on the scheme described by Ram
et al. Subordinate processors are grouped in such a way that each subgroup includes a ring
exchange processor. It follows that collective exchanges consider solutions obtained through
the cyclic exchange process.
Although the proposed scheme is comparatively involved, it allows the behaviour of the
heuristic to be altered by introducing a bias towards search space exploration or conversely
search space exploitation: If the size of subgroups is equal to the total number of processors,
communication is restricted to collective solution exchanges, so that rapid convergence is pro-
moted. In this case, eectively only a single subgroup exists. Providing that communication
occurs at short intervals to ensure that similar solution vectors are held in memory, it is specu-
lated that the algorithm will exhibit the described weak scaling behaviour while increasing the
number of processors. On the other hand, for unit subgroup size, collective solution exchanges
are absent from the distributed search process. As a consequence, the ring-based approach is
reinstated. Here, the expectation is that the heuristic will emphasise on explorative search, and
therefore exhibit strong scaling behaviour when increasing the number of processors.
It is apparent that there are a multitude of parameters which inuence parallel optimisation,
in addition to the memory choosing rate and number of solution vectors dened by serial har-
Collective exchange
Cyclic exchange
Processor
Figure 4.4: Harmony search parallelisation scheme
mony search. These include the total number of processors involved in search, and the size
of subgroups. Also, of signicance is the rate at which solutions are exchanged, both for the
ring and collective subgroup operations. Finally, the latter two operations must be dened in
detail; these may for example involve selecting solutions at random, or communicating the most
promising solutions.
The following describes a pseudocode prototype of a parallel harmony search algorithm for
obtaining Ising spin glass ground states, using the message passing model:
1 So l u t i o n [ ] s o l u t i o n s : = i n i t i a l i s e r a n d o m s o l u t i o n s (NVECTORS) ;
2
3 f o r ( i =1; ha s c onve r ge d ( ) ; i ++)
4 So l u t i o n s o l u t i o n = new So l u t i o n ;
5
6 f l o a t h i g h e s t e n e r g y = c omput e hi ghe s t e ne r gy ( s o l u t i o n s ) ;
7 i n t h i g h e s t e n e r g y v e c t o r = c o mp u t e h i g h e s t e n e r g y v e c t o r ( s o l u t i o n s ) ;
8
9 f o r ( j : =1; j <=s o l u t i o n . l e n g t h ; j ++)
10 i f ( r and ( 0 , 1) < MEMORY CHOOSING RATE)
11 s o l u t i o n [ j ] : = s o l u t i o n s [ r and ( ) ] [ j ] ;
12 e l s e
13 s o l u t i o n [ j ] : = r andom s pi n ( ) ;
14
15
16 i f ( s p i n g l a s s e n e r g y ( s o l u t i o n ) < h i g h e s t e n e r g y )
17 s o l u t i o n s [ h i g h e s t e n e r g y v e c t o r ] : = s o l u t i o n ;
18
19 i f ( PROCESSOR ID mod ZONE SIZE = 0)
20 msg send ( s o l u t i o n s [ r and ( ) ] , ( PROCESSOR ID+ZONE SIZE) mod
N PROCESSORS) ;
21 msg r cv ( r c v s o l u t i o n ) ;
22 copy mi n ( r c v s o l u t i o n , s o l u t i o n s [ r and ( ) ] ) ;
23
24 i f ( i mod ZONEEXBLOCK = 0)
25 r e duc e mi n z one ( s o l u t i o n s [ h i g h e s t e n e r g y v e c t o r ] ) ;
26
27
As with serial harmony search, the algorithm consists of an iterative loop, whose purpose it
is to generate successive solutions and evaluate their utility. The proposed algorithm involves
terminating the loop when the most favourable congurations held by processes have identical
energies. Although a more obvious approach might involve a less stringent termination criterion,
it is thought that using this scheme, the number of iterations until termination provides a rea-
sonable means of evaluating solution exploitation. Within the loop, solutions with random spins
are generated, based on the conguration of existing solutions (lines 915), and replaced (lines
1618). The constants NVECTORS and MEMORY CHOOSING RATE control the number of
retained solution vectors and the memory choosing rate, respectively. Following this, each loop
iteration contains communication instructions for processors involved in ring exchange of solu-
tions: Lines 20 and 21 swap random solution vectors between processors, following which the
function copy min() on line 22 copies the value of the energetically more favourable argument
to its complementary argument. In this way, energetically favourable solutions are propagated
within a ring of search processes. There are (N PROCESSORSZONE SIZE) such processors
in the ring.
In addition, solutions are periodically exchanged between subgroups of processes, using the
collective operation reduce min zone. This performs a reduction based on the most favourable
of argument solutions. As dened, the operation involves the highest energy solutions held by
each search process. The operation is executed at a rate determined by the constant ZONE-
EXBLOCK. Subgroup size is inuenced by the value of constant ZONE SIZE. When equal
to N PROCESSORS, there exists a single group for which collective operations are dened,
whereas ring communications are without eect. Conversely, for unit ZONE SIZE all processes
are involved in ring communications, whereas collective operations are without eect.
4.2 Dynamic programming approaches
In the previous chapter, exact solutions to the ground state problem were presented, based on
modelling spin interactions as Markov chains. The latter in turn were used to arrive at dynamic
programming formulations of the respective optimisation problems. Run-time complexities are
lower than the 2
nm
bound required for nding the ground states of the n m spin lattice using
brute force, nevertheless they are high enough to merit investigating parallelisation strategies.
4.2. Dynamic programming approaches 39
4.2.1 First-order Markov chain approach
Parallelisation is based on an approach by Grama et al. [30], where a dynamic programming
problem which is serial and monadic is decomposed into a tabular arrangement of solutions
to subproblems of increasing size. The order of operations required to solve the problem is
equivalent to the order of individual scalar multiplications and additions required for a series
of matrix/vector multiplications. The parallelisation approach is therefore given by parallel
matrix/vector multiplication, which is well studied.
A dynamic programming problem is monadic if its optimisation equation contains a single
recursive term. That is, given the function c = g ( f (x
1
), f (x
2
), . . . , f (x
n
)), which assigns a cost
to the solution constructed from subproblems x
1
, x
2
, . . . , x
n
, monadicity exists when g is dened
as f ( j) a( j, x), where is an associative operator. In this form, each solution depends on a
single subproblem.
Furthermore, a dynamic programming problem is serial, if there are no cycles in the graph
of dependencies between subproblems. More formally, the graph G = (V, E) is dened by the
set of nodes V, where each edge represents a subproblem. An edge between nodes exists, if the
optimisation equation contains a recursive term indicating a dependency between subproblems.
Examining the optimisation equation for lattice ground state energy (without cyclic bound-
ary conditions),
H
min
(S
i
) =
_
_
min
S
i
H (S
i
) i = 1
min
S
i1
H (S
i
) + H
b
(S
i
, S
i1
) + H
min
(S
i1
) i > 1,
it is apparent that the equation is monadic. To establish existence of the serial property, the
graph of subproblem dependencies is visualised (Figure 4.5(a)). As depicted, rows of nodes
represent states of subsystems S
i
, which characterise the values of subproblems. Since there
are n subsystems, there are n2
S
1
nodes in the graph. Since a subproblem may assume as many

values as there are values of its preceding dependency, the graph has a trellis-like structure con-
sisting of bipartite graph segments. Because this organisation into individual levels is acyclic,
the dynamic programming problem is serial.
The graph is modied to include information on system energy. Given the pair of nodes
associated with subsystem congurations S
k
i
, S
l
i1
, dene the weight function w
_
S
k
i
, S
l
i1
_
=
w
k, l
i
= H
_
S
k
i
_
+ H
b
_
S
k
i
, S
l
i1
_
, for 1 < i n. Further dene an additional node , such that the
set of graph edges in extended to E
= E
_
(, S
k
1
)1 k q
_
for q subsystem congurations.
For i = 1, the weight function is dened as w
_
, S
k
i
_
= H(S
i
). Minimising system energy is
then equivalent to obtaining min
k
p
_
, S
k
n
_
, where p
_
, S
k
n
_
is the minimum path between nodes
and S
k
n
.
n
2
m
(a) First-order
(n 1)m
2
m+1
(b) Higher-order
Figure 4.5: Graph of subproblem dependencies for an n = 3, m = 2 spin problem
A further observation is that that the minimum paths p
_
, S
k
i
_
, 1 k q are expressed as
p(, S
1
i
) = min
_
w
1,1
i
+ p(, S
1
i1
), w
1,2
i
+ p(, S
2
i1
), . . . , w
1,q
i
+ p(, S
q
i1
)
_
,
p(, S
2
i
) = min
_
w
2,1
i
+ p(, S
1
i1
), w
2,2
i
+ p(, S
2
i1
), . . . , w
2,q
i
+ p(, S
q
i1
)
_
,
.
.
.
p(, S
q
i
) = min
_
w
q,1
i
+ p(, S
1
i1
), w
q,2
i
+ p(, S
2
i1
), . . . , w
q,q
i
+ p(, S
q
i1
)
_
,
for i > 1. For i = 1, p(, S
k
i
) = w(, S
k
i
). In an analogy to matrix/vector multiplication,
where addition is substituted by minimisation and multiplication is substituted by addition, the
equations are equivalent to
p
i
= M
i, i1
p
i1
where p
i
= [p(, S
1
i
)p(, S
2
i
) . . . p(, S
q
i
)]
T
. For i > 1, the matrix is dened as
M
i, i1
=
_
_
w
1,1
i
, w
1,2
i
, . . . , w
1,q
i
w
2,1
i
, w
2,2
i
, . . . , w
2,q
i
.
.
.
.
.
.
.
.
.
.
.
.
w
q,1
i
, w
q,2
i
, . . . , w
q,q
i
_
_
,
otherwise
M
i, i1
=
_
_
w(, S
1
i
), w(, S
1
i
), . . . , w(, S
1
i
)
w(, S
2
i
), w(, S
2
i
), . . . , w(, S
2
i
)
.
.
.
.
.
.
.
.
.
.
.
.
w(, S
q
i
), w(, S
q
i
), . . . , w(, S
m
i
)
_
_
.
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
P
1
P
1
P
1
P
1
P
2
P
2
P
2
P
2
P
3
P
3
P
3
P
3
P
4
P
4
P
4
P
4
Step 1
Step 2
Step 3
Step 4
(a) Basic
1
2
3
4
2
3
4
1
3
4
1
2
4
1
2
3
P
1
P
1
P
1
P
1
P
2
P
2
P
2
P
2
P
3
P
3
P
3
P
3
P
4
P
4
P
4
P
4
(b) Improved
Figure 4.6: Parallel matrix operations. Numerals indicate order of vector elements.
Using a sequence of n matrix/vector operations, it is now possible to compute minimum paths
p(, S
k
i
), by initialising p to a q-component zero vector: The rst operation M
1,0
p
0
yields
minimum paths p(, S
k
1
) for 1 k q. Retaining the value of the resulting vector as the
argument for the next matrix/vector operation, minimum paths p(, S
k
2
) for 1 k q are
computed. The process is continued, until minimum paths p(, S
k
n
) have been computed. The
minimum vector component then corresponds to ground state energy.
Matrix operation parallelisation
A simple approach to parallelising the matrix/vector operation is shown in Figure 4.6(a). Here,
the matrix is distributed in such a way that each processor stores the values of
q
p
rows, where p
is the number of processors. Each is responsible for computing the same fraction of components
of the resulting vector. It follows that the latter is assembled from partial results computed by
each processor. In the message passing model, this can be achieved using a gather operation.
For the required purpose, it is necessary for each processor to access all components of the
resulting vector subsequently. Therefore, it is practical to gather collectively. The algorithm is
described in the following pseudocode, where M
i, i1
k, l
denotes the component in row 1 k q,
column 1 l q of matrix M
i, i1
:
Float[] p
Float[] p
for k:= (proc id *

q
p
+1) to ((proc_id+1) *
q
p
)
Float minval :=
for l := 1 to q
if p[l] + M
i, i1
k,l
< minval
minval := p[l] + M
i, i1
k,l
p
[k] := minval
all gather(p
, p
)
In the pseudocode, the outer loop is responsible for iterating through matrix rows. For each row,
elements are added to vector components stored in p. The minimum sum becomes a component
of the vector p
. Matrix rows are assigned to processors based on the processor identier proc id,
whose value is in the range [0, number of processors). The computation concludes with the
collective operation all gather().
Examining the algorithms computational complexity, it can be seen that execution time is
t(q)
q
p
q. Since determining ground state energy requires n iterations of the algorithm, where
n is the number of rows in the spin lattice, total execution time is t(n, q) n
q
2
p
. Considering that
the lattice contains m = log
2
(q) spin columns, execution time expressed in terms of lattice size
is O
_
n
p
2
2m
_
, which is cost optimal in comparison to the serial algorithm presented in Chapter 3.
Memory ecient matrix/vector computation
Alternatively, it is possible to perform the desired matrix/vector computation using a parallel
algorithm with reduced memory requirements for vectors q, q
. In resemblance to Cannons
algorithm [13], it can be observed that although all processors access vector q in its entirety, in-
dividual components need not be accessed simultaneously, as in the described approach above.
Instead, the vector can be distributed between processors, so that each holds
q
p
components.
Computation commences with each processor performing additions of matrix elements asso-
ciated with its allocated vector components. After the latter have been processed, all proces-
sors perform a cyclic shift of vector components, which allows the minimisation operation to
progress further. This procedure is repeated until processors have completed the minimisation
operation on their assigned rows. The approach is illustrated in Figure 4.6(b), for which the
modied pseudocode is:
Float[] p
Float[] p
for k:= (proc id *

q
p
+1) to ((proc_id+1) *
q
p
)
Float minval :=
for l := 1 to q
if (l mod
q
p
) = 1
cyclic shift(p)
if p[(l-1) mod
q
p
+ 1] + M
i, i1
k,l
< minval
minval := p[(l-1) mod
q
p
+ 1] + M
i, i1
k,l
p
[(k-1) mod
q
p
+ 1] :=minval
Here, the previously dened loop has been adapted to index the components of the distributed
vectors. Since the result vector p
becomes an operand in successive iterations of the algorithm,

performing a collective operation on p
is not necessary; this vector is thus distributed identically

to p.
In Chapter 3, a serial algorithm was presented for the ground state energy of the lattice
with cyclic boundary conditions. This involved evaluating the boundaryless ground state en-
ergy H
min
for all congurations of boundary subsystems S
1
, S
n
. To adapt the parallel matrix
algorithm for this problem, dene the weight function between nodes , S
k
1
as w
_
, S
k
1
_
=
H
_
S
k
1
_
+ H
b
_
S
k
1
, S
l
n
_
, for boundary subsystem conguration S
l
n
. The ground state energy can
then be obtained by performing the described series of matrix operations for all congurations
of subsystem S
n
. For each conguration S
k
n
, the nal result vector contains the minimum path
lengths p
n
= [p(, S
1
n
) . . . p(, S
k
n
) . . . p(, S
q
n
)]
T
, of which the relevant component is retained.
The ground state energy is the minimum of these retained components. The complexity of the
entire computation is O
_
n
p
2
3m
_
executed on p processors, for an n-row, m-column lattice. In
comparison to the serial algorithm, this is cost optimal.
4.2.2 Higher-order Markov chain approach
It remains to develop a parallel solution to the approach based on the higher-order Markov chain.
For this model, it was formulated that ground state probability is
P
viterbi
(S
i
, S
i1
, . . . , S
im
) =
_
_
P(S
i
, S
i1
, . . . , S
im
) i m
max
S
im1
P(S
i
S
i1
, . . . , S
im1
) P
viterbi
(S
i1
, . . . , S
im1
) i > m,
where m is the number of lattice columns. By the relation between state probability and energy,
in analogy to the approach based on row-wise lattice decomposition shown in Chapter 3, it was
shown that
H
min
(S
i
, S
i1
, . . . , S
im
) =
_
_
H (S
i
, S
i1
, . . . , S
im
) i m
min
S
im1
H
b
(S
i
, (S
i1
, . . . , S
im1
)) + H
min
(S
i1
, . . . , S
im1
) i > m,
where H(S
i
, S
i1
, . . . , S
im
) is the energy of the ordered set of subsystems (S
i
, S
i1
, . . . , S
im
)
and H
b
(S
i
, (S
i1
, . . . , S
im1
)) is the interaction energy between system S
i
and the ordered set
(S
i1
, . . . , S
im1
). Examining this optimisation equation, it can be seen that it is monadic,
since it contains a single recursive term. As each level of recursion eects a unit decrease
of indices of the tuple (S
i
, S
i1
, . . . , S
im
), there are no cyclic dependencies between subprob-
lems. The dynamic programming formulation is therefore also serial. Considering this sim-
ilarity, the opportunity is given to adapt the parallel matrix based computation to solve this
dynamic programming problem. To achieve this, the weighted graph of subproblems is re-
established, with an edge connecting two nodes if the recursive formulation indicates depen-
dency. For an n m spin lattice, there are (n 1) m2
m
nodes in the graph, because each tuple
(S
i
, S
i1
, . . . , S
im
) has 2
m
congurations and a solution is constructed from (n 1) m subprob-
lems. A given subproblem corresponds to a certain position of the sliding window on the lat-
tice, as described in Chapter 3. The function w((S
i
, S
i1
, . . . , S
im
), (S
i1
, S
i2
, . . . , S
im1
)) =
H
b
(S
i
, (S
i1
, . . . , S
im1
)), dened for i > m, describes the weight of an edge. As before,
the graph is extended with an additional node , so that the set of edges is dened as E
=
E (, (S
1
, S
2
, . . . , S
m+1
)) for all congurations of (S
1
, . . . , S
m+1
). For i m, dene the
weight function w(, (S
i
, S
i1
, . . . , S
im
)) = H(S
i
, S
i1
, . . . , S
im
). This results in a trellis-like
graph, shown in Figure 4.5(b). Minimising system energy is equivalent to obtaining
min
(S
nm
,S
nm1
,...,S
nmm
)
p (, (S
nm
, S
nm1
, . . . , S
nmm
)) ,
where the function p is the minimum path between two nodes in the graph.
Previously, matrices of edge weights between trellis segments were used to compute min-
imum paths, for which the parallel matrix operation was presented. From the optimisation
equation and Figure 4.5(b), it is observed that each node at a given level is connected to
only two nodes at the preceding level. This is because there are two congurations of tuple
(S
i1
, S
i2
, . . . , S
im1
) for any specied tuple (S
i
, S
i1
, . . . , S
im
). Assigning innite weights
to unconnected nodes between trellis levels, it follows that the matrices are sparse, with regard
to innite valued elements.
Providing matrix sparseness can be exploited, an adaptation of the existing parallel algo-
rithm will execute in t(n, m) (n 1)m
1
p
2
m
time on p processors, since each matrix contains
2
m
rows distributed between processors. With a total of (n 1) m matrix operations, the ground
state energy of the lattice without cyclic boundary conditions can be obtained in O(
nm
p
2
m
) time.
This is cost optimal in comparison to the serial algorithm described in Chapter 3. Using bit
string representations of spin tuples in combination with shift operations, an approach which
considers matrix sparseness is described in Chapter 6.
Chapter 5
The Project
In previous chapters, the theoretical background to the ground state optimisation problem was
described. Having described the two approaches identied for solving this problem, this chapter
deals with undertaken practical work towards their implementation and evaluation.
5.1 Project description
The purpose of the project is to conduct practical investigation into parallel algorithms for de-
termining ground states of the Ising spin glass. Specically, the project deals with the two-
dimensional Edwards-Anderson model, i.e. the Ising model with lattice aligned spins, in which
spins are able to assume two discrete states.
Investigations deal with a method for obtaining spin glass ground states exactly. The method
is based on the transfer matrix method, in which the statistical-mechanical properties of the lat-
tice system are used to obtain solutions. It follows that one project objective is to develop a
parallel algorithm based on the Transfer Matrix method. As an additional objective, the project
includes investigating an alternative parallel algorithm, with which solutions to the ground state
problem are obtained heuristically. The performance of both parallel algorithms is to be evalu-
ated; in the case of the heuristic this entails evaluating solution accuracy.
Investigation requires that algorithms are developed in software. The software should be
self-contained: From the users perspective, the software should oer sucient functionality to
be useful as a research tool, allowing various types of problem instance to be solved using the
implemented algorithms. The software should be able to be executed on a wide range MIMD
multiprocessing architectures.
5.1.1 Available resources
There are two computing resources available for the project. The rst of these, Ness, is a shared
memory multiprocessor system [2]. It has a total of 32 back-end processors, which are parti-
45
46 Chapter 5. The Project
tioned into two interconnected groups. This conguration allows a single job to request 16 pro-
cessors at maximum. The system is constructed from AMD 64-bit Opteron processors, which
have a clock frequency of 2.6GHz. Jobs are submitted to the back-end from a dual processor
front-end, which executes the Sun Grid Engine scheduling system. The back-end has 32 2GB
of RAM. The system is based on the Linux operating system, providing Fortran, C and Java
programming environments. Both shared memory and message passing model programming
are supported, using the MPI and OpenMP programming interfaces. Ness does not implement
a budget system for CPU time, however access to queues is restricted according to the amount
of requested computation time.
Also available is the supercomputing resource HPCx [3]. This consists of a cluster of IBM
P575 shared memory nodes, each containing 16 processors and 32GB of RAM. For executing
jobs, the system consists of 160 compute nodes. Nodes are constructed from Power5 proces-
sors, which have a clock frequency of 1.5GHz. The processor architecture allows for 6.0Gop/s
theoretical peak performance. Inter-node communication is supported using IBM High Perfor-
mance Switch interconnects. These provide a maximum unidirectional inter-node bandwidth
of 2GB/s, at MPI latencies of 46s [24]. Based on the AIX operating system, the serial and
parallel programming environments are similar to those provided on Ness. The job scheduler,
LoadLeveler, provides queues for serial and parallel jobs, using a budget system for CPU time.
5.2 Project preparation
Before commencing the project, an initial phase was designated to project preparation. This
consisted of investigating the problem background and dening the projects aims. Potential
approaches to solving the spin glass problem were idented and implemented as prototype soft-
ware. Project process activities were carried out, consisting of a risk analysis and scheduling. A
software development model was decided upon.
5.2.1 Initial investigations
Access to an existing serial transfer matrix code was provided before commencing the project
preparation phase. The potential was given for a code level analysis of parallelism; this approach
was considered an alternative to basing an implementation on the mathematical formulation of
the optimisation problem, which was subsequently undertaken. With a view to implementing
the parallel approach described by Grama et al. [30], initial work consisted of investigating the
exact optimisation technique described in Chapter 3.
The harmony search algorithm was identied as a potential secondary approach to com-
pare to the envisaged exact ground state solver. After initialising a CVS repository for project
source code and experiment data, a serial implementation of the heuristic was evaluated, in
order to assess the algorithms suitability for further parallelisation. The evaluation consisted
5.2. Project preparation 47
spinglass.h
+int xSize
+int ySize
+double[] weights
+Spin[] initialSpins
+Spin[] spins
+boolean[] clamps
Figure 5.1: Spin glass structure design
of determining solution accuracy, based on ground states obtained for a collection of random
spin glasses, using an implementation of a brute force algorithm. Discussed in Chapter 7, re-
sults suggest that solution accuracy might be increased, using a parallel implementation of the
algorithm.
5.2.2 Design and implementation
A basic software framework was developed, to facilitate the collation of performance data. This
framework consisted of a set of utilities, implementing rudimentary functionality for creating
spin glass problem instances and evaluating their energy. Based on this, a design for a more
extensive framework was created, based on the following list of client operations on a spin glass
API:
Initialisation of spin lattices with specic boundary conditions
Destruction of spin lattices
Calculation of system energy
Bond randomisation
Also, a spin glass data structure was designed. Shown in Figure 5.1, this consists of instance
variables for storing the height and width of the spin lattice. The values of spins themselves are
stored in an associative array-like data structure, as are the values of coupling constants. The
former are stored two-dimensionally in row major fashion, while the latter require an additional
dimension. In the design, two 2-dimensional arrays store vertical and horizontal bonds, again
using a row major storage scheme. To record whether a spin is clamped to a specic state, the
data structure includes a further array. Finally, the initial values of spins are stored. This stores
the actual state to which a spin is clamped, allowing the primary spin array to be reserved for
computation.
A schema of the framework is shown in Figure 5.2. This includes an interface for per-
forming input/output operations: It allows representations of coupling constants to be read from
SpinGlass
+spinGlass_new()
+spinGlass_remove()
+spinGlass_energy()
IO
+readBonds()
+writeBonds()
+readClamps()
+writeClamps()
writeBonds writeClamps
transferMaxtrixSolver
Solver
Figure 5.2: Software framework design
les, similarly a function allows the clamping state of spins to be read. These operations are
complemented by functionality for writing representations to le.
The IO operations are required by the two utilities writebonds and writeclamps, which fa-
cilitate creating spin glass problem instances. These are responsible for writing data to les,
which are subsequently read by solver utilities. The format of clamping state les is specied as
a UNIX UTF-8 encoded text le, containing the symbols 1 and 0. These provide a represen-
tation of whether a spin is clamped, such that a string encodes the state of a lattice row. Strings
consist of the aforementioned symbols, separated by whitespace. Spin clamps are stored in the
le as consecutive strings, separated by line feed characters. The le format for spin coupling
constants is similar: Here, symbols are oating point numbers in decimal notation, again sep-
arated by whitespace and line feed characters. The format reects the design of the spin glass
data structure, in that two consecutive blocks retain values of vertical and horizontal bonds. The
format species that these blocks are separated by a single blank line.
Figure 5.2 also shows the design of the spin glass API. This exports functionality to client
solvers, which themselves implement a simple interface for solving spin glass instances. A
solver uses the IO interface to construct a spin glass instance from bond and clamp state les.
Thereafter, it invokes its implementation of a ground state algorithm. The latter utilises further
API operations, to evaluate spin glass energy. Finally, the spin glass instance is destroyed, after
an output of the determined solution has been generated.
5.2.3 Implementation language and tools
During the course of software design, the choice of implementation language and tools was
considered. The C language was selected due to its general widespread use as a development
language on high performance systems, and availability of compilers both on the two computa-
tion resources and development machines. To ensure portability, ANSI C 89 was selected as the
implementation standard.
To expedite software development, it was decided to implement the software using the GLib
library [1]. This is a cross-platform collection of utility functions which implement general
purpose data structures, parsers etc. Macros and type denitions are provided, which potentially
reduce the amount of required pointer casts in a code. This in turn has an impact on cast errors
and debugging time.
A build management system was also selected. Widely used in conjunction with the C and
C++ programming language on UNIX based systems, this allows makeles to be generated
semi-automatically and congured for dierent target systems. This was considered useful for
providing an application package for a variety of systems.
Given the available computing resources of which the HPCx is a clustered system, the MPI
message passing library was chosen for parallel development. For this reason, the algorithms
described in Chapter 4 are given for the message passing model. Although the possibility of
using a hybrid shared memory/message passing approach using MPI and e.g. OpenMP is given,
this was considered beyond the scope of the project.
5.2.4 Choice of development model
For the choice of software development model, multiple factors were taken into account. These
included the amount of time available, the required functionality and overall software complex-
ity.
Intuitively, implementation can be realised in two phases, each relating to one of the two
algorithms. From previous experience and design requirements, it was assumed that each of
the implementation tasks would involve a relatively small amount of written code. Instead, im-
plementation eort was assumed to focus on distribution of data, communication patterns and
algorithm correctness. Therefore, it was thought that the approach of applying staged delivery to
each phase would be advantageous to the project. Following the design of the frameworks over-
all architecture with multiple ground state solvers, this approach involves discrete design/imple-
mentation/testing activities associated with one release for each ground state solver. Developing
each ground state solver is associated with iteratively augmenting software functionality.
5.2.5 Project schedule
The devised project schedule is shown in Appendix A. Based on an available time frame of 16
weeks, the schedule accounts for all project deliverables, implementation goals and exploratory
aims. Therefore both a practical component, consisting of software development and evaluation,
and the project report and presentation are included.
Risk Type Impact Likelihood Action
Data loss Schedule High Low Avoid
Lack of time Schedule, Scope High Moderate Reduce
Unavailable testing resources Schedule, Quality, Scope High Low Avoid
Algorithmic complexity Scope, Schedule Moderate Moderate Avoid
Table 5.1: Identied project risks
The practical component is split into two distinct phases. Each of these corresponds to the
development and evaluation of the dynamic programming and harmony search based ground
state solvers. A development/evaluation iteration is comprised of tasks for designing, imple-
menting, debugging and testing software, before gathering performance data. Following devel-
opment and evaluation, tasks are specied for producing the report and presentation. A single
week is left unallocated for making amendments to the produced work.
The implementation, debugging and testing tasks required for software development are
scheduled in parallel, as it was thought that this best reects the nature of the chosen develop-
ment model, where functionality is integrated iteratively. Evaluation tasks are interleaved with
software development, so as to minimise the eects of unavailable resources, should these have
occurred.
5.2.6 Risk analysis
To assess the chance of the projects successful completion, potentially detrimental factors were
considered. Such factors include those aecting the project plan and scheduling, software qual-
ity and software scope. Table 5.1 lists risks identied during project preparation by type, esti-
mated impact, likelihood of occurring and proposed action.
Judging from the product of impact and likelihood of occurrence, the most signicant risk is
lack of time. As the time frame for completing the project and required deliverables was short,
this was conceivable. To counteract this, care was taken to dene project goals rigorously to
avoid feature creep, furthermore all tasks were scheduled within a 15 week time frame, allowing
for a further week as oat time.
The remaining risks were avoided by ensuring sucient computing time on parallel ma-
chines (pertaining to unavailable resources), backups and software version control (pertaining
to data loss) and sucient background research (pertaining to sophisication of algorithms). As
a fallback action in the event of not being able to implement the researched transfer matrix
scheme, the possibility of performing a code level analysis of an existing serial transfer ma-
trix solver code was given. As a caveat, this approach would have oered less insight into the
underpinnings of parallelism in the transfer matrix method.
5.2.7 Changes to project schedule
A number of changes were made to the project schedule. These concerned both the order of
scheduled tasks and their estimated duration.
Most signicantly, developing the parallel harmony search solver proved to require less
time than envisaged in the project schedule; it claimed only two schedule weeks in comparison
to the four weeks assigned during preparation. As a result, it was possible to implement a more
advanced exact parallel solver, as previously described.
Also, the original decision to designate performance evaluation to a single task for each of
the two solver types proved impractical. Instead, data were gathered separately for each comput-
ing resource, with subtasks for each variant of the exact solver. Separating evaluation between
machines was initiated by the fact that implementing experiments on HPCx was delayed due to
compilation issues with the required version of the GLib library.
Furthermore, after devising the original project schedule, the communicated date for the
presentation proved to be after the date for the remaining deliverables. The time gained was
allocated to completing the project report.
5.2.8 Overview of project tasks
The following provides a description of tasks undertaken during the project, as an account of
the extent to which the project schedule was adhered to.
In weeks 1 and 2, the ideas presented in Chapter 3 were developed as a basic serial exact
ground state solver code. The parallelisation method using collective operations, discussed in
Chapter 4 was also implemented. In both cases, the algorithms were based on the spin lattice
without boundary conditions.
In week 2, timing data were collected for the previously implemented serial solver. In addi-
tion, scaling data for the parallel solver were collected on the Ness computing resource. Work
commenced on implementing the improved parallel ground state solver using cyclic commu-
nication patterns, also described in Chapter 4. The improved parallel ground state solver was
completed in week 3. In week 4, further scaling performance data were collected on Ness for
this code. Remaining time in week 4 was used to conduct a code review, based on the entirety
of implemented software.
In week 5, work commenced on developing the harmony search ground state solver. Both
serial and parallel code was completed in week 6, during which the dynamic programming
code was modied to support solving systems with cyclic boundary conditions. In week 6,
performance data for the dynamic programming code were collected on the HPCx machine.
In week 7, further performance data were gathered on HPCx. This was to evaluate the
dynamic programming code with cyclic communication patterns. Also, routines were developed
for evaluating harmony search performance, which was subsequently evaluated in week 8.
In weeks 9 and 10 a further modication to the exactly solving dynamic programming ap-
proach was implemented, based on the higher-order Markov chain theory described in Chapter
3. This was for the spin glass model without cyclic boundary conditions. In week 10, perfor-
mance data were gathered for this algorithm.
The remaining time was used to complete the project report and perform a nal revision of
all deliverables.
Chapter 6
Software Implementation
6.1 Introduction
The implemented software is a framework for experimenting with two-dimensional lattice spin
glass ground state problems. It consists of utilities which assist with generating spin glass in-
stances, which may be subsequently solved using either exact or heuristic based solver utilities.
The latter provide information on both the energy and spin conguration of ground states. While
aimed primarily at generating solutions using parallel algorithms, it is also possible to recong-
ure the software to use serial computation only.
The software is implemented in the C programming language. The GNU C compiler was
used on the development system. To increase C90 standard conformity, the compiler ags -ansi
-pedantic were used. Development took place predominantly on a 32 bit single processor Linux
system, on which gcc 4.1.2 and gdb 6.6 were installed. The MPI implementation was MPICH2,
version 1.0.6. To assist with debugging, the Valgrind suite was used to check for memory leaks.
The version control system CVS was used extensively during implementation. Based on a
central repository stored on the Ness machine, version control was used as a means of retrieving
the entire code base and synchronising code modications between machines.
The build management system used for the software is the GNU autotools suite. This is used
to automatically congure the software prior compiling it on the target architecture. Instructions
on how this can be achieved are given in Appendix E.
In the following, an overview of the software framework is given.
6.2 Implementation overview
From the users perspective, the framework consists of a set of binary executables. These are:
genbonds
53
54 Chapter 6. Software Implementation
genclamps
sbforce
dpsolver
dpsolverfast
hmsolver
The two utilities genbonds and genclamps are used to generate random coupling constants
and specify the clamping state of spins in the lattice, respectively. As implemented, the utilities
produce character based representations as described in the design in Chapter 5. The utilities
write to the standard output. Using UNIX shell redirection, this output can be stored in les, in
preparation to invoking a ground state solver on the data. Using these utilities therefore facili-
tates creating instance data. Both genbonds and genclamps use standard command line options
for specifying spin lattice dimensions and related parameters. For example, lattice dimensions
are specied using xSize=x ySize=y, for a system with x rows and y columns.
The remaining executables correspond to implementations of algorithms described in Chap-
ters 3 and 4: For testing purposes, the sbforce utility implements a simple exhaustive search,
hmsolver the harmony search algorithm in its parallel realisation. Similarly, dpsolver and dp-
solverfast provide exact solvers based on dynamic programming approaches. As before, all of
these executables use command line parameters for specifying options. In this case, the most
signicant parameters are those for specifying bond and clamp conguration les. These utili-
ties write solutions to standard output.
From the perspective of implementation, the software is constructed using a modular ap-
proach. Also based on the design described in the previous chapter, there exist various library
modules, which provide functionality such as IO and spin glass manipulation. These are utilised
by client modules, which include implementations of of ground state solvers. By means of C
headers, client modules are able to reference APIs. API implementations are used to generate
separate binary executables through the linking process.
Appendix Bincludes a UML class schema of the relationships between source code modules
and headers. As shown, source code modules reference various headers, which include arrays.h,
gstatender.h, io.h, random.h and spinglass.h are dened. Their purpose is as follows:
arrays.h Species multidimensional array operations
gstatender.h Species the interface to be implemented by ground state solvers
io.h Denes IO operations
random.h Denes randomisation functions
6.3. Source code structure 55
spinglass.h Denes the spin glass data structure and operations
As shown in Figure B.1, multidimensional arrays are used by the dynamic programming based
solvers, as bets the algorithms requirements for associative data structures. The IO header
is used by module main.c, which implements an entry point for all executables. Further-
more, gstatender.h is included by main.c, bforce gstate nder.c, dp gstate nder.c and har-
mony gstate nder.c, the latter three implementing exhaustive search, dynamic programming
and harmony search, respectively. Whereas dp gstate nder.c implements the basic exact op-
timisation algorithm described in Chapter 3, a further module dp gstate nder fast.c provides
an implementation of the improved dynamic programming algorithm, described in the same
chapter.
6.3 Source code structure
From the description of source module and header purpose, the following provides a more de-
tailed description of the implementation. This is given at function level for a selection of the
code base, to illustrate core functionality.
6.3.1 Library functionality
arrays.h
As previously mentioned, the implementation of the exactly solving algorithm requires access
to multidimensional arrays. Given the restriction in C to dening single-dimensional dynamic
arrays, next to using static arrays, it is necessary to use pointer arithmetic and casts to implement
multidimensional arrays. Conning implementation to source module arrays.c, functions are
provided for constructing and destroying arrays in two and three dimensions of arbitrary size.
Returning pointer types, the constructor functions allow data elements to be accessed using
conventional array syntax, while preserving memory contiguity. These functions are invoked
repeatedly by dp gstate nder.c and dp gtate nder fast.c. While a less involved approach
might have oered increased performance, implementing the dynamic programming algorithm
otherwise was considered too cumbersome, given the allocated time for software development.
As an alternative, the header denes macros which emulate a multidimensional array, based
on performing arithmetic on a single pointer. Although syntactically less convenient, this ap-
proach requires fewer dereferencing operations to access a pointer element. For performance
reasons, the approach is utilised by the spin glass library functions in spinglass.c.
io.h
Header io.h denes six functions, responsible for reading and writing les containing repre-
sentations of spin state, clamping state and coupling constants. Three functions responsible
for reading from le are of the form *read(char *leName, int *xSize, int *ySize). Of all pa-
rameters, which are all called by reference, the value of leName is read upon invoking the
function, whereas xSize and ySize retain the spin lattice diminsions after the function call has
been completed. The function returns a pointer to state data read from le.
Complementary functions for writing to le are of the form write(struct SpinGlass *spin-
Glass, char *leName). Here, the parameters consist of a pointer to an instance of the spin
glass abstract data type (described in the previous chapter), and the name of the le to write to.
The function return type is void.
The le-reading functions in io.c are implemented using a single static method, GQueue
*parse le(). As the name suggests, this provides simple parsing capabilities, using a loop to
iterate through string tokens obtained from the standard library function strtok(). Recording and
verifying counts of symbols on each line, tokens are added to a queue. This queue is returned
by the function. Dequeuing elements stored in the queue, the aforementioned reading functions
then construct data structures representing spin glass parameters.
spinglass.h
The spin glass data structure is dened in header spinglass.h. Using a C struct type, the fol-
lowing elds are dened:
1 s t r u c t Spi nGl a s s
2 g i n t xSi ze ;
3 g i n t ySi ze ;
4 Spi n s p i n s ;
5 gdoubl e we i ght s ;
6 gbool ean cl amps ;
7 Spi n i n i t i a l S p i n s ;
8 ;
As given by the design description in Chapter 5, the structure species variables
for storing
lattice dimensions. An enumeration type denes the Spin type; the pointer eld is used to
reference a memory block storing the state of spins. The enumeration denes integer states
UP=1 and DOWN=-1. Spins states are stored using a row-major scheme. This matches the
access method using a single pointer, dened in arrays.h. Coupling constants, clamping states
and the eld initialSpins store states similarly. The latter eld provides an account of spin state
GLib species wrappers for standard C types; motivation for their use is discussed in the GLib documentation
[1]
(a) row energy() (b) interrow energy() (c) ensemble delta()
Figure 6.1: Functions provided by spinglass.c
distinct to eld spins, the latter storing the state of spins while performing optimisation. Using
two separate elds allows lattice congurations to be compared before and after optimisation.
Header functions in spinglass.h are grouped into four categories, associated with allocating
memory for the data type, computing lattice energy, writing lattice properties to le, and mis-
cellaneous activities. All functions operate on the spin glass data structure, which is passed by
reference from a caller function.
The purpose of the memory related functions is as described in the design: These ensure that
the spin glass structure is initialised and terminated correctly. The constructor function is of the
form *spinglass alloc(gint xSize, gint ySize, Spin *initialSpins, gdouble *weights, gboolean
*clamps); it requires as parameters the lattice dimensions, initial spin conguration, coupling
constants, and clamping states. The function returns a pointer to a newly allocated data structure
(elds are assigned according to supplied parameters). To assist in freeing memory after use,
the function spinglass free() is implemented.
Lattice energy is computed using a collection of ve functions. The simplest of these is de-
ned as spinglass energy(struct SpinGlass *spinGlass), which returns as a oating point num-
ber the energy arising from all interactions in the lattice. For convenience, spinglass(struct Sp-
inGlass *spinGlass, Spin *conf) returns the energy due to coupling constants specied in *spin-
Glass, however the conguration is given as a separate array *conf. A comparison between the
remaining three energy calculating functions is given in Figure 6.1: The spinglass row energy()
function determines the energy of a spin row (considering horizontal bonds), whereas inter-
row energy() uses vertical bonds to calculate the interaction energy between adjacent rows.
With ensemble delta(), the energetic contribution between a single spin and its predecessors in
horizontal and vertical dimensions is calculated.
The le output functions in spinglass.c are used to implement the output functions in io.c.
The functions are of the form write(struct SpinGlass *spinglass, FILE *le), i.e. arguments
include a pointer to a spin glass structure and a le pointer. If required, this allows spin glass
properties to be easily echoed to screen, using the le pointer stdout.
Finally, miscellaneous functions include get random spins() (used to generate random spin
congurations, while considering spin clamping state), has vertical boundary() (used to deter-
mine whether cyclic boundary interactions are present along the lattices vertical dimension),
and correlate(). The latter is used to compare spin congurations between spin glass structures
in terms of diering spin state.
6.3.2 Client functionality
Having described library functionality provided by the software, attention is now given to the
code modules utilising this functionality. These include the entry point module main.c, and
more importantly, the modules implementing optimisation algorithms. Note that the code base
includes additional modules for the utilities genbonds and genclamps. These do not make use
of library functions; as their implementation is trivial, these are not considered in further detail.
The source code for all algorithms is provided in Appendix F.
main.c
Module main.c uses the standard argument processing library provided by GLib to implement
execution parameter parsing for solver utilities. This requires a number of auxiliary data types
and structures, which are dened as static global and local variables in the modules main()
function. The latter is responsible for reading le name arguments associated with specic ags,
describing the locations of coupling constant and clamping state les. Also, a le describing a
spin conguration to compare the solution to may be specied.
After parsing program arguments, presence of required and optional parameters is veried.
A local function init() then initialises a spin glass data structure, using previously described
function spinglass alloc(). Optimisation is then initiated by invoking the header dened func-
tion nd ground states(). After the solution has been obtained, spinglass correlate() performs
a comparison, should the related ag have been specied. After deallocating the data structure,
init() and main() terminate. By each optimisation algorithm implementing nd ground states()
in its own module and linking with main.c, the main() function is provided by the same mod-
ule for all utilities. This promotes code reuse and facilitates extending the code base with new
algorithms.
bforce gstate nder.c
To generate ground truth data for testing purposes, module bforce gstate nder.c implements
a brute force ground state solver. The solver is based on an inx traversal of state space. This
is achieved using a function nd ground states(), which is called recursively. A conditional
statement restricts recursion depth, based on a variable whose value represents the position of a
window on the spin lattice. For each invocation of the function, the state of the spin under the
window is ipped. Before and after ipping spin state, recursive calls are performed, in each
case advancing the window by one spin. The base case eects evaluation of system energy. If
system energy is found to be lower than the recorded minimum, energy and conguration are
output before updating the minimum. Since search is exhaustive, the ground state conguration
is eventually output.
harmony gstate nder.c
Serial and parallel harmony search algorithms were described in Chapters 3 and 4. The se-
rial algorithm consists of initial random solution generation (characterised by the parameter
NVECTORS) followed by an iterative process, in which low-utility solutions are replaced. Re-
placement is based on combining the components of stored solutions, using randomisation. The
latter is controlled by the memory choosing rate parameter. The parallelisation strategy involves
a collection of harmony search processes which exchange solutions between each other, using a
hierarchical system of nearest-neighbour and collective communication patterns.
Excepting the number of processss, module harmony gstate nder.c denes all parameters
controlling the behaviour of harmony search using preprocessor directives. These parameters
include the number of solutions held by a process (NVECTORS), the memory choosing rate,
the number of iterations before performing a collective communication operation, and the size
of subgroups involved in collective communications.
In addition to the modules entry function nd ground states(), the implementation consists
of seven static functions, responsible for initialising and nalising message passing communica-
tions, collectively evaluating solution energy, and verifying the algorithms state of convergence.
When the entry function is invoked, the implementation begins by allocating memory for a
single solution vector *neighbourSpins, which is used to store data from nearest-neighbour ring
communications. After initialising communications, solution vectors are generated randomly
and assigned to elements of an array Spin *spins[NVECTORS]. The latter is the collection
of solution vectors used during the heuristic process. The actual heuristic consists of a loop
executed directly after the aforementioned solution generation, which is of the form:
1 f o r ( i =1; g e t s t a b i l i s e d s t a t u s ( ) == FALSE; i ++)
2 / Cr e a t e new v e c t o r /
3
4 / Compute h i g h e s t ener gy v e c t o r /
5
6 / Set v e c t o r component s /
7
8 / Repl ace v e c t o r i n memory , i f new v e c t o r i s of hi ghe r f i t n e s s /
9
10 / Per f or m communi cat i on o p e r a t i o n s /
11
As shown, the loops execution is controlled by get stabilised status(), responsible for eval-
uating the state of convergence. Within the loop body, memory for a new solution vector is
allocated; like all other solution vectors, the memory block consists of xS ize yS ize elements
of type Spin, where xS ize yS ize are the dimensions of the spin lattice. After determining
the solution vector with highest energy, the values of the new solution vectors components are
set from existing vectors, according to the algorithm described in Chapter 3. Following this,
the new solutions energy is determined. The highest energy solution is replaced, if compari-
son yields that the new solutions energy is lower. Communication routines are executed, after
which the process begins anew.
The hierarchical communication scheme is implemented using two separate conditional
statements, responsible for performing nearest-neighbour ring communications and collective
operations:
1 i f ( Sol ve r Pr oc I D % ZONE SIZE == 0)
2 g i n t random = g r a n d o m i n t r a n g e ( 0 , NVECTORS) ;
3 MPI Sendr ecv ( s p i n s [ random ] , 1 , Type Ar r ay , ( Sol ve r Pr oc I D+ZONE SIZE)%
Sol ver NPr ocs , 0 , nei ghbour Spi ns , 1 , Type Ar r ay , MPI ANY SOURCE,
MPI ANY TAG, COMM, MPI STATUS IGNORE) ;
4 r e d u c t i o n f u n c t i o n ( nei ghbour Spi ns , s p i n s [ random ] , NULL, NULL) ;
5
6
7 i f ( i % ZONEEXBLOCK == 0)
8 r e d u c e mi n i ma l s p i n v e c t o r ( s p i n s [ maxVect or ] , Sol ver Zone ) ;
9
The exchange begins by processes selecting solutions at random (line 2) and sending them to
their neighbours. Ring communication is performed using the send/receive operation in line 3,
where each process with IDSolver ProcIDsends to process ID((Solver ProcID + ZONE SIZE)
mod Solver NProcs). Here, Solver NProcs is the total number of processes and ZONE SIZE
is the number of processes in a subgroup. In this way, ZONE SIZE controls the number of
processes involved in ring communications. Every random solution is received into the memory
block referenced by *neighbourSpins. Whether this is committed to a process solution set
spins[], depends on the result of applying reduction function(). The latter performs identically
to the copy min() function in Chapter 4, copying the energetically minimal argument to its
complement. Consequentially, line 4 is responsible for accepting or rejecting solutions received
in the ring exchange operation. Line 7 performs the aforementioned collective operation; this
involves each subgroup performing a reduction on their least favourable solutions, using the
communicator Solver Zone. The communicator refers to all processes in a subgroup based on
the instruction
MPI Comm spl i t (COMM, Sol ve r Pr oc I D / ZONE SIZE , 0 , &Sol ver Zone ) ;
which partitions the set of all processes, such that processes with equal Solver ProcID /
ZONE SIZE share the same subgroup. The function reduce minimal spin vectors is itself
based on the MPI Allreduce() operation, using reduction function() as a custom reduction op-
erator. The frequency of reduction is controlled by the value of constant ZONE SIZE.
After the optimisation loop has terminated, the function nd ground states() performs a
number of operations to nalise optimisation, such as determining the most favourable solution
held hitherto in the solution set among processes. The obtained conguration data are copied to
the spins eld of the spin glass data structure, and the solution is output by invoking the function
spinglass write spins(). Memory for storing solution vectors is deallocated, following which
MPI communications are terminated.
To complete the description of the harmony search module, it remains to detail the function
which controls the heuristics termination, get stabilised status(). Like the collective operation
used for exchanging solutions between processes, this is based on reduction operations used to
determine whether the most favourable solutions held by processes have equal energy. This is
achieved with the instructions
c omput e l owe s t e ne r gy (&mi nEnergy , &mi nVect or ) ;
MPI Al l r educe (&mi nEnergy , &gl obal Mi nEner gy , 1 , MPI DOUBLE, MPI MIN , COMM) ;
i f ( mi nEnergy == gl obal Mi nEner gy ) l ocal HasOpt i mum = TRUE;
MPI Al l r educe (&l ocal HasOpt i mum , &al l HaveOpi t i mum , 1 , MPI INT , MPI LAND, COMM
) ;
the rst of which determines the lowest energy locally, the second the lowest energy glob-
ally, followed by a further reduction to determine whether all processes possess solutions with
energies corresponding to that of the globally most favourable solution. This implements the
termination condition described in Chaper 4.
dp gstate nder.c
In Chapter 3, it was established that the ground state energy of the Ising spin glass can be
obtained using an algorithm consisting of nested loops. Based on formulating ground state
energy as a dynamic programming problem, approaches to parallelisation inspired by those used
for matrix/vector multiplication were presented in Chapter 4. The basic O(nm2
2m
) time serial
algorithm for computing ground state energy of the lattice without cyclic boundary conditions
leads to two parallel variants, using a collective communication operation between processes,
or alternatively a cyclic shift operation. The latter was shown to be more memory ecient. To
account for cyclic boundary conditions in more than one dimension, the algorithm is required
to be executed for all congurations of an arbitrary spin row (cf. Theorem 3.3). In the collective
variant, the basic algorithm for systems without cyclic boundary conditions is given by the
pseudocode
Float[] p
Float[] p

for k:= (proc id *
q
p
+1) to ((proc_id+1) *
q
p
)
Float minval :=
for l := 1 to q
if p[l] + M
i, i1
k,l
< minval
minval := p[l] + M
i, i1
k,l
p
[k] := minval
all gather(p
, p
)
which is executed n times for an n m spin column, using vector p
as argument p in
successive iterations of the algorithm and matrices M
i, i1
to store interaction energies between
congurations of spin rows i, i 1. The latter are evaluated in the ith iteration of the algorithm.
The all gather() operation combines the vector distributed among p processors into a single vec-
tor. Upon termination, vector p
contains ground state energies for all congurations of the nth

spin row, from which ground state energy can be obtained for the entire lattice by determining
the minimum vector component.
As described, the algorithm is capable only of computing ground state energy; implicit
information on actual ground state conguration is discarded. To enable this information to
be computed, it is necessary to retain at each iteration of the algorithm the value of l yielding
the assignment p
[k] := minval, for all values of k. This corresponds to retaining the optimal
conguration of row i 1 for each of the q congurations of row i, with 1 < i n. This requires
a two-dimensional array.
Module dp gstate nder.c implements the basic dynamic programming algorithm, suited
for both serial and parallel execution. Both parallel variants based on collective and cyclic shift
operations are implemented. To promote code reuse, this is achieved by using preprocessor
directives for conditional compilation.
Similar to the implementation of harmony search, in addition to the entry function nd ground states(),
the module consists of six static functions. These are responsible for initialising and nalising
message passing, computing ground state energy, manipulating spin rows and applying the ob-
tained ground state conguration to the spin glass data structure.
Given the parallel algorithm in either of its variants, a problem the implementation must
address is how to distribute the set of congurations a spin row may assume, among processes.
This amounts to distributing the rows of matrices M
i,i1
among processes, where each row ac-
counts for a unique conguration of spin row i. As spins assume binary state, a simple approach
is to represent spin subsystems as bit strings, e.g. assigning spin values +1 1, 1 0.
Exploiting the fact that processes are addressed using integer numbers in MPI, the bit string
representation can be split into a prex and sux, where the prex is given by the process
number. For an m spin subsystem and p processors, prexes consist of log
2
p bits, suxes
m log
2
p bits. Providing the number of processes is a power of 2, it is possible to enumerate
all possible spin congurations by each process considering its process number prex, and all
suxes 0 k < 2
mlog
2
p
. This is the approach implemented in dp gstate nder.c.
When nd ground states() is invoked, the implementation begins by initialising message
passing, following which the function get minimum path() is invoked. This is responsible for
initiating a series of further function calls, based on a loop which iterates through each row
in the lattice. After allocating memory for an array *minPath , get minimum path() allocates
**minPathConf, the two-dimensional array used to record optimal subssystem congurations.
The aforementioned loop then commences; for each spin row i, the function
g e t o p t i ma l p r e s t a t e s ( s pi nGl a s s , mi n Pa t h Pa r t i a l , mi nPat hConf [ i ] , i , t r e l l i s Co l s , 0 ) ;
is invoked, which performs the parallel matrix/vector operation previously described in
pseudocode. The arguments are the spin glass data structure to optimise, a memory block cor-
responding to vector p, the matrix row to hold the optimal states of row i-1, the current spin
row, and the total number of elements in p. The nal argument is used to enforce a particular
conguration of the nal spin row. In absence of cyclic boundary conditions its value is not
signicant. Should the spin glass indeed possess cyclic boundary conditions, the loop over spin
rows is repeated for all congurations of this row, and the lowest obtained energy is accepted as
the ground state energy.
Using conditional compilation based on the constant CYCLIC EXCHANGE, two imple-
mentations of get optimal prestates() are provided, to account for both variants of the parallel
algorithm. If CYCLIC EXCHANGE is left undened, a further constant USE MPI allows con-
trol over whether message passing communications are used. If the latter is left undened, the
optimisation proceeds serially.
Both implementations of get optimal prestates() are based on the pseudocode designs pre-
viously discussed, using control ow instructions for dealing with spin rows when i=1, for which
cyclic boundary interactions must be considered. In contrast to the presented pseudocode, the
elements of matrices M
i, i1
are not stored explicitly in a data structure. Instead, loop variables
are used to determine matrix elements as demanded, which are computed by invoking the func-
tions dened in spinglass.h on the spin glass instance. To this end, of importance is the function
adjust spin row(), which modies a spin glass instance according to the bit string representation
of a spin row.
The collective implementation of get optimal prestates() begins by allocating the array
*minPathNew, which is equivalent to vector p
in the pseudocode, with elements distributed

among processes. Elements of *minPathNew are assigned values, based on elements in *min-
Path and interaction energies arising from the examined spin rows. Having completed this
evaluation, distributed vector elements are combined and reassigned to *minPath, using the in-
struction
MPI Al l ga t he r ( minPathNew , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE, mi nPat h ,
t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,COMM) ;
where trellisCols/Solver NProcs is the amount of vector components stored by each pro-
cess, COMM is the global communicator and MPI DOUBLE is the data type of vector elements.
minPath
minPathNew
Congurations of row i 1
Congurations of row i
Determine optimum states of
row i 1, for row i
Gather results held in minPathNew
Figure 6.2: Schematic of operations performed by get optimal prestates() (basic dynamic pro-
gramming, collective operations). In contrast, when using cyclic communications, processes
evaluate dierent congurations of row i 1, shifting elements in minPath.
Aschematic depiction of the optimisation process for a single invocation of get optimal prestates()
is shown in Figure 6.2.
Similar in its operation, the realisation of get optimal prestates() using cyclic shift oper-
ations between processes distributes vector p
among processes using the array *minPathNew.

However, instead of assigning all components of vector p to each process, these are also dis-
tributed among *minPath. This requires multiple communication operations as optimisation
progresses for a single spin row. Here, elements in *minPath are examined in parallel by each
process, however since each only retains a fraction of components in p, it is necessary to perform
a cyclic shift of data. It turns out that as iteration through elements in *minPath progresses, it is
possible to communicate elements residing at neighbouring processes is advance. This suggests
a nonblocking communication scheme, which is implemented in the software module. The non-
blocking communication scheme utilises MPI Issend(), Wait() and Recv() instructions inserted
into the optimisation loops (cf. Appendix F).
After get optimal prestates() has been invoked for all spin rows, it remains to obtain the
ground state energy from *minPath and the corresponding ground state conguration from
**minPathConf. Since the latter stores optimal congurations of preceding spin rows, for each
spin row, the ground state conguration can be recovered. This is achieved by determining the
optimum conguration of the nal spin row, and traversing through matrix rows, referencing
preceding subsystem congurations. Function set optimal cong() performs this activity. It is
invoked by get minimum path(), following which the ground state conguration is output using
spinglass write spins().
Figure 6.3: Sliding window for improved dynamic programming
dp gstate nder fast.c
In Chapter 3, an improved serial algorithm for computing ground states was presented. In
contrast to the previous algorithm, instead of considering interacting spin rows in the lattice,
subsystems can be considered positions of a sliding window. This window covers spin rows
horizontally, such that the total number of spins is equal to the number of columns in the lattice
plus one. As with the row-wise approach, optimisation is achieved by comparing adjacent
subsystems. Here, adjacent subsystems are those obtained by advancing the sliding window by
one spin (Figure 6.3).
In Chapter 4, it was suggested that the matrix/vector approach can be used to arrive at an im-
proved parallel algorithm. As previously, matrices retain interaction energies between adjacent
subsystems. However, as a caveat of the sliding window approach, interacting subsystems must
share spin congurations in the overlapping region between window positions. This means that
for every subsystem conguration, it is only necessary to evaluate interactions with two cong-
urations of the preceding subsystem.
The module dp gstate nder fast.c implements the improved algorithmfor obtaining ground
states, for the lattice without cyclic boundary conditions. Similar in structure to dp gstate nder.c,
the module consists of a function get minimum path(), which is responsible for performing the
main optimisation. Given a spin glass instance, it proceeds to invoke get optimal prestates() in
a loop which iterates through all subsystems in the lattice.
Two main dierences arise from the sliding window approach to subsystems. Firstly,
adjusting spin congurations based on bit strings requires a leading spin to be referenced
in the spin lattice, instead of a spin row. For this reason, the module implements the func-
tion adjust spin ensemble(), whose arguments include the problem instance and referential
spin. Secondly, interaction between subsystems involves the energy introduced by a single
spin interacting with vertical and horizontal neighbours (Figure 6.1(c)). Therefore, function
get optimal prestates() utilises the library function spinglass ensemble delta().
Invoking get optimal prestates() serves the same purpose as previously, namely to record
optimal energy for increasing size, recording conguration data in a two dimensional array.
Again, this is achieved using an argument *minPath which corresponds to vector p in the pseu-
docode algorithm. After the function has returned, this array stores data equivalent to vector
p
. The computation performed by get optimal prestates() is shown in Figure 6.4. Here, ele-
ments corresponding to vector p
are computed in parallel, such that interactions between each

corresponding subsystem conguration and preceding subsystems in both of its two states are
compared. Given the irregular pattern in which elements in *minPath are accessed, the ap-
proach using a collective operation to combine elements of the resulting array *minPathNew is
favourable.
The method of determining congurations of preceding subsystems to evaluate involves ma-
nipulating the subsystems bit string representation. Given a bit string where the most signicant
bit describes the leading spins state, conducting a left arithmetic shift reveals permissable con-
gurations of the preceding subsystem (the least signicant bit may assume 1 or 0). Figure 6.4
illustrates bit strings corresponding to subsystem congurations, for a 2 2 spin lattice.
Once optimisation has completed, as with dp gstate nder.c it remains to restore the ground
state conguration from data stored in **minPathConf. Again, this is achieved using a function
set optimal cong(). In this case, each row of **minPath yields information on the optimum
state of one spin. The nal row is used to infer the state of an entire subsystem. The entire
ground state conguration can then be output.
0
0
0
0
0
1
0
1
0
1
0
0
1
1
0
1
1
1
0
1
1
1
0
1
0
0
0
0
0
1
0
1
0
1
0
0
1
1
0
1
1
1
0
1
1
1
0
1
m
i
n
P
a
t
h
m
i
n
P
a
t
h
N
e
w
W
i
n
d
o
w
p
o
s
i
t
i
o
n
i
1
W
i
n
d
o
w
p
o
s
i
t
i
o
n
i
D
e
t
e
r
m
i
n
e
o
p
t
i
m
u
m
s
t
a
t
e
s
o
f
r
o
w
i
1
,
f
o
r
r
o
w
i
G
a
t
h
e
r
r
e
s
u
l
t
s
h
e
l
d
i
n
m
i
n
P
a
t
h
N
e
w
P
1
P
2
P
3
P
4
Figure 6.4: Schematic of operations performed by get optimal prestates() (improved dyanamic
programming), executed on four processors. The problem instance is a 2 2 spin lattice.
Chapter 7
Performance Evaluation
So far, approaches to solving spin glass ground states have been presented. These include
exactly solving methods based on dynamic programming, and the harmony search heuristic.
Both approaches are implemented in software, suited for serial and parallel execution using
MPI. The dynamic programming implementation incorporates two variants, which are referred
to as the basic and improved algorithms. Previous complexity analysis showed that the improved
algorithm requires less run time than its counterpart.
In examining techniques for parallelising these exact and heuristic algorithms, further al-
ternatives were described in Chapter 4. In the case of the dynamic programming algorithms,
approaches based on collective and cyclic communication patterns were given. The latter are
implemented using nonblocking synchronous send operations in MPI. Both collective and cyclic
variants are applicable to the basic dynamic programming algorithm, whereas the improved dy-
namic programming algorithm relies solely on collective communications.
In this chapter, the aforementioned solver implementations are examined in terms of their
performance. Data are presented against varying parameters and interpreted. For the parallel
exact solvers, a comparison is given between attainable performance on the Ness and HPCx
machines.
7.1 Serial performance
In the development process, serial versions of ground state solvers were implemented prior to
their parallel analogues. For the exact algorithms, besides facilitating an incremental develop-
ment strategy, this allowed an initial evaluation of performance, in order to gauge the possible
behaviour of parallel dynamic programming. Similarly, performance data for serial harmony
search were examined, in particular to assess the accuracy of solutions generated by the algo-
rithm.
69
70 Chapter 7. Performance Evaluation
0
500
1000
1500
2000
2500
3000
0 20 40 60 80 100 120 140 160 180 200
T
i
m
e

(
s
)
Spins
Serial dynamic programming code performance
Figure 7.1: Execution times for serial dynamic programming (basic algorithm)
7.1.1 Dynamic programming
Execution time data for serial dynamic programming were gathered on Ness. The experimental
procedure involved invoking both variants of the algorithm on the machines back-end, against
varying problem sizes. Timing data were recorded using the shells time command. While
oering limited accuracy and resolution, this method was deemed sucient, considering the
magnitude of execution times. The source code was compiled using the gcc compiler, supplying
the -O2 optimisation ag. Random problem instances were generated as square lattice k-spin
systems without cyclic boundary conditions.
Basic algorithm
Results for basic dynamic programming are shown in Figure 7.1. As shown, problem instances
are generated for systems of up to 14
2
spins. As one would expect, execution time rises mono-
tonically, such that the recorded time for 14
2
spins is approximately 42min. Considering the
ascertainments made in Chapter 3 about the algorithms asymptotic behaviour, the graph ap-
pears to conrm an exponential relationship between system size and execution time.
To examine run time behaviour more closely, the data are visualised as a logarithmic plot
(Figure 7.2). Here, it is apparent that execution time cannot be accurately approximated with
the function f (k) = e
k
, since it is ln ( f (k)) = ln () + k, which corresponds to a line. Also,
the plot shows near-constant values for the rst three data points. This is likely to result from
limited timing resolution.
In Chapter 3, the algorithms asymptotic complexity was shown to be O
_
k 2
2
k
_
, for a
7.1. Serial performance 71
-12
-10
-8
-6
-4
-2
0
2
4
6
8
0 20 40 60 80 100 120 140 160 180 200
l
g
(
T
i
m
e
)

(
s
)
Spins
Curve fit
Figure 7.2: Log execution times for serial dynamic programming (basic algorithm)
square lattice k-spin system without cyclic boundary interactions. From this fact, it is clear
that a more accurate model of execution time must consider an exponential relationship with
respect to the root of system size. The function f (k) = e
k
is thought to be an adequate
approximation.
Figure 7.2 includes a t of the function ln ( f (k)) = ln () +
k to log plotted data points.

The rst three data points are excluded from the t. This was obtained using the Marquardt-
Levenberg algorithm implemented in Gnuplot. With asymptotic standard errors of 0.9365%
and 0.8656% respectively, values of = 1.77111 10
6
and = 1.50197 were computed. The
value

ln 2
= 2.1667 bears similarity to the theoretical value of 2 in the exponential term of the
algorithms asymptotic complexity. The greater value may be attributed to approximation using
constant .
Improved algorithm
Results for improved dynamic programming are shown in Figure 7.3. Here, problem instances
were generated in the range of k = [4, 361] spins. Comparison with Figure 7.1 reveals that
as expected, execution times are lower. As a practical advantage, this allowed the algorithms
performance to be evaluated against larger problem instances during experimentation.
A log plot of these data is shown in Figure 7.4. As before, this representation reveals near-
constant execution time for the rst data points in the series. A unique feature is the data point
at k = 49, which is an outlier in what appears to be another exponential curve against

k. It is
speculated that the outlier is due to caching eects: The Opteron 1218 processor on Ness has
a 64KiB L1 data cache, which is likely to be sucient for containing optimisation data held in
0
20
40
60
80
100
120
0 50 100 150 200 250 300 350 400
T
i
m
e

(
s
)
Spins
Serial improved dynamic programming code performance
Figure 7.3: Execution times for serial dynamic programming (improved algorithm)
-10
-8
-6
-4
-2
0
2
4
6
0 50 100 150 200 250 300 350 400
l
g
(
T
i
m
e
)

(
s
)
Spins
Curve fit
Figure 7.4: Log execution times for serial dynamic programming (improved algorithm)
0
1e+06
2e+06
3e+06
4e+06
5e+06
6e+06
7e+06
8e+06
100 200 300 400 500 600 700
R
e
s
i
d
e
n
t

m
e
m
o
r
y

c
o
n
s
u
m
p
t
i
o
n

(
K
i
B
)
Spins
Figure 7.5: Memory consumption for serial dynamic programming (basic algorithm)
**minPathConf and *minPath (cf. Chapter 6): The former requires 672
8
4bytes = 42KiB,
the latter 2
8
4bytes = 1KiB. The spin glass data structure is estimated to require less than
1KiB, yielding a total of less than 64KiB (considering the size of additional memory blocks).
Fitting the log plot to the function used for analysing basic dynamic programming, ln ( f (k)) =
ln () +
k allows further comparison of the two algorithms. Using the same procedure for
producing the t, obtained values are = 1.0845 10
5
, = 1.2275, with asymptotic standard
errors of 0.8924% and 0.9401%, respectively. The value of is close to the theoretical value
of 1 in the exponential term of the algorithms complexity function; compared to basic dynamic
programming, execution time is observed to grow at a slower rate, as expected.
Memory consumption
Brief experiments were conducted to assess memory consumed by the dynamic programming
implementations. Considering resident memory values, as reported by the top process utility,
data were recorded by initiating computation using increasingly large problem sizes. For both
algorithms, as allocated memory remains constant for the majority of computation, it was not
necessary to execute until termination.
Plots of memory consumption are shown in Figures 7.5,7.6. For basic dynamic program-
ming, the data reveal that to avoid swapping on a machine with 4GiB (e.g. Ness), the maximum
problemsize is a 2424 spin lattice. With improved dynamic programming, the maximumprob-
lem size decreases to 19 19 spins. This behaviour is expected, since **minPathConf contains
O
_
k 2
k
_
vs. O
_
k 2
k
_
elements, for a k-spin square lattice. Again using a log plot approach
(Figures 7.7,7.8), performance data are t to the function f (k) = k
k
, whose logarithm is
10
11
12
13
14
15
16
100 200 300 400 500 600 700
l
g
(
R
e
s
i
d
e
n
t

m
e
m
o
r
y

c
o
n
s
u
m
p
t
i
o
n
)

(
K
i
B
)
Spins
Curve fit
Figure 7.6: Log memory consumption for serial dynamic programming (basic algorithm)
ln () + ln (k) +
k
ln(2)
. For basic dynamic programming, obtained values are = 9.46851,
= 40.42 (asymptotic standard errors 1.401% and 1.924%, respectively). The values for im-
proved dynamic programming are = 6.76659, = 27.1801 (asymptotic standard errors
2.092% and 2.844%). Comparing the two values of , it is apparent that between the two vari-
ants of dynamic programming, there exists a trade-o between execution time and memory
eciency: In terms of execution time, improved dynamic programming is preferable, whereas
for memory consumption, the basic algorithm is preferable.
7.1.2 Harmony search
Serial harmony search was evaluated by comparing solutions generated by the heuristic to
ground truth, based on a 6 6 spin problem instance with equally distributed bonds in the
range [1, 1). Ground truth was obtained by conducting an exhaustive search on the problem
instance. While varying the number of solution vectors used, search was executed multiple
times. Results were used to compute minimum error , mean error and standard error values.
Totalling 80 executions for each value of NVECTORS, results are presented in Table 7.1.
As shown, standard and mean error values improve monotonically when increasing algo-
rithm memory capacity. No improvement in error rate is given, when increasing memory to
NVECTORS=50; the algorithms ability to nd the exact ground state decreases under the spec-
ied parameter value. Despite this,
e
and
e
suggest that large NVECTORS benets solution
quality in general. This is in agreement with the behaviour of solution exploration described
in Chapter 3. Exploring the algorithms behaviour against large NVECTORS is indeed the mo-
tivation behind developing parallel harmony search.
0
1e+06
2e+06
3e+06
4e+06
5e+06
6e+06
7e+06
100 150 200 250 300 350 400
R
e
s
i
d
e
n
t

m
e
m
o
r
y

c
o
n
s
u
m
p
t
i
o
n

(
K
i
B
)
Spins
Figure 7.7: Memory consumption for serial dynamic programming (improved algorithm)
10
11
12
13
14
15
16
100 150 200 250 300 350 400
l
g
(
R
e
s
i
d
e
n
t

m
e
m
o
r
y

c
o
n
s
u
m
p
t
i
o
n
)

(
K
i
B
)
Spins
Curve fit
Figure 7.8: Log memory consumption for serial dynamic programming (improved algorithm)
NVECTORS = 1 NVECTORS = 2 NVECTORS = 10 NVECTORS = 50
e
1.84 1.55 0.97 0.83
e
0.83 0.77 0.77 0.61
e
0.06 0.10 0.14 0.10
Table 7.1: Mean error
e
, standard error
e
and error rate
e
of serial harmony search ground
states for increasing solution memory NVECTORS. Results are based on the ground truth value
30.7214. Error rate is dened as the amount of correctly obtained ground state congurations
over the total amount of algorithm invokations.
Optimisation ags Execution time
-O0 10.682s
-O1 10.542s
-O2 6.354s
-O3 6.340s
-O3 -funroll-loops 4.043s
-O3 -funroll-loops -ftree-loop-im 4.043s
-O3 -funroll-loops -ftree-loop-im -funswitch-loops 4.043s
Table 7.2: Serial execution times for basic dynamic programming on Ness, for various GCC 4.0
optimisation ags
7.2 Parallel performance
The architecture of the Ness and HPCx machines was described in Chapter 5. In the following,
the method and results of performance assessment are presented for the implemented parallel
algorithms. As with the serial algorithms, results are interpreted.
7.2.1 Dynamic programming
Since the dynamic programming algorithms are deterministic, the opportunity is given to assess
parallel performance in terms of execution time. That is, given parallel execution time T
p
on
p processors, and serial execution time T
s
, it is possible to describe performance in terms of
parallel eciency T
s
/(T
p
p).
In preparation for experiments on Ness, serial execution time was measured against vari-
ous combinations of gcc compiler ags, based on the basic dynamic programming algorithm
and a 11 11 test spin problem. Using the -O3 optimisation level with ag -funroll-loops,
for automated loop unrolling oered the greatest gain in performance over unoptimised code.
Timing data are shown in Table 7.2. This behaviour is not surprising, since the code is heavily
reliant on loops for processing spin glass data structures. In contrast, rudimentary analysis of
the source code reveals few cases where performance would likely benet from loop-invariant
motion (pertaining to other optimisation ags used).
On HPCx, the same test spin problem was used to assess execution time on the machines
serial job node. Here, the eect of target architecture optimisation was considered, using the
xlc r re-entrant compiler, version 8.0. For all tests, 64-bit compilation was enabled using the
-q64 ag. Timing data are listed in Table 7.3. The set of compiler ags used for parallel
performance evaluation was -qhot -qarch=pwr5 -O5 -Q -qstrict.
The parallel environment on HPCx allows control over a number of settings [3], poten-
tially inuencing distributed application performance. Specically, settings eect the protocol
used for communicating between shared memory nodes, including use of remote direct memory
7.2. Parallel performance 77
Optimisation ags Execution time
-g 91.50s
-qhot -qarch=pwr4 -O3 19.99s
-qhot -qarch=pwr5 -O4 19.23
-qhot -qarch=pwr5 -O5 -Q 18.26
-qhot -qarch=pwr5 -O5 -Q -qstrict 17.95
Table 7.3: Serial execution times for basic dynamic programming on HPCx, for various xlc
optimisation ags
Communications directive Execution time
US,bulkxfer 12.68s
US 14.12s
IP,bulkxfer 14.32s
IP 14.26s
Table 7.4: Results for parallel basic dynamic programming on HPCx using 32 processors, for
combinations of user space (US) or IP communications in conjunction with the bulkxfer direc-
tive
access (RDMA). Using requested processor counts of 32 to examine the eect of inter-node
communications, timing data were recorded using combinations of two LoadLeveler directives,
network.MPI and bulkxfer. The former is responsible for dening the aforementioned protocol,
while the latter controls direct memory access. Table 7.4 shows results obtained for message
passing using IP and user space protocols, in conjunction with RDMA. Given these results, it
was decided to utilise the user space protocol for computations involving multiple nodes.
Performance on Ness
Results for basic dynamic programming on Ness are shown in Figures 7.9, 7.10, 7.11. From
Figure 7.9, it is apparent that execution time generally diminishes as the processor count is
increased, with the exception of smaller 1010 and 1111 spin instances. The log plot indicates
that for larger problem instances, execution time becomes increasingly linear in relation to the
number of processors. This is in agreement with theoretical O
_

k
p
2
2
k
_
execution time for a
k-spin square lattice.
The parallel eciency plot in Figure 7.10 conrms near-linear scaling for large problem
instances. Good scalability is observable for problem instances larger than 13 13 spins, for
which eciency exceeds 90% on 16 processors. Eciency drops approximately linearly for
instances larger than 11 11 spins. As an extreme case, performance drops sharply for the
0.1
1
10
100
1000
10000
0 2 4 6 8 10 12 14 16
T
i
m
e

(
s
)
Processors
10x10 spins
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
Figure 7.9: Parallel execution time for dynamic programming (basic algorithm, Ness)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
P
a
r
a
l
l
e
l

e
f
f
i
c
i
e
n
c
y
Processors
10x10 spins
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
Figure 7.10: Parallel eciency for dynamic programming (basic algorithm, Ness)
0.75
0.8
0.85
0.9
0.95
1
0 2 4 6 8 10 12 14 16
A
p
p
l
i
c
a
t
i
o
n

t
i
m
e

(
s
)

/

T
o
t
a
l

e
x
e
c
u
t
i
o
n

t
i
m
e

(
s
)
Processors
10x10 spins
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
Figure 7.11: Vampir trace summary for dynamic programming (basic algorithm, Ness)
10 10 instance, at a rate decreasing against p.
To interpret these results, it is reminded that the basic dynamic programming algorithm re-
quires a sequence of

k blocking, collective gather operations to complete computation. For
each of these operations, each processor contributes 2
k
elements. After ground state energy
has been obtained from array *minPath, the ground state conguration is recovered from **min-
PathConf through a similar sequence of

k gather operations.
Clearly, scalability is aected by the size of problem instances, since this inuences the
amount and size of messages sent between processors. If the cost of a single collective gather
is approximated as t
gather
= p
_
T
0
+ m
1
B
_
where p is the number of processors, T
0
the message
initialisation cost, m the message size and B the bandwidth, it follows that for constant mes-
sage size, overall cost relates linearly to p. This serves as a possible explanation for the linear
reduction in parallel eciency observed for the majority of problem instances in Figure 7.10.
The increase in eciency for larger problem instances can then be attributed to the fact that
computing ground state energy requires
1
p
2
m
operations per processor (cf. Chapter 4). Conse-
quentially, for constant p, the fraction m/
m
2
p
diminishes as m is increased; communication costs
thus become less signicant as the problem size increases. It is speculated that the 10 10 spin
lattice causes severe imbalance between communication and computation, so that the amount of
computation is closely approximated by a constant, regardless of p.
Figure 7.11 shows the fraction T
c
/T
m
of parallel computation time over communication
time. These data were gathered by re-linking compiled source code with the Vampir library and
recording summary data as reported by the applications trace utility. Time spent on tracer API
calls is omitted. As a general trend, it is observed from the plot that increasing the number of
1
10
100
1000
10000
0 2 4 6 8 10 12 14 16
T
i
m
e

(
s
)
Processors
10x10 spins
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
Figure 7.12: Parallel execution time for dynamic programming (basic algorithm, cyclic com-
munications, Ness)
processors does indeed increase the proportion of time spent on communication. For the 1414,
1515 lattices, T
c
/T
m
does not decrease monotonically with p. This may be due to the accuracy
of trace data, which indicate a non-monotonic relation between lattice sizes and scalability.
Having examined performance of basic dynamic programming using collective operations,
a similar procedure is given for the approach based on cyclic communication. In Figures 7.12,
7.10, 7.11, plots of execution time, parallel eciency and the fraction T
c
/T
m
are shown. From
Figure 7.12, it is again observed that increasing the processor count causes execution time to
diminish, with the exception of the 1010 lattice. For the latter, performance appears to degrade
more profoundly as with the collective variant of the algorithm. This is to the extent that exe-
cution time on 16 processors exceeds that obtained for a single processor. For larger processor
counts and the remaining problem instances, performance appears to degrade uniformly; this
eect is shown more clearly in Figure 7.13. Here, parallel eciency uctuates in the range of
[1, 4] processors, before decreasing monotonically for each examined problem instance. Signi-
cantly, scalability does not improve monotonically as lattice size is increased. Nevertheless, it is
possible to group problem instances into two categories, such that the smaller 1010 and 1111
lattices result in parallel eciency in the range [.4, .5] on four processors, with the remainder
attaining [.99, .8] eciency. Increasing the processor count to 16, parallel eciency drops to
[.4, .5], [.01, .2] for the respective groups. From Figure 7.14, it is observed that communication
costs become signicant for all problem sizes, as the processor count increases: For p = 16, the
fraction T
c
/T
m
lies in the range [.4, .5] for all examined lattices, except the 10 10 lattice, for
which the fraction is further diminished due to communication costs.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
P
a
r
a
l
l
e
l

e
f
f
i
c
i
e
n
c
y
Processors
10x10 spins
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
Figure 7.13: Parallel eciency for dynamic programming (basic algorithm, cyclic communica-
tions, Ness)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
A
p
p
l
i
c
a
t
i
o
n

t
i
m
e

(
s
)

/

T
o
t
a
l

e
x
e
c
u
t
i
o
n

t
i
m
e

(
s
)
Processors
10x10 spins
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
Figure 7.14: Vampir trace summary for dynamic programming (basic algorithm, cyclic commu-
nications, Ness)
0.1
1
10
100
1000
0 2 4 6 8 10 12 14 16
T
i
m
e

(
s
)
Processors
10x10 spins
12x12 spins
14x14 spins
16x16 spins
18x18 spins
20x20 spins
22x22 spins
Figure 7.15: Parallel execution time for dynamic programming (improved algorithm, Ness)
Comparing the two variants performance, it is observed that using collective communi-
cations reduces execution time on few processors. This suggests that in this case, collective
communication costs are less expensive than cyclic operations. Also, it is reminded that the
cyclic variant of the algorithm requires additional conditional statements, which increases the
number of branch instructions in the code. Scalability is signicantly reduced, indicating that
problem instances signicantly larger than 15 15 spins are required to obtain favourable ef-
ciency at p > 16 processors. It is possible that suciently large problem instances might
expose the cyclic approach as advantageous, these are however not explored due to restricted
experimental time scales. For the examined problem sizes, reduced scalability is thought to be
inuenced by synchronisation overhead, such that the amount of computation within the nested
loops
is not sucient to merit overlapping communications.

Results for improved dynamic programming executed on Ness are shown in Figures 7.15,
7.16, 7.17. For all examined problem instances, parallel execution times behave similarly as ob-
served for the 10 10 lattice using basic dynamic programming: Here, increasing the processor
count causes performance to degrade severely for smaller lattices, such that parallel eciency
drops to around 20% at p = 4 processors. Larger lattices result in slightly enhanced paral-
lel eciency, however increasing to p = 16 causes near-uniform degradation to around 10%.
Figure 7.17 shows performance degradation from the perspective of computation and commu-
nication time. The fraction T
c
/T
m
behaves as expected in relation to Figure 7.16, indicating
that performance degradation is due to communication costs. In comparison to basic dynamic
programming using cyclic communications, the eect of increasing processors is further pro-
cf. Chapter 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
P
a
r
a
l
l
e
l

e
f
f
i
c
i
e
n
c
y
Processors
10x10 spins
12x12 spins
14x14 spins
16x16 spins
18x18 spins
20x20 spins
22x22 spins
Figure 7.16: Parallel eciency for dynamic programming (improved algorithm, Ness)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10 12 14 16
A
p
p
l
i
c
a
t
i
o
n

t
i
m
e

(
s
)

/

T
o
t
a
l

e
x
e
c
u
t
i
o
n

t
i
m
e

(
s
)
Processors
10x10 spins
12x12 spins
14x14 spins
16x16 spins
18x18 spins
20x20 spins
22x22 spins
Figure 7.17: Vampir trace summary for dynamic programming (improved algorithm, Ness)
1
10
100
1000
10000
100000
0 50 100 150 200 250 300
T
i
m
e

(
s
)
Processors
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
16x16 spins
Figure 7.18: Parallel execution time for dynamic programming (basic algorithm, HPCx)
nounced, such that T
c
/T
m
is reduced to under 20% at 16 processors.
Comparing basic and improved variants of the algorithm, it appears there exists a trade-
o between scalability and algorithmic complexity. Whereas basic dynamic programming has
higher algorithmic complexity, results show favourable scalability up to 16 processors. In con-
trast, improved dynamic programming is a more ecient algorithm in terms of complexity,
however scalability is considerably diminished on Ness for examined problem sizes. A possible
explanation for this behaviour is provided by the number of communication operations, which
is O(k) for the improved variant, versus O
_
k
_
required for the basic variant, for a k-spin lat-
tice. Given that communication takes place every O
_
2
2
k
_
instructions, versus every O
_
2
k
_
instructions for basic (collective) and improved algorithms, respectively, it is clear that the ratio
of computation against communication is lower for the improved algorithm. Since communi-
cations are non-blocking in both cases, it follows that for improved dynamic programming, a
greater proportion of execution time is due to communication operations. As a consequence,
this reduces scalability.
Performance on HPCx
Plots of performance data on HPCx for basic dynamic programming using collective communi-
cations are shown in Figures 7.18, 7.19. Because of how the machines resources are grouped
into logical partitions and their implication for time budgeting, the processor count was scaled
like 16 2
n
, albeit to greater magnitude than on Ness. For small problem sizes, behaviour is as
observed on Ness, where increasing the processor count eects little improvement in execution
time. Scalability improves as problem size is increased, to the extent that parallel eciency is
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0 50 100 150 200 250 300
P
a
r
a
l
l
e
l

e
f
f
i
c
i
e
n
c
y
Processors
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
16x16 spins
Figure 7.19: Parallel eciency for dynamic programming (basic algorithm, HPCx)
greater than 95%, for lattices with 15 15, 16 16 spins solved on 256 processors. A distinct
feature is observed for the 15 15 lattice, where super linear speedup appears to occur in the
range of [16, 128] processors.
In Figures 7.20, 7.21, results for the algorithm variant using cyclic communications are
shown. In comparison to the collective approach, again performance improves as problem size
is increased. However, the obtained parallel eciency is around 60% at 256 processors, for a
1616 spin lattice. This decline in performance is similar to that observed on Ness. In contrast,
on HPCx , increasing parallel eciency reects the ordering of problem sizes more accurately.
Fluctuations observed on Ness are not present; for all examined problem instances execution
time decreases monotonically against the number of processors. As with the collective variant,
parallel eciency obtained for the 15 15 lattice exceeds that for the 16 16 lattice, on 16
and 32 processors. In contrast, scaling performance is not sucient for super linear speedup, as
previously noted.
Results for improved dynamic programming on HPCx are shown in Figures 7.22, 7.23.
Here, performance drops rapidly for all explored problem sizes, such that executing on 16 pro-
cessors reduces parallel eciency to below 50%. Increasing the number of processors, e-
ciency tails o further; at 256 processors, it is less than 10%. Signicantly, in resemblance to
the aforementioned results, the largest examined problem instance does not result in the most
scalable computation: The 22 22 lattice falls behind 18 18 and 20 20 instances in terms of
parallel eciency. This phenomenon is observed for all evaluated processor counts.
Concluding from performance data on HPCx, the three algorithm variants exhibit varying
degrees of scalability. From most to least scalable, the algorithms are ordered as:
1
10
100
1000
10000
100000
0 50 100 150 200 250 300
T
i
m
e

(
s
)
Processors
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
16x16 spins
Figure 7.20: Parallel execution time for dynamic programming (basic algorithm, cyclic com-
munications, HPCx)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100 150 200 250 300
P
a
r
a
l
l
e
l

e
f
f
i
c
i
e
n
c
y
Processors
11x11 spins
12x12 spins
13x13 spins
14x14 spins
15x15 spins
16x16 spins
Figure 7.21: Parallel eciency for dynamic programming (basic algorithm, cyclic communica-
tions, HPCx)
1
10
100
1000
0 50 100 150 200 250 300
T
i
m
e

(
s
)
Processors
12x12 spins
14x14 spins
16x16 spins
18x18 spins
20x20 spins
22x22 spins
Figure 7.22: Parallel execution time for dynamic programming (improved algorithm, HPCx)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100 150 200 250 300
P
a
r
a
l
l
e
l

e
f
f
i
c
i
e
n
c
y
Processors
12x12 spins
14x14 spins
16x16 spins
18x18 spins
20x20 spins
22x22 spins
Figure 7.23: Parallel eciency for dynamic programming (improved algorithm, HPCx)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100 150 200 250 300
P
a
r
a
l
l
e
l

e
f
f
i
c
i
e
n
c
y
Processors
16x16 spins, Improved DP
16x16 spins, Basic collective DP
16x16 spins, Basic cyclic DP
Figure 7.24: Summary of parallel eciencies on HPCx
Basic algorithm using collective communications
Basic algorithm using cyclic communications
Improved algorithm using collective communications
This ordering is as observed on Ness, however problem scalability is higher on HPCx, for each
of the variants. This is attributed to lower communication costs on HPCx, resulting from higher
message passing bandwidth available on the machine. A summary of the algorithms parallel
eciency on HPCx is shown in Figure 7.24, based on a 16 16 lattice.
7.2.2 Harmony search
The parallel harmony search algorithm introduced in Chapter 4 is based on a combination of
two types of communication operation. Considering additional algorithm parameters, the al-
gorithm exhibits a high degree of exibility; this leads to a potentially large set of algorithm
variants. The latter must be considered when examining performance. To restrict the space
of algorithm variants, it was decided to conne the behaviour of communication operations:
Hence, cyclic operations are based on exchanging random solution vectors between processes,
such that favourable solutions are retained. Collective operations take place between process
groups of specied size. Cyclic operations are executed every iteration of the harmony search
algorithm, while collective operations are executed periodically.
The question arises how to assess the heuristics parallel performance. For a deterministic
algorithm, such as the exact dynamic programming based solver, performance is characterised
Processors
T
i
m
e
(a) Non-heuristic
A
c
c
u
r
a
c
y
Processors
T
i
m
e
(b) Heuristic
Figure 7.25: Conceptual representation of properties relevant to parallel performance
by scalability. Scalability is quantied in terms of the algorithms execution time against the
number of processors on which it is executed. From the latter, measures such as speedup and
parallel eciency can be computed. This leads to a two-dimensional space (Figure 7.25(a)),
which may be explored experimentally; for a given problem size, it may for example be of in-
terest to approximate the function which maps the number of processors to execution time. In
the case of heuristic algorithms however, an additional dimension is signicant for characteris-
ing performance, namely the accuracy of generated solutions. As a result, the space in which
performance is evaluated is three-dimensional (Figure 7.25(b)). Experimental exploration may
involve assessing the relation between accuracy and execution time, for a given number of pro-
cessors. Another possibility might involve approximating the boundary surface in the space,
providing such a surface exists.
From the discussion in Chapter 3, it is evident that quantifying solution accuracy is non-
trivial: It is necessary to dene a measure to compare solutions with one another. An obvious
approach is to use the utility function, if dened by the heuristic. However, it might prove
advantageous to employ a measure more reective of the problems solution landscape, for
example considering the distribution of solution utility values.
In the following description of an attempt at performance evaluation, parallel harmony
search was executed on a number of test instances, while varying the number of processes and
a selection of algorithm parameters. As previously explained, the algorithm possesses a signif-
icant number of parameters. Given the specied communication strategies, these include the
number of solution vectors NVECTORS, the memory choosing rate, and the rate of performing
collective operations ZONEEXBLOCK.
Experiment series are based on three lattice sizes of 12 12, 14 14 and 15 15 spins.
For each size, ve instances were generated, using random uniform bond distributions in the
range [1, 1). The procedure for every conguration of parameters and process count involved
executing the algorithm on each lattice instance ve times. Result data were then collected and
mean values computed. A single data point used in visualisation corresponds to the mean result
obtained for a given lattice size instance.
Evidently, using several problem instances multiplies the number of times the parallel al-
gorithm must be invoked. As a compromise to reduce the number of invokations, the two
parameters NVECTORS and the memory choosing rate were held constant. More importantly,
the three-dimensional space to explore is adapted, such that execution time is replaced by the
number of loop iterations executed by harmony search. This is thought to better reect the
performance property of state space exploitation, described in Chapter 3. An advantage of the
parallel algorithms design is that it terminates when all processes hold identical solution vec-
tors (cf. Chapter 4). Consequentially, the aforementioned performance property can be seen
as a dependent variable reecting solution exploitation, which need not be considered when
permuting algorithm parameters. Eectively, this allows performance assessment to be divided
between exploring the relations number of processes against accuracy and number of processes
against algorithm iterations.
Experiments were carried out on Ness, using up to 16 processors. The size of processor
subgroups ZONESIZE was varied in the range [1, 16], so that the number of processors lies in
the range [ZONESIZE, 16] for each experiment. The parameter ZONEEXBLOCK was variably
assigned values 10
2
, 10
3
, 10
4
. For each lattice instance, solution accuracy was characterised
in terms of energetic aberrance from ground truth data obtained using dynamic programming.
Also, solution congurations were compared using the Hamming distance [35]
. Finally, the
number of algorithm iterations was recorded.
Performance results
In Figure 7.27 performance data for ZONEEXBLOCK = 10 are shown, against varying proces-
sor numbers, lattice sizes and ZONESIZE. Quantitatively, the plot corresponds to the series of
experiments where collective operations are performed frequently among processes. As the al-
gorithm is dened, solutions are exchanged at a constant rate between process groups. The latter
however vary in size with parameter ZONESIZE, as previously mentioned. Given a subgroup
size, the smallest collection of processes consists of a single subgroup; in general the processor
count must be a multiple of ZONESIZE. For this reason, curves in the plot vary in length. As an
example of reading the plot, consider the curves s16 which range from 4 to 16 processes. These
correspond to invoking the algorithm with a subgroup size of 4. As a special case, for each plot
there exist two curves per lattice size in the range [1, 16]. These correspond to subgroup sizes
of 1 and 2.
Figure 7.27 describes
E
, the dierence between ground truth and mean solution energies,
The implemented algorithm takes the complement of spin congurations into account, where all spin states are
inverted.
1
10
100
1000
10000
100000
2 4 6 8 10 12 14 16
I
t
e
r
a
t
i
o
n
s
Processors
s12
s14
s16
Figure 7.26: Parallel harmony search convergence durations (ZONEEXBLOCK= 100)
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
2 4 6 8 10 12 14 16
D
e
l
t
a

E
Processors
s12
s14
s16
Figure 7.27: Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100)
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
2 4 6 8 10 12 14 16
D
e
l
t
a

E
Processors
s12
s14
s16
against processors p. On initial consideration, it is observed that increasing the processor count
reduces aberrance in some cases: Accuracy for one of the 16 16 spin lattice series improves
from around 160 to 60 at 16 processors. It turns out that this series corresponds to the
parameter value ZONESIZE = 1. Similar improvements occur for 12 12 and 14 14 lattices,
from 120 and 85 to 35 and 17, respectively. However, increasing ZONESIZE to 2 eects
an increase in solution accuracy in all cases, such that little improvement in accuracy is observed
when increasing p.
Comparing Figures 7.27, 7.28, 7.29 allows insight to be gained into the eect of increas-
ing the frequency of collective exchanges within processor subgroups. For increasing ZONE-
EXBLOCK, the eect of p becomes less signicant: With the exception of experiment series
conducted for ZONESIZE = 1, all processor counts yield energetic aberrances in the approxi-
mate range [10, 20]. For ZONESIZE = 1, behaviour is consistent for all values of ZONE-
EXBLOCK, to the extent that increasing p eects a signicant increase in solution accuracy as
observed for ZONEEXBLOCK = 10
2
.
From the previous observations, two conclusions can be drawn with regard to solution ex-
ploration. Firstly, it appears that increasing the value of ZONEEXBLOCK causes solution ex-
ploration to improve, given that accuracy as characterised by
E
improves. This is in agreement
with the assumption made in Chapter 4, where solution exploration and exploitation were de-
scribed as opposing qualities in the search process. Assuming that collectively exchanging so-
lutions benets solution exploitation, an obvious consequence of reducing the frequency of this
operation is increased solution accuracy. Secondly, from the increase in solution accuracy be-
tween subgroups sized 1 and 2, it is concluded that contrary to prior expectation, the ring-based
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
2 4 6 8 10 12 14 16
D
e
l
t
a

E
Processors
s12
s14
s16
scheme of exchanging solutions contains an element of solution exploitation. In increasing the
size of subgroups, more opportunity is evidently given for diverse solution islands, since there
exist processes only participating in infrequent collective operations. A possible explanation for
the increase in accuracy against p is the circumference of the ring in which processes exchange
solutions. For large circumferences, it becomes increasingly propagating a solution across the
ring becomes increasingly involved. This also improves solution diversity.
Figures 7.26, 7.30, 7.31 show performance results in terms of algorithm iterations until con-
vergence. The scheme is identical to that used to visualise solution aberrance. In Figure 7.26,
results for ZONEEXBLOCK = 100, (where collective operations occur frequently) show that
increasing p above ZONESIZE causes a reduction in execution time for all lattice and process
subgroup sizes. As previously observed, an exception are the series executed for unit ZONE-
SIZE, where the number of iterations increases against the processor count. Also, maximum
execution times occur for ZONESIZE = 16.
These results are interpreted as follows: Firstly, the reduction in execution times against p is
attributed to the solution exploitation property of ring-based communications: As p is increased,
so does the number of processor subgroups. Since the latter exchange solutions frequently, con-
vergence is promoted between those processes involved in ring communications. Convergence
between remaining processors is aected by the rate of subgroup communications. Secondly,
when no cyclic communications take place, it follows that convergence is only promoted by
collective communications, which in all experiments occur infrequently in comparison to cyclic
communications. This serves as an explanation for peak execution times when ZONESIZE = p.
Thirdly, for unit ZONESIZE execution times are comparatively short, which is attributed to
1
10
100
1000
10000
100000
1e+06
2 4 6 8 10 12 14 16
I
t
e
r
a
t
i
o
n
s
Processors
s12
s14
s16
1
10
100
1000
10000
100000
1e+06
2 4 6 8 10 12 14 16
I
t
e
r
a
t
i
o
n
s
Processors
s12
s14
s16
50
60
70
80
90
100
110
120
130
2 4 6 8 10 12 14 16
H
a
m
m
i
n
g

d
i
s
t
a
n
c
e
Processors
s12
s14
s16
Figure 7.32: Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 100)
absence of processes exempt from cyclic communications. Since the latter occur frequently,
convergence is promoted especially rapidly.
Figures 7.32, 7.33, 7.34 plot the Hamming distances of generated solutions against proces-
sors, for all conducted experiment series. This metric is designed to expose accuracy in terms
of the number of dissimilar spin states, in solutions generated by the heuristic. Increasing the
number of processors to 16 appears to decrease Hamming distance slightly, for all lattice in-
stances. It is observed that distances are approximately equal to
k
2
, where k is the number of
spins. This suggests that the distribution of spin congurations against system energy might be
uniform. Considering this, the metric does not appear expressive of solution accuracy.
Overall, results indicate that parallel Harmony search does improve solution accuracy. How-
ever, it must be considered that the improvements shown in Figures 7.27, 7.28, 7.28 are marginal.
Also, it is noted that comparatively good performance is achieved on few processors, providing
algorithm parameters are selected carefully. Cyclic communications were observed to contain
a signicant element of solution exploitation. Unsurprisingly considering the latter, lowest en-
ergetic aberrance is achieved when communications are minimised. The attempt to quantify
accuracy in terms of Hamming distance highlights the diculty of obtaining solutions heuris-
tically: The spin glass problem appears to have a rough solution landscape, which poses a
diculty for nding ground states using harmony search. In all conducted experiment series,
only suboptimal solutions were found.
Because of their fundamental dierences, comparison between examined exact approaches
and harmony search is dicult to achieve. Whereas dynamic programming places exact de-
mands on computation due to its deterministic nature, the heuristic is exible in terms of re-
50
60
70
80
90
100
110
120
130
2 4 6 8 10 12 14 16
H
a
m
m
i
n
g

d
i
s
t
a
n
c
e
Processors
s12
s14
s16
40
50
60
70
80
90
100
110
120
130
2 4 6 8 10 12 14 16
H
a
m
m
i
n
g

d
i
s
t
a
n
c
e
Processors
s12
s14
s16
sources, albeit at the expense of accuracy. All dynamic programming approaches were shown
to benet from high bandwidth communications as found on HPCx. The codes are thus suited
for execution on non-vector supercomputer machines with many processors. In contrast, de-
pending on algorithm parameters, execution performance on a commodity cluster system with
low latency Gigabit Ethernet may prove adequate. This is estimated from 153s execution time
on Ness, corresponding to around 20000 iterations of harmony search on 16 processors, for a
256 spin lattice. Guest [33] provides an overview of message passing performance on commod-
ity systems, which suggests reasonable bandwidth would be obtained.
Chapter 8
Conclusion
In the previous chapters, implemented parallel optimisation software was described and exper-
imental results presented. Given the projects scope, there exist numerous possibilities for con-
ducting further work. Based on theoretical and practical aspects described in this dissertation,
the following discusses such possibilities briey, before concluding.
8.1 Further work
In Chapter 2, the spin glass problem was introduced. Here, it was established that the Ising
spin glass is a simplication of spin interaction. The two objects dening the exchange energy
between spins are the spins themselves, and coupling constants. In general, the graph of spin
interactions can be arbitrary. Spins assume state, whose representation can vary in complexity
from the classical or quantum Heisenberg formulation of state, to the binary Ising formulation.
Coupling constants may be chosen from arbitrary distributions, such as a discrete or continuous
Gaussian etc.
8.1.1 Algorithmic approaches
Considering that the project is concerned with the Ising spin glass, the opportunity presents
itself to explore the behaviour of more involved models. As an intermediate model between
Heisenberg and Ising formulations, one might implement the Potts model, where spins assume
discrete state. Provided that the model of spin interactions is left unaltered, this model appears
comparatively simple to implement: Applying the framework of subsystems and subsystem
interactions to the Potts model, it is apparent that the total energy of a system is still the sum
of subsystem energies and interaction energies between them. However, for a p state model,
the number of states a k-spin subsystem can assume is p
k
, instead of 2
k
. The consequence of
greater diversity is that the computational complexity of basic dynamic programming increases
to O
_
n p
2m
_
for an nm lattice. Similarly, improved dynamic programming has a complexity of
99
100 Chapter 8. Conclusion
O(nm p
m
). A further ramication of spin state concerns the algorithms implementation, which
is based on bit string representations of subsystems. Clearly, allowing more than binary state
requires the code to be redesigned. A possible approach might involve representing subsystems
as linked lists of integers. A likely consequence of this for all algorithms would be reduced
performance from additional memory operations.
One might also consider extending the algorithms to higher dimensions. While this is trivial
in the case of the heuristic, the dynamic programming approaches require the notion of a subsys-
tem to be extended into higher dimensions: Whereas basic dynamic programming is based on
a sequence of interacting spin rows for the square lattice, it is necessary to consider a sequence
of interacting lattices for the cubic lattice. The relation is analogous between hypercubes of
d and d + 1 dimensions. As a caveat, the algorithms become computationally expensive: The
basic algorithm requires O
_
n 2
2 n
d1
_
time for an n
d
-spin Ising hypercubic lattice, since there
are n (d 1)-dimensional subsystems in the lattice. For the improved algorithm, the sliding
window approach is based on a sequence of d 2-dimensional subsystems, yielding a time
complexity of O
_
n
d
2
n
d
_
. It is assumed that both algorithms parallel performance will degrade,
since higher-dimensional data are required to be communicated between processes. This places
greater requirements on message passing bandwidth.
Another possibility for further work involves applying the framework described in Chapter
3 to more general models of spin interaction: For an arbitrary graph of interacting spins, the
concept of probabilistic spin conguration (s
1
, s
2
, . . . , s
n
) can be expressed as
P(s
1
, s
2
, . . . , s
n
) =
n
_
i=1
P(s
i

i
),
where
i
is the set of precursor spins associated with spin s
i
. The task is then to arrive at
a formulation of optimum spin conguration, as shown in Chapter 3. It is believed that the
resulting dynamic programming problem must be both non-serial and polyadic, since the graph
may contain cycles, and since a spin is permitted to have multiple ancestors. This is likely to
have consequences for the complexity of the corresponding optimisation algorithm.
Of particular interest is the algorithm described by Pardella and Liers [53]. This provides
a polynomial time solution to the planar spin glass problem, allowing ground states to be de-
termined exactly, for problem instances far larger than those examined in this project. The
approach is based on combining the cut optimisation problem with the notion of Kasteleyn
cities, i.e. complete graphs which are subgraphs in the dual lattice representing plaquette frus-
trations in the spin lattice. Pardella and Liers apply the algorithm to a 30003000 lattice, which
represents an improvement over previous graph theoretical approaches [46]. Parallelisation of
cut optimisation might be achieved using the approach described by Diaz, Gibbons et al. [18].
8.1. Further work 101
8.1.2 Existing code
Next to implementing additional algorithms for spin glass optimisation, further work might be
conducted on the existing code base. Possible additional features include augmenting function-
ality to allow algorithm parameters to be controlled at runtime, or implementing further bond
distributions. Unlike basic dynamic programming, the improved dynamic programming algo-
rithm does not support lattices with periodic boundary conditions. This can be implemented by
adapting the approach described in Chapter 3, where the algorithm is invoked repeatedly, for
dierent congurations of boundary spins.
More pertinent is the optimisation of the existing codes performance. Considering the
projects scope, it was decided to adopt a design promoting code maintainability, described
in Chapters 5 and 6. Given additional time, it would be of interest to examine the cost of
pointer operations, replacing them where possible by static arrays. Also, although state-of-the-
art compilers were used during development and evaluation, the potential is given for optimising
kernel code segments: In the function get optimal prestates(), one might for example consider
manual function inlining or loop unrolling. Similar treatment for the harmony search module is
conceivable.
As implemented, the codes use MPI for achieving message passing parallelism. Although
the algorithms are indeed based on the message passing architecture, one might consider a
shared memory approach: Given the method of state space decomposition, where congura-
tions of spin subsystems are distributed equally among processes, the parallel for directive as
e.g. implemented in OpenMP appears an obvious instrument in implementing shared memory
versions of the algorithms.
8.1.3 Performance evaluation
In Chapter 7, performance data were gathered for dynamic programming and harmony search
algorithms. Scalability of the exact algorithms was examined on two machines. Further exper-
imental work might be concerned with evaluating scalability on other machines, such as com-
modity clusters or the Blue Gene architecture, if available. A more detailed examination of per-
formance on existing architectures might consider the implications of message passing latency
and bandwidth, especially with regard to the dynamic programming code using asynchronous
communications. Also applicable to harmony search, it is of interest to examine scalability.
Due to time constraints, undertaken work considered the algorithms accuracy. Additionally,
one might consider the eect of processor count and communication frequency on algorithm
iterations (ideally the latter should remain constant). Finally, there exists the potential to exper-
iment with alternative communication strategies as proposed in this work.
102 Chapter 8. Conclusion
8.2 Project summary
During the course of the project, software was developed to compute ground states of the Ising
spin glass. The software includes implementations of serial and parallel optimisation algo-
rithms. The latter include parallel dynamic programming algorithms, available in two variants.
The rst of these allows lattice instances with arbitrary boundary conditions to be solved, while
the second is computationally more ecient. Performance was examined, indicating good scal-
ability for the rst variant. In contrast, scalability is limited for the second variant. Also, a
further algorithm was examined. This implements a parallel ground state optimiser, based on
the harmony search heuristic. Performance was examined in terms of solution accuracy and
algorithm convergence.
In Chapter 5, the projects goals were described. These consisted of developing an exact
ground state solver based on the transfer matrix method. As an additional objective, inves-
tigation was to include an alternative, heuristic parallel algorithm. The performance of both
algorithms was to be examined. It was intended that the software should be self-contained,
oering sucient functionality to be useful as a research tool.
In the light of undertaken work, the projects goals are considered fullled to considerable
extent: Implemented software includes variants of exact optimisation algorithms. In theoretical
work, the dynamic programming approach was shown to oer identical performance to transfer
matrix based methods, therefore both approaches are considered computationally equivalent.
The described harmony search heuristic was also implemented. Both dynamic programming
and harmony search are implemented as message passing codes. Performance was investigated
as proposed, examining scalability of dynamic programming codes, and accuracy of parallel
harmony search. Although it remains of interest to examine scalability of the alternative code,
overall the project is considered a success.
8.3 Conclusion
In this dissertation, the Ising spin glass was introduced as a combinatorial optimisation problem.
The theoretical background was discussed, identifying and developing solutions to the problem.
A description of undertaken project work was provided. Implemented software was described
and experimental results were presented. Finally, possibilities for further work were identied.
Appendix A
Project Schedule
103
104 Chapter A. Project Schedule
W
k
1
W
k
2
W
k
3
W
k
4
W
k
5
W
k
6
W
k
7
W
k
8
W
k
9
W
k
1
0
W
k
1
1
W
k
1
2
W
k
1
3
W
k
1
4
W
k
1
5
W
k
1
6
D
e
t
a
i
l
e
d
d
e
s
i
g
n
I
m
p
l
e
m
e
n
t
a
t
i
o
n
D
e
b
u
g
g
i
n
g
T
e
s
t
i
n
g
P
e
r
f
o
r
m
a
n
c
e
E
v
a
l
u
a
t
i
o
n
D
e
t
a
i
l
e
d
d
e
s
i
g
n
I
m
p
l
e
m
e
n
t
a
t
i
o
n
D
e
b
u
g
g
i
n
g
T
e
s
t
i
n
g
P
e
r
f
o
r
m
a
n
c
e
E
v
a
l
u
a
t
i
o
n
R
e
p
o
r
t
P
r
e
s
e
n
t
a
t
i
o
n
S
u
b
m
i
s
s
i
o
n
,
C
o
r
r
e
c
t
i
o
n
s
Figure A.1: Project schedule
Appendix B
UML Chart
105
106 Chapter B. UML Chart
i
o
.
c
i
o
.
h
s
p
i
n
g
l
a
s
s
.
c
s
p
i
n
g
l
a
s
s
.
h
m
a
i
n
.
c
r
a
n
d
o
m
.
c
r
a
n
d
o
m
.
h
a
r
r
a
y
s
.
c
a
r
r
a
y
s
.
h
g
s
t
a
t
e
f
i
n
d
e
r
.
h
b
f
o
r
c
e
_
g
s
t
a
t
e
_
f
i
n
d
e
r
.
c
d
p
_
g
s
t
a
t
e
_
f
i
n
d
e
r
.
c
d
p
_
g
s
t
a
t
e
_
f
i
n
d
e
r
_
f
a
s
t
.
c
h
a
r
m
o
n
y
_
g
s
t
a
t
e
_
f
i
n
d
e
r
.
c
Figure B.1: UML class diagram of source code module and header relationships
Appendix C
Markov Properties of Spin Lattice
Decompositions
C.1 First-order property of row-wise decomposition
Using a row-wise decomposition strategy of spin rows, system state probability is expressed as
P(S ) =
1
Z(T)
exp
_
1
kT
_
_
H(S
1
) +
n
i=2
H(S
i
) + H
b
(S
i1
, S
i
)
_
_
_
_
=
1
Z(T)
exp
_
1
kT
H(S
1
)
_
n
_
i=2
exp
_
1
kT
(H (S
i
) + H
b
(S
i1
, S
i
))
_
.
The partition function is expanded in a similar manner to account for subsystems, as
Z(T) =
S S
exp
_
1
kT
H(S )
_
=
S
1
exp
_
1
kT
H(S
1
)
_
n
_
i=1
_
S
i
exp
_
1
kT
(H(S
i
) + H
b
(S
i1
, S
i
)
_
_
_
=
n
_
i=2
Z
i
(T), with Z
i
(T) =
_
_
_
S
i
exp
_
1
kT
H(S
i
)
_
i = 1
_
S
i
exp
_
1
kT
(H(S
i
) + H
b
(S
i1
, S
i
))
_
1 < i n
Substituting Z(T) in Equation C.1, state is dened as
P(S ) =
1
Z
1
(T)
exp
_
1
kT
H(S
1
)
_
n
_
i=2
1
Z
i
(T)
exp
_
1
kT
(H (S
i
) + H
b
(S
i1
, S
i
))
_
= P(S
1
)
n
_
i=2
P(S
i
S
i1
).
107
108 Chapter C. Markov Properties of Spin Lattice Decompositions
which shows that the chosen approach fulls the property of a rst-order Markov chain; the
conditional probability P(S
i
S
i1
) is due to dependence of row S
i
on its predecessors congu-
ration.
C.2 Higher-order property of unit spin decomposition
Applying an analogous approach to determining system state probability, P(S ) is expressed as
P(S ) =
1
Z(T)
exp
_
1
kT
_
_
nm1
i=0
H
b
(S
i
, S
i1
) + H
b
(S
i
, S
im
)
_
_
_
_
=
1
Z(T)
nm1
_
i=0
exp
_
1
kT
(H
b
(S
i
, S
i1
) + H
b
(S
i
, S
im
))
_
.
with Z(T) =
nm1
i=0
Z
i
(T) and Z
i
(T) =
_
S
i
exp
_
1
kT
(H
b
(S
i
, S
i1
) + H
b
(S
i
, S
im
))
_
it follows
that
P(S ) =
nm1
_
i=0
P(S
i
S
i1
, S
im
).
It is reminded that ground state information can be obtained by optimising P(S ). For this par-
ticular model, the ground state conguration is obtained by maximising P(S ), i.e.
argmax
S
0
,S
1
,...,S
nm1
_
_
nm1
_
i=0
P(S
i
S
i1
, S
im
)
_
_
.
Next, it is necessary to adapt the Viterbi path formulation, in order to arrive at a recursive expres-
sion of ground state energy for the higher-order Markov model. Disregarding cyclic boundary
interactions in the model, and noting that P(S
i
S
i1
, S
im
i) = P(S
i
) for i = 0, a prototypical
approach is
P
viterbi
(S
i
) =
_
_
max
S
i
P(S
i
) i = 1
max
S
i1
,S
im
P(S
i
S
i1
, S
im
) Q
viterbi
(S
i1
) Q
viterbi
(S
im
) i > 1.
Unfortunately, there exists a caveat against recursively stating
P
viterbi
(S
i
) = max
S
i1
,S
im
P(S
i
S
i1
, S
im
) P
viterbi
(S
i1
) P
viterbi
(S
im
) ,
because by denition, probability of subsystem S
i
assuming a given state is conditionally de-
pendent on subsystems S
i1
, S
im
, which in turn are both conditionally dependent on subsystem
S
im1
. This ordering requires that when evaluating terms P
viterbi
(S
i1
) and P
viterbi
(S
in
) identi-
C.2. Higher-order property of unit spin decomposition 109
cal sets of subsystem congurations are considered. The The mapping Q
viterbi
must reect this
behaviour in terms of P
viterbi
.
A solution to the dependency problem of vertical and horizontal predecessor spins can be
obtained by increasing the order of the Markov model to m + 1. As a result, system state
probability is given by the product
P(S ) =
nm1
_
i=0
P(S
i
S
i1
, S
i2
, . . . , S
im1
) ,
from which ground state probability can be formulated as
P
viterbi
(S
i
, S
i1
, . . . , S
im
) =
_
_
P(S
i
, S
i1
, . . . , S
im
) i m
max
S
im1
P(S
i
S
i1
, . . . , S
im1
) P
viterbi
(S
i1
, . . . , S
im1
) i > m.
Appendix D
The Viterbi Path
D.1 Evaluating the Viterbi path in terms of system energy
It is of interest to examine the behaviour of system state probability, which is present in the
recursive formulation of the Viterbi path, and evaluated in the described pseudocode algorithm.
Taking the natural logarithm of the state probability , it is observed that
ln (P(S )) = ln
_
1
Z(T)
exp
_
1
kT
H(S )
__
= ln
_
1
Z(T)
_
H(S )
kT
H(S ).
Using this result, the natural logarithmof the conditionally dependent state probability P(S
i
S
i1
)
is
ln (P(S
i
S
i1
)) = ln
_
P(S
i
, S
i1
)
P(S
i1
)
_
= ln (P(S
i
, S
i1
)) ln (P(S
i1
))
(H (S
i
) + H (S
i1
) + H
b
(S
i
, S
i1
)) + H(S
i1
)
(H (S
i
) + H
b
(S
i
, S
i1
)) ,
which allows system probability to be evaluated quantitatively in terms of its Hamiltonian. This
in turn permits reformulation of the dynamic programming optimisation problem;
ln (P
viterbi
(S
i
)) =
_
_
max
S
i
ln (P(S
i
)) i = 1
max
S
i1
ln (P(S
i
S
i1
)) + ln (P
viterbi
(S
i1
)) i > 1
ln (P
viterbi
(S
i
)) =
_
_
c min
S
i
H (S
i
) i = 1
min
S
i1
H (S
i
) + H
b
(S
i
, S
i1
) + c ln (P
viterbi
(S
i1
)) i > 1,
with c R. It is trivial to apply the same approach to the recursive function viterbi(i), which
evaluates to the actual sequence of emitted states in the Viterbi path, and the described pseu-
111
112 Chapter D. The Viterbi Path
docode algorithm.
Setting c = 1, the evaluated optimal sequence remains the Viterbi path. Further substitution
yields
H
min
(S
i
) =
_
_
min
S
i
H (S
i
) i = 1
min
S
i1
H (S
i
) + H
b
(S
i
, S
i1
) + H
min
(S
i1
) i > 1,
(D.1)
which is the Hamiltonian of the system (S
1
, S
2
, . . . , S
i
), whose states are equal to those emitted
by the Viterbi algorithm. Since the Viterbi path corresponds to the most probable system state,
H
min
is the systems ground state. This provides a solution to the ground state problem for the
two dimensional lattice without vertical or horizontal boundary interactions.
Appendix E
Software usage
The following provides instructions on how to install and use the software described in this
dissertation.
Requirements
The software requires the library glib-2.0 to be installed. By default, this library is expected
to reside in the directory /usr/lib, with headers located at /usr/include/glib-2.0 and /usr/lib/glib-
2.0/include. These settings may be changed by modifying the le Makele.am. An implemen-
tation of MPI, such as MPICH2, is also required.
Congure and compile
The software is delivered as a compressed tarball with the .tar.gz le name extension. It is un-
packed by issuing
tar xvzf ising.tar.gz
at the command prompt. Following this, it is necessary to initiate conguration by issuing
./congure
from within the packages root directory. Environment variables are used to specify congu-
ration options, including the compiler used (which defaults to mpicc). For example, to disable
optimisation, the necessary commands are:
export CFLAGS=-O0; ./congure
113
114 Chapter E. Software usage
Providing conguration was successful, compilation is initiated using
make
Usage
Upon completion, the source directory contains the binaries genbonds, genclamps, sbforce,
dpsolver, dpsolverfast, hmsolver, whose purpose is described in chapter 6. Most signicantly,
the solver utilities dpsolver, dpsolverfast, hmsolver operate on spin bond conguration les.
which are generated using genbonds. To generate a sample 12 12 spin conguration le
BONDS, the required command is
./genbonds -x 12 -y 12 > BONDS
which is solved e.g. using improved dynamic programming on a single process by invoking
./dpsolverfast -b BONDS
Multiprocessing is enabled either by invoking mpiexec directly, or by using one of the SUN
GridEngine scripts located inside the source root directory. All utilities support the -? ag for
displaying a list of command line options.
Appendix F
Source Code Listings
1 /
2 Fi l e : main . c
3
4 I mpl ement s common e n t r y p o i n t f o r ground s t a t e s o l v e r u t i l i t i e s .
5 Re s pons i bl e f o r p r o c e s s i n g command l i n e o p t i o n s and i n i t i a t i n g c omput at i on
6
7 /
8
9 # i nc l ude <s t d i o . h>
10 # i nc l ude <s t d l i b . h>
11 # i nc l ude <g l i b . h>
12 # i nc l ude <g l i b / g p r i n t f . h>
13
14 # i nc l ude s p i n g l a s s . h
15 # i nc l ude i o . h
16 # i nc l ude g s t a t e f i n d e r . h
17
18 / These s t o r e v a l ue s of command l i n e ar gument s /
19 s t a t i c gchar s pi nConf i g = NULL;
20 s t a t i c gchar bondConf i g = NULL;
21 s t a t i c gchar cl ampConf i g = NULL;
22 s t a t i c gchar compSpi nConf i g = NULL;
23
24 / Data s t r u c t u r e f o r command l i n e p r o c e s s i n g .
25 S p e c i f i e s p r o p e r t i e s of command l i n e o p t i o n s /
26 s t a t i c GOpt i onEnt r y e n t r i e s [ ] =
27 s pi n i n i t i a l c o nf i g , s , G OPTION FLAG OPTIONAL ARG, G OPTION ARG FILENAME, &
s pi nConf i g , I n i t i a l s pi n c o n f i g u r a t i o n f i l e , s pi nConf i g ,
28 bondc onf i g , b , 0 , G OPTION ARG FILENAME, &bondConf i g , I n i t i a l bond
c o n f i g u r a t i o n f i l e , bondConf i g ,
29 clampc onf i g , c , G OPTION FLAG OPTIONAL ARG, G OPTION ARG FILENAME, &
cl ampConf i g , I n i t i a l s pi n cl amp c o n f i g u r a t i o n f i l e , cl ampConf i g ,
30 s pi n compar i son c o nf i g , x , G OPTION FLAG OPTIONAL ARG, G OPTION ARG FILENAME
, &compSpi nConfi g , Spi n c o n f i g u r a t i o n t o compare r e s u l t wi t h ,
compSpi nConf i g ,
31 NULL
32 ;
115
116 Chapter F. Source Code Listings
33
34 s t a t i c voi d i n i t i a l i s e c o mp u t a t i o n ( ) ;
35
36 i nt main ( i nt ar gc , char ar gv [ ] )
37
38 / I n i t i a l i s e dat a s t r u c t u r e f o r argument p r o c e s s i n g /
39 GEr r or e r r o r = NULL;
40 GOpt i onCont ext c o n t e x t ;
41
42 c o n t e x t = g o p t i o n c o n t e x t n e w ( Ca l c u l a t e s pi n g l a s s gr ound s t a t e s ) ;
43 g o p t i o n c o n t e x t a d d ma i n e n t r i e s ( c ont e xt , e n t r i e s , NULL) ;
44 / Parse ar gument s /
45 g o p t i o n c o n t e x t p a r s e ( c ont e xt , &ar gc , &ar gv , &e r r o r ) ;
46
47 / Handl i ng of r e q u i r e d ar gument s /
48 i f ( bondConf i g == NULL)
49 g f p r i n t f ( s t d e r r , Pl e a s e s p e c i f y an i n p u t bond c o n f i g u r a t i o n f i l e . \ n ) ;
50 e x i t ( EXIT FAILURE) ;
51
52 i f ( cl ampConf i g != NULL && s pi nConf i g == NULL)
53 g f p r i n t f ( s t d e r r , Sp e c i f y i n g a cl amp c o n f i g u r a t i o n f i l e r e q u i r e s t he us e of
an i n i t i a l s pi n c o n f i g u r a t i o n f i l e . \ n ) ;
55
56
57 i n i t i a l i s e c o mp u t a t i o n ( ) ;
58
59 g o p t i o n c o n t e x t f r e e ( c o n t e x t ) ;
60 ret urn ( EXIT SUCCESS) ;
61
62
63 voi d i n i t i a l i s e c o mp u t a t i o n ( )
64 g i n t xSi ze , ySi ze , xSi ze1 , ySi ze1 ;
65
66 / Used t o c o n s t r u c t s pi n g l a s s s t r u c t u r e /
67 gdoubl e we i ght s = NULL;
68 gbool ean cl amps = NULL;
69 Spi n s p i n s = NULL;
70 Spi n compSpi ns = NULL;
71
72 s t r uc t Spi nGl a s s s p i n Gl a s s ;
73
74 / Read we i ght s f rom p r e v i o u s l y obt ai ne d f i l e name /
75 we i ght s = r e a d we i g h t s ( bondConf i g , &xSi ze , &ySi ze ) ;
76
77 i f ( cl ampConf i g != NULL)
78 / Read s pi n cl amps f rom p r e v i o u s l y obt ai ne d f i l e name /
79 cl amps = r e a d c l a mps ( cl ampConf i g , &xSi ze1 , &ySi ze1 ) ;
80
81 / Check t h a t s i z e s of s pi n and cl amp ma t r i c e s mat ch /
82 i f ( xSi ze != xSi ze1 ySi ze != ySi ze1 )
83 g f p r i n t f ( s t d e r r , Er r or : Bond and cl amp ma t r i x s i z e s do not mat ch .
Abor t i ng \n ) ;
117
85
86
87
88 i f ( s pi nConf i g != NULL)
89 / Read i n i t i a l s pi n c o n f i g u r a t i o n f rom p r e v i o u s l y obt ai ne d f i l e name /
90 s p i n s = r e a d s p i n s ( s pi nConf i g , &xSi ze1 , &ySi ze1 ) ;
91
93 g f p r i n t f ( s t d e r r , Er r or : Bond and s pi n c o n f i g u r a t i o n ma t r i x s i z e s do not
mat ch . Abor t i ng \n ) ;
95
96
97
98 i f ( compSpi nConf i g != NULL)
99 / Read compari son s pi n c o n f i g u r a t i o n f rom p r e v i o u s l y obt ai ne d f i l e name /
100 compSpi ns = r e a d s p i n s ( compSpi nConfi g , &xSi ze1 , &ySi ze1 ) ;
101
103 g f p r i n t f ( s t d e r r , Er r or : Re f e r e nc e s pi n c o n f i g u r a t i o n and bond ma t r i x
s i z e s do not mat ch . Abor t i ng \n ) ;
105
106
107
108 / I n i t i a l i s e s pi n g l a s s /
109 s p i n Gl a s s = s p i n g l a s s a l l o c ( xSi ze , ySi ze , s pi ns , wei ght s , cl amps ) ;
110
111 / Compute ground s t a t e /
112 f i n d g r o u n d s t a t e s ( s p i n Gl a s s ) ;
113
114 i f ( compSpi ns != NULL)
115 / Compare r e s u l t i n g c o n f i g u r a t i o n t o s p e c i f i e d r e f e r e n c e c o n f i g u r a t i o n /
116 g i n t d i s t a n c e ;
117 s t r uc t Spi nGl a s s s pi nGl a s s 2 = s p i n g l a s s a l l o c ( xSi ze , ySi ze , compSpi ns , NULL,
NULL) ;
118 d i s t a n c e = s p i n g l a s s c o r r e l a t e ( s pi nGl a s s , s pi nGl a s s 2 ) ;
119
120 g p r i n t f ( Co r r e l a t i o n d i s t a n c e : %d\n , d i s t a n c e ) ;
121 s p i n g l a s s f r e e ( s pi nGl a s s 2 ) ;
122
123
124 s p i n g l a s s f r e e ( s p i n Gl a s s ) ;
125
1 /
2 Fi l e : d p g s t a t e f i n d e r . c
3
4 I mpl ement s s e r i a l and p a r a l l e l bas i c dynami c programmi ng a l g o r i t h ms
5
6 /
7
9 # i nc l ude <math . h>
10 # i nc l ude <s t r i n g . h>
13
15 # i nc l ude a r r a y s . h
17
18 / CYCLIC EXCHANGE d e f i n e s c y c l i c communi cat i on p a t t e r n s /
19 # de f i ne YCLIC EXCHANGE
20
21 / USE MPI d e f i n e s p a r a l l e l code /
22 # de f i ne USE MPI
23 # i f d e f USE MPI
24 # i nc l ude <mpi . h>
25 # e ndi f
26
27 / De f i ne s dat a t y pe f o r message pas s i ng /
28 # de f i ne T INT MPI LONG LONG INT
29
30 / Cons t ant a l i a s /
31 # de f i ne IGNORE BITMASK TRUE
32
33 / Communi cat i ons dat a /
34 s t a t i c g i n t Sol ve r NPr oc s = 1;
35 s t a t i c g i n t Sol ve r Pr oc I D = 0;
36 # de f i ne COMM MPI COMM WORLD
37 s t a t i c gui nt 64 Sol ve r Pr oc e s s or Ma s k = 0;
39
40 / Adj u s t row of s p i n s ac c or di ng t o b i t s t r i n g r e p r e s e n t a t i o n
41 s pi nGl as s ( wr i t e ) t he s pi n g l a s s s t r u c t u r e t o mani pul at e
42 row s p e c i f i e s t he s pi n row i n t he range [ 0 , NROWS)
43 c onf t he b i t s t r i n g r e p r e s e n t a t i o n of a s pi n row
44 i gnor e Bi t mas k i f TRUE, t he pr oc e s s ID does not i n f l u e n c e t he b i t
s t r i n g /
45 s t a t i c voi d a d j u s t s p i n r o w ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t row , t i n t conf ,
gbool ean i gnor e Bi t ma s k ) ;
46
47 / Det er mi ne ground s t a t e and c o n f i g u r a t i o n of a s pi n g l a s s i n s t a n c e
48 s pi nGl as s s pi n g l a s s i n s t a n c e /
49 s t a t i c voi d get mi ni mum pat h ( s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
50
51 / Det er mi ne opt i mum c o n f i g u r a t i o n s of s pi n row row1 , f o r a l l c o n f i g u r a t i o n s of row
row
119
52 s pi nGl as s ( read / wr i t e ) s pi n g l a s s i n s t a n c e
53 mi nPat h ( read / wr i t e ) s t o r e s minimum pat h ( i . e . ground s t a t e ener gy ) of
s ubs y s t e m be f or e and a f t e r i nc r e me nt i ng row row
54 mi nPat hConf ( read / wr i t e ) s t o r e s opt i mum c o n f i g u r a t i o n s of rows
55 row row of t he s pi n l a t t i c e t o pr oc e s s
56 t r e l l i s C o l s number of s pi n row c o n f i g u r a t i o n s
57 f i nal RowConf used t o s p e c i f y f i n a l row s c o n f i g u r a t i o n , i f c y c l i c
boundary c o n d i t i o n s ar e p r e s e n t /
58 s t a t i c voi d g e t o p t i ma l p r e s t a t e s ( s t r uc t Spi nGl a s s s pi nGl a s s , gdoubl e mi nPat h , t i n t
mi nPat hConf , g i n t row , t i n t t r e l l i s Co l s , t i n t f i nal Conf Row ) ;
59
60 / Se t t he c o n f i g u r a t i o n of s pi n rows , based on opt i mum c o n f i g u r a t i o n s
61 s pi nGl as s ( wr i t e ) s pi n g l a s s t o mani pul at e
62 mi nPat hConf ( read ) s t o r e s opt i mum s pi n row c o n f i g u r a t i o n s
63 c onf opt i mum c o n f i g u r a t i o n of u l t i ma t e s pi n row /
64 s t a t i c voi d s e t o p t i ma l c o n f i g ( s t r uc t Spi nGl a s s s pi nGl a s s , t i n t mi nPat hConf , t i n t
conf ) ;
65
66 / I n i t i a l i s e message pas s i ng communi cat i ons
67 /
68 s t a t i c voi d i ni t comms ( s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
69
70 / Ter mi nat e message pas s i ng communi cat i ons
71 /
72 s t a t i c voi d term comms ( voi d ) ;
73
74
75 gdoubl e f i n d g r o u n d s t a t e s ( s t r uc t Spi nGl a s s s p i n Gl a s s )
76
77 gdoubl e ener gy ;
78
79 i f ( s pi nGl a s s >ySi ze > 63)
80 g f p r i n t f ( s t d e r r , Er r or : The s p e c i f i e d s pi n l a t t i c e exceeds a count of 63
col umns \n ) ;
81
82
83 i ni t comms ( s p i n Gl a s s ) ;
84
85 get mi ni mum pat h ( s p i n Gl a s s ) ;
86
87 t erm comms ( ) ;
88
89 / Mast er pr oc e s s o u t p u t s s pi n g l a s s ground s t a t e /
90 i f ( Sol ve r Pr oc I D == 0)
91 ener gy = s p i n g l a s s e n e r g y ( s p i n Gl a s s ) ;
92 g p r i n t f ( Ener gy : %E\n , ener gy ) ;
93 s p i n g l a s s wr i t e s p i n s ( s pi nGl a s s , s t d o u t ) ;
94
95
96 ret urn ener gy ;
97
98
99 s t a t i c voi d a d j u s t s p i n r o w ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t row , t i n t conf ,
gbool ean i gnor e Bi t ma s k )
100 g i n t i ;
101 Spi n s pi n ;
102
104 / Row c o n f i g u r a t i o n i s dependent on pr oc e s s or ID , whi ch i s a b i t p r e f i x /
105 i f ( ! i gnor e Bi t ma s k ) conf = conf Sol ve r Pr oc e s s or Ma s k ;
106 # e n d i f
107
108 f or ( i =0; i <s pi nGl a s s >ySi ze ; i ++)
109 i f ( conf % 2 != 0) s pi n = UP;
110 e l s e s pi n = DOWN;
111
112 / Se t s t a t e of s pi n i wi t h i n row /
113 Ar r ayAccess2D ( s pi nGl a s s >s pi ns , s pi nGl a s s >ySi ze , row , i ) = s pi n ;
114
115 conf = conf >> 1;
116
117
118
119 / Co l l e c t i v e / s e r i a l v a r i a n t /
120 # i f nde f CYCLIC EXCHANGE
mi nPat hConf , g i n t row , t i n t t r e l l i s Co l s , t i n t f i nal RowConf )
122 t i n t j ;
123 t i n t k ;
124
125 / S t o r e s updat ed minimum pat h dat a /
126 gdoubl e minPathNew = g new0 ( gdoubl e , t r e l l i s C o l s / Sol ve r NPr oc s ) ;
127
128 g i n t pr evi ousRow ;
129
130 i f ( row == 0)
131 pr evi ousRow = ( s pi nGl a s s >xSi ze ) 1;
132
133 / Se t pr e c e di ng row c o n f i g u r a t i o n /
134 a d j u s t s p i n r o w ( s pi nGl a s s , previ ousRow , f i nal RowConf , IGNORE BITMASK) ;
135
136 f or ( j =0; j <t r e l l i s C o l s / Sol ve r NPr oc s ; j ++)
137 mi nPat hConf [ j ] = f i nal RowConf ;
138
139 / Se t c u r r e n t s pi n row c o n f i g u r a t i o n /
140 a d j u s t s p i n r o w ( s pi nGl a s s , row , j , ! IGNORE BITMASK) ;
141
142 / Ca l c u l a t e e n e r g e t i c c o n t r i b u t i o n /
143 minPathNew [ j ] = s p i n g l a s s r o w e n e r g y ( s pi nGl a s s , row) +
s p i n g l a s s i n t e r r o w e n e r g y ( s pi nGl a s s , pr evi ousRow ) ;
144
145
146 e l s e
147 pr evi ousRow = row 1;
148
150 gdoubl e pa t h = G MAXDOUBLE;
151 t i n t conf ;
121
152
155
156 f or ( k=0; k<t r e l l i s C o l s ; k++)
157 gdoubl e i nt er RowEner gy ; / En e r g e t i c c o n t r i b u t i o n of c u r r e n t and
p r e v i o u s row /
158 gdoubl e rowEnergy ; / En e r g e t i c c o n t r i b u t i o n of c u r r e n t row /
159
160 / Se t pr e c e di ng s pi n row c o n f i g u r a t i o n /
161 a d j u s t s p i n r o w ( s pi nGl a s s , previ ousRow , k , IGNORE BITMASK) ;
162
163 / Ca l c u l a t e e n e r g e t i c c o n t r i b u t i o n s /
164 i nt er RowEner gy = s p i n g l a s s i n t e r r o w e n e r g y ( s pi nGl a s s , pr evi ousRow ) ;
165 rowEnergy = s p i n g l a s s r o w e n e r g y ( s pi nGl a s s , row) ;
166
167 i f ( mi nPat h [ k]+ i nt er RowEner gy+rowEnergy < pa t h )
168 pa t h = mi nPat h [ k ] + i nt er RowEner gy + rowEnergy ;
169 conf = k ;
170
171
172
173 / Record opt i mum pat hs t o exami ned s t a t e /
174 mi nPat hConf [ j ] = conf ;
175 minPathNew [ j ] = pa t h ;
176
177
178
180 / Exchange minimum pat hs /
181 MPI Al l ga t he r ( minPathNew , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE, mi nPat h ,
t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE, COMM) ;
182 # e l s e
183 f or ( j =0; j <t r e l l i s C o l s ; j ++) mi nPat h [ j ] = minPathNew [ j ] ;
184 # e n d i f
185
186 g f r e e ( minPathNew ) ;
187
188 # e ndi f
189
190 / Cy c l i c v a r i a n t /
191 # i f d e f CYCLIC EXCHANGE
mi nPat hConf , g i n t row , t i n t t r e l l i s Co l s , t i n t f i nal RowConf )
193
194 t i n t j , k ;
195
196 / Compute nei ghbour pr oc e s s ID /
197 g i n t l e f t Ne i g h b o u r = ( Sol ver Pr ocI D 1+Sol ve r NPr oc s ) % Sol ve r NPr oc s ;
198
201 gdoubl e b u f f e r = g new0 ( gdoubl e , t r e l l i s C o l s / Sol ve r NPr oc s ) ;
202
203 g i n t pr evi ousRow ;
204
205 i f ( row == 0)
206 pr evi ousRow = ( s pi nGl a s s >xSi ze ) 1;
207
208 / Se t pr e c e di ng row c o n f i g u r a t i o n /
209 a d j u s t s p i n r o w ( s pi nGl a s s , previ ousRow , f i nal RowConf , IGNORE BITMASK) ;
210
212 mi nPat hConf [ j ] = f i nal RowConf ; / T h e o r e t i c a l l y r edundant /
213
216
218 minPathNew [ j ] = s p i n g l a s s r o w e n e r g y ( s pi nGl a s s , row) +
s p i n g l a s s i n t e r r o w e n e r g y ( s pi nGl a s s , pr evi ousRow ) ;
219
220 e l s e
221 MPI Request r e q u e s t ;
222 pr evi ousRow = row 1;
223
224 / I t e r a t e t hr ough s u b s e t of c u r r e n t row s s t a t e s /
227 t i n t conf ;
228
229 / Se t s pi n row c o n f i g u r a t i o n /
231
232 / I t e r a t e t hr ough a l l s t a t e s of pr e c e di ng s pi n row /
233 f or ( k=0; k<t r e l l i s C o l s ; k++)
234 gdoubl e i nt er RowEner gy ;
235 gdoubl e rowEnergy ;
236
237 / Se t p r e v i ou s row c o n f i g u r a t i o n ID /
238 t i n t cID = ( Sol ve r Pr oc I D ( t r e l l i s C o l s / Sol ve r NPr oc s ) + k ) %
t r e l l i s C o l s ;
239
240 / I n i t i a t e nei ghbour r o t a t i o n of mi npat h /
241 i f ( k == 0) MPI I s s end ( mi nPat h , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,
l e f t Ne i ghbour , 0 , COMM, &r e q u e s t ) ;
242
243 / Se t pr e c e di ng s pi n row c o n f i g u r a t i o n /
244 a d j u s t s p i n r o w ( s pi nGl a s s , previ ousRow , cID , IGNORE BITMASK) ;
245
246 / Ca l c u l a t e e n e r g e t i c c o n t r i b u t i o n s /
247 i nt er RowEner gy = s p i n g l a s s i n t e r r o w e n e r g y ( s pi nGl a s s , pr evi ousRow ) ;
248 rowEnergy = s p i n g l a s s r o w e n e r g y ( s pi nGl a s s , row) ;
249
250 i f ( k % ( t r e l l i s C o l s / Sol ve r NPr oc s ) == 0 && k != 0)
251 / Re c e i v e dat a /
252 MPI Recv ( buf f e r , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE, (
Sol ve r Pr oc I D +1) % Sol ver NPr ocs , MPI ANY TAG, COMM,
123
MPI STATUS IGNORE) ;
253 MPI Wait (&r e que s t , MPI STATUS IGNORE) ;
254 memcpy ( mi nPat h , buf f e r , t r e l l i s C o l s / Sol ve r NPr oc s s i z e o f ( gdoubl e
) ) ;
255 / . . . r e c e i v e dat a /
256 / Send dat a /
257 MPI I s s end ( mi nPat h , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,
l e f t Ne i ghbour , 0 , COMM, &r e q u e s t ) ;
258 / Send dat a /
259
260
261 i f ( mi nPat h [ k % ( t r e l l i s C o l s / Sol ve r NPr oc s ) ] + i nt er RowEner gy +
rowEnergy < pa t h )
262 pa t h = mi nPat h [ k % ( t r e l l i s C o l s / Sol ve r NPr oc s ) ] + i nt er RowEner gy +
rowEnergy ;
263 conf = cID ;
264
265
266
269
270 / Re c e i v e dat a /
271 MPI Recv ( buf f e r , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE, ( Sol ve r Pr oc I D +1)
% Sol ver NPr ocs , MPI ANY TAG, COMM, MPI STATUS IGNORE) ;
272 MPI Wait (&r e que s t , MPI STATUS IGNORE) ;
273 memcpy ( mi nPat h , buf f e r , t r e l l i s C o l s / Sol ve r NPr oc s s i z e o f ( gdoubl e ) ) ;
274
275
276
277 f or ( j =0; j <t r e l l i s C o l s / Sol ve r NPr oc s ; j ++) mi nPat h [ j ] = minPathNew [ j ] ;
278
279 / Free memory /
281 g f r e e ( b u f f e r ) ;
282
283 # e ndi f
284
285 s t a t i c voi d get mi ni mum pat h ( s t r uc t Spi nGl a s s s p i n Gl a s s )
286 t i n t j ;
287 g u i n t i ;
288
289 g u i n t t r e l l i s Ro ws = s pi nGl a s s >xSi ze ;
290 t i n t t r e l l i s C o l s = 1 << ( s pi nGl a s s >ySi ze ) ;
291
293 t i n t conf ;
294
295 / S t o r e s minimum pat h t o c u r r e n t l y exami ned s ubs y s t e m f o r each of i t s s t a t e s /
297 gdoubl e mi n Pa t h Pa r t i a l = g new0 ( gdoubl e , t r e l l i s C o l s / Sol ve r NPr oc s ) ;
298 gdoubl e mi nPat h = g new0 ( gdoubl e , t r e l l i s C o l s ) ; / S t o r e s minimum pat h dat a of a
s ubs y s t e m i n a s u b s e t of i t s s t a t e s /
299 # e l s e
300 gdoubl e mi nPat h = g new0 ( gdoubl e , t r e l l i s C o l s ) ;
301 gdoubl e mi n Pa t h Pa r t i a l = mi nPat h ;
302 # e n d i f
303
304 t i n t mi nPat hConf = ar r ay new 2D ( t r e l l i s Ro ws , t r e l l i s C o l s / Sol ve r NPr oc s ) ; /
S t o r e s opt i mal c o n f i g u r a t i o n s of pr e c e di ng s ubs ys t em , gi v e n s ubs y s t e m i i n
s t a t e j /
305
306 i f ( ! s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s p i n Gl a s s ) )
307 f or ( i =0; i <t r e l l i s Ro ws ; i ++)
308 g e t o p t i ma l p r e s t a t e s ( s pi nGl a s s , mi n Pa t h Pa r t i a l , mi nPat hConf [ i ] , i ,
t r e l l i s Co l s , 0) ; / Las t argument i s zero , s i n c e we don t car e about
v e r t i c a l boundary /
309
310
312 MPI Al l ga t he r ( mi n Pa t h Pa r t i a l , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE, mi nPat h ,
313 # e n d i f
314
315 / Get minimum pat h /
316 f or ( j =0; j <t r e l l i s C o l s ; j ++)
317 i f ( mi nPat h [ j ] < pa t h )
318 pa t h = mi nPat h [ j ] ;
319 conf = j ;
320
321
322 s e t o p t i ma l c o n f i g ( s pi nGl a s s , mi nPat hConf , conf ) ;
323
324 e l s e
325 t i n t r e t a i ne dMi nPa t hConf = ar r ay new 2D ( t r e l l i s Ro ws , t r e l l i s C o l s /
Sol ve r NPr oc s ) ;
326
329 / Las t argument c or r e s ponds t o f i x e d s pi n f o r boundary i n t e r a c t i o n /
330 g e t o p t i ma l p r e s t a t e s ( s pi nGl a s s , mi n Pa t h Pa r t i a l , mi nPat hConf [ i ] , i ,
t r e l l i s Co l s , j ) ;
331
332
334 MPI Al l ga t he r ( mi n Pa t h Pa r t i a l , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE,
mi nPat h , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE, COMM) ;
335 # e n d i f
336
337 / Track ener gy /
338 i f ( mi nPat h [ j ] < pa t h )
340 conf = j ;
341 / Re t ai n s t a t e s s t o r e d i n mi nConf /
342 memcpy(&( r e t a i ne dMi nPa t hConf [ 0 ] [ 0 ] ) , &( mi nPat hConf [ 0 ] [ 0 ] ) , t r e l l i s Ro ws
( t r e l l i s C o l s / Sol ve r NPr oc s ) s i z e o f ( t i n t ) ) ;
343
344
125
345
346 s e t o p t i ma l c o n f i g ( s pi nGl a s s , r et ai nedMi nPat hConf , conf ) ;
347 a r r a y f r e e 2 D ( r e t a i ne dMi nPa t hConf ) ;
348
349
350 g f r e e ( mi nPat h ) ;
351 a r r a y f r e e 2 D ( mi nPat hConf ) ;
353 g f r e e ( mi n Pa t h Pa r t i a l ) ;
354 # e n d i f
355
356
conf )
358 g i n t i ;
359 g u i n t t r e l l i s Ro ws = s pi nGl a s s >xSi ze ;
360 t i n t t r e l l i s C o l s = 1 << ( s pi nGl a s s >ySi ze ) ;
361
363 t i n t minPathConfRow = g new0 ( t i n t , t r e l l i s C o l s ) ; / Used t o s t o r e exchanged (
compl et e ) row c o n f i g u r a t i o n dat a /
364 # e n d i f
365
366 / I t e r a t e t hr ough s pi n rows i n r e v e r s e /
367 f or ( i =t r e l l i s Ro ws 1; i >=0; i )
368 / Se t row c o n f i g u r a t i o n /
369 a d j u s t s p i n r o w ( s pi nGl a s s , i , conf , IGNORE BITMASK) ;
370
371 / Re f e r e nc e opt i mum c o n f i g u r a t i o n of pr e c e di ng s pi n row /
373 MPI Al l ga t he r ( mi nPat hConf [ i ] , t r e l l i s C o l s / Sol ver NPr ocs , T INT , minPathConfRow
, t r e l l i s C o l s / Sol ver NPr ocs , T INT , COMM) ;
374 conf = minPathConfRow [ conf ] ;
375 # e l s e
376 conf = mi nPat hConf [ i ] [ conf ] ;
377 # e n d i f
378
379
381 g f r e e ( minPathConfRow ) ;
382 # e n d i f
383
384
385 s t a t i c voi d i ni t comms ( s t r uc t Spi nGl a s s s p i n Gl a s s )
386
388 gdoubl e b i n a r y Pl a c e s ;
389
390 MPI I ni t (NULL, NULL) ;
391 MPI Comm size (COMM, &Sol ve r NPr oc s ) ;
392 MPI Comm rank (COMM, &Sol ve r Pr oc I D ) ;
393
394 / Check pr oc e s s or count i s a power of t wo or u n i t y /
395 i f ( Sol ver NPr ocs >1 && Sol ve r NPr oc s % 2! =0)
396 g f p r i n t f ( s t d e r r , The p r o c e s s o r count must be a power of two . Abor t i ng . \ n ) ;
398
399
400 / Cr eat e pr oc e s s or mask /
401 Sol ve r Pr oc e s s or Ma s k = Sol ve r Pr oc I D ;
402 b i n a r y Pl a c e s = ( l og ( ( gdoubl e ) Sol ve r NPr oc s ) / l og ( 2 . 0 ) ) ;
403 Sol ve r Pr oc e s s or Ma s k <<= ( s pi nGl a s s >ySi ze ) ( g i n t ) b i n a r y Pl a c e s ; / S h i f t l og 2 (
Nprocs ) b i t s l e f t /
404 # e n d i f
405
406
407 s t a t i c voi d term comms ( )
408
410 MPI Fi na l i z e ( ) ;
411 # e n d i f
412
127
1 /
2 Fi l e : d p g s t a t e f i n d e r f a s t . c
3
4 I mpl ement s s e r i a l and p a r a l l e l i mproved dynami c programmi ng a l g o r i t h ms
5
6 /
7
9 # i nc l ude <math . h>
13
17
18 / USE MPI d e f i n e s p a r a l l e l code /
19 # de f i ne USE MPI
22 # e ndi f
23
24 / De f i ne s dat a t y pe f o r message pas s i ng /
25 # de f i ne T INT MPI LONG LONG INT
26
27 / Cons t ant a l i a s /
28 # de f i ne IGNORE BITMASK TRUE
29
34 s t a t i c t i n t Sol ve r Pr oc e s s or Ma s k = 0;
36
37 / Adj u s t group of s p i n s ac c or di ng t o b i t s t r i n g r e p r e s e n t a t i o n
38 s pi nGl as s ( wr i t e ) t he s pi n g l a s s s t r u c t u r e t o mani pul at e
39 l e a d i n g S p i n s p e c i f i e s s l i d i n g window p o s i t i o n i n t he range [ y Si z e
, x S i z e y S i z e )
40 c onf t he b i t s t r i n g r e p r e s e n t a t i o n of a s pi n row
41 i gnor e Bi t mas k i f TRUE, t he pr oc e s s ID does not i n f l u e n c e t he b i t
s t r i n g /
42 s t a t i c voi d a d j u s t s p i n e n s e mb l e ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t l e a di ngSpi n , t i n t
conf , gbool ean i gnor e Bi t ma s k ) ;
43
44 / Det er mi ne ground s t a t e and c o n f i g u r a t i o n of a s pi n g l a s s i n s t a n c e
45 s pi nGl as s s pi n g l a s s i n s t a n c e /
46 s t a t i c voi d get mi ni mum pat h ( s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
47
48 / Det er mi ne opt i mum c o n f i g u r a t i o n s of s pi n group l e adi ngSpi n 1 , f o r a l l
c o n f i g u r a t i o n s of group l e a d i n g S p i n
49 s pi nGl as s ( read / wr i t e ) s pi n g l a s s i n s t a n c e
50 mi nPat h ( read / wr i t e ) s t o r e s minimum pat h ( i . e . ground s t a t e ener gy ) of
s ubs y s t e m be f or e and a f t e r i nc r e me nt i ng by s pi n l e a d i n g S p i n
51 mi nPat hConf ( read / wr i t e ) s t o r e s opt i mum c o n f i g u r a t i o n s of s pi n gr oups
52 l e a d i n g S p i n p o s i t i o n of s l i d i n g window i n t he range [ y Si z e , x S i z e
y S i z e )
53 t r e l l i s C o l s number of s pi n group c o n f i g u r a t i o n s
54 f i nal RowConf used t o s p e c i f y f i n a l row s c o n f i g u r a t i o n , i f c y c l i c
boundary c o n d i t i o n s ar e p r e s e n t /
mi nPat hConf , g i n t l e a di ngSpi n , t i n t t r e l l i s C o l s ) ;
56
57 / Se t t he c o n f i g u r a t i o n of s pi n groups , based on opt i mum c o n f i g u r a t i o n s
58 s pi nGl as s ( wr i t e ) s pi n g l a s s t o mani pul at e
59 mi nPat hConf ( read ) s t o r e s opt i mum s pi n group c o n f i g u r a t i o n s
60 c onf opt i mum c o n f i g u r a t i o n of s pi n group at u l t i ma t e
s l i d i n g window p o s i t i o n /
conf ) ;
62
64 /
65 s t a t i c voi d i ni t comms ( s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
66
68 /
70
71
73
75
76 i f ( s pi nGl a s s >ySi ze > 63)
77 g f p r i n t f ( s t d e r r , Er r or : The s p e c i f i e d s pi n l a t t i c e exceeds a count of 63
col umns \n ) ;
78
79
80 i ni t comms ( s p i n Gl a s s ) ;
81
82 get mi ni mum pat h ( s p i n Gl a s s ) ;
83
85
86 / Mast er pr oc e s s o u t p u t s s pi n g l a s s ground s t a t e /
89 g p r i n t f ( Ener gy : %E\n , ener gy ) ;
91
92
94
95
129
96 s t a t i c voi d a d j u s t s p i n e n s e mb l e ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t l e a di ngSpi n , t i n t
conf , gbool ean i gnor e Bi t ma s k )
97 g i n t i ;
98 Spi n s pi n ;
99
101 / Row c o n f i g u r a t i o n i s dependent on pr oc e s s or ID , whi ch i s a b i t p r e f i x /
102 i f ( ! i gnor e Bi t ma s k ) conf = conf Sol ve r Pr oc e s s or Ma s k ;
103 # e n d i f
104
105 f or ( i =0; i <=s pi nGl a s s >ySi ze ; i ++)
106 i f ( conf % 2 != 0) s pi n = UP;
107 e l s e s pi n = DOWN;
108
109 / Se t s pi n at p o s i t i o n i wi t h i n s l i d i n g window /
110 s pi nGl a s s >s p i n s [ l e a di ngSpi n ( s pi nGl a s s >ySi ze ) +i ] = s pi n ;
111
112 conf = conf >> 1;
113
114
115
mi nPat hConf , g i n t l e a di ngSpi n , t i n t t r e l l i s C o l s )
117 t i n t j ;
118 t i n t k ;
119
122
123 i f ( l e a d i n g Sp i n == s pi nGl a s s >ySi ze )
124 / s pi nGl as s >y S i z e c or r e s ponds t o t he f i r s t s pi n i n t he s econd row of t he
l a t t i c e /
125
127 / Se t c u r r e n t s pi n group c o n f i g u r a t i o n /
128 a d j u s t s p i n e n s e mb l e ( s pi nGl a s s , l e a di ngSpi n , j , ! IGNORE BITMASK) ;
129
131 minPathNew [ j ] = s p i n g l a s s e n s e mb l e d e l t a ( s pi nGl a s s , l e a d i n g Sp i n ) +
s p i n g l a s s r o w e n e r g y ( s pi nGl a s s , 0) ;
132
133
134 e l s e
137 gdoubl e ensembl eEner gy ;
138 t i n t conf I ndex , conf ;
139
140 / Se t c u r r e n t s pi n ens embl e c o n f i g u r a t i o n /
141 a d j u s t s p i n e n s e mb l e ( s pi nGl a s s , l e a di ngSpi n , j , ! IGNORE BITMASK) ;
142 ensembl eEner gy = s p i n g l a s s e n s e mb l e d e l t a ( s pi nGl a s s , l e a d i n g Sp i n ) ;
143
144 / Ca l c u l a t e i nde x f o r a c c e s s i n g pr e c e di ng ens embl e c o n f i g u r a t i o n /
145 c onf I nde x = ( ( ( j Sol ve r Pr oc e s s or Ma s k ) << 1) t r e l l i s C o l s )
t r e l l i s C o l s ;
146
147 f or ( k=0; k <2; k++)
148 / Mi ni mi s e on sum of ens embl e e n e r g i e s /
149 i f ( mi nPat h [ c onf I nde x+k]+ ensembl eEner gy < pa t h )
150 pa t h = mi nPat h [ c onf I nde x+k ] + ensembl eEner gy ;
151 conf = c onf I nde x + k ;
152
153
154
155 / Record opt i mum pat hs t o exami ned s t a t e /
158
159
160
162 / Exchange minimum pat hs /
163 MPI Al l ga t he r ( minPathNew , t r e l l i s C o l s / Sol ver NPr ocs , MPI DOUBLE, mi nPat h ,
164 # e l s e
165 f or ( j =0; j <t r e l l i s C o l s ; j ++) mi nPat h [ j ] = minPathNew [ j ] ;
166 # e n d i f
167
169
170
171 s t a t i c voi d get mi ni mum pat h ( s t r uc t Spi nGl a s s s p i n Gl a s s )
172 t i n t j ;
173 g u i n t i ;
174
175 g u i n t t r e l l i s Ro ws = ( s pi nGl a s s >xSi ze 1) s pi nGl a s s >ySi ze ;
176 t i n t t r e l l i s C o l s = 1 << ( s pi nGl a s s >ySi ze +1) ;
177
179 t i n t conf ;
180
181 gdoubl e mi nPat h = g new0 ( gdoubl e , t r e l l i s C o l s ) ;
182
183 t i n t mi nPat hConf = ar r ay new 2D ( t r e l l i s Ro ws , t r e l l i s C o l s / Sol ve r NPr oc s ) ; /
S t o r e s opt i mal c o n f i g u r a t i o n s of pr e c e di ng s ubs ys t em , gi v e n s ubs y s t e m i i n
s t a t e j /
184
186 g e t o p t i ma l p r e s t a t e s ( s pi nGl a s s , mi nPat h , mi nPat hConf [ i ] , s pi nGl a s s >ySi ze+i ,
t r e l l i s C o l s ) ;
187
188 / Fi nd opt i mum c o n f i g u r a t i o n of s pi n group at u l t i ma t e s l i d i n g window p o s i t i o n /
190 i f ( mi nPat h [ j ] < pa t h )
192 conf = j ;
193
131
194
195 s e t o p t i ma l c o n f i g ( s pi nGl a s s , mi nPat hConf , conf ) ;
196
197 g f r e e ( mi nPat h ) ;
198 a r r a y f r e e 2 D ( mi nPat hConf ) ;
199
200
conf )
202 g i n t i ;
203 g u i n t t r e l l i s Ro ws = ( s pi nGl a s s >xSi ze 1) s pi nGl a s s >ySi ze ;
204 t i n t t r e l l i s C o l s = 1 << ( s pi nGl a s s >ySi ze +1) ;
205
207 t i n t minPathConfRow = g new0 ( t i n t , t r e l l i s C o l s ) ; / Used t o s t o r e exchanged (
compl et e ) row c o n f i g u r a t i o n dat a /
208 # e n d i f
209
210 f or ( i =t r e l l i s Ro ws 1; i >0; i )
211
212 / Se t s pi nGl as s s pi n ac c or di ng t o l e a d i ng s pi n c o n f i g u r a t i o n /
213 g i n t s pi nVa l = conf >> ( s pi nGl a s s >ySi ze ) ;
214 g i n t l e a d i n g Sp i n = ( s pi nGl a s s >xSi ze s pi nGl a s s >ySi ze 1) ( t r e l l i s Ro ws 1
i ) ;
215 i f ( s pi nVa l != 0) ( s pi nGl a s s >s p i n s ) [ l e a d i n g Sp i n ] = UP;
216 e l s e ( s pi nGl a s s >s p i n s ) [ l e a d i n g Sp i n ] = DOWN;
217
219 MPI Al l ga t he r ( mi nPat hConf [ i ] , t r e l l i s C o l s / Sol ver NPr ocs , T INT , minPathConfRow
, t r e l l i s C o l s / Sol ver NPr ocs , T INT , COMM) ;
220 conf = minPathConfRow [ conf ] ;
221 # e l s e
222 conf = mi nPat hConf [ i ] [ conf ] ;
223 # e n d i f
224
225
226 / Se t ens embl e c o n f i g u r a t i o n due t o f i r s t l e a d i ng s pi n /
227 a d j u s t s p i n e n s e mb l e ( s pi nGl a s s , s pi nGl a s s >ySi ze , conf , IGNORE BITMASK) ;
228
229
231 g f r e e ( minPathConfRow ) ;
232 # e n d i f
233
234
235 s t a t i c voi d i ni t comms ( s t r uc t Spi nGl a s s s p i n Gl a s s )
236
238 gdoubl e b i n a r y Pl a c e s ;
239
243
244 / Check pr oc e s s or count i s a power of t wo or u n i t y /
245 i f ( Sol ver NPr ocs >1 && Sol ve r NPr oc s % 2! =0)
246 g f p r i n t f ( s t d e r r , The p r o c e s s o r count must be a power of two . Abor t i ng . \ n ) ;
248
249
250 / Cr eat e pr oc e s s or mask /
251 Sol ve r Pr oc e s s or Ma s k = Sol ve r Pr oc I D ;
252 b i n a r y Pl a c e s = ( l og ( ( gdoubl e ) Sol ve r NPr oc s ) / l og ( 2 . 0 ) ) ;
253 Sol ve r Pr oc e s s or Ma s k <<= ( s pi nGl a s s >ySi ze ) + 1 ( g i n t ) b i n a r y Pl a c e s ; / S h i f t
l og 2 ( Nprocs ) b i t s l e f t /
254 # e n d i f
255
256
257 s t a t i c voi d term comms ( )
258
261 # e n d i f
262
133
1 /
2 Fi l e : h a r mo n y g s t a t e f i n d e r . c
3
4 I mpl ement s p a r a l l e l harmony ground s t a t e s o l v e r
5
6 /
7
12
15 # i nc l ude random . h
16
17
18 / S e r i a l al gor i t hm par amet er s /
19 # de f i ne NVECTORS 10
20 # de f i ne MEMORY CHOOSING RATE 0. 95
21
22 / Pa r a l l e l al gor i t hm par amet er s /
23 # de f i ne ITERBLOCK 100
24 # de f i ne ZONEEXBLOCK 100
25
26 / Common s pi n g l a s s dat a /
27 s t a t i c s t r uc t Spi nGl a s s s p i n Gl a s s ;
28 s t a t i c Spi n s p i n s [NVECTORS] ;
29 s t a t i c g i n t xSi ze ;
30 s t a t i c g i n t ySi ze ;
31 / Common s pi n g l a s s dat a /
32
35 # de f i ne ZONE SIZE 16
36 s t a t i c MPI Dat at ype Type Ar r ay ;
37 s t a t i c MPI Op Reduct i on Op ;
38 s t a t i c MPI Comm Sol ver Zone ;
42
43 / Det er mi ne h i g h e s t ener gy s pi n g l a s s he l d by t h i s pr oc e s s
44 hi ghe s t Ene r gy ( wr i t e ) t he ener gy of t he obt ai ne d s o l u t i o n v e c t o r
45 vect orNum ( wr i t e ) t he i nde x of t he s o l u t i o n v e c t o r as s t o r e d i n t he
ar r ay s p i n s [ ] /
46 s t a t i c voi d c omput e hi ghe s t e ne r gy ( gdoubl e hi ghe s t Ene r gy , g i n t vect orNum ) ;
47
48 / Det er mi ne l o we s t ener gy s pi n g l a s s he l d by t h i s pr oc e s s
49 hi ghe s t Ene r gy ( wr i t e ) t he ener gy of t he obt ai ne d s o l u t i o n v e c t o r
50 vect orNum ( wr i t e ) t he i nde x of t he s o l u t i o n v e c t o r as s t o r e d i n t he
ar r ay s p i n s [ ] /
51 s t a t i c voi d c omput e l owe s t e ne r gy ( gdoubl e l owes t Ener gy , g i n t vect orNum ) ;
52
53 / Det er mi ne t he al gor i t hm s conver gence s t a t u s , based on s o l u t i o n v e c t o r s he l d by
each pr oc e s s
54 r e t u r n s TRUE, i f t he al gor i t hm has conver ged /
55 s t a t i c gbool ean g e t s t a b i l i s e d s t a t u s ( voi d ) ;
56
57 / Co l l e c t i v e l y obt ai n e n e r g e t i c a l l y mi ni mal s o l u t i o n v e c t o r he l d by p r o c e s s e s
58 s p i n Ve c t o r ( read / wr i t e ) s p e c i f i e s s o l u t i o n v e c t o r t o per f or m r e d u c t i o n on ,
based on ener gy
59 comm ( read ) MPI communi cat or t o s p e c i f y p r o c e s s e s i n v o l v e d i n
r e d u c t i o n /
60 s t a t i c voi d r e d u c e mi n i ma l s p i n v e c t o r ( Spi n s pi nVe c t or , MPI Comm comm) ;
61
62 / De f i ne s ope r at i on , on whi ch r e d u c t i o n i s based
63 ve ct or 1 , v e c t o r 2 ( read / wr i t e ) o p e r a t i o n ar gument s
64 l e n g t h ( read ) l e n g t h of v e c t o r s
65 d a t a t y p e ( read ) dat a t y pe used f o r communi cat i ons /
66 s t a t i c voi d r e d u c t i o n f u n c t i o n ( Spi n ve ct or 1 , Spi n ve ct or 2 , g i n t l e ngt h ,
MPI Dat at ype dat aType ) ;
67
69 /
70 s t a t i c voi d i ni t comms ( voi d ) ;
71
73 /
75
76 gdoubl e f i n d g r o u n d s t a t e s ( s t r uc t Spi nGl a s s par amSpi nGl as s )
77 g i n t i , j ;
78
79 / Used t o s t o r e ener gy and i d e n t i f i e r of h i g h e s t ener gy v e c t o r i n memory /
80 gdoubl e hi ghe s t Ene r gy ;
81 g i n t maxVect or ;
82
83 / Used t o s t o r e ener gy and i d e n t i f i e r of l owe s t ener gy v e c t o r i n memory /
84 gdoubl e mi nEnergy ;
85 g i n t mi nVect or ;
86
87 / Used f o r communi cat i ng s pi n v e c t o r s /
88 Spi n ne i ghbour Spi ns = g new ( Spi n , par amSpi nGl ass >xSi ze par amSpi nGl ass >ySi ze ) ;
89
90 / St or e s pi n g l a s s g l o b a l l y /
91 s p i n Gl a s s = par amSpi nGl as s ;
92 xSi ze = par amSpi nGl ass >xSi ze ;
93 ySi ze = par amSpi nGl ass >ySi ze ;
94
95 i ni t comms ( ) ;
96
97 / I n i t i a l i s e by g e n e r a t i n g random v e c t o r s /
98 f or ( i =0; i <NVECTORS; i ++) s p i n s [ i ] = s p i n g l a s s g e t r a n d o m s p i n s ( s p i n Gl a s s ) ;
99
100 / Begi n i t e r a t i v e pr oc e s s /
101 f or ( i =1; g e t s t a b i l i s e d s t a t u s ( ) == FALSE; i ++)
102 / Cr eat e new v e c t o r /
135
103 Spi n newSpi ns = g new ( Spi n , xSi ze ySi ze ) ;
104
105 / Compute i n i t i a l h i g h e s t ener gy v e c t o r /
106 c omput e hi ghe s t e ne r gy (&hi ghe s t Ene r gy , &maxVect or ) ;
107
108 / Se t v e c t o r component s /
109 f or ( j =0; j <xSi ze ySi ze ; j ++)
110 i f ( s pi nGl a s s >cl amps != NULL && ( s pi nGl a s s >cl amps ) [ j ] )
111 / Cl ampi ng c o n d i t i o n /
112 newSpi ns [ j ] = s pi nGl a s s >s p i n s [ j ] ;
113 e l s e i f ( r a n d c o n t i n u o u s ( 0 , 1) < MEMORY CHOOSING RATE)
114 / Memory s e l e c t i o n c o n d i t i o n /
115 newSpi ns [ j ] = s p i n s [ g r a n d o m i n t r a n g e ( 0 , NVECTORS) ] [ j ] ;
116 e l s e i f ( r a n d c o i n t o s s ( ) )
117 newSpi ns [ j ] = UP;
118 e l s e
119 newSpi ns [ j ] = DOWN;
120
121
122
123 / Repl ace v e c t o r i n memory , i f t he new v e c t o r i s f i t t e r /
124 i f ( s p i n g l a s s e n e r g y c o n f ( s pi nGl a s s , newSpi ns ) < hi ghe s t Ene r gy )
125
126 g f r e e ( s p i n s [ maxVect or ] ) ; / Free p r e v i o u s v e c t o r /
127 s p i n s [ maxVect or ] = newSpi ns ;
128 e l s e
129 g f r e e ( newSpi ns ) ;
130
131
132 i f ( Sol ve r Pr oc I D % ZONE SIZE == 0)
133 / Pe r i od i c exchange of s pi n v e c t o r s bet ween ne i ghbour i ng z ones /
134 / Hi ghe s t ener gy v e c t o r i s r e pl ac e d by random v e c t o r /
135 g i n t random = g r a n d o m i n t r a n g e ( 0 , NVECTORS) ;
136 MPI Sendr ecv ( s p i n s [ random ] , 1 , Type Ar r ay , ( Sol ve r Pr oc I D+ZONE SIZE)%
Sol ver NPr ocs , 0 , nei ghbour Spi ns , 1 , Type Ar r ay , MPI ANY SOURCE,
MPI ANY TAG, COMM, MPI STATUS IGNORE) ;
137 r e d u c t i o n f u n c t i o n ( nei ghbour Spi ns , s p i n s [ random ] , NULL, NULL) ;
138
139
140 / Zone i n t e r n a l v e c t o r exchange /
141 i f ( i % ZONEEXBLOCK == 0)
142 r e d u c e mi n i ma l s p i n v e c t o r ( s p i n s [ maxVect or ] , Sol ver Zone ) ;
143
144
145
146 / Det er mi ne minimum v e c t or , copy c o n f i g u r a t i o n back t o o r i g i n a l s t r u c t u r e /
147 c omput e l owe s t e ne r gy (&mi nEnergy , &mi nVect or ) ;
148 r e d u c e mi n i ma l s p i n v e c t o r ( s p i n s [ mi nVect or ] , COMM) ;
149 memcpy ( s pi nGl a s s >s pi ns , s p i n s [ mi nVect or ] , s i z e o f ( Spi n ) xSi ze ySi ze ) ;
150
151 / Mast er pr oc e s s o u t p u t s s o l u t i o n /
153 p r i n t f ( S t a b i l i s e d a f t e r %d i t e r a t i o n s . \ n , i ) ;
154 g p r i n t f ( Ener gy : %E\n , mi nEnergy ) ;
156
157
159
160 f or ( i =0; i <NVECTORS; i ++) g f r e e ( s p i n s [ i ] ) ;
161 g f r e e ( ne i ghbour Spi ns ) ;
162
163 ret urn mi nEnergy ;
164
165
166 s t a t i c gbool ean g e t s t a b i l i s e d s t a t u s ( voi d )
167 gdoubl e mi nEnergy ;
168 gdoubl e gl obal Mi nEner gy ;
169 gbool ean l ocal HasOpt i mum = FALSE;
170 gbool ean al l HaveOpi t i mum ;
171
172 g i n t mi nVect or ;
173
174 / Perf orm r e d u c t i o n on l owe s t ener gy s o l u t i o n s /
175 c omput e l owe s t e ne r gy (&mi nEnergy , &mi nVect or ) ;
176 MPI Al l r educe (&mi nEnergy , &gl obal Mi nEner gy , 1 , MPI DOUBLE, MPI MIN , COMM) ;
177
178 / Det er mi ne whet her a l l p r o c e s s e s r e t a i n i d e n t i c a l l owe s t ener gy s o l u t i o n s /
179 i f ( mi nEnergy == gl obal Mi nEner gy ) l ocal HasOpt i mum = TRUE;
180 MPI Al l r educe (&l ocal HasOpt i mum , &al l HaveOpi t i mum , 1 , MPI INT , MPI LAND, COMM) ;
181
182 ret urn ( al l HaveOpi t i mum ) ;
183
184
185 s t a t i c voi d c omput e hi ghe s t e ne r gy ( gdoubl e hi ghe s t Ene r gy , g i n t vect orNum )
186 g i n t i ;
187
188 hi ghe s t Ene r gy = G MAXDOUBLE;
189
190 f or ( i =0; i <NVECTORS; i ++)
191 / I t e r a t e t hr ough a l l s o l u t i o n v e c t or s , de t e r mi ne h i g h e s t ener gy /
192 gdoubl e ener gy = s p i n g l a s s e n e r g y c o n f ( s pi nGl a s s , s p i n s [ i ] ) ;
193 i f ( ener gy > hi ghe s t Ene r gy )
194 hi ghe s t Ene r gy = ener gy ;
195 vect orNum = i ;
196
197
198
199
200 s t a t i c voi d c omput e l owe s t e ne r gy ( gdoubl e l owes t Ener gy , g i n t vect orNum )
201 g i n t i ;
202
203 l owes t Ener gy = G MAXDOUBLE;
204
205 f or ( i =0; i <NVECTORS; i ++)
206 / I t e r a t e t hr ough a l l s o l u t i o n v e c t or s , de t e r mi ne l owe s t ener gy /
207 gdoubl e ener gy = s p i n g l a s s e n e r g y c o n f ( s pi nGl a s s , s p i n s [ i ] ) ;
208 i f ( ener gy < l owes t Ener gy )
137
209 l owes t Ener gy = ener gy ;
210 vect orNum = i ;
211
212
213
214
215 s t a t i c voi d r e d u c e mi n i ma l s p i n v e c t o r ( Spi n s pi nVe c t or , MPI Comm comm)
216 Spi n newSpi ns = g new ( Spi n , xSi ze ySi ze ) ;
217
218 MPI Al l r educe ( s pi nVe c t or , newSpi ns , 1 , Type Ar r ay , Reduct i on Op , comm) ;
219 memcpy ( s pi nVe c t or , newSpi ns , xSi ze ySi ze s i z e o f ( Spi n ) ) ;
220 g f r e e ( newSpi ns ) ;
221
222
223 s t a t i c voi d r e d u c t i o n f u n c t i o n ( Spi n ve ct or 1 , Spi n ve ct or 2 , g i n t l e ngt h ,
MPI Dat at ype dat aType )
224 gdoubl e ener gy1 , ener gy2 ;
225
226 ener gy1 = s p i n g l a s s e n e r g y c o n f ( s pi nGl a s s , v e c t o r 1 ) ;
227 ener gy2 = s p i n g l a s s e n e r g y c o n f ( s pi nGl a s s , v e c t o r 2 ) ;
228
229 / Oper at i on c o n d i t i o n /
230 i f ( ener gy1 < ener gy2 )
231 memcpy ( vec t or 2 , vec t or 1 , xSi ze ySi ze s i z e o f ( Spi n ) ) ;
232
233
234
235 s t a t i c voi d i ni t comms ( voi d )
236 MPI Dat at ype s pi nType ;
237
239
242 i f ( Sol ve r Pr oc I D == 0) p r i n t f ( NProcs : %d , zone s i z e : %d\n , Sol ver NPr ocs ,
ZONE SIZE) ;
243
244 / S p l i t communi cat or /
245 MPI Comm spl i t (COMM, Sol ve r Pr oc I D / ZONE SIZE , 0 , &Sol ver Zone ) ;
246
247 / I n i t i a l i s e r e d u c t i o n o p e r a t i o n /
248 MPI Op cr eat e ( ( MPI Us e r f unc t i on ) r e d u c t i o n f u n c t i o n , 1 , &Reduct i on Op ) ;
249 MPI Type vect or ( 1 , s i z e o f ( Spi n ) , s i z e o f ( Spi n ) , MPI BYTE , &s pi nType ) ;
250 MPI Type vect or ( xSi ze , ySi ze , ySi ze , spi nType , &Type Ar r ay ) ;
251 MPI Type commi t (&Type Ar r ay ) ;
252
253
254 s t a t i c voi d term comms ( voi d )
255 MPI Comm free(&Sol ver Zone ) ;
256 MPI Type f r ee (&Type Ar r ay ) ;
258
1 /
2 Fi l e : s p i n g l a s s . h
3
4 S p e c i f i e s s pi n g l a s s o p e r a t i o n i n t e r f a c e and s pi n g l a s s dat a s t r u c t u r e
5
6 /
7
10
11 # i f nde f SPINGLASS H
12 # de f i ne SPINGLASS H
13
14 / Cons t ant s f o r s pi n g l a s s IO /
15 # de f i ne STR SPIN UP +
16 # de f i ne STR SPIN DOWN
17 # de f i ne STR CLAMPED 1
18 # de f i ne STR UNCLAMPED 0
19 # de f i ne WEIGHT FMT %l f
20
21 / Spi n dat a t y pe /
22 t ypedef enum Spi n
23 UP = 1 ,
24 DOWN = 1
25 Spi n ;
26
27 / Spi n g l a s s s t r u c t u r e /
28 s t r uc t Spi nGl a s s
29 / L a t t i c e di me ns i ons /
30 g i n t xSi ze ;
31 g i n t ySi ze ;
32
33 / Ve ct or of s pi n s t a t e s /
34 Spi n s p i n s ;
35
36 / S t o r e s c oupl i ng c o n s t a n t s . Data are s t o r e d as t wo row maj or mappi ngs of s p i n s
t o v e c t or s ,
37 such t h a t v e r t i c a l bonds pr ecede h o r i z o n t a l bonds . /
38 gdoubl e we i ght s ;
39 / S t o r e s cl ampi ng s t a t e s s i mi l a r l y /
40 gbool ean cl amps ;
41 / S t o r e s i n i t i a l s pi n c o n f i g u r a t i o n /
42 Spi n i n i t i a l S p i n s ;
43 ;
44
45 / Cons t r uc t a new s pi n g l a s s s t r u c t u r e
46 x S i z e l a t t i c e rows
47 y S i z e l a t t i c e col umns
48 i n i t i a l S p i n s ( read ) v e c t o r of i n i t i a l s pi n s t a t e s . I f NULL, a v e c t o r of UP s p i n s
i s a l l o c a t e d
49 we i ght s ( read ) v e c t o r of bonds . I f NULL, z e r o we i ght s ar e i n i t i a l i s e d
50 cl amps ( read ) v e c t o r of cl ampi ng s t a t e s .
51 r e t u r n s s pi n g l a s s dat a s t r u c t u r e /
139
52 s t r uc t Spi nGl a s s s p i n g l a s s a l l o c ( g i n t xSi ze , g i n t ySi ze , Spi n i n i t i a l S p i n s , gdoubl e
wei ght s , gbool ean cl amps ) ;
53
54 / De s t r u c t a s pi n g l a s s s t r u c t u r e . Per f or ms deep d e a l l o c a t i o n .
55 s pi nGl as s s pi n g l a s s dat a s t r u c t u r e /
56 voi d s p i n g l a s s f r e e ( s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
57
58 / Det er mi ne t o t a l ener gy of s pi n g l a s s
59 s pi nGl as s ( read ) s pi n g l a s s dat a s t r u c t u r e , whose s pi n s t a t e s and bonds ar e
r e f e r e n c e d
60 r e t u r n s t o t a l energy , ac c ount i ng f o r c y c l i c boundary i n t e r a c t i o n s /
61 gdoubl e s p i n g l a s s e n e r g y ( s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
62
63 / Det er mi ne t o t a l ener gy of s pi n g l a s s us i ng a l t e r n a t i v e s pi n v e c t o r
64 s pi nGl as s ( read ) s pi n g l a s s dat a s t r u c t u r e , whose bonds ar e r e f e r e n c e d
65 c onf ( read ) v e c t o r of s p i n s whose s t a t e s ar e r e f e r e n c e d
67 gdoubl e s p i n g l a s s e n e r g y c o n f ( s t r uc t Spi nGl a s s s pi nGl a s s , Spi n conf ) ;
68
69 / Det er mi ne ener gy of s pi n row
70 row s pi n row i n range [ 0 , NROWS)
71 s pi nGl as s ( read ) s pi n g l a s s dat a s t r u c t u r e
73 gdoubl e s p i n g l a s s r o w e n e r g y ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t row) ;
74
75 / Det er mi ne ener gy r e s u l t i n g f rom v e r t i c a l i n t e r a c t i o n s bet ween t wo rows row , row+1
77 row row i n s pi n l a t t i c e , i n t he range [ 0 , NROWS)
78 r e t u r n s row energy , ac c ount i ng f o r c y c l i c boundary i n t e r a c t i o n s /
79 gdoubl e s p i n g l a s s i n t e r r o w e n e r g y ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t row) ;
80
81 / Det er mi ne ener gy bet ween s pi n and i t s ne i ghbour s i mme di at e l y above and t o t he l e f t
of i t
83 l e a d i n g S p i n s pi n p o s i t i o n i n t he range [ 0 , XSIZEYSIZE ) , wi t h row maj or
enumer at i on /
84 gdoubl e s p i n g l a s s e n s e mb l e d e l t a ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t l e a d i n g Sp i n ) ;
85
86 / Wr i t e s pi n s t a t e s t o f i l e
87 f i l e ( read ) f i l e t o wr i t e t o
88 s p i n g l a s s ( read ) s pi n g l a s s dat a s t r u c t u r e /
89 voi d s p i n g l a s s wr i t e s p i n s ( s t r uc t Spi nGl a s s s pi nGl a s s , FILE f i l e ) ;
90
91 / Wr i t e s pi n s t a t e s t o f i l e
92 c onf ( read ) s pi n c o n f i g u r a t i o n v e c t o r t o out pu t
93 s pi nGl as s ( read ) used t o s p e c i f y l a t t i c e di me ns i ons
94 f i l e ( read ) f i l e t o wr i t e t o /
95 voi d s p i n g l a s s wr i t e s p i n s c o n f ( s t r uc t Spi nGl a s s s pi nGl a s s , Spi n conf , FILE f i l e ) ;
96
97 / Wr i t e c oupl i ng c o n s t a n t s t o f i l e
98 s p i n g l a s s ( read ) s pi n g l a s s dat a s t r u c t u r e
100 voi d s p i n g l a s s wr i t e we i g h t s ( s t r uc t Spi nGl a s s s pi nGl a s s , FILE f i l e ) ;
101
102 / Wr i t e cl ampi ng s t a t e s t o f i l e
105 voi d s p i n g l a s s wr i t e c l a mp s ( s t r uc t Spi nGl a s s s pi nGl a s s , FILE f i l e ) ;
106
107 / Gener at e random s p i n s based on uni f or m d i s t r i b u t i o n , ac c ount i ng f o r cl amped s p i n s
108 s pi nGl as s ( read ) used t o s p e c i f y l a t t i c e di me ns i ons and cl ampi ng s t a t e s
109 r e t u r n s v e c t o r of s p i n s s t o r i n g l a t t i c e c o n f i g u r a t i o n /
110 Spi n s p i n g l a s s g e t r a n d o m s p i n s ( s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
111
112 / Det er mi ne whet her s pi n g l a s s has c y c i c v e r t i c a l boundary i n t e r a c t i o n s
114 r e t u r n s TRUE i f c o n d i t i o n p r e s e n t /
115 gbool ean s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
116
117 / Compare s pi n s t a t e s of t wo s pi n g l a s s e s
118 s p i n g l a s s 1 ( read ) s pi n g l a s s dat a s t r u c t u r e
119 s p i n g l a s s 2 ( read ) s pi n g l a s s dat a s t r u c t u r e
120 r e t u r n s minimum number of d i f f e r i n g s pi ns , c o n s i d e r i n g s pi nGl as s 1 s
i n v e r s i o n /
121 g i n t s p i n g l a s s c o r r e l a t e ( s t r uc t Spi nGl a s s s pi nGl a s s 1 , s t r uc t Spi nGl a s s s pi nGl a s s 2 ) ;
122
123 # e ndi f / SPINGLASS H /
141
1 /
2 F i l e i : s p i n g l a s s . c
3
4 I mpl ement s s pi n g l a s s o p e r a t i o n i n t e r f a c e
5
6 /
7
12
16
17 s t r uc t Spi nGl a s s s p i n g l a s s a l l o c ( g i n t xSi ze , g i n t ySi ze , Spi n i n i t i a l S p i n s , gdoubl e
wei ght s , gbool ean cl amps )
18 g i n t i ;
19
20 s t r uc t Spi nGl a s s s p i n Gl a s s = g new ( s t r uc t Spi nGl as s , 1) ;
21
22 s pi nGl a s s >xSi ze = xSi ze ;
23 s pi nGl a s s >ySi ze = ySi ze ;
24 i f ( xSi ze < 2 ySi ze < 2)
25 g f p r i n t f ( s t d e r r , Warni ng : Tr i e d t o c o n s t r u c t s pi n g l a s s wi t h di mens i ons %d
by %d\n , xSi ze , ySi ze ) ;
26
27
28 / Al l o c a t e s pi n mat r i x /
29 i f ( i n i t i a l S p i n s == NULL)
30 s pi nGl a s s >s p i n s = g new ( Spi n , xSi ze ySi ze ) ;
31 / As s i gn d e f a u l t v a l ue s /
32 f or ( i =0; i <xSi ze ySi ze ; i ++) ( s pi nGl a s s >s p i n s ) [ i ] = UP;
33 s pi nGl a s s >i n i t i a l S p i n s = NULL;
34 e l s e
35 s pi nGl a s s >s p i n s = i n i t i a l S p i n s ;
36 / Se t i n i t i a l s p i n s /
37 s pi nGl a s s >i n i t i a l S p i n s = g new ( Spi n , xSi ze ySi ze ) ;
38 memcpy ( s pi nGl a s s >i n i t i a l S p i n s , s pi nGl a s s >s pi ns , s i z e o f ( Spi n ) xSi ze ySi ze ) ;
39
40
41 / Al l o c a t e bond wei ght mat r i x s t o r e s v e r t i c a l bonds , t he n h o r i z o n t a l bonds /
42 i f ( we i ght s == NULL) s pi nGl a s s >we i ght s = g new0 ( gdoubl e , xSi ze ySi ze 2) ;
43 e l s e s pi nGl a s s >we i ght s = we i ght s ;
44
45 s pi nGl a s s >cl amps = cl amps ;
46
47 ret urn s p i n Gl a s s ;
48
49
50 voi d s p i n g l a s s f r e e ( s t r uc t Spi nGl a s s s p i n Gl a s s )
51 / Free a l l f i e l d s /
52 i f ( s pi nGl a s s >s p i n s != NULL) g f r e e ( s pi nGl a s s >s p i n s ) ;
53 i f ( s pi nGl a s s >i n i t i a l S p i n s != NULL) g f r e e ( s pi nGl a s s >i n i t i a l S p i n s ) ;
54 i f ( s pi nGl a s s >we i ght s != NULL) g f r e e ( s pi nGl a s s >we i ght s ) ;
55 i f ( s pi nGl a s s >cl amps != NULL) g f r e e ( s pi nGl a s s >cl amps ) ;
56
57 g f r e e ( s p i n Gl a s s ) ;
58
59
60 gdoubl e s p i n g l a s s r o w e n e r g y ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t row)
61 g i n t i ;
62 gdoubl e ener gy = 0;
63
64 gdoubl e wei ght ; / Bond wei ght /
65 Spi n s pi n0 , s pi n1 ; / Nei ghbour s p i n s /
66
67 g i n t xSi ze = s pi nGl a s s >xSi ze ;
68 g i n t ySi ze = s pi nGl a s s >ySi ze ;
69 Spi n s p i n s = s pi nGl a s s >s p i n s ;
70 gdoubl e we i ght s = s pi nGl a s s >we i ght s ;
71
72 / I t e r a t e t hr ough row s p i n s /
73 f or ( i =0; i <ySi ze ; i ++)
74 s pi n0 = Ar r ayAccess2D ( s pi ns , ySi ze , row , i ) ;
75
76 / Ca l c u l a t e h o r i z o n t a l bond ener gy /
77 wei ght = Ar r ayAccess3D ( wei ght s , ySi ze , xSi ze , row , i , 1) ;
78 i f ( i <ySi ze 1) s pi n1 = Ar r ayAccess2D ( s pi ns , ySi ze , row , i +1) ;
79 e l s e s pi n1 = Ar r ayAccess2D ( s pi ns , ySi ze , row , 0) ;
80 ener gy += wei ght s pi n0 s pi n1 ;
81
82 / Se t ener gy t o MAXFLOAT, i f s pi n0 s t a t e i s u n p e r mi s s i b l e due t o cl amp s t a t e
/
83 i f ( s pi nGl a s s >cl amps != NULL)
84 gbool ean cl amp = Ar r ayAccess2D ( s pi nGl a s s >cl amps , ySi ze , row , i ) ;
85 i f ( cl amp == TRUE && s pi n0 != Ar r ayAccess2D ( s pi nGl a s s >i n i t i a l S p i n s , ySi ze ,
row , i ) )
86 ener gy = G MAXDOUBLE;
87
88
89
90
91 ret urn 1 ener gy ;
92
93
94 gdoubl e s p i n g l a s s i n t e r r o w e n e r g y ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t row)
95 g i n t i ;
97
98 gdoubl e wei ght ;
99 Spi n s pi n0 , s pi n1 ;
100
143
105
106 / I t e r a t e t hr ough row s pi ns , ac c umul at i ng ener gy /
107 f or ( i =0; i <ySi ze ; i ++)
108 s pi n0 = Ar r ayAccess2D ( s pi ns , ySi ze , row , i ) ;
109
110 / Ca l c u l a t e v e r t i c a l bond ener gy /
111 wei ght = Ar r ayAccess3D ( wei ght s , ySi ze , xSi ze , row , i , 0) ;
112 i f ( row<xSi ze 1) s pi n1 = Ar r ayAccess2D ( s pi ns , ySi ze , row+1 , i ) ;
113 e l s e s pi n1 = Ar r ayAccess2D ( s pi ns , ySi ze , 0 , i ) ;
115
116
118
119
120 gdoubl e s p i n g l a s s e n s e mb l e d e l t a ( s t r uc t Spi nGl a s s s pi nGl a s s , g i n t l e a d i n g Sp i n )
122
123 Spi n s pi n0 , s pi n1 ;
126
127 g i n t row = l e a d i n g Sp i n / s pi nGl a s s >ySi ze ;
128 g i n t col umn = l e a d i n g Sp i n % s pi nGl a s s >ySi ze ;
129
133
134 i f ( row > 0)
135 / Ca l c u l a t e v e r t i c a l component /
136 s pi n0 = Ar r ayAccess2D ( s pi ns , ySi ze , row , col umn ) ;
137 s pi n1 = Ar r ayAccess2D ( s pi ns , ySi ze , row1 , col umn ) ;
138 wei ght = Ar r ayAccess3D ( wei ght s , ySi ze , xSi ze , row1 , column , 0) ;
140
141
142 i f ( col umn > 0)
143 / Ca l c u l a t e h o r i z o n t a l component /
144 s pi n0 = Ar r ayAccess2D ( s pi ns , ySi ze , row , column 1) ;
145 s pi n1 = Ar r ayAccess2D ( s pi ns , ySi ze , row , col umn ) ;
146 wei ght = Ar r ayAccess3D ( wei ght s , ySi ze , xSi ze , row , column 1 , 1) ;
148
149
151
152
153 gdoubl e s p i n g l a s s e n e r g y ( s t r uc t Spi nGl a s s s p i n Gl a s s )
154
156
157 g i n t i ;
158 / Tot al ener gy i s sum of rows e n e r g i e s and row i n t e r a c t i o n s /
159 f or ( i =0; i <s pi nGl a s s >xSi ze ; i ++)
160 ener gy += s p i n g l a s s i n t e r r o w e n e r g y ( s pi nGl a s s , i ) ;
161 ener gy += s p i n g l a s s r o w e n e r g y ( s pi nGl a s s , i ) ;
162
163
165
166
167 gdoubl e s p i n g l a s s e n e r g y c o n f ( s t r uc t Spi nGl a s s s pi nGl a s s , Spi n conf )
169
170 Spi n c u r r e n t Sp i n s = s pi nGl a s s >s p i n s ;
171 s pi nGl a s s >s p i n s = conf ;
173 s pi nGl a s s >s p i n s = c u r r e n t Sp i n s ;
174
176
177
178 voi d s p i n g l a s s wr i t e s p i n s ( s t r uc t Spi nGl a s s s pi nGl a s s , FILE f i l e )
179 g i n t i , j ;
180 Spi n s pi n ;
181
182 / I t e r a t e t hr ough s p i n s and f or mat out put /
184 f or ( j =0; j <s pi nGl a s s >ySi ze ; j ++)
185 s pi n = Ar r ayAccess2D ( s pi nGl a s s >s pi ns , s pi nGl a s s >ySi ze , i , j ) ;
186 i f ( s pi n == UP)
187 g f p r i n t f ( f i l e , %s , STR SPIN UP ) ;
188 e l s e
189 g f p r i n t f ( f i l e , %s , STR SPIN DOWN) ;
190
191
192 g f p r i n t f ( f i l e , %s , ) ;
193
194
195 g f p r i n t f ( f i l e , %s , \n ) ;
196
197
198
199 voi d s p i n g l a s s wr i t e s p i n s c o n f ( s t r uc t Spi nGl a s s s pi nGl a s s , Spi n conf , FILE f i l e )
200 Spi n c u r r e n t Sp i n s = s pi nGl a s s >s p i n s ;
201 s pi nGl a s s >s p i n s = conf ;
202 s p i n g l a s s wr i t e s p i n s ( s pi nGl a s s , f i l e ) ;
203 s pi nGl a s s >s p i n s = c u r r e n t Sp i n s ;
204
205
206 voi d s p i n g l a s s wr i t e we i g h t s ( s t r uc t Spi nGl a s s s pi nGl a s s , FILE f i l e )
207 g i n t i , j , k ;
209
210 / I t e r a t e t hr ough we i ght s and f or mat out put /
211 f or ( k=0; k <2; k++)
145
214 wei ght = Ar r ayAccess3D ( s pi nGl a s s >wei ght s , s pi nGl a s s >ySi ze , s pi nGl a s s
>xSi ze , i , j , k ) ;
215 g f p r i n t f ( f i l e , WEIGHT FMT , wei ght ) ;
216
217
218 g f p r i n t f ( f i l e , %s , \n ) ;
219
220
221 g f p r i n t f ( f i l e , %s , \n ) ;
222
223
224
225 voi d s p i n g l a s s wr i t e c l a mp s ( s t r uc t Spi nGl a s s s pi nGl a s s , FILE f i l e )
226 g i n t i , j ;
227 gbool ean cl amp ;
228
229 / I t e r a t e t hr ough cl amps and f or mat out put /
232 cl amp = Ar r ayAccess2D ( s pi nGl a s s >cl amps , s pi nGl a s s >ySi ze , i , j ) ;
233 i f ( cl amp )
234 g f p r i n t f ( f i l e , %s , STR CLAMPED) ;
235 e l s e
236 g f p r i n t f ( f i l e , %s , STR UNCLAMPED) ;
237
238
239 g f p r i n t f ( f i l e , %s , ) ;
240
241
242 g f p r i n t f ( f i l e , %s , \n ) ;
243
244
245
246 Spi n s p i n g l a s s g e t r a n d o m s p i n s ( s t r uc t Spi nGl a s s s p i n Gl a s s )
247 g i n t t o t a l = s pi nGl a s s >xSi ze s pi nGl a s s >ySi ze ;
248 g i n t i ;
249
250 / Al l o c a t e s p i n s /
251 Spi n s p i n s = g new ( Spi n , t o t a l ) ;
252
253 / As s i gn s pi n v al ue s /
254 f or ( i =0; i <t o t a l ; i ++)
255 i f ( s pi nGl a s s >cl amps != NULL && ( s pi nGl a s s >cl amps ) [ i ] )
256 / Clamped s t a t u s /
257 s p i n s [ i ] = ( s pi nGl a s s >s p i n s ) [ i ] ;
258 e l s e
259 / As s i gn random s pi n v al ue s /
260 gbool ean randomVal = r a n d c o i n t o s s ( ) ;
261 i f ( randomVal == TRUE)
262 s p i n s [ i ] = UP;
263 e l s e
264 s p i n s [ i ] = DOWN;
265
266
267
268
269 ret urn s p i n s ;
270
271
272 gbool ean s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s t r uc t Spi nGl a s s s p i n Gl a s s )
273 gbool ean ha s Ve r t i c a l Bounda r y = FALSE;
274
277
278 g i n t i ;
279 / I t e r a t e t hr ough s p i n s i n u l t i ma t e row , c he c k i ng f o r nonz e r o s pi n v al ue s /
280 f or ( i =0; i <ySi ze && ! ha s Ve r t i c a l Bounda r y ; i ++)
281 gdoubl e wei ght = Ar r ayAccess3D ( s pi nGl a s s >wei ght s , ySi ze , xSi ze , xSi ze 1 , i ,
0) ;
282 i f ( wei ght != 0) ha s Ve r t i c a l Bounda r y = TRUE;
283
284
285 ret urn ha s Ve r t i c a l Bounda r y ;
286
287
288 g i n t s p i n g l a s s c o r r e l a t e ( s t r uc t Spi nGl a s s s pi nGl a s s 1 , s t r uc t Spi nGl a s s s pi nGl a s s 2 )
289 g i n t i , j , k ;
290 g i n t f i n a l Di s t a n c e = G MAXINT;
291 g i n t d i s t a n c e ;
292
293 f or ( k=0; k <2; k++)
294 / Repeat , compari ng bot h o r i g i n a l and i n v e r s e of s pi nGl as s 1 /
295 d i s t a n c e = 0;
296
297 f or ( i =0; i <s pi nGl a s s 1 >xSi ze ; i ++)
298 f or ( j =0; j <s pi nGl a s s 1 >ySi ze ; j ++)
299 Spi n s pi n1 = Ar r ayAccess2D ( s pi nGl a s s 1 >s pi ns , s pi nGl a s s 1 >ySi ze , i , j )
;
300 Spi n s pi n2 = Ar r ayAccess2D ( s pi nGl a s s 2 >s pi ns , s pi nGl a s s 2 >ySi ze , i , j )
;
301 i f ( k == 0)
302 i f ( s pi n1 != s pi n2 ) d i s t a n c e ++;
303 e l s e
304 i f ( s pi n1 == s pi n2 ) d i s t a n c e ++;
305
306
307
308 i f ( d i s t a n c e < f i n a l Di s t a n c e ) f i n a l Di s t a n c e = d i s t a n c e ;
309
310
311 ret urn f i n a l Di s t a n c e ;
312
147
1 /
2 Fi l e : i o . h
3
4 S p e c i f i e s IO o p e r a t i o n i n t e r f a c e
5
6 /
7
9
10 # i f nde f IO H
11 # de f i ne IO H
12
13 / For f i l e i n p u t r o u t i n e s us i ng f g e t s ( ) /
14 # de f i ne MAX LINE LEN 100000
15
16 / Read s pi n c o n f i g u r a t i o n f rom f i l e
17 f i l eName ( read ) f i l e name f rom whi ch t o i n i t i a t e r e adi ng
18 x S i z e ( wr i t e ) number of rows i n t he obt ai ne d c o n f i g u r a t i o n
19 y S i z e ( wr i t e ) number of col umns i n t he obt ai ne d c o n f i g u r a t i o n
20 r e t u r n s v e c t o r of s pi ns , s t o r e d i n row maj or or der /
21 Spi n r e a d s p i n s ( gchar f i l eName , g i n t xSi ze , g i n t ySi ze ) ;
22
23 / Read s pi n cl ampi ng s t a t e f rom f i l e
27 r e t u r n s v e c t o r of s pi n cl amp s t a t e s , s t o r e d i n row maj or or der /
28 gbool ean r e a d c l a mps ( gchar f i l eName , g i n t xSi ze , g i n t ySi ze ) ;
29
30 / Read s pi n bond c o n f i g u r a t i o n f rom f i l e
34 r e t u r n s v e c t o r of s pi n bonds , s t o r e d i n row maj or or der
35 dat a f o r v e r t i c a l bonds pr ecede t hos e f o r h o r i z o n a l bonds /
36 gdoubl e r e a d we i g h t s ( gchar f i l eName , g i n t xSi ze , g i n t ySi ze ) ;
37
38
39 / Wr i t e s pi n c o n f i g u r a t i o n t o f i l e
40 s pi nGl as s ( read ) dat a s t r u c t u r e s t o r i n g s pi n g l a s s dat a
41 f i l eName ( read ) f i l e name t o wr i t e dat a t o /
42 voi d wr i t e s p i n s ( s t r uc t Spi nGl a s s s pi nGl a s s , gchar f i l eName ) ;
43
44 / Wr i t e s pi n cl ampi ng s t a t e t o f i l e
47 voi d wr i t e c l a mps ( s t r uc t Spi nGl a s s s pi nGl a s s , gchar f i l eName ) ;
48
49 / Wr i t e s pi n bond c o n f i g u r a t i o n t o f i l e
52 voi d wr i t e we i g h t s ( s t r uc t Spi nGl a s s s pi nGl a s s , gchar f i l eName ) ;
53
54 # e ndi f / IO H /
1 /
2 Fi l e : i o . c
3
4 I mpl ement s IO o p e r a t i o n s s p e c i f i e d i n i o . h
5
6 /
7
13
15 # i nc l ude i o . h
16
17 / Par s es a f i l e , addi ng t o k e ns t o a queue
18 f i l eName ( read ) f i l e name t o read f rom
19 x S i z e ( wr i t e ) number of t ok e n rows c ont ai ne d i n t he f i l e
20 y S i z e ( wr i t e ) number of t ok e n col umns c ont ai ne d on t he f i l e
21 r e t u r n s queue c o n t a i n i n g par s ed t ok e ns /
22 s t a t i c GQueue p a r s e f i l e ( gchar f i l eName , g i n t xSi ze , g i n t ySi ze )
23 g i n t nRows = 0;
24 g i n t nCol s = 0;
25 g i n t nCol Check = 0;
26
27 GQueue t okenQueue = g queue new ( ) ;
28
29 FILE f i l e = f open ( f i l eName , r ) ;
30 gchar l i n e [ MAX LINE LEN+1] ;
31 i f ( f i l e != NULL)
32
33 / Read l i n e s u n t i l end of f i l e , pr oc e s s i f non z e r o l e n g t h /
34 whi l e (NULL != f g e t s ( l i ne , MAX LINE LEN, f i l e ) )
35 i f ( s t r l e n ( l i n e ) > 0 && l i n e [ 0 ] != \n )
36 gchar t oken ;
37 nRows++;
38
39 nCol Check = 0;
40 / Tok e ni s e l i n e s /
41 t oken = s t r t o k ( l i n e , \ t \n ) ;
42 whi l e ( t oken != NULL)
43 gchar tokenMem = g ma l l oc ( s t r l e n ( t oken ) + 1) ;
44 s t r c p y ( tokenMem , t oken ) ;
45
46 nCol Check ++;
47
48 / Add t ok e n t o queue /
49 g q u e u e p u s h t a i l ( t okenQueue , tokenMem) ;
50 t oken = s t r t o k (NULL, \ t \n ) ;
51
52
53 / Check f o r mat chi ng row l e n g t h s /
54 i f ( nCol s == 0) nCol s = nCol Check ;
149
55 i f ( nCol Check != nCol s )
56 g f p r i n t f ( s t d e r r , Er r or : The i n p u t da t a ma t r i x does not c o n t a i n
rows of e qua l l e n g t h s . \ n ) ;
57 e x i t ( 1) ;
58
59
60
61 e l s e
62 g f p r i n t f ( s t d e r r , An e r r o r oc c ur r e d whi l e openi ng t he f i l e %s . \ n , f i l eName ) ;
63 e x i t ( 1) ;
64
65
66 f c l o s e ( f i l e ) ;
67
68 xSi ze = nRows ;
69 ySi ze = nCol s ;
70
71 ret urn t okenQueue ;
72
73
74 Spi n r e a d s p i n s ( gchar f i l eName , g i n t xSi ze , g i n t ySi ze )
75 / Re t r i e v e t ok e n s i n f i l e /
76 GQueue t okenQueue = p a r s e f i l e ( f i l eName , xSi ze , ySi ze ) ;
77 Spi n s p i n s = g new ( Spi n , ( xSi ze ) ( ySi ze ) ) ;
78
79 i nt i =0;
80
81 / Pr oces s t ok e n s /
82 whi l e ( g q u e u e g e t l e n g t h ( t okenQueue ) > 0)
83 / Get t ok e n /
84 gchar t oken = g queue pop head ( t okenQueue ) ;
85
86 / Check whet her s t r i n g s assume e x pe c t e d v a l ue s /
87 i f ( s t r cmp ( t oken , STR SPIN UP ) ==0)
88 s p i n s [ i ] = UP;
89 e l s e i f ( s t r cmp ( t oken , STR SPIN DOWN) ==0)
90 s p i n s [ i ] = DOWN;
91 e l s e
92 g f p r i n t f ( s t d e r r , Er r or : Unr ecogni s ed s pi n da t a . \ n ) ;
93 e x i t ( 1) ;
94
95
96 g f r e e ( t oken ) ;
97 i ++;
98
99
100 g q u e u e f r e e ( t okenQueue ) ;
101 ret urn s p i n s ;
102
103
104
105 voi d wr i t e s p i n s ( s t r uc t Spi nGl a s s s pi nGl a s s , gchar f i l eName )
106 / Open f i l e , d e l e g a t e t o s pi n g l a s s modul e /
107
108 FILE f i l e = f open ( f i l eName , w ) ;
109
110 i f ( f i l e != NULL)
111 s p i n g l a s s wr i t e s p i n s ( s pi nGl a s s , f i l e ) ;
112 e l s e
113 g f p r i n t f ( s t d e r r , An e r r o r oc c ur r e d whi l e openi ng t he f i l e %s . , f i l eName ) ;
114
115
116 f c l o s e ( f i l e ) ;
117
118
119 gbool ean r e a d c l a mps ( gchar f i l eName , g i n t xSi ze , g i n t ySi ze )
120 / Re t r i e v e t ok e ns i n f i l e /
121 GQueue t okenQueue = p a r s e f i l e ( f i l eName , xSi ze , ySi ze ) ;
122 gbool ean cl amps = g new ( gbool ean , ( xSi ze ) ( ySi ze ) ) ;
123
124 i nt i =0;
125
126 / Pr oces s t ok e ns /
128 / Get t ok e n /
130
131 / Check whet her s t r i n g s assume e x pe c t e d v a l ue s /
132 i f ( s t r cmp ( t oken , STR CLAMPED) ==0)
133 cl amps [ i ] = TRUE;
134 e l s e i f ( s t r cmp ( t oken , STR UNCLAMPED) ==0)
135 cl amps [ i ] = FALSE;
136 e l s e
137 g f p r i n t f ( s t d e r r , Er r or : Unr ecogni s ed s pi n da t a . \ n ) ;
138 e x i t ( 1) ;
139
140
141 g f r e e ( t oken ) ;
142 i ++;
143
144
146 ret urn cl amps ;
147
148
149 voi d wr i t e c l a mps ( s t r uc t Spi nGl a s s s pi nGl a s s , gchar f i l eName )
152
153 i f ( f i l e != NULL)
154 s p i n g l a s s wr i t e c l a mp s ( s pi nGl a s s , f i l e ) ;
155 e l s e
157
158
159 f c l o s e ( f i l e ) ;
160
161
151
162 gdoubl e r e a d we i g h t s ( gchar f i l eName , g i n t xSi ze , g i n t ySi ze )
163 g i n t nRows , nCol s ;
164 g i n t i =0;
165
166 / Re t r i e v e t ok e ns i n f i l e /
167 GQueue t okenQueue = p a r s e f i l e ( f i l eName , &nRows , &nCol s ) ;
168 gdoubl e we i ght s = g new ( gdoubl e , ( nRows nCol s ) ) ;
169
170 / Account f o r v e r t i c a l and h o r i z o n t a l we i ght s s t o r e d i n f i l e /
171 xSi ze = nRows / 2 ;
172 ySi ze = nCol s ;
173
174 / Si mpl e check f o r mat chi ng v e r t i c a l / h o r i z o n t a l bond numbers /
175 i f ( nRows % 2 == 1)
176 g f p r i n t f ( s t d e r r , Odd number of da t a rows d e t e c t e d when r e a di n g bond f i l e .
Shoul d be even . \ n ) ;
177 e x i t ( 1) ;
178
179
180 / Pr oces s t o k e ns /
182 / Get t ok e n /
184 gdoubl e wei ght Val = 0;
185
186 / Conver t t o doubl e /
187 i f ( s s c a n f ( t oken , WEIGHT FMT, &wei ght Val ) != 1)
188 g f p r i n t f ( s t d e r r , Er r or : Unr ecogni s ed bond da t a . \ n ) ;
189 e x i t ( 1) ;
190
191
192 we i ght s [ i ++] = wei ght Val ;
193
194 g f r e e ( t oken ) ;
195
196
198 ret urn we i ght s ;
199
200
201 voi d wr i t e we i g h t s ( s t r uc t Spi nGl a s s s pi nGl a s s , gchar f i l eName )
204
205 i f ( f i l e != NULL)
206 s p i n g l a s s wr i t e we i g h t s ( s pi nGl a s s , f i l e ) ;
207 e l s e
209
210
211 f c l o s e ( f i l e ) ;
212
1 /
2 Fi l e : ar r ay s . h
3
4 S p e c i f i e s ar r ay o p e r a t i o n i n t e r f a c e
5 and d e f i n e s macros f o r ar r ay o p e r a t i o n s
6
7 /
8
10
11 # i f nde f ARRAYS H
12 # de f i ne ARRAYS H
13
14 / Emul at es twodi me ns i onal ar r ay ac c e s s
15 ar r ay p o i n t e r t o dat a
16 i , j ar r ay i n d i c e s /
17 # de f i ne Ar r ayAccess2D ( a r r a y , r owl engt h , i , j ) ( ( a r r a y ) [ ( i ) ( r owl e ngt h ) + ( j ) ] )
18
19 / Emul at es t hr e e di me ns i onal ar r ay ac c e s s
20 ar r ay p o i n t e r t o dat a
21 i , j , k ar r ay i n d i c e s /
22 # de f i ne Ar r ayAccess3D ( a r r a y , r owl engt h , col umnl engt h , i , j , k ) ( ( a r r a y ) [ ( col umnl engt h )
( r owl e ngt h ) ( k ) + ( i ) r owl e ngt h + ( j ) ] )
23
24
25 / Array dat a t y p e s /
26 t ypedef gui nt 64 t i n t ;
27 t ypedef gdoubl e t d o u b l e ;
28
29 / Cons t r uc t twodi me ns i onal ar r ay . Data c o n t i g u i t y i s ens ur ed
30 nRows number of rows
31 nCol s number of col umns
32 r e t u r n s p o i n t e r t o a l l o c a t e d dat a /
33 t i n t ar r ay new 2D ( t i n t nRows , t i n t nColumns ) ;
34
35 / De s t r u c t twodi me ns i onal ar r ay p r e v i o u s l y a l l o c a t e d wi t h array new 2D ( )
36 ar r ay t he ar r ay t o d e s t r u c t /
37 voi d a r r a y f r e e 2 D ( t i n t a r r a y ) ;
38
39 / Cons t r uc t twodi me ns i onal ar r ay . Data c o n t i g u i t y i s ens ur ed
40 nRows number of rows
41 nCol s number of col umns
42 nZ s i z e of t h i r d di mens i on
43 r e t u r n s p o i n t e r t o a l l o c a t e d dat a /
44 t d o u b l e ar r ay new 3D ( t i n t nZ , t i n t nRows , t i n t nColumns ) ;
45
46
47 / De s t r u c t t hr e e di me ns i onal ar r ay p r e v i o u s l y a l l o c a t e d wi t h array new 3D ( )
48 ar r ay t he ar r ay t o d e s t r u c t /
49 voi d a r r a y f r e e 3 D ( t d o u b l e a r r a y ) ;
50
51 i nt a r r a y u t e s t ( voi d ) ;
52
53 # e ndi f / ARRAYS H /
153
1 /
2 Fi l e : ar r ay s . c
3
4 I mpl ement s ar r ay o p e r a t i o n i n t e r f a c e s p e c i f i e d i n ar r ay s . h
5
6 /
7
10
12
13 t i n t ar r ay new 2D ( t i n t nRows , t i n t nColumns )
14 g i n t i ;
15
16 / Al l o c a t e p o i n t e r bl oc k /
17 t i n t a r r a y = g ma l l oc ( nRows s i z e o f ( t i n t ) ) ;
18 / Al l o c a t e dat a bl oc k /
19 a r r a y [ 0 ] = g ma l l oc ( nRows nColumns s i z e o f ( t i n t ) ) ;
20 / As s i gn dat a o f f s e t s /
21 f or ( i =1; i <nRows ; i ++) a r r a y [ i ] = a r r a y [ 0 ] + i nColumns ;
22
23 ret urn a r r a y ;
24
25
26 voi d a r r a y f r e e 2 D ( t i n t a r r a y )
27 g f r e e ( a r r a y [ 0 ] ) ;
28 g f r e e ( a r r a y ) ;
29
30
31 t d o u b l e ar r ay new 3D ( t i n t nZ , t i n t nRows , t i n t nColumns )
32 g i n t i ;
33
34 / Al l o c a t e p o i n t e r bl oc k /
35 t d o u b l e a r r a y = g ma l l oc ( nZ s i z e o f ( t d o u b l e ) ) ;
36 / Al l o c a t e s econds p o i n t e r bl oc k /
37 a r r a y [ 0 ] = g ma l l oc ( nZ nRows s i z e o f ( t d o u b l e ) ) ;
38 / Al l o c a t e dat a bl oc k /
39 a r r a y [ 0 ] [ 0 ] = g ma l l oc ( nZ nRows nColumns s i z e o f ( t d o u b l e ) ) ;
40
41 / As s i gn dat a bl oc k /
42 f or ( i =0; i <nZ ; i ++) a r r a y [ i ] = a r r a y [ 0 ] + nRows i ;
43 / As s i gn dat a bl oc k /
44 f or ( i =0; i <nZ nRows ; i ++) ( a r r a y ) [ i ] = ( a r r a y ) [ 0 ] + i nColumns ;
45
46 ret urn a r r a y ;
47
48
49 voi d a r r a y f r e e 3 D ( t d o u b l e a r r a y )
50
51 g f r e e ( a r r a y [ 0 ] [ 0 ] ) ;
52 g f r e e ( a r r a y [ 0 ] ) ;
53 g f r e e ( a r r a y ) ;
54
55
56 i nt a r r a y u t e s t ( voi d )
57
58 g i n t i , j , k ;
59 t i n t a r r a y = ar r ay new 2D ( 10 , 10) ;
60 t d o u b l e a r r a y2 = ar r ay new 3D ( 5 , 32 , 32) ;
61
62 f or ( i =0; i <10; i ++)
63 f or ( j =0; j <10; j ++) a r r a y [ i ] [ j ] = i 10 + j ;
64
65 f or ( i =0; i <10; i ++)
66 f or ( j =0; j <10; j ++) g a s s e r t ( a r r a y [ i ] [ j ] == i 10+ j ) ;
67
68
69 a r r a y f r e e 2 D ( a r r a y ) ;
70
71 f or ( i =0; i <5; i ++)
72 f or ( j =0; j <32; j ++)
73 f or ( k=0; k <32; k++)
74 a r r a y2 [ i ] [ k ] [ j ] = i 1024 + k 32 + j ;
75 g a s s e r t ( a r r a y2 [ i ] [ k ] [ j ] == i 1024 + k 32 + j ) ;
76
77
78
79
80 f or ( i =0; i <5; i ++)
81 f or ( j =0; j <32; j ++)
82 f or ( k=0; k <32; k++)
83 g a s s e r t ( a r r a y2 [ i ] [ k ] [ j ] == i 1024 + k 32 + j ) ;
84
85
86
87
88 a r r a y f r e e 3 D ( a r r a y2 ) ;
89
90 ret urn 0;
91
155
1 /
2 Fi l e : random . h
3
4 De f i ne s i n t e r f a c e f o r random number g e n e r a t i o n
5
6 /
7
9
10 / Gener at e c o n t i n u o u s l y d i s t r i b u t e d random doubl e i n t he range [ l ower , upper )
11 l ower l ower l i mi t
12 upper upper l i mi t /
13 gdoubl e r a n d c o n t i n u o u s ( gdoubl e l ower , gdoubl e upper ) ;
14
15 / Gener at e e q u a l l y d i s t r i b u t e d random bool ean
16 /
17 gbool ean r a n d c o i n t o s s ( ) ;
1 /
2 Fi l e : random . c
3
4 I mpl ement s i n t e r f a c e f o r random number g e n e r a t i o n
5
6 /
7
11
12 gdoubl e r a n d c o n t i n u o u s ( gdoubl e l ower , gdoubl e upper )
13 ret urn g r a ndom doubl e r a nge ( l ower , upper ) ;
14
15
16 gbool ean r a n d c o i n t o s s ( )
17 gbool ean va l ue = g r andom bool ean ( ) ;
18 ret urn va l ue ;
19
157
1 /
2 Fi l e : b f o r c e g s t a t e f i n d e r . c
3
4 I mpl ement s br ut e f o r c e ground s t a t e f i n d e r
5
6 /
7
13
14 s t a t i c voi d f i n d g r o u n d s t a t e s b r u t e f o r c e ( g i n t l e a di ngSpi n , gdoubl e mi nEnergy ,
s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
15
17 g i n t nSpi ns = s pi nGl a s s >xSi ze s pi nGl a s s >ySi ze ;
18 gdoubl e mi nEnergy = G MAXDOUBLE;
19
20 / I n i t i a t e br ut e f o r c e e v a l u a t i o n /
21 f i n d g r o u n d s t a t e s b r u t e f o r c e ( nSpi ns , &mi nEnergy , s p i n Gl a s s ) ;
22
23 ret urn mi nEnergy ;
24
25
26 / Re c ur s i v e br ut e f o r c e ground s t a t e e v a l u a t i o n
27 l e a d i n g S p i n s pi n window p o s i t i o n , used t o s p e c i f y s t a t e t o be f l i p p e d . Used t o
e v a l u a t e base cas e
28 mi nEnergy ( read / wr i t e ) Records c u r r e n t minimum ener gy . For each i n v o c a t i o n of
t he f u n c t i o n , s t a t e s ar e out put i f t h e i r ener gy i s l ower t han t he v al ue
c u r r e n t l y he l d by t h i s v a r i a b l e
29 s pi nGl as s ( read / wr i t e ) s pi n g l a s s dat a s t r u c t u r e whose s p i n s ar e mani pul at e d
dur i ng s ear ch /
30 s t a t i c voi d f i n d g r o u n d s t a t e s b r u t e f o r c e ( g i n t l e a di ngSpi n , gdoubl e mi nEnergy ,
s t r uc t Spi nGl a s s s p i n Gl a s s )
31 / Base cas e /
32 i f ( l e a d i n g Sp i n == 0)
33 / Compute ener gy /
34 gdoubl e ener gy = s p i n g l a s s e n e r g y ( s p i n Gl a s s ) ;
35
36 i f ( ener gy < mi nEnergy )
37 mi nEnergy = ener gy ;
38
39
40 i f ( ener gy == mi nEnergy )
41 g p r i n t f ( \ nLeaf node wi t h ener gy %E\n , ener gy ) ;
42 g p r i n t f ( I s c u r r e n t gr ound s t a t e \n ) ;
44
45
46 e l s e
48 f i n d g r o u n d s t a t e s b r u t e f o r c e ( l e a di ngSpi n 1 , mi nEnergy , s p i n Gl a s s ) ;
49 / Fl i p s pi n down /
50 s pi nGl a s s >s p i n s [ l e a di ngSpi n 1] = DOWN;
52 f i n d g r o u n d s t a t e s b r u t e f o r c e ( l e a di ngSpi n 1 , mi nEnergy , s p i n Gl a s s ) ;
53
54
159
1 /
2 Fi l e : g s t a t e f i n d e r . h
3
4 S p e c i f i e s i n t e r f a c e f o r ground s t a t e s o l v e r s
5 /
6
8
9 # i f nde f GSTATEFINDER H
10 # de f i ne GSTATEFINDER H
11
12 / Det er mi ne ground s t a t e s of s pi n g l a s s
13 s pi nGl as s ( read ) t he s pi n g l a s s t o e v a l u a t e /
14 gdoubl e f i n d g r o u n d s t a t e s ( s t r uc t Spi nGl a s s s p i n Gl a s s ) ;
15
16 # e ndi f / GSTATEFINDER H /
Bibliography
[1] The GLib library. http://library.gnome.org/devel/glib/, 2008. Accessed 2 July, 2008.
[2] The Ness user guide. http://www2.epcc.ed.ac.uk/ ness/documentation/index.html, 2008.
Accessed 2 July, 2008.
[3] Users guide to the HPCx service.
http://www.hpcx.ac.uk/support/documentation/UserGuide/HPCxuser/HPCxuser.html,
2008. Accessed 2 July, 2008.
[4] D.J. Amit, H. Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks.
Physical Review A, 32(2):10071018, 1985.
[5] D. Andre and J.R. Koza. Parallel genetic programming: a scalable implementation using
the transputer network architecture. Advances in genetic programming: volume 2 table of
contents, pages 317337, 1996.
[6] F. Barahona. On the computational complexity of Ising spin glass models. J. Phys. A:
Math. Gen, 15(10):32413253, 1982.
[7] F. Barahona, M. Grotschel, M. Junger, and G. Reinelt. An application of combinato-
rial optimization to statistical physics and circuit layout design. Operations Research,
36(3):493513, 1988.
[8] R.J. Baxter. Exactly solved models in statistical mechanics. Academic Press, London;
Tokyo, 1982.
[9] R. Bellman. Dynamic Programming. Science, 153(3731):3437, 1966.
[10] I. Bieche, R. Maynard, R. Rammal, and JP Uhry. On the ground states of the frustration
model of a spin glass by a matching method of graph theory. J. Phys. A: Math. Gen,
13:25532576, 1980.
[11] S.G. Brush. History of the Lenz-Ising Model. Rev. Mod. Phys., 39(4):883893, Oct 1967.
161
162 BIBLIOGRAPHY
[12] M. Campanino, E. Olivieri, and A.C.D. van Enter. One dimensional spin glasses with po-
tential decay 1/r
1+g
. Absence of phase transitions and cluster properties. Communications
in Mathematical Physics, 108(2):241255, 1987.
[13] Lynn Elliot Cannon. A cellular computer to implement the kalman lter algorithm. PhD
thesis, Montana State University, Bozeman, MT, USA, 1969.
[14] E. Cantu-Paz. A survey of parallel genetic algorithms. Calculateurs Paralleles, Reseaux
et Systems Repartis, 10(2):141171, 1998.
[15] A. Carter. Finite-size scaling studies of Ising spin glasses. PhD thesis, Department of
Physics and Astronomy, University of Manchester, 2003.
[16] B.A. Cipra. The Ising Model Is NP-Complete. SIAM News, 33(6), 2000.
[17] D. de Fontaine and J. Kulik. Application of the ANNNI model to long-period superstruc-
tures. ACTA METALLURG., 33(2):145165, 1985.
[18] J. Daz, A. Gibbons, G.E. Pantziou, M.J. Serna, P.G. Spirakis, and J. Toran. Parallel
algorithms for the minimum cut and the minimum length tree layout problems. Theoretical
Computer Science, 181(2):267287, 1997.
[19] H.Q. Ding. Monte Carlo simulations of Quantumsystems on massively parallel computers.
Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pages 3443, 1993.
[20] P.A.M. Dirac. On the Theory of Quantum Mechanics. Proceedings of the Royal Society
of London. Series A, Containing Papers of a Mathematical and Physical Character (1905-
1934), 112(762):661677, 1926.
[21] B. Drossel and MA Moore. The J spin glass in Migdal-Kadano approximation. The
European Physical Journal B Condensed Matter, 2001.
[22] S.F. Edwards and P.W. Anderson. Theory of spin glasses. Journal of Physics F: Metal
Physics, 5(5):965974, 1975.
[23] A.N. Ermilov, A.N. Kireev, and A.M. Kurbatov. Investigation of models of spin glass with
arbitrary distributions of the coupling constants. Theoretical and Mathematical Physics,
49(3):10711076, December 1981.
[24] Chochia et al. IBM High Performance Switch on System p5 575 Server - Performance.
http://www-03.ibm.com/systems/p/hardware/whitepapers/575 hpc perf.html, 2008. Ac-
cessed 2 July, 2008.
BIBLIOGRAPHY 163
[25] R. Forsati, M. Mahdavi, M. Kangavari, and B. Safarkhani. Web page clustering using Har-
mony Search optimization. Electrical and Computer Engineering, 2008. CCECE 2008.
Canadian Conference on, pages 001601001604, 2008.
[26] M. Gabay and G. Toulouse. Coexistence of Spin-Glass and Ferromagnetic Orderings.
Physical Review Letters, 47(3):201204, 1981.
[27] Z.W. Geem, J.H. Kim, et al. A New Heuristic Optimization Algorithm: Harmony Search.
SIMULATION, 76(2):60, 2001.
[28] F. Glover and G.A. Kochenberger. Handbook of Metaheuristics. Springer, 2003.
[29] C.D. Godsil, M. Gr otschel, and D.J.A. Welsh. Combinatorics in statistical physics. Hand-
book of combinatorics (vol. 2) table of contents, pages 19251954, 1996.
[30] A. Grama, V. Kumar, A. Gupta, and G. Karypis. Introduction to Parallel Computing:
Design and Analysis of Algorithms. Addison-Wesley, 2003.
[31] D.J. Griths. Introduction to Quantum Mechanics. Prentice Hall, 1995.
[32] U. Gropengiesser. The ground state energy of the J spin glass. A comparison of vari-
ous biologically motivated algorithms. Journal of Statistical Physics, 79(5-6):10051012,
1995.
[33] M.F. Guest. Communications Benchmarks on High-End and Commodity-Class Com-
puters. http://www.cse.scitech.ac.uk/disco/Benchmarks/pmb.2004/index.htm, 2008. Ac-
cessed 2 July, 2008.
[34] F. Hadlock. Finding a maximum cut of a planar graph in polynomial time. SIAM Journal
on Computing, 4(3):221225, 1975.
[35] R.W. Hamming. Error Detecting and Error Correcting Codes. Computer Arithmetic, II,
29(2):147160, 1990.
[36] A.K. Hartmann. Scaling of stiness energy for three-dimensional J Ising spin glasses.
Physical Review E, 59(1):8487, 1999.
[37] W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applica-
tions. Biometrika, 57(1):97109, 1970.
[38] W. Heisenberg. Mehrk orperproblem und Resonanz in der Quantenmechanik. Zeitschrift
f ur Physik, 38(6):411426, 1926.
[39] P.C. Hemmer, H. Holden, and S.K. Ratkje. The collected works of Lars Onsager: with
commentary. World Scientic, Singapore; River Edge, NJ, 1996.
164 BIBLIOGRAPHY
[40] G. Hempel, G. Blaschke, and KF Pal. The ground state energy of the Edwards-Anderson
Ising spin glass with a hybrid genetic algorithm. Physica A, 223(3):283292, 1996.
[41] J. Houdayer and O.C. Martin. Hierarchical approach for computing spin glass ground
states. Physical Review E, 64(5):56704, 2001.
[42] H. Kawamura. Chiral ordering in Heisenberg spin glasses in two and three dimensions.
Physical Review Letters, 68(25):37853788, 1992.
[43] J.H. Kim, Z.W. Geem, and E.S. Kim. Parameter estimation of the nonlinear Muskingum
model using harmony search. Journal of the American Water Resources Association,
37(5):11311138, 2001.
[44] S. Kirkpatrick, CD Gelati Jr, and MP Vecchi. Optimization by Simulated Annealing.
Biology and Computation: A Physicists Choice, 1994.
[45] K.S. Lee and Z.W. Geem. A new structural optimization method based on the harmony
search algorithm. Computers and Structures, 82(9-10):781798, 2004.
[46] F. Liers, M. Junger, G. Reinelt, and G. Rinaldi. Computing Exact Ground States of Hard
Ising Spin Glass Problems by Branch-and-Cut. New Optimization Algorithms in Physics,
June 2005.
[47] B.M. McCoy and T.T. Wu. The two-dimensional Ising model. Harvard University Press,
Cambridge, Mass., 1973.
[48] S.P. Meyn and R.L. Tweedie. Markov chains and stochastic stability. Springer-Verlag
London, 1993.
[49] M. Mezard, G. Parisi, and M.A. Virasoro. Spin glass theory and beyond. World Scientic
Teaneck, NJ, USA, 1987.
[50] T.M. Mitchell. Machine learning. McGraw-Hill, 1997.
[51] D. Mitra, F. Romeo, and A. Sangiovanni-Vincentelli. Convergence and Finite-Time Be-
havior of Simulated Annealing. Advances in Applied Probability, 18(3):747771, 1986.
[52] C.M. Newman and D.L. Stein. Blocking and Persistence in the Zero-Temperature Dynam-
ics of Homogeneous and Disordered Ising Models. Physical Review Letters, 82(20):3944
3947, 1999.
[53] G. Pardella and F. Liers. Exact Ground States of Huge Two-Dimensional Planar Ising Spin
Glasses. Arxiv preprint arXiv:0801.3143, 2008.
BIBLIOGRAPHY 165
[54] G. Parisi. Innite Number of Order Parameters for Spin-Glasses. Physical Review Letters,
43(23):17541756, 1979.
[55] D.J. Ram, TH Sreenivas, and K.G. Subramaniam. Parallel Simulated Annealing Algo-
rithms. Journal of Parallel and Distributed Computing, 37(2):207212, 1996.
[56] J. Randa. Axial next-nearest-neighbor Ising (ANNNI) and extended-ANNNI models in
external elds. Physical Review Letters, 32(1):413416, 1985.
[57] W. Selke. The ANNNI model-Theoretical analysis and experimental application. Physics
Reports, 170(4):213264, 1988.
[58] D. Sherrington and S. Kirkpatrick. Solvable Model of a Spin-Glass. Physical Review
Letters, 35(26):17921796, 1975.
[59] P. Sutton, DL Hunter, and N. Jan. Short Communication The ground state energy of the
J spin glass from the genetic algorithm. J. Phys. I France, 4:12811285, 1994.
[60] D.J. Thouless, P.W. Anderson, and R.G. Palmer. Solution of Solvable model of a spin
glass. Philosophical Magazine, 35(3):593601, 1977.
[61] A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding
algorithm. Information Theory, IEEE Transactions on, 13(2):260269, 1967.
[62] D.H. Wolpert and W.G. Macready. No free lunch theorems for optimization. Evolutionary
Computation, IEEE Transactions on, 1(1):6782, 1997.
[63] F. Y. Wu. The Potts model. Rev. Mod. Phys., 54(1):235268, Jan 1982.

Peter Alexander Foster

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Peter Alexander Foster

Uploaded by

Copyright:

Available Formats

Parallel Combinatorial Optimisation for Finding Ground

States of Ising Spin Glasses

, it is furthermore possible to express the contribution towards the sys-

An explanation is given by Griths [31]

(a) General undirected case

cf. Baxter [8]

= ((1), (2), . . . , (n)) where (i) =

is determined from the assignment S

) denotes the probability of the emission S

. It follows that the probability distribution of states d

nodes in the graph. Since a subproblem may assume as many

for k:= (proc id *

for k:= (proc id *

becomes an operand in successive iterations of the algorithm,

is not necessary; this vector is thus distributed identically

62 Chapter 6. Software Implementation

contains ground state energies for all congurations of the nth

in the pseudocode, with elements distributed

among processes using the array *minPathNew.

are computed in parallel, such that interactions between each

k to log plotted data points.

is not sucient to merit overlapping communications.

You might also like