You are on page 1of 13

Smoothed Aggregation Multigrid for Cloth Simulation

Rasmus Tamstorf Toby Jones Stephen F. McCormick


Walt Disney Animation Studios Walt Disney Animation Studios University of Colorado, Boulder

Abstract

Existing multigrid methods for cloth simulation are based on ge-


ometric multigrid. While good results have been reported, ge-
ometric methods are problematic for unstructured grids, widely
varying material properties, and varying anisotropies, and they
often have difficulty handling constraints arising from collisions.
This paper applies the algebraic multigrid method known as
smoothed aggregation to cloth simulation. This method is ag-
nostic to the underlying tessellation, which can even vary over
time, and it only requires the user to provide a fine-level mesh.
To handle contact constraints efficiently, a prefiltered precondi-
tioned conjugate gradient method is introduced. For highly effi-
cient preconditioners, like the ones proposed here, prefiltering is
essential, but, even for simple preconditioners, prefiltering pro-
vides significant benefits in the presence of many constraints.
Numerical tests of the new approach on a range of examples con-
firm 6 8 speedups on a fully dressed character with 371k ver-
tices, and even larger speedups on synthetic examples.

CR Categories: I.6.8 [Simulation and Modeling]: Types of Figure 1: The method presented in this paper provides an 8 speedup for
SimulationAnimation a walk cycle animation of this character and 6 for a run cycle anima-
tion. These numbers are compared to a block diagonally preconditioned
Keywords: cloth simulation, smoothed aggregation, algebraic CG method. The garments consist of a combined 371, 064 vertices.
multigrid, equality constrained optimization

1 Introduction paper investigates several possible causes and potential reme-


dies for these difficulties. One difficulty inherent to elasticity is
Multigrid methods are well-known and theoretically optimal in that the different displacement variables are interconnected in
that they promise to deliver a solution to a wide variety of discrete the sense that changes in the smooth components of one variable
equations with compute time proportional to the number of un- strongly affect others. In fact, neither geometric multigrid meth-
knowns. However, multigrid algorithms that obtain full compu- ods (GMG) [Brandt 1977] nor algebraic multigrid methods (AMG)
tational efficiency can be difficult to design for new applications, [Brandt et al. 1985], when designed in a standard way, work opti-
especially when constraints are introduced. mally when the variables are strongly coupled. We elaborate on
this below with a small didactic example.
The goal of this paper is the acceleration of high-end cloth sim-
ulation, with applications ranging from feature film production Our primary contribution here is to apply a multigrid precon-
to virtual try-on of garments in e-commerce. The methods de- ditioner based on smoothed aggregation (SA) [Mka and Vanek
veloped here also apply to the broader problem of thin-shell sim- 1992] to cloth simulation. SA is designed to handle coupling be-
ulation that has many applications in engineering. Common to tween variables and is superior to simple diagonally precondi-
all of the applications is that higher resolution leads to higher fi- tioned conjugate gradients even for relatively small problems. SA
delity, but also often prohibitive computation times for conven- is also purely algebraic, so it is agnostic to the choice of input
tional methods. meshes and relieves the user of explicitly generating hierarchies
or choosing meshes such that they are amenable to coarsening.
Existing multigrid approaches to these types of problems typ- In fact, SA works with completely irregular and potentially even
ically require structured meshes and have difficulty with colli- (time) adaptive meshes. It is also agnostic to the choice of tri-
sions. Furthermore, their convergence rates and scaling proper- angles or quads for the underlying discretization. To our knowl-
ties have often not attained full computational efficiency. This edge, this work is the first time that algebraic multigrid methods
have been applied to cloth simulation. The initial focus here is
on the formulation introduced by [Baraff and Witkin 1998], but it
has important implications on other formulations and it provides
a pathway to treating other models.

While many existing literature sources address multigrid funda-


mentals, we are not aware of any that provide the comprehen-
sive but condensed focus needed to address the several chal-
lenges present in cloth simulation. To this end, in sections 4
and 5, we present a concise discourse on the fundamentals of
multigrid methodology and the development of smoothed aggre-
gation, including some theoretical foundations and heuristics,
Figure 2: Visualization of the strength of connection within a matrix for each of the three variables, (x , y , z ), at two frames of a simulation. Strong negative
off-diagonal connections between vertices are shown in UV space as blue lines. The two red lines in the second-to-left image indicate strong positive off-
diagonal connections. For clarity, "weak" connections are omitted. In the undeformed state (three leftmost images), the x and z components are each
anisotropic but in different directions. In the deformed state (three rightmost images), different directions of anisotropy appear even within a single variable.

known and new, that guide the development of effective multi- cloth, where the boundary along the cut-out corner is held fixed
grid algorithms for novel applications. This discussion is new in while the rest falls under gravity. The corresponding strength of
that it focuses on the connection between existing multigrid the- connections in the associated stiffness matrix exibits not only
ory and the development of effective algebraic multigrid solvers distinct anisotropies, but also directions of anisotropy that vary
applied to thin shells, especially with collisions. Our experiences in space and time (see figure 2). Standard methods like semi-
with the failure of standard AMG and SA for these systems led us coarsening or line relaxation used with geometric multigrid are
to reconsider basic multigrid concepts and principles, and this thus ineffective for this problem.
v2
discussion will hopefully act as a guide to anyone who might at-
tempt to apply multigrid to other related applications. The theory The second challenge related to
strong coupling between vari- x
f2
is further elaborated upon in the supplemental material.
ables can easily be seen by con-
A critical component in cloth simulation is collision handling. We sidering the simple 2D example f0
do not propose any new methods for collision detection or re- shown in the inset figure to the
sponse, but we do introduce a novel way of handling the filter- right. In this example, two edges
ing step originally introduced by [Baraff and Witkin 1998]. The are bent from their rest config- y v0 v1
new method, called Prefiltered Preconditioned CG (PPCG), is one uration, which generates the set
of the principal contributions of this paper. PPCG is essential for of bending forces labeled f 0 , f 1 ,
maintaining the advantage of the SA preconditioner in the pres- and f 2 . If we apply a displace- x f1
ence of collision constraints, but it is also advantageous when ment, x , to the top vertex, v 2 , then all the bend forces increase
using a simple diagonal preconditioner. We introduce PPCG for in magnitude. In particular, this means that a change in the x
treating collisions in section 8. coordinate of v 2 leads to a change in the x coordinate of f 2 , but
it also leads to a change of the y coordinate of f 1 . Because the
In section 11, we document the effectiveness of SA and PPCG two changes have the same magnitude, the x and y variables are
with a comprehensive set of tests that show significant compu- interpreted to be strongly coupled from a multigrid point of view.
tation speedup over conventional methods for a wide range of
cloth simulations of interest. The method has been implemented The theory in section 4 suggests that this strong cross-variable
in a production-quality solver, and can be implemented in other coupling would lead to poor performance of standard AMG. We
systems simply by replacing the innermost linear solver. confirmed this numerically by running several tests for the L-
shaped problem mentioned above. Let the number of vertices in
a simulation be n. Standard AMG then performed well for each of
2 Challenges for multigrid the three n n blocks associated with the individual unknowns
when the coupling between unknowns was deleted in the ma-
Partial differential equations (PDEs) for thin shell elasticity with trix. However, for the full matrix, its convergence rate was poor
large deformations are complicated, yet most cloth models are even for small problems, and degraded further as the problem
approximations of these equations. Even the highly simplified sizes increased. Standard AMG simply was unable to provide a
models of a planar elastic membrane undergoing small defor- significant improvement over diagonally preconditioned conju-
mations result in biharmonic equations. For most methods, in- gate gradients (PCG).
cluding multigrid, these equations are substantially more com-
plicated to find solutions for than Poisson-like problems, which
is what is often considered in the multigrid literature. 3 Related work
Two challenges that affect our work directly are anisotropy and Given the wide range of applications and the cost of the simula-
strong coupling between variables. Anisotropy is usually associ- tions, it is not surprising that multigrid methods have been stud-
ated with constitutive models and, in fact, cloth is an example ied in the graphics community for cloth simulation and, in the
of an anisotropic material. However, in the context of multigrid wider multigrid community, for the more generic linear elasticity
methods, anisotropy simply means that certain variables are con- and thin-shell problems.
nected more strongly than others in the underlying matrix, and
that there is a pattern to the directionality of these strong con- Some of the earliest work on multigrid for thin shells appears
nections. The notion of a strong connection here corresponds in [Fish et al. 1996] where they investigated an unstructured
to a large off-diagonal value of the associated matrix element rel- multigrid method for thin shells amounting to an extension of
ative to the diagonal. What is important to note is that this type geometric multigrid that coarsened with knowledge of the prob-
of anisotropy occurs even with an isotropic homogeneous elastic lem. However, they acknowledge that their method has limita-
material. This behavior is due in part to the Poisson effect. As tions in that it does not address higher-order interpolation and it
an illustration, consider the simulation of an L-shaped piece of was never tested on large-scale problems.
More recently, Gee et al. [Gee et al. 2005] addressed more com- lation is free of continuous time collisions at the end of a time
plicated shell models using a finite element discretization. They step, we follow our linear solve by one or more loops of Bridsons
used an aggregation-based approach that has many similarities method [Bridson et al. 2002, section 7.4]. If this still fails to resolve
to an SA hierarchy. However, for their method, they treat the shell all collisions, then we apply the fail-safe in [Harmon et al. 2008].
as a (thin) 3d solid unlike the typical 2d manifold approach used Only the contributions from Baraff & Witkins paper actually af-
for cloth. To avoid severe ill-conditioning and to obtain conver- fect what goes into the linear system that our solver sees. This in-
gence rates independent of conditioning, they apply scaled di- cludes constraints for cloth/object collisions and repulsion forces
rector preconditioning. This also allows them to model varying for cloth/cloth collisions (the latter being akin to the repulsion
thickness across the shell. In their follow-up work [Gee and Tu- forces in section 7.2 of [Bridson et al. 2002] but included into the
minaro 2006], the focus is on using a nonlinear solver, and adap- implicit solve as originally suggested by [Baraff and Witkin 1998]).
tive SA is used to precondition the linearizations. In our expe- While multigrid methods can be used as stand-alone solvers, we
rience, adaptive SA is currently too expensive for typical cloth found it more effective in our system to use multigrid as a pre-
problems, although this may change in the future with improve- conditioner within CG. The only change we made to our system
ments in the adaptive algorithm and demands for much more is therefore within the CG method. All other features and limita-
computationally intense simulations. tions remain the same.
Related research within the graphics community has been fo- This work should apply without modifications to other cloth sim-
cused primarily on applying various types of geometric multigrid ulators such as the one in [Narain et al. 2012], which re-tessellates
to cloth simulation. Oh et al. [2008] implemented a GMG method the geometry adaptively throughout the simulation.
that uses CG for the smoother on each level, with a focus on pre-
serving energy and mass at all levels of the hierarchy. Linear in-
terpolation is used between levels, the level hierarchy is attained 4 Multigrid fundamentals
through regular subdivision, and constraints are only treated on
the fine grid. This produced a significant speedup for their simu- Conventional multigrid methods, whether geometric or alge-
lations but failed to show linear scaling in the size of the problem, braic, tend to perform poorly for thin-shell applications. To un-
and the performance deteriorated in the presence of constraints. derstand the difficulties inherent in this application and develop
Lee et al.[2010] used a multi-resolution approach that focused on a more advanced algebraic multigrid method with improved per-
using adaptive meshing to only place subdivisions where needed formance, we carefully consider the basic multigrid principles
to provide acceleration, compared to a GMG with regular subdi- and the nature of the equations that we are treating. We doc-
vision. It is not clear how to use this approach in a multilevel fash- ument the concepts that originally guided us through this pro-
ion, nor how it addresses the difficulties arising from the PDE and cess by providing a basic overview of the multigrid methodol-
collisions. Jeon et al. [2013] extended the work in [Oh et al. 2008] ogy tailored to thin-shell equations. This is covered in the next
to handle hard constraints, as presented in [Baraff and Witkin two sections, with additional background available in the sup-
1998], by converting them to soft constraints to avoid the chal- plemental material. The aim in this development is to give in-
lenges of coarsening them. They use GMG as a direct solver, with sight into the steps needed to design effective multigrid solvers
PCG as their smoother for restriction and GS as their smoother for thin-shell equations and provide a template for their devel-
for prolongation, but require an expensive V(5,10) cycle. opment for future applications. We draw on relevant existing
multigrid concepts and principles available in the basic tutorial
One of the exceptions to the use of geometric multigrid in the presented in [Briggs et al. 2000], the additional material devel-
graphics literature is an AMG-like algorithm developed by Kr- oped in [Trottenberg et al. 2000], and relevant theory given in
ishnan et al. [2013]. Their focus is on discrete Poisson equa- [McCormick 1984] and [Vassilevski 2008]. We also include several
tions written in terms of M-matrices, which (among other prop- new insights that we gained in this development effort.
erties) have only nonpositive off-diagonal entries. Basically,
their approach is to modify the original matrix by first select- Multigrid relies on two complementary processes: smoothing (or
ing unknowns that represent the coarse grid and then elimi- relaxation) to reduce oscillatory errors associated with the up-
nating interconnections between the remaining fine-grid-only per part of the spectrum, and coarsening (or coarse-grid correc-
unknowns. This elimination is accomplished in a sparsifica- tion) to reduce smooth errors associated with the lower part of
tion/compensation process that is analogous to AMGs so-called the spectrum. Many choices exist for smoothing, but the various
stencil collapsing procedure. The difference is that, while AMG multigrid methods are distinguished mostly in how they coarsen.
eliminates these connections to determine interpolation, their
goal is to produce a modified matrix (to be used as a precondi- Geometric multigrid methods rely on the ability to coarsen a
tioner) that naturally preserves M-matrix character on coarser grid geometrically and to (explicitly or implicitly) define dis-
levels. Their method is shown to be superior to two algebraic cretizations on coarser grids, as well as interpolation operators
multigrid techniques that were modified to similarly preserve M- between the grids. Unfortunately, geometric multigrid meth-
matrix character, so it is no doubt important in some applica- ods can be difficult to develop for problems with unstructured
tions. However, their results show a loss of efficiency compared grids, complex geometries, and widely varying coefficients and
to more standard algebraic multigrid methods. Our aim here is anisotropies. As a convenient alternative to GMG methods, AMG
to develop an AMG approach that is not restricted to M-matrix and its cousin SA were developed to provide automatic processes
discretizations. for coarsening based solely on the target matrix. AMG coarsens
a grid algebraically based on the relative size of the entries of the
The simulator we use for our experiments is based on a com- matrix to determine strong connections, thereby forming a hier-
bination of the methods presented by Baraff and Witkin [1998] archy of grids from the finest, on which the original problem is
and Bridson et al. [2002]. As stated above we do not propose any defined, down to the coarsest, which typically consists of just a
new contributions related to the material model or the way colli- few degrees of freedom. The AMG coarsening process produces
sions are handled. In particular, we use the material model from coarse grids whose degrees of freedom are subsets of those on
[Baraff and Witkin 1998], and the basic contact handling is as de- the fine grid (represented by identity rows in the interpolation
scribed in section 6 of that paper (including approximate han- matrix). Thus, while AMG is an algebraic approach, a geomet-
dling of friction). Since that does not guarantee that the simu- ric representation of coarse-grid nodes in the continuum is still
easily determined. of its residual, Ae :

For linear finite element discretizations of Poissons equation on C


regular 2D grids, the parameters for AMG can be selected to pro- min ke P u c k2A Ae , Ae .
c
u kAk
duce the usual geometric coarsening with linear interpolation.
In this case, the coarse-grid matrix is essentially what FE would
e ,Ae
produce by re-discretization on the coarse grid. AMG and GMG Let any vector e be called a near-kernel component if either kAk
solvers would then have similar interpolation, restriction, and is small compared to ke k2 = e , e or Ae ,Ae
is small compared to
kAk
coarse-grid components. It is thus often safe to make assump-
ke k2A = e , Ae . The SAP is sufficient to guarantee fast conver-
tions about the convergence of a standard GMG approach by
gence of the most cost-effective form of a multigrid solver called
looking at the convergence of an AMG implementation. Yet AMG
the V-cycle. (See the supplemental material for a detailed descrip-
can automatically produce effective coarse levels for many prob-
tion.) A consequence of this guarantee is that the SAP creates a
lems that do not lend themselves to geometric approaches.
clear goal for the interpolation operator in that it must, at the very
Smoothed aggregation is an advanced aggregation-based least, adequately approximate near-kernel components. This is
method founded on algebraic multigrid principles. When coars- an essential observation in what follows. However, the global na-
ening the grid, these methods form agglomerates (by grouping ture of the energy norm that the SAP is based on makes it difficult
fine-grid nodes) that each become a node of the coarse grid. to use as a design tool. It is more common in practice to develop
The points that go into agglomerates are also formed based on schemes with estimates based on the Euclidean norm, as the fol-
relative strength between elements of the matrix. However, for lowing weaker property provides.
standard SA, coarse nodes do not correspond to single fine-grid
nodes. So, for vertex-centered discretizations, it is generally not
Definition 4.2 The Weak Approximation Property (WAP) holds
possible to assign geometric meaning to the coarse grids that SA
if a fine-grid error, e , can be approximated in the Euclidean norm
produces, especially for systems of PDEs. Smoothed aggregation
by the coarse grid with accuracy that depends on its energy norm:
also tends to coarsen more aggressively than AMG and GMG, so
the coarse matrices and interpolation operators generally must C
work harder to obtain efficiency comparable to that of AMG and min ke P u c k2 Ae , e .
c u kAk
GMG.

Once a coarse grid has been selected, an interpolation opera- Developing schemes based on the WAP is easier because es-
tor, P , that maps coarse-grid to finer-grid functions must be timates involving the Euclidean norm can be made locally in
formed. Selection of the interpolation operator can be guided neighborhoods of a few elements or nodes. This locality provides
by classical multigrid theory. Solving a fine-grid matrix equation the basis for the classical development of both AMG and SA. Un-
of the form Au = f , where A is symmetric positive definite, is like the SAP, however, the WAP itself is not enough to ensure V-
equivalent to minimizing the discrete energy functional given by cycle convergence, which is why interpolation-operator smooth-
F (v ) Av , v 2v , f , where v is an approximation of the exact ing (see the next section) is used in aggregation-based methods.
solution u . Simple algebra shows that the best coarse-grid cor-
rection to a fixed approximation, v , in the sense of minimizing Even with the above requirements there are many ways to con-
F (v P v c ), is expressed by struct the interpolation operator, P . Standard GMG and AMG are
examples of so-called unknown-based multigrid methods, where
1
the degrees-of-freedom (DOF) are coarsened and interpolated

v v P P T AP P T Av f .

separately. To illustrate this approach, note that a cloth simula-
This objective leads to the two so-called variational conditions tion typically involves three displacement unknowns. The result-
that are used in forming AMG/SA hierarchies: restriction from ing matrix, A, can therefore be written in the form
the fine to the coarse grid is the transpose of interpolation, and
the coarse-grid matrix is given by the Galerkin" product A c A A 1,2 A 1,3
1,1
A = A 2,1 A 2,2 A 2,3 , (1)

P T AP . GMG by comparison is often done by re-discretizing the
problem on the coarse levels to obtain A c , and then computing A 3,1 A 3,2 A 3,3

v v P (A c )1 P T Av f . where each block is of size n n, with n the number of nodes used



in the discretization. When using an unknown-based method,
interpolation and coarsening are constructed based on the block
To construct an effective interpolation operator, we must under- diagonals to form the full interpolation operator:
stand why conventional iterative methods tend to work well for
a couple of iterations, but then soon stall. This phenomena is P 0 0
due to the error quickly becoming smooth and difficult to re- 1,1
P = 0 P2,2 0 . (2)

duce. But this smoothing property is precisely all that we ask 0 0 P3,3
of relaxation. The basic idea for multigrid is that smooth error
can then be eliminated efficiently on coarse levels. For coars- The coarse-grid matrix is then formed using the Galerkin oper-
ening to work well this way, interpolation must adequately ap- ator P T AP . This works well when the PDE is dominated by the
proximate smooth errors, as articulated in the following property connections within the unknowns. However, for problems like
1
[Vassilevski 2008]. Here and in what follows, kk and kkA kA 2 k elasticity with strong off-diagonal connections, unknown-based
denote respective Euclidean and energy norms. approaches can suffer poor convergence and scaling.
Another choice for applying multigrid to systems of PDEs is
Definition 4.1 The Strong Approximation Property (SAP) holds the nodal approach, where the fine-grid matrix is structured in
if a fine-grid error, e , can be approximated in the energy norm by blocks of the same size as the number of unknowns in the PDE at
the coarse grid with accuracy that depends on the Euclidean norm each grid point. This approach complicates the determination of
strength between nodes and, in turn, coarsening, but it provides a allocation is known at the beginning of the construction process.
bridge to smoothed aggregation. Instead of coarsening a grid by In general, S is not needed after setup, so it can be discarded after
identifying a set of nodes that becomes the coarse grid, SA par- that phase. Usually, the strength between nodes is defined in a
titions the grid into aggregates that are strongly interconnected. way that allows S to be symmetric, and the cost of assembling S
Akin to how finite elements use local functions, SA then assigns is reduced to constructing the upper (or lower) triangular part of
each aggregate a basis of local vectors that can be linearly com- the matrix. Classically, the strength between nodes is defined as
bined with each other and bases of the other aggregates to ade-
quately (according to the WAP) approximate all fine-grid smooth i = j,
error. The coefficients in these linear combinations constitute 1, 1/2 1/2

s i j = 1, Ai i Ai j A j j > i ,max ,
the coarse-grid unknowns, and the vectors themselves represent
an approximation of the near-kernel for the coarse problem. This 0, otherwise,
form gives SA the natural ability to interpolate across unknowns,
and it has the added benefit of being able to fit a much richer set where () denotes the spectral radius of a matrix, i ,max =
1/2 1/2
of errors. max j 6=i A i i A i j A j j , and (0, 1). s i j effectively deter-
mines strength relative to other off-diagonals in row i . Also, A i j
In summary, multigrid methodology involves approximating the here refers to the block (of size b 1 b 1 ) associated with a nonzero
algebraically smooth error left by relaxation by forming a coarse in the matrix between nodes i and j .
grid and an interpolation operator from it to the fine grid that ad-
equately represents these errors in accordance with the WAP. SA, Based on S , define the set S to be the special nodes, by which
in particular, approximates algebraically smooth errors by choos- we mean those that are not strongly connected to any other node
ing aggregates of nodes that are connected strongly enough to or that correspond to a row in the matrix that is very diagonally
enable one or a few basis elements to represent these errors lo- dominant. Relaxation leaves little to no error at the special nodes,
cally, with the WAP guiding the choice of the basis elements that so they need not be interpolated to or from coarser levels and
properly approximate these smooth errors. are therefore gathered into one aggregate that is not affected by
interpolation from coarser levels.
5 Smoothed Aggregation Next, to facilitate description of the aggregation process, let the
set of fine-level nodes be represented by their indices, D1 =
Construction in SA of a hierarchy of matrices and the correspond- {1, 2, . . . , n 1 }. The next phase then constructs a fine-level parti-
ing interpolation operators between successive levels proceeds tion, {A1 , A2 , . . . , An 2 }, of D1 \ S into disjoint aggregates:
in three stages: selection of aggregates (A i on level i = 1, 2, . . . , m
from fine to coarse grids), forming interpolation operators (Pi ), n2
[
and then forming coarse-grid operators (A i +1 = PiT A i Pi ). Since D1 \ S = Ai , Ai A j = ;, i 6= j .
SA is a nodal approach, on any given level i of the hierarchy, A i is i =1

assumed to have n i nodes, each corresponding to b i b i blocks.


Each aggregate here forms a node on the coarse level. Given the
At the finest level, b 1 is the number of unknowns in the original
set of special nodes, this phase is accomplished by two passes
PDE (i.e., 3 displacements in our case). The dimensions of A i is
through the nodes. In the first pass, an initial set of aggregates is
block n i n i when considered nodally, and (n i b i )(n i b i ) when
formed and, in the second, all unaggregated nodes are assigned
all unknowns are considered.
to existing aggregates.
Smoothed aggregation assumes that we are given a set of near-
kernel components that constitute the columns of a matrix, K . Algorithm 1 Form aggregates, pass 1
This near-kernel matrix is used below to construct bases for 1: input Set of nodes, D, and SOC matrix, S .
the agglomerates. K must have the property that any near- 2: R =D
kernel component e must be adequately approximated (accord- 3: k =0
ing to the WAP) in each aggregate by a linear combination of the 4: for i = 1, . . . , n 1 do
columns of K restricted to that aggregate. For scalar Poisson 5: Form Ni = {j : s i j = 1, i 6= j }
equations, one near-kernel component (typically the constant 6: if Ni R = Ni then
vector) is usually enough to obtain good performance. For 2D 7: Ak = Ni {i }
linear elasticity, three components (typically a constant for each 8: k =k +1
of the two displacement unknowns and a rotation) are usually 9: R = R \ (Ni {i })
needed. In section 6, we return to the problem of choosing K . 10: end if
11: end for
The first step in aggregating nodes is to form a strength-of-
12: n2 = k
connection (SOC) matrix, S , which serves multiple purposes. Its
13: return Aggregates, A1 , . . . , An 2 , and still un-aggregated nodes
primary function is to provide a structure where strength" be-
R.
tween any pair of nodes in the grid" is stored. This is used to de-
cide which nodes are strongly interconnected so that they can be
grouped together into small local aggregates. Another purpose of The goal of the first pass is to create a set of aggregates from a
S is to treat a problem caused by anisotropy. The problem arises maximally independent set of strongly connected nodes. One
because the interpolation should be in the direction of strength, way to do this is outlined here. Each node is examined once in
but the smoothing that is used to improve the interpolation op- turn, in any logical order. If none of the current nodes strongly
erator can smear in the direction of weakness. S can be used to connected neighbors are in an aggregate, then they join the cur-
identify the potential for this smearing and filter smoothing by rent node to form a new aggregate. Otherwise, the current node is
eliminating the weak connections in the matrix. left alone and the next node is examined similarly. More specif-
ically, let R be the set of node indices that are not currently as-
The SOC matrix is usually chosen with a sparsity pattern that is signed to an aggregate. Initially, R = D1 \ S . Let Ni = {j : s i j =
a subset of the original nodal matrix. This can be advantageous 1, i 6= j } be the set of points that are strongly connected to point
in the implementation because the size of the necessary memory i . The first pass is outlined in Algorithm 1.
K1 Q1 R1

R2 R1

n2 coarse level nodes


n1 fine level nodes
K2 Q2
R2

R3 R3
K3 Q3

R4
R4
K4 Q4

Fine level kernel n2 aggregates Coarse level kernel

Figure 3: To form the kernel for a coarse level, we start with the kernel for the fine level (far left), where the rows have been tagged according to which
aggregate the corresponding node belongs to. All the rows with identical tags are then combined to form the local kernels (middle), and a thin QR decom-
position is applied to each local kernel. The resulting Q matrices form the building blocks for the tentative interpolation operator, P , while the resulting R
matrices form the building blocks for the coarse-level kernel (far right). P is obtained by replacing each K i matrix with the corresponding Qi matrix and
then permuting back to the original row ordering (second from the left).

After the initial set of aggregates is formed, a subset of unaggre- kernel components.
gated nodes, R, remains. The goal now is to assign the nodes in
R to aggregates in the list A1 , . . . , An 2 . This assignment can be A local QR of each block can now be formed: K i = Qi R i ,QiT Qi =
done by looping over each aggregate and assigning to it all nodes I , which yields the matrices Q1 , . . . ,Qn 2 and R 1 , . . . , R n2 . The
left in R that are strongly connected to one of its nodes. (An alter- columns of Qi form a local basis spanning the near kernel in Ai .
native is to loop over each node in R, assigning it to the aggregate Given this decomposition, the tentative interpolation operator,
that it is most strongly connected to.) All non-special nodes are P , is formed via
strongly connected to at least one node, so this step ensures that

Q1 0 0 0 R1
they will all be aggregated. Each aggregate is represented by a 0 Q
2 0 0
R2

node on the coarse level, so that level will have size n 2 . This step P = .
.. .. .. , R = ..

.
. . .
is outlined in Algorithm 2. . . .
0 0 Qn 2 Rn 2
Algorithm 2 Form aggregates, pass 2
1: input Aggregates, A1 , . . . , An 2 , and still un-aggregated nodes
Here, P T P = I by construction.
R. A coarse near kernel must be determined to allow for recursion
2: for i = 1, . . . , n 2 do to coarser levels. But K = P R means that the fine-level near
3: Let m i be the number of elements in Ai kernel can be exactly represented on the coarse level by simply
4: for j = 1, . . . , m i do choosing K c = R . Note then that, with A c = P T A P , we have
5: Form N j A c K c = P T A P R P T 0 = 0 since A K 0.
6: Let Pj = N j R
7: for k Pj do As discussed in the supplemental material, this local non-
8: Ai = Ai {k } overlapping use of the near kernel may generally satisfy the weak
9: R = R \ {k } approximation property, but not the strong one needed to en-
10: end for sure optimal V-cycle performance. To improve accuracy of the
11: end for interpolation operator, we therefore smooth it by applying the
12: end for weighted Jacobi error propagation matrix: P = (I D 1 A) P .
13: return An independent set containing all nodes, A1 , . . . , An 2 . The block diagonal matrix, D , whose block diagonal agrees with
that of A, is used here because it does not change the sparsity pat-
tern of P and it responds better to the local nature of A. A typical
Interpolation is constructed in two basic steps. The first involves choice for is 3(D41 A) , with care needed in estimating (D 1 A)
choosing a tentative interpolation operator, P , while the second as discussed in section 6. Smoothed interpolation, while gen-
step consists of smoothing P . The tentative interpolation opera- erally causing overlap in the aggregate basis functions so that
tor is chosen such that the set of near-kernel components, K , is in P T P 6= I , often leads to an optimal V-cycle algorithm. Also, the
the range of P , and P does not connect neighboring aggregates, smoothed near kernel is exactly in the range of smoothed inter-
so P T P = I . The construction of P is illustrated in Figure 3. Con- polation: P K c = (I D 1 A)P K c = (I D 1 A)K , which gen-
ceptually, assume that the nodes are ordered so that they are con- erally preserves and even improves the near-kernel components
tiguous within each aggregate and in correspondence to the ag- in K . While the finest-level matrix has nodes with b 1 degrees of
gregate ordering. (This ordering is not necessary in practice, but freedom each, all coarser levels have degrees of freedom associ-
used here simply to facilitate the discussion.) The near kernel can ated with each finer-level aggregate. The complexity of the coarse
then be decomposed into n 2 blocks denoted by K 1 , K 2 , . . . , K n 2 level is thus dictated by the number of near-kernel components
T
and written in block form as K = K 1T K 2T K nT2 . This and the aggressiveness of coarsening (that is, the size of the ag-
representation means that the number of rows of K i equals the gregates). Both choices must be controlled so that the coarse-
number of nodes in Ai times the nodal block size for the current level matrix has substantially fewer nonzero entries than the next
level, and the number of columns equals the number, , of near- finer-level matrix has.
The above steps outline how a given matrix and near kernel pair in any depth, so it may benefit from additional work. However,
A, K are decomposed to form a coarse level, the operators be- just as relaxation sweeps may improve nearness to the kernel for
tween the two levels, and the appropriate coarse matrix and near modes that are altered to satisfy the boundary conditions and lie
kernel. The combined process is summarized in Algorithm 3. The in the finite element space, so too may relaxation benefit the lin-
coarsening routine is applied recursively until there is a coarse ear elasticity modes used for the nonlinear case.
matrix that can be easily inverted through iteration or a direct
solver. Because aggregation coarsens aggressively, the number of A potentially more serious concern is our assumption that the
levels is usually small, between three to six levels for all our tests. thin-shell cloth simulation is essentially a discretization of linear
elasticity. This assumption may not be valid and could suggest
Algorithm 3 Form a coarse level in the SA hierarchy that our slightly suboptimal convergence of SA on these prob-
lems is due to incorrect near-kernel components. If the near ker-
1: input The matrix and kernel for this level, A and K .
nel is unknown, then there are adaptive SA methods that attempt
2: Precompute inverse D 1 of block diagonal of A
to discover the near kernel as part of the setup process, [Brezina
3: Find spectral radius of D 1 A
et al. 2004; Brandt et al. 2011]. These methods have computation-
4: Smooth kernel K to match boundary conditions
ally expensive setup costs, which for this problem would be on
5: Form a matrix S to determine strength
the order of three times what it is for SA, not counting the need for
6: Use S to form aggregates from nodes in A
an additional 10 cycles. To amortize this cost, convergence from
7: For each aggregate form local QR of K
an adaptively found kernel would need to be near optimal. We
8: Use the local Q blocks to form tentative interpolation, P
did experiment with an adaptive approach for this discretization,
9: Smooth the tentative interpolation to get P
but, while increased convergence rates were indeed attainable,
10: Use P to form A c = P T AP
the extra setup cost made time to solution slower overall.
11: Use local R blocks to form coarse kernel K c
12: return Interpolation and restriction operators, P , R as well as
the coarse matrix and kernel, A c , K c . 7 Smoothing

Our algorithm uses multigrid as a preconditioner for conjugate


6 Null space gradients (CG) rather than as a stand-alone solver. As a conse-
quence, the multigrid preconditioner that we use must be sym-
In the previous section, we tacitly assumed that the near kernel metric and positive definite. This requirement has multiple im-
for the fine-grid problem is known. While this is not always the plications. To ensure symmetry, the V-cycle that we use must be
case, near-kernel components can be obtained for many prob- energy-symmetric and, to ensure positive definiteness, it is criti-
lems by examining the underlying PDE. For example, for elastic- cal that the smoother be convergent in energy. (See the supple-
ity, if we ignore boundary conditions, then it is well-known that mental material for theory that proves this claim.) While these
a rigid-body mode (constant displacement or rotation) has zero requirements are not surprising, it is important to take the nec-
strain and, therefore, zero stress. So rigid-body modes that are essary steps to ensure that they are satisfied.
not prevented by boundary conditions from being in the domain
of the PDE operator are actually in its kernel. Fixed boundary The basic relaxation scheme that we use is a Chebyshev
conditions prevent these modes from being admissible, so they smoother, which amounts to running a fixed number of Cheby-
cannot, of course, be kernel components in any global sense. But shev iterations at each level based on D 1 A [Golub and Varga
any rigid-body mode that is altered to satisfy typical conditions at 1961]. A nice introduction to this smoother can be found in
the boundary becomes a near-kernel component, and they can [Adams et al. 2003]. From a theoretical point of view, it has good
usually be used locally to represent all near-kernel components. properties and, in practice, it also performs the best as shown in
[Adams et al. 2003; Baker et al. 2011]. Most importantly, it has the
To be more specific, displacements for linear elasticity are as- property of being implementable with mat-vec operations and is
sumed to be small, thereby simplifying computation of the strain thus relatively easy to parallelize compared to other smoothers
tensor. In particular, the strain is given by " = 12 (u + u T ), like Gauss-Seidel.
where u is the displacement field. In this case, it is easy to
verify that the following vector functions (which represent rota- Chebyshev requires an upper bound for the spectral radius of
tions around each of the three axes) all lead to zero strain : u = D 1 A and an estimate of the smallest part of the spectrum that
(0, z , y ), u = (z , 0, x ), and u = (y , x , 0). Here, (x , y , z ) repre- we wish to attenuate, the same as needed for smoothing the in-
sents material (undeformed) coordinates. Since these rigid-body terpolation operator by weighted Jacobi. One approach to ob-
modes are linear functions, they should be well approximated in taining these estimates is to use the power method for computing
the interior of the domain by most finite element discretization the largest eigenvalue of a matrix. Unfortunately, it does not gen-
methods. Indeed, if the finite element space includes linear func- erally provide a rigorous bound on the largest eigenvalue, and its
tions, then these modes are exactly represented in the discretiza- convergence rate is limited by the ratio of the two largest eigen-
tion except in the elements abutting fixed boundaries. All that values [Golub and Loan 1983]. In practice, these two eigenval-
is needed in this case is to interpolate these rigid-body modes at ues are often very close, so that convergence is very slow, which
the nodes. In doing so, we make them admissible and retain their leads to a trade-off: too loose of an approximation to the spec-
near-kernel nature by forcing their values at the boundary nodes tral radius yields slow Chebyshev smoothing rates, while tighter
to satisfy any fixed conditions that might be there. For further as- approximations can be costly.
surance that they remain near-kernel components, a relaxation
sweep may be applied, using them as initial guesses to the solu- Another possibility is to use Gerschgorins theorem to estimate
tion of the homogeneous equation. the largest eigenvalue, but this approximation is too loose for our
requirements. A potentially better alternative is to use Lanczoss
As noted above, this discussion assumes linear elasticity. Our lim- method [Golub and Loan 1983]. However, while its convergence
ited experience suggests that the rigid-body modes that work for rate is better, it is also more expensive per iteration and may re-
those equations may be good candidates for the linearized equa- quire careful numerical treatment to ensure convergence (e. g.,
tions of nonlinear elasticity. We have not examined this issue periodic re-orthogonalization of the Krylov basis).
In the end we need an approximation to the spectral radius of By introducing c b Az and y x z , we can then rewrite the
D 1 A, rather than A. While D 1 A is not symmetric in the tradi- constrained minimization problem in Equation (3) as
tional sense, it is symmetric in the energy inner product defined
by u , v A = u , Av . In fact, for any symmetric positive definite min
y
kS c S ASy k
preconditioner M , we have (4)
s.t. (I S )y = 0.
u , M 1 Av A = u , AM 1 Av = M 1 Au , Av = M 1 Au , v A .
By construction, S is symmetric and, therefore, diagonalizable.
M 1 A is also positive definite in energy because Since it is also orthogonal, it follows that the eigenvalues must be
either 0 or 1. We can thus compute another orthogonal matrix,
u , M 1 Au A = u , AM 1 Au > 0
Q, such that
for any u 6= 0. Its eigenvalues are therefore positive, so its spec-  
tral radius can be computed by finding the largest such that I 0
S =Q r Q T Q J Q T , r = dim(range(S )).
M 1 Ax = x , x 6= 0, which is clearly equivalent to the generalized 0 0
eigenvalue problem Ax = M x , x 6= 0. We apply the generalized
Lanczos algorithm [van der Vorst 1982] to this generalized eigen- If we now partition Q into V Rn r and W Rn nr such that
value problem to compute the spectral radius of D 1 A. Q = (V , W ), then V is a basis for the constraint null space while
W is a basis for the constrained subspace. From this decomposi-
tion, it follows that S = V V T and I S = W W T .
8 Collision handling
Similarly, let
A challenge for many of the existing multigrid methods devel-    
oped for cloth simulation is proper handling of constraints. As d1 u1
d =QT c , u =QT y .
shown in [Boxerman and Ascher 2004], superior performance is d2 u2
achieved when the preconditioner for CG is based not on the
full system, but rather on the constraint null space, by which we Using the last definition, we have y = V u 1 + W u 2 and, therefore,
mean the null space of the constraint operator. The method they
proposed constructs a pre-filtered matrix restricted to the con- V V T y = V u1 and W W T y = W u 2 . (5)
straint null space, but it is neither symmetric nor easily treated
by our multigrid approach. A reduced set of equations can be Combining the above definitions and substituting Q J Q T for S ,
constructed based on a null-space basis for the constraints, but we can rewrite the objective function in Equation (4) as follows :
this leads to a system where the block size is no longer constant.
While this may seem like a minor inconvenience, it leads to either = kS c S ASy k
a substantial increase in code complexity and a reduced ability
generate highly optimized code or reduced performance due to = kQ J Q T c Q J Q T AQ J Q T y k
less coherent memory access patterns. In fact, with a constant = k J Q T c J Q T AQ J Q T y k
block size, we can use BSR (block sparse row) matrices, where the
innermost operations are effectively BLAS3 operations with high = k J d J Q T AQ J u k
     
arithmetic intensity. By comparison, to use standard libraries for d1 V T AV 0 u1
matrices with varying block size, we would have to use CSR ma- = .
0 0 0 u2
trices with comparatively low arithmetic intensity.
This system can clearly be reduced by eliminating u 2 since any
To obtain a system with a constant block size, we form a reduced
value of u 2 produces the same value of . However, eliminating
system, but replace all eliminated variables with dummy ones.
u 2 creates a smaller system, which means that if A is block sparse
This retains the block structure while ensuring that our precon-
with fixed block size, then the reduced system will in general be
ditioner operates on the constraint null space. We refer to this
block sparse with different block sizes. To keep the original size of
method as Pre-filtered Preconditioned CG (PPCG).
the system we could leave all the zero blocks in place, but the re-
To explain PPCG, the modified linear system solved by [Baraff sulting system would then be singular, thereby introducing other
and Witkin 1998] can be written in the notation from [Ascher and problems. On the other hand, we know from Equation (4) that
Boxerman 2003] as (I S )y = 0 and, since (I S )y = W W T y = W u 2 , it follows that
u 2 = 0. Minimizing subject to the desired constraint is there-
min kS (b Ax )k
x (3) fore equivalent to solving the following linear system :
s.t. (I S )x = (I S )z ,     
V T AV 0 u1 d1
where: A Rnn is the matrix for the full system; b is the cor- = . (6)
0 I u2 0
responding right-hand side; S , which represents the filtering op-
eration in [Baraff and Witkin 1998], is the orthogonal projection
Rotating this back to our original coordinates yields
matrix onto the constraint null space (not to be confused with the
SOC matrix in section 5); and z is a vector of the desired values 
V T AV 0
  
u1
 
d1
for the constrained variables. It is assumed that A is symmetric Q QTQ =Q
0 I u2 0
positive definite.
Due to the constraint, any feasible point must satisfy or, equivalently,

x = S x + (I S )x = S x + (I S )z .

V V T AV V T + W W T y = V d 1 = V V T c .
To incorporate this expression into the objective function, first
note that Since S = V V T and I S = W W T , we finally arrive at

S (b Ax ) = Sb S A(S x + (I S )z ) = S (b Az ) S AS (x z ). (S AS + I S ) y = S c . (7)
The importance of Equation (7) is that we now have a symmetric 10 Examples
positive definite (in particular, full rank) system of the same di-
mensions as the original system, but which correctly projects out To evaluate and compare our approach to existing methods, we
all of the constraints. From this system, the solution to Equation consider five procedurally generated examples at different res-
(3) is easily recovered as x = y + z . olutions, and a high-resolution production example to show its
practical applicability. The five simple examples are shown in fig-
Furthermore, the condition number of the new system is no
ure 4, while the production example is shown in figure 1. The
worse than that of A, and may in fact be better. This conclusion is
animation for all of these can be seen in the supplemental video.
based on the assumption that 1 is in the field of values of A, which
is defined to be the real numbers inclusively between the small- The first example is a fully pinned piece of cloth subjected to
est and largest eigenvalues of A. (This assumption often holds gravity. The corresponding PDE is fully elliptic with a full Dirich-
for PDEs; otherwise, we can simply multiply I S by a scalar that let boundary, meaning in part that it is positive definite with an
is in As field of values.) To prove this assertion, first note that order 1 constant. The cloth that is pinned on two sides in the sec-
our assumption implies that the field of values of V T AV is the ond example has a smaller constant and thus provides a test to
same as that of the matrix in Equation (6), which is in turn the see how sensitive our methods are to variations in the strength
same as that of the matrix in Equation (7) because they are re- of ellipticity. The third example has a re-entrant corner, which
lated by a similarity transformation. By Cauchys interlacing the- is more difficult yet because it does not possess full ellipticity,
orem [Horn and Johnson 1985], the field of values of V T AV is a which means that the standard approximation properties dis-
subset of that of A, which immediately implies that the condition cussed above do not apply. Standard discretization and solution
number of V T AV is bounded by the condition number of A. methods have serious difficulties in the presence of re-entrant
corners, so they provide an especially challenging test problem
For the constraints considered by Baraff and Witkin [1998], S is
for our methodology. In the fourth example, the cloth falls flat
block diagonal, so the computation of S AS amounts to simple
on a flat box with a hole in the middle. This generates many con-
blockwise row and column scaling of A, while the addition of I S
tact constraints and thus illustrates the performance of our PPCG
only affects the block diagonal. It should be noted that, while the
method well. Finally, in the last example, we drop the cloth on
derivation required S to be diagonalized, the final result does not.
the same box, but this time from a vertical configuration. In this
We refer to the system in Equation (7) as the prefiltered system,
way, we observe plenty of buckling and also lots of cloth-cloth
to which we apply a standard preconditioned CG algorithm.
collisions. The examples are referred to as pinned, drooping, re-
The constraints considered by Baraff and Witkin [1998] are lim- entrant, dropHorizontal, and dropVertical, respectively. Each of
ited to (cloth) vertices against (collision) objects. However, the the five simple examples were run with both regular and irregu-
above method easily generalizes to any kind of equality con- lar tessellations.
straint, including the face-vertex and edge-edge constraints used
in other collision response algorithms (e.g., [Harmon et al. 2008],
11 Evaluation
[Otaduy et al. 2009]).

Finally, it should be noted that none of the derivations in this sec- All simulations were run on dual socket systems with Intel(R)
tion depend on multigrid methods: the results can be used with Xeon(R) E5-2698 v3 @ 2.30GHz processors, each processor with
any type of linear solver. However, by using Equation (7) with 16 cores and each system configured with 64 GB DDR4 RAM. All
smoothed aggregation, we have an effective way of coarsening simulations were run with a time step of t = 2 ms. The stopping
not just the dynamics, but also the constraints. criterion used in the CG method is a small relative residual er-
ror : kr i kM 1 /kr 0 kM 1 < , where M is the chosen preconditioner,
1

9 Implementation k kM 1 kM 2 k, r i is the residual after i iterations, and is a


given tolerance, which we set to 105 .
Our implementation of SA depends critically on Intels MKL v11.3 An important point to keep in mind is that the relative residual er-
for highly optimized implementations of most of the basic linear ror used to judge convergence gives an unfair advantage to poor
algebra operations like sparse matrix-vector and matrix-matrix preconditioners like the block diagonal ones. This observation
products as well as QR decompositions. To achieve good perfor- comes from first realizing that the relative residual in practice is
mance, many structures are pre-allocated and re-used through- only as good as how tight it bounds the relative energy norm of
out the simulation, leading to slightly higher memory consump- the error. Remembering that r = Ae , we have the bound
tion, but not excessively so. The code is somewhat parallelized,
but is generally limited by memory bandwidth rather than CPU 1 1 1 1
ke i kA kA 2 r i k kA 2 M 2 M 2 r i k kr i kM 1
resources. Additional improvements are definitely possible. = 1
= 1 1 1
C ,
ke 0 kA kA 2 r 0 k kA 2 M 2 M 2 r 0 k kr 0 kM 1
One of the most expensive operations in any Galerkin multigrid
algorithm is forming the coarse-grid operator since it is based
max
where C = min
, with max and min the respective largest and
on a triple-matrix product. Despite the symmetric nature of this
1 1
product, it is currently most efficiently implemented using two smallest eigenvalues of M 2 AM 2 or, equivalently, of M 1 A.
sparse mat-mat multiply operations. We use MKL for this pur- This bound means that the practical error measure that we use
pose, while others have explored doing it on a GPU [Dalton et al. is sharp only up to the square root of spectral condition number
2015]. Regardless of the implementation of the matrix products, of M 1 A. If M is a reasonable approximation to A as it hope-
care has to be taken when constructing the interpolation matrix fully is for our multigrid scheme, then this bound is pretty tight.
in the presence of special nodes. The range of interpolation is If M is not a very accurate approximation to A, as when M is its
the entire fine level and must contain zeros for all special nodes. diagonal part, then we may believe that we have converged well
However, in a naive implementation, if these zero entries in the before the true relative energy norm of the error is sufficiently
interpolation matrix are not pruned, then the mat-mat products small. Said differently, diagonally preconditioned iteration has
will introduce structural fill-in that can significantly affect the run the smoothing property that the residuals they produce are very
time, especially after smoothing the interpolation operator. small compared to the actual error. More precisely, as we show in
Figure 4: The five examples shown here represent problems of increasing difficulty that we use for benchmarking. They are generated procedurally, with the
vertex count ranging from 1, 000 to 100, 000. The ones shown here have 40, 000 vertices.

the supplemental material, relaxation produces error whose rel- 11.2 Setup time
ative energy norm is much larger than its relative residual norm.
In other words, the smooth error left after relaxation is hidden To achieve better convergence rates, we need both setup and pre-
in the residual because it consists mostly of low eigenmodes of filtering. The computational cost of both of these is illustrated in
A. SA tends much more to balance the error among the eigen- figure 6. The setup cost for AMG-type methods is often domi-
modes, so a small relative residual for SA is much more reflective nated by the cost of the triple-matrix product when forming the
of small relative energy error. This should be kept in mind in the coarse matrices. In our case, this is still a significant cost, but not
consideration of the results we present below. the dominant one. The computation of the spectral radius esti-
mates is currently expensive and the computation of aggregates
The size of the cloth in all our examples is 1m 1m and the ma- remains entirely serial. As such, there is room for improvement
terial parameters are the defaults used by our production solver, of these numbers.
which approximates cotton. We use = 0.48, but have experi-
mented with other values and found the results to be fairly insen- To reduce the setup time, it is possible to only update the precon-
sitive to the exact choice. If a suboptimal value is chosen, then the ditioner periodically or adaptively based on the observed conver-
convergence rate still tends to be good, but the time to solution gence rate. We experimented with only performing an update if
suffers from excessive amounts of fill-in on the coarser levels. For the convergence rate after four iterations was substantially less
smoothing, we use a single sweep of Chebyshev with a second- than in previous solve. However, we found the cost of restarting
order polynomial. Higher-order polynomials improve the con- the solver to outweigh the benefit. A more sophisticated con-
vergence rate significantly, but at too high a computational price. troller design might make adaptive setup more beneficial, but
For the spectral radius estimation, we use 10 iterations of the gen- this remains future work.
eralized Lanczos method. Once again, the convergence rate im-
proves if this number is increased, but not in a computationally 11.3 Time to solution
cost-effective way. For the kernel, we picked six vectors, one con-
stant for each of the unknowns and the three rotations described Ultimately, the most important metric is usually time to solution.
in section 5. However, making fair comparisons is not entirely straightforward
because different solvers have different strengths. As an example,
The benchmark numbers are averages over all the solves required
Diag-PCG is generally attractive for systems that are highly diag-
for 10 frames of animation for the pinned, drooping, and re-
onally dominant or when the required accuracy is very modest.
entrant examples, while we used 15 frames for dropHorizontal
For cloth simulations, diagonally dominant systems generally oc-
and 25 frames for dropVertical. These durations are chosen to
cur with small timesteps, soft materials, and/or large masses. At
capture most of the interesting behavior for each of the exam-
the other end of the spectrum, we compare our results to those
ples. In the following, we refer to block diagonally precondi-
of Intel MKLs highly optimized version of PARDISO. As a sparse
tioned conjugate gradients simply as Diag-PCG, while we refer
direct solver, it is attractive if very high accuracy is required or if
to our smoothed aggregation preconditioned conjugate gradient
the problem is very ill-conditioned.
method with pre-filtering as SA+PPCG.
For our comparisons, we use a moderate accuracy (relative error
11.1 Convergence rates less than 105 ). We note that our results do look better with stiffer
materials and/or larger time steps, but do not report on that here.
The expectation for a multigrid preconditioner is that it should The results for four of the examples are shown in figure 7. As can
improve the convergence rate of the conjugate gradient method be seen, Diag-PCG is consistently best for small problems, but
significantly and that the convergence rate should be indepen- our method is superior for problems with roughly 25k vertices or
dent of (or only weakly dependent on) the problem size. To in- more. Somewhat surprisingly, PARDISO shows very good scal-
vestigate this, define the convergence rate at iteration i by i = ing behavior considering that it is a direct solver, but it remains
kri kM 1 /kri 1 kM 1 . The average convergence rate, , within a sin- about twice as slow as SA+PPCG. The empirical complexity of our
gle solve is then the geometric mean of these values. For an entire method can be seen to be close to linear in the size of the problem
animation, we refer to the average convergence rate as the aver- as expected.
age over all solves of .
The outlier in the above set of results is the dropVertical example.
Figure 5 shows average convergence rates of SA+PPCG and Diag- As already seen in figure 5, the convergence rate for this problem
PCG for each of our five examples. While Diag-PCG approaches is worse than for the others, as is the time to solution. The rea-
a convergence rate close to 1 very quickly, our method stays son for the different behavior in this example is the large number
bounded away from 1 at significantly lower values. Note that of cloth repulsion springs added to handle cloth-cloth collisions.
the pinned example converges faster than the drooping exam- While our method handles this without any special cases, they do
ple, which in turn converges faster than the re-entrant example. not stem from an underlying and well-behaved PDE, so we lose
This observation is in agreement with our expectations based on some optimality. However, we still see the same basic scaling be-
the underlying PDEs. Note the irregularity of the curve for the havior as the problem size increases, and the method remains
re-entrant corner example, due perhaps to the loss of ellipticity. superior to Diag-PCG for large problems.











Figure 5: The average convergence rate over the length of the entire an-
imation for each of our examples. The orange curves are for Diag-PCG
while the blue curves are for SA+PPCG. Figure 7: Average time for one linear solve using Diag-PCG (orange),
SA+PPCG (blue), and PARDISO (pink). The graphs are shown for four ex-
amples : pinned (square markers), drooping (circle markers), re-entrant
(diamond markers), and verticalDrop (triangle markers)1.







Figure 6: The average time for prefiltering and setup as a percentage of
the total solve time. The combined preprocessing time is the sum of the

two. Setup is shown in blue while prefiltering is shown in orange. The
corresponding total solve time is shown in figure 7.

Figure 8: Average time for one linear solve in our horizontal drop exam-
ple. Note that the fastest time is always obtained using prefiltering.
In figure 8, we consider the solve time for our horizontal drop
example, including now two additional solvers. The first is
smoothed aggregation without our prefiltering method. The sec-
ond is Diag-PCG with prefiltering added. For low resolutions, we
see that prefiltering further improves the superior performance
of Diag-PCG and, for all resolutions, we see that prefiltering is es-

sential for the good performance of SA. In fact, without prefilter-

ing, SA at large resolutions is no better than Diag-PCG and may
in fact be worse. However, with prefiltering, SA+PPCG contin-

ues to be the best solution for large problems. The reason pre-
filtering turns out to be so critical for SA is that SA is a very good
preconditioner, so a single step without any knowledge about the
constraints can bring the solver far from the constraint manifold,
and the subsequent projection step is likely to undo much of the

progress just made by the solver.


Finally, we present a production example in figure 9. Overall, we
observe an average 8 speedup for a walk-cycle animation and a
6 speedup for a run-cycle. However, as seen from the figure, the
variation in the solve time from frame to frame was substantially
Figure 9: Time for each linear solve in our production example using a
less with our method than with Diag-PCG. walk-cycle animation.

The speedups for all our examples are summarized in table 1.


Num. vertices Pinned Drooping Re-entrant DropHorizontal DropVertical
961 0.39 0.76 0.58 0.72 0.55 0.72 0.57 0.61 0.53 0.61
1681 0.45 1.06 0.63 0.99 0.60 0.95 0.57 0.75 0.51 0.80
3721 0.40 1.31 0.50 1.18 0.54 1.17 0.62 0.96 0.35 0.94

Regular tessellation
6561 0.43 1.51 0.53 1.35 0.50 1.31 0.65 1.14 0.35 1.04
10201 0.54 1.78 0.66 1.57 0.72 1.53 0.77 1.37 0.40 1.06
22801 0.85 2.23 1.03 1.89 1.02 1.86 1.07 1.69 0.50 1.14
40401 1.07 2.29 1.35 2.03 1.36 2.01 1.40 1.89 0.60 1.18
90601 2.23 2.27 2.87 2.07 2.67 1.95 2.27 1.92 2.35 1.11
160801 3.89 2.14 5.01 1.95 4.87 1.92 3.64 1.84 1.14 0.77
361201 5.48 2.21 7.10 2.01 6.38 1.76 4.70 1.85 2.29 1.32
641601 7.12 2.26 9.28 2.10 8.84 1.96 6.09 1.89 3.46 1.53
1002001 9.20 2.40 11.64 2.20 10.89 2.04 7.53 2.13 3.01 1.80
961 0.43 0.87 0.49 0.80 0.51 0.80 0.52 0.64 0.56 0.71
1681 0.46 1.17 0.52 1.12 0.62 1.04 0.60 0.94 0.54 1.01
3721 0.43 1.61 0.46 1.45 0.55 1.39 0.56 1.21 0.38 1.06
Irregular tessellation

6561 0.46 1.86 0.50 1.65 0.56 1.72 0.69 1.44 0.44 1.32
10201 0.56 2.06 0.61 1.87 0.67 1.93 0.89 1.74 0.53 1.42
22801 0.81 2.49 0.87 2.11 0.97 2.20 1.13 2.13 0.57 1.43
40401 1.22 2.61 1.19 2.21 1.43 2.32 1.33 2.23 0.76 1.56
90601 2.32 2.54 2.51 2.21 2.36 2.33 2.20 2.19 1.29 1.39
160801 3.78 2.45 4.20 2.18 4.22 2.16 3.26 2.10 2.03 1.39
361201 5.32 2.49 5.98 2.22 6.13 2.21 4.63 2.13 2.63 1.57
641601 7.02 2.56 7.84 2.24 7.72 2.19 5.93 2.21 2.39 1.68
1002001 8.97 2.73 9.82 2.56 9.59 2.47 5.74 2.08 2.81 1.71

Table 1: The speedup factors of SA+PPCG relative to Diag-PCG (first column for each example) and PARDISO (second column for each
example). Shaded cells indicate speedups greater than 1. Diag-PCG is on average 11% faster with the irregular tessellation, while PARDISO
on average is 7% slower and SA+PPCG is 8% faster.

12 Limitations and future work 13 Conclusion

One limitation of SA compared to Diag-PCG is its somewhat This paper presented a new preconditioner for linear systems
larger memory overhead. However, the overhead is generally less formed in implicit cloth simulations by developing an algebraic
than 50 percent. SA+PPCG also comes with a higher coding com- multigrid hierarchy based on the underlying PDEs and discretiza-
plexity than a simple Diag-PCG, and an unoptimized implemen- tion. The SA solver provides a faster solution than existing meth-
tation may often be slower. ods for typical problems with 25, 000 vertices or more. For prob-
lems that are stiffer, have smaller mass, or are subjected to larger
The current algorithm is limited by memory bandwidth and, in time steps, the advantages of the method increase and show
practice, we saw negligible speedups from parallelization after speedups for smaller problems as well.
eight cores. But it remains superior to Diag-PCG and, for our
problems, was comparable to the parallel efficiency of MKLs To realize the full potential of the SA in cloth simulation and at-
PARDISO. Even so, we believe there is room for improvement, tain near optimal scaling in the presence of collisions, we had to
either by exploring techniques such as those presented in [Bell pair it with our new PPCG method. SA+PPCG is attractive be-
et al. 2012] for forming aggregates in parallel or by algorithmic cause no changes need to be made to existing collision detection
changes with better parallel implications. Incremental setup and response methods to handle the linear solves.
along the lines of what [Hecht et al. 2012] did for Cholesky fac-
torizations might be another way to reduce the performance hit Acknowledgements
of time spent in setup.
The authors would like to thank Intels MKL team for their sup-
While the results show near optimal behavior, they are not per-
port and development during this project. We are also grateful
fect. The convergence rates degrade (slowly) as the problem sizes
for the assistance from Jeff MacNeill, Garrett Raine, Benjamin
grow, leading to more iterations while cycling, so time to solution
Huang, Heather Pritchett, and Angela McBride (WDAS) in set-
is not quite scaling linearly. To attain ideal convergence and sub-
ting up and rendering our simulations. Finally, Eitan Grinspun,
sequently linear scaling, further improvements should be made
Bernhard Thomaszewski, Marian Brezina, and John Ruge pro-
to the construction of the kernel components to ensure that they
vided valuable discussions. All images are Disney 2015.
match the discretization at all times. Also, a more accurate dis-
cretization of the underlying PDEs than that provided by Baraff
and Witkin [1998] might offer a better foundation for building
multigrid methods. We intend to investigate this in the future.

Finally, we note that the present work only considers linear prob-
lems, but a natural extension is to use the proposed solver in
the inner loop of an outer nonlinear iteration that involves a lin-
earization strategy such as Newtons method.
References GOLUB, G. H., AND VARGA, R. S. 1961. Chebyshev semi-iterative
methods, successive overrelaxation iterative methods, and
ADAMS, M., BREZINA, M., HU, J., AND TUMINARO, R. 2003. Paral- second order richardson iterative methods. Numerische Math-
lel Multigrid Smoothing: Polynomial Versus GaussSeidel. J. ematik 3, 1, 157168.
Comput. Phys. 188, 2 (July), 593610.
HARMON, D., VOUGA, E., TAMSTORF, R., AND GRINSPUN, E. 2008. Ro-
ASCHER, U. M., AND BOXERMAN, E. 2003. On the modified con- bust Treatment of Simultaneous Collisions. ACM Trans. Graph.
jugate gradient method in cloth simulation. The Visual Com- 27, 3 (Aug.), 23:123:4.
puter 19, 7-8, 526531.
HECHT, F., LEE, Y. J., SHEWCHUK, J. R., AND OBRIEN, J. F. 2012. Up-
BAKER, A. H., FALGOUT, R. D., KOLEV, T. V., AND YANG, U. M. 2011. dated Sparse Cholesky Factors for Corotational Elastodynam-
Multigrid Smoothers for Ultraparallel Computing. SIAM Jour- ics. ACM Transactions on Graphics 31, 5 (Oct.), 123:113. Pre-
nal on Scientific Computing 33, 5, 28642887. sented at SIGGRAPH 2012.

BARAFF, D., AND WITKIN, A. 1998. Large Steps in Cloth Simula- HORN, R. A., AND JOHNSON, C. R. 1985. Matrix Analysis. Cambridge
tion. In Proceedings of the 25th Annual Conference on Com- University Press. Cambridge Books Online.
puter Graphics and Interactive Techniques, ACM, New York, NY,
JEON, I., CHOI, K.-J., KIM, T.-Y., CHOI, B.-O., AND KO, H.-S. 2013.
USA, SIGGRAPH 98, 4354.
Constrainable Multigrid for Cloth. Computer Graphics Forum
BELL, N., DALTON, S., AND OLSON, L. 2012. Exposing Fine-Grained 32, 7, 3139.
Parallelism in Algebraic Multigrid Methods. SIAM Journal on
KRISHNAN, D., FATTAL, R., AND SZELISKI, R. 2013. Efficient Precon-
Scientific Computing 34, 4, C123C152.
ditioning of Laplacian Matrices for Computer Graphics. ACM
BOXERMAN, E., AND ASCHER, U. 2004. Decomposing Cloth. In Trans. Graph. 32, 4 (July), 142:1142:15.
Proceedings of the 2004 ACM SIGGRAPH/Eurographics Sympo-
LEE, Y., YOON, S.-E., OH, S., KIM, D., AND CHOI, S. 2010. Multi-
sium on Computer Animation, Eurographics Association, Aire-
Resolution Cloth Simulation. Computer Graphics Forum 29, 7,
la-Ville, Switzerland, SCA 04, 153161.
22252232.
BRANDT, A., MCCORMICK, S. F., AND RUGE, J. W. 1985. Algebraic
MCCORMICK, S. 1984. Multigrid Methods for Variational Prob-
multigrid (AMG) for sparse matrix equations. In Sparsity and
lems: Further Results. SIAM Journal on Numerical Analysis 21,
its Applications, D. J. Evans, Ed. Cambridge University Press,
2, 255263.
257284.
MKA, S., AND VAN EK, P. 1992. Acceleration of convergence of
BRANDT, A., BRANNICK, J., KAHL, K., AND LIVSHITS, I. 2011. Boot-
a two-level algebraic algorithm by aggregation in smoothing
strap AMG. SIAM J. Sci. Comput. 33, 2 (Mar.), 612632.
process. Applications of Mathematics 37, 5, 343356.
BRANDT, A. 1977. Multi-level adaptive solutions to boundary-
NARAIN, R., SAMII, A., AND OBRIEN, J. F. 2012. Adaptive Anisotropic
value problems. Mathematics of Computation 31, 138, 333
Remeshing for Cloth Simulation. ACM Trans. Graph. 31, 6
390.
(Nov.), 152:1152:10.
BREZINA, M., FALGOUT, R., MACL ACHLAN, S., MANTEUFFEL, T., MC-
OH, S., NOH, J., AND WOHN, K. 2008. A physically faithful multigrid
CORMICK, S., AND RUGE, J. 2004. Adaptive Smoothed Aggrega-
method for fast cloth simulation. Computer Animation and
tion (SA). SIAM Journal on Scientific Computing 25, 6, 1896
Virtual Worlds 19, 3-4, 479492.
1920.
OTADUY, M. A., TAMSTORF, R., STEINEMANN, D., AND GROSS, M. 2009.
BRIDSON, R., FEDKIW, R., AND ANDERSON, J. 2002. Robust Treatment
Implicit Contact Handling for Deformable Objects. Computer
of Collisions, Contact and Friction for Cloth Animation. ACM
Graphics Forum 28, 2, 559568.
Trans. Graph. 21, 3 (July), 594603.
TROTTENBERG, U., OOSTERLEE, C. W., AND SCHULLER, A. 2000.
BRIGGS, W., HENSON, V., AND MCCORMICK, S. 2000. A Multigrid
Multigrid. Academic press.
Tutorial, Second Edition. Society for Industrial and Applied
Mathematics. 1982. A Generalized Lanczos Scheme. Math-
VAN DER VORST, H. A.
ematics of Computation 39, 160 (Oct), pp. 559561.
DALTON, S., OLSEN, L., AND BELL, N. 2015. Optimizing Sparse
Matrix-Matrix Multiplication for the GPU. ACM Transactions VASSILEVSKI, P. S. 2008. Multilevel Block Factorization Precondi-
on Mathematical Software 41, 4. tioners, Matrix-based Analysis and Algorithms for Solving Fi-
nite Element Equations. Springer.
FISH, J., PAN, L., BELSKY, V., AND GOMAA, S. 1996. Unstructured
multigrid methods for shells. International Journal for Numer-
ical Methods in Engineering 39, 7, 11811197.
GEE, M. W., AND TUMINARO, R. S. 2006. Nonlinear Algebraic Multi-
grid for Constrained Solid Mechanics Problems Using Trili-
nos. Tech. Rep. SAND2006-2256, Sandia National Laborato-
ries, April.
GEE, M., RAMM, E., AND WALL, W. A. 2005. Parallel multilevel so-
lution of nonlinear shell structures. Computer Methods in Ap-
plied Mechanics and Engineering 194, 21-24, 25132533. Com-
putational Methods for Shells.
GOLUB, G. H., AND LOAN, C. F. V. 1983. Matrix Computations,
3rd ed. The Johns Hopkins University Press.

You might also like