You are on page 1of 165

DISS. ETH NO.

21051

Stabilizing Explicit Dynamic Finite Element


Simulations

A dissertation submitted to
ETH ZURICH

for the degree of


Doctor of Sciences

presented by
Basil Fierz
Master of Science ETH in Computer Science
born July 5, 1983
citizen of Winterthur, Switzerland

accepted on the recommendation of


Prof. Dr. Gabor Szekely, examiner
Prof. Dr. Miguel Otaduy, co-examiner
Prof. Dr. Matthias Harders, co-examiner
Dr. Jonas Spillmann, co-examiner

2013
T O MY FAMILY
Abstract

Interactive simulation of deformable objects is one of the most important topics in visual
computing today. While explicit time integration offers an inexpensive way to numeri-
cally integrate the evolution of deformable objects, they are of limited use, for traditional
explicit finite element method integration depends on shape and size of the bodys tetra-
hedral elements. Small or ill-shaped elements containing large or small dihedral angles
require very small time steps in order for the simulation to be stable. This thesis explores
a novel way of looking at the integration stability of the linear finite element method using
explicit time integration. The focus lies on trying to maintain as many of the advantages
of explicit integration and relaxing the constraint of the elements size and shape on the
integration time step. This allows us to simplify the simulation pipeline by relaxing the
necessity to run expensive meshing algorithms. In case of topological changes we allow
to maintain the initial time step by integrating ill-shaped elements separately.
We discuss the derivation and implementation of the finite element method and the meth-
ods we developed to judge the stability of single elements. Hereby, we focus on the
corotational linear finite element method, which allows for non-linear deformations of a
linear material similar to the non-linear Saint Venant-Kirchhoff material model. We give
an overview over integration stability for explicit integration in the context of the simu-
lation of deformable bodies. Based on the analysis of the stiffness matrix, we develop
numerical schemes able to determine whether an individual element of an entire simu-
lation mesh is violating the criterion for stable explicit simulation. Extensive numerical
tests show the performance and accuracy of the develop methods.
In the second part of the thesis, we use the traditional corotational finite element method
to solve the dynamics equations to compute deformable body motions. By combining it
with various methods to handle ill-shaped elements we are able to use explicit integra-
tion while still having a stable integration. We present four methods decoupling time step
and element quality; each of them is designed as a two pass process. Wherein the first
pass the tetrahedral elements are checked whether they can be stably simulated given the
simulation time step. In the second pass, during runtime, the determined ill-shaped ele-
ments are handled separately such that they do not disturb the stability of the simulation.
As a consequence, larger integration time steps than in purely explicit methods are pos-
sible. During topological modifications such as cutting, only newly created or modified
iv A BSTRACT

elements need to be reevaluated, thus making the techniques usable in interactive simu-
lations. We conclude by numerically evaluating and comparing the performance of the
developed methods.
Zusammenfassung

Die interaktive Simulation deformierbarer Krper ist einer der wichtigsten Aspekte bei der
Darstellung virtueller Welten. Whrend das Lsen der Bewegungsgleichungen mithilfe ex-
pliziter Methoden kostengnstige Verfahren bieten, sind sie andererseits nur eingeschrnkt
nutzbar. Bei der Kombiniation der Finite Elemente Methode (FEM) mit expliziten Lsern
ist die Stabilitt stark abhngig von Grsse und Form der Diskretisierungselemente. Kleine
oder unfrmige Elemente mit sehr grossen oder sehr kleinen Raumwinkeln erfordern die
Verwendung kleiner Simulationszeitschritte, um eine stabile Lsung zu erhalten.

In dieser Dissertation werden neue Verfahren untersucht, welche es erlauben die Stabi-
litt der linearen FEM unter Verwendung expliziter Zeitintegration neu zu quantifizieren.
Diese konzentrieren sich darauf, so viele Vorteile der expliziten Integration als mglich zu
bewahren, whrend sie andererseits versuchen, die Abhngigkeit von Grsse und Form der
Diskretisierungslemente zu reduzieren. Dies ermglicht es gnstigere Verfahren zur Gitter-
generierung und -optimierung zu nutzen, da vereinzelte schlechte Elemente die Stabilitt
der Simulation nicht mehr gefhrden knnen. Im Falle topologischer nderungen der simu-
lierten Objekte, erlauben uns diese neuen Verfahren den ursprnglich gewhlten Integrati-
onszeitschritt zu halten und schlechte Elemente getrennt zu behandeln.

Im ersten Teil dieser Arbeit erlutern wir die Umsetzung der Finiten Elemente Methode.
Ebenfalls werden die Verfahren erlutert, welche es uns erlauben den Einfluss einzelner
Elemente auf die Stabilitt zu beurteilen. Wir konzentrieren uns dabei auf die mitrotierte
lineare FEM. Diese ermglicht es grosse Verformungen zu berechnen, obwohl nur ein li-
neares Materialgesetz verwendet wird. Basierend auf der Analyse der Steifigkeitsmatrix
entwickeln wir numerische Verfahren, welche es uns ermglichen einzelne Elemente eines
Gitters bezglich ihres Einflusses auf die Simulationsstabilitt des ganzen Gitters zu beur-
teilen. Umfassende numerische Tests demonstrieren die Effizienz und Genauigkeit der
entwickelten Techniken.

Im zweiten Teil erweitern wir die mitrotierte lineare FEM. Durch die Kombination ver-
schiedener Verfahren zur Behandlung schlechter Simulationselemente, erhalten wir neue
Simulationsalgorithmen, welche es uns ermglichen die Stabilitt der expliziten Integration
zu bewahren. Es werden vier Methoden prsentiert, welche die Entkopplung von Integra-
tionszeitschritt und Elementqualitt ermglichen; wobei jede in zwei Phasen abluft. In der
vi Z USAMMENFASSUNG

ersten werden die Elemente untersucht, ob sie gegeben des Zeitschritts eine stabile Simu-
lation ermglichen. In der zweiten Phase, welche whrend der Simulation abluft, werden die
nicht kritischen und die aussortieren, schlechten Elemente getrennt integriert, so dass ein
stabiles Verfahren resultiert. Im Vergleich erlauben diese Verfahren es grssere Zeitschritte
zu whlen.
Nach einer nderung in einem Gitter whrend der Laufzeit mssen nur modifizierte oder ein-
gefgte Elemente dahingehend beurteilt werden, ob sie allenfalls eine separate Behandlung
bentigen. Dies macht die vorgestellten Verfahren zur Nutzung in interaktiven Umgebun-
gen attraktiv. Wir beschliessen die Arbeit mit einer ausgiebigen numerischen Evaluation
sowie mit Vergleichen hinsichtlich der Verhalten der entwickelten Verfahren.
Acknowledgments

This thesis is the result of my scientific work at the Computer Vision Laboratory. In
research, one easily encounters obstacles and even small things can take a lot more time
than initially anticipated. During my academic journey, I naturally experienced ups and
downs that taught me valuable lessons about scientific work in general as well as how to
write publishable work. I am grateful that I had the opportunity to make this experience
and that I had the chance and freedom to pursue my research.
I owe a great deal of the successful completion of my thesis to my supervisors Prof. Dr.
Gabor Szekely and Prof. Dr. Matthias Harders. I thank them for giving me this oppor-
tunity and for supporting me in every possible way. Furthermore, I would like to thank
Prof. Dr. Otaduy for being part of my thesis committee and reviewing my work.
My time at the BIWI would have been very boring without my excellent colleagues and
fellow students. I specially want to thank Jonas Spillmann, Iker Aguinaga and Martin
Seiler from BIWI and Denis Steinemann and Dominik Berner from VirtaMed for being
such good friends. In this time, we had many fruitful discussions, which gave me new
insights into my research questions.
Last but not least, I would like to thank my family, my friends and my girlfriend Maggie
for their kind and tirelessly support, despite me overstretching their kindness once in a
while, while completing my thesis.
Basil Fierz
Contents

List of Figures xiii

List of Tables xv

1 Introduction 1
1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Effect of Remeshing on the Simulation Time Step . . . . . . . . . . 8
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Deformable Body Dynamics 15


2.1 Basic Continuum Mechanics . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Strain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.2 Stress and Elasticity . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Finite Element Method . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Natural Coordinates . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Gauss Quadrature . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.3 Linear FEM . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.4 Linear Corotational FEM . . . . . . . . . . . . . . . . . . . . 29
2.2.5 Tangent Stiffness Matrix . . . . . . . . . . . . . . . . . . . . 30
2.3 Time Integration of the Dynamics Equations . . . . . . . . . . . . . 32
2.3.1 Explicit Integration . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.2 Implicit integration . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3 Stability of Explicit Integration 37


3.1 Stability of Integration Methods . . . . . . . . . . . . . . . . . . . . 37
3.1.1 Explicit Integration Schemes . . . . . . . . . . . . . . . . . 39
3.2 Stable Integration Time Step for Deformable Body Dynamics . . . 41
3.3 Element Based Approximation . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Single Element Approximation . . . . . . . . . . . . . . . . 43
x C ONTENTS

3.3.2 Extension to Local Neighborhood to Improve the Accuracy 46


3.3.3 Reduced 1-ring . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Oscillation Frequency Quality Metric . . . . . . . . . . . . . . . . . 48
3.5 Influence of Material and Geometry . . . . . . . . . . . . . . . . . 49
3.6 Identification Performance . . . . . . . . . . . . . . . . . . . . . . . 52
3.6.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6.3 Element Quality Distribution . . . . . . . . . . . . . . . . . . 56
3.7 Extension to Nonlinear Models . . . . . . . . . . . . . . . . . . . . 59
3.7.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4 Handling Ill-shaped Elements 65


4.1 Filtering High Modal Frequencies . . . . . . . . . . . . . . . . . . . 65
4.1.1 Relaxation of Element Vibration Modes . . . . . . . . . . . 67
4.1.2 Constraining Element Deformation . . . . . . . . . . . . . . 67
4.1.3 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.1.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 Hybrid FE/Position-Based Deformation Method . . . . . . . . . . . 74
4.2.1 Position-Based Deformation Method . . . . . . . . . . . . . 75
4.2.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 Mixed Implicit-Explicit Integration . . . . . . . . . . . . . . . . . . . 83
4.3.1 Element identification . . . . . . . . . . . . . . . . . . . . . 84
4.3.2 Matrix-free IMEX solver . . . . . . . . . . . . . . . . . . . . 84
4.3.3 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.4 Time Adaptive Explicit Integration . . . . . . . . . . . . . . . . . . . 91
4.4.1 Building Update Queues . . . . . . . . . . . . . . . . . . . . 91
4.4.2 Time Integration . . . . . . . . . . . . . . . . . . . . . . . . 92
4.4.3 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.5 Performance and Comparison . . . . . . . . . . . . . . . . . . . . . 94
4.5.1 Interaction with a Liver Model . . . . . . . . . . . . . . . . . 95
4.5.2 Comparison of Different Deformation Modes . . . . . . . . 97
4.5.3 Cutting Simulation using the FE/PBD Method . . . . . . . . 107
4.5.4 Cutting Simulation using IMEX Method . . . . . . . . . . . . 109
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5 Implementation Aspects 113


5.1 Rotation Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 IMEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.2.1 Clustered IMEX . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.3 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
C ONTENTS xi

5.3.1 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121


5.3.2 IMEX using CUDA . . . . . . . . . . . . . . . . . . . . . . . 124
5.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.4.1 Speedup using CUDA for IMEX . . . . . . . . . . . . . . . . 127
5.5 Iterative Computation of the Highest Oscillation Frequency . . . . 128
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6 Conclusion 135
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.2 Discussion of Contributions . . . . . . . . . . . . . . . . . . . . . . 136
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

Bibliography 141
List of Figures

1.1 Example surgical simulator . . . . . . . . . . . . . . . . . . . . . . 1


1.2 Performance comparison between explicit and implicit integration . 4
1.3 Sequence of cuts on various meshes . . . . . . . . . . . . . . . . . 9

2.1 Mapping function from initial to current state . . . . . . . . . . . . . 18


2.2 Natural coordinate in 2D . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Natural coordinates for 3D elements . . . . . . . . . . . . . . . . . 24

3.1 2D example of n-ring approximation schemes . . . . . . . . . . . . 47


3.2 Reduced 1-ring for a 2D example . . . . . . . . . . . . . . . . . . . 48
3.3 Relation between and max for an equilateral tetrahedron. . . . . 50
3.4 Relation between geometry and max for an equilateral tetrahedron. 51
3.5 Various 3D models . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Distribution of oscillation frequencies for the liver meshes . . . . . 57
3.7 Cumulative distribution of the oscillation frequencies . . . . . . . . 58
3.8 The Bar (C) mesh is stretched using non-linear FEM . . . . . . . . 60
3.9 Effect on the critical time step of the Bar (C) mesh . . . . . . . . . 60
3.10 The Bunny (C) mesh is stretched using non-linear FEM . . . . . . 61
3.11 Effect on the critical time step of the Bunny (C) mesh . . . . . . . . 62

4.1 Constraints for different tetrahedra types . . . . . . . . . . . . . . . 68


4.2 Comparison between corotational and filtered FEM under external
constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3 Mean error of a sample deformation using the filtered FEM method 71
4.4 Effect of rigidification in the large stretch simulations . . . . . . . . 72
4.5 Effect of rigidification in the large compression simulations . . . . . 73
4.6 Exemplary 2D cluster with central node and ill-shaped elements . 76
4.7 Twisting deformation at different time steps . . . . . . . . . . . . . 79
4.8 Mean error for the twisting deformation . . . . . . . . . . . . . . . . 80
4.9 Error in a bending deformation example . . . . . . . . . . . . . . . 81
4.10 Mean error for the bending deformation . . . . . . . . . . . . . . . 81
4.11 Bending limitations of the hybrid FE/PBD Method . . . . . . . . . . 82
4.12 Stretching limitations of the hybrid FE/PBD Method . . . . . . . . . 82
xiv L IST OF F IGURES

4.13 IMEX performance for various models at different time steps . . . 87


4.14 Remaining total energy after one simulated second . . . . . . . . . 88
4.15 Implicit integration performance for various models at different time
steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.16 Speedup using clusters . . . . . . . . . . . . . . . . . . . . . . . . 90
4.17 2D example of the time adaptive integration scheme . . . . . . . . 93
4.18 Example load case on the Liver (C) model . . . . . . . . . . . . . . 95
4.19 Computation time per step for different deformation methods on the
Liver (C) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.20 Computation time per simulated second for different methods on
the Liver (C) model . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.21 Inhomogeneous cube and bar meshes . . . . . . . . . . . . . . . . 98
4.22 Ill-shaped elements for the inhomogeneous meshes . . . . . . . . 98
4.23 Inhomogeneous bar and cube mesh subject to stretching constraints100
4.24 Inhomogeneous bar and cube mesh subject to compression con-
straints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.25 Inhomogeneous cube mesh after high compression . . . . . . . . 102
4.26 Inhomogeneous bar and cube mesh subject to bending constraints 103
4.27 Inhomogeneous bar and cube mesh subject to twisting constraints 105
4.28 Inhomogeneous cube mesh subject to displacement constraints
using the hybrid FE/PBD method for various Poisson ratios . . . . 106
4.29 Cuts applied to the Liver (C) mesh . . . . . . . . . . . . . . . . . . 107
4.30 Dragon mesh is cut and stretched . . . . . . . . . . . . . . . . . . 109

5.1 Gathering and scattering computation schemes . . . . . . . . . . . 121


5.2 Schematic overview of a CUDA device . . . . . . . . . . . . . . . . 121
5.3 Graph coloring for hazard free writes . . . . . . . . . . . . . . . . . 123
5.4 Performance of the parallel implementations of the corotational FEM 126
5.5 Computation times for different iteration grouping sizes . . . . . . . 127
5.6 Correlation between normals and eigenvectors . . . . . . . . . . . 130
List of Tables

1.1 Decreasing critical time steps after cutting . . . . . . . . . . . . . . 10


1.2 Improved critical time steps after remeshing . . . . . . . . . . . . . 10
1.3 Computation costs of cutting and remeshing . . . . . . . . . . . . . 10

2.1 Parameters for Gaussian integration for 3D elements . . . . . . . . 27

3.1 Configurations and bounding boxes of various 3D models . . . . . 54


3.2 Relative error for the single element approximations . . . . . . . . 55
3.3 Relative error for the ring-based approximations . . . . . . . . . . 55
3.4 Relative error for the reduced 1-ring and 2-ring methods . . . . . . 56

4.1 Number of ill-shaped elements of the Liver (C) model. . . . . . . . 96


4.2 Errors for the stretching comparison example . . . . . . . . . . . . 100
4.3 Computation time per step for the inhomogeneous meshes . . . . 101
4.4 Errors on the compression comparison example . . . . . . . . . . 102
4.5 Errors on the bending comparison example . . . . . . . . . . . . . 104
4.6 Errors on the twisting comparison example . . . . . . . . . . . . . 105
4.7 Ill-shaped elements before and after each cut . . . . . . . . . . . . 108
4.8 Performance of the hybrid FE/PBD method to maintain the time step 108

5.1 Performance of various rotation extraction algorithms . . . . . . . . 115


5.2 Performance of the corotational FEM implementation . . . . . . . . 125
5.3 Performance advantage of using power iterations . . . . . . . . . . 129
5.4 Performance advantage of using normals as input for power iterations131
5.5 Applying power iterations to various meshes . . . . . . . . . . . . 132
1
Introduction

Real-time simulation of rich virtual environments is an important topic in visual comput-


ing today. Virtual reality (VR) applications are used to simulate aspects of the real world.
With growing computational resources geometrically complex environments become pos-
sible. Probably the most prominent ones are flight simulators to train prospective pilots.
The combination of complex software, duplicates of real airplane cockpits, and haptic
feedback allows to simulate various flight situation realistically. Most important is that
such systems allow to train exceptional and dangerous situations. These properties make
simulators also very interesting in other fields where dangerous situations emerge. The
application we are mainly interested in, is the training of surgical interventions, espe-
cially minimal invasive surgical simulations. Typically, external hardware connected to

Figure 1.1: Example of a surgical simulator. The physician uses a mockup endoscope
to interact with the virtual patient. The generated image of the virtual camera shows the
running simulation. Image provided by courtesy of VirtaMed AG.

a computer system is used in order to operate on virtual patients. Figure 1.1 shows an
example of a virtual surgery setup. The physician uses a mockup endoscope to interact
with the virtual patient and the generated image of the virtual camera shows the running
2 1. I NTRODUCTION

simulation. The variety of potential tasks, such as inspecting the texture or resecting parts
of organs, pose high demands on the versatility and efficiency of the software systems.
Basdogan et al. [2007] compare a few available simulators and show the wide range of
possible applications; from training navigation and coordination skills to complete pro-
cedures. The various parts which are required to create such simulators are summarized
in Liu et al. [2003] and Basdogan et al. [2007]. In general surgical simulators already
achieve reasonable visual results, while efficient and convincing physical simulation and
haptic feedback remain open problems.
Integral to all medical simulators are the methods used to solve the dynamics equations
that drive the deformation of the virtual organs. Nealen et al. [2006] present the wide
range of available computational models used to simulate deformable objects. In order
to solve the dynamics equations based on constitutive laws the simulation domains are
discretized. However, of the available methods mostly mass spring models (MSM) or
mesh-based finite element methods (FEM) are used. The MSM for volumetric objects
are simple and have low computational costs [Teschner et al. 2004]. However, it is hard
to parameterize them such that the behavior resembles real tissue [Lloyd et al. 2007].
FEM on the other hand has a well-established parameterization, which maps the behavior
of real world objects to virtual objects well. Thus, MSM in medical simulators were
mainly replaced by variants of FEM. Harders et al. [2003] show that linear FEM have
comparable costs to MSM while giving better results. However, for large displacements
linear models introduce artificial forces leading to inflated objects. In order to avoid
these artifacts Picinbono et al. [2003] resort to using non-linear material models, which
are rotationally invariant. Similar rotational invariance can be achieved with linear FEM
by using a corotational correction [Muller and Gross 2004].
From the discretization and the deformation model we obtain a set of ordinary differential
equations, which describe the dynamics of the simulated objects. In turn, these equations
are solved numerically by discretizing time into small steps [Schwarz and Kockler 2006].
Real-time interaction with a VR scene is possible when the time necessary to compute
a second in the simulation takes less than a physical second. This is essential because
only then quantities are perceived the same way they are simulated. Hence, this places
hard constraints on the computation time; all necessary computations must be completed
within these boundaries. Depending on the costs for a single update step, the rate has to
be adjusted accordingly.
In addition, many VR simulations require high fidelity haptic rendering systems. The
interaction of the user with virtual objects through real tools connected to force feedback
systems allows rendering a touch sensation; e. g. in medical simulations physicians use
surgical tools connected to the simulator in order to operate on a virtual patients organs.
Otaduy and Lin [2006] present a complete overview over various components and tech-
niques, which are necessary to build haptic systems. Rendering these collision forces and
thus letting the user feel the texture of organs easily requires update rates of 0.5 to 1 kHz.
Using lower rates is possible, however, the user will experience disturbing discontinu-
3

ities in the feedback. Due to the mentioned performance constraints, often the physical
simulations need to be executed using low update rates of 50 to 100 Hz. Hence, the sim-
ulation and the haptic loop are typically decoupled in order to ensure that the latter runs
with the required frequency. An example of such a system is [Galoppo et al. 2007]. The
haptic loop computes the interaction of the external tools with a low-resolution collision
model, which is embedded into a high resolution deformable mesh running in a special
simulation thread. However, this detachment can also lead to discontinuities in the haptic
feedback when the simulation loop runs too slow.
Instead, using shorter time steps has implications for many parts of the simulation. Col-
lision detection and resolution become simpler if shorter integration steps are taken. The
number of collisions and the depth of penetrations are smaller, and thus, easier to resolve.
Small time steps allow to exploit temporal coherence better, which in turn may be used to
reduce the computational effort. In similar ways topological changes on the meshes, such
as cutting and tearing, benefit from smaller integration time steps. With shorter update
steps the number of modified elements remains small and the necessary computational
work is reduced.
Preferably, we would set the simulation and the haptic update rate the same. However,
this also has implications on how the deformable objects need to be integrated. The two
ways of solving the dynamics equations of deformable bodies are explicit and implicit
integration. Choosing between one of them depends on various factors. Frequently engi-
neers choose implicit integration because it is stable even for large step sizes of 10 to 20
ms, which corresponds to low update rates of 50 to 100 Hz. Large time steps compen-
sate high computational costs, but introduce artificial damping in linearized deformation
models [Desbrun et al. 1999]. Explicit integration, on the other hand, is fast to compute
and does not suffer from artificial damping. However, the stability of the scheme is only
conditional and often expensive to predict in complex scenarios. The critical time step
depends on shape and size of the elements, the material parameters of the objects, and
the integration scheme. Various ways to determine the critical time step are found in [De-
bunne et al. 2001, Shewchuk 2002, Miller et al. 2006] for FEM and [Kacic-Alesic et al.
2003] for MSM. Note that for non-linear FEM (and hyper-elastic materials in general) the
critical time step depends on the state of the deformation and thus changes throughout the
simulation.
Other factors influencing the decision are found on the implementation side. Implicit in-
tegration requires solving a large system of equations (linear or non-linear). Usually, this
results in complex precomputations, like factorizations of the stiffness matrix or com-
puting preconditioners in order to improve performance and convergence rate of iterative
solvers. If this can be done as a preparation step prior to the simulation, their costs are
secondary. However, if this is not possible, e. g. when the mesh topology changes during
the simulation, these computations need to be performed during runtime. Naturally, this
has a huge impact on the simulation performance. As a consequence, the used integration
time step is chosen such that it is larger than the computation time per step. However,
4 1. I NTRODUCTION

this can lead to an unsatisfying behavior of the physical simulation. If used with linear
deformation models, large time steps increase the damping caused by the solver and add
a viscous behavior to the simulation.
For these reasons many resort to explicit integration [Debunne et al. 2001, Picinbono et
al. 2003, Miller et al. 2006, Comas et al. 2008, Joldes et al. 2010]. This requires no
complex precomputations, a single integration step is very cheap, and damping caused by
the integration scheme is not an issue. However, it is only conditionally stable, meaning
that the time step is highly dependent on the geometry and the material parameters; it has
to be often set below 1 ms. Thus, the used algorithms and tools to generate the simulation
mesh play a more important role in explicit than implicit integration. Already a single
element of low quality can dominate the mesh quality and require the integration time
step to be far smaller than it needs to be for elements of average quality. Such ill-shaped
elements can appear throughout a simulation as the result of topological changes such as
cutting or tearing. This places high requirements on the cutting and remeshing algorithms.
0.16
Computation time for 1 ms [s]

0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 500 1000 1500 2000 2500 3000 3500
Number Tetrahedra

Explicit Implicit

Figure 1.2: Performance comparison made between explicit and implicit integration. The
first sample on the left is built from a 10 1 1 cm3 bar with an edge length range from
1 cm. The mesh is successively refined until the edge length is 1.6 mm on the right.

Comparing the performance of explicit and implicit integration for an interactive real-time
application shows that explicit integration is competitive with implicit integration. Setting
a target time step of 1 ms, Figure 1.2 illustrates the computational effort of explicit and
implicit integration on a mesh using corotational linear FEM. The implicit integration
builds on an ordinary conjugate gradient algorithm which does not require precomputing
or factorizing any matrix to solve the equations. The explicit model uses a symplectic
Euler integration scheme. The mesh is of size 10 1 1 cm3 . One side is fixed, while
the rest is subject to simple gravitational forces. The mesh is divided into tetrahedral
elements and the size of the elements is successively reduced. The edge length along the
dimensions ranges from 1 cm in the coarsest mesh to 1.6 mm in the finest mesh. Keeping
the target time step of 1 ms requires more iterations in the implicit integration as the
element count increases and the edges shrink. Similarly, explicit integration needs to take
1.1. R ELATED W ORK 5

more sub steps in order to ensure stable integration due to the conditional stability limiting
the maximum integration step as the edges get shorter. Both curves show almost perfect
quadratic tendencies and thus are similarly expensive in this example.
Meshes rarely contain only regular shaped elements of the same size; especially topo-
logical changes during the simulation introduce elements of various shapes and sizes. In
order to reduce the overall element count, meshing tools produce inhomogeneous meshes,
i. e. containing elements of various shapes and sizes to account for the difference in curva-
ture. Although meshing tools try to produce only elements of high quality, it is still likely
that a few ill-shaped elements emerge, which will in turn dominate the integration time
step. Typically, if explicit integration is used, a very conservative time step is chosen in
order to ensure stable integration. After topological changes the time step often needs to
be adjusted in order to continue stable integration.
In this thesis we explore the use of explicit integration for interactive simulations. The
impact of ill-shaped elements on the time step for explicit integrated linear FE simula-
tion is investigated more closely. We look at ways to robustly identify such elements and
determine a precise integration time step for each individual element. Furthermore, we
take advantage of the fact that only few elements are ill-shaped in a mesh and develop
integration schemes and deformation models to handle these elements. The basis for all
these methods is linear corotational finite element method without a structural damping
component. Explicit integration is used for the majority of elements, which is comple-
mented with different ways allowing to handle ill-shaped elements with the least possible
impact on the simulation quality. This allows us to use a time step which aligns with the
quality of the majority of the elements instead of the ill-shaped elements. Additionally,
we are able to use the same time step throughout a simulation, even though topological
changes may introduce new ill-shaped elements.

1.1 Related Work

Shape and size of the discretization elements of a mesh heavily influence the quality of
the computed solution. For explicit integration this dependency limits the range of valid
time steps. Even more, by changing the topology new elements are created and thus the
time step needs to be adapted. There exists a wide range of methods developed to cope
with this problem.
The easiest way to counter ill-shaped elements is to prevent them completely by only
generating well-shaped elements at the meshing stage. While Delaunay-based meshing
approaches [Shewchuk 1998, Alliez et al. 2005] and advancing front methods [Schoberl
1997] obtain a very good average tetrahedron quality, they cannot guarantee that no ill-
shaped tetrahedra are created. As outlined, a single ill-shaped tetrahedron can control
the overall simulation time step. This problem is alleviated by the approach of Labelle
6 1. I NTRODUCTION

and Shewchuk [2007]. The authors assert that the dihedral angles are between a defined
minimum and maximum. From a given surface mesh they compute a distance field on
a body centered cubic lattice. Given this distance field, they fill the domain of the mesh
with tetrahedra selected from a list of stencils. Although their approach ensures the angle
qualities, they generate a huge number of elements due to the regular structure of the
background grid. However, topological changes are still difficult to accomplish, because
the cuts cannot be approximated well due to the limited grid resolution. Increasing the
resolution locally would again generate very small elements that destabilize the explicit
integration. Employing the mentioned algorithms to remesh domains on the fly (as in
[Klingner et al. 2006, Wojtan and Turk 2008]) is currently not possible in real-time;
e. g. for the method in [Alliez et al. 2005], the authors state that they need a time of
16 seconds to create a torus mesh with 1000 vertices. Although, the time to generate a
mesh with [Labelle and Shewchuk 2007] is only hundreds of milliseconds, creating and
evaluating the distance field takes much longer.

As an alternative to global remeshing, it is also possible to start from a given volumet-


ric mesh and optimize the elements locally. These approaches aim to produce meshes of
higher quality without the need to optimize the whole mesh. After generating an initial
mesh, Ito et al. [2004] optimize it using local remeshing techniques like node-snapping
and edge-swapping. Similarly, Cheng et al. [2004] use local techniques to generate tetra-
hedral meshes from polyhedra. Recently, Klinger et al. [2008] proposed to optimize the
mesh by using a sequence of different local optimization steps. In their report, they give
detailed results about the improvements achieved with their algorithms. However, the
resulting topological changes are very expensive and, for low resolution meshes (approx-
imately 1000 elements), the runtime is measured in seconds. Additionally, there is no
guarantee that the resulting mesh is of a distinct minimum quality. Due to the computa-
tionally intensive work and the lack of quality-assurances, remeshing techniques are not
useful in real-time simulations.

Closely connected to remeshing is the question, which elements in a mesh are of bad
quality and need to be optimized. Dompierre et al. [1998] provide a benchmark and com-
pare various geometric measures to judge the shape of tetrahedral elements. They state
that they do not consider linear scaling as part of the metrics. Moreover, they clearly
state that an optimal tetrahedral mesh in 3D does not exists and that the result of a mesh
optimization process depends on the used metric. Knupp [2001, 2003] takes a different
approach by defining metrics to measures shape and size based on the Jaccobian of the
tetrahedral element. By combining the two metrics he obtains a size-shape metric captur-
ing all the geometric properties. However, all of these metrics only consider the geometry
and not the effects of time discretization, deformation model, and integration schemes.
Shewchuk [2002] builds on similar concepts as Knupp does. Additionally, he takes the
whole numerical integration into account and derives a quality measure. Similar ways to
determine the quality are found in [Debunne et al. 2001, Miller et al. 2006] for FEM and
[Kacic-Alesic et al. 2003] for MSM. However, as we will show later on, the computed
1.1. R ELATED W ORK 7

results are too conservative or not reliable enough to be useful as a criterion for computing
the time step. Instead, by considering the properties of the dynamic system of a mesh,
several authors compute stable time steps for various explicit integrators [Shinya 2004,
Gurfil and Klein 2007]. They determine a time step based on the spectral radius of the
matrix of a linear solver. However, this requires computing the eigenvalues of the system
matrix, which is not applicable for interactively changing topologies. Furthermore, the
determined time step may be significantly smaller than the necessary step size for a real
time simulation. Instead of selecting a valid time step for a given configuration, it would
also be possible to add damping to the simulation, in order to make it stable. Schmedding
et al. [2009] describe an iterative scheme damping the relative velocities. Their general
description allows to use it for any deformation model, e. g. to reduce the negative in-
fluence of ill-shaped elements on the simulation. However, structural damping is not
straightforward using this model. Furthermore, in the context of topological changes the
positive effect of the damping on the stability is unpredictable and thus it is not possible to
determine the improved time step. Askes et al. [2011] summarize different mass-scaling
approaches allowing to increase the critical time step for explicit simulation. However,
because the modified mass matrix is not diagonal, it needs to be inverted, which is not
applicable if the mesh topology changes often.

Another possible strategy to handle ill-shaped elements is adaptive methods. For instance,
Debunne et al. [2001] propose a space- and time-adaptive level of detail technique. Lo-
cal space and time resolution is refined in order to concentrate computational load in
strongly deforming regions. Adaptivity is enabled by embedding meshes of different res-
olutions and, depending on the deformation, by choosing another resolution per region.
A further adaptive approach has been introduced in [Grinspun et al. 2002]. Instead of
refining elements directly, they propose to refine basis functions. This allows them to re-
solve deformations on a fine level and thus reduce the error in high deformation regions.
Adaptive strategies would provide a sophisticated solution to dealing with ill-shaped ele-
ments. However, in the case of topological changes, difficulties can arise from necessary
updates. Multi-level adaptive meshing requires interactive updates on all nested mesh
levels, potentially adding ill-shaped elements at various levels. The creation of extremely
badly-shaped elements in general poses a problem for adaptive methods.

A very different way of simulating deformable objects is to approximate the global de-
formation modes. Instead of allowing all possible deformations, the simulation is con-
strained to a subset. Pentland and Williams [1989] and Hauser et al. [2003] identify high
modal frequencies as a source of small amplitude vibrations causing unpleasant visible
artifacts. Moreover, the high modal frequencies require small time steps. By omitting
high frequency modes they improve the performance and remove unwanted oscillatory
visual artifacts. As a result Pentland and Williams compute a polynomial approxima-
tion of the lower modes and use them to perform the animation. Hauser et al. extend
the method by using the vibrating modes directly and include the possibility to use con-
straints, e. g. collisions, in frequency space. Barbic and James [2005] extend the con-
8 1. I NTRODUCTION

cept to non-linear FEM by precomputing a polynomial approximation. By using the


lower deformation modes they construct an optimal small orthogonal basis, which allows
them to simulate complex objects in real-time. Similarly, Kim et al. [2011] and Barbic
et al. [2011] do computations in frequency space, but constrain the modal reductions to
sub-spaces. Global behavior is achieved by connecting the individual subspaces using
constraints. However, the above methods require precomputing the vibration modes for
the stiffness matrix, which is a costly step. Any change in the topology of a mesh requires
to recompute the modal reduction, which is computationally too expensive for situations
where real-time cutting is part of the simulation. Furthermore, modal reduction removes
small local deformations from the simulations, which are important when interacting with
virtual objects locally.
Similar in spirit are sub-space simulation techniques based on data-driven simulation in-
stead of modal reduction. These allow to efficiently separate the simulation of low and
high frequency details. A coarse base mesh is simulated using e. g. FEM while the fine
details are added on top of the coarse simulation state. De et al. [2011] use a data-driven
method to simulate soft-tissue models. In a preparation phase high resolution simulations
are run to train a neural network, through which a set of radial basis functions are de-
fined. Later, a coarser resolution mesh is employed for the simulation and the position of
a rigid body is used to select the radial basis functions to add details to the fine mesh. In
this spirit, the method by Seiler et al. [2012] separates the simulation of low and high fre-
quency details. Their main goal is to simulate the high frequency features observed when
a rigid body is interacting with an incompressible deformable body. A pre-simulation
recording phase generates high frequency details and stores them in a database. During
simulation the stored details are added to the coarse simulation, where the rigid bodies
interact with the deformable object. Martin et al. [2011] describe a method which allows
to direct an underlying simulation to certain target poses. One of their use cases embeds
a high resolution surface in a low resolution simulation mesh, where the appropriate sur-
face is chosen from a precomputed set of configurations. Albeit, they use their method
for offline animations, a similar approach for real-time scenarios is imaginable. These
approaches work well as long as the interaction of the virtual environment and the de-
formable body is simple enough. However, precomputing the space of all possible target
states is very cumbersome and time consuming. Even more, as soon the mesh is cut
the precomputed functions/displacements loose their validity and need to be recomputed.
This makes such methods difficult in this case.

1.2 Effect of Remeshing on the Simulation Time Step

Given the performance requirements of interactive applications, we need to ensure that


topological changes on the mesh do not degrade the performance by introducing too many
new elements or lowering the required integration time step. Although cutting algorithms
1.2. E FFECT OF R EMESHING ON THE S IMULATION T IME S TEP 9

are not the topic of this thesis, cutting itself is an important motivation. Changing the
topology of a mesh during the simulation is an important part of VR systems and can
heavily influence the critical time step by introducing new elements of various sizes and
shapes. Traditionally, implementations try to minimize this effect by optimizing the simu-
lation meshes after a cut has occurred. However, we show that remeshing approaches are
not the best option in terms of computational costs and achieved mesh improvement. In
more detail, topological changes typically involve element subdivisions which potentially
decrease mesh quality by introducing small or ill-shaped elements. Besides the grow-
ing number of elements, ill-shaped elements may decrease the simulation time step for
explicit integration. Therefore, their conditional stability requires to adapt the time step
to the decreasing mesh quality. In the following, we give an overview of the impact of
cutting on the mesh quality and the limitations remeshing algorithms face when trying to
maintain the initial time step. The simulation using the corotational FEM is integrated
with the Symplectic Euler scheme (see Chapter 2). We examine the performance on a
set of meshes using a Youngs modulus of 30 kPa, a Poisson ratio of 0.3, and a density
of 1000 kg/m3 and perform the tests on an Intel Core2 Quad Q9400 Processor running at
2.66 GHz.

Figure 1.3: Sequence of cuts performed on the Armadillo, Bunny (C), Dragon, and Duck
model.

The cutting algorithm is implemented based on [Steinemann et al. 2006] by using a


combination of element subdivision and node snapping. As a first optimization step, we
process the contour edges of the cut surface. Vertices adjacent to short contour edges are
displaced to increase the volume of the incident elements. In a second phase, we use a
set of standard improvements as they can be found in [Burkhart et al. 2010, Klingner
and Shewchuk 2008]: multi-face removal, edge removal, and basic flips. We base these
operations on the characteristic length Le = 3Ve /maxAe , where Ve is the volume and
maxAe is the area of the largest face of a tetrahedron [Aguinaga et al. 2010]. Finally,
we smooth the neighborhood of a cut using a mass-spring simulation to regularize the
element shapes further [Serby et al. 2001].
Figure 1.3 shows a few example meshes, where we performed the series of four cuts
using element subdivision and node snapping. After each cut the critical time step is
computed by evaluating the spectral radius of the system matrix of the whole mesh (see
Section 3.1). We focus on two use cases: we take the cut mesh as it is, and, in a second
10 1. I NTRODUCTION

Name Initial 1st cut 2nd cut 3rd cut 4th cut
Armadillo 0.032 ms 0.017 ms 0.017 ms 0.017 ms 0.017 ms
Bunny (C) 1.64 ms 0.34 ms 0.34 ms 0.34 ms 0.34 ms
Dragon 0.039 ms 0.024 ms 0.024 ms 0.024 ms 0.017 ms
Duck 0.058 ms 0.058 ms 0.038 ms 0.017 ms 0.009 ms

Table 1.1: Decreasing critical time steps after performing several cuts. Each time new
ill-shaped elements are introduced the necessary simulation time step decreases. Note
that not each cut introduces new elements worse than the most ill-shaped before the cut.

case, we optimize the mesh after each cut. The decrease of the critical time steps without
doing any remeshing after the cuts is summarized in Table 1.1. Note that the time step
does not change with each cut; this indicates that the performed cut did not introduce
elements which are worse than the most ill-shaped elements already in the mesh.

Name Initial 1st cut 2nd cut 3rd cut 4th cut
Armadillo 0.032 ms 0.017 ms 0.017 ms 0.017 ms 0.017 ms
Bunny (C) 1.64 ms 0.63 ms 0.6 ms 0.6 ms 0.6 ms
Dragon 0.039 ms 0.033 ms 0.027 ms 0.027 ms 0.027 ms
Duck 0.058 ms 0.077 ms 0.056 ms 0.056 ms 0.011 ms

Table 1.2: Improved critical time steps after remeshing. For most of the cutting results
the mesh quality is higher compared Table 1.1. Still, after several cuts the critical time
step is lower than the initial one.

In order to increase the critical time step the mesh is optimized after each cut. Table 1.2
shows the results after optimizing the mesh after each cut. Compared to the case above
where the mesh is not optimized, the critical time step is higher after optimization. How-
ever, it still decreases and as a consequence, the time step still needs to be adapted in order
for the simulation to be stable.

Cutting costs [s] Remeshing costs [s]


Name 1 cut 2nd cut 3rd cut 4th cut
st
1 cut 2nd cut 3rd cut 4th cut
st

Armadillo 0.028 0.023 0.279 0.330 0.424 0.578 5.655 5.731


Bunny (C) 0.067 0.128 0.034 0.038 0.236 0.328 0.154 0.141
Dragon 0.230 0.360 0.216 0.207 0.533 0.877 0.519 0.491
Duck 0.024 0.013 0.028 0.193 0.159 0.123 0.095 2.946

Table 1.3: Computation costs for each of the cuts and for each remeshing sequence for
each of the meshes. Optimizing the mesh is several times more expensive than cutting
itself.

Besides the decreasing time step, which influences the performance of the integration pro-
cess, the performance costs of the cutting and remeshing algorithms matter. In Table 1.3
1.3. C ONTRIBUTIONS 11

the computation times for cutting and remeshing for the different cuts are plotted. The
cutting procedure itself is not cheap. However, mesh optimization is an order of magni-
tude more expensive. While it clearly improves the mesh quality, the costs of remeshing
are high compared to their benefit. Moreover, there is no guarantee that the critical time
step of the optimized mesh is above the simulation time step. It is still necessary to adapt
the simulation to the new mesh. However, as we will show in this thesis, we can main-
tain the simulation time step even in the presence of ill-shaped elements, which would
otherwise require a smaller time step.

1.3 Contributions

In order to make explicit integration a viable option for simulating deformable objects, we
propose an alternative to traditional remeshing to handle the negative effect of ill-shaped
elements on the simulation time step. This thesis contributes two significant methods
which enable explicit integration using large time steps, while, at the same time, handling
ill-shaped elements which would otherwise destabilize the simulation:

A novel way of determining the stable integration time step for explicit integration
on a per element basis by regionally isolating modal frequencies. The developed
method is valid for a wide range of FEM based deformation models such as lin-
ear FEM, corotational linear FEM and non-linear FEM. In order to evaluate the
developed metric efficiently, we present a simple, fast iterative method.

Based on the developed identification scheme, we designed four methods to cope


with ill-shaped elements during the simulation. Given a target time step they can
maintain a stable integration by handling any ill-shape elements dynamically. Fur-
thermore, this allows to handle changes to the mesh topology efficiently during run-
time. Thus, we are able to maintain a stable explicit integration without adapting
the time step.

Based on the developed methods, we demonstrate the advantage of using our method over
traditional explicit integrated FEM. Moreover, we show that depending on the method,
we can balance between speed and quality. This allows to choose the appropriate one
depending on the requirements of the application. The developed solutions allow to re-
lax the interdependence between the mesh quality and the simulation time step. We are
able to handle defects of meshing and remeshing algorithms, which allows us to use less
expensive techniques. As a consequence, we are not bound to the time step defined by
the most ill-shaped elements. Rather, we can use an integration time step oriented at the
quality of the majority of elements.
12 1. I NTRODUCTION

1.4 Outline

Chapter 2 presents the background of the finite element method. This consists of an intro-
duction to the basics of continuum mechanics and the numerical method to discretize its
partial differential equations. Furthermore, the numerical integration schemes necessary
to solve the dynamics equations are introduced.
Chapter 3 first gives an introduction to the stability theory for ordinary differential equa-
tions and shows its application to the finite element method. In order to evaluate the sta-
bility criteria efficiently, we introduce approximate metrics to determine critical elements.
The chapter is completed with accuracy and performance validations and applications to
linear and non-linear FEM.
Chapter 4 presents the methods we have developed, allowing a stable integration of the
dynamics equations by using explicit integration schemes in the presence of elements
violating the stability criteria. For each method we show the performance in example
simulations, and discuss characteristics and short comings. Furthermore, performance
measurements and an overall comparison between the different methods provide details
to reason about their usability.
Chapter 5 is dedicated to implementation aspects of various parts of the developed meth-
ods. Amongst other things, we compare different ways to compute polar decompositions
and demonstrate possible ways of parallelizing FEM on single machine work stations.
Chapter 6 completes the thesis by summarizing the introduced methods and their impli-
cations. Finally, we depict possible future directions.
1.5. P UBLICATIONS 13

1.5 Publications

Work presented in this thesis appeared in the following journals and proceedings.

Journals

[Fierz et al. 2010] B. Fierz, J. Spillmann, and M. Harders, 2010.


Stable Explicit Integration of Deformable Objects by
Filtering High Modal Frequencies.
Journal of WSCG, 18(1-3), 81 - 88.
[Aguinaga et al. 2010] I. Aguinaga, B. Fierz, J. Spillmann, and M. Harders, 2010.
Filtering of High Modal Frequencies for Stable Real-time
Explicit Integration of Deformable Objects using the Finite
Element Method.
Progress in Biophysics and Molecular Biology,
103(2-3), 225 - 235.
[Fierz et al. 2012] B. Fierz, J. Spillmann, I. Aguinaga, and M. Harders, 2012.
Maintaining Large Time Steps in Explicit Finite Element
Simulations using Shape Matching.
IEEE Transactions on Visualization and Computer Graphics,
18(5), 717 - 728.

Conference Proceedings

[Gutierrez et al. 2011] L. Gutierrez, I. Aguinaga, B. Fierz, F. Ramos, and M. Harders, 2011.
Pitting a New Hybrid Approach for Maintaining Simulation Stability
after Mesh Cutting Against Standard Remeshing Strategies.
Procedings Computer Graphics International.
[Fierz et al. 2011] B. Fierz, J. Spillmann, and M. Harders, 2011.
Element-wise Mixed Implicit-Explicit Integration for Stable
Dynamic Simulation of Deformable Objects.
Proceedings of the ACM SIGGRAPH/Eurographics
Symposium on Computer Animation, 257 - 266.
2
Deformable Body Dynamics

In this chapter, we introduce the basic theory behind deformable body dynamics along
with an introduction to the finite element method, which we use to solve the equations.
We provide the derivations in a manner making it possible to easily implement deformable
body dynamics using FEM.
Describing the behavior of deformable objects is not as straight forward as for rigid bod-
ies. For the latter it is enough the characterize the motion of the center of mass. However,
in case of deformable objects it does not maintain its relative position within the object
when the shape changes. Thus, instead of a single mass point, the object is represented by
a set of N interacting particles. The motion of the deformable object is then described by
the combined behavior of all the particles. Newtons law of motion connects the inertial
forces of each particle i: mi x i with external forces, e. g. gravity, and internal inter-particle
forces
i = f ext
mi x i + f ij , (2.1)
j

where f ij is the force between two particles i and j. The dynamics of the whole particle
system in 3D result in 3N coupled 2nd order ordinary differential equations (ODEs)

= f ext (x) + f int (x),


Mx (2.2)

where M is the lumped mass matrix and f ext (x) and f int (x) are functions computing
external and internal body forces from the positions; x , x are the accelerations and the
current positions, respectively. Tracking the dynamics of the particles defines how the
whole body evolves over time. Over the years a variety of methods to setup and solve the
above differential equations were defined. Nealen et al. [2006] summarize the methods
typically used in interactive VR simulations. Each method has different properties with
respect to discretization, material parameterization, accuracy, stability and computational
efficiency. As already mentioned, this chapter discusses the mathematical modeling and
implementation of soft tissue using FEM.
In order to derive the dynamics equations (2.2), we first introduce the underlying me-
chanical concepts. Thereafter, we show how to discretize the obtained partial differential
16 2. D EFORMABLE B ODY DYNAMICS

equations (PDEs) using FEM. Depending on the use case, we obtain different sets of
ODEs for linear and non-linear problems. For the derivations of the equations, we follow
[Hughes 2003, Athanasiou and Natoli 2008, Ibrahimbegovic 2009].

2.1 Basic Continuum Mechanics

The dynamics of deformable bodies can be modeled by continuum mechanics. Instead of


discrete particles this model assumes a continuous matter distribution. This allows us to
formulate PDEs describing the behavior of differential quantities of a body. Note that in
order to solve the resulting equations, we need to discretize the simulation domain later.
We assume that matter is composed of infinitesimal particles, which may be associated
with different properties (mass, velocity, pressure, ...). In order to describe the quantities
of these particles, we need to choose between different spatial descriptions:

Lagrangian (material) coordinates denoted with X,

Eulerian (spatial) coordinates denoted with x.

Lagrangian coordinates define the spatial location as an explicit property of a particle


with respect to the initial configuration of the body. Eulerian coordinates, on the other
hand, describe a general position in space, which can be occupied by different particles
during the simulation. In the former system, we track the material particles explicitly.
While, in the latter, we track the distribution of the material particles in the spatial domain.
We express the relation between spatial and material coordinates in terms of a bijective
transfer function at time t

x = (X, t), with (X, 0) = X. (2.3)

From a theoretical point of view Lagrangian and Eulerian descriptions are equivalent and
can be used interchangeably. Lagrangian descriptions are typically used in cases where we
are interested in the properties of the particles themselves, which we need to track, as we
do in the case of deformable body dynamics. Eulerian descriptions are more convenient
when we are interested in the spatial distribution of properties rather than the properties
of a distinct particle. A prominent case for this representation are e. g. grid based fluid
dynamics.
In the following we restrict our derivations to the Lagrangian representation of continuum
mechanics. Any discussed property is associated with a fixed reference frame, i. e. a par-
ticle. During the course of the simulation these properties may change, however they are
still associated with their initial reference frame. We study the dynamics of deformable
2.1. BASIC C ONTINUUM M ECHANICS 17

bodies by tracking position, velocity, and acceleration of the particles. The three functions
x, v, a take care of this tracking:

x = x(X, t)
v = v(X, t)
a = a(X, t) (2.4)

Note that the function x calculates the spatial position from the material coordinates of
particle X. Analogously, the functions v and a calculate the velocity and acceleration of
a particle in the spatial domain, respectively.
A core concept of deformable body dynamics is the change of the relative displacement
between particles. Next, we introduce the notion of strain describing a geometrical mea-
sure of the spatial relation of neighboring particles.

2.1.1 Strain

Consider two material particles P (t), Q(t) of a body at time t, which are close to each
other in material space. Their distance can be described by the material differential dX
at t = 0.
We describe the spatial coordinates of P and Q in terms of the coordinate function de-
fined in (2.4). In order to simplify the following derivations, we define the displacement
function u
u(X, t) = x(X, t) x(X, 0), (2.5)
which relates the current position of a particle to its initial position. The current position
of P : X P and Q : X Q = X P + dX can then be written as

xP (t) = x(X P , 0) + u(X P , t)


xQ (t) = x(X P + dX, 0) + u(X P + dX, t). (2.6)

Figure 2.1 shows a visual representation of this relation. The magnitude of the strain
is the relation between the distance of P and Q at some time t and t = 0, meaning
||xP xQ ||2 /||XP XQ ||2 . While the difference XP XQ = dX is written in material
coordinates, we write the difference xP xQ as a differential quantity dx, in spatial
coordinates, in terms of the material differential and the displacement function

dx = dX + u(X + dX, t) u(X, t). (2.7)

X and dX are three dimensional variables, thus we use multi-variable calculus to define
the displacement gradient
u(X + dX, t) u(X, t)
u = . (2.8)
dX
18 2. D EFORMABLE B ODY DYNAMICS

QX
dX

PX Qx
Px dx

Figure 2.1: Relation between the current state x and initial state X connected by the
mapping function . The points P and Q are separated by differential quantities dX and
dx. The dashed lines represent the initial configuration, while the continuous line is the
deformed state.

Finally, the spatial differential dx in terms of material differentials is


dx = dX + (u)dX. (2.9)
The displacement gradient describes how the spatial coordinates vary, when the material
coordinates change by the differential quantity dX. Dividing (2.9) by dX, we obtain the
deformation gradient in terms of the displacement gradient
dx
F = = I + (u). (2.10)
dX
If the displacement gradient between two particles is 0, the relative position between the
particles did not change. Thus, the deformation gradient is the identity matrix, which is
equivalent to the fact that no deformation is present.
One way to define the strain is to look at the way the orthogonality of two vectors
dX a , dX b in material coordinates changes in spatial coordinates. Given arbitrary dis-
placements, orthogonal vectors in material coordinates are transformed to spatial coordi-
nate vectors dxa , dxb using (2.9)
dxa = dX a + (u)dX a
dxb = dX b + (u)dX b . (2.11)
We measure the amount of deformation by looking at how much the vectors in spatial
coordinates deviate from orthogonality by taking the dot product
dxTa dxb = dX Ta dX b
+ dX Ta (u)dX b + dX Ta (u)T dX b
+ [(u)dX a ]T [(u)dX b ]. (2.12)
2.1. BASIC C ONTINUUM M ECHANICS 19

By separating the variables, the relation in terms of the deformation gradient becomes

dxTa dxb = dX Ta (I + (u) + (u)T + (u)T (u))dX b ,


= dX Ta F T F dX b . (2.13)

In fact, this leads to the definition of the Right Cauchy-Green deformation tensor

C = FTF. (2.14)

In addition, there exist other deformation tensors, for which we refer the reader to the
literature. A further definition we need later on, is the finite Green-Lagrange strain tensor
1
E = (u + uT + uT u)
2
1 T
= (F F I), (2.15)
2
which can be understood as measuring the deviation from orthogonality when going from
material coordinates to spatial coordinates. If no deformation is present then F T F is I
and the strain thus is zero.
In cases where only small deformations are present, we can neglect the quadratic term of
E and effectively linearize the Green-Lagrange strain tensor and arrive at the infinitesimal
Cauchy strain tensor
1
= (u + uT )
2
1
= ((F I) + (F T I)). (2.16)
2
Note, the infinitesimal Cauchy strain tensor is not rotationally invariant and thus only
valid in the linear regime. The rotational invariance of E can be observed by performing
a polar decomposition of F = RU , where R is the rotational and U the stretch tensor.
Computing F T F effectively removes the rotational part: (RU )T (RU ) = U 2 .
Given the strain within a differential material element, we are able to define a strain-
energy density function , describing the internal potential energy. As for this thesis,
we focus on the well-known Saint Venant-Kirchhoff material model and its linearization.
The internal potential energy is formulated in terms of the Green-Lagrange strain tensor


(E) = [tr(E)]2 + tr(E 2 ), (2.17)
2
where and are the first and second Lame coefficients, respectively. These two param-
eters describe linear elastic behavior for homogeneous and isotropic materials. While the
first parameter is a general stiffness coefficient of the material, the second parameter
describes its response to shearing strains.
20 2. D EFORMABLE B ODY DYNAMICS

2.1.2 Stress and Elasticity

So far we have looked at quantities, which are defined on particles. Handling real matter as
a collection of interacting infinitesimal particles is computationally impossible. Statistical
physics defines rules, describing the average behavior of the collection of particles. In
this section, we review the concept of stress as a measure of the forces acting on the
deformable body.
Stress is a vectorial quantity defined as the force f acting on a small area A
f
tn = lim , (2.18)
A0 A

where tn has the units of N/mm2 = Pa. Given a normal n, we can define the Cauchy
stress tensor (F ) as
tn = (F )n, (2.19)
which is a second order tensor and can be defined in terms of the deformation gradient.
In order to arrive at , we first need the 2nd Piola-Kirchoff stress tensor S from the strain
energy

S= . (2.20)
E
The formulation for the Saint Venant-Kirchhoff material model in (2.17) is

S(E) = tr(E)I 3 + 2E. (2.21)

The Cauchy stress tensor is then defined as

= J 1 F SF T , (2.22)

where the Jacobian J = det(F ) is the determinant of the deformation gradient F . Similar
to (2.16), we can define a linearized version of (2.21) by using the Cauchy strain tensor

() S() = tr()I 3 + 2. (2.23)

The formulation of the stress vector given the normal of the current configuration can
be rewritten in terms of the initial configuration. This is what we will use throughout
the thesis, because it allows to precompute many of the necessary tensors. In contrast to
(2.19), we write the stress with respect to the initial configuration

tn0 = P n0 , (2.24)

where P is the 1st Piola-Kirchoff stress tensor (nominal or Lagrange stress), n0 is the
normal unit vector to a surface of the reference configuration. Given the generic definition
in (2.18), we can relate the stress vectors in the current and reference configuration

df = tn dA = tn0 dA0 . (2.25)


2.2. F INITE E LEMENT M ETHOD 21

We transform n to n0 by Nansons relation dAn = J dA0 F T n0 . Therefore, we can


also compute the 1st Piola-Kirchoff stress tensor from the 2nd Piola-Kirchoff stress tensor

P = F S. (2.26)

This definition of the stress is in many cases more convenient as it allows to precompute
tensors depending on the initial configuration only.
The assembled equations describe the laws of elasticity. Any given strain in a body causes
stress in the material defined by the material model. This induced stress is defined such
that it acts against the deformation, effectively trying to store the initial state.

2.2 Finite Element Method

In the previous sections, we defined the PDEs describing the internal body forces. These
elasticity equations relate the strains to the resulting stress forces. In order to be able to
solve these PDEs, we discretize the continuum, which leads to the dynamics equations
(2.2). In this section, we discuss the finite element method based on the Lagrangian
formulation presented above. By using the variational method and the principle of virtual
work, the forces in a body are written as:

0= (u) (x)dx u bx dx u tx dx , (2.27)
x x x

where the first term describes the internal body forces, is a variation of the real , and
u are virtual displacements; bx and tx are external forces. All three integrals are written
in terms of the current configuration. In order to be able to precompute the tensors, we
need to rewrite the above expressions in terms of the initial configuration. For simplicity,
we only consider the first term describing the elastic forces. The external body and surface
forces can be derived similarly.

x
(u) (x)dx = u (x) dX . (2.28)
x X X

Instead of the Cauchy stress tensor we want to use the 1st Piola-Kirchhoff stress tensor to
simplify the expressions later on

u P (x)dX . (2.29)
X

In order to compute the integral, we need to discretize the equation. As is commonly done,
we use Galerkins method to approximate the integration domain . For the complete
derivation we refer to [Ibrahimbegovic 2009]. The spatial domain of a body is discretized
22 2. D EFORMABLE B ODY DYNAMICS

into a set of non-overlapping elements e, e. g. tetrahedra in 3D, building a mesh. The


integration is split and performed for each element individually. Hence, we can write the
former integral as sum of integrals over the elements e
( )

uT P (x)deX . (2.30)
e eX

The continuous values are discretized and stored at the nodes of the mesh. The final nodal
forces exerted by a single element are

f int,e
(x) = P (x)deX . (2.31)
eX

Because we are not only interested in the values at the nodes, but also at any point in
an element e, Galerkins method defines how the nodal values are interpolated within an
element. The interpolation functions are required to be at least C 0 continuous in order for
the method to work.
This interpolation is based on a linear combination of basis functions Ne,i (), which are
defined for each node i for each element e in a coordinate system . For brevity, we will
omit the subscript e in the basis function and write Ni () assuming that basis functions
are always associated with a single element. The interpolation of any quantity, exemplary
written down for the position, is then given as

ne
x() = xi Ni (), (2.32)
i

where ne is the number of nodes for the element e.


As we have seen above, we need to compute the deformation gradient F (see (2.10)) as
part of the dynamics equations. Note that the notations are similar to [Irving et al. 2004].
We can rewrite (2.10) using the chain rule as
( )1
x() x() x() X()
F () = = = . (2.33)
X() X()
Inserting (2.32) into (2.33) gives us a way to compute the deformation gradient at any
point in the mesh
(n )( n )1
e
Ni () e
Ni ()
F () = xi Xi . (2.34)
i
i

For convenience, we assemble the derivatives of the shape functions in a matrix. Using
the same terminology as in most literature, we assemble matrix H Rne 3 containing
all the shape function derivatives
[ ]T
N1 () Nne ()
H() = , ..., . (2.35)

2.2. F INITE E LEMENT M ETHOD 23

Furthermore, we define D X Rne 3 containing the initial coordinates of the nodes of


element e
D X = [X 1 , ..., X ne ] . (2.36)
Using these definitions and similarly defining D x as the matrix containing the current
coordinates, we rewrite F
F () = D x B() = D x H()(D X H())1 , (2.37)
where the term B() = H()(D X H())1 is constant for a specific . Given the
deformation gradient F , we are able to determine the stress using (2.22). Using this
parametrization we can write f int in term of

X
f int,e (x) = B()T P (F ()) d.
(2.38)
e
e

Next, we will see how the basis functions are defined in terms of natural coordinates for
different elements building the foundation to solve the integral above.

2.2.1 Natural Coordinates

Natural coordinates provide an efficient way to parameterize regular elements. Their pur-
pose is to allow computing properties at any point within the element given the discretized
nodal values. In this section, we describe natural coordinates for different types of con-
vex 3D elements. We restrict the discussion to linear elements, i. e. elements which have
nodes only at their corners. Natural coordinates for higher dimensional elements can be
constructed from lower dimensional elements. The basic 1D element is the line. Its linear
form is represented by two nodes, a start and an end node. Taking the Cartesian product
of two 1D elements creates the 2D bilinear rectangular element, as it is depicted in Fig-
ure 2.2, where and are two 1D elements. A further lower dimensional basic element
is the triangle, which uses barycentric coordinates for interpolation.

Figure 2.2: Natural coordinate mapping between parameter and real space in 2D. The
orthogonal coordinates axes , [1, 1] are mapped to real space by an interpolation
function.
24 2. D EFORMABLE B ODY DYNAMICS

In the following, we present the coordinate functions for 3D tetrahedra, wedges, pyramids,
and hexahedra [Hughes 2003]. The reason for choosing these elements is that tetrahedra
and hexahedra are commonly used for modeling deformable bodies. Using both of them
within the same model requires connecting elements, which can be used as a transition
between triangular and rectangular faces. This can be efficiently achieved by employing
wedges and pyramids. The four element types are shown in Figure 2.3. While tetrahedral
elements are used throughout the thesis, the other elements are shown for completeness.

1 2

4
2 1
3 3

Figure 2.3: Natural coordinates for tetrahedra, hexahedra, wedges and pyramids. The
tetrahedron coordinates are defined using barycentric coordinates 1 , 2 , 3 , 4 [0, 1].
The coordinates for hexahedra and pyramids , , [1, 1] are defined on orthogonal
axes. Prims are interpolated using barycentric coordinates 1 , 2 , 3 [0, 1] and a linear
function [1, 1].

Tetrahedron The first and simplest type of 3D element is the tetrahedron. As for the
triangle, the tetrahedron is a simplex and not constructible from lower order elements. It is
the extension of the 2D triangle to the third spatial dimension. The interpolation function
is based on barycentric interpolation and thus very similar to the one for triangles


3
x(1 , 2 , 3 ) = Na (1 , 2 , 3 )xa . (2.39)
a=1

The shape functions are written in terms of barycentric coordinates, with 0 1 1, 0


2 1, 0 3 1, 0 4 1 and 1 + 2 + 3 + 4 1. Eliminating 4 results in the
final shape functions

N1 = 1
N2 = 2
N3 = 3
N4 = 1 1 2 3 . (2.40)
2.2. F INITE E LEMENT M ETHOD 25

The derivatives of the interpolation function with respect to the shape function parameters
= (1 , 2 , 3 ) is needed in (2.33)

4
Na ()xa
dx() = d. (2.41)
a=1

For the linear tetrahedron the derivatives are computed by taking the difference between
nodes
x(1 , 2 , 3 )
= [x1 x4 ]
1
x(1 , 2 , 3 )
= [x2 x4 ]
2
x(1 , 2 , 3 )
= [x3 x4 ] . (2.42)
3
( )
The Jacobian J = x
describes how the spatial coordinates change with respect to the
shape functions. Computing J for tetrahedra is straightforward. From the formula it is
clear that the derivatives are constant throughout the element in contrast to other elements,
which have different derivatives depending on ( the
) values of the natural coordinates. For
1
the formulation of the FEM we need J = x by inverting J .

Hexahedron Besides tetrahedral finite elements, hexahedral space divisions are very
popular. Their interpolation is based on the Cartesian product of three independent linear
1D shape functions
8
x(, , ) = Na (, , )xa . (2.43)
a=1
The definition of the shape functions is straight forward
1
N1 = (1 )(1 )(1 )
8
1
N2 = (1 + )(1 )(1 )
8
1
N3 = (1 + )(1 + )(1 )
8
1
N4 = (1 )(1 + )(1 )
8
1
N5 = (1 )(1 )(1 + )
8
1
N6 = (1 + )(1 )(1 + )
8
1
N7 = (1 + )(1 + )(1 + )
8
1
N8 = (1 )(1 + )(1 + ). (2.44)
8
26 2. D EFORMABLE B ODY DYNAMICS

In contrast to the tetrahedron the gradient of the shape functions varies with the natural
coordinates.

Wedge The first of the transition elements is the wedge connecting triangular and rect-
angular elements. The wedge is constructed as combinations of lower order elements, i. e.
a 2D triangle and a 1D line.

6
x(, 1 , 2 ) = Na (, 1 , 2 )xa , (2.45)
a=1

where is the natural coordinate for the line element and 1 , 2 are the barycentric coor-
dinates for the triangular element. The shape functions describe two triangular sides and
a linear interpolation between them
1
N1 = (1 )1
2
1
N2 = (1 )2
2
1
N3 = (1 )(1 1 2 )
2
1
N4 = (1 + )1
2
1
N5 = (1 + )2
2
1
N6 = (1 + )(1 1 2 ). (2.46)
2

Pyramid The second transition element is the pyramid. Its shape functions are con-
structed by collapsing one face of a hexahedron to a single point. Hence, the interpolation
is similar to the hexahedral element

5
x(, , ) = Na (, , )xa . (2.47)
a=1
The shape function for the four base points are identical to those of the hexahedron, while
shape function N5 is the sum of the shape functions N5 , N6 , N7 , N8 from the hexahedron
1
N1 = (1 )(1 )(1 )
8
1
N2 = (1 + )(1 )(1 )
8
1
N3 = (1 + )(1 + )(1 )
8
1
N4 = (1 )(1 + )(1 )
8
1
N5 = (1 + ). (2.48)
2
2.2. F INITE E LEMENT M ETHOD 27

2.2.2 Gauss Quadrature

The previously defined interpolation functions allow to compute the properties at any
point inside the element from its nodes. This allows to numerically integrate properties
over the element domain. Any numerical integration scheme could be used for this task.
However, we employ the widely used Gauss Quadrature for this process because of its
optimality regarding the number of integration points. The integration of the internal
forces over the domain e of the element e using natural coordinates

X
f int,e
(x) = B P
T d (2.49)
e

can be discretized and written as a sum of function evaluations at integration points g


ng

f int,e
(x) B Tg P g |Jg | Wg , (2.50)
g

where Wg are the quadrature weights and Jg is the scaled determinant at point g. Table 2.1
summarizes the quadrature points necessary to integrate the elements from Section 2.2.1
accurately.

Element Gauss Determinant Natural Weights


Points Coordinates
Tetrahedron 1 J = 16 det(J ) { 14 , 14 , 41 } 1
1
Wedge 2 J = 12 det(J ) { 13 , 13 , 3
} 1
{ , , }
1 1 1
1
3 3 3
Pyramid 5 J = det(J ) {{ 85 25
2 8
, 5 25 } , 32 }
2 2
0.81, 0.81, 0.81, 0.81
{0, 0, 52 } 125
27
1 1 3
Hexahedron 8 J = det(J ) { 3
, 3} 1, 1, 1, 1, 1, 1, 1, 1

Table 2.1: Parameters for the Gaussian integration over the domains of a linear tetrahe-
dron, wedge, pyramid, and hexahedron. A comprehensive list of integration points can be
found in [Felippa 2009].

Summing the element forces f int,e over all elements in a mesh results in the internal body
forces f int as they are defined in (2.2). However, computing the non-linear forces re-
quires to evaluate the Gauss integration each time the strain within an element changes.
This is the case because of the non-linear dependence of the material function (2.21) on
the strain. In structural mechanics often the linearized version is used, which allows to
simplify many terms. Stiff materials and small observed strains, thus negligible element
rotations, result in linear ODEs which are easier to solve than non-linear equations. How-
ever, ignoring the rotational part in the elasticity equation introduces artefacts in case of
28 2. D EFORMABLE B ODY DYNAMICS

large geometric deformations. In order to allow for such deformations, but still work with
a linear material, we use the corotational correction from [Muller and Gross 2004]. Linear
FEM and corotational linear FEM are introduced in the next two sections.

2.2.3 Linear FEM

In contrast to non-linear FEM, linear FEM is valid only for linear materials with small
displacement. It can be assumed that the rotational part of the deformation tensor is equal
to the identity matrix. This allows precomputing the strain-stress relation of an element.
Linear FEM relies on the linearized strain formulation as we defined it in (2.16). We
write the infinitesimal strain and the stress tensors as vectors using Voigt notation. Taking
advantage of the symmetric nature of both tensor we reduce them to 6 degree of freedoms.
In order to simplify the formulation of the stiffness matrix we multiply the shear strain by
2
= [xx |yy |zz |2xy |2yz |2xz ]T . (2.51)
= [xx |yy |zz |xy |yz |xz ] .
T
(2.52)
Typically, linear elasticity theory is described by the Youngs modulus E and the Poisson
ratio , which describe the elasticity in Pa and the amount of volume conservation. These
two values can be computed from the Lame coefficients , : E = (3+2) +
and =

2(+)
. The strain-stress relation from (2.23) can be rewritten in term of and

a = 1
1 2
b =
2
E
c =
(1 + )(1 2)

x a 0 0 0 x
y a 0 0 0
y
z 0
= c a 0 0 z
xy 0 0 0 b 0 0
2xy
yz 0 0 0 0 b 0 2yz
xz 0 0 0 0 0 b 2xz

= E. (2.53)
Note that b is divided by 2 compared to (2.23) in order to correct the factor in the shear
strain. The change in notation allows to rewrite
in terms of the displacement vector u
and B as in (2.37)
E = EBu (2.54)
Applying the Gauss Quadrature (2.49) we compute the per element stiffness matrix

ng

K =e
B Tg EB g |Jg | Wg , (2.55)
g
2.2. F INITE E LEMENT M ETHOD 29

where B g R6x3n contains the spatial derivatives of the element shape functions. We
compute the derivatives of the element shape functions at the integration point as defined
in (2.35). From these derivatives H(), we compute B = H()(D X H())1 . For each
node i of the element e we have a 6 3-matrix contributing to B g
Ni
x
0 0
0 Ni 0
y
0 Ni
0
Ni Ni z . (2.56)
y x 0
Ni
0 N z
i
y

Ni Ni
z
0 x

For a tetrahedron this results in a 6 12-matrix and for a hexahedron in a 6 24-matrix.


Because the stiffness matrix depends only on the initial configuration it can be precom-
puted and stays constant throughout the simulation. The internal forces are then computed
as
f int = (Ae K e )u, (2.57)
where Ae is an assembly operator summing up the element stiffness matrices to get a
global stiffness matrix and u are the displacements.

2.2.4 Linear Corotational FEM

Using linear FEM in applications with large geometric deformations, i. e. large element
rotations, inevitably leads to an increase of the volume, see e. g. [Muller et al. 2002,
Picinbono et al. 2003, Muller and Gross 2004]. In order to be able to use linear FEM in
such applications, we employ a corotational approach [Muller and Gross 2004]. The basic
idea behind this method is to handle the rotational part R of the deformation tensor F
independently from the stretch tensor U . We start by computing the polar decomposition
F = RU . In a second step, we use the rotation tensor to remove the rotational part of the
current positions
xer = (Re )T xe , (2.58)
where xe are the positions per element and (Re )T an elements rotation applied to each
3 1 block independently. The rotated positions xer are now in the same frame as the
initial positions x0 . Due to the linearity of the strain-stress relation, we can now compute
the forces in the initial frame
f e = K e (xer x0 ). (2.59)
Finally, we need to rotate the forces back to the current frame and sum up the results

f int = Ae Re K e ((Re )T xe x0 ). (2.60)

The resulting forces can be used in the same way as for linear FEM allowing simulations
with large displacements.
30 2. D EFORMABLE B ODY DYNAMICS

2.2.5 Tangent Stiffness Matrix

For the full non-linear FEM, where the reaction of the material depends on the strain, the
internal forces are a non-linear function of the current positions, f = f (x(t)). However,
when solving the resulting equations often a local linearization is used, e. g.the quasi-
static method by Teran et al.[2005]. The gradient of the force function is computed by
linearizing f using a first-order Taylor approximation
f (x(t))
f (x(t + h)) f (x(t)) + (x(t + h) x(t)). (2.61)
x
In the case of linear FEM f (x(0)) = 0, and K = f (x(0))
x
is constant and corresponds to
the stiffness matrix. In non-linear FEM K changes throughout the simulation and needs
to be recomputed at each simulation step. In the following, we show how to compute K
for linear tetrahedra. The force acting on a single node is the superposition of the forces
of all adjacent tetrahedra. Following Section 2.2.2, we can describe the force function for
a linear tetrahedron with a one-point quadrature

f (x) = V B T P (F ), (2.62)

where f is the function generating the 3n 1-matrix of the internal forces for an element
with n nodes. F is the deformation tensor, P the 1st Piola-Kirchoff stress tensor, and
B are the spatial derivatives of the deformation tensor. The first step is to expand the
derivative of the force function
f (x(t)) P F
= V BT . (2.63)
x F x
Using Voigt notation, we rewrite the 3 3-matrix F as vector

F v = [F11 , F12 , F13 , F21 , F22 , F23 , F31 , F32 , F33 ]. (2.64)

The spatial derivatives of F


x
v
result in a 9 3n constant matrix, where n is the number
of nodes of the element. In case of a tetrahedral element, F depends on the current and
initial positions

F = [x1 x4 , x2 x4 , x3 x4 ][X 1 X 4 , X 2 X 4 , X 3 X 4 ]1 . (2.65)

The derivative xF
1,x
, i. e. the derivative of the deformation tensor with respect to the
x-component of the node position x1 , is

1 0 0
F
= 0 0 0 [X 1 X 4 , X 2 X 4 , X 3 X 4 ]1 , (2.66)
x1,x
0 0 0

which can be written as a column-vector of the resulting derivative matrix Fx


v
. The
st
derivative of the 1 Piola-Kirchhoff stress tensor is computed component-wise. For each
2.2. F INITE E LEMENT M ETHOD 31

component of the 3 3-matrix P the derivative with respect to each element of F is


computed, which results in a 9 9-matrix. The 1st Piola-Kirchhoff stress tensor is de-
rived from the 2nd Piola-Kirchhoff stress tensor (see (2.26)) by multiplying it with the
deformation tensor P = F S,
( ) ( )
1 T 1 T
P (F ) = F tr (F F I) I + 2F (F F I) . (2.67)
2 2
The derivative with respect to a generic component of F is
P (F ) F ( ) [ ( T ) ]
= tr F T F I I + F tr F F I I
F ij 2 F ij 2 F ij
| {z }
(1)
T
F (F F )
+ (F T F I) + F , (2.68)
F ij F ij
| {z }
(2)

where F
F ij
is a 3 3 - matrix with F kl
F ij
= ik ij . Terms (1) and (2) are
( ) ( )
[ ( T ) ] tr F T F (F T F )
(1) tr F F I I = I tr(I)I = tr
F ij F ij F ij F ij
| {z } | {z }
=0 (2)
T T
(F F ) (F ) F
(2) = F + FT .
F ij F ij F ij
The last remaining term B relates the derivatives of P to the nodes of the volume element
Ni Ni+1
x
0 0 x
0 0 ...
Ni 0 0 Nyi+1 ...
y 0 0
Ni Ni+1
z 0 0 0 0 ...
z
0 N i
0 0 Ni+1
0 ...
x x
B 912 =
0 Ni
y
0 0 Ni+1
y
0 ... .
(2.69)
0 Ni 0 Ni+1
0 0 ...
z z
Ni+1
0 0 Ni
0 0 ...
x x
0 0 N y
i
0 0 Ni+1
y
...
Ni+1
0 0 N z
i
0 0 z
...

As pointed out by Teran et al.special care has to be taken to make the stiffness matrix more
robust in cases of inverted elements. So far the computation of the 1st Piola-Kirchhoff
stress tensor relies directly on the deformation tensor, which may be computed from an
inverted element. The derivation for other element types (like prisms, wedges, and hex-
ahedra) is similar. Instead of a single point integration, we replace (2.63) with a Gauss
integration (see Section 2.2.2); V is replaced with the corresponding integration weight.
32 2. D EFORMABLE B ODY DYNAMICS

The derivative of F and the definition of B change to take the other nodes into account,
while the derivative of P remains the same. Note, that the stiffness matrix of the linear
FEM formulation and the tangent stiffness matrix using the Saint-Venant Kirchhoff ma-
terial model at t = 0 are equal. The only difference is that in the derivation here, we do
not exploit any symmetry in the tensors.

2.3 Time Integration of the Dynamics Equations


Starting from the PDEs, we developed terms for the dynamics equations (2.2). The in-
ternal forces were derived using FEM (see Section 2.2). However, the resulting 2nd order
ODEs are still continuous functions. By distretizing the continuous time and performing
small discrete time steps, we solve the equation of motion. In the following sections,
we present explicit and implicit schemes to solve these 2nd order ODEs. While explicit
schemes compute the next state based on the current configuration only, implicit schemes
try to optimize the next state given continuity assumptions of the functions. In this thesis
we are mainly concerned with a linear relation between displacements and forces. Thus,
we replace the non-linear force function f int (x) with the linearized version K(x x0 ).
For the rotational correction (see Section 2.2.4) we find a similar formulation. For sim-
plicity, we omit writing the rotation matrices. In general the 2nd order ODEs are solved by
transforming them to 1st order ODEs. Thus, we first introduce an intermediate variable in
order to split the equation into a system of equations

x(t) = v(t)

v(t) = M 1 K(x(t) x(0)), (2.70)
where t is the time, x are the positions and v are the velocities; M is the mass matrix
and K is the linear stiffness matrix. In order to be able to easily invert M we use a com-
mon mass lumping approach to compute the matrix. Each element e with density and
volume Ve contributes a scalar factor to the mass matrix. Each node of e contributes Vne ,
where n is the number of nodes of e. In the following sections we present the discretized
integration schemes, where h denotes the integration time step.

2.3.1 Explicit Integration

In the following we summarize commonly use single step explicit integration schemes. In
the next chapter, we will discuss them in more detail in order to determine their stability
properties. The archetype of the explicit integration schemes is the Explicit Euler. It is
known to be unstable for 2nd order ODEs by definition and is included for illustrative
purpose only. The Symplectic Euler is used as representative for a 1st order accurate
integrator, while the Stormer-Verlet scheme is a second order accurate scheme. Finally,
the Runge Kutta 4 represents the special class of Runge Kutta schemes.
2.3. T IME I NTEGRATION OF THE DYNAMICS E QUATIONS 33

Explicit Euler

The simplest method to solve differential equations is Eulers explicit integration scheme
[Hairer et al. 2006]. It is based on a first order Taylor approximation of the function we
want to integrate. The update step for a system of first order ODEs is

v(t + h) = v(t) + hM 1 K(x(t) x(0))


x(t + h) = x(t) + hv(t). (2.71)

In the described case of solving 2nd order ODEs, the Explicit Euler produces an unstable,
diverging solution.

Symplectic Euler

The Symplectic Euler, also known as semi-implicit Euler, belongs to the class of first
order explicit integration schemes [Hairer et al. 2006]. It computes the new velocity with
an Euler step and uses this result to update the position.

v(t + h) = v(t) + hM 1 K(x(t) x(0))


x(t + h) = x(t) + hv(t + h). (2.72)

The Symplectic Euler belongs to the class of geometric integrators, which ensures con-
servation of phase-space symplecticity. This essentially means that it preserve the area
of the space of all possible configurations of the system, i. e. all possible configurations
of positions and velocities. In contrast to the Explicit Euler, this method is conditionally
stable and is the explicit integration scheme of choice throughout the rest of the thesis.
This is mainly due to two reasons: it is simple in its formulation and provides the best
tradeoff between stability and computation costs.

Stormer-Verlet

The Stormer-Verlet integration scheme [Hairer et al. 2006] is a 2nd order accurate explicit
solver. Basically, it is a central difference scheme. It was introduced by Loup Verlet in
1967 in the field of computational molecular dynamics.

x(t + h) 2x(t) + x(t h) = h2 M 1 K(x(t) x(0)). (2.73)

The Verlet method shares the stability and symplecticity with the Symplectic Euler. Its
reliance on the previous time step makes it less favorable for the implementation of inter-
active systems, where we sometimes need to modify velocities and positions directly.
34 2. D EFORMABLE B ODY DYNAMICS

4th Order Runge-Kutta-Nystrom

Runge Kutta methods represent a whole family of explicit and implicit integration meth-
ods for solving ODEs. Exemplary, we focus on the widely used 4th order Runge-Kutta
method. It uses function evaluations at 4 different positions to calculate the final result.

k1 = v(t)
l1 = M 1 K(x(t) x(0))
h
k2 = v(t) + l1
2
h
l2 = M 1 K(x(t) + k1 x(0))
2
h
k3 = v(t) + l2
2
h
l3 = M 1 K(x(t) + k2 x(0))
2
h
k4 = v(t) + l3
2
h
l4 = M 1 K(x(t) + k3 x(0))
2
h
x(t + h) = x(t) + (k1 + 2k2 + 2k3 + k4 )
6
h
v(t + h) = v(t) + (l1 + 2l2 + 2l3 + l4 ) (2.74)
6

The maximum stable integration time step is larger than for the Symplectic Euler and it
offers higher accuracy than the Stormer-Verlet method. However, in terms of computation
costs, this scheme is four times as expensive as the ones presented above, because the
forces are evaluated four times.

2.3.2 Implicit integration

In contrast to explicit integration schemes, implicit methods compute the solution of the
next state assuming it is already known. In this thesis we discuss implicit integration
only for linear FEM, which can be computed by solving a system of linear equations. In
contrast, in order to solve the equations from non-linear FEM, iterative processes such as
Newton-Raphson methods are used. These are based on linearizing the force computation
for each iteration step. For a more thorough discussion on this topic and application to
simulation in computer animation, we refer to [Hauth et al. 2003]. Note that this thesis
focuses on explicit integration, thus we only cover the Implicit Euler for completeness.
2.4. S UMMARY 35

Implicit Euler

In contrast to the explicit Euler, it computes both position and velocity based on the in-
formation of the next time step

v(t + h) = v(t) + hM 1 (f ext K(x(t + h) x(0)))


x(t + h) = x(t) + hv(t + h). (2.75)

In order to solve these equations, we assume a constant velocity within one time step.
Thus, we can replace x(t + h) in the first equation with x(t) + hv(t + h). This leads to
equations for v(t + h), which depend only on the current configuration space.

(M + h2 K)v(t + h) = M v(t) + hf ext hK(xn x(0))


x(t + h) = x(t) + hv(t + h) (2.76)

The new positions x(t + h) are computed using the velocity of the next time step. Al-
though the integration scheme is not symplectic and energy is constantly lost, it is popular
due to its simple formulation and its unconditional stability.

2.4 Summary

In this chapter, we derived the necessary theory of the deformation model we use through-
out the thesis. We provided the basics behind continuum mechanics applied to soft body
deformations. We showed how to compute the internal body forces for various discritiza-
tion elements using the finite element method. This provides a general way to characterize
non-linear and linear FEM along with the corotational correction. The resulting ordinary
differential equations need to be solved in order to simulate the dynamics of deformable
bodies. In that light, we presented a few commonly used integration schemes, which we
will discuss in more depth in Chapter 3.
3
Stability of Explicit Integration

Chapter 2 presented the differential equations through which we describe deformable


body dynamics. Depending on the choice of the numerical integration scheme, different
difficulties arise regarding stability and accuracy. In this chapter, we take a closer look at
the stability issues when using explicit integration schemes and derive a stability criterion.
Next, we introduce ways to approximate the evaluation of this criterion. These take the
geometry, the deformation model, the material parameters, and the integration method
into account. This allows us to develop a quality metric, which evaluates the effect of a
mesh element on the simulation stability and determines critical elements.

In Section 3.1 we review the stability of explicit integration in general and discuss each
of the integration schemes presented in the previous chapter based on a simple scalar
function. Following, Section 3.2 extends this discussion to the multi-dimensional case.
Next, Section 3.3 describes several element-based approximations to efficiently evaluate
integration stability. Based on these, we introduce our quality metric in Section 3.4.
Section 3.5 then examines the influence of the geometry and the material parameters
on stability more closely. Section 3.6 completes the discussion by looking at numerical
experiments to verify our claims. Additionally, Section 3.7 presents preliminary results
applying the developed methods to deformation models derived from non-linear FEM.

3.1 Stability of Integration Methods

In order to analyze the stability of explicit integration, we look at a single degree of


freedom 2nd order ODE example. It is the prototype function for Newtons equation of
motion (see (2.2)) describing the motion of an oscillating particle

m
x + dx + kx = 0, (3.1)
38 3. S TABILITY OF E XPLICIT I NTEGRATION

where m is the mass, d and k are the damping and the stiffness coefficients, respectively.
The analytic solution to this equation describes a harmonic oscillation

( )2
k d
x(t) = Ae m t cos t + ,
d
(3.2)
m 2m

where A is the initial amplitude and the phase. The undamped and damped harmonic
( d )2
k
frequency of the system are = m and d = m k
2m , respectively. Note that
for the methods in this chapter we consider only the undamped harmonic frequencies. In
order to solve (3.1) numerically it is discretized in time and integrated using a numerical
integration scheme. Depending on the size of the discretization the number of steps to
recover the solution and the errors vary. Many integration schemes provide only con-
ditionally stability meaning that the applicable step size is limited. If it is chosen too
large the computed solution diverges from the analytic result (3.2), the error results in
a growing amplitude. Any explicit integration scheme is only conditionally stable. In
the following we discuss how to determine the stability of numerical explicit integration
without knowing the analytical solution.

In a first attempt to derive an appropriate sampling rate for the numerical integration of
(3.2) we look at Nyquists sampling theorem [Shannon 1984]. This allows us to connect
the sampling frequency, and hence the integration step size to the function we want to
solve. In order to reproduce (3.2) accurately we need at least twice the sampling fre-
quency of . This provides us with an upper limit for the step size for explicit integration.
The necessary condition, which relates the maximum time step to the frequency of the
function, is

h , (3.3)

where is the frequency at which we want to sample. When we look at discretized con-
figurations such as the equations arising from FEM, geometric properties influence .
The simplest example would be the oscillation frequency obtained from a single vibrating
spring. As the length of the undeformed spring is decreasing the vibration increases. The
relation between material parameters, discretization and the step size can be characterized
by the Courant-Friedrichs-Lewy (CFL) condition. The CFL condition describes the ob-
servation that if a wave (e. g., a pressure wave through the material) is crossing a discrete
grid, the time step size must be less than the time necessary for the wave to propagate
through an element. More precisely, it relates the time step of the numerical integration
to the highest vibration mode of the underlying discretized mesh. This is inverse propor-
tional to the propagation time of a pressure wave as seen in (3.3). For a more thorough
discussion on this topic we refer to [Courant et al. 1967]. Note that (3.3) describes the
optimal case. Depending on the properties of a numerical integration scheme, we have to
use a smaller time step. In the following, we discuss how to derive a stable integration
time step for explicit integration schemes.
3.1. S TABILITY OF I NTEGRATION M ETHODS 39

3.1.1 Explicit Integration Schemes

We sketch the stability conditions for the explicit integration methods we have presented
in Section 2.3.1. Without loss of generality, we show the derivations based on scalar
equations. We describe the integration schemes using the notation of the state transition
matrix [Schwarz and Kockler 2006]
( ) ( )
x(t)
x(t)
=A . (3.4)
v(t)
v(t)

In order to solve (3.4), we need to discretize the equation with respect to time. The values
of an integration step always depend on the values of the previous step plus a derivative.
This leads to the formulation
( ) ( )
x(t + h) x(t)
= (I + hC) , (3.5)
v(t + h) v(t)

where C is a 22 matrix specific to the integration scheme. The eigenvalues of (I + hC)


allow to compute the stability criterion of an integration method. Using (3.5), we define
a function F (hC) to get from x(t) to x(t + h). In order to reason about stability, we
are interested in the long term behavior of the system limn F (hC)n . The integration
is stable if the spectral radius fulfills (F (hC)) 1. For each of the four following
schemes we derive the CFL condition by computing the spectral radius of the transition
matrix.

Explicit Euler

The most basic explicit integration scheme makes a single linear step to integrate the
quantities. Position and velocity are both integrated using the values from the previous
time step ( ) ( ( )) ( )
x(t + h) 0 1 x(t)
= I +h . (3.6)
v(t + h) m1 k 0 v(t)
Computing the eigenvalues of the state transition matrix yields no real solution which
fulfills (F (hC)) 1. Therefore, the undamped form cannot be used to solve (3.4)
stably for any given integration step size. Gurfil and Klein [2007] compute the CFL
condition for the damped explicit Euler scheme. They derive it from the state transition
matrix which includes a damping term, which has the CFL condition

2
h , (3.7)
k
m


k
where is the factor with which m
is damped.
40 3. S TABILITY OF E XPLICIT I NTEGRATION

Symplectic Euler

The Symplectic Euler proves to be better than the original explicit Euler. Due to the leap-
frog-like update rule the method is conditionally stable without damping. Similar to the
simple Euler, the state transition can be written as
( ) ( ( )) ( )
x(t + h) hm1 k 1 x(t)
= I +h . (3.8)
v(t + h) m1 k 0 v(t)

Following Muller et al. [2005], we compute the eigenvalues 0 , 1 of the transition matrix
in (3.8) analytically
1 2
0 = 1 (h k 4mh2 k + h4 k 2 )
2m
1 2
1 = 1 (h k + 4mh2 k + h4 k 2 ) (3.9)
2m
Solving these equations for h such that |i | 1 results in the CFL criterion necessary to
ensure (F (hC)) 1.
2 2
h 1 = . (3.10)
m k
Thus the Symplectic Euler is stable if the time step is set below the computed boundary.
We find the same formulation for the oscillation frequency as in (3.2), where stiffness and
mass directly influence the stability. In order to generalize the concept we directly use the
oscillation frequency from here on.

Stormer-Verlet

In order to characterize the Verlet integration scheme we need to change the notation of
the state transition matrix slightly. It integrates the positions directly without including
the velocity explicitly by looking at xt and xth
( ) ( )( )
x(t + h) 2 h2 m1 k 1 x(t)
= . (3.11)
x(t) 1 0 x(t h)

Similar to the Symplectic Euler, the eigenvalues of the update matrix can be computed in
order to derive the CFL condition (see [Klein 2007]). This leads to the same equations as
for the Symplectic Euler. Hence, in order to fulfill the condition || 1, we obtain the
same CFL condition
2
h . (3.12)

The stability criterion of the Symplectic Euler and the Stormer-Verlet are the same. With
respect to the maximum integration time step both methods are equally stable. However,
because the Verlet method is of a higher order its accuracy is higher.
3.2. S TABLE I NTEGRATION T IME S TEP FOR D EFORMABLE B ODY DYNAMICS 41

4th Order Runge-Kutta-Nystrom

In order to enhance both stability and accuracy the symplectic variant of the Runge-Kutta
4 (RKN4) integration scheme is often used. Writing the formulation in (2.74) in terms of
the state transition matrix (3.5), we define the scheme as
( )
0 1
F =
m1 k 0
k1 = F
h
k2 = F (I + hF ) = F + F 2
( ( 2 ))
h 2
k3 = F I + h F + F
2
2
h h
= F + F2 + F3
( 2 ( 4 ))
h 2 h2 3
k4 = F I + h F + F + F
2 4
2 3
h h
= F + hF + F 2 + F 3
2 4
( ) ( )( )
x(t + h) h x(t)
= I + (k1 + 2k2 + 2k3 + k4 )
v(t + h) 6 v(t)
( ) ( )
h2 2 h3 3 h4 4 x(t)
= I + hF + F + F + F . (3.13)
2 6 24 v(t)

The stability radius is computed by extracting the eigenvalues of the transition matrix
[Schwarz and Kockler 2006]:
2.78
h . (3.14)

The explicit RKN4 integration scheme is more stable and accurate. However, the addi-
tional costs compared
to the larger stability range yield a lower efficiency. We can achieve
approximately a 2 times larger time step, but pay four times the computational costs.
Hence, it is computationally cheaper to perform two steps using the Symplectic Euler
than a single RKN4 step.

3.2 Stable Integration Time Step for Deformable Body


Dynamics

After looking at the CFL condition for an oscillating single degree-of-freedom function,
we extend the discussion to the integration of the deformable body dynamics described
in Chapter 2. For this, we look at the general multi-dimensional degree-of-freedom case.
42 3. S TABILITY OF E XPLICIT I NTEGRATION

Finding an appropriate integration time step in such a configuration depends on the high-
est oscillation frequency of the whole system. This is observed best by diagonalizing the
system matrix M 1 K from the coupled differential equations (2.2). We obtain the os-
cillation frequencies for the decoupled equations by solving the generalized eigenvalue
problem (see [Pentland and Williams 1989])

= (M 1 K), (3.15)

where the diagonal matrix contains the eigenvalues of M 1 K, and the columns of ma-
trix correspond to the eigenvectors of M 1 K. The eigenvalues i = []ii correspond
to the squared modal frequencies i2 , and the eigenvectors []i correspond to the direc-
2
tions along which the vibrations work. By plugging max = max := maxi i into the CFL
conditions of the discussed explicit integration schemes, we can determine the maximum
time step allowed for the stable simulation of the deformable object. This ensures that
the highest oscillation frequency is integrated stably. Hence all lower frequencies are also
integrated stably.
Comparing the time it takes to compute the largest eigenvalue max of the system matrix
M 1 K to the size of the update time step, shows that it takes many times longer to com-
pute max than the the available computation time for the step is. Computation times for
max depend on the number nodes and the shape of the elements. By using power itera-
tions (see Section 5.5) on an Intel Core 2 Quad Q9400 running at 2.66 GHz, identifying
the largest eigenvalues for the Liver (C) model with 766 tetrahedra (see Figure 3.5) takes
about 1.5 ms, which is 7.5 times more than the identified limit time step of 0.2 ms. These
costs are crucial in real-time simulation scenarios, where the stability condition needs to
be evaluated often, e. g. after topological modification like cutting. Moreover, although
this allows determining a stable time step, it is not possible to determine which elements
in the mesh cause the maximum oscillation frequency. The goal of the following section is
to develop methods that allow to approximate the largest eigenvalue of the system matrix
locally.

3.3 Element Based Approximation

Identifying bounds of the eigenvalues of the system matrix has been an important topic
in structural engineering for a long time. Fried [1972b] showed that it is possible to
determine a reasonable upper bound for the eigenvalues of M and K by considering
each element e in isolation. These upper bounds depend on the maximal eigenvalues
max of the element matrices M e , K e and the maximum vertex valence n of the mesh,

max max (K e ) max (K) n max max (K e )


e e
max max (M e ) max (M ) n max max (M e ), (3.16)
e e
3.3. E LEMENT BASED A PPROXIMATION 43

where max () corresponds to the maximum eigenvalue of the matrix passed as argument.
Based on this, Fried [1972a] devised a metric to compute bounds to the condition number
of the system matrix. Similarly Shewchuk [2002] derived boundaries to the maximum
eigenvalue for tetrahedral elements,
max max (J e ) max (M 1 K) n max max (J e ), (3.17)
e e

where J e = M 1/2
e
1/2
K eM e ,M e = diag(mi , mj , mk , ml ) are the nodal masses, n is
the maximum node valence in the mesh, and i, j, k, l are the nodes of e. As a consequence,
the maximum eigenvalue of M 1 K is proportional to the maximum eigenvalue of J e
of the worst element e. In Section 3.6.2, we compare the identification performance of
Shewchuks metric to our methods, and show that it is reliable but too conservative for
our needs.
Although not applicable to FEM, Kacic-Alesic et al. [2003] derived a metric to estimate
the integration time step for explicitly integrated mass spring models. In spirit it is similar
to our approach by making use of the CFL condition to determine a stable time step. Their
formulation is comparable to Shewchucks method by scaling the maximum oscillation
frequency with the connectivity.
A different approach is taken by Debunne et al. [2001]. Instead of using stiffness and
mass matrices they directly use the material parameters and the integration scheme. They
do not make an attempt to develop a metric for single nodes or elements; rather they try
to determine a stable time step for the whole mesh


h<l , (3.18)
+ 2
where h is the time step, and are the Lame parameters, is the materials rest density,
and l is the shortest distance in the mesh. This formulation is similar to our method
in Section 3.3.1. In contrast to us they state that their usage of the semi-implicit Euler
integration scheme allows them to double the time step. Although the method in general
is conservative, doubling the time step can lead to an unstable simulation due to a too
optimistic estimation. Since no verification is presented, we assume that the results are
similar to the ones of our method in Section 3.3.1.
In the following, we cover the approximation methods we have developed based on the
ideas and observations presented above. We first discuss the methods of the approaches
based on single elements; extensions to larger regions are presented later on. A detailed
performance evaluation is deferred to a later point in this chapter.

3.3.1 Single Element Approximation

Instead of computing max for the whole system matrix M 1 K, we focus on two meth-
ods which work on single, independent elements. A method with a constant runtime per
44 3. S TABILITY OF E XPLICIT I NTEGRATION

element is preferable because it scales linearly with the number of elements in the mesh,
while computations on the full matrix scale worse. The two methods are inspired by the
works discussed above. The first method is based on Rayleighs principle as in [Fried
1972b]. The second method is based on pressure wave propagation in materials similar to
[Debunne et al. 2001].

Estimation Based on Single Element Oscillation Frequencies

Inspired by [Shewchuk 2002] we propose a slightly different formulation based on the


behavior of the worst element, i. e. the element with the maximal oscillation frequency
1
max (M 1 K) max max (K e ), (3.19)
e me

where me = 41 0 Ve is the average mass of a corner of the element e with volume Ve and
density 0 . The highest eigenvalue of the assembled system matrix is always smaller than
the largest eigenvalue of the element matrices scaled with the element mass per node me .
In contrast to (3.17), we consider only the mass of the element itself. This is based on the
assumption that each element could be a mesh of its own and could also be simulated on
its own.
Based on Rayleighs principle we provide a proof that the approximation is valid. Our
derivation follows the procedure described by Wathan [1987]. The system stiffness K and
lumped mass M matrices are the result of an assembly procedure where we accumulate
the element matrices K e , M e of all elements using the Boolean assembly operator L:

K = LT D (K e ) L
M = LT D (M e ) L, (3.20)

where D is an operator to produce a diagonal block matrix from the element matrices; i. e.,
D (K e ) and D (M e ) are the diagonal block matrices containing the element stiffness and
mass matrices. If is an eigenvalue of M 1 K then we know, by using the Rayleigh
quotient, that the following relation holds for any non-zero vector x:

xT M 1 Kx xT M 1 Kx
min max . (3.21)
x=0 xT x x=0 xT x
In the case of the lumped diagonal matrix M we can rewrite (3.21) as

xT Kx xT Kx
min max . (3.22)
x=0 xT M x x=0 xT M x

In our case the stability is related to the highest eigenvalue of the matrix M 1 K. Thus,
in the following we will examine only the upper bound. We substitute the matrices K and
3.3. E LEMENT BASED A PPROXIMATION 45

M with (3.20) to get an expression of the Rayleigh quotient that depends on the element
mass and stiffness matrices
xT LT D (K e ) Lx
max . (3.23)
x=0 xT LT D (M e ) Lx
We simplify the equations by substituting y for Lx

y T D (K e ) y
max . (3.24)
y=0 y T D (M e ) y
Finally, we can rewrite the Rayleigh quotient as
( )
y T D M 1
e Ke y
max . (3.25)
y=0 yT y
Thus, any eigenvalue of (the global)system is bounded by the highest eigenvalues of the
block diagonal matrix D M 1 e K e . Each block contains the product of the inverse mass
matrix and the stiffness matrix for an element. The highest eigenvalue of a matrix with
this structure is defined by the eigenvalues e of each block. Therefore, the eigenvalues
of the original system are bounded by:

max max
e . (3.26)
e

Thus, the highest eigenvalue is bounded by the maximal eigenvalues of the elements.

Estimation Based on Sound Wave Propagation

The second method is a heuristic approach also based on single elements. It is similar
to [Debunne et al. 2001], but defined per element instead per node. It is built on the
observation that the forces within the body only affect a certain region within one time
step. The momentum transmission is caused by pressure waves crossing an object. In our
case, the models have been discretized into simple tetrahedral elements of various sizes
and shapes. For a stable simulation, the time step should be small enough such that it can
reliably represent the transmission of a pressure wave across a characteristic distance of
the elements. The critical time step is, therefore, equal to this characteristic length Le of
an element divided by the dilatational wave speed s: h Le /s. The speed can be easily
computed in the case of linear elasticity from the mechanical properties of the material as
done in [Kinsler et al. 1999]

E(1 )
s= , (3.27)
(1 + )(1 2)

where E is the Youngs modulus, is the Poisson ratio and is the rest density of the
material. In case of hexahedral elements, [Miller et al. 2006] propose a value for the
46 3. S TABILITY OF E XPLICIT I NTEGRATION

specific distance of an element of Le = Ve / max Ae , where Ve is the volume of an element


and max Ae is the maximal area of the faces of the hexahedron. Because the volumetric
meshes we use, are based on tetrahedral elements, we obtain the characteristic distance in
a similar way by dividing the volume of a tetrahedron by the maximal area of its faces
Le = 3Ve / max Ae , (3.28)
where Ve is the volume of element e and max Ae is the maximal area of the faces of the
tetrahedron. The volume of the tetrahedron is defined as Ve = Ai hi /3 for any face Ai
whose corresponding height is hi . Therefore, the specific distance is considered to be the
minimal height of the tetrahedron above its faces. For these elements at least one of the
heights will be smaller than the others. The specific distance of the whole mesh is the
smallest specific distance of all tetrahedra L = min Le .

3.3.2 Extension to Local Neighborhood to Improve the Accuracy

Instead of looking at a single element at a time, we additionally consider how it is embed-


ded in the mesh. The dynamics of a mesh node is influenced by all the elements connected
to this node. Therefore, a more precise way to obtain the critical time step is to consider
the neighboring nodes and elements in the local estimation. Obviously, we can define
several different sets of connected elements. Smaller sets mimic a behavior similar to the
single node (element); larger sets produce a behavior closer to the one of the complete
mesh. In fact, we could consider the whole mesh as one of these sets. Similar to the case
above, where we computed the highest oscillation frequency for a single element, we de-
termine the largest eigenvalue of the system matrix of the region. Instead of using the
matrices from a single element, the system matrix is built from the matrices of all the ele-
ments of the set (M )1 K . The computational costs of solving the extended eigenvalue
problem increase with the number of elements in a set. Without loss of generality, we
restrict the discussion to the four smallest sets. The CFL condition describes how far a
pressure wave travels within a single time step. Thus, when only checking whether the
condition is fulfilled, looking at small sets is sufficient:

1. Vertex 1-ring: In this case the highest oscillation frequency is computed for each
vertex of the mesh (see Figure 3.1(a)). For a given vertex, the stiffness and mass
matrices are built by assembling the contributions of all the tetrahedra sharing that
vertex.
2. Vertex 2-ring: In addition to the vertex 1-ring we add all elements sharing a ver-
tex with the boundary of the 1-ring (see Figure 3.1(b)). Adding, this second ring
includes the mutual interactions of the 1-ring nodes with their adjacent elements.
3. Element 1-ring: Instead of assembling the neighborhood of a single vertex, we
combine the vertex 1-ring sets of the four nodes of a the central tetrahedron (see
Figure 3.1(c)). Effectively, we build the 1-ring neighborhood of the central element.
3.3. E LEMENT BASED A PPROXIMATION 47

(a) Vertex 1-ring (b) Vertex 2-ring (c) Element 1-ring (d) Element 2-ring

Figure 3.1: Extension of the single element approximation to rings illustrated on a 2D


example. The central node or triangle, for which the oscillation frequency is computed, is
highlighted.

4. Element 2-ring: Similar to above, we add all elements sharing a vertex with the
element 1-ring. Hence, similar to the vertex 2-ring case, the mutual interactions of
the 1-ring with its neighboring elements are modeled.

The regions can be extended further to improve the approximation more. However, the
larger the regions are the less valid the results are to describe the CFL condition for a
single element. In the following, we extend this approach such that it keeps the validity
for individual elements.

3.3.3 Reduced 1-ring

Although extending the neighborhood locally increases the accuracy of the approximated
highest eigenfrequency, it is less clear how much the central element influenced the result.
Consequently, it is not possible to assign a maximum eigenfrequency to a single element.
Regardless, it is preferable to include the mutual interactions of the central element with
its neighborhood. In order to improve the approximation of the highest oscillation fre-
quency, compared the single element methods in Section 3.3.1, but still have the possi-
bility to do the approximation precisely for a single element, we introduce the concept of
the reduced 1-ring neighborhood. Instead of computing the highest eigenfrequency for
the central element alone, this method incorporates the influence of the boundary nodes
of the 1-ring on the nodes of the central elements, but not vice-versa. This is equivalent
to using the element 1-ring as before but assuming that the boundary nodes are fixed. As
before, we assemble the 1-ring system matrix (M e )1 K e for each element from the ele-
ment stiffness and mass matrices. The ordering of the nodes in the 1-ring is such that the
nodes corresponding to the central element occupy the first n degrees of freedom. In case
of a tetrahedron this defines a 12 12-sub matrix e . Figure 3.2 shows a 2D example
triangle mesh. The numbers in brackets correspond to the links between the nodes of the
mesh. The degrees of freedom of the dark colored element e in the middle of the mesh
occupy the upper left of the matrix (sub matrix e ).
48 3. S TABILITY OF E XPLICIT I NTEGRATION

3 4

9 2
1 5

0
6
8
7

Figure 3.2: Matrix of the reduced 1-ring for a 2D example. The degrees of freedom
corresponding to the central element occupy the upper left corner.

Instead of computing the maximum modal frequency of the full 1-ring system matrix, we
obtain the eigenvalues e,i only for the n n-sub matrix e , containing only the degrees
of freedom of element e. Thus, we determine the modal frequencies of the nodes of e,
while also considering the influence of the 1-ring on the central element. As will become
evident , this allows a reliable identification of the highest vibration mode of the central
element e. Neglecting parts of the matrix, however, results in inaccuracies; the computed
highest frequency is too low, i. e. the determined time step is too optimistic. This error
stems from the assumption that the 1-ring boundary is fixed, which has an additional,
artificial stabilizing effect on the central element. The errors introduced this way are
documented in Section 3.6.2.
In order to reduce these errors, we also model the influence of the nodes adjacent to the
central element e on the boundary nodes of the 1-ring by extending the reduced 1-ring to
a reduced reduced 2-ring. This way all the interactions taking place within a single time
step can be captured. The results in Section 3.6.2 clearly demonstrate that the accuracy
on all the models tested is very high.
However, there exists a trade-off between the reliability of the element identification and
the accuracy of the time step approximation. Even though the reduced 2-ring has a higher
accuracy, we opted for using the reduced 1-ring. Its approximation is more accurate than
the ones obtained from evaluating full n-rings. In contrast to the reduced 2-ring it allows
identifying the highest oscillation frequency for single elements. Currently, we handle the
inaccuracies of the reduced 1-ring by adding a heuristic safety margin. It was set to 10%
based on the evaluation of a wide variety of meshes.

3.4 Oscillation Frequency Quality Metric

In the previous section we developed approximation methods for the highest oscillation
frequency for individual elements within a mesh. Together with the CFL condition they
can be used to select the maximum possible time step for a mesh with specific material
3.5. I NFLUENCE OF M ATERIAL AND G EOMETRY 49

parameters and given integration scheme. Next, we are going to use the presented mecha-
nisms to establish a metric in order to distinguish between ill-shaped elements, which are
critical for the integration stability, and well-shaped elements, which are non-critical.
A metric which can distinguish well- and ill-shaped elements has to be able to accurately
identify single elements. We have presented two localized methods - single element oscil-
lation frequencies and reduced 1-ring - which can be used to determine for each element
the maximum oscillation frequencies. In order to establish the metric, we start by select-
ing a simulation time step h. For each element, we compute the maximum oscillation
frequency emax according to the chosen approximation method. Next, we apply the CFL
condition to determine the maximum time step he for element e
c
he < , (3.29)
emax
where c depends on the integration scheme; for the Symplectic Euler and the Stormer-
Verlet we get c = 2 and for the RKN4 c = 2.78. The decision if an element is ill-shaped
is binary. If he < h the elements needs a smaller time step for a stable integration than the
one selected for the metric, thus we label e as ill-shaped. Stable explicit time integration
requires that each integration step is performed with a time step fulfilling the CFL con-
dition. For the linear and the linear corotational model we can precompute the stiffness
matrices, which stay constant during simulation. The corotational method uses the same
rest state configuration up to an orthonormal transformation, thus it is equivalent to the
linear method. This allows to classify the elements beforehand. In the case of topological
changes during the simulation only new or modified elements need to be classified.
In summary, given a target time step h, we iterate over all elements e of the mesh and
build the system matrix of the reduced 1-ring of e. By computing the highest vibration
frequencies of e and checking the CFL condition, we obtain the critical limit time step he
for each element e. If he > h, we consider e to be well-shaped, else ill-shaped.
Before we evaluate each of the presented methods, we discuss how different sizes and
shapes numerically influence the vibration of tetrahedral elements. The following section
discusses these dependencies on the basis of the linear tetrahedron as part of the finite
element method.

3.5 Influence of Material Parameters and Geometry on


the Oscillation Frequencies

As it has already been shown by Shewchuck [2002] tetrahedral elements are more critical
for the stability when their shape differs from a regular tetrahedron. Additionally, not only
the shape but also the size of the elements, the material parameters, and the integration
scheme play a role. A small uniform tetrahedron can be as critical for the stability as a
50 3. S TABILITY OF E XPLICIT I NTEGRATION

large thin element. Thus, geometrical properties and the material of the body influence
the magnitude of the oscillation frequencies. A stiffer material will generate higher fre-
quencies, while denser material oscillates with a lower frequency. In the following, we
look at how the parameters of the material model in (2.53), the geometry, and the density
influence oscillation frequencies and thus the applicable time step.
First, we show the influence of the material parameters based on an equilateral tetrahedron
with an edge length of 2 m and a volume of Ve = 1 m3 . As we have shown before,
computing the eigenvalues of max (M 1 K) results in the maximum squared oscillation
2
frequencies max . Using a linear material
it is clear that max linearly depends on the
square root of the Youngs Modulus E. Similarly, max depends linearly on 1 , since

the mass M is computed from Ve . The scaling factor E is also found in the wave
propagation in a material (see Section 3.3.1). In order to visualize the dependency of
Poisson ratio and geometry, we rewrite the computation of M 1 K for a single tetrahedral
element e
4
M 1 K = e
Ve EB Te DB
Ve
4E T
= B DB e (3.30)
e

where D is the material matrix without the Youngs Modulus E. Note, for the special
case of a linear tetrahedron, the volume is equal to the determinant of the Jacobian of
the positions, and thus it can be removed from the equation. Because density and elas-
tic modulus are constant scalar values, which are independent of the geometry, we omit
e for
them in the following discussions. However, we compute the eigenvalues of B Te DB
different geometries and a series of different .

Figure 3.3: Relation between and max for an equilateral tetrahedron. As the poisson
ratio reaches 0.5 and thus incompressibility the maximum frequency goes to infinity.

For the equilateral tetrahedron the relation between the maximum oscillation frequency
and the Poisson ratio is depicted as full-line in Figure 3.3. For moderate values of , from
0.05 to 0.4, we see a modest increase of the maximum oscillation frequency of a factor of
2. Increasing the Poisson ratio further results in frequencies being orders of magnitudes
3.5. I NFLUENCE OF M ATERIAL AND G EOMETRY 51

higher than for smaller ratios. However, comparing this to the volume conservation term
(1)
in homogeneous material (1+)(12) , as we find it in the wave propagation in material
(see (3.27)), shows the significant influence of the discretization. The relation between
this term (dashed-line in Figure 3.3) and the full term (full-line) is not constant due to
the missing geometry term in the former. As a consequence, we find even for moderate
Poisson ratios of 0.4 a significant difference, which shows that the geometric term is
necessary to capture the full influence of the Poisson ratio.

Instead of varying only the Poisson ratio, we change shape and size of the initial equi-
lateral tetrahedron to observe the effects of the geometry on the maximum oscillation
frequency. First, we transform the initial tetrahedron to a cap element by decreasing the
distance of one point to its opposing face and observe the resulting maximum oscillation
frequencies. The dependency between the height of the cap and the frequency is shown in
Figure 3.4(a). The x-axis shows the height relative to the initial height; the y-axis shows
the computed maximum oscillation frequency relative to the maximum frequency at the
initial height. The increase in frequency are roughly inverse proportional to the compres-
sion of the height. Dividing the height by two, doubles the maximum frequency. In a
second example, we shrink the initial tetrahedron uniformly, thus, maintaining its shape
(see Figure 3.4(b)). The x-axis we shows the ratio of the edge length to the initial length
and the y-axis again the ratio the computed frequency to the initial value. The results are
similar to the case above and we find the same inverse proportionality. We conducted both
evaluations for different Poisson ratios. With increasing the computed oscillation fre-
quencies grew. However, relative to the initial value, the results for the different Poisson
ratios are similar, e. g. when shrinking the edges of the tetrahedron to one fifth of the ini-
tial length, the maximum oscillation frequency increases by a factor of five independent
from the Poisson ratio.

(a) Transforming uniform tetrahedron to cap (b) Shrinking tetrahedron uniformly

Figure 3.4: Relation between geometry and max for an equilateral tetrahedron. In the left
figure one node is moved towards its opposing face. For increasing compression ratios the
impact on the oscillation frequency is smaller for higher Poisson ratios. In the right figure
the tetrahedron is scaled uniformly. The horizontal axis shows the ratio of the shortened
height/edges to their initial length. The vertical axis shows the increase of the maximum
oscillation frequency relative to the maximum frequency of the initial tetrahedron.
52 3. S TABILITY OF E XPLICIT I NTEGRATION

The results of these three experiments show that size/shape and Poisson ratio do not influ-
ence each other strongly. While higher Poisson ratios cause higher oscillation frequencies,
the steepness of the increase has no strong dependency on the geometry. For flatter ele-
ments the increase is less steep than for more equilateral elements. The steepness even
decreases more for higher ratios. When varying the size or shape of the uniform tetra-
hedron we find the same inverse proportionality as in Section 3.3.1. However, while
the shape influences the oscillation frequency significantly, the size of the element has a
stronger impact. A uniform equilateral tetrahedron has a much higher maximum oscilla-
tion frequency than a cap element with the same height but a much larger base area. Thus,
the simplified model correlating wave speed and characteristic length is not sufficient to
judge the quality of an element.
As a consequence of the observed behavior we can make some simplifications, when
evaluating our developed quality metrics. Density and material stiffness need no further
consideration as they are only scaling the final results linearly. Moreover, the influence
of different Poisson ratios on the oscillation frequencies for the same geometry is limited.
Thus, in the following section, we limit the discussions to a single Poisson ratio and ignore
density and stiffness. This allows to verify the applicability of the presented methods
easily and demonstrate the advantages compared to previous approaches in a concise way.

3.6 Identification Performance

For each of the presented approximation methods we compare the identification perfor-
mance on a variety of 3D models. As a reference, we use the maximum oscillation fre-
quency computed from the global system matrix, which describes the stability property
accurately for linear FEM. While we could present a proof that some of the methods are
correct, we resort to experimental validation for the remaining ones. Moreover, for none
the methods we have an analytic boundary to quantify the error made by them. First, we
compare the oscillation frequency based single element approximation (see Section 3.3.1)
to the method from Shewchuck and the material wave propagation approach. Next, we
demonstrate the improvements obtained using the extension to the local neighborhoods
(see Section 3.3.2). Because we are missing an analytic proof to show the correctness of
the reduced 1-ring approximation, we verify the quality and applicability experimentally.
Additionally, we show how the quality of the elements are distributed for several meshes.

3.6.1 Models

In order to get a good evaluation of the performance of our identification schemes we


test them on a variety of 3D models. In Figure 3.5 we depict the models we run the
tests on, along with some statistics in Table 3.1. For several meshes, we use different
3.6. I DENTIFICATION P ERFORMANCE 53

discretizations to show the dependency of the resolution on the element sizes. The top
row lists four resolutions of a liver model; coarse (C), medium (M), fine (F), extra fine
(XF) resolution from left to right. The second row shows a coarse resolution (C) and a
fine resolution (F) mesh of a bar. The third row displays four Stanford bunny models (C,
M, F, XF). In the last row we find a model of the Stanford dragon and armadillo, a duck,
and a regular cube. All models (with exception of the cube) were generated from surface
meshes using TetGen [Si 2009]. The bunny, dragon, and armadillo surface meshes were
taken from the Stanford 3D Scanning Repository.

Figure 3.5: 3D models used to show the performance of the identification process. The
corresponding bounding boxes and element numbers are listed in Table 3.1.

3.6.2 Accuracy

In order to compare the accuracy of the various eigenvalue approximation methods, we


use a setting with as few parameters as possible. In Section 3.5, we showed that the
54 3. S TABILITY OF E XPLICIT I NTEGRATION

Name # Nodes # Tetrahedra Bounding box (l w h) [cm3 ]


Liver (C) 258 766 20 15 16
Liver (M) 825 2749 20 15 16
Liver (F) 2396 8458 20 15 16
Liver (XF) 6640 24115 20 15 16
Bar (C) 192 635 20 4 4
Bar (F) 592 2323 20 4 4
Bunny (C) 255 801 86 66 84
Bunny (M) 940 3957 86 66 84
Bunny (F) 2108 7403 86 66 84
Bunny (XF) 3667 11779 86 66 84
Dragon 1270 4163 20 14 9
Armadillo 758 2629 333
Duck 199 610 655
Regular Cube 216 625 555

Table 3.1: Configurations of the models in Figure 3.5.

Youngs modulus and the density can be factored out a priori for linear FEM. Thus,
we perform the following tests with a Youngs modulus of E = 1 Pa and a density of
= 1 kg/m3 . The Poisson ratio influences the results and cannot be separated from the
geometry. However, we showed in Section 3.5 that the relative differences between the
results for different Poisson ratios behave similarly. For example, if we take a uniform
tetrahedron and divide the length of the edges in halves, the maximum oscillation fre-
quency doubles. This is independent of the Poisson ratio; for the ratio of = 0.2 it
increases from = 1.8 rad/s to = 3.6 rad/s and for = 0.49 from = 9.8 rad/s to
= 19.6 rad/s, respectively. Thus, for the rest of this section, we restrict the discussions
to an arbitrary soft tissue like Poisson ratio of = 0.45.
In order to assess the accuracy of the eigenvalue approximation algorithms, we compare
their results to reference values, obtained from computing the largest eigenvalue of the
system matrix M 1 K of the complete mesh. Throughout this section we depict the re-
sults on selected meshes from Figure 3.5, which show a wide range of outcomes. First, we
compare the estimations of our single element approximation, the pressure wave method,
and Shewchuks method (introduced in (3.17)) to the reference values. Table 3.2 shows
the relative differences of the two methods to the reference. A perfect approximation
would yield 0% difference, while positive values indicate too optimistic estimations and
negative values indicate that the approximation is more conservative. The single element
approximation shows the best behavior out of the three methods. Its results are conserva-
tive and it can approximate the largest eigenvalue exactly in some cases. However, it also
can be off several orders of magnitudes. The pressure wave approximation on the other
hand shows a similar behavior. However, due to the fact that it can be too optimistic (in
case of Liver (XF) and Regular Cube), it is a less practical method. Shewchuks method
3.6. I DENTIFICATION P ERFORMANCE 55

Model
Method Liver (C) Liver (XF) Bunny (M) Armadillo Regular Cube
Single Element -94% 0% -2055% -2400% -29%
Pressure Wave -63% 26% -1948% -1853% 4%
Shewchuk -3337% -5900% -4654% -4372% -3081%

Table 3.2: Relative difference of the single element approximations to the maximum
eigenvalue of the system matrix M 1 K . The oscillation frequency method is conserva-
tive; the resulting approximation are mostly small. A similar range is shown by the pres-
sure wave method. However, it also shows too optimistic results. Shewchuks method is
also conservative, but the computed approximations show a very high error.

yields a valid upper bounds for the maximum eigenvalue and the results are distributed
only over a smaller range. However, these are too conservative compared to the single
element approximation.
Extending the region from single elements to n-rings improves the approximation of the
largest eigenvalue significantly. Table 3.3 depicts the accuracy achieved for the element
1-ring, the element 2-ring, the vertex 1-ring, and the vertex 2-ring. The results are relative
to the largest eigenvalue of the full system matrix. Evidently, the element n-ring methods

Model
Method Liver (M) Bar (F) Bunny (F) Dragon Duck
Single Element -96% -166% -2839% -1931% -1223%
Element 1-ring -13% -29% -1498% -202% -231%
Element 2-ring -5% -19% -796% -202% -98%
Vertex 1-ring -41% -39% -1498% -202% -242%
Vertex 2-ring -41% -39% -1498% -202% -242%

Table 3.3: Relative difference of the ring-based approaches to the largest eigenvalue of
the full system matrix. Using larger regions improves the results compared to the single
element approximation. Note that the vertex based rings are less efficient than the element
based rings.

improve the estimation significantly for most meshes. However, quite a few meshes do
not show any improvement using the 2-ring. This is the case for the Dragon, Bunny (C),
Bunny (M), and Armadillo meshes. Similarly, the vertex 1-ring improves the estimation,
but enlarging the region to a 2-ring around a vertex does not improve the results further.
This behavior is observed for all the meshes. In general, using larger regions improves the
accuracy of the computed oscillation frequencies, but reduces the precision of identifying
a specific element.
In Section 3.3.3 we introduced the concept of reduced n-rings, especially the reduced 1-
ring. By including the direct neighborhood, but fixing the boundary nodes, we account for
56 3. S TABILITY OF E XPLICIT I NTEGRATION

the influence of the neighbors on the central element but still perform the identification per
element. Similarly, the reduced 2-ring takes the neighborhood of the 1-ring into account.
The results in Table 3.4 show that the reduced 1-ring and reduced 2-ring methods are very
accurate for all meshes. While the reduced 1-ring metric is too optimistic with a very

Model
Method Liver (C) Bar (F) Bunny (C) Bunny (F) Dragon
Reduced 1-ring 0.25% 1.28% 0.03% 1.28% 0.00%
Reduced 2-ring 0.00% -0.17% 0.00% 0.00% 0.00%

Table 3.4: Relative difference of the reduced 1-ring and 2-ring approximations to the
maximum eigenvalue of the system matrix. The computed bounds are very accurate,
however, a bit too optimistic in the case of the reduced 1-ring.

small error, the extension to the reduced 2-ring is an extremely accurate approximation;
only the Bar (F) shows a small, albeit conservative, error. Again, using regions larger
than a single element reduces the precision of the method significantly; instead of only
the central element, we would need to mark the whole region as ill-shaped. Moreover,
because we need to process the elements further, a precise identification is preferable to
keep the set of ill-shaped elements small and thus minimize the computational overhead.
By adding a margin of 10% to the reduced 1-ring method as suggested in Section 3.3.3
we remove the false negatives, which makes this method the best compromise between
accuracy and precision.

3.6.3 Element Quality Distribution

In Section 3.4, we presented localized identification metrics to approximate the maximum


oscillation frequency. In the previous section we showed that the single element and
the reduced 1-ring metric are both valid approximation methods. They can be used to
assign a quality value to each element in a mesh. In this section, these metrics are used to
compute the highest oscillation frequency for each element in a mesh. From the computed
frequencies, we build a histogram to visualize the distribution of the various element
qualities in a mesh. This allows to visualize the effect of applying the quality metrics
with a distinct target time step ht . Typically, when creating an arbitrary mesh, only few
elements have very high oscillation frequencies above the mesh average. The target time
step ht is chosen best, such that it is as high as possible, but as few as possible elements
would be affected.

For the liver meshes given in Figure 3.5, we apply the reduced 1-ring metric to visualize
the distribution of the element qualities within the mesh. The mean ratio R/(3r) of the
circum-sphere radius R to the inscribed-sphere radius r for the four meshes are 1.46, 1.39,
1.34, and 1.81, with a standard deviation of 0.39, 0.36, 0.25, and 0.77, respectively (note
3.6. I DENTIFICATION P ERFORMANCE 57

that this standard mesh quality metric is normalized such that a uniform tetrahedron has a
ratio of 1). For each of the four meshes, Figure 3.6 shows the histogram of the oscillation

15
Liver (C)

10

0
0 1.2 2.4 3.6 4.8 6 7.2 8.4 9.6 10.8 12 13.2 14.4 >15.6

10
Liver (M)

0
Element count

0 1.2 2.4 3.6 4.8 6 7.2 8.4 9.6 10.8 12 13.2 14.4 >15.6

10
Liver (F)

0
0 1.2 2.4 3.6 4.8 6 7.2 8.4 9.6 10.8 12 13.2 14.4 >15.6

15
Liver (XF)

10

0
0 3 9 15 21 27 33 39 45 51 57 63 69 >75
3
Oscillation Frequency 10 [rad/s]

Figure 3.6: Distribution of oscillation frequencies for the liver meshes. The means shift
to higher oscillation frequencies as the discretization size of the meshes gets smaller.

frequencies of all the elements. The first and the last bar in each histogram contain the
combined count of all the elements which have a smaller/higher maximum frequency,
respectively. The frequencies are computed for soft tissue like parameters with a Youngs
modulus of E = 90 kPa, a Poisson ratio of = 0.45, and a density of = 1000 kg/m3 .
While the four meshes occupy approximately the same space, their discretization varies.
From Liver (C) to Liver (XF) the number of elements grows by about a factor of 30,
thus the average volume per element decreases. The histograms show this effect nicely
as the mean of the distributions shifts to higher frequencies. The highest frequencies for
the meshes from top to bottom are 1.5 104 rad/s, 4 104 rad/s, 2.9 104 rad/s, 3.4
106 rad/s. Note that these highest frequencies are collectively shown in the last bar on
the right of each plot. Only a few elements in each mesh have very high oscillation
frequencies and thus dominate the time step for explicit integration; most elements have
smaller frequencies accumulating on the left side. A typical goal of mesh optimization
techniques is to remove these high frequency elements and thus reduce the width of the
58 3. S TABILITY OF E XPLICIT I NTEGRATION

histogram. However, these optimizations are hard and thus some of these elements usually
remain in the mesh and still dominate the time step.

Next, we study a small example, which visualizes the number of ill-shaped elements in a
mesh when setting a target time step ht . From the data of the Liver (C) mesh in Figure 3.6,
we compute the cumulative distribution and compute their respective maximum allowable
time step using (3.10); as an integration scheme we choose the Symplectic Euler. The
cumulative distribution is given in Figure 3.7(a). If ht is set below 0.12 ms none of the
elements is identified as ill-shaped. As ht is set to higher values more and more elements
are identified as ill-shaped given the target time step. Setting ht to 0.3 ms, the reduced
1-ring metric would identify about 14% of the elements as critical to the stability of the
explicit integration.

100 100
Percentage of identified elements [%]

Percentage of identified elements [%]

80 80

60 60

40 40

20 20

0 0
0.1 0.5 1 1.5 0.1 0.5 1 1.5
Target time step [ms] Target time step [ms]

(a) Reduced 1-ring metric (b) Single element metric

Figure 3.7: Cumulative distribution of the oscillation frequencies for the Liver (C) mesh
using oscillation frequency quality metrics. The single element metric (right) clearly
identifies more elements as ill-shaped for a given time step than the reduced 1-ring metric
(left).

Alternatively, instead of using the reduced 1-ring metric, the results using the single el-
ement oscillation frequency metric on the Liver (C) are plotted in Figure 3.7(b). At the
same ht of 0.3 ms we now identify 34% of the elements as being ill-shaped. Both meth-
ods provide robust approximations of the critical time step, however, the single element
metric identifies more elements as ill-shaped for a given time step ht due to being more
conservative. Ideally, the time step is set such that no element is ill-shaped. However, this
is not always possible due to computation time limits or limitations of remeshing algo-
rithms. In Chapter 4, we investigate methods allowing to handle these ill-shaped elements
dynamically.
3.7. E XTENSION TO N ONLINEAR M ODELS 59

3.7 Extension to Nonlinear Models

So far we looked at linear FEM only. In this section we use the same techniques to discuss
a preliminary extension of our metric system to non-linear FEM. Following [Schwarz and
Kockler 2006] the explicit integration stability can be measured by linearizing the func-
tion f (x) and looking at the Jacobian J . By computing the first order Taylor approxima-
tion of f at the position x we obtain the necessary Jacobian

f (x + u) f
= f (x) + (u). (3.31)
x x x

The Jacobian J = f has the same properties as K with respect to the stability of the
x x
system as both are the first derivative of the force function. Thus, instead of looking at the
non-linear function, we compute the linear tangent stiffness matrix K T (see Section 2.2.5)
and use the largest eigenvalue of M 1 K T to compute a satisfiable time step for the
explicit integration.
Similar to Section 3.3, we use the same technique to provide an element-based approxi-
mation of the largest eigenvalue. However, instead of constant element stiffness matrices,
the element tangent stiffness matrices are used. The process of identifying ill-shaped el-
ements in non-linear FEM given the target time step ht is then similar to the linear case.
But, the dependency of the material laws on the current position requires to perform the
identification process whenever the positions of the mesh change. Thus, in each simu-
lation time step, we recompute the tangent stiffness matrix of each tetrahedral element
K e (t). The identification algorithm is then the same as in Section 3.4. Given a target
time step ht , we iterate over all elements e of the mesh and build the system matrix of
the reduced 1-ring of e. Computing the highest oscillation frequencies of e and by using
the CFL condition, we obtain an element limit time step he . If he > ht , the element e is
well-shaped, otherwise, e is ill-shaped.

3.7.1 Performance

In this section, we demonstrate the application of the non-linear version and discuss the
runtime costs and usability. In two experiments, we show that the time step is related to
the current strain the elements are undergoing.
In a first experiment, we look at the Bar (C) mesh, fix it on one side, and stretch it with a
constant displacement per time step. Figure 3.8 shows the initial bar and the deformation
after 0.44 s. In Figure 3.9 the computed maximum time step is plotted against the stimu-
lation time; the total strain in stretching direction is linear to the time. The maximum time
step is computed once using the reduced 1-ring method (full line) and once by computing
the largest eigenvalue of the global system matrix M 1 K T (dashed line), where K T is
the global tangent stiffness matrix.
60 3. S TABILITY OF E XPLICIT I NTEGRATION

Figure 3.8: The Bar (C) mesh is stretched using non-linear FEM. The figure on the left
depicts the rest state; the on the right the deformed mesh after 0.44 s.

Figure 3.9: Effect on the critical time step when stretching the Bar (C) mesh. During the
simulation the strain grows linearly; the computed maximum time step decreases. The
dash-line graph is obtained using the maximum eigenvalue of the global stiffness matrix,
the full-line graph using the reduced 1-ring metric. The simulation time step is 1.1 ms.
On the right the simulation is unstable and solution diverges.

The simulation time step is 1.1 ms and the material parameters are E = 30 kPa, = 0.3,
and = 1000 kg/m3 . Initially, the estimated time steps are higher than the simulation time
step, which allows to visualize a decreasing time step without stability problems. Higher
strains increase the stiffness of the equations and thus both methods estimate a decreasing
time step. This relation is expected and is intrinsic to non-linear material models. Without
adapting the time step throughout the simulation, the CFL condition is violated after a
certain strain threshold. As a consequence, the simulation gets unstable, which can be
observed in the graph by identifying a rapidly falling critical time step. Using the tangent
stiffness matrix is itself an approximation and thus, in contrast to the linear case, it is
possible to use a slightly higher time step then the one provided by the best estimate.
3.7. E XTENSION TO N ONLINEAR M ODELS 61

However, if the estimation provided by the Jacobian falls below the simulation time step,
we can assume that the integration stability is no longer guaranteed.

Figure 3.10: The Bunny (C) mesh is stretched using non-linear FEM. The figure on the
left depicts the rest state; the on the right the deformed mesh after 80 ms.

Instead of displacing nodes directly, we provide a second example where the Bunny (C) is
subject to a homogeneous force field stretching the mesh, while a few nodes are fixed; the
same material parameters as above were used with a time step of 1.6 ms. The resulting
deformation is depicted in Figure 3.10; the computed maximum time steps are plotted in
Figure 3.11. While the estimation using the full matrix is initially above the simulation
time step, the computed maximum time step using reduced 1-ring metric are lower. For
about two-thirds no change in the critical time step is observed; in contrast to the former
example the elements do not stretch equally. However, after some point the critical time
step starts decreasing. After the full matrix estimation falls below the time step, large
oscillations start appearing and the simulation destabilizes. This unstable region is marked
in the figure.
Due to the used margin the reduced 1-ring method is more conservative than the full
system matrix estimation but still provides a reasonable estimate to judge the stability
of the nonlinear simulation. However, compared to the latter, it can be used to identify
single elements. Evaluating the necessary critical time step needs to be done each step.
Looking at the bar mesh, the identification time is many times larger than the integration
time. Computing the dynamics takes 0.36 ms, while computing the acceptable time step
takes 4.6 ms using the reduced 1-ring method and 10.8 ms for the full matrix evaluation.
In case of the bunny the integration time is 0.47 ms, the identification time using the
reduced 1-ring is 3.9 ms and for the full matrix is 8.3 ms. Given the accuracy of the
identification, the tangent stiffness matrix is a valuable tool to track the stability of non-
linear FEM simulations efficiently. However, for real-time simulations, the metrics are
62 3. S TABILITY OF E XPLICIT I NTEGRATION

Reduced 1-ring Full matrix Timestep

1.8
1.6
1.4
1.2
Timestep [ms]

1
0.8
0.6
0.4
0.2
0
0.0016 0.0176 0.0336 0.0496 0.0656 0.0816 0.0976 0.1136 0.1296 0.1456
Time [s]

Figure 3.11: Effect on the critical time step when stretching the Bunny (C) mesh. With
growing strain the maximum time step sinks. The dash-line graph is obtained using the
maximum eigenvalue of the global stiffness matrix, the full-line graph using the reduced
1-ring metric. Below the simulation time step of 1.6ms, the simulation starts destabilizing.

not fast enough because they need to be evaluated each integration step. In contrast to
linear FEM the tangent stiffness matrix depends on the current state of the deformation
and thus the highest oscillation frequency changes.

3.8 Discussion

This chapter discussed the problems concerning stable explicit integration for linear and
non-linear FEM. In order to determine a stable simulation time step, we used the CFL
condition derived from the oscillation frequencies of the system matrix. The stable time
step has to be chosen small enough such that the highest oscillation frequency can be
integrated without the solution diverging. This allows to easily determine the integra-
tion stability of a mesh without taking a closer look a the exact geometry or material
parametrization. However, determining a valid simulation time step for a complete mesh
is costly. Moreover, it is then not possible to determine which elements of the mesh are
limiting the size of the time step.

With this disadvantage in mind, we developed different single-element based methods


able to approximate the highest oscillation frequency locally. By considering each el-
ement in isolation, we achieve a reliable, albeit, not very accurate result. No auxiliary
matrices need to be computed; only the mass and stiffness matrices of a single element.
3.8. D ISCUSSION 63

However, in practice we find meshes whose maximum eigenvalue is magnitudes smaller


than the estimation derived by this method. In case of linear FEM, this approach uses
values that can be pre-generated, therefore, simplifying the computations. The possibility
to store the estimations for each element naturally makes these approximations a good
choice in situation where the topology of the mesh is changing. Only elements changed
or created during simulation, e. g. through cutting, need to be evaluated because the only
information used is local to the element. The second single-element based method ex-
ploits the CFL condition directly by considering the time a pressure wave needs to travel
through the material. This method can be applied to any mesh with known geometry and
material properties, without having to compute expensive stiffness matrices. In general
the method is conservative, but in a few cases the estimator is too optimistic. Therefore,
in practice it is not very reliable. Any method neglecting the influence of the integration
scheme and the exact geometry will produce results which are only applicable in lim-
ited cases. Moreover, the fact that both methods ignore the influence of the neighboring
elements makes them more conservative than necessary.
In order to compensate the missing neighborhood information we extended the region,
effectively showing that the precision can be increased without being too optimistic. The
downside, however, is that we obtain the critical time step of a region and not of a single
element. This makes it impossible to determine which elements actually are ill-shaped
and which are not. Consider a mesh where one element is extremely ill-shaped and the
surrounding elements are not critical. Applying the extended methods will determine a
high oscillation frequency for each set that contains this one element. As a consequence,
more elements will be marked as ill-shaped than necessary. Furthermore, when modifying
the topology all the sets where elements changed need to be re-evaluated. On the other
hand, the extended regions can serve as a good first estimate, which could be refined
later on using secondary estimators. However, we did not pursue this option because the
overhead of re-evaluating changed configurations is not practical.
Instead of computing the highest oscillation frequency from the system matrix of the ex-
tended region, we demonstrated that the one from our reduced 1-ring metric is enough to
produce an approximation with a very low error. The assumption that the boundary nodes
of the 1-ring are fixed increased the accuracy tremendously. However, this simplification
introduced errors and led to slightly too optimistic results. In order to obtain a reliable
method, we heuristically determined a small margin. When facing changing topologies
only few elements need to be reevaluated, i. e. newly created or changed elements and
those directly adjacent to them. The reduced 1-ring metric is a very good compromise
between the accuracy of the approximation and the number of elements erroneously clas-
sified as ill-shaped.
Typically only few elements in a mesh dominate the highest oscillation frequencies and
hence require a small time step. However, these elements typically cannot be removed by
remeshing approaches. For the majority of elements the required time steps are distributed
over a wide range. Thus, we could use these metrics to steer remeshing algorithms to
64 3. S TABILITY OF E XPLICIT I NTEGRATION

specifically remove these elements. Unfortunately, it may still not be possible to fix these
elements. Even more important, in interactive simulations where the topology can change
dynamically an extensive remeshing algorithm may be computationally not feasible. For
these reasons, the following chapter presents ways to specifically handle such ill-shaped
elements dynamically without requiring any changes to the geometry.
4
Handling Ill-shaped Elements

In the last chapters we have looked at how deformations are computed using explicit in-
tegration and what influences the stability of this integration. This allowed to determine
a stable integration time step for a given mesh, material parameterization and integration
scheme combination. Moreover, it allowed a precise prediction of the necessary integra-
tion time step for each element. This chapter presents the methods we developed allowing
to use a time higher than the critical time step and handle ill-shaped elements dynamically
at runtime. The developed algorithms share a basic common scheme by working in two
phases. The first phase is executed at initialization. For each element it determines the
necessary maximum integration time step given the simulation parameters. This step,
however, slightly varies for each method. The second phase is the actual integration. Tak-
ing into account the identified ill-shaped elements, different relaxations are applied such
that the integration is still stable. For all the methods the underlying deformation model
is the linear corotational FEM introduced in Chapter 2.

Over the next sections, we present four approaches to handle ill-shaped elements. The first
method in Section 4.1 applies a filtering method to remove high modal frequencies from
the simulation. Next, in Section 4.2, we show a hybrid deformation model mixing a classi-
cal FE model with a geometric deformation model. Section 4.3 details an implicit-explicit
integration scheme to handle ill-shaped elements. The last method, in Section 4.4, han-
dles ill-shaped elements by employing an adaptive time integration scheme. Section 4.5
completes this chapter with elaborate performance measurements, comparisons between
the methods and applications to cutting simulations.

4.1 Filtering High Modal Frequencies

As discussed in Chapter 3 high modal frequencies require small time steps in order to
run the simulation in a stable manner. Similar in spirit to modal reduction techniques we
present a technique to filter these high modal frequencies that require the use of a small
66 4. H ANDLING I LL - SHAPED E LEMENTS

integration time step. In contrast to existing approaches, we perform this filtering on a per
element basis instead of filtering all the frequencies globally.

In a standard corotational FEM simulation we apply filters to the computed forces and
velocities to ensure a stable explicit time integration. This approach is a combination of
two components. We assume that the time step hmin of the simulation is given. It can
be chosen heuristically, e. g., based on real-time consideration like the expected computa-
tion time for one simulation step. In cutting simulations, we additionally demand that the
time step hmin does not change during the simulation, even though topological changes
might induce ill-shaped tetrahedra requiring a smaller time step. First, we identify the
elements which cannot be simulated with hmin , i. e., which have vibration modes exceed-
ing the maximum allowable modal frequency. As input, we take an object with a domain
discretized into tetrahedral elements. In addition, we assume that the material properties
(Youngs modulus, Poisson ratio and density) of the object are given. Furthermore, we
employ an explicit integration method, such as the Symplectic Euler scheme (see Sec-
tion 2.3.1), to numerically evolve the object in time. We use the single element approach
for the identification, as detailed in Section 3.3.1. This process requires the solution of
an Eigenvalue problem of size 12 12 for each element. We then filter the computed
modal frequencies of the found elements along the affected directions by directly altering
the stiffness matrices. Note that this filtering prevents us from using the reduced 1-ring
identification scheme because it would require to re-evaluate the neighboring elements.
This process is done before computing restitution forces and time evolution.

In order to account for the changed material properties, and to prevent element inversion
of the softened elements, the second part of the algorithm rigidifies those elements along
the determined directions. In order to accomplish this, we define a set of constraints and
employ the fast projection method [Goldenthal et al. 2007], implemented as a velocity
filter, to enforce the constraints. In doing so, we use an approach recently proposed by
Thomaszewski et al. [2009], who extend this concept to continuum based constraints
and as such define a Neo-Hookean material law. This new material is linear within the
constraints and inextensible otherwise. While Thomaszewski et al. limit the strain of all
elements, we identify the ill-shaped elements beforehand and only consider their defor-
mation. Therefore, we obtain a significantly smaller system of constraints whose solution
is feasible in real-time.

In the following sections we present the process in more detail. In Section 4.1.1 we outline
how to identify ill-shaped elements, and compute and filter the modal frequencies. Next,
in Section 4.1.2 we detail the constraint handling. Section 4.1.3 discusses simulation
characteristics and finally, Section 4.1.4 examines limiting aspects of this method.
4.1. F ILTERING H IGH M ODAL F REQUENCIES 67

4.1.1 Relaxation of Element Vibration Modes

The basic idea behind this method is that we can approximate the highest modal frequency
per element as detailed in Section 3.3. Given a target time step hmin , we want to modify
the modal frequencies of each element such that it fulfills the CFL condition. Evaluating
the CFL condition is done using the single element approximation (see Section 3.3.1). In
contrast to just classifying an element, we compute all its vibration modes. We determine
the vibration modes of each element e by solving the generalized eigenvalue problem
from the elements mass M e and stiffness K e matrices
e e = (M 1
e K e )e , (4.1)
where the diagonal matrix e contains the eigenvalues of M 1 e K e , and the columns of
the matrix e correspond to the eigenvectors. The eigenvalues i = [e ]ii are equal to
the squared modal frequencies i2 , and the eigenvectors i are identical to the directions
along which the vibrations work. Stable integration can be achieved by integrating only
the modes which satisfy the CFL condition.
We first compute the matrix e by solving the corresponding 12 12 eigenvalue problem
with QR decomposition. If the frequencies exceed the maximum oscillation max from
(3.19), we filter the corresponding frequencies i and form a new matrix e from the

filtered frequencies i
{
i = i if i me max
2
2 2
. (4.2)
me max if i > me max
From e , we assemble the filtered stiffness matrix K e = T e e . The modified mate-
e
rial will react less to forces in the direction of the eigenvector corresponding to the mod-
ified modes. Thereafter, we obtain stable time integration. These material modification
can be done as a pre-computation step for simulations without topological changes. In
cutting or fracturing simulations, additional filtering needs to be done for those elements
whose topology has changed. This is possible at runtime due to the derived per-element
approximation of the modal frequencies.
However, because the modified elements react less to strains in the direction of the re-
moved modes, they tend to invert or distort. In the limit the material becomes infinitely
soft along the filtered directions. In order to prevent elements from collapsing, we add
constraints rigidifying the filtered modes. These constraints try to keep the filtered el-
ements as close as possible to the ground truth. The ground truth, in this case, is the
element without material relaxation, simulated with a sufficiently small time step.

4.1.2 Constraining Element Deformation

To constrain the deformations of the filtered elements, we use the fast projection method
[Goldenthal et al. 2007], which is a manifold projection technique [Hairer et al. 2006].
68 4. H ANDLING I LL - SHAPED E LEMENTS

The conceptual advantage is that fast projection can be implemented as a velocity filter,
which directly modifies the future positions and velocities of the points after the numerical
time-integration.

(a) Well-shaped, but small (b) Needle (c) Wedge

(d) Spindle (e) Sliver (f) Cap

Figure 4.1: We distinguish six different types of ill-shaped elements [Bern et al. 1995].
Highlighted line segments mark edge (b), (c) or height (f) constraints. The other elements
(a), (d), (e) use volume constraints. The arrows show the direction of the highest modal
frequency.

An intuitive way to rigidify the elements along the filtered directions would be to directly
employ the eigenvectors corresponding to the removed vibrations as constraint directions.
However, since the eigenvectors and their derivatives cannot be computed analytically, we
instead classify the ill-shaped elements according to [Bern et al. 1995]. For each type of
degeneracy (see Figure 4.1), we use tailored constraints to rigidify the element along the
filtered directions. We further normalize these constraints to increase numerical stability.
The filtered directions align with the shortest distances in the tetrahedron, which can either
be an edge or a height. If more than one vibration mode is filtered or if multiple directions
within the same element have been filtered, we rigidify the whole element:

Well-shaped elements as in Figure 4.1(a) can cause instability if they are small,
compared to the average tetrahedron size in the mesh. Because all four heights are
4.1. F ILTERING H IGH M ODAL F REQUENCIES 69

relaxed, we rigidify the element completely by introducing a volumetric constraint


similar to [Teschner et al. 2004]
(((y x) (z x)) (w x)) V0
CV (x, y, z, w) = = 0, (4.3)
V0
where V0 is the undeformed volume and x, y, z, w are the corners of the tetrahe-
dron.

Spindles and slivers (Figures 4.1(d) and 4.1(e)) have relaxed directions perpendic-
ular to the plane spanned by the crossing edges. As such all four heights have to be
constrained, which we obtain by adding a volume constraint as in (4.3).

For wedges (Figure 4.1(c)) we use one edge constraint along the shortest edge. The
length constraint Ce (x, y) for an edge (x, y) is
y x2
Ce (x, y) = l0 = 0, (4.4)
l0
where l0 is the undeformed length of the edge.

For needles (Figure 4.1(b)) we use three edge constraints to approximate the relaxed
heights as in (4.4). Edge constraints are shared between tetrahedra and thus reduce
the overall number of constraints.

Finally, for caps (Figure 4.1(f)) we set a constraint along the shortest height. The
function Ch (x, y, z, w) maintaining the initial height h0 between a tetrahedron cor-
ner w and its opposite face (x, y, z) is
(((y x) (z x)) (w x)) h0
Ch (x, y, z, w) = = 0. (4.5)
h0

These constraints are symbolically differentiated and stacked in the constraint Jacobian
matrix C(x(t)). Then, the fast projection method (see [Goldenthal et al. 2007]) is em-
ployed to compute the feasible positions. Fast projection forms an algorithm iteratively
converging to C(x(t + h)) = 0. In each iteration j the system

h2 (C(xj )M 1 C(xj )T )j+1 = C(xj ) (4.6)

is solved using the sparse Cholesky decomposition from [Gunnebaud et al. 2010], where
M is the mass matrix, xj are the positions at iteration j. The solution j+1 is used to
compute updates to the positions

xj+1 = h2 M 1 C(xj )T j+1


xj+1 = xj + xj+1 . (4.7)

With each iteration the constraints are better enforced. Solving the system of equations
scales linearly with the number of non-zero blocks. Thus, the method is linear in the
70 4. H ANDLING I LL - SHAPED E LEMENTS

number of constraints. Experiments indicate that performing a small number of iterations


provides sufficient accuracy. Notice that in contrast to [Thomaszewski et al. 2009], we
do not impose constraints on the well-shaped tetrahedra, leading to significantly fewer
constraints.

4.1.3 Characteristics

In the following, we show how the filtered stiffness matrices augmented with constraints
influence the global deformation behavior. The material parameters were chosen to be
comparable to soft tissue under low frequency interaction. We set the Youngs Modulus
to E = 30 kPa, the Poisson ratio to = 0.3, and the density to = 1000 kg m3 . We
defer examples using higher Poisson ratios to Section 4.5.

1 2

0.75 1.5

0.5 1
0.375 0.75
0.25 0.5
0.125 0.25
0 0
[mm] [mm]

(a) Rest state (b) Deformation using (c) Deformation using


force constraints displacement constraints

Figure 4.2: Comparison between corotational and filtered FEM under external con-
straints. The error in both cases, force and positional constraints, is the largest near to the
area where the constraints are applied. Compared to the movement of the tip of 2.6cm
and the size of the mesh the errors are small.

We show the behavior of the constrained method with respect to force- and position-based
external inputs. We compare the resulting motion to the one of an unmodified corotational
FEM simulation running at a smaller time step. Ideally both the constrained and unmod-
ified simulation are stable and visually indistinguishable. We inspect the behavior of the
deformation model by running a simulation using the Liver (C) mesh subject to external
inputs. While the force-based constraints are chosen such that the simulation produces
a reasonable and stable deformation, the position-based constraints are chosen such that
the resulting simulation is similar. In both cases the mesh is fixed at the back and the
load is applied at the front pulling the nodes to the right. While the position constraint is
constant displacement per time step, the applied force constraint is increasing linearly up
to a maximum value. The critical time step for these settings is 0.3 ms. The simulation
time step was set to 0.8 ms, at which the constrained method filters 25% of the elements
(see Figure 4.2(a)).
4.1. F ILTERING H IGH M ODAL F REQUENCIES 71

The effect of a force load on a small number of nodes is depicted in Figure 4.2(b). It
shows the steady-state deformation after 2 s. Visually the results cannot be distinguished
easily. However, the spatial difference is visualized using a color coding scheme. The
average error and standard deviation are plotted in Figure 4.3(a).

Instead of using external forces we apply a direct position-based load, i. e. nodes are con-
strained to a fixed path. In Figure 4.2(c) the same nodes are constrained as in the previous
example. Their motion is along the same direction as the force was before. Mean error
and standard deviation are found in Figure 4.3(b). Note that in the case of direct position
displacements the errors naturally are zero at the nodes where the constraints are applied.
Hence, we exclude these nodes when determining the mean error.

0.4 0.9
0.35 0.8
0.3 0.7
0.25 0.6
Error [mm]
Error [mm]

0.2
0.5
0.15
0.4
0.1
0.3
0.05
0 0.2
-0.05 0.1
-0.1 0
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 2.00
Time [s] Time [s]

(a) Mean error and standard deviation using (b) Mean error and standard deviation using po-
force based constraints sition based constraints

Figure 4.3: Mean error of a sample deformation using the filtered FEM method. The
errors using both types of constraints, force and position based, are comparable. Note
that the motion is not exactly the same due to the different types of external constraints.
The distance between the initial and current position of the tip for both is approximately
2.6cm.

Comparing the mean errors of both samples to the mesh size and the applied deformation
gives us a hint on how noticeable the errors are. The bounding box of the mesh is 20
15 16 cm3 and the displacement of the tip is for both simulations approximately 2.6 cm.
The mean errors are below 1 mm for the force constraints and 2 mm for the position-
based constraints. These are only visible in a direct overlay rendering. If the nodes are
displaced much further the errors start to appear visually and increase rapidly to a point
where the simulation is not stable anymore. For both types of external constraints, the
error grows with the amount of overall deformation. Especially in areas where filtered
elements are compressed, this leads to artifacts in the constraint resolution. They cannot
be resolved stably leading to popping artifacts and finally unsolvable constraints. This
type of limitation is explained in more depth in the following section. It renders this
method difficult to apply in cases where high compression is present.
72 4. H ANDLING I LL - SHAPED E LEMENTS

4.1.4 Limitations

Using constraints to modify deformations is limited in some cases. In [Goldenthal et al.


2007] it is mentioned that too restrictive constraints lead to unsolvable systems and lock-
ing problems. In their case they change the constraint set from edge- to quad-based to
prevent these locking problems. Similarly, [Irving et al. 2007] do not use per tetrahe-
dron constraints for their volume preservation algorithm. In order to circumvent locking
their constraints are defined on the 1-ring of a node. In our case a growing number of
constraints in a single region may result in locking problems.
This locking problem can be twofold. On one hand the system of equations may simply
be overconstrained resulting in a singular matrix due to conflicting constraints. On the
other hand, although the system is still solvable, the resulting behavior may be too stiff.
In turn, this limits the applicability of this method to configurations, where the number
of connected ill-shaped elements is sparse. In similar spirit are conflicts between external
positional constraints and the internal constraints. Internal constraints and adjacent ex-
ternal constraints are modeled separately and thus may easily conflict, leading to locking
problems.

(a) Rest state (b) Corotational FEM (c) Filtered FEM

Figure 4.4: Effect of rigidification in the large stretch simulations. In Figure (a) the
mesh is in its rest state. Figures (b) and (c) represent the deformation after the mesh has
been stretched by a factor 2 using the unmodified and the filtered FEM, respectively. The
enlarged region shows the elements which are subject to the rigidification constraints.
Note that the wedge elements are less stretched in case of enforcing constraints.

Moreover, the constraints try to maintain the initial configuration of the tetrahedra (up
to the rigid modes), which works well in cases where only small strains are present.
Although the method can handle deformations with large displacements, large strains
conflict with the constraints. This effect shows differently for stretch and compression.
The effect when large stretches are involved, can be seen in Figure 4.4. The thin wedge
elements in the middle are ill-shaped and augmented with edge constraints along their
short edges. The mesh is stretched by a factor of 2, once using constraints and once
using a small time step reference simulation. Comparing the resulting deformations shows
4.1. F ILTERING H IGH M ODAL F REQUENCIES 73

that the constrained elements are less stretched than the corresponding elements of the
reference simulation. However, note that the thin elements do not contribute much to the
total stretching and thus this limitation has only minor visual effects.

(a) Unmodified constraints

(b) Modified constraints

Figure 4.5: Effect of rigidification in the large compression simulations. In contrast


to the simulation in Figure 4.4 the wedges are compressed. The top row shows three
close frames of a simulation using constraints to rigidify the vertical edges of the wedges.
The middle nodes oscillate due to the constraint resolution method. The bottom row
depicts the same simulation frames as the top row. However, this time the constraints were
modified to match the overall compression of the simulation, which leads to a reduction
of the artifacts.

A different effect can be observed for constrained elements which are subject to high
compression. Instead of stretching, as in Figure 4.4, we compress the mesh, depicted
in Figure 4.5. Again, we have edge constraints along the vertical short edges of the
thin wedges. Solving the system of constraints displaces the nodes along the edges. Any
lateral movement of such constrained points inevitably moves them away from a common
line inducing additional forces. Thus, it would be necessary to constrain the movement
of the points such that no lateral movement occurs. In case of this experiment this leads
to jittering of the middle nodes under high compression. These artifacts are depicted in
Figure 4.5(a). It shows three close frames of the simulation. Note that the middle row of
74 4. H ANDLING I LL - SHAPED E LEMENTS

the highlighted nodes in each line are not moving steadily. Rather they oscillate, causing
jittering artifacts.
There are two possible solutions to this problem. First, we could replace the edge con-
straints with volume constraints, which increases the possibility of locking. Second, the
goal length for the constraint resolution of the edges can be adjusted to match the com-
pression of the surrounding elements, which reduces these artifacts. An example using
this method is depicted in Figure 4.5(b). The figure shows the same frames as in Fig-
ure 4.5(a); the simulation behaves smoother with less jittering on the middle nodes. The
goal length of the constraints were set according to the overall compression of the model.
However, this way of adjusting the constraints is only preliminary and not practical for
meshes of larger size and arbitrary configuration.

4.2 Hybrid FE/Position-Based Deformation Method


In the previous section, we have proposed an approach to handle ill-shaped elements by
modifying the elements stiffness matrix. These modifications are performed by relaxing
high eigenmodes and introducing constraints to prevent element inversion along the re-
laxed modes. Locking problems and the necessity to solve a linear system of equations
still makes the method cumbersome. Thus, we opt for a less strict constraint method.
In this section, we present a hybrid deformation model mixing corotational FE and a
position-based deformation method in order to perform stable explicit integration. For
brevity we will refer to it as FE/PBD Method. In contrast to the previous method, we only
need to identify the ill-shaped elements; we do not need the exact modal frequencies.
Similar to the filtering approach, this solution strategy consists of two steps: First, we
determine the ill-shaped elements of a given tetrahedral mesh with corresponding mate-
rial parameters and target time step. The underlying algorithm builds upon the efficient
element-wise reduced 1-ring analysis of the local neighborhood of the tetrahedral ele-
ments (see Section 3.3.3). In summary, given a target time step ht , we iterate over all
elements e of the mesh, build the system matrix of the reduced 1-ring of e, and compute
the highest vibration frequencies of the nodes of e. Based on this, we obtain an elemen-
tal limit time step he . If he > ht , we consider e to be well-shaped, i. e., e M , and
otherwise, e M . In the second step, we combine a standard FEM and a simple ge-
ometric restitution model in a hybrid deformation method. That is, the deformations of
elements that can be stably simulated are computed with a standard corotational linear
FEM [Muller and Gross 2004]; in contrast, the deformations of the remaining elements
are determined with a geometric deformation model based on [Muller et al. 2005] and
[Rivers and James 2007], and the resulting displacements are blended with the restitution
forces at the interface nodes. In order to guarantee the geometric and dynamic continuity
of the resulting hybrid deformation model, we employ a novel parametrization of the ge-
ometric deformation model such that the superposition of the resulting nodal restitution
4.2. H YBRID FE/P OSITION -BASED D EFORMATION M ETHOD 75

displacements with the restitution forces resulting from the FE formulation preserves the
global static and dynamic deformation properties of the model. Next, we introduce the
restitution model with which we augment the FEM model.

4.2.1 Position-Based Deformation Method

The original method presented in [Muller et al. 2005] takes a point cloud without explicit
connectivity as input. A rotation matrix R and a translation vector t are computed such
that the difference between the current configuration x(t) and the translated and rotated
initial configuration is minimal in a least squares sense. The displacements di for point i
from the current position xi (t) towards its transformed initial position xi (0) is obtained
from the difference between the current and the transformed initial position,
di = Rxi (0) + t xi (t). (4.8)
The drawback of this method is that deformations are intrinsically global and local de-
formations cannot be modeled. Consequently, [Muller et al. 2005] proposed to partition
the mesh into clusters and extract the transformation per cluster in order to allow local
deformations. A method tackling this problem based on a regular grid clustering has also
been proposed by Rivers and James [2007].

Clustering

We adopt the clustering technique in order to realize the restitution of the ill-shaped el-
ements. However, instead of directly working on the ill-shaped elements, we define the
clusters as a set of nodes. In accordance to Rivers and James, who control material stiff-
ness by sizing the clusters wider or narrower, we build the clusters as small as possible.
This allows to take the original connectivity structure into account, where each node is
influenced by its directly connected nodes. We define each node i of an ill-shaped element
ej M as center of a cluster Ci ; each node directly connected to i that is also adjacent
to an ill-shaped element e M is added to Ci . Intuitively, Ci contains all points around
i that are incident to at least one ill-shaped element connected to i. Because a cluster is
defined for each node i adjacent to an ill-shaped element, there are at most 4M clus-
ters; by construction they are overlapping. Figure 4.6 shows a typical configuration in
2D. The elements e1 , e2 , e3 have been identified as ill-shaped, and thus are simulated with
the position-based deformation model. The cluster Ci for the point i contains the point i
itself, plus the points j, k, l, and m that are adjacent to i, and part of an ill-shaped element.
Note that only nodes adjacent to ill-shaped elements are added to the clusters.
For each cluster Ci , we extract the optimal rotation RC and translation tC , and obtain the
displacements dC i of the point i with respect to the cluster Ci as
1 ( C )
i = Ci R xi (0) + t xi (t) .
dC C C
(4.9)
n
76 4. H ANDLING I LL - SHAPED E LEMENTS

e'2
k i
e'1 m
e'3

Figure 4.6: Exemplary 2D cluster for central node i with ill-shaped elements e1 , e2 , e3
M ; the nodes j, k, l, m belonging to the cluster Ci .

Since this process is repeated for all clusters, we obtain displacements dC i for all points
and all clusters. However, by just adding up all the displacements the computed total
displacement would overshoot the desired goal position. In order to address this prob-
lem Muller et al. take the average final displacement, while Rivers and James weight the
resulting displacements with the node masses. However, both solutions do not preserve
linear momentum. In order to prevent overshooting and preserve linear momentum, we
scale the displacements of each cluster with the maximum number nCi of overlaps with
other clusters. To obtain the restitution displacements di , we sum all the dCi

dC
i
di = C
. (4.10)
C
n i

The resulting displacement preserves linear momentum and does not overshoot the goal
position. Finally, the restitution displacements are multiplied with h1 to obtain the resti-
tution velocities of the mass points, which are finally time-integrated.

Rotation extraction

At this point, we have to consider the case of a single isolated ill-shaped element. Espe-
cially for sliver- and wedge-type elements, the geometric arrangement of these points will
be close to co-linear or co-planar, and thus a standard procedure to extract the rotation RC
as found in [Muller and Gross 2004] will fail. Consequently, we follow Irving et al.[2004]
and compute RC using a singular value decomposition. We decompose the linear trans-
formation matrix from the rest state to the current deformed configuration into matrices
U V T , where U , V are orthonormal matrices and contains the singular values. The
rotation is obtained as the combination of the orthonormal matrices U and V .

RC = U V T . (4.11)
4.2. H YBRID FE/P OSITION -BASED D EFORMATION M ETHOD 77

For a valid rotation, det(U V T ) = 1 has to hold. If an element is inverted along a single
direction the determinant becomes 1. Irving et al. assume that commonly inversions
happen only along a single direction. Thus, if the determinant is negative, we multiply
the column of V which corresponds to the smallest singular value with 1. In doing
so, we eliminate the inversion along the shortest distance in a cluster. This makes rotation
extraction more robust especially when small and thin elements need to be handled, which
easily invert.

Elasticity parameter estimation

So far, we have computed displacements in (4.10) that, if applied to the mass points of
the elements, would immediately result in their undeformed positions. However, this is
not exactly our desired result. Instead, the goal is to mimic the behavior of elements in an
elastic material, although the computation is non-physical. Thus, we try to approximate
the behavior by scaling the displacements in (4.10) by a constant [0, 1]

dC
i
di = C . (4.12)
C
nCi

We propose a heuristic analytic formula to compute C for each cluster C such that the
dynamic behavior of a physically-based elastic material is approximated. We start by writ-
ing down the position update rule in the geometric model for a node i (in the following,
we drop the indices i and C),

x(t + h) = x(t) + d. (4.13)

To determine an appropriate value for , we compare this update rule with the computa-
tion step in an implicit Euler time-integration method,

(I + h2 M 1 K)(x(t + h) x(0)) = x(t) x(0). (4.14)

The rationale behind this approach is that (4.13) can effectively be regarded as a simple
numerical time integration method, where is the integration parameter. Consequently,
by determining such that the result of the update rule (4.13) is as close as possible to the
one of the physically-based update rule (4.14), we hope to approximate its behavior. The
update rule (4.14) holds only for small deformations. To account for large deformation,
we rotate the initial positions x(0) by the rotation matrix R and translate them by t. The
stiffness tensor K is updated accordingly,

x(t + h) (Rx(0) + t) = (I + h2 M 1 |RKR 1 1


{z }) (x(t) (Rx(0) + t))
| {z }

K d
1 1
= (I + h M 2
K) d. (4.15)
78 4. H ANDLING I LL - SHAPED E LEMENTS

Note that this relation corresponds to the original corotational formulation for linear elas-
ticity. We use the expression for x(t + h)

x(t + h) = Rx(0) + t (I + h2 M 1 K)
1 d (4.16)

to determine by plugging (4.16) into (4.13), which results in

x(t) + d = Rx(0) + t (I + h2 M 1 K)
1 d. (4.17)

Finally, to obtain an expression including we subtract x(t) and simplify, yielding

d = (I (I + h2 M 1 K)
1 )d. (4.18)

In order to determine the scalar value , we examine the matrix on the right-hand side
more closely. The time an element needs to return to its rest state configuration is domi-
nated by the highest eigenvalue max of M 1 K. Thus, we use this notion to approximate
the time a cluster of the geometric model needs to return to its rest state. Then, since R
is an orthonormal transformation that does not change the eigenvalues, we can consider
only K instead of K in the derived relation. We finally obtain

= 1 (1 + h2 max (M 1 K))1 . (4.19)

In a sense, is related to the fastest changing mode of a cluster. This creates a smooth
transition between the well- and ill-shaped elements. Elements which have a critical time
step much smaller than the simulation time step will be nearly rigid, while those with a
critical time step closer to the CFL condition will be softer. Note that it is not possible to
capture all material properties with a single parameter.

Combined Method

Given the integration time step and thus the set of ill-shaped elements, clusters are com-
puted as a preprocessing step as described in Section 4.2.1; during runtime only necessary
clusters need to be updated. The computations of the displacements of the well-shaped
elements using FEM and the ill-shaped elements using the geometric model are com-
puted independently. The superposition of the results of both methods define the final
displacement of the hybrid FE/PBD method.

4.2.2 Characteristics

In this section we show that our combination of the standard corotational FEM with a
geometric deformation model does not significantly influence the global deformation be-
havior. All timings were obtained using an Intel Core2 Quad Q9400 Processor running
4.2. H YBRID FE/P OSITION -BASED D EFORMATION M ETHOD 79

at 2.66 GHz. The material properties are comparable to soft tissue for low frequency in-
teraction. We set the Youngs Modulus to E = 30 kPa, the Poisson ratio to = 0.3, and
the density to = 1000 kg m3 . Comparisons using higher Poisson ratios are placed in
Section 4.5.
In the following experiments, we illustrate that the geometric errors introduced by the
non-physical geometric deformation model have only a minor influence on the global
deformation behavior. We provide two typical experimental setups. The first example
considers a bar subject to torsional loads. In the second example, we show a bar under-
going bending deformation. Both meshes are compared to reference simulations using
explicitly integrated FEM with a small time step.

Torsion

0.4

0.3

0.2
0.15
0.1
0.05
0
[cm]

Figure 4.7: Twisting deformation at different time steps. The time steps are 0.75 ms,
0.8 ms, and 0.9 ms, respectively. In the top row the elements treated with the geometric
model are highlighted. In the bottom row the absolute geometric error is shown, computed
with respect to a reference FEM simulation. The error is color-coded according to the
shown scale. The experiment indicates that the error grows with larger time steps.

We consider the Bar (F) mesh as shown in Figure 3.5 with a size of 4 4 20 cm3 .
While fixing one end, we rotate the other end of the bar around its central axis for 10 s
with an angular velocity of 0.25 rad s1 , thus twisting it. For the mesh, we determine
a limit time step of 0.55 ms. We now perform the dynamic simulation of the twisting
process with target time steps of 0.75 ms, 0.8 ms, and 0.9 ms, respectively. This requires
the replacement of 8.5%, 16.8%, and 45.4% of the elements, respectively. The elements
80 4. H ANDLING I LL - SHAPED E LEMENTS

classified as ill-shaped and handled with the geometric model are highlighted in the top
row of Figure 4.7.

0.3 0.3 0.3


Error [cm]

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12
Time [s]

Figure 4.8: The evolution of the mean error (along with the standard deviation) is plotted
over time for the three deformations in Figure 4.7. The error grows linearly as long the
bar is being deformed and stabilizes after the twisting stops.

The larger the target time step the more elements are replaced and simulated with the ge-
ometric deformation model, potentially leading to geometric inaccuracies. The mean ver-
tex distance error and its standard deviation for the time step of 0.75 ms are
= 0.0632 cm and = 0.0388 cm. For the time steps of 0.8 ms and 0.9 ms we ob-
tain = 0.1032 cm, = 0.0696 cm and = 0.124 cm, = 0.0836 cm, respectively.
The bottom row of Figure 4.7 illustrates color-coded the measured differences at the ver-
tices to the reference simulation. As can be seen, the geometric differences remain within
reasonable bounds. Figure 4.8 depicts the development of the error over simulation time.
It grows linearly and stabilizes when the maximum deformation is reached. For the ex-
periments was set according to Section 4.2.1. In comparison, when just setting this
parameter to 1, the error in these examples grows by 9%, 10%, and 19%, respectively.

Bending

For this experiment the same test mesh is used. As in the previous case we fix one end of
the bar. The other end is rotated downwards around the geometrical center of the bar in
rest state with an angular velocity of 0.25 rad s1 .

A time step of 0.8 ms is used and the simulation is run for 7 s, resulting at a frame rate of
500 Hz. For the time step we have to handle 16.8% of the elements with our geometric
deformation model (as in the previous experiment). The result of the simulation is shown
on the left in Figure 4.9. Again, the absolute error measured in cm is shown color-coded.
The mean error is 0.152 cm with a standard deviation of 0.112 cm. The strongly bent, high
strain region of the bar shows increased differences, but the overall error still remains
reasonable. The same linear dependency between error and time is encountered in this
experiment as depicted in Figure 4.10.
4.2. H YBRID FE/P OSITION -BASED D EFORMATION M ETHOD 81

0.6

0.45

0.3
0.225
0.15
0.075
0
[cm]

Figure 4.9: Error in a bending deformation example. The color-coding visualizes the
absolute geometric error, computed with respect to a reference FEM simulation. Elements
using the geometric model are highlighted. The example is using a time step of 0.8 ms,
resulting in only few ill-shaped elements.
0.3
Error [cm]

0.2

0.1

0
0 2 4 6 8 10
Time [s]

Figure 4.10: The evolution of the mean error (along with the standard deviation) is plotted
over time for the three deformations in Figure 4.9. The error grows linearly as long the
bar is being deformed and stabilizes after the bending stops.

4.2.3 Limitations

The presented method has problems in controlling deformation limitations in areas with
long thin clusters. The model of a polyp in Figure 4.11(a) shows these limitations very
well. In this dynamic simulation, we fix the base vertices and apply a gravitational force
to the model. In addition, a force pulling to the left is applied to one of the nodes at the
top. The value of this force increases linearly from 0 N to 5 N after one second. The
parameters are E = 30 kPa, v = 0.3, and = 1000 kg/m3 . The critical time step is
0.1 ms. The actual simulation time step is 0.4 ms, which means that 18 of 136 elements
are considered ill-shaped. As can be seen, the hybrid model is stiffer in the lower part of
the model; the slim area at the bottom resists bending and is too rigid. This is due to the
fact that the constrained elements are more rigid and in this case occupy a considerable
part of the total volume of the mesh.
Similarly, this stiffening artifact shows up when large strains are applied. Due to the
limited elasticity parameter of the geometric part and the averaging of the displacements
of the individual clusters, volume is not preserved well. For the example in Figure 4.12
82 4. H ANDLING I LL - SHAPED E LEMENTS

(a) Rest state (b) Corotational linear FEM (c) Hybrid FE/PBD Method

Figure 4.11: Rest state (a), corotational linear FEM solution (b) and hybrid FE/PBD
method (c). The polyp is deformed by exerting a linearly increasing force on a single
node. In the middle the reference simulation is run at a time step of 0.1 ms, on the right
the hybrid simulation uses a time step of 0.4 ms. In the lower part the material is too stiff
due to the introduced constraints.

(a) Corotational linear FEM

(b) Hybrid FE/PBD Method (h = 0.75 ms)

(c) Hybrid FE/PBD Method (h = 1 ms)

Figure 4.12: The three figures show the Bar (F) model stretched by 50%. The reference
solution using the corotational FEM is the top most. The middle and lowest image show
the deformation using the Hybrid FE/PBD method with time steps of h = 0.75 ms and
h = 1 ms.

again the Bar (F) model with the same parameters as before is used. The applied bound-
ary constraints force the model to stretch by 50%. The top figure shows the resulting
4.3. M IXED I MPLICIT-E XPLICIT I NTEGRATION 83

deformation using the corotational FEM at a small time step. The one in the middle and
at the bottom depict the results using the hybrid FE/PBD at target time steps of 0.75 ms
and 1 ms, respectively. The regions where the elements are simulated using the geometric
model are stiffer. These elements stretch more isotropic, their diameter orthogonal to the
stretching direction shrinks less than a corresponding element using the FEM.

4.3 Mixed Implicit-Explicit Integration

So far, we presented two methods combining FE with geometric constraint approaches. In


this section, we opt for a different strategy. Instead of selecting one particular scheme in
advance, we combine both explicit and implicit integration methods within the same sim-
ulation, in order to exploit their intrinsic benefits. This idea is known by the term IMEX
(IMplicit-EXplicit integration) and is used in many areas. For instance, in computational
engineering, IMEX schemes are used to handle coupled simulations, where fluid and solid
parts are connected by interface nodes [Belytschko and Mullen 1978]. In computer ani-
mation, IMEX has been used to combine linear with non-linear forces in cloth simulation,
e. g., to employ an implicit scheme to integrate stiff linear in-plane forces while explicitly
integrating the non-linear but weaker out-of-plane forces [Eberhardt et al. 2000].

We use the same concept to build an element-wise IMEX scheme for dynamic simulation
of deformable objects in order to handle ill-shaped elements. Similar to the approaches
in the previous sections, this method is based on the observation that we can separate the
elements into well- and ill-shaped ones based on the integration time step. In Section 3.4,
we have shown that this is possible efficiently and accurately given target time step ht . As
before, this can be done as a pre-computation step when then elemental stiffness matrices
are constant.

During simulation, the nodes connected to any identified ill-shaped element, are inte-
grated implicitly, while the remaining ones are evolved with an explicit scheme. Be-
lytschko and Mullen [1978] proved that such a combination of explicitly and implicitly
integrated nodes within the same mesh is stable as long as the highest oscillation fre-
quency of the explicit partition fulfills the CFL criteria. In cases of topological changes,
it is not necessary to reevaluate all the elements. Instead, only newly created elements
and those where the neighboring elements changed need to be evaluated. For mesh sizes
used in interactive simulations only limited computational effort is required, thus making
the approach applicable to real-time modifications such as cutting procedures. Before
we present the IMEX scheme, we look at how to divide the nodes into explicitly and
implicitly integrated nodes.
84 4. H ANDLING I LL - SHAPED E LEMENTS

4.3.1 Element identification

By using the reduced 1-ring metric, we first determine all ill-shaped elements of a sim-
ulation mesh. We define two sets E and E containing the well- and ill-shaped elements,
respectively. The identification is accomplished by using the reduced 1-ring approach
presented in Section 3.3.3 as it shows the best accuracy. Next, we group the mesh nodes
into a set N of nodes needing higher update rates than a target time step ht and a set
N of nodes being integrable with the target time step. In order to keep the implicit set
as small as possible the accurate identification of ill-shaped elements is a central element
of our method. Having as little false positives as possible is crucial, since the runtime
costs of the implicit part scales with the number of nodes in the implicit set N . For each
tetrahedron in E we place all nodes in a set N of nodes connected to ill-shaped elements.
The complementary set N is initialized with all the mesh nodes not in N . Additionally,
we assemble two sets of tetrahedra T and T . The former set contains all tetrahedra con-
nected to the nodes in N , while the latter contains all elements connected to the nodes in
N . Note that T is a strict super-set of E . Naturally, elements which are connected to
nodes in N and N are found in both sets and thus counted twice.

4.3.2 Matrix-free IMEX solver

The main part of this approach is the time evolution of the mesh nodes grouped into
explicitly integrated nodes N and implicitly integrated nodes N . In the following, we
use i to denote elements in N and i to denote elements in N . k denotes the total number
of tetrahedra and n and m the number of elements in N and N .
Although, the principles should work with any deformation model, we use the corotational
linear FEM (see Section 2.2.4) as in the previous approaches. The first step is then to
extract the rotation of the elements, which are used for both the explicit and implicit
integration step. We compute them at the beginning of each time step assuming that
the rotations remain constant throughout the step. This error introduces a small energy
dissipation [Chao et al. 2010], however, the effect is negligible for very small time steps
as we use them for explicit integration. Next, we perform an explicit integration step for
the nodes i N . We use the Symplectic Euler integration scheme, but any single step
scheme could be applied. The integration is done in a node-wise fashion, i. e.,
1
v i (t + h) = v i (t) + h (f K i u(t)) (4.20)
mi i
xi (t + h) = xi (t) + hv i (t + h), (4.21)

where K i u(t) R3 are the rotationally corrected, nodal forces computed from the stiff-
ness matrix and displacements assembled from the contributions of the adjacent elements,
and mi is the nodal mass. The nodal stiffness matrices K i are assembled from the ele-
mental stiffness matrices adjacent to the nodes in N , i. e. the elements in T . The first step
4.3. M IXED I MPLICIT-E XPLICIT I NTEGRATION 85

results in future positions xi (t + h) of all nodes which are adjacent to only well-shaped
elements. The explicitly computed velocities and positions are stable given the properties
of the CFL condition as shown in [Belytschko and Mullen 1978], and they are correct
with respect to the used integration scheme.

The positions and velocities of the remaining nodes (the nodes in N ) are integrated im-
plicitly in the third step. In order to update the implicitly-integrated nodes i N , we
consider a global system, where the previously computed future displacements of the
explicit nodes are conceptually treated as zero displacement boundary conditions, as pro-
posed in [Belytschko et al. 1979]. We build a linear system of equations with 3m degrees
of freedom (i. e. the nodes in N ) for the future velocities v (t + h) R3m

(M + h2 K )v (t + h) = M v (t) + hf hK u (t) hf (4.22)


u (t + h) = u (t) + hv (t + h), (4.23)

where the 3m3m matrices M and K are composed of the contributions of the implicit
nodes. Likewise the vectors v (t), u (t), and f are of size 3m. Further, the vector f is
constructed from the nodal forces f i = K i u(t + h) of the explicitly integrated interface
nodes, i. e., those explicitly integrated nodes i N which are adjacent to implicitly in-
tegrated nodes i N . f effectively encodes the aforementioned boundary conditions.
Due to the fact that we do not know the appropriate rotations of the interface tetrahedra,
we make a simplification and assume that the rotations applied to the interface nodes do
not change. Of note is that the system (4.23) is of size 3m 3m. Compared to a full
implicit system of size 3k 3k it can be considerably smaller depending on the number
of nodes in N .

In order to solve (4.23), we use the conjugate gradient (CG) method following [Shewchuk
1994]. We implemented the algorithm from scratch to incorporate handling topology
changes resulting from cutting. The main difference to standard solvers is that we do
not assemble any large matrices, thus reducing the need for additional data structures to
store them. Instead of reassembling the stiffness matrix each time step, due to changed
rotations, we compute the matrix-vector product on the fly. Further, having deep knowl-
edge of the matrix structure, this algorithm is easy to implement on parallel computing
devices. Using the current velocity as starting point, the computation of the initial residual
is simplified. A full reference of the algorithm is presented in Section 5.2.

Naturally, the presented algorithm works on any combination of ill-shaped elements. If


all nodes are in N the system is the same as it would be using the implicit Euler as
integration scheme. In most use cases, however, the number if ill-shaped elements is
small. Moreover, these elements are typically spread over the mesh and often form groups
of independent clusters separated through non-critical nodes.
86 4. H ANDLING I LL - SHAPED E LEMENTS

Clustered IMEX

In many cases there exist independent clusters of implicit nodes, not connected by tetra-
hedra in the implicit set T . Thus the system of equations resulting from these implicitly
integrated nodes can contain independent equations. The matrix-free formulation makes
it easy to separate these smaller independent systems of equations. Solving each cluster
individually is possible because the implicitly integrated nodes do not influence the prop-
erties of the interface nodes. In general, computing the update variables for each cluster
independently increases the overall convergence rate. We observe that smaller systems
converge faster than larger ones, and thus the number of iterations is limited by the size
of the larger groups. In general, the iterative process terminates with fewer iterations,
speeding up the overall solving process. Implementation aspects of the cluster variant are
covered in Section 5.2.1.

4.3.3 Characteristics

In this section, we compare the performance of our IMEX formulation to fully implicit
and explicit algorithms. We also illustrate the relation between the time step of the nu-
merical integration and the numerical dissipation in implicit schemes. Deformations were
computed using a linear corotational finite element method (see Section 2.2.4). The tim-
ings were obtained on a single core of an Intel Core2 Quad Q9400 CPU running at 2.66
GHz.

Note that this method only provides a benefit within a certain region of time step val-
ues. For time steps smaller than a minimum time step hmin , all simulation nodes can be
integrated explicitly, thus our method corresponds to an explicit method. For time steps
larger than a maximum time step hmax , all nodes must be simulated implicitly, thus our
method corresponds to an implicit method. The values hmin and hmax cannot be obtained
analytically, but instead depend on the geometry of the mesh and on the parameters of the
modeled material. As a rule of thumb, inhomogeneous meshes where elements differ sig-
nificantly in shape and size will result in a larger range [hmin , hmax ], while homogeneous
meshes with only regular sized and shaped elements will result in a very narrow range.

For a given time step h [hmin , hmax ], the element identification method detects the set
of nodes i N which cannot be integrated stably with h, thus those nodes are integrated
implicitly. As a consequence, the performance of the IMEX for h [hmin , hmax ] depends
on both the number and the connectivity of the nodes in N ; still performance is com-
monly higher than for pure explicit solvers. Note that the timings do not include the time
to identify the ill-shaped elements, because this is commonly done as a pre-processing
step. However, online identification of cut and newly created elements is comparably
cheap. Timings show that the identification test takes about 0.014 ms per element.
4.3. M IXED I MPLICIT-E XPLICIT I NTEGRATION 87

Comparison with explicit methods

In order to compare our method to explicit solvers, we perform a series of experiments.


Elastic objects are first deformed, while one part is subject to boundary displacement con-
straints. The loads are then removed and the velocities are set to zero, thus causing objects
to vibrate about their resting state. In order to demonstrate the performance independently
from any random cluster configuration, we solve the implicit system of equations in one
block ignoring any independent clusters. However, we also run all of the subsequent
experiments also with cluster support. These results are deferred to a later part of this
section. The simulation for all integration methods start from the same deformed state
and thus with the same deformation energy. As we do not include any structural damping
in the deformation model the initial energy remains throughout the simulation. We first
determine the minimum time step hmin at which all nodes can be integrated explicitly.
Then, the performance of a semi-implicit Euler method running at a time step of hmin is
used as basis for comparison.

(a) Armadillo (b) Dragon

(c) Liver (M) (d) Bar (F)

Figure 4.13: Performance (full line) of one simulated second of various models at differ-
ent time steps. The dotted line denotes the performance of an explicit method running at
the highest possible time step, e. g. 0.01 ms for the Armadillo model. The diamonds show
the number of detected ill-shaped elements for the different time steps.

We stage this experiment for a number of meshes shown in Figure 3.5 (Armadillo, Dragon,
Liver (M), and Bar (F)). The dimensions of the models are summarized in Table 3.1; the
88 4. H ANDLING I LL - SHAPED E LEMENTS

material parameters for all models are set to E = 30 kPa, = 0.3, and = 1000 kg m3 .
The results of the experiment using the Armadillo mesh are shown in Figure 4.13(a). The
time to compute one simulation second is plotted for different time steps h > hmin =
0.01 ms along with the number of elements classified as ill-shaped for a selected time
step. Note that for the purpose of comparison the resulting time of performing a purely
explicit integration with hmin is visualized by projecting the value as a line into the plot.
This only serves as a reference (the explicit method becomes unstable for h > hmin ). As
can be seen, this method results in a reduction of the computation time up to a factor of
4.6. For time steps h > hmax = 0.15 ms, all nodes have to be simulated implicitly, thus
our method then corresponds to a purely implicit method.

(a) Armadillo (b) Dragon

(c) Liver (M) (d) Bar (F)

Figure 4.14: Remaining total energy after one simulated second. The dotted line indicates
the energy of the explicit integration. The full lines indicate the total energy of the IMEX
and the implicit method.

Similar results have been found for the Dragon mesh (Figure 4.13(b)) and the Liver (C)
mesh (Figure 4.13(c)). We obtain a maximum speedup of 3.3 and 2.7, respectively. The
Bar (F) model (Figure 4.13(d)) turns out to be a worst-case scenario. The elements in the
mesh have similar shapes and sizes. There are no singular, particularly badly shaped ele-
ments present in the mesh. As a consequence, the limit explicit time step hmin = 0.5 ms is
already comparatively large. Our method only exhibits an improvement over the explicit
method within a relatively small region of 0.5 ms < h < 0.7 ms. For larger time steps
h 0.7 ms, the costs of solving the implicit system dominate, which is not compensated
by the reduced number of required integration steps. For meshes with equally shaped and
4.3. M IXED I MPLICIT-E XPLICIT I NTEGRATION 89

sized elements the range of different element qualities and thus time steps is very narrow.
Thus, our method will likely result only in small range where performance improvements
can be expected.

Comparison with implicit methods

Implicit methods, being unconditionally stable, allow to take considerably larger time
steps. However, for larger steps, numerical dissipation can reduce the simulation quality
of moving objects. This is particularly true when the absolute displacement dominates
the deformation, as, e. g., in a swinging deformable pendulum. In order to quantify these
effects, and show that our method comes with both a reduced numerical dissipation and a
better performance for a range of time steps, we repeat the experiments described above
using implicit integration. We employ an implicit Euler method to evolve the object.
In Figure 4.14, we plot the total energy Etot , obtained as the sum of the kinetic energy
Ekin = 12 v T M v and deformation energy Edef = 12 uT Ku, after one simulated second,
for both methods at different time steps. Note that the energy level at the start of the
simulation is also plotted as a reference line.

(a) Armadillo (b) Dragon

(c) Liver (M) (d) Bar (F)

Figure 4.15: Computation time of one simulated second for the same four models using
the IMEX and the implicit integration method.

A higher value indicates that less energy has drained due to numerical dissipation. In
Figure 4.14(a) the measurements illustrate that for time steps h < hmax = 0.15 ms, the
90 4. H ANDLING I LL - SHAPED E LEMENTS

dissipation rate of our method is smaller compared to the implicit scheme. For time steps
h 0.01, our approach corresponds to an explicit method. For time steps h hmax , our
scheme corresponds to an implicit integration. A similar behavior is observed for the other
simulations. Employing the IMEX schemes effectively leads to less energy dissipation in
time step ranges where significant portions of the mesh can be integrated explicitly.

While Figure 4.13 compares the IMEX performance only to the explicit performance,
in Figure 4.15 we plot the time to compute one simulation second for the IMEX and
the implicit scheme for comparison. The plot shows that using the IMEX schemes is
always at least as fast as the implicit method and thus provides an excellent transition
between explicit and implicit integration. For most meshes IMEX is faster than explicit
and implicit integration.

Speedup using Clusters

(a) Armadillo (b) Dragon

(c) Liver (M) (d) Bar (F)

Figure 4.16: Speedup using clusters for the previous simulations.

In Section 4.3.2 we extended the IMEX method by solving independent clusters individ-
ually thereby obtaining smaller system of equations. We rerun the experiments above by
solving the clusters individually (see Figure 4.16). For the tested models and time steps,
only the Armadillo and the Dragon show a respectable speedup. For the other models the
speedup is less than 12%. These very mixed results show that the speedup using clusters
4.4. T IME A DAPTIVE E XPLICIT I NTEGRATION 91

is difficult to predict and cannot be generalized. The potential speedup largely depends on
the number of clusters. More clusters automatically means smaller clusters and thus fewer
iterations and higher performance. On the other hand, if only few clusters are present the
performance is at best equal to the original method. In this case, the potential gain in
performance is lowered due to a small management overhead for the clusters.

4.4 Time Adaptive Explicit Integration

Another way of computing forces for elements with different time step requirements is
to use an adaptive integration model. In the literature only few time adaptive implemen-
tations are available; mostly found for mass-spring model implementations [Bielser and
Gross 2000] or as part of spatial adaptive methods [Debunne et al. 2001]. Our imple-
mentation extends the idea of several integration queues presented in [Bielser and Gross
2000] to work with corotational linear FEM. Depending on the critical time step of the
elements, we sort the nodes in different queues. We determine for each node an appro-
priate integration time step and integrate each queue with a different update rate. Note
that adaptive schemes are common to improve performance in various fields. Thus, the
method at hand mainly serves as comparative implementation to the solutions presented
in this chapter. However, to our knowledge adaptive schemes have not yet been used for
corotational linear FEM.

We first describe how the node queues are built based on the reduced 1-ring metric (see
Section 3.3.3). Next, we describe the hierarchical scheme used to integrate the nodes
in the queues. Then, basic performance considerations are presented. More thorough
comparison of performance and accuracy are deferred to Section 4.5.

4.4.1 Building Update Queues

Similar to the methods in Section 4.2 and 4.3, we first identify an appropriate maximum
integration time step he for each element by using the reduced 1-ring. Given a simulation
time step ht we define a number of queues for nodes with different time steps. Starting
with ht we create a first queue Q0 which is integrated with the time step ht . The second
queue Q1 is integrated with ht /2. In general, each queue Qn uses the time step ht /2n .

We start from the sets of tetrahedra E and nodes N . Given the target time step ht , we
build two sets E0 and E0 of tetrahedra. E0 contains the elements being well-shaped with
respect to ht , i. e. he ht , and E0 contains the elements being ill-shaped, i. e. he < ht .
All nodes N0 adjacent to tetrahedra in E0 need to be integrated with a time step smaller
than ht . The rest of the nodes N \ N0 are added to Q0 and by construction are safe to
integrate with ht .
92 4. H ANDLING I LL - SHAPED E LEMENTS

We repeat the above procedure using the target time step ht /2 for the elements in E0 . We
build the sets E1 and E1 containing the well- and ill-shaped elements with respect to the
target time step ht /2. The nodes adjacent to elements in E1 are a subset N1 of N0 . Similar
as before the nodes N0 \ N1 are safe to integrate with ht /2 and thus are added to Q1 .
Proceeding recursively with the set E1 and the target time step ht /4 we fill the queues
Q0 , Q1 , ..., Qn with nodes which are safe to integrate with the time steps ht , ht /2, ..., ht /2n .
Using a recursive scheme by constantly halving the target time step ht results in an easier
to implement integration algorithm. However, empty queues may be generated by this
process, which typically is seldom and does not lead to any significant impact on the
runtime.

4.4.2 Time Integration

In order to obtain a stable adaptive time integration scheme, we integrate the nodes in the
queues Qi at different rates with their respective time step. In the previous section, we
identified the stable time step for each node and sorted them into queues Qi . The time
step for queue Qi is ht /2i . Within the time step ht , we assume that no external influence
happens. Any external changes are included at the end of the time step, e. g. gravitational
forces.
The integration algorithm is recursive with a depth of n and 2n leaves at the bottom. At
the beginning of the time step, the nodes in each queue are at time t. Starting from level 0
down to level n all the nodes are advanced with their respective time step. Figure 4.17(a)
shows an example with 7 nodes distributed to three queues Q0 , Q1 , Q2 with their respec-
tive time steps ht , h2t , and h4t . We illustrate the algorithm on this example without loss of
generality. On the right, Figure 4.17(b) depicts the same configuration after all the nodes
have been advanced once with their respective time step, e. g. the nodes in Q1 are at time
t + ht /2.
After the first step the nodes in Q0 are at their final position at time step t + ht . However,
for the nodes in Q1 we need to perform a second and for the ones in Q2 an additional
three steps. In order to advance those in Q1 further in time we need the positions of all
adjacent nodes at time t + ht /2. For nodes that are already advanced further, like those
in Q0 , we linearly interpolate their positions at time t + ht /2. Any remaining queues are
not yet integrated far enough, e. g. the nodes in Q2 are still at time t + h/4. Thus in order
to advance the nodes in Q1 , the nodes in Q2 need to be updated first (see Figure 4.17(c)).
The procedure just described again applies to the nodes in Q2 , i. e. nodes in Q0 and Q1
are linearly interpolated to t + h/4, in order to be able to advance Q2 . In this example, all
queues are processed up to t + ht /2, which allows to update Q1 to t + ht . Subsequently,
those in Q2 can be advanced in two steps to t + th (see Figure 4.17(d)).
In general, if we want to update the nodes in Qi from time ta to ta + ht /2i , then the nodes
in Qi+1 need to be advanced to ta . This defines a recursive scheme, where the forces for
4.4. T IME A DAPTIVE E XPLICIT I NTEGRATION 93

Q0 Q0

Q1 Q1

Q2 Q2

t t + h/4 t + h/2 t + 3h/4 t+h

(a) Sample 2D mesh and integration queues (b) Queues after the first step
Q0 , Q1 , Q2

Q0 Q0

Q1 Q1

Q2 Q2

t t + h/4 t + h/2 t + 3h/4 t+h t t + h/4 t + h/2 t + 3h/4 t+h

(c) Queues after the second step (d) Queues after the third step

Figure 4.17: 2D example of the time adaptive integration scheme. Initially, the nodes of
the mesh are distributed to the update queues. Figures (b), (c), and (d) depict the first three
update steps. The light shaded nodes are interpolated and used to integrate the nodes of
the current step. The result of the current time step is depicted by the arrows.
94 4. H ANDLING I LL - SHAPED E LEMENTS

the nodes in Q0 are computed once, the ones in Q1 are computed twice and the forces for
the nodes in Qn are compute 2n times. The integration algorithms progresses with a step
size of 2n ht each time the n-th level is visited.
In case of the corotational linear FEM, we need to take special care of the rotations. In
order to reduce the error and keep the integration process stable, the computed rotations
need to be sufficiently accurate. Extracting the rotations at the beginning of the time step
is only sufficient if few levels are present. In general the n-th level takes 2n steps, thus the
number of steps between two rotations extractions is high and errors accumulate. Thus,
we recompute the rotations each time the positions of a tetrahedron have changed.

4.4.3 Characteristics

In the following, we want to visualize the performance of this method. We take the
Dragon mesh with the material parameters E = 30 kPa, = 0.3, and = 1000 kg/m3 .
The CPU used is an Intel Core 2 Quad Q9400 running at 2.66 GHz. The target time step
we want to achieve is 1 ms, which results in six levels with the time steps 1 ms, 0.5 ms,
0.25 ms, 0.125 ms, 0.0625 ms, 0.03125 ms with 1, 243, 674, 275, 57, and 16 nodes.
We compare the performance of three different cases: the unmodified corotational linear
FEM, the time adaptive scheme where the rotations are extracted at the beginning of the
time step, and the time adaptive model where the rotations are extracted each time it is
necessary. A single time step using the full corotational FEM takes 4.56 ms. The time
step of 0.035 ms was identified by the reduced 1-ring method, which accumulates to a
computation time of 130.3 ms for the integration of 1 ms simulation time. For the adap-
tive model, if the rotation extraction is done once at the beginning of the time step, the
integration costs are 32.54 ms. Thus, the performance improves by a factor of 4 compared
to the ordinary model. Unfortunately, with increasing deformations and rotation the in-
tegration becomes unstable. This can be fixed by recomputing the rotations whenever
positions change. The costs for an integration step increase to 52.86 ms, which is slower
than before, but still a 2.5x improvement to the unmodified model.
Comparisons to our other methods are presented in the following section. Again, the
results above show that rotation extraction is a significant cost driver of corotational linear
FEM. Unfortunately, accurate information about the rotation of an element is important
to integrate the dynamics stably.

4.5 Performance and Comparison

In the following, we compare the performance of the presented methods. As we have


seen before, each of the methods can be used to relax the dependency on the time step
4.5. P ERFORMANCE AND C OMPARISON 95

and handle small and ill-shaped elements. First, we apply our methods on a generic
mesh in order to compare the computation costs. Next, in order to be able to judge the
deformation quality, we test our methods on a synthetic sample set and apply typical
deformations. For the filtering approach (see Section 4.1) and the hybrid FE/PBD method
(see Section 4.2) we have presented experiments showing that these methods exhibit a
behavior close to a reference simulation. We extend these experiments and additionally
compare them with each other, and to the IMEX and time adaptive methods. Closing
this section we show two applications, where the hybrid FE/PBD method and the IMEX
method are used to simulate a deformable objects with cutting. The simulations were run
on a serial implementation on an Intel Core 2 Quad Q9400 running at 2.66 GHz.

4.5.1 Interaction with a Liver Model

In the first experiment the simulation timings of the four methods are measured. We
apply a load to the Liver (C) model as depicted in Figure 4.18. The material parameters
are E = 30 kPa, = 0.3, and = 1000 kg/m3 . The mesh is fixed at the back and a

Figure 4.18: Example load case on the Liver (C) model. The mesh is fixed at the marked
points (blue nodes), while a load is applied to the front part.

point load is applied at the front as shown. The resulting deformations are computed
using increasing simulation time steps. Due to this, an increasing number of elements are
detected as ill-shaped and need to be treated specially. The resulting computation times
per time step are plotted in Figure 4.19. The performance numbers were gathered using a
single threaded implementation.
The standard corotational FEM is used as reference method. Computing the forces is in-
dependent from the time step and thus its cost are the same for each time step in the figure.
96 4. H ANDLING I LL - SHAPED E LEMENTS

Figure 4.19: Computation time per time step of the filtered FE, hybrid FE/PBD, IMEX
and adaptive method for increasing simulation time steps. At the time step of 1 ms 24%
of the elements are ill-shaped.

Time step [ms] 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Single element 0 12 28 69 106 151 195 223 257
Reduced 1-ring 0 0 4 17 34 66 97 129 179

Table 4.1: Number of ill-shaped elements for various time steps of the Liver (C) model.
The top row shows the number of ill-shaped elements using the single element approxi-
mation. The bottom row uses the reduced 1-ring method instead.

Note that these numbers are only here as reference. For other methods the time needed
to compute the forces for a single step is dependent on the time step, more precisely on
the number of ill-shaped elements. For the sake of comparison, we limited the number
of iterations for the constraint based approach to five. As can be seen, the runtime cost
of the constraint-based method grows significantly when more constraints are added. In
contrast to this, the IMEX and adaptive methods scale better with an increasing number
of ill-shaped elements. Due to the conservative estimation and the high solving costs the
constraint based method is the slowest of the three. The adaptive integration scheme has
higher computation costs even if no element is ill-shaped and no adaptive integration oc-
curs. In this small example the ratio between the numerical code and the management
code is small and thus, compared to the samples in Section 4.5.2, the adaptive method
performs worse. While the fastest approach certainly is the hybrid FE/PBD method, the
IMEX approach provides a good trade off between performance and physical accuracy.

In order to get an overview over the number of ill-shaped elements, Table 4.1 shows the
numbers for the single element identification method and the reduced 1-ring scheme. The
4.5. P ERFORMANCE AND C OMPARISON 97

single element method is used for the constraint based method and the reduced 1-ring
method for the hybrid FE/PBD, IMEX, and adaptive method. At the time step of 1 ms the
fraction of ill-shaped elements reaches 34% and 24%, respectively.

Figure 4.20: Computation time per simulated second on the Liver (C) model for the
filtered FE, hybrid FE/PBD, IMEX and adaptive method for increasing simulation time
steps. The average computation times per time step are given in Figure 4.19.

Taking a closer look at the computation time for a simulated second, which takes the
costs per step and the step size into account, Figure 4.20 shows that the hybrid FE/PBD
and the adaptive method by far are the most scalable. However, the adaptive method
is still more expensive than the hybrid FE/PBD method. The constraint method and the
IMEX approach do not scale in the same manner. Initially, the computation costs decrease
due to the fact that the number of integration steps is reduced. Later, the increasing costs
per step outweigh the advantage of requiring fewer update steps. This effect is described
in more detail for the IMEX method in Section 4.3.

4.5.2 Comparison of Different Deformation Modes

In the set of examples in this section, we compare all of the four methods on synthetic
meshes with many small and large elements. We show the performance on two inhomo-
geneous meshes: a cube and a bar. The cube was constructed with small elements on one
side and larger elements on the other. It contains 1428 nodes and 5889 tetrahedra. The
bar was created from the cube by mirroring it at the side with the small elements, which
results in 2340 nodes and 11778 tetrahedra. The meshes are depicted in Figure 4.21.
These configurations allow to observe the effect of the presented methods on large groups
of ill-shaped elements. The dimensions of the cube are 5 5 5 cm3 and those of
98 4. H ANDLING I LL - SHAPED E LEMENTS

Figure 4.21: Inhomogeneous cube and bar meshes in rest state. The bar on the left is built
by mirroring the cube at the face with the small elements.

the bar 10 5 5 cm3 . The Youngs modulus is set to E = 30 kPa and the density
to = 1000 kg/m3 . The following examples use a Poisson ratio of = 0.3. Later, we
extend the discussion to higher Poisson ratios and show selected cases. We use two types
of constraints, external forces and direct position manipulations, exerting four types of
common deformations: stretching, compression, twisting and bending. For the position
based manipulations the constrained nodes are not subject to internal forces, but only to
external manipulations. The deformations subject to force based constraints are simulated
only for a limited time period. When the stored restitution forces become higher than the
external forces, the mesh begins to oscillate trying to return to the rest state. We do not
compare these oscillating motions because the changes in the deformation models lead
to different oscillation frequencies. For the position-based constraints there are no such
restrictions. The force- and position-based constraints exert similar types of deformations,
but are not directly comparable.

Figure 4.22: In the three images the ill-shaped elements are highlighted for the inhomo-
geneous meshes. On the left and right the ill-shaped elements are identified using the
reduced 1-ring metric. In the middle image the highlighted elements are identified by the
single element metric.

The time step for the reference simulations is set to 0.1 ms, which is smaller than the
0.15 ms computed using the maximum eigenvalue of the global system matrix M 1 K.
The time step used for the filtered FEM model is 0.2 ms; for the hybrid, the IMEX, and
time adaptive method it is 0.4 ms. The ill-shaped elements are highlighted in Figure 4.22.
The time steps are chosen such that the small elements are identified as critical. However,
setting a higher time step for the filtered FEM simulation creates a system of equations
which is over-constrained and unsolvable. Still, in the case of the bar the constraints yield
an unsolvable system of equations. Thus, the filtered FEM method is only applied to the
4.5. P ERFORMANCE AND C OMPARISON 99

experiments with the cube. In the case of the cube the filtered FEM method identifies 1690
elements as ill-shaped by the single element metric; for the hybrid method and IMEX
method 2989 elements are identified by the reduced 1-ring metric. For the bar 5970
elements are identified by the reduced 1-ring metric. Note that the constraint method uses
a smaller time step and thus less elements are identified as ill-shaped.

Stretching

First, we compare the behavior of the methods when the models are stretched (Fig-
ure 4.23). The difference between each of the methods and the reference simulation are
color coded; the error scale is shown on the left of each row, and the average errors for
each of the simulations are summarized in Table 4.2. The first row in Figure 4.23 shows
the bar mesh subject to stretching displacement constraints, where the nodes on the right
are pulled towards the right. Note, as mentioned above, the missing results for the filtered
FEM method are due to the unsolvable constraints. While the errors for the IMEX and
the time adaptive method are very small, the hybrid FE/PBD method shows a larger error
for the geometrically handled elements. A similar error behavior appears when replacing
the displacement constraint with external forces. In both cases, we observe the same stiff-
ening limitation as discussed in Section 4.2.3. The large group of connected ill-shaped
elements is not stretched the same as the surrounding well-shaped elements. These two
examples again demonstrate that, even with the parameter adjustment, the geometric el-
ements are stiffer than their FEM counterpart. Second, the lack of a linear deformation
mode, does not allow the elements to diverge too much form their initial shape. Grouped
to larger clusters this stiffening becomes more apparent.

The third and fourth row show the same experiment on the cube mesh. However, in
contrast to the bar, the stretch is applied in the vertical direction. The overall stretch for
the cube mesh is 140%, i. e. the height grew from 5 cm to 7 cm. Again for both rows
the IMEX and time adaptive model have a low error. Although the errors are noticeable,
the filtered model only shows small errors. Note that the time step is only half of the
other models thus the errors are expected to be smaller. For both, the filtered FE and the
hybrid FE/PBD method, the errors accumulate mostly near the edges of the face where
the ill-shaped elements are. These errors are the result of the stiffening originating from
the constraints and the geometric method; the vertical edges are less bent compared to
the reference model. In case of the force boundary condition the hybrid FE/PBD model
shows a disturbing behavior. While the overall deformation looks well, the elements near
the top edge, where the ill-shaped elements are, do not handle large forces well. This is an
artifact resulting from the way the clusters are processed by averaging the displacements.
In such boundary cases the restitution forces can be too small.

To sum up, the time adaptive method is faster than the reference model and exhibits only
very small errors. However, the IMEX method is cheaper than the reference and the time
100 4. H ANDLING I LL - SHAPED E LEMENTS

Filtered FE FE/PBD IMEX Time Adaptive


2.1
Displacement

1.575

1.05
0.788
0.525
0.263
0
[mm]

1.1

0.83
Force

0.55
0.4
0.28
0.14
0
[mm]
1.1
Displacement

0.83

0.55
0.4
0.28
0.14
0
[mm]
3

2.25
Force

1.5
1.125
0.75
0.375
0
[mm]

Figure 4.23: Inhomogeneous bar and cube mesh subject to stretching constraints. The
first two rows show the bar subject to displacement and force constraints. The latter
two rows show the cube subject to displacement and force constraints. The errors are
computed per node and color coded using the coding on the left side of each row.

Error [mm]
Model Filtered FE FE/PBD IMEX Adaptive
Force Position Force Position Force Position Force Position
Bar - - 0.34 0.73 0.01 0.07 0.03 0.1
Cube 0.13 0.18 0.26 0.25 0.02 0.06 0.007 0.03

Table 4.2: Errors for the stretching comparison example. For each of the methods the er-
ror is computed with respect to the reference solution; the first columns show the average
error using force and the second columns the average error using position constraints.

adaptive method, and has similarly small errors. In contrast, the filtered FEM is extremely
costly due to the repetitive solving of a huge system of equations. The hybrid FE/PBD
4.5. P ERFORMANCE AND C OMPARISON 101

method is by far the fasted method, but exhibits a larger error than the other methods. The
performance costs for each of the methods are summarized in Table 4.3.

Timings [ms]
Model Reference Filtered Hybrid IMEX Adaptive
Bar 92 ms - 16 ms 58 ms 67 ms
Cube 50 ms 362 ms 9 ms 36 ms 34 ms

Table 4.3: Computation time per step for the inhomogeneous meshes using the four meth-
ods to handle ill-shaped elements.

Compressing

In the following, we observe the behavior of the developed methods using compressing
boundary constraints. We apply constraints with the same magnitude as before, but in
opposite direction. Similar to Figure 4.23, the resulting deformations are shown in Fig-
ure 4.24. Again, the bar simulations are not performed with the filtered FEM due to the
over constrained system of equations. The final deformations using the time adaptive and
the IMEX method are omitted because the errors are hardly visible.

FE/PBD Filtered FE FE/PBD


3 1.5
Displacement

2.25 1.125

1.5 0.75
1.125 0.563
0.75 0.375
0.375 0.188
0 0
[mm] [mm]

1.1
2.1
0.83
1.575
Force

0.55
1.05
0.4
0.788
0.28
0.525
0.14 0.263
0 0
[mm] [mm]

Figure 4.24: Inhomogeneous bar and cube mesh subject to compression constraints. The
first column shows the bar model subject to displacement and force constraints. The latter
two columns show the cube model subject to displacement and force constraints for the
filtered FEM and hybrid FE/PBD method. The errors are computed per node and color
coded using the coding on the left side of each row. The plots using the IMEX and time
adaptive method are omitted because the errors are very small.

While we observe the desired behavior in case of the IMEX and the time adaptive model,
the filtered FE and the hybrid FE/PBD method deviate from the reference method under
102 4. H ANDLING I LL - SHAPED E LEMENTS

Error [mm]
Model Filtered FE FE/PBD IMEX Adaptive
Force Position Force Position Force Position Force Position
Bar - - 0.29 0.7 0.009 0.18 0.003 0.05
Cube 0.17 0.3 0.19 0.4 0.02 0.07 0.005 0.03

Table 4.4: Errors on the compression comparison example. For each of the methods the
error is computed with respect to the reference solution; the first columns show the aver-
age error using force and the second columns the average error using position constraints.

high compression. Similar to the stretching case the quality of the deformation decreases
with growing strains. For larger strains the geometric model generates too small forces,
thus elements start collapsing. In case of the bar mesh the artifacts are contrary to the
stretching case. The volume preservation should result in a growing cross sectional area
orthogonal to the direction of compression. However, the elements in the center are uni-
formly compressed resulting in a too small cross-sectional area in case of using displace-
ment based boundary constraints.

2.25

1.5
1.125
0.75
0.375
0
[mm]

(a) Linear corotational FEM (b) Filtered FEM (c) Hybrid FE/PBD method
at a small time step

Figure 4.25: Compressing the cube mesh using direct positional constraints to 60% re-
sults in artifacts and high errors for the filtered FE (Figure (b)) and the hybrid FE/PBD
method (Figure (c)). Figure (a) shows the same simulation using the linear corotational
FEM running at a smaller time step.

A similar behavior is observed when using external pressure to compress the cube. The
second row in Figure 4.24 shows the cube compressed to approximately 90%. As with
displacement constraints the reference simulation is not well behaved due to the asymme-
try in the size of the elements. Instead of just vertical compression, the generated forces
additionally start bending the cube after a compression to approximately 80%. While
the adaptive and the IMEX method behave the same as the reference model, the hybrid
FE/PBD method has problems in maintaining the shape of the elements in the geometric
part, which compresses too fast. The filtered FEM again develops stability problems after
4.5. P ERFORMANCE AND C OMPARISON 103

a compression to approximately 90%. The overall errors for the states in Figure 4.24 are
comparable to the stretching case, as can be seen in Table 4.4.

In case of compressing the cube, we observe the limitation of the unmodified linear coro-
tational FEM; small elements are stiffer than the larger ones. The results is that the small
elements start to bend over when the cube is compressed too far. In the first row in Fig-
ure 4.24 the cube is compressed to 80% of its initial height by using direct displacement
constraints. Note that applying high strains is difficult in general for corotational linear
FEM. At these high strains, we expect to observe deficiencies of the model. While filtered
FE, the IMEX, and the time adaptive method produce similar results, the hybrid FE/PBD
method already shows the mentioned errors. They appear earlier and thus less compres-
sion can be applied. As an illustrative example to demonstrate the limits, we compress
the cube mesh further to 60% of its initial height. We observe the described errors in
the reference method (see Figure 4.25(a)), where the small elements on the top start to
flip over. Figure 4.25(c) shows the same simulation state for the hybrid FE/PBD method.
However, due to the generally stiffer behavior, the deviation from the linear corotational
FEM is large. The filtered FEM shows the same errors as discussed in Section 4.1.4. High
compression leads to constraints not being resolved stably, resulting in oscillation and a
diverging solution (see Figure 4.25(b)).

Bending

FE/PBD Filtered FE/PBD


1.1
1.8
Displacement

0.83
1.35

0.55
0.9
0.4
0.675
0.28
0.45
0.225 0.14
0 0
[mm] [mm]
2.1 1.8

1.575 1.35
Force

1.05 0.9
0.788 0.675
0.525 0.45
0.263 0.225
0 0
[mm] [mm]

Figure 4.26: Inhomogeneous bar and cube mesh subject to bending constraints. The first
column shows the bar model subject to displacement and force constraints. The latter
two columns show the cube model subject to displacement and force constraints for the
filtered FEM and the hybrid FE/PBD method. The errors are computed per node and color
coded using the coding on the left side of each row. The plots using the IMEX and time
adaptive method are omitted because the errors are very small.
104 4. H ANDLING I LL - SHAPED E LEMENTS

Bending shows similar errors to stretching and compressing. Similarly, the errors are
concentrated at the small elements where the bending constraints are applied. For both
meshes the IMEX and time adaptive model behave as the reference solution with only
little errors. Hence, they are not shown in Figure 4.26. The filtered FE and the hybrid
FE/PBD method show similar errors as in the cases above. In contrast to the bending ex-
periment in Section 4.2.2 the ill-shaped elements are not distributed over the entire mesh
but concentrated in a small volume. Bending the bar (by direct position manipulation or
by force field) using the hybrid FE/PBD method places a high pressure on small elements
at the top of the mesh. This results in similar errors as when compressing the bar (see
Table 4.5). Bending the cube shows similar artifacts for the filtered FE and the hybrid
FE/PBD method. While the former shows the stability issues due to high compression of
the small elements, the latter is not able to compress the small elements to the same extent
as the reference model.

Error [mm]
Model Filtered FE/PBD IMEX Adaptive
Force Position Force Position Force Position Force Position
Bar - - 0.37 0.35 0.14 0.02 0.11 0.01
Cube 0.27 0.21 0.31 0.20 0.06 0.012 0.02 0.01

Table 4.5: Errors on the bending comparison example. For each of the methods the error
is computed with respect to the reference solution; the first columns show the average
error using force and the second columns the average error using position constraints.

Twisting

Notably less errors occur when the meshes are twisted. For the IMEX and time adaptive
method the errors are in range of the other experiments. However, they are visible in the
figures due to the fact that the error in general is reduced. In general the error is smaller
than in the previous cases because the elements are less stretched or compressed. Still,
Figure 4.27 shows that the errors are mainly found near the small elements but they do
not cause significant visible artifacts. The overall error for the twisting case are tabulated
in Table 4.6.

Performance

The computation costs in each of the presented cases are similar. Table 4.3 summarizes
the timings for a single integration step of 0.4 ms. The measurements were performed for
the experiment using stretching constraints. The numbers using compressing and bending
constraints are similar, while in case of twisting the timings are approximately 20% higher
due to the higher workload of the rotations. The first column shows the performance of
4.5. P ERFORMANCE AND C OMPARISON 105

Filtered FE/PBD IMEX Time Adaptive


1.1
Displacement

0.83

0.55
0.4
0.28
0.14
0
[mm]
0.5

0.375
Force

0.25
0.188
0.125
0.063
0
[mm]

1.5
Displacement

1.125

0.75
0.563
0.375
0.188
0
[mm]

0.5

0.375
Force

0.25
0.188
0.125
0.063
0
[mm]

Figure 4.27: Inhomogeneous bar and cube mesh subject to twisting constraints. The first
two rows show the bar subject to displacement and force constraints. The latter two rows
show the cube subject to displacement and force constraints. The errors are computed per
node and color-coded using the coding on the left side of each row.

the linear corotational method. The fastest is clearly the hybrid FE/PBD method; when
using the IMEX and time adaptive method we still observe a significant speedup. The

Error [mm]
Model Filtered FE/PBD IMEX Adaptive
Force Position Force Position Force Position Force Position
Bar - - 0.11 0.21 0.03 0.11 0.02 0.03
Cube 0.08 0.33 0.09 0.23 0.05 0.03 0.04 0.02

Table 4.6: Errors on the twisting comparison example. For each of the methods the error
is computed with respect to the reference solution; the first columns show the average
error using force and the second columns the average error using position constraints.
106 4. H ANDLING I LL - SHAPED E LEMENTS

timings for the filtered FEM method show the limitations of the constraint approach. The
complexity of the bar model results in an unsolvable set of equations. In case of the cube
model the set of equations is so large that the computational effort is much higher than
the costs of the reference method.

High Poisson Ratios

Many materials, specially soft-tissue like materials, have high Poisson ratios. Thus, we
next inspect the quality of the deformations using the same samples, but higher Poisson
ratios. The Poisson ratio is increased to 0.4, 0.45 and 0.49, which reduces the limit time
step from 0.15 ms to 0.12 ms, 0.093 ms, and 0.044 ms, respectively. Note, this decrease
roughly corresponds to the decrease we measured in Section 3.5. Increasing the Poisson
ratio from 0.3 to 0.49 increases the oscillation frequency by approximately a factor of 4.
For the first two increases, the target time step for the hybrid method, the IMEX method
and the time adaptive method is still set to 0.4 ms. For the filtered FE method, we use 3
and 4 sub-steps and for the reference simulation, we use 4 and 5 sub-steps, respectively.
For = 0.49, we use same number of sub steps as for = 0.45, but we reduce the target
time step to 0.2 ms. Increasing the Poisson ratio has little impact on the time adaptive
and the IMEX method. Both show similar results with low errors. However, the picture
changes for the filtered FE and the hybrid FE/PBD method. While the filtered FE method
shows a general increase in the error, the hybrid FE/PBD method again behaves too stiff.
This behavior gets more noticable with higher Poisson ratios. While the vertical edges
using the corotational FEM start to bend more inwards, the same does not happen for the
hybrid method.
3

2.25

1.5
1.125
0.75
0.375
0
[mm]

Figure 4.28: Inhomogeneous cube mesh subject to displacement constraints using the
hybrid FE/PBD method for various Poisson ratios. From left to right the results use a
Poisson ratio of 0.3, 0.4, 0.45, and 0.49.

Figure 4.28 shows a comparison of simulation states after stretching the cube model using
the hybrid method for various Poisson ratios. The figures clearly show the disadvantages
of the hybrid FE/PBD method at higher Poisson ratios. While the tetrahedra using the
corotational linear FEM stretch more at higher ratios, the elements using the geometric
deformation maintain the initial shape. Similar errors are found when compressing el-
ements, while the geometrically animated elements show a similar growth in the error
when increasing the Poisson ratio.
4.5. P ERFORMANCE AND C OMPARISON 107

Summary

The various deformations tested above demonstrate the performance of the models devel-
oped in this chapter. For the IMEX and time adaptive method we observe a significant
speedup with small differences compared to an unmodified simulation using small time
steps. While the hybrid FE/PBD method outperforms the others computation wise, it
shows deficiencies for large strains when significant groups of elements are simulated
using the geometric model. While the filtered FE method works in smaller scenes, the
method scales bad when the number of ill-shaped elements grows. Locking, compression
artifacts, and general performance problems limit its applicability.

4.5.3 Cutting Simulation using the FE/PBD Method

Figure 4.29: Cuts A to D were made in sequential order to the Liver (C) model. The
highlighted elements are ill-shaped for the simulation time step of 0.8 ms.

In the following experiment, we apply the hybrid FE/PBD method in a scenario where
the simulation mesh is modified. We are able to maintain an initially defined target time
step throughout the simulation. This is done by classifying the changed and newly created
elements and handling any ill-shaped element with the geometric model. For the exper-
iment, we use the Liver (C) mesh in Figure 3.5. For cutting and remeshing, we apply a
simple element subdivision scheme. The material parameters are E = 30 kPa, = 0.3,
and = 1000 kg/m3 ; we employ a target time step of 0.8 ms. In contrast to the previous
examples, the numbers for this test were obtained using a multi-threaded implementation.
Starting from the initial mesh we applied a series of four cuts (A to D in Figure 4.29).
Table 4.7 summarizes how the total number of elements and the number of ill-shaped
108 4. H ANDLING I LL - SHAPED E LEMENTS

Total # of Ill-shaped for Identification


elements target time step process
Before After Before After # Elements Time per Total
to check element time
Cut A 766 961 97 (13%) 246 (26%) 182 0.015 ms 2.73 ms
Cut B 961 1187 246 (26%) 490 (41%) 215 0.015 ms 3.23 ms
Cut C 1187 1545 490 (41%) 828 (54%) 335 0.014 ms 4.69 ms
Cut D 1545 1970 828 (54%) 1188 (60%) 400 0.014 ms 5.6 ms

Table 4.7: Total number and fraction of ill-shaped elements before and after each cut. The
last three columns illustrate the computational effort for identifying ill-shaped elements.

elements changes from one cut to the next. Note that the number of ill-shaped elements
increases, thus larger regions have to be handled using the position-based approach. The
last three columns show the number of elements we have to check after the cut and the
time needed for the identification. The time spent per element remains constant. The
computational effort is acceptable since usually the number of elements introduced in a
cut is limited. Moreover, note that it is also possible in case of a large number of new
elements to assign these all tentatively to the ill-shaped set. The simulation can then
continue, while the identification process to update the sets is completed in parallel.

Initial Cut A Cut B Cut C Cut D


FE/PBD 0.8 ms 0.8 ms 0.8 ms 0.8 ms 0.8 ms
Stable time step
FEM 0.3 ms 0.09 ms 0.03 ms 0.03 ms 0.02 ms
Computation time FE/PBD 0.6 s 1.3 s 2.25 s 3.28 s 3.6 s
for 1 simulated second FEM 1.6 s 13 s 39.5 s 52.6 s 90.4 s

Table 4.8: The hybrid FE/PBD method allows to maintain a stable time step with only
minor performances degradations. If no special handling of ill-shaped elements is applied
the stable time step decreases rapidly, leading to much higher runtime costs.

Table 4.8 summarizes the simulation performance after the various cuts. The first two
rows show that with our method we can keep the initial time step, while we have to
reduce it significantly without a special handling of ill-shaped elements. In the latter
two rows we compare the runtime costs of the presented method to a corotational FE
approach. As can be seen, for the latter the necessary small time step leads to a huge
degradation in performance. Nevertheless, the computation time for the hybrid FE/PBD
method also increases. This is due to the huge increase in the number of elements in the
mesh. However, it should be noted that in a typical cutting scenario, the excised pieces
would be removed from the simulation, thus keeping the number of elements manageable.
For instance, after Cut D only the right lobe would remain. It consists of 539 elements
of which 159 are ill-shaped. For this part, the computation using the presented method
requires 0.66 s for one simulated second, thus still allowing real-time simulation.
4.6. D ISCUSSION 109

4.5.4 Cutting Simulation using IMEX Method

Figure 4.30: The Dragon mesh is first stretched and then cut through an element subdi-
vision algorithm. Using the IMEX method the existing and new ill-shaped elements are
handled keeping the simulation stable.

In order to illustrate the IMEX method in a more complex application context, we per-
form an experiment where the Dragon mesh is non-progressively cut into two pieces (see
Figure 4.30). For the entire simulation the IMEX method is used. Again the material pa-
rameters are E = 30 kPa, = 0.3, and = 1000 kg/m3 . The cutting path is approximated
through straightforward element subdivision. The desired time step of the numerical in-
tegration is 0.3 ms. Before the cut, 512 ill-shaped elements are detected for the desired
time step. Thus, 483 affected nodes have to be integrated implicitly. The average cost for
computing one time step is 26 ms. After the cut, the total number of simulation elements
increases to 4370 tetrahedra. 122 additional ill-shaped elements appeared after the cut,
resulting in a total of 606 nodes to be integrated implicitly. This raises the average costs
for computing one time step to 36 ms. The time to compute one physical second is 120 s.
In contrast to this, had a purely explicit scheme been employed, the time step before the
cut should have been set to 0.03 ms to maintain stability. After the cut, the time step
has to be further reduced to 0.01 ms due to newly generated ill-shaped elements. In this
case, computing one physical second would take up to 880 s. Note that the region which
is cut initially does not contain any ill-shaped elements. These results demonstrate the
advantage of using a mixed implicit-explicit approach to treat any dynamically appearing
ill-shaped elements, which makes it possible to maintain an initial time step throughout
the simulation. Finally, it should be noted that the time to perform the element identifica-
tion after the cutting is only 0.014 ms per element. The complete online identification for
this case was performed in 3.7 ms.

4.6 Discussion
We have presented four approaches for computing the dynamic evolution of a deformable
object by employing explicit numerical time-integration that overcome the stability prob-
lems of previous approaches. The methods consist of two stages: First, we determine
which elements cannot be simulated stably, given a target time step using explicit integra-
tion. This classification is enabled by computing the highest eigenmodes of the elements.
110 4. H ANDLING I LL - SHAPED E LEMENTS

We then combine a standard corotational linear FE model to animate the well-shaped


elements with different methods to handle the ill-shaped elements. The first method in
Section 4.1 handles ill-shaped elements by filtering the eigenmodes of each element. In
order to cope with material getting too soft, the relaxed modes are constrained. In the
second approach in Section 4.2, we compute the displacements generated by ill-shaped
elements using a geometric deformation model. The third scheme in Section 4.3 uses an
implicit integration scheme for nodes adjacent to ill-shaped elements. Finally, the fourth
method in Section 4.4 determines for each node connected to an ill-shaped element an
individual integration time step. These nodes are then grouped, such that each group is
integrated with a single time step leading to a time adaptive approach.
The key benefit of our approaches is the alleviation of the dependency between the mesh-
ing, material parameters, and the time step size. In contrast to pure explicit integration
schemes, we can take larger time steps, resulting in a significant speedup. Further, the
meshing process is eased since ill-shaped elements do not need to be strictly avoided.
For many meshes it is possible to double the time step, while only handling about 10
percent of the elements with an additional method. This reduces significantly the compu-
tational costs per simulated second. Furthermore, an initially defined target time step can
be maintained throughout a simulation, even if an object undergoes limited topological
changes. This makes the methods particularly well-suited for interactive cutting simu-
lations. Note that we do not handle element inversion, but our presented methods can
be combined with any inversion handling technique available, e. g. [Irving et al. 2004,
Schmedding and Teschner 2008, Stomakhin et al. 2012].
Depending on the type of scenario different methods are preferable. If the overall runtime
is the most important factor than the hybrid FE/PBD method is the best choice. For
reasonably well distributed ill-shaped elements the errors are small. However, the results
using high Poisson ratios, large stretches or huge clusters of ill-shaped elements show the
limitations of the method. The simplicity of the implementation and the good balance
between error and performance make the IMEX method a good choice for most scenarios
as long as the number of ill-shaped elements is small. The time adaptive method has
similar properties to the IMEX method, but it is more complex to implement. Compared
to the IMEX it has less time step dependent damping and is more suitable for scenarios
where full control of the damping behavior is needed. However, additional care needs to
be taken to prevent the situation where a single element requires an infinitesimally small
time step.
We have discussed the effect of remeshing on the simulation time step in Section 1.2. We
demonstrated that ill-shaped elements introduced by, e. g. cutting, often reduce the critical
simulation time step. While the employed remeshing steps improve the mesh quality, the
initial time step cannot be maintained. In general mesh optimization allows to reduce the
negative effect of a cut, however it cannot be removed entirely in most cases. Moreover,
we measured the costs of optimizing the mesh topology. Comparing these high costs
to the effect of the observed quality improvements favor our methods of handling ill-
4.6. D ISCUSSION 111

shaped elements to keep a stable simulation time step. Less resources need to be invested
into methods maintaining a mesh with reasonably well-shaped elements. Still, remeshing
bares potential, especially if the handling of the critical elements is more expensive as
it is the case when using the filtered FE and the IMEX method. Run in parallel to the
simulation, remeshing can reduce the implicit set, thus lowering the integration costs.
Note that we cannot evade the time step dependency of explicit integration schemes com-
pletely. If the target time step is significantly larger than the theoretic limit time step, then
all elements are classified as ill-shaped. Depending on the resolution method, we would
arrive at a purely geometric deformation or a fully implicitly integrated method.
5
Implementation Aspects

In the previous chapters, we presented the theory and the algorithms to identify and handle
ill-shaped elements in mesh-based FEM simulation. In the following, we discuss some
implementation aspects in more detail. First, we look at different algorithms to extract the
rotation from an arbitrary deformation tensor. Next, Section 5.2 presents the implementa-
tion of the IMEX integration scheme. In Section 5.3 we discuss parallelization strategies
using OpenMP and CUDA. Thereafter, in Section 5.4, we compare the performance of
the scalar to the parallelized implementations. Finally, we discuss our iterative approach
to compute the largest eigenvalue of the stiffness matrix of a tetrahedron.

5.1 Rotation Extraction

The most expensive operation, when implementing the corotational FEM, is to extract the
rotation from the deformation tensor F = RS by polar decomposition. In this section, we
review available schemes and compare their performance. Choosing a particular method
requires balancing between several goals:

Stability: Besides being able to extract the rotation of a well-shaped tetrahedron,


it should also be work correct in case of collapsed or inverted elements. While col-
lapsed elements lead to singular matrices and can result in invalid rotations, inverted
elements produce general orthonormal transformations, which are represented by a
rotation and a reflection operation. Preferably, the reflective part has to be omitted
in order to guarantee a stable simulation.

Speed: Because the rotation extraction is expensive the scheme should be as fast as
possible.

Accuracy: The decomposition should be accurate enough, such that correctness


and stability of the simulation are not decreased.
114 5. I MPLEMENTATION A SPECTS

Techniques to extract the rotation efficiently have received a significant amount of atten-
tion in the literature. Muller et al. [2005] and Rivers and James [2007] use cyclic Jacobi
iterations to compute the eigenvalues and eigenvectors of F T F = V 2 V T . The desired
rotation is obtained by R = F (V 2 V T ). In order to further stabilize the result the
1

columns of R are modified. The one corresponding to the smallest eigenvalue of F T F


is recomputed as the cross product of the other two columns. This stabilizes the rotation
matrix in case of nearly collapsed elements and element inversions along a single direc-
tion. In order to reduce the costs of the algorithm, Rivers and James proposed to exploit
temporal coherence Each time the rotation of an element is extracted, the diagonalization
matrix V is stored. Before the next rotation extraction, the stored matrix of the previ-
ous extraction V prev is applied to F T F V Tprev F T F V prev . Starting the eigenvalues
computation from this transformed matrix reduces the number of needed iterations sig-
nificantly. For the sake of comparison, we replaced the cyclic Jacobi iterations by a direct
eigenvalue computation for symmetric matrices from [Gunnebaud et al. 2010].
Recently, McAdams et al. [2011] introduced an optimized iterative polar decomposition.
They employ a two step algorithm, which first computes the eigenvectors V of F T F
using Jacobi iterations. In the second step, they perform a QR decomposition of F V to
compute U and for a full singular value decomposition (SVD) F = U V T . The
final rotation R is obtained by computing U V T . Their implementation is de facto branch
free and thus perfectly parallelizable using vector instructions. Instead of making the
detour and compute the eigenvalues of F T F , we directly compute the SVD of F . We
tested two different methods; we used the iterative implementation from the Eigen library
[Gunnebaud et al. 2010] and implemented a two sided cyclic Jacobi iteration following
[Brent et al. 1982]. Handling inverted tetrahedra is then done as described in [Irving et al.
2004] by assuming that a potential inversion happens along the direction corresponding
to the smallest singular value.
The threshold for the iterative algorithms is chosen such that the error does not destabilize
the simulation. Next, we compare the runtime performance of the presented methods.
For the algorithms by Muller, Brent, and the two from Eigen (iterative SVD and direct
eigenvalue computation) we measure a scalar implementation only. The performance of
the algorithm by McAdams was measured using the benchmark program from their web
site, which provides a scalar and vectorized version. As a test machine an Intel Core2Quad
Q9400 at 2.6 GHz was used. Table 5.1 compares the computation times per element from
a series of 1024 1024 random matrices.
The Eigen (SVD)-implementation is the most accurate algorithm tuned towards preci-
sion, thus it is also the slowest. The two sided approach (Brent) requires a more com-
plex rotation pattern, and thus could not match the performance of Mullers method.
Testing the performance of the algorithm by [McAdams et al. 2011] reveales a two
fold result. Their scalar version runs slower than our implementation of the simple
cyclic Jacobi iteration algorithm. However, the merit of their implementation is that it
is nearly perfectly parallelizable. Enabling vectorization shows an immediate speedup
5.2. IMEX 115

Algorithm Computation time [ns]


Muller 787.28
Brent 1270.78
Eigen (SVD) 1926.43
Eigen (Direct) 832.63
McAdams 953.67
McAdams (Vectorized) 228.88

Table 5.1: Performance of various rotation extraction algorithms. Apart from the Eigen
(SVD)-implementation, all other algorithms finish within similar time frames. The imple-
mentation by McAdams provides a 4-way vector parallelization, which shows an almost
linear speedup.

making it the fastest algorithm. We settled for the method based on [Muller et al. 2005,
Rivers and James 2007] due to its stability, and the ability to choose between accuracy
and speed by limiting the number of iterations and thus the tolerable error. Although the
implementation by McAdams is faster, it was not possible to integrate it into our software
framework due to different data layout schemes. However, it shows the potential speedup
that is possible if the data layout is changed accordingly.

5.2 IMEX

In Section 4.3 we presented the IMEX algorithm to handle the integration of ill-shaped
elements. This section describes the implementation of the scheme in more detail. We
deompose the it into three steps: First, the rotations of the elements are computed. Hereby,
we assume that these are constant throughout the integration step. Next, we describe the
explicit integration of the non-critical nodes. The last step then is the implicit integration
of the critical nodes. We wrap up this section by discussing some algorithmic details of
the clustered IMEX. In addition to the notation for the sets used in Section 4.3, we use the
following notation to denote the corner i of an element e: {e, i}. The first and simplest
pass is summarized in Algorithm 5.1. For each tetrahedron, we compute the deformation
tensor F , which is passed to the polar decomposition method as described in Section 5.1.
Next, the non-critical nodes are integrated explicitly by computing the rotationally cor-
rected linear forces as shown in Section 2.2.4. The velocities and positions are then up-
dated using the symplectic Euler (see Section 2.3.1). In Algorithm 5.2 we compute the
displacements for each tetrahedron in the stable set T . They are determined in the ele-
ments rest frame by rotating the nodal positions using the previously computed rotations.
The computed nodal forces are then rotated back to the current frame. If the node be-
longs to the set of non-critical nodes N we store the result. Finally, the symplectic Euler
integration is applied to the nodes in N using the accumulated forces.
116 5. I MPLEMENTATION A SPECTS

Algorithm 5.1 Rotation extraction part of the IMEX algorithm.


for all t (T T ) do
x0 = RestP osition({t, 1}) RestP osition({t, 0})
x1 = RestP osition({t, 2}) RestP osition({t, 0})
x2 = RestP osition({t, 3}) RestP osition({t, 0})
p0 = P osition({t, 1}) P osition({t, 0})
p1 = P osition({t, 2}) P osition({t, 0})
p2 = P osition({t, 3}) P osition({t, 0})
F = [p0 p1 p2 ] [x0 x1 x2 ]1
Rt ComputeRotationF romT ensor(F )
end for

Algorithm 5.2 Explicit integration part of the IMEX algorithm.


% Compute states at t + h for the nodes i N .
for all e T do
d0 = RTe P osition({e, 0}) RestP osition({e, 0})
d1 = RTe P osition({e, 1}) RestP osition({e, 1})
d2 = RTe P osition({e, 2}) RestP osition({e, 2})
d3 = RTe P osition({e, 3}) RestP osition({e, 3})
d = {d0 , d1 , d2 , d3 } R12
K e LoadStif f nessM atrix(e) R1212
{f 0 , f 1 , f 2 , f 3 } K e d
for all k [0, 3] do
if {e, k} N then
F orce({e, k}) F orce({e, k}) Re f k
end if
end for
end for
for all i N do
V elocity(t + h, i) V elocity(t, i) + FMorce(i)
ass(i)
P osition(t + h, i) P osition(t, i) + hV elocity(t + h, i)
end for

The positions and velocities of the non-critical points do not change in the following im-
plicit part as they are correct with respect to explicit integration. The implicit integration
tries to minimize the stored deformation energy with respect to the internal forces and the
external boundary conditions. For the explicitly integrated nodes, we know the solution
at the end of the integration step and thus they act as boundary condition to the implicit
problem. The system of equations stated in (4.23) is solved using a CG algorithm. The
first step is to compute the initial residual as summarized in Algorithm 5.3. The process
is similar to the one above evaluating the internal forces of the non-critical nodes. We
5.2. IMEX 117

start with (4.23) and replace f by the forces of the interface nodes K(xi (t + h) xi (0))
(note that we omitted the rotation here for brevity). Instead of using the current posi-
tions of the critical nodes in N , we use a projection of them to the end of the time step
xi (t + h) = xi (t) + hv i (t + h). Because the velocities v i (t + h) are not known yet,
we use the current velocities as an estimate. If node i was already integrated explicitly
then v i (t + h) is known and thus is available to the implicit integration. Thus, from an
algorithmic point of view we do not distinguish between critical and non-critical nodes.
However, for the non-critical nodes we assume that the solution is correct and thus the
residual is 0. Setting up the equations for the residual and reordering the terms we can
eliminate M and arrive at a very compact formulation.

Algorithm 5.3 Computation of the initial residual for the CG of the IMEX algorithm.
% Compute states at t + h for the nodes i N .
for all i N do
Residual(i ) h ExternalF orce(i )
end for
for all e T do
d0 = RTe (P osition({e , 0}) + hV elocity({e , 0})) RestP osition({e , 0})
d1 = RTe (P osition({e , 1}) + hV elocity({e , 1})) RestP osition({e , 1})
d2 = RTe (P osition({e , 2}) + hV elocity({e , 2})) RestP osition({e , 2})
d3 = RTe (P osition({e , 3}) + hV elocity({e , 3})) RestP osition({e , 3})
d = {d0 , d1 , d2 , d3 } R12
K e LoadStif f nessM atrix(e ) R1212
{f 0 , f 1 , f 2 , f 3 } hK e d
for all k [0, 3] do
if {e , k} N then
Residual({e , k}) Residual({e , k}) Re f k
end if
end for
end for
for all i N do
Direction(i ) Residual(i )
end for

In the iterative process (see Algorithm 5.4) the residual is continuously reduced by up-
dating the velocities with a better approximation. Our implementation of the iterations
follows [Rodriguez-Navarro and Susin 2006]. In their work they reorder the operations
such that all the dot products are computed at one place instead of two. This allows to
compute the updates for the vector in the same loop. This results in a more compact rep-
resentation with a reduced number of synchronisation points such that more work can be
done in parallel without interruption.
118 5. I MPLEMENTATION A SPECTS

Algorithm 5.4 Iterations of the CG for the implicit integration part of the IMEX algo-
rithm.
while iter < 3|N | do
for all i N do
Q(i ) = M ass(i )Direction(i )
end for
for all e T do
d0 RTe Direction({e , 0})
d1 RTe Direction({e , 1})
d2 RTe Direction({e , 2})
d3 RTe Direction({e , 3})
d {d0 , d1 , d2 , d3 } R12
K e LoadStif f nessM atrix(e ) R1212
{q 0 , q 1 , q 2 , q 3 } h2 K e d
for all k [0, 3] do
if {e , k} N then
Q({e , k}) Q({e, k}) Re q k
end if
end for
end for
r 0, g 0, b 0, a 0
for all i N do
r r + Residual(i )T Residual(i )
g g + Direction(i )T Q(i )
b b + Residual(i )T Q(i )
a a + Q(i )T Q(i )
end for
r/g
(r 2b + 2 a)/r
for all i N do
V elocity(i ) V elocity(i ) + Direction(i )
Residual(i ) Residual(i ) Q(i )
Direction(i ) Residual(i ) + Direction(i )
end for
if |r| < threshold then
break
end if
end while
5.3. PARALLELIZATION 119

In each iteration first the vector Q is computed. This process is similar to the scheme
which is used to evaluate the forces of the non-critical nodes. The stiffness matrices
are applied to the current Direction-vector, which is rotated to the local frame of each
tetrahedron. Next, the update magnitudes , are calculated in a series of dot-products;
the solution and intermediate vectors are then updated accordingly. With each iteration
the velocities of the critical points improve by minimizing the residual. After the iterations
process has converged, we update the positions of the critical nodes with the computed
velocities.

5.2.1 Clustered IMEX

As an extension to the plain IMEX method, Section 4.3.2 introduced clusters in order to
improve the performance. This results in smaller systems, which require less iterations to
converge. In the following, we provide some details on how we handle clusters within the
solving process. In order to find the sets of disjoint nodes efficiently we use a union-find
structure. This process starts by assigning each node in N its own set. In an iterative
process, traversing all the tetrahedra in T , the sets are joined appropriately. For each
tetrahedron t T , we join sets containing a node of t by assigning them the same
parent node, which also acts as the set label. Accessing the set-label of a node is done
by recursively accessing the parent of the node until the top-level is reached. During this
traversal, we store the visited nodes and dynamically reassign each of them directly its
set-label in a second pass, which speeds up later queries. Based on the final node set, each
tetrahedron in T is assigned to a list such that all tetrahedra in the same cluster are in a
single list.
The solving process described above is modified such that each cluster is solved indepen-
dently. For each set we allocate additional data structures to store the temporary variables
for the CG iterations. There is no need to handle node data explicitly because the nodes
of different clusters are not connected. This allows to compute the initial residual for
each cluster in a single pass over the elements in the previously computed lists. Simi-
larly, during the iterative part of the algorithm (see Algorithm 5.4) each cluster is treated
independently using the additional structures. This allows to terminate the loop for each
cluster individually. The maximum number of iterations is typically dominated by the
largest cluster. Nevertheless, if there are many small clusters, computation time is re-
duced significantly as these terminate early and are removed from the work queue.

5.3 Parallelization

Parallelization is an important topic when dealing with interactive simulations. Although,


this thesis focuses on algorithmic improvements, we discuss how to parallelize FEM and
120 5. I MPLEMENTATION A SPECTS

show the limitations of real-time simulations. In this section, we cover the details of
parallelizing FEM on both the CPU by using OpenMP [Chapman et al. 2007] and on
the nowadays commonly used compute coprocessors, e. g. the Compute Unified Device
Architecture (CUDA) by NVIDIA [2011]. A recent example for CPU side paralleliza-
tion is e. g. [McAdams et al. 2011], who implement a hexahedra-based parallel multigrid
solver for FEM. Recently, many works have been published discussing CUDA imple-
mentations of the FEM, e. g. [Taylor et al. 2008, Comas et al. 2008, Joldes et al. 2010,
Allard et al. 2011, Kim and Pollard 2011]. The basic concepts behind each of the im-
plementations are similar, independent of using either explicit or implicit integration. Es-
pecially the implementation by [Allard et al. 2011] is the most similar to ours, however,
developed independently. Their method is part of the SOFA framework [Allard et al.
2007] for medical simulations.

The main code path for both OpenMP and CUDA remains serial, while the loops iterating
over nodes and elements are parallelized. Looking at the integration code only, we see that
the size of the serial part contains only control flow instructions, while the parallelizable
part increases as larger problem are solved. Usually Amdahls law or Gustafsons law
[Gustafson 1988] are used to characterize the potential speedup by breaking the program
in to a serial and a parallel part, assuming that the latter can be parallelized perfectly.
Although, the application of these laws to our problem is only limited, they still serve as a
good estimate. We should see a good speedup behavior because only one synchronization
points requires serial execution. However, in order to exploit parallelization well, we
need to consider memory write-operations of the parallel part. Distributing the iteration
over nodes or elements to several execution units inevitable raises the question of how
to prevent read-write hazards. The parallelization algorithms can be viewed from two
fundamentally different positions: an output centric and an input centric view. The former
avoids hazards by iterating over the output elements instead of the input elements. The
downside of this approach is that the gathering process incurs redundant computations
if the output element depends on several input elements. On the other hand, the input
centric view focuses on minimizing redundant computations, but has to deal with the
problem of scattering results, potentially creating read-write hazards. These two views are
sketched in Figure 5.1. On the left each output element needs to fetch its input elements,
potentially doing the same computations for an input element several times. On the right
the computations are done for each input element once and the results are distributed to
the output elements. While the former is more expensive, the later has potential read-write
hazards. In our parallelization approach, we opt for the latter version, similar to [Comas
et al. 2008]. In order to avoid any hazards, we use multiple output arrays, and merge
them in a later step.

Still, parallelizing the computations with OpenMP is straight forward. For each thread
we intend to use, we assign a number of FEM elements by adding appropriate compiler
instructions to the loop iterating over the elements. Additionally, each thread is assigned a
5.3. PARALLELIZATION 121

Input Input

Output Output

Figure 5.1: Gathering and scattering oriented computation. On the left each output el-
ement fetches several input elements. On the right the computations are done per input
element and the results are scattered to the output. Gathering potentially incurs redundant
computations, while scattering creates read-write hazards.

full-sized output array. It processes its assigned elements and stores the result in its output
array. In a second step the partial results are accumulated to compute the results per node.

5.3.1 CUDA

Using CUDA to parallelize the computation is more complicated, but follows the same
pattern. First, we give a short introduction to the compute device architecture to identify
the crucial algorithmic design aspects.

CUDA Device
SMP SMP
Registers Thread Registers Thread
Registers Thread Registers Thread

Registers Thread Registers Thread

Shared Memory Shared Memory

Global Memory

Figure 5.2: Schematic overview of a CUDA device. Threads are grouped and assigned to
a shared multiprocessor (SMP). Each SMP has its own register file and its private block
of shared memory.

The Compute Unified Device Architecture (CUDA) is a general description of how com-
putational units are organized. We provide a small overview only; for a full description
we refer to [NVIDIA 2011]. Figure 5.2 shows a schematic of a CUDA device. It is or-
ganized as a collection of independent streaming multiprocessors (SMPs); each of which
122 5. I MPLEMENTATION A SPECTS

executes a number of threads in parallel. Because there are more threads assigned to a
single SMP than there are hardware resources to execute them in parallel, threads are
collected in groups of 32, called warps. Threads in a warp can be thought of as being
executed in parallel. The hardware is allowed to switch between warps to maximize re-
source utilization. In order to do this switching efficiently, each thread is assigned a fixed
number of registers it may operate on.
Additionally, each SMP contains a block of shared memory, which is accessible from all
the threads running on it. It acts as a place to store data in order for several threads to
share their computations. Communication between threads running on different SMPs is
not trivially possible. All of them have access to global random access memory, where
the input and output data are stored. Its is several hundred megabytes, but accessing it
incurs high latency. This is usually not a problem because the latency can be hidden by
the scheduling unit.
In order to achieve high computational throughput, several issues have to be taken into
account. The efficiency highly relies on the fact that all threads in a warp are running in
lock step, executing the same instruction sequence. If computations on different threads
branch the execution, the diverging branches are serialized, executing first one branch,
then the other one. Thus, it is important that each thread executes the same instruction
sequence. Besides high latency, access to global memory should happen as sequential as
possible. Threads are linearly indexed, which allows issuing sequential memory access
commands based on the indices. The granularity and alignment of the access depend on
the hardware architecture. The last issue is accessing shared memory inappropriately.
Shared memory is organized into banks where no two threads of a warp are allowed to
access the same bank concurrently. If they still do, the access will be serialized stalling
the computations.
The computations are organized in units called kernels. A single kernel can be thought of a
for-each loop executed in parallel, where each iteration is assigned to an individual thread.
In the following, we describe the kernels for the corotational FEM: rotation extraction and
force computation. In total, we distribute the work to four different kernels.

1. The first kernel prepares the rotation extraction by computing F and F T F . Four
contiguous threads are dedicated to perform the matrix multiplication for the result-
ing 3 3 matrix. First, the kernel computes
[( ) ( ) ( ) ( )] [( ) ( ) ( ) ( )]1
p0 p1 p2 p3 r0 r r r
F = , , , , 1 , 2 , 3 , (5.1)
1 1 1 1 1 1 1 1
where p are the current and r are the rest positions. We use four threads in order to
optimize the memory access pattern. Each loads one of the positions and a column
of the inverted position matrix. In a second step, F T F is computed using three of
the four threads (the fourth thread remains idle). The results are stored in global
memory.
5.3. PARALLELIZATION 123

2. Using F and F T F , the second kernel extracts the rotations using cyclic Jacobi
iterations (see Section 5.1). Each thread is assigned its own input matrix. Compared
to the rest of the computations, the costs of this part are negligible, thus we did not
optimize the method intensively. We use a scalar variant of the polar decomposition,
where only the branches are eliminated using a static code path. In order to optimize
the later memory loading, the rotation matrices are stored aligned to a 12 word
boundary.

3. The third kernel computes the internal forces for each tetrahedron. As described
above, we define a scattering scheme assigning each node of an element its own
storage location. The partial forces are computed by rotating the current positions
to the reference frame, multiplying the displacements with the stiffness matrix and
rotating the result back.

4. The final kernel accumulates the partial forces.

The most complex kernel is the one computing the forces per tetrahedron. The most ex-
pensive operation is the multiplication of a 1212 matrix with a 121 vector. Because the
matrix is by far the largest piece of memory, the kernel cannot store it locally and needs
to stream it. 12 threads perform the data loading and multiplication in conjunction, such
that each thread computes one component of the output. Thus, we assign 12 consecutive
threads to compute the forces generated by a tetrahedron. Each tetrahedron is defined by
four indices pointing to its nodes and four indices describing the unique output locations
to avoid read-write hazards. For higher memory throughput, we pad the tetrahedron data
to a 12 word boundary. Figure 5.3 shows a 2D example of the graph coloring scheme
which assigns each corner of each tetrahedron its own output index. The total memory
requirements for this method scale linear with the maximum vertex valence. In this ex-
ample the nodes A and B require 5, and node C 6 output location. Thus, we generate 6
output locations for each node.

Nodes
B A B C D E

D 5 0
0
4 ... Array 0
3
1 2
4 1
3 2
4 0
... Array 1
3 1
2
...

... Array N -1

Figure 5.3: Graph coloring for hazard free writes on CUDA devices. Each node has a
separate output location for each adjacent triangle. e. g., the algorithms generates 5 output
locations for node A.
124 5. I MPLEMENTATION A SPECTS

The first step of the kernel loads the necessary data (except for the stiffness matrix) and
stores them in the shared memory. The 12 threads share the task of loading the data
to maximize hardware utilization. Next, the rotations are applied to the positions and
the displacements in the rest-frame are computed. The result again is stored in shared
memory. The main task of computing the matrix-vector product is done by accumulating
the point-wise product of each column of the stiffness matrix with the displacement vector.
Each thread loads a single scalar of a column and multiplies it with the corresponding
displacement value. In order to efficiently allow this operation, the memory layout of the
stiffness matrix is reorganized in a preprocessing step such that the first columns of all
stiffness matrices are adjacent in memory. The same is applied to the rest of the columns.
This storage layout allows it to load the matrix data consecutively without violating any
memory alignment rules. Finally, the computed forces are rotated back to the current
frame and stored in the defined memory location for the corner of the tetrahedron.

5.3.2 IMEX using CUDA

Implementing our IMEX algorithm on a massive data parallel architecture using CUDA
is a straightforward process using the building blocks of the explicit integration imple-
mented in CUDA from the previous section. This section summarizes the implementation,
which is based on the algorithm outlined in Section 5.2.

The CUDA implementation uses the same control flow as the CPU version, just replacing
the loops over the nodes and elements with device kernels. The loops over the tetrahedral
elements is performed by almost the same kernel as it is used for explicit integration. Only
small additions are needed for the kernels which compute the initial residual and q = Ad.
These additions contain fetching additional node data and checking which nodes need to
be updated, which are local and inexpensive operations. Only for the dot-products and the
vector updates additional kernels were written. In order to reduce the number of invoked
CUDA kernels and also communication overhead, we do not use the clustering approach
for the IMEX. Typically, the number of ill-shaped elements is small anyway and thus the
CUDA device will not be used to its full capacity.

5.4 Performance

In this section we look at the performance of the implemented FEM algorithms. In the
previous chapters the computations times were mostly gathered using single threaded ver-
sions. Here, we measure the performance the parallelized implementation in order to be
able to determine the applicability of explicitly integrated corotational FEM for interactive
simulations. We show the potential speedup using OpenMP and CUDA. The benchmarks
5.4. P ERFORMANCE 125

are performed on an Intel Core2 Quad 2.66 GHz on Windows 7 64-bit using the Mi-
crosoft Compiler version 16 (Visual Studio 2010). The OpenMP results were obtained
on the same system, using different numbers of threads. The CUDA benchmarks were
running on a NVIDIA GTX 285.

Dimensions Rotation Compute


(l w h) # Nodes # Tetrahedra Extraction [ms] Forces [ms] Total [ms]
10 1 1 44 50 0.024 0.018 0.041
10 2 2 99 200 0.095 0.070 0.16
10 3 3 176 450 0.21 0.16 0.37
10 4 4 275 800 0.37 0.29 0.66
10 5 5 396 1250 0.59 0.45 1.04
10 6 6 539 1800 0.84 0.65 1.49
10 7 7 704 2450 1.15 0.91 2.06
10 8 8 891 3400 1.49 1.20 2.69
10 9 9 1100 4050 1.90 1.60 3.50

Table 5.2: Performance of the corotational FEM implementation. In 1 ms the forces of


approximately 1000 tetrahedra can be computed. The average computation time for a
single element is 0.9 s. This yields an approximate throughput of 1 million tetrahedra
per second.

The first benchmark is implemented as single threaded version. In order to make it scal-
able we use a tetrahedral mesh based on a regular grid, where each cell is subdivided into
5 tetrahedra. One side of the mesh is subject to a fixed boundary condition, while the rest
of the nodes are subject to gravitational forces. Table 5.2 shows the computation time
necessary to perform a force update for one time step. Real-time performance is given
when the computation time is lower than the step size. Given a practical target time step of
1 ms, this would limit the maximum mesh size of less than 1000 elements. Further, it can
be noted that the rotation extraction takes about 60% of the computation time, which is a
substantial part. Changing the memory layouts to take advantage of vector-instructions is
important to reduce these costs as discussed in Section 5.1.
Using OpenMP to parallelize the computations gives a speedup even for very small sam-
ples. For the same test cases as above, the performance numbers using up to four threads
are plotted in Figure 5.4. Up to a single synchronization point all the computations can
be performed in parallel. Thus, even for small sample sizes there is a performance gain,
albeit setting up the parallelized calls incurs latencies. Increasing the number of threads
from one to four, increases execution performance even for the smallest sample sizes.
Additionally, by increasing the sample size, the latencies are amortized and in accordance
to Gustafsons law the speedup factor approaches almost 2x, 3x, and 4x.
Running the same samples on the GPU shows an entirely different picture. In Figure 5.4
the performance of the GPU implementation is depicted along with the OpenMP results.
126 5. I MPLEMENTATION A SPECTS

Figure 5.4: Performance of the parallel implementations of the corotational FEM. With
growing work sets, both OpenMP and CUDA, get more efficient by amortizing the over-
head associated with starting parallel computations. For OpenMP the performance gain
grows from approximately 2x to approximately 4x, while for CUDA small samples are
slower but the improvements grow to approximately 10x.

Compared to the CPU results the computation times are low and increase only slightly
with a growing working set. Note that for small cases the CUDA version is slower than
the CPU version. This has two reasons: First, the measurements include the data up-
load to the device, e. g. positions, the invocation of the kernels, and the data download;
invoking memory transfers is comparably expensive and causes high latency. Second,
calling kernels is expensive in general. We use a small sample size of 5 tetrahedra to
measure the overhead involved in calling the required device kernels only (i. e. doing only
minimal amount of computations). This overhead is roughly 0.2 ms on a NVIDIA GTX
285, which naturally limits the minimum possible simulation time step. With upcoming
systems (lower latencies, more flexible CUDA devices) this overhead could be reduced.
Furthermore, in most cases it is possible to place the memory transfers such that the in-
curred latency is masked by other computations. Thus, the second CUDA graph (CUDA
(compute)) shows the computation time for the kernels only, assuming that the costs of the
memory transfers could have been hidden. The costs incurred by the memory transfers
make up to 30% of the runtime. Through this, it is possible to improve the performance
by a factor of 10 in these examples. Higher speedups are observed with larger samples
sizes. However, if, due to other parts of the simulation, the maximum number of elements
would be 1000, the performance improvements are only approximately a factor of 3 to
5.4. P ERFORMANCE 127

Figure 5.5: Computation times for different meshes sizes and different iteration grouping
sizes to execute 40 iterations.

5 depending on whether memory transfers can be hidden or not. Note that the OpenMP
implementation with four threads performs about the same in this case.
In summary, there is no clear parallelization path, which is the best. OpenMP is much
easier to use than CUDA and shows an immediate benefit even for small working sets.
While the potential speedup is much larger using CUDA the available time per step is
limited and thus the latency involved may limit the usability of CUDA.

5.4.1 Speedup using CUDA for IMEX

As already mentioned above, launching a device kernel and waiting for the computed
results entails a significant latency. While explicit integration requires invoking four ker-
nels, the IMEX requires many more. In addition to the four of the explicit integration, we
invoke two kernels to initialize the residual and another three for each iteration. Assum-
ing that 20 iterations are necessary for the CG to converge the number of invoked kernels
accumulates to 60. Furthermore, for 20 iterations this means that we need to stop the
computations 20 times in order to check if the result has already converged; this check in-
creases the latency significantly. Since we cannot reduce the number of called kernels, we
try to reduce latency within the iterations by grouping several iterations without checking
the error.
The performance improvements of grouping iterations is summarized in Figure 5.5. Sim-
ilar to the benchmark in Section 5.4 we use a tetrahedral mesh based on a regular grid.
128 5. I MPLEMENTATION A SPECTS

We configure the IMEX such that all elements are classified as critical. The performance
is measured for a total of 40 iterations for different grouping sizes n. Note that the per-
formance increases with higher values of n. Choosing a good value for n depends on
the expected number of iterations. Typically, the number of iterations per time step is
nearly constant for a simulation, thus the easiest optimization is to omit the checks com-
pletely and just stop after a distinct number of iterations. Additionally, this would ensure
a constant runtime for the algorithms, which is important if only limited computational
resources are available.
Although we obtain significant improvements in case of explicit FEM (see Section 5.4),
the speedup in the case of the IMEX is not very impressive. We perform the experiments
from Section 4.3.3 again using the CUDA-based IMEX and compare the performance.
In order to improve the implicit part, we group 10 iterations and execute them without
checking for convergence. In case where all elements are well-shaped we find speedups
of approximately 10x similar to a fully explicit integration. As soon as ill-shaped elements
appear the speedup factor decreases to approximately 3x to 4x. In many cases the set of
ill-shaped elements is small and thus the compute device cannot exploit its full potential.
Thus, for small problem sizes we observe low speedup factors which can also be achieved
using OpenMP. With a growing number of elements in the implicit set, the speedup factor
increases fast due to the fact that the costs of the CPU implementation grow faster than
the costs of the CUDA implementation.

5.5 Iterative Computation of the Highest Oscillation Fre-


quency

In the previous sections, we discussed how to improve the performance of FEM by par-
allelizing the computations. This section shows how to efficiently evaluate the CFL con-
dition in case topological changes occur and elements need to be checked. Determining
the oscillation frequencies of a system matrix is a expensive procedure because it requires
solving an eigenvalue problem. However, because we only need the highest frequency,
we can use an iterative process to determine the largest eigenvalue only. The straight for-
ward approach of solving the eigenvalue problem using established numerical procedures
is implemented with the Eigen library [Gunnebaud et al. 2010], which is based on the
NIST public domain code of JAMA [Hicklin et al. 2005]. It computes a full solution, i. e.,
all the eigenvalues and eigenvectors. The iterative algorithm computing the highest fre-
quency directly by using a method called power iterations [Schwarz and Kockler 2006].
Repeatedly applying a matrix to a vector lets it converge to the eigenvector corresponding
to the largest eigenvalue:
5.5. I TERATIVE C OMPUTATION OF THE H IGHEST O SCILLATION F REQUENCY 129

x normalize({1, ..., 1}T )


0 0
for i = 1 imax do
v M 1 Kx
i v T x
xv
% Stop iterations if convergence is reached.
if i
i
i1
< then
return i
end if
ii+1
end for

The input to the algorithm is the system matrix M 1 K and a unit length vector as approx-
imation to the final eigenvector. Each iteration improves the approximation of eigenvalue
and eigenvector. The convergence of the method depends on the difference in amplitude
between the first and the second largest eigenvalue. A larger difference results in a higher
convergence rate; less iterations are necessary.

In a numerical experiment, we test the performance of the method on the prototypical


elements depicted in Figure 5.6. We normalize the measurements by employing the same
method as in Section 3.5; the Youngs Modulus and density are excluded from the system
matrices by setting them to E = 1 Pa and = 1 mkg3 , respectively. Likewise, we vary
the Poisson ratio to provide a good coverage of the material parameters. Table 5.3 shows
that using power iterations is approximately 20 times faster than using a full eigenvalue
computation algorithm. However, it is obvious that the computation time can be reduced
if a better input vector is used; ideally the eigenvector itself.

Poisson Ratio
Model = 0.3 = 0.4 = 0.49
TRef [s] T [s] TRef [s] T [s] TRef [s] T [s]
Cap 19.48 1.26 20.04 1.24 18.39 0.84
Needle 18.61 1.78 18.59 0.94 19.13 0.81
Round 15.49 1.54 15.21 1.25 15.15 0.84
Sliver 19.52 1.23 18.98 1.39 18.95 0.83
Spindle 19.49 1.83 19.32 1.39 18.85 1.19
Wedge 19.67 1.25 19.00 1.41 18.86 0.84

Table 5.3: Performance advantage of using power iterations to compute the largest eigen-
value for the single element identification. TRef is the computation time in microseconds
needed for the base line algorithm to compute the eigenvalues. T is the computation time
in microseconds for our algorithm using a uniform vector as input.
130 5. I MPLEMENTATION A SPECTS

It was shown by Kidger [1992] that the vibration mode corresponding to the largest eigen-
value of the system matrix reflects the change in volume. For isotropic materials this is
expressed by a scaling in the direction of this mode, which largely corresponds to the
points moving along their normal. We use this relation to speed up the identification of
the maximum oscillation frequency for a single tetrahedron. An improvement is observed
for both single element metrics, the single element identification, and the reduced 1-ring
method, even though the latter includes contributions from adjacent elements.

(a) = 0

(b) = 0.3

(c) = 0.4

(d) = 0.49

Figure 5.6: With growing Poisson ratio the normals (green) align better the estimated
eigenvectors (purple). The normals are a good estimate for the eigenvectors and thus,
when used as input to the algorithm, reduce number of needed iterations.

We show the correspondence of the highest mode eigenvector to the normals in Figure 5.6
using prototypical tetrahedral elements at different Poisson ratios. The node normals cor-
respond to the normals of the opposite faces. As can be seen, the correspondence in gen-
5.5. I TERATIVE C OMPUTATION OF THE H IGHEST O SCILLATION F REQUENCY 131

eral is very good. For higher Poisson ratios the angle between the normals and the highest
eigenmode decreases. Thus, the normals are a precise and easy to compute approximation
of the eigenmode corresponding to the highest oscillation frequency. Hence, we use the
normals of the face normals {n0 , n1 , n2 , n3 } as input instead of the normalized uniform
vector.
This improvement of the algorithms shows an additional speedup, which is summarized in
Table 5.4; it reduces the number of necessary iterations and thus computation time. T0 , Tn
are the computation times in microseconds for our algorithm using a uniform vector and
the normals as input, respectively. I0 , In are the average number of iterations per element.
In contrast doing the full evaluation using the Eigen library needs about 20 s per element.
In general, for larger Poisson ratios less iterations are necessary because the difference
between the largest and the second largest eigenvalue grows. In summary, within 1 ms
approximately 1000 elements can be tested whether they fulfill the CFL condition.

Poisson Ratio
Model = 0.3 = 0.4 = 0.49
T0 [s] Tn [s]) T0 [s] Tn [s]) T0 [s] Tn [s])
I0 In I0 In I0 In
1.26 0.91 1.24 0.75 0.84 0.75
Cap 7 3 7 2 4 2
1.78 1.16 0.94 1.05 0.81 0.76
Needle 11 5 5 4 4 2
1.54 0.77 1.25 0.58 0.84 0.71
Round 9 1 7 1 4 1
1.23 0.75 1.39 0.75 0.83 0.75
Sliver 7 2 8 2 4 2
1.83 1.17 1.39 1.04 1.19 0.78
Spindle 11 5 8 4 5 2
1.25 0.76 1.41 0.75 0.84 0.75
Wedge 7 2 8 2 4 2

Table 5.4: Performance advantage of using normals as input for power iterations. T0 , Tn
are the computation times in microseconds for our algorithm using a uniform vector or
normals as input; I0 , In are the number of iterations.

Applying the iterative method to more complex models shows similar results. Table 5.5
lists the average identification time per element for some of the meshes in Figure 3.5.
Comparing the computation time for single elements to the time needed to apply power
iterations to a complete mesh, demonstrates the advantage of approximating the largest
oscillation frequency per element. On average, the costs per node and iteration for com-
puting the largest eigenvalue of the complete system matrix are approximately 0.1 s. E.g.
the complete identification process for Liver (M) with = 0.4 takes 11.3 ms, whereas
132 5. I MPLEMENTATION A SPECTS

Poisson Ratio
Model = 0.3 = 0.4 = 0.49
T0 [s] Tn [s]) T0 [s] Tn [s]) T0 [s] Tn [s])
I0 In I0 In I0 In
1.51 1.20 1.18 0.99 0.87 0.75
Liver (M) 8 5 6 3 4 2
1.52 1.20 1.18 0.99 0.87 0.75
Bar (F) 8 5 6 3 4 1
1.49 1.09 1.19 0.94 0.87 0.75
Bunny (XF) 8 4 6 3 4 1
1.48 1.08 1.18 0.95 0.87 0.75
Dragon 8 4 6 3 4 1
1.50 1.14 1.19 0.97 0.87 0.76
Armadillo 8 4 6 3 4 1
1.44 1.00 1.18 0.90 0.86 0.75
Duck 8 3 6 3 4 1

Table 5.5: Average computational effort per element to compute the largest eigenvalue
for various meshes. T0 , Tn are the computation times using a uniform vector or normals
as input; I0 , In are the number of iterations.

identifying each element individually takes 1 s per element; in total 2.75 ms. In con-
trast, for the Duck mesh with = 0.4 the process takes 0.52 ms for the whole matrix and
0.55 ms for the individual elements. In general, running the identification process for the
whole mesh or for all elements individually takes about the same computational effort
for small meshes, while for larger meshes identifying the elements individually is more
efficient.

5.6 Discussion
In this chapter we discussed various implementation specific aspects important to im-
plement our methods efficiently. First, we looked at different ways to extract the rota-
tion from a deformation tensor. We showed that all methods roughly are comparable
performance-wise. We believe that the computational work cannot be significantly re-
duced, however, as the results by McAdams [2011] indicate, vectorizing the processes
can be done effectively. This is essential as the rotation extraction makes up more about
60% of the overall computation time for the explicit corotational linear FEM. In general,
parallelizing explicit integration scales very well with the number of additional processes.
However, more processors also means that larger problems need to be solved in order to
hide initial latency. This may not always be possible due to various circumstances lim-
iting the number of elements in a simulation. While the CUDA implementation shows
a tremendous speedup for explicit FEM, the gain is much lower for the IMEX method.
5.6. D ISCUSSION 133

Solving the implicit part of it using CUDA is efficient only for large problems. The small
problems typically appearing when using the IMEX scheme do not use the hardware to
its full capacity and thus, the improvements we measure are only in the range of OpenMP
parallelization. Finally, we showed that it is efficiently possible to compute the highest
eigenvalue of a tetrahedral system matrix M 1 K by exploiting the fact that the normal
vectors of a tetrahedron align with the eigenvector of the highest oscillation frequency.
6
Conclusion

This final chapter first gives a short summary of the presented topics and then discusses
the main contributions. We wrap up this thesis by looking into short comings we observed
and by giving an outlook on possible extensions to our work.

6.1 Summary
We presented a framework which enables us to detect and handle ill-shaped elements in
generic explicitly integrated FEM simulations. First, this allows us to efficiently identify
the critical integration time step for a given mesh by only looking at single elements.
Second, we are able to select an update step which is above the critical time step by
handling the elements that need a smaller one. As a consequence, the time step can be
maintained throughout the simulation by handling any changed or newly created elements
resulting from topological changes.
In the first part of this thesis we discussed general stability properties of explicit integra-
tion algorithms. We showed that the CFL condition is an appropriate measure to capture
all the factors, which influence the critical time step. However, the necessity of solving
a huge eigenvalue problem makes this approach inadequate for interactive simulations.
Based on these observations we developed several element-based methods to approxi-
mate the stability criterion. We showed that single elements are sufficient to compute
a bound for the highest oscillation frequency and that an extension to the neighborhood
significantly improves the results. As one of these extensions we presented the reduced
1-ring approximation, which takes the influence of the direct neighbor elements into ac-
count. With extensive numerical tests we verified that the error is low, which makes this
method perfectly usable to identify the critical time step for explicit integration of linear
FEM. Finally, we extended the method to non-linear FEM and showed that it can be used
to verify the stability of a simulation at any given time.
In the second part, we developed four approaches for computing the dynamic evolution
of deformable objects. These overcome the stability problems of previous approaches
136 6. C ONCLUSION

when employing explicit numerical time integration. All methods consist of two stages:
In the first step we determine the ill-shaped elements, i. e., those which cannot be sim-
ulated stably given a target time step. This classification is done by evaluating the CFL
condition for each element individually. For the filtered FEM we compute all the eigen-
modes of each element without any neighborhood information. For the other methods
we consider the mutual interactions with the neighboring elements and employ the re-
duced 1-ring method to find the critical time step for each element. This identification
step can be performed as a precomputation step in case of using linear FEM. The sec-
ond step computes the dynamics of the deformable objects using the corotational linear
FEM. In order to keep the explicit integration stable we handle the ill-shaped elements
separately. Each of the four presented methods uses a different approach. The filtered
FEM modifies the stiffness matrices of the elements such that no oscillation frequency
exceeds the stability criterion. In order to prevent elements from collapsing, modified ele-
ments are constraint along the filtered directions. Instead of filtering, the hybrid FE/PBD
method uses a position-based approach to solve the dynamics equations of the ill-shaped
elements. In contrast, the IMEX method uses an implicit solver to handle the ill-shaped
elements. The last approach uses an time adaptive integration approach. The nodes of
the mesh are sorted into queues depending on the critical time step the adjacent elements.
Each queue is then integrated using its distinct update rate.

In the last part of this thesis we presented various implementation aspects. We took a
closer look at the implementation of rotation extraction algorithms, which account for the
majority of the computation costs. We conducted a small survey and compared various
methods in order to find the best algorithm. Furthermore, we discussed different paral-
lelization technologies and applied them to the corotational linear FEM and the IMEX
method. The chapter closes with showing an iterative algorithm to efficiently compute
the largest eigenvalue of the element system matrices.

6.2 Discussion of Contributions

The key benefit of the presented methodology is the alleviation of the dependency between
meshing and material parameters from the size of the time step. As a consequence, an
initially defined target time step can be maintained throughout a simulation, even if an
object undergoes limited topological changes. This makes our approach particularly well-
suited for interactive cutting simulations.

The element-based approximation of the highest eigenfrequency of the system matrix of


a simulation mesh allows us the create an accurate metric to judge the quality of single
elements. While this thesis mainly discusses the application to tetrahedral elements, the
extension to other element types is straight forward. In contrast to other methods our ap-
proximation approach take the exact element geometry and material model into account
6.2. D ISCUSSION OF C ONTRIBUTIONS 137

by directly working on the element stiffness matrices. Moreover, while current methods
only take the element itself into account our reduced 1-ring metric accurately includes
the contributions of the neighboring elements. It proves to be the best method to approx-
imate the CFL criterion on the basis of individual elements. It shows only small errors
in the computed results, is efficient to evaluate, thus allows to precisely classify individ-
ual elements. The locality of the developed methods makes them applicable in situations
where topological changes are frequent, e. g., due to users performing cutting operations.
Due to the intrinsic error the reduced 1-ring makes there is no definite guarantee that no
miss-identification happens, even though the probability is low.

However, just identifying the critical time step is not always enough. Through topological
changes the critical time step may decrease significantly to a point where the simulation is
not interactive anymore. Similarly, mesh generation often creates a few elements of low
quality, which require an unnecessary small time step. In these cases the four developed
deformation models allow to maintain the initial time step by handling any appearing ill-
shaped element. By the same means, the simulation time step can be chosen above the
critical time step from the beginning. Typically, it can be increased by a factor of two
while only few elements need to be handled specially.

While all the developed deformation methods allow to handle ill-shaped elements and
improve the time step, the quality of the resulting simulations varies. For a small set of
ill-shaped elements, the filtered FE method provides good results with a low error rate.
However, due to the strict definition of the constraints, it is not able to cope with large
connected regions of ill-shaped elements due to locking problems. From a performance
perspective the hybrid FE/PBD method is superior, because its costs are close to those
of the unmodified FE method. The downside is that the observed errors are comparably
large in regions of large strains or if high Poisson ratios are used. However, typically
the strains in medical simulations are small and thus the errors will also be small. The
remaining two methods show only very small errors even in case of large strains or large
clusters of ill-shaped elements. On the other hand, the IMEX method scales badly with
the number of ill-shaped elements, because a system of equations needs to be solved.
While the time adaptive FE method scales well, it is cumbersome to implement and the
runtime is dominated by the worst shaped element, which may potentially take infinitely
many steps. This requires additional care to ensure that such elements do not appear.

In any case, we cannot evade the conditional stability of explicit integration schemes com-
pletely. If the target time step is too large all elements are classified as ill-shaped, and thus
we would have no explicitly integrated elements left. While our methods, with exception
of the filtered FEM, are still applicable, their usage is questionable. The methods pre-
sented in this thesis have been developed with a specific type of interactive real-time ap-
plication in mind, where using small time steps can be desirable. For instance in surgical
simulations, where small time steps are typical and ease tasks such as collision handling
or haptic rendering.
138 6. C ONCLUSION

6.3 Future Work

Although, we can use explicit integration effectively in interactive simulation, without the
need to constantly adapt the time step various open questions remain.
In order for the reduced 1-ring method (see Section 3.3.3) to evaluate the CFL condition
conservatively we need to apply a margin. So far, this margin is chosen heuristically based
on experimental results such that any observed error lies within it. A theoretical way or a
better heuristic method would make the classification algorithms more solid. Although the
application of the developed metrics to the (corotational) linear FEM is clear and works
well, we only preliminary verified its usage on the non-linear Saint Venant-Kirchhoff
model. More rigorous tests and the extension to more complex hyper-elastic materials are
necessary.
A common way to extend the material models is to include additional structural damp-
ing. However, adding damping in explicit integration is cumbersome because the CFL
condition needs to be adapted. Klein [2007] describes the CFL condition for a central
difference scheme with damping. For a fully damped simulation the computed critical
time step is 41% of the undamped case. Even, if we accept that the critical time step
is reduced, extending the scalar damping definition to a multi-dimensional case is not
straightforward. Instead, it would be favorable to develop a damping scheme which does
influence the CFL condition predictably positive and, similar to Rayleigh damping, only
works against the internal forces.
The filtered FEM shows short-comings which limit its applicability. First, the definition of
the constraints is too strict, which leads to the creation of too many constraints and locking
problems. Moreover, we have seen that high compression can lead to unstable constraint
resolution. These two problems require adapting the definition of the constraints. Second,
solving the constraints is expensive due to the necessity to solve a system of linear equa-
tions. A different approach to resolve the constraints could reduce the costs. A possible
method could be Position-based Dynamics by Muller et al.[2007].
In case of the hybrid FE/PBD method we find two distinct deficiencies. First, anisotropic
behavior is caused by the shapes of the clusters. Using a more elaborate set of parameters
instead of a single scalar value could reduce this artifact by obtaining a more isotropic
behavior despite the anisotropic element shape. Second, as Section 4.5.2 demonstrates,
the current geometric model shows a very different behavior than linear FEM for large
strains or high Poisson ratios. Extending the rigid transformation to linear transformation
should be sufficient to represent the deformation modes of a tetrahedral element. How-
ever, a more elaborate parameterization is then necessary to match the deformation of the
linear modes to the original finite element deformation.
With the possibility of using explicit integration, collision handling should be adjusted
and simplified to take advantage of small time steps. Between different time steps the
6.3. F UTURE W ORK 139

set of colliding points does not change much and the penetrations are small. Thus, the
generated resolution displacements are also small. This allows using simpler methods for
collision resolution and less care has to be taken, e. g. to check for inverted elements due
to the collision.
Bibliography

[Aguinaga et al. 2010] I. Aguinaga, B. Fierz, J. Spillmann, and M. Harders. Filtering of


High Modal Frequencies for Stable Real-time Explicit Integration of Deformable Ob-
jects using the Finite Element Method. Progress in biophysics and molecular biology,
103:225235, 2010.
[Allard et al. 2007] J. Allard, S. Cotin, F. Faure, P.-J. Bensoussan, F. Poyer, C. Duriez,
H. Delingette, and L. Grisoni. SOFA - An Open Source Framework for Medical Sim-
ulation. Studies In Health Technology And Informatics, 125:1318, 2007.
[Allard et al. 2011] J. Allard, H. Courtecuisse, and F. Faure. Implicit FEM Solver on
GPU for Interactive Deformation Simulation. In GPU Computing Gems Jade Edition.
2011.
[Alliez et al. 2005] P. Alliez, D. Cohen-Steiner, M. Yvinec, and M. Desbrun. Variational
Tetrahedral Meshing. ACM Transactions on Graphics, 24:617 625, 2005.
[Askes et al. 2011] H. Askes, D. C. D. Nguyen, and A. Tyas. Increasing the critical time
step: micro-inertia, inertia penalties and mass scaling. Computational Mechanics,
47:657667, 2011.
[Athanasiou and Natoli 2008] K. A. Athanasiou and R. M. Natoli. Introduction to Con-
tinuum Biomechanics, volume 3. 2008.
[Barbic and James 2005] J. Barbic and D. L. James. Real-Time Subspace Integration for
St. Venant-Kirchhoff Deformable Models. ACM Transactions on Graphics, 24:982
990, 2005.
[Barbic and Zhao 2011] J. Barbic and Y. Zhao. Real-time Large-deformation Substruc-
turing. ACM Transactions on Graphics, 30:91, 2011.
[Basdogan et al. 2007] C. Basdogan, M. Sedef, M. Harders, and S. Wesarg. VR-Based
Simulators for Training in Minimally Invasive Surgery. IEEE Computer Graphics and
Applications, 27:5466, 2007.
[Belytschko and Mullen 1978] T. Belytschko and R. Mullen. Stability of Explicit-
Implicit Mesh Partitions in Time Integration. International Journal for Numerical
Methods in Engineering, 12:15751586, 1978.
142 B IBLIOGRAPHY

[Belytschko et al. 1979] T. Belytschko, H. Yen, and R. Mullen. Mixed Methods for Time
Integration. Computer Methods in Applied Mechanics and Engineering, 17-18:259
275, 1979.
[Bern et al. 1995] M. Bern, P. Chew, D. Eppstein, and J. Ruppert. Dihedral Bounds for
Mesh Generation in High Dimensions. In Proc. Symposium on Discrete Algorithms,
pages 189196, 1995.
[Bielser and Gross 2000] D. Bielser and M. Gross. Interactive Simulation of Surgical
Cuts. In Proc. Pacific Conference on Computer Graphics and Applications, pages
116125, 2000.
[Brent et al. 1982] R. P. Brent, F. T. Luk, and C. Van Loan. Computation of the Singular
Value Decomposition Using Mesh-Connected Processors. Technical report, Cornell
University, 1982.
[Burkhart et al. 2010] D. Burkhart, B. Hamann, and G. Umlauf. Adaptive and Feature-
Preserving Subdivision for High-Quality Tetrahedral Meshes. Computer Graphics Fo-
rum, 29:117127, 2010.
[Chao et al. 2010] I. Chao, U. Pinkall, P. Sanan, and P. Schroder. A Simple Geometric
Model for Elastic Deformations. ACM Transactions on Graphics, 29:1 6, 2010.
[Chapman et al. 2007] B. Chapman, G. Jost, and R. van der Pas. Using OpenMP:
Portable Shared Memory Parallel Programming (Scientific and Engineering Compu-
tation). 2007.
[Cheng et al. 2004] S.-W. Cheng, T. K. Dey, E. A. Ramos, and T. Ray. Quality Meshing
for Polyhedra with Small Angles. In Proc. Symposium on Computational Geometry,
pages 290 299, 2004.
[Comas et al. 2008] O. Comas, Z. A. Taylor, J. Allard, S. Ourselin, S. Cotin, and J. Pas-
senger. Efficient Nonlinear FEM for Soft Tissue Modelling and Its GPU Implementa-
tion within the Open Source Framework SOFA. In Proc. International Symposium on
Biomedical Simulation, volume 5104, pages 2839, 2008.
[Courant et al. 1967] R. Courant, K. Friedrichs, and H. Lewy. On the Partial Differ-
ence Equations of Mathematical Physics. IBM Journal of Research and Development,
11:215234, 1967.
[De et al. 2011] S. De, D. Deo, G. Sankaranarayanan, and V. S. Arikatla. A Physics-
Driven Neural Networks-Based Simulation System (PhyNNeSS) for Multimodal In-
teractive Virtual Environments Involving Nonlinear Deformable Objects. Presence:
Teleoperators and Virtual Environments, 20:289308, 2011.
[Debunne et al. 2001] G. Debunne, M. Desbrun, M.-P. Cani, and A. Barr. Dynamic Real-
Time Deformations using Space & Time Adaptive Sampling. In Proc. ACM SIG-
GRAPH, pages 31 36, 2001.
[Desbrun et al. 1999] M. Desbrun, P. Schroder, and A. Barr. Interactive Animation of
Structured Deformable Objects. In Proc. Graphics Interface, pages 1 8, 1999.
B IBLIOGRAPHY 143

[Dompierre et al. 1998] J. Dompierre, P. Labbe, F. Guibault, and R. Camarero. Proposal


of benchmarks for 3D unstructured tetrahedral mesh optimization. In Proc. Interna-
tional Meshing Roundtable, pages 2628, 1998.
[Eberhardt et al. 2000] B. Eberhardt, O. Etzmu, and M. Hauth. Implicit-Explicit
Schemes for Fast Animation with Particle Systems. In Eurographics Computer An-
imation and Simulation Workshop, pages 137151, 2000.
[Felippa 2009] C. A. Felippa. Appendix I: A Compendium of FEM Integration Formulas.
http://www.colorado.edu/engineering/CAS/courses.d/AFEM.d/Home.html, 2009.
[Fierz et al. 2010] B. Fierz, J. Spillmann, and M. Harders. Stable Explicit Integration of
Deformable Objects by Filtering High Modal Frequencies. Journal of WSCG, 18:81
88, 2010.
[Fierz et al. 2011] B. Fierz, J. Spillmann, and M. Harders. Element-wise Mixed Implicit-
Explicit Integration for Stable Dynamic Simulation of Deformable Objects. In Proc.
Symposium on Computer Animation, pages 257266, 2011.
[Fierz et al. 2012] B. Fierz, J. Spillmann, I. Aguinaga, and M. Harders. Maintaining
Large Time Steps in Explicit Finite Element Simulations using Shape Matching. IEEE
Transactions on Visualization and Computer Graphics, 18:717 728, 2012.
[Fried 1972a] I. Fried. Bounds on the Extremal Eigenvalues of the Finite Element Stiff-
ness and Mass Matrices and their Spectral Condition Number. Journal of Sound and
Vibration, 22:407418, 1972.
[Fried 1972b] I. Fried. Condition of Finite Element Matrices Generated from Nonuni-
form Meshes. AIAA Journal, 10:219221, 1972.
[Galoppo et al. 2007] N. Galoppo, S. Tekin, M. A. Otaduy, M. Gross, and M. C. Lin.
Interactive haptic rendering of high-resolution deformable objects. pages 215233,
2007.
[Goldenthal et al. 2007] R. Goldenthal, D. Harmon, R. Fattal, M. Bercovier, and E. Grin-
spun. Efficient Simulation of Inextensible Cloth. ACM Transactions on Graphics, 26,
2007.
[Grinspun et al. 2002] E. Grinspun, P. Krysl, and P. Schroder. CHARMS: A Simple
Framework for Adaptive Simulation. ACM Transactions on Graphics, 21:281290,
2002.
[Gunnebaud et al. 2010] G. Gunnebaud, B. Jacob, and Others. Eigen v3.
http://eigen.tuxfamily.org/, 2010.
[Gurfil and Klein 2007] P. Gurfil and I. Klein. Stabilizing the Explicit Euler Integration
of Stiff and Undamped Linear Systems. Journal of Guidance, Control, and Dynamics,
30:16591667, 2007.
[Gustafson 1988] J. L. Gustafson. Reevaluating Amdahls law. Communications of the
ACM, 31:532533, 1988.
144 B IBLIOGRAPHY

[Gutierrez et al. 2011] L. Gutierrez, I. Aguinaga, B. Fierz, F. Ramos, and M. Harders.


Pitting a New Hybrid Approach for Maintaining Simulation Stability after Mesh Cut-
ting Against Standard Remeshing Strategies. In Proc. Computer Graphics Interna-
tional, 2011.
[Hairer et al. 2006] E. Hairer, C. Lubich, and G. Wanner. Geometric Numerical Integra-
tion: Structure-Preserving Algorithms for Ordinary Differential Equations. 2006.
[Harders et al. 2003] M. Harders, R. Hutter, A. Rutz, P. Niederer, and G. Szekely. Com-
paring a simplified FEM approach with the mass-spring model for surgery simulation.
Studies in health technology and informatics, 94:1039, 2003.
[Hauser et al. 2003] K. K. Hauser, C. Shen, and J. F. OBrien. Interactive Deformation
Using Modal Analysis with Constraints. In Proc. Graphics Interface, pages 247 256,
2003.
[Hauth et al. 2003] M. Hauth, O. Etzmu, and W. Strasser. Analysis of Numerical Meth-
ods for the Simulation of Deformable Models. The Visual Computer, 19:581600,
2003.
[Hicklin et al. 2005] J. Hicklin, C. Moler, P. Webb, R. F. Boisvert, B. Miller, R. Pozo,
and K. Remington. JAMA : A Java Matrix Package, 2005.
[Hughes 2003] T. J. Hughes. The Finite Element Method. 2003.
[Ibrahimbegovic 2009] A. Ibrahimbegovic. Nonlinear Solid Mechanics: Theoretical For-
mulations and Finite Element Solution Methods (Solid Mechanics and Its Applica-
tions). 2009.
[Irving et al. 2004] G. Irving, J. Teran, and R. Fedkiw. Invertible Finite Elements for Ro-
bust Simulation of Large Deformation. In Proc. Symposium on Computer Animation,
pages 131 140, 2004.
[Irving et al. 2007] G. Irving, C. Schroeder, and R. Fedkiw. Volume Conserving Finite
Element Simulations of Deformable Models. ACM Transactions on Graphics, 26:13,
2007.
[Ito et al. 2004] Y. Ito, A. M. Shih, and B. K. Soni. Reliable Isotropic Tetrahedral Mesh
Generation Based on an Advancing Front Method. In Proc. International Meshing
Roundtable, pages 95 106, 2004.
[Joldes et al. 2010] G. Joldes, A. Wittek, and K. Miller. Real-Time Nonlinear Finite
Element Computations on GPU - Application to Neurosurgical Simulation. Computer
Methods in Applied Mechanics and Engineering, 199:33053314, 2010.
[Kacic-Alesic et al. 2003] Z. Kacic-Alesic, M. Nordenstam, and D. Bullock. A Practical
Dynamics System. pages 716, 2003.
[Kidger and Smith 1992] D. Kidger and I. Smith. Eigenvalues of Element Stiffness Ma-
trices Part II: 3-D Solid Elements. Engineering Computations, 9:317328, 1992.
B IBLIOGRAPHY 145

[Kim and James 2011] T. Kim and D. L. James. Physics-based Character Skinning using
Multi-Domain Subspace Deformations. In Proc. Symposium on Computer Animation,
pages 6372, 2011.
[Kim and Pollard 2011] J. Kim and N. S. Pollard. Fast Simulation of Skeleton-driven
Deformable Body Characters. ACM Transactions on Graphics, 30:119, 2011.
[Kinsler et al. 1999] L. E. Kinsler, A. R. Frey, A. B. Coppens, and J. V. Sanders. Funda-
mentals of Acoustics. 1999.
[Klein 2007] B. Klein. FEM - Grundlagen der Anwendungen der Finite-Elemente-
Methode im Maschinen- und Flugzeugbau. 2007.
[Klingner and Shewchuk 2008] B. M. Klingner and J. R. Shewchuk. Aggressive Tetra-
hedral Mesh Improvement. In Proc. International Meshing Roundtable, pages 323,
2008.
[Klingner et al. 2006] B. M. Klingner, B. E. Feldman, N. Chentanez, and J. F. OBrien.
Fluid Animation with Dynamic Meshes. ACM Transactions on Graphics, 25:820825,
2006.
[Knupp 2001] P. M. Knupp. Algebraic Mesh Quality Metrics. SIAM Journal on Scientific
Computing, 23:193218, 2001.
[Knupp 2003] P. M. Knupp. Algebraic Mesh Quality Metrics for Unstructured Initial
Meshes. Finite Elements in Analysis and Design, 39:217241, 2003.
[Labelle and Shewchuk 2007] F. Labelle and J. R. Shewchuk. Isosurface Stuffing: Fast
Tetrahedral Meshes with Good Dihedral Angles. ACM Transactions on Graphics,
26:57, 2007.
[Liu et al. 2003] A. Liu, F. Tendick, K. Cleary, and C. Kaufmann. A Survey of Surgical
Simulation: Applications, Technology, and Education. Presence: Teleoperators and
Virtual Environments, 12:599614, 2003.
[Lloyd et al. 2007] B. Lloyd, G. Szekely, and M. Harders. Identification of spring pa-
rameters for deformable object simulation. IEEE Transactions on Visualization and
Computer Graphics, 13:108194, 2007.
[Martin et al. 2011] S. Martin, B. Thomaszewski, E. Grinspun, and M. Gross. Example-
based Elastic Materials. ACM Transactions on Graphics, 30:1, 2011.
[McAdams et al. 2011] A. McAdams, Y. Zhu, A. Selle, M. Empey, R. Tamstorf, J. Teran,
and E. Sifakis. Efficient Elasticity for Character Skinning with Contact and Collisions.
ACM Transactions on Graphics, 30:37, 2011.
[Miller et al. 2006] K. Miller, G. Joldes, D. Lance, and A. Wittek. Total Lagrangian
Explicit Dynamics Finite Element Algorithm for Computing Soft Tissue Deformation.
Communications in Numerical Methods in Engineering, 23:121134, 2006.
[Muller and Gross 2004] M. Muller and M. Gross. Interactive Virtual Materials. In Proc.
Graphics Interface, pages 239 246, 2004.
146 B IBLIOGRAPHY

[Muller et al. 2002] M. Muller, J. Dorsey, L. McMillan, R. Jagnow, and B. Cutler. Stable
Real-time Deformations. In Proc. Symposium on Computer animation, page 49, 2002.
[Muller et al. 2005] M. Muller, B. Heidelberger, M. Teschner, and M. Gross. Meshless
Deformations Based on Shape Matching. ACM Transactions on Graphics, 24:471
478, 2005.
[Muller et al. 2007] M. Muller, B. Heidelberger, M. Hennix, and J. Ratcliff. Position
Based Dynamics. Journal of Visual Communication and Image Representation, 18:109
118, 2007.
[Nealen et al. 2006] A. Nealen, M. Muller, R. Keiser, E. Boxerman, and M. Carlson.
Physically Based Deformable Models in Computer Graphics. Computer Graphics Fo-
rum, 25:809836, 2006.
[NVIDIA 2011] NVIDIA. NVIDIA CUDA C Programming Guide, 2011.
[Otaduy and Lin 2006] M. A. Otaduy and M. C. Lin. High Fidelity Haptic Rendering.
2006.
[Pentland and Williams 1989] A. Pentland and J. Williams. Good Vibrations: Modal
Dynamics for Graphics and Animation. ACM SIGGRAPH Computer Graphics, 23:207
214, 1989.
[Picinbono et al. 2003] G. Picinbono, H. Delingette, and N. Ayache. Non-linear
anisotropic elasticity for real-time surgery simulation. Graphical Models, 65:305321,
2003.
[Rivers and James 2007] A. R. Rivers and D. L. James. FastLSM: Fast Lattice Shape
Matching for Robust Real-Time Deformation. ACM Transactions on Graphics, 26:82,
2007.
[Rodriguez-Navarro and Susin 2006] J. Rodriguez-Navarro and A. Susin. Non Struc-
tured Meshes for Cloth GPU Simulation using FEM. In Workshop in Virtual Reality,
Interactions, and Physical Simulations, pages 1 7, 2006.
[Schmedding and Teschner 2008] R. Schmedding and M. Teschner. Inversion Handling
for Stable Deformable Modeling. The Visual Computer, 24:625633, 2008.
[Schmedding et al. 2009] R. Schmedding, M. Gissler, and M. Teschner. Optimized
Damping for Dynamic Simulations. In Proc. Spring Conference on Computer Graph-
ics, page 189, 2009.
[Schoberl 1997] J. Schoberl. NETGEN An Advancing Front 2D/3D-Mesh Generator
Based on Abstract Rules. Computing and Visualization in Science, 1:4152, 1997.
[Schwarz and Kockler 2006] H.-R. Schwarz and N. Kockler. Numerische Mathematik.
2006.
[Seiler et al. 2012] M. Seiler, J. Spillmann, and M. Harders. Enriching Coarse Interactive
Elastic Objects with High-Resolution Data-Driven Deformations. In Proc. Symposium
on Computer Animation, pages 917, 2012.
B IBLIOGRAPHY 147

[Serby et al. 2001] D. Serby, M. Harders, G. Szekely, W. Niessen, and M. Viergever. A


New Approach to Cutting into Finite Element Models. Medical Image Computing and
Computer-Assisted Intervention, 2208:425433, 2001.
[Shannon 1984] C. Shannon. Communication in the Presence of Noise. Proceedings of
the IEEE, 72:11921201, 1984.
[Shewchuk 1994] J. R. Shewchuk. An Introduction to the Conjugate Gradient Method
Without the Agonizing Pain. Technical report, 1994.
[Shewchuk 1998] J. R. Shewchuk. Tetrahedral Mesh Generation by Delaunay Refine-
ment. In Proc. Symposium on Computational Geometry, pages 86 95, 1998.
[Shewchuk 2002] J. R. Shewchuk. What is a Good Linear Element? Interpolation, Con-
ditioning, and Quality Measures. Technical report, 2002.
[Shinya 2004] M. Shinya. Stabilizing Explicit Methods in Spring-Mass Simulation. In
Proc. Computer Graphics International, pages 528 531, 2004.
[Si 2009] H. Si. TetGen: A Quality Tetrahedral Mesh Generator and Three-Dimensional
Delaunay Triangulator, 2009.
[Steinemann et al. 2006] D. Steinemann, M. Harders, M. Gross, and G. Szekely. Hybrid
Cutting of Deformable Solids. In Proc. Virtual Reality, pages 35 42, 2006.
[Stomakhin et al. 2012] A. Stomakhin, R. Howes, C. Schroeder, and J. M. Teran. Ener-
getically Consistent Invertible Elasticity. In Proc. Symposium on Computer Animation,
pages 2532, 2012.
[Taylor et al. 2008] Z. A. Taylor, M. Cheng, and S. Ourselin. High-speed Nonlinear Fi-
nite Element Analysis for Surgical Simulation using Graphics Processing Units. IEEE
Transactions on Medical Imaging, 27:65063, 2008.
[Teran et al. 2005] J. Teran, E. Sifakis, G. Irving, and R. Fedkiw. Robust Quasistatic
Finite Elements and Flesh Simulation. In Proc. Symposium on Computer Animation,
pages 181190, 2005.
[Teschner et al. 2004] M. Teschner, B. Heidelberger, M. Muller, and M. Gross. A Ver-
satile and Robust Model for Geometrically Complex Deformable Solids. In Proc.
Computer Graphics International, pages 312 319, 2004.
[Thomaszewski et al. 2009] B. Thomaszewski, S. Pabst, and W. Stra er. Continuum-
based Strain Limiting. Computer Graphics Forum, 28:569576, 2009.
[Wathan 1987] A. J. Wathan. Realistic Eigenvalue Bounds for the Galerkin Mass Matrix.
IMA Journal of Numerical Analysis, 7:449457, 1987.
[Wojtan and Turk 2008] C. Wojtan and G. Turk. Fast Viscoelastic Behavior with Thin
Features. ACM Transactions on Graphics, 27:47, 2008.

You might also like