Professional Documents
Culture Documents
Bjrn-Ove Heimsund
Department of Mathematics
University of Bergen
March 2005
Preface
This work presented herein constitutes my thesis for the partial fulfilment of
the requirements for the degree of Doctor Scientiarum at the University of
Bergen. The studies started in January 2002 and were conducted at both the
Department of Mathematics and the Centre for Integrated Petroleum Research.
Professor Magne S. Espedal has been the principal supervisor of my thesis,
and Professor Kenneth H. Karlsen has been the co-supervisor. Funding has
been provided by the Norwegian research council (NFR) within the program Computational mathematics in Applications (BeMatA) under the project
Robust Numerical Methods for Partial Differential Equations. Some additional
funding has been provided by the Centre for Integrated Petroleum Research.
The overall goal of the BeMatA project is to develop and analyse mathematical
models, numerical techniques, and method oriented software for solving problems
within science and technology. Under that umbrella, I have worked on a project to
develop robust numerical methods for reservoir fluid flow problems, with a focus
on emerging techniques for compositional fluids.
A model for reservoir fluid flow, called a reservoir simulator, provides a tool
for investigating the flow of fluids in the subsurface. In the oil industry, a reservoir simulator can be used to decide the optimal operating policy, or it can be
used to forecast future production. But such a tool can also be used within environmental studies, for instance by predicting impact of contaminants, performing
groundwater management, or remediation of polluted soil.
Reservoir simulation incorporates elements from all main branches of science:
physics, chemistry, geology, biology and mathematics. In this work, I have only
considered certain mathematical aspects which may be of interest to other mathematicians in the field of reservoir flows. The reader will probably notice some
discrepancies between the introductory part of the thesis and the attached papers
(apart from differences in notation). This is because I have found better and clearer
ways, and I wish to present these as well.
Outline of thesis
The thesis consists of three parts. In the first part an overview over the field
of reservoir flow modelling is provided, with emphasis on compositional fluids
and numerical methods. The second part deals with selected aspects of practical
simulation, e.g., software design and fast solvers. In the last part is a collection of
scientific papers which cover several facets of reservoir flows, both mathematical
and numerical.
Part I: Elementary reservoir flow modelling
Chapter 1 provides a broad overview over reservoir flow modelling, with an emphasis on oil recovery. The basics of mathematical modelling conservation
laws are introduced.
Then chapter 2 discusses the fundamental mathematical model for the flow of
fluids in a porous media (permeability, pressure and Darcys law). Classical oneand two-phase flow equations are derived.
These simple models are extended to the flow of multiple chemical species in
chapter 3. A large part of that chapter is devoted to the equilibrium calculations
for determining phase properties from pressure, temperature and composition. A
modern formulation of the classical black-oil model closes the chapter.
Numerical methods for these flow equations are supplied in chapter 4. We
distinguish between two classes; elliptic/parabolic and hyperbolic/parabolic, and
present numerical schemes for both classes.
Part II: Large scale simulation
Since the numerical solution of our flow models can be complicated, chapter 5
provides the outline of a flexible reservoir simulator software design. It is kept
general so that it may serve as a starting point for the development of specific
simulators.
To ensure high performance, parallel computers must often be used. To this
end, chapter 6 discusses domain decomposition parallelisation with an emphasis
on general, black-box type methods. In addition to the classical one-level Schwarz
Seven papers are included at the end of the thesis. Their content is summarised in
chapter 8, and chapter 9 outlines possible venues for future research work.
Paper A: A Parallel Eulerian-Lagrangian Localized Adjoint Method.
Accepted for publication in Advances in Water Resources.
Paper B: Adjoint methods are particle methods.
Draft manuscript.
Paper C: Multiscale Discontinuous Galerkin Methods for Elliptic Problems
with Multiple Scales.
Accepted for publication in Lecture Notes in Computational Science and
Engineering.
Paper D: Level set methods for a parameter identification problem.
Published in the book Analysis and optimization of differential systems,
pages 189200 by Kluwer Academic Publishers in 2003. Volume 121 of
International Federation for Information Processing (IFIP) series.
Paper E: On a class of ocean model instabilities that may occur when
applying small time steps, implicit methods, and low viscosities.
Published in Journal of Ocean Modelling, volume 7 (2004), pages 135144.
Paper F: A two-mesh superconvergence method with applications for
adaptivity.
Accepted for publication in AMS Contemporary Mathematics.
Paper G: High performance numerical libraries in Java.
Draft manuscript.
Acknowledgements
I would first like to thank my supervisors Magne S. Espedal and Kenneth H.
Karlsen for (perhaps unintentionally) giving me considerable leeway in how to
approach and conduct the research. As a result, I have been able to work on a
diverse selection of topics during my thesis.
I gratefully acknowledge both the financial support provided by the Norwegian
research council under project number 135420 (Robust Numerical Methods for
Partial Differential Equations), and the additional support furnished by the Centre
for Integrated Petroleum Research.
Rainer Helmig was my host during a three month research visit to the Institut
fur Wasserbau, Universitat Stuttgart in the spring of 2004. Thank you for kindling
my interest in mathematical modelling of reservoir flows, and for making my stay
a pleasant and useful one.
Thanks to Helge K. Dahle and Ivar Aavatsmark for many a useful conversation, and to Gunnar E. Fladmark and Erlend ian for the cooperation we had (and
have) on compositional reservoir flow simulation. Many have provided feedback
on this work, including Geir Terje Eigestad and Raymond Martinsen, and with
such help, any remaining faults are entirely my own.
Finally, I would like to thank my family for their support over the years. I
believe a creative work is as much a product of one self as that of ones closest.
Bjrn-Ove Heimsund
Bergen, March 2005.
Contents
I Elementary reservoir flow modelling
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
5
6
7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
12
13
15
16
17
18
20
24
.
.
.
.
.
.
.
.
.
.
.
.
27
28
28
32
34
35
36
37
39
41
44
45
50
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Numerical methods
4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Solving the discretised equations . . . . . . . . .
4.2 Solving elliptic/parabolic problems . . . . . . . . . . . .
4.2.1 Integral formulation: Finite differences . . . . .
4.2.2 Variational formulation: Finite elements . . . . .
4.2.3 Saddle point formulation: Mixed finite elements
4.2.4 Comparisons . . . . . . . . . . . . . . . . . . .
4.3 Solving hyperbolic/parabolic problems . . . . . . . . . .
4.3.1 Hyperbolic formulations . . . . . . . . . . . . .
4.3.2 Eulerian formulation: Finite volumes . . . . . .
4.3.3 Lagrangian formulation: Characteristic methods
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
52
54
55
.
.
.
.
.
.
.
.
.
.
.
57
58
59
60
61
66
70
76
77
78
80
83
93
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
95
96
96
97
98
99
100
104
107
110
112
114
116
116
116
117
6.2
Parameter estimation
7.1 Parameter representation . . . . . . . . .
7.1.1 Coarse grid representation . . . .
7.1.2 Level set representation . . . . .
7.2 Parameter identification . . . . . . . . . .
7.2.1 Regularisation . . . . . . . . . .
7.2.2 Solving the optimisation problem
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
III Papers
127
128
129
129
131
131
133
137
139
Further work
149
9.1 Numerical methods . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.2 Large scale simulations . . . . . . . . . . . . . . . . . . . . . . . 151
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
153
155
155
157
159
161
162
164
.
.
.
.
.
.
.
.
.
.
.
165
167
167
169
170
172
172
174
176
177
179
180
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
181
183
183
185
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
209
211
211
213
214
214
216
216
220
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
223
225
225
227
232
.
.
.
.
235
237
237
239
241
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
F.4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
241
242
247
249
251
.
.
.
.
.
.
.
.
.
.
.
.
.
253
255
257
259
261
264
267
268
268
271
273
274
277
280
283
Part I
Elementary reservoir flow modelling
Chapter 1
Reservoir flow modelling overview
A subsurface reservoir is a trap in which fluids such as oil, gas and water have
accumulated over millions of years by migration from source rocks. The reservoir
rock is typically sedimentary in nature and consists of an interconnected porous
network where fluids may flow, subject to forces such as fluid pressure, capillarity
and gravity.
To recover the oil and gas, wells are drilled into the reservoir, some which
produce oil and other which inject water and gas to provide pressure support.
Since it is costly to drill and operate wells, it is desirable to optimize their number,
placement and operation in the reservoir. For this to be done, a good understanding
of the fluid dynamics in the reservoir is necessary.
In this chapter we provide a brief overview of the main oil/gas recovery mechanisms and where mathematical flow modelling is useful. Later chapters will then
provide the actual flow models for the types of problems discussed herein, and
supplement these with suitable numerical methods. See [237, 207, 123, 112] for
more in-depth information.
1.1
Oils and gases have complex origins, both inorganic and organic. Chemical reactions between carbon containing substances and water can form hydrocarbons,
primarily methane. But the amount of hydrocarbon generated is usually small
compared to the organic generation. The organic origin is from continuous sedimentation of dead plants and animals. At shallow subsurface conditions, bacterial
decay produces methane, water and carbon dioxide, and is primarily a reduction
of the relative oxygen content, leaving hydrocarbon rich kerogen substances. This
process is termed diagenesis.
Kerogen is a family of complex cyclic hydrocarbon molecules bonded with
Injector
Reservoir
Gas
Oil and Gas
Water
Seal
Carrier bed
Source rock
Figure 1.1: Hydrocarbons are formed from organic matter in source rock which
has been subjected to high pressure and temperature. As the resulting oil and gas
phases are lighter than water, they are driven through the carrier bed and trapped
in the reservoir. A typical recovery strategy is to inject water below the oil/gas cap
to provide continual pressure support so that the production well can recover as
much hydrocarbons as possible. The horizontal scale can be as large as 100 km,
while the vertical scale can be as small as a few hundred meters.
oxygen, nitrogen and sulfur. Deeper in the subsurface, and at temperatures
between 60 C 120C, kerogen cracks into heavier hydrocarbon components
which form the oil phase, while gas is formed deeper, and at temperatures between
120C 225 C. This process, catagenesis, is believed to be the largest source of
hydrocarbons.
At larger depths, and temperatures excessing 225 C, the kerogen has expelled
most of the hydrocarbons, and the remainder cracks into methane until no hydrogen is left. The remaining carbon forms graphite. This final stage is called
metagenesis.
The generation of hydrocarbons takes place in the source rocks, which are
usually buried far below the surface, see figure 1.1. Primary migration is the
movement of hydrocarbons from the source rocks into the permeable carrier beds,
but the exact nature of this migration is still not fully understood. Once in the
carrier beds, oil and gas will migrate through a water saturated medium by a process termed secondary migration. As oil and gas is less dense than water, the
gravity forces will drive oil and gas above the water and upwards in the reservoir by buoyancy, but during the migration the reservoir is undergoing gradual
compaction which may either limit the upward motion, or force the hydrocarbons
downwards. In addition, capillary and hydrodynamic forces can act against or
the coarse pores, and oil/gas in the finer pores is not pushed towards the production
well.
The hydrocarbon fluids can also be rather immobile, either because of large
viscosity in heavier hydrocarbon mixtures such as asphaltenes, or because of capillary forces binding fluids to the host rock. As the relative saturation of the oil
decreases, the capillary effects can become stronger and trap the oil in place.
Tertiary production is a set of methods designed to address these issues. Some
of the well-known techniques are:
Polymer injection We may add polymer substances to injected water. The polymers increase the viscosity of the water and can also block larger pores.
Consequently, the injected water must chose a different path, and can then
displace oil trapped in smaller pores.
Miscible displacement Gaseous fluids consisting of components such as carbon
dioxide, nitrogen and methane can mix with the oil and form a single phase.
This single phase flow regime between the oil and the gas phase has reduced
interfacial tensions and can result in a more effective displacement.
Surfactant flooding Surfactants are complex chemical compounds designed to
reduce the interfacial tensions between oil and water, resulting in higher
total phase mobility.
Thermal flooding The viscosity of oil can be reduced by heating. Injection of
hot water or steam is a common method. Other methods include igniting
the most viscous of the hydrocarbon banks and carefully controlling the
oxygen supply.
These methods tend to be expensive and their precise effect on the reservoir flow
performance can be hard to ascertain a priori. Consequently, tertiary production
methods induce a fair amount of risk, but still methods such as reinjection of gas
(for miscible displacement) and steam injection are becoming more mainstream.
~n
Figure 1.2: A control volume with surface . Every point on the surface has an
outward normal vector ~n.
Using simple mathematical models with analytical solutions, engineers can
provide basic performance predictions. However, for the more advanced models
analytical answers may not be available. Instead numerical methods for simulating the models have become popular, especially with the advent of fast computers.
But there are still trade-offs. In particular, the more detailed the mathematical model, the slower the computer can calculate a solution. Therefore we must
seek fast numerical schemes combined with suitable mathematical models for the
reservoir. And even the geological description of the reservoir may be riddled
with uncertainty, leading us to perform multiple realizations and model calibration based on field production data.
Q is here positive for a source, and negative for a sink. Using Gauss divergence
theorem, we may convert the boundary flux integral into a volume integral:
Z
Z
Z
u
~
dV + F dV =
Q dV
(1.2)
t
Z
u
+ F~ Q dV = 0.
(1.3)
t
Since this must hold for any size of , we may drop the integrals to obtain the
conservation law in differential form:
u
+ F~ = Q.
t
(1.4)
In reservoir flow modelling the conserved quantity can be the mass of a phase
(water, oil, gas), the mass of a molecular substance, or the thermal energy. We
shall in the following chapters give expressions for u and its flux F~ , and then
provide methods which solve equation (1.4) accurately and efficiently. A topic
we will not discuss in any detail, is the initial generation of a simulation grid (i.e.,
the set of connected control volumes). To accurately capture important geological
features, these grids may be very large, and the need for speed in a numerical
solution scheme is obvious. An actual simulation grid is shown in figure 1.3.
Figure 1.3: A simulation grid for the PUNQ (Production forecasting with Uncertainty Quantification) model by Elf Exploration, viewed from the top. Oil and gas
will be trapped in the dome in the center of the grid, with water around. Totally
there are 1761 control volumes (cells) in five layers, and each cell has a width and
length of 180m, and a height of around 5m.
Chapter 2
Immiscible reservoir flows
For immiscible fluids the phases do not mix or exchange mass between each other.
This constraint is often quite reasonable when considering oil and water flows at
constant temperature, but with gases or at high temperatures mixing does occur.
However, the equations derived in this chapter will be extended in the next chapter
for miscible non-isothermal reservoir flows.
In this chapter we will briefly cover the primary physical and geological parameters influencing fluid flows in porous media. Then we introduce Darcys law,
which is an empirical relation giving the effective fluid velocity. With this we
provide mathematical models for the immiscible flows of one, two and three
phases along with some common simplifications. Further information can be
found in the books [139, 215, 166, 220, 15, 23, 88, 61].
2.1
The flow of reservoir fluids in porous media can be described at several different
scales, from the smallest molecular scale up to an averaged continuum scale. For
the purpose of large-scale numerical simulations, a molecular description is too
demanding due to the large number of molecules in even the smallest amount of
fluids (a gram of carbon contains over 1022 atoms).
A continuum scale description of fluids averages the molar masses so that
we get a continuous body of fluid, and its motion is governed by forces acting
between different fluid bodies and the reservoir medium itself. This is all well
described by the Navier-Stokes equations, which are much used for surface and
atmospheric flows. However, when we are dealing with the subsurface we do not
know the actual geometry of the problem as measuring equipment for determining
the pore-networks are still not comprehensive, and even if they were, the geometry
of the pore networks is too complex for mathematical/numerical modelling.
12
REV
Vm
V
VM
Figure 2.1: Volume averaged porosity as a function of the averaging volume size.
Consequently, a reservoir level continuum model averages not just the fluids,
but the whole control volume state (geometry and fluids). Therefore we describe
volume-average properties such as phase saturations, permeabilities, porosities,
etc, which are more amendable to measurements and experiments. The size V
of the averaging volume (also called the representative elementary volume, REV)
must be chosen such that it is larger than the microscopic length Vm , while smaller
than the large scale variations VM :
Vm V VM .
(2.1)
This ensures that the significant microscale variations are smoothed out, and also
ensures that the large scale heterogeneities are preserved, see figure 2.1.
A reservoir model is built from multiple control volumes V , and the larger
their size, the smaller the number of them. In upscaling, the size V is taken to
be of order O(VM ), and special techniques must be taken so that the subscale
information is not lost. This is itself a large topic [226], and we shall briefly
consider it later when discussing numerical solution methods.
Vp
.
V
(2.2)
13
te
m
So
So
Sr
Sg = 1
m
ste
=
Sg
Sr
sy
W
at
er
/g
il
as
s
ys
/o
as
All phases mobile
Sg
Water/oil system
r
Sw = 1 So = So
Sw = Swr So = 1
Figure 2.2: Three phase system diagram. If none of the phase saturations are at
a residual level, we have a full three phase case, but if one phase saturation is
residual it reduces into a two-phase problem. It may also further reduce into a
single phase system (the corners of the triangle).
In the pore space, the different fluid phases such as oil, water and gas exist, each
with their own volume. The phase saturations are
S =
V
,
Vp
= w, o, g,
(2.3)
where the subscripts denote the phases water (w), oil (o) and gas (g), respectively.
Since the pores are always fully saturated, we must have
X
S = 1.
(2.4)
2.2.1 Permeability
The hydraulic conductivity tensor m2/Pa s describes the influence of the fluid
flow density (flow velocity), and is given as
and rock properties on the volumetric
kr
.
(2.5)
Here, [Pa s] is the phase viscosity, and K m2 represents the intrinsic or abso K is typically a spatially dependent
lute permeability of the given porous medium.
= K,
14
tensor as the pores have a preferred flow direction and may be (nearly) sealed in
other directions. By Onsagers principle [91], this tensor must always be symmetric and positive definite to ensure a physical consistent conductivity.
The quantity kr [0, 1] is the relative permeability of phase , which depends
on all the phase saturations S . If only a single phase is present, its relative permeability is 1, but when multiple phases flow in a pore, they interfere with each
other and their relative permeabilities decrease. Since kr 0 faster than S 0,
phase gets a zero mobility at a residual saturation Sr > 0. Figure 2.3 illustrates
this by showing that phases with a too low saturation are immobilised.
Single-phase experiments can provide the fluid viscosity , see [227, Chapter
9] and [218, Chapters 3 and 11]. Once the viscosity is known, we can measure the
permeability of a rock species by single-phase flooding experiments. But for the
relative permeabilities we must resort to multiphase experiments by considering
pairs of phases. For each pair of phases, one phase will wet the rock more than the
other phase, and that phase will be referred to as the wetting phase ( = w). The
other phase is then the non-wetting phase ( = n). Normally, water is the wetting
phase in a water-oil system, and oil is the wetting phase in a oil-gas system. For a
two-phase system, we can eliminate the non-wetting phase saturation by
Sw + Sn = 1 Sn = 1 Sw .
(2.6)
(2.9)
For oil and gas three-phase relative permeability functions it has been common to
use the relationships provided by Stone [249, 250]. He built relative permeability
go
functions by a scaled product of the two-phase functions krow and kr . See the
above references for details.
15
Relative permeability
10
0.8
0.6
0.4
0.2
0
0
0.5
Sw
0
0
Capillary pressure
0.5
Sw
Figure 2.3: Some idealised relative permeability and capillary pressure curves.
The blue curves are for the Brooks-Corey model, while the red ones illustrate
the Van Genuchten model, both with some typical parameters. For the relative
permeability, the dotted curves are for the non-wetting phase, while the solid are
for the wetting phase.
2.2.2 Pressure
The phase pressure p [Pa] can be related to the phase density [kg/m3 ] through
the compressibility c [1/Pa]:
1
= c .
(2.10)
p
Ideal fluids have a constant compressibility, so equation (2.10) may be integrated
to obtain
(2.11)
ln 0 = c p p0 , p0 = 0 ,
(2.12)
= 0 exp c p p0 .
A special case is that of incompressible fluids with c = 0. Then equation (2.12)
becomes
= 0 .
(2.13)
Real fluids have more complicated pressure/density relationships given by equations of state. We shall return to this in Chapter 3.
Due to interfacial tensions between the different fluids and the rock matrix,
the pressure in the wetting phase is smaller than the pressure in the non-wetting
phase. This difference is the capillary pressure, and it is always at least a function
of saturation:
pc (Sw ) = pn pw .
(2.14)
Likewise, a capillary pressure function is needed to express the third pressure for
16
(2.15)
(2.16)
2.2.3 Hysteresis
The capillary pressure and relative permeability relationships have so far been
assumed to only depend on the saturations, but experiments show that they are
also dependent on the flow process itself. A two-phase problem may then have
distinct capillary pressure / relative permeability curves when water is displaced
instead of oil. This is referred to as hysteresis, and makes these parameters history
dependent.
When a process is reversed, we have to switch between different curves. Since
the process is smooth, intermediary scanning curves connecting the two curves are
introduced, see figure 2.4. Nested scanning curves may have to be introduced if
there are multiple reversions in a short time span, but we will not provide details
on their construction. See instead [242, 164, 103].
More physically correct methods treat the capillary pressure saturation relationships to be directly flow dependent, typically [135]
Sw
, > 0,
(2.17)
t
where is an empirical parameter that may be dependent on Sw . During drainage
processes, the change in Sw will be positive and thus the pressure in the nonwetting phase will be relatively larger than the wetting phase pressure when compared to the = 0 case. This may aid in the oil displacement process.
pc (Sw ) = pn pw
17
pc
B
Sw
Figure 2.4: Capillary pressure hysteresis for a two-phase problem. The blue curve
is used for drainage (reduction of oil saturation) and the red for imbibition (increase of oil saturation). If the process changes from imbibition to drainage at
A, a scanning curve is followed instead of the primary drainage curve. The same
argument applies at B if the process changes from drainage to imbibition.
The opposite effect occurs if there is an imbibition process (decrease of Sw ).
Then pn becomes relatively smaller and the wetting phase is more rapidly displaced where there are large changes in Sw .
While the hysteretic and dynamic effects in capillary pressure and relative permeability can be significant in some complex flow processes, we shall not discuss
them further in our mathematical models. However, the modification necessary
to incorporate these effects into a numerical model is usually minor, albeit it may
make the model harder to solve.
2.3
Mathematical models
The fundamental law of fluid flow in a porous medium is Darcys law. It gives the
effective flow velocity across a representative elementary volume and thus does
not describe the intrinsic particle velocity, which is typically larger. In differential
form, the law states that the phase volumetric flow density u~ (flow velocity) is
given by
u~ = p .
(2.18)
The equation states that phase will move from high pressure to the regions of
lower pressure, and the velocity is dependent on the medium and phase conductivities. However, since is a tensor, the actual direction of the flow may not
18
(2.21)
For each phase we may have distinct sources, so Q = q [kg/s m3 ]. Inserting these
back into the conservation law (2.20) gives
( S )
(p + h) = q .
t
(2.23)
(2.24)
19
Incompressible fluid
(2.26)
(2.27)
(2.28)
= .
t
(2.29)
(2.30)
1
.
c
(2.31)
This is then inserted into equation (2.24) to eliminate the pressure, and we find
the following equation which can be solved for the density:
1
()
+ h = q.
(2.32)
t
c
This may be significantly simplified by assuming horizontal flow (h = ~0) and
time-constant porosity:
(2.33)
= q.
t
c
This is a standard parabolic type of equation, and is similar to the heat equation
[111, Chapter 2].
20
(n Sn )
n n (pn + n h) = qn.
t
(2.34)
(2.35)
(2.36)
(2.37)
and some unspecified equation of state is used for the densities. In the following
we shall first find an equation for the wetting phase pressure pw . Then a saturation
equation for Sw will be provided. By our constitutive relationships Sn and pn are
then automatically found.
Pressure equation
S S 1
q
S
+
+
+ ( u~ ) =
.
t
t
t
(2.38)
(2.39)
+ (w u~w ) + (n u~n ) =
+
+
+ . (2.40)
t
w t
n t
w
n
w n
Notice that the time-derivatives of the saturations have cancelled out. The porosity
and density derivatives are transformed into pressure derivatives by applying the
chain rule;
pw
=
,
t
pw t
w pw
w
=
,
t
pw t
n
n pn n pw pc
.
=
=
+
t
pn t
pn t
t
(2.41)
(2.42)
(2.43)
21
pc 1
1
qw qn
pw
+ Sn cn
+ (w u~w ) + (n u~n ) =
+ ,
t
t w
n
w n
(2.44)
1
+ Sw cw + Sn cn ,
pw
c =
1
.
p
(2.45)
Once equation (2.44) has been solved for pw , the Darcy velocity u~w is found:
u~w = w (pw + w h) .
(2.46)
(2.47)
However, there are some other common saturation equations based on total fluid
velocity u~T . Notice that we can rewrite u~n in terms of u~w as
u~n =
n
u~w n pc + n (w n )h.
w
(2.48)
n
u~w n pc + n (w n )h,
=
1+
w
(2.49)
(2.50)
w
(~
uT + n pc n (w n )h) .
n + w
(2.51)
22
0.1
0.2
0.3
0.4
0.5
Sw
0.6
0.7
0.8
0.9
Figure 2.5: The fractional flow function fw (Sw ) of equation (2.54). The phase viscosities are both set to unity, and we have used the relative permeability functions
from figure 2.3. Blue curve is for Brooks-Corey, and red for Van Genuchten.
Substituting this into the conservation law for the wetting phase gives
(w Sw )
w
+ w
(~
uT + n pc n (w n )h) = qw .
t
n + w
(2.52)
Note that the phase conductivities n and w are directly dependent on Sw , while
pc
Sw .
(2.53)
pc =
Sw
By now defining the fractional flow function
fw (Sw ) =
w
,
n + w
(2.54)
(2.55)
We will now make some simplifying assumptions to find a set of pressure/saturation equations amendable to analytical solution procedures and thus popular as
reference solutions for numerical schemes. The simplification proceeds by first
assuming that
the two fluids are incompressible,
23
u~T = 0,
(2.56)
(2.57)
t
Sw
(2.58)
Expanding the first term of the flux and using equation (2.57) gives
(fw u~T ) = fw u~T + u~T fw = u~T fw .
(2.59)
= 0.
Sw (w n )h
+ u~T fw (Sw ) + fw n
t
Sw
(2.60)
+ u~T fw (Sw ) fw n (w n )h = 0.
(2.61)
t
See [133, 132] for more details. The classical Buckley-Leverett problem [51] is
usually formulated without gravity:
Sw
+ u~T fw (Sw ) = 0.
t
(2.62)
24
(2.63)
(2.64)
(2.65)
(2.66)
(2.67)
(2.68)
(2.69)
Conservation laws (2.23) for each of the three phases are then obtained:
w (1 So Sg )
w w (pw + w h) = qw ,
t
(o So )
= qo ,
o o pw + poc + o h
t
(g Sg )
g
g g pw + pc + g h
= qg .
t
(2.70)
(2.71)
(2.72)
As earlier, we can now solve for two phase saturations using any two of the three
equations (2.70)-(2.72). However, we shall also provide an alternative formulation based on the total phase velocity [246], as this will prove useful in certain
25
numerical methods later. Using the Darcy velocity (2.69), the total three phase
velocity is
X
u~T =
u~
(2.73)
K pw + pc + w h
X
X
= T Kpw +
Kpc + Kh
,
where T =
!
X
X
1
u~T Kh
Kpw =
Kpc .
T
(2.74)
(2.75)
(2.76)
Here, is an auxiliary phase index like . Substituting equation (2.76) for Kpw
Chapter 3
Miscible reservoir flows
In the last chapter we assumed no mass transfer between the phases. This assumption is mostly valid for two phase flows of water and oil, which is often the case
with primary and secondary production mechanisms. However, it is well known
that gas may boil out of the oil phase as the pressure drops, so we can end up with
three phases and mass transfers. Furthermore, in tertiary production, mass transfer and compositional effects are essential to model correctly as they may become
the driving mechanisms for the flow.
A typical reservoir fluid consists of hundreds of distinct chemical components.
These include inorganic compounds such as helium, argon, krypton, radon, nitrogen and carbon dioxide, and the more organic compounds of hydrogen gas, hydrogen sulfide, and of course the hydrocarbons themselves. It is the hydrocarbons
which are of commercial interest while the others may have adverse effects, particularly hydrogen sulfides which seriously damage production equipment. The
hydrocarbons can be subdivided into three groups:
Alkanes These are simple chained hydrocarbon segments, possibly with
branches (iso-alkanes). All bonds are simple electron bonds. Methane, ethane, propane and butane are common alkanes and are typically gaseous at
surface conditions. Heavier alkanes include pentane, hexane, heptane etc.,
and these are instead usually in the liquid phase.
Cyclo-alkanes are hydrocarbons with simple bonds and ring structures of five or
six carbon atoms and bonded hydrogen. Some common cyclo-alkanes are
cyclopentane and cyclohexane.
Arenes Ring structures with double bounds between the carbon atoms are called
arenes. They tend to be found in heavier oils, with benzene and toluene
being the most common aromatic molecules.
28
All these components may exist in a multitude of phases. We have already mentioned the water, oil and gas phases, and typically hydrocarbons do not mix (easily) with water. Other phases include solidified material (asphalt, tar, wax), which
cause problems for flows in reservoir pipes, gas condensate, which is gaseous
in the reservoir but condenses into liquid at surface conditions, and gas hydrates
which are frozen mixtures of water and lighter hydrocarbons.
In this chapter we will present general conservation laws for mass, energy
and volume suitable for this compositional setup. We include energy since the
interphase mass exchange is affected by the temperature. Our derivation is based
on the previous chapter and basic continuum mechanics. Then the chemistry of
interphase mass transfer is described by thermodynamical equilibrium conditions.
As these calculations can be complex, a popular and simplified compositional
model known as the black-oil model is presented at the end of the chapter.
We refer to the books [227, 218, 4, 116, 14] for further background on the
chemical aspects of this chapter, and also to the doctoral theses [59, 245].
(3.1)
29
Notice that this is the mass density of a component distributed over all the phases.
We may do the same for the flux, keeping in mind that Darcys law is still assumed
valid:
X
F~ =
u~ C .
(3.4)
S C
+
!
u~ C
= q .
(3.5)
(3.6)
The mole densities are related to the mass densities through the component molecular weights M [kg/mol] by
(3.7)
= .
M
Observe now that
C = = M = M C ,
(3.8)
S C
t
+
!
u~ C
= q ,
(3.9)
with q = q /M having the unit [mol/s m3 ]. We next wish to clearly identify a set
of primary variables. For this we shall use the mole numbers given in
N
,
=
V
(3.10)
30
Taylor diffusion
Stream splitting
Tortuosity effects
S C =
S
(3.11)
X N
Vp X N V
V
V Vp
1X
=
N
V
(3.12)
(3.13)
(3.14)
1
=
N .
(3.15)
V
The number of moles of component , namely N [mol], will be used as the
primary variable in the mass conservation law
!
X
1 N
+
(3.16)
u~ C = q .
V t
As the presence of N implies molar units, we have here dropped the -notation
on C and q .
Dispersion
31
Dispersion can also represent small scale movements not captured by the volume
averaging in our mathematical model, but quantifying this macroscale dispersion
is an open problem. It is therefore neglected here, but [108, 169] may provide
further details. The microscale dispersive velocity is given mathematically by
[234, 233]
C
C
u k t e~t .
(3.17)
w~ = d,l k~
u k l e~l d,t k~
~
e
~
e
Here, d,l [m] and d,t [m] are the longitudinal (parallel) and transversal (perpendicular) dispersion coefficients, and e~l and e~t are the unit vectors parallel and
perpendicular to u~ respectively. Typically;
d,l > d,t .
(3.18)
where
u~ u~T
(3.19)
T =
.
k~
u k2
(3.20)
D = d,l k~
u k T + d,t k~
u k (I T ) ,
(3.21)
(3.22)
It should be noted that many numerical solution methods add artificial diffusion,
and this artificial diffusion is often of far greater magnitude than the physical
dispersion derived here. When using a naturally diffusive scheme we may hence
neglect physical dispersion altogether [112].
32
u~ u~
,
2
(3.24)
for which is the specific internal energy, is the potential energy driving
the volume forces, and the last term is the specific kinetic energy. The sum of
potential and kinetic energy is the mechanical energy. From basic continuum
mechanics (i.e., [271, 104]) we have that the conserved quantity is
u = S e ,
(3.25)
Here, J~ [W/m2 ] is the conductive heat flux and [Pa] is the stress tensor. If there
is also interphase energy transfer, Q = qe [W/m3 ], then equation (1.4) becomes
( S e )
+ e u~ + J~ u~ = qe .
t
(3.27)
The terms in the energy flux will now be specified. The conductive heat flux
J~ is usually given by Fouriers law
J~ = T ,
(3.28)
where [W/m K] is the heat conductivity tensor and T [K] is the system temper we assume that the fluid is ideal, that is, it has constant compressibility,
ature. If
Cauchys laws of continuum mechanics simplify the stress tensor into the isotropic
form
= p I.
(3.29)
Before substituting (3.29) into equation (3.27), we note that in the majority of
thermal flooding applications, most of the phase energy is the internal energy
of the heated fluid. Thus, we may neglect the mechanical energies contained in
potential and kinetic terms, and we get the equality
e = .
(3.30)
( S )
+ ( u~ T + p u~ ) = qe .
t
(3.31)
33
= h u~ T .
u~ T + p u~
(3.32)
(3.33)
Here we have introduced the enthalpy h [J/kg], which is a thermodynamical potential that may be computed from an equation of state. Equation (3.31) is now
( S )
+ (h u~ T ) = qe .
t
(3.34)
Since the temperature is assumed equal for all the phases, we may sum (3.34) over
the phases, yielding
+
!
(h u~ T )
= qe .
(3.35)
In this form q e reduces to just the contribution of energy from external sources.
As energy is also contained within the rock phase = r, we include this phase
in the above summation. Convective energy transfer in the rock phase is negligible
~ Furthermore, the internal
compared to its conductive transfer, so we have u~r = 0.
energy of the rock is
pr
r = hr .
(3.36)
r
Assuming further a negligible change in porosity and rock density with time, we
obtain
(r Sr r )
hr T
= (1 )r
.
(3.37)
t
T t
In this equation we also assumed no change in rock pressure with time. Now,
equation (3.35) becomes the porous media temperature equation
hr T
(1 )r
+
T t
P
6=r S
t
!
(h u~ T )
= qe.
(3.38)
The second time derivative term can be treated implicitly in temperature if we
have a temperature-explicit expression for the specific internal energy . See
[71, 72] for details.
34
X
V = Vp = V .
(3.40)
V .
(3.41)
zt is the coordinate at the reservoir ceiling (the seal, see figure 1.1), and Wt is the
overburden there. Naturally, this phase sum should include the rock = r.
Using the chain rule on the residual volume derivative in equation (3.42) yields
the volume balance equation
R(t) R pw R W R T X R N
+
+
+
+
= 0.
(3.44)
t
pw t
W t
T t
N
t
35
X V
V
,
pw
pw
(3.46)
R
=
V,
W
W
X V
R
=
,
T
T
X V
R
=
.
N
N
(3.47)
(3.48)
(3.49)
By the same approach as for the total volume balance, we can also provide phase
volume balance equations. Since V is dependent on pressure, temperature and
masses, clearly
V V pw V T X V N
=
+
+
.
t
pw t
T t
N
t
(3.50)
Overburden affects the phase volumes through the change in the pressure, so having solved for pressure we may neglect overburden here. The time derivatives
of pressure, temperature, and mass are known from earlier equations. Most of the
physical parameters used in the equations above are functions of saturations rather
than volumes, and using
V = V S ,
(3.51)
we get the compositional saturation equation
1 V pw V 1 T X V 1 N
(S )
=
+
+
.
t
V pw t
T V t
N
V
t
(3.52)
(3.53)
36
37
N0
,
w
Nw = N0w = N0 .
(3.54)
(3.56)
38
(3.57)
(3.58)
However, for a multi-phase system we note that the energy change can also be
affected by the phase composition:
d = T ds p dV +
dN ,
(3.59)
(3.60)
To find the equilibrium condition, we must consider the total system energy:
dg + do = T (dsg + dso ) p(dVg + dVo ) +
g
g
dN + o dNo .
(3.61)
g
g
o dN .
(3.62)
At the equilibrium state there is no change in entropy or total energy, and consequently we must have
g
= o ,
= 1, . . . , nc .
(3.63)
The equality of the chemical potentials for all components in both of the phases
is our thermodynamical equilibrium condition. In the following sections we shall
provide a closed expression for , followed by some solution techniques for
equation (3.63).
39
0
,
(3.65)
= + RT ln
f0
with 0 being the potential, and f0 being the fugacity, both of a pure component
fluid at ideal (low pressure) conditions. This shows that equality of the chemical
potentials across phases implies that the fugacities should also be equal across
phases:
g
g
= o f = fo .
(3.66)
Hence, we can limit ourselves to just determining the fugacities, and they are
defined through the relation
(3.67)
f = C p ,
where is called the fugacity coefficient which satisfies
lim = lim
p0
p0 C p
= 1.
(3.68)
To determine the fugacity, start by subtracting RT d ln C p from both sides of
equation (3.64):
d RT d ln C p = RT d ln f RT d ln C p = RT d ln .
(3.69)
We can identify d by considering the Gibbs energy function. The Gibbs energy
is defined as the summation of the internal energy, the volume changing work and
the energy loss due to entropy:
G = + pV T s .
(3.70)
dG = d + p dV + V dp T ds s dT
X
dN .
= V dp s dT +
(3.71)
p
T
N
40
(3.74)
(3.75)
(3.76)
2 G
V
2 G
=
=
,
=
p
N p p N
N
(3.77)
and therefore the change in when only the phase composition N can vary is
d =
V
N
(3.78)
dp.
dp
RT
d
ln
C
p
.
(3.79)
=
=
d C p
C p
C dp
C p
dp
=
p
= d ln p,
(3.80)
(3.81)
(3.82)
(3.83)
such that
d ln =
1 V
dp d ln p.
RT N
(3.84)
If the fugacity coefficient can be determined, then the fugacity itself may be found.
To do this requires further pressure/volume/temperature relationship.
41
p
Oil
C
Oil and Gas
Gas
T
Figure 3.2: Phase envelope for a given system. Outside the envelope, there is only
a single phase; oil or gas. This is determined by the position relative to the critical
point C in the pressure-temperature diagram. The left portion of the envelope is
called the bubble-point curve, and the right is part is the dew-point curve.
(3.86)
(3.87)
A real fluid becomes ideal as pressure tends to zero or as the molar volume tends
to infinity. Hence;
lim Z = lim Z = 1.
(3.88)
p0
There are numerous ways to determine Z , and this is a large research area by
itself. Here we shall start by providing pressure-explicit equations of state, and
then Z will be identified. In the classical pressure-explicit form we assume that
p = prep + patt ,
(3.89)
42
where prep are the repulsive forces and patt are the attractive forces. Van der Waals
[260] provided a simple expression for these by
a
RT
2,
p=
(3.90)
V b V
in which a and b are empirical parameters that satisfy
p
= 0,
V
2 p
= 0,
V 2
(3.91)
(3.92)
at the critical point of a single-component system. The critical point is where the
system hovers between two phase oil/gas, single phase oil and single phase gas,
see figure 3.2.
The Van der Waals equation of state, although simple, is not used nowadays
since more accurate equations of state are available. We shall here consider the
class of general two-parameter equations
RT
a
(3.93)
p=
,
V b
V 1 b V 2 b
where
1 < 2 < 1
(3.94)
are empirical constants. Notice that Van der Waals equation, while not strictly of
this class, can be considered a special case with the choice 1 = 2 = 0.
The two parameters a and b are given by the general mixing rules
XX
a =
Ci Cj aij (T ),
(3.95)
i
b =
Ci bi ,
(3.96)
(3.97)
for which
aij (T ) =
ai (T )aj (T ) 1 dij .
(3.98)
RTic
pci
(3.100)
1
2
a
b
m(i )
Soave-Redlich-Kwong [244]
1
0
1
3
9 2 1
1 3
21
3
0.48 + 1.574i 0.176i2
43
Peng-Robinson [219]
1 + 2
1 2
0.45724
0.0788
0.37464 + 1.54226i 0.26992i2
Table 3.1: Two common equations of state suitable for hydrocarbon mixtures.
Some new parameters have now been introduced. Tic and pci are the pure component critical temperature and pressure, respectively. Further, a and b can be
determined based on the conditions of the system at the critical point, but they
may be regarded as empirical parameters. Pitzers acentric factor, i , is a measure of the deviation of a molecular substance from a perfect spherical form. The
function m(i ) is then some polynomial fitting function.
It should come as no surprise that there is a vast number of available state
equations. Table 3.1 lists two common methods which are used widely, and their
choice of parameters. More complicated equations of state exists, but these do not
seem to have gained widespread acceptance within the oil industry yet.
Comparing equation (3.87) with the two-parameter form (3.93), we can
identify a phase compressibility factor Z by noting that
Z RT
.
(3.101)
p
Substitution of (3.101) into equation (3.93) yields a cubic algebraic equation in
Z ;
pV = Z RT N V =
Z3 [1 + B (1 + 1 + 2 )] Z2 +
A + B (1 + B ) + 1 + 2 + 1 2 B2 Z =
A B + B2 (1 + B ) 1 2 , (3.102)
with
pa
,
(3.103)
(RT )2
pb
.
(3.104)
B =
RT
In solving the cubic equation, we can find up to three real roots, but we obviously
only need one. To remedy this, note that gas tends to have large volumes, while
oil has smaller volumes. Thus, for the gas phase the largest real root is selected,
which then maximises V . Conversely, the oil phase should use the smallest real
root, minimising its phase volume.
A =
44
dZ =
Division by Z gives
V dp + p dV
.
RT
dZ dp dV
,
= +
Z
p
V
and further
d ln p =
dp dZ dV
.
=
p
Z
V
(3.106)
(3.107)
(3.108)
(3.109)
dp,
(3.110)
N
simplifies by using the fact that the system pressure has the dependencies
p = p V , T , N1 , . . . , Nnc .
(3.111)
The total change in the pressure is
dp =
X p
p
p
dN .
dV +
dT +
V
T
N
(3.112)
Division by dN , and keeping all other quantities except for V fixed yields
0=
p V
p
+
.
V N N
(3.113)
The total derivative on the phase volume, dV , has been changed into a partial
derivative since it is only N that is variable. Rearranging gives
V
N
dp =
p
N
dV .
(3.114)
d ln =
+
RT N
Z
V
45
(3.115)
ln
V
1 Z
N p
1
+ .
(3.116)
RT N Z V V
This is then to be integrated with respect to V from zero pressure to a fixed
pressure. At zero pressure, V = , and at a fixed pressure the molar volume
is naturally finite. With these integration limits, we use the relations (3.68) and
(3.88), yielding
Z
1
N p
ln =
dV ln Z .
(3.117)
RT
V
N
V
The pressure derivative can be computed straight from the equation of state (3.93),
and then integration and some simplifications give
Z 2 B
ln = (Z 1) ln (Z B ) +
ln
,
(3.118)
(2 1 ) B
Z 1 B
A
(Z 1)
exp
Z 2 B ( 2 1 )B
,
(3.119)
=
Z B
Z 1 B
=
nc
2 X
=
Cj aij ,
a
(3.120)
j=1
i =
bi
.
b
(3.121)
f = fo ,
= 1, . . . , nc .
(3.122)
Furthermore, the system must always maintain the material mole balance
g
N = N + No ,
= 1, . . . , nc .
(3.123)
46
Newtons method
G = ln f ln fo = 0,
= 1, . . . , nc .
(3.124)
Choosing N as the unknowns, No may be found from the material balance equation (3.123). A Taylor expansion of equation (3.124) then yields
[n+1]
G
[n]
G +
nc
X
G
G = G
=1 N
with
[n]
g N ,
g [n+1]
N = N
g [n]
N1
g [n]
g [n]
, . . . , Nnc
,
(3.125)
(3.126)
Here, [n] is the nonlinear iteration index. Since our aim is G = 0, , the left hand
side of equation (3.125) may be posed to be zero, and the equation is rewritten
into matrix form:
G1 G1
[n]
G1g
g
g
g
N2
Nnc
G1
N1
1
N
G [n]
G2g G2g G2g N g
N1 N2
Nnc
(3.127)
.. 2 = 2. .
.
.
.
.
.
..
..
.. .
..
.
g
[n]
Gnc
Gnc
Gnc
Nnc
Gnc
g
g
g
N1
N2
Nnc
ln f
g
ln fo
g
(3.128)
(3.129)
g = g
g +
g o
g ,
C Ng
N C N
N
N
where the fraction derivatives are
g
N
Co
C
=
,
Ng
(3.130)
g.
(3.131)
47
g
N
1
Z B
+ A g g
(Z 1) +
A
g
N
g
N
!
Z
(2 1 ) B
Differentiating equa-
g
N
A
(2 1 )2 B2
Z 2 B A
+
ln
Z 1 B
(2 1 ) B
"
B Z 2 B
1
Z
2
g
g
Z 2 B N
N Z 1 B
B
g
N
!#
Z
B
g
.
(3.132)
i
ai a
2 X Cj
=
a
,
(3.133)
ij
g
a j Ng
a Ng
N
i b
=
,
b Ng
p a
=
,
(RT )2 Ng
p b
=
.
RT Ng
i
g
N
A
N
B
(3.134)
(3.135)
(3.136)
Z
g
+ 2a2 Z
Z
g
+ a1
Z
g
a2
a1
a0
2
g Z
a2
Z
g
g
N
Z2 +
g Z
a1
g
N
Z +
(3.137)
a0
3Z2 + 2a2 Z + a1
(3.138)
(3.139)
a1 = A + B (1 + B ) + 1 + 2 + 1 2 B2 ,
(3.140)
a0 = A B B2 (1 + B ) 1 2 .
(3.141)
48
N
b
Ci
XX
i
Cj
Cj
+ Ci
aij (T ),
X Ci
g bi .
(3.142)
(3.143)
N
=1 N
, = g, o.
(3.144)
=
F = Pnc
N NHC
=1
Since
Ng + No
(3.145)
C = Fg C + Fo Co
g
= Fg C + 1 Fg Co .
(3.146)
(3.147)
K =
o
C
.
=
Co g
(3.148)
(3.149)
C
,
(K 1)Fg + 1
(3.150)
C = K Co =
K C
.
(K 1)Fg + 1
(3.151)
49
Do
Solve the Rachford-Rice equation (3.153) for Fg .
g
While f 6= fo ,
Algorithm 3.2: The successive substitution algorithm for a two-phase mixture.
Clearly, the sum of each phase fraction is 1, so their difference must be zero:
X
g
C Co = 0,
(3.152)
and into this we substitute both equation (3.150) and (3.151), getting the RachfordRice equation
nc
X
C (K 1)
= 0.
(3.153)
(K 1)Fg + 1
=1
The successive substitution algorithm is given in algorithm 3.2. To start it, one additional set of inputs compared to the Newton iteration is required, namely initial
estimates on the equilibrium ratios. At low pressures a correlation formula based
on ideal fluid properties can be used, for instance Wilsons formula:
Tc
p
.
K = c exp 5.37 (1 + ) 1
T
p
(3.154)
This is only necessary at the first time step of the reservoir flow process; at later
times we can use the previously computed K values from the same grid block.
The successive substitution is clearly a simple algorithm, and for the twophase problem here it is always convergent [195, 196]. But it is also a slow
method. Therefore one may combine it with Newtons method by alternating
between some successive substitution steps and a Newton step [197].
50
(3.156)
.
=
TN +
Z N +
Z T
x
pw x
x
x
x
pw
(3.157)
a2 2 a1
a0
Z
+
Z
+
x
x 2 x
3Z + 2a2 Z + a1
(3.158)
The temperature derivative is zero unless differentiating with respect to temperature, in which case the derivative equals 1. Likewise for the pressure. It remains to
compute the mass derivative:
N X N
=
.
x
x
(3.159)
We will use the fugacity equality condition to find these mole mass derivatives.
First note that
g
N N No
=
+
.
(3.160)
x
x
x
If x = pw or T , the left hand side is zero since the mass is an independent variable:
g
N o
N
= .
x
x
(3.161)
No
N
=
.
N
N
(3.162)
51
(3.163)
Starting from the equality f = fo , differentiating with respect to x and substituting into (3.163) yields
!
g
g
g
X fg
fo No
f fo p
f fo T f
=
+
x ,
o
g +
x
p
p x
T
T x N
N N
(3.164)
where x = 1 if x = N , and otherwise zero. Defining
Fij
fi
Nj
,
g
ri
(3.165)
fi fio
p
p
!
p
+
x
fi fio
T
T
T fi
x ,
+
x N
(3.166)
g
o
F +F
x
N2o
x
..
.
Nnoc
x
= r~ .
(3.167)
3.3
52
Aquatic
Light
Heavy
Water
Gas
Caw = 1
Clw = 0
Chw = 0
Ca = 0
g
Cl = 1
g
Ch = 0
Oil
Cao = 0
Clo = 1 Cho
Cho
Table 3.2: Phase fractions of the classical black-oil model. The light component
may exist in both the gas and the oil phase, as indicated by the fraction Cho .
We may simplify the auxiliary thermodynamical calculation into explicit relationships between pressure, temperature and composition, and this will result in
equations of much the same form as those in Chapter 2. There are two types of
thermodynamical quantities; intensive and extensive, respectively. The former has
point values and are independent of the system size, while the latter is dependent
on the system size.
It is a fundamental postulate that the intensive state of a nc component system
is given by nc + 1 variables, while its extensive state is given by nc + 2 variables.
These variables may be freely chosen, and we use pressure pw , temperature T ,
and system volume V to reduce the number of variables to nc 1 for the extensive
system.
(3.168)
We now need to find the functional dependencies of Cho . Since the aquatic component is restricted to water, and the hydrocarbon components cannot mix with
water, our system is actually two systems: one pure water system and a hydrocarbon mixture. The hydrocarbon mixture has just two components, and its intensive
Aquatic
Light
Heavy
53
Water
Gas
Oil
Caw = 1 Clw
Clw
Chw = 0
Ca = 0
g
Cl
g
g
Ch = 1 Cl
Cao = 0
Clo = 1 Cho
Cho
Table 3.3: Phase fractions of the extended black-oil model with solubility of the
g
light component into the water phase and a volatile heavy component. Clw , Cl
and Cho are now variable.
state can be described by three variables. We chose these three to be pressure pw ,
temperature T and oil saturation relative to total hydrocarbon phase saturation
Sor =
So
.
So + Sg
(3.169)
Therefore;
Cho = Cho (pw , T , Sor ).
(3.170)
The phase mole densities are also intensive state variables, and therefore have
the dependencies
(3.171)
w = w (pw , T ) , g = g pw , T , Cho , o = o pw , T , Cho .
Mass conservation is given by equation (3.16), with one equation for each component:
!
X
1 N
+
u~ C = q , = a, l, h.
(3.172)
V t
Extended black-oil
The classical black-oil model may easily be extended to more complicated mixtures. For instance, the light component group can include substances which are
water soluble, while some of the heavier components may vaporise into the gas
phase. This model is called the extended black-oil, but without hydrocarbon solubility in water, it is instead referred to as just the symmetrical black-oil model.
Since this is a coupled three component problem, we need four variables to
describe the intensive system state, and those will be pw , T , and the phase saturations Sw and So . The phase fractions are tabulated in table 3.3, and we must
prescribe the fractions
Clw = Clw (pw , T , Sw , So ),
Cl = Cl (pw , T , Sw , So ),
54
o = o pw , T , Clo , Cho .
(3.174)
(3.175)
For the pressure and phase saturations we shall simply use the volume balance
equations (3.44) and (3.52). To use these, N /t and T /t are needed along
with phase volume derivatives. The time derivatives of mass and temperature are
known from before, so we are left with calculating V /pw , V /T and V /N .
For the sake of generality, we will consider the extended black-oil model.
Applying the results to the classical black-oil model is then just a matter of zeroing
some terms.
Now, the number of moles of each component is
Na = Vw w ,
X
g
N =
C V
(3.176)
Nl = Clw Vw w + Cl Vg g + Clo Vo o ,
g
o
Nh = Ch Vg g + Ch Vo o .
Solving for the phase volumes gives
Na
,
w
Cho Nl Clw Na Clo Nh
=
g ,
g
g Cl Cho Clo Ch
Vw =
(3.177)
Vg
(3.178)
Vo
g
g
g
Nh Cl Cho Clo Ch + Clo Ch Nl + Clw Na
.
=
g
g
o Cho Cl Cho Clo Ch
(3.179)
(3.180)
55
Rs
Bo
pw
BP
BP
pw 1
pw 2
Figure 3.3: Bo and Rs for the classical black-oil model plotted against pressure. o
decreases initially due to dissolution of the light component into the oil phase. At
the bubble point pressure pBP
w , no more of the light component can be dissolved,
and Rs becomes constant at higher pressures while Bo decreases as /pw 0.
The bubble point pressure varied with temperature and composition, and we have
plotted two. The y-axis is not to any particular scale.
Substituting the volume derivatives into equation (3.44) gives the black-oil pressure equation, and substitution into equation (3.52) gives saturation equations for
Sw and So . Temperature is solved from equation (3.38). Notice that there is no
need to solve any of the mass equations (3.16) explicitly anymore.
Ma w + Ml Clw w ,
|{z}
| {z }
wa
wl
g = Ml Cl g + Mh Ch g ,
| {z } | {z }
g
(3.182)
o = Ml Clo o + Mh Cho o .
| {z } | {z }
ol
(3.181)
(3.183)
oh
Recall that the M s are the component molecular weights, and the s are the
partial phase densities. Furthermore, let SC
be the phase densities at surface
(standard) conditions, and it is understood that the phases there consist of pure
56
components, i.e., the water phase consists exclusively of the aquatic component.
Then, define the volume factors
SC
w
,
w
SC
g
,
=
g
SC
= o .
o
Bw =
(3.184)
Bg
(3.185)
Bo
(3.186)
These are clearly all 1 at the surface, but can exhibit different types of behaviours
at reservoir pressure, see figure 3.3. Next, define the dissolution ratios
Rlw
wl
w
g
g
Rh
Rs = Rlo
ol
g
o
wl
,
SC
w
g
h
= Bg SC ,
g
ol
= Bo SC .
o
= Bw
(3.187)
(3.188)
(3.189)
These ratios are zero at surface conditions, and typically increasing downwards in
the reservoir up to a certain point. Using the dissolution ratios and volume factors,
we can solve for C and as needed in our black-oil formulation.
Chapter 4
Numerical methods
Comparably few of the reservoir flow equations provided admit analytical solutions. Under the assumption of homogeneous fluid and medium conditions, the
single phase immiscible equations (2.30) and (2.33) can be solved in some regular domains. For two phases, the Buckley-Leverett problem (2.57), (2.62) can
be solved for a restricted set of fractional flow functions fw . In addition, the
McWhorter problem (2.57), (2.60) can be solved by quasi analytical means, that
is, reduced to an integral expression that can be solved by numerical quadrature.
However, an arbitrary, physical correct problem described by our reservoir
flow equations has in general no known analytical nor quasi analytical solution.
Therefore, we will resort to numerical means by discretising the partial differential
equations into a set of coupled nonlinear equations which in turn are formulated
into a sequence of linear equations by Newtons method. In formulating discretisation methods, we will make use of some canonical forms. From linear partial
differential equation theory [111], the types of partial differential equations may
be classified as
Elliptic Time-independent potential equations:
Ku = q.
(4.1)
(4.2)
(4.3)
58
Numerical methods
i4
i4
i1
i3
i1
i3
i
i2
i2
4.1 Preliminaries
Let be the reservoir domain and the boundary . We subdivide into the
grid cells i , and the number of grid cells will be n . Let i be the surface of i ,
and further subdivide the surface into the subsurfaces ik where k = 1, . . . ni and
ni is the number of faces of i . Thus;
i = i ,
i =
ni
[
ik .
(4.4)
k=1
We let Vi be the volume of i , Ai the surface area of i , and Aik likewise the
surface area of ik . x~i is the centre spatial point in i and x~ik is the midpoint of
ik .
4.1 Preliminaries
59
i = 1, . . . , n ,
(4.5)
with
[t+1]
ui = ui
[t]
ui ,
(4.7)
in which [t] is the nonlinear iteration index. The goal is to obtain the unknowns
such that Ri = 0, i, and therefore we set the left hand side of equation (4.6) equal
zero, and rearrange into the matrix form:
R1
R1
R1
[t]
u
u1
u2
R1
n
u1
R2
R2
R2
[t]
u1
u2
un u2
R2
.
(4.8)
..
.. .. = .. .
..
.
.
.
.
. .
.
[t]
Rn Rn
Rn
un
Rn
u
u
u
1
After solving equation (4.8), we update the residual and the Jacobian matrix, and
keep on solving the system until the corrections ui are sufficiently close to zero.
Let us here note that most discretisations yield sparse Jacobian matrices, and
then it is recommended to use a Krylov iteration technique, possibly preconditioned [25]. This combined technique often goes under the name of NewtonKrylov methods, and it ranks among the more popular nonlinear solution algorithms [161, 228].
60
Numerical methods
u
K = q.
t
(4.9)
= w, g, o,
(4.10)
while the temperature equation also has a T potential. Here we just include one
term for the sake of simplicity. We have also split the conductivity into an
intrinsic media property K and a solution dependent mobility . This distinction
is of some importance to certain numerical methods.
Now, there are roughly three schools for the solution of these types of equations, namely
Finite difference Replaces the derivatives by their finite difference approximations. Since we will use the method in an integral conservation law setting,
it is also coined the control volume or integrated finite difference method.
Finite element Uses a mathematical rigorous variational formulation, and like
the finite volume method it applies to a wide variety of problems.
Mixed finite element An extention of the finite element method which simultaneously computes fluxes and potentials.
Let us here emphasise that in the discretisation we must honour the conservation
law from which the equations were derived. In particular, the flux (K) n~
must be continuous once it is discretised. Failure to do so implies that the amount
of mass leaving from cell i to j may not equal the amount entering j from i.
Another aspect is the treatment of boundary conditions. Boundary conditions
can be used for coupling different spatial regions together, for instance a water
aquifer with an oil reservoir, a production/injection well with the reservoir, or
even the source rock with the carrier bed for secondary oil migration. The two
types of boundary conditions we consider are
Specified flux (Neumann) (K) ~n is given on the boundary. For an inward
flux
(K) ~n < 0,
(4.11)
61
(4.12)
(4.13)
(4.14)
Each of the three discretisation methods have numerous subclasses of methods, and we shall briefly cover some of these, but we make no pretence of a
comprehensive treatment; for that refer to [5] and the references therein.
i
i
i t
Replacing the time derivative by finite differences, and using that our parameters
are cell centred yields
uin+1 uni
Vi ci
Z
i
(K) ~n dS = Vi qi .
(4.16)
Vi ci
t
k=1 ik
(K) ~n dS = Vi qi .
(4.17)
Let the set of control volumes contributing to the flux integral across ik be given
by the set Mik :
Mik = {i, ik , . . .} .
(4.18)
62
Numerical methods
ik
jM
(4.19)
ik
The values ij are the transmissibilities for the conductivity tensor K, and we will
show how these can be computed shortly. To ensure stability, the mobility is
evaluated upstream on ik , giving ik . This upstream value is defined by
(
P
i ,
ij j < 0,
PjMik
ik =
(4.20)
ik ,
jMi ij j > 0.
k
(S )i + (S )ik
.
(S )i + (S )ik
(4.22)
Note that the upstream weighting causes the fluid to flow preferably along grid
directions [112, Chapter 1]. Substituting equation (4.19) into (4.17) yields
ni
X
uin+1 uni X
ik
+
ij jn+1 = Vi qi .
Vi ci
t
(4.23)
jMik
k=1
k=1
(4.24)
jMik
It is conventional to evaluate the flux potential at the next (unknown) time step
for optimal stability since the elliptic operator has infinite propagation speed.
Two point flux approximation
K ~n dS
(4.25)
ik
(~n K) dS.
(4.26)
ik
63
(4.27)
and
i ik
.
(4.28)
fik Aik kKik ~nk k
k~
xik x~ik k
fik k~
xik x~i k fik 1
,
=
Aik kKi ~nk
Aik Ti,ik
(4.29)
ik ik =
(4.30)
and
where
Ti,ik =
kKi ~nk
.
k~
xik x~i k
(4.31)
(4.32)
Aik
1
Ti,i
+ Ti1
k
k ,i
= ii .
(4.33)
(4.34)
The two point flux approximation is only valid when K n~ is orthogonal to the
other normal vectors of the same control volume; i.e., the grid is K-orthogonal
[8]. Skewed grids or a non-diagonal tensor K are typical violations of this. By
64
Numerical methods
4
2
ik
m
X
m ~nik K m .
(4.37)
Aik
m
To make the discussion concrete, we consider the O-method in 2D, as shown in
figure 4.2. We wish to find the fluxes across the four half edges included in the
given interaction region, and the potential will be assumed to be linear in each
cell. Consider the lower left cell in the figure, where we represent by
= 1 1 + 1 2 + 4 3 ,
(4.38)
in which 1 equals one in the cell centre and zero in the two mid points, 2 equals
one at 1 and zero in the two other points, and 3 equals one at 4 and it is zero
in the two other points. We do likewise on the other cells. As for the two-point
flux, we wish to eliminate the midpoint potentials by assuming flux continuity.
The half edge flux from cell i to cell j is
fi,j =
Ai,j X
m n~i,j Ki m ,
2 m
i 6= j.
(4.39)
i 6= j.
(4.40)
65
f2,1 ,
f3,2 ,
f4,3 ,
f1,4 .
=
=
=
=
(4.41)
(4.42)
(4.43)
(4.44)
The unknowns here are both the cell centred potentials and the midpoint potentials. We can write the left hand side of these equations in the matrix form
f~ = C ~ D,
~
(4.45)
where ~ is the vector of midpoint potentials and ~ is the vector of cell centred
potentials. By the flux continuity constraints (4.40), we also have
A~ = B ,
~
(4.46)
This equation can be solved for the midpoint potentials, yielding the half edge
fluxes
~
(4.47)
f~ = C A1 B D .
| {z
}
T
Each row in the matrix T corresponds to transmissibilities for each half edge flux
Two interaction regions must be used to construct the
in the interaction region.
whole edge flux transmissibilities in 2D, and four in 3D.
While we have just presented the O-method type of interaction region, other
type of interaction regions must be used for highly skewed grids (the Z-method
[206]), and special care must be taken for faulted grids (O-method for faults
[102, 11]). Furthermore, computationally cheaper multi point flux methods which
couple fewer cells in 3D are also available (the U-method [9, 10]). A recent overview can be found in [7].
Boundary conditions
For a Neumann type boundary condition we know the flux on the boundary ik .
Then in equation (4.19) we insert this specified flux rather than using transmissibilities. The mobilities are still evaluated upstream. This means that for an outward
flux, we use the inner mobility, but if the flux is inward, a boundary mobility must
be supplied.
However, for a Dirichlet boundary condition with specified boundary potential
, we must modify the transmissibility calculation slightly. In the preceding we
have always eliminated the midpoint potentials to arrive at an expression for the
transmissibilities, but since the boundary midpoint potentials are now known, we
can instead insert these into the flux continuity constraints directly, see figure 4.3.
66
Numerical methods
111111111111
000000000000
000000000000
111111111111
2
Figure 4.3: Handling of Dirichlet boundary conditions for the multi point Omethod. 2 and 3 are known potentials, and need not be eliminated by flux
continuity assumptions.
uj j ,
(4.48)
where uj are (unknown) scalars and j are (known) basis functions. We shall
return to the nature of these basis functions later.
To derive the formulation, multiply equation (4.9) by i and integrate over the
whole domain, giving
Z
Z
Z
u
c i dV ( K) i dV = qi dV .
(4.49)
Next, we replace the time differential by finite differences and perform integration
by parts on the flux term:
Z
un+1 un
i dV +
c
t
K i dV
Z
(K ~n) i dS =
qi dV .
(4.50)
Rearranging, we arrive at
Z
c n+1
u i dV +
t
Z
(K) i dV =
Z
Z
c n
u i dV + qi dV + (K ~n) i dS. (4.51)
Inserting the discrete representation (4.48) for u yields the residual equation for
67
i+1
00
1
11111111
1
100000000
1 0
0
1 0
1 0
10
xi2 xi1 xi
xi+1 xi+2
Figure 4.4: Linear hat basis functions in the one dimensional case. These satisfy equation (4.55) in each cell with the (internal) boundary conditions i (xj ) =
ij .
node i:
def
Ri =
Z
ujn+1
Z
c
j i dV + (K) i dV
Z
Z
Z
X
c
n
uj
j i dV qi dV (K n~) i dS. (4.52)
t
(4.53)
If the pressure is the unknown that is being solved for, it would now be easy to
handle it implicitly.
Basis functions
All that remains is the computation of the integrals in the residual equation Ri .
To do this, we must specify the basis functions i so that the integrals can be calculated. There are many different choices [46, 47, 281], so to limit the exposition
we will only discuss linear finite elements and the multiscale variant.
The linear finite elements, also called hat functions, are the weak (variational) solutions of the elliptic boundary value problem
i = 0
(4.55)
with the (internal) boundary conditions i x~j = ij and consistent (linear)
boundary conditions on the boundaries of each cell. See figure 4.4 for an illustration.
68
Numerical methods
4
3
4
Figure 4.5: Oversampling for the construction of the multiscale basis functions.
First equation (4.56) is solved to find in the oversampled cell (dashed). Then we
restrict and rescale to the actual cell (solid) to find the basis functions , which
capture subscale information, but are free of numerical boundary layers. The
symbols in the corners indicate where the different basis functions equal one.
These basis functions are simple to construct, and they have integrable differentials, which is what is necessary to carry out the integration in the residual
equation (4.52). Depending on the geometry of the cells, the integrals can be
carried out analytically rather than numerically.
However, the numerical simulation grid is usually constructed by upscaling a
much larger geological grid, and the upscaling transforms the fine scale absolute
permeability Kf into the coarse scale tensor K. However, it is clear that accuracy
As shown in [146, 147], this method correctly captures fine scale phenomena, and
its convergence was proven for periodic fine scale oscillations. But it was also
demonstrated that the use of the same boundary conditions for the linear case with
the multiscale case gave rise to a oscillatory boundary layer in the basis functions.
Since the thickness of the boundary layer is proportional to the frequency of
the subscale variations, it was proposed to use an oversampling technique which
computes the basis functions on slightly enlarged elements, and then extracts oscillation free multiscale basis functions. Consider the two dimensional case in
figure 4.5. We there first compute the multiscale basis functions j on the enlarged (dashed) cell, then we wish to restrict to the actual cell by
X
i =
cij j ,
(4.57)
j
where cij are scalars chosen such that i x~j = ij . In practice, this means solving
the above linear system where the two sets of basis functions are evaluated in all
the nodes of the actual cell.
69
Boundary conditions
For Neumann conditions we merely substitute the specified flux (K) n~ into
and evaluate this integral using either the inner for outward flux or a specified
outer for an inward flux.
In the case of Dirichlet boundary conditions for the potential , we fix its
coefficients from equation (4.53) so that if i is nonzero on a Dirichlet boundary,
i would be set equal to the boundary potential there. As a result, the above
integral makes no contribution to either the residual or the Jacobian matrix of
equation (4.8), and we can therefore omit its evaluation.
Flux recovery
A problem with the finite element method is that it does not guarantee that the
fluxes are continuous across the cell boundaries. Using the representation of
from equation (4.53), the flux is
X
F~ = K =
j Kj .
(4.59)
j
The finite element method only guarantees continuity in , but not in the flux F~ .
Let us therefore try to find another representation for the flux, and project the
computed flux onto this new representation space. Henceforth, we introduce a
new set of continuous vector basis functions
~j , and represent the flux by
X
K = F~ =
Fj
~j .
(4.60)
j
The vector basis functions weakly satisfy the elliptic equation
1
~j = V1i ,
i = ,
Vi
~j = i ,
(4.61)
~j n~ =
ij
.
Ai,j
(4.62)
70
Numerical methods
The L2 projection of the discontinuous flux onto the function space spanned
by the functions
~j is
Z
F~
~i dV =
(K)
~i dV .
(4.63)
i,
Inserting our representation for the unknown continuous flux F~ and the known
potential gives
X
j
Z
Fj
~j
~i dV =
X
j
Z
j
Kj
~i dV ,
i.
(4.64)
This is a set of linear equations in the Fj coefficients, and once solved gives a continuous, conservative flux field. If our original problem has Neumann boundary
conditions, then we should fix the Fj values on the boundary to the specified flux.
A few remarks are in order here. There are many other ways to choose the
vector basis functions
~j . For instance, the use of higher order polynomials on
coarser meshes can give much higher accuracy [138] for smooth problems, a phenomenon known as superconvergence. But for non-smooth and nearly singular
problems, this may not be the case.
The continuous flux recovery need not be done globally, it can also be done
locally by just a few
~j functions at a time. While this is cheaper computationally,
the accuracy can also be lower.
u
+ F~ = q,
t
F~ = K.
(4.65)
(4.66)
We shall now use two sets of basis functions. First we have i for representing u,
and secondly we have
~i which will be used to represent the flux. Multiplying the
first equation by i and integrating gives
Z
c n+1
u i dV +
t
c n
F~ i dV =
u i dV +
t
Z
qi dV .
(4.67)
71
integrated:
Z
Z
1 ~
K F
~i dV =
~i dV
(4.68)
Z
Z
=
~i dS
~i n~ dS.
(4.69)
uj j ,
(4.70)
Fj
~j .
(4.71)
F~ =
X
j
~j i dV (4.72)
uj
Ri =
t
j
j
Z
Z
X
c
j i dV qi dV ,
unj
t
j
Z
Z
X
n+1
1
F~ def
K
~j
~i dV
~i dS +
(4.73)
Fj
Ri =
j
Z
~i ~n dS.
Let us here remark that there are no derivatives on anymore, so for the Darcy
flux case we simply use
= p + h.
(4.74)
While the mixed finite element method can guarantee continuous fluxes, it does so
by the introduction of additional variables and the resulting system is much harder
to solve. In fact, the problem can be considered a saddle point type optimisation
problem, thereby the formulations name. There exist many customised iterative methods for solving the discrete saddle point equations, see [34] for a recent
overview. Despite these techniques, the problem is still more computationally
demanding than the other discretisations discussed herein.
Basis functions
The residual equations have no derivatives on the j basis functions, and consequently the simplest choice is to let j (i ) = ij . For the vector basis functions
72
Numerical methods
~j we can use the same type as for the continuous flux recovery for the finite
element method. Notice that with this choice, we can integrate the divergence of
~j = F1i ,
~j = Kf i ,
(4.75)
ij
.
Ai,j
(4.76)
This must now be solved numerically, for instance by using the Raviart-Thomas
mixed finite element method.
As for the multiscale finite element case, oversampling should be used to remove oscillations from the basis functions. We first compute the basis functions
~i =
cij
~ j.
(4.77)
cij are chosen such that the boundary condition (4.76) holds. However, we cannot
in general enforce this pointwise along the whole cell boundary. Instead we can
let the average value of
~j on i,j equal 1/Ai,j and zero on the other surfaces.
Boundary conditions
Dirichlet boundary conditions have a prescribed value, which means that the
boundary integral in equation (4.73);
Z
~i ~n dS,
(4.78)
73
Zt
t
Z
K 1 F~
~i dV =
~i dV .
(4.80)
Since elementwise smoothness is satisfied, integration by parts can be applied,
but the lack of inter element continuity reveals a slightly different set of equations
than earlier:
Z
Z
Z
Z
c n+1
c n
~
u i dV F i dV =
u i dV + qi dV
(4.81)
t
t
nk Z
XX
F~ ~n i dS,
Zk
Z
K 1 F~
~i dV =
l=1 kl
~i dV
nk Z
XX
(4.82)
(
~i n~) dS.
k l=1 kl
The summation over all the boundary segments kl has appeared now because
there is no explicit inter element continuity anymore. For the determination of F~
and on kl the following averaging notations are introduced;
JKk,kl
F~ k,k
l
def
= k ~nk + kl ~nkl ,
def 1 ~
Fk + F~kl ,
=
2
(4.83)
(4.84)
1
k + kl ,
2
def ~
= Fk ~nk + F~kl n~kl ,
def
(4.85)
(4.86)
74
Numerical methods
(4.87)
(4.88)
However, for certain grids this method produces singular Jacobian matrices. A
stabilised method [48] is
|k = {}k,kl ,
l
F~ = F~ k,k JKk,kl ,
kl
(4.89)
(4.90)
in which > 0 is a penalty term which is problem and grid dependent. Finally,
we mention the somewhat popular local discontinuous Galerkin (LDG) methods
which have
|k = {}k,kl ~ {}k,kl ,
l
q y
F~ = F~ k,k + ~ F~ k,k JKk,kl .
kl
(4.91)
(4.92)
ui,j j .
(4.93)
75
Multiple fluxes
For the multiphase equations multiple fluxes need to be compute, and for the sake
of conservation, these fluxes must be continuous. We can handle this in a mixed
finite element formulation by ensuring that just the total velocity is continuous
[65, 113].
Consider the idealised multiphase pressure equation:
X
pw
+
t
!
u~
(4.94)
= q.
pw
+
t
!
u~
(4.95)
= q,
K 1 u~T =
u~ =
pw + pc + h . (4.96)
f u~T + Kh
( ) +
!
K pc pc
i dV .
(4.97)
The difficulty is here the representation and computation of the Kh and Kpc
76
Numerical methods
are instead
pw
+
t
!
u~
= q,
(4.98)
K 1 u~n = n pw + pnc + n h ,
(4.99)
1
K u~w = w (pw + w h) .
(4.100)
Of course, the mixed formulation of this system gives a much larger number of
unknowns and it can thus be harder to solve.
4.2.4 Comparisons
Let us conclude this section by offering some comparative remarks on the three
distinct discretisation methods presented.
Finite differences
Continuity in all fluxes and explicit representations for all terms of the form
K are enforced. However, the flux is only known on the cell surfaces, not
the cell interior. This is unfortunate for some hyperbolic solvers which
in
rely on a global flux representation.
Convergence has only been proven for some classes of problems (homogeneous conductivity on certain regular grids [165]).
The use of upstream mobility evaluation causes the computed solution to be
grid orientation dependent.
Upstream mobility renders the Jacobian matrices to be unsymmetric, and
cannot be solved by the fastest available iterative methods.
Finite elements
77
Solves both for the fluxes and the potentials, resulting in high accuracy and
global representations for both.
Similar convergence theory as for the finite element methods exists, but it is
not as comprehensive.
No grid orientation effects for the flux.
The introduction of additional unknowns gives a much larger system to
solve. Furthermore, the system matrix is not positive definite.
4.3
We now turn to the second major topic of this chapter, namely the hyperbolic
formulation of the conservation laws and the subsequent numerical discretisation. While the discretisation methods of the preceding section can be used for
all of our conservation equations, they miss the advective driven nature of the
flow process and can produce numerical artifacts such as oscillations or excessive smearing. The saturation, mass and temperature equations have advective
fluxes primarily determined by the Darcy velocities, and we shall exploit this to
formulate some different advection-diffusion formulations. Based on these formulations, we present two common discretisation methods, which are
Finite volumes Similar to the control volume method for parabolic equations,
this one computes discrete fluxes across cell interfaces and discretely satisfies the integral conservation law.
Characteristic methods Follows the Lagrangian nature of the advective transport, which allows long time steps and is not affected by grid orientation
issues.
We shall not make use of much of the theory of hyperbolic conservation laws, but
it is necessary to recall the conditions for the existence of a unique solution to the
pure hyperbolic problem (4.3) [6, 145, 175, 174, 159];
u
+ (f (u)~
v ) = 0,
t
where v~ is vector independent of u.
(4.101)
78
Numerical methods
f (uL ) f (uR )
.
uL uR
(4.102)
,
u uR
u uL
u (uL , uR ).
(4.103)
and we shall look into different ways to use the Darcy phase velocity in this flux.
Fluxes in the saturation or temperature equations are similar, albeit usually somewhat simpler. The goal is to arrive at equations of the advection-diffusion form
u
+ (f (u)~
v ) K = q,
(4.105)
t
and to do that, we must identify f and v~. Notice that as earlier, we can and do
have multiple advective and diffusive fluxes.
c
Phase velocity
The very simplest formulation assumes no change in the Darcy velocity u~ while
solving the flow equations. Then each component will have as many advective
directions as there are phases. While this neglects the dependency of u~ on the
changing phase state, it is so easy to apply that the formulation is commonly found
in IMPES (IMplicit Pressure, Explicit Saturations) type simulators.
79
Total velocity
We can instead use the total velocity formulation presented by equation (2.77).
Its assumption is that the total velocity is more slowly changing than the phase
velocities, and it leads to the component fluxes
!
X
X
X
.
(4.106)
C f u~T + Kh
( ) +
K pc pc
C f
K pc pc ,
{z
}
(4.107)
diffusive
where
fT =
fh
C f ,
C f
(4.108)
( ) ,
(4.109)
and
v~h = Kh.
(4.110)
We now see that there are just two advective directions. One for the total velocity and another for the gravity direction. Furthermore, the capillary forces now
constitute a diffusive force, just as was the case for the pure parabolic formulation. However, as demonstrated in [246] and [215, Chapter 6], the diffusive forces
are better behaved numerically since the possible large derivatives in the capillary pressure curves are damped by the product of the phase mobilities and the
fractional flow function f [0, 1].
Constant pressure
A more pragmatic hyperbolic formulation can be had by assuming that only the
water phase pressure is constant, allowing the rest to vary. To formulate it, we
rewrite the Darcy phase velocity as
u~ = (p + h)
= K pw + pc + h
= Kpw Kh Kpc .
(4.111)
(4.112)
(4.113)
80
Numerical methods
Defining
v~pw = Kpw ,
(4.114)
we find that
u~ = v~pw v~h Kpc .
(4.115)
This is then substituted into the advective compositional flux term, yielding
X
p
f v~pw fh v~h
(4.116)
C Kpc ,
|
{z
}
advective
{z
}
|
diffusive
where
p
C ,
(4.117)
C .
(4.118)
fh =
This is as simple as the total velocity formulation, with the caveat that the capillary
diffusive forces are not multiplied by as small a factor. This can in turn give
numerical difficulties.
i
i
i t
Replacing the time derivative by a finite difference approximation, and integrating
from t = tn to t = tn+1 yields
Z
Z
n+1
cu
Z tn+1 Z
n
dV dt
cu dV dt+
tn
Z tn+1 Z
(f v~ K) ~n dS dt =
q dV dt.
tn
(4.120)
= uni
1
ci Vi
Z tn+1 Z
tn
1
(f v~ K) ~n dS dt +
ci
Z tn+1
tn
qi dt.
(4.121)
81
uni
1
ci Vi
Z tn+1 X
ni Z
tn
k=1
1
(f v~ K) n~ dS dt +
ci
ik
Z tn+1
qi dt.
(4.122)
tn
Defining
Z tn+1 Z
Fik =
tn
f
(~
v ~n) dS dt =
ci Vi
ik
Z tn+1
g dt,
(4.123)
tn
and using the finite difference approximation (4.19) for the diffusive flux K,
we get
uin+1 +
ni
ni
X
t
t X X
n+1
n
Fik + qi .
ij j = ui
ik
ci Vi
ci
k=1
jMik
(4.124)
k=1
We have here replaced the time integrals on the source term and the diffusive
flux term by first order finite differences. For maximum stability, these should be
treated implicitly. Equation (4.124) can then be cast into residual form and solved
by a Newton iteration as earlier.
Riemann problems
To finish the discretisation, it remains to compute the flux integral Fik from equation (4.123). Since the advective flux is local, we will mostly consider explicit
solution procedures based on solving Riemann problems. The Riemann problem
is the calculation of Fik where u = uL on one side of ik and u = uR on the other.
Thus we may consider g in equation (4.123) to be a function
g = g(uL , uR ),
(4.125)
and we must integrate it from t = tn to t = tn+1 . By satisfying the RankineHugoniot (4.102) and Olejnik (4.103) conditions we get a set of wavespeeds {p }.
Since we assume that g is only dependent on the cells adjacent to ik , all waves
emanating from ik must be contained within i ik . This can be done in one
of two ways:
g can be dependent on the solution state in other cells. The most extreme
case is an implicit solution technique which couples all cells together in
computing g.
We can restrict the time step t so that the waves do not propagate too far
in each step.
82
Numerical methods
* 1
@
6
@
tmax
@
@
?
3 @
I
2
ik
ik
ikj
Figure 4.6: Illustration of the CFL limit (4.126). Three waves are emanating from
ik . The fastest has the speed 1 , and in this space-time plot, the point on the time
axis where it intersects ikj gives the largest stable time step.
The first option will not be pursued here, so we shall instead formalise the second.
Let the outward normal distance on ik to the first cell boundary be x. Then to
keep the wave propagation local, we must satisfy
t min
p
x
.
|p |
(4.126)
This is known as the Courant-Friedrichs-Lewy (CFL) limit [81, 82], and by obeying it we can proceed to solve the Riemann problems.
Riemann solvers
There is a wide variety of available solvers for Riemann problems, see [257] for a
recent overview. Here we will just briefly mention some common ones:
Upwind Evaluates g based on the local velocity field:
gi , v~ n~ > 0
=
, Fik = t .
gik , v~ n~ < 0
(4.127)
83
High resolution This encompasses a class of methods trying to attain higher order accuracy in smooth regions. They are all based on the ReconstructSolve-Average (RSA) algorithm:
1. Reconstruct a higher order function u from the cell-centred u.
2. Solve the advective part of the equation by using u as initial data.
3. Average the computed u on each cell to find u at next time step.
The reconstruction step must be careful to avoid the introduction of oscillations near discontinuities of the solution, consequently minimum modulo
type reconstructions into linear or quadratic polynomial are common. When
solving with this higher order u data, approximate Riemann solvers are almost always used.
Central schemes The central schemes is a class of methods combining the simplicity of the upwind scheme with the accuracy of the high resolution methods. The simplest of these is the first order Lax-Friedrichs method [173],
but more recently we have the second order Nessyahu-Tadmor scheme [201]
and further generalisations [168].
Boundary conditions
Flux boundary conditions are treated naturally by prescribing advective fluxes for
f (~
v n~) on the boundary and treating the diffusive fluxes as for the finite difference
method.
Dirichlet type boundary conditions are treated by introducing a set of ghost
cells adjacent to the boundary. In these cells we fix the solution state, and then
the Riemann solver will naturally compute the correct advective boundary flux.
Notice that this may require the evaluation of f outside the boundary, depending
on which kind of Riemann solver is in use.
84
Numerical methods
= + v~ ,
(4.128)
t
where is the time along the characteristics with advective speed v~. Clearly we
can expect that
u
u
(4.129)
t
holds for large k~
v k, as demonstrated in [150]. Closely tracking the advective field
thus leads to small temporal truncation errors. Furthermore, since the tracking can
be done on an auxiliary streamline grid, the orientation of the Eulerian grid will
have a small effect on the characteristics themselves.
However, we must also consider that the process is driven by multiple fluxes.
Operator splitting is used to decompose a problem with several advective and
diffusive fluxes into one hyperbolic problem for each advective flux, and one collected parabolic problem for the diffusive fluxes. The hyperbolic problems will be
treated here, while a standard technique may be used for the parabolic problem.
To maintain the advantages of the characteristic methods, it may be preferable to
use a highly implicit finite element type discretisation to ensure long time steps
and no grid orientation dependencies.
With this in mind, we will present some classes of characteristic methods
[114], (arbitrarily) classified as:
Streamlines Decouples the multidimensional advection-diffusion process into
multiple 1D streamlines for advection and then a diffusion solver is applied.
Characteristics Splits the advection-diffusion equation into one linearised advection equation and a corrective advection-diffusion equation.
Adjoint characteristics Closely related to the characteristic methods, adjoint
methods are based on a variational framework where the test functions satisfy a dual linear characteristic equation.
Of course, a drawback in most characteristic methods is either a failure to conserve mass or a failure to satisfy the hyperbolic conditions (4.102)-(4.103). These
problems will be discussed for each method.
Streamlines
Perhaps the most popular of the characteristic methods, the streamline philosophy
is to decouple the advection into a set of independent one-dimensional hyperbolic
equations which can be solved efficiently and independently.
85
(4.130)
(4.131)
(4.132)
The first term gives the advective velocity V~ while the second term can be considered a source/sink term. If v~ is the total velocity, its divergence is zero for
incompressible problems (confer equation (2.57)). Otherwise, we subtract fc v~
from the right hand side of equation (4.131) so that equations (4.130)-(4.131) become
u ~
1
+ V f (u) = 0, V~ = v~,
t
c
u
c K = q f ( v~).
t
(4.133)
(4.134)
= V~ ,
(4.135)
u f (u)
+
= 0.
t
(4.136)
(4.137)
that is r~() maps from to the usual (x, y, z) space. The streamline algorithm is
summarised in algorithm 4.1, and most of its steps are now straight-forward.
However, the difficulty lies within step 4, as mapping back to the Eulerian grid
cannot be done uniquely in general since some grid blocks may contain multiple
streamlines, while some may contain none at all. While some possible remedies
are available [256, 29], some mass is usually lost in the transfer.
86
Numerical methods
For a linear advection problem, the method of characteristics yields the analytical
solution [111, Chapter 3], but for nonlinear fluxes we encounter issues of nonuniqueness due to shock formations. To apply a characteristic method, we must
create an auxiliary flux function that ensures a unique solution.
As an example, consider the idealised Buckley-Leverett flux function
f (u) =
u2
u2 + (1 u)2
(4.138)
as shown in figure 4.7, and consider a discontinuous state with uR = 0 on one side
and uL = 1 on the other. If we assume that the solution evolves as a pure shock,
then the Rankine-Hugoniot condition (4.102) states = 1, but one may verify that
Olejniks entropy condition (4.103) is nowviolated for some value of u. Instead,
we can verify that a shock with uL = 1/ 2 and uR = 0 and a rarefaction with
uL = 1 and uR = 1/ 2 satisfies the entropy condition.
87
0.8
0.6
0.4
0.2
0.2
0.4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Figure 4.7: Flux function f (u) in blue, the concave envelope fc (u) in green, and
the residual flux fr (u) = f (u) fc (u) in red.
Consequently, a consistent solution scheme must use a modified envelope flux
function fc to ensure convergence to the correct solution to the hyperbolic problem. Therefore we split the advection-diffusion equation (4.101) into the system
u
+ (fc (u)~
v ) = 0,
t
(4.139)
u
+ (fr (u)~
v ) K = q.
t
(4.140)
c
c
Here fr = f fc is the residual flux function. It is only nonzero at shock discontinuities, and it serves to sharpen the front since the flux fc may smear it
[109, 86, 84]. We may now apply the characteristic method to the linear advection
equation (4.139), then we solve the advection-diffusion equation (4.140) using the
solution of the first equation as initial data.
This method is known as the corrected operator splitting [160], in contrast to
other operator splittings which do not take the entropy conditions into considerations, and thus do not end up with a linear advection equation. Let us note that the
construction of the entropy corrected flux fc and its residual fr cannot always be
done analytically, but may instead be approximated by numerical means [160].
To solve equation (4.139) we rewrite it into a canonical form. Using the same
flux expansion as in (4.132), we get
c
u
+ f (u)~
v u = f v~.
t
(4.141)
(4.142)
88
Numerical methods
Comparing with equation (4.141) we get the coupled ordinary differential equation system
= c,
~
x
(4.143)
= f (u)~
v,
u = f v~.
This system is solved for u, t and x~, using any suitable quadrature rule [130, 131].
However, there are two distinct directions to integrate: forward or backward in
. With forward tracking we chose spatial points x~n , and use un (~
xn , tn ) as initial
conditions to integrate forward.
Backtracking would start at tn+1 in x~n+1 , but as we do not know un+1 yet, a
nonlinear iteration procedure such as a Picard iteration [128] can be used to attain
convergence from an initial guess (un+1 un ).
The advantage of forward contra backward tracking is that all mass can be
accounted for since we can evolve characteristics from every cell at tn . Backward
tracking cannot guarantee that characteristics will hit all cells in tn . However,
forward tracking has the converse problem of not necessarily hitting all the cells in
tn+1 , leading to possible oscillations lest some sophisticated interpolation scheme
is used.
Another issue is treatment of boundary conditions. If a backward tracked
characteristic intersects the outer boundary, it is natural to prescribe a Dirichlet
condition for it. And conversely, if a forward tracked characteristic leaves the
domain a compatible flux condition must be used. These are naturally handled by
the pressure solver when computing the velocity v~. However, the general handling
of boundary conditions with the characteristic method is still somewhat ad hoc
[114].
We will now discuss some alternative characteristic methods which can be applied to both the full advection-diffusion equation (4.105) or the residual equation (4.140). The idea is to use a Petrov-Galerkin finite element formulation,
which chooses test-functions satisfying an adjoint characteristic equation while
using usual finite element bases for the solution itself [229].
Multiply equation (4.105) by some test function wi and integrate both in space
89
t = tn+1
t = tn
Figure 4.8: The test function wi is given as a linear hat function at t = tn+1
and fully aligned to the underlying Eulerian grid (refer figure 4.4). As we move
backwards along the adjoint characteristics equation (4.147) to find wi at t = tn , wi
may no longer be aligned to that mesh.
and time:
Z tn+1 Z
c
tn
u
wi dV dt+
t
Z tn+1 Z
tn
Z tn+1 Z
(f v~ K) wi dV dt =
tn
qwi dV dt.
(4.144)
Applying integration by parts to both terms on the left hand side yields
t=tn+1 Z t=tn+1 Z
Z
wi
dV
cuwi dV
cu
n
t
t=tn
t=t
Z tn+1 Z
(f v~ K) wi dV dt+
tn
Z tn+1 Z
Z tn+1 Z
(f v~ K) wi n~ dS dt =
qwi dV dt,
tn
tn
which is rearranged into
t=tn+1 Z tn+1 Z
Z
K wi dV dt
cuwi dV
n + n
t
t=t
Z t=tn+1 Z
wi
cu
+ f v~ wi dV dt+
t
t=tn
Z tn+1 Z
Z tn+1 Z
(f v~ K) n~ wi dS dt =
qwi dV dt.
tn
tn
(4.145)
(4.146)
wi
+ f v~ wi = 0,
t
wi f
+ v~ wi = 0,
t cu
(4.147)
90
Numerical methods
then we get
Z
t=tn+1
cuwi dV
n +
t=t
Z tn+1 Z
tn
K wi dV dt +
Z tn+1 Z
tn
Z tn+1 Z
(f v~ K) ~n wi dS dt =
tn
qwi dV dt.
(4.148)
The presence of the diffusion term means that u and wi must both have integrable
differentials, and we therefore use the representation (4.48) for u at all times:
X
X
un =
unj j , un+1 =
ujn+1 j ,
(4.149)
j
where uj are scalar coefficients and j are differentiable basis functions, typically
linear hat functions. For a symmetric discretisation we use the same basis for the
wi functions at tn+1 , while wi at tn is given by solving the linear (in wi ) advection
equation (4.147), see figure 4.8.
The diffusive flux integral should be handled fully implicitly, and then it is
treated identically with the finite element method for parabolic problems. We also
treat the mass integral at tn+1 and the source term in the same way as earlier. The
boundary flux integral takes care of boundary condition treatment in a systematic
manner, see [263] for details.
R
Consequently, we only need to evaluate cuwi dV at tn to close the method.
While u at tn is known, wi is not and we must solve equation (4.147) to find it. We
shall use the rest of this section to discuss the nature of the adjoint equation and
its solution.
Notice first that the adjoint method represents mass or particle motion, in contrast to the wave motion induced by the primary characteristics. To see this, we
consider (4.148) without diffusion, sources and away from the boundary:
Z
Z
n+1 n+1
(4.150)
(cu)n win dV .
(cu) wi dV =
n+1
i
ni
91
tn+1
tn
Figure 4.9: Adjoint characteristics at a shock front. The moving front is drawn
in bold. Notice that the adjoint characteristics speed up over the shock. This
corresponds to an increase in particle density.
(4.147) is linear in wi , and contrasts the intersecting wave characteristics. However, the velocity of the adjoint characteristics does depend on the solution u itself,
so its tracking is not trivial.
Let us illustrate by means of a one-dimensional example, and for this purpose
we consider one of the simplest of the nonlinear flux functions, Burgers;
f (u) =
u2
,
2
(4.151)
(4.152)
u2
2
= 0.
(4.153)
Shock Consider the Riemann problem with uL = 1 and uR = 1/2. We can verify
that we get a travelling shock of speed = 3/4. Behind the shock u = 1 and
the adjoint characteristics move with speed 1/2, while ahead of it they have
the speed 1/4, see figure 4.9.
Since the adjoint equation is linear, the adjoint characteristics cannot cross
one another. Instead they bend as they cross the primal shock, thereby
speeding up. Notice that particles move closer to another over the shock,
meaning that the particle density increases (u = 1/2 u = 1).
92
Numerical methods
tn+1
tn
Figure 4.10: Adjoint characteristics at a rarefaction. While the primal characteristics would form a fan of linear (x, t) vectors, the adjoint characteristics are bent
backwards when entering the rarefaction fan. This corresponds to a decrease in
particle density.
Rarefaction Conversely, let now uL = 0 and uR = 1. This gives us a complete
rarefaction with a linearly interpolated profile:
0, x 0,
x
(4.154)
u(x, t) =
, 0 < x < t,
t
1, x t.
In this rarefaction u changes continuously, and since the adjoint velocity is
u/2, we find the characteristic equations
x u x
t
w
= = ,
= 1,
= 0.
(4.155)
2 2t
Part II
Large scale simulation
Chapter 5
Design of reservoir simulators
A reservoir simulator is a computer program for solving the reservoir flow problems of part I using numerical solution techniques. As digital computers are
a relatively recent development, so are reservoir simulators, with the first versions appearing in the middle of the 1950s [269, 73]. Before that, forecasting
was based on solving analytical models such as the single-phase equations, the
Buckley-Leverett equations [270], or by simple material balance considerations
[88, Chapter 3].
However, from the 1960s and onward, there was a surge of interest in the use
and development of computerised simulation tools [70], with the model of choice
being the black-oil compositional model. As computers grew ever more capable,
complete compositional models were starting to emerge in the 1970s. The need
for general simulators handling many different production scenarios (black-oil,
compositional, thermal, chemical, etc) was soon realised [163], and led to the use
of volume balance type compositional models.
A complementary development was underway in the field of numerical linear
algebra. Early on in the use of computers, matrix equations were solved either by
direct methods (Gauss elimination or Gram-Schmidt orthogonalisation), or crude
iterative methods (Jacobi, Gauss-Seidel or SOR). However, the matrices arising
from the discretised partial differential equations are very sparse, and the current
generation of matrix solvers, the Krylov subspace methods, exploit this expertly.
The first practical Krylov method, conjugate gradients (CG) [142], appeared early
in the 1950s, but it was not until the 1970s that the method saw significant use.
As CG only applies to symmetrical, positive definite matrices, the Krylov methods GMRES [231] and ORTHOMIN [261] were developed for solving general
matrix equations. These methods are all iterative and search for the solution in an
expanding Krylov subspace. To increase their efficiency, many preconditioning
techniques were developed (incomplete decompositions or multigrid smoothers
[25]).
96
5.1 Foundations
Before discussing the simulator itself, we must lay a foundation for its design.
This includes choosing a suitable software development environment, and the construction of associated solvers for linear and nonlinear equations.
5.1 Foundations
97
entirely in a black box manner, so that the actual data structure (dense or sparse) of
the matrix is hidden from the user. The benefit is a simple interface and flexibility
to chose different implementations freely. In the same manner, we define the
following vector interface:
interface Vector {
/ / Returns xi
do ubl e g e t ( i n t i ) ;
/ / xi = value
v o i d s e t ( i n t i , do ubl e v a l u e ) ;
/ / xi = xi + value
v o i d add ( i n t i , do ubl e v a l u e ) ;
/ / Calculates the inner product x~ y~
do ubl e d o t ( V e c t o r y ) ;
}
Define now the general iterative solver interface
interface IterativeSolver {
v o i d s e t P r e c o n d i t i o n e r ( P r e c o n d i t i o n e r M) ;
98
(5.2)
and solving it by an iterative method. Since a naive Newton method can often fail
to converge, we should instead use either a damped iteration by truncated linesearch or constructing a trust region. Either method can be represented by the
interface:
99
Numerics
@
@
R
@
Geometry
Field
Well -
Fluid
@
@@
R
Rock
Figure 5.1: Reservoir simulator design indicating the six main modules and their
interaction with one another.
i n t e r f a c e NewtonSolver {
~ = 0~ starting from u~
/ / Solves the nonlinear problem R
Vector solve ( NonlinearAssembler R , Vector u ) ;
}
5.2
Simulator design
The simulator design we outline is based on the experiences of [209]. To keep the
exposition general and brief, we will primarily cover the binding interfaces and
only mention implementation details in passing. As such, we hope the design will
prove useful for other simulation development efforts.
All the principal modules of the simulator are shown in figure 5.1, and their
responsibilities are:
Fluid Stores primary component and fluid properties, and is responsible for computing the thermodynamical equilibrium and associated fluid properties
(viscosity, heat conductivity, enthalpy, etc).
Well Connects production and injection wells to the reservoir, and maintains the
internal well fluid state. But, it is the numerical module which must provide
the actual flow solvers inside the wells.
Geometry Provides an interface to the computational mesh, boundary conditions
and spatially varying reservoir parameters (media conductivities, well conditions, etc).
Rock Holds the relative permeability and capillary pressure curves and controls
the fluid/rock interaction that may affect both their values (by hysteresis),
or compact the porous media itself.
100
Field The field module stores the complete reservoir system state for a given
time step. This includes all control volumes and associated control surfaces.
Furthermore, the field module performs the initial hydrostatic equilibration
of the fluid state.
Numerics The numerics module performs primary time stepping and stability
control. It has classes for discretising the flow equations, and can be used
with a large number of different numerical methods.
Since the design is somewhat modular, certain parts, such as the fluid or rock modules, can be made into external packages instead of being an intrinsic part of the
simulator itself. This may be useful for researchers who may need to perform thermodynamical calculations as part of an experimental compositional formulation,
but have no need for the complete apparatus.
101
102
103
104
Methods for computing phase properties can now be added. These would take the
pressure, pw , temperature, T and the global composition, N , as input.
Black-oil fluids
Ca/Ca ,
A,
= C
L,
C
h /Ch , H.
Cl/Cl ,
(5.4)
V
=
N
V/Na ,
A,
L,
V/Nh , H.
V/Nl ,
(5.5)
105
C uw ,
(5.7)
h uw .
(5.8)
FT =
The well fluxes can be specified as either internal boundary conditions or source/sink terms.
Fluid flow inside of a well
Since fluids may flow freely between the well and the reservoir, we need to model
the flow inside the pipeline and couple this with the reservoir flow. A sophisticated
model would take into account the coupling between the wells and the surface
production facilities too [69], but that is beyond the scope of this work. Instead,
we shall only treat decoupled wells.
w
Following [189, 126] we let u i be the interior phase fluid velocity. Then we
may adapt equation (3.16) to give the mass flow equation in the well:
X
1 N
+
V t
x
!
w
C u i
= q ,
(5.9)
where x is the coordinate along the 1D pipeline, and q now corresponds to mass
entering or leaving through perforations of the pipe walls. Energy conservation
from equation (3.38) becomes
+
x
!
w
h u i
= qe,
(5.10)
in which q e is heat loss/gain through the pipeline walls, heat conduction has been
neglected due to assumed high flow velocities, and = 1 inside the pipe. For the
well fluid pressure, pw , we utilise the volume balance equation (3.44):
R(t) R pw R T X R N
+
+
+
= 0.
t
pw t
T t
N
t
(5.11)
106
Control mechanisms
w
For the determination of the interior well velocity u i we may either use either
Eulers equation, Navier-Stokes equation, or more popularly, the drift-flux relationship. The latter is fast and suitable to 1D gas/liquid pipeline flows.
The details of these velocity calculations may be found in the hydrodynamics
literature [104]. Here, we shall consider one aspect peculiar to well flows: the
control mechanisms. These mechanisms will be used as boundary conditions to
determine the well velocity, and we distinguish between two controls:
Pressure fixed at one point (typically bottom) of the well.
Fixed flux rate, where a pressure must be calculated to satisfy the constraint.
The former is a Dirichlet type condition, and a velocity must be found satisfying
it. If instead our well is rate controlled, then we should find a pressure such that
w
the velocity, u i , ensures that the flux computed equals the required production
rate. For an injection well we can limit flux of each phase:
X
C u i = P ,
(5.12)
where P [mol/m2 s] is the phase flux target rate. For a production well it may be
more convenient to impose a constraint on the total mass production:
XX
where P =
C u i = P ,
(5.13)
P .
Each well has a list of geometry indices indicating where it is perforated (completed). For every completion, a well index must be given to indicate how permeable the well is to the surrounding reservoir:
c l a s s WellCompletion {
int [ ]
perforations ;
do ubl e [ ] WI ;
}
The well itself consists of the internal fluid state, the completions, and a production control system:
107
Figure 5.2: The cells in the left blue layer are connected to the cells in the blue
layer on the right of the fault plane, and is an example of a non-neighbouring
connection. There is negligible flow between the blue and green layers across the
fault.
c l a s s Well {
W e l l C o m p l e t i o n compl ;
WellControl
ctrl ;
/ / State of the fluids at each perforation
FluidState [ ] fluid ;
/ / From the cross-sectional areal of the well we can compute its
/ / volume in each segment
do ubl e a r e a l ;
}
The control system can be a class for fixed pressure control, a class for fixed rate
control, or even a flexible schedule.
108
An element is a geometrical cell of the mesh with a set of interfaces and corner
points. A connecting interface represents grid connections, which usually are
local, but need not be (see figure 5.2).
First, an element allows access to its containing geometry and associated properties:
i n t e r f a c e Element {
I t e r a b l e <I n t e r f a c e >
interfaces ();
I t e r a b l e <C o n n e c t i n g I n t e r f a c e > c o n n e c t i n g I n t e r f a c e s ( ) ;
I t e r a b l e <P o i n t >
points ( );
do ubl e
g et V o l u m e ( ) ;
RockState getRockState ( ) ;
}
The iterators here just return the local interfaces and points, in contrast to the
iterators in the M es h interface. The element has a set of rock properties which
are described in the next section.
An interface has access to its normal vector and potential boundary conditions:
interface Interface {
Element
element ( ) ;
I t e r a b l e <C o n n e c t i n g I n t e r f a c e > c o n n e c t i n g I n t e r f a c e s ( ) ;
I t e r a b l e <P o i n t >
points ( );
do ubl e g e t A r e a l ( ) ;
Vector3 getNormal ( ) ;
/ / If this is a boundary, the boundary conditions can be retrieved
boolean isBoundary ( ) ;
MassBC getMassBC ( ) ;
HeatBC g et H eat BC ( ) ;
}
We distinguish three boundary conditions: Dirichlet (specified potential), Neumann (specified flux), or well. The latter will only exists if wells are modelled
as boundary conditions, and the numerical solver must then take the well flow
into account. The specific details depend on the type of well flow and its control
mechanism.
The actual grid connections are represented through the interface
interface ConnectingInterface {
I n t e rf a ce here ( ) ;
Interface there () ;
Cartesian grid
109
Radial grid
Unstructured grid
do ubl e g e t F l u x M u l t i p l i e r ( ) ;
}
This couples two interfaces, which need not be geometrically adjacent. A flux
multiplier can also be given on the connections, and can be used to model flow
reduction through partially blocked pores.
Finally, we have the simple spatial points
interface Point {
I t e r a b l e <E l em en t >
elements ( ) ;
I t e r a b l e <I n t e r f a c e > i n t e r f a c e s ( ) ;
do ubl e getX ( ) ;
do ubl e getY ( ) ;
do ubl e g e t Z ( ) ;
}
Its iterators return the set of elements and interfaces which have the point as a
corner point.
Some examples of mesh implementations are (see figure 5.3):
Cartesian A grid given by x, y, and z arrays. Elements can be accessed
by the indices (i, j, k). Cartesian grids are suitable for modelling simple
sedimentary (non-faulted) reservoirs.
Radial Inflow to production wells is usually radial, and using cylindrical coordinates is here appropriate. The grid is then specified by r (radial increment),
(x-y plane angle increment) and z (height increment).
It may be convenient to formulate the equations in radial (r, , z) rather than
Cartesian (x, y, z) coordinates when using radial grids. This also converts
the non-conformal grid cells in the radial grid into conformal ones.
110
Unstructured This type of grid is specified by connections and spatial coordinates of every point. The popular corner point geometry found within reservoir simulation can be converted into a conformal unstructured grid by subdivision of non-conformal quadrilaterals into conformal multilaterals.
/ / Component dispersion
P h a s e D a t a <do ubl e [ ] > d l ; / / Longitudinal dispersion d,l
P h a s e D a t a <do ubl e [ ] > d t ; / / Transversal dispersion d,t
P h a s e D a t a <Double > k r ; / / Relative permeability kr
go
do ubl e pcow , pcgo ;
/ / Capillary pressure pow
c and pc
}
and the Ro ck Reg i o n class;
c l a s s RockRegion {
111
Co m p act i o n c o m p a c t i o n ;
/ / Hysteresis models
R e l P e r m H y s t e r e s i s g e t R e l P e r m ( ) ; / / kr
go
C a p P r e s H y s t e r e s i s g e t C a p P r e s ( ) ; / / pow
c and pc
}
The models herein compute and change the stored rock data depending on how
the reservoir state evolves. Compaction primarily affects porosity, absolute permeability and rock density, and is dependent on the fluid pressure, pw , and the
reservoir overburden, W (3.45):
i n t e r f a c e Co m p act i o n {
do ubl e c a l c P o r o s i t y
( do ubl e p , do ubl e W, do ubl e p h i ) ;
do ubl e c a l c P o r o s i t y D e r i v P
( do ubl e p , do ubl e W, do ubl e p h i ) ;
do ubl e c a l c P o r o s i t y D e r i v W
( do ubl e p , do ubl e W, do ubl e p h i ) ;
Tensor c a l c P e r m e a b i l i t y
( do ubl e p , do ubl e W, T e n s o r K ) ;
do ubl e c a l c R o c k D e n s i t y
( do ubl e p , do ubl e W, do ubl e r h o ) ;
}
The relative permeability and capillary pressures are history dependent, so the objects must maintain an internal history. When calculating new relative permeabilities / capillary pressures, the latest phase saturations must be provided:
interface RelPermHysteresis {
P h a s e D a t a <Double > c a l c R e l P e r m
( do ubl e Sw , do ubl e So , P h a s e D a t a <Double > k r ) ;
}
interface CapPresHysteresis {
do ubl e c a l c P c o w ( do ubl e Sw , do ubl e So ) ; / / pow
c
go
do ubl e c a l c P c g o ( do ubl e Sw , do ubl e So ) ; / / pc
}
112
Equilibrium initialisation
To start the simulation, we need initial values for the fluid state in the control
volumes. We can either get these values from a previous simulation run or we
can calculate a global equilibrium state such that all fluids are at rest. The former
corresponds to a hydrodynamic equilibrium, since fluids may flow, while the latter
is a hydrostatic equilibrium since the weight of the fluids balances with the phase
pressures:
p
= .
(5.14)
h
113
S
Sw
So
Sg
WOC
Height
GOC
Figure 5.4: Phase saturations as functions of depth at equilibrium. The size of the
transition zone between the phases is determined by the strength of the capillary
forces. In this case, these forces are quite strong.
This is the hydrostatic condition, and using it we clearly have that all fluids are at
rest since
(5.15)
u~ = (p + h) = ~0.
| {z }
=~0
V = 0.
(5.16)
So (WOC) = 0,
Sg (WOC) = 0,
(5.17)
and
Sw (GOC) = Swr ,
So (GOC) = 1 Swr ,
Sg (GOC) = 0.
(5.18)
We remark that hysteresis-free capillary pressure curves should be used for the
initialisation, since there is no actual fluid flow taking place, and therefore no flow
history.
114
n 0, F n+1 F n
While tn+1 < tmax
Output F n+1 if desired
Do
Check CFL conditions for all explicit discretisations
Call all solvers in sequence, yielding the primary variables in F n+1
Calculate secondaries in F n+1
Until Rn+1 = 0
F n F n+1 , n n + 1
Algorithm 5.1: The time stepping algorithm in the Ti m eSt ep p er class. F n denotes the field state at time step tn .
115
116
class Transmissibility {
do ubl e k ; / / Transmissibility coefficient
do ubl e j ; / / Associated control volume index
}
f (x) = lim
(5.19)
f (x + x) f (x)
,
x
x > 0.
(5.20)
117
Chapter 6
Parallelisation of reservoir
simulators
Numerical simulators are notoriously demanding computer applications, with an
essentially unbounded need for more power (faster processors, more memory,
etc). The reason is that we can always increase the spatial or temporal resolution, or we can use more advanced numerical / physical models.
One answer to this ever-growing demand for speed lies in faster and more efficient discretisations, another lies in the use of parallel computers. As the name
indicates, a parallel computer has numerous processing units and several associated banks of memory.
Since the advent of parallel computers in the 1980s, much effort has been applied to develop algorithms for solving partial differential equations on these machines. It turns out that most such algorithms are captured within the framework
of domain decomposition methods [243, 258, 224]. A domain decomposition algorithm splits the spatial domain into pieces, and each piece is solved on a single
processor. To attain global convergence, the processors must communicate with
one another, and the way this is done is an important differentiator between the
specific methods.
Many of the common domain decomposition methods are algebraic, meaning
that they work as parallel solvers for the discretised matrix equation
A~
x = ~b.
(6.1)
While this approach is appealing, it has the drawback of intrinsically linking the
discretisation method to the parallelisation approach. This leads to rigid software
that is hard to maintain. Software which is hard to maintain, is also hard to extend,
and consequently may not be suitable for emerging applications.
We therefore wish to present a parallelisation method which clearly separates
the flow simulation from parallelisation. This allows us to parallelise a sequential
120
Figure 6.1: The union between the circle and the rectangle forms the computational domain, with the green region indicating the overlapping part.
simulator without material change to that code. A further benefit is that a change
of discretisation will not affect the manner in which parallelisation is conducted.
(6.3)
(6.4)
be the boundary of the rectangle which intersects the circle, Schwarzs method
alternates between solving
[t]
in 1 ,
Lu1 = q,
[t]
u1 = g,
[t]
[t1]
u1 = u2
on \ 12 ,
, on 12 ,
(6.5)
121
u[0]
[1]
u1
[1]
u2
[2]
u1
[2]
u2
1
1 2
2
Figure 6.2: Geometric convergence proof of the Schwarz method for a onedimensional Laplace problem (L = 2 /2 x, q = 0, g = 0). Starting with u[0] , the
[1]
[1]
solution in 1 yields u1 , and is followed by the solution in 2 yielding u2 .
Notice the gradual descent to u = 0, the exact solution.
and
[t]
Lu2 = q,
[t]
u2
[t]
u2
= g,
[t]
in 2 ,
on \ 21 ,
(6.6)
= u1 , on 21 .
Problem (6.2) can be hard to solve because of the complicated geometry. The
splitting into the two subproblems (6.5) and (6.6) gives us geometrically regular
domains (a circle and a rectangle). Equipped with analytical solution methods
applicable only to regular domains, this splitting was a necessity. A geometrical
convergence proof is sketched in figure 6.2.
Nowadays, the advantage of the splitting is different: the solution of (6.5) can
be quite decoupled from the solution of (6.6). The domains only need to communicate with one another by passing updated internal boundary conditions. Consequently one processor may solve on 1 , another on 2 , and they communicate
the latest available boundary conditions. The solver on the subdomains can be
unmodified sequential codes [50]. Since the subdomain solvers in our case can be
very complicated, the fact that we can reuse them in a parallel setting by merely
updating boundary conditions, is a significant advantage.
122
Figure 6.3: A domain split into 16 subdomains. Domains of equal colour can
be solved independently of each other by the parallel Schwarz method, while domains of different colour can be solved in sequence by the sequential Schwarz
method. In this case, four processor can solve the problem in parallel.
[t]
123
the larger the number of subdomains, the slower the global convergence is,
while
greater overlap between the subdomains leads to faster convergence.
To see this, consider Poissons elliptic equation
u = q, in ,
u = g, on .
We can express its solution, u, by Greens function [111, 2.2.4]:
Z
Z
y )G(~
x, y)
~ dV (~
y)
g(~
y )y G(~
x, y~) n~ dS(~
y)
u(~
x) = q(~
(6.7)
(6.8)
(6.9)
where x equals one in x~ and zero elsewhere. This implies that Greens function
has a peak at x~, but rapidly decays to zero away from x~. From equation (6.8)
we see that any internal changes in q or boundary data, g, implies changes everywhere in u. However, the rapid decay of Greens function means that the effect is
primarily local.
Now, the overlap between the subdomains is used to capture the local effect of
Greens function. However, when we have many subdomains, the global effect of
Greens function is not captured very well. The updated information computed in
one subdomain is conveyed to another subdomain only by passing through all the
intermediary subdomains. For the domain splitting in figure 6.3, this can mean
that four Schwarz iterations may have to be done for information in one corner to
travel to an opposite corner.
Advection dominated problems
124
13
12
2
14
4
Figure 6.4: The computational domain is initially partitioned into a set of non i }. Each of these domains are extended to form a set
overlapping domains {
of overlapping subdomains {i }. The internal boundary of 1 , 1 = 1 \ is
split into parts 12 , 13 and 14 given by the intersection with the non-overlapping
i }.
subdomains {
125
(6.10)
A better initial estimate of vfn can be found by using the last fine boundary data
vfn1 . Clearly
vfn1 = vcn1 + vfn1 ,
(6.11)
which can be solved for the fine scale correction, vfn1 , giving
vfn1 = vfn1 vcn1 .
(6.12)
(6.13)
Another problem is that there is no information passing from the fine scale
to the coarse scale. If the coarse scale model is not adequate, its solution may
drift away from the fine scale solution, and the boundary data, vcn , it produces can
become progressively more incorrect. A simple remedy is to replace the initial
conditions of the coarse scale model by the converged fine scale solution.
Chapter 7
Parameter estimation
An important prerequisite for useful production forecasting with a reservoir simulator, is accurate and comprehensive input data. Parameter estimation or history
matching is the process of finding missing field data which matches the production
history of the reservoir. The known data that we must match include:
pressure and fluid data in and around wells,
production history (ratio of oil to water, ratio of gas to oil),
tracer tests and recorded water break through, and
core samples.
These measurements are usually quite accurate, but may contain errors, and are
also spatially restricted to the well regions. Other data include reservoir interpretations based on geological insight and seismic measurements. While not as
tangible, they serve as an initial estimate of the reservoir and its properties, and
can help to steer the parameter estimation away from obvious dead ends.
Now, the type of fluids and their properties can be readily determined from
laboratory experiments of reservoir fluid samples. What we must match is primarily rock data:
porosity and absolute permeability,
rock compressibility,
relative permeability and capillary pressure,
identification of fluid contact regions (WOC and GOC), and
specific geological features, such as faults.
128
Parameter estimation
Figure 7.1: A simulation grid with about 10 distinct geological layers and three
major faults. Taken from the Visund geometry.
The core samples provide this data in selected parts of the reservoir, but it is our
task to determine these properties everywhere.
It is conventional to carry out parameter estimation manually. Then, one starts
with a reservoir interpretation, and performs simulation runs with this model. The
output of the model is compared with measured well data, and the reservoir data
is modified until a match in all parameters is achieved. Naturally, this is very time
consuming and error prone, hence the proverb a good history match is obtained
when you run out of time or money.
Instead, by formulating the history matching as an optimisation problem, we
can make the process automatic. In the optimisation we use the measurements
as solution constraints, while reservoir interpretations are used as initial estimates of the subsurface conditions. As a part of the optimisation, numerous flow
simulations must be conducted, and these must clearly be fast and capable of incorporating a highly detailed geo-description.
129
> 0,
(~
x)
= 0,
< 0,
x~ interior of ,
x~ ,
x~ exterior of .
(7.2)
To use this level set function for parameter representation, we shall utilise the
Heaviside function
0, x 0,
H (x) =
(7.3)
1, x > 0.
Clearly H () = 1 inside of and zero elsewhere.
130
Parameter estimation
++
+
Figure 7.2: Two curves divide the domain into four regions. If the curves do
not overlap, ++ will disappear, and we would have three regions left.
Let our unknown parameter be a, and assume for simplicity that it has only
two values, a1 inside and a2 outside of . Then
a = a1 H () + a2 1 H () .
(7.4)
This reduces the identification of a to just identifying a1 , a2 and . And to find
, we only have to move the zero level set of the level set function . This is a
substantial gain over identifying a in every grid block.
This approach may be generalised to multiple level sets following the idea in
[60]. Consider two closed curves 1 and 2 with associated level set functions 1
and 2 . These will partition into four distinct regions, see figure 7.2, where
++
+
+
=
=
=
=
{~
x : 1 (~
x) > 0, 2 (~
x) > 0} ,
{~
x : 1 (~
x) > 0, 2 (~
x) < 0} ,
{~
x : 1 (~
x) < 0, 2 (~
x) > 0} ,
{~
x : 1 (~
x) < 0, 2 (~
x) < 0} .
(7.5)
(7.6)
(7.7)
(7.8)
(7.9)
(7.10)
(7.11)
(7.12)
131
7.2
(7.13)
(7.14)
(7.15)
(7.16)
Parameter identification
We now have a representation of our parameters. Since there is usually more than
a single parameter to find, we may use the same representation for each (i.e., the
same coarse grid or the same set of level set functions).
The parameter identification procedure must find parameters a and solution u
such that the equation errors ei in each of the reservoir flow equations E is zero:
ei (u; a) = 0,
i E.
(7.17)
x~ .
(7.18)
is the subset of
The function u can contain noise or other uncertainties, and
where the measurements were taken. The solution of equations (7.17)-(7.18) can
be formulated as an output least-squares problem:
minimise ku(~
x, t) u(~
x, t)k ,
(7.19)
subject to ei (u; a) = 0, i E.
Another possibility is the total least-squares formulation:
X
minimise ku(~
x, t) u(~
x, t)k +
kei (u; a)k .
(7.20)
iE
Any norm can be used in either (7.19) or (7.20), but it is common to use the L2
norm. To facilitate an easier differentiation, the norms are often squared. This
squaring does not change the solution of the optimisation problem, however.
7.2.1 Regularisation
Hadamards definition [129] of a well-posed problem is that
1. it must have a solution,
132
Parameter estimation
Tikhonov regularisation [124] minimises the norm of the parameter a or any of its
derivatives. As such, we augment either of the minimisation problems (7.19) or
(7.20) with
n a
,
(7.21)
where > 0 is some scalar weighting the regularisation against the least squares
optimisation, and n 0 is an integer. With n = 0, we try to keep a small, with
n = 1, we instead wish to minimise the total variation, and with n = 2, it is the
curvature that should be minimised.
If we have some estimate a of what a should be, we can supplement (7.21)
with
n a n a
,
(7.22)
where > 0 is another scalar weighting factor.
Level set specific regularisation
The Tikhonov regularisation (7.21) can be applied directly to the level set parameter representation. For n = 1 this gives
X
j
.
(7.23)
Here, is the Dirac delta function which equals one if its argument is zero, and
zero if not. Therefore we can interpret
the above as the area of the j surfaces in
some suitable norm, since j = 1 only if j = 0, and this in turn occurs only
on j .
133
(7.28)
We seek d~[k] which minimises the right hand side, and by differentiation, we get
(7.29)
2 f s~[k] d~[k] = f s~[k] .
This linear matrix system can be solved to yield d~[k] .
134
Parameter estimation
(7.30)
i E.
(7.31)
The scalars
are the Lagrangian multipliers at the optimum, and are additional
unknowns to be determined during the solution process. From this we define the
Lagrangian functional
X
~ = f (~
L s~,
s)
i ei (~
s),
(7.33)
iE
(7.34)
(7.35)
iE
where is a some small and positive barrier parameter. The last term is always
positive, and an unconstrained optimiser must try to ensure ei = 0 if LA is to be
minimised. Differentiating LA with respect to s~ yields
~
s~LA s~, ;
1X
ei (~
s) ei (~
s)
iE
iE
X
ei (~
s)
ei (~
s) .
= f (~
s)
i
= f (~
s)
i ei (~
s) +
(7.36)
(7.37)
iE
Comparing this with the KKT conditions (7.32) indicates that the optimal Lagrangian multiplier can be approximated by
i i
ei (~
s)
.
(7.38)
135
A
~[k] ; [k]
<
if
s~LA s~[k+1] ,
return with the approximative solution s~[k+1]
[k+1]
[k]
ei s~[k+1]
[k]
Part III
Papers
Chapter 8
Summary of the papers
Seven papers are included as part of this thesis. Two of the papers have been
accepted for journal publication, three will be included in conference proceedings,
and two are as of yet in a draft state.
The papers cover a broad range of topics in computational mathematics. In
this chapter we provide a summary of the contents of each paper, and list the main
contributions.
140
Paper A
A Parallel Eulerian-Lagrangian Localized Adjoint Method
Bjrn-Ove Heimsund and Magne S. Espedal
This paper considers parallelisation aspects of the adjoint characteristic
method introduced in 4.3.3. Parallelisation of characteristic methods has been
studied in several other papers [265, 253], but the algorithms proposed there are
usually somewhat complicated and/or inconsistent with the discretisation from the
sequential case.
Our goal was to perform a parallel discretisation such that the matrix equation
in the parallel case coincided with the matrix equation in the sequential setup.
In the paper, we considered a linear advection-diffusion problem, and recall that
the adjoint characteristic method must then solve the adjoint characteristic equation (4.147)
w
+ v~ w = 0.
(8.1)
t
Since this is a purely hyperbolic equation, it does not depend upon the global system state, and can be solved locally in each subdomain. At subdomain boundaries
with adjacent processors, we only have to communicate the local velocity field v~
for the solution of the adjoint equation.
This results in a matrix equation
A~
x = ~b
(8.2)
Developed a simple and robust parallelisation scheme for advectiondiffusion equations by the adjoint characteristic method.
Discussed implementation aspects of this method on different parallel architectures.
Provided comparisons of the new method with other parallelisation method
for adjoint characteristic methods, and performed a simple scalability test
on a parallel computer.
141
Paper B
Adjoint methods are particle methods
Thomas F. Russell, Bjrn-Ove Heimsund, Helge K. Dahle and Magne S. Espedal
The adjoint methods were originally developed for linear advection-diffusion
problems. For that class of problems, the adjoint method is very efficient since
it allows essentially arbitrary long time steps [58]. In this paper we study the
application of the adjoint characteristics method to nonlinear advection problems.
It turns out that the adjoint equation corresponds to particle motion instead
of wave motion. Particles can always be traced, and there is never problems of
non-uniqueness. A wave shock only affects particles by speeding them up.
Based on this particle interpretation, we designed a numerical algorithm for
solving the one-dimensional adjoint particle equation (4.147)
w f (u) w
+
= 0.
(8.3)
t
u x
This linear advection equation gives the particle speed once we have the primary
solution u. To effectively predict the evolution of u, we used Riemann solvers.
This works well for shocks, but at rarefactions we experienced difficulties due to
the continuous change in u. We were able to overcome some of these difficulties
by introducing a small amount of artificial diffusion. The introduction of this
diffusion can be motivated by the mass dispersion present in compositional flows,
see equation (3.23).
We extended the one-dimensional algorithm to two dimensions by splitting the
advection field into decoupled streamlines. Along each streamline we could then
apply our one-dimensional algorithm. Since we assumed that the dispersion is
mostly longitudinal and not transversal, there was no need to map the streamline
solution back onto the original Cartesian grid.
Main results of the paper:
Established that an adjoint method corresponds to mass motion, and that the
adjoint characteristics are particle paths.
Developed a numerical algorithm for solving nonlinear advection-diffusion
equations in one spatial dimension. The algorithm relies on a Riemann
solver to ensure accurate particle tracing. Extended the algorithm to two
dimensions by a streamline splitting.
Numerical results for one and two dimensional problems showed that the
method provided accurate solutions with time steps far greater than the explicit CFL limit.
142
Paper C
Multiscale Discontinuous Galerkin Methods for Elliptic Problems
with Multiple Scales
Jrg Aarnes and Bjrn-Ove Heimsund
The discontinuous finite element method has gained popularity within the
fields of fluid- and structural mechanics [74]. This paper investigates their potential use within reservoir mechanics with a special emphasis on the multiscale
structure of the medium permeability.
4.2.2-4.2.3 introduced both the multiscale finite element method and the discontinuous finite element discretisation. The paper combines these two, motivated by the potential higher accuracy of the discontinuous finite element method
and the possibility of capturing subgrid effects in permeability offered by the
multiscale methods.
The resulting method was applied to the single phase equation (2.30)
K = q
(8.4)
(8.5)
(8.6)
which make the solution dependent on the free parameter . Theoretical results
for isotropic and homogeneous problems show that convergence is achieved for
1
,
(8.7)
>O
h
where h is the grid size [57]. But for realistic problems, no such results seem to be
available. Indeed, it is common to use = 0, even if that may give non-convergent
methods [48, 27].
Through several numerical tests the paper illustrates the impact of , and it is
clearly established that an un-optimal choice (i.e., = 0) leads to errors of order
O(1). Larger values of gives better results, but too large values may ruin the
accuracy.
143
Combined the multiscale and discontinuous finite elements into a new discretisation method for elliptic equations with rapidly varying subscale permeability.
Established the potential performance of the method on some problems with
a highly oscillatory subscale permeability.
Provided a numerical convergence comparison between multiscale and
monoscale methods, of finite element, mixed finite element and discontinuous finite element type.
144
Paper D
Level set methods for a parameter identification problem
Bjrn-Ove Heimsund, Tony Chan, Trygve K. Nilssen and Xue-Cheng Tai
This paper shows how to apply the level set representation to parameter identification problems. Along with [149], it was the first paper to consider level set
methods for this purpose.
As shown in chapter 7, the level set representation of a piecewise constant
parameter is done by an implicit representation of the boundary interface between
the distinct regions. So when trying to identify this parameter, we do not have to
determine its value cell by cell. Instead we only have to determine the boundary
interface. By regularising the size of the interface (to avoid folding), we achieve
a large reduction in the degrees of freedom of the problem.
In the paper we considered the single phase pressure equation (2.30)
K = q.
(8.8)
The task was to identify the scalar permeability K given measurements of the
potential in selected nodes. Using an output least-squares formulation, the resulting minimisation problem was solved by the augmented Lagrangian method.
Main results of the paper:
145
Paper E
On a class of ocean model instabilities that may occur when applying
small time steps, implicit methods, and low viscosities
Bjrn-Ove Heimsund and Jarle Berntsen
With more efficient numerical methods and faster computers, it has become
feasible to perform simulation studies with reduced or omitted artificial diffusion.
In ocean modelling, artificial diffusion can be introduced by increasing the viscosity of water.
It has been observed that with physical viscosity, the time stepping procedure
fails to converge, even if the time step size t 0 [115]. These difficulties are
not peculiar to ocean physics, they may very well occur in sophisticated reservoir
simulation tools.
Therefore we tried to find a possible explanation for this counter-intuitive phenomenon. The paper considers the ordinary differential equation (ODE)
u
= u,
t
C.
(8.9)
An ODE solver is absolutely stable if it produces bounded solutions to this equation as t . Now, some solvers have a stability region of values in the complex plane that includes part of the right (positive) half plane. A combination of
negligible diffusion and specific implicit ODE solvers can give stable solutions
for t > . But if t < , these methods produce unbounded oscillations. The
paper properly identifies the conditions for these instabilities so that they may be
avoided.
Main results of the paper:
Furnished stability requirements for some common multistage and multistep ordinary differential equation integrators.
Tested multistep, multistage (Runge-Kutta) and implicit ODE solvers on the
linearised 2D shallow water equations.
Obtained perfect match between theoretical and numerical results.
The analysed implicit ODE solver has been implemented in the Bergen
Ocean Model [36].
146
Paper F
A two-mesh superconvergence method with applications for adaptivity
Bjrn-Ove Heimsund and Xue-Cheng Tai
The flux recovery method for finite element discretisations has been explained
in 4.2.2, and it is known that the recovered flux can be more accurate than the
original flux, a phenomenon known as superconvergence [282]. As a follow-up to
[138], this paper considers the flux recovery method as a mesh refinement indicator.
Large discrepancies between the computed flux of the finite element method
and the recovered flux can be used to indicate regions where the mesh should be
refined. The equation studied is
K = q.
(8.10)
A large emphasis is placed on numerical results, hence we consider several different permeability fields K, sources q and domains . Non-convex domains and
singular problems were also considered to test the method to its limits.
The refinement is compared to the classical jump indicator [107, 154]
v
u ni
uX
2
(8.11)
EJ (i ) = |i | kqki + t |ik |2 ~nik (K)
k=1
which is based on an error estimate for the computed flux. In the paper we
show that our flux recovery based method is superior to this classical indicator
for smooth problem. For less regular problems, the recovered gradient computed
from the flux recovery adaptivity is generally the most accurate.
Main results of the paper:
147
Paper G
High performance numerical libraries in Java
Bjrn-Ove Heimsund
In the last paper we consider the construction of a set of software components
for scientific computing. Aside from basic arithmetics, the fundamental software
component for scientific calculations is a comprehensive matrix library. Such a
library must provides matrix factorisations and solvers, and should preferably be
efficient to program with and efficient in terms of computer time.
As argued for in chapter 5, modern programming environments such as Java
can significantly enhance the productivity of the programmer by removing much
of the burden of code tuning and debugging. Consequently, we want to provide
matrix tools which are easily usable from Java whilst not compromising performance.
Starting from basic vector and matrix operations, the paper builds up a matrix
library with useful abstractions to the underlying machine architecture and matrix
data structures, while providing consistently high performance through a range of
industry standard benchmarks. In particular, the matrices arising from the discretisation of partial differential equations tend to be very sparse, so we designed a
sparse matrix data structure with an easy interface. In standard benchmarks, our
implementation outperformed established sparse matrix software.
The design of the matrix toolkit was so flexible that an extention to parallel architectures was implemented. The implementation includes parallel matrix
solvers and preconditioners, and good scalability was demonstrated.
Main results of the paper:
Contrary to popular belief, our Java matrix library was able to perform on
par or better than counterpart software written in C or C++.
Provided the design schematics of an extensible matrix library.
Demonstrated auxiliary functions such as iterative solvers and matrix factorisations.
Showed parallel extentions, along with suitable distributed matrix data
structures.
Chapter 9
Further work
Based on the two introductory parts and the papers, we can identify some areas
of future research. This includes a further investigation of numerical methods
with model formulations, and development of techniques for practical large scale
simulations.
9.1
Numerical methods
In 4.3.1 we provided several different ways to formulate the hyperbolic/parabolic flow equations. Essentially, we identified several
choices for f (u) and v~ in the flux
Hyperbolic formulations
(f (u)~
v) .
(9.1)
The goal should be a velocity v~ which changes as little as possible when the solution u changes. Furthermore, if the velocity is divergence free, v~ = 0, and
(f (u)~
v ) = v~ f (u).
(9.2)
This removes a potentially troublesome source term. Capillary forces can be ill
behaved with large gradients. To avoid numerical difficulties, the capillary pressure gradient pc should be multiplied with a damping factor. The total velocity
formulation multiplies pc by the product of the relative permeabilities, and this
leads to enhanced numerical stability [246, 215].
Different hyperbolic formulations have been investigated in [64, 39], but not
comprehensively. In particular, we should seek a formulation that closely satisfies
the above conditions. Then we will have a numerically stable discretisation, and
the flux computation can be done accurately with a Riemann solver for f (u).
150
Further work
(9.3)
Here, q is the mass transfer of a component between phases. This equation can
be solved for the fraction C by tracking the flux u~ . In a particle (adjoint)
formulation, we solve the adjoint equation
1
w
+
u~ w = 0.
t
S
(9.4)
Notice that this adjoint equation is only dependent on phase properties, not component properties. This means that for a three-phase, multi-component system,
only three equation (9.4) are needed for every particle to be traced, and this can
result in large computational savings.
151
9.2
Parameter identification
by optimisation and level set representations have so far only been carried out
Parameter estimation for compositional problems
152
Further work
for one- and two-phase problems. Typically, it is only porosity and permeability which are the unknown parameters. Therefore it would be useful to study
automatic history matching with a complete compositional model. Such a model
incorporates much more fluid data, and the fluid data can be assumed known.
Consequently, we get much more known data into the problem compared to
one- and two-phase problems, but the amount of unknowns ( and K) are the
same. It should then be expected that better results can be achieved by using the
more sophisticated forward model.
Paper A
A Parallel Eulerian-Lagrangian
Localized Adjoint Method
Bjrn-Ove Heimsund and Magne S. Espedal
Abstract
We describe an approach to parallelize Eulerian-Lagrangian localized adjoint
methods such that no errors are introduced compared to the sequential case. This
parallelization approach fully captures the hyperbolic features of the underlying
problem. It uses an overlapping domain decomposition technique, and does not
involve the introduction of artificial boundary conditions between subdomains.
Implementation details on different parallel architectures are discussed.
A.1 Introduction
A.1
155
Introduction
The Eulerian-Lagrangian localized adjoint method (ELLAM) is a method developed by Celia, Russell, Herrera and Ewing [58] to solve linear advectiondiffusion equations. Later works by Binning and Celia [38, 40] have extended
the method to multiple spatial dimensions. A general overview of ELLAM is
found in [229], while [263] contains numerous implementation details.
Advection-diffusion equations arise in applications such as groundwater flows
and reservoir flooding. In many cases, the size of these problems are such that
parallelization is necessary, both to increase accuracy by mesh refinement and
to decrease the solution time. Much effort has been applied to extend domain
decomposition methods [243], originally developed as a preconditioner for elliptic
and parabolic problems, to solve advection-diffusion equations. See [253, 262,
264] and the references therein for more information.
Those approaches subdivided the computational domain into disjoint pieces
and solve nested advection-diffusion problems on each subdomain. The domains
are joined together by choosing artificial boundary conditions which tries to capture the flow direction. However, no analysis has been done to analyze the errors
which are introduced.
In this work we will propose a method for parallelization of the ELLAM such
that no errors are introduced, and with a theoretically nearly linear scalability,
as we will argue for. No artificial boundary conditions are introduced between
subdomains, instead characteristic information is passed between the subdomains
to fully capture the hyperbolic nature of the problem.
The outline is as follows. Section A.2 gives a variational formulation of the
continuous equation to derive the basic ELLAM formulation, and briefly discusses
the choice of a basis for the approximation space and the test-space. Section A.3
is on how to perform the numerical integration and characteristic tracking. With
this background, parallelization is discussed in Section A.4, followed by a brief
comparison and a numerical example.
A.2
Variational formulation
(A.1)
156
(A.2)
The first term is integrated by parts with respect to t, while the other term is
integrated by parts in x, giving
t=T Z
Z
Z
w
(V u Du) w dxdt
u dxdt
uw dx
UT
U
UT t
t=0
Z
(V u Du) nw dsdt = 0,
+
UT
+ V w dxdt = 0.
t
UT
Next, we partition the time-domain into finite intervals of size t = tn+1 tn , and
in each time-interval, w satisfies the localized adjoint equation
w
+ V w = 0.
(A.4)
t
Using this, our variational formulation becomes
Z
Z
u Dw dxdt
uw dx
n+1 +
UT n+1
U
t
n
Z
Z
(A.5)
(V u Du) nw dsdt =
uw dx
+
.
UT n+1
n
t=tn
157
tn+1
tn
Figure A.1: Here the test-function wi is given as a hat-function. At tn+1 it is
aligned to the Eulerian mesh, but as we move back to tn , it follows the Lagrangian
nature of the problem, and it may no longer be properly aligned. The dashed lines
refer to the spatial representation, while the dotted lines show the time-evolution.
We choose spatially dependent basis-functions {i }M
i=1 where M is the number of
n
spatial nodes, and represent u at t = t by
u(tn ) = un =
M
X
in i ,
i=1
where in R are the coefficients of the discrete representation of u(tn ). For each
i , we associate a test-functionwi which
equals i at t = tn+1 , but satisfies Equa
tion (A.4) in the time interval tn , tn+1 . Consequently, wi is constant along the
characteristic
dx
=V,
(A.6)
dt
andwe can backtrack to find the value of wi anywhere in the time interval. For
t 6 tn , tn+1 , we set wi = 0. See Figure A.1.
Note that u is defined by a fixed (Eulerian) grid. wi is also defined by a fixed
grid, but only at tn+1 . As we backtrack, wi follows the Lagrangian characteristics
of the problem and may not be aligned to the Eulerian grid at tn .
A.3
Numerical integration
158
functions, such a system may be very sparse and iterative linear solvers can be
used.
The first integral on the left hand side of (A.5) gives a mass-matrix. Both u and
wi will be on the same fixed mesh at tn+1 , and one can in fact compute this integral
analytically, as is done in standard finite element formulations. This addition is
added into the system matrix A.
On the right hand side there is also a mass integral, but it represents the mass
at the previous time t = tn . While it is possible to treat it analytically for some
special velocity fields V , in general, a numerical approach must be used.
The numerical approach we use here is forward-tracking. We choose quadrature points at t = tn , and forward track them using Equation (A.6) to t = tn+1 ,
where we can evaluate wi . A simple prediction tracking of a point xn is given by
xn+1 = xn + V xn , tn t,
(A.7)
and uses only information on V at the time tn . A better tracking is the predictorcorrector method
( n+1
x0
= xn + V (xn , tn ) t
(A.8)
n+1
xj+1
= xn + 21 V (xn , tn ) + V xjn+1 , tn+1 t, j = 0, 1, 2, . . .
The initial prediction is corrected iteratively using the average of the initial velocity and the velocity at the predicted spatial point at t = tn+1 . This corrective
cycle typically converges after only a few iterations. Depending on the form of V
and the time-stepping procedure used, it is also possible to use more sophisticated
tracking schemes, see [13, 125, 136].
Having chosen a tracking scheme, we proceed to evaluate the mass-integral
by performing element-wise integration
Z
XZ
=
uw
dx
uw dx
.
n
n
U
U
t=t
g
g
t=t
Here g loops over all the elements of the grid. Next, we choose a quadrature rule
with p integration points, and approximate the integral by
XX g g
XZ
g
Wp u(xp , tn )w(xp , tn ).
uw dx
n
Ug
g p
g
t=t
Wp are integration weights of the chosen quadrature rule, and xp are the integration points. At t = tn , u is known and can be substituted into the integration.
g
g
g
To evaluate w, we forward-track each xp to t = tn+1 , giving xp . At xp we note
A.4 Parallelization
159
t = tn+1
t = tn
Ui
Uj
Figure A.2: A part of the spatial domain has been divided into Ui and Uj . The
forward-tracked characteristic from Ui enters the domain Uj .
which wi are non-zero, and add an integral contribution to row i of the right-hand
side vector b.
Returning to the left hand side of Equation (A.5), we have the diffusion term.
It may be approximated by an implicit Euler
Z
Z
u Dw dxdt t u Dw dx
,
UT b
a
t=tn+1
Dw
dx
u Dw dx
u Dw dxdt
n .
n+1
2
U
U
U b
t=t
t=t
Ta
A.4
Parallelization
Our method to parallelize the solver for (A.5) is to assemble the matrix A and the
vector b in parallel, so as to achieve both data and computational parallelization.
To do this we partition the spatial domain U into the set {Ui }Pi=1 such that Pi=1 Ui =
U. The coefficients V and D are also given on the same spatial partition, and by
associating one processor on each subdomain, that processor will only have access
to its local coefficients and local solution state u(tn ).
Consider Figure A.2. Using Equation (A.7) to predict the characteristic,
its path may enter neighbouring subdomains. Correcting the path using Equation (A.8) is now not possible unless V in Uj is made available to the solver
160
Ui
1111
0000
0000
1111
Overlap
Uj
t = tn
(A.9)
A.4 Parallelization
161
162
Outflow, Ui
Inflow, Uj
A.4 Parallelization
163
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0.2
0.4
0.6
(a) t = 0
0.8
0
0
0.2
0.4
0.6
0.8
(b) t = 0.8
1
1 0 0
V = 0 , D = 105 0 1 0 .
0
0 0 1
This corresponds to advection in the positive x-direction, and homogeneous, isotropic diffusion of small magnitude. This velocity field and our choice of x
and t gives an integer Courant number of 10, thus there should not be any additional numerical diffusion due to interpolation errors. No-flow (zero Robin)
boundary conditions were used on the whole boundary, which in effect eliminates
the boundary integral.
The initial value of u is zero everywhere, except in x [0.05, 0.1], y
[0.3, 0.7], z [0.2, 0.8] where u = 1. This initial state is then propagated forward
from t = 0 to t = 0.8, see Figure A.5.
164
Time (sec)
1782
961
665
548
Speedup
1
1.854
2.680
3.252
Table A.1: Scalability results for the problem in Figure A.5. To reduce spurious
variations, the elapsed times reported are the average of times of 10 runs.
We solved this problem on a shared memory supercomputer, an IBM p690
Regatta with 32 Power4 processors each running at 1.3GHz and with 32MB of
cache. While our implementation is not tuned for performance, it still shows good
scalability, see Table A.1 for some timings. Due to our choice of a shared memory
implementation, the scalability for our code is mostly unchanged when varying V
and t.
A.5
Concluding remarks
Acknowledgements
Bjrn-Ove Heimsund is grateful to The Research Council of Norway (Computational mathematics in applications) for support. Thanks to Philip Binning for
reading an early draft of the manuscript, and to Hong Wang and Thomas Russell
for useful remarks.
Paper B
Adjoint methods are particle
methods
Thomas F. Russell, Bjrn-Ove Heimsund, Helge K. Dahle and Magne S. Espedal
Abstract
We extend the Eulerian-Lagrangian localized adjoint method (ELLAM) to nonlinear hyperbolic equations. The adjoint to the hyperbolic operator is shown to
be a linear advection operator which corresponds to a mass transfer equation.
Couplings between these operators are illustrated for some classical examples,
and a numerical algorithm is developed which exploits this adjoint framework.
Some numerical examples are provided.
Draft manuscript
B.1 Introduction
B.1
167
Introduction
B.2
A scalar equation
(g(u)) + V~ f (u) = 0,
t
x~ ,
t J = [t0 , t1 ],
(B.1)
168
Z
1
Z t1 Z
V~ n~ f (u) w ds dt
(g(u)w) d~
x+
t0
Z t1 Z
(g(u)w) d~
x+
t0
(g(u)wt + V~ f (u) w) d~
x dt.
(B.2)
Thus, if w(~
x, t) satisfies the adjoint equation
Au w g(u)wt + V~ f (u) w = 0,
the unknown u(~
x, t) will be the solution of the weak form
Z
Z
Z t1 Z
1
~
V ~n f (u) w ds dt = (g(u)w)0 d~
x.
(g(u)w) d~
x+
t0
(B.3)
(B.4)
Note that (B.4) has no advective term; the advection is tacitly represented by the
test function via (B.3), so that the t0 mass on the right-hand side of (B.4) advects
to the t1 mass on the left-hand side.
Next, rewrite the adjoint equation (B.3) in the form
wt + v~ w = 0,
where
(B.5)
f (u)
.
(B.6)
v~(~
x, t, u) = V~ (~
x, t)
g(u)
It is evident that (B.5) is a linear equation for w with a velocity v~ that depends
nonlinearly on u, in contrast with the nonlinear equation (B.1) for u. We observe
that v~(~
x, t, u) is a particle velocity, the velocity of movement of a fluid particle
governed by (B.1). To see this, note that if the particle velocity were V~ , the
volume flux across a unit cross-section per unit time would be V~ n~, hence the
flux of the conserved quantity would be g(u)V~ ~n. The actual flux is f (u)V~ n~, so
that the correct particle velocity is obtained by multiplying V~ by f (u)/g(u), as in
(B.6). The next section is devoted to an explanation of why the adjoint velocity
should be a particle velocity.
B.3
169
The key to this relationship is the understanding that the test function w(~
x, t) is
really a linear functional belonging to the dual space X of the primal space X
that contains the solution u(~
x, t). We need not specify X precisely for the purpose
of this discussion. The duality pairing is the angle bracket hAu, wi in (B.2), and
the formal adjoint of the nonlinear operator A on X is the linear operator Au on
X in (B.3). When Au w = 0 as in (B.3), the weak form of (B.1) involves nothing
but the boundary terms (at t0 , t1 , and ) of the adjoint relation, as in (B.4).
We are interested in advection of mass, and the term is peripheral to our
concerns if necessary, assume that the mass in question is far from the boundary. Then (B.3) implies that the mass-conservation equation (B.4) is
Z
Z
1
(g(u)w) d~
x = (g(u)w)0 d~
x,
(B.7)
t0
View w 1 as a linear functional belonging to the dual space X1 of the primal space
X1 that contains the solution u1 u( , t1 ), and analogously view w 0 w( , t0 ) as
an element of X0 . Then w 1 evaluates the mass in 1 at time t1 , w(~
x, t) propagates
0
0
the mass, and w evaluates the conserved mass at time t that has been propagated
forward in time from a set 0 to 1 . Note that, like w 1 , the values of w(~
x, t) are 0
and 1, easily seen by (B.5). Thus, w 0 is the indicator function of 0 .
View the determination of 0 from 1 , or equivalently w 0 from w 1 , as the
action of a linear operator T : X1 X0 , so that w 0 = T w 1 . Then T tracks
sets, or equivalently the functionals that evaluate the mass in them, backward in
time. We can write (B.7) in terms of duality pairings as follows:
hg(u1 ), w 1 iX1 ,X1 = hg(u0 ), T w 1 iX0 ,X0 .
(B.8)
170
Mass particles propagate at the adjoint velocity because they are represented by limits of masses in sets (finite volumes), which correspond to indicator test functions, which are linear functionals in the
dual space, which track via the linear operator T associated with the
dual adjoint equation (B.5).
Since (B.5) is linear in w, its solution always exists in the classical sense of generalized solutions characteristics do not intersect, and shocks do not form. This
matches the physical interpretation of particle propagation trailing particles do
not overtake leading ones, and particles do not vanish into a shock. The dual
space is the natural framework for a particle equation that propagates sets to sets
and always makes sense physically. Adjoint methods like ELLAM operate in this
framework.
B.4
(B.10)
f (u)
~ (~
x, t, u) = V~ (~
x, t) .
g (u)
(B.11)
Du
= 0.
Dt
(B.12)
171
(B.14)
Here, u1 and u0 directly represent mass, rather than evaluating it on sets as w 1 and
w 0 did in Section B.3. View the propagation of mass from t0 to t1 as the action of
a nonlinear operator T : X0 X1 , where
f (u) ~
0
0
(B.15)
(T u )(~
x) = u x~ V t .
g (u)
T tracks mass forward in time at the wave velocity ~ , and T u0 is evaluated via
backtracking as in (B.15). In terms of duality pairings, (B.14) becomes
hg (u1 )u1 , w 1 iX1 ,X1 = hg (u1 )T u0 , w 1 iX1 ,X1 .
(B.16)
172
B.5
We will now illustrate how the adjoint characteristics interact with the primal
waves in some simple settings. We will consider both shocks and rarefactions,
and the effects of an improper numerical treatment.
1, x < 0,
1/2, x > 0.
173
f (u) 6
1
1/2
2t
Figure B.2: Shock wave
1, x < 2t,
0, x > 2t.
Note that all particles propagate more slowly than the wave. The characteristics
of the adjoint equation like (B.5),
wt + vwx = 0,
are shown in Figure B.3.
The dual (ELLAM) weak form of (B.17) will be, as in (B.7),
Z
Z
1 1
u w dx = u0 w 0 dx
R
(B.19)
(B.20)
174
x + 2t
t1
t0
u
= 0,
x R, t > 0,
t + x 2
(B.21)
0
u(x, 0) = u (x), x R.
The flux-function f (u) = u2 /2 is given in Figure B.4. The wave velocity (B.11)
and particle velocity (B.6) here are
(u) = f (u) = u,
f (u) u
v(u) =
= .
u
2
175
f (u) 6
1
0, x < 0,
1, x > 0,
illustrated in Figure B.5. By the entropy conditions, the solution is given explicitly
by
0, x < 0,
x
u(x, t) =
, 0 < x < t,
t
1, x > t.
As u changes continuously through the rarefaction, both the wave and particle
velocities will also change continuously. In the rarefaction, the velocity of the
particles are given by
dx u x
= = .
dt 2 2t
This can be solved for x(t) to give
x(t) = C t,
(B.22)
where C is determined by the initial position of the particle. Behind the rarefaction
x (t) = 0, and ahead of it x (t) = 1/2.
In contrast to the shock, the particle paths going into a rarefaction changes
continuously, see Figure B.6. This is a much harder problem than that of a shock
where a single bending will suffice. Tangential approximations will overestimate
the particle speed, and leads to a slower development of the rarefaction.
Another problem is that ELLAM has no mechanism to ensure that it computes
the correct entropy solution. For example, in Figure B.5, both a rarefaction and
176
Figure B.5: Initial wave (solid) and the resulting rarefaction (dotted)
t
B.6 Algorithm
177
such as lumping the whole or parts of the mass matrix have been used; however,
this can lead to excessive smearing, and for the non-linear case can lead to a very
incorrect shock value u, which in turn gives large errors in the shock velocity.
Round-off errors when solving the linear systems in ELLAM adds a small
error in every timestep. While of small magnitude, the non-linear features of the
hyperbolic equation can magnify this error to such large proportions as to render
the solution unusable. We shall return to this in Section B.7 when we perform
numerical experiments.
The most reliable way to stabilize a non-linear ELLAM solver is to incorporate
artificial numerical diffusion as in Equation (B.23). Its magnitude must be as
small as possible, while ensuring that oscillations do not occur.
B.6
Algorithm
We will now outline an algorithm for the solution of a simplified variant of Equation (B.1)
ut + f (u)x = Duxx ,
(B.24)
with the adjoint equation
wt +
f (u)
wx = 0.
u
(B.25)
(B.26)
where we assume periodic boundary conditions. For details on how to solve linear
equations with ELLAM, see [40, 263]. Here our focus will be on the differences
between a linear ELLAM and the non-linear formulation.
Algorithm 1 (Non-linear ELLAM) Solves Equation (B.26) from t = tn to t =
tn+1 using a timestep of size t = tn+1 tn .
1. For every element i, do
(a) Compute the elemental mass integral at t = tn+1
Z
un+1 w dx, w
i
178
u wdx
m
X
j=1
where Wj are the quadrature weights and xj are the quadrature points. To evaluate
w(xj ), we use forward-tracking to tn+1 of xj to xj . The particle speeds are given
by
dx f (u)
=
,
dt
u
and are easily approximated using prediction tracking
x t
f (un )
,
un
un = u(t = tn ).
However, as we explained earlier, this is insufficient to handle hyperbolic equations. Another approach is the prediction-correction tracking
t f (un ) f (un+1 )
x
+ n+1
2
un
u
The first step of this scheme is to compute an approximation u n+1 to un+1 using
the prediction tracking. This approximation is then inserted into the predictioncorrection tracking to produce a correction of un+1 . Further corrections can be
applied until the computed un+1 converges.
179
However, since the simple prediction tracking can have low accuracy, there is
little reason to believe that the prediction-correction tracking will be more accurate
in the presence of strong non-linear behavior. For problems with a fixed velocity
field such a tracking works well, however.
What is necessary is to incorporate the information from the primal waves
into the adjoint tracking. The idea is to use a Riemann solver at discontinuities to
resolve the primal wave structures into shocks and rarefaction waves. Assuming
piecewise linear basisfunctions for u, discontinuity detection is done numerically
by checking for large numbers of jumps, that is all indices i such that
uni+1 uni > s ,
where s is some specified tolerance. We then solve a Riemann problem using uni
as left state and uni+1 as right state. If the solution of the Riemann problem gives
us a shock wave, we store the shock saturation u , its speed and origin from t = tn .
This information is then used in the particle tracking. We start by performing a
prediction
tracking, and if we detect that a particle path intersects a shock (at point
xj , tj , say), then we use the shock value of u = u , giving us the scheme
f (u(xi ))
f (u )
+ tn+1 tj
.
u(xi )
u
The generalization to the case of multiple shock-crossings is straight forward.
Rarefactions are harder, as the particle path bends continuously. A natural
approach is to Taylor-expand
f (u(x, t))
u(x, t)
in the variables x, t, starting at xi , tn . A major motivation of ELLAM is the possibility to use long timesteps, and it is well known that Taylor expansions with few
terms are only valid in a small vicinity of the starting point. Thus such expansions
are likely to be expensive and potentially unstable.
Instead, we use the results of the Riemann solver to determine the presence of
a rarefaction. If a rarefaction is found, we increase the number of particles tracked
forward in time, which increases the accuracy of the numerical integral slightly.
As the Riemann solver is only used on distinct discontinuities, this will only help
in the initial development of a rarefaction; in practice we use artificial diffusion to
distribute mass properly.
xj = xi + tj tn
B.7
Numerical results
We will compare our non-linear ELLAM formulation with a simple upwinding finite volume scheme. The ELLAM used a simple forward-tracking and an adaptive
composite Simpson 1/3 rule, and piecewise linear basis functions were used.
180
10
10
10
10
15
10
0.5
1.5
2.
Figure B.7: Stability of the solution of Burgers equation with an initial condition
of u = 1 on an infinite domain. The curve depicts Equation (B.27) as a function of
time. The y-axis is logarithmic.
(B.27)
as a function of t. The grid used has the spacings x = 1/100 and t = x. We see
that just before t = 2, the oscillations have grown to order O(1), and the solution
s = 1 has been destroyed. The errors introduced are solely due to round-off errors
in the course of the computation, and the propagation of errors into the velocity
field and the solution. L2 projection errors do not play a part here.
181
0
0.2
0.6
Figure B.8: Initial condition for solving the discretized Burgers equation
Adding diffusion of size D = 103 completely damped the resulting oscillations in this case. We have not found a functional relationship between the needed
stabilizing diffusion and the discretized problem (x, t, f (u), etc), so in our remaining numerical experiments, we have experimentally determined D. We note
here that lumping the mass-matrix did work for t = x, but for larger stepsizes
lumping could not prevent the rising instability.
0, x < 15 ,
u(x, 0) =
1, 51 x 35 ,
0, x > 35 .
This domain is discretized using x = 1/100, see Figure B.8. We compare a
simple conservative upwinding method [174]
uin+1 = uni
t
f (uni ) f (uni1 ) ,
x
against our non-linear ELLAM. The results of the upwind method is given in
Figure B.9 with the analytical solution overlaid. It used t = x/10.
Next, we solve using ELLAM, with an artificial diffusion of D = 2 103 and
a timestep of t = 5x. The computed result without the frontal Riemann solver
is shown in Figure B.10, and the results with a Riemann solver is in Figure B.11.
182
0
0.2
0.7
0.85
Figure B.9: Solution of the Burgers equation with a simple upwinding scheme.
The dashed line is the analytical solution and the solid line is the computed, both
displayed at t = 1/2.
1
0
0.2
0.7
0.85
Figure B.10: ELLAM without Riemann solver. As particles in front of the shock
move too slowly, the result is oscillations.
1
0
0.2
0.7
0.85
0.2
0.7
0.85
Figure B.11: ELLAM with Riemann solver. The left figure uses a diffusion coefficient of D = 2 103 , and the right uses the double D = 4 103 .
183
Without shock handling oscillations occur, and the accuracy is rather degenerate. Particles in front of the shock do not speed up properly, and contribute mass
behind the shock, leading to excessive mass build-up and oscillations. However
the results using a Riemann solver are much better, and quite comparable to the
upwinding method. The hump in the rarefaction is caused by the tangential approximation to the particle paths, adding more diffusion will smear this out, at the
cost of also smearing the shock front.
B.7.3 Buckley-Leverett
A more challenging problem than Burgers is the Buckley-Leverett equation for
two-phase flooding of a reservoir. Instead of the simple variant considered earlier,
we will now solve for the full form:
u
+ f (u) = 0, x (0, 1), t > 0,
t x
u(x, 0) = 0, x (0, 1],
(B.28)
u(0, t) = 1, t 0,
f (u(1, t)) n = 0, t 0.
The flux-function is
u2
.
(B.29)
u2 + (1 u)2
For the upwinding method, the same discretization is used as for the Burgers
case (x = 1/100, t = x/10). Its results are in Figure B.12. The ELLAM now
needs an artificial diffusion of D = 102 , and uses x = 1/100, t = 5x. See
Figure B.13 for the computed results. The solution states are shown at t = 1/2
with the analytical solution overlaid.
f (u) =
B.7.4 2D Buckley-Leverett
We proceed to solve the 2D Buckley-Leverett equation on a rotational velocity
field:
u
+ V~ f (u) = 0, x~ , t > 0,
(B.30)
t
where the velocity is spatially dependent
V~ (~
x) = 2 [y, x] ,
and the flux function is given in Equation (B.29). Using the polar coordinates r
and , the spatial domain is = {r, | r < 2}. Initial conditions are
1, (0, /2)
u(r, ) =
0, [/2, 2].
184
0.7071
0
0
0.6036
0.7071
0
0
0.6036
185
(B.31)
= V~
(B.32)
(B.33)
The streamlines (B.32) are computed by an ODE solver, and the computation
is only done initially as the velocity field is fixed in time. After computing the
streamlines, which here are closed circles, Equation (B.33) is solved using our 1D
non-linear algorithm.
30 streamlines positioned at (r = ri , = 0) are traced by solving Equation (B.32), and for the 1D Buckley-Leverett problem we use the same discretization as before. The results are shown in Figure B.14. We see that the streamline
approach has successfully reduced the 2D problem into 1D problems with good
accuracy.
B.8
Concluding remarks
In this paper we have developed a non-linear Eulerian-Lagrangian localized adjoint method for solving hyperbolic equations in multiple spatial dimensions.
The adjoint formulation is simple and a natural extension of the linear case, and
we have provided several examples to illustrate its performance, both analytical
(B.5) and numerical (B.7).
186
0
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-1
-2
-2
-1
(a) Initial at t = 0
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-1
-2
-2
-1
(b) Final at t = 1
Paper C
Multiscale Discontinuous Galerkin
Methods for Elliptic Problems with
Multiple Scales
Jrg Aarnes and Bjrn-Ove Heimsund
Abstract
We introduce a new class of discontinuous Galerkin (DG) methods for solving
elliptic problems with multiple scales arising from e.g., composite materials and
flows in porous media. The proposed methods may be seen as a generalization of
the multiscale finite element (FE) methods. In fact, the proposed DG methods are
derived by combining the approximation spaces for the multiscale FE methods and
relaxing the continuity constraints at the inter-element interfaces. We demonstrate
the performance of the proposed DG methods through numerical comparisons
with the multiscale FE methods for elliptic problems in two dimensions.
Accepted for publication in Lecture Notes in Computational Science and Engineering, Springer
C.1 Introduction
C.1
189
Introduction
(a(x)u) = f,
u = 0,
a(x)u n = 0,
in Rd ,
on D ,
on N = \D ,
(C.1)
x , y Rd , y 6= 0.
We will interpret the variable u as the (flow) potential and q as the (flow) velocity. The homogeneous boundary conditions are chosen for presentational brevity.
General boundary conditions can be handled without difficulty.
Equation (C.1) may represent incompressible single-phase porous media flow
or steady state heat conduction through a composite material. In single-phase flow,
u is the flow potential, q = a(x)u is the Darcy filtration velocity and a(x) is
the (rock) permeability of the porous medium. For heat conduction in composite
materials, u, q and a(x) represents temperature, heat flow density, and thermal
conductivity respectively. These are typical examples of problems where a(x) can
be highly oscillatory and the solution of (C.1) displays a multiscale structure. This
leads to some fundamental difficulties in the development of robust and reliable
numerical models.
In this paper we introduce a new class of DG methods for solving this particular type of multiscale elliptic problems. Until recently, DG methods have been
used mainly for solving partial differential equations of hyperbolic type, see e.g.
[74] for a comprehensive survey of DG methods for convection dominated problems. Indeed, whereas DG methods for hyperbolic problems have been subject
to active research since the early seventies, it is only during the last decade or
so that DG methods have been applied to purely elliptic problems, cf. [21] and
the references therein. The primary motivation for applying DG methods to elliptic problems is perhaps their flexibility in approximating rough solutions that
may occur in elliptic problems arising from heterogeneous and anisotropic materials. However, to our knowledge, previous research on DG methods for elliptic
problems has been confined to solving elliptic partial differential equations with
smooth coefficients.
DG methods approximate the solution to partial differential equations in finite
dimensional spaces spanned by piecewise polynomial base functions. As such,
190
they resemble the FE methods, but, unlike the FE methods, no continuity constraints are explicitly imposed at the inter-element interfaces. This implies that
the weak formulation subject to discretization must include jump terms across interfaces and that some artificial penalty terms must be added to control the jump
terms. On the other hand, the weak continuity constraints give DG methods a
flexibility which allows a simple treatment of, e.g., unstructured meshes, curved
boundaries and h- and p- adaptivity. Another key feature with DG methods is their
natural ability to impose mass conservation locally. Moreover, the local formulation of the discrete equations allows us us to use grid cells of arbitrary shapes
without difficulty. We may therefore choose the gridlines to be aligned with sharp
contrasts in, for instance, underlying heterogeneous materials.
The multiscale FE methods (MsFEMs) introduced in [67, 146] have been successfully applied to multiscale elliptic problems, but their accuracy is to some
degree sensitive to the selection of the boundary conditions that determine the FE
base functions. If, for instance, strong heterogeneous features penetrate the intercell interfaces, then simple, e.g. linear, boundary conditions may be inadequate.
In such situations, oversampling strategies or other techniques for the generation
of adaptive boundary conditions must be used to recover the desired order of accuracy. This sensitivity to the selection of boundary conditions is partly due to the
strong continuity requirements at the inter-element interfaces implicit in the FE
methods.
Here we propose a class of multiscale DG methods (MsDGMs) for solving
elliptic problems with multiple scales. One of the primary motives for developing
MsDGMs is to generate multiscale methods that are less sensitive to the selection
of boundary conditions for the base functions than is the case for the MsFEMs.
Another nice feature with MsDGMs is that they produce solutions for both the
potential variable (e.g. pressure or temperature) and the velocity variable (e.g.
phase velocity or thermal flux density) that reflect important subgrid variations
in the elliptic coefficients. We will demonstrate the benefit of using multiscale
methods in comparison with ordinary monoscale numerical methods and perform
numerical experiments to display the performance of the MsDGMs relative to
the original and mixed MsFEMs. We therefore attempt to reveal that there is
a need for multiscale methods, and to demonstrate under what circumstances it
may be advantageous to relax the inter-element continuity assumptions implicit in
the MsFEMs.
The paper is organized as follows. We give the general mathematical setting
for the DG methods in Section C.2 and show how they are related to the more
familiar FE methods. In particular we show that both standard and mixed FE
methods may be viewed as special DG methods. This observation allows us to
extend this type of FE methods to corresponding DG methods. In Section C.3
we outline the MsFEMs introduced in [146] and [67] and exploit the relationship
191
between FE methods and DG methods to derive a corresponding class of MsDGMs. Finally, Section C.4 contains the numerical experiments and we conclude
with a discussion of the results in Section C.5.
C.2
Mathematical formulations
=
=
=
=
a(x)u,
f,
0,
0,
in ,
in ,
on D ,
on N .
p
dx
=
u p dx p QN ,
q v dx =
fv dx
v UD .
In the DG methods, a similar set of equations is derived for each grid cell.
However, for the grid cell equations it is not natural to impose homogeneous
boundary conditions. The boundary conditions are therefore approximated from
neighboring values of the unknown solution. Essentially we want to ensure that
the potential u and the velocity q are almost continuous at the interfaces. Since
we do not want to enforce continuity by imposing constraints on the approximation spaces as the FE methods do, we have to penalize the deviation from continuity by introducing an artificial penalty term. To understand the mechanism behind
192
the penalty term, we digress for a moment in order to consider an example that
illustrates the basic principle.
Example: Consider the Poisson equation with Dirichlet data,
u = f, in ,
u = g, on ,
and for each > 0, let u H 1 () be the solution to the regularized problem
Z
Z
Z
1
(u g)v ds = fv dx v H 1 ().
(C.2)
u v dx +
Here ds denotes the surface area measure. This problem corresponds to perturbing
the boundary data so that instead of u = g we have u + u n = g on . One can
show that (C.2) is well posed and that u u H01 () as 0 [181]. Hence, we
see that the extra penalty term is added in order to force, in the limit 0, the
satisfaction of the boundary conditions.
Just as the satisfaction of the Dirichlet boundary data was imposed weakly
in (C.2), so can inter-element continuity be attained in a similar fashion. It was
this observation that originally led to the development of the interior penalty (IP)
methods [20, 83, 272]. Arnold et al. [21] recently recognized that the IP methods,
along with several other methods with discontinuous approximation spaces, can
be classified as DG methods. These methods differ in the flux approximating
schemes used to force continuity at the inter element interfaces. We now describe
the general framework for the DG methods with respect to the elliptic problem
(C.1).
Let T () = {T T } be a family of elements in a partitioning of and define
T = {T : T T }, = T \ and ij = Ti Tj , Ti , Tj T . Next, introduce
an approximation space Qh U h (H 1 (T ))d H 1 (T ) where
H k (T ) = {w L2 () : w H k (T ), T T }.
The DG method then seeks q h QhN = Qh QN and uh UDh = U h UD such that
Z
Z
Z
1 h
h
(C.3)
a q p dx = u p dx
u p nT ds
p QhN ,
TZ
TZ
T
Z
h
(C.4)
q v dx = fv dx +
v q nT ds
v UDh ,
T
193
The perhaps simplest and most natural choice of numerical fluxes is to set
(q,
u)
=
1 h h
(q , u )|Ti + (q h , uh )|Tj
2
on ij .
We see that this option, which was considered by Bassi and Rebay in [27], does
not involve a penalty term and simply computes the fluxes by taking the average
of the functional limits on each side of the inter-element interfaces ij . Though
this option seems attractive, the lack of a penalty term renders the method unstable
and may lead to a singular discretization matrix on certain grids. It is therefore
clear that the stabilization of the DG methods via the inclusion of a penalty term
is crucial. In fact, without it, not only stability is affected, but convergence is
degraded or lost [21].
To define the numerical fluxes that will be used in this paper, it is convenient
to introduce, for q Qh , u U h , and x ij , the mean value operators
1
(ui (x) + uj (x)),
2
1
{q}(x) =
(qi (x) + qj (x)),
2
{u}(x) =
Here (qk , uk ) = (q, u)|Tk and nij is the unit normal on ij pointing from Ti to Tj .
We shall employ the numerical fluxes associated with the method of Brezzi et al.
[48], which are
u = {uh },
q = {q h } [uh ].
(C.5)
These numerical fluxes have been analyzed in [57] in the wider context of LDG
(Local Discontinuous Galerkin) methods, and gives a stable, convergent method
when = O(1/h). While there are many other numerical fluxes that has been proposed for DG methods, see e.g., [21], we have chosen to use the Brezzi fluxes
(C.5) because they are simple, stable, and consistent, and give the same rate of
convergence (at least for elliptic problems with smooth coefficients) as more elaborate DG methods.
194
The need to construct approximation spaces for both the potential variable and
the velocity variable leads to a relatively large number of degrees of freedom per
element. However, it is standard procedure in the literature on DG methods to
eliminate the velocity variable from the discretized equations. This elimination
leads to the primal formulation:
Z
h h
B (u , v) = fv dx, v U h ,
(C.6)
R
T \
(C.7)
ds,
and q = q(u
h , q h ) is defined with the understanding that q h satisfies
R
R
R
a1 q h p dx = uh p dx + T [u uh ] {p} ds
+
(C.8)
h
T \ {u u }[p] ds.
If the unknowns associated with the velocity variable q h are numbered sequentially, element by element, then the matrix block that stems from the term on the
left hand side of (C.8) becomes block diagonal. This allows us to perform a Schurelimination of the discretization matrix to give the reduced form corresponding to
B h ( , ) at a low cost. Thus, to compute uh using the primal formulation, we
eliminate first the velocity variable by Schur-elimination. The next step is to solve
(C.6) for uh . Finally one obtains an explicit expression for the fluxes by backsolving for q h in (C.8).
For the numerical fluxes considered in this paper, we have u = uh . Thus,
since q is conservative, i.e., unit valued on T , the integral over T \ in
B h (uh , v) vanishes, and the primal form reduces to
Z
Z
h h
h
B (u , v) := u av dx
([uh ] {av} {q}
[v]) ds.
(C.9)
T
(C.10)
195
A rigorous analysis of the primal form (C.10) in the case of polynomial elements
can be found in [21]. There it was shown that the bilinear form (C.10) is bounded
and stable, provided that the stabilizing coefficient is chosen sufficiently large.
Hence, the same type of constraint applies to either we formulate the DG method
using the mixed formulation (C.3)(C.4) or the primal formulation (C.6) and (C.8)
using the primal form (C.10).
discontinuous Galerkin
(C.11)
T \
numerical flux for the velocity variable and an approximation space U h C()
reduces to the standard FE variational formulation.
Similarly, in mixed FE methods one seeks a solution (q h , uh ) of the elliptic
problem (C.1) in a finite dimensional subspace QhN UDh of H (div, ) L2 ().
196
The subscripts N and D indicate that functions in QhN and UDh satisfy the homogeneous Neumann and Dirichlet conditions on N and D respectively. The
mixed FE solution is defined by the following mixed formulation:
Z
Z
1 h
a q p dx = uh p dx,
p QhN ,
Z
Z
q h v dx = fv dx,
v U h ,
C.3
Many areas of science and engineering face the problem of unresolvable scales.
For instance, in porous media flows the permeability of the porous medium is
rapidly oscillatory and can span across many orders of magnitude across short
distances. By elementary considerations it is impossible to do large scale numerical simulations on models that resolve all pertinent scales down to, e.g., the scale
of the pores. The standard way of resolving the issue of unresolvable scales is to
build coarse scale numerical models in which small scale variations in the coefficients of the governing differential equations are homogenized and upscaled to
the size of the grid blocks. Thus, in this approach small scale variations in the
coefficients are replaced with some kind of effective properties in regions that
correspond to a grid block in the numerical model.
Multiscale methods have a different derivation and interpretation. In these
methods one tries to derive coarse scale equations that incorporate the small scale
variations in a more consistent manner. Here we present three different types of
197
198
flow velocity field varies slowly in time away from the propagating saturation
front. In such situations the base functions for the proposed multiscale methods
need only be generated once, or perhaps a few times during the simulation [2].
In other words, all computations at the subgrid level become part of an initial
preprocessing step.
on T \,
kT = 0,
on D T ,
akT n = 0,
on N T .
To ensure that the base functions are continuous, and hence belong to H 1 (),
we require that kTi = kTj on all non-degenerate interfaces ij . The MsFEM now
P
seeks ums U ms = span{k : k = T kT } such that
Z
Z
ms
u av dx = fv dx v U ms .
(C.13)
Since the base functions are determined by the homogeneous equation (C.12),
it is clear that the properties of the approximation space U ms , and hence of the
accuracy of the multiscale solution ums , is determined by the boundary data kT
for the multiscale base functions kT .
In [146, 147] it was shown using homogenization theory that for elliptic problems in two dimensions with two-scale periodic coefficients, the solution ums tends
to the correct homogenized solution in the limit as the scale of periodicity tends to
zero, 0. For a positive scale of periodicity, a relation between and the discretization scale h was established for linear boundary conditions. Moreover, using
multiscale expansion of the base functions they showed that with linear boundary
conditions kT at the interfaces, the resulting solution exhibits a boundary layer
near the cell boundaries and satisfies
ku ums kL2 () = O(h2 + /h).
199
This shows that when and h are of the same order, a large resonance error is
introduced. To reduce the resonance effect, which is caused by improper boundary
conditions, Hou and Wu introduced also an oversampling technique motivated
by the observation that the boundary layer has a finite thickness of order O().
However, for further details about this oversampling technique we refer the reader
to the article by Hou and Wu [146].
ij = a
ij , in Ti Tj ,
|Ti |1
in Ti ,
ij =
1
|Tj |
in Tj ,
(C.14)
on ij ,
on (Ti Tj ) D ,
on (Ti Tj )\(ij D ).
Here nij is the coordinate unit normal to ij pointing from Ti to Tj and n is the
outward unit normal on (Ti ij Tj ). We now define Qms = span{ij : ij }
and seek (q ms , u) Qms P0 (T ) which solves
Z
Z
1 ms
a q p dx = u p dx,
p Qms ,
Z
Z
ms
v q dx = fv dx,
v P0 (T ).
Again we see that the method is determined by the local boundary conditions for
the base functions.
It is also possible to choose the right hand side of the equations (C.14) differently, and in some cases it would be natural not to do so. For instance, in reservoir
simulation the right hand side of equation (C.1) represent wells and wells give rise
to source terms that are nearly singular. For simulation purposes it is important
that the velocity field is mass conservative. To this end, the right hand side of
equation (C.14) in the well blocks must be replaced with a scaled source term at
the well location, see [2] for further details.
200
A rigorous convergence analysis for the mixed MsFEM has been carried out
in [67] for the case of two-scale periodic coefficients using results from homogenization theory. There it was shown that
p
ms
ms
kq q kH (div,) + ku u kL2 () = O h + /h .
Hence, again we see that a large resonance error is introduced when /h = O(1).
As for the MsFEM method, the possibility of using oversampling as a remedy
for resonance errors was explored, and it was shown that oversampling can indeed be used to reduce resonance errors caused by improper boundary conditions.
The need for oversampling strategies to reduce resonance errors is, however, a
drawback with the MsFEMs since oversampling leads to additional computational
complexity.
In addition to the selection of boundary conditions for the base functions, this
method is determined by the choice of numerical fluxes. As indicated in Subsection C.2.1 we limit our study to the numerical fluxes (C.5) of Brezzi et al.
We observe that MsDGMs have much in common with the mortar FE methods.
Indeed, here as in the mortar methods, we construct the base functions locally in a
manner which corresponds to the underlying partial differential equation, and glue
the pieces together using a weak formulation at the interfaces. The new element
201
here is that we derive the above formulation directly from the original and mixed
MsFEMs.
Apart for imposing inter-element continuity weakly, the MsDGMs differ from
the MsFEMs by using multiscale approximation spaces for both the velocity variable and the potential variable. Another motive for introducing a new multiscale
method for elliptic problems is that the accuracy of MsFEMs solutions can be
sensitive to the boundary conditions for the base functions. Indeed, previous numerical experience [2, 1] shows that the MsFEMs with simple boundary conditions may produce solutions with poor accuracy when strong heterogeneous features penetrate the inter-element interfaces. Thus, by introducing a MsDGM we
aim toward a class of multiscale methods that are less sensitive to resonance errors
caused by improper boundary conditions for the multiscale basis functions.
(C.17)
Thus, since U ms coincides with U, it follows from the projection property of the
FE method that ums = uI where uI is the interpolant of the exact solution u on x in
U ms = U. This property is due to the fact that there is no resonance error caused by
improper boundary conditions and implies that the conforming MsFEM induces
an ideal domain decomposition preconditioner for elliptic problems in one spatial
dimension.
In higher dimensions the choice of boundary conditions is no longer insignificant. In fact, the MsFEM may be viewed as an extension operator acting on .
Hence, the restriction of the solution ums to must lie in the space spanned by
the boundary conditions for the base functions. To clarify the relation between
202
the approximation properties for the MsFEM and the selection of boundary conditions, we consider the following homogeneous boundary value problem: Find
u H01 () such that
Z
Z
a(u, v) := u a(x)v dx = fv dx, v H01 ().
Now, let M = H01 ()| and define the following extension operator
H : M H01 (),
a(H, v) = 0,
v H01 (\).
(C.18)
C.4
Numerical results
203
the approximation space U ms for the potential variable u and the Raviart-Thomas
mixed FEM to compute the basis functions that span the approximation space Qms
for the velocity variable q. Finally, the reference solution is computed using the
Raviart-Thomas mixed FEM. We assess the accuracy of the tested methods with
the weighted error measures
y
y !
x
x
1 kqh qr k2 kqh qr k2
kuh ur k2
+
,
E(qh ) =
.
E(uh ) =
y
kur k2
2
kqrx k2
kqr k2
Here k k2 is the L2 ()-norm, the subscript h denotes computed solutions, the
subscript r refers to the reference solution and the superscripts x and y signifies velocity components. When comparing velocity fields we do not include the
FEMs since these methods are not conservative.
This type of coefficients a(x, y) give rise to spurious oscillations in the velocity
field, and the source term f (x, y) exerts a low frequent force. We shall fix P = 1.8,
NM = 512 and = 1/64 for our numerical test cases. We thus get significant
subgrid variation for N < 64 while the resonance is greatest at N = 64. When
N > 64 the characteristic scale of variation is resolved by the coarse mesh and the
use of multiscale methods are no longer necessary.
Table C.1 shows errors E(uh ) in the potential for all the methods. We see that
none of the monoscale methods perform particularly well, as they cannot pick up
subgrid variations in the coefficients, but the DGM is somewhat more accurate
than the other methods. The multiscale methods, on the other hand, generate
quite accurate potential fields. The MsFEM is most accurate here, but MsDGM
is nearly as accurate, and for very coarse meshes it is the most accurate method.
The least accurate multiscale method is the MsMFEM. This is probably due to
the fact that piecewise constant functions are used to approximate the potential.
The results shown in Table C.2 demonstrate that the monoscale methods tend to
give more accurate velocity fields than potential fields, but we still see that the
multiscale methods give much higher accuracy. We observe also that for this test
case the MsMFEM gives more accurate velocity fields than the MsDGM.
The accuracy of the DG methods depend on the parameter in (C.5). The
results in Table C.1 and Table C.2 correspond to the value of = that produced
204
Table C.1: Potential errors for oscillatory coefficients. For the DGM and the MsDGM, the numbers presented correspond to the choice of that gave the smallest
error.
N M MFEM DGM MsMFEM MsDGM
8 64 0.5533 0.6183
0.2985
0.4291
16 32 0.5189 0.5388
0.2333
0.2842
0.2377
0.2579
32 16 0.5093 0.5144
0.2866
0.3177
64 8 0.5058 0.5079
Table C.2: Relative velocity errors for oscillatory coefficients. For the DG methods the numbers presented correspond to the choice of that gave the smallest
error.
the best solutions. Since we do not know a priori what is, it is natural to ask
how the error behaves as a function of , and, in particular, how sensitive the
DG methods are to the penalty parameter . We have therefore plotted E(uh )
and E(qh ) for both the DG method and the MsDGM in Figures C.1 and C.2 as
functions of . We see that the DG method only converges for a succinct choice of
, except for N = 64. In contrast, the MsDGM converges with good accuracy for
sufficiently large . These plots thus demonstrate that; (1): for elliptic problems
with oscillating coefficients the MsDGM is less sensitive to than the monoscale
DG methods, and (2): the convergence behavior for the MsDGM seem to be in
accordance with the convergence theory for DG methods for elliptic problems
with smooth coefficients.
205
N=8
10
10
10
10
10
10
10
10
10
N=32
10
10
10
10
10
10
10
10
N=64
10
10
10
10
10
N=16
10
10
10
10
10
10
10
10
10
10
10
10
10
10
Figure C.1: Errors induced by the DG method as functions of . The solid line is
the potential error, and the dashed line is the velocity error.
N=8 M=64
10
10
10
10
10
10
10
10
10
N=32 M=16
10
10
10
10
10
10
10
10
N=64 M=8
10
10
10
10
10
N=16 M=32
10
10
10
10
10
10
10
10
10
10
10
10
10
10
Figure C.2: Errors induced by the MsDGM as functions of . The solid line is
the potential error, and the dashed line is the velocity error. Note that the error
decreases monotonically as increases, and that the method converges for >
O(1/h).
206
M
64
32
16
8
MsMFEM
0.3375
0.2851
0.2941
0.3346
MsDGM
0.4849
0.3716
0.3472
0.3624
Table C.3: Potential errors E(uh ) (left) and Velocity errors E(qh ) (right) for an
elliptic problem with random coefficients.
the MsDGM generates the by far most accurate potential field, in fact, by almost
an order of magnitude. The MsMFEM still produces the most accurate velocity
field, but the MsDGM produces a velocity field with comparable accuracy. These
results are representative for the results we obtained for a variety of different random coefficient functions. Again the parameter for the MsDGM numerical flux
function q was attempted optimized in order to give best results.
The results in Table C.3 indicate that the MsDGM may be more robust than
the MsFEM. Unfortunately, for this case the MsDGM was more sensitive to than
what we observed in Subsection C.4.1, and choosing too large can deteriorate
the convergence, see Figure C.3. However, the optimal for potential ( 13,
34, 83 and 188 for h = 8, 16, 32 and 64 respectively) still scales like O(1/h).
This suggest that good accuracy should be obtained by choosing /h for some
fixed O(1).
C.5
Concluding remarks
In this paper, we have used approximation spaces for two different multiscale
finite element methods in order to develop a multiscale discontinuous Galerkin method for elliptic problems with multiple scale coefficients. Unlike the
multiscale finite element methods, the multiscale discontinuous Galerkin method
introduced in this paper provides detailed solutions for both velocity and potential that reflect fine scale structures in the elliptic coefficients. This makes the
multiscale discontinuous Galerkin method an attractive tool for solving, for instance, pressure equations that arise from compressible flows in heterogeneous
porous media. Indeed, in compressible flow simulations it is not sufficient to resolve the velocity field well, an accurate pressure field is also needed.
Numerical comparisons with both monoscale- and multiscale methods have
been given. The results show that monoscale numerical methods are inadequate
when it comes to solving elliptic problems with multiple scale solutions. We have
further demonstrated that the multiscale discontinuous Galerkin method produce
207
N=8 M=64
10
10
10
10
10
10
10
10
10
N=32 M=16
10
10
10
10
10
10
10
10
N=64 M=8
10
10
10
10
10
N=16 M=32
10
10
10
10
10
10
10
10
10
10
10
10
10
10
Figure C.3: Logarithmic plot of errors for the MsDGM for random coefficients.
The plots should be compared with the results depicted in Figure C.2.
solutions with comparable or higher accuracy than solutions produced by corresponding multiscale finite element methods. To summarize the results for the
multiscale methods, we plot errors for all the multiscale methods in Figure C.4.
This figure shows that the velocity solutions obtained with the multiscale discontinuous Galerkin method have comparable accuracy with the velocity solutions
obtained with the corresponding mixed multiscale finite element method. The potential solutions produced by the multiscale discontinuous Galerkin method, on
the other hand, are equally or more accurate than the potential solutions obtained
with both of the multiscale finite element methods.
In the present paper we have not provided any convergence theory, but it is
likely that convergence results can be obtained using results from homogenization theory in conjunction with the convergence theory for discontinuous Galerkin methods for elliptic problems with smooth coefficients. However, since
the discontinuous Galerkin methods appear to give comparable accuracy to the
multiscale mixed finite element methods for which error estimates based on the
homogenization theory have been established, one can expect the discontinuous
Galerkin methods to enjoy similar error estimates. We have also shown that the
multiscale discontinuous Galerkin method appears to converge for values of the
penalty parameter in the numerical flux function that is in accordance with the
convergence theory for Galerkin methods for elliptic problems with smooth coef-
208
10
10
MsMFEM
1
MsDGM
10
MsDGM
MsMFEM
MsFEM
2
10
16
32
64
10
16
32
64
10
10
MsFEM
MsDGM
MsMFEM
10
MsMFEM
MsDGM
2
10
16
32
64
10
16
32
64
Figure C.4: Plot of the errors as functions of grid size for oscillatory (top) and
random coefficients (bottom) respectively. The left figures show potential errors,
and the right figures show the velocity errors. The MsDGM is the solid line, the
MsMFEM is the dashed line, and the MsFEM is the dashed and dotted line.
ficients. A rigorous convergence analysis of the multiscale discontinuous Galerkin
methods is a topic for further research.
Acknowledgments
We would like to thank Kenneth H. Karlsen and Ivar Aavatsmark for useful comments. The second author would also like to thank the Research Council of Norway for financial support.
Paper D
Level set methods for a parameter
identification problem
Bjrn-Ove Heimsund, Tony Chan, Trygve K. Nilssen and Xue-Cheng Tai
Abstract
We present a level set based method for recovering piecewise constant diffusion
coefficients from an elliptic equation. The level set method is used to represent
the coefficients and the augmented Lagrangian method is used to solve the outputleast-squares functional with the equation as a constraint. We do not assume that
the number of the different constant coefficients are known a priori it suffices
to know only an upper bound on the number of the constant values. For a maximum of n different coefficient values, only log2 n level set functions are needed.
The geometry of the different constant coefficient regions are automatically determined by the level set method.
Published in the book Analysis and optimization of differential systems, pages 189200 by Kluwer
Academic Publishers in 2003. Volume 121 of International Federation for Information Processing
(IFIP) series
D.1 Introduction
D.1
211
Introduction
(D.1)
D.2
Here, we state some of the details of the level set idea. Let be a closed curve in
. Associated with , we define as a signed distance function by
distance(x, ),
x interior of ,
(x) =
distance(x, ),
x exterior of .
In many applications, the movement of the curve can be described by a partial differential equation of the function . The function is called a level set
function for . In fact, is the unique viscosity solution for the following partial
differential equation
|| = 1, in .
(D.2)
212
In this work, we shall use the level set method to identify the coefficient q
which is assumed to be piecewise constant. First look at a simple case, i.e. assume
that q has a constant value q1 inside a closed curve and is another constant q2
outside the curve . Utilizing the Heaviside function H (), which is equal to 1
for positive and 0 elsewhere, it is easy to see that q can be represented as
q = q1 H () + q2 (1 H ()) .
(D.3)
In order to identify the coefficient q, we just need to identify a level set function
and the piecewise constant values qi .
If a function has many pieces, then we need to use multiple level
set functions. This idea was introduced in Chan and Vese [60]. Assume that we have two closed curves 1 and 2 , and we associate the
two level set functions j , j = 1, 2 with these curves.
Then the domain is divided into the four parts ++ = {x, 1 > 0, 2 > 0},
+ = {x, 1 > 0, 2 0},
+ = {x, 1 0, 2 > 0},
= {x, 1 0, 2 0}.
Allowing some of the subdomains defined above to be empty, we can easily
handle the case that the zero level set curves could merge, split or disappear. Using
the Heaviside function again, we can express q with possibly up to four pieces with
constant values as
q = q1 H (1 )H (2 ) + q2 H (1 )(1 H (2 ))+
+q3 (1 H (1 ))H (2 ) + q4 (1 H (1 ))(1 H (2 )).
(D.4)
(D.5)
+q2n (1 H (1 )) (1 H (2 )) (1 H (n )) .
Even if we need less than 2n distinct regions, we can still use n level set functions
since some subdomains may be empty. In using such a representation, we need to
determine the maximum number of level set functions we want to use before we
start.
For many practical applications, such kind of priori information is often available or is chosen according the measurements that are available to us. Also, to
ensure ellipticity of equation (D.1), we need each qi to be positive, that is, we assume that there exist 0 < ai < bi < that are known a priori such that qi [ai , bi ].
D.3
213
v H01 ().
(D.6)
Here and later ( , ) denotes the L2 -innerproduct over . We will also use the
notation k k to denote the associated norm. For a given q and a given u, we say
that they satisfy the equation (D.1) if and only if e(q, u) = 0. In order to solve our
inverse problem, we shall try to find a q and u such that e(q, u) = 0 and also fits
the measurements u best among all admissible functions q and u. Using the level
set functions, the coefficient q will be represented as functions of the level set
functions j and the piecewise values qi and the minimization problem we need
to solve takes the form
!
n Z
X
1
min
ku uk
22 +
|H j |dx ,
L ()
qi ,j ,u 2
(D.7)
i=1
|j | = 1, j.
j=1
(D.8)
214
D.3.1 Calculation of qi L
The derivative of L with respect to qi is
e
e
e
L
,
= , (ce ) .
= c e,
qi
qi
qi
qi
From (D.6), it follows that the derivative of e with respect to qi is
e
q
, v =
u, v .
qi
qi
Taking v to be ce gives
L
=
qi
q
u, (ce ) .
qi
D.3.2 Calculation of j L
For clarity of the presentation, we shall first calculate the Gauteaux derivative
of the regularization term, i.e. we first calculate the Gauteaux derivative for the
following functional
Z
Z
R(j ) = |H (j )|dx = (j )|j |dx.
Here and later, denotes the Dirac-function. To get the derivative of R with
respect to j in the direction j , we proceed
Z
Z
j
R
j = (j )j |j |dx + (j )
j dx.
j
|j |
215
Applying Greens formula to the last term which can be theoretically verified by
replacing the delta function by a smooth function and then passing to the limit, we
will get that
Z
Z
j
R
j dx
j = (j )j |j |dx (j )
j
|j |
!
Z
Z
2
|
|
j
j
(D.9)
(j )
= (j )j |j |dx
dx
j +(j )j
|j |
|j |
Z
j
= (j )j
dx,
|j |
j
R
= (j )
.
j
|j |
j
j
Taking v to be ce , we get that
R
q
L
j =
j , u (ce ) +
j .
j
j
j
(D.10)
q
From (D.5), it is easy to calculate the Gauteaux derivative
j . For simplicity
j
of the presentation, let us take the case that we only have two level set functions.
Then q takes the form (D.4). Consequently, the Gauteaux derivative for the function j in a direction j is
q
1 = (q1 q3 )H (2 ) + (q2 q4 )(1 H (2 ) (1 )1
1
q
2 = (q1 q2 )H (1 ) + (q3 q4 )(1 H (1 ) (2 )2 .
2
(D.11)
(D.12)
216
D.3.3 Calculation of u L
We perturb u to u + w and try to calculate the Gauteaux derivative of L with u in
the direction w. First note that
e
i.e.
2. If kek < e , update by k+1 = k ce qik+1 , jk+1 , uk+1 ,
else update c by c c.
217
jk+1 = 0 on kj .
For a one-dimensional problem, this can easily be done, but in higher dimensions
we can use methods described in Osher and Fedkiw [210], and Smereka, Sussman, Fatemi and Osher [252]. There are fast and cheap algorithms to solve this
problem, see [252, 210].
218
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
(a) Initial
0.4
0.5
0.6
0.7
0.8
0.9
(b) Final
Figure D.1: The first case. Convergence was attained after about 300 iterations.
Also, kuh uk2 1.1906 104 at time of convergence.
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Figure D.2: This is the second case, with perturbed qi s and we start with the
same initial levelsets as in the first case. For q , it took about 350 iterations to
converge, and kuh uk2 1.3027 104 at that time, while for q + it only took
about 100 iterations, and here kuh uk2 1.1182 104 . Since the discontinuities
are smaller in this latter case, this quicker convergence is expected.
219
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Figure D.3: Here is the third case, with noise added to u . After about
100 iterations, we had convergence, and the error kuh uk2 was about
1.7251 104 , 1.5019 104 and 1.4223 104 for noise amplitudes of 102 , 5 103
and 103 respectively. More noise made it generally impossible to get convergence, and with less noise the solution was not distinguishable from the case
without noise.
220
D.4
Numerical experiments
In our numerical tests, we will consider three cases. First, the qi s are all known,
and we try to identify j and u; then we perturb qi , and attempt to identify j and
u with qi fixed. Thirdly, we add noise to u,
and try to identify j and u while qi is
known.
The equation we will use is
(qu) = 2 2 sin(x) sin(y), in
u = 0,
on .
(D.13)
Here, = (0, 1) (0, 1). All experiments are done with a uniform, 2D mesh of
, with 24 24 elements. The numerical parameters used are = 107 , c =
5 106 , = 1.1, s = 25.
We shall use q given by two level set functions as follows. q1 = 1 when
1 , 2 0, q2 = 1 + 21/3 when 1 0, 2 > 0, q3 = 1 + 22/3 when 1 > 0, 2 0,
and finally q4 = 3 when 1 , 2 > 0. 1 is positive within the union of the following
rectangles
{x, y:3/4<x<7/8, 1/8<y<7/8} {x, y : 1/8<x<7/8, 3/4<y<7/8},
and 2 is likewise positive within this union
{x, y:1/8<x<1/4, 1/8<y<7/8} {x, y : 1/8<x<7/8, 1/8<y<1/4}.
In all the figures, we draw the zero level set curves of the level set functions.
The dashed lines are for solution, and the solid lines are for the computed solution. Furthermore, 1 and its computed approximation is in the upper-right of the
figures, while 2 is in the lower-left.
extend from the boundary of and into the interior by
In all cases, we let
be a coarser than
a length of 1/3 from all sides. In the remainder, we let
by a factor of two. Thus we have complete observations u along the boundary,
and coarser observations in the interior. Note that kuk 0 near the center of .
Because of this, we cannot easily identify q in the center since for kuk = 0, q is
no longer unique.
Solving for the first case yields the results in Figure D.1, and are quite accurate.
For the second case, we will use the two sets of qi coefficients q and q + ,
and they are given as follows. q1 = q1 0.1, q2 = q2 0.05, q3 = q3 0.05,
q4 = q4 0.1.
Note that the perturbations in q create larger jumps, while in q + the jumps
are smaller. The results are in Figure D.2, and we see that it is easier to identify q +
rather than q due to smaller jumps. Also note that the deviations from Figure D.1
are small.
221
We now come to the third case where we will add noise to our observations,
and try find j and u. Adding normally distributed noise of varying magnitude
gives us the results in Figure D.3. Having larger noise-magnitude than 102 makes
it generally hard to get convergence, while a noise-magnitude of less than 103
hardly makes much of an impact on our case.
Paper E
On a class of ocean model
instabilities that may occur when
applying small time steps, implicit
methods, and low viscosities
Bjrn-Ove Heimsund and Jarle Berntsen
Abstract
The horizontal grid size in ocean models may be gradually reduced with increasing computer power. This allows simulations with less viscosity. The hyperbolic,
free wave nature of the problem will become more dominant. However, as viscosities are reduced, instabilities may more easily occur. To avoid such instabilities,
it is well known from the literature on numerical methods for ordinary differential
equations that implicit methods have attractive stability properties. Therefore, it
may be tempting to add implicitness into the time stepping also in low viscosity
ocean modelling to improve the robustness. However, when using methods with
implicit features and low viscosity, it may happen that models are stable for longer
time steps, but become unstable as the time step is reduced. In the present paper,
an explanation for this phenomenon is given. It is shown that the explanation may
be valid when solving the shallow water equations on a C-grid.
E.1 Introduction
E.1
225
Introduction
E.2
Stability analysis
226
[0, 1].
1 + (1 )z
.
1 z
Since the solution is assumed to be n , it will be bounded if || 1. The set of
zvalues which give bounded solutions, is the stability region of the method. As
an example, for = 0 the method is the forward Euler method, and
=
= 1 + z,
which gives a stability region given by z = r exp(i) 1, = [0, 2), r = [0, 1].
This is the interior of a unit-circle centred at z = 1. For = 1/2, the method is
the trapezoidal method, and
2+z
=
.
(E.1)
2z
For this method the stability region is the negative half of the complex plane, z 0.
An often used ODE solver is the multistage Runge-Kutta 4 (RK4),
1
un+1 = un + (a + 2b + 2c + d),
6
with
a = zun , b = z(un + a/2) , c = z(un + b/2) , and d = z(un + c).
Assuming polynomial solution n , we get
1
1
1
= 1 + z + z2 + z3 + z4 .
2
6
24
(E.2)
This is evaluated numerically, and the boundary of the stability region is given in
Figure E.1.
The third method to be considered is the Leapfrog predictor, Adams-Moulton
2-step corrector scheme (AM2+LF), as used in ROMS,
( n+1
u
= un1 +2zun
un+1 = un + z a1 un+1 + a2 un + a3 un1 .
227
3
2.5
1.5
0.5
Figure E.1: Stability region for the RK4 method. The real axis range from 3
to 0.5, and the imaginary axis range from 3 to 3. is exactly equal to one
in magnitude along the drawn lines, and the magnitude is smaller than one on
the inside. Note that parts of the positive half-plane are included in the stability
region.
The coefficients are a1 = 5/12, a2 = 2/3 and a3 = 1/12. To analyse stability, we
insert the predictor into the corrector, getting
un+1 un = 2a1 un z2 + a2 un + (a1 + a3 )un1 z.
Assuming polynomial solution un = n , the previous equation may be written as
( 1) = 2a1 z2 + (a2 + a1 + a3 ) z.
(E.3)
For each value of z, the equation yields two roots in . As is known from the
theory of multi-step methods, both roots must satisfy || 1 or || < 1 if they are
equal. Using numerical evaluation, the outline of the stability region is given in
Figure E.2.
E.3
Numerical experiment
Let f be the Coriolis parameter, g the gravity constant, U and V the depth integrated transports in x and y directions, respectively, the sea surface elevation
228
1.5
0.5
1.2
0.8
0.6
0.4
0.2
0.5
1.5
Figure E.2: Stability region for the AM2+LF predictorcorrector method. The
real axis range from 1.2 to 0.075, and the imaginary axis range from 1.6 to 1.6.
is exactly equal to one in magnitude along the drawn line, and the magnitude
is smaller than one on the inside. As for the RK4 method, parts of the positive
half-plane are included in the stability region.
and H the undisturbed water depth. The linearised shallow water equations in
Cartesian coordinates (x, y) may then be written
U
= gH + fV
t
x
V
= gH fU
t
y
U V
=
.
t
x y
(E.4)
229
modes. A theoretical explanation was also given, and three case studies were
presented. To illustrate the main point, the case with the simplest possible C-grid
ocean model involving both U, V and points is repeated below, see Figure E.3.
1
V
f
gHu
gHu
0
0
U
x
x
4
gHv
f
v
0
0 gH
y
y
4
V
0
0
0
0
1 ,
y
1
1
x y
0
0
0 2
1
3
0
0
0
0
x
where Hu and Hv are the depths in the U and V points respectively, x and y
are the lengths of the grid cells in two directions. In the experiment g = 9.81 ms2 ,
f = 1.3 104 s1 , x = y = 20000.0 m, Hu = 200.0 m and Hv = 2000.0 m .
The initial values are 1 = 3 = 0.0 m, 2 = 1.0 m, U = V = 0.0 m2 s1 . For the
values of the parameters given above, the 5 eigenvalues of the matrix A above are:
1,2 = 0.7665 105 0.0100i, 3,4 = 0.7665 105 0.0026i, 5 = 0.0. From
the initial values, the eigenvectors, and the eigenvalues, the exact solution after
230
231
t
2
U
V
||
256.0 -0.61386 17.570
297.81 1.000740577
128.0 -4.5845
28.611
457.13 1.000694774
64.0
0.3028 -108.953 -2013.9 1.000444791
32.0 -1.3130 135.981 2505.6 1.000239146
16.0 -8.9838 -102.860 -2022.0 1.000121864
8.0
-0.3633 -146.604 -2727.7 1.000061224
4.0
14.618
9.495
350.1 1.000030649
2.0
12.528
71.245
1468.8 1.000015329
1.0
11.621
85.031
1713.1 1.000007665
0.5
11.373
88.327
1771.1 1.000003833
0.25
11.310
89.141
1785.4 1.000001916
Table E.1: Results for the trapezoidal method. The values are from left to right:
the time step in s, 2 in m, U and V in m2 s1 , after 120 hours. In the last column
|| is given, computed using the eigenvalue of A with positive real part, the given
time step, and Equation (E.1). The method is stable for || 1.
t
512.0
256.0
128.0
64.0
32.0
16.0
8.0
4.0
2.0
1.0
0.5
2
U
V
||
Infinite values after 31.57 hours 24.30160691
0.3327 -0.1823
0.0985
0.5534924853
0.3383 0.0420
-0.0233
0.9758215121
0.2385 6.478
116.66
1.000024747
12.605 36.552
826.65
1.000237683
11.472 86.281
1734.4
1.000122527
11.301 89.224
1786.8
1.000061321
11.289 89.400
1790.0
1.000030661
11.289 89.411
1790.1
1.000015330
11.289 89.411
1790.2
1.000007665
11.289 89.411
1790.2
1.000003833
Table E.2: Results for the RK4 method. The values are from left to right: the time
step in s, 2 in m, U and V in m2 s1 , after 120 hours. In the last column || is
given, computed using the eigenvalue of A with positive real part, the given time
step, and Equation (E.2). The method is stable for || 1.
to the exact solution. When increasing the time step, the Courant number will
become greater than 1, and the solution will soon tend to infinity.
232
t
512.0
256.0
128.0
64.0
32.0
16.0
8.0
4.0
2.0
1.0
0.5
2
U
V
Infinite values after 32.49 hours
0.33329 0.01239 -0.00668
0.33329 0.01239 -0.00668
0.33618 0.11061 -0.06006
0.33813 0.11501 -0.06706
1.6016 23.967
457.56
8.9737 74.424
1483.8
10.980 87.200
1745.4
11.250 89.120
1784.3
11.284 89.374
1789.4
11.288 89.407
1790.1
|max |
4.70350655
0.910921537
0.910921537
0.990509169
0.999365416
1.000060768
1.000057326
1.000030409
1.000015314
1.000007664
1.000003832
Table E.3: Results for the AM2+LF method. The values are from left to right: the
time step in s, 2 in m, U and V in m2 s1 , after 120 hours. In the last column max
|i |, i = 1, 2 is given. The maximum is computed using the eigenvalue of A with
positive real part, the given time step, and Equation (E.3). The method is stable
for |max| 1.
E.4
Concluding remarks
The stability properties of numerical methods are often studied with the Fourier
method. This method is only applicable to linear equations and constant parameters i.e. constant depth of the ocean and constant Coriolis parameter.
Based on studies for these simplified cases, methods are often selected for
application on more realistic problems involving non-linearities, variable coefficients and coupled systems of equations. In [110] it was demonstrated that even
for the relatively simple and well studied linear shallow water equations, it would
be enough to introduce variable depth or variable Coriolis parameter to make the
space discretised system unstable. If a time stepping method with a region of stability that includes parts of the right half complex plane is applied, the models
may still run stably for longer time steps. However, as the time steps are reduced,
the solution will grow with the growth rate given by the positive real parts of the
eigenvalues of the propagation matrix. This may explain that models for complicated sets of equations may become more unstable as the time steps are reduced.
With increasing computer power, it will be feasible to apply higher spatial resolution. This may allow simulations with smaller viscosities, and the hyperbolic
wave nature of the solutions will become more dominant and realistic. However,
in [110] it is shown that the spatial discretisations may force us to increase the
233
viscosities more than one would like from physical considerations to move the
eigenvalues of the propagation matrices out of the right half complex plane.
An alternative to increase the viscosity, when encountering this growth, is to
introduce time stepping methods with implicit features. This is typically methods
with regions of stability that includes parts of the positive half complex plane. The
models may then run stably for longer time steps, but for consistent methods the
instabilities will appear for small enough time steps. This is demonstrated here
for the RK4 method and AM2+LF method.
Even if one may achieve stability by applying implicit methods and longer
time steps, the price may be that the waves become more damped than they should
be. Thus it is not clear that it is physically more correct to add implicitness instead
of increasing the viscosities when encountering instabilities.
The spatial discretisations should be chosen such that the non-growth property of the underlying partial differential equations is maintained. A new Coriolis
weighting is suggested in [110] that brings all the eigenvalues of the present problem without friction on to the imaginary axis, and therefore non-growing wave
solutions are obtained.
An alternative to applying this linear algebra or eigenvalue approach is to apply the energy method. The task is then to construct discrete spatial operators such
that for instance energy and potential enstrophy are conserved, see [18] and [19].
The schemes produced with the energy method are rather complicated involving
large computational stencils and may be difficult to apply in practise, see [200].
The [110] method only involves the same four points as in the usual 1/4 weighting, but accuracy and conservation of potential enstrophy properties of the method
has not been studied. Thus, the two approaches give energy conserving methods
that are different. Systematic comparisons of other properties of the Espelid and
Arakawa discretisations remain to be done.
It should be noted that ROMS does not apply the 1/4-Coriolis weighting given
here. However, there may be other spatial discretisation errors that moves the
eigenvalues into the right half plane. It is for instance well known that the errors
in the internal pressure gradients in -coordinate ocean models may grow, see
[33, 193].
For full size ocean models it is not feasible to analyse the propagation matrices.
However, the technique illustrated here may be applied to more complicated equations for cases involving a limited number of grid cells. A necessary condition for
stability will be that the model is stable also in cases involving a few cells.
For non-linear systems of equations, the propagation matrices will vary in
time. For these systems one may set up the space and time discretised system
wn+1 = A(n)wn,
and at each time step analyse the eigenvalues of the time dependent propaga-
234
tion matrix A(n). The eigenvalues of A(n) should, in some average sense, be on
or inside the unit circle in the complex plane to ensure stability. This may also
be studied for problems involving small numbers of grid cells. The method for
analysing time and space discretised problems indicated here has recently been
applied to investigate two way nesting techniques for the linearised shallow water
equations [137].
Acknowledgements
BjrnOve Heimsund is grateful to The Research Council of Norway (Computational mathematics in applications) for support. Thanks to Tal Ezer, Hernan
Arango, Alexander F. Shchepetkin, Terje O. Espelid, and two anonymous referees
for useful remarks.
Paper F
A two-mesh superconvergence
method with applications for
adaptivity
Bjrn-Ove Heimsund and Xue-Cheng Tai
Abstract
A two-mesh superconvergent gradient recovery mechanism for elliptic equations
is presented. The method first computes the gradient over a fine mesh and then
project it to a coarser mesh. This projected gradient has superconvergence properties for general unstructured meshes. The difference between the computed and
projected gradient is used as the error indicator in refining a mesh adaptively. This
new superconvergence mesh refinement technique is easy to implement and can
be used for a large class of problems. Numerical experiments for smooth and singular elliptic problems given in this work show the efficiency of this technique.
Comparisons with a classical mesh adaptivity method is also given here to show
the advantages.
F.1 Introduction
F.1
237
Introduction
In recent years, adaptive algorithms for finite element approximation have been
extensively investigated. When producing the mesh, different indicators could be
used to refine it [63, 107, 199, 282, 178, 277, 56, 55, 143, 144]. One of the popular
indicators is the superconvergence indicator. The well-known ZZ [282] method
has been intensively tested and studied for different kind of applications. Recently,
we have extended the original ZZ method to a method called MZZ (modified ZZ)
method [138]. For the MZZ method, it normally works with two meshes. One
is called the master mesh which is coarser and one is called the slave mesh
which is produced by refining the master mesh. The finite element solution is
computed on the finer slave mesh. After the computation of the finite element
solution, we project the gradient or the solution itself to a finite element space of
equal or higher order on the master coarser mesh. It is proved that such a procedure gives a solution having superconvergence properties [266, 267]. Contrary
to many other superconvergence methods, this procedure does not need to impose
uniformity or geometrical symmetry on the mesh. Thus, we use such an indicator
to do mesh refinement, and we do not need to impose any conditions on how to
refine the mesh. Compared with other adaptive mesh methods [178, 179, 278,
280, 186, 185, 276, 184, 183, 182, 100, 277, 66, 62, 68, 54, 56, 53, 55, 32, 143],
our method is easy to implement and can be used for a large class of problems.
F.2
The method can be used for different kind of applications. We shall consider the
model problem:
d
X
j (aij i u) +
d
X
bi i u + cu = f
i=1
i,j=1
in Rd ,
(F.1)
where a = (aij )di,j=1 is the coefficient tensor which is symmetric, bounded, and
uniformly positive definite in a polygonal domain Rd with measurable entries
aij = aij (x). The other coefficients b = (bi (x))di=1 and c = c(x) are assumed to
ensure a uniqueness of solutions for (F.1) such that u H 1 () and u = g on .
The method is also correct for other kind of boundary conditions.
Let
a(u, v) =
d Z
X
i,j=1
aij i uj v dx +
d Z
X
i=1
Z
bi i uv dx +
cuv dx
238
be a bilinear form defined in H 1 () H 1 (). A weak form for the problem (F.1)
seeks a function u H 1 () such that u = g on and
a(u, v) = (f, v)
v H01 ().
(F.2)
v Sh0 .
(F.3)
L .
(F.4)
Let q = au. Under some conditions, the projected flux has superconvergence.
In order to get superconvergence, we need to assume that there is a fixed real
number 1 < s k + 1. The dual problem and the projection finite element space
should have some properties. First, the dual problem of (F.2) has H s -regularity in
the sense that for any given f H s2 (), the following problem
a(v, w) = (f, v)
v H01 ()
239
(F.5)
w
= (aw) n denotes the
where n is the unit outward normal vector of and n
a
normal component of the flux variable on the boundary for the dual solution
w. Second, the projection space L need to have the following regularity
L [H s1 ()]d .
(F.6)
Under those conditions, it can be proved that the projected flux has the following superconvergence property [138]:
kq Q qh k0 C r+1 kqkr+1, + C(h 1 )s1 ku uh k1 .
(F.7)
Since s > 1, h 1 1, we have kq Q qh k0 kuuh k1 if we choose the polynomial order r sufficiently high and the mesh size sufficiently coarse. Accordingly,
we have
kQ qh qh k0
= kq qh k0 ,
which means that we could use kQ qh qh k as an indicator to do mesh refinement.
(F.8)
To this end, for a given coarse mesh L , we compute the maximum value of
the left hand side of Equation (F.8) over all the coarse mesh elements:
= max kQ qh qh k0,K .
KT
240
The refinement process is stopped if either (F.8) is satisfied, the memory limit has
been reached, or the change of the computed solution in the energy norm is less
than a given tolerance. More formally, this indicator function will be refered to as
EG (K) = kQ qh qh k0,K ,
meaning that the element K T is refined if
EG (K) max EG (e).
eT
In order to show the efficiency of our error indicator, we shall compare the
mesh produced by our method with the jump error indicator of [107, 154]. The
error indicator of [107, 154] is based on the homogeneous Dirichlet problem for
u = f,
where one can show that a computed finite element solution of the flux qh satisfied
the error estimate
kqh qk0,K
v
u
2
u X
u
h
,
|i |2
khfkK + t
ni
i K
where [ ] is the jump on the edge i of K h and and are mesh independent
constants. When the conductivity is non-constant, and we solve for
(au) = f,
the error indicator
EJ (K) = khfkK +
s X
i K
2
|i|2 ni qh
(F.9)
is used instead. This method does not need to work on two meshes simultaneously.
As earlier, an element K h is then refined if
EJ (K) max EJ (e).
eTh
In the numerical experiments, the indicators EG (K) and EJ (K) will be compared.
F.3
241
Numerical results
To illustrate the performance of our flux recovery method and its use in mesh
adaption, we will compare our indicator EG (K) with the standard mesh adaption technique of [107, 154] given by EJ (K). Comparisons with other recovery
schemes has been provided in [138] and will not be repeated here.
The function value errors are computed by
v
uX Z
u
kuh uk0 = t
(uh u)2 dx dy.
(F.10)
KTh K
(F.11)
The integration over each element is computed by a sixth order quadrature rule.
Furthermore, we set = = 0.15 in Equation (F.9). The tolerance is chosen
experimentally on a case by case basis to ensure a smooth refinement, typically
= 0.4.
In the error-plots, we always use logarithmic axis in which the x-axis is the
degrees of freedom (DoF) of the triangulation, while the y-axis is either the function error or the gradient error. A solid line will denote the error computed on the
mesh produced by EG , dashed line is for the error by EJ and the dashed-dotted
line is for the error of the recovered gradient Q qh .
We shall also measure the rate of convergence of the gradients. The convergence rate is evaluated using the following formula:
order =
(F.12)
in which error(0) is the first computational gradient error, while error(k) is the
last. DoF(0) and DoF(k) are the corresponding degrees of freedoms.
242
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
Figure F.1: Error plots for the smooth problem in F.3.1, The rate of convergence
of the recovered gradient using EG is 0.85, while the rate is 0.53 using EG .
with zero Dirichlet boundary conditions. The solution to this problem is obviously
u = sin(x) sin(y).
The functional error and gradient errors are shown in Figure F.1, and the meshes
produced by the two indicator functions are presented in Figure F.2. The convergence rate for the recovered gradient using EG is clearly better than those
recovered by EJ , while the functional error is generally smaller for the mesh
produced by the EG indicator.
It is also instructive to investigate the error distribution of the gradients on the
different meshes. In Figures F.3-F.4 we have plotted the pointwise error
q
2
2
uhx ux + uhy uy
in which uhx and uhy are the computed solution derivatives. From the figure, it is
clear that the recovered gradient on the mesh produced by the EG indicator is
overall smaller, while the EJ mesh has a gradient with larger errors near the four
corners.
243
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0
0
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Figure F.2: Meshes for the smooth problem of F.3.1. The left mesh has 2513
nodes and a gradient error of 9.309 103 , while the right has 2589 nodes with a
gradient error of 5.7 102
Figure F.3: The distribution of the gradient errors on some coarser meshes produced by the two indicators for the smooth problem of F.3.1. The EG mesh has
1153 nodes with largest pointwise error of 0.060982, while the EJ mesh has 1337
nodes and largest error of 0.144607. Dark areas have large errors, while light
areas have small errors.
244
0.14
0.14
0.12
0.12
0.1
0.1
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0
1
0
1
0.8
1
0.6
0.8
0.6
0.4
0.4
0.2
0.8
1
0.6
0.8
0.4
0.2
0.2
0
0.6
0.4
0.2
0
Figure F.4: 3D plot of the same data as in Figure F.3. Note that the recovered
gradient is piecewise linear and globally continuous, while the gradient produced
by the linear finite element method is piecewise constant.
where a(x, y) is scalar-valued and equal 1 in the first and third quadrants, and
equal R in the other two quadrants. The solution in polar coordinates is u(r, ) =
r (), and is due to [162]. It is
cos
cos
2
2
cos () cos (( + ) ) ,
() =
cos () cos(( ) ) ,
cos 2 cos 3
2 ,
3
2
< 2
<
< 3
2
< 2
(F.14)
< 2,
max(0, ) < 2 < min(, ),
max(0, ) < 2 < min(, 2 ).
(F.15)
245
10
10
10
10
10
10
10
10
10
10
Figure F.5: Plots of the errors (F.11) for the problem in F.3.2 with = 0.5. The
convergence rate for the gradients is slower here, and it is clear that the EJ indicator does a rather better job than the EG indicator. The rate of convergence of the
EJ indicator is 0.55, while EG has a convergence rate of 0.36.
For a given value of , Equations (F.14)-(F.15) can be solved for , and , and in
[162] several solutions were provided where = /4. Here we will consider
= 0.5,
R 5.8284271247461907,
2.3561944901923448.
(F.16)
Adaptive finite element methods for this problem has been studied [63, 199], however their goal was reducing the error kuh uk0 and not necessarily the error in the
gradient. Applying the two indicator functions to this problem gives the computational results in Figure F.5 along with the produced meshes in Figure F.6. We see
that the function errors are steadily decreased, while the reduction of the gradient
error is slower. Also, the classical indicator EJ consistently outperforms the EG
indicator, even for the gradient.
Figure F.7 shows the gradient error distributions as was shown for the smooth
problem in F.3.1. While the EJ indicator has a small error in large portions of
the domain, however it has a larger error at origin. As the mesh is further refined,
the error in the origin is dominating.
246
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
Figure F.6: Meshes produced by the EG and EJ error indicators used on the problem in F.3.2 with = 0.5. The meshes for the other values of do not differ
much. The EG mesh is with 7993 nodes and a gradient error of 8.196 102 after
the gradient recovery has been applied. The indicator EJ produced a mesh with
8715 degrees of freedom and a gradient error of 4.325 102 .
Figure F.7: Gradient error distribution plot for the discontinuous coefficient problem of F.3.2. The left mesh has 909 nodes with maximum error of 2.105516 and
the right mesh has 826 nodes and a max error equal 4.148048.
247
10
10
10
10
10
10
10
10
10
10
10
10
10
10
Figure F.8: Plots of the errors for the corner singularity problem in F.3.3 with
= 4/3, giving the meshes in Figure F.9. The rate of convergence of the recovered
gradient is 0.84, while the rate of convergence of the gradient produced by the
standard adaption is 0.51.
=
in ,
on ,
.
The exact solution is u(r, ) = r sin () and the boundary data u0 is taken to be
consistent with the analytical solution. It is easy to see that the regularity of the
solution depends directly on . We will consider the case with = 4/3 and = 2/3,
u H 2 for the former choice, but not for the latter.
The meshes produced for the case of = 4/3 are shown in Figure F.9, and
plots of the error (F.11) are in Figure F.8. This can be compared with the case
of = 2/3 given in Figures F.10-F.11. In both cases we see that the recovered
gradient is clearly superconvergent, while the function errors comparing the two
meshes are comparable to one another.
We note that this particular corner singularity problem has been studied earlier
in [106] where an indicator function based on the jumps of the gradient across the
edges of each element was used. The tests here show clearly that our method can
248
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
Figure F.9: Meshes produced by the mesh adaptivity schemes applied to the corner
singularity problem of F.3.3 with = 4/3. The mesh given by EG is shown with
2144 nodes and has a recovered gradient error of 2.967 103 while the mesh
produced with EJ is with 1925 nodes and an error of 9.313 103 .
10
10
10
10
10
10
10
10
10
10
10
10
Figure F.10: Plots of the errors for the corner singularity problem in F.3.3 with
= 4/3, with the associated meshes of Figure F.11. Here the convergence rate
of the recovered gradient using EG is 1.12, and it has a rate of 0.69 for the EJ
indicator.
249
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
Figure F.11: Meshes produced by the mesh adaptivity schemes applied to the
corner singularity problem of F.3.3 with = 2/3. Left mesh has 1297 nodes
with a recovered gradient error of 1.03 102 and the right has 1473 nodes and a
gradient error of 2.127 102 .
handle this problem better than the EJ indicator, especially with regards to the
accuracy of the recovered gradient.
The analytical solution is u(r, ) = r sin 2 and the Dirichlet boundary data is
chosen consistent with this solution. Note that u is singular as |x|, |y| 0.
Error-plots are given in Figure F.12 followed by the corresponding meshes in
Figure F.13. Again, note the superconvergence behaviour of the recovered gradient. This time around, the EJ indicator produces the smallest functional error, but
the recovered gradient given by EG is an order of magnitude more accurate than
the regular gradient.
250
10
10
10
10
10
10
10
10
10
10
10
10
Figure F.12: Plots of the errors for the mixed Dirichlet/Neumann problem in
F.3.4. The rate of convergence for the recovered gradient for EG is 1.12, while
EJ indicator has a rate of 0.72.
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
0
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
Figure F.13: Meshes produced by the different mesh adaptivity schemes applied to
the mixed Dirichlet/Neumann problem of F.3.4. The mesh for the EG indicator is
shown with 1447 nodes and a projected gradient error of 7.671 103 while the EJ
produced mesh has 1565 degrees of freedom and a gradient error of 2.078 102 .
F.4
251
Concluding remarks
Mesh adaptivity has been widely studied in the literature. We have proposed a
two-mesh superconvergence method which provides both a much more accurate
gradient than classical methods, and offers an associated mesh refinement indicator.
The advantages of this scheme is that it can be easily used for a wide range
of problems. The implementation for different kind of problems is easier than
other approaches. Our numerical experiments show that the method works well
for smooth problems as well as for problems with singularities.
Paper G
High performance numerical
libraries in Java
Bjrn-Ove Heimsund
Abstract
In this paper we outline a set of numerical libraries for high performance scientific
computing. The libraries are implemented in the Java programming language,
which offers modern features such as clean object orientation, platform independence, garbage collection and more. Java has been regarded as unsuitable for high
computing due to its conceived low performance. In this article we show how our
design has resulted in a set of libraries which match the performance of native
numerical libraries implemented in Fortran and C++, while offering better ease of
use.
Draft manuscript
G.1 Introduction
G.1
255
Introduction
In the last few years there have been several attempts at creating structured, extensible and user-friendly numerical software to supplement the large existing base of
Fortran codes. Fortran 77 has been a preferred language for numerical calculations due to its high-performance compilers and intrinsic support for matrices
and mathematical functions, however with the increasing complexity of application and library codes, it has become necessary with more sophisticated language
constructs in order to better manage this complexity.
One of the chief developments have been in object oriented (OO) programming, which is based on the principles of object polymorphism and reuse. Polymorphism allows the use of one interface but with multiple different implementations, and reuse is a mechanism to extend an object to either change its intrinsic
behaviour or to implement other interfaces. For example, we could create a matrix interface, and define dense and banded matrix objects. An end-user can thus
use either storage type without changing the application code. Other examples
lie in iterative system solvers, where the user can easily change between different
algorithms. Furthermore, by extending the dense matrix, we can create a custom
algorithm implementation without creating a new matrix structure from scratch.
We could for example implement the Strassen matrix/matrix product for easy performance comparisons.
Another, perhaps more important development, is the use of automatic garbage
collectors. In traditional programming, programmers are forced to perform both
resource allocation and deallocation themselves, and there are numerous pitfalls
when deallocating resources, such as freeing memory twice or forgetting to free
causing a memory leak. Studies [42] have shown that 30% to 40% of the development time is spent on storage management, and this estimate does not include the
time used debugging memory related errors. Garbage collectors circumvent the
whole issue by moving resource deallocation away from the programmer into the
runtime environment. Modern garbage collectors are also highly concurrent and
can approach the performance of manual memory management [43, 283].
Also there has been an increased focus on portability of both codes and compiled applications. While standards such as POSIX (Portable Operating System
Interface) provides a common interface across a wide range of computers, it does
not expose much of the advanced functionality available, and consequently there
are many applications which uses more functions than are found in POSIX and
are not easily ported. Furthermore, commercial software is seldom delivered with
source codes, and may often require much time before it can be made available
on new architectures. It has therefore been an increased interest in binary compatibility by the use of a virtual machine (VM), which provides a rich and uniform
interface to underlying operating systems. In the last decade many VMs have
256
DMT
I
6 @
MPP
SMT
I
6 @
Java VM
MVIO
MT
I
6 @
JLAPACK
NNI
Machine
dependent
LAPACK
6
BLAS
Figure G.1: Hierarchy of the main components presented in this paper with their
interdependencies. Components in italic are external.
gained prominence, examples being Sun Java and Microsoft .Net. Of these, the
Java VM is available on almost every combination of hardware and operating system.
In this paper we will describe a set of numerical libraries which is built using
the Java programming language. Java is a modern language offering the full benefits of objects oriented programming while providing features such as garbage
collection and complete binary portability. Preliminary versions of Java gained
a reputation for slow execution speed, however newer releases have added advanced techniques in order to provide highly optimized performance. The most
prominent improvements have been the addition of just-in-time (JIT) compilers
which recompiles Java code into optimized assembly code fully on the fly without
compromising portability. A JIT compiler has several advantages over a traditional compiler, one of which is that it can examine program execution and detect performance bottlenecks (hotspots) as they occur, and then apply aggressive
optimization selectively. Another advantage is code elimination of checks and
synchronisations that can be proven redundant during execution (that is, a given
bounds check will never fail or that there will never be any contentions on a given
synchronisation lock). A JIT compiler can also perform other types of optimisations, for example function call inlining and automatic data-alignment.
Our aim is to show that the performance of our numerical libraries are competitive with the best numerical libraries currently available, while offering substantial benefits due to garbage collection and portability. We will first describe the
basic libraries, then add and extend functionality such that we end up with a set
of objects suitable for solving large scale numerical problems on supercomputers
and clusters of computers, see Figure G.1. During the course of this development, we will compare against basic libraries such as BLAS [172, 97, 96, 94, 95],
257
Sparskit [230], and more sophisticated libraries such as [99, 98, 24, 241]. Both
performance and functionality will be compared.
G.2
258
Java code
public class BLAS {
public static native double norm2(int N, double[] X, int incX);
// ...
}
C code
JNIEXPORT jdouble JNICALL Java_BLAS_norm2
(JNIEnv *env, jclass clazz, jint N, jdoubleArray X, jint incX) {
jdouble *Xl = NULL, ret = 0;
Xl = (*env)->GetPrimitiveArrayCritical(env, X, NULL);
ret = cblas_dnrm2(N, Xl, incX);
(*env)->ReleasePrimitiveArrayCritical(env, X, Xl, JNI_ABORT);
return ret;
}
Figure G.2: Example of calling the BLAS subroutine nrm2 from Java. Notice that
the method in the Java source has the keyword native and does not include an
implementation. A corresponding C code does the call to the C BLAS. The use of
the GetPrimitiveArrayCritical function returns a pointer to the Java array X,
and this pointer is then passed on to the BLAS subroutine.
cially for larger problems. Despite this, JLAPACK is useful for prototyping, small
dense matrix problems, and also for certain large-scale problems involving large
numbers of small matrices or vectors.
We will now illustrate the performance differences between the libraries we
have discussed. BLAS has three types of operations, those acting on vectors (level
1), those acting on matrices and vectors (level 2) and operations solely for matrices
(level 3). We test the performance of one routine from each level, comparing
different implementations.
From level 1 we chose daxpy which is the vector update operation
y = x + y,
x, y Rn ,
R.
A : Rn 7 Rn ,
x, y Rn ,
, R.
Lastly, we will also test with the classical level 3 routine dgemm for dense matrix/matrix multiplication:
C = AB + C,
A, B, C : Rn 7 Rn ,
, R.
The measure of performance is the number of floating points per second (flops),
and by assuming that addition and multiplication consumes one flop each and
disregarding memory performance issues, the three BLAS routines require the
259
G.3
BLAS and LAPACK has similar functions for a range of different types of
matrices. For instance, LAPACK offers optimized equation solvers for Ax = b
depending on whether A is symmetrical, positive definite, just symmetrical, triangular, or general. These algebraic properties help LAPACK chose more efficient
algorithms. Furthermore, LAPACK can also exploit some types of sparsity, such
as banded, triangularity, and tridiagonal matrices, all of which help save memory.
When using BLAS and LAPACK directly, the user must handle all the details
of matrix-storage, which can be non-trivial for some of the sparse formats, and
260
1800
BLAS
C
NNI
Java
JLAPACK
1600
1400
1200
1000
800
600
400
200
0
0
2000
4000
6000
8000
10000
1500
BLAS
C
NNI
Java
JLAPACK
1000
500
0
0
200
400
600
800
1000
261
4000
3500
3000
BLAS
C
NNI
Java
JLAPACK
2500
2000
1500
1000
500
0
0
100
200
300
400
500
Figure G.5: Millions of floating point operations per second in performing dense
matrix/matrix multiplication C = AB + C as function of the matrix size.
must also find the most optimal method for the given combination of algebraic
properties and storage format.
It is obvious that we can create a set of matrices with a common interface that
masks all the differences. These matrices should encapsulate matrix properties
such as symmetry or triangularity, but also space-savings such as bandedness and
packed layouts. In good object-oriented style, we will try to maximise reuse by
designing a hierarchy.
In what follows, we will outline the Matrix Toolkit (MT) which provides a
convenient high-level Java interface to BLAS and LAPACK. It has generic BLAS
and LAPACK interfaces, and upon startup a check is made to see if NNI is available for use. If so, its used, and otherwise JLAPACK is used instead. This ensures high performance without compromising portability. The overhead of an
extra function call to NNI or JLAPACK is negligible, see Figure G.6.
262
4000
3500
3000
2500
2000
1500
1000
500
0
0
BLAS
MT
100
200
300
400
500
Figure G.6: Performance comparison of calling dgemm from C BLAS and from
the Matrix Toolkit. The differences are negligible.
Conventional storage
The conventional dense storage is based on the Fortran matrix construct. Since
Fortran 77 lacks the means to create user-defined structures and classes, it provides
a small set of useful high-level mathematical types, such as complex numbers and
matrices. A Fortran matrix can be accessed as follows:
DOUBLE PRECISION A(M, N)
DO J=1:N
DO I=1:M
A(I,J) = ...
END DO
END DO
Matrices are laid out column major, that is, columns follow each other sequentially
in memory. For example, the above code fragment has an inner loop moving down
one column at the time, which is the optimal way to traverse a Fortran matrix.
Since computers only operate on linear memory segments (vectors), A(I,J)
is actually accessed as
Aij = A [i + jn],
A : Rm 7 Rn .
In our Java codes, we create one long array of double precision numbers and access them as column major constructs. This results in full Fortran interoperability.
a11
Matrix
a12 a13 a14
a22 a23 a24
a33 a34
a44
263
Array
a11 a12 a22 a13 a23 a33 a14 a24 a34 a44
|{z} | {z } | {z }
Matrix
a11
a21 a22
Array
a11 a21 a31 a41 a22 a32 a42 a33 a43 a44
| {z } | {z } |{z}
It is clear that a more economical storage is possible for symmetrical and triangular matrices. By packing the relevant triangle by columns into a one dimensional vector we get the packed matrix format. For upper triangular/symmetrical
matrices, A(I,J) is accessed as (see Figure G.7)
(j + 1)j
Aij = A i +
, A : Rn 7 Rn ,
2
and lower triangular/symmetrical matrices are accessed by (see Figure G.8)
(j 1)j
, A : Rn 7 Rn .
Aij = A i + jn
2
Unit triangular matrices can also be stored in the packed format, but the main
diagonal is still stored, however it is not accessed.
264
a11
a21
a31
Banded matrix
a12
a22 a23
a32 a33 a34
a42 a43 a44 a45
a53 a54 a55
a11
a21
a31
Fortran matrix
a12
a22
a32
a42
a43 a54
a53
Array
a11 a21 a31 a12 a22 a32 a42 a23 a33 a43 a53 a34 a44 a54 a45 a55
| {z } | {z } | {z } | {z } | {z }
Figure G.9: Matrix stored in the banded format. It has one upper diagonal (ku = 1)
and two lower diagonals (kl = 2). denotes entries that are not accessed but still
stored. The linear array shows how the matrix is laid out in memory.
Banded matrices
A : Rn 7 Rn .
265
Matrix
AbstractMatrix
AbstractDenseMatrix
DenseMatrix
TriangDenseMatrix
SymmDenseMatrix
AbstractPackMatrix
TriangPackMatrix
SymmPackMatrix
AbstractBandMatrix
BandMatrix
TriangBandMatrix
SymmBandMatrix
Figure G.10: The main parts of the matrix hierarchy. The topmost Matrix is an
abstract interface which specifies the available methods, but not their implementation. AbstractMatrix implements the methods in Matrix by means of generic
algorithms using iterators and elemental access operations. However it does not
specify the storage layout. Extending AbstractMatrix is a number of abstract
structures which specify actual storage layout. Finally, concrete matrices add
BLAS and LAPACK algorithms to override and complement the algorithms in
AbstractMatrix while using the specified storage layout
the top is the generic Matrix interface. It specifies methods which retrieve entries,
sets entries, multiplies the matrix and so forth.
AbstractMatrix implements the Matrix interface, but does not specify how the
matrix is stored. Instead, is provides implementations of most of the methods
specified by means of elemental access operations and iterators. This allows subclasses to only implement those methods which they can perform efficiently, while
delegating the rest to AbstractMatrix.
Further extending the AbstractMatrix are a set of matrices which specify the
storage layout and elementary access operations, but not the algorithms themselves. These provide get and set methods for working with the entries of the
matrix. So whether the matrix is banded, packed or dense,
double Matrix.get(int i, int j);
always retrieves Aij , reliving the user of the sometimes complicated underlying
details.
LAPACK and BLAS algorithms are chosen by specific matrices such as
DenseMatrix or TriangPackMatrix. For instance, direct solvers are different for
these formats, the former must use a general LU with partial pivoting, while the
latter can be solved for without factorising. As always, the underlying differences
are masked away, and one can solve Ax = b by calling
Vector Matrix.solve(Vector b, Vector x);
266
An important part of the matrix implementations are the matrix iterators. A matrix
iterator passes over all the stored matrix entries, returning one entry at a time,
however the details of the traversal are matrix specific.
Iterators are very useful in constructing generic algorithms [247, 248], and
have been used with much success in constructing high performance generic matrix operations [241]. Furthermore, sparsity is naturally exploited by iterators since
they only traverse the relevant parts of the matrix.
Figure G.11 shows two implementations of the matrix scaling operation
A A,
one using get and set operations, the other using an iterator. There are many
differences, firstly the syntax for the iterator is very clear and concise, resulting
in more maintainable code. Secondly, the iterator correctly exploits any kind of
sparsity while the elemental operations loop over the whole of the matrix. Thirdly,
and most important, iterators can be much faster when traversing very sparse
structures such as narrow banded and tridiagonal matrices.
The matrix entry returned at each step in the iterator is typically reused for
higher performance. Also, it contains current row and column indices as well.
Vectors
267
Figure G.12: Code snippet for solving a linear 3 3 system and computing the
final residual. An LU decomposition is used from LAPACK when calling the
solve method
Due to the results in Figure G.3, DenseVector does not call BLAS for any
of its operations, instead they are all implemented in pure Java. We have not
registered any performance degradation due to this in practical applications.
Finally, vectors also have iterators which works in the same way as that of the
matrices, with the only difference that they only traverse one dimension. While an
iterator is not so useful for a dense vector, they are much used for sparse vectors
introduced later on.
268
G.4
The most simple matrix is the coordinate format. It uses three arrays, rows,
columns and entries, each of the same length, see Figure G.13. When retrieving
an entry from the matrix, the arrays must be search linearly for a match which can
be very slow. For the converse case of adding a new entry, a search must first be
done to see if the entry is already in the matrix. If it is not there, the new entry can
be appended to the end of the three arrays, possibly after performing a reallocation. It is clear that the coordinate format does not offer very good performance,
and is it also not the most memory efficient.
a11
a21
a31
Matrix
a13
a22
a24
a33
a44
269
Arrays
entries a11
columns 0
rows
0
a13
1
0
a21
0
1
a22
1
1
a24
3
1
a33
0
2
a33
2
2
a44
3
3
a33
0
5
a33
2
a44
3
7
a11
a21
a31
Matrix
a13
a22
a24
a33
a44
Arrays
entries a11
columns 0
rowP
0
a13
1
a21
0
2
a22
1
a24
3
Figure G.14: Storage of a matrix in the compressed row format. The rowP array
is compressed, its formatting used here is for simplicity.
Compressed storage
The compressed storage is the most common layout found in practice. For the row
oriented type, three arrays are used, one for the column indices, one for the entries
and a third which stores the offsets of each row, see Figure G.14. Retrieval can
now be done more efficiently. First the correct row is found by looking at rowP,
then the column indices of that row is searched. By keeping the column indices
within a row sorted, the performance is further increased.
When adding an entry, we also do an initial search to see it it is already there.
But if the entry is not found the resulting reallocation can be very expensive as
the column and entry arrays must be recreated. However, by preallocating reserve
storage this expensive operation may not be necessary.
Vector storage
A third popular format is to store each row of the matrix as a sparse vector. The
advantage of this is that reallocation is cheap, only the relevant row needs to be
allocated again and not the whole matrix. A minor drawback is that the extra layer
of abstraction may reduce performance slightly.
The Sparse Matrix Toolkit provides two sparse vectors, one which stores indices and entries in sorted arrays much like the coordinate storage, and another
backed by a custom high performance hashmap. The hashmap is often faster for
270
y=0
for i = 1 : nnz
y(rows(i)) = y(rows(i)) + entries(i) x(columns(i))
for i = 1 : m
y(i) = 0
for j = rowP(i) : rowP(i + 1)
y(i) = y(i) + entries(j) x(columns(j))
for i = 1 : m
y(i) = row(i)T x
Figure G.15: Pseudocodes for the matrix/vector product y = Ax for the three
sparse matrix formats. The topmost is the coordinate, second is the compressed,
and the last is the vector storage.
frequent retrieval operations, but has some extra overhead in terms of memory
usage compared to the sparse array based vector.
When the sparsity pattern is known, a compressed format is usually the fastest in
both assembly and when performing matrix/vector products. However, for changing memory requirements, a flexible vector format can be preferable instead.
Matrix/vector product pseudocodes are given in Figure G.15. Comparing them
we see that the coordinate format traverses the vectors x and y unordered. The
compressed has ordered traversal of y which may help performance. Most interesting is the vector storage, where the product can be expressed as a simple vector
innerproduct where one of the vectors is sparse.
To compare the performance of the matrices, we run a simple application
which firsts inserts entries into an empty matrix and times how long it takes to
insert all the entries. While this test can be run without preallocation, this is very
slow so we instead preallocate sufficient storage for all entries. Next, the numerical values of the matrix are zeroed (but the index structure is preserved) and
the entries are reinserted. This measures any benefit of structural reuse. Lastly,
matrix/vector performance is also measured. See Table G.1.
Compressed
Vector
Hash
Sparse
271
Initialisation Assembly y = Ax
9.62
6.6589
265
34.467
13.979
7.619
7.378
88
198
Table G.1: Performance of the sparse matrices for assembly and in matrix/vector
products. Initialisation refers to creating the matrix entries (indices and numerical
values) starting from a preallocated storage. Assembly refers to just inserting the
numerical values into an existing sparse matrix. These two tests are reported in
seconds, while the last test is in millions of floating point operations per second.
The matrix is a 100.000 100.000 matrix with 100 nonzero entries on every row.
272
A.multAdd(-1, x, b, r);
for (iter.setFirst();
!iter.converged(r, x); iter.next()) {
M.apply(r, z);
rho = r.dot(z);
if (iter.isFirst())
p.set(z);
else {
beta = rho / rho_1;
p.add(z, beta);
}
A.mult(p, q);
alpha = rho / p.dot(q);
x.add(alpha, p);
r.add(-alpha, q);
rho_1 = rho;
}
Figure G.16: The mathematical PCG algorithm and its implementation in SMT.
273
SMT
Sparselib 1 thread 2 threads
MEMPLUS, 126150 entries
326
573
0%
FIDAPM37, 765944 entries
173
199
61%
FIDAP011, 1091362 entries
109
120
62%
S3DKQ4M2, 4427725 entries
29
67%
Table G.2: Number of iterations per second for the PCG method. The C++
Sparselib is compared against the Java SMT library for some selected matrices
from [44]. The speedup of the threaded SMT is also shown in percent. Matrix
S3DKQ4M2 was not compatible with Sparselib.
G.5
Distributed computing
274
Switch
6
?
6 KA A
A A
A A
? A AU
Node1
- Node2
6
?
Node3
?
- Node4
Figure G.17: Two typical scenarios for connecting compute nodes. The switch
based network allows direct communication between nodes, while the grid requires node to pass information along using tagged messages.
These send or receive to/from the given peer node. Similar functions are provided
for all primitive arrays (char, short, int, long, float and double).
The most important difference from MPI is the lack of a message tag. Messages needs to be tagged when nodes cannot communicate directly with one another, but must send messages in stages. For example, consider the case of Figure G.17(b). If node 1 is to send a messages to node 4, it can be routed either
through node 2 or 3. Now, if multiple distinct messages are sent from 1 to 4,
each message must have a unique tag to distinguish them from one another. The
point-to-point nature of Figure G.17(a), guarantees that messages are delivered in
the correct order and consequently does not need any tags.
A further simplification we can make lies in the communication of large datasets. In the grid case, suppose node 1 is to send a large array of data to node 4
in a single message. The intermediary node (say 2) has two options: wait until
275
node 4 is ready to receive and then start streaming data from 1 and to 4, or locally store the data from node 1, passing it on to 4 when 4 is ready. For the fully
connected network, sends do not need to wait until matching receives have been
posted. Instead sends are processed immediately and the underlying operating
system handles all the buffering details.
As our implementation is rather simple, we will show the core send and receive
methods as implemented. The code for send is as follows:
void send(byte[] data, int offset, int length, int peer) {
while (length > 0) {
// Determine how many bytes to write
int cLength = Math.min(length, outputBuffer.capacity());
// Fill buffer with data from current offset
outputBuffer.clear();
outputBuffer.put(data, offset, cLength);
// Write the buffer to the channel
int num = cLength;
outputBuffer.flip();
while (num > 0)
num -= outputChannel[peer].write(outputBuffer);
// Advance offsets
offset += cLength;
length -= cLength;
}
}
276
inputBuffer.clear();
while (num > 0)
num -= inputChannel[peer].read(inputBuffer);
// Get data from the buffer
inputBuffer.flip();
inputBuffer.get(data, offset, cLength);
// Advance
offset += cLength;
length -= cLength;
}
}
Asynchronous operations
Asynchronous sends and receives runs the operations in the background, freeing
the application to perform other tasks simultaneous. This is useful for effectively
implementing collective operations and for overlapping computation with communication in numerical codes. The signatures for these sends and receives are:
Future Communicator.isend(byte[] data, int offset, int length, int peer);
Future Communicator.irecv(byte[] data, int offset, int length, int peer);
The asynchronous operations are implemented by running a synchronous operation in a separate thread. For efficiency, a pool of worker threads are used to run
all the asynchronous tasks. Since new threads are recycled rather than recreated,
this results in a very low latency.
Collective operations
The most common collective operations found in MPI 1.2 are implemented. The
list includes
fast cyclic barriers,
one-to-all broadcast,
all-to-one reductions and prefix scans,
all-to-one gathers and vectorised gathers,
one-to-all scatters and vectorised scatters,
all-to-all personalised communications.
The operations are coded for the fully connected network model, and we will not
provide details, see [127] instead.
277
10
10
10
MPP
LAM
10
10
10
10
10
Figure G.18: Bandwidth performance comparison between MPP and LAM. The
curve plotted is the number of bytes transmitted per second from one node to
another as function of the array size.
Performance comparisons
To verify the performance of MPP we compare it against LAM (Local Area Multicomputer, a freely available MPI library). Here we will test bandwidth and
latency characteristics for the point to point operations.
Bandwidth is measured at how many bytes per second one node is able to
send to another node using a synchronous transfer mode. Latency is how many
milliseconds it takes before a message arrives at the other node. For both tests, we
vary the number of bytes transmitted. Results are plotted in Figures G.18-G.19 for
a switched gigabit network. It is clear that MPP offers performance competitive
with LAM.
Instead of performing microbenchmarks for the collective and asynchronous
operations, we will instead introduce the Distributed Matrix Toolkit (DMT) and
run large scale benchmarks using it. DMT uses many of the asynchronous and
collective operations, and it is therefore suitable as a benchmark tool.
278
10
10
10
10
MPP
LAM
2
10
10
10
Figure G.19: Latency performance comparison between MPP and LAM. Plots the
number of transactions per millisecond as a function of the array size.
A1
A2
A3
A4
(a) Row-wise partition
B1
A1
A2
B3
B4
A 1
B2
A 2
A 3
A3
A 4
A4
Figure G.20: Partition of the global system matrix A onto four nodes. Ai and
Bi are local to node i. Note that Bi is exactly zero in the block Ai occupies. A
corresponding overlapping preconditioner matrix is easy to construct.
Our aim is to use the Krylov subspace iterative methods, and we thus consider
computing the matrix/vector product
y = Ax,
A : RN 7 RN ,
x, y RN ,
(G.1)
279
1. Compute z = Bi xi for a z RN .
2. Send the non-zero entries of z to the relevant nodes.
3. Compute yi = Ai xi .
4. Add into yi the values of z computed from other nodes
Figure G.21: Algorithm for computing y = Ax with the distributed matrix format
given in Figure G.20(b). The communication started in step 2 can be overlapped
with the computation in step 3.
to the global set, Equation (G.1) can be rewritten locally as
+
y = Ai xi .
(G.2)
The symbol = means that locally computed results are added into the global vector. There are two problems with this simple decomposition, we have to have
access to the whole of y on every node, and we cannot overlap computations and
communications. To overcome these issues, we split Ai into a block diagonal part
Ai and a remainder Bi :
Ai = Ai + Bi ,
Ai : Rn 7 Rn ,
Bi : Rn 7 RN .
Is is assumed Bi = 0 where it overlaps Ai , see Figure G.20(b). Rewriting Equation (G.2) to use this decomposition gives
yi = Ai xi ,
y = Bi xi ,
yi Rn .
(G.3)
280
Table G.3: Number of iterations per second on average for the PCG method when
using the Distributed Matrix Toolkit (DMT). Percentage of optimal scalability,
compared to Table G.2, is given in parenthesis.
the latter requires that we create an augmented matrix A i which contains some
rows and columns of Bj , j = {i 1, i, i + 1} and Aj , j = {i 1, i + 1}. See Figure G.20(c). Sequential subdomain solvers such as ILU or ICC can then be applied.
To test DMT, we solve the preconditioned Conjugate Gradients (PCG) problem again as we did for SMT. The only difference is that we now use distributed
memory matrices and vectors. Results are given in Table G.3.
Finally, we mention that our design has been inspired by [259, 24], however it
is somewhat more flexible since we allow any kind of subdomain matrices Ai and
Bi , not just a compressed row storage.
G.6
Summary
We have presented a comprehensive set of numerical libraries for scientific computing in Java. Performance figures shows it to be competitive with other libraries
while offering many of the benefits of Java such as object orientation and a comprehensive runtime environment. Also, all the software presented in this paper is
freely available at
http://www.math.uib.no/bjornoh/mtj
Only the most noteworthy features of the libraries have been presented herein.
Some omissions are column-oriented sparse formats, sparse eigenvalue solvers,
solvers for saddle point problems and nested parallelization (combining SMT and
DMT for multilevel parallelization).
There are some limitations of our implementation. The first is the almost
exclusive use of double precision as underlying numerical format. Complex numbers and extended precision are not supported, but may be if proper template
mechanisms are added into Java. Another limitation is that Java arrays are addressed with signed 32bit integers. Since an array has only positive entries, this
limits us to arrays of length 231 . While this may seem like a lot, dense matrices
G.6 Summary
281
are stored
as one long array, and a quadratic n n matrix can then not exceed
p
n = 231 = 46341, which is not a particularly large number. It is hoped that true
64bit addressing is added before this becomes problematic.
Bibliography
[1] J. Aarnes. Efficient domain decomposition methods for elliptic problems
in heterogeneous porous media. To appear in Computing and Visualization
in Science.
[2] J. Aarnes. On the use of a mixed multiscale finite element method for
greater flexibility and increased speed or improved accuracy in reservoir simulation. SIAM Journal on Multiscale Modeling and Simulation,
2(3):421439, 2004.
[3] J. Aarnes and T. Y. Hou. Multiscale domain decomposition methods for
elliptic problems with high aspect ratios. Acta Mathematica Applicatae
Sinica, 18(1):6376, 2002.
[4] I. Aavatsmark. Mathematische Einfuhrung in die Thermodynamik der
Gemische. Akademie Verlag, 1995.
[5] I. Aavatsmark. Bevarelsesmetoder for elliptiske differensialligninger. Technical report, Institutt for Informatikk, University of Bergen, 2004.
[6] I. Aavatsmark. Bevarelsesmetoder for hyperbolske differensialligninger.
Technical report, Institutt for Informatikk, University of Bergen, 2004.
[7] I. Aavatsmark. Mehrpunktflussverfahren. Technical report, Institut fur
Wasserbau, Universitat Stuttgart, 2004.
[8] I. Aavatsmark. Interpretation of a two-point flux stencil for skew parallelogram grids. Submitted to Computational Geosciences, 2005.
[9] I. Aavatsmark, T. Barkve, . Be, and T. Mannseth. Discretization on unstructured grids for inhomogeneous, anisotropic media. Part I: Derivation
of the methods. SIAM Journal on Scientific Computing, 19:17001716,
1998.
284
Bibliography
[10] I. Aavatsmark, T. Barkve, . Be, and T. Mannseth. Discretization on unstructured grids for inhomogeneous, anisotropic media. Part II: Discussion
and numerical results. SIAM Journal on Scientific Computing, 19:1717
1736, 1998.
[11] I. Aavatsmark, E. Reiso, and R. Teigland. Control-volume discretization
method for quadrilateral grids with faults and local refinement. Computational Geosciences, 5(1):123, 2001.
[12] G. Acs,
S. Doleschall, and E. Farkas. General purpose compositional
model. SPE, 10515, 1985.
[13] M. Al-Lawatia, R. C. Sharpley, and H. Wang. Second order Characteristic
Methods for Advection-Diffusion Equations and Comparisons to Other
Schemes. Advances in Water Resources, 22(7):741768, 1999.
[14] M. B. Allen. Collocation Techniques for Modeling Compositional Flows in
Oil Reservoirs, volume 6 of Lecture Notes in Engineering. Springer, 1984.
[15] M. B. Allen, G. A. Behie, and J. A. Trangenstein. Multiphase Flow in
Porous Media, volume 34 of Lecture Notes in Engineering. Springer, 1988.
[16] L. Ambrosio and H. M. Soner. Level set approach to mean curvature flow in
arbitrary codimension. Journal on Differential Geometry, 43(4):693737,
1996.
[17] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,
J. D. Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen.
LAPACK Users Guide. SIAM, third edition, 1999.
[18] A. Arakawa. Computational Design for Long-Term Numerical Integration
of the Equations of Fluid Motion: Two-Dimensional Incompressible Flow.
Part I. Journal of Computational Physics, 1:119143, 1966.
[19] A. Arakawa and V. Lamb. A Potential Enstrophy and Energy Conserving
Scheme for the Shallow Water Equations. Monthly Weather Review,
109:1836, 1981.
[20] D. Arnold. An interior penalty finite element method with discontinuous
elements. SIAM Journal on Numerical Analysis, 19(4):742760, 1982.
[21] D. N. Arnold, F. Brezzi, B. Cockburn, and L. D. Marini. Unified analysis
of discontinuous Galerkin methods for elliptic problems. SIAM Journal on
Numerical Analysis, 39(5):17491779, 2002.
Bibliography
285
286
Bibliography
Bibliography
287
288
Bibliography
Bibliography
289
[70] K. H. Coats. Use and misuse of reservoir simulation models. SPE, 2367,
1969.
[71] K. H. Coats. Simulation of steamflooding with distillation and solution gas.
SPE, 5015, 1976.
[72] K. H. Coats. A highly implicit steamflood model. SPE, 6105, 1978.
[73] K. H. Coats. Reservoir simulation: State of the art. SPE, 10020, 1982.
[74] B. Cockburn, G. Karniadakis, and C.-W. Shu, editors. Discontinuous
Galerkin Methods. Theory, Computation and Applications, volume 11 of
Lecture Notes in Computational Science and Engineering. Springer, 2000.
[75] B. Cockburn and C.-W. Shu. The local discontinuous Galerkin method for
time-dependent convection-diffusion systems. SIAM Journal on Numerical
Analysis, 35:24402463, 1998.
[76] D. Colton, R. Ewing, and W. Rundell. Inverse Problems in Partial Differential Equations. SIAM, 1992.
[77] Computer Modelling Group. GEM: Advanced Compositional Reservoir
Simulator.
[78] Computer Modelling Group. IMEX: Advanced Oil/Gas Reservoir Simulator.
[79] Computer Modelling Group.
Reservoir Simulator.
290
Bibliography
Bibliography
291
292
Bibliography
Bibliography
293
294
Bibliography
Bibliography
295
296
Bibliography
Bibliography
297
[173] P. D. Lax. Weak solutions of nonlinear hyperbolic equations and their numerical computation. Communications on Pure and Applied Mathematics,
7:159193, 1954.
[174] R. J. LeVeque. Numerical Methods for Conservation Laws. Birkhauser,
1992.
[175] R. J. LeVeque.
Finite Difference Methods for Differential Equations. http://www.amath.washington.edu/rjl/booksnotes.html,
1998. Unpublished text.
[176] M. C. Leverett. Capillary behaviour in porous solids. Transactions of the
AIME, 142:159172, 1941.
[177] W. B. Lewis and M. C. Leverett. Steady flow of gas-oil-water mixtures
through unconsolidated sands. Transactions of the AIME, 142:107116,
1941.
[178] R. Li, T. Tang, and P. Zhang. Moving mesh methods in multiple dimensions
based on harmonic maps. Journal on Computational Physics, 170(2):562
588, 2001.
[179] R. Li, T. Tang, and P. Zhang. A moving mesh finite element algorithm for
singular problems in two and three space dimensions. Journal on Computational Physics, 177(2):365393, 2002.
[180] S. Liang. Java Native Interface: Programmers Guide and Specification.
Addison-Wesley, 1999.
[181] J.-L. Lions. Probl`emes aux limites non homog`enes a` donees irreguli`eres:
Une method dapproximation. In Numerical Analysis of Partial Differential
Equations, pages 283292. CIME, 1968.
[182] W. Liu and N. Yan. Quasi-norm a priori and a posteriori error estimates
for the nonconforming approximation of p-Laplacian. Numerische Mathematik, 89(2):341378, 2001.
[183] W. Liu and N. Yan. Quasi-norm local error estimators for p-Laplacian.
SIAM Journal on Numerical Analysis, 39(1):100127, 2001.
[184] W. Liu and N. Yan. Some a posteriori error estimators for p-Laplacian
based on residual estimation or gradient recovery. Journal of Scientific
Computing, 16(4):435477 (2002), 2001.
298
Bibliography
Bibliography
299
[199] P. Morin, R. H. Nochetto, and K. G. Siebert. Data oscillation and convergence of adaptive FEM. SIAM Journal on Numerical Analysis, 38:466488,
2000.
[200] I. Navon. Implementation of a posteriori methods for enforcing conservation of potential enstrophy and mass in discretized shallow-water equations
models. Monthly Weather Review, 109:946958, 1981.
[201] H. Nessyahu and E. Tadmor. Non-oscillatory central differencing for hyperbolic conservation laws. Journal of Computational Physics, 87:408463,
1990.
[202] T. K. Nilssen. Parameter estimation with the augmented Lagrangian
method and a study of some fourth order problems. PhD thesis, University of Bergen, October 2001.
[203] T. K. Nilssen, 2002. Personal communications.
[204] T. K. Nilssen and X.-C. Tai. Parameter estimation with the augmented Lagrangian method for a parabolic equation. Journal of Optimization Theory
and Application, 124(2), February 2005.
[205] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, 1999.
[206] J. M. Nordbotten and G. T. Eigestad. Discretization on quadrilateral grids
with improved monotonicity properties. Journal of Computational Physics,
203(2), 2004.
[207] F. K. North. Petroleum geology. Allen & Unwin, 1985.
[208] J. T. Oden, I. Babuska, and C. E. Baumann. A discontinuous hp finite
element method for diffusion problems. Journal of Computational Physics,
146:491519, 1998.
[209] E. ian. Modeling of Flow in Faulted and Fractured Media. PhD thesis,
University of Bergen, March 2004.
[210] S. Osher and R. R. Fedkiw. Level set methods. Technical Report 0008, Computational and applied mathematics, University of California, Los
Angeles, February 2000.
[211] S. Osher and J. A. Sethian. Fronts propagating with curvature dependent speed: Algorithms based on Hamilton-Jacobi formulations. Journal of
Computational Physics, 79:1249, 1988.
300
Bibliography
[212] S. J. Osher and R. P. Fedkiw. Level Set Methods and Dynamic Implicit
Surfaces. Springer, 2002.
[213] J. C. Parker and R. J. Lenhard. A Model for Constitutive Relations Governing Multiphase Flow in Prorus Media 1. Saturation-Pressure Relations.
Water Resources Research, 12(23):21872196, 1987.
[214] D. W. Peaceman. Improved treatment of dispersion in numerical calculation of multidimensional miscible displacement. SPE, 1362, 1965.
[215] D. W. Peaceman. Fundamentals of Numerical Reservoir Simulation. Elsevier Scientific Publishing Company, 1977.
[216] D. W. Peaceman. Interpretation of well-block pressures in numerical reservoir simulation. SPE, 6893, 1978.
[217] D. W. Peaceman. A new method for calculating well indexes for multiple wellblocks with arbitrary rates in numerical reservoir simulation. SPE,
79687, 2003.
[218] K. S. Pedersen, A. Fredenslund, and P. Thomassen. Properties of Oils
and Natural Gases, volume 5 of Contributions in Petroleum Geology and
Engineering. Gulf Publishing Company, 1989.
[219] D.-Y. Peng and D. B. Robinson. A new two-constant equation of state.
Industrial & Engineering Chemistry Fundamentals, 15(1):5964, 1976.
[220] . Pettersen. Grunnkurs i Reservoarmekanikk. Department of mathematics, University of Bergen, 1990.
[221] M. J. D. Powell. A method for nonlinear constraints in minimization problems. In R. Fletcher, editor, Optimization, pages 283298. Academic Press,
New York, 1969.
[222] M. Prevost. Accurate Coarse Reservoir Modeling Using Unstructured
Grids, Flow-Based Upscaling and Streamline Simulation. PhD thesis,
Stanford University, December 2003.
[223] W. Pugh and J. Spacco. MPJava: High-Performance Message Passing in
Java using Java.nio. Presented at Languages and Compilers for Parallel
Computing, 2003.
[224] A. Quarteroni and A. Valli. Domain Decomposition Methods for Partial
Differential Equations. Oxford University Press, 1999.
Bibliography
301
[225] P. Raviart and J. Thomas. A mixed finite element method for second order
elliptic equations. In I. Galligani and E. Magenes, editors, Mathematical
Aspects of Finite Element Methods, pages 292315. Springer, 1977.
[226] V. Reichenberger, H. Jakobs, P. Bastian, R. Helmig, and J. Niessner. Complex Gas-Water Processes in Discrete Fracture-Matrix Systems. Upscaling,
Mass-Conservative Discretization and Efficient Multilevel Solution and Efficient Multilevel Solution. Technical Report 130, Institut fur Wasserbau,
Universitat Stuttgart, 2004.
[227] R. C. Reid, J. M. Prausnitz, and T. K. Sherwood. The Properties of Gases
and Liquids. McGraw-Hill Book Company, third edition, 1977.
[228] W. C. Rheinboldt. Methods for solving systems of nonlinear equations.
SIAM, second edition, 1998.
[229] T. F. Russell and M. A. Celia. An overview of research on Eulerian
Lagrangian localized adjoint methods (ELLAM). Advances in Water Resources, 25(812):12151231, 2002.
[230] Y. Saad. SPARSKIT: a basic tool for sparse matrix computations, second
edition, June 1994.
[231] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 7:856869, 1986.
[232] F. Santosa. A level-set approach for inverse problems involving obstacles.
European Series in Applied and Industrial Mathematics. Controle, Optimisation et Calcul des Variations, 1:1733, 1995/96.
[233] A. E. Scheidegger. General theory of dispersion in porous media. Journal
of Geophysical Research, 66(10):32733278, 1961.
[234] A. E. Scheidegger. The physics of flow through porous media. University
of Toronto Press, third edition, 1974.
[235] Schlumberger. ECLIPSE: Technical Description.
[236] H. A. Schwarz. Gesammelte mathematische abhandlungen. Vierteljahrsschrift Naturforschenden Gesellschaft Zurich, 15:272286, 1870.
[237] R. C. Selley. Elements of petroleum geology. W. H. Freeman and Company,
1985.
302
Bibliography
Bibliography
303
304
Bibliography
Bibliography
305
[276] N. Yan. A posteriori error estimators of gradient recovery type for elliptic
obstacle problems. Advances in Computational Mathematics, 15(1-4):333
362 (2002), 2001.
[277] N. Yan and A. Zhou. Gradient recovery type a posteriori error estimates for
finite element approximations on irregular meshes. Computer Methods in
Applied Mechanics and Engineering, 190(32-33):42894299, 2001.
[278] Z. Zhang. Polynomial preserving gradient recovery and a posteriori estimate for bilinear element on irregular quadrilaterals. International Journal
of Numerical Analysis and Modeling, 1(1):124, 2004.
[279] H.-K. Zhao, T. F. Chan, B. Merriman, and S. Osher. A variational level
set approach to multiphase motion. Journal on Computational Physics,
127(1):179195, 1996.
[280] A. Zhou, editor. A posteriori error estimation and adaptive computational
methods, volume 15. Kluwer Academic Publishers, 2002.
[281] O. C. Zienkiewicz and R. L. Taylor.
The finite element method.
Butterworth-Heinemann, fifth edition, 2000.
[282] O. C. Zienkiewicz and J. Z. Zhu. The superconvergent patch recovery and
a posteriori error estimates, parts 1 and 2. International Journal for Numerical Methods in Engineering, 33:13311364 and 13651382, 1992.
[283] B. G. Zorn. The measured cost of conservative garbage collection. Software
- Practice and Experience, 23(7):733756, 1993.