You are on page 1of 80

Risky Measures of Risk:

Error Analysis of
Numerical Differentiation
Dr. Harvey J. Stein
Head, Quantitative Finance R&D
Bloomberg L.P.
22 June 2005
Revision: 1.13

1. Overview

1. Overview

Issues in numerical differentiation


Roundoff error
Convexity error
Cancellation error
Correlated errors
Methods to improve accuracy
h control
Smoothing techniques
Computation specific approaches

2. Derivatives as finite difference

2. Derivatives as finite difference


To compute the derivative of a function f , one must compute
f (x + h) f (x)
.
f (x) = lim
h0
h
0

Although one could try to take this limit numerically, this is much work. More
commonly, one chooses a small value of h = h0 , and tries to verify that the
approximation
f (x + h0 ) f (x)
f 0 (x) fr0 (x) =
h0
is sufficiently close to the desired derivative.
But, once were not taking the limit, we run into questions, such as:
What value of h0 ?
Why fr0 , and not fl0 (x) =

f (x)f (xh0 )
h0

or fc0 =

f (x+h0 )f (xh0 )
?
2h0

3. Investigating behavior as a function of h0

3. Investigating behavior as a function of h0


Naively, one might think that since we want the limit as h goes to zero, small is
better.
However, graphing the computed derivatives as functions of the step size indicates
otherwise.

3. Investigating behavior as a function of h0

1
Normal CDF
Central derivative with h = 10^-12
Central derivative with h = 10^-13

0.8

0.6

0.4

0.2

0
-3

-2

-1

As can be seen in the above graph, h0 = 1013 exhibits some noise (which, Im
afraid is barely visible on this slide).

3. Investigating behavior as a function of h0

Using h0 = 1014 gives visible noise:


1
Normal CDF
Central derivative, stepsize 10^-14
0.8

0.6

0.4

0.2

0
-3

-2

-1

3. Investigating behavior as a function of h0

As h0 is decreased, the approximation gets worse and worse, with h0 = 1017


giving complete nonsense - a derivative approximation thats mostly zero:
1
Normal CDF
Central derivative, stepsize 10^-15
Central derivative, stepsize 10^-16
Central derivative, stepsize 10^-17

0.8

0.6

0.4

0.2

0
-3

-2

-1

4. Error analysis 101 - machine precision

4. Error analysis 101 - machine precision


Why does the derivative look so bad for h0 = 1017 ? Because then h0 is disappearing into the resolution of our computer arithmetic:
Cumulative normal

Range from 1-10^-15 to 1+10^-15

Doubles by the IEEE standard are 64 bits long with a 53 bit mantissa. For any
given exponent, one can only represent 252 different positive values (one bit is

4. Error analysis 101 - machine precision

the sign bit). 252 1016 , so we only get to use at most 16 decimal digits to
represent a mantissa. In particular, our computers think that 1 + 1017 = 1.
This discreteness affects the input value, the output value, and all intermediate
computations.
Inspecting the above graph, we see that a stepsize of 1016 for x values around 1
will give either zero or a ridiculously high value for the derivative. 10 15 will also
give noisy derivatives, depending on where the actual x + h0 and x h0 lie on
the step function that constitutes the normal CDF at this level of magnification.
Theres another effect thats commonly discussed. The fact that (x + h) (x
h) 6= 2h because of roundoff error. This means that the denominator of our
approximation shouldnt really be h. There are two ways to fix this. One is to
use a power of 2 for h instead of a power of 10. The other is to set h = (x+h)x.
The later requires a little bit of effort to prevent an optimizing compiler from
reducing it to h = h.
But, Ive only observed minor impact from such adjustments. For clarity, Ill

4. Error analysis 101 - machine precision

continue using powers of 10 instead of powers of 2 here.

10

11

5. Error analysis 102A - The right h

5. Error analysis 102A - The right h


But, is this the whole story? Since were differentiating the normal CDF, we
can compare it to the analytic solution. Comparing our original calculation with
h0 = 1012 , we see that the derivative computed wasnt nearly as accurate as
we had thought. In fact, h0 = 1010 gives much smaller errors:
0.00015
normal - delta f, h=10^-12
normal - delta f, h=10^-11
normal - delta f, h=10^-10

0.0001

5e-05

-5e-05

-0.0001
-3

-2

-1

12

5. Error analysis 102A - The right h

Lets see how large we need to make h0 .


1e-06
normal - delta f, h=10^-10
normal - delta f, h=10^-19
normal - delta f, h=10^-8

8e-07
6e-07
4e-07
2e-07
0
-2e-07
-4e-07
-6e-07
-8e-07
-1e-06
-3

-2

-1

Clearly, 108 works much better than 1010 .

13

5. Error analysis 102A - The right h

Continuing along, we see that 106 works much better than 108 :
1e-08
normal - delta f, h=10^-8
normal - delta f, h=10^-7
normal - delta f, h=10^-6

8e-09
6e-09
4e-09
2e-09
0
-2e-09
-4e-09
-6e-09
-8e-09
-1e-08
-3

-2

-1

14

5. Error analysis 102A - The right h

Finally, we see that 105 gives the best results, with 104 being smooth, but
giving a high bias.
3e-10
normal - delta f, h=10^-6
normal - delta f, h=10^-5
normal - delta f, h=10^-4

2e-10
1e-10
0
-1e-10
-2e-10
-3e-10
-4e-10
-5e-10
-6e-10
-7e-10
-3

-2

-1

15

5. Error analysis 102A - The right h

To prove Im not cheating, lets do the same with f (x) = x3 :


2
deriv(x^3), error with h = 10^-14
1
0
-1
-2
-3
-4
-5
-6
-6

-4

-2

16

5. Error analysis 102A - The right h

The best h0 for f (x) = x3 ends up again being around 105 . Coincidence?
1.2e-08
deriv(x^3), error with h=10^-6
deriv(x^3), error with h=10^-5
deriv(x^3), error with h=10^-4

1e-08
8e-09
6e-09
4e-09
2e-09
0
-2e-09
-3

-2

-1

17

5. Error analysis 102A - The right h

Lets look more closely at the error graph using h0 = 105 :


1.5e-11
normal - delta f, h=10^-5
1e-11
5e-12
0
-5e-12
-1e-11
-1.5e-11
-3

-2

-1

It appears to consist of noise plus some periodic error. The noise is from cancellation error, and the periodic component is from convexity error.

18

6. Convexity error

6. Convexity error
f 00 (x) 2 f 000 (x) 3
h +
h + ...
f (x + h) = f (x) + f (x)h +
2!
3!
0

f 000 (x)
3! + ...
f (x + h) f (x h) = 2f (x)h + 2
3
h
0

f 000 (x) 2
f (x + h) f (x h)
0
= f (x) +
h + ...
2h
3!
f 000 2
3! h

When h is large enough, the


tends to zero as h tends to zero.

term contributes to the error. This error

This is one of the reasons why a centered derivative is favored over a one sided
derivative the f 00 h error term drops out, so the convexity error is smaller.
Note also that the one sided derivative is the same as the two sided derivative
computed at half the stepsize and shifted by half the stepsize, so in some sense
the error in the one sided derivative is that its estimating the derivative at the
wrong x value.

19

7. Cancellation error

7. Cancellation error

Consider f (x + h) f (x h). Suppose our of each has 3 significant digits of


accuracy. How many digits of accuracy are in the difference?

f (x + h) =.335 + noise
(f (x h) =.231 + noise)
f (x + h) f (x h) =.104 + noise
However,

f (x + h) =.335 + noise
(f (x h) =.331 + noise)
f (x + h) f (x h) =.004 + noise

20

7. Cancellation error

In the first case, the difference yielded 3 significant digits. In the second case,
the high order digits cancelled, leaving only 1 significant digit in the difference.
This is cancellation error.
Clearly, cancellation error increases as h decreases. For h sufficiently small,
f (x + h) = f (x h) and the relative error becomes infinite.
In general, we can use relative errors to encode the number of significant digits
in our computations. Let

f(x) = f (x) + (x)f (x)

where f(x) is what we actually get when computing f (x). Here (x) is a random
quantity that quantifies the relative error in calculating f (x). If were accurate
to machine precision, then || 253 . If our calculation only yields 5 decimal
105
digits of accuracy, then || 2 .

21

7. Cancellation error

f(x + h) f(x h) =f (x + h) f (x h)

+ (x + h)f (x + h) (x h)f (x h)
f (x + h) f (x h) + (x)f (x)

assuming h is small enough that f (x + h) and f (x h) are about the same


magnitude, and that (x + h) and (x h) are independent noise, and ignoring
the fact that summing them might cause the loss of one additional significant
bit. This analysis can be done more carefully, but this will get us into the right
ballpark.
This quantifies the problem of cancellation error. The absolute error in calculating f is roughly f . For small h, f is small, leaving the error to dominate
the calculation.
The best value of h will balance the cancellation error and the convexity error.
f 000 (x) 2 (x)f (x)
f (x + h) f (x h)
0
f (x) +
h +
2h
3!
2h

22

7. Cancellation error

000

To minimize the error we must minimize f 3!(x) h2 +

(x)f (x)
) .
2h

d f 000 (x) 2 (x)f (x)


f 000 h
f
(
h +
)=
2
dh
3!
2h
3
2h
f 000 h3 3f
=
6h2
=0

implies
or

f 000 h3 3f = 0
h=

p
3

(3f /f 000 )

If our calculations are exact except for roundoff error, and f and f 000 are around
the same order of magnitude, then 253 (52 accurate digits, base 2), which
gives an optimal h of about 7 106 . This ties out with our above empirical
work, and indicates that both functions are being computed with around full
accuracy.

23

7. Cancellation error

For x2 , we know the 3rd derivative is zero. This means that the only error is
from cancellation, which means that larger h should automatically be better.
Graphs confirm this theory:
2.5e-13
deriv(x^2), error with h=10^-2
deriv(x^2), error with h=10^-1
deriv(x^2), error with h=10^0

2e-13
1.5e-13
1e-13
5e-14
0
-5e-14
-1e-13
-1.5e-13
-2e-13
-2.5e-13
-3

-2

-1

Note that in finance our error is often on the order of a penny on a 100 value, a
relative error of about 105 which requires h0 0.03, a rather large value!

8. Error analysis 102B - Flying without a reference

24

8. Error analysis 102B - Flying without a reference


More commonly were faced with calculating derivatives without being able to
verify against a known analytic formula. If we dont know the derivative, and
we dont know how much error we have in our function, how does one pick h0 ?
One way is to inspect higher order derivatives. If our first derivative is jumping
around, then the second derivative with the same step size will be visibly noisy.
Well use

f (x + h) + f (x h) 2f (x)
f (x)
h20
00

and
f 000 (x)

f (x + 2h) 2f (x + h) + 2f (x h) f (x 2h)
2h20

so that for the second and third derivative approximations sample f with the
same spacing as the first derivative calculation.

25

8. Error analysis 102B - Flying without a reference

Graphing f 00 and f 000 as a function of step size shows that the second derivative
is visibly poor for h0 = 107 and that the third derivative is visibly poor for
h0 = 105 :
1
Normal CDF
Derivative with h = 10^-5
2nd derivative with h = 10^-7
2nd derivative with h = 10^-6
3rd derivative with h = 10^-5
3rd derivative with h = 10^-4

0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-3

-2

-1

This at least indicates that something between 106 and 104 is called for when

8. Error analysis 102B - Flying without a reference

26

computing f 0 . Can we make this more precise? Maybe, but we havent tried.
One might consider trying to integrate the derivative to see how close it comes
back to the original function. Unfortunately this doesnt help because the sum is
effectively a telescoping series. If x = 2h, X = {x1 , x1 + x, x1 + 2x . . . , x2 },
then
X f (x + h) f (x h)
x = f (x2 + h) f (x1 h).
2h
X

In other words, the error in the derivative from one point to the next cancel each
other. This is clear because if a given point is too high, then the derivative to
the left will be too large while on the right it will be too small.

9. Error analysis 201 - Correlated errors

27

9. Error analysis 201 - Correlated errors


This is where most error analysis ends up, but is far from the whole story. One
of the key assumptions in the above analysis is that the error is random and
uncorrelated from one x value to another. This is rarely the case.
Consider finite difference and lattice (aka tree) approaches to option valuation.
In these, the pricing function is a weighed average of the payoff sampled at various
points. The weights change slightly as a function of the underlying, but the actual
payoffs used change substantially as the strike passes a sample point. This makes
the pricing function calculation roughly a piecewise linear approximation of the
actual function. In the case of a european option on a stock under Black-Scholes
and using a binomial lattice, its exactly a piecewise linear approximation. In
the case of an option on a bond, its closer to piecewise exponential.
To prove this, let Sij be the stock value at node j at time ti . With starting value
S0 , volatility , maturity time T , N steps, t = T /N , and risk free rate r, a
typical binomial lattice uses future stock values of:
Sij = S0 uj dij ,

9. Error analysis 201 - Correlated errors

where u = t, and d = 1/u.

28

29

9. Error analysis 201 - Correlated errors

The value of an option of strike K is then


C = erT

N
X

max(SN j K, 0)

N
X

max(S0 (pu)j (qd)N j K, 0)

j=0

= erT

j=0

= erT

N
X

j=j(S0 )

where p =

ert d
ud ,

S0 (pu)j (qd)N j K

and q = 1p, and j(S0 ) is the minimum j such that SN j > K.

Then,
dC
= erT
dS0

N
X

(pu)j (qd)N j

j=j(S0 )

The derivative is a step function, only changing value when j(S0 ) changes.

30

9. Error analysis 201 - Correlated errors

This is rarely noticed when graphing the function, just like the error in the
derivative calculations werent noticed in our initial graphs:
Option value, 1 yr opt, 30% vol, 3% risk free rate
60
12 step lattice
50
40
30
20
10
0
40

60

80

100

120

Initial stock value

140

160

31

9. Error analysis 201 - Correlated errors

But it becomes clearly evident when we inspect the difference derivative:


dC/dS, 1 yr opt, 30% vol, 3% risk free rate, h=.01
1
12 steps
120 steps

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
40

60

80

100

120

140

160

Initial stock value

A 12 step lattice gives us large piecewise linear sections. A 120 step lattice, while
increasing computation by a factor of 100, only decreases the sizes of the steps
by about a factor of 3.

9. Error analysis 201 - Correlated errors

32

Why only a factor of three? Because u = t. Decreasing the step size by a


factor of 10 only decreases u by a factor of 10 3, which only gives about 3
times the level density.
When the approximation is piecewise linear, and the stepsize is much smaller
than the support of the linear segments, the first derivative is poor. In computing
the second derivative, the sample endpoints almost always land on the same
segement, making the estimate of the second derivative zero almost everywhere.

33

9. Error analysis 201 - Correlated errors

ddC, 1 yr opt, 30% vol, 3% risk free rate, h=.01


25
12 step lattice
120 step lattice

20
15
10
5
0
-5
40

60

80

100

120

Initial stock value

140

160

10. Smoothing

34

10. Smoothing
When the calculation is a black box, we cant get inside to use the internals in
the calculation. In this case, how can one compute a good derivative?
One trick is to use a large h. We suffer convexity error because its being swamped
by error from the piecewise linearity of the function. Picking h around 1 to 2x
the support of the linear segments will do it.

35

11. H adjustment

11. H adjustment
Here we can see that with a 12 step lattice, we need to compute the derivative
with h0 17.
ddC, 1 yr opt, 30% vol, 3% risk free rate, 12 step lattice
1
h=.01
h=1
h=10
h=17

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
40

60

80

100

120

Initial stock value

140

160

36

11. H adjustment

The second derivative is also helped by using a larger stepsize, but still isnt
especially good:
2nd deriv of 1 yr opt, 30% vol, 3% risk free rate, 12 step lattice
0.03
h=1
h=10
h=17

0.025
0.02
0.015
0.01
0.005
0
40

60

80

100

120

Initial stock value

140

160

37

11. H adjustment

More commonly, people would use 120 levels for a 1 year stock option, but even
this requires a large value of h0 :
dC/dS of 1 yr opt, 30% vol, 3% risk free rate, 120 step lattice
1
h=.01
h=1
h=5

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
40

60

80

100

120

Initial stock value

140

160

38

11. H adjustment

Second derivative:
2nd deriv of 1 yr opt, 30% vol, 3% risk free rate, 120 step lattice
0.02
h=5
0.015

0.01

0.005

0
40

60

80

100

120

Initial stock value

140

160

39

11. H adjustment

For fun, lets take a look at what happens with 1200 levels, which is over 3
levels/day:
dC/dS, 30% vol, 3% risk free rate, 1200 step lattice
1
h=.01
h=2

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
40

60

80

100

120

140

160

Initial stock value

As you can see, we still have fairly large piecewise linear sections. We need to
make h0 around 2 to get reasonable derivative estimates.

40

11. H adjustment

Second derivative:
2nd deriv, 30% vol, 3% risk free rate, 1200 step lattice
0.018
h=2

0.016
0.014
0.012
0.01
0.008
0.006
0.004
0.002
40

60

80

100

120

Initial stock value

140

160

11. H adjustment

41

Why did an h0 of 17 for 12 levels, 5 for 120 levels and 2 for 1200 levels work
reasonably well?
As mentioned before,
the stepsize needed is roughly the lattice spacing. This is
approximately 2S0 T , which is 17 for 12 steps/year, 5.5 for 120 steps/year,
and 1.7 for 1200 steps/year.
Even for a dense lattice of 1200 levels, a much larger stepsize required than is
commonly recognized.
In fact, its common to use monthly steps in a binomial lattice for long dated
bonds, and a bump size of 10bp for modified duration and key rate duration with
a one sided derivative.
Lets take a look at the behavior of this. Well use a trinomial lattice, which
gives better results than a binomial lattice.

42

11. H adjustment

First, consider the error in computing the change in a callable bond as a function
of the step size using a centered derivative.
Centered derivatives
0
dC/dS 10bp
dC/dS 25bp
dC/dS 50bp

-0.01
-0.02
-0.03
-0.04
-0.05
-0.06

-0.07
0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075
curve shift

A 25bp shift is bumpy, but looks fairly close to what it should be.

43

11. H adjustment

Compare this to a one sided derivative, which is whats commonly used:


One sided derivatives
0
-0.01
-0.02

dC/dS 25bp
dC/dS 10bp, 1 sided
dC/dS 25bp, 1 sided
dC/dS 50bp, 1 sided

-0.03
-0.04
-0.05
-0.06
-0.07
0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075
curve shift

The stepping in the one sided derivative for a given h0 is the same as that for
the centered derivative at h0 /2, but the convexity error is much worse.

44

11. H adjustment

Next, consider the key rate sensitivities. The bond isnt sensitive to the 3mo
rate, so the 1st key rate sensitivity should be zero.
key rate duration - k1 - call sensitivity vs level
-7.75e-05
-7.8e-05
-7.85e-05
-7.9e-05

10bp 1side
10bp centered
25bp 1side
25bp centered
50bp 1side
50bp centered

-7.95e-05
-8e-05
-8.05e-05
-8.1e-05
-8.15e-05
0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075
curve shift

It ends up being close to zero, but noisy, and pretty similar for all step sizes
and seemingly unaffected by whether we use a centered derivative or a one sided
derivative.

45

11. H adjustment

The second key rate sensitivities suffer from the piecewise nature of the calculation, both the centered ones as well as the one sided ones.
key rate duration - k2 - call sensitivity vs level
0
-0.001
-0.002
-0.003
-0.004

10bp 1side
10bp centered
25bp 1side
25bp centered
50bp 1side
50bp centered

-0.005
-0.006
-0.007
-0.008
-0.009
-0.01
0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075
curve shift

The other key rates look similar.

46

11. H adjustment

Comparing the sum of the one sided key rates to the 25bp centered derivatives, we
see that the sum suffers both from the piecewise nature as well as the convexity,
and suffers worse than the one sided full sensitivity.
Sum of key rates, one sided
0
-0.01
-0.02

dC/dS 25bp
sum kr 10bp
sum kr 25bp
sum kr 50bp

-0.03
-0.04
-0.05
-0.06
-0.07
0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075
curve shift

47

11. H adjustment

The centered key rates are better, with the 25bp step size landing fairly close to
the 25bp derivative.
Sum of key rates, centered
0
-0.01

dC/dS 25bp
sum kr 10bp
sum kr 25bp

-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075
curve shift

48

11. H adjustment

Comparing the 50bp centered key rates to the 50bp difference derivative, we
see that the two are close, but are significantly different. This is because the
key rates interact with the piecewise nature differently than the full curve shift.
Sum of key rates, centered
0
-0.01

dC/dS 50bp
sum kr 50bp

-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 0.075
curve shift

49

12. Filtering

12. Filtering
A more sophisticated approach is to smooth our pricing function. Essentially,
wed like to filter out the high frequencies that come from the corners where the
slope changes, leaving only the lower frequency data arising from the changing
function values.
This amounts to computing the Fourier transform of the price function, multiplying by a function that decays to zero (to dampen out the high frequency
noise), and transforming back, or
Smooth f = F 1 (F(f )D)
where D is our damping function (or smoothing kernel), and F is the Fourier
transform.

50

12. Filtering

Since F(f g) = F(f )F(g) (where is the convolution operator),


Smooth f = F 1 (F(f )D)

= F 1 (F(f )F(F 1 (D)))

= F 1 (F(f F 1 (D)))
= f F 1 (D)

So, smoothing a function is the same as computing its convolution with the
inverse transform of the smoothing kernel. Since (f g)0 = f 0 g = f g 0 ,
smoothing the derivative can be done by convolving with the derivative of the
inverse transform of the smoothing kernel. Finally, since the Fourier transform
of a Gaussian PDF is a Gaussian (up to scaling), we can smooth by integrating
against a Gaussian and its derivatives.
All thats left is to integrate a function times a Gaussian, which is best done by

51

12. Filtering

Gaussian quadrature.
1

0
x2
f (x0 x) e 22 dx =

x2
1

f (x0 x)xe 22 dx
3 2
Z

1
x2
dx
f (x0 2x)xe
=
2
X wi

f (x0 2xi )xi

where xi are the Gaussian quadrature points and wi are the associated weights.
The theory sounds beautiful, and looks like exactly what we need, but the theory
doesnt live up to its promise in practice. Although Ive used this method in
the past, and it has applications in signal processing, Ive been unable to make
it perform better than a two point difference derivative. It seems that it works
better in the random noise case than on piecewise linear functions.

52

12. Filtering

Using 5 points on a 120 level Black-Scholes lattice yields:


FFT vs centered difference - 1st derivative
1
formula, h=10^-5
centered, h=5
Gauss, 3pt, h=5

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
40

60

80

100

120

140

160

Stock price

The 5 point FFT method yields similar results to the 2 point difference derivative.
Both look good by inspection.

53

12. Filtering

But of course, the best way to check is to compare to a good reference. In this
case, well compare the error relative to a difference derivative computed on the
formula using a step size of 105 .
FFT vs centered difference - 1st derivative, errors
0.006
Centered err
FFT err, sig=3
FFT err, sig=2

0.004
0.002
0
-0.002
-0.004
-0.006
40

60

80

100

120

140

160

Stock price

The FFT method is hard pressed to do better than a well chosen step size.

54

12. Filtering

Its hard to make either method produce both a smooth and accurate second
derivative:
FFT vs centered difference - 2nd derivative
0.016
formula, h=10^-4
centered, h=15
FFT h=6

0.014
0.012
0.01
0.008
0.006
0.004
0.002
40

60

80

100

120

140

160

Stock price

Both look equally poor, with the FFT method requiring twice the computational
effort.

55

12. Filtering

Error graphs confirm what we saw in the previous graph:


FFT vs centered difference - 2nd derivative errors
0.0006
Centered, h=15
FFT err, sig=6

0.0004
0.0002
0
-0.0002
-0.0004
-0.0006
-0.0008
40

60

80

100
Stock price

120

140

160

56

13. Complex arithmetic

13. Complex arithmetic


Another approach makes use of complex analysis. If f is a real valued function
of one real variable, and can be extended to a complex analytic function, then
f (3) (x)ih3
f (4) (x)h4
f 00 (x)h2

+
+
f (x + ih) = f (x) + f (x)ih
2
3!
4!
0

so

f (3) (x)h2
=(f (x + ih))
0
= f (x)
+
h
3!

This has the same convexity error as the centered derivative, but doesnt directly
suffer from cancellation error, allowing one to reduce h to lower convexity error
without increasing cancellation error.
While this approach can be useful in analytic methods, difficulties are encountered when trying to apply it in finance. It doesnt correct for correlated errors
when the function is piecewise linear, it just does a very good job of returning
the slope of the linear sections, yielding a step function for the derivative. Its
also not as straight forward as it looks. One cant just change all references

13. Complex arithmetic

57

of double to complex because numerical code in finance makes heavy use of


inequalities, as in
max(S K, 0),
which are meaningless on the complex plane. They need to be replaced by
something else. One source recommends comparing the real parts, but this
prevents the function from being analytic, thus breaking the above Taylor series
analysis. Finally, our analytic formulas in finance typically involve cumulative
normal distributions. While there is a unique continuation to the complex plane,
computing it is more involved than just calculating erf(x/sqrt(2))/2+1/2. One
would need to develop fast and accurate numerical methods for the calculation
of a complex cumulative normal before this method is useful in such a context.
This method is commonly compared to a one sided derivative because both require one additional function evaluation. But, evaluating a function at a complex
point can triple the computational effort. One complex addition is over double
the effort of a real addition, in that it requires two real additions and works with
more memory. One complex multiplication requires four real multiplications plus
two real additions, and thus is over four times as expensive as a real addition.

58

13. Complex arithmetic

A centered derivative is more comparable in computational effort, in which case


both methods have the same convergence properties as h0 tends to zero. The
only difference is in the cancellation error.
Its easy to see why this doesnt help for Black-Scholes binomial lattices. Recalling that the lattice computation for the value of a call option is
C(S0 ) = erT

N
X

j=j(S0 )

S0 (pu)j (qd)N j K

we see that
=(C(S0 + ih))/h = erT

N
X

j=j(S0 )

(pu)j (qd)N j K

Up to round off error, the complex method gives the same results as the centered
difference.

13. Complex arithmetic

59

Another complex technique is to exploit the Cauchy integral formula, which


states that:
I
n!
f (z)
dz
f n (z0 ) =
n+1
2i (z z0 )
where is a counterclockwise loop enclosing z0 .
One can then compute the above integral numerically. Bruno Dupire and Arun
Verma have looked into this method a little, deriving formulas for getting 4th
order accuracy using 4 points for the first 4 derivatives.

14. Algorithm specific approaches

60

14. Algorithm specific approaches


There are additional approaches that can be taken if one can modify the internals
of the numerical calculations.

15. Using internal lattice spacing

61

15. Using internal lattice spacing


In finite difference approaches, one can often read extra information from the
lattice itself. In a simple Black-Scholes lattice, one can start the lattice two
levels early. This gives the option value as the middle value after the second
step. The values at the other two nodes can be used for the up and down values.
One reference for this method is a 1994 article by Pelsser and Vorst, where they
call it a well known alternative to the difference derivative.
Pelsser and Vorst compute the derivative as f
x , which introduces convexity error
by doing a difference derivative around the wrong point. Here we avoid this by
using another numerical technique fitting all three points (the up, the down
and the center) to a quadratic and reading the derivatives from there.
This latter technique could also be used in general when three points are available, and should reduce convexity error, but I havent tested it.
Shifting the lattice often gives the best derivatives that can be gotten from a
lattice.

15. Using internal lattice spacing

62

Unfortunately, the approach cant always be applied. In interest rate lattices, the
values at the other nodes dont always correspond to a shift of the yield curve.
In normal short rate models they do, but in log normal models they dont. In the
latter case, to apply this approach, one would have to either adjust the derivative
or settle for differentiating with respect to a different sort of curve move.

63

15. Using internal lattice spacing

None the less, where this method applies, it works quite well. Well compare
it to the best of fixed h0 selection. Consider Black-Scholes again with a 1 year
option, 30% vol, 3% risk free rate, computed using a 120 step binomial lattice.
Again, the first derivatives are visually fine:
Centered difference vs lattice shift - 1st derivative
1
formula, h=10^-5
centered, h=5
lattice shift

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
40

60

80

100
Stock price

120

140

160

64

15. Using internal lattice spacing

But the differences to a reference show that the shifted lattice approach is far
smoother and more accurate:
Centered difference vs lattice shift - 1st derivative, errors
0.004
Centered err
lattice shift

0.003
0.002
0.001
0
-0.001
-0.002
-0.003
-0.004
-0.005
-0.006
40

60

80

100
Stock price

120

140

160

65

15. Using internal lattice spacing

The results on the second derivative are more pronounced. The fixed h selection
is visibly poor, while the shifted lattice still looks quite good:
Centered difference vs lattice shift - 2nd derivative
0.016
formula, h=10^-4
centered, h=15
lattice shift

0.014
0.012
0.01
0.008
0.006
0.004
0.002
40

60

80

100
Stock price

120

140

160

66

15. Using internal lattice spacing

Checking the differences to a reference shows how much better:


Centered difference vs lattice shift - 2nd derivative errors
0.0006
Centered, h=15
FFT err, sig=6
lattice shift

0.0004
0.0002
0
-0.0002
-0.0004
-0.0006
-0.0008
40

60

80

100
Stock price

120

140

160

67

15. Using internal lattice spacing

Looking at the errors for the lattice shift by itself, we can see the errors in the
second derivative calculation are around 104 , which is about a 1.5% error in
the second derivative.
Lattice shift 2nd derivative error
8e-05
lattice shift

6e-05
4e-05
2e-05
0
-2e-05
-4e-05
-6e-05
-8e-05
-0.0001
40

60

80

100
Stock price

120

140

160

68

15. Using internal lattice spacing

Surprisingly, this method yields reasonable results even with a monthly lattice.
Heres the first derivative:
Centered difference vs lattice shift - 1st derivative
1
formula, h=10^-5
lattice shift

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
40

60

80

100
Stock price

120

140

160

69

15. Using internal lattice spacing

Heres the second derivative:


Centered difference vs lattice shift - 2nd derivative
0.016
formula, h=10^-4
lattice shift

0.014
0.012
0.01
0.008
0.006
0.004
0.002
40

60

80

100
Stock price

120

140

160

70

15. Using internal lattice spacing

The shifted lattice performs well because it samples the price function at exactly
the right points. At option expiration, on the up tree theres exactly one more
node in the money and one less out of the money, and the rest get exactly the
same value.
When the call option price is:
C(S0 ) = erT

N
X

j=j(S0 )

S0 (pu)j (qd)N j K

and 1 <= j(S0 ) <= N 1, the up price is:


C(S0 u) = erT

N
X

S0 u(pu)j (qd)N j K

N
X

S0 d(pu)j (qd)N j K

j=j(S0 )1

and the down price is:


C(S0 d) = erT

j=j(S0 )+1

71

15. Using internal lattice spacing

Whereas we used a quadratic approximation, for simplicity, just consider a one


sided derivative. Its value is

N
j(S0 )1
N j(S0 )+1
X
(qd)

K/S
u(pu)
0
+
(pu)j (qd)N j
erT
u1
j=j0 (S0 )

Its exactly the centered difference derivative for a small shift, plus a correction
term thats a linear function of 1/S0 . Its the correction term varying as a
function of S0 while j(S0 ) remains fixed that makes up the appropriate correction
for the derivative calculation.
It must be noted that despite the above graphs, optimizing h0 and the shifted
lattice method are actually the same numerically. Picking h0 gives poorer results
in the above tests predominantly because were not picking a different h0 for each
underlying. Fixing it for the entire computation is why its not behaving nearly
as well.
Setting h = S0 u S0 would make the difference derivative quite close to the
shifted lattice value. Using separate up and down shifts instead of doing a

15. Using internal lattice spacing

72

centered derivative would make them identical. But this is the same value at
almost three times the computational effort.

16. Differentiation under the integral sign

73

16. Differentiation under the integral sign


In Monte Carlo calculations, one computes an integral via random sampling of
the payoff. Pricing errors in Monte Carlo based calculations are typically much
larger than in other methods, making shifting methods particularly poor.
One approach (advocated by Vladimir Piterbarg) is to exploit the fact that
the integral and derivative commute to integrate the derivative of the payoff
function instead of differentiating the integral of the payoff. This approach may
lead to staircasing, but even so, its still better than the random noise observed
in attempting a direct finite difference calculation.
Another approach (advocated by Fournie, Lasry, Lebuchoux, Lions and Touzi,
as well as by Benhamou) applies Malliavin calculus in an effort to reduce the
error in computing the expectation of the derivative. Here, instead of computing
d
d
E[X(S
)],
we
find
a
random
variable

such
that
0
dS0
dS0 E[X(S0 )] = E[X(S0 )],
which again allows computing the derivative directly by Monte Carlo instead of
taking the difference of two Monte Carlo price calculations.
Differentiation under the integral can also be used when valuing options via FFT.

16. Differentiation under the integral sign

74

The derivative can be computed by computing the FFT of the derivative of the
characteristic function.

17. Analytic techniques

75

17. Analytic techniques


Theres a large literature on working out various greeks analytically which I
havent reviewed. Because of the pricing PDE, there are relations that can be
exploited to avoid the need to compute all the greeks some can be gotten from
others. Symmetries and in general, behavior under specific transformations can
be exploited as well. Papers by Peter Carr as well as by Oliver Reiss and Uwe
Wystup are good places to get started.

18. Summary

76

18. Summary
Approximating the derivative by a difference magnifies the error of the original function.
Small step sizes give huge errors due to cancellation error.
Large step sizes give huge errors due to convexity error.
Balancing convexity error and cancellation error requires unexpectedly large
step sizes as large as 105 when calculations are accurate to machine
precision.
Its hard to judge accuracy without an an accurate reference, but one can
try to make due by graphing higher order derivatives with small stepsize.
Finite difference methods produce piecewise linear (or exponential) functions, which require extra care. Large step sizes are needed to produce
reasonable results. We observed the need for step sizes of 17 for a 12 level
binomial lattice, and 25 50 bp for a 12 level trinomial lattice. Hedges in
practice could be way off.
Fixing this by increasing lattice density
is computationally infeasible because

level spacing is proportional to t.

18. Summary

77

Beware of key rate durations. Theyre especially inaccurate.


Beware of one sided derivatives. Theyre more sensitive to piecewise linear
functions and more sensitive to convexity the worst of both worlds.
Other methods appear in the literature, but dont always help.
One simple method that does help is using the points in the lattice for the
up and down values, extending the lattice back in time if necessary to get
those points.

19. References

78

19. References

What Every Computer Scientist Should Know About Floating-Point Arithmetic,


David Goldberg, Computing Surveys, March 1991
http://docs.sun.com/source/806-3568/ncg goldberg.html
Numerical Recipes in C/C++/Fortran, William H. Press, Saul A. Teukolsky,
William T. Vetterling, Brian P. Flannery
The Binomial Model and the Greeks, Antoon Pelsser and Ton Vorst, Journal of
Derivatives, Spring 1994.
The Complex-Step Derivative Approximation (Sensitivity Analysis Workshop,
Livermore, August 2001)
http://mdolab.utias.utoronto.ca/documents/livermore2001.pdf
The connection between the complex-step derivative approximation and algorithmic differentiation, J. R. R. A. Martins, P. Sturdza, J. J. Alonso, AIAA Paper

19. References

79

2001-0921, Jan. 2001.


Using Complex Variables to Estimate Derivatives of Real Functions, William
Squire and George Trapp, SIAM Review, Vol. 40, No. 1, March 1998.
Risk Sensitivities of Bermuda Swaptions, Vladimir Piterbarg, Bank of America
Working Paper, November 1, 2002
Applications of Malliavin calculus to Monte Carlo methods in finance, Eric
Fournie, Jean-Michel Lasry, Jer
ome Lebuchoux, Pierre-Louis Lions, Nizar Touzi,
Finance and Stochastics, Vol. 3, No. 4, August 1999
Applications of Malliavin calculus to Monte Carlo methods in finance, II, Eric
Fournie, Jean-Michel Lasry, Jer
ome Lebuchoux, Pierre-Louis Lions, Nizar Touzi,
Finance and Stochastics, Vol 5, No. 2, April 2001
Smart Monte Carlo: Various tricks using Malliavin calculus, E. Benhamou,
Quantitative Finance, Volume 2, Number 5, 2002.

19. References

80

Optimal Malliavin Weighting Function for the Computation of the Greeks, E.


Benhamou, Mathematical Finance, Volume 13, Issue 1, 2003.
Deriving Derivatives of Derivative Securities, Peter Carr, Journal of Computational Finance, Vol. 4, No. 2, Winter 2000.
Computing Option Price Sensitivities Using Homogeneity and Other Tricks,
Oliver Reiss and Uwe Wystup, The Journal of Derivatives, Winter 2001

You might also like