Professional Documents
Culture Documents
OXPORD
UNIVERSITY PRESS
OXTORD
UNIVERSITY PRESS
Great Clarendon Street, Oxford OX2 6DP
Oxford University Press is a department of the University of Oxford.
It furthers the University's objective of excellence in research, scholarship,
and education by publishing worldwide in
Oxford New York
Auckland Cape Town Dares Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Oxford is a registered trade mark of Oxford University Press
in the UK and in certain other countries
Published in the United States
by Oxford University Press Inc., New York
© Oxford University Press 2006
The moral rights of the authors have been asserted
Database right Oxford University Press (maker)
First published 2006
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
without the prior permission in writing of Oxford University Press,
or as expressly permitted by law, or under terms agreed with the appropriate
reprographics rights organization. Enquiries concerning reproduction
outside the scope of the above should be sent to the Rights Department,
Oxford University Press, at the address above
You must not circulate this book in any other binding or cover
and you must impose the same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Cataloging in Publication Data
Data available
Printed in Great Britain
on acid-free paper by
Biddies Ltd., King's Lynn, Norfolk
10 9 8 7 6 5 4 3 2 1
Preface
The name "Econophysics" has been used to denote the use of the mathemat-
ical techniques developed for the study of random processes in physical systems
to applications in the economic and financial worlds. Since a substantial number
of physicists are now employed in the financial arena or are doing research in
this area, it is appropriate to give a course that emphasizes and relates physical
applications to financial applications.
The course and text on Random Processes in Physics and Finance differs from
mathematical texts by emphasizing the origins of noise, as opposed to an analysis
of its transformation by linear and nonlinear devices. Of course, the latter enters
any analysis of measurements, but it is not the focus of this work.
The text opens with a chapter-long review of probability theory to refresh those
who have had an undergraduate course, and to establish a set of tools for those who
have not. Of course, this chapter can be regarded as an oxymoron since probability
includes random processes. But we restrict probability theory, in this chapter, to
the study of random events, as opposed to random processes, the latter being a
sequence of random events extended over a period of time.
It is intended, in this chapter, to raise the level of approach by demonstrating
the usefulness of delta functions. If an optical experimenter does his work with
lenses and mirrors, a theorist does it with delta functions and Green's functions. In
the spirit of Mark Kac, we shall calculate the chi-squared distribution (important
in statistical decision making) with delta functions. The normalization condition
of the probability density in chi-square leads to a geometric result, namely, we
can calculate the volume of a sphere in n dimensions without ever transferring to
spherical coordinates.
The use of a delta function description permits us to sidestep the need for using
Lebesgue measure and Stieltjes integrals, greatly simplifying the mathematical
approach to random processes. The problems associated with Ito integrals used
both by mathematicians and financial analysts will be mentioned below. The prob-
ability chapter includes a section on what we call the first and second laws of
gambling.
Chapters 2 and 3 define random processes and provide examples of the most
important ones: Gaussian and Markovian processes, the latter including Brownian
motion. Chapter 4 provides the definition of a noise spectrum, and the Wiener-
Khinchine theorem relating this spectrum to the autocorrelation. Our point of view
here is to relate the abstract definition of spectrum to how a noise spectrum is
measured.
vi PREFACE
Chapter 5 provides an introduction to thermal noise, which can be regarded
as ubiquitous. This chapter includes a review of the experimental evidence, the
thermodynamic derivation for Johnson noise, and the Nyquist derivation of the
spectrum of thermal noise. The latter touches on the problem of how to handle
zero-point noise in the quantum case. The zero-frequency Nyquist noise is shown
to be precisely equivalent to the Einstein relation (between diffusion and mobility).
Chapter 6 provides an elementary introduction to shot noise, which is as ubiq-
uitous as thermal noise. Shot noise is related to discrete random events, which, in
general, are neither Gaussian nor Markovian.
Chapters 7-10 constitute the development of the tools of random processes.
Chapter 7 provides in its first section a summary of all results concerning the
fluctuation-dissipation theorem needed to understand many aspects of noisy sys-
tems. The proof, which can be omitted for many readers, is a succinct one in
density matrix language, with a review of the latter provided for those who wish
to follow the proof.
Thermal noise and Gaussian noise sources combine to create a category of
Markovian processes known as Fokker-Planck processes. A serious discussion of
Fokker-Planck processes is presented in Chapter 8 that includes generation recom-
bination processes, linearly damped processes, Doob's theorem, and multivariable
processes.
Just as Fokker-Planck processes are a generalization of thermal noise, Langevin
processes constitute a generalization of shot noise, and a detailed description is
given in Chapter 9.
The Langevin treatment of the Fokker-Planck process and diffusion is given
in Chapter 10. The form of our Langevin equation is different from the stochastic
differential equation using Ito's calculus lemma. The transform of our Langevin
equation obeys the ordinary calculus rule, hence, can be easily performed and
some misleadings can be avoided. The origin of the difference between our
approach and that using Ito's lemma comes from the different definitions of the
stochastic integral.
Application of these tools contribute to the remainder of the book. These appli-
cations fall primarily into two categories: physical examples, and examples from
finance. And these applications can be pursued independently.
The physical application that required learning all these techniques was the
determination of the motion and noise (line-width) of self-sustained oscillators
like lasers. When nonlinear terms are added to a linear system it usually adds
background noise of the convolution type, but it does not create a sharp line. The
question "Why is a laser line so narrow" (it can be as low as one cycle per second,
even when the laser frequency is of the order of 1015 per second) is explained in
Chapter 11. It is shown that autonomous oscillators (those with no absolute time
origin) all behave like van der Pol oscillators, have narrow line-widths, and have
a behavior near threshold that is calculated exactly.
PREFACE vii
Chapter 12 on noise in semiconductors (in homogeneous systems) can all be
treated by the Lax-Onsager "regression theorem".
The random motion of particles in a turbid medium, due to multiple elastic scat-
tering, obeys the classic Boltzmann transport equation. In Chapter 13, the center
position and the diffusion behavior of an incident collimated beam into an infinite
uniform turbid medium are derived using an elementary analysis of the random
walk of photons in a turbid medium. In Chapter 14, the same problem is treated
based on cumulant expansion. An analytical expression for cumulants (defined in
Chapter 1) of the spatial distribution of particles at any angle and time, exact up to
an arbitrarily high order, is derived in an infinite uniform scattering medium. Up
to the second order, a Gaussian spatial distribution of solution of the Boltzmann
transport equation is obtained, with exact average center and exact half-width with
time.
Chapter 15 on the extraction of signals in a noisy, distorted environment has
applications in physics, finance and many other fields. These problems are ill-
posed and the solution is not unique. Methods for treating such problems are
discussed.
Having developed the tools for dealing with physical systems, we learned that
the Fokker-Planck process is the one used by Black and Scholes to calculate the
value of options and derivatives. Although there are serious limitations to the
Black-Scholes method, it created a revolution because there were no earlier meth-
ods to determine the values of options and derivatives. We shall see how hedging
strategies that lead to a riskless portfolio have been developed based on the Black-
Scholes ideas. Thus financial applications, such as arbitrage, based on this method
are easy to handle after we have defined forward contracts, futures and put and call
options in Chapter 16.
The finance literature expends a significant effort on teaching and using Ito
integrals (integrals over the time of a stochastic process). This effort is easily
circumvented by redefining the stochastic integral by a method that is correct
for processes with nonzero correlation times, and then approaching the limit in
which the correlation time goes to zero (the Brownian motion limit). The limiting
result that follows from our iterative procedure, disagrees with the Ito definition
of stochastic integral, and agrees with the Stratanovich definition. It is also less
likely to be misleading as conflicting results were present in John Hull's book on
Options, Futures and Other Derivative Securities.
In Chapter 17 we turn to methods that apply to economic time series and other
forms including microwave devices and global warming. How can the spectrum of
economic time series be evaluated to detect and separate seasonal and long term
trends? Can one devise a trading strategy using this information?
How can one determine the presence of a long term trend such as global warm-
ing from climate statistics? Why are these results sensitive to the choice of year
from solar year, sidereal year, equatorial year, etc. Which one is best? The most
viii PREFACE
careful study of such time series by David J. Thomson will be reviewed. For exam-
ple, studies of global warming are sensitive to whether one uses the solar year,
sidereal year, the equatorial year or any of several additional choices!
This book is based on a course on Random Processes in Physics and Finance
taught in the City College of City University of New York to students in physics
who have had a first course in "Mathematical Methods". Students in engineering
and economics who have had comparable mathematical training should also be
capable of coping with the text. A review/summary is given of an undergraduate
course in probability. This also includes an appendix on delta functions, and a fair
number of examples involving discrete and continuous random variables.
Contents
1 Review of probability 1
1.1 Meaning of probability 1
1.2 Distribution functions 4
1.3 Stochastic variables 5
1.4 Expectation values for single random variables 5
1.5 Characteristic functions and generating functions 7
1.6 Measures of dispersion 8
1.7 Joint events 12
1.8 Conditional probabilities and Bayes'theorem 16
1.9 Sums of random variables 19
1.10 Fitting of experimental observations 24
1.11 Multivariate normal distributions 29
1.12 The laws of gambling 32
1.13 Appendix A: The Dirac delta function 35
1.14 Appendix B: Solved problems 40
5 Thermal noise 82
5.1 Johnson noise 82
5.2 Equipartition 84
5.3 Thermodynamic derivation of Johnson noise 85
5.4 Nyquist's theorem 87
5.5 Nyquist noise and the Einstein relation 90
5.6 Frequency dependent diffusion constant 90
6 Shot noise 93
6.1 Definition of shot noise 93
6.2 Campbell's two theorems 95
6.3 The spectrum of filtered shot noise 98
6.4 Transit time effects 101
6.5 Electromagnetic theory of shot noise 104
6.6 Space charge limiting diode 106
6.7 Rice's generalization of Campbell's theorems 109
Bibliography 307
Index 323
A note from co-authors
Most parts of this book were written by Distinguished Professor Melvin Lax
(1922-2002), originated from the class notes he taught at City University of New
York from 1985 to 2001. During his last few years, Mel made a big effort in edit-
ing this book and, unfortunately, was not able to complete it before his untimely
illness.
Our work on the book is mostly technical, including correcting misprints and
errors in text and formulas, making minor revisions, and converting the book to
LaTex. In addition, Wei Cai wrote Chapter 14, Section 10.3-10.5, Section 16.8,
and made changes to Section 8.3, 16.4, 16.6 and 16.7; Min Xu wrote Chapter 13
and partly Section 15.6.
We dedicate our work in this book in memory of our mentor, colleague and
friend Melvin Lax. We would like to thank our colleagues at the City College
of New York, in particular, Professors Robert R. Alfano, Joseph L. Birman and
Herman Z. Cummins, for their strong support for us to complete this book.
Wei Cai
MinXu
1
Review of probability
Introductory remarks
The definition of probability has been (and still is) the subject of controversy. We
shall mention, briefly, three approaches.
it is regarded as the definition of the probability of success. One can object that
this definition is meaningless since the limit does not exist, in the ordinary sense,
that for any e there exists an N such that for all M > N, |P/v - P\ < e. This
limit will exist, however, in a probability sense; namely, the probability that these
inequalities will fail can be made arbitrarily small. The Chebycheff inequality of
Eq. (1.32) is an example of a proof that the probability of a deviation will become
arbitrarily small for large deviations. What is the proper statement for the definition
of probability obtained as a "limit" of ratios in a large series of trials?
We shall sidestep the above controversy by assuming that for our applications there
exists a set of elementary events whose probabilities are equal, or at least known,
and shall describe how to calculate the probability associated with compound
events. Bertrand's paradox in Appendix l.B illustrates the clear need for prop-
erly choosing the underlying probabilities. Three different solutions are obtained
there in accord with three possible choices of that uniform set. Which choice is
correct turns out not to be a question of mathematics but of the physics underlying
the measurements.
Suppose we have a random variable X that can take a set S of possible values
Xj for j = 1,2,..., N. It is then assumed that the probability
of each event j is known. Moreover, since the set of possible events is complete,
and something must happen, the total probability must be unity:
as given, and assume completeness for the density function p(x) in the form
STOCHASTIC VARIABLES 5
The "discrete" case can be reformatted in continuous form by writing
where 6(x) is the Dirac delta function discussed in Appendix l.A. It associates a
finite probability PJ to the value X = Xj.
Since mathematicians (until the time of Schwarz) regarded delta functions as
improper mathematics, they have preferred to deal with the cumulative density
function
which they call a distribution whereas physicists often use the name distribution
for a density function p(x). The cumulative probability replaces delta functions
by jump discontinuities which are regarded as more palatable. We shall only
occasionally find it desirable to use the cumulative distribution.
We shall refer to X as a random (or stochastic) variable if it can take a set (discrete
or continuous) of possible values x with known probabilities. With no loss of gen-
erality, we can use the continuous notation. These probabilities are then required
to obey
Mathematicians prefer the latter form and refer to the integral as a Stieltjes integral.
We have tacitly assumed that X is a single (scalar) random variable. However,
the concept can be immediately extended to the case in which X represents a
multidimensional object and x represents its possible values.
More generally, the nth moment of the probability distribution is defined by:
the probability density itself. Equation (1.65), below, provides one example in
which this definition is a useful way to determine the density distribution. In
attempting to determine the expectations of a random variable, it is often more
efficient to obtain an equation for a generating function of the random variable
first. For example, it may be faster to calculate the expectation exp(itX) which
includes all the moments (Xn} than to calculate each moment separately.
CHARACTERISTIC FUNCTIONS AND GENERATING FUNCTIONS 7
1.5 Characteristic functions and generating functions
which is the Fourier transform of the probability distribution. Note that </)(t) exists
for all real t and has the properties
and
for all t. We shall assume that the Stieltjes form of integral is used if needed. If all
moments of Eq. (1.14) exist then
provides a connection between the characteristic function and the moments. This
function <f>(t) is a so called generating function of a random variable.
A frequently used generating function can be obtained by setting
so that
Note that t is not the time, but just a parameter. One could equally well have used
k. The variable z is frequently used directly when the range of x is the set of
integers xr = r, and then
provided that the variable X is not a lattice variable whose range of values is given
by:
8 REVIEW OF PROBABILITY
If the range of X has an upper bound, i.e., p ( x ) = 0 for x > xmax, then it is
convenient to deal with the generating function
When neither condition is obeyed, one may still use these definitions with s
understood to be pure imaginary.
In this section, we shall introduce moments that are taken with respect to the
mean. They are independent of origin, and give information about the shape of
the probability density. In addition, there is a set of moments known as cumulants
to physicists, or Thiele semi-invariants to statisticians. These are useful in describ-
ing the deviation from the normal error curve since all cumulants above the second
vanish in the Gaussian case.
Moments
The most important measure of dispersion in statistics is the standard deviation a
defined by
since it describes the spread of the distribution p ( x ) about the mean value of x
m = (x).
Chebychev's inequality
guarantees that the probability of deviations that are large multiple h of the
standard deviation a must be small.
Proof
since the full value of <r2 would be obtained by adding the (positive) integral over
the region m — ha < x <m + ha. In each of these integrals, (x — m) 2 > (/icr)2.
MEASURES OF DISPERSION 9
The inequality remains if we replace the RHS by its smallest possible value
or
for moments about the mean, m = (x}, and to use Eq. (1.14)
for the ordinary moments. Thus /Z2 = o"2, and /j,i = 0. The binomial expansion of
Eq. (1.33) yields
where . = .,,";_ ^,
1 is the binomial coefficient. In particular:
LJ J •''^' - ''
Conversely:
10 REVIEW OF PROBABILITY
Cumulants
The cumulants to be described in this section are useful since they indicate clearly
the deviation of a random variable from that of a Gaussian. They are sometimes
referred to as Thiele semi-invariants (Thiele 1903).
The cumulants KJ are defined by
Note that normalization of the probability density p(x) guarantees that //(, = 1 and
KO = 0. Equivalently,
Thus K\ = m, and the higher AC'S are expressible in terms of the moments of
(x — m). In particular:
where the individual linked moments must still be calculated by Eqs. (1.45)-
(1.49). However, Eq. (1.43) can be written in a nice symbolic form:
Example The normal error distribution (with mean zero) associated with a
Gaussian random variable X,
MEASURES OF DISPERSION 11
has the characteristic function
The integral can be performed by completing the square and introducing x' =
x — ia2t as the new variable of integration. In particular, the cumulants are all
determined by In <f>(t] to be
where
These measures 71 and 72 clearly vanish in the Gaussian case. Moreover, they
provide a pure description of shape independent of horizontal position or scale.
12 REVIEW OF PROBABILITY
1.7 Joint events
Suppose that we have two random variables X and Y described (when taken sep-
arately) by the probability densities p\ (x) andp2(y) respectively. The probability
that X is found in the interval (x, x + dx) and at the same time y is found in the
interval (y, y + dy) is described by the joint probability density
Example
Two points x and y are selected, at random, uniformly on the line from 0 to 1.
(a) What is the density function p(£) of the separation £ = \x — y\l
(b) What is the mean separation?
(c) What is the root mean squared separation [{£2} — (£) 2 ] ?
(d) What is (W(|x — y|)) for an arbitrary function W?
Solution
It is necessary to map the square with vertexes at the four point in the (x, y) plane:
Using Eq. (1.62) and Eq. (1.64), the density function p(£) is then given by
JOINT EVENTS 13
Note that our use of a delta function to specify the variable we are interested in
is one of our principal tools. It fulfills our motto that experimentalists do it with
mirrors, and theorists do it with delta functions (and Green's functions).
The solution (1.65) is even in u, we can integrate over 1/2 the interval and
double the result:
where the last integral, with the value over unity, was entered as a means of intro-
ducing the variable £. Rearranging the order of integration we get for the right
14 REVIEW OF PROBABILITY
FlG. 1.2. The events A and B are nonoverlapping, and the probability of at least
one of these occurring is the sum of the separate probabilities.
hand side:
where the second integral is simply the definition of p(£). Restoring the left hand
side we have established another tautology:
where W(Q is an arbitrary function and p(£) was given in Eq. (1.66). Finally we
obtain an explicit formula
Disjoint events
If A and B are disjoint events (events that cannot both occur), then
where A U B means the union of A and B, that is, at least one of the events A or B
has occurred. In the language of set theory the intersection, A n B, is the region in
which both events have occurred. For disjoint sets such as those shown in Fig. 1.2,
the intersection vanishes.
Overlapping events
The probability that at least one of two events A and B has occurred when overlap
is possible is
because the sum of the first two terms counts twice the probability p(A n B) that
both events occur (the shaded area in Fig. 1.3).
JOINT EVENTS 15
FIG. 1.3. Events A and B that overlap are displayed. The hashed overlap region
is called the intersection and denoted A n B,
Note that an event A or B need not be elementary. For example, A could repre-
sent the tossing of a die with an odd number appearing which is the union of these
events, a one, a three, or a five appearing. B could be the union of the events one
and two.
Suppose YJ is a random variable that takes the value 1 if event Aj occurs, and
zero otherwise. We have thus introduced a projection operator, that is the analogue
for discrete variables of our introduction of a delta function for continuous vari-
ables. The probability that none of the events Ai,A2,..., An has occurred can be
written as
The probability that one or more events has occurred can be written as a
generalization of Eq. (1.75):
If X and Y are two, not necessarily independent variables, the conditional proba-
bility density P(y\x)dy that Y takes a value in the range [y, y + dy] given that X
has the value x is denned by
The notation in which the conditioned variables appear on the right is common in
the mathematical literature. It is also consistent with quantum mechanical notation
in which one reads the indexes from right to left. Thus verbally we say that the
probability that X and Y take the values x and y is the probability that X takes
the value x times the probability that Y will take the value y knowing that X has
taken the value x, a conclusion that now appears obvious.
Equation (1.82) is a general equation that imposes no requirements on the
nature of the random variables. Moreover, the same idea applies to events A and
B which may be more complicated than single random variables. Thus
Suppose that Ac is the complementary event to A (anything but A). Then these
events are mutually exclusive and exhaustive:
CONDITIONAL PROBABILITIES AND BAYES' THEOREM 17
c
Then the events A n B and A n B are mutually exclusive and their union is B.
Thus
By the same argument, if the set of events Aj are mutually exclusive, AiCiAj =
0, and exhaustive.
Bayes' theorem
One can determine the probability of a hypotheses Aj if we have made a mea-
surement B. This conditional probability probability P(Aj\B) is given by Bayes'
thp.nrp.TTT
The first equality follows directly from the definition Eq. (1.80) of a conditional
probability. The second equality is obtained by inserting Eq. (1.88). The impor-
tance of Bayes' theorem is that it extracts the a posteriori probability, P(Aj\B),
of a hypothesis Aj, after the observation of an event B from the a priori probability
P(Aj] of the hypothesis Aj.
For simple systems like the tossing of a die, the a priori probabilities are
known. In more general problems they have to be estimated, possibly as subjective
probabilities. Bayesians believe that this step is necessary. Anti-Bayesians do not.
They try to use another approach, such as maximum likelihood. In our opinion this
approach is equivalent to making a tacit assumption for the a priori probabilities.
We would prefer explicit assumptions.
Bernstein (1998) notes that Thomas Bayes (1763), an English minister,
published no mathematical works while he was alive. But he bequeathed his
manuscripts, in his will, to a preacher, Richard Price, who passed it to another
member of the British Royal Society, and his paper Essay Towards Solving A Prob-
lem in the Doctrine of Chance was published two years after his death. Although
Bayes' work was still ignored for two decades after his death in 1761, he has since
become famous among statisticians, social and physical scientists.
18 REVIEW OF PROBABILITY
Example
It is known that of 100 million quarters, 100 are two-headed. Thus the a priori
probability of a coin being two-headed is 10~6. A quarter is selected at random
from this population and tossed 10 times. Ten heads are obtained. What is the
probability that this coin is two headed?
Solution
Let AI = two headed, A^ = A\ = fair coin, B = ten heads tossing result, we
have.
Then,
Example
Two points are chosen at random in the interval [0,1]. They are connected by a
line. Two more points are then chosen over the same interval and connected by a
second line. What is the probability that the lines overlap?
Solution
We will choose the complementary question: what is the probability that they do
not overlap? which is easier to answer.
Suppose the first two points are x and y. No overlap will occur if the next two
points are both left of the smaller of x, y or both right of the larger of x, y. By
symmetry, the second probability is the same as the first.
SUMS OF RANDOM VARIABLES 19
Suppose x is the smaller of x and y. The probability that the third point is
less than x is x. The probability that the fourth point is less than x is also x. The
probability that both are less than x is x 2 . What is the probability density P(x = £)
given that x < y! This conditional probability is
where H(x) is the Heaviside step function, H(x) = 1 if x > 0 and H(x) = 0
otherwise. We can evaluate the denominator in Eq. (1.91):
If the variables are independent, then the averages over x and y can be performed
separately with the result
Because the cumulants are denned in terms of the logarithm of the characteristic
function- the cnmnlants are additive:
20 REVIEW OF PROBABILITY
More generally, if
The characteristic function of the joint distribution, p(x,y), of the two random
variables is defined by
If these variables X,Y are uncorrelated, i.e., (XY) = (X)(Y) = 0, then the
characteristic function factors:
Since p(x, y) can be obtained by taking the inverse Fourier transform of </>(s, t),
it too must factor. Hence we arrive at the result: if two Gaussian variables are
uncorrelated, they are necessarily independent.
If we ask for the probability that there are r successes in n trials without regard to
order, the probability will be
SUMS OF RANDOM VARIABLES 21
u/1-».o»-a
With z = elt, the generating function can be expanded using the binomial theorem
Since the coefficient of zr in the generating function is Pr [or Pr (n) with n fixed
and r variable!, we have established that
Comparison with Eq. (1.105) shows that the combinatorial and binomial coeffi-
cients are equal
becomes a Gaussian random variable of mean zero and variance equal to unity.
These statements are heuristic. They tacitly assume what is known as the con-
tinuity theorem of probability discussed by E. Parzen (1960) in his Section 10.3.
This theorem states that the cumulative distribution function Pn and the charac-
teristic function </>n are related in a continuous way. Pn converges at all points
of continuity of P if and only if the sequence of characteristic functions 0n (t)
converge at each real t to the characteristic function <p(t) of P.
The result we have just found for the binomial distribution, that the normalized
sum variable
tends to 0 as n —>• oo for some S > 0. Then the cumulative probability, Pn(u), that
See for example Uspensky (1937), Chapter 14. The condition (1.120) is less
stringent than the set of conditions in Eq. (1.115).
The cumulants then take the remarkable simple single value for all s
Linear fit
Suppose we expect, on theoretical grounds, that some theoretical variable Y should
be expressible a linear combination of the variables X^. We wish to determine
the coefficients aM of their linear expansion in such a way as to best fit a set of
experimental data, i = 1 to N, of values X^, Yi. We can choose to minimize the
least squares deviation between the "theoretical" value of Yi
and the observed value Yi by minimizing the sum of the squares of the deviations
between experiment and theory:
We can minimize by differentiating (F) with respect to a\. After removing a factor
of two, we find that the av obey a set of linear equations:
which provides a least squares fit to the date by a linear function of the set of
individual random variables.
The logic behind this procedure is that the measurement of Yi is made subject
to an experimental error that we treat as a Gaussian random variable e^. Errors
in the points X? of measurement are assumed negligible, and for simplicity, we
assume (e^e.,) = cV,<72 with <r2 independent of i. If a depends on i, then ith term
in Eq. (1.133) should contain a factor I/of. A more interesting question is how to
test the adequacy of the assumption that a linear fit is the correct one.
If there were only two independent variables X1 and X2, we could think of
these as horizontal coordinates and Y as a vertical coordinate. We have minimized
the sum of the squares of the vertical deviations because it was assumed that there
are no horizontal errors.
Nonlinear fit
If one expects a nonlinear relation of the form
where the a A are the parameters of the fit, then it is customary to attempt to
minimize the sum of the squares of the deviations
of cij < x < bj, and an expected number (HJ) = npj of events in that interval.
We then observe an actual set of re/s. Are these observations compatible with the
assumed p ( x ) distribution? Karl Pearson (1900) has established that the n/s are
described by a multivariate Gaussian of the form
where
that describes the probability of observing data leading to a chi-square larger than
the observed value. A fairly general but somewhat arbitrary convention is to use
5% as the dividing line between small and large. Thus a deviation large enough
to have a less than 5% chance of being observed (if the hypothesis is correct) is
used to cast suspicion on the validity of the hypothesis. Actually, this procedure
gives no probability of correctness of the hypothesis, only the probability that
of observing the given event if the hypothesis is assumed correct. As Bayesians,
we claim that one needs an a priori probability of correctness to deduce the a
posteriori probability of correctness with the help of Bayes' theorem. With only
qualitative a priori information, one could reduce the 5% level say to 1% if one
has peripheral information leading one to have some faith in the hypothesis.
Can chi-square be too small? There was an article in Science magazine within
the last 10 years that argued that the experimental results supporting Coulomb's
inverse square law had such a small chi-square that the fit was too good and that
the experimental data was doctored.
If we let
we obtain
The result is
This result is used in Appendix 1 .A to determine the volume and surface area of a
sphere in n dimensions.
Let us consider a simple dice throwing example from Alexander (1991).
Example
A single die is tossed n = 100 times. The frequency with which each side of
the die is observed is indicated in Table 1.1 below. Calculate chi-square from this
table.
Solution In this case, all pi = 1/6, so npi = 100/6 = m. There are six possible
frequencies, but there is one constraint. The sum must add to 100. Thus there are
MULTIVARIATE NORMAL DISTRIBUTIONS 29
TABLE 1.1.
Value 1 2 3 4 5 6
Freq 18 18 17 13 18 16
Is this large? No. Indeed, the 5% acceptance level permits fluctuations as large
as 11.070 in chi-square when the number of degrees of freedom is only 5.
where x denotes a vector with n components, x i , X 2 , ...a;n, x' denotes the trans-
posed vector and A denotes a n x n matrix which can be chosen to be symmetric.
When the matrix Aij possesses off-diagonal terms, the components of x are
correlated.
The normalization factor N is determined by
we can set
and
is diagonal. Thus
But
Thus
where
The multivariate distribution, Eq. (1.158) that we started with has all means equal
to zero, (X) = 0.
MULTIVARIATE NORMAL DISTRIBUTIONS 31
The slightly generalized distribution
As in the univariate case, the exponent is a quadratic form and all cumulants
of order higher than 2 vanish. Moreover, the second moment of the distribution
is given directly by the coefficients of the quadratic terms in the exponent. In
particular, with
1
Writing V = A , which is the variance matrix,
so that
Introduction
In this section we propose two laws of gambling that appear to contradict one
another. We shall state them loosely first to demonstrate the apparent contradiction:
The first law of gambling states that no betting scheme, i.e., method of varying
the size of bets, can change one's expected winnings.
The second law of gambling states that if you are betting against a slightly
unfair "house" there is a way to arrange one's bets to maximize the probability of
winning.
Proof
Let d be the odds given, so that if br is the amount bet on the rth trial, the loss is br
on failure or the amount won on success is brd. If S is the sum won, its expected
THE LAWS OF GAMBLING 33
value is
where
is a measure of the game's unfairness. In a fair game, the odds would be d = q/p
and the expected winnings remain zero regardless of the choice of the br. In any
case, the expected winnings depends on the total bet B, not on how the bets were
distributed.
Another fallacy with the scheme of doubling one's bets is that it presumes the
bettor has infinite capital. The problem of winning is reformulated in the second
law of gambling.
where e = q — dp is the degree of unfairness of the bet, and B is the total amount
bet.
Proof
If P is the (unknown) probability of winning, and Q = 1 — P, the expected
winnings are
Conjecture
Consistent with the restriction that one should never bet more than necessary to
win the game in a single step, the best strategy is to make a sufficiently large bet
that one can win in a single try. We assume the game is unfair, and this procedure
is designed to minimize the total bet.
Suppose bets are available with odds up to d = W/C. Then one should make
a single bet of one's entire capital C, at these odds. The probability of winning in
this single step is p, which by Eq. (1.186) can be expressed in terms of the available
odds as
Any betting scheme has the probability, P, of Eq. (1.187), of winning. If the odds
are lower, several bets will have to be made, and we will have B > C, thus
Suppose odds up to d\ = W/(C/2) are available. Then one can bet (7/2 on the
first bet and win with probability
The best one can do, then, is to stop if one wins on the first stop, and to bet C/2
again if one loses. The expected amount bet is then
so that the probability P of winning is better than that in the first scheme, Eq.
(1.190). It is clear, that if the degree, e, of unfairness is the same at all odds, it is
favorable to choose the highest odds available and bet no more than necessary to
achieve C + W.
APPENDIX A: THE DIRAC DELTA FUNCTION 35
1.13 Appendix A: The Dirac delta function
Point sources enter into electromagnetic theory, acoustics, circuit theory, prob-
ability and quantum mechanics. In this appendix, we shall attempt to develop
a convenient representation for a point source, and establish its properties. The
results will agree with those simply postulated by Dirac (1935) in his book on
quantum mechanics and called delta functions, or "Dirac delta functions" in the
literature.
What are the required properties of the density associated with a point source?
The essential property of a point source density is that it vanishes everywhere
except at the point. There it must go to infinity in such a manner that its integral
over all space (for a unit source) must be unity. We shall start, in one dimension,
by considering a source function 6(e, x) of finite size, e, which is small for x 3> e,
and of order 1/e for x < e, such that the area for any e is unity:
A problem involving a point source can always be treated using one of the finite
sources, S(e, x), by letting e —» 0 at the end of the calculation. Many of the steps
in the calculation (usually integrations) can be formed more readily if we can let
e —> 0 at the beginning of the calculation. This will be possible provided that the
results are independent of the shape g(y) of the source. Only if this is the case,
36 R E V I E W OF PROBABILITY
however, can we regard the point source as a valid approximation in the physical
problem at hand.
The limiting process e —>• 0 can be accomplished at the beginning of the
calculation by introducing the Dirac delta function
This is not a proper mathematical function because the shape g(y) is not specified.
We shall assume, however, that it contains those properties which are common to
all shape factors. These properties can be used in all problems in which the point
source is a valid approximation. It is understood that the delta function will be
used in an integrand, where its properties become well denned.
The most important property of the 6 function is
for a < b < c and zero if b is outside these limits. Setting 6 = 0, for simplicity of
notation, we can prove this theorem in the following manner:
In the first term, the limit as e —> 0 can be taken. The integral over g(y) is then
1 if a < 0 < c since the limits then extend from — oo to oo, and 0 otherwise
since both limits approach plus (or minus) infinity. The result then agrees with
the desired result, Eq. (1.202). The second integral can be easily shown to vanish
under mild restrictions on the functions / and g. For example, if / is bounded and
g is positive, the limits can be truncated to fixed finite values, say a' and c' (to any
desired accuracy), since the integral converges. Then the limit can be performed
on the integrand, which then vanishes.
The odd part of the function makes no contribution to the above integral, for
any f(x). It is therefore customary to choose g(y) and hence S(x) to be even
fnnrtinns
when the range of integration includes the singular point. The indefinite integral
over the delta function is simply the Heaviside unit function H(x)
With g(y) taken as an even function, its integral from negative infinity to zero is
one half, so that the appropriate value of the Heaviside unit function at the origin
is
\AJlM
since
and Eq. (1.211) is clearly valid underneath an integral sign in accord with Eq.
(1.202).
As a special case Eq. (1.211) yields
or
Thus a symmetric region [—e, e] is excised by the principal valued reciprocal before
the integral is performed. The function x/(x2 + e 2 ) behaves as 1/x for all finite
x and deemphasizes the region near x = 0. Thus, in the limit, it reduces to the
nrincinal valued recinroeal. The, combination
This theorem follows from the fact that a delta function vanishes everywhere
except at its zeros, and near each zero, we can approximate
The denominator is just the Jacobian for the transformation from Cartesian to
spherical coordinates. This is the natural generalization of the Jacobian found in
Eq. (1.221), and guarantees that the same result, Eq. (1.226), is obtained regardless
of which coordinate system is used.
where the Heaviside unit function confines the integration region to the interior of
the sphere. If we now differentiate this equation with respect to R2 to convert the
Heaviside function to a delta function, we get
Now, if we let X{ = Rui for all i and use the scaling property, Equation (1.220) of
delta functions we get
40 REVIEW OF PROBABILITY
where
Solution
Out of the 36 possible tosses of a pair, only the four combinations, 1 + 4, 2 + 3,
3 + 2, and 4 + 1 add to 5. Similarly, six combinations add to 7: 1 + 6, 2 + 5,
3 + 4, 4 + 3, 5 + 2, and 6 + 1. Thus in a single toss the three relevant probabilities
P5 = 4/36, P7 = 6/36 for 5 and 7, and the probability for all other possibilities
combined P0 = 26/36. The probability of r tosses of "other", and s > 1 tosses of
5, followed by a toss of a 7 is given by
where the sum on s starts at 1 to insure the presence of a 5 toss, r has been replaced
by n — s and the combinatorial coefficient has been inserted to allow the "other"
tosses and the 5 tosses to appear in any order. The 7 toss always appears at the end.
APPENDIX B: SOLVED PROBLEMSs 41
The sum over s can be accomplished by adding and subtracting the s = 0 term:
The corresponding result for 7 to appear first is obtained using the formulas with
5 and 7 interchanged:
Solution
Because of the distribution of the gold coins, the probability that the coin came
from the first drawer is pi = 3/6 because three of the six available gold coins
were in that drawer. Similarly p2 = 2/6, and p% = 1/6. The probability that there
is a second coin in the same drawer is 1 x pi + 1 x p% + 0 x p% = 5/6. Similarly,
the probability that the second selected coin (from the same drawer) is gold is
1 x pi + (1/2) x p2 = 2/3, since the second coin is surely one in the first drawer,
and has a 1/2 chance in the second drawer, and there are no gold coins left in the
third drawer. Note that these values of 1, 1/2, and 0 are conditional probabilities
given the outcome of the first choice.
Solution 2
The chord will be greater than the triangle side if the angle subtended by the
chord is greater than 120 degrees (out of a possible 180) which it achieves with
probability 2/3.
Solution 3
Draw a tangent line to the circle at an intersection with the chord. Let (f> be the
angle between the tangent and the chord. The chord will be larger than the triangle
side if <j> is between 60 and 120 degrees, which it will be with probability (120 —
60)/180 = 1/3.
Solution 2 is given in Kyburg (1969). Solution 3 is given by Uspensky (1937)
and the first solution is given by both.
Which solution is the correct one? Answer: the problem is not well defined.
We do not know which measures have uniform probability unless an experiment it
specified. If a board is ruled with a set of parallel lines separated by the diameter,
and a circular disk is dropped at random, the first solution is correct. If one spins a
pointer at the circle edge, the third solution would be correct.
Gambler's ruin
Hamming (1991) considers a special case of the gambler's ruin problem, in which
gambler A starts with capital C, and gambler B starts with W (or more) units. The
game will be played until A loses his capital C or wins an amount W (even if B is
a bank with infinite capital). Each bet is for one unit, and there is a probability p
that A will win and q = I — p that B will win.
Solution
The problem is solved using the recursion relation
APPENDIX B: SOLVED PROBLEMS 43
where P(n) is the probability that A will win if he holds n units. The boundary
conditions are
Strictly speaking, the recursion relation only needs to be satisfied for 0 < n < T,
which omits the two end points. However, the boundary conditions, Eq. (1.240),
then lead to a unique solution. The solution to a difference equation with constant
coefficients is analogous to that of a differential equation with constant coeffi-
cients. In the latter case, the solution is an exponential. In the present case, we
search for a power law solution, P(n) = rn, which is an exponential in n. The
result is a quadratic equation with roots 1 and p/q. The solutions, 1™ and (p/q)n,
actually obey the recursion relation, Eq, (1.239), for all n. But they do not obey
the boundary conditions. Thus we must, as in the continuous case, seek a linear
combination
and impose the boundary conditions of Eq. (1.240) to obtain simultaneous linear
conditions on A and B. The final solution is found to be
Since our solution obeys the boundary conditions, as well as the difference equa-
tion everywhere (hence certainly in the interior) it is clearly the correct, unique
solution.
2
is known for all possible sets [ti, t%, ...,tn] of times. Thus we assume that a set of
functions
A stationary process is one which has no absolute time origin. All probabilities are
independent of a shift in the origin of time. Thus
In particular, this probability is a function only of the relative times, as can be seen
by setting r = —ti. Specifically, for a stationary process, we expect that
reduces to the stationary state, independent of the starting point when this limit
exists. For the otherwise stationary Brownian motion and Poisson processes in
Chapter 3, the limit does not exist. For example, a Brownian particle will have a
distribution that continues to expand with time, even though the individual steps
are independent of the origin of time.
A Gaussian process is one for which the multivariate distributions
pn(xn,xn-i, ...,xi) are Gaussians for all n. A Gaussian process may, or may
not be stationary (and conversely).
A Markovian process is like a student who can remember only the last thing he
has been told. Thus it is defined by
that is the probability distribution of xn is sensitive to the last known event xn-\
and forgets all prior events. For a Markovian process, the conditional probability
formula. Ea. (2.5) specializes to
or
where wa>a is the transition probability per unit time and the second term has been
added to conserve probability. It describes the particles that have not left the state
THE CHAPMAN-KOLMOGOROV CONDITION 47
a provided that
If we set t = to + Ato, we can evaluate the right hand side of the Chapman-
Kolmosorov condition to first order in At and Atr>:
which is just the value p(a', to + At + Ato|ao, to) expected from Eq. (2.18).
Note, however, that this proof did not make use of the conservation condition,
Eq. (2.19). This will permit us, in Chapter 8, to apply the Chapman-Kolmogorov
condition to processes that are Markovian but whose probability is not normalized.
3
Examples of Markovian processes
Consider two physical problems describable by the same random process. The
first process is the radioactive decay of a collection of nuclei. The second is the
production of photoelectrons by a steady beam of light on a photodetector. In both
cases, we can let a discrete, positive, integer valued, variable n(t) represent the
number of counts emitted in the time interval between 0 and t. In both cases there
is a constant probability per unit time v such that vdt is the expected number of
photocounts in [t, t + dt] for small dt. We use the initial condition
Then n — HQ will be the number of counts in the interval [0, i]. When we talk
of P(n, t) we can understand this to mean P(n, t|n 0 , 0), the conditional density
distribution. Since the state n(t) = n is supplied by transitions from the state n — I
with production of photoelectrons at a rate vdt and is diminished by transitions
from state n to n + 1 we have the eauation
with the middle term supplying the increase in P(n) by a transition from the n — I
state, and the last term describes the exit from state n by emission from that state.
These are usually referred to as rate in and rate out terms respectively. Canceling
a factor dt we obtain the rate equation
In the first term, n increases from n — 1 to n, in the second from n to n +1. Thus n
never decreases. Such a process is called a birth process in the statistics literature,
or a generation process in the physics literature. A more general process is called
a birth and death process or a generation-recombination process.
Since n > no we have no supply from the state P(HQ — 1, t) so that
whose solution is
THE POISSON PROCESS 49
since P(n, 0) = Snjrto at time t = 0 corresponding to the certainly that there are
no particles at time t = 0.
The form, Eq. (3.5), of this solution suggests the transformation
Thus any Q(n,t) may be readily obtained if Q(n—l) is known. Butn, as described
by Eq. (3.3), can only increase. Thus
or, setting n = no + m,
for n > no with a vanishing result for n < no- This result specializes to the usual
Poisson answer
for the usual case no = 0 (see also Eq. (1.128)). The two formulas, Eqs. (3.12)
and (3.13) are, in fact, identical since n — no has the meaning of the number of
events occurring in the interval (0, t). The more general form is useful in verifying
50 EXAMPLES OF M A R K O V I A N PROCESSES
the Chapman-Kolmogorov conditions
where the last step recognizes the binomial expansion that occurred in the previous
step. The final result is equal to that in Eq. (3.13) if t is replaced by (t — to) in the
latter.
The Poisson process is stationary so that P(n, t no, to) is a function only of
t — to- However, no limit exists as t — to —»• oo, so that there is no time independent
P(n). We shall therefore evaluate the characteristic function of the conditional
probability density
This result reduces to Eq. (1.130) if one sets HO = 0. The cumulants can be
calculated as follows:
Here the subscript L is used to denote the linked moment or cumulant as in Section
1.6.
points at the positions ja where j = 0, ±1, ±2, etc. and a is the spacing between
the points. At each interval of time, T, a hop is made with probability p to the right
and q = 1 — p to the left.
The distribution of r, of hops to the right, in N steps in given as before by the
Bernoulli distribution:
The first moment, and the second moment about the mean are given as before in
Section 1.9 by
A particle that started at 0 and taken r steps to the right, and N — r to the left
arrives at position
Notice, if p = q = 1/2, or equal probability to jump to the right or the left, the
average position after N steps will remain 0. The second moment about the mean
is given by
From the central limit theorem, discussed in Section 1.9, the limiting dis-
tribution after many steps is Gaussian with the first and second moments just
obtained:
The factor 2 in the definition of the diffusion coefficient D is appropriate for one
dimension, and would be replaced by 2d if were in a space of dimension d. Thus
the distribution moves with a "drift" velocity
The problem we discussed, in connection with the second law of gambling, that
of winning a specific sum W starting with a finite capital C, is referred to as the
Gambler's ruin problem.
To make connection to physical problems, we map the probability to a ran-
dom walk problem on a line. It is distinguished from conventional random walk
problems because it involves absorbing boundaries. Since the game ends at these
boundaries it is also a first passage time problem - a member of a difficult class.
The gambling problem with bet 6 and odds d can be described as a random
walk problem on a line with steps to the left of size 6 if a loss is incurred, and a
step to the right of bd if a win occurs. Instead of dealing with the probability of
winning at each step, we shall define P(x) as the probability of eventually winning
if one starts with capital x.
Our random walk starts at the initial position C. The game is regarded as lost if
one arrives at 0, i.e., no capital left to play, and it is regarded as won if one arrives
at the objective C + W.
G A M B L E R ' S RUIN 53
Our random walk can be described by the recursion relation:
since the right hand side describes the situation after one step. With probability p
one is at position x + bd with winning probability P(x + bd) and with probability
q, one is at position x — b with winning probability P(x — b). Since the probability
of eventual winning depends on x, but not how we got there, this must also be the
probability P(x).
The procedure we have just described of going directly after the final answer,
rather than following the individual steps, is given the fancy name "invariant
embedding" by mathematicians, e.g., Bellman (1964).
The boundarv conditions we have are
We establish in Appendix 3.A that there are exactly two roots, one with A =
0, and one with A > 0. Calling the second root A, the general solution of Eq.
(3.28) is a linear combination of 1 and exp(Ax) subject to the boundary conditions,
Eq. (3.29), with the result
Although Eq. (3.30) does not supply an explicit expression for A (except in the
case C <C 1), we know that A > 0. The denominator in Eq. (3.32) then increases
more rapidly with A than the numerator. Thus
Since the condition (3.30) involves only the product A6, an increase in b causes
a decrease in A, hence an increase in P. Thus the probability of winning is an
increasing function of b. Of course, at the starting position, a bet greater than C is
54 EXAMPLES OF M A R K O V I A N PROCESSES
impossible. Thus the optimum probability is obtained if A is calculated from Eq.
(3.30) with b replaced by C:
Our arguments have tacitly assumed that one bet requires a step outside the
domain 0 < x < W + C. Thus if a game with large odds d = 2W/C were
allowed, the preceding argument would not apply, and it would be appropriate to
bet C/2, since the objective is to win no more than W, and to terminate the game
as soon as possible, in order to minimize the total amount bet.
In the large N limit, the distribution function (3.23) for the one-dimensional
random walk can be written as
Equation (3.35) is the Green's function of the diffusion equation that is written
down explicitly in Eq. (3.50) below. That is, Eq. (3.35) is the solution of Eq. (3.50)
that obeys the initial condition:
Let us compare this result with the macroscopic theory of diffusion in which a
concentration c of particles obeys the conservation law.
where the particle (not electrical) current density is given by Pick's law
and D is the macroscopic diffusion constant. Thus c obeys the diffusion equation
in agreement with our random walk result, Eq. (3.35) for v = 0 but the initial
position at XQ.
where the mechanical mobility B is the mean velocity of a particle per unit applied
force. Thus the drift current per unit of concentration c is proportional to the
applied field F.
However, if a force F is applied in an open circuit, a concentration gradient
will build up large enough to cancel the drift current
or
The simplest example of this is the concentration distribution set up in the atmo-
sphere subject to the gravitational force plus diffusion. This steady state result
must agree with the thermal equilibrium Boltzmann distribution
Comparison of the two expressions for c(x) yields the Einstein relation between
diffusion, D, and mobility, B:
For charged particles, F = eE, and the electrical mobility is /j, = v/E = eB so
that
where v = BF (or efj,E in the charged case). Equation (3.50) is a special case
of a Fokker-Planck equation to be discussed in Section 8.3. We note, here, that
the drift is contained in the first derivative coefficient term and the diffusion is the
second derivative coefficient.
The solution of this equation for a pulse starting at x = 0 at t = 0 is
which is the precise relation of the discrete random walk solution, Eq. (3.35).
By injecting a pulse of minority carriers into a semiconductor and examining the
response on an oscilloscope at a probe a distance down the sample, a direct mea-
surement can be made of the "time of flight" of the pulse and the spread in its
width. This technique introduced by Haynes was applied by his class Transistor
Teacher's Summer School (1952) to verify the Einstein relation for electrons and
holes in germanium. Note that Eq. (3.51) describes a Gaussian pulse whose center
travels linear with a velocity v. Thus the center of the pulse has a position that
grows linearly with time. Also, the pulse has a Gaussian shape, and the root mean
square width is given by (2Dt) 1 / 2 . Measurements were made by each of the 64
students in the class. The reference above contained the average results that ver-
ified the Einstein relation between the diffusion constant and the mobility. With
holes or electrons injected into a semiconductor, a pulse will appear on a computer
screen connected by probes to the semiconductor. For several probes at different
distances, the time of arrival can be noted and the width of the pulse is measured
at each probe. Thus a direct measurement is made of both the mobility /j, and the
diffusion constant D.
The biologist Robert Brown (1828) observing tiny pollen grains in water under
a microscope, concluded that their movement "arose neither from currents in the
fluid, nor from its gradual evaporation, but belonged to the particle itself". Mac-
Donald (1962) points out that there were numerous explanations of Brownian
motion, proposed and disposed of in the more than 70 years until Einstein (1905,
1906, 1956) established the correct explanation that the motion of the particles was
due to impact with fluid molecules subject to their expected Boltzmann distribution
of velocities.
L A N G E V I N THEORY OF VELOCITIES IN B R O W N I A N MOTION 57
It is of interest to comment on the work of von Nageli (1879) who proposed
molecular bombardment but then ruled out this explanation because it yielded
velocities two orders of magnitude less than the observed velocities of order
10~4cm/sec. von Nageli assumed that the liquid molecules would have a velocity
given by
where the mass of the Brownian particle M is proportional to the cube of its radius
so that
Our introduction to the Langevin treatment of Brownian motion comes from the
paper of Chandrasekhar (1943) and the earlier paper of Uhlenbeck and Ornstein
(1930), both of which are in the excellent collection made by Wax (1954).
However, a great simplification can be made in the algebra if one assumes from
the start that the process is Gaussian in both velocity and position. The justification
is given in Appendix 3.B.
The distribution of velocity is first considered. The free particle of mass M
subject to collisions by fluid molecules is described by the equation (for simplicity,
we discuss in a one-dimensional case, instead of the actual three-dimensional case)
It was Langevin's (1908) contribution to recognize that the total force F exerted
by the fluid molecules contains a smooth part —v/B associated with the viscosity
58 EXAMPLES OF M A R K O V I A N PROCESSES
of the fluid that causes the macroscopic motion of a Brownian particle to decay
plus a fluctuating force F(t) whose average vanishes
This fluctuating part will be shown to give rise to the diffusion of the Brownian
particle. The relation between the fluctuating part and the diffusion part is the
Einstein relation to be derived below. It is also a special case of the fluctuation-
dissipation theorem to be derived in Chapter 7.
Note that if a steady external force G is applied, the average response at long
times is v = BG so that B is to be interpreted as the mechanical mobility. If the
particle is a sphere of radius a moving in a medium of viscosity rj then Stokes law
yields (in the three-dimensional case)
must fall off in times of order 10 sec, much shorter than the 10 sec decay
time. It is therefore permissible to approximate the correlation as a Dirac delta
function.
for the particular ^(s) of Eq. (3.62). The limiting value at long times is
In the limit as t —> oo,(v (t))Vo must approach the thermal equilibrium value
that relates a measure d of diffusion in velocity space to the mobility B (or the
dissipation). Equation (3.64) can be rewritten
Thus the mean square deviation of the velocity from its mean, starting with the
initial velocity VQ, namely avv is independent of the starting velocity VQ\ This is a
special case, with u = t, of Eq. (8.18) of Classical Noise I in Lax (19601).
For the delta correlated case, Eq. (3.62) shows that the velocity is a sum of
uncorrelated (hence independent) Gaussian variables since (A(s)A(s')) = 0 for
s ^ s'. Since each term is Gaussian, the sum will also be a Gaussian random vari-
able (see Appendix 3.B). Thus the statistics of v(t) are completely determined
by its mean and second cumulant since all higher cumulants vanish. Thus the
conditional probability density for v(t~) is given by
where (v)Vo = VQ exp(—At) and the unconditional average (v2} = kT/M is just
the thermal average independent of time. In the limit as t —» oo we approach the
steady state solution
Since the position of a particle is determined by the time integral of the velocity,
we would expect that the statistics of Brownian motion of a particle, that is the
random motion in position space, can be determined fairly directly by a knowl-
edge of its motion in velocity space. More generally, one would like to determine
the joint distribution in position and velocity space. We shall see in this section
that the manipulations to be performed involve only minor difficulties provided
that the distribution in positions, velocities and the joint distribution are all Gaus-
sians. That is because the distributions can be written down fairly readily from the
first and second moments if the distributions are Gaussian in all variables. But its
proof depends on the Gaussian nature of the sum of Gaussian variables. And we
have only established that Gaussian nature if the variables are independent. Since
positions and velocities are correlated, it is not clear whether the scaffold we have
built using independent variables will collapse.
To separate the computational difficulties from the fundamental one, we shall
perform the calculations in this section assuming that all the variables involved are
Gaussian, and reserve for Appendix 3.B a proof that this is the case.
The average position may be obtained by integrating Eq. (3.62) with respect to
time and averaging:
Here, all averages are understood to be taken contingent on given initial velocities
and positions.
Next, we calculate ((x(t) — (x(t))) 2 ). The general value of the random variable,
x ( t ) , can be obtained by integrating Eq. (3.62) over time, by setting t = w in Eq.
(3.62) and integrating over w from 0 to t. The expression is simplest if we subtract
off the average position given by Eq. (3.70). The result takes the form
L A N G E V I N THEORY OF POSITIONS IN BROWN1AN MOTION 61
where
with
where
The fluctuations in position are then described after applying Eq. (3.61) by
where VQ has again canceled out. It is of interest to examine axx (t) for small and
large t. For small t, we may expand the exponential so that
As
when At <C 1. Conversely, for large t, we may omit the e in cr 2 (t) to obtain
where the diffusion constant D for x is given by comparison with Eq. (3.79) to be
in terms of the diffusion constant d for velocity. After use of Eqs. (3.65), (3.66) we
find
where the A's describe deviations from the corresponding mean values conditional
on a given VQ and XQ.
The characteristic function of the conditional probability is then determined
from the first and second cumulants in the form
Both the first moments x and v and the second order cumulants are understood
to be conditional on the initial position and velocity that were just computed. The
original conditional distribution P(x, v, t\XQ, VQ, 0) appearing in Eq. (3.86) can be
obtained by taking the inverse Fourier transform of Eq. (3.86). This was already
done in Eq. (1.182) for the case of two Gaussian variables, and expressed directly
in terms of the second moments axx = a\, avv = a^ and the correlation coefficient
L A N G E V I N THEORY OF POSITIONS IN BROWN1AN MOTION 63
Adiabatic elimination
The conditional probability P(x, t\xo, 0) for position does not obey the Chapman-
Kolmogorov condition. Thus it does not describe a Markovian process. In what
way can the Uhlenbeck, Ornstein, Chandresekhar problem of diffusion in x and
v space be reduced to the usual Einstein Brownian motion problem in ordinary
space? If one recognizes that the time I/A is short compared to the time interval
At over which one measures the positions of the particles with a similar condition
on the accuracy of positions
Then if we regard the slow motion in d/dt as small compared to A, we neglect the
former and have
so that
Now x(t) obeys a Brownian motion directly, and the diffusion constant is
in agreement with Eq. (3.81) obtained earlier. Equation (3.89) is exactly parallel
to the Brownian motion Eq. (3.69) for velocity, and the analogous solution is the
standard diffusion solution of Eq. (3.41).
An adiabatic elimination procedure was extensively used to decrease a six
variable problem (two fields, two populations and two polarizations in a gas
laser) in Lax (1964QIII). to a two variable problem whose solution was feasible
(Hempstead and Lax 1967).
Some of the original references to the work on Brownian motion are given in
Smoluchowski (1916), Einstein (1905), and Furth (1920).
64 EXAMPLES OF M A R K O V I A N PROCESSES
3.8 Chaos
There are chaotic and other processes that differ from Brownian motion in that the
root mean square growth is
This occurs in many natural phenomena. An early known example relates to the
flow of water in the Nile river. Records of this water flow have been kept over
many centuries. A detailed investigation by a civil engineer, Hurst (1951), has an
exponent a that differs from 1/2 and fits closely to an exponent of 0.6.
The scaling properties of the fluctuations studied by Hurst (1951) were
investigated further by Anis and Lloyd (1976).
Many of the phenomena in chaotic motion are described in terms fractional
power laws associated with "fractals", a term introduced by Mandelbrot (1983). A
brief chapter is provided in Arfken and Weber (1995). An introduction to chaos is
provided by G. P. Williams (1997).
Let
and
By Eq. (3.30), the roots we desire obey F(Z] = 1. F(Z) is infinite at Z = 0 and
Z = co and possesses a single minimum at Zm determined by
or
This is clearly true for e —» 1. By expansion, it can be verified for small e. It can
be extended to all e by taking the derivative with respect to e and after canceling a
factor (1 + d)/d, verifying that
Taking the logarithm of Eq. (3.103), we can prove the stronger statement
as long as we restrict ourself to gambling that favors the house (with e > 0). The
second inequality
is true for all positive e. Thus our original inequality Fn < 1 is true for all e. Hence
there are two roots. By inspection, z = 1 is one root.
66 EXAMPLES OF M A R K O V I A N PROCESSES
Thus we have two possibilities. If the smaller root is Z = 1, the larger root
will have Zr > 1. We shall establish this by showing that the slope is negative at
Z = 1, showing that Z = 1 is the smaller root:
Theorem
The sum, C, of two independent Gaussian variables A and B is Gaussian.
Proof
The characteristic function of the sum is
where the independence of the variables permits the factorization. If A and B are
Gaussian, only linear and quadratic powers of t appear in each exponent involving
A and B, hence also for C. Thus the random variable C is Gaussian (proved).
If A + B is known to be Gaussian the calculations can be performed by direct
use of Ott's theorem (1.56):
will obey Ott's theorem (1.56) if the variables A(s) are independent, or if C is
otherwise known to be Gaussian.
Even if C is not Gaussian its mean value is given by
To deal with averages of products of linear stochastic integrals of the form found
in Eq. (3.62), we introduce a theorem that will make all the requirements easy.
Although not stated explicitly, this theorem is used implicitly and extensively by
Chandrasekhar (1943) and Uhlenbeck and Ornstein (1930).
The average of the product of two linear stochastic integrals can be written
and in the special case, when the motion is pure Brownian, Eq. (3.61) is obeyed
and
They do this in connection with determining the fourth moment of the velocity:
Since the four times can be partitioned into pairs in three ways they state that the
right hand side is, in effect multiplied by 3. When applied to the fourth moment of
the velocity Eq. (3.116) can be translated into
which is consistent with the velocity being Gaussian. Conversely, since the
Maxwell distribution has the Gaussian form
If v is Gaussian, this implies that for all linked moments of the velocity
Since the velocity can be written as an integral over a force as in Eq. (3.64)
For all the linked moments to vanish for n > 2 this must be true of the random
forces as well:
In this chapter we shall compare three definitions of noise: The standard engineer-
ing definition that takes a Fourier transform over a finite time interval, squares
it, divides by the time and then takes the limit as the time approaches infinity.
The second definition is the Fourier transform of the autocorrelation function. The
equality between these two definitions is known as the Wiener (1930)-Khinchine
(1934, 1938) theorem. The third procedure that we adopt it is to pass the signal
through a realizable filter of finite bandwidth, square it, and average over some
large finite time. As the bandwidth is allowed to approach zero, the result will
(aside from a normalization factor) approach the ideal value of the two preceding
definitions.
The standard engineering definition of noise is chosen for the case of a stationary
process, to obey the normalization
and verify the normalization later. We use the subscript s to denote the standard
engineering (SE) definition.
70 SPECTRAL MEASUREMENT AND CORRELATION
The letter j is used to denote the imaginary unit customary in electrical engi-
neering. The SE convention is that exp(jwt) describes positive frequencies and
R + jujL + l/(jujC) is the impedance of a series circuit of a resistance R, an
inductance L and a capacity C. Because propagating waves are described by
exp(ikx — iujt) in physical problems, we regard exp(—iujt) as describing pos-
itive frequencies so that the physics convention is equivalent to setting j = —i
consistently. It is also consistent with the convention in quantum mechanics that
a Schrodinger wave function in quantum mechanics has the factor exp(—iEt/H}
where E is the energy of the system and is positive for positive energies (or positive
frequencies E/K).
In this definition, Eq. (4.3), the interval on t is truncated to the region — T <
t < T, and its Fourier transform is taken, and the result squared. Since a measure-
ment would attempt to filter out one component QJ and square it, this definition is
reasonable. What is not clear yet, is why one divides by T rather than T2, which
will be clear later. The brackets {•) denote an ensemble average. It is curious that
both the limit T —» oo and an ensemble average are taken. For ergodic systems, a
time average and an ensemble average are equal because in such systems, over an
infinite time the system will visit all the points in phase space over which an ensem-
ble average is made (The more precise statement for quasi-ergodic systems, is that
over time a single system comes arbitrarily close to all points in phase space).
This is the reason why statistical mechanics works. The experimenter measures a
time average. The theorist finds it much easier to calculate an ensemble or phase
space average. Yet their results agree. For an experimenter to make an ensemble
average, he would have to average over an infinite number of systems. Instead, he
averages over time for one system. In most cases, then, either average is adequate,
and performing both is redundant. The above assumption is, however, wrong for
the measurement of noise. Middleton (1960) shows that if the ensemble average is
not performed, substantial fluctuations occur in the value of GS(UJ}.
Presumably, this sensitivity occurs because we are asking for the noise at a
precise frequency. Because of the Fourier relation between frequency and time,
a measurement accurate to Au; requires a time t > I/Aw. Realistic noise mea-
surements, to be discussed below, using filters of finite width, are presumably
ergodic.
guaranteed to yield a real, but not necessarily positive G(a, w, t). In the stationary
case.
since a shift of the time origin by r/2 is permitted. But, this new form is not
generally correct, and indeed is not necessarily real.
The Wiener-Khinchine (W-K) theorem states that the noise spectrum, Eqs. (4.3), is
given by the Fourier transform of the autocorrelation function, Equation (4.4). This
is equivalent to the statement that the above two definitions of noise are equivalent.
We shall prove the Wiener-Khinchine theorem by evaluating Gs(a, u) in terms
of G(a,uj):
This result was obtained by writing the squared integral in Eq. (4.3) as a product of
two separate integrals, and using different integration variables in each factor. In
72 SPECTRAL MEASUREMENT AND CORRELATION
the stationary case (for which the W-K theorem is valid) Eq. (4.7) can be written
The last step moved the limiting procedure under the integral sign and used
The appropriateness of the limit, Eq. (4.13), as discussed in Section 1.13 on the
delta functions, is based on the fact that (a) the integral of the left hand side, for
any T, is 1; (b) the width of the function of order 1/T and the maximum height at
a/ = u! is of order T, This function becomes very tall and narrow. An integration
against this function by any function G(u/) of bounded variation will be sensitive
only to its value at the peak a/ = u>. See Eq. (1.200).
Note that Eq. (4.11) with u = t leads to the normalization condition
is the customary symbol for spectral density used by statisticians. This normaliza-
tion in Eq. (4.14) is equivalent to the customary choice, Eq. (4.2), when G(a,uj)
is even in u, but more general when it is not. It follows easily from time rever-
sal that for classical variables evenness follows, but this is not true for quantum
mechanical variables (our definitions apply to the quantum case if a* is replaced
in the quantum case by the Hermitian conjugate, at). The quantum case will be
discussed in Chapter 7 in deriving the fluctuation-dissipation theorem.
NOISE MEASUREMENTS 73
4.4 Noise measurements
where K(t) is known as the indicial response of the filter, or its response to a S(t)
input pulse. In order that the filter be realizable, hence causal, output can only
appear after input so that
The upper limit in Eq. (4.16) can thus be extended to infinity without changing the
value of the integral. In terms of Fourier components
where
is chosen to emphasize the frequency region near U>Q. Thus we expect the output
spectrum to be \k(u, UQ)\2 times the input spectrum
However, this argument is heuristic, since the integral for a(uj) does not converge
in the usual sense, since the integrand in Eq. (4.18) does not decrease as t —> oo.
What is actually measured is
the time average of the squared signal. The subscript m denotes the definition of
noise using the filter. For long enough T, we expect ergodicity, and can replace the
74 SPECTRAL MEASUREMENT AND CORRELATION
time average by the ensemble average. Equation (4.22) and Eq. (4.16) combine to
yield
Equation (4.23) and Eq. (4.24) are valid for nonstationary processes. Stationarity
was assumed only in the last step to obtain Eq. (4.25). Order has been preserved in
the above steps so that they remain valid for noncommuting operators. Using the
Wiener-Khinchine theorem in reverse, Eq. (4.11), to eliminate the autocorrelation
we obtain
The factor 4vr arises because of the convention followed in Eq. (4.14). Thus the
desired spectrum at frequency UJQ can be extracted by using a sharp enough filter,
\k(uj, UJQ) | 2 - With an appropriate choice of filter K(t) we have described a Hewlett-
Packard spectrum analyzer.
with
where the coefficient TrR/(2L) was chosen to yield the correct integral
This integral was evaluated exactly using formula 031.10 in Grobner and Hofreiter
(1950) shown below. For ab > 0,
Before we answer the above question, let us note that G(a, uj) by the standard
definition of noise is manifestly real. This reality extends to the quantum case,
even when a is non-Hermitian. To see this, let us introduce the correlation noise
GAtB(u) by
where A and B are possibly non-Hermitian operators and the dagger represents
Hermitian conjugation. (For the classical case, simply regard the dagger as taking
a complex conjugate.) Note, that with £ — » • — £ and the use of stationarity, Eq.
(4.36) can be rewritten as
we obtain
by comparing to Eq. (4.36). Clearly, then, B^B is always Hermitian and GB^B(^)
is real. The question we raised above was under what circumstances is GB^B(^}
an even function of u. Alternatively, when is RB^B(^) an even function oft where
we define
Although the principal applications in this book are to classical physics and eco-
nomics, in which random variables commute (can be written in any order) we
maintain the order of our variables so that they remain valid in a quantum context.
Thus, where complex conjugate (classically) is replaced by Hermitian conjugate,
using a dagger, we write for the Hermitian conjugate of a product
This is consistent with the requirement that a measurement of the noise at a single
frequency, uj, requires an infinite measurement time, in agreement with the limiting
process T —>• oo used in Eq. (4.3).
One consequence of this difficulty is that a number of definitions have been sug-
gested in the literature, for example, by Page (1952), Lampard (1954). A detailed
analysis of the proposed spectrum by Page, and Lampard is made by Eberly and
Wodkiewicz (1977) who also provide a fourth definition of the time dependent
spectrum, which they call "the physical spectrum of light".
We shall not attempt to review this work here since Eberly and Wodkiewicz
(1977) have already made detailed comparisons.
What appears to have been overlooked in these references is that a number
of solutions appeared earlier for an analogous problem in quantum mechanics.
Position and momentum variables can also not be measured simultaneously with
complete precision because of the Heisenberg uncertainty principle. Thus a simul-
taneous distribution function for position and momentum would appear to be just
as much an oxymoron. However, Wigner (1932) proposed an elegant solution for
the density in phase (position and momentum space) which was expounded later
at some length by Moyal (1949).
In the Brandeis Lectures Lax (1968) suggested that the Wigner-Moyal choice
could be applied to the case of noise in nonstationary problems. Equation (4A18)
of the Brandeis lectures is the same as Eqs. (4.5), (4.6) here.
In Lax (1968QXI), however, it is shown that there are many possible choices
of distribution functions. In particular, there is the Wigner symmetric distribution,
the de Rivier symmetric distribution, and the normal and antinormal distributions.
If q is position and p is momentum, then there are combinations roughly
that are the negative and positive frequency parts (or in quantum mechanics
the destruction and creation operators respectively). Normal order involves all
creation operators to the left of all destruction operators. It is then possible to con-
struct normal ordered and antinormally ordered distributions. They are different
numerically, but related.
78 SPECTRAL MEASUREMENT AND CORRELATION
In Lax (1968QXI), the point is that any of these distributions can be used. And
they should all lead to the same final answer. But which one is most convenient to
use depends on what physical quantity is concerned and determined. For example,
if measurements are made for photon counters, then the antinormal distribution is
best in the sense that the desired results can be obtained by a simple integration
over a classical distribution function. But if another choice is made, corrections
will have to be calculated, as described in Lax (1968QXI) and in Lax and Yuen
(1968QXIII).
In our work on laser line-widths and photocount distributions, the distribution
function can be calculated analytically. And so the antinormal one was calculated.
In this section, our objective is different. We must choose the form of spectral
distribution that is easiest to measure. An additional consideration, is to make a
choice that is best for computing the physical result of interest. The latter may
depend on the nature of the measuring devices. It also depends on the nature of the
process involved, particularly, if we have some knowledge about it. For example,
Bendat and Piersol (1971) display three processes. See Fig. 4.1. The first is one in
which a random process (with zero average) is modified by a time varying mean
value. In the second, the mean is zero but the mean square varies randomly. In
the third, the frequency varies randomly. It is doubtful that one choice of spectral
formula is better than all others for all three cases.
There are other practical considerations in the construction of an algorithm for
spectral calculation. A true ensemble average might require an enormous amount
of repetitions of the experiment. Even if this could be done, it might be meaning-
less. For example, consider stock market prices. One could take a set of series,
each of which begins on January 1 and ends on December 31. Since conditions
(other than seasonal effects) could be sufficiently different at the start of each year
that averages (say over 100 years) might give very misleading results.
Perhaps the most suitable case for analysis is one in which the time-scale of
the nonstationary part of the processes is much longer than the time scale of the
stationary part.
An appropriate starting point for the nonstationary case would be Eq. (4.23) or
(4.24) in which one the signal a(t) through a filter by a convolution with K(t — t')
and then absolute squaring the result. The absolute squared result can be time
averaged over a suitably chosen time interval, as shown in Eq. (4.22). The latter
step can also be replaced by averaging with an exponential weight:
The choice we made of a realizable filter in Eq. (4.30) was chosen to be an RLC
circuit for simplicity. It has two resonances at frequencies whose real parts are
equal but opposite in sign. The procedure used by Eberly and Wodkiewicz (1977)
is equivalent to creating a filter with only a positive frequency resonance. Since
NOISE FOR NONSTATIONARY RANDOM VARIABLES 79
FIG. 4.1. Three processes displayed by Bendat and Piersol (1971). The first is
one in which a random process (with zero average) is modified by a time vary-
ing mean value. In the second, the mean is zero but the mean square varies
randomly. In the third, the frequency varies randomly.
both their filter and ours have a dissipative term that controls the frequency line-
width it also restricts the range of time data used. Our subsequent average over
a time IT or 1/7, with 7 in Eq. (4.47), provides separate control over the time
and frequency intervals. This freedom may be illusory since the time-frequency
product must exceed unity. Our procedures are thus very similar. There are some
small errors inherent in both, since our predicted noise, calculated over a time
interval is probably a better estimate of the value in the middle of the interval.
80 SPECTRAL MEASUREMENT AND CORRELATION
A detailed examination of the problem of time-varying spectra has recently
been made by Cohen (1995) who compares a wide variety of possible choices.
If we regard the problem of determining the spectrum from experimental data
in the presence of distortion, the problem becomes an ill-posed problem of the
sort discussed in Chapter 15 on "Signal Extraction in the Presence of Smoothing
and Noise". The ill-posed nature of such problems is overcome by regularization
procedures. Although they are not referred to in this way, the wide variety of win-
dowing procedures perform the function. We shall return to this problem of noise
in nonstationary systems in Chapter 17 on the "Spectral Analysis of Economic
Time Series".
Even if a(i) is a real variable, such as the voltage V(t), the associated Fourier
transform
is complex. If a(t) is a stationary variable, this expression for a(uj) is not a conver-
gent integral. Mathematicians might say this expression is meaningless. However,
its moments are meaningful:
Inserting the inverse Wiener-Khinchine relation, Eq. (4.11), for the last factor in
Eq. (4.49) one gets
an expression in terms of the noise spectrum itself of the variable a. The pres-
ence of the delta function shows that only two Fourier components at the same
frequency interfere with each other. Note aji = 2?r/i relates angular frequencies to
ordinary frequencies. The last term uses the notation S(a, /) = (l/2)G(a, w) of
Eq. (4.15), common in mathematics and statistics books.
APPENDIX A: COMPLEX VARIABLE NOTATION 81
Thus, if we have two variables related by a complex factor, such as the current,
I((jj], and a voltage, V(uj) related by an impedance Z(uj) then we can relate the
corresponding noise spectra by
Thermal noise
FlG. 5.1. The noise measured by Johnson (1928) versus resistance in six diverse
materials.
JOHNSON NOISE 83
FlG. 5.2. Thermal noise for two resistors in parallel versus temperature obtained
by Williams (1937) is plotted as an effective resistance (V2)/[4kdfTa] against
T2/Ta. Williams takes Ta to be Ti, except in the one resistor case, for which
RI is infinite, and he then chooses Ta to be room temperature. Both theory,
in Eq. (5.2), and experiment are linear functions of temperature. Line B is the
two-resistance case, and line A is the one-resistance case.
Johnson found that the measured noise power in the frequency interval is
proportional to the temperature of the resistor from which the noise emanates
where k is Boltzmann's constant. Theoretical results from Eq. (5.2) are represented
by solid lines in Fig. 5.2 compared with experimental data.
84 THERMAL NOISE
Moullin (1938) generalizes this result to the case of an arbitrary number of
impedances in parallel:
and write V = ZI, this result takes a simpler form in terms of current fluctuations
Other useful references on networks and noise are Murdoch (1970), Bell (1960)
and Robinson (1962, 1974).
5.2 Equipartition
where we have set u = LJQX, with LJQ = 1/(LC)1/2 as the circuit resonant fre-
quency, and Q = uj^L/R as the circuit Q factor (energy stored over energy lost
per cycle). This integral can be performed using the residue theorem of complex
variable theory, or by using Eq. (4.33) which was obtained from Grobner and
Hofreiter (1950).
Similarly, the energy stored on the capacitance is
Thus, for both the inductance and the capacitor, the energy stored because of the
noise in the resistor is precisely that expected by the equipartition theorem. Note
that Eq. (5.7) is a relation between Fourier components, so that, for example / and
q should be written 1^ and q^, whereas in Eqs. (5.8) and (5.9) we are really dealing
with the time dependent quantities {/(t)2} and (q(t)2) respectively.
A fundamental truth now emerges. Fluctuations must be associated with dissi-
pation in order that the system does not decay to zero, but maintains the appropriate
thermal equilibrium energy.
In view of the compatibility with thermal equilibrium shown in the preceding ses-
sion, it is not surprising that a simple thermodynamic argument can be used to
demonstrate that the noise emanating from a resistor must be proportional to its
resistance.
Consider two resistors in a series circuit shown in Fig. 5.3. The current I\
through resistor R^ produced by the first resistor is
86 THERMALNOISE
FlG. 5.3. Power transfer from resistance R\ to R% and vice versa where Vj is the
Johnson noise voltage in resistor Rj.
If both resistors are at the same temperature, the second law of thermodynamics
requires that there can be no steady net flow (in either direction). Equating Eqs.
(5.11) and (5.12), we obtain
Since the left hand of the equation is independent of R*z, and the right hand side is
independent of R\, this equality requires both sides to be independent of both resis-
tances. It therefore must be an (as yet unknown) universal function of frequency,
W(/). In summary, the noise spectrum associated with an arbitrary resistance R
is given by
Thus we can conclude that the noise G(V, /) associated with impedance Z(f) is
given by:
where 3J means the real part. Thus the noise is proportional to R(f) = $tZ(f) even
when the impedance Z(f) is frequency dependent. The Johnson law has therefore
been generalized to the case of frequency dependent impedances.
On the other hand, a transmission line can be terminated with its "characteristic
impedance"
where L is the inductance per unit length of the line and C is its shunt capacitance
per unit length. In this case, waves down the line are not reflected. The line acts
as if it were infinite. Nyquist therefore chooses as his proof vehicle a transmission
line terminated by RQ at both ends. The line is assumed to have length I. The trans-
mission line can be described in terms of its modes which are harmonic oscillators.
[f U is the energy density per mode, then the energy per mode is
where we have made use of the equipartition theorem valid for modes that behave
like harmonic oscillators.
If the modes are described as plane waves ex.p(±ikx) in a periodic system of
lensth I then k takes the discrete values
Since each mode carries an energy U with a velocity v, the power transmission
down the line is
diverges. Nyquist suggested that this problem could be removed if the classical
energy, kT, associated with a harmonic oscillator were replaced by the quantum
energy
includes the zero-point energy. If the latter is retained, the divergence in the
integrated energy reappears.
N Y Q U I S T ' S THEOREM 89
It is sometimes argued that zero-point energy can be ignored because only dif-
ferences in energy can be observed. However, it is not true for magneto-optical
transitions between Landau levels in the valence band and similar levels in the
conduction band. These levels possess a level structure like a harmonic oscillator,
but the frequency is the cyclotron frequency associated with the magnetic field.
Thus it is inversely proportional to the effective mass of the electrons in the con-
duction band, or the holes in the valence band. Since these masses are different,
the energy differences contain the difference of the two zero-point energies which
is therefore observable. See Lax (1967).
In the Casimir effect (Casimir and Polder 1948), two closely spaced metal-
lic plates are attracted by the influence of vacuum fluctuation in the gap on van
der Waal forces. The Casimir effect has been experimentally verified by Derya-
gin and Abrikosava (1956), Deryagin, Abrikosava and Lifshitz (1956), Kitchener
and Prosser (1957), and Chan et al. (2001). An alternate derivation of the "Lamb
shift" between the otherwise degenerate s and p levels in a hydrogen was given
by Welton (1948) based on the effects of the zero-point fluctuations of the
electromagnetic field on the electron.
The relevance of zeropoint energies in the electromagnetic field is discussed
further in Chapter 7 as it relates to the area of quantum optics. We also note that
absolute energies (not just differences) are relevant in general relativity, since space
is distorted by the total energy content. For an elementary discussion of these
points see Power (1964).
Callen and Welton (1951) considered a general class of systems (quantum
mechanically) and established that Eq. (5.26), and its dual form with Y = l/Z,
the admittance, and g = ^KY(uj), the susceptance:
apply to all systems near equilibrium with the replacement of kT by Eq. (5.28)
when necessary. The importance of the Callen-Welton work is the great generality
of potential applications. The fact that all dissipative systems have corresponding
noises associated with them is necessary in order that the second law of ther-
modynamics not be violated when such systems are connected. The fluctuation-
dissipation theorem will be discussed in more detail in Chapter 7, after the density
operator tools needed to give a short proof are developed.
90 THERMAL NOISE
5.5 Nyquist noise and the Einstein relation
Consider a mechanical system with velocity v. Then the standard engineering (SE)
noise associated with v can be defined by
Conversely, the fluctuation dissipation theorem for the velocity (which is analo-
gous to a current rather than a voltage) is given by Eq. (5.30)
where
with F the applied force, is the admittance, or velocity per unit applied force. At
zero frequency, we refer to v/F as the mechanical mobility, B, and v/E as the
(electrical) mobility n so that
which is simply the Einstein relation between diffusion and mobility. An experi-
mental verification of the Einstein relation for electrons and holes in semiconduc-
tors is given in the Transistor Teacher's Summer School (1953).
where /z is the electronic mobility suggests that we can define a complex frequency
dependent diffusion constant:
where we have used stationarity and t' = t + r. Thus we can write for the mean-
square displacement:
By Eq. (5.38) and (5.41), D(u) is G(v, /)/4. Thus we obtain MacDonald's
theorem
Shot noise is the name given to electrical fluctuations caused by the discreteness
of electronic charge. For excellent elementary discussions of shot noise see Robin-
son (1962, 1974), MacDonald (1962), and Bell (1960). The most typical example
concerns the emission of electrons from the cathode of a vacuum tube. For a dis-
cussion of noise in vacuum tubes see Lawson and Uhlenbeck (1950) and Valley
and Wallman (1948). In Fig. 6.1 we display an example discussed by Robinson.
The switch S is connected to A for a time interval r. The current through the
diode charges condenser C. Then the switch is shifted to position B and the accu-
mulated charge is measured by the ballistic galvanometer G. The actual charge
measured will be
with integral n in a single measurement. The mean charge (average over many
measurements) will be
with nonintegral results. Assuming random arrival of the electrons, we have Pois-
son statistics, see Section 3.1, and the root-mean-square fluctuation in charge is
FIG. 6.1. Apparatus for measuring shot noise associated with charge accu-
mulated on a condenser using a ballistic galvanometer. After Robinson
(1962).
94 SHOT NOISE
given by
Then the results were confirmed by even more accurate accurate experiments of
Williams and Huxford (1929).
CAMPBELL'S TWO THEOREMS 95
FlG. 6.2. Shot noise into a circuit containing a C in parallel with a series RL
circuit, from Hull and Williams (1925), Fig. 8. or Fig. 4-12 of Lawson. The
currents from the two circuits can be adjusted to add to zero (on the average)
but the shot noises do not cancel from independent emissions.
The paper by Williams and Vincent (1926) measured the shot effect from a
vacuum tube diode directly into a noninductive resistance and simplified the the-
ory for emission into a nonperiodic circuit. The experimental results of the work
of Williams and his collaborators required a careful analysis of both theory and
experiment before adequate accuracy and understanding was obtained.
An even more detailed analysis of experiment and theory is needed in the recent
ingenious experimental work by de-Picciotto et al. (1997, 1998) to show that in
the case of the fractional Hall effect, the effective charge can be e* = e/3. Thus,
although the charges are discrete, they are not necessarily integral.
Campbell (1909) was concerned with the measurement of the charge on the alpha
particle. The charge q, given to an electrode system, generates a voltage q/C
(where C is the electrode capacity) which decays through some leakage resistance
so that (q/C) exp(— pt) is the voltage V on the electrometer plates. The voltage V
generates a torque KV on the electrometer needle whose response is determined
by its moment of inertia /, torsional stiffness k and damping // through the relation
The parameters /, k, p., K are presumed known from a separate experiment. The
solution for © can be written q f ( t ) . If a set of pulses arrive at the times t,, the
96 SHOT NOISE
complete response is
but we need not restrict our calculations to any particular form of /(£). The latter
is the indicial response, that is, the response of the apparatus to a delta function
source. We are now in a position to state:
Campbell's theorem
If the pulses in Eq. (6.8) arrive at random at an average rate v per second, the
average response is given by the first Campbell theorem:
and the variance of the response is given by the second Campbell theorem
Proof
Equation (6.8) can be written in the form
The last step, setting the average time-dependent rate (v(s)}, to a constant v, the
average number of events per second, is appropriate (only) in the stationary case.
Thus we obtain Campbell's first theorem:
In performing the average, one must separate the double sum over i and j to
the i = j terms and i ^ j terms:
The second term is the definition of the (possibly correlated) joint rate:
Thus
If one takes the limit in which the charge e goes to zero and v goes to infinity at
fixed current / = ve, the discreteness of the charge, and the shot noise associated
with that discreteness disappear.
If we regard |/(t) |2 as the density of energy in time, then |F(/) |2 can be regarded
as the energy density in frequency /. Parseval's theorem shows that the total energy
THE SPECTRUM OF FILTERED SHOT NOISE 99
can be obtained by adding either the frequency components or the time compo-
nents, with equal result. A simple nonrigorous proof can be given, using Eq. (6.12),
to rewrite the right hand side in the form
We have reversed the order of integration. Since the last integral over frequency
/ is simply the delta function 5(u — t), representing completeness, the integral
reduces to the left hand side of Eq. (6.29).
Since f ( t ) is real, F(/)* = F(-f}, so |F(/)| 2 is even in / and
is the spectrum of ideal shot noise. We shall derive this spectrum below directly
fromEq. (6.14). The factor |F(/)| 2 can then be interpreted as the filter that reduces
the ideal shot noise of Eq. (6.14) to that of O(t). The function f(t] is the indicial
response of the filter, namely the response to a delta function input. That the spec-
trum at the output is that of the input pure shot noise multiplied by the spectrum
100 SHOT NOISE
of the filter is so reasonable, it hardly requires proof. We shall, however, make a
direct evaluation of the spectrum of pure shot noise in the next section, since we
are then sure that all our normalizations are correct, in addition to that of the shape
of the spectrum.
The current associated with the arrival of charges q at times t, can be written
where the symbol 5* is used to remind us that we are referring to the current
associated with pure shot noise. If the expected arrival rate is v(t~) per second
then
since the average current / is time independent, as anticipated in Eq. (6.33). Such
a constant spectrum is referred to as "white".
We shall illustrate the above result by calculating the voltage spectrum across
the condenser C in Fig. 6.3 without using Campbell's theorem. The full current,
ifuii = i, passes through the parallel circuit of condenser, C, and resistance, R,
with ic and IR passing through the condenser and resistance, respectively, in
proportion to the admittance of these elements:
Shot noise arises because charge is discrete. In a vacuum tube diode each electron
crosses from cathode to anode. It would be incorrect, however, to assume that
the external circuit sees a delta pulse associated with the time of arrival of each
electron. Instead we shall follow a simple model developed by Shockley (1938)
as discussed by Freeman (1952) and in Section 6.5 we shall supply an elementary
proof of the validity of the model.
If a charge e has advanced a distance x, a fraction x/L of the total distance
L from the cathode to the anode, we shall assume that the external circuit will
vary continuously as if a charge ex(t)/L has arrived at the anode. The full charge
transfer is completed when x(t) = L. A set of charges at positions Xj(t) lead to a
charge transfer of
and Vj(t) = dxj(t)/dt is the velocity of charge j. Of course, only charges in the
region 0 < Xj < L contribute to either sum above. Equation (6.45) is equivalent
to using an average current over the region 0 < x < L .
where the actual current density has the expected form (Jackson 1975, Section
5.6),
102 SHOT NOISE
We shall explore the consequences of Eq. (6.45) and prove in the next section that
the average expression in Eq. (6.46) or Eq. (6.45) is, in fact, exactly correct; see
Eq. (6.76). The transit time T for any carrier with velocity v(t) obeys
The velocity v(t) is not assumed constant, but we shall, in what follows, state the
general answer, and answers for the simple special case of uniform velocity. For
example, if v(t) = v
It has been tacitly understood that each term in Eq. (6.45) contributes only
while the position of the charge is in the active region:
Application of Campbell's first theorem to Eq. (6.45) with the help of Eq. (6.48)
yields:
which is the charge e times v, the number per second, at which they appear.
Campbell's second theorem takes the form
and G(S, uj) is the pure shot noise associated with S(s).
If we regard t,, as the time an electron leaves the cathode, no signal can appear
in the output circuit until t > tj . Thus we set v (t) = 0 for t < 0 . After the
electron hits at tj +T , this v (t — tj) term no longer contributes so we can set
v(t) = 0 for t > T. Thus the pure shot noise is filtered by the "window" factor
where T is the transit time of the electron in passing from cathode to anode. One
can readily verify that the mean current
is unaffected by the window. All that remains is to calculate the spectrum of pure
shot noise itself, which we have already given as a theorem in Eq. (6.39) as
In this section we shall prove that the intuitive "average" model of Shockley (1938)
and Freeman (1952) discussed in the previous section is rigorously correct. A one-
dimensional sheet of charge whose total charge is e moves from the cathode to
the anode of a diode with velocity v(t) (see Fig. 6.4). The charge sheet enters the
region between cathode and anode at time to and arrives at the anode at time to+T
where T is the transit time. Describe the current that appears in the external circuit
in the time interval — oo < t < oo. The results will justify the use of the smooth
current in Eq. (6.45).
A quasistatic approach is permissible. We shall therefore use Poisson's equa-
tion:
in MKS units. To get Gaussian units set eo = 1/4.7T. For our case
where dx/dt = v(t) and x(0) = 0. Thus we need the Green's solution of Poisson's
equation. On each side of the sheet, d^c^/dx2 = 0, since no charge is located in the
vacuum region. Thus <j> is linear in x and both E and D are constants. By Gauss'
law, the jump across the sheet is
is independent of x and represents the current flow in the circuit. In the exter-
nal wire, D = 0, since charges disappear in a conductor (within the dielectric
relaxation time).
To evaluate the total current in the vacuum transit region we note that the
conduction part of the current density (in the one-dimensional case) is
With the help of the Heaviside unit function H(x), Eqs. (6.68) and (6.69) can be
combined into the single equation:
Differentiation with respect to t leads to four terms, two of which cancel, leadir
to the simple result:
106 SHOT NOISE
FIG. 6.5. The potential between cathode and anode in a simple diode displaying
the barrier plane.
Combining this result with Eq. (6.72) for the conduction current, the combined
current simplifies to:
Note, that the total current contains no singular or discontinuous terms. This
combined current appears in the external electrical circuit:
The smooth result in Eq. (6.76) is just what we assumed In Eq. (6.45). The time T
in Eq. (6.76) is determined by:
where the velocity v(t) will be governed by the applied voltage and Newton's laws.
Note, an extra dc field V/L does not contribute to dD/dt although it contributes
toD.
At low concentrations, or for one particle (with no screening), one can write a
simple expression for the velocity (in a constant potential)
Our previous discussion of the shot noise in a diode is temperature limited. When
significant space charge is present, the noise is limited by correlations induced
SPACE CHARGE LIMITING DIODE 107
by the space charge. The potential produced by the space charge obeys Poisson's
equation
where the density of electronic space charge p is everywhere negative. Thus the
potential is concave upwards. If the potential V would increase monotonically
from cathode to anode, emitted electrons of all energies would be accelerated
from cathode to anode, and no space charge would develop. Since space charge
develops, the potential must develop a minimum within the region from cathode
to anode as shown in Fig. 6.5. In practice, a small minimum is found close to
the cathode. This minimum provides a potential barrier, and only electrons with
higher emission kinetic energy will overcome the barrier. Since electrons have a
negative charge, they can be visualized by inverting Fig. 6.5. Since the electrons
have a Boltzmann distribution of energy, the current is
where V& is the barrier height in energy units. Clearly, ~V\, = kT log(Io/I), and the
barrier position can be determined from its height and the temperature. This would
reduce the current but not change the ratio of shot noise to current. However, a
positive fluctuation in the current will cause an increase in the barrier height that
will turn back some electrons. This "negative feedback" causes a reduction in shot
noise, by a factor F2. For anode voltages larger than 30/cT/e, an approximate
formula for the smoothing factor given by Rack (1938):
FIG. 6.6. Equivalent circuit of a diode feeding into a noisy resistor, after Williams
and Moullin, p. 74, combining the thermal noise current with that of a
space-charge limited.
where the reduction factor T2 has a contribution from the electrons reflected at the
barrier and a second contribution from those that get to the plate.
A simplified description of the space charge reduction F2 in a triode is given
on p. 564 of Valley and Wallman (1948). Williams (1936) and p. 74 of Moullin
(1938) show that the equivalent circuit of a diode feeding into a resistor is what in
Fig. 6.6.
The equations found to fit the data are given by
This is a combination of shot noise and thermal noise. Here p is the differen-
tial resistance dV/dl of the diode. The notation Ic and IT is a notation we use
to remind us that the first noise is space charge limited, and the second is tem-
perature limited. This conversion from current to voltage noise was adopted by
Williams (1936) following a suggestion by Moullin and Ellis (1934) and discussed
extensively in Moullin (1938), p. 74.
The noise dependence from a resistance R at temperature T% was then mea-
sured by comparing the effective resistance of the diode resistance combination
RICE'S GENERALIZATION OF CAMPBELL'S THEOREMS 109
with that of a metallic resistance at another temperature T\. The results in Fig. 23
of Moullin (1938) agree with the effective resistance formula
Williams extended this verification to the case of two diodes in parallel with a
resistance R, with one of the diodes being temperature limited and the other space
charge limited. In that case, the formula is
Rice (1944, 1945, 1948a, 1948b) not only generalized Campbell's theorem to
obtain all the higher moments, he also considered the more general process
where the r/j's are random jumps with a distribution independent oft.,, of t, and of
j. Our procedure for dealing with the same problem consists in writing
Equation (6.88) describes @(t) as filtered shot noise. Thus we can relate the ordi-
nary characteristic function of 8 to the generalized characterized function of the
shot noise function, G(s)
The average in Eq. (6.90), for general y ( s ) , was evaluated in two ways in Lax
(1966IV). The first made explicit use of Langevin techniques, which we will dis-
cuss later. The second, which follows Rice, will be presented here. It makes use of
110 SHOT NOISE
the fact that the average can be factored:
Here, we have supposed that N pulses are distributed uniformly over a time inter-
val T at the rate v = N/T. All N factors are independent of each other and have
equal averages, so that the result of the RHS of Eq. (6.91) is
where 5(77) is the normalized probability density for the random variable 77. In the
last step, we assumed that the integral over s converges, and replaced it by its limit
before taking a final limit in which N and T approach infinity simultaneously with
the fixed ratio N/T = v. Setting y ( s ) = k q f ( t — s), the generalized Campbell's
theorem is obtained
The cumulants are then given by the coefficients of kn/nl in the exponent:
The choice 5(77) = 5(77 — 1) restores the original Campbell process, which includes
only the cases n = 1,2. The probability density of this variable may then be
obtained by taking the inverse Fourier transform of the characteristic function in
RICE'S GENERALIZATION OF CAMPBELL'S THEOREMS 111
Eq. (6.93)
This form of generalized Campbell's theorem, like its antecedents, assumes that
the tj are randomly (and on the average uniformly) distributed in time. Moreover,
there is assumed to be no correlation between successive pulse times. Lax and
Phillips (1958) have found it convenient to exploit Eq. (6.93) and Eq. (6.94) in
studying one dimensional impurity bands.
Rice (1944) has also determined (© 2 (t)) for the case in which the time interval
between successive pulses has a distribution p(r) that is not necessarily equal to
the Poisson value, p(r) = z^exp(—z/r), appropriate to uncorrelated pulses. A
simplified argument can be given for the second moment of
since
If we write
where the factor 2 takes account of the fact that tj can be less than tj as well as
greater than tj. If we define
then
112 SHOT NOISE
If we write
and
then
These results are readily evaluated for the case considered by Rice (1944)
so that
where
Since
where stationarity is needed to get the second form. A and B can be thought of
as (possibly complex) random variables in the classical case and operators in the
quantum case.
We also consider the response of the variable B(t), governed by a Hamiltonian
K, to an infinitesimal force associated with A produced by changing the Hamil-
tonian from K to K + XA exp(+iujt). Here A is an arbitrarily small number. The
notation used here is consistent with that used in Lax (1964QIII). If the average
response changes from (B(t)} in the absence of the A force to (B(t)}A in the pres-
ence of the force, we can define a response, or transition, function TBA(U) by the
114 THE FLUCTUATION-DISSIPATION THEOREM
change due to the force
This response function TBA(U~) can be computed for a certain system. Some
example of TBA(U) can be found Eq. (7.17 ) and Eq. (7.76). We shall establish in
Section 7,3 that
where the {• • • ) denote an average over the stationary and possibly equilibrium
ensemble present before the A force was applied. In the quantum case, an average
over an arbitrarv observable M is piven hv
where p(E) is the equilibrium (Gibbs) density operator for Hamiltonian K, and
0 = 1/kT, where k is Boltzmann's constants and T is the absolute temperature.
In the classical case, (M} is simply an integral of M against the equilibrium distri-
bution function. Our parenthesis-comma construct was defined in Lax (1964QIII)
to be
where
In the classical limit, hn(uj) is replaced by kT/uj. Note that n appears, rather than
n + 1/2, so that the zero-point contribution is absent with the present order of the
operators.
SUMMARY OF IDEAS AND RESULTS 115
The reason for the particular combination of T matrices in Eq. (7.7) is that this
combination is expressible as an integration over the complete time interval
With no further information about the operators A and B, the only relation between
the two terms in Eq. (7.1) is
where CA,B = ±1- m that case we can specialize the time reversal relation of Eq.
(10.5.23) of Lax (1974) by setting the external magnetic field to zero:
The order of these operators is relevant in the quantum case. It then follows that
and
Since hojn(oj) —>• kT, we obtain the correct classical limit for Johnson noise.
Again there is no zero-point contribution to this noise.
If we had followed the Ekstein-Rostoker (1955) antisymmetrized definition of
noise:
where
The N is the number of systems in an ensemble, not number of atoms. The average
over an ensemble is
where the sum is over the states J. Pj is the "probability" of a system being in
the state J, i.e., the number of systems in the J divided by the total number of
systems, N. The states *J are normalized, but need not be orthogonal.
We now introduce an arbitrary (orthonormal) basis set </>^. In terms of these
fin's, the pure state wave function \t J can be expanded as
has the correct matrix elements pv^ in any system of basis vectors.
where H is the total Hamiltonian of the system, where we have assumed, for
simplicity, that the Hamilton H does not depend explicitly on the time. Using this
equation, Eq. (7.33) becomes
Taking the derivative of this equation with respect to t, the equation of motion of
the density operator is
or
and treat A as a weak perturbation, via the small parameter A. This discussion
follows that in Lax (1964QIII). If we set
then if A were zero, p(t) would reduce to the constant value p(0) of Eq. (7.35).
Equation (7.40) transforms away the rapid motion associated with the unperturbed
Hamiltonian, K. This is called a transfer to the interaction picture. The only
motion that remains is that induced by the perturbation added to the unperturbed
Hamiltonian. It should be no surprise, then, that if we substitute Eq. (7.40) into
Eq. (7.39) we obtain
is the operator A with the time dependence induced by the unperturbed Hamilto-
nian, sometimes referred to as the operator A in the interaction representation.
We assume that the system starts at equilibrium at the time t = — oo:
where Z is the partition sum calculated as a trace. In the classical case, this would
be an integral over all phase space.
the density operator evaluated to exactly the first order in A. Finally, as K is time
independent, we have
But
Thus
where stationarity is applied to obtain the second form from the first. The final
result is Eq. (7.3) as predicted.
For future reference, we note that the interchange of the names A and B in the
first form of Eq. (7.50) yields (with stationarity)
If we set u = —t,
The passage from Eq. (7.51) to Eq. (7.52) involves three minus signs: dt = —du,
the reversal of limits, and the reversal of the order of the operators. Finally, we can
combine Eqs. (7.50) and (7.52) to obtain the simpler form
which involves an integration over the complete region from negative to positive
infinity.
EQUILIBRIUM THEOREMS 121
7.4 Equilibrium theorems
When Planck's constant goes to zero, we reduce to classical physics, and the
operators commute. If we take the Fourier transform of this equation, and intro-
duce t + ih/3 as a new integration variable on the right hand side, we obtain the
corresponding theorem in the frequency domain:
The Fourier integral, on the left hand side of Eq. (7.56), yields a delta function
factor 5(Huj — (Em — En)) which permits the replacement of exp[{3(Em — En)]
by exp[/3huj] and the theorem follows with no assumption of analyticity.
In the classical limit, .A(O) and B(t) commute. The factor exp[/3Huj] is the price
one must pay, in quantum mechanics, for switching the order of the operators.
This factor is identical to (and responsible for) the ratio of Stokes to anti-Stokes
radiation intensities.
For relating TBA(<^} — TAB(—^), which represents the response function or
dissipation to the noise spectrum, we use
With the help of Eq. (7.56) and notice that A = A(0) Eq. (7.53) can be written
By Eq. (7.1), the integral on the right is (1/2)GBA- Using Eq. (7.8) we obtain the
fluctuation-dissipation theorem in the form
where the left hand side follows the conventions of Section 4.7.
122 THE FLUCTUATION-DISSIPATION THEOREM
7.5 Hermiticity and time reversal
In the more general case, time reversal is assumed in the classical case by Onsager
(1931a, 1931b), and derived in the quantum case by Lax (1974).
where e^;# = ±1 according as A (or B) is even or odd under the barring operation,
which includes a classical time reversal transfer and a Hermitian adjoint transfer:
with a similar statement for B. Onsager stated Eq. (7.66) as a classical macroscopic
relation, without derivation. It was not clear what the order of the factors should
APPLICATION TO A H A R M O N I C OSCILLATOR 123
be. Lax (1974) derived this result for quantum systems described by a Hamilto-
nian even under time reversal. Our order of the operators is a consequence of that
derivation. We can immediately obtain
then
The above example applies to any case in which A and B have opposite parity
under time reversal. Application to a harmonic oscillator will be considered in the
next section.
The harmonic oscillator is an important example, since it can stand for an RLC
circuit, a mechanical circuit, or a mode of the electromagnetic field. The electric
circuit response is usually the response of a current to an applied voltage. In the
associated mechanical problem, the response, or mobility, is that of a velocity to
the applied force.
124 THE FLUCTUATION-DISSIPATION THEOREM
We can therefore use the result, Eq. (7.73), of the previous section by setting
A to the displacement variable Q, and B = A = Q is a velocity, and the applied
potential energy is +\Qex.p(+iujt), corresponding to the perturbation described
in Section 7.1.
The equations of motion of an oscillator of position Q, momentum P, mass
M, and resonant frequency UJQ are given by
Combining Eqs. (7.74) and (7.75), and setting Q(t) = Qmexp(+iujt), with the
position amplitude Qm, we have
Here, and in the discussion that follows, 7 can be frequency dependent, but we
shall avoid using the explicit form 7(0;) for simplicity. The velocity Q is then
expressible as
which is TQQ in the form of Eq. (7.73) with A = Q and B = Q. Applying the
fluctuation-dissipation theorem, we get for the velocity noise
The Qu follows another notation common in the literature. See footnote 24 of Lax
(1966QIV). The dagger correlates with the first variable, since it carries the time,
t in Eq. (7.67).
If we apply such relations as shown in Eq. (7.73), we can obtain position-
position correlations. Notice that TQQ = Q m /A, we have
With this transform of variables, to a good approximation b(€) will contain terms
predominantly of the form exp(—iujt/U) with positive us and tf will contain only
frequencies of the opposite sign.
The inverses to Eqs. (7.85) and (7.86) are
where
126 THE FLUCTUATION-DISSIPATION THEOREM
The same ratio applies to the Fourier components, so that the power ratio is
The result is that these new Langevin forces have the astonishingly simple form
If the analysis is repeated with the operators in the opposite order, say (Q(0)Q(t)),
one obtains
where the added unity is the result of the commutation rules and yields, as
expected, the corrected Stokes-anti-Stokes ratio
The procedure used by Lax (1966QIV) to produce a Markovian model that can
be solved by a well established set of tools is
1. To make a rotating wave approximation in Eqs. (7.88)-(7.89) by omitting tf
in the equation for db/dt and vice verse, and
2. To force the noise source to be white by evaluating it at a> = U>Q.
An exact solution can be made for each oscillator. For each oscillator j subject to
a force A(t), which is an abbreviation for —a,jQ(t), the position and momentum
(called briefly q and p) at time t is given in terms of the initial values q(to} and
p(to) with u = t — tn,
When each of the solutions for ^and PJ are inserted into Eq. (7.97), the system
momentum is found to obey
If we define
128 THE FLUCTUATION-DISSIPATION THEOREM
then
Phen we have
whereas
The integrations over frequency in this section extend only over positive fre-
quency since each frequency ujj of an oscillator is, by convention, positive. The
relation between the Fourier components of K(u) or L(u), which describe trans-
port and involve 7(0;), and the noise anticommutator, Eq. (7.110), which involves
7(0;) and E(UJ) represents the standard fluctuation-dissipation theorem. Here we
have not given an abstract proof of the fluctuation-dissipation theorem for a gen-
eral system, but a demonstration of how it arises in a simple case of a harmonic
system, and a harmonic reservoir. If we were to replace the potential energy of
the harmonic oscillator by V(Q) the remaining analysis would remain unchanged
except that MOJ^ in Eq. (7.101) would be replaced by dV(Q)/dQ.
If the coefficient a?J(mjU?}, which are regarded as a function of frequency, is
chosen to be a constant, then 7(0;) =7, and we get
and
8.1 Objectives
The first objective of this chapter is to reduce the determination of the behavior of a
Markovian random variable, a(t), to the solution of a partial differential equation
for the probability, P(a,t), that a(t) will assume the value a at time t. In this
way, a problem of stochastic processes has been reduced to a more conventional
mathematical problem, the solution of a partial differential equation.
But the spectrum of a random process requires that the Fourier transform be
taken of a two-time correlation. We therefore introduce a regression theorem that
states that for a Markovian process, the time dependence of a two-time correlation
of the form (a(t)a(O)) is the same as that of (a(t)). The motion of (a(t)) is just
the motion or transport of the variable a itself. Thus we relate the spectrum of
the noise to an understanding of the one-time transport or the mean motion of the
system. Onsager (1931a, 1931b) was the first to follow this procedure. He simply
stated it as an assumption within the context of a classical system near equilib-
rium. A proof, for the classical case is given in Section 8.4. A detailed discussion
of the quantum case is given in Lax (1968QXI), and a comparison between the
exact treatment of a harmonic oscillator, in the quantum case, with a Markovian
procedure introduced in Lax (1966QIV) is given by Ford and O'Connell (2000)
and Lax (2000).
However, the initial condition of (o(i)a(O)) is not the same as the initial con-
dition for (a(t)}. Thus we must obtain, in another way, information about the
total fluctuation {[a(0)]2) at the initial time. For stationary processes, at equilib-
rium, these fluctuations can be determined from thermodynamical arguments. For
example, the mean square velocity of a molecule in three dimensions is determined
from equipartition to be
relating the diffusion constant D to the mobility A we can then deduce the value of
the diffusion constant D in the equilibrium case. In the nonequilibrium case (v2)
is unknown, and Eq. (8.2) is used to obtain it from D and A which are determined
directly from the stochastic model used to describe the system. In particular, D
is determined by the second moments of the velocity jumps, and A is determined
directly from the first moments of the velocity jumps.
Explicit evaluation of the spectrum is carried out, in this chapter, for quasilinear
systems. The simplest example will be generation-recombination noise. Noise in
a system that cannot be approximated as quasilinear, such as a laser because it is
a self-sustained oscillator, is discussed in Chapter 11. In this case, an analytic (or
numerical) solution must be made of the Fokker-Planck equation for the one-time
motion of the system, and a regression theorem must be exploited to obtain the
spectrum.
We are then in a position to calculate the (two-time) noise spectrum by exploit-
ing a classical regression theorem which is established in Section 8.4. The form in
which that theorem is proved here is the statement that the equation obeyed by the
two-time (or conditional) distribution P(a(t~),t a(0), 0) obeys precisely the same
differential equation in time as the one-time distribution, P(a(t), t). The proof is
based, explicitly on the system being Markovian, with no assumption of equilib-
rium. Our classical regression theorem is not equivalent to the Onsager regression
hypothesis in two ways: (1) it is a theorem, with a proof, not a hypothesis; (2) it
does not assume that the fluctuations take place from equilibrium but can be from
a nonequilibrium stationary state.
To what extent can we expect the dynamics of a system of random variables
to be Markovian, namely that the probability of future events is determined by a
knowledge of the present. This is analogous to the question: to what extent can
we expect the future of a set of (nonrandom) variables to be deterministic, namely,
that their future values are completely determined by the initial conditions? The
answer to the latter question is yes if the set of variables is complete. But in any real
problem, one doesn't include all the variables in the whole universe. The scientist
must decide which are the relevant variables, and the rest can be discarded. In the
random case, an analogous choice must be made. If enough variables are used, we
would expect that the noise sources would be white. If they are not white, they may
have come from a filtering process via an intermediate system. If we add the latter
to our system, we can make the ultimate noise white, and Markovian methods
become available to solve the problem.
Although it is beyond the scope of this book, we will comment on the quantum
case. Because of the commutation rules, noise at positive and negative frequencies
cannot be equal. Their ratio is the familiar Stokes-anti-Stokes ratio exp(huj/kT)
discussed in Section 7.6. However, when the damping rate 7 is small compared
to the frequency difference 2wo between positive and negative frequencies it is
DRIFT VECTORS AND DIFFUSION COEFFICIENTS 131
possible to separate these two degrees of freedom by a rotating wave approxima-
tion. Then the noise can be treated as approximately white over the width of each
resonance. The correct Stokes-anti-Stokes ratio can still be maintained. The error
in this procedure is first order in 7/0;. For an optical laser, typical numbers are
7 = io9 per second and uj = 1015 per second.
In spite of the negligible error in the application for which the approximation
was made, the Lax-Onsager procedure has been attacked by Ford and O'Connell
(1996, 2000). Although these authors recognize the often negligible errors, they
have repeatedly stated that there is no "quantum regression theorem", and this
is of course true. The initial statement, Lax (1963QII) on regression was based
on an approximate decoupling between system and reservoir. So clearly, it was
understood to be an approximation in the quantum case, not a theorem. Later
papers, clarifying the nature of the approximation, see Lax (1964QIII, 1966QIV),
were not mentioned in the initial draft of Ford and O'Connell (2000). To under-
stand why the approximate procedure worked in a variety of cases, Lax (1968QXI)
showed that it would work whenever the system was Markovian. Why this is true
in the classical case is shown in Section 8.4. The quantum mechanical proof in
Lax (1968QXI) merely showed that a system would obey a regression theorem if
it were Markovian, and vice versa. But an exact quantum treatment of a damped
harmonic oscillator was already used to show in Lax (1966QIV) that the ratio of
the noise at positive frequency to that at the corresponding negative frequency is
exp(huj/kT~), usually called the Stokes-anti-Stokes ratio. Since this ratio is not
unity, the noise cannot be white, which would imply an exactly flat spectrum.
Ford and O'Connell (1996) wrote: "But the so-called quantum regression the-
orem appears in every modern textbook exposition of quantum optics and, so far
as we know, there are no flagrant errors in its application. How can it be that a
nonexistent theorem gives correct results?"
The answer, of course, is that many real systems are approximately Markovian.
That is why the method works. It is not necessary for the noise to be white over all
frequencies, but only over each resonance, where most of the energy resides.
A general random process (not necessarily a Markovian one) obeys the relation on
conditional probabilities, Eq. (2.15), which is rewritten in the form
at a at time t remembering that one started the entire process at ao at time to- This
last bit of information is irrelevant if the process is Markovian. If this dependence
is dropped, Eq. (8.3) reduces to the Chapman-Kolmogorov condition, Eq. (2.17).
For many Markovian processes one can write (for a1 ^ a)
where At is small and wa>ais the transition probability per unit time and the
normalization condition (including a' = a),
This is in fact obeyed even for the Brownian motion process, which can be easily
proved from Eq. (8.7). Again, for a Markovian process the dependence on OQ can
be omitted everywhere.
DRIFT VECTORS AND DIFFUSION COEFFICIENTS 133
Equation (8.9) is the central axiom we set for random processes. Random
variables are different from nonrandom variables in smoothness with time. For
a nonrandom variable, (a' — a)n is proportional to At™, because it varies smoothly
with time, hence, lim[(a' - a) n /At] = 0 for n > I. For a random variable, there
are nth moments for n > 1 proportional to At due to strong fluctuation.
It is customary to refer to DI as a drift vector, A, since
so that we expect to be able to show later that the mean motion of our random
variable a obeys
a diffusion constant in the variable a in analogy with the Brownian motion result
that D = {(Ax) 2 )/(2At) for a one-dimensional position variable. However, a
need not be a position variable. It could, for example, be an angular variable, or
the number of particles in a given state.
If one does not impose a (in) = ao, every random process obeys
which adds up all the ways of arriving at a' weighting each with the probability of
its starting point. Thus it is also possible to define the set of diffusion constants
Equation (8.15) follows from Eq. (8.9) where all information for times earlier than
t is ignored, which is appropriate (only) for Markovian processes.
134 GENERALIZED FOKKER-PLANCK EQUATION
which adds up all the ways of arriving at a! weighting each with the probability
of its starting point. The average motion of an arbitrary function M(a,t) may be
obtained by integrating M ( a f , t) against P(a', t + At). Thus, we should multiply
Eq. (8.16) by M ( a f , t) and integrate over a'. On the right hand side of the equation,
we shall replace M(a' t) by its Tavlor expansion:
The integrals of (a' — a)n over a' give rise to the diffusion coefficients in Eq.
(8.8). Then integration in Eq. (8.16) over a at time t leads to
In obtaining (M(a,t))t of the second term of the right hand side in the above
equation, the normalization property of the transition probability is used:
Thus we obtain
AVERAGE MOTION OF A GENERAL R A N D O M VARIABLE 135
where (• • • ) for any function M(a) is defined by
This formula (8.20) is valid in the sense of the expectation value for a general
random variable M(a, t). We remind readers of the two different symbols. The
symbol {(Aa) n ) a(i)=a = ([a(t + At) - a(t)f) a(t)=a in the definition of Dn(a, t)
in Eq. (8.4), where a subtraction of a at t from a at t + At is first taken, then
an average is made over the values of a at t + At subject to a fixed a(t) = a
at time t, involves only an integration over a(t + At). The symbol d(M(a)} =
(M(a, t + At)} — (M(a, t)) that represents the change of the expectation value of
M(a, t) with time, where the averages of M(a) at two different times t and t + At
are first made, then the difference is made, involves two integrations over a(t) and
a(t + At).
We shall illustrate its value by setting M(a) = a,
and for the variable M, we have d(M)/dt = (A(M,t)). When M(a) = a2, we
have
where we have used A(a, t) = Di(a, t), D(a, t) = D^(a, t), ..., and {...) rep-
resents the average over a. The last two terms are taken in Eq. (8.23) instead of
2(A(a, t)a), as that is needed for the multidimensional cases and the quantum case
in physics.
The conditional expectation under the condition of a(t) = a is obtained by
setting
The first equation determines the operating point of the stationary state. The sec-
ond is the usual Einstein relation which relates the diffusion coefficient D to the
dissipative response contained in A.
If the noise is weak so that the fluctuations extend over a range in a which is
small compared to that over which A (a) and D(a) vary appreciably, it is permissi-
ble to make a quasilinear approximation. Let aop represent the point at which the
drift vector vanishes:
and set
as the deviation from the operating point aop. In the quasilinear approximation we
shall make the approximations
i.e., we retain first order deviations in the drift vector. Thus our drift, or transport
equation, simplifies to
In the steady state, the left hand sides of Eqs. (8.33) and (8.34) vanish and for the
single variable, classical case, our Einstein relation is then simply
THE G E N E R A L I Z E D FOKKER-PLANCK EQUATION 137
which relates a diffusion constant D, a decay or dissipation constant A and the
mean-square fluctuations (a2) in the steady state. Here (a2) plays the role of kT
but thermal equilibrium has not been assumed as it was in Einstein's original work.
In general, the equations for /(a), with /(a) = 1, a, a 2 , do not form a closed
set. Therefore it is necessary to obtain an equation for the full probability distribu-
tion P(a, t) or P(a, t aoto) and not just the moments of those distributions, which
is presented in Section 8.4.
To obtain the equation of motion for P(a, t) we write Eq. (8.20) in the explicit
from, when M(a) does not explicitly depend on time,
Since this equation is to be valid for any choice of M(a), the coefficient of M(a) in
the above equations must vanish yielding the generalized Fokker-Planck equation:
The ordinary Fokker-Planck equation is the special case in which the series ter-
minates at n = 2. The ordinary Fokker-Planck equation, which can describe a
nonlinear Brownian motion, plays a special role because of the following theorem.
Proof
Define An = n\Dn. Then the Cauchy-Schwarz inequality takes the form
138 GENERALIZED FOKKER-PLANCK EQUATION
This argument is due to Pawula (1967). From Eq. (8.41) we see that
and
Thus if any An exists for n > 2, this string of inequalities guarantees the existence
of an infinite number of such coefficients. Such an infinite number corresponds to
an integral equation rather than a differential equation. Actually, Pawula proves
the converse. If one assumes the existence of a closed equation of order greater
than 2, then some finite order coefficients must vanish. By using the inequalities
in reverse, one can show that all coefficients above n — 2 would vanish.
The existence of a Fokker-Planck equation for a random process does not guar-
antee that the process is Markovian. If we start from Eq. (8.9) instead of (8.20) we
obtain the equation
In the Markovian case, Eq. (8.44) reduces to Eq. (8.38) since the dependence on
the earlier time to can be discarded. Thus if (and only if) the process is Markovian,
P(a, t ao, to), a two time object, obeys the same equation of motion as the one
time object, P(a, t). In the Markovian case, then, we can calculate the conditional
probability, P(a, t ao, to), a two-time object, by solving the single time equation,
Eq. (8.38), subject to the initial condition
and
thus we have
and we obtain
Here, a and y may both be regarded as operators but all y's must remain to the left
of all a's.
where (A(n)} is our average drift vector. Since the occupancy of state n is
increased by generation from the state n — 1 and reduced by generation out of the
state n, whereas it is increased by recombination out of state n + l and reduced by
140 GENERALIZED FOKKER-PLANCK EQUATION
recombination out of state n, the probability distribution function, P(n, t), obeys
the following generalization of Eq. (3.3)
The rth order diffusion constant read off from the coefficient of the rth derivative
term agrees with that found in Eq. (8.56). Compare with Eq. (8.38) and see Lax
(1966III) for a detailed discussion of the generalized Fokker-Planck equation. We
have
Thus all the even numbered diffusion constants are proportional to the sum of
the rate in plus the rate out, while all the odd numbered diffusion constants are
proportional to the difference, i.e., the rate in minus the rate out. In particular, the
second moment, Do, which will be the moment of the noise source, obeys
These results are characteristic of shot noise. A Langevin approach to the gen-
eralized Fokker-Planck equation is presented in Lax (1966QIV) and discussed in
Chapters 9 and 10.
In the quasilinear approximation, the operating point, nop, and decay parame-
ter, A, are determined by
Then, using Einstein relation, the fluctuation from the average value becomes
where we have tacitly assumed that t > 0. By the definition, Eq. (1.80), of
conditional probability
We have replaced t by \t\ here, but the justification is based on time reversal as
developed in Section 8.10.
Using Eq. (8.63) in the quasilinear approximation,
Thus the Poisson process of Section 3.1 is a special case of the generation-
recombination process of shot noise processes in which the noise arises because
THE CHARACTERISTIC FUNCTION 143
of the discreteness of the occupancy number n. An even more general case, when
many states have occupancies was given in Lax (19601).
Let us summarize the procedure we use to obtain the spectrum of fluc-
tuations from the quasilinear stationary state. For a Markovian process, the
time-dependent decay (regression) of a correlation, (An(i)An(O)}, is the same
as that of (An(t)}^ n ( 0 ) the decay of the mean motion from a deviation. This is
the basis of the Onsager (1931a, 1931b) regression hypothesis for the equilibrium
state, but it is proposed as a theorem by Lax (1968QXI) for Markovian systems
with no assumption regarding equilibrium. Of course classical systems can be
exactly Markovian, whereas quantum systems can only be approximately Marko-
vian. For the classical physics case, this proposed theorem is a consequence of
the definition of conditional probability, Eq. (8.65). Thus the conditional average
of An(t) obeys the same time dependent equation of motion as the unconditional
average. See Lax (19601). In this way, the frequency dependence of the noise is
determined by the Fourier transform of the mean motion - the transport. The nor-
malization of the noise is determined by its total: {[Are(O)] 2 } and the latter can
be calculated via the Einstein relation. Thus {[An(0)] 2 ) is determined by D and
A, and D must be calculated directly from the nature of the random process. In
the equilibrium case, the procedure is the reverse. The total fluctuations are deter-
mined by the Gibbs distribution in classical mechanics, or the associated density
matrix. The Einstein relations can then be used in the reverse direction to calculate
the diffusion constant.
We found earlier that the easiest way to obtain solutions to the Poisson process is
to solve for its characteristic function. This suggests examining the characteristic
function of the generalized Fokker-Planck process.
The characteristic function, (p(y, t), of a normalized probability density P was
defined in Section 1.5 as
The moments (an} are determined by the nth order derivatives of 0(y, t) with
respect to y at y = 0 since from the above equation
144 GENERALIZED FOKKER-PLANCK EQUATION
Since
thus
where
Together with
Thus we have forms, Eqs. (8.81) and (8.80), analogous to the quantum mechanical
space and momentum representations, respectively. The best form to use, as in
quantum mechanics, depends on the particular problem.
Although our diffusion coefficients are originally defined in terms of ordinary
moments, i.e., Eq. (8.14)
since the lower moments to be subtracted off, i.e., the unlinked parts, contain the
product of at least two averages of the form ([a(t + At) — a(t)]'} with I > 1,
THE CHARACTERISTIC FUNCTION 145
and the product is at least quadratic in At, and can be discarded. Although the
cumulants have an earlier history as Thiele (1903), called the semi-invariants (see
Section 1.6), but the use of the linked-moment notation (L subscripts) in quantum
problems is due to Kubo (1962).
In both Eq. (8.82) and Eq. (8.83) averages are taken subject to the initial con-
dition a(t] = a. With the notation Aa = a(t + At) — a(t), Eq. (8.83) can be used
to rewrite Eq. (8.48) for L(y, a, t) in the elegant form
As an example, let us consider the shot noise case. We had found the diffusion
coefficients, Dn, shown in Eq. (8.71), to be
or
i.e., all the linked-moments are the same and equal to the number of events
expected, (a), or vt as shown in Eq. (3.16).
146 GENERALIZED FOKKER-PLANCK EQUATION
8.7 Path integral average
All our previous work has dealt with averages taken at one time or at most, taken
at two different times. We shall now look at a truly multitime function.
A number of important problems in the theory of random processes can be
reduced to an expectation value of the form (see, for example, Lax 1966III; Deutch
1962; Middleton (1960); Stratonovich 1963),
This is evaluated for real A by Lax and Zwanziger (1973) and the inverse transform,
an ill-posed problem, is obtained using a Laguerre expansion procedure.
The most important object of attention is the generalized characteristic func-
tional F f - • - 1 ofo(s]
which becomes
We note that we do not integrate over dao in the above equation since MQ is the
average under condition a (to) = GO- To calculate the total average, M, we would
PATH INTEGRAL AVERAGE 147
set
For a Markovian process, using the factorization of probabilities, Eq. (2.11) and
Eq. (8.95) become
The relation between Feynman (1948) and Wiener (1926) path integrals is dis-
cussed by Montroll (1952). This equation is essentially the equation of a Feynman
and Wiener path integral. We now define
We may regard P(a,j+i a.,) as the transition probability of a new Markovian pro-
cess. It obeys the usual properties of transition probabilities except for a change in
the normalization
to first order in At. If we regard P(a, t) as a density of systems, then Q(a, t) can
be regarded as the rate at which these systems disappear. The higher moments of
P(a'\a) are the same as those of P(a' a),
which is Eq. (8.81) with an added loss term. Thus on the right hand side we
have the usual Fokker-Planck operator, plus a loss term. Nowhere in our previ-
ous discussion have we made use of the normalization of the probabilities except
148 GENERALIZED FOKKER-PLANCK EQUATION
Since P is not normalized and is not an ordinary Markovian process, the use of
the Chapman-Kolmogorov condition requires some justification. We can provide
an intuitive proof as follows. The term in Q(a) represents a rate of disappearance
from which there is no return. If we simply add a new discrete state which holds
all the escaped probability, then P will be again a normalized Markovian process
since no memory has been added. Equation (8.102) then describes a composition
of probabilities in which the system ("particle") passes from ao to an having sur-
vived, i.e., not disappeared or escaped. This is accomplished by passing through
the intermediate states 01,02 • • • a n _ihaving survived at each step. As a formal
proof of the Chapman-Kolmogorov condition we note that the P process differs
from an ordinary Markovian process only in having
increased by an amount Q(a). But these transition probabilities have been shown
to obey the Chapman-Kolmogorov condition in Section 2.4 without specifying
r(a). The P process is already included in that proof. Thus Eq. (8.97) becomes
which just describes a decay in the normalization of the P's associated with taking
the mean of the exponential, Eq. (8.91). We note, however, that Q may be positive
or negative.
Associated with P we define a characteristic function, </>, which is the Fourier
transform of P, just as <f> was the Fourier transform of P, i.e., we define
LINEAR DAMPING AND HOMOGENEOUS NOISE 149
Comparing Eq. (8.107) with Eq. (8.105) we see that
We conclude that if we can solve the equation for the characteristic function </>, we
can evaluate the path integral Eq. (8.91). In a similar manner as the derivation of
Eq. (8.101), we find the equation of motion of the characteristic function <^>, to be
where a = —id/dy. Thus, as in the equation of motion for ct>, we obtain an extra
term in the equation of motion for <j>, compared with the equation of motion for (j>,
Eq. (8.80).
We now specialize these results to an easy case for which explicit answers can
be obtained: the case of linear damping and homogeneous noise. In our present
language, linear damping means
i.e., our Dn(a)'s for n > 2 are independent of a, but could be functions of time.
Actually, A can also be a function of time, but to simplify the equations, we
temporarily stick to constant A.
In this case the operator L becomes
where the complete dependence on a is contained in the Aa term, and the noise
contained in K(y, t) is homogeneous, or independent of a:
Note that terms linear in y have been separated, and K(y, t) contains all terms
quadratic and higher in y. We can then solve for P, and <j> by using Eq. (8.101) and
150 GENERALIZED FOKKER-PLANCK EQUATION
(8.109). We are interested in the form
Since
Since a only appears linearly in L, only first derivatives with respect to y appear
in the partial differential equation (PDE). If we had K = 0, the method of
characteristics of Appendix 8.A, which applies to PDE's that involve only first
derivatives, becomes applicable. The method of characteristics yields the equation
of characteristics
(We write yA rather than Ay since this is the appropriate order when y is a vector
y and A is a matrix — the case discussed in Lax (1966III) Section 7F. The solution
can be written
where
is the special solution appropriate to y(0) = 0, and the first term is a solution of
the homogeneous equation. To conform with the notation in Eq. (8.206) we note
that y(0) = Y. If K = 0, the exact solution of the PDE is then
where Y(y, t) is the inverse of y(Y, t ) , the solution of y(t) in terms of its initial
value Y, and 6(y. 0) has the form:
which is the prescribed initial condition since P(a, t\ao, 0) approaches 6(a — ao)
and (j)(y, 0|a 0 ,0) is defined by Eq. (8.107).
We can deal with the case K(y,t) / 0 by introducing the new variable z by
the transformation
We note that the q(s) function has been completely arbitrary in our solution of
this problem. We have desired this arbitrariness in order to make a comparison in
Section 9.5 with a corresponding Langevin problem. Equality for general q(s) will
guarantee the full equivalence of these problems.
We then take the Taylor expansion of P(a, t\a', to) about a$:
EXTENSION TO M A N Y VARIABLES 153
and using Eqs. (8.99) and (8.100) obtain the backward equation, (Lax 1966III
5.21).
Thus we have obtained an equation of motion for our final result, the path integral
itself. Such a procedure that obtains an equation directly for the quantity of interest
is referred to as "invariant embedding" by Bellman (1964).
is a set of N variables. Thus the reader may replace in his mind each a(t) by the
same symbol in boldface.
We shall not restate all the results in the multidimensional case, but only indi-
cate a few where it is worthwhile to display the subscripts explicitly. All results
were stated in multidimensional form in Lax (1966III). For example, the equation
of motion of a general operator can be written
154 GENERALIZED FOKKER-PLANCK EQUATION
where we have also generalized to include the loss term Q so as to be able to
evaluate path integrals later. The diffusion coefficients themselves are defined by
where
For a shot noise process, because all the cumulants are equal, all terms of order
(Aa) re yield a contribution of first order in At. Thus we must write
and retain terms to all orders for shot noise. A subsequent average over a yields
our previous equation of motion, Eq. (8.135). For a Brownian motion process Aa^
contains terms of order (At) 1 / 2 . See the Brownian motion results, Eqs. (3.27),
(3.41). Since Dn = 0 for n > 2, only terms Aa^ to second order are needed for
Brownian motion problems.
Now, we discuss the noise spectrum in the case of multiple variables. When the
fluctuations are small, it is appropriate to introduce the deviations
If the elements a^(or a^) are real (Hermitian) then so is the matrix A. For simplic-
ity, we have specialized here to the case Q = 0. The matrix A is not random, and,
for fluctuations from a steady state, is time-independent, although, in general, A
can be time dependent.
The comnlex noise tensor can he written
156 GENERALIZED FOKKER-PLANCK EQUATION
or when components are exhibited and the process is stationary,
where the dagger denotes the complex conjugate for classical processes and the
Hermitian adjoint for quantum random processes. Generally, one measures the
noise in a composite variable
But Eq. (8.145) is valid only for t > 0, and the integral in Eq. (8.148) extends from
—oo to co. We can overcome this problem by splitting the second integral in Eq.
(8.148) into two regions: positive and negative t. In the second region, we replace
t by — t and use stationarity to simplify the result. Thus
where
Note that we have succeeded, in Eq. (8.154), in expressing the second term in terms
of positive times. This is needed since Markovian techniques normally express the
EXTENSION TO M A N Y VARIABLES 157
future (positive times) in terms of the present. Whether the variables a are real
(Hermitian) or not, the second integral is related to the first by
The positive and negative time components of the noise can thus be written brief!}
where A means the transpose of A. As in the single variable case, the spectrum
of the noise is determined by the transport problem, as embodied in A, and the
magnitude of the noise, (afia), which will be determined with the help of the
Einstein relation. For this purpose, we take the second moment equation of Eq.
(R 1^8^ and write it in terms of rv instead of a:
simplifies to
or
where the indexes 1, 2, and 3 constitutes the left-to-right ordering of the symbols,
regardless of how they are written down. The formal solution:
is not explicit, since it gives no indication of how the expression can be written in
the desired order. We can start disentangling by first writing the previous result as
an integral representation:
This expression can then be written with the operators correctly ordered from left
to right in A, B, C order. Then the ordering subscripts are no longer needed. Thus
we arrive at the completely disentangled form:
The example in Eq. (8.170) shows that volume fluctuation, and hence concentra-
tion fluctuations, are proportional to the compressibility.
In the equilibrium case, the Einstein relation is used not to determine (a^a) but
to determine D. In quantum applications it is convenient to determine D^ directly
EXTENSION TO M A N Y VARIABLES 159
using one expression in Eq. (8.138) with Q = 0, which is called the "generalized
Einstein relation":
to remind us that D^, in Eq. (8.171), represents the extent to which the law of
differentiating a product is violated. In quantum application, we will consider a
system in interaction with a reservoir and by elimination of the reservoir obtain an
effective equation for the density operator of the system. Such an equation permits
one to calculate motion of all operators: a^, av, and a^av and hence D^v. We
use the phrase "generalized Einstein relation" because it is not restricted to the
stationary state.
Another concept which generalizes readily to the multivariable case is the
linked-average. With
as in Eq. (1.51).
One of the most general questions one can ask about a set of random variables
a(s) is the multitime characteristic function. We have an explicit expression for
the characteristic function for the case of linear damping plus homogeneous noise.
The multidimensional analogue of Eq. (8.126) is
160 GENERALIZED FOKKER-PLANCK EQUATION
When time reversal is obeyed, Eq. (7.66), with A(i) = a\(i) and 5(0) = oy(O) ,
yields the condition
valid even in the non-Markovian case. The order of random variables has been
chosen so that the time reversal condition remains valid even in the quantum case.
As applied to our linear response, Eqs. (8.151), (8.157), we get
At t = 0 this specializes to
In the classical limit, a.{ and OLJ commute. Thus if one variable is even (say oti)
and the other is odd (say flj, using j3 rather than a to make the oddness visible) we
have
(classical only), even in the non-Markovian case. Thus the variables odd under
time reversal do not correlate with the variables even under time reversal.
TIME REVERSAL IN THE L I N E A R CASE 161
If we equate terms linear in t in Eq. (8.178) we obtain
In the classical case, when both variables are even under time reversal, this
simplifies to
An example for the inertial systems, containing even and odd variables, will be
discussed in more detail in Section 9.7.
Time reversal leads to a paradox. If a(t) is a set of even variables then j3(t) =
da(t]/dt is a set of odd variables under time reversal. Thus
For t < u, take the hermitian and use this result again
In view of Eq. (8.178), these expressions are in fact equivalent provided that the
absolute value sign is used. The derivative of Eqs. (8.186)-(8.187) with respect to
t at t = 0 then displays a cusp or discontinuity in slope:
In view of the time reversal condition, Eq. (8.181), these slopes are, in fact, equal
and opposite in sign. This situation is displayed in Fig. 8.1.
Perhaps the best way to understand this dilemma is that the cusp is correct for a
truly Markovian system. However, the Markovian approximation may not be valid
for exceedingly small times. And the slope will be continuous for small times, and
hence vanish there. This is what would be expected for the occupancy of an excited
state of an atom as a function of time. However, a deviation of the excited state
occupancy from a simple exponential decay has, so far, not been observed.
162 GENERALIZED FOKKER-PLANCK EQUATION
Time reversal symmetry of correlation
FIG. 8.1. The true regression of a fluctuation (solid curve) is compared to the
Markovian approximation (dashed curve). For t > TJ, the decay is exp(—t/r),
where r is a typical relaxation time and r^ is the duration of a collision, or the
forgetting time of the system. This figure is from Lax (19601).
Theorem
A Gaussian process is necessarily linear. By linearity, we mean
Proof
The conditional probability
where all the dependence on {a.,} is shown. If one calculates the mean of a, e.g.,
by finding the maximum exponent, or by completing the square, one finds
The process is nonstationary if the Tj and a are time dependent, but it remains
linear. The proof also applies to a set of random variables a(i) conditional on
(a(tj)}. Then a and Tj are possibly time dependent matrices.
Doob's theorem
A random process that is stationary, Gaussian, and Markovian possesses an
exponential autocorrelation
Doob (1942) stated this theorem for a one-dimensional random process. Kac
extended it to the multidimensional case as discussed in Appendix II of Wang
and Uhlenbeck (1945).
A number of people have asked how I got into the study of noise. Blame it on
the editor of the Physical Reviews in the late 1950s. I had written a paper entitled
"Generalized Mobilities" in which I derived the formula usually called the Kubo
formula. That formula was shown to obey the fluctuation dissipation theorem. As a
result the editor assumed that I knew something about noise and promptly started to
send me a series of papers on noise in semiconductor transistors, diodes and other
devices. These papers were written by good experimentalists who observed noise
and felt obliged to explain it. Since general methods of dealing with noise were
limited, I saw many ad hoc explanations that neither I, nor the authors understood.
In self-defense I tried to learn what aspects were shared by most semiconductor
devices. I decided that the relevant feature was that the noise consisted in small
fluctuations about an operating point. The operating point was typically a steady
state, but since current was drawn it was not an equilibrium state. So Lax (19601)
"Fluctuations from the Nonequilibrium Steady State" was born. The lesson learned
was that if the transport theory (time dependent average motion) was understood,
the noise spectrum was readily determined, since the two-time decay or transport
would be the same as the regression or one-time motion. But the normalization
must be determined separately. And the latter was fixed by the Einstein relation.
For the equilibrium case, the integrated noise can be expressed by thermodynamic
164 GENERALIZED FOKKER-PLANCK EQUATION
formulas. In the nonequilibrium case, one cannot call on thermodynamics, but
can call on the second moments, or the diffusion coefficients that described the
underlying fluctuations.
The framework was later generalized to stronger noise and nonlinear fluctua-
tions. For the Markovian case, this led to the conventional Fokker-Planck equation
discussed in Lax (1966III).
Finally, I decided to apply these methods to the problem of noise in lasers. This
applied the Langevin approach discussed in Lax (19601) and Lax (1996IV) and to
a quantum system thereby creating a "Quantum Theory of Noise Sources" in Lax
(1966QIV).
Quasilinear methods become completely inappropriate since we are dealing
with a self-sustained oscillator. However, classical self-sustained oscillators exist,
and the Fokker-Planck equation for a "rotating wave van der Pol oscillator" was
solved exactly, but numerically by Hempstead and Lax (1967CVI) and Risken
and Vollmer (1967), as will be seen in Chapter 11. Even here, in most cases, the
fluctuations are small and a quasilinear treatment is satisfactory.
An extension of the Einstein relation between fluctuations and mobility was
extended by Lax (19601) to the nonequilibrium case. A treatment by Reggiani,
Lugli, and Mitin (1988) considered the fluctuation-dissipation in a strong field
case in a semiconductor.
The problem is to solve a first order partial differential equation (PDE) of the form
Suppose that we have found a solution of the partial differential equation of the
form
Then
APPENDIX A: A METHOD OF SOLUTION OF FIRST ORDER PDES 165
By eliminating dy and dz, using Eq. (8.196) we get
Thus the integral of the "characteristic equations" (8.196) is a solution of the partial
differential equation (8.195).
If we regard x and y as independent variables and z = z(x,y) then we have
Take P times the first equation (8.198) plus Q times the second equation (8.199)
plus R times du/dz to get
or
Thus the solution of the characteristic equations also satisfies the inhomogeneous
partial differential equation (8.201).
Note: Ifu(x, y,z) = a and v(x, y, z) = b are two solutions of the characteristic
equations then an arbitrary function (f>(u, v) is a solution of the PDEs (8.195) and
(8.201).
The above procedure can be extended to any number of independent variables.
It is convenient then to note that
so that
represents a conservation of density along the motion, that is, a Liouville's the-
orem. Thus if particles move according to the dynamical equations, Eq. (8.204),
then u(x(t), y ( t ) , z(t)~) will be a constant of the motion as discussed by Goursat
(1917).
In particular, if the dynamical equations possess a solution
x = x(X,Y,Z,t); y = y(X,Y,Z,t)\ z = z(X,Y, Z,t}, (8.206)
where X, Y, and Z are the initial values of x, y, and z at t = to, then [X, Y, Z] can
be thought of as the name of a particle which remains fixed as its position [x, y, z]
changes. If we solve for the name of position or material variable X, Y, Z in terms
of the spatial variables:
or
then the motion x = [x(t), y ( t ) , z(t)} is such that X = ~K(x(t), y(t), z(t]} is fixed
at the value X, etc. Thus X, Y, and Z are three constants of the motion, and any
function
This result shows that a point source remains a point as it moves although the nor-
malization integral may change. The first delta function reminds us of the meaning
APPENDIX A: A METHOD OF SOLUTION OF FIRST ORDER PDES 167
of X(x, t) as the inverse of x(X, t) but the notation 5(F(x, t} - X) reminds us
that in the integration over X only the term after the minus sign is actually an
integration variable.
The solution of the first order Fokker-Planck equation
where
We wrote the solution down by guessing that it was the appropriate solution in
which a "point" remains a point for all time (this simplicity will disappear when
diffusion is added).
We may verify the solution by evaluating dP/dt:
Since the other factors are functions of X, the d/dxi can be pulled all the way to
the left
But X is the material variable or name of the particle so that [dxi/dt]^ is in fact
the material time derivation (the one which follows the particle), i.e.,
and because of the presence of the delta function, we can replace A^(x(X, t)) by
Aj (x) to obtain
where the last step uses the original ansatz Eq. (8.212).
9
Langevin processes
The autocorrelation of the noise source, F(t), defined in Eq. (9.1), is then given
by (in a form appropriate to the complex case)
where we have replaced at A^ by A*o^. The <— sign reminds us that d/du acts to
the left on (a\t)a(u)}.
170 LANGEVIN PROCESSES
Using a shifting theorem of the form
we have
If we insert Eq. (9.3) into Eq. (9.6), and use Eq. (9.8), we get
where the second equality follows from the Einstein relation, Eq. (8.163).
An interesting consequence of the above results is the theorem written in
component form.
Theorem
Products of random variables and random forces for Markovian processes:
Equation (9.12) is the statement that for a Markovian process the Langevin force
has no memory.
Proof
Suppose that we start at some time to < s, to < t with a specified initial value
a (to) at t = to- Then Eq. (9.1), a linear first order equation, with constant
HOMOGENEOUS NOISE WITH LINEAR DAMPING 171
coefficients, has the usual solution
where the left hand side could be labeled with a subscript a: (to) to remind us of the
initial condition. If we multiply by F(t) on the right and take an ensemble average,
the first term vanishes since a(to) is fixed and (F(t)} = 0. With the help of Eq.
(9.11) the second term yields
When t > s, the delta function is not included in the region of integration, and
Eq. (9.12) results. When t = s, only half of the delta function is integrated over,
and this establishes Eq. (9.14). The three results can be combined into a single
expression
provided that the Heaviside unit function H(s) takes the value 1/2 at s = t. The
subscript on the average reminds us that it is conditional on a given value of a (to)
at t = to- An additional average over a: (to) permits us, also, to drop the initial
condition, yielding Eqs. (9.12)-(9.14) with the constraint removed on the starting
value.
That a random force for a Markovian system does not depend on random vari-
able values at earlier times, seems intuitively clear. But a proof is needed. Indeed,
our proof, and our entire discussion, so far, has been limited to linear, stationary
processes.
The factor of 2 reduction in Eq. (9.14) from integrating over half a delta func-
tion is consistent with our view that in the nonideal world, the correlation of the
forces is not a delta function, but a sharp even function. Strictly speaking it is not
a function at all, but a sequence of such even functions whose width approaches
zero.
For a large class of noise problems, it is appropriate to treat the noise as weak
and make a quasilinear approximation about the operating point. The purpose
of Lax (19601) was to show that the noise and correlations in such a system
can be obtained in three steps: (1) the time dependence of correlations such as
{a^(t)a(O)} obey the same equations as those for (a^(t)), so that the average
"transport" or "relaxation" equations determine the correlations, and hence the
172 LANGEVIN PROCESSES
frequency dependence of the noise; (2) the normalization is determined by the sin-
gle time correlations (cJa)); (3) for a nonequilibrium system, the steady state
values (cJa) must be determined by solving the Einstein relations of Eq. (8.163):
For N variables, this is a set of N"2 equations. One major benefit of the Langevin
approach is that one can avoid the solution of Eq. (9.18) and never solve any system
of more than N equations.
We shall obtain our principal result using the Langevin approach in the heuristic
manner described in Lax (1966IV), Section 1. The noise correlation, Eq. (4.36),
can he written
Using the Langevin approach, Eq. (9.1) leads to the associated equation
where A is the transpose of A. And Eq. (9.2) translates into the adjoint of Eq.
(9.21)
With the understanding that, even through the Fourier transforms are not well-
defined mathematical object, products of two are, in the sense that
CONDITIONAL CORRELATIONS 173
This notation is consistent with that in Section 4.7 Eq. (4.50). Equation (9.19) can
then be written as
where D is simply the diffusion matrix associated with a due to the Langevin force
F(t) from Eq. (9.11). Thus an explicit answer for the noise in the variables o^ov
if?
The integrals contained in Eq. (9.28) can always be evaluated by residue tech-
niques since there are a finite number of poles. Thus the second moments, although
no longer needed, except as a measure of total noise, can also be obtained simply.
This avoids the use of ordered operators discussed in Section 8.10.
Another advantage of the Langevin method, at least in the linear case, is that it is
easy to calculate second order correlations conditional on an initial condition. The
174 LANGEVIN PROCESSES
equation is Eq. (9.1) and (9.2):
If Eqs. (9.31) and (9.32) are multiplied and averaged, the result is
with
If t > u, the integral over r should be done first in Eq. (9.35). Then the delta
function is always satisfied somewhere in the range of integration. Thus
The absolute value \t — u , in the first term, is unnecessary since t > u. How-
ever, when u > t, the integral must be done over s first, and both answers are given
correctly by using \t — u .
Although we are dealing with a stationary process, Eq. (9.40) is not stationary
(not a function only of \t — u\) because initial conditions have been imposed at
t = to. However, if to —> —oo, the results approach the stationary limit
Equation (9.40) can also be rewritten by subtracting off the mean values
In this section we will obtain the result in Section 8.7 using the Langevin approach,
which was presented in Lax (1966IV), Section 2.
176 L A N G E V I N PROCESSES
We can continue our discussion of homogeneous noise with linear damping
using the same Langevin equation
Thus all linked-moments are assumed maximally delta correlated. The parameters
A, D and Dn can be functions of time as discussed in Lax (1966IV), but we
shall ignore that possibility here for simplicity of presentation. Here, L denotes
the linked average (or cumulant) which is defined by
Here the symbol ":" means summation on all the indexes. The n = 1 term van-
ishes in view of Eq. (9.46). Equation (9.50) defines K(y, s) which was previously
defined in the scalar case in Eq. (8.113).
Instead of evaluating MQ by solving a partial differential equation, as in Section
8.8, we consider the direct evaluation of Eq. (8.114):
where
Equation (9.53) is now of the form, Eq. (9.49), for which the average is known.
The final result
is in agreement with Eq. (8.126), except that here, we have explicitly dealt with
the multivariable case.
where the rjk are random variables. We use the symbol G rather than F to remind
us that (G} ^ 0. The choice of linked-moments
with
is appropriate to describe Rice's generalized shot noise of Section 6.7. With this
choice, the linked-moment relation of Eq. (9.58), with F replaced by G yields
178 LANGEVIN PROCESSES
These results describe the properties of the noise source G. We are concerned,
however, with the average
then
GENERALIZED SHOT NOISE 179
Reversing the order of integration in MQ leads, as in Eq. (9.53), to a result
where
Equation (9.65) is of the form to which Eq. (9.60) can be applied. Since we have G
in place of G, the first factor on the right hand side of Eq. (9.60) should be omitted
in MQ:
When inserted into Eq. (9.63), these results, Eqs. (9.69), (9.71), contain all the
multitime statistics of the conditional multitime average MQ. The correspond-
ing multitime correlations can be determined by simply differentiating MQ with
respect to %(«1)^(^2) • • ••
180 LANGEVIN PROCESSES
9.7 Systems possessing inertia
and the cross-moments are clearly odd in time (see Eq. (7.14))
Hence such moments must vanish at equal time in the classical case:
where
and F is an external force. In the presence of noise, random forces can be added to
the right hand side of Eqs. (9.78) and (9.79). Hashitsume (1956) presented heuris-
tic arguments that no noise source should be added to the momentum-velocity
SYSTEMS POSSESSING INERTIA 181
relation. However, a proof is required, which we shall make based on the Einstein
relation. Equations (9.78) and (9.79) correspond to the decay matrix
The nonequilibrium case is discussed in Lax (19601), Section 10. The Einstein
relation Eq. (9.18) in the equilibrium case then yields
The presence of elements only in the lower right hand corner means that, as
desired, random forces only enter Eq. (9.79). If F in Eq. (9.79) is regarded as
the random force then
may be regarded as a different way of stating the Einstein relation and the
fluctuation-dissipation theorem.
10
Langevin treatment of the Fokker-Planck process
In Chapter 9, the Langevin processes are discussed based on the Langevin equa-
tions in the quasilinear form, Eqs. (9.1), (9.2). In this chapter, we consider the
nonlinear Langevin process defined bv
The coefficients B and a may explicitly depend on the time, but we will not display
this time dependence in our equations.
We will now limit Langevin processes which lead back to the ordinary Fokker-
Planck equation, i.e., a generalized Fokker-Planck equation that terminates with
second derivatives. We shall later find that the classical distribution function of the
laser, which corresponds to the density matrix of the laser, obeys, to an excellent
approximation, an ordinary Fokker-Planck equation. We assume
The Gaussian nature of the force f ( t ) implied by Eq. (10.4), which is needed for a
conventional Fokker-Planck process, namely one with no derivatives higher than
the second. It is possible to construct a Langevin process which can reproduce any
given Fokker-Planck process and vice versa.
The process defined by Eq. (10.1) is Markovian, because a(t + At) can be cal-
culated in terms of a(t), and the result is uninfluenced by information concerning
a(u) for u < t. The /'s at any time t are not influenced by the /'s at any other
t', see Eqs. (10.2)-(10.4), nor are they influenced by the a's at any previous time
since / is independent of previous a. Thus the moments Dn in Section 8.1 can be
calculated from the Markovian expression:
DRIFT VELOCITY 183
The difference between the unlinked and linked averages in Eq. (10.5) vanishes
in the limit At —> 0. Rewriting Eq. (10.1) as an integral equation and denoting
The first term is already of order At and need not be calculated more accurately.
In the second term of Eq. (10.7) we insert the first approximation
Retaining only terms of order At, or / 2 , but not /At, or higher, we arrive at the
second and final approximation
Let us now take the moments, Dn. For n > 2, using Eq. (10.4), to order At, we
have
Thus from Eq. (10.5) all Dn = 0 for n > 2. For n = 2, using Eqs. (10.2)-(10.5),
we obtain
184 L A N G E V I N TREATMENT OF THE FOKKER-PLANCK PROCESS
and for n = 1,
The double integral in Eq. (10.14), evaluated using Eq. (10.3), is half that in Eq.
(10.12) since only half of the integral over the delta function is taken. Integration
over half a delta function is not too well denned. From a physical point of view,
we can regard the correlation function in Eq. (10.3) to be a continuous symmetric
function, such as a Gaussian of integral unity. As the limit is taken with the corre-
lation becoming a narrower function, the integral does not change from 1/2 during
any point of the limiting process.
Equations (10.13) and (10.14) have shown that given a Fokker-Planck pro-
cess, described by a drift vector A(a) and a diffusion D(a), we can determine the
functions B(a) and a (a):
that leads to a Langevin process with the correct drift A(a) and diffusion D(a) of
the associated Fokker-Planck process described by Eqs. (10.1). The reverse is also
true. Given the coefficients B and a (a) in the Langevin equation, we can construct
the corresponding coefficients, A and D of the Fokker-Planck process.
The procedure used in the above section may be regarded as controversial, because
we have used an iterative procedure which is in agreement with the Stratonovich's
(1963) treatment of stochastic integrals, as opposed to the Ito's (1951) treatment.
This disagreement arises when integrating over random process that contain white
noise, the so-called Wiener processes. We shall therefore consider an example
which can be solved exactly. Moreover, the example will assume a Gaussian ran-
dom force that is not delta correlated, but has a finite correlation time. In that case,
there can be no controversy about results. At end, however, we can allow the cor-
relation time to approach zero. In that way we can obtain exact answers even in
the white noise limit, without having to make one of the choices proposed by Ito
or Stratonovich, as discussed in Lax (1966IV), Section 3.
AN EXAMPLE WITH AN EXACT SOLUTION 185
The example we consider is:
The average is then expressed in terms of the cumulants. But for the Gaussian case,
the series in the exponent terminates at the second cumulant:
It was permissible, here, to obey the ordinary rules of calculus in this trans-
formation, without requirement of Ito's calculus lemma, because delta function
correlations are absent. The equation of motion for x is
Since x would be constant in the absence of the random force f ( t ) , then the prob-
ability of x at time t, P(x, t), is necessarily Gaussian, and determined only by the
186 LANGEV1N TREATMENT OF THE FOKKER-PLANCK PROCESS
random force /(£), therefore, has the normalized solution form
and changing back to the original random variable a, Eq. (10.24) leads to
We ask what is the Langevin equation for Ml Following the procedure in Section
8.2, the drift vector for M in the Fokker-Planck process is determined by
and
We obtain that
Therefore, the transform in our Langevin equation obeys the ordinary calculus
rule.
The average is contributed not only from B(M,t] but also from the second
term
which is not zero, except that a(M) is a constant. For the conditional average with
M(t) = M, the contribution from the second term is
Hence, Eq. (10.37) can be simply obtained from Eq. (10.36) multiplying by S. For
obtaining the average or the conditional average with s(t) = s, d(S)/dt, we have
and for the conditional average, (S) in the last expression of Eq. (10.38) is replaced
by S.
The stochastic differential equations, in which Ito's calculus lemma is used for
the transform of random variables, are broadly applied in the financial area. Ito's
stochastic equation is written as
Our Eq. (10.30) is similar to Ito's calculus lemma, which indicates that the
ordinary calculus rule is not valid for the Fokker-Planck processes, as shown in
Section 8.2. However, the calculus rule for our Langevin equation is not Ito's
calculus lemma.
The difference between our Langevin's stochastic equation and that using Ito's
lemma is originates from the different definitions of the stochastic integral. Ito's
integral <r(a(t))H is evaluating at beginning of the interval. cr(a(t))H is replaced
by cr(a(t c ))H, where tc = t — e is slightly earlier than t. This leads to the average
EXTENDING TO THE MULTIPLE DIMENSIONAL CASE 189
(o-(a(£ c )) H /0)) = 0, and B(a,t] = A(a,t) in Eq. (10.39). Hence, Ito's calculus
lemma must be introduced in order to obtain the correct answer. In our Langevin
description (a(a(t))f(t)} is estimated based on an integrand as function of t, that
can be recognized as a limit of an analytical function on which the Riemann inte-
gral exists as shown in Section 10.2, and finally it approaches to a delta correlated
function: (f(t]f(s)} = 2S(t — s}. Hence, (a(a(t))f(t)} is estimated to be nonzero,
except the case that a is a constant.
There is an unimportant difference in notation between ours and that used in
some literature. The standard Wiener notation is equivalent to the correlation
whereas the customary physics (and our) notation would have a factor of 2 placed
on the right hand side of these equations. Hence, there is a notation transform
between that used in this book and that in other literature:
Although two stochastic approaches lead to the same mathematical result on the
average level or on the conditional average level, we, as physicists, prefer a method
more compatible to the actual natural processes, that the integrand is a function
of time t. The ordinary calculus rule can be used in our Langevin's stochastic
equation. This is a major advantage of our approach. As shown by the example in
Eq. (10.38), for a random variable dx = dS/S, one cannot simply write dS =
Sdx when Ito's calculus lemma is used. This could possibly misleading, and will
be discussed in Chapter 16. Other examples of applying two different stochastic
approaches in the financial area are presented in Section 16.6 and Section 16.7.
Consider a set of random variables a [ai, a-2,..., an], which obey the Langevin
equations:
and
For a set of functions M(a) [Mi (a), M2(a), . . . , M n (a)], the Langevin equation
for M is written as
We ask what are £?j(M, t) and <Tj(M)? Following the procedure in Section 8.2,
the drift vector for M in the Fokker-Planck process is determined by
where [Di(a,t)]i and [D^(a., £)]&/ are given, separately, by Eq. (10.49) and Eq.
(10.48).
The fluctuation term for M is determined by
Therefore, the transform in the Langevin formula of M(a) obeys the ordinary
calculus rule.
The average of the fluctuation term (<Tj(M)/(£)} is given by
and for the conditional average, the average symbol ( . . . ) on the right hand side of
the above equation is taken out.
In this section we use another method to estimate (M(a)F(t)}, and extend to the
multidimensional case. Here, F(t) is limited to be independent of a, hence, is a
linear fluctuation model. This approach will be used in the next chapter.
192 L A N G E V I N TREATMENT OF THE FOKKER-PLANCK PROCESS
Let us consider an arbitrary function M(a) of the set of random variables
a= [ai,..., an] which obey the usual coupled Langevin equations:
i.e., the noise sources at time t are independent of the variables a at the time s < t.
Our calculation here will be different from the iteration procedure used in Section
10.1, and simpler. We set
and will later let e —> 0. We rewrite (M(t)Fj(t)), with notation M(t) =
M(a(t),t) , as
By Eq. (10.56), the first term vanishes, and Eq. (10.58) can be rewritten as an
integral
Only the last term of Eq. (10.60) is sufficiently singular, containing a product of
two forces, to yield a finite contribution as t — tc = e —>• 0.
MEANS OF PRODUCTS OF RANDOM VARIABLES AND NOISE SOURCE193
To lowest order in e, (dM/dai)s « (OM/dai)tc, so that
Using Eq. (10.55) and omitting a factor 2 since we are integrating only over one
half of the delta function, we obtain:
We note that on the right hand side we had taken the a's fixed at tc, and then looked
at the fluctuation of a at time t, i.e., a(t c )+fluctuation. Then we have calculated
the correlation of the components of this fluctuation with Fj (t) . We can then take
the limit t —> tc and write Eq. (10.62) as
The question has been raised with an entirely different meaning by Scully and
Lamb (1967). However, optical lasers with frequencies as high as 1015 radians per
second can have line-widths of 104 and even smaller. The addition of nonlinearity
to a system generally leads to combination frequencies, but not to an extremely
narrow resonance. The clue for the solution of this difficulty is described in Lax's
1966 Brandeis lectures (Lax 1968), in which Lax considered a viable classical
model for a laser containing two field variables (like A and A^~), two population
levels such as the upper and lower level populations in a two level atomic system,
plus a pair of atomic polarization operators that represent raising and lowering
operators.
When Lax took this nonlinear 5 by 6 system, sought a stationary state, and
examined the deviations from the stationary operating state in a quasilinear man-
ner, he discovered that there is always one nonstable degree of freedom: a phase
variable that is a combination of field and atomic phases. The next step was the
realization that this degree of instability is not an artifact of the particular exam-
ple, but a necessary consequence of the fact that this system, like many others,
including Hewlet-Packard radio-frequency oscillators, is autonomous, namely that
the equations of motion contained no time origin and no metronome-like driving
source. Mathematically, this means that the system is described by differential
equations that does not contain to explicit time dependence. As a consequence if
x(t) is a solution, where x(t) is a six-component vector, then x(t + r) is also
necessarily a solution of the differential equation system. But this means that the
solution is unstable to a time shift, or more pictorially to a frequency shift. Under
an instability, a new, perhaps sharp line can occur, as opposed to the occurrence
of summation or difference lines that arise from nonlinearity. Hempstead and Lax
(1967CVI) illustrate the key idea in a simpler system, the union of a positive and
negative impedance. In this chapter, we will first describe this nonlinear model,
build the corresponding differential equation of motion in Section 11.2, and trans-
form this equation to a dimensionless canonical form in Section 11.4. In Section
AN OSCILLATOR WITH PURELY RESISTIVE NONLINEARITIES 195
11.3, the diffusion coefficients in a Markovian process is defined, and the con-
dition for validity of this approximation is described. In Section 11.5 the phase
fluctuations and the corresponding line-width are calculated. The main result of
line-width W obtained in Eq. (11.66) is shown to be very narrow. In Section
11.6, the amplitude fluctuations is calculated using a quasilinear treatment. In Sec-
tions 11.7 and 11.8, the exact solutions of fluctuations are calculated based on the
Fokker-Planck equation of this model.
where
where
196 THE ROTATING WAVE VAN DEL POL OSCILLATOR ( R W V P )
It is consistent with the rotating wave approximation only to retain the slowly
varying parts of A and A*, so that the term of A*e2luJ°t in Eq. (11.6) is dropped,
leaving
We first use the equations for A and A* to determine the operating point, i.e.,
We call this operating point poo because we will later find a slightly better one,
po, using a different reference frame. From Eq. (11.8) we obtain the equation of
THE DIFFUSION COEFFICIENT 197
motion of A*,
where
We are doing this problem thoroughly because we will find that this classical ran-
dom process, which is associated with the quantum mechanical problem like laser
in the difficult region near threshold, reduces to the Fokker-Planck equation for
this rotating wave van der Pol oscillator.
Noticing that e_(i) and e+(t) are random forces, we now calculate the diffusion
coefficients D-+ = D+- defined by
Equation (11.15)isan appropriate definition for processes that are Markovian over
a time interval AT » (^o)"1 (see Lax 1966IV, Section 5 for a more detailed dis-
cussion of processes containing short, nonzero correlation times). Equation (11.15)
can be rewritten as
Using the definition of G(e, UJQ) in Eq. (4.50), we see that the diffusion constant in
the limit of AT —> oo is
and describes the noise at the resonant frequency U>Q. If we had chosen
then our spectrum would be that of white noise, i.e., independent of frequency.
In the case of not exactly Markovian process we are assuming the spectrum does
not vary too rapidly about OJQ (see Fig. 11.2), and thus we can approximate it by
198 THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
FIG. 11.2. In the case of not exactly Markovian process we can approximate it
by white noise evaluated at the frequency U>Q.
white noise evaluated at the frequency UIQ. The spectrum of the noise source is not
necessarily white, but only the change over the frequency width of the oscillator
is important, and that change may be small enough to to permit approximating the
noise as white. Indeed, the situation is even more favorable in a laser containing N
photons: the line width will be smaller than the ordinary resonator line-width by a
factor of N.
In general, for an oscillatory circuit such as shown in Fig. 11.1, it is essential to
choose AT to be large compared to the period of the circuit, but it is often chosen
small compared to the remaining time constants to avoid nonstationary errors. The
condition for the validity of this choice is actually
where A l is the relaxation time associated with the system and Su> is the
frequency interval that
1
over which the noise displays its nonwhiteness. To order (woAT) , we have
shown in Lax (1967V) that
Thus we actually require two conditions, Eq. (11.19) and the less stringent
condition (o^AT)"1 «C 1 .
THE VAN DER POL OSCILLATOR SCALED TO CANONICAL FORM 199
11.4 The van der Pol oscillator scaled to canonical form
The oscillator shown above is a rotating wave oscillator, but not a van del Pol
oscillator since Eq. (11.8) has an arbitrary nonlinearity R(p). Therefore we expand
R(p) about the operating point, forming the linear function
We shall later discuss the condition under which this approximation is valid. We
now perform a transformation
and
where
and
where
and
The coefficients £ and T were determined by the requirement that A' and h satisfy
Eqs. (11.27) and (11.29).
The condition for neglect of higher terms in the expansion of R(p) about the
operating point is
In Eq. (11.10) we found that an oscillator chooses to operate at a point at which its
net resistivity and its line-width vanishes. Noise in a stable nonlinear system would
add to this signal possessing a delta function spectrum, but not broaden it. Fortu-
nately, an autonomous oscillator (described by a differential equation with time
independent coefficients) is indifferent to a shift in time origin and thus is unsta-
ble against phase fluctuations. These unstable phase fluctuations will broaden the
line, whereas amplitude fluctuations only add a background. For the purpose of
understanding phase broadening, therefore, it is adequate to linearize with regard
to amplitude fluctuations. Indeed, for a purely resistive oscillator, there is no cou-
pling (at the quasilinear level) between amplitude and phase fluctuations. At least
in the region well above threshold, then, when amplitude fluctuations are small, it
is adequate to treat phase fluctuations by neglecting amplitude fluctuations entirely.
If in Eq. (11.8) we introduce
from which R(p) has disappeared. Amplitude fluctuations are neglected by setting
u = 0. The only vestige of dependence on R(p) is through po, which is \A\^ at the
operating point. Equation (11.34), with u = 0, is a differential equation containing
no time origin and no metronome-like driving source. As a consequence if x(t) is
a solution, then x(t + T) is also necessarily a solution. This means that the solution
is unstable to a time shift.
po could be replaced by the more accurate (p). Since R(p) no longer enters the
problem, we can with no loss of generality work with the dimensionless variables
introduced in Section 11.4 for the RWVP oscillator. Dropping the primes in Eq.
(11.27), and defining p = (p)/£ 2 , Eq. (11.34) (with u = 0) takes the dimensionless
form
Using Eq.
Since we already have the product of two /i's, using Eqs. (11.29) and (11.30) we
obtain
using Eq. (11.29) and integrating over half the delta function. Adding the complex
conjugate in Eq. (11.39) we get
By inserting the above first and second moments into the generalized Fokker-
Planck equation, Eq. (8.38), we obtain a simple Fokker-Planck equation for this
202 THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
process,
which describes pure phase diffusion. Since the process is Markovian, it is permis-
sible to use the standard Fokker-Planck Equation (8.38) with the delta function
initial condition, Eq. (8.45), without requiring conditional moments, Eq. (8.44).
But this is the well-known Green's function for diffusion
We can now calculate the phase line-width, which by Eq. (4.4) is given by the
Fourier transform of (a*(t)a(0)}, where a is defined in Eq. (11.4)
Since (/> is a Gaussian variable, linked averages beyond the second disappear, and
Eq. (1.56) yields
Using Eqs. (11.36) and (11.38) for £>(</>), Eq. (11.48) becomes
where the Ap describes the associated line-width, with subscript p standing for
phase, and we have
where the width AOJ is the cavity width with just the positive impedance Rp
present. We see that the line-width is proportional to l/P. For a laser, the
line-width is proportional to one over the mean number of photons N.
Now we calculate (e 2 ) Wo . we set
where the p and n refer to the e2 associated with the positive and negative resis-
tance respectively. Using the definition of (e 2 ) Wo , Eqs. (11.19)-(11.20) and the
Wiener-Khinchine theorem, Eq. (4.4), we have
The equilibrium theorem in Section 7.4 for this noise, including zero-point
contributions, leads to
where
Tp is the positive temperature and C(u,T) is the quantum correction factor which
approaches 1 for T —» oo and gives us the quantum corrections at low temper-
atures. A detailed discussion of this correction factor is given in Lax (19601),
Section 7.
204 THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
Similarly
where Tn is the negative temperature. Since both Tn and Rn are negative, the
above equation can be rewritten as
where
When zero-point contributions are included at this semiclassical level, Eq. (11.63)
becomes
In Eq. (11.69) A and A* are not constant, hence, (A(t)h*(t)) is not zero. Let us
consider (A(t)h*(t)). Using the method in Section 10.6 and then Eq. (11.27), and
integration over only half the delta function, we have
where
On the other hand, while using Eq. (11.71), the operating point for the variable
P=\A\2,ls
The advantage of po over poo is that it yields a nonvanishing value po > 0 below
threshold.
We will now calculate the amplitude fluctuations. In the quasilinear (QL)
approximation, the decay constant for amplitude fluctuations is
F O K K E R - P L A N C K EQUATION FOR RWVP 207
where the subscript a denotes the amplitude. Using Eq. (11.81), and then Eq.
(11-82),
We note that we need not have used the quasilinear approximation, as we could
have solved this problem exactly, using the drift vector Eq. (11.81) and the
diffusion coefficient, Eq. (11.78) to obtain the Fokker-Planck equation:
with
with
However if we are only concerned with radial fluctuations, our answers are inde-
pendent of <j) and thus this last term need not appear. We can now see the meaning
of the approximation made in Section 11.5 on phase fluctuations. If we replace p~l
in this last term by (p}~1, a number, then Eq. (11.96) can be separated into phase
and radial motions, Eq. (11.43) and (11.87), respectively. Therefore, in the region
well above threshold, since the amplitude fluctuations are small, the amplitude and
phase fluctuations are nearly uncorrelated.
We shall look for exact answers by considering the eigenfunctions of the Fokker-
Planck equation (1 1.94). We consider solutions of the form
FlG. 11.3. The line-width Ap that includes phase fluctuations versus the dimen-
sionless pump rate p. This figure was first presented in Lax (1967V).
since we then have the correct phase dependence. From Eq. (11.95) the amplitude
part becomes
We thus need to find the eigenvalues of this second order differential equation.
When A = 0, our solution has no <j> dependence, and we are only looking at radial
fluctuations. AQ,O = 0 corresponds to the steady state. The lowest nonvanishing
eigenvalue with A = 0 is called A a , i.e.
which is appropriate for amplitude noise. On the other hand when A = 1, we have
a solution proportional to e1^ which is appropriate for considering (e z ^~^ 0 ^) (see
Section 11.5), and thus is called
contains more than 98% of the weight. See Table VI of Hempstead and Lax
(1967CVI).
Equation (11.50) says A.pp = 1. We see that above threshold this is approxi-
mately true. Below threshold, A.pp —> 2. The reason for this behavior is discussed
in Lax (1967V). The Schawlow-Townes formula was wrong by a factor of 2
because they were basically deriving their results by linear methods valid below
threshold, not valid above threshold.
In Fig. 11.4 we plot the half-width of the amplitude spectrum Aa versus the
dimensionless pump rate p. The exact solution for intensity fluctuation is obtained
by solution of the Fokker-Plank equation in this section. The line-width using
the QL approximation is obtained by Eq. (11.85). IQL curve is obtained by Eq.
(11.86), with a better approximation for the operating point. Actually for pure
intensity fluctuations the two lowest nonzero eigenvalues are close to degenerate
and it is necessary to plot the appropriately weighted average of all the eigenvalues.
We see that when p > 10 or p < — 10, i.e., when we are away from threshold, the
quasilinear results are very close to the exact ones.
12
Noise in homogeneous semiconductors
since each integral yields a one or zero according as Ej is in the interval (Ea,Ef,)
or not. Thus n(E), which can be interpreted as the density of states (DOS), is given
by
For electrons quantized in a box of dimensions LI, L-2 and I/s, with periodic
boundary condition, the eigenstates are plane waves
where -E'(k) is the energy wave vector relationship. For large Lj, the sums can be
converted to a triple integral:
Since
212 NOISE IN HOMOGENEOUS SEMICONDUCTORS
In a crystal the integral in Eq. (12.8) is understood to sum over one Brillouin zone
(BZ).
One can interpret F/(2vr) 3 as the density of states in k space. If one introduces
the momentum variable p = hk, this is equivalent to
where
so that there is one quantum state for each volume /i3 in phase space.
Although we have used a box with three perpendicular axes, in a crystal we
could have used a box with edges parallel to the vectors ai, a2, as of the primitive
cell. The sum would then be over cell positions. The integral, Eq. (12.8), over
k could become an integral over the reciprocal lattice. The integral would still
extend over one Brillouin zone whose shape takes that of one cell of the reciprocal
lattice. Periodicity permits a rearrangement so that the integral is over a Brillouin
zone with the full symmetry of the lattice. General results, such as Eqs. (12.8) and
(12.9), remain valid.
Near the bottom of the conduction band in a semiconductor there is an
approximate effective mass relationship of the form
If we let
we get
where
DENSITY OF STATES AND STATISTICS OF FREE CARRIERS 213
is the corresponding density of states for the isotropic case. It is convenient to
define
where d£l = sin 9d6d(j), the solid angle, can be integrated over to yield a factor 4vr.
Here, DC(E) represents the density of electronic states per unit energy, and f ( E )
is the probability that any one of these states is occupied. The energy Ec is the
energy at the bottom of the conduction band. If the conduction band is isotropic
near its minimum, the energy takes the simple form:
Here m* is the effective mass of electrons in the conduction band. In this case, the
density of states is shown to take the simple form:
Here, £ = Ep is the Fermi energy. We avoid the customary symbol // since the
latter is used for mobility in this chapter. The last form in Eq. (12.20) is appropriate
to the nondegenerate case. (Nondegenerate means that the density is sufficiently
low that Boltzmann statistics are adequate.)
214 NOISE IN HOMOGENEOUS SEMICONDUCTORS
Free holes at equilibrium
Holes are simply empty electron states. Their statistics can be written in a
completely analogous manner. The density of holes, called p, is given by:
Here DV(E) represents the density of states in the valence band and is given by:
As before, we have assumed that the states near the valence band edge (now the
top of the valence band) obey a simple effective mass relationship:
The probability of a hole is the probably that the corresponding electron state is
empty:
and represent effective numbers of states at the band edges that correspond to
Boltzmann occupancy of a distributed set of states. The letters n and p are pre-
sumably used to correspond to the mnemonic, n for negative, and p for positive.
Note that the expressions, Eqs. (12.25) and (12.26), for the densities are correct,
without specifying how the Fermi level £ is to be determined.
CONDUCTIVITY FLUCTUATIONS 215
Law of mass action
The Fermi energy appears with opposite signs in Eqs. (12.25) and (12.26). Hence
it cancels out of the product. The result,
is an example of the law of mass action, with Eg representing the energy gap
between the conduction and valence bands. If donors or acceptors are present, then
the Fermi level is shifted, but Eq. (12.29) are unaffected. If ND, the donor density,
is a function of x, then £, n, and p will be functions of x but the product n(x]p(x]
will be a constant. This law is a special case of the law of chemical equilibrium.
Having the above preliminary knowledge, we will concentrate on the calcula-
tion of noise in semiconductors.
where ^p and nn are the hole and electron mobilities, p and n are the hole and
electron concentrations, and P and N are the total hole and electron numbers
over the volume AL between the electrodes, of area A and separation L. Thus the
fractional voltage fluctuations are given by
If only electrons and holes are present (and not traps) charge neutrality will
be enforced up to the (very high) dielectric relaxation frequency, so that to an
excellent approximation
and the total voltage fluctuation may be obtained by replacing the integral by unity.
The total noise, which only involves <&(0) = 1, is consistent with the normalization
condition in the noise spectrum. The after-effect function will be calculated in later
sections.
This result is not entirely surprising, since the total number of carriers is an integral
Indeed, Eq. (12.38) can be derived directly from Eq. (12.39) using only the
assumption that the TJ are uniformly distributed in space.
THERMODYNAMIC TREATMENT OF CARRIER FLUCTUATIONS 217
A less obvious case is that of a set of Nt traps interacting with a reservoir of
chemical potential (4. We assume that the trap occupancy is sufficiently high that
Fermi statistics are necessary. In that case, the number of filled traps by use of
Fermi-Dirac statistics is
It can be seen that the fluctuations are reduced by a factor equal to the fraction of
empty states. The reason for this result is made clear in the next section, in which a
kinetic approach is used for the same problem. If both N and N are allowed to vary
simultaneously, the simplest distribution consistent with these second moments is
The term in AA^ATV vanishes because N does not depend on (^ and N does not
depend on £. Within the quasilinear approximation, it is appropriate to ignore
higher cumulants than the second and stop at the Gaussian approximation.
Suppose, now, that the electrons in traps do not have an independent reservoir,
but are obtained from the free carriers. Then we must impose the conservation
condition
The situation for holes is similar to that for electrons. If the holes have their
own reservoir, then the typical Poisson process prevails
If holes, traps and electrons are all present and coupled to each other then charge
neutrality imposes the constraint
218 NOISE IN HOMOGENEOUS SEMICONDUCTORS
In the presence of compensating centers, Nco, there is also a neutrality condition
for the steady state
which includes all three statistical cases, Fermi, Boltzmann and Bose, with the
three choices of e above. This result is true in equilibrium.
The master transition probability from occupation number n(a) to n'(a) can be
written as
GENERAL THEORY OF CONCENTRATION FLUCTUATIONS 219
We now perform the calculation of the 6th state of the first moment of the
transition probability defined by Eq. (8.10):
If one inserts Eq. (12.52) and sums first over n', the only terms in the sum which
contribute are those for which a = b and a' = b,
In a steady state, one is tempted to make the terms on the right hand side of Eq.
(12.55) cancel in pairs by detailed balance:
However, if one requires that three states a,b,c be consistent with one another
under this requirement, one finds that
Thus if the ratio of forward to reverse transition probabilities has the form Eq.
(12.58), then the steady state solution has the form
or
where
for c 7^ b. The two terms in D^c are equal under detailed balance but not otherwise.
For c = 6, we obtain
The steady state second moments (An(6)An(c)} now are chosen so that the right
hand side of Eq. (12.64) vanishes, i.e., so that the Einstein relation is obeyed.
Assuming that there is no correlation between fluctuations in different states, we
GENERAL THEORY OF CONCENTRATION FLUCTUATIONS 221
try a solution of form
satisfies the Einstein relation of Eq. (12.64), provided the steady state obeys
detailed balance, Eq. (12.56). Using Eq. (12.59), we have
which leads to
The total number of systems is N = ^n(c), which yields a formula similar to the
thermodynamic case, Eq (12.38).
In our model, however, the total number TV should be fixed, and we need to
force a constraint. The solution Eq. (12.68) we found is only a particular solution,
and can be added a solution of the homogeneous equation
For the Boltzmann case (e = 0), the solution Eq. (12.68) is replaced by
which obeys the constraint {[^ a Ara(a)]Are(c)} = 0 . For the Fermi and Bose
cases, we have
The added term is of order 1/N and therefore is unimportant in calculating the
fluctuations in any small portion of a large system. However, this term does affect
fluctuations in appreciable parts of a system.
Equation (12.50) can be readily applied to the case in which n(l) = N, the
number of conduction electrons, n(2) = N, the number of trapped electrons and
re(3) = Nv — P, number of electrons in the valence band, where Nv is the number
of valence band states and P is the number of holes. Thus
We have assumed nondegeneracy for the holes and the free electrons, but not for
the trapped electrons. Since An(3) = — P, we can write the second moments in
222 NOISE IN HOMOGENEOUS SEMICONDUCTORS
the form
where
Note that this normalization is four times that used for g(uj~) in Lax and Mengert
(1960). For simplicity, we confine ourselves to a one-dimensional geometry, as
INFLUENCE OF DRIFT AND DIFFUSION ON MODULATION NOISE 223
was done by Hill and van Vliet (1958), and calculate the total hole fluctuation as
by the integral
we can write
so that the correlation at two times is, as usual, related to the pair correlation at the
initial time
where the coefficient of the delta function is chosen so that the fluctuation in the
total number of carriers {(AP)2} is given correctly by Eq. (12.82). Here L is the
distance between the electrodes.
224 NOISE IN HOMOGENEOUS SEMICONDUCTORS
The definition, Eq. (12.34), of $(t) yields the expression
Here, v and D are the bipolar drift velocity and diffusion constants found by van
Roosebroeck (1953) to describe the coupled motion of electrons and holes while
maintaining charge neutrality
where the individual diffusion constants and mobilities are related by the Einstein
relation.
Equation (12.95) for the Green's function can be solved by a Fourier transform
method
where
With Eq. (12.99) for k, the after-effect function can be calculated from Eq. (12.94)
Lax and Mengert (1960) provide an exact evaluation of this integral. However,
the resulting expressions are complicated. It is therefore worthwhile to treat some
INFLUENCE OF DRIFT AND DIFFUSION ON MODULATION NOISE 225
limiting cases. For example, if there is no diffusion, then
where Ta = L/v is the transit time and the spectrum is governed by a windowing
factor W
Indeed, the current noise, in this special case, can be written in the form given by
Hill and van Vliet (1958)
which emphasizes the similarity to shot noise. The equivalent current is defined by
a windowing factor similar to that found associated with the effect of transit time
on shot noise.
226 NOISE IN HOMOGENEOUS SEMICONDUCTORS
In the opposite limit, in which diffusion is retained but drift is neglected, the
exact result for the spectrum is given by
where
and
is the reciprocal of the diffusion length. The exponential term represents an inter-
ference term between the two bounddaries that is usually negligible since they are
seperated by substantially more than a diffusion length. A simple approximate
from over intermediate frequencies is
In summary, in addition to the first term, which represents the volume noise easily
computed just by using the total carrier P(t), the term proportional to an inverse
frequency to the three-halves power arises from diffusion across the boundary at
the electrodes.
13
Random walk of light in turbid media
13.1 Introduction
Clouds, sea water, milk, paint and tissues are some examples of turbid media. A
turbid medium scatters light strongly. Visible light shed on one side of a cup of
milk is much weak and diffusive observed on the other side of the cup because
light is highly scattered in milk while the absorption of light by milk is very low.
The scattering and absorption property of a turbid medium is described by the
scattering and absorption coefficients /j,s and /j,a, respectively. Their values depend
on the number density of scatterers (absorbers) in the medium and the cross-section
of scattering (absorption) of each individual scatterer (absorber). For a collimated
beam of intensity IQ incident at the origin and propagating along the z direction
inside a uniform turbid medium, the light intensity in the forward direction at
228 RANDOM WALK OF LIGHT IN TURBID MEDIA
position z is attenuated according to the Beer's law:
FIG. 13.1. A photon moving along n is scattered to n' with a scattering angle
0 and an azimuthal angle ^ in a photon coordinate system xyz whose 2-axis
coincides with the photon's propagation direction prior to scattering. XYZ is
the laboratory coordinate system.
on the angle between s and s' rather than the directions and the phase function can
be written in a form of .P(s • s').
Denote the position, direction and step-size of a photon after ith scattering event
as xW, sW and S-l\ respectively. The initial condition is x(°) = (0, 0, 0) for the
starting point and s^0) = SQ — (0, 0, 1) for the incident direction. The laboratory
Cartesian components of x*^ and s^-1 are Xa and Sa (a = 1,2,3). The photon
is incident at time to = 0- For simplicity the speed of light is taken as the unit of
speed and the mean free path /u^1 as the unit of length.
The scattering of photons takes a simple form in an orthonormal coordinate
system attached to the moving photon itself where n is the
photon's propagation direction prior to scattering and m is an arbitrary unit vector
not parallel to n (see Fig. 13.1). The distribution of the scattering angle 9 € [0, vr]
is given by the phase function of the medium and the azimuthal angle (f) is uni-
formly distributed over [0 , 2-rr) . For one realization of the scattering event of angles
(0, (f>) in the photon coordinate system, the outgoing propagation direction n' of
the photon will be:
230 RANDOM WALK OF LIGHT IN TURBID MEDIA
FIG. 13.2. The average photon propagation direction (vector) decreases as gr"
where g is the anisotropy factor and n is the number of scattering events.
The freedom of choice of the unit vector m reflects the arbitrariness of the xy axes
of the photon coordinate system. For example, taking m = (0, 0,1), Eq. (13.2)
gives
Similar equalities are obtained for x and y components as the labels are rotated
due to the symmetry between x,y,z directions. The correlations between the
propagation directions are hence given by
On the other hand, the correlation between si and Sa (j > i) can be reduced
to a correlation of the form of Eq. (13.7) due to the following observation
232 RANDOM WALK OF LIGHT IN TURBID MEDIA
where p(s^ |sW) means the conditional probability that a photon jumps from sW
at the ith step to s(J) at the jth step. Equation (13.8) is a result of the Chapman-
Kolmogorov condition (2.17), p(sW sW) = f ds^-^p(s^\s^~^)p(s^-^ sW)
of the Markovian process and the fact f ds^Sa p(s^\s^~^) = gs2 from
Eq. (13.3). Combining Eqs. (13.7) and (13.8), and using the initial condition of
s(°\ that is,
we conclude
The connection between the macroscopic physical quantities about the photon dis-
tribution and the microscopic statistics of the photon propagation direction is made
by the probability, pn(t), that the photon has taken exactly n scattering events
before time t (the (n + l)th event comes at t). We claim pn(t) obeys the general-
ized Poisson distribution. This claim was previously proved by Wang and Jacques
(1994):
which is the Poisson distribution of times of scattering with the expected rate
occurrence of IJL~I multiplied by an exponential decay factor due to absorption.
Here we have used p^1 = 1 as the unit of length. This form ofpn(t) can be easily
verified by recognizing first that po(t) = exp(—t) equals the probability that the
photon experiences no events within time t (and the first event occurs at t); and
second that the probability pn+i (t) is given by
MACROSCOPIC STATISTICS 233
in which the first event occurred at t' is scattering and followed by n scattering
events up to but not including time t, which confirms Eq. (13.11) at n + 1 if
Eq. (13.1 1) is valid at n. The total probability of finding a photon at time t
revealing that the center of the photon cloud moves along the incident direction for
one transport mean free path lt before it stops (see Fig. 13.3).
The second moment of the photon density is calculated as follows. Denote
p(&2, £21 si, ti) the conditional probability that a photon jumps from a propagation
direction si at time t\ to a propagation direction 82 at time t% (£2 > ti > 0). The
conditional correlation of the photon propagation direction subject to the initial
condition is given by
Denote the number of scattering events encountered by the photon at states (si, t\)
and (82, £2) as n\ and n^ respectively. Here n<i > n\ since the photon jumps from
(si, t\) to (82, £2)- Equation (13.17) can be rewritten as
where C"1 = n,2'/[( n 2 ~~ rai)'ni'] and we have repeatedly used the binomial
expansion of (a + b)n = J^o<fc<n C^an^kbk. With this result, we obtain
This exact result for ( s p ( t 2 ) s a ( t i ) ) can be easily verified to agree with the
regression theorem discussed in Chapter 8.
The second moment of the position is then
after integration. Our main result, Eqs. (13.16) and (13.22), agrees with Eqs.
(14.31)-(14.33) in Chapter 14 derived by the cumulant expansion.
The general form of the photon distribution depends on all moments of the
distribution. However, after a sufficiently large number of scattering events have
taken place, the photon distribution approaches a Gaussian distribution over space
according to the central limit theorem (Kendall 1999). This asymptotic Gaussian
distribution, characterized by its central position and half-width (2Dt), is then
where the normalizing factor is C(t) = exp(—// O t) owing to Eq. (13.13). This
provides a "proper" diffusion solution to radiative transfer, revealing a behavior
of light propagation that photons migrate with a center that advances in time, and
with an ellipsoidal contour that grows and changes shape (see Fig. 13.3).
It is also worth mentioning that the absorption coefficient only appears in
the generalized Poisson distribution pn(t) through an exponential decay factor
exp(—Hat}- This exponential factor will be canceled in the evaluation of the condi-
tional moments of the photon distribution, see Eqs. (13.17) and (13.18). Hence, the
sole role played by absorption is to annihilate photons and affects neither the shape
of the distribution function nor the diffusion coefficient (Durduran et al. 1997; Cai
et al. 2002).
The results, except for the Gaussian photon distribution Eq. (13.23), are exact
under the sole assumption of a Markov random process of photon migration. The
deviation from a Poisson distribution of scattering or absorption events can be
dealt with by modifying pn(t). The Markov random process is usually a good
description of scattering due to short-range forces such as photon migration in
turbid media. In situations where interference of light is appreciable, the phase
of photon, which depends on its full past history, must be considered and this
is non-Markovian. Non-Markov processes may also occur in scattering involving
long-range forces such as Coulomb interaction between charged particles in which
the many-body effect cannot be ignored.
236 R A N D O M WALK OF LIGHT IN T U R B I D MEDIA
FlG. 13.3. The center of a photon cloud approaches lt along the incident direction
and the diffusion coefficient approaches lt/3 with increase of time.
We should finally point out that this treatment is for a scalar photon. Light is
a vector wave. The vector nature produces some intrigue effects of multiply scat-
tered light including the polarization memory effects where light polarization is
preserved over a long distance where light is already diffusing. A scattering matrix,
as opposed to the scalar phase function, needs to be used to describe polarized light
scattering in turbid media. Nevertheless, the simple picture of a random walk of
light can be generalized to treat propagation and depolarization of polarized light in
turbid media. Characteristic lengths governing depolarization of multiply scattered
light can be determined analytically and explain the observed memory effects. The
interested reader may refer to Xu and Alfano (2005, 2006) and references therein.
14
14.1 Introduction
where fis is the scattering rate, //0 is the absorption rate, and P(s, s') is the phase
function, normalized to J P(s, s')ds' = 1. When the phase function depends only
on the scattering angle in an isotropic medium, we can expand the phase function
in Legendre polynomials with constant coefficients,
A difficulty in solving Eq. (14.1) is that the term vs • Vr-f (r, s, t) makes compo-
nents of spherical harmonics of /(r, s,t) coupling with each other. We first study
the dynamics of the distribution in direction space, F(s, SQ, t), on a spherical sur-
face of radius 1. The kinetic equation for F(s, SQ, t) can be obtained by integrating
Eq. (14.1) over the whole spatial space, r. The spatial independence of/i s , fj,a, and
P(s, s') retains translation invariance. Thus the integral of Eq. (14.1) obeys
Since the integral of the gradient term over all space vanishes, in contrast to Eq.
(14.1), if we expand F(s, SQ,£) in spherical harmonics, its components do not
DERIVATION OF CUMULANTS TO AN A R B I T R A R I L Y HIGH ORDER 239
couple with each other. Therefore, it is easy to obtain the exact solution of Eq.
(14.3):
where
Two special values of g\ are: go = 0, which follows from the normalization
of F(s, s') and g\ = v/ltT, where ltl is the transport mean free path, defined by
^tr = v/[ns(l — (cos6})], where (cos9) is the average of s • s' with P(s,s') as
weight. In Eq. (14.4), Y/ m (s) are spherical harmonics normalized to 4vr/(2/ + 1).
Equation (14.4) serves as the exact Green's function of particle propagation
in the velocity space. Since in an infinite uniform medium this function is inde-
pendent of the source position, TO, the requirements for a Green's function are
satisfied, and especially, the Chapman-Kolmogorov condition (see Section 2.4) is
obeyed:
where (...) means the ensemble average in the velocity space. The first delta func-
tion imposes that the displacement, r — 0, is given by the path integral. The second
delta function assures the correct final value of direction. Equation (14.6) is an
exact formal solution of Eq. (14.1), but cannot be evaluated directly. We make a
Fourier transform for the first delta function in Eq. (14.6), then make a cumulant
expansion, (a detailed explanation of cumulants, see Section 1.7) and obtain
240 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
where T denotes time-ordered multiplication. In Eq. (14.7), the index c denotes
the cumulant, which is denned as (A)c = ( A ) , (A2)c = (A2) - ( A ) ( A ) . A general
expression relating the moment (Am) and the cumulant (Am}c is given by:
Hence, if (Am} (m = 1,2, ...re) have been calculated, (Am}c (m = 1,2, ...re) can
be recursively obtained, and conversely.
In the following, we derive the analytical expression for the ensemble average
{/o dtn.... JQ d t i T [ s j n ( t n ) . . . S j 1 ( t i ) ] } . Using a standard time-dependent Green's
function approach, it is given by
with the row index (from above) j = —1,0,1 and the column index (from the left)
i = 1,0, — 1. The orthogonality relation of spherical harmonics is given by
Using Eq. (14.11) and Eq. (14.13), the integrals over dsn...ds in Eq. (14.9)
can be analytically performed. We obtain, when SQ is set along z, that
Note that all ensemble averages have been performed. Equation (14.15) involves
integrals of exponential functions, which can be analytically performed. Equation
242 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
(14.15) includes all related scattering and absorption parameters, gi, I = 0,1,...
and na, and determines the time evolution dynamics. The final particle direction,
s, appears as the argument of the spherical harmonics Yj m (s) in Eq. (14.14). Sub-
stituting Eq. (14.15) into Eq. (14.14), and using a standard cumulant procedure,
the cumulants as functions of angle s and time t up to an arbitrary nth order can
be analytically calculated. The final position, r, appears in Eq. (14.7), and its com-
ponent can be expressed as |r|lij(r) , j = 1,0, —1, with r and f are, separately,
the magnitude and the unit direction vector of r. Then, performing a numerical
three-dimensional inverse Fourier transform over k, an approximate distribution
function, /(r, s,i), accurate up to nth cumulant, is obtained.
By a cut-off at the second cumulant, the integral over k in Eq. (14.7) can be analyt-
ically performed, which directly leads to a Gaussian spatial distribution displayed
in Eq. (14.17). The exact first cumulant provides the correct center position of the
distribution. The exact second cumulant provides the correct half-width of spread
of the distribution. The expressions below are given in Cartesian coordinates with
indexes a, (3 = [ x , y , z ] . These expression is obtained by use of an unitary trans-
form sa = UajSj j = 1,0, —1 from Eq. (14.14) (up to second order) which is
based on Sj = YIJ(S), with
We set SQ along the z direction and denote s as (9, (/>). Our cumulant approximation
to the particle distribution function is given by
with the center of the packet (the first cumulant), denoted by rc, located at
yz is obtained by replacing cos <p in Eq. (14.25) by sin <p. In Eqs. (14.22)-( 14.25)
244 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
In contrast to Eqs. (14.18), (14.19) and (14.22)-(14.25), the results for N(r, t)
are independent of gi for I > 2. Each distribution in Eq. (14.17) and Eq. (14.30)
describes a particle "cloud" anisotropically spreading from a moving center, with
time-dependent diffusion coefficients. At early time t —» 0, /(<?) « t + O(t 2 ) in
Eq. (14.20), and E\j) w t 2 / 2 + O(t3) for j = 1, 2, 3,4 in Eqs. (14.26)-(14.29).
From Eqs. (14.18), (14.19), Eqs. (14.22)-(14.25), and Eqs. (14.31)-(14.33), we
see that for the density distribution, N(r, t), and the dominant distribution func-
tion, that is J(r, s, t) along s = SQ, the center moves as vtso and the Bap in Eq.
(14.21) are proportional to t3 at t —» 0. These results present a clear picture of
nearly ballistic motion at t —» 0. With increase of time, the motion of the center
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION245
FlG. 14.1. The moving center of photons, Rz, and the diffusion coefficients, Dzz
and Dxx, as function of time, where g\ are calculated by Mie theory, assuming
water drops with a/A = 1, with a the radius of droplet and A the wavelength
of light, and the index of refraction m = 1.33.
slows down, and the diffusion coefficients increase from zero. This stage of parti-
cle migration is often called a "snake-like mode". At large times, the distribution
function tends to become isotropic. The particle density, at t » l^/v and r > / tr ,
tends towards the center-moved (1/tr) diffusion solution with the diffusive coeffi-
cient /tr/3. Therefore, our solution quantitatively describes how particles migrate
from nearly ballistic motion to diffusive motion, as shown in Fig. 14.1.
Figure 14.2 shows the light distribution as a function of time at different
receiving angles in an infinite uniform medium, computed by the second cumu-
lant solution, where the detector is located at 5/tr from the source in the incident
direction of the source.
The analytical solution obtained, although it has exact center and half-width, is
not satisfied in two aspects. First, at very early times, exp(—git) —>• I for all /,
hence, one cannot ensure summation over / to be convergent. Second, particles at
the front edge of the Gaussian distribution travel faster than speed v, thus violating
causality.
246 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
The moments of the ballistic component can be easily calculated. When SQ is along
z, we have
hence, the moments of the scattered component can be obtained by subtracting the
corresponding ballistic moments from the moments of I(r, s, t ) . For example, we
have
Notice that
Substituting Eq. (14.38), (14.35) into Eq. (14.37), the corresponding cumulants for
scattered component 1^ (r, s, t) can be easily obtained, which are the following
replacements of Eqs. (14.4), (14.18), and (14.22):
The expressions of other components of the first and second cumulants are
unchanged, provided all F(s, so,i) in G in Section 14.3 is replaced by
F^ (s, SQ, t). Note that Eq. (14.38) actually is equal to zero at s ^ SQ, and there is
no ballistic component at these directions.
The replacement of equations in Section 14.3 by Eqs. (14.39)-(14.41) greatly
improves calculation of cumulants at very early times. By subtraction introduced
above, the terms for large I approaches to zero, and summation over / becomes
convergent at very early times. Because g\ = ^s[l — ai/(2l + l)], which approaches
248 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
to /j,s for large /, f(gi — gi±i) ~ t, and E^ ~ t 2 /2 when t —> 0, which results
in cancellation in the summand for large / at very early times.
An example of successful use of this replacement is calculation of backscat-
tering. When 9 = 180°, Pi(cos6) = 1 or —1, depending on I even or odd.
The computed rcz at very early times using Eq. (14.18) oscillates with a cut-off
of 1. But the computed TZ at very early times using Eq. (14.40) becomes sta-
c(s)
ble. Calculation shows that TV = 0 at any time for any phase function when
9 = 180°.
Figure 14.3 shows the computed time profile of the backscattering intensity
l(s\r, s, t) at a detector centering at r = 0 and detection angle 9 = 180°, compar-
ing with the Monte Carlo simulation. The absolute value of intensity, as well as the
shape of the time-resolved profile, computed using our analytical cumulant solu-
tion matches well with that of the Monte Carlo simulation. The insert diagram is
the same result drawn using a log scale for intensity. Note, this result of backscat-
tering, based on solution of the transport equation, is for a detector located near the
source, different from other backscattering results based on the diffusion model,
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION249
which are only valid when the detector is located with a distance of several / tr from
the source.
Figure 14.4 shows /(r, s, t) with a detector located at z = 6/tr front of source
and received direction along (9 = 0, computed using the analytical cumulant solu-
tion up to tenth order of cumulants (solid curve), to the second order cumulants
(dotted curve), the diffusion approximation (thick dotted curve), and the Monte
Cairo simulation (discrete dots). The figure shows that the tenth order cumulant
solution is located in the middle of the data obtained by the Monte Carlo simu-
lation, and /(r, s, t) K- 0 before the ballistic time t\, = Ql^/v. The second order
cumulant solution has nonzero /(r, s, t) before tf,, which violates causality. The
computed JV(r, t)/4vr using the diffusion model has a large discrepancy with the
Monte Carlo simulation, and the diffusion solution has more nonzero components
before fy,, which violates causality.
Using the second order cumulant solution, the distribution function can be com-
puted very fast. The associated Legendre functions can be quickly computed using
recurrence relations with accuracy limited by the computer machine error. It takes
a minute to produce 105 data of /(r, s, t) on a personal computer. On the other
hand, in order to reduce the statistical fluctuation to a level shown in Fig. 14.4, 109
events are counted in the Monte Carlo simulation, which takes tens of hours com-
putation time on a personal computer. Computation of high order cumulants also
is a cumbersome task, because the number of involved terms rapidly grows with
increase of order n. Also, It has been proved that as long as there are some nonzero
cumulants higher than 2, all cumulants up to infinite order must be nonzero (See
Section 8.3). Therefore, no matter how a cut-off at a finite order n > 2 is taken,
the cumulant solution of the Boltzmann transport equation cannot be regarded to
be exact.
B. Reshaping the particle distribution For practical applications, we use a
semiphenomenological model. The Gaussian distribution is replaced by a new
250 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
shaped form, which maintains the correct center position and the correct half-width
of the distribution. The new distribution satisfies causality, namely, /(r, s, t) = 0
outside the ballistic limit, vt. There are an infinite number of choices of shape of
the distribution under the above conditions. We choose a simple analytical form as
discussed later. At long times, the half-width of the distribution a ~ (4B) 1 / 2 , with
B shown in Eq. (14.21), spreads with t 1 / 2 , hence, a <C vt at large t, and the Gaus-
sian distribution at long times with half-width a can be regarded completely inside
the ballistic sphere. The new reshaped distribution of I(r,s,t), hence, should
approach to the Gaussian distribution at long times.
where Rcz and Dzz in Eqs. (14.31), (14.32). As shown in Fig. 14.5, although
the ID Gaussian spatial distribution (the dashed curve) at time t = 2ltT/v, Eq.
(14.42), has the correct center and half-width, the curve deviates from the distri-
bution computed by the Monte Carlo simulation (dots), and a remarkable part of
the distribution appears outside the ballistic limit vt = 2/ tr - At early times the
spatial distribution is not symmetric to the center Rc. While Rc moves from the
source toward the forward side, causality prohibits particles appearing beyond vt.
This requires the particles in the forward side being squeezed in a narrow region
between Rc and vt. For a balance of the parts of distribution in the forward and
backward sides of Rc, the peak of the distribution should move to a point at the
forward side and the height of the peak should increase. Based on this observation
we propose the following analytical expression: (1) to move the peak position of
the distribution from Rcz to zc, while the parameter zc will be determined later; (2)
to take this point as the origin of the new coordinates; and (3) to use the following
form of the shape of ID density in new coordinates:
252 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
where
At the ballistic limit z = z±, N(z) reduces to zero, and N(z) = 0 when z is
outside of z±. The parameter b in Eq. (14.43) can be determined by normalization;
the parameters (a, zc) can be determined by fitting the center and half-width of the
distribution. This fit requires
The integrals in Eqs. (14.45), (14.46), and (14.47) can be analytically performed,
related to the standard error function:
The solid curve in Fig. 14.5 shows the reshaped spatial distribution, Eq.
(14.43), of the ID density at time t = 21^ /v, using the Heyney-Greenstein phase
function with g = 0.9, which satisfies causality and matches the Monte Carlo
result much better than the Gaussian distribution.
For nonlinear fitting a difficulty is how to quickly find a global minimum. The
optimization codes require setting a good initial value of the parameters, so the
obtained local minimum is the true global minimum. The following procedure is
used to quickly obtain a global minimum. At the long time limit, zc « Rcz, and
c? Ki (4Dzzvt)~l, the distribution approaches the original Gaussian distribution.
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION253
We set these value of parameters at a long time tm, and take them as initial values,
using a nonlinear fitting, to determine the parameters at tm-i = tm — At, where
At is a small time interval. Then, we use parameters at t m _i as initial values to
determine parameters at t m _2- Step by step, the parameters in a whole time period
can be computed.
SD-density In this case the ballistic limit is represented by a sphere with center
located at the source position and radius vt. We move the peak position of the
distribution from Rcz to zc along the SQ = z direction, take this point as the origin
of the new coordinates, and use the following form of the shape of 3D density as a
function of the position in the new coordinates, f:
where N(r) = 0 when r > r*, and x is the polar angle of r in the new coordinates,
f* is the distance between the new origin and the point by extrapolating f to the
surface of the ballistic sohere:
The parameters b can be determined by normalization; the parameters (o^, aj_, zc)
are determined by fitting the center and half-width of the distribution. This fit
requires
before £& = 3ltr/v has been completely removed in the reshaped form, while
the Gaussian distribution has nonzero components before t\,. The reshaped time
profile matches with result of the Monte Carlo simulation in most time periods,
but the peak values is about 20% lower. The errors are much smaller than that of
the Gaussian distribution. By integration over time, the density for the steady state
can be obtained. The difference in the steady state density between the reshaped
analytical model and the Monte Carlo simulation is about 3%.
Distribution function 1^ (r, s, t) When the detector is located less than 8/tr from
the source in a medium with large (/-factor, the distribution function /( s )(r, s, t) is
highly anisotropic, and the intensity received strongly depends on the angle. One
needs to use the photon distribution function /^(r, s,t) instead of the photon
density N(r, t).
In this case the center position rc, as a function of (s, t), is not located on the
axis at incident direction SQ. Without loss of generality, we set the scattering plane
(s, SQ) as the x-o-z plane. The center position now is located at rc = (r£, 0, rcz}.
The orientations and lengths of axes of the ellipsoid, which characterize the half-
width of spread of the distribution, can be computed as follows. The nonzero
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION255
FIG. 14.7. Schematic diagram describing the geometry of the particle spatial
distribution for scattering along a direction s ^ SQ. At certain time t, the center
of the distribution is located at rc. The half-width of the spread is characterized
by a ellipsoid (the gray area). The large sphere represents the ballistic limit.
The origin of new coordinates is set by extending from \rc to zc. f * is a point
by extrapolating a position f (in the new coordinates) to the surface of the
ballistic sphere, and the length f* is determined by Eq. (14.53).
the lengths and directions of other two axes of the ellipsoid on the scattering plane
can be obtained. In fact, calculation shows that the direction of rc is also the direc-
tion of one axis of the ellipsoid, since at a certain time t the direction rc can
replace s as the unique special direction in the scattering plane. In order to reshape
the distribution we choose a new z axis along the rc direction, and move the peak
position of the distribution from \rc\ to zc, and take this point as the origin of the
new coordinates (x, y = y, z), as shown schematically in Fig. 14.7.
In the new coordinates we use a shaped form similar to that of the 3D density
Eqs. (14.52), while a(x) in Eq. (14.52) is
where x and (p are, separately, the polar angle and the azimuthal angle of a position
r in new coordinates. The parameters (ax, ay,az, zc) are determined by fitting the
256 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
center rc and lengths of three axes of the ellipsoid characterizing the half-width
of the distribution. In many cases, the ellipsoid can be approximately treated as
an ellipsoid of revolution, with the length of the axis of the ellipsoid along the x
direction approximately equal to that along the y direction then the computation
can be simplified. The new shaped distribution function 7^(r, s, i) for a certain
direction s is normalized to F^ (s, SQ, t).
Figure 14.8 shows the computed time profile of distribution function
/( s )(r, s, £), when the detector is located at 4/ tr in front of the source, using the
Heyney-Greenstein phase function with g = 0.9. Fig. 14.8(a), (b) are, sepa-
rately, for different directions of light s: 9 = 0 and 30°. The solid curves are
for the reshaped form Eq. (14.52) and the dashed curve is for the Gaussian form.
The dots are for the Monte Carlo simulation. Anisotropic distribution is shown by
comparing with Fig. 14.8(a) and Fig. 14.8(b). The reshaped distribution removes
the intensity before tb = 4/ tr /t>, which appears in the Gaussian distribution.
The reshaped distribution matches the Monte Carlo result much better than the
Gaussian distribution.
While causality, together with the correct center and half-width of the distribu-
tion, are major controlling factors in determining the shape and the range of the
particle distribution, the detail shapes are, to some extent, different by use of the
different models.
For s at the near backscattering direction, the Gaussian distribution can be a
good approximation as shown in Fig. 14.3, because most particles suffer many
scattering events to transfer from the forward direction to the backward direction.
Our calculation shows that the center position rc is close to the source for 0 KI 180°
and far from the ballistic limit, hence, reshape has little effect on the backscattering
case.
Beside improving convergence, separating the ballistic component from the
scattered component also provides a more proper time-resolved profile for trans-
mission. In the time-resolved transmission profile the ballistic component is
described by a sharp jump exactly at time vt, separated from later scattered
component. The intensity of the ballistic component, comparing to the scattered
component, strongly depends on the ^-factor. For g = 0, l^ = ls, the ballistic
component decays to exp(—1) = 0.368 at distance 11^. But for g = 0.9 it decays
to exp(—10) = 4.54 x 10~5 at l/tr» because l^ = 10/s. The jump of the ballis-
tic component can be seen in experiments of transmission of light for medium of
small sized scatterers (small ^-factor), but is difficult to be observed for medium
of large sized scatterers (large (/-factor). Our formula provide a proper estimation
for both small and large ^-factors by explicitly separating these two components.
Using the obtained analytical expressions, the distribution I(r, s, t) can be
computed very quickly. The cumulant solution has been extended to the polarized
photon distribution (Cai, Lax, and Alfano 2000c), and extended to semi-infinite
and slab geometries (Xu, Cai, and Alfano 2002, Cai, Xu, and Alfano 2003). Using
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION257
FlG. 14.8. Time-resolved profile of photon distribution function, for light direc-
tions (a) 9 = 0, (b) 6 = 30°, where the detector is located at z = 4/tr
from the source along the incident direction, obtained by the reshaped form
Eq. (14.52) (solid curves) and the Gaussian form (dashed curve), comparing
with that of the Monte Carlo simulation (dots). The Heyney-Greenstein phase
function with g = 0.9 is used, with the absorption coefficient l/la = 0.
If there were no noise, and if m(x) were measured with infinite precision, one
could invert the integral equation to write
where
1
i.e., K is the kernel inverse to K(x, y).
SOLUTION CONCEPTS 259
The difficulty is that all kernels K(x,y) perform some smoothing on their
input. Thus one could add to the solution s(x) a high frequency term Asmujx
such that
for any small e by choosing u> sufficiently large - for any A, even an intense A.
Thus, if our measured result is precise only to within a noise n(x) of order e, many
different solutions are possible such that
One may object, however, that the solution s(x) should be smooth and not con-
tain a superposition of high frequency components. The answer is that the problem
is, in general, not well-posed (that is with a unique solution) until the nature of the
smoothness is specified. Unfortunately, the smoothness of the solution may not be
known in advance. One is attempting to determine this from the measured results!
However, if one makes no specification of smoothness, it is difficult to tell which
of the frequencies in the measurement m(x) are properties of the signal, and which
are spurious. It is our opinion, and that of a number of others, whose work will be
discussed shortly, that this issue can only be resolved by making a separate mea-
surement of the spectrum associated with the noise n(x) (or its autocorrelation
(n(x)n(x + x')}). Not only the shape of the noise spectrum, but also its intensity
is relevant, since Fourier components in a particular measurement much below the
corresponding noise intensity can not be regarded as significant, and should be
excluded from the estimated signal s(x). Thus we see that the correct procedure
for computing an estimate s(x) from m(x) must be a nonlinear one.
We shall use as a guide to the literature the paper by Price (1982) and the extensive
review of statistical methods presented by Turchin, Koslov and Malkevich (1971).
All of the methods, to be discussed below, reduce the continuum problem to
a discrete one. In the simplest case, the observation points used are at Xi and the
solution s(y) is to be evaluated at yj. Equation (15.1) then reduces to a set of
coupled linear equations
Filtering
If noise is ignored and the kernel possesses translational invariance
where m(p) and K(p) are the Fourier transforms of m(x) and K(x), respectively.
The ill-posed nature of the problem can be made evident by considering an
instrument K with Gaussian line shape:
The use of Eq. (15.10) in Eq. (15.8) clearly produces a large enhancement in any
high frequency components in m(p). If there were no noise, m(p) would vanish
as p —>• oo more rapidly than K(p) so that a well-defined expression would result
for the signal s(x).
But the added noise n(x) can be white noise, meaning that n(p) and hence
rh(p), do not fall off, but remain constant as p —> oo. The most elementary way to
avoid this difficulty is to set
where F(p) is a filter factor that falls off rapidly as p —» oo, chosen to impose
the desired smoothness on s(x). The problem is the arbitrariness involved in
specifying F(p).
METHODS OF SOLUTION 261
Regularization
A second class of methods, known as regularization methods, replaces the problem
of minimizing | \Ks — m\ |2 by a well-posed problem of the form
where D is a linear operator that measures the degree of nonsmoothness, say a sec-
ond derivative, and a is a parameter that determines the amount of nonsmoothness
allowed.
or
In Fourier space, the factor in brackets is the filter factor of Eq. (15.13), but the
matrices in Eq. (15.16) can taken in any basis set 4>^(x).
Perhaps the earliest suggestion for regularization was made by Phillips (1962)
who proposed that to keep the solution smooth one should for fixed \\Ks — m\\
262 SIGNAL EXTRACTION IN PRESENCE OF SMOOTHING AND NOISE
minimize
But then
so that
which is simply the statement that the spectrum of s"(x) is k4 times that of s(x).
A more general discussion of regularization is given by Tikhonov (1977).
Iteration
One of the earliest deconvolution schemes is the iteration scheme of van Cittert
(1931). In this scheme one starts with
and passes from the /j,th to the (/j, + l)th iterate according to
which is, in effect, a Jacobi iterative solution of the simultaneous equations. (The
prime on the sum omits the diagonal j = i term.) Jansson (1970) proposed an
overrelaxation scheme of the form
where K need not equal unity. In this case, all components of s are updated
simultaneously. If, in updating any component, the updated values of earlier com-
ponents are used, we get a generalization (for K ^ 1) of the Gauss-Seidel iteration
METHODS OF SOLUTION 263
procedure:
However, the matrix Kij will always be ill-conditioned, Eq. (15.26) will be dis-
obeyed, and convergence will never occur. Jansson succeeds, however, for a
different reason. At the start, and after each iteration, he applies a smoothing
procedure of the form
with a = 0, b = 1, s(y) = 1, and m(x) computed from this integral with a kernel
that arises from potential theory:
264 SIGNAL EXTRACTION IN PRESENCE OF SMOOTHING AND NOISE
He converts the integral equation to a set of simultaneous equations using
Simpson's rule with n points. He solves for s(0) [exact s(0) = 1] with the results:
"Contrary to what we might expect at first sight, the larger the number of points,
the worse the results are; the smoother the kernel, the worse the results are".
Franklin's method
Our section heading is borrowed from the title of a lucid contribution by Franklin
(1970). Our description of Franklin's work follows that of Shaw (1972), whose
improvement will be detailed in the next section. In our notation, Franklin
considers the solution of the problem
where the noise n has given Gaussian statistics, and the signal s also has given
statistics. The statistics of m is derived from the corresponding statistics of s and
n. Presumably, the statistics of the noise n can be obtained by measurements in
the absence of a signal. The aim is to construct a linear operator L such that an
estimate s of s can be constructed from m via
If we vary L* in Eq. (15.32), and use the cyclic property of the trace we get
in agreement with Shaw's (3.15). Note, however, that the statistics of m are deter-
mined by those of n and s according to Eq. (15.30): The scalar product with s
yields
or
The explicit result can be written as a matrix relation by suppressing the subscripts:
Thus
In the usual case, in which noise is uncorrelated with the signal, Rsn = Rns = 0,
L reduces to:
This is the result used by Franklin. We can verify that when the noise is neglected
Franklin's procedure had one significant defect: He assumed that the signal s was
drawn from a space in which the mean value {s} = 0. In addition, one often wishes
to use a number M of measured values rrij significantly larger than the number TV
of signal values Sj to be estimated. Franklin's procedure requires the solution of M
simultaneous equations. Shaw has produced an algorithm that requires inversion
only of a smaller N x N matrix as one does in a least squares calculation. In
addition, he makes an initial estimate s^ of the signal and interates assuming
(s) = s^ in computing §(n+1\
We first note that a simple least squares algorithm which requires the minimiza-
tion of
Here K^K has the reduced N x N size. This problem now has the same form as
Franklin's original problem with the replacements
The estimate based on Franklin's procedure, but using the reduced matrices, takes
the form:
SHAW'S IMPROVEMENT OF F R A N K L I N ' S ALGORITHM 267
or in expanded form:
now, Rmrn in Eq. (15.56) has a factor K^K on the left, so that its reciprocal takes
the form
where / is a unit matrix in the ji space. Then the estimated signal, Eq. (15.56),
can be rewritten as:
Since the matrix RSSK^K commutes with itself as well as with the unit matrix /,
we can move it through to the right to obtain:
our aim is to determine (s} = h from the measured data. Franklin's procedure,
Eqs. (15.314) and (15.37), is modified to
in its original form. In the reduced form Eq. (15.56) is replaced by:
In the case of white noise we can return to Eq. (15.60) and replace s by s — h, and
m by m — h. After the subtraction involving the h terms is performed, we get the
simplified result:
Shaw then provides a starting estimate s^ from the least square equation
by assuming that K^K is sufficiently sharp that Sj can be replaced (on the left) by
Sj with the result
which is just the least squares solution. Thus the iterative procedure should
eventually become unstable!
These fluctuations may represent actual noise that contaminates the signal. How-
ever, even when the signal is not contaminated by noise, but only added later,
the Franklin and Shaw procedures would break down (the problem became ill-
posed) if Rss were set equal to zero. Another interpretation is that we impose on
the problem an a priori distribution -P([s]) of possible signals. This distribution,
for example, should give weight to our prejudice that the Sj = s(xj) arise from
STATISTICAL REGULARIZATION 269
a smooth function s(x). The Phillips's regularization procedure emphasized this
point by adding a term in the minimization procedure
which becomes large if s(x) becomes highly oscillatory. These ideas can be cast
in the language of statistical decision theory. Let
If we do not know the a priori probability in detail but only its correlation matrix
Nonlinear methods have been introduced in connection with the problem of image
restoration. These methods recognize that an image is likely to have sharp edges.
The methods introduced consist in a mixture of a regularized solution with the
unregularized result with the degree of admixture varied in a local manner that is
sensitive to the gradient of the measured signal. This procedure reduces the amount
of undesirable smoothing that occurs in the vicinity of an edge. But no investiga-
tion has been made of the stability of these new procedures. The work of Abramatic
and Silverman (1982) is based upon a procedure introduced in geophysics by
Backus and Gilbert (1970) and on the work of Frieden (1975).
The idea of Abramatic and Silverman is to allow the regularization parameter
which controls smoothness of the solution to adapt to the local characteristics of
the image (a flat field or an edge). This was done by taking into account the mask-
ing effect of the human eye. The eye is quite sensitive to a small amount of noise
in a flat field, but is able to tolerate a large amount of noise in the surroundings of
an edge. In their procedure, the masking function is estimated in the form of
from the noisy image where g(i,j) is the gradient of the image at the pixel ( i , j )
and do is at the order of the typical size of an edge. The amount of regulariza-
tion at each pixel (i, j ) is scaled by a visibility function, f(M(i, j)), a monotonic
decreasing function from 1 to 0 as M goes from 0 to oo. Abramatic and Silverman
used the visibility function
where a > 0 is a tuning parameter. Via the visibility function, stronger regular-
ization is then applied to the flat field where M is small and weaker regularization
near an edge where M is large.
In short, nonlinear image restoration is much harder than linear restoration. An
excellent summary of image restoration can be found in G. Demoment (1989).
16
Stochastic methods in investment decision
A forward contract is an agreement by one person to sell at a time T (in years) for
K dollars (at delivery) an asset whose value at the current time t (in years) is S.
The forward price F is that delivery price K chosen to make the value of
the contract zero. If the interest rate r in risk free money were zero, we would
have F = S. However, if the delivery price is K, one only needs cash equal to
K exp[—r(T — t)\ at the present time to be able to pay K at the time T — t later.
If we assign / to be the value of the forward contract, then
since the first two items are equivalent to owning the third.
The forward price F is then the value of K that makes / = 0.
As an example, from the Wall Street Journal on Friday, May 22, 1998 we take the
price in dollars for 100 yen (table 16.1).
272 STOCHASTIC METHODS IN INVESTMENT DECISION
Except for commission, that we neglect, the ratio of the 30 day, 90 day and
180 day forward prices describe the interest rate factor exp[r(T — t)] for the three
different periods. In the third column we list the rate r, consistent with the above
data.
It would appear that Eq. (16.2) contains a hidden assumption that the price S
(at the initial time t) will be equal to the final price ST at the settlement time T.
But this is not the case! An arbitrageur can buy the asset at the spot price S and
take the short (seller) side of the forward contract. To do this, he must borrow S
dollars at a total cost of 5exp[r(T — t)]. When he sells the asset, he receives ST
for a gain (possibly negative) of
From the forward contract, he receives F, but then must supply an asset of value
ST leading to a gain of
the value of ST disappears. This result agrees with Eq. (16.2). Thus if F >
5exp[r(T — t)] he makes a risk free gain. If F < S l exp[r(T — £)] , an arbi-
trage in the opposite direction also yields a risk free gain. This expression, Eq.
(16.2), for the value of a forward contract is independent of the final value ST of
the asset.
Futures contracts, like forward contracts, are an agreement now to buy or sell an
asset in the future at an agreed upon delivery price. However, future contracts are
handled on an exchange such as the Chicago Board of Trade. To buy a future con-
tract through a broker, a deposit, the initial margin, must be supplied to guarantee
delivery. This could be 20% of the value of the contract.
A VARIETY OF FUTURES 273
Thus to buy 100 ounces of gold at $400/ounce, a contract of $40000 might
require an $8000 deposit. If the price of gold goes up by $10 the buyer of the future
contract finds that his margin account has gone up by 100 x 10 = 1000. However,
any balance above the initial margin can be withdrawn. Even if not withdrawn,
additional interest is earned. Conversely, if the price goes down, the value of the
margin account declines a corresponding amount, and the interest earned declines.
If the margin falls below a maintenance level, the investor will receive a margin
call to make up the difference. If not received, the broker sells the contract thus
closing out the position.
In Appendix 2A of Hull (1989), or 3B of Hull (2001) he establishes that if the
interest rate is the same for all maturities, futures contracts and forward contracts
should have identical prices. However, if interest rates change, particularly if they
change in a way that is correlated with price S charges, the equivalence no longer
holds. We shall ignore these fine points in the discussion that follows.
Typically, European contracts involve action only at the closing date. But
American puts and calls can be exercised at any intermediate date, or held to the
close. This introduces a need for a strategy as to when to take action, and can also
cause a modification of the price of the put or call.
The general behavior of future contracts were described in the previous section,
but there are differences that depend on the nature of the assets. If we are dealing
with stock index futures, where the stock has a dividend yield q, the forward price,
Eq. (16.2), is modified to
Stock:
since the underlying asset has a dividend yield q that partly compensates for the
interest rate r.
For future contracts involving currencies, if the local currency has interest rate
r, and the foreign currency has interest rate rt we get
since the yield in the foreign currency plays the role of a dividend.
Table 16.2 shows an decrease of price with maturity corresponding to the fact
that interest rates in the US are less than those in Canada.
For gold and silver, the forward price is
Gold:
or
Gold:
274 STOCHASTIC METHODS IN INVESTMENT DECISION
where U is the present value of all storage costs over the life of the contract. If the
storage cost is proportional to the value of the gold, with the storage cost u per
year per dollar of value, this is equivalent to a total storage cost
Black and Scholes (1973) have developed a procedure for estimating the appropri-
ate price for puts and calls which are forward contracts where the asset involved is
a stock. A similar contribution was made by Merton (1973) at the same time. (The
Nobel prize was shared for this work.)
For this purpose, they need a model for how the price S of a stock varies with
time. If the change in S is proportional to S, the growth is necessarily exponential.
In the absence of fluctuations then, the model assumes
where the growth rate /j,, in the stock price, presumably can be estimated from the
growth rate in earnings of the stock. In the lowest order, one might expect the stock
to execute a Brownian motion with fluctuation parameter anS. But this choice has
two disadvantages. The first as remarked by Hull (1989), Section 3.3 and Hull
(2001), Section 10.3 is that investors expect to derive a return as a percentage of
the stock value, independent of the price. Thus they classify stocks by their growth
rate. To add Brownian motion, Eq. (16.11) is rewritten in the form
Hull
where (3.7) refers to Hull (1989) and (10.6) refers to Hull (2001). Hull, of course,
doesn't use the subscript H in his work. Here dz is the differential of a Wiener
A MODEL FOR STOCK PRICES 275
process of pure Brownian motion, namely one whose mean remains zero, and
whose standard deviation is (At) 1 / 2 .
If 4>(m, S) describes a normal process with mean m and standard deviation E,
then the distribution of Ax = A5/5 is described correctly by
to allow for the growth of the mean value m = //Ai and the standard devia-
tion o-H(At) 1 / 2 with time. This correct result, stated as Eq. (3.8) in Hull (1989)
and Eq. (10.9) in Hull (2001), also avoids the second disadvantage. If 5* itself
were described as a Wiener process, the price S could reach unacceptable negative
values. In the accepted model, all that can happen is for In S to go negative. We
therefore advocate the use of Eq. (16.12) as the fundamental description of a model
for stock prices. The model, Eq. (16.12), defines // as the slope of the ensemble
average of x(t) if an ensemble of measurements can be made. If not, one takes a
logarithm of one sample price series and asks for the slope of the best linear fit to
that line of price logarithms.
Now we ask what is the Ito's stochastic differential equation (ISDE) for stock
5? Using the Ito's calculus lemma, Eq. (10.41), and dS/dx = S and d2S/dx2 =
S, this leads to ISDE for S,
This simplest example shows the puzzle in performing Ito's calculus lemma. But
this kind of manipulation has appeared in some well known books in the financial
area. For example, in Hull's Eq. (10.6), with the word "or", Eq. (16.12) is rewritten
as
Equations (16.12) and (16.16) appear to be are regarded as equivalent by Hull and
the finance community. But we believe the second choice is not equivalent to the
first choice.
276 STOCHASTIC METHODS IN I N V E S T M E N T DECISION
Before further analysis we must note an annoying but unimportant difference
in notation. The standard Wiener notation is equivalent to the correlation
whereas the customary physics notation would have a factor of 2 placed on the
right hand side of these equations. In this chapter we follow Hull's convention
common in the economics world and and set
Equation (16.21) seems similar to Eq. (16.16), however, using our approach
the average of the product in the second term is not zero. Thus Eq. (16.21) should
not be regarded as an Ito's equation.
The custom in the economic field, under definition of Ito's integral, is to replace
Eq. (16.16) by an equation
Hull:
where tc = t — e is a slightly earlier time than t. This guarantees that the average of
the second term vanishes. This happens because in an integration from t to t + At
the time tc is not included in the region of integration. Equation (16.22) is then a
true Ito's equation, but it is not equivalent to Eq. (16.21).
Do these models, Eq. (16.12) and Eq. (16.16), or Eq. (16.21) and Eq. (16.22),
yield different results?
A MODEL FOR STOCK PRICES 277
Equation (16.21) has been solved for a well behaved f ( t ) (finite, not 5 func-
tions) stochastic variable such as one with a Gaussian correlation in time (see
Section 10.2). The average of the product in the second term is not zero in a finite
time interval, and remains nonzero as one approaches the white noise limit by
letting the correlation time approach zero when At —» 0. By specializing to the
delta-correlated case, Eq. (10.27) can be written in the form
Lax :
which is in agreement with Eq. (16.14). For the average, S on the right hand side
of Eq. (16.26) is replaced by (S}.
In our Langevin expression, the first term represents motion driven by finite
number of driving forces to the system, hence, is a deterministic function of
time. The second term represents the fluctuation driven by many unknown random
forces. Under transform of variables, the ordinary calculus rule can be applied,
separately, on both terms. Hence, the meanings of drift and fluctuation are clearly
kept separate in the first and second terms after transform of variables. The average
of the second term is generally nonzero when an(S) is not a constant.
278 STOCHASTIC METHODS IN INVESTMENT DECISION
16.5 The Ito's stochastic differential equation
where
The Riemann integral exists if the sum approaches a limit independent of the
placement of tj in the interval in Eq. (16.29). The Riemann integral exists, accord-
ing to Jeffries and Jeffreys (1950), when the integrand is bounded over the interval
of integration and for any positive ui and 17, the interval of integration can be
divided into a finite set of intervals such that those with hops (jump discontinu-
ities) > (jj have a total length < 17. Our point is that Brownian motion violates
these condition. See Ito (1951), and Doob (1953).
Ito (1951) avoids the difficulty by evaluating a (a) at the beginning of the
interval, and evaluating the integral over /(£) as a Stieltjes integral
However, even the Stieltjes sum does not converge to a unique integral, and
the evaluation at the beginning of the interval is an arbitrary choice. The effect of
this choice, since /(s) is independent of a(t) for t < s, is that the average of the
second term in Eq. (16.27) vanishes, so that the Ito drift vector is
where D = cr(a) 2 is the diffusion coefficient. This result is in agreement with that
found in Stratonovich,
The justification for our procedure is that physical processes are described by
noise that is only approximately white. For the physical process, one can use the
ordinary methods of calculus. The iteration is necessary to retain terms that will
retain a finite value in the limit as the correlation of the noise approaches a delta
function. Direct use of the Ito choice, Eq. (16.32), and starting from Eq. (16.16),
leads to Eqs. (4.6|11.1) in Hull (1989(2001), the result quoted in our Eq. (16.25).
We simply claim that this result is not the answer to the original model, namely
that the logarithm of the price obeys the standard Brownian motion.
A direct proof of this remark can be made without using stochastic inte-
grals. Noting that the Gaussian distribution of P(x, t) satisfies the Fokker-Planck
equation
which contains the constant diffusion term D = (l/2)<7jj, and the constant drift
term A = p,. One can obtain the equation for S from the equation for P(x, t), by
introducing the relation x = ln[S/S(0)]:
written in our notation, where tc = t — e guarantees that the last term averages to
zero. The same equation written in Ito notation looks like:
280 STOCHASTIC METHODS IN INVESTMENT DECISION
The confusion in the financial literature arises because Eqs. (3.7J10.6) in Hull
(1989)2001) states that his model is
Lax :
but by occasionally multiplying this equation by S (without the strict use of the
Ito's calculus lemma) to obtain
or Hull :
In summary, we do not claim that the Ito definition is wrong, but requires
extreme care to obtain correct results. It tends to mislead smart people into obtain-
ing an incorrect answer. The proper intuitive view for the Black-Scholes model
should be that x, the logarithm of the price, obeys a standard Brownian motion. In
other words, Eq. (16.12) is the correct model regardless of which calculus is used.
When one makes a change of variable from the logarithm x, to the actual price,
S, the appropriate stochastic differential equation that obeys the Ito rules will be
Eq. (16.39). The differences between Hull's results and mine (Lax) (as well as
with some of his own results) is due primarily to the use of two different models.
The Ito notation just obscures this point. It could be used with great care to obtain
correct results.
The models based on Ito's lemma should be avoided because it is counter-
intuitive for physical reasons. On the other hand, the procedure used in Section
10.3 would reduce the number of errors and avoid the use of Stieltjes and Lebesgue
integration. The main disadvantage of our proposal is that it will reduce the number
of jobs needed for mathematicians to teach measure theory. The discussion of
models for stock prices and market behavior can then be devoted more heavily to
real world questions and less heavily to formalism.
For volatile stocks, the difference between the two possible market models,
Eq. (16.12) and Eq. (16.16) can be appreciable. In a completely rational world, the
growth parameter would be determined completely by the growth rate in earnings
per share. Since this is not the case in the real world, the parameter /z is obtained
by fitting against stock prices. Depending on how this is done, the fitting procedure
might cancel the error in the use of the model Eq. (16.16) instead of Eq. (16.12).
In our work on laser line-widths discussed in Chapter 11, the growth and decay
rate can be separately determined, and there is no flexibility in our choice. The
excellent agreement for the laser line-widths with experiment, as shown in Fig.
11.3, supports the iterative procedure used in Chapter. 10 for relating the Langevin
to the Fokker-Planck pictures. It is supposed that the mathematical techniques
developed for the study of random processes in physical systems can be applied in
future for the economic and financial worlds.
VALUE OF A FORWARD CONTRACT ON A STOCK 281
16.6 Value of a forward contract on a stock
As the simplest application of Ito's lemma, Hull (Example 4.1) considers the value
of a forward contract on a nondividend paying stock. We already found in Eq.
(16.2) that the forward price should be
The extra term involving d2F/dS2 due to Ito's lemma vanishes for the choice of
Eq. (16.41), since F is linear in S. Thus F obeys a dynamics, including the noise,
similar to that of the stock price S, but with the growth rate v reduced by the risk-
free interest rate r. However, in Eq. (16.43) v = p, + (l/2)<7^ when the physical
stock model is used in Ito's formula, which leads to [p, — r + (l/2)a^l]Fdt in the
first term.
Now, we use our Langevin approach described in Section 10.3, where the
ordinary calculus can be used. From Eq. (16.12), we have for S
we have
and for the conditional average, (F} on the right hand side is replaced by F, which
is in agreement with result of Ito's approach.
282 STOCHASTIC METHODS IN INVESTMENT DECISION
Our Langevin approach is easier than that using Ito's lemma. Instead of using
Ito's calculus lemma at each steps of transform from x to S, and to F, the ordi-
nary calculus rule can be used in each step in our approach, and d(F}/dt is then
determined using Eq. (16.46) at last step.
We showed in Section 16.2 that the risk of owning an asset could be canceled out
by also having a forward contract to sell the asset. And the combination is risk
neutral provided that an appropriate value F is set as the forward contract. Can
this scheme be extended to the case when the asset is a stock that has a growth rate
^5 and is subject to noise proportional to the value of the stock? It is assumed that
derivative asset (say a put) has a value g = g(S, t) for one derivative security.
Using the Ito's lemma, g(S, t) obeys the stochastic differential equation
where, by the Ito convention, the second term has a vanishing average value. To
obtain a risk free portfolio, we must have a combination of assets in which the
term related to the change rate of stock v vanishes. This can be accomplished by
combining a put the equivalent of —1 shares which takes the value —g, with dg/dS
shares of the stock. The combined value is
where the subscript t reminds us that the number of shares does not change during
time evolution, except when it is adjusted by investor.
The ratio of these two components was chosen to cancel the v contribution
from each other. This cancellation occurs if one follows Hull and writes
Assuming the validity of this result, the value of II must grow at the interest rate r
, or arbitrageurs could make a risk free profit:
DISCUSSION 283
The result is the differential equation
where
The contribution to the average or the conditional average with II (t) = IT, from
the second term in Eq. (16.55), is the average of
16.8 Discussion
Stock model
The motivation to separate the right hand side of the stochastic differential equation
into two terms and to assign a standard fluctuation force f ( t ) in the second term is
not from mathematics, but from modeling of the real world.
In physics, using physical modeling of a system of finite parameters (for exam-
ple, an oscillator, or a system of coupled oscillators) and using the physical
284 STOCHASTIC METHODS IN INVESTMENT DECISION
rules (for example, the Newton's law), the deterministic function of drift veloc-
ity B(a,t) of a system can be determined, and it should be a smooth function of
time t. On the other hand, the fluctuation force comes from connection of this sys-
tem with the "heating reservoir", which has infinite dimensions. See Sections 7.6
and 7.7 for detailed discussion.
Concerning the price of a stock, in a completely rational world, the growth
parameter of the price of a stock would be determined by the growth rate in earn-
ings per share and other known conditions. Since this is not the case in the real
world; the stock price fluctuates because of many unknown causes. The fluctuation
force introduces an irregular, nondeterministic, and very fast-varying oscillatory
motion of S with time.
LSDE builds a model to represent effects of two different original forces on
the system. Of cause, there are some stochastic processes (for example, scattering
in a turbid medium in physics), where fluctuation plays the dominate role, and
separating into two terms in LSDE is not proper. Here, we limit, however, to the
cases that the model in LSDE is suitable, as well as in ISDE.
The equation of motion described by a linear model of a stock, Eq. (16.12)
or Eq. (16.15), is only the first perturbative approximation of a nonlinear model,
and is only valid over a very short period from the spot time. For a longer period
of time, the first term in this equation means that the price of S will increase (or
decrease) exponentially, which certainly does not reflect real development of S.
Also, the second term of this equation means that the width of fluctuation of S will
increase with time, and there is no force to limit the amplitude of fluctuation.
We suggest the following model for a stock S. First, a model for underlying
value of a stock So(t) can be built based on fundamental analysis, which is a
deterministic and slow-varying function of time. One may also build a linear model
of So(t) by use of an extrapolation of previous data of S to obtain So(t) = n(t —
to) So (to), where So (to) is the extension of previous value of SQ up to the spot time
to, and n is a rate of increasing (or decreasing) of the stock. The So(t) provides an
"operating point" for stock S as a random variable.
We define S(t) = S(t) — So(t), and build a stochastic differential equation
for S(t~). The stock value S'(t) oscillates around So(t), hence, B(S,t~) represents
a model of drift velocity of S. The simplest model for this purpose is a harmonic
oscillator, where a constant recovering force F = —aS makes S(t) oscillating
around S = 0 with a certain amplitude. The amplitude of S(t) limits the value of
S(t) within a certain range. When a fluctuation force is added, the value of S(t)
will not diffuse to infinity with time.
However, the velocity of a harmonic motion is too fast when S is near zero,
and too slow when S is near its amplitude value. The real situation is that S(t)
slowly changes when its value is near S'o(t), but it rapidly changes when its value
DISCUSSION 285
largely deviates from So(t). This suggests that a nonharmonic model is needed,
for example, the recovering force can be chosen as F — —aS — (3S3.
Recently, a form of B(S, t) = —aS is often used, which originates from
Uhlenbeck's model. Under this linear model S(t) will exponentially approach to
So(t) from its spot value. The advantage of this model is easy to be manipulated in
calculation. However, one may ask why S(t) only approaches to So(t) from one
side, but cannot cross through So(t) to the other side of So(t).
If S is limited to oscillate in a certain range, cr(S) in the second term of SDE
should not dramatically change, and the extra term in Ito's lemma (or the contri-
bution to expectation from the second term in LSDE) may be relatively small and
can be treated using some approximation, that reduce the difficulty in solving the
equation of motion of the expectation value of a derivative security with time.
Conditional expectation
We emphasize that the stochastic processes in finance, similar to that in physics
science, are natural processes. People who have some prior information and
knowledge may build the more proper model to match development of the nat-
ural stochastic process of a marketed asset, but, in general, does not change this
natural process. The state of a random variable a at time t is best described by
its probability distribution P(a,t), not its spot value realized at time t, because
the spot value of a is undetermined and it jumps up and dawn in a very fast way.
Based on this view point, we would like to discuss the concept of the conditional
expectation.
Mathematicians denote the conditional expectation by the symbol E(Xt> \J-~t),
t' > t, where Xf is a stochastic process, and J-± is a a-field known as "natural
filtration". For example, estimation of the price of an option (or future) H(T — t)
with maturity time T is determined based the spot value of its underlying stock
S at current time t. Speaking differently, one already has information that the
closing price today of S(t) is S. This information provides a filtration under which
the the expectation of S(t') in future can be calculated. However, one may ask a
question whether the the price H(T - t} determined based on S at 4:00 PM, or
that based on 5" at 3:50 PM is more reasonable, since the spot value of S may be
remarkably different by irregular jumps during the last ten minutes before closing.
In our opinion, the expectation based on probability distribution P(S, t) at spot
time t could provide a more reasonable estimation of H(T -1}, not the conditional
expectation, because the probability P(S, t) can not dramatically change during a
small interval of time, but its realization value can jump up or dawn during a very
short period of time. In physics, we use "ensemble" to describe all possible states
of a system under a certain probability, and do not regard a sample in an ensemble
as a meaningful quantity. In Monte Carlo simulations we do not regard a single
286 STOCHASTIC METHODS IN I N V E S T M E N T DECISION
path essential; similarly, the point by point value in the time path of a stock is not
essential.
Discrete processes
Discrete random processes, for example, the early process after a suddenly big
jump of a stock, is beyond the LSDE description and also beyond ISDE. In dis-
crete random processes, the nth order diffusion coefficients Dn are nonzero, up to
infinite order of n, and Eq. (10.4) should be replaced by a series of equations:
16.9 Summary
17.1 Overview
Thomson's contributions
My (Lax) interest in the field of time series has been greatly stimulated by personal
contact with David J. Thomson who has spent a lifetime career covering all aspects
of time series. We can describe the present chapter as our attempt to learn enough
about time series to be able to read Thomson's work. We shall therefore record a
subset of his publications to indicate the breadth of topics covered.
His work started, appropriately for Bell Laboratories, with the analysis of time
series in waveguides used to transmit information in the telephone network. See
Thomson (1977). This work was expanded in Thomson (1982) already referred to.
His 1982 paper constitutes the foundation of much of his later work. In Kleiner,
Martin and Thomson (1979) Thomson shows how to apply Tukey's ideas of robust-
ness to spectral estimation. Tukey's book on Exploratory Data Analysis shows how
to deal with real data. When is an apparently deviating point an outlier to be dis-
carded? See Tukey (1977) and Thomson (1982). His work was also applied to the
THE WIENER-KHINCHINE AND WOLD THEOREMS 291
global warming problem in Kuo, Lindberg, Craig, and Thomson (1990). This first
serious paper on 'recent' climate has been cited by Al Gore! Thomson (1990a) on
analysis of the earth's climate extended to a period of 20,000 years, and correlated
CO2 data with tree ring data. His next work, Thomson (1990b), extended time
series over 600,000 years and established a sensitivity of the results to the small
time differences between the siderial year, the equatorial year, and the solar year.
Thomson and Chave (1991) also adapted jack-knife procedures to deal with non-
normal variables and confidence limits. The current picture of global heating is
discussed in Thomson (1995). Thomson, MacLennan, and Lanzerotti (1995) use
time series techniques to analyze the propagation of solar oscillations through the
interplanetary medium.
Perhaps Thomson's most important work on global heating filters the preces-
sion signal out of the data and establishes a strong correlation between global
worming and the CO2 concentration. See Thomson (1997).
An application is made to the financial world by studying stock and commodity
data over a 40 year period; see Ramsey and Thomson (1999). A comprehensive
review was given by Thomson (1998) of his work on "Multitaper Analysis of
Nonstationary and Nonlinear Series Data", presented at the Isaac Newton Institute.
The Stieltjes form is needed for mathematical rigor, when the spectrum of X con-
tains Brownian motion (white noise), or a delta function of time autocorrelation.
In the real world, in which the noise can be approximately white, and spectral lines
are narrow, but not infinitely so, Stieltjes integrals are unnecessary, and our simpler
292 SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
notation can be followed. Even when white noise is present, the second moments
are defined via Eq. (4.50) and the relation
The first and last terms follow the notation of the time series texts by Percival
and Walden (1993) and Priestley (1981), and the middle terms follow the notation
in this book. The Wiener-Khinchine theorem (for continuous time), Eq. (4.11),
takes the form
For the discrete time case, with x(t) known only at the integers t = 0,±1,±2,...,
the same theorem applies but the limits of integration in Eq. (17.5) extend
only from — 1/2 to 1/2. That is because exp(j2vr/i) is indistinguishable from
exp[j2vr(/ + l)t] since t is an integer. Thus all frequencies outside the basic band-
width from —1/2 to 1/2 are folded into that basic bandwidth. This process is
referred to as aliasing, and is well known to anyone who has watched rotating
wheels in the movies (at l/24th of a second) and found that they can appear to be
rotating backward. In the study of crystals, there is a similar folding of all wave
vectors into the first Brillouin zone.
Since the spectrum S(f) is, by definition positive, the normalized p(f) =
S(f)/ j S(f)df is a probability density. Correspondingly, we defined
Priestley (1981) describes the Wold (1938) theorem, Eq. (17.7), below, as the
necessary and sufficient condition that the set of numbers p(±n) can be an
autocorrelation. The Wold condition is
that is, the autocorrelation p(r) must be the Fourier transform of a probability
density p(f).
Assuming that we are dealing with a stationary random process, the first objec-
tive in dealing with a time series is to obtain its spectrum. The simplest procedure,
for a single discrete sample of x(t] for t = 1 , 2 , 3 , . . . , AT is to use the estimate
where ST is given by
with
the Nyquist sampling frequency.
If one makes the common choice of units such that At = 1, one obtains Eq. (17.8).
Equation (17.10) is biased. If all the summands were unity, the result would be
1 — r\/N. The correlator associated with this periodogram can be modified by
replacing 1/N by 1/N' in Eq. (17.10), with Nf is given the value N — \T\, and the
"periodogram" is said to be unbiased. However, due to variance errors, the biased
(original) periodogram is often superior. We here follow the notation and approach
of Percival and Weldon (1993) including the assumption that the process mean is
zero, which is easily eliminated by subtracting the mean from each variable. By
rearranging the order of summation, the inverse of Eq. (17.10) then permits the
spectrum to be estimated by
Thomson and Chave (1991) however claims that the use of autocorrelations
and spectral densities throws away phase information. He therefore follows a
procedure outlined in Section 17.4.
since E(X) = //. Note that X is still a random variable. No ensemble average has
been performed. The variance of Xt is denned by
where, again, the last expression is valid only in the stationary case.
The covariance of two RV is defined by
If one sums first along the diagonal (at fixed r = t — s), Eq. (17.18) can be rewritten
as
where we note that TV — |r| is the length of the relevant diagonal and r| < N.
Since the sum converges, var(X) approaches 0 as N —» oo, and the estimate X is
unbiased.
The Karhunen-Loeve theorem, (karhunen 1947, see also Kac and Siegert 1947)
states that if the orthogonal functions are chosen to be eigenfunctions of the cor-
relation function R(t, s), the expansion coefficients will be uncorrelated random
variables.
SLEPIAN FUNCTIONS 295
Proof
Start with the expansion
with
In the case of discrete time, the integral over time is replaced by a sum over time.
But the theorem remains valid.
where for convenience, times are measured from the center of the series. The limits
±1/2 are appropriate for the spacing At = 1 since that corresponds to the Nyquist
296 SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
sampling rate, and frequencies outside the first zone can be shifted in to the first
zone by a change of an integer which has no effect on the exponential. The Stielt-
jes integral can be replaced by a Riemann integral for a continuous spectrum by
making the replacement
as was done to get the second form of Eq. (17.27). It is necessary to remember that
x(n) is the discrete time series, and that dZ(v) and x(v) are in frequency space.
A zeroth approximation to the frequency amplitude is given by taking the discrete
FFT:
Since Eq. (17.29) provides a Fourier series representation of y(f) with period 1,
the Fourier coefficients are given by
If one now inserts Eq. (17.27) for x(n) into Eq. (17.29) one obtains
In the presence of band limitation, W will be less than 1/2. These eigenfunctions
are referred to as the Slepian functions, in honor of Slepian who recognized their
importance in signal representation. Each eigenfunction depends on N and W as
parameters, so a complete labeling of the Slepian eigenfunctions is Uk(N, W, /).
These are also referred to as Discrete Prolate Spheroidal Functions (DPSF).
To obtain spectral amplitudes, we must solve Eq. (17.31) for x(v) in terms of
j/(/), the preliminary direct spectrum (where v is also a frequency). By expanding
the known y(f) in terms of the eigenvectors
or in final form
The properties of the Slepian (1978) functions are analyzed in great detail in his
paper. The eigenvalues are found to be nondegenerate and monotonic decreasing:
Comparison between Eqs. (17.38) and (17.39) shows that the eigenvalue A& repre-
sents the ratio of the energy in the inner region [-W, W] to that in the full region
[—1/2,1/2]. This guarantees that all the eigenvalues are less than one, and in view
of Eq. (17.37), the shape most concentrated in frequency is associated with the
k = 0 mode. Indeed, that is why Eberhard (1973) advocated using that mode,
alone, as the appropriate window.
where
TABLE 17.1. Eigenvalues, \k(N, W), of the discrete prolate spherical series
(after Percival and Walden 1993).
k A fc (31,6/31)_ A fe (31,7/31)_ Afc(31,8/31)
0 0.9999999999999997 1.000000000000007 1.000000000000002
1 0.9999999999999769 0.9999999999999933 1.000000000000001
2 0.9999999999978725 0.9999999999999921 0.9999999999999945
3 0.9999999998764069 0.9999999999998924 0.9999999999999908
The eigenvalues are so close, include some values above unity, which are
clearly wrong. This inaccuracy means that the associated eigenvectors, using these
values, will also be inaccurate.
In the case of continuous time problems Slepian and Pollak (1961) found that
eigenfunctions that minimize the amount of energy outside some time boundary
[—1/2, 1/2] and obey an integral equation, can also be found to be solution of a
second order differential equation. For the discrete time problem, The solutions of
an integral equation of the form of Eq. (17.33) can also be obtained as solutions of
a second order difference equation
This matrix equation has eigenvectors t/fc) and eigenvalues 9k, as functions of N,
and W. The eigenvectors v^ (N, W) are identical to those in the original integral
equation. However, the eigenvalues Ok(N, W} are now well separated. As a result,
it is easy to calculate the eigenvectors vk accurately, without requiring quadruple
precision, and then A& can be determined by
The procedure used by Thomson (1982) to analyze time series, in 41 IEEE pages,
is sufficiently complicated that we would like to give a road-map.
(1) The first (optional) step is to replace the original input data by a new set that is
prewhitened.
An example of such a procedure is outlined by Percival and Walden (1993).
Convolute Xt with gu to get
If one can choose <?(/) to cancel most of the frequency dependence of Sx (/) then
Sy(/) will be nearly constant and easier to evaluate accurately by the methods
described below.
Of course, this is a chicken and egg problem, since it supposes that an approxi-
mate spectrum, <§(/), is already known. A periodogram, or a parametric approach,
can be used to get a zeroth approximation to the spectrum of X. If one can choose
a prewindow filter that reduces the dynamic range of the resulting RV, the use of a
Slepian window will yield a more accurate estimate of the spectrum.
(2) Post-smoothing by a second window produces a modified estimate
The first preliminary estimate of the spectral amplitude is y(f), the discrete Fourier
transform of x(n), given in Eq. (17.29). But this covers the full frequency range,
[—1/2, 1/2] with a single formula. Thomson (1982) in his Eq. (3.1) suggests that a
Fourier transform truncated to the interval [/ — W, f + W] would provide a better
resolution using the formula
This proposal is an excellent one, since one can prove that if S(f) were flat over
this smaller interval, it would yield the correct spectral density at /.
Presumably because Eq. (17.51) cannot be expressed directly in terms of
observed x(n), Thomson introduces another formula for quantity yk(f), related
to Zk(f) by Eq. (17.58) in the next section,
which also covers the truncated frequency region. Since y(f) is expressible i
terms of the data x(n) via Eq. (17.31), he then obtains
after using the integral equation (17.33) for the Slepian functions. The advantage
of this new form is that it is directly expressible in terms of the data
The disadvantage of this estimate is that it is no longer local but has contributions
from the entire [—1/2,1/2] domain.
302 SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
Thomson then uses yk(fo) to obtain an estimate of the spectral density S(f; /o)
at any / in the interval /o — W < f < /o + W . He then averages this result over
the interval to obtain the average result
where each
can be regarded as an individual spectral estimate, with the fcth data window
namely the differences between the ideal estimate of the amplitude and the
weighted estimate dk(f)yk(f)- The result is that the weights are estimated from
For an elementary discussion of the removal of periodicity and trend see Williams
(1997). For a broad perspective on spectral analysis see Tukey (1961). A good gen-
eral reference is Handbook of Statistics III, by Brillinger and Krishnaiah (1983),
as is Harris (1967) and Tukey (1967).
A more detailed description of the techniques used in what is now called
"complex demodulation" is given in Hassan (1963).
The methods described in this chapter are nonparametric, in the sense that one
doesn't have a model with a small number of parameters which are fitted by analyz-
ing experimental time series data. Fourier techniques have been used. In a sense,
parameters are involved. But there are so many that no assumptions are really made
that pertain to a model, except possibly the assumption that one is dealing with a
stationary process. A recent comparison of the relative merits of parametric and
Fourier transform methods has been given by Gardiner (1992). He also suggests
methods of combining parametric and Fourier methods. For an excellent elemen-
tary presentation of methods of dealing with nonstationary time series see Cohen
(1995) who discusses the effects of many of the popular filters.
This theorem can be established by taking the left hand sum from —NtoN, and
then taking the limit. A simpler approach is to note that the right hand side g(x)
regarded as a function of x, is periodic
The Fourier series representation over the domain 0 < x < a leads to the relation
where
304 SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
In order for Eq. (17.65) to be true, we must have
In order that g(x) be periodic, A(x — x') must possess the same delta function in
each interval from na to (n + l)a. In other words,
If this result is equated to the definition, Eq. (17.66) we have the special Poisson
formula, Eq. (17.63).
If we integrate Eq. (17.63) over an arbitrary function f ( x ) , we obtain the usual
Poisson sum formula, see Titchmarch (1948),
Proof
Since F ( k ) = 0 unless \k\< B = vr/a, we can represent F(k) by a Fourier series
over this finite region:
APPENDIX A: THE SAMPLING THEOREM 305
where the Fourier coefficients are determined by
where the second form makes use of the definition, Eq. (17.70). If Eq. (17.74) is
inserted into Eq. (17.74) we obtain
Thus F(k) is uniquely determined by the sample values f(na). Inserting Eq.
(17.75) into Eq. (17.70), but restricting the integration region to the band from
—TT/O, to TT/O we immediately obtain Eq. (17.71), the sampling theorem.
An alternate view of the sampling theorem can be phrased as follows. Suppose
one has a set of values f(na) at the lattice points na and we wish to inter-
polate to obtain the values /(x) at other points. The sampling theorem, in the
form of Eq. (17.71) provides the smoothest interpolation in the sense that any
other interpolation will not be band-limited, and hence will involve higher Fourier
components.
This page intentionally left blank
Bibliography