(Oxford Finance Series) Melvin Lax, Wei Cai, Min Xu-Random Processes in Physics and Finance-Oxford University Press, USA (2006)

Random Processes in Physics and Finance
This page intentionally left blank

Random Processes in
Physics and Finance
MELVIN LAX, WEI CAI, MIN XU
Department of Physics, City University of New York

Department of Physics, Fairfield University, Connecticut
OXPORD
UNIVERSITY PRESS
OXTORD
UNIVERSITY PRESS
Great Clarendon Street, Oxford OX2 6DP
Oxford University Press is a department of the University of Oxford.
It furthers the University's objective of excellence in research, scholarship,
and education by publishing worldwide in
Oxford New York
Auckland Cape Town Dares Salaam Hong Kong Karachi
Kuala Lumpur Madrid Melbourne Mexico City Nairobi
New Delhi Shanghai Taipei Toronto
With offices in
Argentina Austria Brazil Chile Czech Republic France Greece
Guatemala Hungary Italy Japan Poland Portugal Singapore
South Korea Switzerland Thailand Turkey Ukraine Vietnam
Oxford is a registered trade mark of Oxford University Press
in the UK and in certain other countries
Published in the United States
by Oxford University Press Inc., New York
© Oxford University Press 2006
The moral rights of the authors have been asserted
Database right Oxford University Press (maker)
First published 2006
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
without the prior permission in writing of Oxford University Press,
or as expressly permitted by law, or under terms agreed with the appropriate
reprographics rights organization. Enquiries concerning reproduction
outside the scope of the above should be sent to the Rights Department,
Oxford University Press, at the address above
You must not circulate this book in any other binding or cover
and you must impose the same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
Library of Congress Cataloging in Publication Data
Data available
Printed in Great Britain
on acid-free paper by
Biddies Ltd., King's Lynn, Norfolk
ISBN 0-19-856776-6 978-0-19-856776-9
10 9 8 7 6 5 4 3 2 1
Preface
The name "Econophysics" has been used to denote the use of the mathemat-
ical techniques developed for the study of random processes in physical systems
to applications in the economic and financial worlds. Since a substantial number
of physicists are now employed in the financial arena or are doing research in
this area, it is appropriate to give a course that emphasizes and relates physical
applications to financial applications.
The course and text on Random Processes in Physics and Finance differs from
mathematical texts by emphasizing the origins of noise, as opposed to an analysis
of its transformation by linear and nonlinear devices. Of course, the latter enters
any analysis of measurements, but it is not the focus of this work.
The text opens with a chapter-long review of probability theory to refresh those
who have had an undergraduate course, and to establish a set of tools for those who
have not. Of course, this chapter can be regarded as an oxymoron since probability
includes random processes. But we restrict probability theory, in this chapter, to
the study of random events, as opposed to random processes, the latter being a
sequence of random events extended over a period of time.
It is intended, in this chapter, to raise the level of approach by demonstrating
the usefulness of delta functions. If an optical experimenter does his work with
lenses and mirrors, a theorist does it with delta functions and Green's functions. In
the spirit of Mark Kac, we shall calculate the chi-squared distribution (important
in statistical decision making) with delta functions. The normalization condition
of the probability density in chi-square leads to a geometric result, namely, we
can calculate the volume of a sphere in n dimensions without ever transferring to
spherical coordinates.
The use of a delta function description permits us to sidestep the need for using
Lebesgue measure and Stieltjes integrals, greatly simplifying the mathematical
approach to random processes. The problems associated with Ito integrals used
both by mathematicians and financial analysts will be mentioned below. The prob-
ability chapter includes a section on what we call the first and second laws of
gambling.
Chapters 2 and 3 define random processes and provide examples of the most
important ones: Gaussian and Markovian processes, the latter including Brownian
motion. Chapter 4 provides the definition of a noise spectrum, and the Wiener-
Khinchine theorem relating this spectrum to the autocorrelation. Our point of view
here is to relate the abstract definition of spectrum to how a noise spectrum is
measured.
vi PREFACE
Chapter 5 provides an introduction to thermal noise, which can be regarded
as ubiquitous. This chapter includes a review of the experimental evidence, the
thermodynamic derivation for Johnson noise, and the Nyquist derivation of the
spectrum of thermal noise. The latter touches on the problem of how to handle
zero-point noise in the quantum case. The zero-frequency Nyquist noise is shown
to be precisely equivalent to the Einstein relation (between diffusion and mobility).
Chapter 6 provides an elementary introduction to shot noise, which is as ubiq-
uitous as thermal noise. Shot noise is related to discrete random events, which, in
general, are neither Gaussian nor Markovian.
Chapters 7-10 constitute the development of the tools of random processes.
Chapter 7 provides in its first section a summary of all results concerning the
fluctuation-dissipation theorem needed to understand many aspects of noisy sys-
tems. The proof, which can be omitted for many readers, is a succinct one in
density matrix language, with a review of the latter provided for those who wish
to follow the proof.
Thermal noise and Gaussian noise sources combine to create a category of
Markovian processes known as Fokker-Planck processes. A serious discussion of
Fokker-Planck processes is presented in Chapter 8 that includes generation recom-
bination processes, linearly damped processes, Doob's theorem, and multivariable
processes.
Just as Fokker-Planck processes are a generalization of thermal noise, Langevin
processes constitute a generalization of shot noise, and a detailed description is
given in Chapter 9.
The Langevin treatment of the Fokker-Planck process and diffusion is given
in Chapter 10. The form of our Langevin equation is different from the stochastic
differential equation using Ito's calculus lemma. The transform of our Langevin
equation obeys the ordinary calculus rule, hence, can be easily performed and
some misleadings can be avoided. The origin of the difference between our
approach and that using Ito's lemma comes from the different definitions of the
stochastic integral.
Application of these tools contribute to the remainder of the book. These appli-
cations fall primarily into two categories: physical examples, and examples from
finance. And these applications can be pursued independently.
The physical application that required learning all these techniques was the
determination of the motion and noise (line-width) of self-sustained oscillators
like lasers. When nonlinear terms are added to a linear system it usually adds
background noise of the convolution type, but it does not create a sharp line. The
question "Why is a laser line so narrow" (it can be as low as one cycle per second,
even when the laser frequency is of the order of 1015 per second) is explained in
Chapter 11. It is shown that autonomous oscillators (those with no absolute time
origin) all behave like van der Pol oscillators, have narrow line-widths, and have
a behavior near threshold that is calculated exactly.
PREFACE vii
Chapter 12 on noise in semiconductors (in homogeneous systems) can all be
treated by the Lax-Onsager "regression theorem".
The random motion of particles in a turbid medium, due to multiple elastic scat-
tering, obeys the classic Boltzmann transport equation. In Chapter 13, the center
position and the diffusion behavior of an incident collimated beam into an infinite
uniform turbid medium are derived using an elementary analysis of the random
walk of photons in a turbid medium. In Chapter 14, the same problem is treated
based on cumulant expansion. An analytical expression for cumulants (defined in
Chapter 1) of the spatial distribution of particles at any angle and time, exact up to
an arbitrarily high order, is derived in an infinite uniform scattering medium. Up
to the second order, a Gaussian spatial distribution of solution of the Boltzmann
transport equation is obtained, with exact average center and exact half-width with
time.
Chapter 15 on the extraction of signals in a noisy, distorted environment has
applications in physics, finance and many other fields. These problems are ill-
posed and the solution is not unique. Methods for treating such problems are
discussed.
Having developed the tools for dealing with physical systems, we learned that
the Fokker-Planck process is the one used by Black and Scholes to calculate the
value of options and derivatives. Although there are serious limitations to the
Black-Scholes method, it created a revolution because there were no earlier meth-
ods to determine the values of options and derivatives. We shall see how hedging
strategies that lead to a riskless portfolio have been developed based on the Black-
Scholes ideas. Thus financial applications, such as arbitrage, based on this method
are easy to handle after we have defined forward contracts, futures and put and call
options in Chapter 16.
The finance literature expends a significant effort on teaching and using Ito
integrals (integrals over the time of a stochastic process). This effort is easily
circumvented by redefining the stochastic integral by a method that is correct
for processes with nonzero correlation times, and then approaching the limit in
which the correlation time goes to zero (the Brownian motion limit). The limiting
result that follows from our iterative procedure, disagrees with the Ito definition
of stochastic integral, and agrees with the Stratanovich definition. It is also less
likely to be misleading as conflicting results were present in John Hull's book on
Options, Futures and Other Derivative Securities.
In Chapter 17 we turn to methods that apply to economic time series and other
forms including microwave devices and global warming. How can the spectrum of
economic time series be evaluated to detect and separate seasonal and long term
trends? Can one devise a trading strategy using this information?
How can one determine the presence of a long term trend such as global warm-
ing from climate statistics? Why are these results sensitive to the choice of year
from solar year, sidereal year, equatorial year, etc. Which one is best? The most
viii PREFACE
careful study of such time series by David J. Thomson will be reviewed. For exam-
ple, studies of global warming are sensitive to whether one uses the solar year,
sidereal year, the equatorial year or any of several additional choices!
This book is based on a course on Random Processes in Physics and Finance
taught in the City College of City University of New York to students in physics
who have had a first course in "Mathematical Methods". Students in engineering
and economics who have had comparable mathematical training should also be
capable of coping with the text. A review/summary is given of an undergraduate
course in probability. This also includes an appendix on delta functions, and a fair
number of examples involving discrete and continuous random variables.
Contents
A Note from Co-authors xiv
1 Review of probability 1
1.1 Meaning of probability 1
1.2 Distribution functions 4
1.3 Stochastic variables 5
1.4 Expectation values for single random variables 5
1.5 Characteristic functions and generating functions 7
1.6 Measures of dispersion 8
1.7 Joint events 12
1.8 Conditional probabilities and Bayes'theorem 16
1.9 Sums of random variables 19
1.10 Fitting of experimental observations 24
1.11 Multivariate normal distributions 29
1.12 The laws of gambling 32
1.13 Appendix A: The Dirac delta function 35
1.14 Appendix B: Solved problems 40
2 What is a random process 44

2.1 Multitime probability description 44
2.2 Conditional probabilities 44
2.3 Stationary, Gaussian and Markovian processes 45
2.4 The Chapman-Kolmogorov condition 46
3 Examples of Markovian processes 48

3.1 The Poisson process 48
3.2 The one dimensional random walk 50
3.3 Gambler's ruin 52
3.4 Diffusion processes and the Einstein relation 54
3.5 Brownian motion 56
3.6 Langevin theory of velocities in Brownian motion 57
3.7 Langevin theory of positions in Brownian motion 60
3.8 Chaos 64
3.9 Appendix A: Roots for the gambler's ruin problem 64
3.10 Appendix B: Gaussian random variables 66
x CONTENTS
4 Spectral measurement and correlation 69
4.1 Introduction: An approach to the spectrum of a stochastic
process 69
4.2 The definitions of the noise spectrum 69
4.3 The Wiener-Khinchine theorem 71
4.4 Noise measurements 73
4.5 Evenness in u> of the noise? 75
4.6 Noise for nonstationary random variables 77
4.7 Appendix A: Complex variable notation 80
5 Thermal noise 82
5.1 Johnson noise 82
5.2 Equipartition 84
5.3 Thermodynamic derivation of Johnson noise 85
5.4 Nyquist's theorem 87
5.5 Nyquist noise and the Einstein relation 90
5.6 Frequency dependent diffusion constant 90
6 Shot noise 93
6.1 Definition of shot noise 93
6.2 Campbell's two theorems 95
6.3 The spectrum of filtered shot noise 98
6.4 Transit time effects 101
6.5 Electromagnetic theory of shot noise 104
6.6 Space charge limiting diode 106
6.7 Rice's generalization of Campbell's theorems 109
7 The fluctuation-dissipation theorem 113

7.1 Summary of ideas and results 113
7.2 Density operator equations 117
7.3 The response function 119
7.4 Equilibrium theorems 121
7.5 Hermiticity and time reversal 122
7.6 Application to a harmonic oscillator 123
7.7 A reservoir of harmonic oscillators 126
8 Generalized Fokker-Planck equation 129

8.1 Objectives 129
8.2 Drift vectors and diffusion coefficients 131
8.3 Average motion of a general random variable 134
8.4 The generalized Fokker-Planck equation 137
8.5 Generation-recombination (birth and death) process 139
CONTENTS xi
8.6 The characteristic function 143
8.7 Path integral average 146
8.8 Linear damping and homogeneous noise 149
8.9 The backward equation 152
8.10 Extension to many variables 153
8.11 Time reversal in the linear case 160
8.12 Doob's theorem 162
8.13 A historical note and summary (M. Lax) 163
8.14 Appendix A: A method of solution of first order PDEs 164
9 Langevin processes 168

9.1 Simplicity of Langevin methods 168
9.2 Proof of delta correlation for Markovian processes 169
9.3 Homogeneous noise with linear damping 171
9.4 Conditional correlations 173
9.5 Generalized characteristic functions 175
9.6 Generalized shot noise 177
9.7 Systems possessing inertia 180
10 Langevin treatment of the Fokker-Planck process 182

10.1 Drift velocity 182
10.2 An example with an exact solution 184
10.3 Langevin equation for a general random variable 186
10.4 Comparison with Ito's calculus lemma 188
10.5 Extending to the multiple dimensional case 189
10.6 Means of products of random variables and noise source 191
11 The rotating wave van del Pol oscillator (RWVP) 194

11.1 Why is the laser line-width so narrow? 194
11.2 An oscillator with purely resistive nonlinearities 195
11.3 The diffusion coefficient 197
11.4 The van der Pol oscillator scaled to canonical form 199
11.5 Phase fluctuations in a resistive oscillator 200
11.6 Amplitude fluctuations 205
11.7 Fokker-Planck equation for RWVP 207
11.8 Eigenfunctions of the Fokker-Planck operator 208
12 Noise in homogeneous semiconductors 211

12.1 Density of states and statistics of free carriers 211
12.2 Conductivity fluctuations 215
12.3 Thermodynamic treatment of carrier fluctuations 216
12.4 General theory of concentration fluctuations 218
xii CONTENTS
12.5 Influence of drift and diffusion on modulation noise 222
13 Random walk of light in turbid media 227

13.1 Introduction 227
13.2 Microscopic statistics in the direction space 229
13.3 The generalized Poisson distribution pn(t) 232
13.4 Macroscopic statistics 233
14 Analytical solution of the elastic transport equation 237

14.1 Introduction 237
14.2 Derivation of cumulants to an arbitrarily high order 238
14.3 Gaussian approximation of the distribution function 242
14.4 Improving cumulant solution of the transport equation 245
15 Signal extraction in presence of smoothing and noise 258

15.1 How to deal with ill-posed problems 258
15.2 Solution concepts 259
15.3 Methods of solution 261
15.4 Well-posed stochastic extensions of ill-posed processes 264
15.5 Shaw's improvement of Franklin's algorithm 266
15.6 Statistical regularization 268
15.7 Image
"&v restoration 270
16 Stochastic methods in investment decision 271

16.1 Forward contracts 271
16.2 Futures contracts 272
16.3 A variety of futures 273
16.4 A model for stock prices 274
16.5 The Ito's stochastic differential equation 278
16.6 Value of a forward contract on a stock 281
16.7 Black-Scholes differential equation 282
16.8 Discussion 283
16.9 Summary 286
17 Spectral analysis of economic time series 288

17.1 Overview 288
17.2 The Wiener-Khinchine and Wold theorems 291
17.3 Means, correlations and the Karhunen-Loeve theorem 293
17.4 Slepian functions 295
17.5 The discrete prolate spheroidal sequence 298
17.6 Overview of Thomson's procedure 300
17.7 High resolution results 301
CONTENTS xiii
17.8 Adaptive weighting 302
17.9 Trend removal and seasonal adjustment 303
17.10 Appendix A: The sampling theorem 303
Bibliography 307
Index 323
A note from co-authors
Most parts of this book were written by Distinguished Professor Melvin Lax
(1922-2002), originated from the class notes he taught at City University of New
York from 1985 to 2001. During his last few years, Mel made a big effort in edit-
ing this book and, unfortunately, was not able to complete it before his untimely
illness.
Our work on the book is mostly technical, including correcting misprints and
errors in text and formulas, making minor revisions, and converting the book to
LaTex. In addition, Wei Cai wrote Chapter 14, Section 10.3-10.5, Section 16.8,
and made changes to Section 8.3, 16.4, 16.6 and 16.7; Min Xu wrote Chapter 13
and partly Section 15.6.
We dedicate our work in this book in memory of our mentor, colleague and
friend Melvin Lax. We would like to thank our colleagues at the City College
of New York, in particular, Professors Robert R. Alfano, Joseph L. Birman and
Herman Z. Cummins, for their strong support for us to complete this book.
Wei Cai
MinXu
1
Review of probability
Introductory remarks
The purpose of this chapter is to provide a review of the concepts of probability

for use in our later discussion of random processes. Students who have not had
an undergraduate probability course may find it useful to have some collateral
references to accompany our necessarily brief summary.
Bernstein (1998) provides a delightful historical popularization of the ideas of
probability from the introduction of Arabic numbers, to the start of probability
with de Mere's dice problem, to census statistics, to actuarial problems, and the
use of probability in the assessment of risk in the stock market. Why was the book
titled Against the Gods ? Because there was no need for probability in making
decisions if actions are determined by the gods, it took the Renaissance period
before the world was ready for probability.
An excellent recent undergraduate introduction to probability is given by Ham-
ming (1991). The epic work of Feller (1957) is not, as its title suggests, an
introduction, but a two-volume treatise on both the fundamentals and applications
of probability theory. It includes a large number of interesting solved problems.
A review of the basic ideas of probability is given by E. T. Jaynes (1958). A
brief overview of the frequency ratio approach to probability of von Mises, the
axiomatic approach of Kolmogorov, and the subjective approach of Jeffreys is
presented below.
1.1 Meaning of probability
The definition of probability has been (and still is) the subject of controversy. We
shall mention, briefly, three approaches.
1.1.1 Frequency ratio definition

R. von Mises (1937) introduced a definition based on the assumed existence of a
limit of the ratio of successes S to the total number of trials N;
2 REVIEW OF PROBABILITY
If the limit exists:
it is regarded as the definition of the probability of success. One can object that
this definition is meaningless since the limit does not exist, in the ordinary sense,
that for any e there exists an N such that for all M > N, |P/v - P\ < e. This
limit will exist, however, in a probability sense; namely, the probability that these
inequalities will fail can be made arbitrarily small. The Chebycheff inequality of
Eq. (1.32) is an example of a proof that the probability of a deviation will become
arbitrarily small for large deviations. What is the proper statement for the definition
of probability obtained as a "limit" of ratios in a large series of trials?
1.1.2 A priori mathematical approach (Kolmogorov)

Kolmogorov (1950) introduced an axiomatic approach based on set theory. The
Kolmogorov approach assumes that there is some fundamental set of events whose
probabilities are known, e.g., the six sides of a die are assumed equally likely to
appear on top. More complicated events, like those involving the tossing of a pair
of dice, can be computed by rules of combining the more elementary events.
For variables that can take on continuous values, Kolmogorov introduces set
theory and assigns to the probability, p, the ratio between the measure of the set
of successful events and the measure of the set of all possible events. This is a
formal procedure and begs the question of how to determine the elementary events
that have equal probabilities. In statistical mechanics, for example, it is customary
to assume a measure that is uniform in phase space. But this statement applies to
phase space in Cartesian coordinates, not, for example in spherical coordinates.
There is good reason, based on how discrete quantum states are distributed, to
favor this choice. But there is no guide in the Kolmogorov approach to probability
theory for making such a choice.
The rigorous axiomatic approach of Kolmogorov raised probability to the level
of a fully acceptable branch of mathematics which we shall call mathematical
probability. A major contribution to mathematical probability was made by Doob
(1953) in his book on Stochastic Processes and his rigorous treatment of Brownian
motion. But mathematical probability should be regarded as a subdivision of prob-
ability theory which includes consideration of how the underlying probabilities
should be determined.
Because ideal Brownian motion involves white noise (a flat spectrum up to
infinite frequencies) sample processes are continuous but not differentiable. This
problem provides a stage on which mathematicians can display their virtuosity in
set theory and Lebesgue integration. When Black and Scholes (1973) introduced
a model for prices of stock in which the logarithm of the stock price executes a
MEANING OF PROBABILITY 3
Brownian motion, it supplied the first tool that could be used to price stock (and
other) options. This resulted in a Nobel Prize, a movement of mathematicians (and
physicists) into the area of mathematical finance theory and a series of books and
courses in which business administration students were coerced into learning set
and Lebesgue integration theory. This was believed necessary because integrals
over Brownian motion variables could not be done by the traditional Riemann
method as the limit of a sum of terms each of which is a product of a function
evaluation and an interval. The difficulty is that with pure Brownian motion the
result depends on where in the interval the function must be evaluated. Ito (1951)
chose to define a stochastic integral by evaluating the function at the beginning of
the interval. This was accompanied by a set of rules known as the Ito calculus.
Mathematical probability describes the rules of computation for compound
events provided that the primitive probabilities are known. In discrete cases like
the rolling of dice there are natural choices (like giving each side of the die equal
probability). In the case of continuous variables, the choice is not always clear, and
this leads to paradoxes. See for example Bertrand's paradox in Appendix B of this
chapter. Feller (1957) therefore makes the logical choice of splitting his book into
two volumes the first of which deals with discrete cases. The hard work of dealing
with continuous variables is postponed until the second volume.
What "mathematical probability" omits is a discussion of how contact must be
made with reality to determine a model that yields the correct measure for each
set in the continuous case. The Ito model makes one arbitrary choice. Stratonovich
(1963) chooses not the left hand point of the interval, but an average over the left
and right hand points. These two procedures give different values to a stochastic
integral. Both are arbitrary.
As a physicist, I (Lax) argue that white noise leads to difficulties because the
integrated spectrum, or total energy, diverges. In a real system the spectrum can be
nearly flat over a wide range but it must go to zero eventually to yield a finite
energy. For real signals, first derivatives exist, the ordinary Riemann calculus
works in the sense that the limiting result is insensitive to where in the interval
the function is evaluated. Thus the Ito calculus can be avoided. One can obtain the
correct evaluation at each stage, and then approach the limit in which the spectrum
becomes flat at infinity. We shall see in Chapters 10 and 16 that this limiting result
disagrees with Ito's and provides the appropriate result for the ideal Brownian
limit.
1.1.3 Subjective probability

Jeffreys (1957) describes subjective probability in his book on Scientific Inference.
One is forced in life to assign probabilities to events where the event may occur
only once, so the frequency ratio can not be used. Also, there may be no obvious
elementary events with equal probabilities, e.g. (1) what is the probability that
Clinton would be reelected? (2) What is the probability that the Einstein theory
of general relativity is correct? The use of Bayes' theorem, discussed in Section
1.8, will provide a mechanism for starting with an a priori probability, chosen in
a possibly subjective manner, but calculating the new, a posteriori probability that
would result if new experimental data becomes available.
Although Bayes' theorem is itself irreproachable, statisticians divide into two
camps, Bayesians and non-Bayesians. There are, for example, maximum like-
lihood methods that appear to avoid the use of a priori probabilities. We are
Bayesians in that we believe there are hidden assumptions associated with such
methods, and it would be better to state one's assumptions explicitly, even though
they may be difficult to ascertain.
1.2 Distribution functions
We shall sidestep the above controversy by assuming that for our applications there
exists a set of elementary events whose probabilities are equal, or at least known,
and shall describe how to calculate the probability associated with compound
events. Bertrand's paradox in Appendix l.B illustrates the clear need for prop-
erly choosing the underlying probabilities. Three different solutions are obtained
there in accord with three possible choices of that uniform set. Which choice is
correct turns out not to be a question of mathematics but of the physics underlying
the measurements.
Suppose we have a random variable X that can take a set S of possible values
Xj for j = 1,2,..., N. It is then assumed that the probability
of each event j is known. Moreover, since the set of possible events is complete,
and something must happen, the total probability must be unity:
If X is a continuous variable, we take
as given, and assume completeness for the density function p(x) in the form
STOCHASTIC VARIABLES 5
The "discrete" case can be reformatted in continuous form by writing
where 6(x) is the Dirac delta function discussed in Appendix l.A. It associates a
finite probability PJ to the value X = Xj.
Since mathematicians (until the time of Schwarz) regarded delta functions as
improper mathematics, they have preferred to deal with the cumulative density
function
which they call a distribution whereas physicists often use the name distribution
for a density function p(x). The cumulative probability replaces delta functions
by jump discontinuities which are regarded as more palatable. We shall only
occasionally find it desirable to use the cumulative distribution.
1.3 Stochastic variables
We shall refer to X as a random (or stochastic) variable if it can take a set (discrete
or continuous) of possible values x with known probabilities. With no loss of gen-
erality, we can use the continuous notation. These probabilities are then required
to obey
Mathematicians prefer the latter form and refer to the integral as a Stieltjes integral.
We have tacitly assumed that X is a single (scalar) random variable. However,
the concept can be immediately extended to the case in which X represents a
multidimensional object and x represents its possible values.
1.4 Expectation values for single random variables
If X is a random variable, the average or expectation of a function f ( X ) of X is

defined by
in the discrete case. Thus, it is the value f ( x j ) multiplied by the probability PJ of

assuming that value, summed over all possibilities. The corresponding statement
6 R E V I E W OF PROBABILITY
in the continuous case is
If the range of x is broken up into intervals [XJ,XJ+AXJ] where Xj + AXJ = Xj+i,

then the discrete formula with
is simply the discrete approximation to the integral.

The most important expectation is the mean value (or first moment)
More generally, the nth moment of the probability distribution is defined by:
The discrete case is included, with the help of Eq. (1.7).

Note that moments need not necessarily exist. For example the Lorentz-
Cauchy distribution
has the first moment m but no second or higher moments.

It may seem to be a tautology, but the choice f ( X ) = 6(a — X) yields as
expectation value
the probability density itself. Equation (1.65), below, provides one example in
which this definition is a useful way to determine the density distribution. In
attempting to determine the expectations of a random variable, it is often more
efficient to obtain an equation for a generating function of the random variable
first. For example, it may be faster to calculate the expectation exp(itX) which
includes all the moments (Xn} than to calculate each moment separately.
CHARACTERISTIC FUNCTIONS AND GENERATING FUNCTIONS 7
1.5 Characteristic functions and generating functions
The most important expectation is the characteristic function
which is the Fourier transform of the probability distribution. Note that </)(t) exists
for all real t and has the properties
and
for all t. We shall assume that the Stieltjes form of integral is used if needed. If all
moments of Eq. (1.14) exist then
provides a connection between the characteristic function and the moments. This
function <f>(t) is a so called generating function of a random variable.
A frequently used generating function can be obtained by setting
so that
Note that t is not the time, but just a parameter. One could equally well have used
k. The variable z is frequently used directly when the range of x is the set of
integers xr = r, and then
is referred to as a generating function.

van Kampen (1982), following Lukacs (1960) and Moran (1968), proved that
the inequality, Eq. (1.19), can be sharpened to
provided that the variable X is not a lattice variable whose range of values is given
by:
If the range of X has an upper bound, i.e., p ( x ) = 0 for x > xmax, then it is
convenient to deal with the generating function
for s > 0. If it is bounded from below, one can use
When neither condition is obeyed, one may still use these definitions with s
understood to be pure imaginary.
1.6 Measures of dispersion
In this section, we shall introduce moments that are taken with respect to the
mean. They are independent of origin, and give information about the shape of
the probability density. In addition, there is a set of moments known as cumulants
to physicists, or Thiele semi-invariants to statisticians. These are useful in describ-
ing the deviation from the normal error curve since all cumulants above the second
vanish in the Gaussian case.
Moments
The most important measure of dispersion in statistics is the standard deviation a
defined by
since it describes the spread of the distribution p ( x ) about the mean value of x
m = (x).
Chebychev's inequality
guarantees that the probability of deviations that are large multiple h of the
standard deviation a must be small.
Proof
since the full value of <r2 would be obtained by adding the (positive) integral over
the region m — ha < x <m + ha. In each of these integrals, (x — m) 2 > (/icr)2.
MEASURES OF DISPERSION 9
The inequality remains if we replace the RHS by its smallest possible value
or
The remarkable Chebychev inequality limits the probability of large deviations

with no knowledge of the shape of p ( x ) , only its standard deviation. It is useful
in many applications, including the proof by Welsh (1988) of the noisy coding
theorem.
The standard deviation a is the second order moment about the mean. Higher
order moments about the mean are defined (Kendall and Stuart 1969) by
for moments about the mean, m = (x}, and to use Eq. (1.14)
for the ordinary moments. Thus /Z2 = o"2, and /j,i = 0. The binomial expansion of
Eq. (1.33) yields
where . = .,,";_ ^,
1 is the binomial coefficient. In particular:
LJ J •''^' - ''
Conversely:
Cumulants
The cumulants to be described in this section are useful since they indicate clearly
the deviation of a random variable from that of a Gaussian. They are sometimes
referred to as Thiele semi-invariants (Thiele 1903).
The cumulants KJ are defined by
Note that normalization of the probability density p(x) guarantees that //(, = 1 and
KO = 0. Equivalently,
By separating a factor exp(imt) the cumulants can be expressed in terms of the

central moments:
Thus K\ = m, and the higher AC'S are expressible in terms of the moments of
(x — m). In particular:
Cumulants were introduced as a tool in quantum mechanics by Kubo (1962) in

conjunction with a convenient notation for the Kn as linked moments
where the individual linked moments must still be calculated by Eqs. (1.45)-
(1.49). However, Eq. (1.43) can be written in a nice symbolic form:
Example The normal error distribution (with mean zero) associated with a
Gaussian random variable X,
MEASURES OF DISPERSION 11
has the characteristic function
The integral can be performed by completing the square and introducing x' =
x — ia2t as the new variable of integration. In particular, the cumulants are all
determined by In <f>(t] to be
The characteristic function, Eq. (1.54), can be rewritten in the form
where X is a Gaussian random variable of mean 0. A Gaussian variable with a

nonvanishing mean can be handled by a shift of origin. For any Gaussian variable
X with mean m = (X) not necessarily 0, we have
where
This is a convenient way to calculate the average of an exponential of a Gaussian

random variable. Since the Fourier transform of a Gaussian is a Gaussian, we know
the form of the right hand sides of Eqs. (1.56) and (1.57). The coefficients could
have been obtained simply by expanding both sides in powers of t and equating
the coefficients oft" for n = 0,1, 2.
Skewness and kurtosis

The cumulants describe the probability distribution in an intrinsic way by subtract-
ing off the effects of all lower-order moments. Thus (X2} has a value that depends
on the choice of origin whereas ^2 = a2 = ((X — m) 2 ) describes the spread about
the mean. Similarly, ^3 = ^3 describes the skewness or asymmetry of a distribu-
tion, and «4 = /Z4 — 3^2 describes the "kurtosis" of the distribution, that is, the
extent to which it differs from the standard bell shape associated with the normal
error curve. These measures are usually stated in the dimensionless form
These measures 71 and 72 clearly vanish in the Gaussian case. Moreover, they
provide a pure description of shape independent of horizontal position or scale.
1.7 Joint events
Suppose that we have two random variables X and Y described (when taken sep-
arately) by the probability densities p\ (x) andp2(y) respectively. The probability
that X is found in the interval (x, x + dx) and at the same time y is found in the
interval (y, y + dy) is described by the joint probability density
Define p\(x)dx as the probability of finding X in (x, x + dx} regardless of the

value of Y. Then we can write
which is referred to as the marginal distribution of x. Conversely,
Example
Two points x and y are selected, at random, uniformly on the line from 0 to 1.
(a) What is the density function p(£) of the separation £ = \x — y\l
(b) What is the mean separation?
(c) What is the root mean squared separation [{£2} — (£) 2 ] ?
(d) What is (W(|x — y|)) for an arbitrary function W?
Solution
It is necessary to map the square with vertexes at the four point in the (x, y) plane:
to the corresponding points in the (u, v) plane,
by a transformation whose Jacobian is unity (see Fig. 1.1).

In the (u, v) plane, the four points are at
Using Eq. (1.62) and Eq. (1.64), the density function p(£) is then given by
JOINT EVENTS 13
FlG. 1.1. Transformation from x, y variables to u, v variables.
Note that our use of a delta function to specify the variable we are interested in
is one of our principal tools. It fulfills our motto that experimentalists do it with
mirrors, and theorists do it with delta functions (and Green's functions).
The solution (1.65) is even in u, we can integrate over 1/2 the interval and
double the result:
We can verify that

(a) Normalization
(b) The average separation between points is
(c) The mean square separation is
(d) The average of any function W of the separation \X — Y\ is
where the last integral, with the value over unity, was entered as a means of intro-
ducing the variable £. Rearranging the order of integration we get for the right
FlG. 1.2. The events A and B are nonoverlapping, and the probability of at least
one of these occurring is the sum of the separate probabilities.
hand side:
where the second integral is simply the definition of p(£). Restoring the left hand
side we have established another tautology:
where W(Q is an arbitrary function and p(£) was given in Eq. (1.66). Finally we
obtain an explicit formula
Disjoint events
If A and B are disjoint events (events that cannot both occur), then
where A U B means the union of A and B, that is, at least one of the events A or B
has occurred. In the language of set theory the intersection, A n B, is the region in
which both events have occurred. For disjoint sets such as those shown in Fig. 1.2,
the intersection vanishes.
Overlapping events
The probability that at least one of two events A and B has occurred when overlap
is possible is
because the sum of the first two terms counts twice the probability p(A n B) that
both events occur (the shaded area in Fig. 1.3).
JOINT EVENTS 15
FIG. 1.3. Events A and B that overlap are displayed. The hashed overlap region
is called the intersection and denoted A n B,
Note that an event A or B need not be elementary. For example, A could repre-
sent the tossing of a die with an odd number appearing which is the union of these
events, a one, a three, or a five appearing. B could be the union of the events one
and two.
Suppose YJ is a random variable that takes the value 1 if event Aj occurs, and
zero otherwise. We have thus introduced a projection operator, that is the analogue
for discrete variables of our introduction of a delta function for continuous vari-
ables. The probability that none of the events Ai,A2,..., An has occurred can be
written as
The probability that one or more events has occurred can be written as a
generalization of Eq. (1.75):
The use of projection operators such as YJ is often a convenient tool in solving

probability problems.
Uncorrelated or independent random variables

Two random variables X and Y are said to be uncorrelated if
The variables are said to be independent if
It is clear that two independent variables are necessarily uncorrelated. The

converse is not necessarily true.
1.8 Conditional probabilities and Bayes' theorem
If X and Y are two, not necessarily independent variables, the conditional proba-
bility density P(y\x)dy that Y takes a value in the range [y, y + dy] given that X
has the value x is denned by
where P(x, y} is the joint probability density of x and y and
is the probability density of x if no information is available about y. Of course, Eq.

(1.80) is also valid with the variables x and y interchanged so that
The notation in which the conditioned variables appear on the right is common in
the mathematical literature. It is also consistent with quantum mechanical notation
in which one reads the indexes from right to left. Thus verbally we say that the
probability that X and Y take the values x and y is the probability that X takes
the value x times the probability that Y will take the value y knowing that X has
taken the value x, a conclusion that now appears obvious.
Equation (1.82) is a general equation that imposes no requirements on the
nature of the random variables. Moreover, the same idea applies to events A and
B which may be more complicated than single random variables. Thus
is valid if B represents the event Xn = xn and A represents the compound event

Xi = xi,X2 = X2,...,Xn-i =£„_!. Thus
Suppose that Ac is the complementary event to A (anything but A). Then these
events are mutually exclusive and exhaustive:
CONDITIONAL PROBABILITIES AND BAYES' THEOREM 17
c
Then the events A n B and A n B are mutually exclusive and their union is B.
Thus
By the same argument, if the set of events Aj are mutually exclusive, AiCiAj =
0, and exhaustive.
then Eq. (1.87) generalizes to
which describes the probability of an event B in terms of possible starting points,

or hypotheses Aj.
Bayes' theorem
One can determine the probability of a hypotheses Aj if we have made a mea-
surement B. This conditional probability probability P(Aj\B) is given by Bayes'
thp.nrp.TTT
The first equality follows directly from the definition Eq. (1.80) of a conditional
probability. The second equality is obtained by inserting Eq. (1.88). The impor-
tance of Bayes' theorem is that it extracts the a posteriori probability, P(Aj\B),
of a hypothesis Aj, after the observation of an event B from the a priori probability
P(Aj] of the hypothesis Aj.
For simple systems like the tossing of a die, the a priori probabilities are
known. In more general problems they have to be estimated, possibly as subjective
probabilities. Bayesians believe that this step is necessary. Anti-Bayesians do not.
They try to use another approach, such as maximum likelihood. In our opinion this
approach is equivalent to making a tacit assumption for the a priori probabilities.
We would prefer explicit assumptions.
Bernstein (1998) notes that Thomas Bayes (1763), an English minister,
published no mathematical works while he was alive. But he bequeathed his
manuscripts, in his will, to a preacher, Richard Price, who passed it to another
member of the British Royal Society, and his paper Essay Towards Solving A Prob-
lem in the Doctrine of Chance was published two years after his death. Although
Bayes' work was still ignored for two decades after his death in 1761, he has since
become famous among statisticians, social and physical scientists.
Example
It is known that of 100 million quarters, 100 are two-headed. Thus the a priori
probability of a coin being two-headed is 10~6. A quarter is selected at random
from this population and tossed 10 times. Ten heads are obtained. What is the
probability that this coin is two headed?
Solution
Let AI = two headed, A^ = A\ = fair coin, B = ten heads tossing result, we
have.
Then,
Thus observing 10 heads in a row has caused the a priori probability, 10 6, of

a bad coin to increase to about 10~3. The point of this problem, as a Bayesian
would say, is that one can never calculate the a postieri probability of a hypothesis
without the use of Bayes' theorem with a choice of a priori probability, possibly
determined by some subjective means.
Example
Two points are chosen at random in the interval [0,1]. They are connected by a
line. Two more points are then chosen over the same interval and connected by a
second line. What is the probability that the lines overlap?
Solution
We will choose the complementary question: what is the probability that they do
not overlap? which is easier to answer.
Suppose the first two points are x and y. No overlap will occur if the next two
points are both left of the smaller of x, y or both right of the larger of x, y. By
symmetry, the second probability is the same as the first.
SUMS OF RANDOM VARIABLES 19
Suppose x is the smaller of x and y. The probability that the third point is
less than x is x. The probability that the fourth point is less than x is also x. The
probability that both are less than x is x 2 . What is the probability density P(x = £)
given that x < y! This conditional probability is
where H(x) is the Heaviside step function, H(x) = 1 if x > 0 and H(x) = 0
otherwise. We can evaluate the denominator in Eq. (1.91):
Thus the conditional probability is given by
The probability of both to the left is then f^ £2d£2(l - £) = 1/6. Therefore

the probability of no overlap is 1/6 x 2 = 1/3 and the probability of overlap is
1 - 1 / 3 = 9 /3
1.9 Sums of random variables
The characteristic function of the sum of two random variables, Z = X + Y, is
If the variables are independent, then the averages over x and y can be performed
separately with the result
Because the cumulants are denned in terms of the logarithm of the characteristic
function- the cnmnlants are additive:
More generally, if
and the X, are independent variables
The characteristic function of the joint distribution, p(x,y), of the two random
variables is defined by
If X and Y are two independent Gaussian random variables, so is their linear

combination. By Eq. (1.56), we then have
where the deviation variables are denned by
If these variables X,Y are uncorrelated, i.e., (XY) = (X)(Y) = 0, then the
characteristic function factors:
Since p(x, y) can be obtained by taking the inverse Fourier transform of </>(s, t),
it too must factor. Hence we arrive at the result: if two Gaussian variables are
uncorrelated, they are necessarily independent.
Bernoulli trials and the binomial distribution

Bernoulli considered a set of independent trials each with probability p of success
and q = 1 — p of failure. In n trials, the probability that the first r trials are a
success and the remainder are failures is
If we ask for the probability that there are r successes in n trials without regard to
order, the probability will be
u/1-».o»-a
is the number of ways r objects can be drawn from n indistinguishable objects. We

shall show how to derive Eqs. (1.105) and (1.106) without knowing the combinato-
rial coefficients by using the characteristic function. If we define a random variable
S that takes the value 1 for a success and 0 for a failure, then the characteristic
function in a single trial is
so that the characteristic function for n independent trials by Eq. (1.99) is
With z = elt, the generating function can be expanded using the binomial theorem
Since the coefficient of zr in the generating function is Pr [or Pr (n) with n fixed
and r variable!, we have established that
Comparison with Eq. (1.105) shows that the combinatorial and binomial coeffi-
cients are equal
With the abbreviation 0 = it, Eq. (1.108) can be rewritten as
from which we deduce the cumulants to be

The measures of skewness and kurtosis were given in Eq. (1.58)
The higher order dimensionless ratios, j > 2, depend on j roughly as,
So all vanish as n —> oo for j > 2.

Thus as n —> oo, <p(t) becomes a Gaussian function of t, with
2
cr = npq. This means that the number of successes r (regarded as values of a
random variable S) acquires a Gaussian distribution such that
becomes a Gaussian random variable of mean zero and variance equal to unity.
These statements are heuristic. They tacitly assume what is known as the con-
tinuity theorem of probability discussed by E. Parzen (1960) in his Section 10.3.
This theorem states that the cumulative distribution function Pn and the charac-
teristic function </>n are related in a continuous way. Pn converges at all points
of continuity of P if and only if the sequence of characteristic functions 0n (t)
converge at each real t to the characteristic function <p(t) of P.
The result we have just found for the binomial distribution, that the normalized
sum variable
(where Xj = 1 for a success and 0 for a failure in an individual trial) approaches a

Gaussian random variable of mean zero and variance one, is a special case of the
central limit theorem. The latter theorem applies to any set of independent random
variables Xj (not necessarily identically distributed), under rather mild conditions.
Here
is the mean, and

is the variance of the sum variable V Xj. The Laplace-Liapounoff form of the
central limit theorem uses as a sufficient condition that
tends to 0 as n —>• oo for some S > 0. Then the cumulative probability, Pn(u), that
tends uniformly to the limit
See for example Uspensky (1937), Chapter 14. The condition (1.120) is less
stringent than the set of conditions in Eq. (1.115).
The Poisson distribution

If n —» oo for fixed p, the binomial distribution approaches the Gaussian distri-
bution, which is an example of the central limit theorem. Another distribution,
appropriate for rare events, may be obtained by letting n —» oo, p —» 0 with a fixed
product
or fixed mean value.

As n —> oo, we can utilize Stirling's asymptotic approximation for the factorial
function
to obtain the limiting behavior for large n
Then the binomial distribution
As p —> 0 and n —> oo (for fixed r)

Thus the binomial distribution in the rare event limit approaches the Poisson
distribution:
The associated generating function
yields the corresponding characteristic function
The cumulants then take the remarkable simple single value for all s
A completely different approach to the Poisson process is given in Section 3.2.
1.10 Fitting of experimental observations
Linear fit
Suppose we expect, on theoretical grounds, that some theoretical variable Y should
be expressible a linear combination of the variables X^. We wish to determine
the coefficients aM of their linear expansion in such a way as to best fit a set of
experimental data, i = 1 to N, of values X^, Yi. We can choose to minimize the
least squares deviation between the "theoretical" value of Yi
and the observed value Yi by minimizing the sum of the squares of the deviations
between experiment and theory:
The number of measurements, N, is in general, much larger than the number, s, of

unknown coefficients, a^. The expected value of F is then
FITTING OF EXPERIMENTAL OBSERVATIONS 25
where
We can minimize by differentiating (F) with respect to a\. After removing a factor
of two, we find that the av obey a set of linear equations:
With MXv = (XXX»}, and Bx = ( X X Y } , we have the matrix equation
which provides a least squares fit to the date by a linear function of the set of
individual random variables.
The logic behind this procedure is that the measurement of Yi is made subject
to an experimental error that we treat as a Gaussian random variable e^. Errors
in the points X? of measurement are assumed negligible, and for simplicity, we
assume (eê.,) = cV,<72 with <r2 independent of i. If a depends on i, then ith term
in Eq. (1.133) should contain a factor I/of. A more interesting question is how to
test the adequacy of the assumption that a linear fit is the correct one.
If there were only two independent variables X1 and X2, we could think of
these as horizontal coordinates and Y as a vertical coordinate. We have minimized
the sum of the squares of the vertical deviations because it was assumed that there
are no horizontal errors.
Nonlinear fit
If one expects a nonlinear relation of the form
where the a A are the parameters of the fit, then it is customary to attempt to
minimize the sum of the squares of the deviations
Although it is, in principle, possible to minimize by differentiating F with respect

to the parameters a\ an explicit solution for the latter is, in general, no longer
possible. It is then customary to use a nonlinear least squares fitting program such
as NLLSQ or Taurus N2F or N2G from the PORT library to obtain the best set
of parameters, or UNCMIN (unconstrained minimization) by Kahaner, Moler, and
Nash (1989), or the Marquard method (MRQMIN) from the Numerical Recipes
library by Press et al. (1992).
Chi-square test of goodness of fit

A common problem consists in deciding whether a set of experimental data is
consistent with a given distribution p ( x ) . The common procedure is to decompose
the line — oo < x < oo into a finite set of k regions, so that in region j there is a
probability
of cij < x < bj, and an expected number (HJ) = npj of events in that interval.
We then observe an actual set of re/s. Are these observations compatible with the
assumed p ( x ) distribution? Karl Pearson (1900) has established that the n/s are
described by a multivariate Gaussian of the form
where
but these k variables are subject to one constraint
Such a constraint can be handled by a rotation of coordinates to a new set of

variables yi, 2/2, • • • > y/t such that the last variable is given by
Then our distribution will be

FITTING OF EXPERIMENTAL OBSERVATIONS 27
where
The variable X (referred to as chi-square) describes the deviation between the

hypothetical distribution npj and the actual distribution HJ. If X2 is too large, we
can reject the assumption that the original distribution {PJ } is correct.
Since Pearson's proof may be difficult to obtain in a library, we mention that
a later text by Alexander (1961) summarizes Pearson's proof in a manner adapted
from the classic text by Fry (1928).
How is the chi-qquare test used to test goodness of fit? It is clear that when
chi-square is small, the hypothesis and the experimental results are compatible.
When chi-square is large, however, either the observation is a large fluctuation of
low probability, or the original hypothesis is incorrect. When is chi-square X2 to
be regarded as large, and when as small? For this purpose, one uses the cumulative
distribution
that describes the probability of observing data leading to a chi-square larger than
the observed value. A fairly general but somewhat arbitrary convention is to use
5% as the dividing line between small and large. Thus a deviation large enough
to have a less than 5% chance of being observed (if the hypothesis is correct) is
used to cast suspicion on the validity of the hypothesis. Actually, this procedure
gives no probability of correctness of the hypothesis, only the probability that
of observing the given event if the hypothesis is assumed correct. As Bayesians,
we claim that one needs an a priori probability of correctness to deduce the a
posteriori probability of correctness with the help of Bayes' theorem. With only
qualitative a priori information, one could reduce the 5% level say to 1% if one
has peripheral information leading one to have some faith in the hypothesis.
Can chi-square be too small? There was an article in Science magazine within
the last 10 years that argued that the experimental results supporting Coulomb's
inverse square law had such a small chi-square that the fit was too good and that
the experimental data was doctored.
The x2 (chi-square) distribution

How large is too large? For this purpose, we need the distribution of values of
z = X2 in Eq. (1.149). Kendall and Stuart (1969) in Section 1.11.2 consider the
distribution, with n = k — 1,
They then transform Eq. (1.150) to z and n — 1 angles. After integrating over the
angles and ensuring normalization, they arrive at the distribution
As theoretical physicists, we shall achieve the same result with 5 functions.

This avoids the need to know the transformation to spherical coordinates in an
n-dimensional space. The distribution function of z (whose integral is manifestly
one) can be denned by
If we let
we obtain
where Kn is a number denned by
It is trivial to evaluate Kn by the requirement that
The result is
This result is used in Appendix 1 .A to determine the volume and surface area of a
sphere in n dimensions.
Let us consider a simple dice throwing example from Alexander (1991).
Example
A single die is tossed n = 100 times. The frequency with which each side of
the die is observed is indicated in Table 1.1 below. Calculate chi-square from this
table.
Solution In this case, all pi = 1/6, so npi = 100/6 = m. There are six possible
frequencies, but there is one constraint. The sum must add to 100. Thus there are
MULTIVARIATE NORMAL DISTRIBUTIONS 29
TABLE 1.1.
Value 1 2 3 4 5 6
Freq 18 18 17 13 18 16
k = 5 degrees of freedom to be used in looking up the chi-square table. The value

of chi-square is
Is this large? No. Indeed, the 5% acceptance level permits fluctuations as large
as 11.070 in chi-square when the number of degrees of freedom is only 5.
1.11 Multivariate normal distributions
We can write a multivariate Gaussian distribution in the form
where x denotes a vector with n components, x i , X 2 , ...a;n, x' denotes the trans-
posed vector and A denotes a n x n matrix which can be chosen to be symmetric.
When the matrix Aij possesses off-diagonal terms, the components of x are
correlated.
The normalization factor N is determined by
Alternatively, with the definition of the multivariate characteristic function
The normalization factor N can be determined by the requirement:

Since any symmetric real matrix A can be diagonalized by an orthogonal
similaritv transformation
we can set
and
is diagonal. Thus
Because the transformation is orthogonal, its Jacobian is unity. The normalization

requirement is then
But
Thus
The characteristic function can be evaluated by completing the square in the

usual manner:
where
By changing to x — c as a variable of integration, the characteristic function is

found to be
The multivariate distribution, Eq. (1.158) that we started with has all means equal
to zero, (X) = 0.
MULTIVARIATE NORMAL DISTRIBUTIONS 31
The slightly generalized distribution
has the mean
and the characteristic function
As in the univariate case, the exponent is a quadratic form and all cumulants
of order higher than 2 vanish. Moreover, the second moment of the distribution
is given directly by the coefficients of the quadratic terms in the exponent. In
particular, with
1
Writing V = A , which is the variance matrix,
is the distribution, and
is the characteristic function.

For the bivariate case, we can write
where a? are the variances and
is the correlation coefficient.

The inverse matrix is given by
so that
with characteristic function
1.12 The laws of gambling
Introduction
In this section we propose two laws of gambling that appear to contradict one
another. We shall state them loosely first to demonstrate the apparent contradiction:
The first law of gambling states that no betting scheme, i.e., method of varying
the size of bets, can change one's expected winnings.
The second law of gambling states that if you are betting against a slightly
unfair "house" there is a way to arrange one's bets to maximize the probability of
winning.
The first law

Naive notions of betting schemes are common. The simplest example occurs in
coin tossing. If the initial bet is one dollar and you lose, double your bet. If you
win, you are ahead one dollar. If you lose, you have lost three dollars. Double your
bet. If you win the four dollars, you are ahead one dollar. If you lose, double again.
Eventually you must win. Ergo you are sure to win one dollar.
There are schemes based on more complicated betting arrangements. Can any
of them work? The first law says no. Restatement of the first law: In a Bernoulli
series of independent trials, with equal probability p of winning any individ-
ual event, the expected winnings is proportional to the total sum of money bet,
independent of how the individual bets are chosen.
Proof
Let d be the odds given, so that if br is the amount bet on the rth trial, the loss is br
on failure or the amount won on success is brd. If S is the sum won, its expected
THE LAWS OF GAMBLING 33
value is
where
is the total amount bet, q = I — p, and
is a measure of the game's unfairness. In a fair game, the odds would be d = q/p
and the expected winnings remain zero regardless of the choice of the br. In any
case, the expected winnings depends on the total bet B, not on how the bets were
distributed.
Another fallacy with the scheme of doubling one's bets is that it presumes the
bettor has infinite capital. The problem of winning is reformulated in the second
law of gambling.
The second law

The problem of winning is redefined as follows: A bettor starts with capital C and
an objective of winning an amount W. The probability of winning is the probabil-
ity of increasing one's capital to C + W before losing all one's capital. The bank,
or casino, is assumed to have an infinite amount of capital. This is a random walk
problem with absorbing boundaries, also a first passage-time problem. With this
definition, the best betting scheme is the one that minimizes the total amount bet.
The probability of winning is given by
where e = q — dp is the degree of unfairness of the bet, and B is the total amount
bet.
Proof
If P is the (unknown) probability of winning, and Q = 1 — P, the expected
winnings are
By the first law of gambling, E is independent of the gambling scheme and is

given by E = —(B, in Eq. (1.184). Setting Q = 1 — P and solving for P, we
obtain the desired expression Eq. (1.187).
Corollary
In a fair game, the probability of winning is
independent of any betting scheme.
Conjecture
Consistent with the restriction that one should never bet more than necessary to
win the game in a single step, the best strategy is to make a sufficiently large bet
that one can win in a single try. We assume the game is unfair, and this procedure
is designed to minimize the total bet.
Suppose bets are available with odds up to d = W/C. Then one should make
a single bet of one's entire capital C, at these odds. The probability of winning in
this single step is p, which by Eq. (1.186) can be expressed in terms of the available
odds as
Any betting scheme has the probability, P, of Eq. (1.187), of winning. If the odds
are lower, several bets will have to be made, and we will have B > C, thus
Suppose odds up to d\ = W/(C/2) are available. Then one can bet (7/2 on the
first bet and win with probability
The best one can do, then, is to stop if one wins on the first stop, and to bet C/2
again if one loses. The expected amount bet is then
so that the probability P of winning is better than that in the first scheme, Eq.
(1.190). It is clear, that if the degree, e, of unfairness is the same at all odds, it is
favorable to choose the highest odds available and bet no more than necessary to
achieve C + W.
APPENDIX A: THE DIRAC DELTA FUNCTION 35
1.13 Appendix A: The Dirac delta function
Point sources enter into electromagnetic theory, acoustics, circuit theory, prob-
ability and quantum mechanics. In this appendix, we shall attempt to develop
a convenient representation for a point source, and establish its properties. The
results will agree with those simply postulated by Dirac (1935) in his book on
quantum mechanics and called delta functions, or "Dirac delta functions" in the
literature.
What are the required properties of the density associated with a point source?
The essential property of a point source density is that it vanishes everywhere
except at the point. There it must go to infinity in such a manner that its integral
over all space (for a unit source) must be unity. We shall start, in one dimension,
by considering a source function 6(e, x) of finite size, e, which is small for x 3> e,
and of order 1/e for x < e, such that the area for any e is unity:
All of these requirements can be fulfilled if we choose
where g(y) is an integrable mathematical function that describes the "shape" of

the source, whereas e describes its extent.
Examples of source functions which can be constructed in this way are
A problem involving a point source can always be treated using one of the finite
sources, S(e, x), by letting e —» 0 at the end of the calculation. Many of the steps
in the calculation (usually integrations) can be formed more readily if we can let
e —> 0 at the beginning of the calculation. This will be possible provided that the
results are independent of the shape g(y) of the source. Only if this is the case,
however, can we regard the point source as a valid approximation in the physical
problem at hand.
The limiting process e —>• 0 can be accomplished at the beginning of the
calculation by introducing the Dirac delta function
This is not a proper mathematical function because the shape g(y) is not specified.
We shall assume, however, that it contains those properties which are common to
all shape factors. These properties can be used in all problems in which the point
source is a valid approximation. It is understood that the delta function will be
used in an integrand, where its properties become well denned.
The most important property of the 6 function is
for a < b < c and zero if b is outside these limits. Setting 6 = 0, for simplicity of
notation, we can prove this theorem in the following manner:
In the first term, the limit as e —> 0 can be taken. The integral over g(y) is then
1 if a < 0 < c since the limits then extend from — oo to oo, and 0 otherwise
since both limits approach plus (or minus) infinity. The result then agrees with
the desired result, Eq. (1.202). The second integral can be easily shown to vanish
under mild restrictions on the functions / and g. For example, if / is bounded and
g is positive, the limits can be truncated to fixed finite values, say a' and c' (to any
desired accuracy), since the integral converges. Then the limit can be performed
on the integrand, which then vanishes.
The odd part of the function makes no contribution to the above integral, for
any f(x). It is therefore customary to choose g(y) and hence S(x) to be even
fnnrtinns
The derivative of a delta function can be defined by

APPENDIX A: THE DIRAC DELTA FUNCTION 37
where the limiting process has a similar meaning to that used in defining the delta
function itself.
An integration by parts then yields the useful relation
when the range of integration includes the singular point. The indefinite integral
over the delta function is simply the Heaviside unit function H(x)
With g(y) taken as an even function, its integral from negative infinity to zero is
one half, so that the appropriate value of the Heaviside unit function at the origin
is
Conversely, it is appropriate to think of the delta function as the derivative of the

Heaviside unit function,
\AJlM
The derivative of a function possessing a jump discontinuity can always be written

in the form
The last term can be rewritten in the simpler form
since
and Eq. (1.211) is clearly valid underneath an integral sign in accord with Eq.
(1.202).
As a special case Eq. (1.211) yields
provided that it is integrated against a factor not singular at x = 0. Thus we can

write, with c an arbitrary constant,
or
Thus the reciprocal of x is undefined up to an arbitrary multiple of the delta func-

tion. A particular reciprocal, the principal valued reciprocal, is customarily defined
by its behavior under an integral sign,
Thus a symmetric region [—e, e] is excised by the principal valued reciprocal before
the integral is performed. The function x/(x2 + e 2 ) behaves as 1/x for all finite
x and deemphasizes the region near x = 0. Thus, in the limit, it reduces to the
nrincinal valued recinroeal. The, combination
is important in the theory of waves (including quantum mechanics) because it

enters the integral representation for "outgoing" waves. Its complex conjugate is
used for incoming waves.
Behavior of delta functions under transformations

Because the delta function is even, we can write
Thus we can make the transformation y = a\xto obtain
This can be restated as a theorem
The denominator is simply the magnitude of the Jacobian of the transformation to

the new variable
A natural generalization to the case of a nonlinear transformation y = j (x) is

given by
where the xr are the nondegenerate roots of /
This theorem follows from the fact that a delta function vanishes everywhere
except at its zeros, and near each zero, we can approximate
and apply Eq. (1.220).

APPENDIX A; THE DIRAC DELTA FUNCTION 39
A simple example of the usefulness of Eq. (1.222) is the relation
Multidimensional delta functions

The concept of a delta function generalizes immediately to a multidimensional
space. For example, a three-dimensional delta function has the property that
If the integral is written as three successive one-dimensional integrals in Cartesian

coordinates, Eq. (1.226) is equivalent to the statement
If spherical coordinates are used, Eq. (1.226) is equivalent to
The denominator is just the Jacobian for the transformation from Cartesian to
spherical coordinates. This is the natural generalization of the Jacobian found in
Eq. (1.221), and guarantees that the same result, Eq. (1.226), is obtained regardless
of which coordinate system is used.
Volume of a sphere in n dimensions

We shall now show how to calculate the volume of a sphere in n dimensions with
the help of delta functions, and probability theory alone! The volume of a sphere
of an n-dimensional sphere of radius R can be written
where the Heaviside unit function confines the integration region to the interior of
the sphere. If we now differentiate this equation with respect to R2 to convert the
Heaviside function to a delta function, we get
Now, if we let X{ = Rui for all i and use the scaling property, Equation (1.220) of
delta functions we get
where
is a constant that depends only on n. Since dVn = Rn dRdS where S is a surface

element, 2Kn, can be interpreted as the surface area, Sn of a sphere of radius unity
in n-dimensions. However, the normalization of the chi-square distribution in Eq.
(1.155) forces Kn to take the value, Eq. (1.156),
Finally, Eq. (1.231) can be integrated to yield
1.14 Appendix B: Solved problems
The dice player

A person bets that in a sequence of throws of a pair of dice he will get a 5 before
he gets a 7 (and loses). What odds should he be given to make the bet fair?
Solution
Out of the 36 possible tosses of a pair, only the four combinations, 1 + 4, 2 + 3,
3 + 2, and 4 + 1 add to 5. Similarly, six combinations add to 7: 1 + 6, 2 + 5,
3 + 4, 4 + 3, 5 + 2, and 6 + 1. Thus in a single toss the three relevant probabilities
P5 = 4/36, P7 = 6/36 for 5 and 7, and the probability for all other possibilities
combined P0 = 26/36. The probability of r tosses of "other", and s > 1 tosses of
5, followed by a toss of a 7 is given by
where the sum on s starts at 1 to insure the presence of a 5 toss, r has been replaced
by n — s and the combinatorial coefficient has been inserted to allow the "other"
tosses and the 5 tosses to appear in any order. The 7 toss always appears at the end.
APPENDIX B: SOLVED PROBLEMSs 41
The sum over s can be accomplished by adding and subtracting the s = 0 term:
The two terms are simply summable geometric series. Using P$ + Pj + P0 = 1

the final result is
The corresponding result for 7 to appear first is obtained using the formulas with
5 and 7 interchanged:
Thus the "house" should give 3 : 2 odds to the bettor on 5.
The gold coins problem

A desk has four drawers. The first holds three gold coins, the second has two gold
and one silver, the third has one gold coin and two silver, and finally the fourth
drawer has three silver coins. If a gold coin is drawn at random, what is (a) the
probability that there is a second gold coin in the same drawer, and (b) if a second
drawing is made from the same drawer, what is the probability that it too is gold?
Solution
Because of the distribution of the gold coins, the probability that the coin came
from the first drawer is pi = 3/6 because three of the six available gold coins
were in that drawer. Similarly p2 = 2/6, and p% = 1/6. The probability that there
is a second coin in the same drawer is 1 x pi + 1 x p% + 0 x p% = 5/6. Similarly,
the probability that the second selected coin (from the same drawer) is gold is
1 x pi + (1/2) x p2 = 2/3, since the second coin is surely one in the first drawer,
and has a 1/2 chance in the second drawer, and there are no gold coins left in the
third drawer. Note that these values of 1, 1/2, and 0 are conditional probabilities
given the outcome of the first choice.
The Bertrand paradox

The Bertrand problem can be stated in the following form: A line is dropped ran-
domly on a circle. The intersection will be a chord. What is the probability that
the length of the chord will be greater than the side of the inscribed equilateral
triangle?
Solution 1
The side of the inscribed triangle is a chord with a distance of half of the radius,
-R/2, from the center of the circle. Assuming that along the direction perpendicular
to the side of the inscribed triangle the distance is uniformly distributed, the chord
will be longer if the distance from the circle center is less than R/2, which it is
with probability 1/2.
Solution 2
The chord will be greater than the triangle side if the angle subtended by the
chord is greater than 120 degrees (out of a possible 180) which it achieves with
probability 2/3.
Solution 3
Draw a tangent line to the circle at an intersection with the chord. Let (f> be the
angle between the tangent and the chord. The chord will be larger than the triangle
side if <j> is between 60 and 120 degrees, which it will be with probability (120 —
60)/180 = 1/3.
Solution 2 is given in Kyburg (1969). Solution 3 is given by Uspensky (1937)
and the first solution is given by both.
Which solution is the correct one? Answer: the problem is not well defined.
We do not know which measures have uniform probability unless an experiment it
specified. If a board is ruled with a set of parallel lines separated by the diameter,
and a circular disk is dropped at random, the first solution is correct. If one spins a
pointer at the circle edge, the third solution would be correct.
Gambler's ruin
Hamming (1991) considers a special case of the gambler's ruin problem, in which
gambler A starts with capital C, and gambler B starts with W (or more) units. The
game will be played until A loses his capital C or wins an amount W (even if B is
a bank with infinite capital). Each bet is for one unit, and there is a probability p
that A will win and q = I — p that B will win.
Solution
The problem is solved using the recursion relation
APPENDIX B: SOLVED PROBLEMS 43
where P(n) is the probability that A will win if he holds n units. The boundary
conditions are
Strictly speaking, the recursion relation only needs to be satisfied for 0 < n < T,
which omits the two end points. However, the boundary conditions, Eq. (1.240),
then lead to a unique solution. The solution to a difference equation with constant
coefficients is analogous to that of a differential equation with constant coeffi-
cients. In the latter case, the solution is an exponential. In the present case, we
search for a power law solution, P(n) = rn, which is an exponential in n. The
result is a quadratic equation with roots 1 and p/q. The solutions, 1™ and (p/q)n,
actually obey the recursion relation, Eq, (1.239), for all n. But they do not obey
the boundary conditions. Thus we must, as in the continuous case, seek a linear
combination
and impose the boundary conditions of Eq. (1.240) to obtain simultaneous linear
conditions on A and B. The final solution is found to be
where we must set n = C, and T = C + W to get the probability that A at his

starting position will win.
lip = 1/2, the two roots for r coalesce, and as in the differential equation case
a solution linear in n emerges. Application of the boundary conditions leads to
Since our solution obeys the boundary conditions, as well as the difference equa-
tion everywhere (hence certainly in the interior) it is clearly the correct, unique
solution.
2
What is a random process
2.1 Multitime probability description
A random or stochastic process is a random variable X(t), at each time t, that

evolves in time by some random mechanism (of course, the time variable can be
replaced by a space variable, or some other variable in application). The variable
X can have a discrete set of values Xj at a given time t, or a continuum of values
x may be available. Likewise, the time variable can be discrete or continuous.
A stochastic process is regarded as completely described if the probability
distribution
is known for all possible sets [ti, t%, ...,tn] of times. Thus we assume that a set of
functions
describes the probability of finding
We have previously discussed multivariate distributions. To be a random process,

the set of variables Xj must be related to each other as the evolution in time Xj =
X(t) \t=t, of a single "stochastic" process.
2.2 Conditional probabilities
The concept of conditional probability introduced in Section 1.8 immediately

generalizes to the multivariable case. In particular, Eq. (1.82)
can be iterated to yield

STATIONARY, GAUSSIAN AND MARKOVIAN PROCESSES 45
When the variables are part of a stochastic process, we understand Xj to be an
abbreviation for X ( t j ) . The variables are written in time sequence since we regard
the probability of xn as conditional on the earlier time values x n _i, ...,xi.
2.3 Stationary, Gaussian and Markovian processes
A stationary process is one which has no absolute time origin. All probabilities are
independent of a shift in the origin of time. Thus
In particular, this probability is a function only of the relative times, as can be seen
by setting r = —ti. Specifically, for a stationary process, we expect that
and the two-time conditional probability
reduces to the stationary state, independent of the starting point when this limit
exists. For the otherwise stationary Brownian motion and Poisson processes in
Chapter 3, the limit does not exist. For example, a Brownian particle will have a
distribution that continues to expand with time, even though the individual steps
are independent of the origin of time.
A Gaussian process is one for which the multivariate distributions
pn(xn,xn-i, ...,xi) are Gaussians for all n. A Gaussian process may, or may
not be stationary (and conversely).
A Markovian process is like a student who can remember only the last thing he
has been told. Thus it is defined by
that is the probability distribution of xn is sensitive to the last known event xn-\
and forgets all prior events. For a Markovian process, the conditional probability
formula. Ea. (2.5) specializes to
so that the process is completely characterized by an initial distribution p ( x i )

and the "transition probabilities" p(xj a^-i). If the Markovian process is also
46 WHAT IS A R A N D O M PROCESS
stationary, all P(XJ\XJ-\) are described by a single transition probability
independent of the initial time tj-i.
2.4 The Chapman-Kolmogorov condition
We have just shown that a Markovian random process is completely characterized

by its "transition probabilities" p(xz x\). To what extent is p(x<2 x\) arbitrary?
This question may be answered by taking Eq. (2.4) for a general random process
specializing to the three time case and dividing by p(x\) to obtain
or
If we integrate over x% we obtain
For the Markovian case this specializes to the Chapman-Kolmogorov condition
which must be obeyed by the conditional probabilities of all Markovian processes.

The Chapman-Kolmogorov condition is not as restrictive as it appears. Many
Markovian processes have transition probabilities that for small At obey:
where wa>a is the transition probability per unit time and the second term has been
added to conserve probability. It describes the particles that have not left the state
THE CHAPMAN-KOLMOGOROV CONDITION 47
a provided that
If we set t = to + Ato, we can evaluate the right hand side of the Chapman-
Kolmosorov condition to first order in At and Atr>:
which is just the value p(a', to + At + Ato|ao, to) expected from Eq. (2.18).
Note, however, that this proof did not make use of the conservation condition,
Eq. (2.19). This will permit us, in Chapter 8, to apply the Chapman-Kolmogorov
condition to processes that are Markovian but whose probability is not normalized.
3
Examples of Markovian processes
3.1 The Poisson process
Consider two physical problems describable by the same random process. The
first process is the radioactive decay of a collection of nuclei. The second is the
production of photoelectrons by a steady beam of light on a photodetector. In both
cases, we can let a discrete, positive, integer valued, variable n(t) represent the
number of counts emitted in the time interval between 0 and t. In both cases there
is a constant probability per unit time v such that vdt is the expected number of
photocounts in [t, t + dt] for small dt. We use the initial condition
Then n — HQ will be the number of counts in the interval [0, i]. When we talk
of P(n, t) we can understand this to mean P(n, t|n 0 , 0), the conditional density
distribution. Since the state n(t) = n is supplied by transitions from the state n — I
with production of photoelectrons at a rate vdt and is diminished by transitions
from state n to n + 1 we have the eauation
with the middle term supplying the increase in P(n) by a transition from the n — I
state, and the last term describes the exit from state n by emission from that state.
These are usually referred to as rate in and rate out terms respectively. Canceling
a factor dt we obtain the rate equation
In the first term, n increases from n — 1 to n, in the second from n to n +1. Thus n
never decreases. Such a process is called a birth process in the statistics literature,
or a generation process in the physics literature. A more general process is called
a birth and death process or a generation-recombination process.
Since n > no we have no supply from the state P(HQ — 1, t) so that
whose solution is
THE POISSON PROCESS 49
since P(n, 0) = Snjrto at time t = 0 corresponding to the certainly that there are
no particles at time t = 0.
The form, Eq. (3.5), of this solution suggests the transformation
with the resultant equation
subject to the initial condition
Thus any Q(n,t) may be readily obtained if Q(n—l) is known. Butn, as described
by Eq. (3.3), can only increase. Thus
andEq. (3.5) yields
Solution by induction then yields
or, setting n = no + m,
for n > no with a vanishing result for n < no- This result specializes to the usual
Poisson answer
for the usual case no = 0 (see also Eq. (1.128)). The two formulas, Eqs. (3.12)
and (3.13) are, in fact, identical since n — no has the meaning of the number of
events occurring in the interval (0, t). The more general form is useful in verifying
50 EXAMPLES OF M A R K O V I A N PROCESSES
the Chapman-Kolmogorov conditions
where the last step recognizes the binomial expansion that occurred in the previous
step. The final result is equal to that in Eq. (3.13) if t is replaced by (t — to) in the
latter.
The Poisson process is stationary so that P(n, t no, to) is a function only of
t — to- However, no limit exists as t — to —»• oo, so that there is no time independent
P(n). We shall therefore evaluate the characteristic function of the conditional
probability density
This result reduces to Eq. (1.130) if one sets HO = 0. The cumulants can be
calculated as follows:
where the cumulants all have the same value!
Here the subscript L is used to denote the linked moment or cumulant as in Section
1.6.
3.2 The one dimensional random walk
The Bernoulli sequence of independent trials described in Section 1.9 can be

mapped onto a random walk in one dimension. In Fig. 3.1 we show an array of
THE ONE DIMENSIONAL RANDOM WALK 51
FIG. 3.1. Random walk on a discrete lattice with spacing a.
points at the positions ja where j = 0, ±1, ±2, etc. and a is the spacing between
the points. At each interval of time, T, a hop is made with probability p to the right
and q = 1 — p to the left.
The distribution of r, of hops to the right, in N steps in given as before by the
Bernoulli distribution:
The first moment, and the second moment about the mean are given as before in
Section 1.9 by
A particle that started at 0 and taken r steps to the right, and N — r to the left
arrives at position
with mean value
Notice, if p = q = 1/2, or equal probability to jump to the right or the left, the
average position after N steps will remain 0. The second moment about the mean
is given by
From the central limit theorem, discussed in Section 1.9, the limiting dis-
tribution after many steps is Gaussian with the first and second moments just
obtained:
If we introduce the position and time variables by the relations

the moments of x are given by
The factor 2 in the definition of the diffusion coefficient D is appropriate for one
dimension, and would be replaced by 2d if were in a space of dimension d. Thus
the distribution moves with a "drift" velocity
and spreads with a diffusion coefficient denned by
The appropriateness of this definition of diffusion coefficient is made clear in

Section 3.4 on "Diffusion processes and the Einstein relation".
A detailed discussion of random walks in one and three dimensions is given
by Chandrasekhar (1943) as well as by Feller (1957). A recent encyclopedia arti-
cle by Shlesinger (1997) emphasizes recent work in random work problems. See
also A Wonderful World of Random Walks by Montroll and Shlesinger (1983). An
"encyclopedic" review of "stochastic process" is given by Lax (1997).
3.3 Gambler's ruin
The problem we discussed, in connection with the second law of gambling, that
of winning a specific sum W starting with a finite capital C, is referred to as the
Gambler's ruin problem.
To make connection to physical problems, we map the probability to a ran-
dom walk problem on a line. It is distinguished from conventional random walk
problems because it involves absorbing boundaries. Since the game ends at these
boundaries it is also a first passage time problem - a member of a difficult class.
The gambling problem with bet 6 and odds d can be described as a random
walk problem on a line with steps to the left of size 6 if a loss is incurred, and a
step to the right of bd if a win occurs. Instead of dealing with the probability of
winning at each step, we shall define P(x) as the probability of eventually winning
if one starts with capital x.
Our random walk starts at the initial position C. The game is regarded as lost if
one arrives at 0, i.e., no capital left to play, and it is regarded as won if one arrives
at the objective C + W.
G A M B L E R ' S RUIN 53
Our random walk can be described by the recursion relation:
since the right hand side describes the situation after one step. With probability p
one is at position x + bd with winning probability P(x + bd) and with probability
q, one is at position x — b with winning probability P(x — b). Since the probability
of eventual winning depends on x, but not how we got there, this must also be the
probability P(x).
The procedure we have just described of going directly after the final answer,
rather than following the individual steps, is given the fancy name "invariant
embedding" by mathematicians, e.g., Bellman (1964).
The boundarv conditions we have are
We seek a homogeneous solution of this linear functional equation in exponential

form P(x) = exp(Ax), just as for linear differential equations. Equation (3.28)
then determines A by requiring
We establish in Appendix 3.A that there are exactly two roots, one with A =
0, and one with A > 0. Calling the second root A, the general solution of Eq.
(3.28) is a linear combination of 1 and exp(Ax) subject to the boundary conditions,
Eq. (3.29), with the result
The probability of winning is that associated with the initial capital C:
Although Eq. (3.30) does not supply an explicit expression for A (except in the
case C <C 1), we know that A > 0. The denominator in Eq. (3.32) then increases
more rapidly with A than the numerator. Thus
Since the condition (3.30) involves only the product A6, an increase in b causes
a decrease in A, hence an increase in P. Thus the probability of winning is an
increasing function of b. Of course, at the starting position, a bet greater than C is
impossible. Thus the optimum probability is obtained if A is calculated from Eq.
(3.30) with b replaced by C:
Our arguments have tacitly assumed that one bet requires a step outside the
domain 0 < x < W + C. Thus if a game with large odds d = 2W/C were
allowed, the preceding argument would not apply, and it would be appropriate to
bet C/2, since the objective is to win no more than W, and to terminate the game
as soon as possible, in order to minimize the total amount bet.
3.4 Diffusion processes and the Einstein relation
In the large N limit, the distribution function (3.23) for the one-dimensional
random walk can be written as
where dx/dn = a is the width of one step, and
Equation (3.35) is the Green's function of the diffusion equation that is written
down explicitly in Eq. (3.50) below. That is, Eq. (3.35) is the solution of Eq. (3.50)
that obeys the initial condition:
Let us compare this result with the macroscopic theory of diffusion in which a
concentration c of particles obeys the conservation law.
where the particle (not electrical) current density is given by Pick's law
and D is the macroscopic diffusion constant. Thus c obeys the diffusion equation
which reduces in one dimension for constant D to

DIFFUSION PROCESSES AND THE EINSTEIN RELATION 55
The Green's function solution to this equation appropriate to a point source at
x = XQ at t = to is given by
in agreement with our random walk result, Eq. (3.35) for v = 0 but the initial
position at XQ.
The Einstein relation

Einstein's (1905, 1906, 1956) original idea is that if one sets up a uniform applied
force F there will be a drift current
where the mechanical mobility B is the mean velocity of a particle per unit applied
force. Thus the drift current per unit of concentration c is proportional to the
applied field F.
However, if a force F is applied in an open circuit, a concentration gradient
will build up large enough to cancel the drift current
or
The simplest example of this is the concentration distribution set up in the atmo-
sphere subject to the gravitational force plus diffusion. This steady state result
must agree with the thermal equilibrium Boltzmann distribution
where k is Boltzmann's constant, T is the absolute temperature, and the potential

energy V is
Comparison of the two expressions for c(x) yields the Einstein relation between
diffusion, D, and mobility, B:
For charged particles, F = eE, and the electrical mobility is /j, = v/E = eB so
that
a frequently stated form of the Einstein relation.

If the entire current, Eq. (3.44), is substituted into the continuity equation
(3.38), one gets
where v = BF (or efj,E in the charged case). Equation (3.50) is a special case
of a Fokker-Planck equation to be discussed in Section 8.3. We note, here, that
the drift is contained in the first derivative coefficient term and the diffusion is the
second derivative coefficient.
The solution of this equation for a pulse starting at x = 0 at t = 0 is
which is the precise relation of the discrete random walk solution, Eq. (3.35).
By injecting a pulse of minority carriers into a semiconductor and examining the
response on an oscilloscope at a probe a distance down the sample, a direct mea-
surement can be made of the "time of flight" of the pulse and the spread in its
width. This technique introduced by Haynes was applied by his class Transistor
Teacher's Summer School (1952) to verify the Einstein relation for electrons and
holes in germanium. Note that Eq. (3.51) describes a Gaussian pulse whose center
travels linear with a velocity v. Thus the center of the pulse has a position that
grows linearly with time. Also, the pulse has a Gaussian shape, and the root mean
square width is given by (2Dt) 1 / 2 . Measurements were made by each of the 64
students in the class. The reference above contained the average results that ver-
ified the Einstein relation between the diffusion constant and the mobility. With
holes or electrons injected into a semiconductor, a pulse will appear on a computer
screen connected by probes to the semiconductor. For several probes at different
distances, the time of arrival can be noted and the width of the pulse is measured
at each probe. Thus a direct measurement is made of both the mobility /j, and the
diffusion constant D.
3.5 Brownian motion
The biologist Robert Brown (1828) observing tiny pollen grains in water under
a microscope, concluded that their movement "arose neither from currents in the
fluid, nor from its gradual evaporation, but belonged to the particle itself". Mac-
Donald (1962) points out that there were numerous explanations of Brownian
motion, proposed and disposed of in the more than 70 years until Einstein (1905,
1906, 1956) established the correct explanation that the motion of the particles was
due to impact with fluid molecules subject to their expected Boltzmann distribution
of velocities.
L A N G E V I N THEORY OF VELOCITIES IN B R O W N I A N MOTION 57
It is of interest to comment on the work of von Nageli (1879) who proposed
molecular bombardment but then ruled out this explanation because it yielded
velocities two orders of magnitude less than the observed velocities of order
10~4cm/sec. von Nageli assumed that the liquid molecules would have a velocity
given by
with a molecular weight of 100, m ~ 10 22 gram and T ~ 300K so that v ~

2 x 104cm/sec. The Brownian particle after a collision can be expected to have a
velocity
where the mass of the Brownian particle M is proportional to the cube of its radius
so that
so that V ~ (2 x 104)/(8 x 109) ~ 2 x 10~6 cm/sec, which is two orders of

magnitude too small. Conversely, if we assume the Brownian particle to be in
thermal equilibrium,
Since M ~ (8 x 109)(10"22g) ~ 10~12 grams, and T ~ 300K, we have V ~ 0.2

cm/sec which is much larger than the observed velocity of 3 x 10~4 cm/sec. We
shall return to the resolution of this discrepancy after we have discussed Brownian
motion from the Langevin (1908) point of view.
3.6 Langevin theory of velocities in Brownian motion
Our introduction to the Langevin treatment of Brownian motion comes from the
paper of Chandrasekhar (1943) and the earlier paper of Uhlenbeck and Ornstein
(1930), both of which are in the excellent collection made by Wax (1954).
However, a great simplification can be made in the algebra if one assumes from
the start that the process is Gaussian in both velocity and position. The justification
is given in Appendix 3.B.
The distribution of velocity is first considered. The free particle of mass M
subject to collisions by fluid molecules is described by the equation (for simplicity,
we discuss in a one-dimensional case, instead of the actual three-dimensional case)
It was Langevin's (1908) contribution to recognize that the total force F exerted
by the fluid molecules contains a smooth part —v/B associated with the viscosity
of the fluid that causes the macroscopic motion of a Brownian particle to decay
plus a fluctuating force F(t) whose average vanishes
This fluctuating part will be shown to give rise to the diffusion of the Brownian
particle. The relation between the fluctuating part and the diffusion part is the
Einstein relation to be derived below. It is also a special case of the fluctuation-
dissipation theorem to be derived in Chapter 7.
Note that if a steady external force G is applied, the average response at long
times is v = BG so that B is to be interpreted as the mechanical mobility. If the
particle is a sphere of radius a moving in a medium of viscosity rj then Stokes law
yields (in the three-dimensional case)
To simplify notation in subsequent calculations, we shall rewrite Eq. (3.56) as
For a Brownian particle of diameter 2a = 1 micron and mass M ~ (8 x 10 9 )

grams moving in a fluid of viscosity of r/ ~ 10~2 poise, we estimate that
I/A ~ 10~7 sec. Microscopic collisions with neighboring particles should occur
in a liquid at roughly 1012-1013/sec. Thus the correlation
must fall off in times of order 10 sec, much shorter than the 10 sec decay
time. It is therefore permissible to approximate the correlation as a Dirac delta
function.
where d is an as yet unknown constant.

Equation (3.59) can be solved for the velocity
where (v(t)}Vo is the ensemble average velocity contingent on u(0) = VQ.

L A N G E V I N THEORY OF VELOCITIES IN B R O W N I A N MOTION 59
2
The mean square velocity contingent (Au ) on v(0) = VQ is then the first term
on the left:
for the particular ^(s) of Eq. (3.62). The limiting value at long times is
In the limit as t —> oo,(v (t))Vo must approach the thermal equilibrium value
This yields another Einstein relation for the one-dimensional case.
that relates a measure d of diffusion in velocity space to the mobility B (or the
dissipation). Equation (3.64) can be rewritten
Thus the mean square deviation of the velocity from its mean, starting with the
initial velocity VQ, namely avv is independent of the starting velocity VQ\ This is a
special case, with u = t, of Eq. (8.18) of Classical Noise I in Lax (19601).
For the delta correlated case, Eq. (3.62) shows that the velocity is a sum of
uncorrelated (hence independent) Gaussian variables since (A(s)A(s')) = 0 for
s ^ s'. Since each term is Gaussian, the sum will also be a Gaussian random vari-
able (see Appendix 3.B). Thus the statistics of v(t) are completely determined
by its mean and second cumulant since all higher cumulants vanish. Thus the
conditional probability density for v(t~) is given by
where (v)Vo = VQ exp(—At) and the unconditional average (v2} = kT/M is just
the thermal average independent of time. In the limit as t —» oo we approach the
steady state solution
which agrees with the equilibrium Boltzmann distribution.

Is the converse theorem true? That a Gaussian steady state distribution P(v)
implies that A(s) must be a Gaussian process? The answer is positive and is
presented in Appendix 3.B.
We calculated the velocity distribution in this section, Eq. (3.69), by assuming
the velocity v(x) is a Gaussian variable even though velocity-velocity correlations
exist. Our results can be justified by the fact that /\v(t) is given by the integral
in Eq. (3.62) which represents a sum of uncorrelated variables since (A(s)A(s'))
vanishes for s ^ s'.
3.7 Langevin theory of positions in Brownian motion
Since the position of a particle is determined by the time integral of the velocity,
we would expect that the statistics of Brownian motion of a particle, that is the
random motion in position space, can be determined fairly directly by a knowl-
edge of its motion in velocity space. More generally, one would like to determine
the joint distribution in position and velocity space. We shall see in this section
that the manipulations to be performed involve only minor difficulties provided
that the distribution in positions, velocities and the joint distribution are all Gaus-
sians. That is because the distributions can be written down fairly readily from the
first and second moments if the distributions are Gaussian in all variables. But its
proof depends on the Gaussian nature of the sum of Gaussian variables. And we
have only established that Gaussian nature if the variables are independent. Since
positions and velocities are correlated, it is not clear whether the scaffold we have
built using independent variables will collapse.
To separate the computational difficulties from the fundamental one, we shall
perform the calculations in this section assuming that all the variables involved are
Gaussian, and reserve for Appendix 3.B a proof that this is the case.
The average position may be obtained by integrating Eq. (3.62) with respect to
time and averaging:
Here, all averages are understood to be taken contingent on given initial velocities
and positions.
Next, we calculate ((x(t) — (x(t))) 2 ). The general value of the random variable,
x ( t ) , can be obtained by integrating Eq. (3.62) over time, by setting t = w in Eq.
(3.62) and integrating over w from 0 to t. The expression is simplest if we subtract
off the average position given by Eq. (3.70). The result takes the form
L A N G E V I N THEORY OF POSITIONS IN BROWN1AN MOTION 61
where
with
Integration by parts then yields
where
The fluctuations in position are then described after applying Eq. (3.61) by
where VQ has again canceled out. It is of interest to examine axx (t) for small and
large t. For small t, we may expand the exponential so that
Thus for small t, x(t) is excellently described by (x(t)}. More specifically
As
when At <C 1. Conversely, for large t, we may omit the e in cr 2 (t) to obtain
Since (x(t)) is bounded, we obtain the diffusion result
where the diffusion constant D for x is given by comparison with Eq. (3.79) to be
in terms of the diffusion constant d for velocity. After use of Eqs. (3.65), (3.66) we
find
which is again the usual Einstein relation, Eq. (3.48).

We shall now proceed by assuming that the position x(t) is Gaussian. The
distribution of any Gaussian variable is completely determined by its first and sec-
ond moments. The first moment of x is calculated in Eq. (3.70), and the second
moment of x about its mean is given in Eq. (3.76). The probability density for x is
then given by that Gaussian with the correct first and second moments:
where <J xx (t) defined by Eq. (3.76) is given by
where D is given by Eq. (3.81). It is also possible to write a joint distri-

bution function, P(x,v,t), for x and v. For this purpose, we also need the
cross-correlation
where the A's describe deviations from the corresponding mean values conditional
on a given VQ and XQ.
The characteristic function of the conditional probability is then determined
from the first and second cumulants in the form
Both the first moments x and v and the second order cumulants are understood
to be conditional on the initial position and velocity that were just computed. The
original conditional distribution P(x, v, t\XQ, VQ, 0) appearing in Eq. (3.86) can be
obtained by taking the inverse Fourier transform of Eq. (3.86). This was already
done in Eq. (1.182) for the case of two Gaussian variables, and expressed directly
in terms of the second moments axx = a\, avv = a^ and the correlation coefficient
L A N G E V I N THEORY OF POSITIONS IN BROWN1AN MOTION 63
Adiabatic elimination
The conditional probability P(x, t\xo, 0) for position does not obey the Chapman-
Kolmogorov condition. Thus it does not describe a Markovian process. In what
way can the Uhlenbeck, Ornstein, Chandresekhar problem of diffusion in x and
v space be reduced to the usual Einstein Brownian motion problem in ordinary
space? If one recognizes that the time I/A is short compared to the time interval
At over which one measures the positions of the particles with a similar condition
on the accuracy of positions
one can then approximate Eq. (3.83) by
Another way of looking at the problem is to think of the rapid motion in

velocity adiabatically following the slow motion in position. This can be done
by rewriting Eq. (3.59) in the form
Then if we regard the slow motion in d/dt as small compared to A, we neglect the
former and have
so that
Now x(t) obeys a Brownian motion directly, and the diffusion constant is
in agreement with Eq. (3.81) obtained earlier. Equation (3.89) is exactly parallel
to the Brownian motion Eq. (3.69) for velocity, and the analogous solution is the
standard diffusion solution of Eq. (3.41).
An adiabatic elimination procedure was extensively used to decrease a six
variable problem (two fields, two populations and two polarizations in a gas
laser) in Lax (1964QIII). to a two variable problem whose solution was feasible
(Hempstead and Lax 1967).
Some of the original references to the work on Brownian motion are given in
Smoluchowski (1916), Einstein (1905), and Furth (1920).
3.8 Chaos
There are chaotic and other processes that differ from Brownian motion in that the
root mean square growth is
This occurs in many natural phenomena. An early known example relates to the
flow of water in the Nile river. Records of this water flow have been kept over
many centuries. A detailed investigation by a civil engineer, Hurst (1951), has an
exponent a that differs from 1/2 and fits closely to an exponent of 0.6.
The scaling properties of the fluctuations studied by Hurst (1951) were
investigated further by Anis and Lloyd (1976).
Many of the phenomena in chaotic motion are described in terms fractional
power laws associated with "fractals", a term introduced by Mandelbrot (1983). A
brief chapter is provided in Arfken and Weber (1995). An introduction to chaos is
provided by G. P. Williams (1997).
3.9 Appendix A: Roots for the gambler's ruin problem
Let
and
By Eq. (3.30), the roots we desire obey F(Z] = 1. F(Z) is infinite at Z = 0 and
Z = co and possesses a single minimum at Zm determined by
or
This is clearly a minimum: F"(Z] is not only positive at Z = Zm, it is positive

for all real Z. Hence to have two roots, we must show that this minimum value is
less than 1.
APPENDIX A: ROOTS FOR THE G A M B L E R ' S RUIN PROBLEM 65
If we evaluate F(Z) at the minimum we get
If we replace q by pd + e which uses the definition of e from Section 1.12, we can

express both p and q in terms of e and d:
the minimum value of F reduces to
Is Fm < 1 for F(Z) = 1 to have two roots? This will be true if
This is clearly true for e —» 1. By expansion, it can be verified for small e. It can
be extended to all e by taking the derivative with respect to e and after canceling a
factor (1 + d)/d, verifying that
Taking the logarithm of Eq. (3.103), we can prove the stronger statement
The first inequality is equivalent to the well known inequality
which is clearly obeyed at small e and its derivative clearly obeys
as long as we restrict ourself to gambling that favors the house (with e > 0). The
second inequality
is true for all positive e. Thus our original inequality Fn < 1 is true for all e. Hence
there are two roots. By inspection, z = 1 is one root.
Thus we have two possibilities. If the smaller root is Z = 1, the larger root
will have Zr > 1. We shall establish this by showing that the slope is negative at
Z = 1, showing that Z = 1 is the smaller root:
since e = q — pd, the "take" of the gambling house is always positive.
3.10 Appendix B: Gaussian random variables
We start by establishing the useful theorem:
Theorem
The sum, C, of two independent Gaussian variables A and B is Gaussian.
Proof
The characteristic function of the sum is
where the independence of the variables permits the factorization. If A and B are
Gaussian, only linear and quadratic powers of t appear in each exponent involving
A and B, hence also for C. Thus the random variable C is Gaussian (proved).
If A + B is known to be Gaussian the calculations can be performed by direct
use of Ott's theorem (1.56):
The cross-terms vanish when A and B are uncorrelated.

If A, B, and their sum are all known to be Gaussian, then Eq. (3.111) can be
used without discarding the cross-terms when the variables A and B are correlated.
Higher moments do not vanish, but all cumulants above order two will vanish.
These ideas all generalize nicely to the continuous case. If s is a continuous
variable such as the time, a(s) is a set of known functions and A(s) is a set of
APPENDIX B: GAUSSIAN RANDOM VARIABLES 67
Gaussian random variables, then the sum variable
will obey Ott's theorem (1.56) if the variables A(s) are independent, or if C is
otherwise known to be Gaussian.
Even if C is not Gaussian its mean value is given by
To deal with averages of products of linear stochastic integrals of the form found
in Eq. (3.62), we introduce a theorem that will make all the requirements easy.
Although not stated explicitly, this theorem is used implicitly and extensively by
Chandrasekhar (1943) and Uhlenbeck and Ornstein (1930).
The average of the product of two linear stochastic integrals can be written
and in the special case, when the motion is pure Brownian, Eq. (3.61) is obeyed
and
Given that A(t) is Gaussian, with vanishing correlation between different

times, the integration over time in Eq. (3.64) is a sum of uncorrelated Gaussian
variables. Thus we can conclude that the velocity is a Gaussian random variable.
The position x is a time integral over velocity. But positions at two times are
correlated. Thus we have not proved (yet) that the position is a Gaussian variable.
Our calculations in this section are all based on the assumption that the position
variable x(t) is Gaussian even though correlations are present. The justification
in this case is similar to that used for velocities. We have shown in Eq. (3.86)
that Ax(i) is expressible as an integral over A(s) a set of uncorrelated Gaussian
variables.
How can we justify the Gaussian assumption when both positions and veloci-
ties are present and they correlate with each other? Since both Au(t) and Ax(t')
are expressible as integrals over the uncorrelated Gaussian A(s), any linear com-
bination is expressible in this way, hence Gaussian. Thus the joint distribution of
these variables is Gaussian, and that justifies our procedure.
Uhlenbeck and Ornstein (1930) recognized that there is a problem and
attempted to prove that the system is Gaussian by estimating third and fourth
moments. For example they argue that if si is close to s? but these are not near the
pair 53 and 54 and the correlation function R(SZ — 53) only has short range one
should have:
They do this in connection with determining the fourth moment of the velocity:
Since the four times can be partitioned into pairs in three ways they state that the
right hand side is, in effect multiplied by 3. When applied to the fourth moment of
the velocity Eq. (3.116) can be translated into
which is consistent with the velocity being Gaussian. Conversely, since the
Maxwell distribution has the Gaussian form
v must be regarded as a Gaussian random variable. And this requires A(s~) to be

Gaussian. That is the converse of what we have proved, so far, namely that if A(s)
is Gaussian, both v(t) and x(t) which are expressible in terms of A(s) will be
Gaussian even if correlations are present.
A succinct way of showing that the force A(s) must be Gaussian if the velocity
is Gaussian is to write Eq. (1.54) for the log of the characteristic function in the
form
If v is Gaussian, this implies that for all linked moments of the velocity
Since the velocity can be written as an integral over a force as in Eq. (3.64)
the requirement, Eq. (3.120), can be written as
For all the linked moments to vanish for n > 2 this must be true of the random
forces as well:
for n > 2. Thus A(s) must be Gaussian.

4
Spectral measurement and correlation
4.1 Introduction: An approach to the spectrum of a stochastic process
In this chapter we shall compare three definitions of noise: The standard engineer-
ing definition that takes a Fourier transform over a finite time interval, squares
it, divides by the time and then takes the limit as the time approaches infinity.
The second definition is the Fourier transform of the autocorrelation function. The
equality between these two definitions is known as the Wiener (1930)-Khinchine
(1934, 1938) theorem. The third procedure that we adopt it is to pass the signal
through a realizable filter of finite bandwidth, square it, and average over some
large finite time. As the bandwidth is allowed to approach zero, the result will
(aside from a normalization factor) approach the ideal value of the two preceding
definitions.
4.2 The definitions of the noise spectrum
The standard engineering definition of the noise spectrum

The spectrum of noise Gs (a, uj) in a random variable a(t) is a measure of the fluc-
tuation energy in the frequency interval [u,uj + du] associated with the fluctuating
part a(t), where
The standard engineering definition of noise is chosen for the case of a stationary
process, to obey the normalization
because it is customary in engineering to emphasize positive frequencies / =

(o;/27r) > 0 only. For this reason, we adopt the definition
and verify the normalization later. We use the subscript s to denote the standard
engineering (SE) definition.
70 SPECTRAL MEASUREMENT AND CORRELATION
The letter j is used to denote the imaginary unit customary in electrical engi-
neering. The SE convention is that exp(jwt) describes positive frequencies and
R + jujL + l/(jujC) is the impedance of a series circuit of a resistance R, an
inductance L and a capacity C. Because propagating waves are described by
exp(ikx — iujt) in physical problems, we regard exp(—iujt) as describing pos-
itive frequencies so that the physics convention is equivalent to setting j = —i
consistently. It is also consistent with the convention in quantum mechanics that
a Schrodinger wave function in quantum mechanics has the factor exp(—iEt/H}
where E is the energy of the system and is positive for positive energies (or positive
frequencies E/K).
In this definition, Eq. (4.3), the interval on t is truncated to the region — T <
t < T, and its Fourier transform is taken, and the result squared. Since a measure-
ment would attempt to filter out one component QJ and square it, this definition is
reasonable. What is not clear yet, is why one divides by T rather than T2, which
will be clear later. The brackets {•) denote an ensemble average. It is curious that
both the limit T —» oo and an ensemble average are taken. For ergodic systems, a
time average and an ensemble average are equal because in such systems, over an
infinite time the system will visit all the points in phase space over which an ensem-
ble average is made (The more precise statement for quasi-ergodic systems, is that
over time a single system comes arbitrarily close to all points in phase space).
This is the reason why statistical mechanics works. The experimenter measures a
time average. The theorist finds it much easier to calculate an ensemble or phase
space average. Yet their results agree. For an experimenter to make an ensemble
average, he would have to average over an infinite number of systems. Instead, he
averages over time for one system. In most cases, then, either average is adequate,
and performing both is redundant. The above assumption is, however, wrong for
the measurement of noise. Middleton (1960) shows that if the ensemble average is
not performed, substantial fluctuations occur in the value of GS(UJ}.
Presumably, this sensitivity occurs because we are asking for the noise at a
precise frequency. Because of the Fourier relation between frequency and time,
a measurement accurate to Au; requires a time t > I/Aw. Realistic noise mea-
surements, to be discussed below, using filters of finite width, are presumably
ergodic.
The definition of the noise spectrum using the autocorrelation function

The above definition, Eq. (4.1), assumes the noise is stationary. A more general
definition of the noise at a frequency ui and time t is given by the Fourier transform
THE WIENER-KHINCHINE THEOREM 71
of the Wigner (1932)-Moyal (1949) type of autocorrelation function
guaranteed to yield a real, but not necessarily positive G(a, w, t). In the stationary
case.
For future use, we note that the inverse of Eq. (4.4) is
When u = v = t, we get the normalization
which is a natural generalization of Eq. (4.2). Here, negative frequencies are

included. But the results are consistent with Eq. (4.2) if (and only if) the inte-
grand is an even function of frequency. The possibility of appearing a noneven
function of frequency function is limited to the quantum case as shown in Section
4.4. The nonstationary case is discussed in the Lax (1968) Brandeis lectures, and
in Section 4.5.
In the stationary case, the autocorrelation, Eq. (4.6), is invariant under a shift
of time origin. In particular this means that in the stationary case, both R(t, r) and
G(a, uj, t) are independent of t. Also in the stationary case, one can replace R(T)
by
since a shift of the time origin by r/2 is permitted. But, this new form is not
generally correct, and indeed is not necessarily real.
4.3 The Wiener-Khinchine theorem
The Wiener-Khinchine (W-K) theorem states that the noise spectrum, Eqs. (4.3), is
given by the Fourier transform of the autocorrelation function, Equation (4.4). This
is equivalent to the statement that the above two definitions of noise are equivalent.
We shall prove the Wiener-Khinchine theorem by evaluating Gs(a, u) in terms
of G(a,uj):
This result was obtained by writing the squared integral in Eq. (4.3) as a product of
two separate integrals, and using different integration variables in each factor. In
the stationary case (for which the W-K theorem is valid) Eq. (4.7) can be written
If Eq. (4.11) (with ui replaced by a/) is inserted into Eq. (4.10)
The last step moved the limiting procedure under the integral sign and used
The appropriateness of the limit, Eq. (4.13), as discussed in Section 1.13 on the
delta functions, is based on the fact that (a) the integral of the left hand side, for
any T, is 1; (b) the width of the function of order 1/T and the maximum height at
a/ = u! is of order T, This function becomes very tall and narrow. An integration
against this function by any function G(u/) of bounded variation will be sensitive
only to its value at the peak a/ = u>. See Eq. (1.200).
Note that Eq. (4.11) with u = t leads to the normalization condition
where the notation
is the customary symbol for spectral density used by statisticians. This normaliza-
tion in Eq. (4.14) is equivalent to the customary choice, Eq. (4.2), when G(a,uj)
is even in u, but more general when it is not. It follows easily from time rever-
sal that for classical variables evenness follows, but this is not true for quantum
mechanical variables (our definitions apply to the quantum case if a* is replaced
in the quantum case by the Hermitian conjugate, at). The quantum case will be
discussed in Chapter 7 in deriving the fluctuation-dissipation theorem.
NOISE MEASUREMENTS 73
4.4 Noise measurements
The definition of the noise spectrum using realizable filters

An actual measurement of noise at a frequency UJQ passes the signal a(t) through
a filter described in the time domain by
where K(t) is known as the indicial response of the filter, or its response to a S(t)
input pulse. In order that the filter be realizable, hence causal, output can only
appear after input so that
The upper limit in Eq. (4.16) can thus be extended to infinity without changing the
value of the integral. In terms of Fourier components
Equation (4.16) yields the convolution theorem result
where
is chosen to emphasize the frequency region near U>Q. Thus we expect the output
spectrum to be \k(u, UQ)\2 times the input spectrum
However, this argument is heuristic, since the integral for a(uj) does not converge
in the usual sense, since the integrand in Eq. (4.18) does not decrease as t —> oo.
What is actually measured is
the time average of the squared signal. The subscript m denotes the definition of
noise using the filter. For long enough T, we expect ergodicity, and can replace the
time average by the ensemble average. Equation (4.22) and Eq. (4.16) combine to
yield
Equation (4.23) and Eq. (4.24) are valid for nonstationary processes. Stationarity
was assumed only in the last step to obtain Eq. (4.25). Order has been preserved in
the above steps so that they remain valid for noncommuting operators. Using the
Wiener-Khinchine theorem in reverse, Eq. (4.11), to eliminate the autocorrelation
we obtain
The factor 4vr arises because of the convention followed in Eq. (4.14). Thus the
desired spectrum at frequency UJQ can be extracted by using a sharp enough filter,
\k(uj, UJQ) | 2 - With an appropriate choice of filter K(t) we have described a Hewlett-
Packard spectrum analyzer.
Example: A realizable filter

The simplest example of a realizable filter is to regard a(t) as a voltage placed
across an R-L-C circuit, with the output a out (i) obtained across the resistance. The
differential equations describing this filter are:
These equations result in the Fourier relation
with
where UJQ = 1/(LC)1/2 is the resonance frequency.

E V E N N E S S IN w OF THE NOISE? 75
The measured spectrum Gm(uJo) continues to be given by Eq. (4.26). In the
limit when the Q = uj^L/R of the oscillator becomes large there are two sharp
resonances at ztwn and we can approximate
where the coefficient TrR/(2L) was chosen to yield the correct integral
This integral was evaluated exactly using formula 031.10 in Grobner and Hofreiter
(1950) shown below. For ab > 0,
where the transformation used was
4.5 Evenness in uj of the noise?
Before we answer the above question, let us note that G(a, uj) by the standard
definition of noise is manifestly real. This reality extends to the quantum case,
even when a is non-Hermitian. To see this, let us introduce the correlation noise
GAtB(u) by
where A and B are possibly non-Hermitian operators and the dagger represents
Hermitian conjugation. (For the classical case, simply regard the dagger as taking
a complex conjugate.) Note, that with £ — » • — £ and the use of stationarity, Eq.
(4.36) can be rewritten as
Our convention is appropriate to having A(t)^ contain frequency behavior of the

form exp(+io;£) [or A(t) containing behavior of the form exp(—iujt)~]. It also
preserves the normal order, namely daggered operators to the left of undaggered
operators.
76 SPECTRAL M E A S U R E M E N T AND CORRELATION
If we take the complex conjugate of the above equation, which entails taking
the Hermitian adjoint of the argument in brackets:
we obtain
by comparing to Eq. (4.36). Clearly, then, B^B is always Hermitian and GB^B(^)
is real. The question we raised above was under what circumstances is GB^B(^}
an even function of u. Alternatively, when is RB^B(^) an even function oft where
we define
Although the principal applications in this book are to classical physics and eco-
nomics, in which random variables commute (can be written in any order) we
maintain the order of our variables so that they remain valid in a quantum context.
Thus, where complex conjugate (classically) is replaced by Hermitian conjugate,
using a dagger, we write for the Hermitian conjugate of a product
because the Hermitian conjugate of a product is the conjugate of the factors in

reverse order. In the classical case, the complex conjugate is used and the order is
unimportant since classical variables commute. Stationarity can be used to show
that
so that in particular RB^B(^) obeys
Thus RB*B(I} would be even in t if it were real. Does it help if B is Hermitian?

In general, we have
The second equality is valid only if B is a classical variable, or if the fluctuations

in B do not involve the variable conjugate to B. The other steps use Stationarity.
In the general quantum case, then GB^B(^} is not an even function of uj
although it is even in the classical limit for a real variable B. The inequality
of Stokes and anti-Stokes scattering intensities is a consequence of this quantum
induced lack of evenness.
NOISE FOR NONSTATIONARY R A N D O M VARIABLES 77
4.6 Noise for nonstationary random variables
The determination of the noise spectrum associated with a nonstationary random

process appears to be an oxymoron. We are searching for a function G(u,t)
describing a density in the two-dimensional ui-t space. Thus, at each time, t,
we require a power spectrum G(UJ) in frequency. However, these are conjugate
variables in the sense the the frequency distribution is related to a Fourier transfor-
mation of the original random variable, say X(t). But then there is an uncertainty
principle of the form
This is consistent with the requirement that a measurement of the noise at a single
frequency, uj, requires an infinite measurement time, in agreement with the limiting
process T —>• oo used in Eq. (4.3).
One consequence of this difficulty is that a number of definitions have been sug-
gested in the literature, for example, by Page (1952), Lampard (1954). A detailed
analysis of the proposed spectrum by Page, and Lampard is made by Eberly and
Wodkiewicz (1977) who also provide a fourth definition of the time dependent
spectrum, which they call "the physical spectrum of light".
We shall not attempt to review this work here since Eberly and Wodkiewicz
(1977) have already made detailed comparisons.
What appears to have been overlooked in these references is that a number
of solutions appeared earlier for an analogous problem in quantum mechanics.
Position and momentum variables can also not be measured simultaneously with
complete precision because of the Heisenberg uncertainty principle. Thus a simul-
taneous distribution function for position and momentum would appear to be just
as much an oxymoron. However, Wigner (1932) proposed an elegant solution for
the density in phase (position and momentum space) which was expounded later
at some length by Moyal (1949).
In the Brandeis Lectures Lax (1968) suggested that the Wigner-Moyal choice
could be applied to the case of noise in nonstationary problems. Equation (4A18)
of the Brandeis lectures is the same as Eqs. (4.5), (4.6) here.
In Lax (1968QXI), however, it is shown that there are many possible choices
of distribution functions. In particular, there is the Wigner symmetric distribution,
the de Rivier symmetric distribution, and the normal and antinormal distributions.
If q is position and p is momentum, then there are combinations roughly
that are the negative and positive frequency parts (or in quantum mechanics
the destruction and creation operators respectively). Normal order involves all
creation operators to the left of all destruction operators. It is then possible to con-
struct normal ordered and antinormally ordered distributions. They are different
numerically, but related.
In Lax (1968QXI), the point is that any of these distributions can be used. And
they should all lead to the same final answer. But which one is most convenient to
use depends on what physical quantity is concerned and determined. For example,
if measurements are made for photon counters, then the antinormal distribution is
best in the sense that the desired results can be obtained by a simple integration
over a classical distribution function. But if another choice is made, corrections
will have to be calculated, as described in Lax (1968QXI) and in Lax and Yuen
(1968QXIII).
In our work on laser line-widths and photocount distributions, the distribution
function can be calculated analytically. And so the antinormal one was calculated.
In this section, our objective is different. We must choose the form of spectral
distribution that is easiest to measure. An additional consideration, is to make a
choice that is best for computing the physical result of interest. The latter may
depend on the nature of the measuring devices. It also depends on the nature of the
process involved, particularly, if we have some knowledge about it. For example,
Bendat and Piersol (1971) display three processes. See Fig. 4.1. The first is one in
which a random process (with zero average) is modified by a time varying mean
value. In the second, the mean is zero but the mean square varies randomly. In
the third, the frequency varies randomly. It is doubtful that one choice of spectral
formula is better than all others for all three cases.
There are other practical considerations in the construction of an algorithm for
spectral calculation. A true ensemble average might require an enormous amount
of repetitions of the experiment. Even if this could be done, it might be meaning-
less. For example, consider stock market prices. One could take a set of series,
each of which begins on January 1 and ends on December 31. Since conditions
(other than seasonal effects) could be sufficiently different at the start of each year
that averages (say over 100 years) might give very misleading results.
Perhaps the most suitable case for analysis is one in which the time-scale of
the nonstationary part of the processes is much longer than the time scale of the
stationary part.
An appropriate starting point for the nonstationary case would be Eq. (4.23) or
(4.24) in which one the signal a(t) through a filter by a convolution with K(t — t')
and then absolute squaring the result. The absolute squared result can be time
averaged over a suitably chosen time interval, as shown in Eq. (4.22). The latter
step can also be replaced by averaging with an exponential weight:
The choice we made of a realizable filter in Eq. (4.30) was chosen to be an RLC
circuit for simplicity. It has two resonances at frequencies whose real parts are
equal but opposite in sign. The procedure used by Eberly and Wodkiewicz (1977)
is equivalent to creating a filter with only a positive frequency resonance. Since
NOISE FOR NONSTATIONARY RANDOM VARIABLES 79
FIG. 4.1. Three processes displayed by Bendat and Piersol (1971). The first is
one in which a random process (with zero average) is modified by a time vary-
ing mean value. In the second, the mean is zero but the mean square varies
randomly. In the third, the frequency varies randomly.
both their filter and ours have a dissipative term that controls the frequency line-
width it also restricts the range of time data used. Our subsequent average over
a time IT or 1/7, with 7 in Eq. (4.47), provides separate control over the time
and frequency intervals. This freedom may be illusory since the time-frequency
product must exceed unity. Our procedures are thus very similar. There are some
small errors inherent in both, since our predicted noise, calculated over a time
interval is probably a better estimate of the value in the middle of the interval.
A detailed examination of the problem of time-varying spectra has recently
been made by Cohen (1995) who compares a wide variety of possible choices.
If we regard the problem of determining the spectrum from experimental data
in the presence of distortion, the problem becomes an ill-posed problem of the
sort discussed in Chapter 15 on "Signal Extraction in the Presence of Smoothing
and Noise". The ill-posed nature of such problems is overcome by regularization
procedures. Although they are not referred to in this way, the wide variety of win-
dowing procedures perform the function. We shall return to this problem of noise
in nonstationary systems in Chapter 17 on the "Spectral Analysis of Economic
Time Series".
4.7 Appendix A: Complex variable notation
Even if a(i) is a real variable, such as the voltage V(t), the associated Fourier
transform
is complex. If a(t) is a stationary variable, this expression for a(uj) is not a conver-
gent integral. Mathematicians might say this expression is meaningless. However,
its moments are meaningful:
Inserting the inverse Wiener-Khinchine relation, Eq. (4.11), for the last factor in
Eq. (4.49) one gets
an expression in terms of the noise spectrum itself of the variable a. The pres-
ence of the delta function shows that only two Fourier components at the same
frequency interfere with each other. Note aji = 2?r/i relates angular frequencies to
ordinary frequencies. The last term uses the notation S(a, /) = (l/2)G(a, w) of
Eq. (4.15), common in mathematics and statistics books.
APPENDIX A: COMPLEX VARIABLE NOTATION 81
Thus, if we have two variables related by a complex factor, such as the current,
I((jj], and a voltage, V(uj) related by an impedance Z(uj) then we can relate the
corresponding noise spectra by
This is an immediate consequence of the relation
and of Eq. (4.50).

5
Thermal noise
5.1 Johnson noise
In this chapter we introduce thermal noise by reviewing the experimental evidence,

and deriving Johnson noise using a phenomenological thermodynamic approach.
A more strict derivation will be described in Chapter 7.
Johnson (1928) measured the voltage noise in a variety of materials, as a func-
tion of resistance. He found the results shown in Fig. 5.1, namely that (V2), the
mean square fluctuation in voltage, is proportional to the resistance R, independent
f the material. See also Kittel (1958).
FlG. 5.1. The noise measured by Johnson (1928) versus resistance in six diverse
materials.
JOHNSON NOISE 83
FlG. 5.2. Thermal noise for two resistors in parallel versus temperature obtained
by Williams (1937) is plotted as an effective resistance (V2)/[4kdfTa] against
T2/Ta. Williams takes Ta to be Ti, except in the one resistor case, for which
RI is infinite, and he then chooses Ta to be room temperature. Both theory,
in Eq. (5.2), and experiment are linear functions of temperature. Line B is the
two-resistance case, and line A is the one-resistance case.
Johnson found that the measured noise power in the frequency interval is
proportional to the temperature of the resistor from which the noise emanates
The proportionality factor k was found by Johnson to agree with Boltzmann's

constant.
Williams (1937) performed experiments using two resistors, RI and R%, in
parallel, and measuring noise power (V2) in the frequency interval df, and two
different temperatures on the resistors, TI and T%. Experimental data are plotted in
Fig. 5.2, as the effective resistance, defined by Re = (V2} / (4kdfTa) as a function
of T<2/Ta, where Ta = T\ in the case of two resistors (case B), and Ta is room
temperature when Tî = oo in the case of one resistor (case A).
Moullin (1938) gives a review of the experimental measurements, and con-
cludes that the noise in a frequency interval df is given by
where k is Boltzmann's constant. Theoretical results from Eq. (5.2) are represented
by solid lines in Fig. 5.2 compared with experimental data.
84 THERMAL NOISE
Moullin (1938) generalizes this result to the case of an arbitrary number of
impedances in parallel:
If we remember that admittance Yn is related to the impedance Zn by
and write V = ZI, this result takes a simpler form in terms of current fluctuations
Other useful references on networks and noise are Murdoch (1970), Bell (1960)
and Robinson (1962, 1974).
5.2 Equipartition
The equipartition theorem in (classical) statistical mechanics asserts that at equi-

librium an energy of kT/22 is stored in each degree of freedom. Thus a harmoni
oscillator would have kT/2 stored in its potential energy, and an equal amount
stored in its kinetic energy. The same would be true for the capacitance and induc-
tance in an LC circuit, the electromagnetic analogue of a harmonic oscillator. But
these devices (ideally) are not a source of noise. Thus it behooves us to establish
compatibility with thermodynamics by showing that adding a small resistance, R,
in series with an LC circuit will produce exactly the required amount of energy
from noise in the circuit, regardless of how weak the resistance, R, is.
Consider a series circuit of resistor R, inductance L and capacitor C, with John-
son noise V(t) in the resistor. Then the resulting current and charge fluctuations /
and q are
T H E R M O D Y N A M I C DERIVATION OF J O H N S O N NOISE 85
The fluctuation energy in the inductance using Eq. (5.1) is
where we have set u = LJQX, with LJQ = 1/(LC)1/2 as the circuit resonant fre-
quency, and Q = uj^L/R as the circuit Q factor (energy stored over energy lost
per cycle). This integral can be performed using the residue theorem of complex
variable theory, or by using Eq. (4.33) which was obtained from Grobner and
Hofreiter (1950).
Similarly, the energy stored on the capacitance is
Thus, for both the inductance and the capacitor, the energy stored because of the
noise in the resistor is precisely that expected by the equipartition theorem. Note
that Eq. (5.7) is a relation between Fourier components, so that, for example / and
q should be written 1^ and q^, whereas in Eqs. (5.8) and (5.9) we are really dealing
with the time dependent quantities {/(t)2} and (q(t)2) respectively.
A fundamental truth now emerges. Fluctuations must be associated with dissi-
pation in order that the system does not decay to zero, but maintains the appropriate
thermal equilibrium energy.
5.3 Thermodynamic derivation of Johnson noise
In view of the compatibility with thermal equilibrium shown in the preceding ses-
sion, it is not surprising that a simple thermodynamic argument can be used to
demonstrate that the noise emanating from a resistor must be proportional to its
resistance.
Consider two resistors in a series circuit shown in Fig. 5.3. The current I\
through resistor R^ produced by the first resistor is
86 THERMALNOISE
FlG. 5.3. Power transfer from resistance R\ to R% and vice versa where Vj is the
Johnson noise voltage in resistor Rj.
The power from V\ into R<2 is given by
Conversely, the power from 2 into 1 is given by
If both resistors are at the same temperature, the second law of thermodynamics
requires that there can be no steady net flow (in either direction). Equating Eqs.
(5.11) and (5.12), we obtain
Since the left hand of the equation is independent of R*z, and the right hand side is
independent of R\, this equality requires both sides to be independent of both resis-
tances. It therefore must be an (as yet unknown) universal function of frequency,
W(/). In summary, the noise spectrum associated with an arbitrary resistance R
is given by
If resistance R% is replaced by an impedance R(f) + j X ( f ) , the same second

law argument requires that
N Y Q U I S T ' S THEOREM 87
Thus we can conclude that the noise G(V, /) associated with impedance Z(f) is
given by:
where 3J means the real part. Thus the noise is proportional to R(f) = $tZ(f) even
when the impedance Z(f) is frequency dependent. The Johnson law has therefore
been generalized to the case of frequency dependent impedances.
5.4 Nyquist's theorem
A more complete derivation of the fluctuation dissipation relation including a

determination of the universal coefficient W(/) is provided by Nyquist (1927,
1928).
Nyquist's procedure is to calculate the power dissipated in a load connected
to the end of a transmission line in two different ways and compare the results.
Equation (5.11) for the power from resistor R± into R% reduces with Eq. (5.14) to
The maximum power is transferred when the impedance is matched, R\ = R^.
On the other hand, a transmission line can be terminated with its "characteristic
impedance"
where L is the inductance per unit length of the line and C is its shunt capacitance
per unit length. In this case, waves down the line are not reflected. The line acts
as if it were infinite. Nyquist therefore chooses as his proof vehicle a transmission
line terminated by RQ at both ends. The line is assumed to have length I. The trans-
mission line can be described in terms of its modes which are harmonic oscillators.
[f U is the energy density per mode, then the energy per mode is
where we have made use of the equipartition theorem valid for modes that behave
like harmonic oscillators.
If the modes are described as plane waves ex.p(±ikx) in a periodic system of
lensth I then k takes the discrete values
Thus the number of modes in the interval Afc is //27rAA;.

88 T H E R M A L NOISE
Since uin = vkn is the frequency associated with mode n, where v is the veloc-
ity of propagation on the line, the number of modes propagating to the right in a
given frequency interval is
Since each mode carries an energy U with a velocity v, the power transmission
down the line is
which according to Eq. (5.18) should be equal to W(/)A//4. We obtain
In the limit of classical physics, Eq. (5.20) is applied and
Equation (5.23) then yields the classical Nyquist theorem
where R is the characteristic resistance of the transmission line. This equation is a

result in agreement with Johnson's experimental results.
The beautifully simple Nyquist proof yields a result independent of frequency
because all the harmonic oscillator traveling modes have the same energy kT,
and because the density of these modes is uniform in frequency. The normal-
ization must, of course, be that found in order to obtain the agreement with the
equipartition obtained in Section 5.2.
An apparent problem with Nyquist-Johnson noise is that the total voltage
fluctuation
diverges. Nyquist suggested that this problem could be removed if the classical
energy, kT, associated with a harmonic oscillator were replaced by the quantum
energy
which approaches kT at low frequencies, and vanishes experimentally at high

frequencies.
Of course, the actual enersv associated with a harmonic oscillator
includes the zero-point energy. If the latter is retained, the divergence in the
integrated energy reappears.
N Y Q U I S T ' S THEOREM 89
It is sometimes argued that zero-point energy can be ignored because only dif-
ferences in energy can be observed. However, it is not true for magneto-optical
transitions between Landau levels in the valence band and similar levels in the
conduction band. These levels possess a level structure like a harmonic oscillator,
but the frequency is the cyclotron frequency associated with the magnetic field.
Thus it is inversely proportional to the effective mass of the electrons in the con-
duction band, or the holes in the valence band. Since these masses are different,
the energy differences contain the difference of the two zero-point energies which
is therefore observable. See Lax (1967).
In the Casimir effect (Casimir and Polder 1948), two closely spaced metal-
lic plates are attracted by the influence of vacuum fluctuation in the gap on van
der Waal forces. The Casimir effect has been experimentally verified by Derya-
gin and Abrikosava (1956), Deryagin, Abrikosava and Lifshitz (1956), Kitchener
and Prosser (1957), and Chan et al. (2001). An alternate derivation of the "Lamb
shift" between the otherwise degenerate s and p levels in a hydrogen was given
by Welton (1948) based on the effects of the zero-point fluctuations of the
electromagnetic field on the electron.
The relevance of zeropoint energies in the electromagnetic field is discussed
further in Chapter 7 as it relates to the area of quantum optics. We also note that
absolute energies (not just differences) are relevant in general relativity, since space
is distorted by the total energy content. For an elementary discussion of these
points see Power (1964).
Callen and Welton (1951) considered a general class of systems (quantum
mechanically) and established that Eq. (5.26), and its dual form with Y = l/Z,
the admittance, and g = ^KY(uj), the susceptance:
apply to all systems near equilibrium with the replacement of kT by Eq. (5.28)
when necessary. The importance of the Callen-Welton work is the great generality
of potential applications. The fact that all dissipative systems have corresponding
noises associated with them is necessary in order that the second law of ther-
modynamics not be violated when such systems are connected. The fluctuation-
dissipation theorem will be discussed in more detail in Chapter 7, after the density
operator tools needed to give a short proof are developed.
90 THERMAL NOISE
5.5 Nyquist noise and the Einstein relation
Consider a mechanical system with velocity v. Then the standard engineering (SE)
noise associated with v can be defined by
The zero frequency noise is then obtained by setting u = 2yr/ = 0
But the usual diffusion constant, D, is defined by {[Ax]2) = 2DT, where T is

the total time traveled. See Eq. (3.27). Thus the zero frequency velocity noise is
directly determined by the diffusion constant:
Conversely, the fluctuation dissipation theorem for the velocity (which is analo-
gous to a current rather than a voltage) is given by Eq. (5.30)
where
with F the applied force, is the admittance, or velocity per unit applied force. At
zero frequency, we refer to v/F as the mechanical mobility, B, and v/E as the
(electrical) mobility n so that
The fluctuation dissipation theorem at zero frequency now reads
which is simply the Einstein relation between diffusion and mobility. An experi-
mental verification of the Einstein relation for electrons and holes in semiconduc-
tors is given in the Transistor Teacher's Summer School (1953).
5.6 Frequency dependent diffusion constant
In some problems, such as the hopping conduction problem discussed by Scher

and Lax (1973), the description is in terms of positions rather than velocities. Thus
FREQUENCY DEPENDENT DIFFUSION CONSTANT 91
it would be desirable to obtain an expression for the frequency dependent mobility
or diffusion constant in terms of positions instead of velocities.
With the help of Nyquist's theorem, Eq. (5.30):
where v is a particle velocity and Y is the mechanical mobility
for an applied force at frequency uj = 2yr/. These formulas can be simplified in

the classical case by assuming that the time correlation is even in r and $Y(uj)
is even in frequency. However, the results we derive below do not depend on this
simplifying approximation.
The Einstein relation
where /z is the electronic mobility suggests that we can define a complex frequency
dependent diffusion constant:
Then Eq. (5.38) simplifies to
This transform can be inverted, as in Eq. (4.7), to yield
where we have used stationarity and t' = t + r. Thus we can write for the mean-
square displacement:
Now differentiate with respect to t to obtain

92 THERMAL NOISE
We can invert this sine transform to obtain
By Eq. (5.38) and (5.41), D(u) is G(v, /)/4. Thus we obtain MacDonald's
theorem
Equation (5.47) permits velocity noise to be calculated from position fluctuations

as required in Scher and Lax (1973). For obtaining the imaginary part of D(UJ),
see the discussion in the Appendix of Scher and Lax (1973).
6
Shot noise
6.1 Definition of shot noise
Shot noise is the name given to electrical fluctuations caused by the discreteness
of electronic charge. For excellent elementary discussions of shot noise see Robin-
son (1962, 1974), MacDonald (1962), and Bell (1960). The most typical example
concerns the emission of electrons from the cathode of a vacuum tube. For a dis-
cussion of noise in vacuum tubes see Lawson and Uhlenbeck (1950) and Valley
and Wallman (1948). In Fig. 6.1 we display an example discussed by Robinson.
The switch S is connected to A for a time interval r. The current through the
diode charges condenser C. Then the switch is shifted to position B and the accu-
mulated charge is measured by the ballistic galvanometer G. The actual charge
measured will be
with integral n in a single measurement. The mean charge (average over many
measurements) will be
with nonintegral results. Assuming random arrival of the electrons, we have Pois-
son statistics, see Section 3.1, and the root-mean-square fluctuation in charge is
FIG. 6.1. Apparatus for measuring shot noise associated with charge accu-
mulated on a condenser using a ballistic galvanometer. After Robinson
(1962).
94 SHOT NOISE
given by
Thus by means of two macroscopic measurements it is possible to determine the

electronic charge as
The case in which electric charge is continuous can be obtained by letting n

approach infinity and e approach zero, keeping the product en fixed. Equations
(6.2) and (6.3) demonstrate our contention that shot noise is created by the dis-
creteness of charge and disappears in the continuous charge limit. If charge were a
continuous fluid, shot noise would disappear.
The first theory of the shot effect due to Campbell (1909) not only yields the
above result, but as shown in the next section takes proper account of the shape and
duration of the pulses. Schottky (1918) proposed a simple theory of the shot effect
in relation to thermally emitted electrons from the anode of a vacuum tube. He also
recognized the possibility of using such measurements to determine the charge on
the electron. The first experiments by Hartmann (1921) obtained a charge between
0.07 and 3 times the electronic charge. Further theory of the shot effect was given
by Furth (1922) and by Fry (1925).
In a remarkable series of papers, N. H. Williams demonstrates in a series of
careful experiments that the charge on the electron can be measured using the
shot effect to an accuracy comparable to that in the Millikan oil drop experiment.
Moreover, this is a measurement of e alone, without involving other parameters
such as the electron mass and the viscosity of air. The earliest paper in this series
by Hull and Williams (1925) measures the shot noise induced in an RLC circuit
from a vacuum tube that emits electrons from a grid as well as the anode. See Fig.
6.2.
This paper shows that even though the voltages can be adjusted for the currents
from these two sources to cancel, the two shot noise contributions add. This, of
course, is what would be expected if the two emissions were independent. The
authors must also correct for the case in which the anode emission is not com-
pletely temperature limited. In that case, there are electron correlations that reduce
the shot noise as described in Section 6.5.
Moreover this paper demonstrates that shot noise can be treated as a current
source by examining the frequency dependence of the voltage noise.
Then the results were confirmed by even more accurate accurate experiments of
Williams and Huxford (1929).
CAMPBELL'S TWO THEOREMS 95
FlG. 6.2. Shot noise into a circuit containing a C in parallel with a series RL
circuit, from Hull and Williams (1925), Fig. 8. or Fig. 4-12 of Lawson. The
currents from the two circuits can be adjusted to add to zero (on the average)
but the shot noises do not cancel from independent emissions.
The paper by Williams and Vincent (1926) measured the shot effect from a
vacuum tube diode directly into a noninductive resistance and simplified the the-
ory for emission into a nonperiodic circuit. The experimental results of the work
of Williams and his collaborators required a careful analysis of both theory and
experiment before adequate accuracy and understanding was obtained.
An even more detailed analysis of experiment and theory is needed in the recent
ingenious experimental work by de-Picciotto et al. (1997, 1998) to show that in
the case of the fractional Hall effect, the effective charge can be e* = e/3. Thus,
although the charges are discrete, they are not necessarily integral.
6.2 Campbell's two theorems
Campbell (1909) was concerned with the measurement of the charge on the alpha
particle. The charge q, given to an electrode system, generates a voltage q/C
(where C is the electrode capacity) which decays through some leakage resistance
so that (q/C) exp(— pt) is the voltage V on the electrometer plates. The voltage V
generates a torque KV on the electrometer needle whose response is determined
by its moment of inertia /, torsional stiffness k and damping // through the relation
The parameters /, k, p., K are presumed known from a separate experiment. The
solution for © can be written q f ( t ) . If a set of pulses arrive at the times t,, the
96 SHOT NOISE
complete response is
For a critically damped galvanometer
but we need not restrict our calculations to any particular form of /(£). The latter
is the indicial response, that is, the response of the apparatus to a delta function
source. We are now in a position to state:
Campbell's theorem
If the pulses in Eq. (6.8) arrive at random at an average rate v per second, the
average response is given by the first Campbell theorem:
and the variance of the response is given by the second Campbell theorem
where / = vq is the average current.
Note the asymmetry, {(A6) 2 ) = {O2} — (O) 2 is expressed in terms of / 2 not

in terms of/ 2 - {/}2. Since f ( t ) describes the "indicial" response of the apparatus
to a delta pulse, its Fourier transform F(f) with / = UJ/IK is
describes the frequency response of the apparatus to a delta time source.
Proof
Equation (6.8) can be written in the form
where the ideal shot noise function

CAMPBELL'S TWO THEOREMS 97
is the density of events t, for the ideal shot noise case. Each delta function gives
rise to one pulse in the series with average value
The last step, setting the average time-dependent rate (v(s)}, to a constant v, the
average number of events per second, is appropriate (only) in the stationary case.
Thus we obtain Campbell's first theorem:
where stationarity is used in the last step.

To obtain {[AO(i)] 2 ), we shall consider the slightly more general problem,
with t' ^ t:
In performing the average, one must separate the double sum over i and j to
the i = j terms and i ^ j terms:
The second term is the definition of the (possibly correlated) joint rate:
Thus
The fluctuation is defined by
Thus the fluctuation in @(t) is given by

98 SHOT NOISE
When correlations are absent, the second term vanishes. In the case of uniform (on
the average) flow, (v(s)) = v is time independent, and
Eq. (6.23) is a generalization of Campbell's second theorem. The latter is restricted

to the case in which t = t':
We close this section with an application of Campbell's theorem to the RC

circuit shown in Fig. 6.3 that a charge e from voltage fluctuations induced by shot
noise in an RC circuit. The vacuum tube generates a charge e on the condenser
that decays away with the usual RC time constant. A voltage across the condenser
plates is given by
Campbell's theorems then yield
where I = i>e and
Again, the electronic charge can be determined by comparison of {(AF)2} with

(V)
If one takes the limit in which the charge e goes to zero and v goes to infinity at
fixed current / = ve, the discreteness of the charge, and the shot noise associated
with that discreteness disappear.
6.3 The spectrum of filtered shot noise
Parseval's theorem (see Morse and Feshbach 1953) states that
If we regard |/(t) |2 as the density of energy in time, then |F(/) |2 can be regarded
as the energy density in frequency /. Parseval's theorem shows that the total energy
THE SPECTRUM OF FILTERED SHOT NOISE 99
FIG. 6.3. A RC circuit for application of the Campbell's theorem.
can be obtained by adding either the frequency components or the time compo-
nents, with equal result. A simple nonrigorous proof can be given, using Eq. (6.12),
to rewrite the right hand side in the form
We have reversed the order of integration. Since the last integral over frequency
/ is simply the delta function 5(u — t), representing completeness, the integral
reduces to the left hand side of Eq. (6.29).
Since f ( t ) is real, F(/)* = F(-f}, so |F(/)| 2 is even in / and
Thus Eq. (6.24) can then be rewritten as
This relation is a consequence of the completeness of the Fourier transform

relationship, Eq. (6.12). The second form is written to suggest that the white
spectrum
is the spectrum of ideal shot noise. We shall derive this spectrum below directly
fromEq. (6.14). The factor |F(/)| 2 can then be interpreted as the filter that reduces
the ideal shot noise of Eq. (6.14) to that of O(t). The function f(t] is the indicial
response of the filter, namely the response to a delta function input. That the spec-
trum at the output is that of the input pure shot noise multiplied by the spectrum
100 SHOT NOISE
of the filter is so reasonable, it hardly requires proof. We shall, however, make a
direct evaluation of the spectrum of pure shot noise in the next section, since we
are then sure that all our normalizations are correct, in addition to that of the shape
of the spectrum.
The current associated with the arrival of charges q at times t, can be written
where the symbol 5* is used to remind us that we are referring to the current
associated with pure shot noise. If the expected arrival rate is v(t~) per second
then
The fluctuation is given with the help of Eq. (6.21) as
If the arrivals are independent
and if thev are constant, on the average, v(t] = v so that
The spectrum is then independent of frequency /
since the average current / is time independent, as anticipated in Eq. (6.33). Such
a constant spectrum is referred to as "white".
We shall illustrate the above result by calculating the voltage spectrum across
the condenser C in Fig. 6.3 without using Campbell's theorem. The full current,
ifuii = i, passes through the parallel circuit of condenser, C, and resistance, R,
with ic and IR passing through the condenser and resistance, respectively, in
proportion to the admittance of these elements:
Thus the voltage across either is given by

TRANSIT TIME EFFECTS 101
Equation (6.41) determines the spectrum of v from that of Zf u u. The spectrum of
voltage fluctuations is
and the total voltage fluctuation is
in agreement with Eq. (6.27) obtained using Campbell's theorem.
6.4 Transit time effects
Shot noise arises because charge is discrete. In a vacuum tube diode each electron
crosses from cathode to anode. It would be incorrect, however, to assume that
the external circuit sees a delta pulse associated with the time of arrival of each
electron. Instead we shall follow a simple model developed by Shockley (1938)
as discussed by Freeman (1952) and in Section 6.5 we shall supply an elementary
proof of the validity of the model.
If a charge e has advanced a distance x, a fraction x/L of the total distance
L from the cathode to the anode, we shall assume that the external circuit will
vary continuously as if a charge ex(t)/L has arrived at the anode. The full charge
transfer is completed when x(t) = L. A set of charges at positions Xj(t) lead to a
charge transfer of
which is associated with a current flow of
and Vj(t) = dxj(t)/dt is the velocity of charge j. Of course, only charges in the
region 0 < Xj < L contribute to either sum above. Equation (6.45) is equivalent
to using an average current over the region 0 < x < L .
where the actual current density has the expected form (Jackson 1975, Section
5.6),
102 SHOT NOISE
We shall explore the consequences of Eq. (6.45) and prove in the next section that
the average expression in Eq. (6.46) or Eq. (6.45) is, in fact, exactly correct; see
Eq. (6.76). The transit time T for any carrier with velocity v(t) obeys
The velocity v(t) is not assumed constant, but we shall, in what follows, state the
general answer, and answers for the simple special case of uniform velocity. For
example, if v(t) = v
It has been tacitly understood that each term in Eq. (6.45) contributes only
while the position of the charge is in the active region:
Application of Campbell's first theorem to Eq. (6.45) with the help of Eq. (6.48)
yields:
which is the charge e times v, the number per second, at which they appear.
Campbell's second theorem takes the form
In the uniform velocity case, with the help of Eq. (6.49)
so that T([A/] 2 )// yields an experimental value of e.

TRANSIT TIME EFFECTS 103
The Spectrum of filtered shot noise
We can write the current as a convolution
between the pure shot noise S(s) ,
and the pulse shape v(t).

As a result of the convolution theorem the current noise can be written
where the "window factor" is
and G(S, uj) is the pure shot noise associated with S(s).
If we regard t,, as the time an electron leaves the cathode, no signal can appear
in the output circuit until t > tj . Thus we set v (t) = 0 for t < 0 . After the
electron hits at tj +T , this v (t — tj) term no longer contributes so we can set
v(t) = 0 for t > T. Thus the pure shot noise is filtered by the "window" factor
where T is the transit time of the electron in passing from cathode to anode. One
can readily verify that the mean current
is unaffected by the window. All that remains is to calculate the spectrum of pure
shot noise itself, which we have already given as a theorem in Eq. (6.39) as
a "white" noise independent of frequency, and
which has a spectrum whose "color" is that of the window W(uj~).

104 SHOT NOISE
FIG. 6.4. Motion of a charged layer from anode to cathode.
6.5 Electromagnetic theory of shot noise
In this section we shall prove that the intuitive "average" model of Shockley (1938)
and Freeman (1952) discussed in the previous section is rigorously correct. A one-
dimensional sheet of charge whose total charge is e moves from the cathode to
the anode of a diode with velocity v(t) (see Fig. 6.4). The charge sheet enters the
region between cathode and anode at time to and arrives at the anode at time to+T
where T is the transit time. Describe the current that appears in the external circuit
in the time interval — oo < t < oo. The results will justify the use of the smooth
current in Eq. (6.45).
A quasistatic approach is permissible. We shall therefore use Poisson's equa-
tion:
in MKS units. To get Gaussian units set eo = 1/4.7T. For our case
where dx/dt = v(t) and x(0) = 0. Thus we need the Green's solution of Poisson's
equation. On each side of the sheet, d^c^/dx2 = 0, since no charge is located in the
vacuum region. Thus <j> is linear in x and both E and D are constants. By Gauss'
law, the jump across the sheet is
which implies that
If there is a potential difference of V between the electrodes

ELECTROMAGNETIC THEORY OF SHOT NOISE 105
+
By superposition, the solution will be the uniform field E = E~ = V/L plus
the solution for the case V — 0. To obtain the latter solution, set V — 0 and use
Eq. (6.65) to eliminate E+. The result, inserted into Eq. (6.66) with V - 0, is an
equation for E~:
Thus we obtain the fields on both sides of the sheet.
Conservation of charge yields
So it is dD/dt + J that is conserved as one moves around a circuit.

In the one-dimensional case
is independent of x and represents the current flow in the circuit. In the exter-
nal wire, D = 0, since charges disappear in a conductor (within the dielectric
relaxation time).
To evaluate the total current in the vacuum transit region we note that the
conduction part of the current density (in the one-dimensional case) is
With the help of the Heaviside unit function H(x), Eqs. (6.68) and (6.69) can be
combined into the single equation:
Differentiation with respect to t leads to four terms, two of which cancel, leadir
to the simple result:
106 SHOT NOISE
FIG. 6.5. The potential between cathode and anode in a simple diode displaying
the barrier plane.
Combining this result with Eq. (6.72) for the conduction current, the combined
current simplifies to:
Note, that the total current contains no singular or discontinuous terms. This
combined current appears in the external electrical circuit:
The smooth result in Eq. (6.76) is just what we assumed In Eq. (6.45). The time T
in Eq. (6.76) is determined by:
where the velocity v(t) will be governed by the applied voltage and Newton's laws.
Note, an extra dc field V/L does not contribute to dD/dt although it contributes
toD.
At low concentrations, or for one particle (with no screening), one can write a
simple expression for the velocity (in a constant potential)
6.6 Space charge limiting diode
Our previous discussion of the shot noise in a diode is temperature limited. When
significant space charge is present, the noise is limited by correlations induced
SPACE CHARGE LIMITING DIODE 107
by the space charge. The potential produced by the space charge obeys Poisson's
equation
where the density of electronic space charge p is everywhere negative. Thus the
potential is concave upwards. If the potential V would increase monotonically
from cathode to anode, emitted electrons of all energies would be accelerated
from cathode to anode, and no space charge would develop. Since space charge
develops, the potential must develop a minimum within the region from cathode
to anode as shown in Fig. 6.5. In practice, a small minimum is found close to
the cathode. This minimum provides a potential barrier, and only electrons with
higher emission kinetic energy will overcome the barrier. Since electrons have a
negative charge, they can be visualized by inverting Fig. 6.5. Since the electrons
have a Boltzmann distribution of energy, the current is
where V& is the barrier height in energy units. Clearly, ~V\, = kT log(Io/I), and the
barrier position can be determined from its height and the temperature. This would
reduce the current but not change the ratio of shot noise to current. However, a
positive fluctuation in the current will cause an increase in the barrier height that
will turn back some electrons. This "negative feedback" causes a reduction in shot
noise, by a factor F2. For anode voltages larger than 30/cT/e, an approximate
formula for the smoothing factor given by Rack (1938):
In the retarding region V < 0, we have
An electron must overcome the potential drop V as a potential barrier.

For large V, the electrons see everywhere an attractive potential, space charge
is unimportant and the usual shot noise formula is valid. For lower voltage the
potential has the form shown in Fig. 6.5, attractive over most of the region, but
repulsive between the cathode and the potential minimum. The size of the potential
minimum is governed by space charge effects whose theory is given by Moullin
(1938), North (1940), and Rack (1938). The effect of the space charge is that a
given electron modifies the potential minimum Vm and causes a reduction in the
108 SHOT NOISE
FIG. 6.6. Equivalent circuit of a diode feeding into a noisy resistor, after Williams
and Moullin, p. 74, combining the thermal noise current with that of a
space-charge limited.
effect of the space charge. The result is to replace
where the reduction factor T2 has a contribution from the electrons reflected at the
barrier and a second contribution from those that get to the plate.
A simplified description of the space charge reduction F2 in a triode is given
on p. 564 of Valley and Wallman (1948). Williams (1936) and p. 74 of Moullin
(1938) show that the equivalent circuit of a diode feeding into a resistor is what in
Fig. 6.6.
The equations found to fit the data are given by
This is a combination of shot noise and thermal noise. Here p is the differen-
tial resistance dV/dl of the diode. The notation Ic and IT is a notation we use
to remind us that the first noise is space charge limited, and the second is tem-
perature limited. This conversion from current to voltage noise was adopted by
Williams (1936) following a suggestion by Moullin and Ellis (1934) and discussed
extensively in Moullin (1938), p. 74.
The noise dependence from a resistance R at temperature T% was then mea-
sured by comparing the effective resistance of the diode resistance combination
RICE'S GENERALIZATION OF CAMPBELL'S THEOREMS 109
with that of a metallic resistance at another temperature T\. The results in Fig. 23
of Moullin (1938) agree with the effective resistance formula
Williams extended this verification to the case of two diodes in parallel with a
resistance R, with one of the diodes being temperature limited and the other space
charge limited. In that case, the formula is
6.7 Rice's generalization of Campbell's theorems
Rice (1944, 1945, 1948a, 1948b) not only generalized Campbell's theorem to
obtain all the higher moments, he also considered the more general process
where the r/j's are random jumps with a distribution independent oft.,, of t, and of
j. Our procedure for dealing with the same problem consists in writing
where the shot noise function G(s) is now given by
Equation (6.88) describes @(t) as filtered shot noise. Thus we can relate the ordi-
nary characteristic function of 8 to the generalized characterized function of the
shot noise function, G(s)
The average in Eq. (6.90), for general y ( s ) , was evaluated in two ways in Lax
(1966IV). The first made explicit use of Langevin techniques, which we will dis-
cuss later. The second, which follows Rice, will be presented here. It makes use of
110 SHOT NOISE
the fact that the average can be factored:
Here, we have supposed that N pulses are distributed uniformly over a time inter-
val T at the rate v = N/T. All N factors are independent of each other and have
equal averages, so that the result of the RHS of Eq. (6.91) is
where 5(77) is the normalized probability density for the random variable 77. In the
last step, we assumed that the integral over s converges, and replaced it by its limit
before taking a final limit in which N and T approach infinity simultaneously with
the fixed ratio N/T = v. Setting y ( s ) = k q f ( t — s), the generalized Campbell's
theorem is obtained
The cumulants are then given by the coefficients of kn/nl in the exponent:
The choice 5(77) = 5(77 — 1) restores the original Campbell process, which includes
only the cases n = 1,2. The probability density of this variable may then be
obtained by taking the inverse Fourier transform of the characteristic function in
RICE'S GENERALIZATION OF CAMPBELL'S THEOREMS 111
Eq. (6.93)
This form of generalized Campbell's theorem, like its antecedents, assumes that
the tj are randomly (and on the average uniformly) distributed in time. Moreover,
there is assumed to be no correlation between successive pulse times. Lax and
Phillips (1958) have found it convenient to exploit Eq. (6.93) and Eq. (6.94) in
studying one dimensional impurity bands.
Rice (1944) has also determined (© 2 (t)) for the case in which the time interval
between successive pulses has a distribution p(r) that is not necessarily equal to
the Poisson value, p(r) = zêxp(—z/r), appropriate to uncorrelated pulses. A
simplified argument can be given for the second moment of
since
If we write
where T\ = tj+i — tj , rs = ti+s — tj+ s _i are intervals, then
where the factor 2 takes account of the fact that tj can be less than tj as well as
greater than tj. If we define
then
112 SHOT NOISE
If we write
and
then
These results are readily evaluated for the case considered by Rice (1944)
with the result
so that
where
Since
the total noise is

7
The fluctuation-dissipation theorem
7.1 Summary of ideas and results
In Chapter 5 we provided Nyquist's derivation of Johnson noise in electrical

resistors. Callen and Welton (1951) were the first to emphasize the universal
nature of the relation between fluctuations and dissipation. They provided a
quantum-mechanical derivation that involved detailed sums over the eigenstates.
A number of basically equivalent proofs have been given by Kubo (1957), Lax
(1958QI) and others. We shall follow, in this chapter, a procedure used in Lax
(1964QIII) because it is a quantum-mechanical proof that avoids a representation
in terms of eigenstates, and is readily translated into a classical proof.
Since a number of ideas and relations must be developed, we shall attempt
in this section to provide an overview of the ideas involved, the formulas to be
derived, and the connections to be made. The remaining sections of this chapter
need only be read by those concerned with the techniques used in the proofs.
Although we are usually concerned with real variables in classical physics, rep-
resented by Hermitian operators in quantum mechanics, we restate the definition,
Eq. (4.37), of the noise correlation between two (operator) variables A and B using
the Wiener-Khinchine form of noise definition
where stationarity is needed to get the second form. A and B can be thought of
as (possibly complex) random variables in the classical case and operators in the
quantum case.
We also consider the response of the variable B(t), governed by a Hamiltonian
K, to an infinitesimal force associated with A produced by changing the Hamil-
tonian from K to K + XA exp(+iujt). Here A is an arbitrarily small number. The
notation used here is consistent with that used in Lax (1964QIII). If the average
response changes from (B(t)} in the absence of the A force to (B(t)}A in the pres-
ence of the force, we can define a response, or transition, function TBA(U) by the
114 THE FLUCTUATION-DISSIPATION THEOREM
change due to the force
This response function TBA(U~) can be computed for a certain system. Some
example of TBA(U) can be found Eq. (7.17 ) and Eq. (7.76). We shall establish in
Section 7,3 that
where the {• • • ) denote an average over the stationary and possibly equilibrium
ensemble present before the A force was applied. In the quantum case, an average
over an arbitrarv observable M is piven hv
where p(E) is the equilibrium (Gibbs) density operator for Hamiltonian K, and
0 = 1/kT, where k is Boltzmann's constants and T is the absolute temperature.
In the classical case, (M} is simply an integral of M against the equilibrium distri-
bution function. Our parenthesis-comma construct was defined in Lax (1964QIII)
to be
i.e., the commutator MN — NM over iU, and h = /i/(2vr) where h is Planck's

constant. In classical mechanics the corresponding expression is the Poisson
brackets:
Here qt and pi are a complete set of conjugate mechanical variables, as discussed

by Goldstein (1980).
The expression in Eq. (7.3) looks like having some relationship to GBA(U)
defined in Eq. (7.1). Using the equilibrium theorem in Section 7.4, the first, most
general, form of the fluctuation-dissipation theorem to be established will be
where
In the classical limit, hn(uj) is replaced by kT/uj. Note that n appears, rather than
n + 1/2, so that the zero-point contribution is absent with the present order of the
operators.
SUMMARY OF IDEAS AND RESULTS 115
The reason for the particular combination of T matrices in Eq. (7.7) is that this
combination is expressible as an integration over the complete time interval
With no further information about the operators A and B, the only relation between
the two terms in Eq. (7.1) is
The dagger f denotes the Hermitian conjugate of an operator in quantum mechan-

ics and the complex conjugate of a number or classical random variable. For the
special case in which A* — B (and vice versa) Eq. (7.10) simplifies to
When A and B are conjugates, the fluctuation-dissipation theorem simplifies to
This result applies also to the special case in which A = B is Hermitian.

If A and B are unrelated, we must find another means of relating TAB(~ w~) to
TBA(W}. Typically, the operators, A and B are either even or odd under the barring
operation that combines time reversal with Hermitian conjugation:
where CA,B = ±1- m that case we can specialize the time reversal relation of Eq.
(10.5.23) of Lax (1974) by setting the external magnetic field to zero:
The order of these operators is relevant in the quantum case. It then follows that
Thus if A and B are Hermitian, we obtain
For a detailed discussion of time reversal see Lax (1974).

The left side of the above equation comes from Eq. (7.1), which represents
fluctuation of a quantity of the system, B (current), under a small external force
(voltage) A. The right side of the equation is related to Eq. (7.2), which repre-
sents the response of B under a perturbation A, or dissipation. Equation (7.16)
shows they are related. Fluctuation must be associated with dissipation in order
that the system does not decay, but maintains the appropriate thermal equilibrium
temperature.
The noise we are usually concerned with is a current-current noise, or a
velocity-velocity noise. If q represents an electron position, we are concerned
with Gqq. But then CA^B = 1> and we get the imaginary part rather than the real
part of a transfer matrix. The solution to this discrepancy is to remember that the
admittance usually referred to is an (odd) current (velocity) response to a (even)
voltage (force on a position), namely
We note, however, that
and
Since eq = —I, eq = 1 we get
Since hojn(oj) —>• kT, we obtain the correct classical limit for Johnson noise.
Again there is no zero-point contribution to this noise.
If we had followed the Ekstein-Rostoker (1955) antisymmetrized definition of
noise:
where
we would have obtained
which includes the zero-point contribution.

Which is the correct answer, is in our opinion, a question of physics, not
formalism. One should not, as Ekstein and Rostoker do, merely choose the anti-
symmetrized combination because it is Hermitian. One must examine the nature
of the measurement. If we are measuring light (where quantum corrections are
important) then conventional photodetectors have no output, unless there are pho-
tons in the input beam. The result is that one typically measures normally ordered
operator products (that is, with the photon creation operators tf to the left, and
destruction operators b to the right) rather than symmetrized operators. In this case,
zero-point oscillations do not make a direct contribution to the answer, and our first
DENSITY OPERATOR EQUATIONS 117
form, Eq. (7.20), of the fluctuation-dissipation theorem is the appropriate one. The
appropriate ordering of operators for the photodetection problem has been given
by Glauber (1963a, 1963b, 1963c, 1965), by Kelley and Kleiner (1964), and by
Lax (1968QXI).
7.2 Density operator equations
Definition of the density operator

Density operators are introduced to enable us to deal with an ensemble of systems
rather than a single system. Therefore we shall consider a large number N of
systems, each system being in a pure state J described by the wave function \I/J.
The mean of a measurement operator, or observable, M, in state J is
The N is the number of systems in an ensemble, not number of atoms. The average
over an ensemble is
where the sum is over the states J. Pj is the "probability" of a system being in
the state J, i.e., the number of systems in the J divided by the total number of
systems, N. The states *J are normalized, but need not be orthogonal.
We now introduce an arbitrary (orthonormal) basis set </>^. In terms of these
fin's, the pure state wave function \t J can be expanded as
Using this representation, Eq. (7.25) becomes
where the density matrix pv^ is defined by

Equation (7.31) can be rewritten as
where the density operator
has the correct matrix elements pv^ in any system of basis vectors.
Equation of motion of the density operator

In the Schrdinger picture, the evolution of the wave function \I/J of a complete
physical system is given by
where H is the total Hamiltonian of the system, where we have assumed, for
simplicity, that the Hamilton H does not depend explicitly on the time. Using this
equation, Eq. (7.33) becomes
Taking the derivative of this equation with respect to t, the equation of motion of
the density operator is
or
where we use the abbreviation of Eq. (7.5):
Application of these density operator equations to the derivation of the fluctuation-

dissipation theorem will be made in succeeding sections. Equation (7.37) is also
THE RESPONSE FUNCTION 119
valid in the classical case, provided only that Poisson brackets are used, as defined
in Eq. (7.6). Note that Eq. (7.37) has the same form as the Heisenberg equation of
motion except for a minus sign. That is because the Heisenberg equation describes
the motion of a general operator, and Eq. (7.37) describes the motion of the density
operator, hence of wave functions. And wave functions and operators transform in
a contragredient way.
7.3 The response function
We start from the density operator equation
and treat A as a weak perturbation, via the small parameter A. This discussion
follows that in Lax (1964QIII). If we set
then if A were zero, p(t) would reduce to the constant value p(0) of Eq. (7.35).
Equation (7.40) transforms away the rapid motion associated with the unperturbed
Hamiltonian, K. This is called a transfer to the interaction picture. The only
motion that remains is that induced by the perturbation added to the unperturbed
Hamiltonian. It should be no surprise, then, that if we substitute Eq. (7.40) into
Eq. (7.39) we obtain
where the remaining motion is that induced by the perturbation \A and
is the operator A with the time dependence induced by the unperturbed Hamilto-
nian, sometimes referred to as the operator A in the interaction representation.
We assume that the system starts at equilibrium at the time t = — oo:
where Z is the partition sum calculated as a trace. In the classical case, this would
be an integral over all phase space.
so that PE is normalized. For an infinitesimal perturbation, p(t) on the right hand

side of Eq. (7.41) can be replaced by p%. Then p(t] can be obtained explicitly, and
one arrives at
the density operator evaluated to exactly the first order in A. Finally, as K is time
independent, we have
we obtain TBA(U} defined by Eq. (7.2):
But
Thus
where stationarity is applied to obtain the second form from the first. The final
result is Eq. (7.3) as predicted.
For future reference, we note that the interchange of the names A and B in the
first form of Eq. (7.50) yields (with stationarity)
If we set u = —t,
The passage from Eq. (7.51) to Eq. (7.52) involves three minus signs: dt = —du,
the reversal of limits, and the reversal of the order of the operators. Finally, we can
combine Eqs. (7.50) and (7.52) to obtain the simpler form
which involves an integration over the complete region from negative to positive
infinity.
EQUILIBRIUM THEOREMS 121
7.4 Equilibrium theorems
Because of the cyclic natures of traces
we obtain the theorem:
When Planck's constant goes to zero, we reduce to classical physics, and the
operators commute. If we take the Fourier transform of this equation, and intro-
duce t + ih/3 as a new integration variable on the right hand side, we obtain the
corresponding theorem in the frequency domain:
The shift in the variable of integration appears to involve an assumption of analyt-

icity in the region 0 < Im t < (3. However, this assumption is unnecessary if one
writes out the right hand side in the energy representation. Then
The Fourier integral, on the left hand side of Eq. (7.56), yields a delta function
factor 5(Huj — (Em — En)) which permits the replacement of exp[{3(Em — En)]
by exp[/3huj] and the theorem follows with no assumption of analyticity.
In the classical limit, .A(O) and B(t) commute. The factor exp[/3Huj] is the price
one must pay, in quantum mechanics, for switching the order of the operators.
This factor is identical to (and responsible for) the ratio of Stokes to anti-Stokes
radiation intensities.
For relating TBA(<^} — TAB(—^), which represents the response function or
dissipation to the noise spectrum, we use
With the help of Eq. (7.56) and notice that A = A(0) Eq. (7.53) can be written
By Eq. (7.1), the integral on the right is (1/2)GBA- Using Eq. (7.8) we obtain the
fluctuation-dissipation theorem in the form
where the left hand side follows the conventions of Section 4.7.
7.5 Hermiticity and time reversal
The purpose of this section is to develop relations between TAB(~w) and

TBA(W)*. From Eqs. (7.3) the latter quantity can be written
since (TrM)T = TrM', and PE is Hermitian. The Hermitian adjoint of a product

(MJV)t = N^M^, but the Poisson bracket has a factor 1/ih whose sign is reversed
by taking its complex conjugate, so the order is restored and
This result is to be compared with Eq. (7.50)
In the special case A* — B, and of necessity, B\ — A, Eq. (7.62) immediately

yields
and with B set to A* we have
with no assumptions about hermiticity or time reversal for the operators A, B.

However, if A = B = Q where Q is a Hermitian operator, Eq. (7.64) takes the
simple form
In the more general case, time reversal is assumed in the classical case by Onsager
(1931a, 1931b), and derived in the quantum case by Lax (1974).
where e^;# = ±1 according as A (or B) is even or odd under the barring operation,
which includes a classical time reversal transfer and a Hermitian adjoint transfer:
with a similar statement for B. Onsager stated Eq. (7.66) as a classical macroscopic
relation, without derivation. It was not clear what the order of the factors should
APPLICATION TO A H A R M O N I C OSCILLATOR 123
be. Lax (1974) derived this result for quantum systems described by a Hamilto-
nian even under time reversal. Our order of the operators is a consequence of that
derivation. We can immediately obtain
Thus we obtain a new expression for TAB(—U) fr°m Eq. (7.63),
Comparison with Eq. (7.61) shows that
then
From Eq. (7.60), if A and B are Hermitian, our fluctuation-dissipation theorem

can then be written
and if B = A*, we have
To obtain the usual fluctuation-dissipation theorem for Gqq in terms of the

response function Y = Tqq, we can set A = q and B = q, and invoke the relation
which follows from Eq. (7.1) by integration by parts, to obtain
The above example applies to any case in which A and B have opposite parity
under time reversal. Application to a harmonic oscillator will be considered in the
next section.
7.6 Application to a harmonic oscillator
The harmonic oscillator is an important example, since it can stand for an RLC
circuit, a mechanical circuit, or a mode of the electromagnetic field. The electric
circuit response is usually the response of a current to an applied voltage. In the
associated mechanical problem, the response, or mobility, is that of a velocity to
the applied force.
We can therefore use the result, Eq. (7.73), of the previous section by setting
A to the displacement variable Q, and B = A = Q is a velocity, and the applied
potential energy is +\Qex.p(+iujt), corresponding to the perturbation described
in Section 7.1.
The equations of motion of an oscillator of position Q, momentum P, mass
M, and resonant frequency UJQ are given by
Combining Eqs. (7.74) and (7.75), and setting Q(t) = Qmexp(+iujt), with the
position amplitude Qm, we have
Here, and in the discussion that follows, 7 can be frequency dependent, but we
shall avoid using the explicit form 7(0;) for simplicity. The velocity Q is then
expressible as
where the mobility is given by:
which is TQQ in the form of Eq. (7.73) with A = Q and B = Q. Applying the
fluctuation-dissipation theorem, we get for the velocity noise
where the resonant denominator is
The Qu follows another notation common in the literature. See footnote 24 of Lax
(1966QIV). The dagger correlates with the first variable, since it carries the time,
t in Eq. (7.67).
If we apply such relations as shown in Eq. (7.73), we can obtain position-
position correlations. Notice that TQQ = Q m /A, we have
which is Eq. (7.79) by dividing by oA

APPLICATION TO A H A R M O N I C OSCILLATOR 125
But the zero-point effect appears for antinormal order (destruction appears
before creation):
These results were already presented in Appendix C of Lax (1966QIV).

We ask what the result will be if we set A = 0 in Eq. (7.75), but add a Langevin
force F(t). In Fourier notation, Eq. (7.75) can be written in the form:
Thus we obtain the relation
Thus the resonant response, MD(uj] in the denominator, is (correctly) removed if

the Langevin force is applied.
In the work on the quantum theory of noise sources Lax (1966QIV) preferred
to work with destruction and creation operators which are denned by
and the conjugate equation
With this transform of variables, to a good approximation b(€) will contain terms
predominantly of the form exp(—iujt/U) with positive us and tf will contain only
frequencies of the opposite sign.
The inverses to Eqs. (7.85) and (7.86) are
The equations of motion of the destruction and creation operators are
where
The same ratio applies to the Fourier components, so that the power ratio is
The result is that these new Langevin forces have the astonishingly simple form
If the analysis is repeated with the operators in the opposite order, say (Q(0)Q(t)),
one obtains
where the added unity is the result of the commutation rules and yields, as
expected, the corrected Stokes-anti-Stokes ratio
The procedure used by Lax (1966QIV) to produce a Markovian model that can
be solved by a well established set of tools is
1. To make a rotating wave approximation in Eqs. (7.88)-(7.89) by omitting tf
in the equation for db/dt and vice verse, and
2. To force the noise source to be white by evaluating it at a> = U>Q.
7.7 A reservoir of harmonic oscillators
To make some of the abstract concepts of dissipative motion become meaningful,

we shall illustrate them by using a reservoir consisting of a continuum of har-
monic oscillators. For simplicity, we shall choose our system, itself, to be a single
harmonic oscillator. By assuming the reservoir oscillators are already uncoupled
normal modes, and that our system oscillator interacts linearly with each of these
modes, we can make an exact analysis of how our system oscillator behaves in the
presence of the reservoir.
Let the system oscillator have position and momentum Q and P, with each
reservoir oscillator having corresponding variables qj and PJ. We can write a
coupled Hamiltonian in the form:
The equation of motion of the system oscillator is then

A RESERVOIR OF HARMONIC OSCILLATORS 127
where the coupling force of the reservoir oscillators on the system oscillator is
Each system operator in the reservoir obeys the equations
An exact solution can be made for each oscillator. For each oscillator j subject to
a force A(t), which is an abbreviation for —a,jQ(t), the position and momentum
(called briefly q and p) at time t is given in terms of the initial values q(to} and
p(to) with u = t — tn,
When each of the solutions for ând PJ are inserted into Eq. (7.97), the system
momentum is found to obey
where K(t — s) represents damping due to action of the reservoir:
We define a damping coefficient 7(0;) by
The Langevin force in Eq. (7.101) is
If we define
then
We note the commutation rule obeyed by the random force F(t) is
Another form can be written using the anticommutator:
where the energy associated with each mode can be written
Phen we have
whereas
The integrations over frequency in this section extend only over positive fre-
quency since each frequency ujj of an oscillator is, by convention, positive. The
relation between the Fourier components of K(u) or L(u), which describe trans-
port and involve 7(0;), and the noise anticommutator, Eq. (7.110), which involves
7(0;) and E(UJ) represents the standard fluctuation-dissipation theorem. Here we
have not given an abstract proof of the fluctuation-dissipation theorem for a gen-
eral system, but a demonstration of how it arises in a simple case of a harmonic
system, and a harmonic reservoir. If we were to replace the potential energy of
the harmonic oscillator by V(Q) the remaining analysis would remain unchanged
except that MOJ^ in Eq. (7.101) would be replaced by dV(Q)/dQ.
If the coefficient a?J(mjU?}, which are regarded as a function of frequency, is
chosen to be a constant, then 7(0;) =7, and we get
and
This is the quasi-Markovian limit.

8
Generalized Fokker-Planck equation
8.1 Objectives
The first objective of this chapter is to reduce the determination of the behavior of a
Markovian random variable, a(t), to the solution of a partial differential equation
for the probability, P(a,t), that a(t) will assume the value a at time t. In this
way, a problem of stochastic processes has been reduced to a more conventional
mathematical problem, the solution of a partial differential equation.
But the spectrum of a random process requires that the Fourier transform be
taken of a two-time correlation. We therefore introduce a regression theorem that
states that for a Markovian process, the time dependence of a two-time correlation
of the form (a(t)a(O)) is the same as that of (a(t)). The motion of (a(t)) is just
the motion or transport of the variable a itself. Thus we relate the spectrum of
the noise to an understanding of the one-time transport or the mean motion of the
system. Onsager (1931a, 1931b) was the first to follow this procedure. He simply
stated it as an assumption within the context of a classical system near equilib-
rium. A proof, for the classical case is given in Section 8.4. A detailed discussion
of the quantum case is given in Lax (1968QXI), and a comparison between the
exact treatment of a harmonic oscillator, in the quantum case, with a Markovian
procedure introduced in Lax (1966QIV) is given by Ford and O'Connell (2000)
and Lax (2000).
However, the initial condition of (o(i)a(O)) is not the same as the initial con-
dition for (a(t)}. Thus we must obtain, in another way, information about the
total fluctuation {[a(0)]2) at the initial time. For stationary processes, at equilib-
rium, these fluctuations can be determined from thermodynamical arguments. For
example, the mean square velocity of a molecule in three dimensions is determined
from equipartition to be
where k is Boltzmann's constant, T is the absolute temperature, and M is ths

molecular mass. With the help of the Einstein relation,
130 GENERALIZED FOKKER-PLANCK EQUATION
relating the diffusion constant D to the mobility A we can then deduce the value of
the diffusion constant D in the equilibrium case. In the nonequilibrium case (v2)
is unknown, and Eq. (8.2) is used to obtain it from D and A which are determined
directly from the stochastic model used to describe the system. In particular, D
is determined by the second moments of the velocity jumps, and A is determined
directly from the first moments of the velocity jumps.
Explicit evaluation of the spectrum is carried out, in this chapter, for quasilinear
systems. The simplest example will be generation-recombination noise. Noise in
a system that cannot be approximated as quasilinear, such as a laser because it is
a self-sustained oscillator, is discussed in Chapter 11. In this case, an analytic (or
numerical) solution must be made of the Fokker-Planck equation for the one-time
motion of the system, and a regression theorem must be exploited to obtain the
spectrum.
We are then in a position to calculate the (two-time) noise spectrum by exploit-
ing a classical regression theorem which is established in Section 8.4. The form in
which that theorem is proved here is the statement that the equation obeyed by the
two-time (or conditional) distribution P(a(t~),t a(0), 0) obeys precisely the same
differential equation in time as the one-time distribution, P(a(t), t). The proof is
based, explicitly on the system being Markovian, with no assumption of equilib-
rium. Our classical regression theorem is not equivalent to the Onsager regression
hypothesis in two ways: (1) it is a theorem, with a proof, not a hypothesis; (2) it
does not assume that the fluctuations take place from equilibrium but can be from
a nonequilibrium stationary state.
To what extent can we expect the dynamics of a system of random variables
to be Markovian, namely that the probability of future events is determined by a
knowledge of the present. This is analogous to the question: to what extent can
we expect the future of a set of (nonrandom) variables to be deterministic, namely,
that their future values are completely determined by the initial conditions? The
answer to the latter question is yes if the set of variables is complete. But in any real
problem, one doesn't include all the variables in the whole universe. The scientist
must decide which are the relevant variables, and the rest can be discarded. In the
random case, an analogous choice must be made. If enough variables are used, we
would expect that the noise sources would be white. If they are not white, they may
have come from a filtering process via an intermediate system. If we add the latter
to our system, we can make the ultimate noise white, and Markovian methods
become available to solve the problem.
Although it is beyond the scope of this book, we will comment on the quantum
case. Because of the commutation rules, noise at positive and negative frequencies
cannot be equal. Their ratio is the familiar Stokes-anti-Stokes ratio exp(huj/kT)
discussed in Section 7.6. However, when the damping rate 7 is small compared
to the frequency difference 2wo between positive and negative frequencies it is
DRIFT VECTORS AND DIFFUSION COEFFICIENTS 131
possible to separate these two degrees of freedom by a rotating wave approxima-
tion. Then the noise can be treated as approximately white over the width of each
resonance. The correct Stokes-anti-Stokes ratio can still be maintained. The error
in this procedure is first order in 7/0;. For an optical laser, typical numbers are
7 = io9 per second and uj = 1015 per second.
In spite of the negligible error in the application for which the approximation
was made, the Lax-Onsager procedure has been attacked by Ford and O'Connell
(1996, 2000). Although these authors recognize the often negligible errors, they
have repeatedly stated that there is no "quantum regression theorem", and this
is of course true. The initial statement, Lax (1963QII) on regression was based
on an approximate decoupling between system and reservoir. So clearly, it was
understood to be an approximation in the quantum case, not a theorem. Later
papers, clarifying the nature of the approximation, see Lax (1964QIII, 1966QIV),
were not mentioned in the initial draft of Ford and O'Connell (2000). To under-
stand why the approximate procedure worked in a variety of cases, Lax (1968QXI)
showed that it would work whenever the system was Markovian. Why this is true
in the classical case is shown in Section 8.4. The quantum mechanical proof in
Lax (1968QXI) merely showed that a system would obey a regression theorem if
it were Markovian, and vice versa. But an exact quantum treatment of a damped
harmonic oscillator was already used to show in Lax (1966QIV) that the ratio of
the noise at positive frequency to that at the corresponding negative frequency is
exp(huj/kT~), usually called the Stokes-anti-Stokes ratio. Since this ratio is not
unity, the noise cannot be white, which would imply an exactly flat spectrum.
Ford and O'Connell (1996) wrote: "But the so-called quantum regression the-
orem appears in every modern textbook exposition of quantum optics and, so far
as we know, there are no flagrant errors in its application. How can it be that a
nonexistent theorem gives correct results?"
The answer, of course, is that many real systems are approximately Markovian.
That is why the method works. It is not necessary for the noise to be white over all
frequencies, but only over each resonance, where most of the energy resides.
8.2 Drift vectors and diffusion coefficients
A general random process (not necessarily a Markovian one) obeys the relation on
conditional probabilities, Eq. (2.15), which is rewritten in the form
The equations in this chapter can all be generalized by regarding a as a set

of variables rather than a single variable. The transition probability P(a',t +
At|a, t; ao, to) describes the probability of arriving at a' at t + At if one starts
at a at time t remembering that one started the entire process at ao at time to- This
last bit of information is irrelevant if the process is Markovian. If this dependence
is dropped, Eq. (8.3) reduces to the Chapman-Kolmogorov condition, Eq. (2.17).
For many Markovian processes one can write (for a1 ^ a)
where At is small and wa>ais the transition probability per unit time and the
normalization condition (including a' = a),
requires us to conserve probability. The probability of remaining in the original

state must therefore be
Thus conservation of probability guarantees that you must end up somewhere,

when one includes the possibility of remaining in the initial state.
Note, that any choice of wa>a will satisfy the Chapman-Kolmogorov require-
ment, Eq. (2.17). However, it is not always possible to expand the transition
probability linearly in At. The simplest example of this is Brownian motion for
which
Nevertheless, the Chapman-Kolmogorov equation, Eq. (2.17), is obeyed by Eq.

(8.7).
We shall make the weaker assumption that moments of the transition prob-
ability can be expanded linearly in At: for n > 1 and infinitesimal At we
assume
For a certain nth moment, if it is proportional to At fc , with k > 1, then Dn can be

regarded as zero. This equation defines an nth order conditional diffusion constant
by
This is in fact obeyed even for the Brownian motion process, which can be easily
proved from Eq. (8.7). Again, for a Markovian process the dependence on OQ can
be omitted everywhere.
DRIFT VECTORS AND DIFFUSION COEFFICIENTS 133
Equation (8.9) is the central axiom we set for random processes. Random
variables are different from nonrandom variables in smoothness with time. For
a nonrandom variable, (a' — a)n is proportional to At™, because it varies smoothly
with time, hence, lim[(a' - a) n /At] = 0 for n > I. For a random variable, there
are nth moments for n > 1 proportional to At due to strong fluctuation.
It is customary to refer to DI as a drift vector, A, since
so that we expect to be able to show later that the mean motion of our random
variable a obeys
where {• • • } means average of a at time t. In the Markovian case, A(a,t\ao,to)

is replaced by A(a, t) but the dependence on ao in the other factors remains. It is
appropriate to call
a diffusion constant in the variable a in analogy with the Brownian motion result
that D = {(Ax) 2 )/(2At) for a one-dimensional position variable. However, a
need not be a position variable. It could, for example, be an angular variable, or
the number of particles in a given state.
If one does not impose a (in) = ao, every random process obeys
which adds up all the ways of arriving at a' weighting each with the probability of
its starting point. Thus it is also possible to define the set of diffusion constants
The process is Markovian if and only if, for all n,
Equation (8.15) follows from Eq. (8.9) where all information for times earlier than
t is ignored, which is appropriate (only) for Markovian processes.
In applying these ideas to lasers we compute the diffusion constants by letting

At become smaller than any of the relaxation times or line-widths. But we actually
require At to be larger than the optical period which can be as short as 10~15sec.
Our process need then only be Markovian for steps, which are large compared to
the optical period. This procedure is analogous to the derivation of the Einstein
diffusion equation for Brownian motion in Chapter 3, where the actual process is
only Markovian when both position and velocity variables are included, not the
position alone. However, for time intervals large compared to the velocity decay
time, the process becomes Markovian in position alone as shown in Section 3.6.
8.3 Average motion of a general random variable
A Markovian random process obeys the Chapman-Kolmogorov relation:
which adds up all the ways of arriving at a! weighting each with the probability
of its starting point. The average motion of an arbitrary function M(a,t) may be
obtained by integrating M ( a f , t) against P(a', t + At). Thus, we should multiply
Eq. (8.16) by M ( a f , t) and integrate over a'. On the right hand side of the equation,
we shall replace M(a' t) by its Tavlor expansion:
The integrals of (a' — a)n over a' give rise to the diffusion coefficients in Eq.
(8.8). Then integration in Eq. (8.16) over a at time t leads to
In obtaining (M(a,t))t of the second term of the right hand side in the above
equation, the normalization property of the transition probability is used:
Thus we obtain
AVERAGE MOTION OF A GENERAL R A N D O M VARIABLE 135
where (• • • ) for any function M(a) is defined by
This formula (8.20) is valid in the sense of the expectation value for a general
random variable M(a, t). We remind readers of the two different symbols. The
symbol {(Aa) n ) a(i)=a = ([a(t + At) - a(t)f) a(t)=a in the definition of Dn(a, t)
in Eq. (8.4), where a subtraction of a at t from a at t + At is first taken, then
an average is made over the values of a at t + At subject to a fixed a(t) = a
at time t, involves only an integration over a(t + At). The symbol d(M(a)} =
(M(a, t + At)} — (M(a, t)) that represents the change of the expectation value of
M(a, t) with time, where the averages of M(a) at two different times t and t + At
are first made, then the difference is made, involves two integrations over a(t) and
a(t + At).
We shall illustrate its value by setting M(a) = a,
and for the variable M, we have d(M)/dt = (A(M,t)). When M(a) = a2, we
have
where we have used A(a, t) = Di(a, t), D(a, t) = D^(a, t), ..., and {...) rep-
resents the average over a. The last two terms are taken in Eq. (8.23) instead of
2(A(a, t)a), as that is needed for the multidimensional cases and the quantum case
in physics.
The conditional expectation under the condition of a(t) = a is obtained by
setting
Hence, from Eq. (8.22) we have
and for the variable M(a), we have
/to's calculus lemma, a fundamental mathematical tool used in financial quan-

titative anaysis, is Eq. (8.26) cutting off at n = 2 for the Brownian motion.
Mathematicians denote the conditional expectation by the symbol E(Xs\Jrt),
s > t, where Xs is a stochastic process, and Ft is a <T-field known as the "nat-
ural filtration" in martingale measure. Ito's integral enforces the expectation value
of the second fluctuation term in the stochastic differential equation (10.40) to be
zero (see Sections 10.4 and 16.5). Hence, under variable transform, the first term
in the Ito's stochastic differential equation (10.40) for M is the conditional average
A(M, t}, and should satisfy Eq. (8.20) up to n = 2, which is Ito's calculus lemma
(10.41). In Chapter 10, we will discuss another Langevin form of the stochastic dif-
ferential equation, which uses the ordinary calculus rule, but leads to the average
and the conditional average as the same as that using Ito's calculus lemma.
If the diffusion coefficients do not depend explicitly on the time, a stationary
state will exist. In that state, we will have
The first equation determines the operating point of the stationary state. The sec-
ond is the usual Einstein relation which relates the diffusion coefficient D to the
dissipative response contained in A.
If the noise is weak so that the fluctuations extend over a range in a which is
small compared to that over which A (a) and D(a) vary appreciably, it is permissi-
ble to make a quasilinear approximation. Let aop represent the point at which the
drift vector vanishes:
and set
as the deviation from the operating point aop. In the quasilinear approximation we
shall make the approximations
i.e., we retain first order deviations in the drift vector. Thus our drift, or transport
equation, simplifies to
and the diffusion equations reduce to
In the steady state, the left hand sides of Eqs. (8.33) and (8.34) vanish and for the
single variable, classical case, our Einstein relation is then simply
THE G E N E R A L I Z E D FOKKER-PLANCK EQUATION 137
which relates a diffusion constant D, a decay or dissipation constant A and the
mean-square fluctuations (a2) in the steady state. Here (a2) plays the role of kT
but thermal equilibrium has not been assumed as it was in Einstein's original work.
In general, the equations for /(a), with /(a) = 1, a, a 2 , do not form a closed
set. Therefore it is necessary to obtain an equation for the full probability distribu-
tion P(a, t) or P(a, t aoto) and not just the moments of those distributions, which
is presented in Section 8.4.
8.4 The generalized Fokker-Planck equation
To obtain the equation of motion for P(a, t) we write Eq. (8.20) in the explicit
from, when M(a) does not explicitly depend on time,
After integration by parts (n times for the nth term) we obtain
Since this equation is to be valid for any choice of M(a), the coefficient of M(a) in
the above equations must vanish yielding the generalized Fokker-Planck equation:
The ordinary Fokker-Planck equation is the special case in which the series ter-
minates at n = 2. The ordinary Fokker-Planck equation, which can describe a
nonlinear Brownian motion, plays a special role because of the following theorem.
Theorem: The Fokker-Planck or Bust theorem

If any Dn for n > 2 exists, then an infinite set of Dn's exist. Thus the Fokker-
Planck differential equation is the only one of finite order.
Proof
Define An = n\Dn. Then the Cauchy-Schwarz inequality takes the form
Equation (8.39) is a generalization of the familiar theorem regarding vectors A and

B
Equation (8.39) can be rewritten as in terms of the drift vectors as
This argument is due to Pawula (1967). From Eq. (8.41) we see that
and
Thus if any An exists for n > 2, this string of inequalities guarantees the existence
of an infinite number of such coefficients. Such an infinite number corresponds to
an integral equation rather than a differential equation. Actually, Pawula proves
the converse. If one assumes the existence of a closed equation of order greater
than 2, then some finite order coefficients must vanish. By using the inequalities
in reverse, one can show that all coefficients above n — 2 would vanish.
The existence of a Fokker-Planck equation for a random process does not guar-
antee that the process is Markovian. If we start from Eq. (8.9) instead of (8.20) we
obtain the equation
In the Markovian case, Eq. (8.44) reduces to Eq. (8.38) since the dependence on
the earlier time to can be discarded. Thus if (and only if) the process is Markovian,
P(a, t ao, to), a two time object, obeys the same equation of motion as the one
time object, P(a, t). In the Markovian case, then, we can calculate the conditional
probability, P(a, t ao, to), a two-time object, by solving the single time equation,
Eq. (8.38), subject to the initial condition
This claim is a precise statement of the "Lax regression theorem", Lax

(1963QII), for the classical case and is not equivalent to the "Onsager regression
hypothesis," (Onsager 1931 a,b). Onsager's hypothesis restricted to fluctuations
from an equilibrium state for which it has been justified by the Callen-Welton
(1951) fluctuation-dissipation theorem. Lax's work was originally derived for the
quantum case by a factorization approximation in Lax (1963QII) and rederived
in Lax (1968QXI) using only the assumption that the system is Markovian. In
the classical case, there are many known Markovian systems. In the quantum
GENERATION-RECOMBINATION (BIRTH AND DEATH) PROCESS 139
case, almost all systems for which solutions are feasible are approximately Marko-
vian. And my (Lax) proposal should then be labeled the "Lax quantum regression
approximation." It is useful for many driven systems, such as lasers for which the
equilibrium assumption is not valid.
The generalized Fokker-Planck equation, Eq. (8.38), can be expressed in a
form similar to the Schrdinger equation of quantum mechanics. If we define
and
thus we have
and we obtain
Here, a and y may both be regarded as operators but all y's must remain to the left
of all a's.
8.5 Generation-recombination (birth and death) process
The generation-recombination process in semiconductors is another example of

a shot noise process in which the diffusion constants can be calculated from first
principles based on an understanding of the physics of the process. It is also an
example of what statisticians refer to as a birth and death process.
Let us define the random integer variable n as the occupancy of some state. We
assume the particles are generated at the rate G(n) and disappear at the rate R(n)
(recombination). Then our mean equation of motion is
where (A(n)} is our average drift vector. Since the occupancy of state n is
increased by generation from the state n — 1 and reduced by generation out of the
state n, whereas it is increased by recombination out of state n + l and reduced by
recombination out of state n, the probability distribution function, P(n, t), obeys
the following generalization of Eq. (3.3)
If we divide by At we immediately obtain the generalization of Eq. (3.5)
We can recover the Poisson process of Section 3.1 by setting R = 0, G = v.

Equation (8.51) can be rewritten as a master equation
with the transition rate
It is convenient to interchange the symbols n and n' so the n now represents

n(t) and n' represents n(t + At). The rth diffusion coefficient, Dr, defined by Eq.
(8.14),
takes the value
after omitting noncontributing terms in Eq. (8.54).

GENERATION-RECOMBINATION (BIRTH AND DEATH) PROCESS 141
A simpler procedure is to rewrite Eq. (8.52) as a generalized (infinite order)
Fokker-Planck equation, Eq. (8.38), by a Taylor expansion of f(n ± 1) by /(re):
The rth order diffusion constant read off from the coefficient of the rth derivative
term agrees with that found in Eq. (8.56). Compare with Eq. (8.38) and see Lax
(1966III) for a detailed discussion of the generalized Fokker-Planck equation. We
have
Thus all the even numbered diffusion constants are proportional to the sum of
the rate in plus the rate out, while all the odd numbered diffusion constants are
proportional to the difference, i.e., the rate in minus the rate out. In particular, the
second moment, Do, which will be the moment of the noise source, obeys
These results are characteristic of shot noise. A Langevin approach to the gen-
eralized Fokker-Planck equation is presented in Lax (1966QIV) and discussed in
Chapters 9 and 10.
In the quasilinear approximation, the operating point, nop, and decay parame-
ter, A, are determined by
and the diffusion coefficient is
Then, using Einstein relation, the fluctuation from the average value becomes
We now want to calculate the autocorrelation function, (Are(t)Are(O)). The

transport equation and solution are
where we have tacitly assumed that t > 0. By the definition, Eq. (1.80), of
conditional probability
Thus we can take a two-time average by first averaging conditional on a fixed

Are(O):
That the dependence on time t of a two-time average during a fluctuation is the

same as that of a one-time average (with possibly different initial conditions) is
a statement that the "regression" theorem is obeyed. For a non-Markovian pro-
cess, (An(t)) An ( 0 ) depends not only on the initial condition, but also on the prior
history, and the path to Are(O).
At the quasilinear level, A is independent of Are(O) so that the conditional
average also obeys Eq. (8.64):
Even without the quasilinear approximation, (An(t)) An ( 0 ) ls me same as the

solution for (Ara(i)) subject to a given Ara(O) because the process is Markovian.
Equation (8.67) permits us to write
We have replaced t by \t\ here, but the justification is based on time reversal as
developed in Section 8.10.
Using Eq. (8.63) in the quasilinear approximation,
We have thus derived the autocorrelation of the generation-recombination fluctu-

ations, whose Fourier transform yields the noise.
We may remark that the simple Poisson process is a generation recombination
process with no recombination, and a rate of generation G independent of n:
In this case, the general diffusion coefficient Dr is given by
Thus the Poisson process of Section 3.1 is a special case of the generation-
recombination process of shot noise processes in which the noise arises because
THE CHARACTERISTIC FUNCTION 143
of the discreteness of the occupancy number n. An even more general case, when
many states have occupancies was given in Lax (19601).
Let us summarize the procedure we use to obtain the spectrum of fluc-
tuations from the quasilinear stationary state. For a Markovian process, the
time-dependent decay (regression) of a correlation, (An(i)An(O)}, is the same
as that of (An(t)}^ n ( 0 ) the decay of the mean motion from a deviation. This is
the basis of the Onsager (1931a, 1931b) regression hypothesis for the equilibrium
state, but it is proposed as a theorem by Lax (1968QXI) for Markovian systems
with no assumption regarding equilibrium. Of course classical systems can be
exactly Markovian, whereas quantum systems can only be approximately Marko-
vian. For the classical physics case, this proposed theorem is a consequence of
the definition of conditional probability, Eq. (8.65). Thus the conditional average
of An(t) obeys the same time dependent equation of motion as the unconditional
average. See Lax (19601). In this way, the frequency dependence of the noise is
determined by the Fourier transform of the mean motion - the transport. The nor-
malization of the noise is determined by its total: {[Are(O)] 2 } and the latter can
be calculated via the Einstein relation. Thus {[An(0)] 2 ) is determined by D and
A, and D must be calculated directly from the nature of the random process. In
the equilibrium case, the procedure is the reverse. The total fluctuations are deter-
mined by the Gibbs distribution in classical mechanics, or the associated density
matrix. The Einstein relations can then be used in the reverse direction to calculate
the diffusion constant.
8.6 The characteristic function
We found earlier that the easiest way to obtain solutions to the Poisson process is
to solve for its characteristic function. This suggests examining the characteristic
function of the generalized Fokker-Planck process.
The characteristic function, (p(y, t), of a normalized probability density P was
defined in Section 1.5 as
The moments (an} are determined by the nth order derivatives of 0(y, t) with
respect to y at y = 0 since from the above equation
From Eq. (8.20) the equation of motion of 0(y, t) is
Since
thus
where
Together with
therefore, the equation of motion of (f>(y, t), Eq. (8.74), becomes
which can be rewritten as
This is to be compared with Eq. (8.49)
Thus we have forms, Eqs. (8.81) and (8.80), analogous to the quantum mechanical
space and momentum representations, respectively. The best form to use, as in
quantum mechanics, depends on the particular problem.
Although our diffusion coefficients are originally defined in terms of ordinary
moments, i.e., Eq. (8.14)
we can also use a linked-moment definition
since the lower moments to be subtracted off, i.e., the unlinked parts, contain the
product of at least two averages of the form ([a(t + At) — a(t)]'} with I > 1,
THE CHARACTERISTIC FUNCTION 145
and the product is at least quadratic in At, and can be discarded. Although the
cumulants have an earlier history as Thiele (1903), called the semi-invariants (see
Section 1.6), but the use of the linked-moment notation (L subscripts) in quantum
problems is due to Kubo (1962).
In both Eq. (8.82) and Eq. (8.83) averages are taken subject to the initial con-
dition a(t] = a. With the notation Aa = a(t + At) — a(t), Eq. (8.83) can be used
to rewrite Eq. (8.48) for L(y, a, t) in the elegant form
As an example, let us consider the shot noise case. We had found the diffusion
coefficients, Dn, shown in Eq. (8.71), to be
Then the equation for L(y, a, t), Eq. (8.84), becomes
We note that in this shot noise case, L ( y , t ) is independent of a, and similarly

L ( y , a, t) is independent of a. The equation of motion of the characteristic function
d>(y. t), Eq. (8.80), then becomes
whose solution, noting that <p(y, 0) = 1, is
However, we also have from Eq. (8.72) and Eq. (1.54)
Therefore equating the exponents of Eqs. (8.88) and (8.89), we have
or
i.e., all the linked-moments are the same and equal to the number of events
expected, (a), or vt as shown in Eq. (3.16).
8.7 Path integral average
All our previous work has dealt with averages taken at one time or at most, taken
at two different times. We shall now look at a truly multitime function.
A number of important problems in the theory of random processes can be
reduced to an expectation value of the form (see, for example, Lax 1966III; Deutch
1962; Middleton (1960); Stratonovich 1963),
or to a Fourier transform of such an expression. For example, in the photode-

tection fluctuation problem, one wishes to determine the probability, P(m, T) of
observing m photocounts in a time interval T. This has been shown by Lax and
Zwanziger (1973) to be determined by the distribution of time-integrated optical
intensity (integrated over a time T). This is difficult to calculate directly, but evalu-
ation of the Laplace transform of the time integrated intensity involves an average
of the form
This is evaluated for real A by Lax and Zwanziger (1973) and the inverse transform,
an ill-posed problem, is obtained using a Laguerre expansion procedure.
The most important object of attention is the generalized characteristic func-
tional F f - • - 1 ofo(s]
from which moments of products at an arbitrary number of times can be computed.

For the case Q = —iq(s)a(s), we shall show that at least in the classical case, this
problem can be reduced to analysis. First we break up the integral in Eq. (8.91)
into small time units,
which becomes
We note that we do not integrate over dao in the above equation since MQ is the
average under condition a (to) = GO- To calculate the total average, M, we would
PATH INTEGRAL AVERAGE 147
set
For a Markovian process, using the factorization of probabilities, Eq. (2.11) and
Eq. (8.95) become
The relation between Feynman (1948) and Wiener (1926) path integrals is dis-
cussed by Montroll (1952). This equation is essentially the equation of a Feynman
and Wiener path integral. We now define
For small time intervals At,,
We may regard P(a,j+i a.,) as the transition probability of a new Markovian pro-
cess. It obeys the usual properties of transition probabilities except for a change in
the normalization
to first order in At. If we regard P(a, t) as a density of systems, then Q(a, t) can
be regarded as the rate at which these systems disappear. The higher moments of
P(a'\a) are the same as those of P(a' a),
to first order in At.

Returning to the derivation of Eq. (8.49), but including the effect of Eq. (8.99),
we obtain,
which is Eq. (8.81) with an added loss term. Thus on the right hand side we
have the usual Fokker-Planck operator, plus a loss term. Nowhere in our previ-
ous discussion have we made use of the normalization of the probabilities except
in the calculation of the zeroth moment. Therefore, from a generalization of the

Chapman-Kolmogorov condition, Section 2.4, we have
Since P is not normalized and is not an ordinary Markovian process, the use of
the Chapman-Kolmogorov condition requires some justification. We can provide
an intuitive proof as follows. The term in Q(a) represents a rate of disappearance
from which there is no return. If we simply add a new discrete state which holds
all the escaped probability, then P will be again a normalized Markovian process
since no memory has been added. Equation (8.102) then describes a composition
of probabilities in which the system ("particle") passes from ao to an having sur-
vived, i.e., not disappeared or escaped. This is accomplished by passing through
the intermediate states 01,02 • • • a n _ihaving survived at each step. As a formal
proof of the Chapman-Kolmogorov condition we note that the P process differs
from an ordinary Markovian process only in having
with the loss term
increased by an amount Q(a). But these transition probabilities have been shown
to obey the Chapman-Kolmogorov condition in Section 2.4 without specifying
r(a). The P process is already included in that proof. Thus Eq. (8.97) becomes
or in a more expanded notation,
which just describes a decay in the normalization of the P's associated with taking
the mean of the exponential, Eq. (8.91). We note, however, that Q may be positive
or negative.
Associated with P we define a characteristic function, </>, which is the Fourier
transform of P, just as <f> was the Fourier transform of P, i.e., we define
LINEAR DAMPING AND HOMOGENEOUS NOISE 149
Comparing Eq. (8.107) with Eq. (8.105) we see that
We conclude that if we can solve the equation for the characteristic function </>, we
can evaluate the path integral Eq. (8.91). In a similar manner as the derivation of
Eq. (8.101), we find the equation of motion of the characteristic function <^>, to be
where a = —id/dy. Thus, as in the equation of motion for ct>, we obtain an extra
term in the equation of motion for <j>, compared with the equation of motion for (j>,
Eq. (8.80).
8.8 Linear damping and homogeneous noise
We now specialize these results to an easy case for which explicit answers can
be obtained: the case of linear damping and homogeneous noise. In our present
language, linear damping means
and homogeneous noise means
i.e., our Dn(a)'s for n > 2 are independent of a, but could be functions of time.
Actually, A can also be a function of time, but to simplify the equations, we
temporarily stick to constant A.
In this case the operator L becomes
where the complete dependence on a is contained in the Aa term, and the noise
contained in K(y, t) is homogeneous, or independent of a:
Note that terms linear in y have been separated, and K(y, t) contains all terms
quadratic and higher in y. We can then solve for P, and <j> by using Eq. (8.101) and
(8.109). We are interested in the form
Since
Eq. (8.109) becomes
Since a only appears linearly in L, only first derivatives with respect to y appear
in the partial differential equation (PDE). If we had K = 0, the method of
characteristics of Appendix 8.A, which applies to PDE's that involve only first
derivatives, becomes applicable. The method of characteristics yields the equation
of characteristics
(We write yA rather than Ay since this is the appropriate order when y is a vector
y and A is a matrix — the case discussed in Lax (1966III) Section 7F. The solution
can be written
where
is the special solution appropriate to y(0) = 0, and the first term is a solution of
the homogeneous equation. To conform with the notation in Eq. (8.206) we note
that y(0) = Y. If K = 0, the exact solution of the PDE is then
where Y(y, t) is the inverse of y(Y, t ) , the solution of y(t) in terms of its initial
value Y, and 6(y. 0) has the form:
which is the prescribed initial condition since P(a, t\ao, 0) approaches 6(a — ao)
and (j)(y, 0|a 0 ,0) is defined by Eq. (8.107).
We can deal with the case K(y,t) / 0 by introducing the new variable z by
the transformation
Since this would be a solution with i ^ ( z , t ) = if)(z,0) if K were absent, all

terms not involving K cancel on the right hand side, as may be verified by direct
LINEAR DAMPING AND HOMOGENEOUS NOISE 151
calculation and the new dependent variable obeys
An integration over time can be immediately performed:
Inserting z = [y — yo(t)] exp(—At), as well as Eq. (8.101), we obtain an explicit

solution for 6:
The desired function MQ may be obtained according to Eq. (8.108) merely by

setting y = 0 in <j>(y, t O,Q, to). Thus we find that
This is a path integral or characteristic function involving an arbitrary q(s). It per-

mits one, by differentiation, to obtain all possible moments of a(s). For example,
(a(u)a(v)} is determined by differentiating the characteristic function with respect
to q(u)q(v). Moreover, the above general problem solved includes a variety of
important special cases.
1. Brownian motion if we set A = 0, Dn = 0 n > 2 , D^ = D - constant;
2. Uhlenbeck and Ornstein (1930) process (see Section 3.5), if we set Dn = 0
n > 2, A = constant;
3. Shot noise, if we set A = 0, Dn = vjn\.
Note that the first factor in Eq. (8.126) is obtained by taking the average and
bringing it inside the exponential. The Uhlenbeck-Ornstein process, using Eq.
(8.113), with Dn = 0 for n > 1 is
The associated MO may be obtained directly by regarding a(s) as a Gaussian

random variable
where, with a = a — (a),
We note that the q(s) function has been completely arbitrary in our solution of
this problem. We have desired this arbitrariness in order to make a comparison in
Section 9.5 with a corresponding Langevin problem. Equality for general q(s) will
guarantee the full equivalence of these problems.
8.9 The backward equation
The formalism associated with Markovian random processes is not symmetric in

time, and a separate investigation must be made of how the probability distribu-
tions propagate backward in time. To our surprise, we not only find a different
equation, but one easier to solve.
Equation (8.101) is the equation of motion for the probability forward in time.
To derive a similar equation of motion for the probability backward in time we
rewrite the generalized Chapman-Kolmogorov equation (8.102), in the form
We then take the Taylor expansion of P(a, t\a', to) about a$:
EXTENSION TO M A N Y VARIABLES 153
and using Eqs. (8.99) and (8.100) obtain the backward equation, (Lax 1966III
5.21).
This backward equation is useful when trying to evaluate multitime averages.

These averages require the evaluation of integrals over P. Since Eq. (8.132)
involves derivatives with respect to OQ, we can integrate it over a and using Eq.
(8.105), obtain
Thus we have obtained an equation of motion for our final result, the path integral
itself. Such a procedure that obtains an equation directly for the quantity of interest
is referred to as "invariant embedding" by Bellman (1964).
8.10 Extension to many variables
We have chosen, for simplicity of notation, to present Markovian processes in

terms of a single random variable a. Most equations generalize immediately to
many variables if we replace a(t) by a(i), where
is a set of N variables. Thus the reader may replace in his mind each a(t) by the
same symbol in boldface.
We shall not restate all the results in the multidimensional case, but only indi-
cate a few where it is worthwhile to display the subscripts explicitly. All results
were stated in multidimensional form in Lax (1966III). For example, the equation
of motion of a general operator can be written
where we have also generalized to include the loss term Q so as to be able to
evaluate path integrals later. The diffusion coefficients themselves are defined by
where the limit as At —> 0 is understood and
From Eq. (8.135) we obtain the useful moment relations
From the motion of a general operator, the generalized Fokker-Planck equation

is obtained by integration by parts:
or in a more concise notation,
The associated characteristic function obeys
where
Before proceeding with a recitation of formal results, we would like to com-

ment on some of the strange properties of Markovian processes. For this purpose,
we shall set Q = 0 to study the underlying process rather than a path integral.
The motion of a general operator Eq. (8.135) was derived using the Chapman-
Kolmogorov condition. It is instructive to provide a more intuitive explanation.
The strange feature is that it is not correct to set as an ordinary differential,
For a shot noise process, because all the cumulants are equal, all terms of order
(Aa) re yield a contribution of first order in At. Thus we must write
and retain terms to all orders for shot noise. A subsequent average over a yields
our previous equation of motion, Eq. (8.135). For a Brownian motion process Aa^
contains terms of order (At) 1 / 2 . See the Brownian motion results, Eqs. (3.27),
(3.41). Since Dn = 0 for n > 2, only terms Aa^ to second order are needed for
Brownian motion problems.
Now, we discuss the noise spectrum in the case of multiple variables. When the
fluctuations are small, it is appropriate to introduce the deviations
whose average values vanish.

The first moment equation of Eq. (8.138) can be written
where the matrix A has elements:
If the elements a^(or a^) are real (Hermitian) then so is the matrix A. For simplic-
ity, we have specialized here to the case Q = 0. The matrix A is not random, and,
for fluctuations from a steady state, is time-independent, although, in general, A
can be time dependent.
The comnlex noise tensor can he written
or when components are exhibited and the process is stationary,
where the dagger denotes the complex conjugate for classical processes and the
Hermitian adjoint for quantum random processes. Generally, one measures the
noise in a composite variable
where the cv are nonrandom coefficients. Then
will of necessity be real and positive.

To obtain the noise from Eq. (8.148) we need to be able to evaluate expressions
such as
But Eq. (8.145) is valid only for t > 0, and the integral in Eq. (8.148) extends from
—oo to co. We can overcome this problem by splitting the second integral in Eq.
(8.148) into two regions: positive and negative t. In the second region, we replace
t by — t and use stationarity to simplify the result. Thus
where
Note that we have succeeded, in Eq. (8.154), in expressing the second term in terms
of positive times. This is needed since Markovian techniques normally express the
future (positive times) in terms of the present. Whether the variables a are real
(Hermitian) or not, the second integral is related to the first by
Eqsuations (8.145) and (8.151) have their counterparts:
The positive and negative time components of the noise can thus be written brief!}
where A means the transpose of A. As in the single variable case, the spectrum
of the noise is determined by the transport problem, as embodied in A, and the
magnitude of the noise, (afia), which will be determined with the help of the
Einstein relation. For this purpose, we take the second moment equation of Eq.
(R 1^8^ and write it in terms of rv instead of a:
Here a runs over daggered as well as undaggered variables. In Section 8.11, we

shall show that when time reversal is obeyed, and all the Q'S are even (or all odd)
under time reversal, the third term in Eq. (8.160) is identical to the second. In this
case, the steady state (or Einstein) relation
simplifies to
or
In more general cases, we use the following approach to obtain A.*(aâ} in

Eq. (8.161). A matrix equation for M of the form
can be solved in terms of A, B, and C using the concept of ordered operators

(Feynman 1951; Dyson 1949) in the form
where the indexes 1, 2, and 3 constitutes the left-to-right ordering of the symbols,
regardless of how they are written down. The formal solution:
is not explicit, since it gives no indication of how the expression can be written in
the desired order. We can start disentangling by first writing the previous result as
an integral representation:
This expression can then be written with the operators correctly ordered from left
to right in A, B, C order. Then the ordering subscripts are no longer needed. Thus
we arrive at the completely disentangled form:
Thus Eq. (8.161) has a solution in the form:
In our discussion of noise from a Langevin point of view, in Chapter 9, we shall

show that it is possible to evaluate the noise spectrum completely, without needing
an integral such as Eq. (8.168).
The above procedure for evaluating (a]j,a.vj is appropriate in the nonequi-
librium state. In equilibrium, the converse procedure is appropriate: (oâ) is
evaluated from the standard thermal equilibrium formula (Lax 19601; Callen and
Greene 1952; Greene and Callen 1952):
where Pv is the force conjugate to av. For example,
The example in Eq. (8.170) shows that volume fluctuation, and hence concentra-
tion fluctuations, are proportional to the compressibility.
In the equilibrium case, the Einstein relation is used not to determine (aâ) but
to determine D. In quantum applications it is convenient to determine D^ directly
using one expression in Eq. (8.138) with Q = 0, which is called the "generalized
Einstein relation":
where we have used the abbreviation
to remind us that D^, in Eq. (8.171), represents the extent to which the law of
differentiating a product is violated. In quantum application, we will consider a
system in interaction with a reservoir and by elimination of the reservoir obtain an
effective equation for the density operator of the system. Such an equation permits
one to calculate motion of all operators: a^, av, and aâv and hence D^v. We
use the phrase "generalized Einstein relation" because it is not restricted to the
stationary state.
Another concept which generalizes readily to the multivariable case is the
linked-average. With
and with each a possibly evaluated at a different time
In general, the multitime linked-averages are denned by
as in Eq. (1.51).
One of the most general questions one can ask about a set of random variables
a(s) is the multitime characteristic function. We have an explicit expression for
the characteristic function for the case of linear damping plus homogeneous noise.
The multidimensional analogue of Eq. (8.126) is
where K is defined in Eq. (8.113), and
In this analysis, we have permitted A(t) to be explicitly time dependent. The

simple T_» denotes time ordering from left to right (earliest time to the left).
8.11 Time reversal in the linear case
When time reversal is obeyed, Eq. (7.66), with A(i) = a\(i) and 5(0) = oy(O) ,
yields the condition
valid even in the non-Markovian case. The order of random variables has been
chosen so that the time reversal condition remains valid even in the quantum case.
As applied to our linear response, Eqs. (8.151), (8.157), we get
At t = 0 this specializes to
In the classical limit, a.{ and OLJ commute. Thus if one variable is even (say oti)
and the other is odd (say flj, using j3 rather than a to make the oddness visible) we
have
(classical only), even in the non-Markovian case. Thus the variables odd under
time reversal do not correlate with the variables even under time reversal.
TIME REVERSAL IN THE L I N E A R CASE 161
If we equate terms linear in t in Eq. (8.178) we obtain
In the classical case, when both variables are even under time reversal, this
simplifies to
An example for the inertial systems, containing even and odd variables, will be
discussed in more detail in Section 9.7.
Time reversal leads to a paradox. If a(t) is a set of even variables then j3(t) =
da(t]/dt is a set of odd variables under time reversal. Thus
hence in the classical case
On the other hand, if we replace /3 by da(t)/dt,
is nonvanishing. How can we explain this paradox? For t > u we have

demonstrated in Eq. (8.157) and in Lax (19601) Eq. (8.18) that
For t < u, take the hermitian and use this result again
In view of Eq. (8.178), these expressions are in fact equivalent provided that the
absolute value sign is used. The derivative of Eqs. (8.186)-(8.187) with respect to
t at t = 0 then displays a cusp or discontinuity in slope:
In view of the time reversal condition, Eq. (8.181), these slopes are, in fact, equal
and opposite in sign. This situation is displayed in Fig. 8.1.
Perhaps the best way to understand this dilemma is that the cusp is correct for a
truly Markovian system. However, the Markovian approximation may not be valid
for exceedingly small times. And the slope will be continuous for small times, and
hence vanish there. This is what would be expected for the occupancy of an excited
state of an atom as a function of time. However, a deviation of the excited state
occupancy from a simple exponential decay has, so far, not been observed.
Time reversal symmetry of correlation
FIG. 8.1. The true regression of a fluctuation (solid curve) is compared to the
Markovian approximation (dashed curve). For t > TJ, the decay is exp(—t/r),
where r is a typical relaxation time and r^ is the duration of a collision, or the
forgetting time of the system. This figure is from Lax (19601).
8.12 Doob's theorem
Theorem
A Gaussian process is necessarily linear. By linearity, we mean
Proof
The conditional probability
is necessarily a Gaussian in the variables ai, 02, • • • a n , a, since it is the ratio of

two Gaussians in these variables. Thus it must have the form
A HISTORICAL NOTE AND S U M M A R Y (M. LAX) 163
where all the dependence on {a.,} is shown. If one calculates the mean of a, e.g.,
by finding the maximum exponent, or by completing the square, one finds
The process is nonstationary if the Tj and a are time dependent, but it remains
linear. The proof also applies to a set of random variables a(i) conditional on
(a(tj)}. Then a and Tj are possibly time dependent matrices.
Doob's theorem
A random process that is stationary, Gaussian, and Markovian possesses an
exponential autocorrelation
Doob (1942) stated this theorem for a one-dimensional random process. Kac
extended it to the multidimensional case as discussed in Appendix II of Wang
and Uhlenbeck (1945).
8.13 A historical note and summary (M. Lax)
A number of people have asked how I got into the study of noise. Blame it on
the editor of the Physical Reviews in the late 1950s. I had written a paper entitled
"Generalized Mobilities" in which I derived the formula usually called the Kubo
formula. That formula was shown to obey the fluctuation dissipation theorem. As a
result the editor assumed that I knew something about noise and promptly started to
send me a series of papers on noise in semiconductor transistors, diodes and other
devices. These papers were written by good experimentalists who observed noise
and felt obliged to explain it. Since general methods of dealing with noise were
limited, I saw many ad hoc explanations that neither I, nor the authors understood.
In self-defense I tried to learn what aspects were shared by most semiconductor
devices. I decided that the relevant feature was that the noise consisted in small
fluctuations about an operating point. The operating point was typically a steady
state, but since current was drawn it was not an equilibrium state. So Lax (19601)
"Fluctuations from the Nonequilibrium Steady State" was born. The lesson learned
was that if the transport theory (time dependent average motion) was understood,
the noise spectrum was readily determined, since the two-time decay or transport
would be the same as the regression or one-time motion. But the normalization
must be determined separately. And the latter was fixed by the Einstein relation.
For the equilibrium case, the integrated noise can be expressed by thermodynamic
formulas. In the nonequilibrium case, one cannot call on thermodynamics, but
can call on the second moments, or the diffusion coefficients that described the
underlying fluctuations.
The framework was later generalized to stronger noise and nonlinear fluctua-
tions. For the Markovian case, this led to the conventional Fokker-Planck equation
discussed in Lax (1966III).
Finally, I decided to apply these methods to the problem of noise in lasers. This
applied the Langevin approach discussed in Lax (19601) and Lax (1996IV) and to
a quantum system thereby creating a "Quantum Theory of Noise Sources" in Lax
(1966QIV).
Quasilinear methods become completely inappropriate since we are dealing
with a self-sustained oscillator. However, classical self-sustained oscillators exist,
and the Fokker-Planck equation for a "rotating wave van der Pol oscillator" was
solved exactly, but numerically by Hempstead and Lax (1967CVI) and Risken
and Vollmer (1967), as will be seen in Chapter 11. Even here, in most cases, the
fluctuations are small and a quasilinear treatment is satisfactory.
An extension of the Einstein relation between fluctuations and mobility was
extended by Lax (19601) to the nonequilibrium case. A treatment by Reggiani,
Lugli, and Mitin (1988) considered the fluctuation-dissipation in a strong field
case in a semiconductor.
8.14 Appendix A: A method of solution of first order PDEs
The problem is to solve a first order partial differential equation (PDE) of the form
where P = P(x, y, z), Q = Q(x, y, z), R = R(x, y, z).

Following Goursat (1917) p.74 and Louisell (1973) Appendix A, we expect
that the solution of this partial differential equation is related to the solution of the
ordinary differential equations
Suppose that we have found a solution of the partial differential equation of the
form
Then
APPENDIX A: A METHOD OF SOLUTION OF FIRST ORDER PDES 165
By eliminating dy and dz, using Eq. (8.196) we get
Thus the integral of the "characteristic equations" (8.196) is a solution of the partial
differential equation (8.195).
If we regard x and y as independent variables and z = z(x,y) then we have
The derivative of u with respect to x yields
A similar derivative with respect to y yields
Take P times the first equation (8.198) plus Q times the second equation (8.199)
plus R times du/dz to get
or
Thus the solution of the characteristic equations also satisfies the inhomogeneous
partial differential equation (8.201).
Note: Ifu(x, y,z) = a and v(x, y, z) = b are two solutions of the characteristic
equations then an arbitrary function (f>(u, v) is a solution of the PDEs (8.195) and
(8.201).
The above procedure can be extended to any number of independent variables.
It is convenient then to note that
is the characteristic equation for

It is appropriate to think of u(x, y, z, t) as a distribution function associated with
the particle dynamical equations
so that
represents a conservation of density along the motion, that is, a Liouville's the-
orem. Thus if particles move according to the dynamical equations, Eq. (8.204),
then u(x(t), y ( t ) , z(t)~) will be a constant of the motion as discussed by Goursat
(1917).
In particular, if the dynamical equations possess a solution
x = x(X,Y,Z,t); y = y(X,Y,Z,t)\ z = z(X,Y, Z,t}, (8.206)
where X, Y, and Z are the initial values of x, y, and z at t = to, then [X, Y, Z] can
be thought of as the name of a particle which remains fixed as its position [x, y, z]
changes. If we solve for the name of position or material variable X, Y, Z in terms
of the spatial variables:
or
then the motion x = [x(t), y ( t ) , z(t)} is such that X = ~K(x(t), y(t), z(t]} is fixed
at the value X, etc. Thus X, Y, and Z are three constants of the motion, and any
function
is a solution of the partial differential equation. Since X = x,Y = y, Z = z at

t = to, the above solution is the one which obeys the initial condition
Another way to state this result is that
where X = [X, Y, Z], x = [ x , y , z] are the Green's function or point source

solution given by
This result shows that a point source remains a point as it moves although the nor-
malization integral may change. The first delta function reminds us of the meaning
APPENDIX A: A METHOD OF SOLUTION OF FIRST ORDER PDES 167
of X(x, t) as the inverse of x(X, t) but the notation 5(F(x, t} - X) reminds us
that in the integration over X only the term after the minus sign is actually an
integration variable.
The solution of the first order Fokker-Planck equation
can also be obtained by considering the characteristic equations
The solution is given by
where
We wrote the solution down by guessing that it was the appropriate solution in
which a "point" remains a point for all time (this simplicity will disappear when
diffusion is added).
We may verify the solution by evaluating dP/dt:
Since the other factors are functions of X, the d/dxi can be pulled all the way to
the left
But X is the material variable or name of the particle so that [dxi/dt]^ is in fact
the material time derivation (the one which follows the particle), i.e.,
and because of the presence of the delta function, we can replace A^(x(X, t)) by
Aj (x) to obtain
where the last step uses the original ansatz Eq. (8.212).
9
Langevin processes
9.1 Simplicity of Langevin methods
For Markovian processes, particularly Fokker-Planck processes, the reduction to a

partial differential equation is the method of choice for obtaining accurate numer-
ical results. However, it is difficult to anticipate the form of the solution, and to
relate it to the physical properties of the system and its noise sources. If the process
is not Markovian, the noise sources will not be white, and the detailed Fokker-
Planck apparatus is no longer applicable. The Langevin description of the noise
source of a physical system is much closer to our physical intuition, and provides
a simpler understanding of the possible solutions. Moreover, analytical solutions
can be obtained for Gaussian processes even when the noise sources are not white.
Shot noise processes involve delta correlated noise sources, but higher cumulants
exist, so they do not reduce to the usual (second order) Fokker-Planck equation,
for which third and higher cumulants vanish. Thus in general Langevin processes
permit a larger class of problems to be solved than Fokker-Planck processes and
procedures. They provide more intuition. However, the class of such processes
that can be reduced exactly to analysis is limited. But approximations (typically
adiabatic ones) are easier to envision and to make in the Langevin language.
Langevin methods, at least for linear, or quasilinear systems, have the simplic-
ity of the circuit equations of electrical engineering. The noise source may arise
from thermal reservoirs as in Johnson noise, or shot noise from the discreteness
of particles. But once the noise is represented as a voltage source with known
moments, the physical nature of the source is no longer important. The sources
can be thought of as a black box, with an impedance and a voltage source, or an
admittance and a current source. And the origin of the sources will not enter into
the solution of problem.
For the quasilinear case, we can write our set of Langevin equations in the form
PROOF OF DELTA CORRELATION FOR M A R K O V I A N PROCESSES 169
where a = a — (a) is a multicomponent object as is the force F(i). The second
form of equation is unnecessary if a is real. The dagger represents the conjugate
when acting on a vector, and the Hermitian adjoint (transposed conjugate) for a
matrix. In the quantum case, it always represents the Hermitian adjoint opera-
tor. Equation (9.1) can be regarded as the definition of F(i). The bold notation is
specifically used here to denote a set of variables a rather than a single variable a.
9.2 Proof of delta correlation for Markovian processes
Results in this section was presented in Lax (19601), Section 8.

Although it is intuitively clear that the correlation (F^(t)F(w)) should be delta
correlated, proportional to 5(t — u), in order not to retain any memory of earlier
events, we shall establish that this must be so for the second moments in order to
be consistent with our results for the correlation (a^(t)a(u)) in Eqs. (8.186) and
(8.187). These equations can be combined into a single equation covering both
time orders
where H(t) is the Heaviside unit function
whose derivative is a delta function:
The autocorrelation of the noise source, F(t), defined in Eq. (9.1), is then given
by (in a form appropriate to the complex case)
where we have replaced at A^ by A*o^. The <— sign reminds us that d/du acts to
the left on (a\t)a(u)}.
170 LANGEVIN PROCESSES
Using a shifting theorem of the form
we have
If we insert Eq. (9.3) into Eq. (9.6), and use Eq. (9.8), we get
Since the derivative of a delta function is an odd function of argument
the result simplifies to
where the second equality follows from the Einstein relation, Eq. (8.163).
An interesting consequence of the above results is the theorem written in
component form.
Theorem
Products of random variables and random forces for Markovian processes:
Equation (9.12) is the statement that for a Markovian process the Langevin force
has no memory.
Proof
Suppose that we start at some time to < s, to < t with a specified initial value
a (to) at t = to- Then Eq. (9.1), a linear first order equation, with constant
HOMOGENEOUS NOISE WITH LINEAR DAMPING 171
coefficients, has the usual solution
where the left hand side could be labeled with a subscript a: (to) to remind us of the
initial condition. If we multiply by F(t) on the right and take an ensemble average,
the first term vanishes since a(to) is fixed and (F(t)} = 0. With the help of Eq.
(9.11) the second term yields
When t > s, the delta function is not included in the region of integration, and
Eq. (9.12) results. When t = s, only half of the delta function is integrated over,
and this establishes Eq. (9.14). The three results can be combined into a single
expression
provided that the Heaviside unit function H(s) takes the value 1/2 at s = t. The
subscript on the average reminds us that it is conditional on a given value of a (to)
at t = to- An additional average over a: (to) permits us, also, to drop the initial
condition, yielding Eqs. (9.12)-(9.14) with the constraint removed on the starting
value.
That a random force for a Markovian system does not depend on random vari-
able values at earlier times, seems intuitively clear. But a proof is needed. Indeed,
our proof, and our entire discussion, so far, has been limited to linear, stationary
processes.
The factor of 2 reduction in Eq. (9.14) from integrating over half a delta func-
tion is consistent with our view that in the nonideal world, the correlation of the
forces is not a delta function, but a sharp even function. Strictly speaking it is not
a function at all, but a sequence of such even functions whose width approaches
zero.
9.3 Homogeneous noise with linear damping
For a large class of noise problems, it is appropriate to treat the noise as weak
and make a quasilinear approximation about the operating point. The purpose
of Lax (19601) was to show that the noise and correlations in such a system
can be obtained in three steps: (1) the time dependence of correlations such as
{a^(t)a(O)} obey the same equations as those for (a^(t)), so that the average
"transport" or "relaxation" equations determine the correlations, and hence the
frequency dependence of the noise; (2) the normalization is determined by the sin-
gle time correlations (cJa)); (3) for a nonequilibrium system, the steady state
values (cJa) must be determined by solving the Einstein relations of Eq. (8.163):
For N variables, this is a set of N"2 equations. One major benefit of the Langevin
approach is that one can avoid the solution of Eq. (9.18) and never solve any system
of more than N equations.
We shall obtain our principal result using the Langevin approach in the heuristic
manner described in Lax (1966IV), Section 1. The noise correlation, Eq. (4.36),
can he written
Our notation is modified to include operators as well as desired random vari-

ables, and to emphasize normal order (creation operators to the left of destruction
operators). With Fourier transforms written as in Eq. (4.18)
Using the Langevin approach, Eq. (9.1) leads to the associated equation
where A is the transpose of A. And Eq. (9.2) translates into the adjoint of Eq.
(9.21)
With the understanding that, even through the Fourier transforms are not well-
defined mathematical object, products of two are, in the sense that
CONDITIONAL CORRELATIONS 173
This notation is consistent with that in Section 4.7 Eq. (4.50). Equation (9.19) can
then be written as
where Fi,Fw is defined by Eq. (9.19) with a replaced by F:
where D is simply the diffusion matrix associated with a due to the Langevin force
F(t) from Eq. (9.11). Thus an explicit answer for the noise in the variables oôv
if?
The inverse of the Wiener-Khinchine relation yields the time correlation
The one-time correlations may then be obtained by setting t = 0:
The integrals contained in Eq. (9.28) can always be evaluated by residue tech-
niques since there are a finite number of poles. Thus the second moments, although
no longer needed, except as a measure of total noise, can also be obtained simply.
This avoids the use of ordered operators discussed in Section 8.10.
9.4 Conditional correlations
Another advantage of the Langevin method, at least in the linear case, is that it is
easy to calculate second order correlations conditional on an initial condition. The
equation is Eq. (9.1) and (9.2):
subject to a(to) = CKQ. The solution can be written as
If Eqs. (9.31) and (9.32) are multiplied and averaged, the result is
with
Cross-terms linear in F vanish on averaging and were discarded. Equations (9.11)

and (8.163) can be rewritten
If t > u, the integral over r should be done first in Eq. (9.35). Then the delta
function is always satisfied somewhere in the range of integration. Thus
But the integral is a perfect differential, as remarked in Lax (19601):
The complete integral then yields

GENERALIZED CHARACTERISTIC FUNCTIONS 175
The absolute value \t — u , in the first term, is unnecessary since t > u. How-
ever, when u > t, the integral must be done over s first, and both answers are given
correctly by using \t — u .
Although we are dealing with a stationary process, Eq. (9.40) is not stationary
(not a function only of \t — u\) because initial conditions have been imposed at
t = to. However, if to —> —oo, the results approach the stationary limit
Equation (9.40) can also be rewritten by subtracting off the mean values
so that for the fluctuations about the average position:
Again the results approach the stationary results as to —» — oo and u — to —>• oo at

fixed t — u.
At t = u = to, the right hand side vanishes as it should, since there is no
fluctuation at the initial time at which all values are specified. The stationarity of
the original expression, Eq. (9.40), is maintained if all times t, to, and u are all
shifted by the same amount r.
9.5 Generalized characteristic functions
In this section we will obtain the result in Section 8.7 using the Langevin approach,
which was presented in Lax (1966IV), Section 2.
176 L A N G E V I N PROCESSES
We can continue our discussion of homogeneous noise with linear damping
using the same Langevin equation
but specifying the noise source moments:
Thus all linked-moments are assumed maximally delta correlated. The parameters
A, D and Dn can be functions of time as discussed in Lax (1966IV), but we
shall ignore that possibility here for simplicity of presentation. Here, L denotes
the linked average (or cumulant) which is defined by
where y(s) is an arbitrary (vector) function of s. Equation (9.49) is a natural

generalization of Eq. (1.51).
If we insert Eqs. (9.46), (9.47), (9.48) into Eq. (9.49), we get
Here the symbol ":" means summation on all the indexes. The n = 1 term van-
ishes in view of Eq. (9.46). Equation (9.50) defines K(y, s) which was previously
defined in the scalar case in Eq. (8.113).
Instead of evaluating MQ by solving a partial differential equation, as in Section
8.8, we consider the direct evaluation of Eq. (8.114):
To do this, we write the solution of Eq. (9.45) as

GENERALIZED SHOT NOISE 177
If Eq. (9.52) is inserted into Eq. (9.51) and the order of the integration over u and
s is reversed, we get
where
Equation (9.53) is now of the form, Eq. (9.49), for which the average is known.
The final result
is in agreement with Eq. (8.126), except that here, we have explicitly dealt with
the multivariable case.
9.6 Generalized shot noise
In the shot noise case, there is no damping, that is, A = 0, corresponding to a

noise source equation of the form
where the rjk are random variables. We use the symbol G rather than F to remind
us that (G} ^ 0. The choice of linked-moments
with
is appropriate to describe Rice's generalized shot noise of Section 6.7. With this
choice, the linked-moment relation of Eq. (9.58), with F replaced by G yields
One can explicitly separate the n = 1 term:
These results describe the properties of the noise source G. We are concerned,
however, with the average
If we separate off the mean part of G
then
GENERALIZED SHOT NOISE 179
Reversing the order of integration in MQ leads, as in Eq. (9.53), to a result
where
Equation (9.65) is of the form to which Eq. (9.60) can be applied. Since we have G
in place of G, the first factor on the right hand side of Eq. (9.60) should be omitted
in MQ:
For the case of simple shot noise
For the case of generation and recombination,
where v = G + R is the total rate.
When inserted into Eq. (9.63), these results, Eqs. (9.69), (9.71), contain all the
multitime statistics of the conditional multitime average MQ. The correspond-
ing multitime correlations can be determined by simply differentiating MQ with
respect to %(«1)^(^2) • • ••
9.7 Systems possessing inertia
The customary description of random systems in terms of a set of first order

equations for da/dt appears to contradict the case of systems possessing iner-
tia for which second order equations are appropriate. However, by introducing the
momentum variables, as in Hamiltonian mechanics, we convert TV second order
equations to IN first order equations. Extension of our previous results to inertial
systems is immediate, but purely formal, except for the fact that the set of position
variables is even under time reversal:
whereas the momentum variables are odd:
and the cross-moments are clearly odd in time (see Eq. (7.14))
Hence such moments must vanish at equal time in the classical case:
In the quantum case, the commutator qp — pq = ih forces (qp} = —(pq) = ih/2.

If we set
then a = (p, q) (here a is Hermitian) obeys a set of first order equations
corresponding to the second order equation
where
and F is an external force. In the presence of noise, random forces can be added to
the right hand side of Eqs. (9.78) and (9.79). Hashitsume (1956) presented heuris-
tic arguments that no noise source should be added to the momentum-velocity
SYSTEMS POSSESSING INERTIA 181
relation. However, a proof is required, which we shall make based on the Einstein
relation. Equations (9.78) and (9.79) correspond to the decay matrix
In the equilibrium case, we can write
The nonequilibrium case is discussed in Lax (19601), Section 10. The Einstein
relation Eq. (9.18) in the equilibrium case then yields
The presence of elements only in the lower right hand corner means that, as
desired, random forces only enter Eq. (9.79). If F in Eq. (9.79) is regarded as
the random force then
may be regarded as a different way of stating the Einstein relation and the
fluctuation-dissipation theorem.
10
Langevin treatment of the Fokker-Planck process
10.1 Drift velocity
In Chapter 9, the Langevin processes are discussed based on the Langevin equa-
tions in the quasilinear form, Eqs. (9.1), (9.2). In this chapter, we consider the
nonlinear Langevin process defined bv
The coefficients B and a may explicitly depend on the time, but we will not display
this time dependence in our equations.
We will now limit Langevin processes which lead back to the ordinary Fokker-
Planck equation, i.e., a generalized Fokker-Planck equation that terminates with
second derivatives. We shall later find that the classical distribution function of the
laser, which corresponds to the density matrix of the laser, obeys, to an excellent
approximation, an ordinary Fokker-Planck equation. We assume
The Gaussian nature of the force f ( t ) implied by Eq. (10.4), which is needed for a
conventional Fokker-Planck process, namely one with no derivatives higher than
the second. It is possible to construct a Langevin process which can reproduce any
given Fokker-Planck process and vice versa.
The process defined by Eq. (10.1) is Markovian, because a(t + At) can be cal-
culated in terms of a(t), and the result is uninfluenced by information concerning
a(u) for u < t. The /'s at any time t are not influenced by the /'s at any other
t', see Eqs. (10.2)-(10.4), nor are they influenced by the a's at any previous time
since / is independent of previous a. Thus the moments Dn in Section 8.1 can be
calculated from the Markovian expression:
DRIFT VELOCITY 183
The difference between the unlinked and linked averages in Eq. (10.5) vanishes
in the limit At —> 0. Rewriting Eq. (10.1) as an integral equation and denoting
we have the integral equation:
We shall solve this equation by iteration. In the zeroth approximation we set

a(s) = a(t) = a which is not random in averages taken subject to a(t) = a.
Our first approximation is then
The first term is already of order At and need not be calculated more accurately.
In the second term of Eq. (10.7) we insert the first approximation
Retaining only terms of order At, or / 2 , but not /At, or higher, we arrive at the
second and final approximation
Let us now take the moments, Dn. For n > 2, using Eq. (10.4), to order At, we
have
Thus from Eq. (10.5) all Dn = 0 for n > 2. For n = 2, using Eqs. (10.2)-(10.5),
we obtain
184 L A N G E V I N TREATMENT OF THE FOKKER-PLANCK PROCESS
and for n = 1,
The double integral in Eq. (10.14), evaluated using Eq. (10.3), is half that in Eq.
(10.12) since only half of the integral over the delta function is taken. Integration
over half a delta function is not too well denned. From a physical point of view,
we can regard the correlation function in Eq. (10.3) to be a continuous symmetric
function, such as a Gaussian of integral unity. As the limit is taken with the corre-
lation becoming a narrower function, the integral does not change from 1/2 during
any point of the limiting process.
Equations (10.13) and (10.14) have shown that given a Fokker-Planck pro-
cess, described by a drift vector A(a) and a diffusion D(a), we can determine the
functions B(a) and a (a):
that leads to a Langevin process with the correct drift A(a) and diffusion D(a) of
the associated Fokker-Planck process described by Eqs. (10.1). The reverse is also
true. Given the coefficients B and a (a) in the Langevin equation, we can construct
the corresponding coefficients, A and D of the Fokker-Planck process.
10.2 An example with an exact solution
The procedure used in the above section may be regarded as controversial, because
we have used an iterative procedure which is in agreement with the Stratonovich's
(1963) treatment of stochastic integrals, as opposed to the Ito's (1951) treatment.
This disagreement arises when integrating over random process that contain white
noise, the so-called Wiener processes. We shall therefore consider an example
which can be solved exactly. Moreover, the example will assume a Gaussian ran-
dom force that is not delta correlated, but has a finite correlation time. In that case,
there can be no controversy about results. At end, however, we can allow the cor-
relation time to approach zero. In that way we can obtain exact answers even in
the white noise limit, without having to make one of the choices proposed by Ito
or Stratonovich, as discussed in Lax (1966IV), Section 3.
AN EXAMPLE WITH AN EXACT SOLUTION 185
The example we consider is:
where // and a are constants, independent of a and t, and R(t — u) is an analyt-

ical function. Thus our problem is linear and Gaussian but not delta correlated,
hence, the noise is not white. This example was previously presented (with a slight
difference in notation) in Lax (1966III), Section 6C and Lax (1966IV), Section 3:
from Eq. (10.17). Therefore the ensemble average given by
can be evaluated using Eq. (9.49) in terms of the linked average:
The average is then expressed in terms of the cumulants. But for the Gaussian case,
the series in the exponent terminates at the second cumulant:
We make a transformation of Eq. (10.18) to variable x, y:
It was permissible, here, to obey the ordinary rules of calculus in this trans-
formation, without requirement of Ito's calculus lemma, because delta function
correlations are absent. The equation of motion for x is
Since x would be constant in the absence of the random force f ( t ) , then the prob-
ability of x at time t, P(x, t), is necessarily Gaussian, and determined only by the
186 LANGEV1N TREATMENT OF THE FOKKER-PLANCK PROCESS
random force /(£), therefore, has the normalized solution form
where the second moment from Eq. (10.23) must be:
where H is an abbreviation for H(t}. Using Eq. (10.22),
and changing back to the original random variable a, Eq. (10.24) leads to
Equation (10.27) is valid, even in the non-Markovian case of a continuous dif-

ferentiable /(£). In this case, (H2) proportional to t2 when t —> 0, and its first
derivative over dt is zero. If the process is Markovian, /(t) will be delta correlated
anrl / ff^\ will he linear in •/••
10.3 Langevin equation for a general random variable
Let us consider an arbitrary function M(a, t) of the random variable a, which

obeys the Langevin equations:
We ask what is the Langevin equation for Ml Following the procedure in Section
8.2, the drift vector for M in the Fokker-Planck process is determined by
where, from Eq. (10.15) and Eq. (10.16),
and
Equation (10.30) is terminated to n = 2 for an ordinary Fokker-Planck process.

L A N G E V I N EQUATION FOR A G E N E R A L R A N D O M VARIABLE 187
The fluctuation term for M is determined by
The drift term in the Langevin equation for M is given by
We obtain that
The Langevin equation for M is given by
Therefore, the transform in our Langevin equation obeys the ordinary calculus
rule.
The average is contributed not only from B(M,t] but also from the second
term
which is not zero, except that a(M) is a constant. For the conditional average with
M(t) = M, the contribution from the second term is
We consider an example, which will be used in Chapter 16 for applications in

the finance area. If S is the price of a stock, the fundamental Langevin model is
built for the percentage of the stock value dx = dS/S:
where u, and a are not functions of x.

Our Langevin equation for S is simply
Hence, Eq. (10.37) can be simply obtained from Eq. (10.36) multiplying by S. For
obtaining the average or the conditional average with s(t) = s, d(S)/dt, we have
and for the conditional average, (S) in the last expression of Eq. (10.38) is replaced
by S.
10.4 Comparison with Ito's calculus lemma
The stochastic differential equations, in which Ito's calculus lemma is used for
the transform of random variables, are broadly applied in the financial area. Ito's
stochastic equation is written as
where dz is the differential of a Wiener process of pure Brownian motion, namely

one whose mean remains zero, and whose standard deviation is (At) 1 / 2 .
Ito's stochastic equation for an arbitrary function M(a, t) is written as
where A(M, t) is determined by Ito's calculus lemma:
and a(M)n is determined by
Our Eq. (10.30) is similar to Ito's calculus lemma, which indicates that the
ordinary calculus rule is not valid for the Fokker-Planck processes, as shown in
Section 8.2. However, the calculus rule for our Langevin equation is not Ito's
calculus lemma.
The difference between our Langevin's stochastic equation and that using Ito's
lemma is originates from the different definitions of the stochastic integral. Ito's
integral <r(a(t))H is evaluating at beginning of the interval. cr(a(t))H is replaced
by cr(a(t c ))H, where tc = t — e is slightly earlier than t. This leads to the average
EXTENDING TO THE MULTIPLE DIMENSIONAL CASE 189
(o-(a(£ c )) H /0)) = 0, and B(a,t] = A(a,t) in Eq. (10.39). Hence, Ito's calculus
lemma must be introduced in order to obtain the correct answer. In our Langevin
description (a(a(t))f(t)} is estimated based on an integrand as function of t, that
can be recognized as a limit of an analytical function on which the Riemann inte-
gral exists as shown in Section 10.2, and finally it approaches to a delta correlated
function: (f(t]f(s)} = 2S(t — s}. Hence, (a(a(t))f(t)} is estimated to be nonzero,
except the case that a is a constant.
There is an unimportant difference in notation between ours and that used in
some literature. The standard Wiener notation is equivalent to the correlation
This leads to a Brownian motion in which
whereas the customary physics (and our) notation would have a factor of 2 placed
on the right hand side of these equations. Hence, there is a notation transform
between that used in this book and that in other literature:
Although two stochastic approaches lead to the same mathematical result on the
average level or on the conditional average level, we, as physicists, prefer a method
more compatible to the actual natural processes, that the integrand is a function
of time t. The ordinary calculus rule can be used in our Langevin's stochastic
equation. This is a major advantage of our approach. As shown by the example in
Eq. (10.38), for a random variable dx = dS/S, one cannot simply write dS =
Sdx when Ito's calculus lemma is used. This could possibly misleading, and will
be discussed in Chapter 16. Other examples of applying two different stochastic
approaches in the financial area are presented in Section 16.6 and Section 16.7.
10.5 Extending to the multiple dimensional case
Consider a set of random variables a [ai, a-2,..., an], which obey the Langevin
equations:
Following the procedure in Section 10.1, up to the second order of iterations, we

have
190 LANGEVIN TREATMENT OF THE FOKKER-PLANCK PROCESS
Hence, the moments Dn defined in Eq. (10.5) (up ton = 2) are
and
For a set of functions M(a) [Mi (a), M2(a), . . . , M n (a)], the Langevin equation
for M is written as
We ask what are £?j(M, t) and <Tj(M)? Following the procedure in Section 8.2,
the drift vector for M in the Fokker-Planck process is determined by
where [Di(a,t)]i and [D^(a., £)]&/ are given, separately, by Eq. (10.49) and Eq.
(10.48).
The fluctuation term for M is determined by
The drift term in the Langevin equation for M is given by
Now, we calculate the second term:

MEANS OF PRODUCTS OF RANDOM VARIABLES AND NOISE SOURCE191
After a few steps of calculation, we obtain that
Therefore, the Langevin equation for M is given by
Therefore, the transform in the Langevin formula of M(a) obeys the ordinary
calculus rule.
The average of the fluctuation term (<Tj(M)/(£)} is given by
and for the conditional average, the average symbol ( . . . ) on the right hand side of
the above equation is taken out.
10.6 Means of products of random variables and noise source
In this section we use another method to estimate (M(a)F(t)}, and extend to the
multidimensional case. Here, F(t) is limited to be independent of a, hence, is a
linear fluctuation model. This approach will be used in the next chapter.
Let us consider an arbitrary function M(a) of the set of random variables
a= [ai,..., an] which obey the usual coupled Langevin equations:
We want to calculate ( M ( a ( t ) , t ) F j ( t ) ) . In accord with Eq. (9.17), we assume
i.e., the noise sources at time t are independent of the variables a at the time s < t.
Our calculation here will be different from the iteration procedure used in Section
10.1, and simpler. We set
and will later let e —> 0. We rewrite (M(t)Fj(t)), with notation M(t) =
M(a(t),t) , as
By Eq. (10.56), the first term vanishes, and Eq. (10.58) can be rewritten as an
integral
or using Eq. (10.55),
Only the last term of Eq. (10.60) is sufficiently singular, containing a product of
two forces, to yield a finite contribution as t — tc = e —>• 0.
MEANS OF PRODUCTS OF RANDOM VARIABLES AND NOISE SOURCE193
To lowest order in e, (dM/dai)s « (OM/dai)tc, so that
Using Eq. (10.55) and omitting a factor 2 since we are integrating only over one
half of the delta function, we obtain:
We note that on the right hand side we had taken the a's fixed at tc, and then looked
at the fluctuation of a at time t, i.e., a(t c )+fluctuation. Then we have calculated
the correlation of the components of this fluctuation with Fj (t) . We can then take
the limit t —> tc and write Eq. (10.62) as
i.e., the average conditional on a(i) = a at time t.

11
The rotating wave van del Pol oscillator (RWVP)
11.1 Why is the laser line-width so narrow?
The question has been raised with an entirely different meaning by Scully and
Lamb (1967). However, optical lasers with frequencies as high as 1015 radians per
second can have line-widths of 104 and even smaller. The addition of nonlinearity
to a system generally leads to combination frequencies, but not to an extremely
narrow resonance. The clue for the solution of this difficulty is described in Lax's
1966 Brandeis lectures (Lax 1968), in which Lax considered a viable classical
model for a laser containing two field variables (like A and A^~), two population
levels such as the upper and lower level populations in a two level atomic system,
plus a pair of atomic polarization operators that represent raising and lowering
operators.
When Lax took this nonlinear 5 by 6 system, sought a stationary state, and
examined the deviations from the stationary operating state in a quasilinear man-
ner, he discovered that there is always one nonstable degree of freedom: a phase
variable that is a combination of field and atomic phases. The next step was the
realization that this degree of instability is not an artifact of the particular exam-
ple, but a necessary consequence of the fact that this system, like many others,
including Hewlet-Packard radio-frequency oscillators, is autonomous, namely that
the equations of motion contained no time origin and no metronome-like driving
source. Mathematically, this means that the system is described by differential
equations that does not contain to explicit time dependence. As a consequence if
x(t) is a solution, where x(t) is a six-component vector, then x(t + r) is also
necessarily a solution of the differential equation system. But this means that the
solution is unstable to a time shift, or more pictorially to a frequency shift. Under
an instability, a new, perhaps sharp line can occur, as opposed to the occurrence
of summation or difference lines that arise from nonlinearity. Hempstead and Lax
(1967CVI) illustrate the key idea in a simpler system, the union of a positive and
negative impedance. In this chapter, we will first describe this nonlinear model,
build the corresponding differential equation of motion in Section 11.2, and trans-
form this equation to a dimensionless canonical form in Section 11.4. In Section
AN OSCILLATOR WITH PURELY RESISTIVE NONLINEARITIES 195
11.3, the diffusion coefficients in a Markovian process is defined, and the con-
dition for validity of this approximation is described. In Section 11.5 the phase
fluctuations and the corresponding line-width are calculated. The main result of
line-width W obtained in Eq. (11.66) is shown to be very narrow. In Section
11.6, the amplitude fluctuations is calculated using a quasilinear treatment. In Sec-
tions 11.7 and 11.8, the exact solutions of fluctuations are calculated based on the
Fokker-Planck equation of this model.
11.2 An oscillator with purely resistive nonlinearities
We consider a self-sustained oscillator, which is modeled by connecting a positive

and negative resistance in series with a resonant circuit, as shown in Fig. 11.1.
A more general case was considered in Lax (1967V). The negative resistance
is a source of energy, like the pump in a laser. A steady state of energy flow
from the negative to the positive resistance is achieved at the oscillation inten-
sity at which the positive and negative resistance vanishes, and the frequency of
oscillation stabilizes at the frequency at which the total reactance vanishes.
In addition to the standard resonant circuit Ldl/dt + Luj^Q, we assume a
resistivity function R(p)I. Therefore our equation of motion is
where e(i) is a real Gaussian random force.
and p is essentially the energy stored in the circuit, which is defined by
where
with A being complex. By taking our nonlinearity R(p) to be a function of the

energy stored in the circuit, but not of the current or of the charge, we omit terms
that vary as exp(2iwoi), etc., thus we have made the rotating wave approximation.
By definition
and using Eqs. (11.1), (11.4), (11.5) we obtain
where
196 THE ROTATING WAVE VAN DEL POL OSCILLATOR ( R W V P )
FIG. 11.1. A self-sustained oscillator is modeled by connecting a positive and

negative resistance in a resonant circuit. The negative resistance is a source of
energy. A steady state of energy flow from the negative to the positive resis-
tance is achieved at the oscillation intensity at which the positive and negative
resistance vanishes, and the frequency of oscillation stabilizes at the frequency
at which the total reactance vanishes.
It is consistent with the rotating wave approximation only to retain the slowly
varying parts of A and A*, so that the term of A*e2luJ°t in Eq. (11.6) is dropped,
leaving
We first use the equations for A and A* to determine the operating point, i.e.,
From Eq. (11.8), the operating point, PQQ, is therefore determined by
or using Eq. (11.2), that
We call this operating point poo because we will later find a slightly better one,
po, using a different reference frame. From Eq. (11.8) we obtain the equation of
THE DIFFUSION COEFFICIENT 197
motion of A*,
where
We are doing this problem thoroughly because we will find that this classical ran-
dom process, which is associated with the quantum mechanical problem like laser
in the difficult region near threshold, reduces to the Fokker-Planck equation for
this rotating wave van der Pol oscillator.
11.3 The diffusion coefficient
Noticing that e_(i) and e+(t) are random forces, we now calculate the diffusion
coefficients D-+ = D+- defined by
Using the definition of the 6 function, we have
Equation (11.15)isan appropriate definition for processes that are Markovian over
a time interval AT » (ô)"1 (see Lax 1966IV, Section 5 for a more detailed dis-
cussion of processes containing short, nonzero correlation times). Equation (11.15)
can be rewritten as
Using the definition of G(e, UJQ) in Eq. (4.50), we see that the diffusion constant in
the limit of AT —> oo is
and describes the noise at the resonant frequency U>Q. If we had chosen
then our spectrum would be that of white noise, i.e., independent of frequency.
In the case of not exactly Markovian process we are assuming the spectrum does
not vary too rapidly about OJQ (see Fig. 11.2), and thus we can approximate it by
198 THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
FIG. 11.2. In the case of not exactly Markovian process we can approximate it
by white noise evaluated at the frequency U>Q.
white noise evaluated at the frequency UIQ. The spectrum of the noise source is not
necessarily white, but only the change over the frequency width of the oscillator
is important, and that change may be small enough to to permit approximating the
noise as white. Indeed, the situation is even more favorable in a laser containing N
photons: the line width will be smaller than the ordinary resonator line-width by a
factor of N.
In general, for an oscillatory circuit such as shown in Fig. 11.1, it is essential to
choose AT to be large compared to the period of the circuit, but it is often chosen
small compared to the remaining time constants to avoid nonstationary errors. The
condition for the validity of this choice is actually
where A l is the relaxation time associated with the system and Su> is the
frequency interval that
1
over which the noise displays its nonwhiteness. To order (woAT) , we have
shown in Lax (1967V) that
Thus we actually require two conditions, Eq. (11.19) and the less stringent
condition (oÂT)"1 «C 1 .
THE VAN DER POL OSCILLATOR SCALED TO CANONICAL FORM 199
11.4 The van der Pol oscillator scaled to canonical form
The oscillator shown above is a rotating wave oscillator, but not a van del Pol
oscillator since Eq. (11.8) has an arbitrary nonlinearity R(p). Therefore we expand
R(p) about the operating point, forming the linear function
We shall later discuss the condition under which this approximation is valid. We
now perform a transformation
and
where
and
Then Eq. (11.8) becomes a canonical form:
where
and
The coefficients £ and T were determined by the requirement that A' and h satisfy
Eqs. (11.27) and (11.29).
The condition for neglect of higher terms in the expansion of R(p) about the
operating point is
where Inserting Eq. we require
If the noise (e 2 ) Wo is weak, as it usually is, the noise width of Ap = ^2 will be

small compared to the region Sp over which R(p) changes appreciably. In physical
terms the width £2 of the fluctuations (in p) is small compared to the region 8p
characterizing the nonlinearity. Thus over this region the linear expansion of R(p),
resulting in the van der Pol equation, is valid.
11.5 Phase fluctuations in a resistive oscillator
In Eq. (11.10) we found that an oscillator chooses to operate at a point at which its
net resistivity and its line-width vanishes. Noise in a stable nonlinear system would
add to this signal possessing a delta function spectrum, but not broaden it. Fortu-
nately, an autonomous oscillator (described by a differential equation with time
independent coefficients) is indifferent to a shift in time origin and thus is unsta-
ble against phase fluctuations. These unstable phase fluctuations will broaden the
line, whereas amplitude fluctuations only add a background. For the purpose of
understanding phase broadening, therefore, it is adequate to linearize with regard
to amplitude fluctuations. Indeed, for a purely resistive oscillator, there is no cou-
pling (at the quasilinear level) between amplitude and phase fluctuations. At least
in the region well above threshold, then, when amplitude fluctuations are small, it
is adequate to treat phase fluctuations by neglecting amplitude fluctuations entirely.
If in Eq. (11.8) we introduce
then the phase is found to obey the equation
from which R(p) has disappeared. Amplitude fluctuations are neglected by setting
u = 0. The only vestige of dependence on R(p) is through po, which is \A\^ at the
operating point. Equation (11.34), with u = 0, is a differential equation containing
no time origin and no metronome-like driving source. As a consequence if x(t) is
a solution, then x(t + T) is also necessarily a solution. This means that the solution
is unstable to a time shift.
po could be replaced by the more accurate (p). Since R(p) no longer enters the
problem, we can with no loss of generality work with the dimensionless variables
introduced in Section 11.4 for the RWVP oscillator. Dropping the primes in Eq.
(11.27), and defining p = (p)/£ 2 , Eq. (11.34) (with u = 0) takes the dimensionless
form
of a Langevin problem in </> alone.

Since e ( t ) , hence h(t] has been assumed Gaussian, with vanishing linked-
moments for n > 2, Eq. (11.35) as shown in Chapter 10 reduces to a
PHASE FLUCTUATIONS IN A RESISTIVE OSCILLATOR 201
Fokker-Planck process. The diffusion coefficient D is given by Eq. (8.14),
Using Eq.
Since we already have the product of two /i's, using Eqs. (11.29) and (11.30) we
obtain
Therefore D((f>) is independent of <p.

Let us now calculate the drift vector, i.e., our first moment. From Eq. (11.35),
the first moment or drift vector is given by
Following the method of Section 10.6
The first term vanishes and the second yields
using Eq. (11.29) and integrating over half the delta function. Adding the complex
conjugate in Eq. (11.39) we get
By inserting the above first and second moments into the generalized Fokker-
Planck equation, Eq. (8.38), we obtain a simple Fokker-Planck equation for this
process,
which describes pure phase diffusion. Since the process is Markovian, it is permis-
sible to use the standard Fokker-Planck Equation (8.38) with the delta function
initial condition, Eq. (8.45), without requiring conditional moments, Eq. (8.44).
But this is the well-known Green's function for diffusion
We can now calculate the phase line-width, which by Eq. (4.4) is given by the
Fourier transform of (a*(t)a(0)}, where a is defined in Eq. (11.4)
Using Eq. (11.33) and neglecting amplitude fluctuations
Using the cumulant expansion theorem, we then obtain
Since (/> is a Gaussian variable, linked averages beyond the second disappear, and
Eq. (1.56) yields
Using Eqs. (11.36) and (11.38) for £>(</>), Eq. (11.48) becomes
where the Ap describes the associated line-width, with subscript p standing for
phase, and we have
The noise spectrum G(a,uj), by the Wiener-Khinchine theorem given in

Section 4.2, is given by
PHASE FLUCTUATIONS IN A RESISTIVE OSCILLATOR 203
Thus we have a Lorentzian spectrum with line-width, Ap = I / p . We now return
to our original units by replacing the prime we had removed:
and then Eqs. (11.26) and (11.33) yield
Now the power dissipated in the positive resistance is
Thus Eq. (11.53) becomes
where the width AOJ is the cavity width with just the positive impedance Rp
present. We see that the line-width is proportional to l/P. For a laser, the
line-width is proportional to one over the mean number of photons N.
Now we calculate (e 2 ) Wo . we set
where the p and n refer to the e2 associated with the positive and negative resis-
tance respectively. Using the definition of (e 2 ) Wo , Eqs. (11.19)-(11.20) and the
Wiener-Khinchine theorem, Eq. (4.4), we have
The equilibrium theorem in Section 7.4 for this noise, including zero-point
contributions, leads to
where
Tp is the positive temperature and C(u,T) is the quantum correction factor which
approaches 1 for T —» oo and gives us the quantum corrections at low temper-
atures. A detailed discussion of this correction factor is given in Lax (19601),
Section 7.
Similarly
where Tn is the negative temperature. Since both Tn and Rn are negative, the
above equation can be rewritten as
We recall, Eq. (11.11), that our operating point, poo was
therefore, from Eq. (11.55), the full width W is given by
Now Eq. (11.59) can be rewritten as
where
When zero-point contributions are included at this semiclassical level, Eq. (11.63)
becomes
which is the Schawlow-Townes (1958) formula with an correct extra factor of 2 in

the denominator. This factor is correct above threshold, but the original Schawlow-
Townes formula is correct well below threshold. See Fig. 11.3 for a plot of the
gradual reduction of App from 2 to 1 as one passes from below threshold to above
threshold.
We note that in the classical limit, Cp,Cn —» 1 in Eq. (11.63) while in the
full quantum limit np, nn —> 0 in Eq. (11.66). If ordered operators are used in a
full quantum treatment, the Cp would only contain np whereas Cn would contain
ftp +1. Thus Eq. (11.66) is unchanged. We have here a case of the rate up plus the
rate down reminiscent of shot noise. The absorption gets no benefit of zero-point
effects, but emission gets a full zero-point contribution. There is emission even
when there are no photons are in the field, and the amount is that which would be
computed classically into a field already containing one photon.
We now show the line-width, W, to be very narrow. Since many photons are
involved, the power, P, is much larger than the energy of one photon times the line-
width, ACJ, i.e., P ;» hujQûj, and thus the final line-width, W, is much less than
the line-width Aw obtained without negative impedance, i.e., W <C Aw. We note
AMPLITUDE FLUCTUATIONS 205
that this discussion is valid only when the operating power is high enough to enable
us to neglect the fluctuations in the amplitude. Equations (11.58) and (11.60) are
also only valid where Rp and Rn are separately in thermal equilibrium since we
assumed Nyquist's theorem for these noise sources. In a more general nonequilib-
rium case, the noise sources must be obtained from a detailed knowledge of the
nature of the resistances Rp and Rn.
11.6 Amplitude fluctuations
Omitting the primes from Eq. (11.27) we get
We concentrate on motion of p in Eq. (11.3):
Therefore, the equation of motion for p becomes
In Eq. (11.69) A and A* are not constant, hence, (A(t)h*(t)) is not zero. Let us
consider (A(t)h*(t)). Using the method in Section 10.6 and then Eq. (11.27), and
integration over only half the delta function, we have
Therefore the Langevin equation for Eq. (11.69) is
where
hence, we have (Fp(t)) = 0, hence, Fp describe the pure fluctuations. As a

Fokker-Planck process it is effectively to write Fp as
here Ac is the value of A at time tc a little early than t.

The diffusion coefficient, D(p), is given by
Using Eqs. (11.29) and (11.30) we have
Therefore our diffusion coefficient for amplitude, D(p), is
Quasilinear treatment of amplitude fluctuations

The following example shows that the operating point will be different in the
choice of different variables. Using Eq. (11.67) to choose the operating point for
the variable A, the drift vector is
On the other hand, while using Eq. (11.71), the operating point for the variable
P=\A\2,ls
The advantage of po over poo is that it yields a nonvanishing value po > 0 below
threshold.
We will now calculate the amplitude fluctuations. In the quasilinear (QL)
approximation, the decay constant for amplitude fluctuations is
F O K K E R - P L A N C K EQUATION FOR RWVP 207
where the subscript a denotes the amplitude. Using Eq. (11.81), and then Eq.
(11-82),
A better approximation, the intelligent quasilinear approximation (IQL), is

obtained by replacing po by p, the exact (or experimental) mean number:
We note that we need not have used the quasilinear approximation, as we could
have solved this problem exactly, using the drift vector Eq. (11.81) and the
diffusion coefficient, Eq. (11.78) to obtain the Fokker-Planck equation:
11.7 Fokker-Planck equation for RWVP
A complete Fokker-Planck equation including phase and amplitude fluctuations

can be obtained from Eq. (11.27) after dropping the primes:
The last term is obtained by DAA*(d2P/dAdA*) + DA*A(d2P/dA*dA), and

fromEq. (11.29) DAA, = DA,A = 2.
Alternatively, if one prefers real variables, one can set
in Eq. (11.27) to obtain the real Langevin equations

with
with
Alternatively with A = re~l<^, this Fokker-Planck equation transforms using Lax

(1966IV), (3.27) and (3.28), to obtain
with
If we use the variables p = r'2 and 0, we obtain
However if we are only concerned with radial fluctuations, our answers are inde-
pendent of <j) and thus this last term need not appear. We can now see the meaning
of the approximation made in Section 11.5 on phase fluctuations. If we replace p~l
in this last term by (p}~1, a number, then Eq. (11.96) can be separated into phase
and radial motions, Eq. (11.43) and (11.87), respectively. Therefore, in the region
well above threshold, since the amplitude fluctuations are small, the amplitude and
phase fluctuations are nearly uncorrelated.
11.8 Eigenfunctions of the Fokker-Planck operator
We shall look for exact answers by considering the eigenfunctions of the Fokker-
Planck equation (1 1.94). We consider solutions of the form
where A is some integer, to the eigenvalue equation

EIGENFUNCTIONS OF THE FOKKER-PLANCK OPERATOR 209
FlG. 11.3. The line-width Ap that includes phase fluctuations versus the dimen-
sionless pump rate p. This figure was first presented in Lax (1967V).
since we then have the correct phase dependence. From Eq. (11.95) the amplitude
part becomes
We thus need to find the eigenvalues of this second order differential equation.
When A = 0, our solution has no <j> dependence, and we are only looking at radial
fluctuations. AQ,O = 0 corresponds to the steady state. The lowest nonvanishing
eigenvalue with A = 0 is called A a , i.e.
which is appropriate for amplitude noise. On the other hand when A = 1, we have
a solution proportional to e1^ which is appropriate for considering (e z ^~^ 0 ^) (see
Section 11.5), and thus is called
The numerical solution of Eq. (11.99) and the determination of Aa and Ap as

a function of p is given in Hempstead and Lax (1967CVI), and by Risken and
Vollmer (1967).
By comparing Eq. (11.98) with Eqs. (11.43), (11.50), and (11.51) we see that
Ap is the half-width of our spectrum. We plot this half-width Ap in Fig. 11.3.
Actually, the line shape is a weighted sum of Lorentzians with widths equal to the
eigenvalues. Ap plotted in Fig. 11.3 is for the lowest eigenvalue A = 1 which
FlG. 11.4. The half-width of amplitude spectrum Aa versus the dimensionless

pump rate p. The solid (Exact) curve is the line-width associated with fluc-
tuation of intensity A. The short-dashed curve is referred to as quasilinear
approximation (QL). The longer dashed curve is the intelligent quasilinear
approximation (IQL). This figure was taken from Lax (1967V).
contains more than 98% of the weight. See Table VI of Hempstead and Lax
(1967CVI).
Equation (11.50) says A.pp = 1. We see that above threshold this is approxi-
mately true. Below threshold, A.pp —> 2. The reason for this behavior is discussed
in Lax (1967V). The Schawlow-Townes formula was wrong by a factor of 2
because they were basically deriving their results by linear methods valid below
threshold, not valid above threshold.
In Fig. 11.4 we plot the half-width of the amplitude spectrum Aa versus the
dimensionless pump rate p. The exact solution for intensity fluctuation is obtained
by solution of the Fokker-Plank equation in this section. The line-width using
the QL approximation is obtained by Eq. (11.85). IQL curve is obtained by Eq.
(11.86), with a better approximation for the operating point. Actually for pure
intensity fluctuations the two lowest nonzero eigenvalues are close to degenerate
and it is necessary to plot the appropriately weighted average of all the eigenvalues.
We see that when p > 10 or p < — 10, i.e., when we are away from threshold, the
quasilinear results are very close to the exact ones.
12
Noise in homogeneous semiconductors
12.1 Density of states and statistics of free carriers
If we have a set of energy states Ej (j = 1, 2, ...), the number of states in the

interval Ea < E < E^ can be written as
since each integral yields a one or zero according as Ej is in the interval (Ea,Ef,)
or not. Thus n(E), which can be interpreted as the density of states (DOS), is given
by
For electrons quantized in a box of dimensions LI, L-2 and I/s, with periodic
boundary condition, the eigenstates are plane waves
where the wave vectors take the quantized values:
where /i, /2, h are integer. The DOS is thus given by
where -E'(k) is the energy wave vector relationship. For large Lj, the sums can be
converted to a triple integral:
Since
212 NOISE IN HOMOGENEOUS SEMICONDUCTORS
In a crystal the integral in Eq. (12.8) is understood to sum over one Brillouin zone
(BZ).
One can interpret F/(2vr) 3 as the density of states in k space. If one introduces
the momentum variable p = hk, this is equivalent to
where
so that there is one quantum state for each volume /i3 in phase space.
Although we have used a box with three perpendicular axes, in a crystal we
could have used a box with edges parallel to the vectors ai, a2, as of the primitive
cell. The sum would then be over cell positions. The integral, Eq. (12.8), over
k could become an integral over the reciprocal lattice. The integral would still
extend over one Brillouin zone whose shape takes that of one cell of the reciprocal
lattice. Periodicity permits a rearrangement so that the integral is over a Brillouin
zone with the full symmetry of the lattice. General results, such as Eqs. (12.8) and
(12.9), remain valid.
Near the bottom of the conduction band in a semiconductor there is an
approximate effective mass relationship of the form
whose surface of constant energy is an ellipsoid, with Ec the energy at the

conduction band edge. The density of states per unit volume is give by
If we let
we get
where
DENSITY OF STATES AND STATISTICS OF FREE CARRIERS 213
is the corresponding density of states for the isotropic case. It is convenient to
define
as the DOS mass, since this choice produces no correction factor.

We can now omit the prime in Eq. (12.15) and write
where d£l = sin 9d6d(j), the solid angle, can be integrated over to yield a factor 4vr.
Free electrons at equilibrium

In a homogeneous sample, the density of free carriers n of electrons can be written
in the form:
Here, DC(E) represents the density of electronic states per unit energy, and f ( E )
is the probability that any one of these states is occupied. The energy Ec is the
energy at the bottom of the conduction band. If the conduction band is isotropic
near its minimum, the energy takes the simple form:
Here m* is the effective mass of electrons in the conduction band. In this case, the
density of states is shown to take the simple form:
and the probability of occupancy is given by the Fermi function:
Here, £ = Ep is the Fermi energy. We avoid the customary symbol // since the
latter is used for mobility in this chapter. The last form in Eq. (12.20) is appropriate
to the nondegenerate case. (Nondegenerate means that the density is sufficiently
low that Boltzmann statistics are adequate.)
Free holes at equilibrium
Holes are simply empty electron states. Their statistics can be written in a
completely analogous manner. The density of holes, called p, is given by:
Here DV(E) represents the density of states in the valence band and is given by:
As before, we have assumed that the states near the valence band edge (now the
top of the valence band) obey a simple effective mass relationship:
The probability of a hole is the probably that the corresponding electron state is
empty:
Again, the last form is valid only in the nondegenerate limit.

For the above densities of states, in the Boltzmann approximation, the integrals
can be performed, and we obtain the densities:
where the coefficients are given by
and represent effective numbers of states at the band edges that correspond to
Boltzmann occupancy of a distributed set of states. The letters n and p are pre-
sumably used to correspond to the mnemonic, n for negative, and p for positive.
Note that the expressions, Eqs. (12.25) and (12.26), for the densities are correct,
without specifying how the Fermi level £ is to be determined.
CONDUCTIVITY FLUCTUATIONS 215
Law of mass action
The Fermi energy appears with opposite signs in Eqs. (12.25) and (12.26). Hence
it cancels out of the product. The result,
is an example of the law of mass action, with Eg representing the energy gap
between the conduction and valence bands. If donors or acceptors are present, then
the Fermi level is shifted, but Eq. (12.29) are unaffected. If ND, the donor density,
is a function of x, then £, n, and p will be functions of x but the product n(x]p(x]
will be a constant. This law is a special case of the law of chemical equilibrium.
Having the above preliminary knowledge, we will concentrate on the calcula-
tion of noise in semiconductors.
12.2 Conductivity fluctuations
Information about carrier number fluctuations can be obtained by injecting a con-

stant current and measuring the voltage fluctuations induced by the conductivity
modulation caused by carrier concentration fluctuations. The admittance can be
written
where ^p and nn are the hole and electron mobilities, p and n are the hole and
electron concentrations, and P and N are the total hole and electron numbers
over the volume AL between the electrodes, of area A and separation L. Thus the
fractional voltage fluctuations are given by
If only electrons and holes are present (and not traps) charge neutrality will
be enforced up to the (very high) dielectric relaxation frequency, so that to an
excellent approximation
Thus the voltage autocorrelation is given by
where We define an "after-effect function" 3>(t) by

The voltage noise is given by
and the total voltage fluctuation may be obtained by replacing the integral by unity.
The total noise, which only involves <&(0) = 1, is consistent with the normalization
condition in the noise spectrum. The after-effect function will be calculated in later
sections.
12.3 Thermodynamic treatment of carrier fluctuations
In the equilibrium case {(AP) 2 ) = {(A7V) 2 ) can be determined by the thermody-

namical formula.
We shall first consider a set of free electrons in the nondegenerate case when
Boltzmann (rather than Fermi) statistics are applicable. Then Shockley (1950) has
shown that the total number of electrons obeys
where Ec is the energy at the bottom of the conduction band and Nc is a

temperature-dependent effective density of states. The conventional symbol C is
used to represent the electron Fermi level or chemical potential.
For fluctuations from the equilibrium state, the second moments are known
thermodynamically (Greene and Callen, 1952):
When regarded as a thermodynamic quantity, it is customary to rewrite (A) simply

as A. Here, FB is the force "conjugate" to the variable B in the sense that the
negative of the pressure, —P, is the conjugate the volume V. (The negative sign is
necessary since pressure decreases volume rather than increasing it.)
The thermodynamic formula, Eq. (12.37), then yields
This result is not entirely surprising, since the total number of carriers is an integral
Indeed, Eq. (12.38) can be derived directly from Eq. (12.39) using only the
assumption that the TJ are uniformly distributed in space.
THERMODYNAMIC TREATMENT OF CARRIER FLUCTUATIONS 217
A less obvious case is that of a set of Nt traps interacting with a reservoir of
chemical potential (4. We assume that the trap occupancy is sufficiently high that
Fermi statistics are necessary. In that case, the number of filled traps by use of
Fermi-Dirac statistics is
In that case, the thermodynamic formula, Eq. (12.38) becomes
It can be seen that the fluctuations are reduced by a factor equal to the fraction of
empty states. The reason for this result is made clear in the next section, in which a
kinetic approach is used for the same problem. If both N and N are allowed to vary
simultaneously, the simplest distribution consistent with these second moments is
The term in AAÂTV vanishes because N does not depend on (^ and N does not
depend on £. Within the quasilinear approximation, it is appropriate to ignore
higher cumulants than the second and stop at the Gaussian approximation.
Suppose, now, that the electrons in traps do not have an independent reservoir,
but are obtained from the free carriers. Then we must impose the conservation
condition
If this constraint is inserted into Eq. (12.38), we obtain a Gaussian in a single

variable with the second moment
The situation for holes is similar to that for electrons. If the holes have their
own reservoir, then the typical Poisson process prevails
If holes, traps and electrons are all present and coupled to each other then charge
neutrality imposes the constraint
In the presence of compensating centers, Nco, there is also a neutrality condition
for the steady state
12.4 General theory of concentration fluctuations
The thermodynamic discussion of occupancy fluctuations can be generalized by

noting that the average occupancy of a state of energy E is given by
where /3 = 1/kT, and for Fermi, Boltzmann, and Bose particles
For application to a particular state a, replace n by n(a) and E by E(a). The

fluctuation in occupancy of that state is given by the thermodynamic formula, Eq.
(12.37), or the first part of Eq. (12.38), to be
which includes all three statistical cases, Fermi, Boltzmann and Bose, with the
three choices of e above. This result is true in equilibrium.
Fluctuations for the nonequilibrium steady state

We have also established the truth of Eq. (12.50) for the nonequilibrium steady
state in Lax (19601) by explicitly constructing a model in which there are transition
probabilities for the transfer of particles between states. Assuming Wa>a represents
the transition probability for a "collision" which carries the particle form state a to
state a'. Considering the Pauli principle for electrons and holes in semiconductors,
the transition rate Wa>a is replaced by
The master transition probability from occupation number n(a) to n'(a) can be
written as
GENERAL THEORY OF CONCENTRATION FLUCTUATIONS 219
We now perform the calculation of the 6th state of the first moment of the
transition probability defined by Eq. (8.10):
If one inserts Eq. (12.52) and sums first over n', the only terms in the sum which
contribute are those for which a = b and a' = b,
Therefore, from Eqs. (8.22) and (12.51),
In a steady state, one is tempted to make the terms on the right hand side of Eq.
(12.55) cancel in pairs by detailed balance:
However, if one requires that three states a,b,c be consistent with one another
under this requirement, one finds that
must satisfy the consistency requirement f ( c , a ) = f ( c , b ) f ( b , a ) a functional

equation whose only solution is the form
Thus if the ratio of forward to reverse transition probabilities has the form Eq.
(12.58), then the steady state solution has the form
or
Even in the nonequilibrium case, we can choose to define a quasi-Fermi level

A = exp(£/fcT). Under the thermal equilibrium case, g(a) = exp(—E(a)/kT)
which leads to the conventional equilibrium result Eq. (12.48).
In general one obtains a steady state in which neither equilibrium nor detail
balance occur. In any case, for small deviations from a steady state, we set
and rewrite Eq. (12.55) in the form
where
The equation for the second moments, according to Eq. (8.23), is
By using the definition Eq. (12.52), one obtains
for c 7^ b. The two terms in D^c are equal under detailed balance but not otherwise.
For c = 6, we obtain
The steady state second moments (An(6)An(c)} now are chosen so that the right
hand side of Eq. (12.64) vanishes, i.e., so that the Einstein relation is obeyed.
Assuming that there is no correlation between fluctuations in different states, we
GENERAL THEORY OF CONCENTRATION FLUCTUATIONS 221
try a solution of form
and we find that the ansatz
satisfies the Einstein relation of Eq. (12.64), provided the steady state obeys
detailed balance, Eq. (12.56). Using Eq. (12.59), we have
which leads to
The total number of systems is N = ^n(c), which yields a formula similar to the
thermodynamic case, Eq (12.38).
In our model, however, the total number TV should be fixed, and we need to
force a constraint. The solution Eq. (12.68) we found is only a particular solution,
and can be added a solution of the homogeneous equation
For the Boltzmann case (e = 0), the solution Eq. (12.68) is replaced by
which obeys the constraint {[^ a Ara(a)]Are(c)} = 0 . For the Fermi and Bose
cases, we have
The added term is of order 1/N and therefore is unimportant in calculating the
fluctuations in any small portion of a large system. However, this term does affect
fluctuations in appreciable parts of a system.
Equation (12.50) can be readily applied to the case in which n(l) = N, the
number of conduction electrons, n(2) = N, the number of trapped electrons and
re(3) = Nv — P, number of electrons in the valence band, where Nv is the number
of valence band states and P is the number of holes. Thus
We have assumed nondegeneracy for the holes and the free electrons, but not for
the trapped electrons. Since An(3) = — P, we can write the second moments in
the form
These results are consistent with the constraint AP = AJV + A TV of charge

neutrality. If the number of traps is zero, they reduce to AP = AN and
12.5 Influence of drift and diffusion on modulation noise
To concentrate on the influence of drift and diffusion on density fluctuations and

modulation noise, let us return to the trap-free case discussed in Section 12.3,
where AJV = AP. The spectrum of voltage noise already given in Eq. (12.35) can
be rewritten as a product
of the total noise
and the normalized spectrum
where
Note that this normalization is four times that used for g(uj~) in Lax and Mengert
(1960). For simplicity, we confine ourselves to a one-dimensional geometry, as
INFLUENCE OF DRIFT AND DIFFUSION ON MODULATION NOISE 223
was done by Hill and van Vliet (1958), and calculate the total hole fluctuation as
We can now apply our techniques to continuous parameter systems by replacing

the sum
by the integral
Introducing a more convenient notation for the Green's function
we can write
so that the correlation at two times is, as usual, related to the pair correlation at the
initial time
It is customary to treat fluctuations at the same time at two places as uncorrelated.

This is clearly the case for independent carriers. It is less obvious when Coulomb
attractions (say between electrons and holes) are included. It was shown, however,
in Appendix C of Lax and Mengert (1960) that a delta function correlation is valid,
as long as we are dealing with distances greater than the screening radius. Thus we
can take
where the coefficient of the delta function is chosen so that the fluctuation in the
total number of carriers {(AP)2} is given correctly by Eq. (12.82). Here L is the
distance between the electrodes.
The definition, Eq. (12.34), of $(t) yields the expression
If the Green's function is defined, appropriately as in Lax (19601), to vanish for

t < 0, it will obey an equation of the form
where, the operator A is defined, in the continuous variable case, by
Here, v and D are the bipolar drift velocity and diffusion constants found by van
Roosebroeck (1953) to describe the coupled motion of electrons and holes while
maintaining charge neutrality
where the individual diffusion constants and mobilities are related by the Einstein
relation.
Equation (12.95) for the Green's function can be solved by a Fourier transform
method
where
Here A (A;) are the eigenvalues of the A operator
With Eq. (12.99) for k, the after-effect function can be calculated from Eq. (12.94)
where z = kL/2. Thus the spectrum, Eq. (12.85), is
where A has been re-expressed as a function of z,
Lax and Mengert (1960) provide an exact evaluation of this integral. However,
the resulting expressions are complicated. It is therefore worthwhile to treat some
INFLUENCE OF DRIFT AND DIFFUSION ON MODULATION NOISE 225
limiting cases. For example, if there is no diffusion, then
and the after-effect function is given by
where Ta = L/v is the transit time and the spectrum is governed by a windowing
factor W
with the window factor given by
Indeed, the current noise, in this special case, can be written in the form given by
Hill and van Vliet (1958)
which emphasizes the similarity to shot noise. The equivalent current is defined by
The window factor still takes the complicated form
where r = 1/r. Even this result is complicated to understand. If we take the

limiting case when recombination is unimportant over the transit time, the result
simplifies to
a windowing factor similar to that found associated with the effect of transit time
on shot noise.
In the opposite limit, in which diffusion is retained but drift is neglected, the
exact result for the spectrum is given by
where
and
is the reciprocal of the diffusion length. The exponential term represents an inter-
ference term between the two bounddaries that is usually negligible since they are
seperated by substantially more than a diffusion length. A simple approximate
from over intermediate frequencies is
In summary, in addition to the first term, which represents the volume noise easily
computed just by using the total carrier P(t), the term proportional to an inverse
frequency to the three-halves power arises from diffusion across the boundary at
the electrodes.
13
Random walk of light in turbid media
Light propagation in a multiple scattering (turbid) medium such as the atmo-

sphere, colloidal suspensions and biological tissue is commonly treated by the
theory of radiative transfer; see, for example, Chandrasekhar (1960). Recent
advances in ultrafast lasers and photon detectors for biomedical imaging and diag-
nostics have revitalized the interest in radiative transfer (Alfano 1994; Yodh et al.
1997; Gandjbakhche 1999). The basic equation of radiative transfer is the elastic
Boltzmann equation, a nonseparable integro-differential equation of first order for
which an exact closed form solution is not known except for the case for isotropic
scatterers as far as the authors know (Hauge 1974). Solutions are often based on
truncation of the spherical harmonics expansion of the photon distribution function
or resort to numerical calculation including Monte Carlo simulations (Ishimaru
1978; Cercignani 1988). In this chapter, we shall treat light propagation in turbid
media as a random walk of photons and determine the characteristics of light prop-
agation (center position and diffusion coefficients) from an analysis of the random
walk performed by photons in the turbid medium directly (Xu et al. 2004). In
the next chapter, a more advanced approach solving the elastic Boltzmann equa-
tion based on a cumulant expansion of the photon distribution function will be
presented.
13.1 Introduction
Clouds, sea water, milk, paint and tissues are some examples of turbid media. A
turbid medium scatters light strongly. Visible light shed on one side of a cup of
milk is much weak and diffusive observed on the other side of the cup because
light is highly scattered in milk while the absorption of light by milk is very low.
The scattering and absorption property of a turbid medium is described by the
scattering and absorption coefficients /j,s and /j,a, respectively. Their values depend
on the number density of scatterers (absorbers) in the medium and the cross-section
of scattering (absorption) of each individual scatterer (absorber). For a collimated
beam of intensity IQ incident at the origin and propagating along the z direction
inside a uniform turbid medium, the light intensity in the forward direction at
228 RANDOM WALK OF LIGHT IN TURBID MEDIA
position z is attenuated according to the Beer's law:
where HT = Us + Ha is the total attenuation coefficient. The portion of light

propagating in the exact forward direction is usually called "ballistic light". The
reduction of the intensity of ballistic light comes from the scattering of light into
other directions (called "multiply scattered light" or "diffusive light") and light
absorption in the medium. Inside the turbid medium, ballistic light decays expo-
nentially and only multiply-scattered light survives over some distance away from
the incident light source.
The theory to treat propagation of multiply scattered light in a turbid medium
is the theory of radiative transfer (Chandrasekhar 1960). Due to the difficulty in
solving the elastic Boltzmann equation which governs radiative transfer, in partic-
ular, in a bounded volume, solutions are often based on truncation of the spherical
harmonics expansion of the photon distribution function or resort to numerical
calculation including Monte Carlo simulations (Ishimaru 1978; Cercignani 1988).
Monte Carlo methods treat photon migration as a Markov stochastic process. The
solution to the elastic Boltzmann equation is equivalent to the probability of find-
ing a photon at any specified location, direction and time in the Monte Carlo
simulation. The advantage of the Monte Carlo method is that it can easily han-
dle, at least in principle, a bounded region, different boundary conditions and/or
heterogeneity of the medium. However, Monte Carlo methods are computation
prohibitive when the size of the sampling volume becomes large.
In the stochastic picture of photon migration in turbid media, photons take a
random walk in the medium and may get scattered or absorbed according to the
scattering coefficient /zs and the absorption coefficient â of the medium. A phase
function, P(s, s'), describes the probability of scattering a photon from direction s
to s'. The free path (step-size) between consecutive events (either scattering or
absorbing) has an exponential distribution / uêxp(—^yef) characterized by the
total attenuation //y. At an event, photon scattering takes place with probability
HS/^T (the albedo) and absorption with probability na/HT- This picture forms the
basis for the Monte Carlo simulation of photon migration.
Here we shall use this simple picture of a Markov stochastic process of pho-
tons to compute analytically macroscopic quantities such as the average central
position and half-width of the photon distribution. The idea is to first analyze the
microscopic statistics of the photon propagation direction in the direction space
which is solely determined by the phase function and the incident direction of light.
The connection between the microscopic statistics and the macroscopic quantities
at any specified time and position is made by a "bridge", a generalized Poisson
distribution pn(t), the probability that a photon has endured exactly re scattering
events before time t. In this book, we will restrict our discussion to light propaga-
tion in an isotropic turbid medium where the property of light scattering depends
MICROSCOPIC STATISTICS IN THE DIRECTION SPACE 229
FIG. 13.1. A photon moving along n is scattered to n' with a scattering angle
0 and an azimuthal angle ^ in a photon coordinate system xyz whose 2-axis
coincides with the photon's propagation direction prior to scattering. XYZ is
the laboratory coordinate system.
on the angle between s and s' rather than the directions and the phase function can
be written in a form of .P(s • s').
13.2 Microscopic statistics in the direction space
Denote the position, direction and step-size of a photon after ith scattering event
as xW, sW and S-l\ respectively. The initial condition is x(°) = (0, 0, 0) for the
starting point and s^0) = SQ — (0, 0, 1) for the incident direction. The laboratory
Cartesian components of x*^ and s^-1 are Xa and Sa (a = 1,2,3). The photon
is incident at time to = 0- For simplicity the speed of light is taken as the unit of
speed and the mean free path /u^1 as the unit of length.
The scattering of photons takes a simple form in an orthonormal coordinate
system attached to the moving photon itself where n is the
photon's propagation direction prior to scattering and m is an arbitrary unit vector
not parallel to n (see Fig. 13.1). The distribution of the scattering angle 9 € [0, vr]
is given by the phase function of the medium and the azimuthal angle (f) is uni-
formly distributed over [0 , 2-rr) . For one realization of the scattering event of angles
(0, (f>) in the photon coordinate system, the outgoing propagation direction n' of
the photon will be:
FIG. 13.2. The average photon propagation direction (vector) decreases as gr"
where g is the anisotropy factor and n is the number of scattering events.
The freedom of choice of the unit vector m reflects the arbitrariness of the xy axes
of the photon coordinate system. For example, taking m = (0, 0,1), Eq. (13.2)
gives
Here sa , etc, are stated in the laboratory coordinate system.

The ensemble average of the propagation direction over all possible realizations
of (6, (p) and then over all possible sW in Eq. (13.3) turns out to be (s(*+1)) =
(sW) (cos 8) because 0 and <j) are independent and (cos^) = (sin^) = 0. By
recursion,
where g = (cos 9) = 1 — g\ is the anisotropy factor (see Fig. 13.2).

MICROSCOPIC STATISTICS IN THE DIRECTION SPACE 231
By squaring the third equation in Eq. (13.3) and then taking an ensemble
average, we find
where 52 = | (sin2 6*) as (cos 0) = 0 and (sin2 0) = (cos2 0) = ^. By forming a

product from the first and third equation in Eq. (13.3) and then taking an ensemble
average, we find
Similar equalities are obtained for x and y components as the labels are rotated
due to the symmetry between x,y,z directions. The correlations between the
propagation directions are hence given by
On the other hand, the correlation between si and Sa (j > i) can be reduced
to a correlation of the form of Eq. (13.7) due to the following observation
where p(s^ |sW) means the conditional probability that a photon jumps from sW
at the ith step to s(J) at the jth step. Equation (13.8) is a result of the Chapman-
Kolmogorov condition (2.17), p(sW sW) = f ds^-^p(s^\s^~^)p(s^-^ sW)
of the Markovian process and the fact f ds^Sa p(s^\s^~^) = gs2 from
Eq. (13.3). Combining Eqs. (13.7) and (13.8), and using the initial condition of
s(°\ that is,
we conclude
where the constants /i = /2 = — 1 and /s = 2. Here we see that the autocorre-

lation of the x, y, or z component of the photon propagation direction approaches
1/3, i.e., scattering uniformly in all directions, after a sufficient large number of
scattering (a = (3 and j = i —» oo ), and the cross-correlation between them is
always zero (a ^ (3).
13.3 The generalized Poisson distribution pn(t)
The connection between the macroscopic physical quantities about the photon dis-
tribution and the microscopic statistics of the photon propagation direction is made
by the probability, pn(t), that the photon has taken exactly n scattering events
before time t (the (n + l)th event comes at t). We claim pn(t) obeys the general-
ized Poisson distribution. This claim was previously proved by Wang and Jacques
(1994):
which is the Poisson distribution of times of scattering with the expected rate
occurrence of IJL~I multiplied by an exponential decay factor due to absorption.
Here we have used p^1 = 1 as the unit of length. This form ofpn(t) can be easily
verified by recognizing first that po(t) = exp(—t) equals the probability that the
photon experiences no events within time t (and the first event occurs at t); and
second that the probability pn+i (t) is given by
MACROSCOPIC STATISTICS 233
in which the first event occurred at t' is scattering and followed by n scattering
events up to but not including time t, which confirms Eq. (13.11) at n + 1 if
Eq. (13.1 1) is valid at n. The total probability of finding a photon at time t
decreases with time due to the annihilation of photons by absorption.
13.4 Macroscopic statistics
The average propagation direction (s(t)} at time t is then
Plugging Eqs. (13.4) and (13.11) into Eq. (13.14), we obtain

<s(t)) = zexp(-fiagit) = zexp(-t/lt). (13.15)
l
Here lt = nj /(I — g) is usually called the transport mean free path which is the
randomization distance of the photon propagation direction.
The first moment of the photon density with respect to position is thus
revealing that the center of the photon cloud moves along the incident direction for
one transport mean free path lt before it stops (see Fig. 13.3).
The second moment of the photon density is calculated as follows. Denote
p(&2, £21 si, ti) the conditional probability that a photon jumps from a propagation
direction si at time t\ to a propagation direction 82 at time t% (£2 > ti > 0). The
conditional correlation of the photon propagation direction subject to the initial
condition is given by
Denote the number of scattering events encountered by the photon at states (si, t\)
and (82, £2) as n\ and n^ respectively. Here n<i > n\ since the photon jumps from
(si, t\) to (82, £2)- Equation (13.17) can be rewritten as
The evaluation of the denominator in Eq. (13.18) is simple and is given by

Sn2 Pni (*2) = ex P(~/ u a*2)- To evaluate the numerator in Eq. (13.18), we proceed
as follows:
where C"1 = n,2'/[( n 2 ~~ rai)'ni'] and we have repeatedly used the binomial
expansion of (a + b)n = Jô<fc<n Cân^kbk. With this result, we obtain
This exact result for ( s p ( t 2 ) s a ( t i ) ) can be easily verified to agree with the
regression theorem discussed in Chapter 8.
The second moment of the position is then
The diffusion coefficient is obtained from

MACROSCOPIC STATISTICS 235
after integration. Our main result, Eqs. (13.16) and (13.22), agrees with Eqs.
(14.31)-(14.33) in Chapter 14 derived by the cumulant expansion.
The general form of the photon distribution depends on all moments of the
distribution. However, after a sufficiently large number of scattering events have
taken place, the photon distribution approaches a Gaussian distribution over space
according to the central limit theorem (Kendall 1999). This asymptotic Gaussian
distribution, characterized by its central position and half-width (2Dt), is then
where the normalizing factor is C(t) = exp(—// O t) owing to Eq. (13.13). This
provides a "proper" diffusion solution to radiative transfer, revealing a behavior
of light propagation that photons migrate with a center that advances in time, and
with an ellipsoidal contour that grows and changes shape (see Fig. 13.3).
It is also worth mentioning that the absorption coefficient only appears in
the generalized Poisson distribution pn(t) through an exponential decay factor
exp(—Hat}- This exponential factor will be canceled in the evaluation of the condi-
tional moments of the photon distribution, see Eqs. (13.17) and (13.18). Hence, the
sole role played by absorption is to annihilate photons and affects neither the shape
of the distribution function nor the diffusion coefficient (Durduran et al. 1997; Cai
et al. 2002).
The results, except for the Gaussian photon distribution Eq. (13.23), are exact
under the sole assumption of a Markov random process of photon migration. The
deviation from a Poisson distribution of scattering or absorption events can be
dealt with by modifying pn(t). The Markov random process is usually a good
description of scattering due to short-range forces such as photon migration in
turbid media. In situations where interference of light is appreciable, the phase
of photon, which depends on its full past history, must be considered and this
is non-Markovian. Non-Markov processes may also occur in scattering involving
long-range forces such as Coulomb interaction between charged particles in which
the many-body effect cannot be ignored.
236 R A N D O M WALK OF LIGHT IN T U R B I D MEDIA
FlG. 13.3. The center of a photon cloud approaches lt along the incident direction
and the diffusion coefficient approaches lt/3 with increase of time.
We should finally point out that this treatment is for a scalar photon. Light is
a vector wave. The vector nature produces some intrigue effects of multiply scat-
tered light including the polarization memory effects where light polarization is
preserved over a long distance where light is already diffusing. A scattering matrix,
as opposed to the scalar phase function, needs to be used to describe polarized light
scattering in turbid media. Nevertheless, the simple picture of a random walk of
light can be generalized to treat propagation and depolarization of polarized light in
turbid media. Characteristic lengths governing depolarization of multiply scattered
light can be determined analytically and explain the observed memory effects. The
interested reader may refer to Xu and Alfano (2005, 2006) and references therein.
14
A n a l y t i c a l solution of the elastic transport equation
14.1 Introduction
An example of a random process is the particle's (or photons, or acoustic wave)

propagation in a turbid medium, where particles suffer multiple scattering by ran-
domly distributed scatterers in the medium. The kinetic equation governing the
particle's propagation is the classic Boltzmann transport equation, which is also
called the radiative transfer equation in the case of light propagation. The search
for an analytical solution of the time-dependent elastic Boltzmann transport equa-
tion has lasted for many years. Beside being considered as a classic problem in
fundamental research in statistical dynamics, a novel approach to an analytical
solution of this equation may have applications in a broad variety of fields. The
common approaches to solve this equation are as follows. Based on the angular
moment expansion with cut-off to certain order, the Boltzmann transport equation
is transferred to a series of moment equations. In lowest order, a diffusion equa-
tion is derived and its analytical solution in an infinite uniform medium is obtained.
The diffusion approximation fails at early times when the particle distribution is
still highly anisotropic. The solutions of the diffusion equation or the telegrapher's
equation do not produce the correct ballistic limit of particle propagation. Numeri-
cal approaches, including the Monte Carlo simulation, are the main tools in solving
the elastic Boltzmann equation, which are cumbersome tasks.
In this chapter we seek an analytical solution of the classic elastic Boltzmann
transport equation in an infinite uniform medium, with the particle's velocity
v = vs, where s is a unit vector of direction, and v is the (constant) speed in
the medium. We assume that the phase function, P(s, SQ), depends only on the
scattering angle: F(s, SQ) = P(s • SQ). Under this assumption, we can handle
an arbitrary phase function for obtaining the particle distribution, /(r, s, t), as a
function of position r, angle s and time t, and that of the particle density distribu-
tion, N(r, t). Our approach is as follows. At first, the exact expression of the total
angular distribution, F(s, t), as a function of time in an infinite uniform medium
is derived. Based on this angular distribution, we derive exact spatial cumulants of
/(r, s, t) and N(r, t) up to an arbitrary high order at any angle and time (Cai, Lax,
and Alfano 2000b). A cut-off at second order of cumulants, /(r, s, t) and N(r, t)
can be expressed by Gaussian distributions (Cai, Lax, and Alfano 2000a) which
238 ANALYTICAL SOLUTION OF THE ELASTIC T R A N S P O R T EQUATION
have exact first cumulant (the position of the center of the distribution) and exact
second cumulant (the half-width of the spread of the distribution). After many scat-
tering events have taken place, the central limit theorem guarantees that the spatial
Gaussian distribution we calculate will become accurate in detail, since the higher
cumulants become relatively small. At early time, the analytical expressions of a
modified non-Gaussian distributions will be presented in Section 14.4 (Cai, Xu,
and Alfano 2005). The solution has been extended to the case of polarized photon
distribution, to semi-infinite and slab geometries. By use of a perturbative method,
the distribution in a weak heterogeneous scattering medium can be computed.
14.2 Derivation of cumulants to an arbitrarily high order
The elastic Boltzmann kinetic equation of particles, with magnitude of velocity

v, for the distribution function /(r, s,t) as a function of time t, position r and
direction s, in an infinite uniform medium, from a point pulse light source, 6(r —
r0)6(s - sQ)6(t - t0), is given by
where fis is the scattering rate, //0 is the absorption rate, and P(s, s') is the phase
function, normalized to J P(s, s')ds' = 1. When the phase function depends only
on the scattering angle in an isotropic medium, we can expand the phase function
in Legendre polynomials with constant coefficients,
A difficulty in solving Eq. (14.1) is that the term vs • Vr-f (r, s, t) makes compo-
nents of spherical harmonics of /(r, s,t) coupling with each other. We first study
the dynamics of the distribution in direction space, F(s, SQ, t), on a spherical sur-
face of radius 1. The kinetic equation for F(s, SQ, t) can be obtained by integrating
Eq. (14.1) over the whole spatial space, r. The spatial independence of/i s , fj,a, and
P(s, s') retains translation invariance. Thus the integral of Eq. (14.1) obeys
Since the integral of the gradient term over all space vanishes, in contrast to Eq.
(14.1), if we expand F(s, SQ,£) in spherical harmonics, its components do not
DERIVATION OF CUMULANTS TO AN A R B I T R A R I L Y HIGH ORDER 239
couple with each other. Therefore, it is easy to obtain the exact solution of Eq.
(14.3):
where
Two special values of g\ are: go = 0, which follows from the normalization
of F(s, s') and g\ = v/ltT, where ltl is the transport mean free path, defined by
^tr = v/[ns(l — (cos6})], where (cos9) is the average of s • s' with P(s,s') as
weight. In Eq. (14.4), Y/ m (s) are spherical harmonics normalized to 4vr/(2/ + 1).
Equation (14.4) serves as the exact Green's function of particle propagation
in the velocity space. Since in an infinite uniform medium this function is inde-
pendent of the source position, TO, the requirements for a Green's function are
satisfied, and especially, the Chapman-Kolmogorov condition (see Section 2.4) is
obeyed:
In fact, in an infinite uniform medium, this propagator determines all particle

migration behavior, including its spatial distribution, because displacement is an
integration of velocity over time. The distribution function /(r, s, t) (the source is
located at TO = 0) is given by
where (...) means the ensemble average in the velocity space. The first delta func-
tion imposes that the displacement, r — 0, is given by the path integral. The second
delta function assures the correct final value of direction. Equation (14.6) is an
exact formal solution of Eq. (14.1), but cannot be evaluated directly. We make a
Fourier transform for the first delta function in Eq. (14.6), then make a cumulant
expansion, (a detailed explanation of cumulants, see Section 1.7) and obtain
240 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
where T denotes time-ordered multiplication. In Eq. (14.7), the index c denotes
the cumulant, which is denned as (A)c = ( A ) , (A2)c = (A2) - ( A ) ( A ) . A general
expression relating the moment (Am) and the cumulant (Am}c is given by:
Hence, if (Am} (m = 1,2, ...re) have been calculated, (Am}c (m = 1,2, ...re) can
be recursively obtained, and conversely.
In the following, we derive the analytical expression for the ensemble average
{/o dtn.... JQ d t i T [ s j n ( t n ) . . . S j 1 ( t i ) ] } . Using a standard time-dependent Green's
function approach, it is given by
where the word "perm" means all n! — 1 terms obtained by permutation of { j i } ,

i = 1,..., re , from the first term. An intuitive way to understand Eq. (14.9) is use
of a basic concept of quantum mechanics. The left side of the equation is written
in the Heisenberg representation, while the right side of the equation is written in
the Schrodinger representation. Another way is to use the Feymann path integral
approach, considering Eq. (14.5), which leads to the same formula.
InEq. (14.9), F(s(-i'>,s^~l\ti - tî) is given by Eq. (14.4). Since Eq. (14.4)
is exact, Eqs. (14.9) provides the exact reth moments of the distribution.
In Cartesian coordinates three components of s are [s x , sy, s z ] . For convenience
in calculation, however, we will use the components of s on the base of spherical
harmonics:
The recurrence relation of the spherical harmonics is given by

DERIVATION OF CUMULANTS TO AN ARBITRARILY HIGH ORDER 241
where i = ±1, and {ii,/2,rai,m2|i,m) is the Clebsch-Gordan coefficients of
angular momentum theory, which are
with the row index (from above) j = —1,0,1 and the column index (from the left)
i = 1,0, — 1. The orthogonality relation of spherical harmonics is given by
Using Eq. (14.11) and Eq. (14.13), the integrals over dsn...ds in Eq. (14.9)
can be analytically performed. We obtain, when SQ is set along z, that
Note that all ensemble averages have been performed. Equation (14.15) involves
integrals of exponential functions, which can be analytically performed. Equation
(14.15) includes all related scattering and absorption parameters, gi, I = 0,1,...
and na, and determines the time evolution dynamics. The final particle direction,
s, appears as the argument of the spherical harmonics Yj m (s) in Eq. (14.14). Sub-
stituting Eq. (14.15) into Eq. (14.14), and using a standard cumulant procedure,
the cumulants as functions of angle s and time t up to an arbitrary nth order can
be analytically calculated. The final position, r, appears in Eq. (14.7), and its com-
ponent can be expressed as |r|lij(r) , j = 1,0, —1, with r and f are, separately,
the magnitude and the unit direction vector of r. Then, performing a numerical
three-dimensional inverse Fourier transform over k, an approximate distribution
function, /(r, s,i), accurate up to nth cumulant, is obtained.
14.3 Gaussian approximation of the distribution function
By a cut-off at the second cumulant, the integral over k in Eq. (14.7) can be analyt-
ically performed, which directly leads to a Gaussian spatial distribution displayed
in Eq. (14.17). The exact first cumulant provides the correct center position of the
distribution. The exact second cumulant provides the correct half-width of spread
of the distribution. The expressions below are given in Cartesian coordinates with
indexes a, (3 = [ x , y , z ] . These expression is obtained by use of an unitary trans-
form sa = UajSj j = 1,0, —1 from Eq. (14.14) (up to second order) which is
based on Sj = YIJ(S), with
We set SQ along the z direction and denote s as (9, (/>). Our cumulant approximation
to the particle distribution function is given by
with the center of the packet (the first cumulant), denoted by rc, located at
where is defined after

and
GAUSSIAN APPROXIMATION OF THE DISTRIBUTION FUNCTION 243
c
ry is obtained by replacing cos<f> in Eq. (14.19) by sm<f>. In Eqs. (14.18,14.19),
PI (cos 0) is the associated Legendre function.
The square of the average spread width (the second cumulant) is given by
where all the coefficients are functions of angle and time:
where (+) corresponds to A xx and (—) corresponds to
yz is obtained by replacing cos <p in Eq. (14.25) by sin <p. In Eqs. (14.22)-( 14.25)
A cumulant approximation for the particle density distribution is obtained from

the exact expression
we have a Gaussian shape
with a moving center located at
and the corresponding diffusion coefficients are given by
In contrast to Eqs. (14.18), (14.19) and (14.22)-(14.25), the results for N(r, t)
are independent of gi for I > 2. Each distribution in Eq. (14.17) and Eq. (14.30)
describes a particle "cloud" anisotropically spreading from a moving center, with
time-dependent diffusion coefficients. At early time t —» 0, /(<?) « t + O(t 2 ) in
Eq. (14.20), and E\j) w t 2 / 2 + O(t3) for j = 1, 2, 3,4 in Eqs. (14.26)-(14.29).
From Eqs. (14.18), (14.19), Eqs. (14.22)-(14.25), and Eqs. (14.31)-(14.33), we
see that for the density distribution, N(r, t), and the dominant distribution func-
tion, that is J(r, s, t) along s = SQ, the center moves as vtso and the Bap in Eq.
(14.21) are proportional to t3 at t —» 0. These results present a clear picture of
nearly ballistic motion at t —» 0. With increase of time, the motion of the center
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION245
FlG. 14.1. The moving center of photons, Rz, and the diffusion coefficients, Dzz
and Dxx, as function of time, where g\ are calculated by Mie theory, assuming
water drops with a/A = 1, with a the radius of droplet and A the wavelength
of light, and the index of refraction m = 1.33.
slows down, and the diffusion coefficients increase from zero. This stage of parti-
cle migration is often called a "snake-like mode". At large times, the distribution
function tends to become isotropic. The particle density, at t » l^/v and r > / tr ,
tends towards the center-moved (1/tr) diffusion solution with the diffusive coeffi-
cient /tr/3. Therefore, our solution quantitatively describes how particles migrate
from nearly ballistic motion to diffusive motion, as shown in Fig. 14.1.
Figure 14.2 shows the light distribution as a function of time at different
receiving angles in an infinite uniform medium, computed by the second cumu-
lant solution, where the detector is located at 5/tr from the source in the incident
direction of the source.
14.4 Improving cumulant solution of the transport equation
The analytical solution obtained, although it has exact center and half-width, is
not satisfied in two aspects. First, at very early times, exp(—git) —>• I for all /,
hence, one cannot ensure summation over / to be convergent. Second, particles at
the front edge of the Gaussian distribution travel faster than speed v, thus violating
causality.
FIG. 14.2. The time-resolved profile of light at different angles measured on a

detector 10 mm from the source in the incident direction. The parameters for
this calculation are / t r = 2 mm, la = 300 mm, the phase function is computed
using Mie theory for polystyrene spheres in water, with diameter d = 1.11 /im,
and the wavelength of the laser source A = 625 nm, which gives the g-factor
g = 0.926.
Separating the ballistic component from the scattered component

In order to make the summation over I convergent, we separate the ballistic
component from the total 7(r, s, t) and compute the cumulants for the scattered
component 1^ (r, s, t).
The ballistic component is the solution of the homogeneous Boltzmann trans-
port equation, which is the transport equation, Eq. (14.1), without the "scattering
in" term (the first term in right side of Eq. (14.1)). The solution of the ballistic
component is given by
The moments of the ballistic component can be easily calculated. When SQ is along
z, we have
and other moments related to are zero.

The total distribution is the summation of the ballistic component and the
scattered component:
hence, the moments of the scattered component can be obtained by subtracting the
corresponding ballistic moments from the moments of I(r, s, t ) . For example, we
have
Notice that
Substituting Eq. (14.38), (14.35) into Eq. (14.37), the corresponding cumulants for
scattered component 1^ (r, s, t) can be easily obtained, which are the following
replacements of Eqs. (14.4), (14.18), and (14.22):
The expressions of other components of the first and second cumulants are
unchanged, provided all F(s, so,i) in G in Section 14.3 is replaced by
F^ (s, SQ, t). Note that Eq. (14.38) actually is equal to zero at s ^ SQ, and there is
no ballistic component at these directions.
The replacement of equations in Section 14.3 by Eqs. (14.39)-(14.41) greatly
improves calculation of cumulants at very early times. By subtraction introduced
above, the terms for large I approaches to zero, and summation over / becomes
convergent at very early times. Because g\ = ^s[l — ai/(2l + l)], which approaches
FlG. 14.3. Time-resolved profile of the backscattered (180°) photon intensity

inside a disk with center at r = 0, radius R = lltr, thickness dz = 0.1/tr
and the received angle dcosO = 0.001, normalized to inject 1 photon. The
Heyney-Greenstein phase function with g = 0.9 is used, and l// a = 0. The
solid curve is for the second cumulant solution (Gaussian distribution), and
dots are for the Monte Carlo simulation. The insert diagram is the same result
drawn using a log scale for intensity.
to /j,s for large /, f(gi — gi±i) ~ t, and E^ ~ t 2 /2 when t —> 0, which results
in cancellation in the summand for large / at very early times.
An example of successful use of this replacement is calculation of backscat-
tering. When 9 = 180°, Pi(cos6) = 1 or —1, depending on I even or odd.
The computed rcz at very early times using Eq. (14.18) oscillates with a cut-off
of 1. But the computed TZ at very early times using Eq. (14.40) becomes sta-
c(s)
ble. Calculation shows that TV = 0 at any time for any phase function when
9 = 180°.
Figure 14.3 shows the computed time profile of the backscattering intensity
l(s\r, s, t) at a detector centering at r = 0 and detection angle 9 = 180°, compar-
ing with the Monte Carlo simulation. The absolute value of intensity, as well as the
shape of the time-resolved profile, computed using our analytical cumulant solu-
tion matches well with that of the Monte Carlo simulation. The insert diagram is
the same result drawn using a log scale for intensity. Note, this result of backscat-
tering, based on solution of the transport equation, is for a detector located near the
source, different from other backscattering results based on the diffusion model,
which are only valid when the detector is located with a distance of several / tr from
the source.
Shape of the particle distribution

If cumulants with order n > 2 are assumed all zero, the distribution becomes
Gaussian. The Gaussian distribution is accurate at long times. At early times, par-
ticles at the front edge of distribution travel faster than free speed of particles, thus
violates causality, especially for particles move along near forward directions. In
the following, two approaches are used for overcoming this fault: (A) including
higher cumulants; and (B) introducing a reshaped distribution.
A. Calculation including high order cumulants We have performed calculation
including the higher order cumulants to obtain more accurate shape of the distri-
bution. Codes for calculation is designed based on the formula in Section 14.2.
Figure 14.4 shows /(r, s, t) with a detector located at z = 6/tr front of source
and received direction along (9 = 0, computed using the analytical cumulant solu-
tion up to tenth order of cumulants (solid curve), to the second order cumulants
(dotted curve), the diffusion approximation (thick dotted curve), and the Monte
Cairo simulation (discrete dots). The figure shows that the tenth order cumulant
solution is located in the middle of the data obtained by the Monte Carlo simu-
lation, and /(r, s, t) K- 0 before the ballistic time t\, = Ql^/v. The second order
cumulant solution has nonzero /(r, s, t) before tf,, which violates causality. The
computed JV(r, t)/4vr using the diffusion model has a large discrepancy with the
Monte Carlo simulation, and the diffusion solution has more nonzero components
before fy,, which violates causality.
Using the second order cumulant solution, the distribution function can be com-
puted very fast. The associated Legendre functions can be quickly computed using
recurrence relations with accuracy limited by the computer machine error. It takes
a minute to produce 105 data of /(r, s, t) on a personal computer. On the other
hand, in order to reduce the statistical fluctuation to a level shown in Fig. 14.4, 109
events are counted in the Monte Carlo simulation, which takes tens of hours com-
putation time on a personal computer. Computation of high order cumulants also
is a cumbersome task, because the number of involved terms rapidly grows with
increase of order n. Also, It has been proved that as long as there are some nonzero
cumulants higher than 2, all cumulants up to infinite order must be nonzero (See
Section 8.3). Therefore, no matter how a cut-off at a finite order n > 2 is taken,
the cumulant solution of the Boltzmann transport equation cannot be regarded to
be exact.
B. Reshaping the particle distribution For practical applications, we use a
semiphenomenological model. The Gaussian distribution is replaced by a new
FlG. 14.4. Time-resolved profile of transmission light in an infinite uniform

medium, computed using the tenth order of cumulants solution (solid curve),
the second cumulant solution (dotted curve), and the diffusion approximation
(thick dots curve), comparing with that of the Monte Carlo simulation (dis-
crete dots). The detector is located at z = 6^r from source along the incident
direction, and the received direction is 0 = 0. The Heyney-Greenstein phase
function with g = 0.9 is used, and the absorption coefficient l/la = 0.
shaped form, which maintains the correct center position and the correct half-width
of the distribution. The new distribution satisfies causality, namely, /(r, s, t) = 0
outside the ballistic limit, vt. There are an infinite number of choices of shape of
the distribution under the above conditions. We choose a simple analytical form as
discussed later. At long times, the half-width of the distribution a ~ (4B) 1 / 2 , with
B shown in Eq. (14.21), spreads with t 1 / 2 , hence, a <C vt at large t, and the Gaus-
sian distribution at long times with half-width a can be regarded completely inside
the ballistic sphere. The new reshaped distribution of I(r,s,t), hence, should
approach to the Gaussian distribution at long times.
ID density We first discuss the one-dimensional (ID) density as an example to

explain our model.
The Gaussian distribution of ID density is described by
FlG. 14.5. The ID spatial photon density at time t = 2ltT/v, obtained by

the reshaped form Eq. (14.43) (solid curve) and the Gaussian form (dashed
curve), comparing with that of the Monte Carlo simulation (dots). The
Heyney-Greenstein phase function with g = 0.9 is used, and l/la = 0. In
the figure, the unit in z axis is ltr', Rc is the center position of distribution com-
puted by the cumulant solution; zc is the distance between the origin of the
new coordinates and the source.
where Rcz and Dzz in Eqs. (14.31), (14.32). As shown in Fig. 14.5, although
the ID Gaussian spatial distribution (the dashed curve) at time t = 2ltT/v, Eq.
(14.42), has the correct center and half-width, the curve deviates from the distri-
bution computed by the Monte Carlo simulation (dots), and a remarkable part of
the distribution appears outside the ballistic limit vt = 2/ tr - At early times the
spatial distribution is not symmetric to the center Rc. While Rc moves from the
source toward the forward side, causality prohibits particles appearing beyond vt.
This requires the particles in the forward side being squeezed in a narrow region
between Rc and vt. For a balance of the parts of distribution in the forward and
backward sides of Rc, the peak of the distribution should move to a point at the
forward side and the height of the peak should increase. Based on this observation
we propose the following analytical expression: (1) to move the peak position of
the distribution from Rcz to zc, while the parameter zc will be determined later; (2)
to take this point as the origin of the new coordinates; and (3) to use the following
form of the shape of ID density in new coordinates:
where
At the ballistic limit z = z±, N(z) reduces to zero, and N(z) = 0 when z is
outside of z±. The parameter b in Eq. (14.43) can be determined by normalization;
the parameters (a, zc) can be determined by fitting the center and half-width of the
distribution. This fit requires
The integrals in Eqs. (14.45), (14.46), and (14.47) can be analytically performed,
related to the standard error function:
The solid curve in Fig. 14.5 shows the reshaped spatial distribution, Eq.
(14.43), of the ID density at time t = 21^ /v, using the Heyney-Greenstein phase
function with g = 0.9, which satisfies causality and matches the Monte Carlo
result much better than the Gaussian distribution.
For nonlinear fitting a difficulty is how to quickly find a global minimum. The
optimization codes require setting a good initial value of the parameters, so the
obtained local minimum is the true global minimum. The following procedure is
used to quickly obtain a global minimum. At the long time limit, zc « Rcz, and
c? Ki (4Dzzvt)~l, the distribution approaches the original Gaussian distribution.
We set these value of parameters at a long time tm, and take them as initial values,
using a nonlinear fitting, to determine the parameters at tm-i = tm — At, where
At is a small time interval. Then, we use parameters at t m _i as initial values to
determine parameters at t m _2- Step by step, the parameters in a whole time period
can be computed.
SD-density In this case the ballistic limit is represented by a sphere with center
located at the source position and radius vt. We move the peak position of the
distribution from Rcz to zc along the SQ = z direction, take this point as the origin
of the new coordinates, and use the following form of the shape of 3D density as a
function of the position in the new coordinates, f:
where N(r) = 0 when r > r*, and x is the polar angle of r in the new coordinates,
f* is the distance between the new origin and the point by extrapolating f to the
surface of the ballistic sohere:
In Eq. (14.52) a(x) is defined by
The parameters b can be determined by normalization; the parameters (o^, aj_, zc)
are determined by fitting the center and half-width of the distribution. This fit
requires
In the above integral (fir = 2'rrf'2dfdcos(x), integration over f can be analytically

performed, integration over x is performed numerically.
Figure 14.6 shows the computed time profile of the 3D density N(r, t), with the
source at the origin and the detector is located at r = (0, 0, 3/ tr ), using the Heyney-
Greenstein phase function with g = 0.9. The solid curves is for the reshaped form
Eq. (14.52). The dashed curve is for the Gaussian form, and the dots are for the
Monte Carlo simulation. The results clearly demonstrate an improvement by use
of the reshaped form than that by use of the Gaussian form. The nonzero intensity
FIG. 14.6. Time-resolved profile of 3D photon density, where the detector is

located at z = 3/tr from source along the incident direction, obtained by
the reshaped form Eq. (14.52) (solid curve) and the Gaussian form (dashed
curve), comparing with that of the Monte Carlo simulation (dots). The
Heyney-Greenstein phase function with g = 0.9 is used, and the absorption
coefficient l/la = Q.
before £& = 3ltr/v has been completely removed in the reshaped form, while
the Gaussian distribution has nonzero components before t\,. The reshaped time
profile matches with result of the Monte Carlo simulation in most time periods,
but the peak values is about 20% lower. The errors are much smaller than that of
the Gaussian distribution. By integration over time, the density for the steady state
can be obtained. The difference in the steady state density between the reshaped
analytical model and the Monte Carlo simulation is about 3%.
Distribution function 1^ (r, s, t) When the detector is located less than 8/tr from
the source in a medium with large (/-factor, the distribution function /( s )(r, s, t) is
highly anisotropic, and the intensity received strongly depends on the angle. One
needs to use the photon distribution function /^(r, s,t) instead of the photon
density N(r, t).
In this case the center position rc, as a function of (s, t), is not located on the
axis at incident direction SQ. Without loss of generality, we set the scattering plane
(s, SQ) as the x-o-z plane. The center position now is located at rc = (r£, 0, rcz}.
The orientations and lengths of axes of the ellipsoid, which characterize the half-
width of spread of the distribution, can be computed as follows. The nonzero
FIG. 14.7. Schematic diagram describing the geometry of the particle spatial
distribution for scattering along a direction s ^ SQ. At certain time t, the center
of the distribution is located at rc. The half-width of the spread is characterized
by a ellipsoid (the gray area). The large sphere represents the ballistic limit.
The origin of new coordinates is set by extending from \rc to zc. f * is a point
by extrapolating a position f (in the new coordinates) to the surface of the
ballistic sphere, and the length f* is determined by Eq. (14.53).
components for the second cumulant now are (Bxx,Bxz,Bzz,Byy), expressed in

Eq. (14.21). Byy represents the length of one axis of the ellipsoid, perpendicular
to the scattering plane. By diagonalizing the matrix
the lengths and directions of other two axes of the ellipsoid on the scattering plane
can be obtained. In fact, calculation shows that the direction of rc is also the direc-
tion of one axis of the ellipsoid, since at a certain time t the direction rc can
replace s as the unique special direction in the scattering plane. In order to reshape
the distribution we choose a new z axis along the rc direction, and move the peak
position of the distribution from \rc\ to zc, and take this point as the origin of the
new coordinates (x, y = y, z), as shown schematically in Fig. 14.7.
In the new coordinates we use a shaped form similar to that of the 3D density
Eqs. (14.52), while a(x) in Eq. (14.52) is
where x and (p are, separately, the polar angle and the azimuthal angle of a position
r in new coordinates. The parameters (ax, ay,az, zc) are determined by fitting the
center rc and lengths of three axes of the ellipsoid characterizing the half-width
of the distribution. In many cases, the ellipsoid can be approximately treated as
an ellipsoid of revolution, with the length of the axis of the ellipsoid along the x
direction approximately equal to that along the y direction then the computation
can be simplified. The new shaped distribution function 7^(r, s, i) for a certain
direction s is normalized to F^ (s, SQ, t).
Figure 14.8 shows the computed time profile of distribution function
/( s )(r, s, £), when the detector is located at 4/ tr in front of the source, using the
Heyney-Greenstein phase function with g = 0.9. Fig. 14.8(a), (b) are, sepa-
rately, for different directions of light s: 9 = 0 and 30°. The solid curves are
for the reshaped form Eq. (14.52) and the dashed curve is for the Gaussian form.
The dots are for the Monte Carlo simulation. Anisotropic distribution is shown by
comparing with Fig. 14.8(a) and Fig. 14.8(b). The reshaped distribution removes
the intensity before tb = 4/ tr /t>, which appears in the Gaussian distribution.
The reshaped distribution matches the Monte Carlo result much better than the
Gaussian distribution.
While causality, together with the correct center and half-width of the distribu-
tion, are major controlling factors in determining the shape and the range of the
particle distribution, the detail shapes are, to some extent, different by use of the
different models.
For s at the near backscattering direction, the Gaussian distribution can be a
good approximation as shown in Fig. 14.3, because most particles suffer many
scattering events to transfer from the forward direction to the backward direction.
Our calculation shows that the center position rc is close to the source for 0 KI 180°
and far from the ballistic limit, hence, reshape has little effect on the backscattering
case.
Beside improving convergence, separating the ballistic component from the
scattered component also provides a more proper time-resolved profile for trans-
mission. In the time-resolved transmission profile the ballistic component is
described by a sharp jump exactly at time vt, separated from later scattered
component. The intensity of the ballistic component, comparing to the scattered
component, strongly depends on the ^-factor. For g = 0, l^ = ls, the ballistic
component decays to exp(—1) = 0.368 at distance 11^. But for g = 0.9 it decays
to exp(—10) = 4.54 x 10~5 at l/tr» because l^ = 10/s. The jump of the ballis-
tic component can be seen in experiments of transmission of light for medium of
small sized scatterers (small ^-factor), but is difficult to be observed for medium
of large sized scatterers (large (/-factor). Our formula provide a proper estimation
for both small and large ^-factors by explicitly separating these two components.
Using the obtained analytical expressions, the distribution I(r, s, t) can be
computed very quickly. The cumulant solution has been extended to the polarized
photon distribution (Cai, Lax, and Alfano 2000c), and extended to semi-infinite
and slab geometries (Xu, Cai, and Alfano 2002, Cai, Xu, and Alfano 2003). Using
FlG. 14.8. Time-resolved profile of photon distribution function, for light direc-
tions (a) 9 = 0, (b) 6 = 30°, where the detector is located at z = 4/tr
from the source along the incident direction, obtained by the reshaped form
Eq. (14.52) (solid curves) and the Gaussian form (dashed curve), comparing
with that of the Monte Carlo simulation (dots). The Heyney-Greenstein phase
function with g = 0.9 is used, with the absorption coefficient l/la = 0.
a perturbative approach, the distribution in a weak hetergeneous medium can also

be calculated based on the cumulant solution (Cai, Xu, and Alfano 2003). The
nonlinear effect for strong heterogeneous objects inside the medium can also be
calculated using a correction of a "self-energy" diagram. (Xu, Cai, and Alfano
2004).
15
Signal extraction in presence of smoothing and noise
15.1 How to deal with ill-posed problems
The importance of ill-posed problems

The problem of extracting a signal from a distorted output (i.e., the solution of
an inverse problem) in the presence of noise is a ubiquitous one. It occurs in the
analysis of spectral data (Jansson 1970), in geophysical problems requiring the
inverse of potential theory (Bullard and Cooper 1948), or of heat conduction (John
1955), and in the inversion of information from optical instruments, electron spec-
troscopy, radioastronomy (Kaplan 1959), medical image, etc. A related ill-posed
problem is that of pattern analysis or recognition (Benjamin 1980). Also of interest
is the use of lasers to probe atmospheric temperature distributions (Hillary et al.
1965). The most recent example is the analysis and repair of distorted signals from
the Hubble telescope.
The nature of ill-posed problems

In what sense is the typical inversion problem an ill-posed problem? The signal
(e.g., the line shape) s(x) we are trying to determine is acted on by some apparatus
K, and the output is contaminated by noise n. In a simple linear case, the measured
output m(x) is given by
If there were no noise, and if m(x) were measured with infinite precision, one
could invert the integral equation to write
where
1
i.e., K is the kernel inverse to K(x, y).
SOLUTION CONCEPTS 259
The difficulty is that all kernels K(x,y) perform some smoothing on their
input. Thus one could add to the solution s(x) a high frequency term Asmujx
such that
for any small e by choosing u> sufficiently large - for any A, even an intense A.
Thus, if our measured result is precise only to within a noise n(x) of order e, many
different solutions are possible such that
One may object, however, that the solution s(x) should be smooth and not con-
tain a superposition of high frequency components. The answer is that the problem
is, in general, not well-posed (that is with a unique solution) until the nature of the
smoothness is specified. Unfortunately, the smoothness of the solution may not be
known in advance. One is attempting to determine this from the measured results!
However, if one makes no specification of smoothness, it is difficult to tell which
of the frequencies in the measurement m(x) are properties of the signal, and which
are spurious. It is our opinion, and that of a number of others, whose work will be
discussed shortly, that this issue can only be resolved by making a separate mea-
surement of the spectrum associated with the noise n(x) (or its autocorrelation
(n(x)n(x + x')}). Not only the shape of the noise spectrum, but also its intensity
is relevant, since Fourier components in a particular measurement much below the
corresponding noise intensity can not be regarded as significant, and should be
excluded from the estimated signal s(x). Thus we see that the correct procedure
for computing an estimate s(x) from m(x) must be a nonlinear one.
15.2 Solution concepts
We shall use as a guide to the literature the paper by Price (1982) and the extensive
review of statistical methods presented by Turchin, Koslov and Malkevich (1971).
All of the methods, to be discussed below, reduce the continuum problem to
a discrete one. In the simplest case, the observation points used are at Xi and the
solution s(y) is to be evaluated at yj. Equation (15.1) then reduces to a set of
coupled linear equations
An alternative procedure is to expand s(y) in some set of orthonormal functions

(f>v(y) and m(x) in the same set <j>fj,(x). The result is that Eq. (15.6) is replaced
by an equation of identical form, with subscripts i and j replaced by /j, and v. Of
course, m^ and sv are the expansion coefficients in the ^ basis.
260 SIGNAL EXTRACTION IN PRESENCE OF SMOOTHING AND NOISE
In terms of either discrete representation, the ill-posed nature of the problem
manifests itself in the matrix KIJ (or K^v) being an ill-conditioned matrix. See for
example, Dahlquist and Bjorck (1974).
Filtering
If noise is ignored and the kernel possesses translational invariance
then a formal solution for s ( x ) can immediately be obtained using Fourier

transform techniques:
where m(p) and K(p) are the Fourier transforms of m(x) and K(x), respectively.
The ill-posed nature of the problem can be made evident by considering an
instrument K with Gaussian line shape:
In this case, its Fourier transform is
The use of Eq. (15.10) in Eq. (15.8) clearly produces a large enhancement in any
high frequency components in m(p). If there were no noise, m(p) would vanish
as p —>• oo more rapidly than K(p) so that a well-defined expression would result
for the signal s(x).
But the added noise n(x) can be white noise, meaning that n(p) and hence
rh(p), do not fall off, but remain constant as p —> oo. The most elementary way to
avoid this difficulty is to set
where F(p) is a filter factor that falls off rapidly as p —» oo, chosen to impose
the desired smoothness on s(x). The problem is the arbitrariness involved in
specifying F(p).
METHODS OF SOLUTION 261
Regularization
A second class of methods, known as regularization methods, replaces the problem
of minimizing | \Ks — m\ |2 by a well-posed problem of the form
where D is a linear operator that measures the degree of nonsmoothness, say a sec-
ond derivative, and a is a parameter that determines the amount of nonsmoothness
allowed.
15.3 Methods of solution
Relation of regularization to filtering

For a general D, we claim, with Price (1982), that the regularization method is
equivalent to using the filter factor
since Eq. (15.12) yields
from which we deduce that
or
In Fourier space, the factor in brackets is the filter factor of Eq. (15.13), but the
matrices in Eq. (15.16) can taken in any basis set 4>^(x).
Perhaps the earliest suggestion for regularization was made by Phillips (1962)
who proposed that to keep the solution smooth one should for fixed \\Ks — m\\
minimize
and he replaced s"(x) by its discrete counterpart
But then
so that
Thus the Phillips choice is a special case of smoothing with |D(fc)| 2 =

16sin 4 (/c/2). The chief purpose is to have \D(k)\2 increase with k to be more
effective in eliminating the high Fourier components. If Fourier transform methods
are used, it would be simpler to use
which is simply the statement that the spectrum of s"(x) is k4 times that of s(x).
A more general discussion of regularization is given by Tikhonov (1977).
Iteration
One of the earliest deconvolution schemes is the iteration scheme of van Cittert
(1931). In this scheme one starts with
and passes from the /j,th to the (/j, + l)th iterate according to
which is, in effect, a Jacobi iterative solution of the simultaneous equations. (The
prime on the sum omits the diagonal j = i term.) Jansson (1970) proposed an
overrelaxation scheme of the form
where K need not equal unity. In this case, all components of s are updated
simultaneously. If, in updating any component, the updated values of earlier com-
ponents are used, we get a generalization (for K ^ 1) of the Gauss-Seidel iteration
METHODS OF SOLUTION 263
procedure:
Jansson quotes the convergence condition for this scheme as
However, the matrix Kij will always be ill-conditioned, Eq. (15.26) will be dis-
obeyed, and convergence will never occur. Jansson succeeds, however, for a
different reason. At the start, and after each iteration, he applies a smoothing
procedure of the form
where the smoothing formula involves 21 + 1 terms. This smoothing procedure

is equivalent to the choice of a particular value of D(k). Optimum choice of the
relaxation parameter K is discussed by Fadeeva (1959).
Jansson was able to resolve adjacent peaks in infra-red spectra whose sepa-
rating valley was eliminated by noise. His method, however, "converged only to
within a certain value of the root-mean-square difference then diverged".
Although Jansson attributes this divergent behavior to noise in the data, in our
opinion, it may be associated with round-off error generated by multiple matrix
processes. If his procedure were replaced by an equivalent D(k), then one fast
Fourier transform, a filtered product, and an inverse fast Fourier transform would
yield his best result without iteration. However, his scheme, like that of Phillips,
and of Twomey's (1963) improvement of Phillips can all be criticized for the
arbitrariness in their choice of D(k).
Nontranslationally invariant kernels

Noble (1977) provides a simple, numerical example:
with a = 0, b = 1, s(y) = 1, and m(x) computed from this integral with a kernel
that arises from potential theory:
He converts the integral equation to a set of simultaneous equations using
Simpson's rule with n points. He solves for s(0) [exact s(0) = 1] with the results:
"Contrary to what we might expect at first sight, the larger the number of points,
the worse the results are; the smoother the kernel, the worse the results are".
15.4 Well-posed stochastic extensions of ill-posed processes
Franklin's method
Our section heading is borrowed from the title of a lucid contribution by Franklin
(1970). Our description of Franklin's work follows that of Shaw (1972), whose
improvement will be detailed in the next section. In our notation, Franklin
considers the solution of the problem
where the noise n has given Gaussian statistics, and the signal s also has given
statistics. The statistics of m is derived from the corresponding statistics of s and
n. Presumably, the statistics of the noise n can be obtained by measurements in
the absence of a signal. The aim is to construct a linear operator L such that an
estimate s of s can be constructed from m via
To minimize the mean square error, we vary L in
To obtain this result, we expanded in an arbitrary basis fa:
so that the second term takes the form

WELL-POSED STOCHASTIC EXTENSIONS OF ILL-POSED PROCESSES265
where we have taken the ensemble average and used
If we vary L* in Eq. (15.32), and use the cyclic property of the trace we get
from which we obtain
in agreement with Shaw's (3.15). Note, however, that the statistics of m are deter-
mined by those of n and s according to Eq. (15.30): The scalar product with s
yields
or
To evaluate all the components of Rmm, we need:
with the help of the expanded form
The explicit result can be written as a matrix relation by suppressing the subscripts:
Thus
In the usual case, in which noise is uncorrelated with the signal, Rsn = Rns = 0,
L reduces to:
This is the result used by Franklin. We can verify that when the noise is neglected
the correct, but useless value because K is inevitably an ill-conditioned matrix.

15.5 Shaw's improvement of Franklin's algorithm
Franklin's procedure had one significant defect: He assumed that the signal s was
drawn from a space in which the mean value {s} = 0. In addition, one often wishes
to use a number M of measured values rrij significantly larger than the number TV
of signal values Sj to be estimated. Franklin's procedure requires the solution of M
simultaneous equations. Shaw has produced an algorithm that requires inversion
only of a smaller N x N matrix as one does in a least squares calculation. In
addition, he makes an initial estimate s^ of the signal and interates assuming
(s) = s^ in computing §(n+1\
We first note that a simple least squares algorithm which requires the minimiza-
tion of
leads to the equations
If we multiply the original relation Eq. (15.30) between m and s by K^ we obtain
Here K^K has the reduced N x N size. This problem now has the same form as
Franklin's original problem with the replacements
Thus Eq. (15.35) for Rsm is replaced by a reduced noise matrix
The reduced signal-measurement correlation is
Thus we obtain the matrix relation:
Similarly, the reduced measurement-measurement correlation is given by:
The estimate based on Franklin's procedure, but using the reduced matrices, takes
the form:
SHAW'S IMPROVEMENT OF F R A N K L I N ' S ALGORITHM 267
or in expanded form:
If cross-correlations are neglected,
If, in addition, white noise is assumed
now, Rmrn in Eq. (15.56) has a factor K^K on the left, so that its reciprocal takes
the form
where / is a unit matrix in the ji space. Then the estimated signal, Eq. (15.56),
can be rewritten as:
Since the matrix RSSK^K commutes with itself as well as with the unit matrix /,
we can move it through to the right to obtain:
When a nonzero signal is present:
our aim is to determine (s} = h from the measured data. Franklin's procedure,
Eqs. (15.314) and (15.37), is modified to
in its original form. In the reduced form Eq. (15.56) is replaced by:
In the case of white noise we can return to Eq. (15.60) and replace s by s — h, and
m by m — h. After the subtraction involving the h terms is performed, we get the
simplified result:
in agreement with Shaw (4.19).

In his calculations Shaw also treats the signal s as having white noise
In this case, Eq. (15.64) simplifies further to:
Since s should agree with h, Shaw adopts the iterative procedure
Shaw then provides a starting estimate s^ from the least square equation
by assuming that K^K is sufficiently sharp that Sj can be replaced (on the left) by
Sj with the result
If we set s(n+1) = s^ = s in Eq. (15.68) and attempt to solve directly for s,

we obtain
which is just the least squares solution. Thus the iterative procedure should
eventually become unstable!
15.6 Statistical regularization
What is the meaning of the statistical fluctuations in the signal characterized by

the correlation function Rss
These fluctuations may represent actual noise that contaminates the signal. How-
ever, even when the signal is not contaminated by noise, but only added later,
the Franklin and Shaw procedures would break down (the problem became ill-
posed) if Rss were set equal to zero. Another interpretation is that we impose on
the problem an a priori distribution -P([s]) of possible signals. This distribution,
for example, should give weight to our prejudice that the Sj = s(xj) arise from
STATISTICAL REGULARIZATION 269
a smooth function s(x). The Phillips's regularization procedure emphasized this
point by adding a term in the minimization procedure
which becomes large if s(x) becomes highly oscillatory. These ideas can be cast
in the language of statistical decision theory. Let
represent the probability of a signal s and a measurement m. (For simplicity, think

of s as the set of numbers s$ = s(xj) and m as the set nij = TO(XJ).) Then the
Bayes' theorem solution of the problem
represents the a posteriori probability of s, after having made the measurement m

in terms of the a priori estimate P(s). Since the noise n is Gaussian and additive,
If we do not know the a priori probability in detail but only its correlation matrix
then the distribution with maximum entropy where
subject to the constraint of having a correlation matrix C is given by the Gaussian
In summary, although the regularization methods appear to be distinct, they

all convert an ill-posed problem to a well-posed one by adding a requirement of
smoothness. This is done most explicitly by specifying an a priori probability
associated with the signal. (Iteration procedures, such as Eq. (15.67), due to Shaw
and Franklin give the signal an a priori spread in Eq. (15.65).) The disadvantage
of all the above procedures, is that the predicted signal depends linearly on the
measured signal. Regularization is undoubtedly necessary, but the best estimate of
the signal should depend nonlinearly on the measured value and noise.
15.7 Image restoration
Nonlinear methods have been introduced in connection with the problem of image
restoration. These methods recognize that an image is likely to have sharp edges.
The methods introduced consist in a mixture of a regularized solution with the
unregularized result with the degree of admixture varied in a local manner that is
sensitive to the gradient of the measured signal. This procedure reduces the amount
of undesirable smoothing that occurs in the vicinity of an edge. But no investiga-
tion has been made of the stability of these new procedures. The work of Abramatic
and Silverman (1982) is based upon a procedure introduced in geophysics by
Backus and Gilbert (1970) and on the work of Frieden (1975).
The idea of Abramatic and Silverman is to allow the regularization parameter
which controls smoothness of the solution to adapt to the local characteristics of
the image (a flat field or an edge). This was done by taking into account the mask-
ing effect of the human eye. The eye is quite sensitive to a small amount of noise
in a flat field, but is able to tolerate a large amount of noise in the surroundings of
an edge. In their procedure, the masking function is estimated in the form of
from the noisy image where g(i,j) is the gradient of the image at the pixel ( i , j )
and do is at the order of the typical size of an edge. The amount of regulariza-
tion at each pixel (i, j ) is scaled by a visibility function, f(M(i, j)), a monotonic
decreasing function from 1 to 0 as M goes from 0 to oo. Abramatic and Silverman
used the visibility function
where a > 0 is a tuning parameter. Via the visibility function, stronger regular-
ization is then applied to the flat field where M is small and weaker regularization
near an edge where M is large.
In short, nonlinear image restoration is much harder than linear restoration. An
excellent summary of image restoration can be found in G. Demoment (1989).
16
Stochastic methods in investment decision
Derivative securities, with which we shall be concerned, have a value related to

the underlying security or asset. When a derivative such as a put, or a forward, or
future sale, is not priced correctly, one can, by making compensating purchases,
make a risk free profit. This is sometimes referred to as arbitrage. The purpose of
this chapter is to develop the relation between the price of the underlying asset,
and a derivative, and to include the effect of noise, or random fluctuations on this
relation. See Hull (1989, 2001) for a detailed discussion of "Options, Futures, and
Other Derivative Securities". A more mathematically oriented description of this
subject is contained in Nielsen (1999).
There is a controversy between the mathematical economists and financiers,
who use methods based on the Ito's (1951) calculus lemma, and physical scien-
tists, for example van Kampen (1992), whose applications have been made to
chemistry problems, and Lax (1966IV). An attempt will be made to compare the
two techniques to avoid pitfalls that sometimes occur in the use of the Ito method.
16.1 Forward contracts
A forward contract is an agreement by one person to sell at a time T (in years) for
K dollars (at delivery) an asset whose value at the current time t (in years) is S.
The forward price F is that delivery price K chosen to make the value of
the contract zero. If the interest rate r in risk free money were zero, we would
have F = S. However, if the delivery price is K, one only needs cash equal to
K exp[—r(T — t)\ at the present time to be able to pay K at the time T — t later.
If we assign / to be the value of the forward contract, then
since the first two items are equivalent to owning the third.
The forward price F is then the value of K that makes / = 0.
As an example, from the Wall Street Journal on Friday, May 22, 1998 we take the
price in dollars for 100 yen (table 16.1).
272 STOCHASTIC METHODS IN INVESTMENT DECISION
TABLE 16.1. Price in dollars for 100 yen

Time Price Interest
Spot 1.1675
30 day 1.1715 2.05%
90 day 1.1808 2.26%
180 day 1.1948 2.31%
Except for commission, that we neglect, the ratio of the 30 day, 90 day and
180 day forward prices describe the interest rate factor exp[r(T — t)] for the three
different periods. In the third column we list the rate r, consistent with the above
data.
It would appear that Eq. (16.2) contains a hidden assumption that the price S
(at the initial time t) will be equal to the final price ST at the settlement time T.
But this is not the case! An arbitrageur can buy the asset at the spot price S and
take the short (seller) side of the forward contract. To do this, he must borrow S
dollars at a total cost of 5exp[r(T — t)]. When he sells the asset, he receives ST
for a gain (possibly negative) of
From the forward contract, he receives F, but then must supply an asset of value
ST leading to a gain of
In the combined gain
the value of ST disappears. This result agrees with Eq. (16.2). Thus if F >
5exp[r(T — t)] he makes a risk free gain. If F < S l exp[r(T — £)] , an arbi-
trage in the opposite direction also yields a risk free gain. This expression, Eq.
(16.2), for the value of a forward contract is independent of the final value ST of
the asset.
16.2 Futures contracts
Futures contracts, like forward contracts, are an agreement now to buy or sell an
asset in the future at an agreed upon delivery price. However, future contracts are
handled on an exchange such as the Chicago Board of Trade. To buy a future con-
tract through a broker, a deposit, the initial margin, must be supplied to guarantee
delivery. This could be 20% of the value of the contract.
A VARIETY OF FUTURES 273
Thus to buy 100 ounces of gold at $400/ounce, a contract of $40000 might
require an $8000 deposit. If the price of gold goes up by $10 the buyer of the future
contract finds that his margin account has gone up by 100 x 10 = 1000. However,
any balance above the initial margin can be withdrawn. Even if not withdrawn,
additional interest is earned. Conversely, if the price goes down, the value of the
margin account declines a corresponding amount, and the interest earned declines.
If the margin falls below a maintenance level, the investor will receive a margin
call to make up the difference. If not received, the broker sells the contract thus
closing out the position.
In Appendix 2A of Hull (1989), or 3B of Hull (2001) he establishes that if the
interest rate is the same for all maturities, futures contracts and forward contracts
should have identical prices. However, if interest rates change, particularly if they
change in a way that is correlated with price S charges, the equivalence no longer
holds. We shall ignore these fine points in the discussion that follows.
Typically, European contracts involve action only at the closing date. But
American puts and calls can be exercised at any intermediate date, or held to the
close. This introduces a need for a strategy as to when to take action, and can also
cause a modification of the price of the put or call.
16.3 A variety of futures
The general behavior of future contracts were described in the previous section,
but there are differences that depend on the nature of the assets. If we are dealing
with stock index futures, where the stock has a dividend yield q, the forward price,
Eq. (16.2), is modified to
Stock:
since the underlying asset has a dividend yield q that partly compensates for the
interest rate r.
For future contracts involving currencies, if the local currency has interest rate
r, and the foreign currency has interest rate rt we get
since the yield in the foreign currency plays the role of a dividend.
Table 16.2 shows an decrease of price with maturity corresponding to the fact
that interest rates in the US are less than those in Canada.
For gold and silver, the forward price is
Gold:
or
Gold:
TABLE 16,2. Price of Canadian dollars in America

Time Price
Spot 0.7404
30 day 0.7396
90 day 0.7383
180 day 0.7308
where U is the present value of all storage costs over the life of the contract. If the
storage cost is proportional to the value of the gold, with the storage cost u per
year per dollar of value, this is equivalent to a total storage cost
This results in Eq. (16.9) with n, acting as if it were a negative dividend, or an

added interest charge.
16.4 A model for stock prices
Black and Scholes (1973) have developed a procedure for estimating the appropri-
ate price for puts and calls which are forward contracts where the asset involved is
a stock. A similar contribution was made by Merton (1973) at the same time. (The
Nobel prize was shared for this work.)
For this purpose, they need a model for how the price S of a stock varies with
time. If the change in S is proportional to S, the growth is necessarily exponential.
In the absence of fluctuations then, the model assumes
where the growth rate /j,, in the stock price, presumably can be estimated from the
growth rate in earnings of the stock. In the lowest order, one might expect the stock
to execute a Brownian motion with fluctuation parameter anS. But this choice has
two disadvantages. The first as remarked by Hull (1989), Section 3.3 and Hull
(2001), Section 10.3 is that investors expect to derive a return as a percentage of
the stock value, independent of the price. Thus they classify stocks by their growth
rate. To add Brownian motion, Eq. (16.11) is rewritten in the form
Hull
where (3.7) refers to Hull (1989) and (10.6) refers to Hull (2001). Hull, of course,
doesn't use the subscript H in his work. Here dz is the differential of a Wiener
A MODEL FOR STOCK PRICES 275
process of pure Brownian motion, namely one whose mean remains zero, and
whose standard deviation is (At) 1 / 2 .
If 4>(m, S) describes a normal process with mean m and standard deviation E,
then the distribution of Ax = A5/5 is described correctly by
to allow for the growth of the mean value m = //Ai and the standard devia-
tion o-H(At) 1 / 2 with time. This correct result, stated as Eq. (3.8) in Hull (1989)
and Eq. (10.9) in Hull (2001), also avoids the second disadvantage. If 5* itself
were described as a Wiener process, the price S could reach unacceptable negative
values. In the accepted model, all that can happen is for In S to go negative. We
therefore advocate the use of Eq. (16.12) as the fundamental description of a model
for stock prices. The model, Eq. (16.12), defines // as the slope of the ensemble
average of x(t) if an ensemble of measurements can be made. If not, one takes a
logarithm of one sample price series and asks for the slope of the best linear fit to
that line of price logarithms.
Now we ask what is the Ito's stochastic differential equation (ISDE) for stock
5? Using the Ito's calculus lemma, Eq. (10.41), and dS/dx = S and d2S/dx2 =
S, this leads to ISDE for S,
A model called geometric Brownian motion is also often used in financial

quantitative analysis, which is written as
Then, by < and Ito's lemma leads to
The fact /z / v indicates that Ito's calculus \emmsiprohibits simply multiplying

by S on both sides of Eq. (16.12), which would produce
This simplest example shows the puzzle in performing Ito's calculus lemma. But
this kind of manipulation has appeared in some well known books in the financial
area. For example, in Hull's Eq. (10.6), with the word "or", Eq. (16.12) is rewritten
as
Equations (16.12) and (16.16) appear to be are regarded as equivalent by Hull and
the finance community. But we believe the second choice is not equivalent to the
first choice.
276 STOCHASTIC METHODS IN I N V E S T M E N T DECISION
Before further analysis we must note an annoying but unimportant difference
in notation. The standard Wiener notation is equivalent to the correlation
This leads to a Brownian motion in which
whereas the customary physics notation would have a factor of 2 placed on the
right hand side of these equations. In this chapter we follow Hull's convention
common in the economics world and and set
where f ( t ) d t is our definition for standard fluctuation, as defined in Chapters 9

and 10. In this book,CTHis an abbreviation for <JH U H> and cr is an abbreviation for
^Lax-
Our approach is developed in Sections 8.2-8.3, and Sections 10.1-10.3, which
uses the ordinary calculus rule for the variable transform in the Langevin stochastic
differential equation (LSDE). By Eq. (16.19), Eq. (16.12) can be written in the
form of our Langevin equation
Thr LSDE approach allows us to simply multiply S on Eq. (16.12), and it is

written as
Equation (16.21) seems similar to Eq. (16.16), however, using our approach
the average of the product in the second term is not zero. Thus Eq. (16.21) should
not be regarded as an Ito's equation.
The custom in the economic field, under definition of Ito's integral, is to replace
Eq. (16.16) by an equation
Hull:
where tc = t — e is a slightly earlier time than t. This guarantees that the average of
the second term vanishes. This happens because in an integration from t to t + At
the time tc is not included in the region of integration. Equation (16.22) is then a
true Ito's equation, but it is not equivalent to Eq. (16.21).
Do these models, Eq. (16.12) and Eq. (16.16), or Eq. (16.21) and Eq. (16.22),
yield different results?
A MODEL FOR STOCK PRICES 277
Equation (16.21) has been solved for a well behaved f ( t ) (finite, not 5 func-
tions) stochastic variable such as one with a Gaussian correlation in time (see
Section 10.2). The average of the product in the second term is not zero in a finite
time interval, and remains nonzero as one approaches the white noise limit by
letting the correlation time approach zero when At —» 0. By specializing to the
delta-correlated case, Eq. (10.27) can be written in the form
Lax :
This is equivalent to the statement that

Lax :
i.e., that In ST has a normal distribution with mean In S + ^(T - t) and variance
a^(T - t). For a small time interval, At, Eq. (16.24) reduces to Eq. (16.13) or
Eqs. (3.8)10.9) in Hull (1989)2001). However, Eqs. (4.6) 11.1) in Hull (1989)2001)
Hull:
is in disagreement with our result, and also with Eqs. (3.8110.9) in Hull
(1989)2001). How did this discrepancy of (l/2)<rjj arise between two equations
of Hull? He gets the result of Eq. (16.13) above when he derives from Eq. (16.12),
and he gets Eq. (16.25), Eqs. (4.6 11.2) in Hull (1989)2001) when he uses the Ito's
calculus lemma and derives from Eq. (16.16).
On the other hand, we can also directly estimate the conditional average
(d/dt)(S}s(t)=s in Eq. (16.21), using our Langevin description in Section 10.3
to Eq. (16.21), and obtain
which is in agreement with Eq. (16.14). For the average, S on the right hand side
of Eq. (16.26) is replaced by (S}.
In our Langevin expression, the first term represents motion driven by finite
number of driving forces to the system, hence, is a deterministic function of
time. The second term represents the fluctuation driven by many unknown random
forces. Under transform of variables, the ordinary calculus rule can be applied,
separately, on both terms. Hence, the meanings of drift and fluctuation are clearly
kept separate in the first and second terms after transform of variables. The average
of the second term is generally nonzero when an(S) is not a constant.
16.5 The Ito's stochastic differential equation
Because mathematicians concentrate on Brownian motion, which is rather singular

in behavior, it is not clear how to define the integral of a product of a random
variable and a random force. In particular, following our notation of Section 10.1
it is not clear how to convert the differential equation, Eq. (10.1),
to an integral. The Riemann sum of calculus would be
where
The Riemann integral exists if the sum approaches a limit independent of the
placement of tj in the interval in Eq. (16.29). The Riemann integral exists, accord-
ing to Jeffries and Jeffreys (1950), when the integrand is bounded over the interval
of integration and for any positive ui and 17, the interval of integration can be
divided into a finite set of intervals such that those with hops (jump discontinu-
ities) > (jj have a total length < 17. Our point is that Brownian motion violates
these condition. See Ito (1951), and Doob (1953).
Ito (1951) avoids the difficulty by evaluating a (a) at the beginning of the
interval, and evaluating the integral over /(£) as a Stieltjes integral
However, even the Stieltjes sum does not converge to a unique integral, and
the evaluation at the beginning of the interval is an arbitrary choice. The effect of
this choice, since /(s) is independent of a(t) for t < s, is that the average of the
second term in Eq. (16.27) vanishes, so that the Ito drift vector is
in contrast to our result, Eq. (10.14).

Stratonovich (1963) makes the arbitrary but fortuitous choice of using
the average of the values at the two end-points of each interval.

THE ITO'S STOCHASTIC DIFFERENTIAL EQUATION 279
It is intuitively clear that an average of the end-points is better than using either
one. But is this procedure always correct? It is just as ad hoc as Ito's procedure. In
Lax (1966IV), and in Section 10.1, Lax uses an iterative procedure that yields all
results accurate to order At in two steps. The result, Eq. (10.14) and Lax (1966IV)
Eq. (3. 10) yields
where D = cr(a) 2 is the diffusion coefficient. This result is in agreement with that
found in Stratonovich,
The justification for our procedure is that physical processes are described by
noise that is only approximately white. For the physical process, one can use the
ordinary methods of calculus. The iteration is necessary to retain terms that will
retain a finite value in the limit as the correlation of the noise approaches a delta
function. Direct use of the Ito choice, Eq. (16.32), and starting from Eq. (16.16),
leads to Eqs. (4.6|11.1) in Hull (1989(2001), the result quoted in our Eq. (16.25).
We simply claim that this result is not the answer to the original model, namely
that the logarithm of the price obeys the standard Brownian motion.
A direct proof of this remark can be made without using stochastic inte-
grals. Noting that the Gaussian distribution of P(x, t) satisfies the Fokker-Planck
equation
which contains the constant diffusion term D = (l/2)<7jj, and the constant drift
term A = p,. One can obtain the equation for S from the equation for P(x, t), by
introducing the relation x = ln[S/S(0)]:
The result after some labor is
The Ito equation corresponding to this Fokker-Planck equation is simply
written in our notation, where tc = t — e guarantees that the last term averages to
zero. The same equation written in Ito notation looks like:
The confusion in the financial literature arises because Eqs. (3.7J10.6) in Hull
(1989)2001) states that his model is
Lax :
but by occasionally multiplying this equation by S (without the strict use of the
Ito's calculus lemma) to obtain
or Hull :
In summary, we do not claim that the Ito definition is wrong, but requires
extreme care to obtain correct results. It tends to mislead smart people into obtain-
ing an incorrect answer. The proper intuitive view for the Black-Scholes model
should be that x, the logarithm of the price, obeys a standard Brownian motion. In
other words, Eq. (16.12) is the correct model regardless of which calculus is used.
When one makes a change of variable from the logarithm x, to the actual price,
S, the appropriate stochastic differential equation that obeys the Ito rules will be
Eq. (16.39). The differences between Hull's results and mine (Lax) (as well as
with some of his own results) is due primarily to the use of two different models.
The Ito notation just obscures this point. It could be used with great care to obtain
correct results.
The models based on Ito's lemma should be avoided because it is counter-
intuitive for physical reasons. On the other hand, the procedure used in Section
10.3 would reduce the number of errors and avoid the use of Stieltjes and Lebesgue
integration. The main disadvantage of our proposal is that it will reduce the number
of jobs needed for mathematicians to teach measure theory. The discussion of
models for stock prices and market behavior can then be devoted more heavily to
real world questions and less heavily to formalism.
For volatile stocks, the difference between the two possible market models,
Eq. (16.12) and Eq. (16.16) can be appreciable. In a completely rational world, the
growth parameter would be determined completely by the growth rate in earnings
per share. Since this is not the case in the real world, the parameter /z is obtained
by fitting against stock prices. Depending on how this is done, the fitting procedure
might cancel the error in the use of the model Eq. (16.16) instead of Eq. (16.12).
In our work on laser line-widths discussed in Chapter 11, the growth and decay
rate can be separately determined, and there is no flexibility in our choice. The
excellent agreement for the laser line-widths with experiment, as shown in Fig.
11.3, supports the iterative procedure used in Chapter. 10 for relating the Langevin
to the Fokker-Planck pictures. It is supposed that the mathematical techniques
developed for the study of random processes in physical systems can be applied in
future for the economic and financial worlds.
VALUE OF A FORWARD CONTRACT ON A STOCK 281
16.6 Value of a forward contract on a stock
As the simplest application of Ito's lemma, Hull (Example 4.1) considers the value
of a forward contract on a nondividend paying stock. We already found in Eq.
(16.2) that the forward price should be
However, the dynamics of S implies an associated dynamics of F. To get dF/dt

when Ito's lemma is used, Hull's Eq. (4.16) which applies to any function of S and
t is
The extra term involving d2F/dS2 due to Ito's lemma vanishes for the choice of
Eq. (16.41), since F is linear in S. Thus F obeys a dynamics, including the noise,
similar to that of the stock price S, but with the growth rate v reduced by the risk-
free interest rate r. However, in Eq. (16.43) v = p, + (l/2)<7^ when the physical
stock model is used in Ito's formula, which leads to [p, — r + (l/2)a^l]Fdt in the
first term.
Now, we use our Langevin approach described in Section 10.3, where the
ordinary calculus can be used. From Eq. (16.12), we have for S
where in Eq. (16.19) is used.

For F, we have
In order to obtain the average d(F}, using
we have
and for the conditional average, (F} on the right hand side is replaced by F, which
is in agreement with result of Ito's approach.
Our Langevin approach is easier than that using Ito's lemma. Instead of using
Ito's calculus lemma at each steps of transform from x to S, and to F, the ordi-
nary calculus rule can be used in each step in our approach, and d(F}/dt is then
determined using Eq. (16.46) at last step.
16.7 Black-Scholes differential equation
We showed in Section 16.2 that the risk of owning an asset could be canceled out
by also having a forward contract to sell the asset. And the combination is risk
neutral provided that an appropriate value F is set as the forward contract. Can
this scheme be extended to the case when the asset is a stock that has a growth rate
^5 and is subject to noise proportional to the value of the stock? It is assumed that
derivative asset (say a put) has a value g = g(S, t) for one derivative security.
Using the Ito's lemma, g(S, t) obeys the stochastic differential equation
where, by the Ito convention, the second term has a vanishing average value. To
obtain a risk free portfolio, we must have a combination of assets in which the
term related to the change rate of stock v vanishes. This can be accomplished by
combining a put the equivalent of —1 shares which takes the value —g, with dg/dS
shares of the stock. The combined value is
where the subscript t reminds us that the number of shares does not change during
time evolution, except when it is adjusted by investor.
The ratio of these two components was chosen to cancel the v contribution
from each other. This cancellation occurs if one follows Hull and writes
Using Eq. (16.48) for Ag and AS = vS + aHSdz, the result is
Assuming the validity of this result, the value of II must grow at the interest rate r
, or arbitrageurs could make a risk free profit:
DISCUSSION 283
The result is the differential equation
Thus we obtain the Black-Scholes equation:
In the above Ito's equations, z/ = // + (1/2)0^. However, v has been canceled

out in the Black-Scholes equation.
Now, we use our Langevin approach described in Section 10.3 for the combined
asset IT in Eq. (16.49). Using the ordinary calculus, the Langevin equation for II
is given by
where
The contribution to the average or the conditional average with II (t) = IT, from
the second term in Eq. (16.55), is the average of
The result on average of A(II) is same as Eq. (16.51).

If the definition of Ito's integral is applied, the second term in Eq. (16.55)
should be zero, because a(g) is estimated at time t, hence, the fluctuation term
is completely canceled. However, according to our point of view, through a very
short period of time, the change of a(g) leads to the nonzero average of the second
term Eq. (16.56), and it is impossible for one to adjust his shares in such a fast
way.
16.8 Discussion
Stock model
The motivation to separate the right hand side of the stochastic differential equation
into two terms and to assign a standard fluctuation force f ( t ) in the second term is
not from mathematics, but from modeling of the real world.
In physics, using physical modeling of a system of finite parameters (for exam-
ple, an oscillator, or a system of coupled oscillators) and using the physical
rules (for example, the Newton's law), the deterministic function of drift veloc-
ity B(a,t) of a system can be determined, and it should be a smooth function of
time t. On the other hand, the fluctuation force comes from connection of this sys-
tem with the "heating reservoir", which has infinite dimensions. See Sections 7.6
and 7.7 for detailed discussion.
Concerning the price of a stock, in a completely rational world, the growth
parameter of the price of a stock would be determined by the growth rate in earn-
ings per share and other known conditions. Since this is not the case in the real
world; the stock price fluctuates because of many unknown causes. The fluctuation
force introduces an irregular, nondeterministic, and very fast-varying oscillatory
motion of S with time.
LSDE builds a model to represent effects of two different original forces on
the system. Of cause, there are some stochastic processes (for example, scattering
in a turbid medium in physics), where fluctuation plays the dominate role, and
separating into two terms in LSDE is not proper. Here, we limit, however, to the
cases that the model in LSDE is suitable, as well as in ISDE.
The equation of motion described by a linear model of a stock, Eq. (16.12)
or Eq. (16.15), is only the first perturbative approximation of a nonlinear model,
and is only valid over a very short period from the spot time. For a longer period
of time, the first term in this equation means that the price of S will increase (or
decrease) exponentially, which certainly does not reflect real development of S.
Also, the second term of this equation means that the width of fluctuation of S will
increase with time, and there is no force to limit the amplitude of fluctuation.
We suggest the following model for a stock S. First, a model for underlying
value of a stock So(t) can be built based on fundamental analysis, which is a
deterministic and slow-varying function of time. One may also build a linear model
of So(t) by use of an extrapolation of previous data of S to obtain So(t) = n(t —
to) So (to), where So (to) is the extension of previous value of SQ up to the spot time
to, and n is a rate of increasing (or decreasing) of the stock. The So(t) provides an
"operating point" for stock S as a random variable.
We define S(t) = S(t) — So(t), and build a stochastic differential equation
for S(t~). The stock value S'(t) oscillates around So(t), hence, B(S,t~) represents
a model of drift velocity of S. The simplest model for this purpose is a harmonic
oscillator, where a constant recovering force F = —aS makes S(t) oscillating
around S = 0 with a certain amplitude. The amplitude of S(t) limits the value of
S(t) within a certain range. When a fluctuation force is added, the value of S(t)
will not diffuse to infinity with time.
However, the velocity of a harmonic motion is too fast when S is near zero,
and too slow when S is near its amplitude value. The real situation is that S(t)
slowly changes when its value is near S'o(t), but it rapidly changes when its value
DISCUSSION 285
largely deviates from So(t). This suggests that a nonharmonic model is needed,
for example, the recovering force can be chosen as F — —aS — (3S3.
Recently, a form of B(S, t) = —aS is often used, which originates from
Uhlenbeck's model. Under this linear model S(t) will exponentially approach to
So(t) from its spot value. The advantage of this model is easy to be manipulated in
calculation. However, one may ask why S(t) only approaches to So(t) from one
side, but cannot cross through So(t) to the other side of So(t).
If S is limited to oscillate in a certain range, cr(S) in the second term of SDE
should not dramatically change, and the extra term in Ito's lemma (or the contri-
bution to expectation from the second term in LSDE) may be relatively small and
can be treated using some approximation, that reduce the difficulty in solving the
equation of motion of the expectation value of a derivative security with time.
Conditional expectation
We emphasize that the stochastic processes in finance, similar to that in physics
science, are natural processes. People who have some prior information and
knowledge may build the more proper model to match development of the nat-
ural stochastic process of a marketed asset, but, in general, does not change this
natural process. The state of a random variable a at time t is best described by
its probability distribution P(a,t), not its spot value realized at time t, because
the spot value of a is undetermined and it jumps up and dawn in a very fast way.
Based on this view point, we would like to discuss the concept of the conditional
expectation.
Mathematicians denote the conditional expectation by the symbol E(Xt> \J-~t),
t' > t, where Xf is a stochastic process, and J-± is a a-field known as "natural
filtration". For example, estimation of the price of an option (or future) H(T — t)
with maturity time T is determined based the spot value of its underlying stock
S at current time t. Speaking differently, one already has information that the
closing price today of S(t) is S. This information provides a filtration under which
the the expectation of S(t') in future can be calculated. However, one may ask a
question whether the the price H(T - t} determined based on S at 4:00 PM, or
that based on 5" at 3:50 PM is more reasonable, since the spot value of S may be
remarkably different by irregular jumps during the last ten minutes before closing.
In our opinion, the expectation based on probability distribution P(S, t) at spot
time t could provide a more reasonable estimation of H(T -1}, not the conditional
expectation, because the probability P(S, t) can not dramatically change during a
small interval of time, but its realization value can jump up or dawn during a very
short period of time. In physics, we use "ensemble" to describe all possible states
of a system under a certain probability, and do not regard a sample in an ensemble
as a meaningful quantity. In Monte Carlo simulations we do not regard a single
286 STOCHASTIC METHODS IN I N V E S T M E N T DECISION
path essential; similarly, the point by point value in the time path of a stock is not
essential.
Discrete processes
Discrete random processes, for example, the early process after a suddenly big
jump of a stock, is beyond the LSDE description and also beyond ISDE. In dis-
crete random processes, the nth order diffusion coefficients Dn are nonzero, up to
infinite order of n, and Eq. (10.4) should be replaced by a series of equations:
In general, it is a difficult problem since it needs an infinite number of parame-

ters to describe the detailed shape of the probability distribution function. In some
special cases, for example, the Poisson process, the generation-recombination pro-
cess, and the Rice shot noise processes, the parameters describing fluctuation can
be reduced to a finite number.
Chapters 8 and 9 provide a theoretical approach where cumulants higher than
second order is involved, so that the distribution will be nonGaussian. In this case,
a generalized Fokker-Planck equation, instead of the ordinary Fokker-Planck
equation, should be used, which is discussed in Section 8.4. A similar example
in physics is that particles are injected in a scattering medium with certain veloc-
ity, hence, the process is not a pure Brownian motion. At early time, particles move
as "ballistic". By multiple collision with scatterers in the medium, the particle dis-
tribution changes with time from ballistic-like to a Gaussian-like. This process is
quantitatively studied in Chapters 13 and 14. Similar phenomena often appear in
the stock market.
16.9 Summary
Our viewpoint expressed in Lax (1966IV), Section 16.4 is that mathematicians

have concentrated too exclusively on the Brownian motion white noise process
which has delta correlation functions. Real processes can have a sharp correlation
time of finite width. Thus their spectrum is flat, but not up to infinite frequency.
Thus, for real processes, the Riemann sums do converge, and no ambiguity exists.
After, the integration is performed, the correlation time can be allowed to go to
zero, that is, one can then approach the white noise limit. Thus the ambiguity
is removed by approaching the integration limit, and the white noise limit in the
correct order.
In summary, there are two kinds of stochastic differential equations: (a) those
whose random term need not average to zero, as used in Eq. (10.33), and (b)
those used by Ito in Eq. (16.32) in which the average of the random term vanishes
SUMMARY 287
by Ito's definition of the stochastic integral, together with Ito's calculus lemma.
Both can be used correctly, but Ito's choice requires more care because the usually
permissible way in which we handle equations is no longer valid. An able analysis
of this situation, with references to earlier discussion, is presented by van Kampen
(1992).
Hull (1989, 2001) book is an extremely well-written text on Options, Futures,
and other Derivative Securities. The Ito-Stratonovich controversy applies to
physics, chemistry and other fields. We have analyzed Hull's treatment in detail
because of his use of the Black-Scholes (1973) work.
Although the Ito choice can be dangerous, as shown above, we trust that by now
the Ito choice is used consistently in practice. However, there may be more serious
problems, since real options may have statistics based on more wildly fluctuating
processes than Brownian motion, such as Levy process and fractal processes. This
has recently been emphasized by Peters (1994), by Mandelbrot (1997), and by
Bouchard and Sornette (1994). For an excellent review of Brownian and fractal
walks see Shlesinger (1997).
Gashghaie et al (1996) have also found a parallelism between prices in for-
eign exchange markets and turbulent fluid flow. Thus the conventional Brownian
motion approach will be invalid in such markets.
17
Spectral analysis of economic time series
17.1 Overview
Nature of the problem

Most of our discussion of random variables referred to variables existing over a
continuous time. The phrase "time series" emphasizes variables known only at a
discrete set of times. The phenomenon may actually be continuous in time, but
experimental measurements may have been made only at discrete times. Closing
(and/or opening) stock market prices occur only one each day, although prices
may be available minute by minute. Extrapolation, Interpolation and Smoothing
of Stationary Time Series, the title of a book by Wiener (1949a), indicates the
problems to be solved.
In Section 17.10 we present a sampling theorem that demonstrated that the
value of a band-limited function sampled at the Nyquist rate, 1/(2W), where W is
the bandwidth, can be deduced at all points from all the sample values on a discrete
lattice. In this chapter, however, we face the more difficult problem of answering
the same question when data is available only over a finite number of points. Even
more difficult is to do the smoothing, interpolation or extrapolation when we deal
not with a deterministic function, but with a random variable.
Norbert Wiener (1930) laid the foundations of this subject. In his Generalized
Harmonic Analysis, he also developed prediction methods used in gun control that
were first made public in Wiener (1949b). A brief summary of Wiener's work
leading to a derivation of a Wiener-Hopf type of integral equation is given by
Levinson (1947). The smoothing, or filtering part of Wiener's work is an attempt
to extract signal from contaminating noise, and has much in common with our
work in Chapter 15 on "Signal Extraction in Presence of smoothing and Noise".
However, the word smoothing is used in two different senses. In Chapter 15, the
smoothing is that which occurs in any measurement process, and one tries to invert
this process, to obtain a sharp signal from a blurred one. In Wiener's case, one
merely wishes to smooth away the effects of added noise to get at what the signal
would be if no noise were present.
Statisticians are typically more at home in the time domain associated with
"time series" whereas communication engineers prefer the frequency domain.
These two are united by the Wiener-Khinchine theorem that establishes the (time)
OVERVIEW 289
autocorrelation as the Fourier transform of the noise spectrum. Wold's theorem is
the corresponding relationship for a discrete time series.
Whether the problem is discrete or continuous, it is clear that a principal prob-
lem is to determine the spectrum of frequencies contained in the data. Presumably,
if one can determine the spectrum of the nondeterministic part of the motion, one
can smooth out, or attempt to eliminate the random part of the motion, and to
better predict the nonrandom part. Actually, one usually may proceed in the oppo-
site direction to eliminate any secular part of the motion, and any sharp (periodic)
frequency components. This last part includes making seasonal corrections to the
data. Then a better estimate of the residual spectrum can be made. Presumably,
better results can be obtained by using a combined procedure, or by iterating the
trend removal and spectral analysis procedures.
Why is the problem so hard

Why is the spectral analysis of time series so hard that it has spawned dozens of
books and hundreds of papers, often recommending different procedures. After all,
a direct Fourier transform of a time series can be performed using the FFT (fast
Fourier transform) which is normally a reliable accurate procedure. One problem
is that for continuous time data, a Fourier integral is only given correctly by a
discrete sum if the function is bandlimited. See the discussion of the sampling
theorem in Section 17.10. This difficulty is compounded by the fact that even if
the trend or periodic components in the data are band-limited, the noise present in
the data is not frequency limited. The result is that the spectral calculation, is an
ill-posed problem, whose answer, like all the problems discussed in Chapter 15, is
nonunique until the problem is regularized by some smoothing procedure.
The customary smoothing procedure in the time series case is referred to as
windowing or tapering. Harris (1967) compares 23 different classes of windows.
Priestley (1981) gives details of 11 windows, and Percival and Waldon (1993) uses
four windows: Bartlett, Daniell, Parzen, and Papoulis, in examples, and describes
Manning and Hamming windows.
The problems in this chapter are more difficult than those discussed in our
previous chapters because here we are dealing directly with unaveraged data. In
our earlier work, we started from some physical model and obtained equations,
such as Fokker-Planck equations obeyed by the ensemble average. Here the aim is
to deduce the model from the data. A compromise procedure involves introducing
a model described by a small number of parameters, and using the data to deduce
these parameters. Such parametric models are discussed in many texts. See, for
example, Brockwell and Davis (1991).
The rectangular window, a direct transform, is equivalent to what we have
called the engineering definition of noise in Section 4.1. The unbiased form of
this window is called a periodogram, and dates back to Schuster (1898). However,
290 SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
Thomson (1982) has shown that although the method converges as JV —» oo, in a
typical physical example, reasonable results were only obtained for N > 106.
There is a biased form of the periodogram that uses a factor 1 — r\/N that
adds bias but reduces variance, with a net improvement. An optimum discrete
window was described by Eberhard (1973). The window coefficients are given
by a discrete prolate spheroidal sequence (DPSS). The fundamental importance
of discrete prolate spheroidal functions was explained in terms of the uncertainty
principle in a series of papers by Slepian (1964, 1965, 1978) with the initial paper
by Slepian and Pollak (1961).
Slepian and Pollak (1961) recognized that the uncertainty principle imposes
limitations on the attempt to fit a nonbandwidth limited function, known only over
a finite interval —T/2 < t < T/2 by a function band-limited over the frequency
band — W < f < W. One can obtain an extremely accurate fit at the expense
of wild oscillations from components outside the bandwidth. Slepian and Pollak
(1961) proposed getting a least squares fit inside the band — W < f < W, while
maintaining a fixed energy outside the bandwidth. Their result leads to a variational
principle in which the fraction of the total energy inside the bandwidth is maxi-
mized. This turns out to be the equivalent of maximizing the fraction of energy
within the time interval —T/2 < t < T/2. The variational expression leads to an
integral equation whose solutions are recognized to be prolate spheroidal wave-
functions. These eigenfunctions of the integral equation constitute an orthogonal
basis set. Thomson (1982) arrives at the same integral equations in a completely
different manner by seeking a direct inversion of the relation between spatial
and frequency amplitudes to obtain the frequency amplitude rather than just the
power spectrum (or absolute squared amplitude). In this way, phase information is
retained.
Thomson's contributions
My (Lax) interest in the field of time series has been greatly stimulated by personal
contact with David J. Thomson who has spent a lifetime career covering all aspects
of time series. We can describe the present chapter as our attempt to learn enough
about time series to be able to read Thomson's work. We shall therefore record a
subset of his publications to indicate the breadth of topics covered.
His work started, appropriately for Bell Laboratories, with the analysis of time
series in waveguides used to transmit information in the telephone network. See
Thomson (1977). This work was expanded in Thomson (1982) already referred to.
His 1982 paper constitutes the foundation of much of his later work. In Kleiner,
Martin and Thomson (1979) Thomson shows how to apply Tukey's ideas of robust-
ness to spectral estimation. Tukey's book on Exploratory Data Analysis shows how
to deal with real data. When is an apparently deviating point an outlier to be dis-
carded? See Tukey (1977) and Thomson (1982). His work was also applied to the
THE WIENER-KHINCHINE AND WOLD THEOREMS 291
global warming problem in Kuo, Lindberg, Craig, and Thomson (1990). This first
serious paper on 'recent' climate has been cited by Al Gore! Thomson (1990a) on
analysis of the earth's climate extended to a period of 20,000 years, and correlated
CO2 data with tree ring data. His next work, Thomson (1990b), extended time
series over 600,000 years and established a sensitivity of the results to the small
time differences between the siderial year, the equatorial year, and the solar year.
Thomson and Chave (1991) also adapted jack-knife procedures to deal with non-
normal variables and confidence limits. The current picture of global heating is
discussed in Thomson (1995). Thomson, MacLennan, and Lanzerotti (1995) use
time series techniques to analyze the propagation of solar oscillations through the
interplanetary medium.
Perhaps Thomson's most important work on global heating filters the preces-
sion signal out of the data and establishes a strong correlation between global
worming and the CO2 concentration. See Thomson (1997).
An application is made to the financial world by studying stock and commodity
data over a 40 year period; see Ramsey and Thomson (1999). A comprehensive
review was given by Thomson (1998) of his work on "Multitaper Analysis of
Nonstationary and Nonlinear Series Data", presented at the Isaac Newton Institute.
17.2 The Wiener-Khinchine and Wold theorems
The purely random part of a time series is usually described as a superposition in

frequency space of various spectral contributions. The inverse of Eq. (4.48) in our
notation [with a(t) replaced by x(t) but a(uj) replaced by x(/)] is
It is consistent to think of x(t) as a realization or sample of the random variable

X(t). In statistics books the above equation is written in Stieltjes notation as
where a comparison of these notations suggests
The Stieltjes form is needed for mathematical rigor, when the spectrum of X con-
tains Brownian motion (white noise), or a delta function of time autocorrelation.
In the real world, in which the noise can be approximately white, and spectral lines
are narrow, but not infinitely so, Stieltjes integrals are unnecessary, and our simpler
notation can be followed. Even when white noise is present, the second moments
are defined via Eq. (4.50) and the relation
The first and last terms follow the notation of the time series texts by Percival
and Walden (1993) and Priestley (1981), and the middle terms follow the notation
in this book. The Wiener-Khinchine theorem (for continuous time), Eq. (4.11),
takes the form
For the discrete time case, with x(t) known only at the integers t = 0,±1,±2,...,
the same theorem applies but the limits of integration in Eq. (17.5) extend
only from — 1/2 to 1/2. That is because exp(j2vr/i) is indistinguishable from
exp[j2vr(/ + l)t] since t is an integer. Thus all frequencies outside the basic band-
width from —1/2 to 1/2 are folded into that basic bandwidth. This process is
referred to as aliasing, and is well known to anyone who has watched rotating
wheels in the movies (at l/24th of a second) and found that they can appear to be
rotating backward. In the study of crystals, there is a similar folding of all wave
vectors into the first Brillouin zone.
Since the spectrum S(f) is, by definition positive, the normalized p(f) =
S(f)/ j S(f)df is a probability density. Correspondingly, we defined
Priestley (1981) describes the Wold (1938) theorem, Eq. (17.7), below, as the
necessary and sufficient condition that the set of numbers p(±n) can be an
autocorrelation. The Wold condition is
that is, the autocorrelation p(r) must be the Fourier transform of a probability
density p(f).
Assuming that we are dealing with a stationary random process, the first objec-
tive in dealing with a time series is to obtain its spectrum. The simplest procedure,
for a single discrete sample of x(t] for t = 1 , 2 , 3 , . . . , AT is to use the estimate
This is referred to as the Schuster (1898) periodogram in the statistics literature. It

is equivalent to what we have called our standard engineering definition of noise,
MEANS, CORRELATIONS AND THE K A R H U N E N - L O E V E THEOREM 293
Eq. (4.3), except that the latter definition takes an ensemble average and a limit as
N —> oo. In time series analysis the ensemble and time averages are not available
since we have only one finite set of data.
A closely related estimate of the periodogram can be obtained by using the
Wiener-Khinchine theorem to get the spectrum as the Fourier transform of the
autocorrelation and the estimate the latter from the sample data. For times, t,
separated by At instead of unity, the spectrum is given in terms the correlations by
where ST is given by
with
the Nyquist sampling frequency.
If one makes the common choice of units such that At = 1, one obtains Eq. (17.8).
Equation (17.10) is biased. If all the summands were unity, the result would be
1 — r\/N. The correlator associated with this periodogram can be modified by
replacing 1/N by 1/N' in Eq. (17.10), with Nf is given the value N — \T\, and the
"periodogram" is said to be unbiased. However, due to variance errors, the biased
(original) periodogram is often superior. We here follow the notation and approach
of Percival and Weldon (1993) including the assumption that the process mean is
zero, which is easily eliminated by subtracting the mean from each variable. By
rearranging the order of summation, the inverse of Eq. (17.10) then permits the
spectrum to be estimated by
Thomson and Chave (1991) however claims that the use of autocorrelations
and spectral densities throws away phase information. He therefore follows a
procedure outlined in Section 17.4.
17.3 Means, correlations and the Karhunen-Loeve theorem
Means and correlations

The mean of the set of random variables (RV) X± at all times t is given by
where the last step is valid if the process is restricted to be stationary. An unbiased
estimate of the mean is given by
since E(X) = //. Note that X is still a random variable. No ensemble average has
been performed. The variance of Xt is denned by
where, again, the last expression is valid only in the stationary case.
The covariance of two RV is defined by
is linear in both X and Y. The correlation of two random variables is defined by
The variance in X, the estimate of the mean can be written
If one sums first along the diagonal (at fixed r = t — s), Eq. (17.18) can be rewritten
as
where we note that TV — |r| is the length of the relevant diagonal and r| < N.
Since the sum converges, var(X) approaches 0 as N —» oo, and the estimate X is
unbiased.
The Karhunen-Loeve theorem

Our previous discussion describes the expansion of our series variables Xt = X(t)
in a Fourier series. Are there advantages in expanding in another set of orthogonal
functions </>(£)? For simplicity, let us assume that mean values have been subtracted
off so that
The Karhunen-Loeve theorem, (karhunen 1947, see also Kac and Siegert 1947)
states that if the orthogonal functions are chosen to be eigenfunctions of the cor-
relation function R(t, s), the expansion coefficients will be uncorrelated random
variables.
SLEPIAN FUNCTIONS 295
Proof
Start with the expansion
Then the correlation of the expansion coefficients is given by
with
and the eigenvalue equation
Eq. (17.22) reduces to
and the theorem is proved.

If one assumes stationarity, then
In the case of discrete time, the integral over time is replaced by a sum over time.
But the theorem remains valid.
17.4 Slepian functions
Although Slepian's argument that spheroidal functions should be used as a basis to

localize signal as much as possible in space and time is fairly compelling, it does
not establish that this procedure yields the best spectrum. Thomson (1982) starts,
instead, from Cramer's spectral representation of the discrete signal x(t) where
t = l,2,...,n,...
where for convenience, times are measured from the center of the series. The limits
±1/2 are appropriate for the spacing At = 1 since that corresponds to the Nyquist
sampling rate, and frequencies outside the first zone can be shifted in to the first
zone by a change of an integer which has no effect on the exponential. The Stielt-
jes integral can be replaced by a Riemann integral for a continuous spectrum by
making the replacement
as was done to get the second form of Eq. (17.27). It is necessary to remember that
x(n) is the discrete time series, and that dZ(v) and x(v) are in frequency space.
A zeroth approximation to the frequency amplitude is given by taking the discrete
FFT:
Since Eq. (17.29) provides a Fourier series representation of y(f) with period 1,
the Fourier coefficients are given by
If one now inserts Eq. (17.27) for x(n) into Eq. (17.29) one obtains
where KN is simply a geometric sum
This kernel reduces to sinc(Arvr/) = sin(A^7r/)/(vr/) in the limit of continuous

time, the same function as found in Eq. (17.71) for the case of continuous variables.
The continuous kernel had appeared very early in the classic work by Wannier
(1937).
Equation (17.31) can be regarded as the integral equation for the signal x(v)
given a measured output y(f). This is exactly analogous to the integral equation
relating the signal s(x) to the measurement m(x) in Chapter 15 on extraction of
signals from noise in an ill-posed problem. As in that case, the nature of the diffi-
culty can be understood by performing the inversion in a representation in which
the measuring device K has been diagonalized. In that representation the inversion
simply multiplies each component by the inverse of the corresponding eigenvalue.
SLEPIAN FUNCTIONS 297
The difficulty occurs because the high-order eigenfunctions, which are rapidly
oscillating, have small eigenvalues. Thus any rapidly oscillating noise is greatly
amplified. Hence, like all ill-posed problems the solution is not unique and cannot
be made so without a regularization procedure that rejects rapid oscillation. In the
time series problem, the time-honored procedure is to filter the inversion procedure
through a low-pass "window". Thomson's (1982) procedure is to generalize to a
multiwindow ("multitapered") procedure.
The above arguments indicate that it is appropriate as well as convenient to
solve Eq. (17.31) by expanding the solution in terms of the eigenfunctions of this
Dirichlet kernel, £>(/) = KN(f)/N,
In the presence of band limitation, W will be less than 1/2. These eigenfunctions
are referred to as the Slepian functions, in honor of Slepian who recognized their
importance in signal representation. Each eigenfunction depends on N and W as
parameters, so a complete labeling of the Slepian eigenfunctions is Uk(N, W, /).
These are also referred to as Discrete Prolate Spheroidal Functions (DPSF).
To obtain spectral amplitudes, we must solve Eq. (17.31) for x(v) in terms of
j/(/), the preliminary direct spectrum (where v is also a frequency). By expanding
the known y(f) in terms of the eigenvectors
we can solve for x(f)
or in final form
The properties of the Slepian (1978) functions are analyzed in great detail in his
paper. The eigenvalues are found to be nondegenerate and monotonic decreasing:
The eigenfunctions are found, surprisingly, to be orthogonal over the interval of

[—1/2,1/2] as well as over [-W, W]. If they are normalized over the larger region
(where no complex conjugate is taken because the eigenfunctions are real), the
corresponding orthogonality condition over the limited bandwidth is
Comparison between Eqs. (17.38) and (17.39) shows that the eigenvalue A& repre-
sents the ratio of the energy in the inner region [-W, W] to that in the full region
[—1/2,1/2]. This guarantees that all the eigenvalues are less than one, and in view
of Eq. (17.37), the shape most concentrated in frequency is associated with the
k = 0 mode. Indeed, that is why Eberhard (1973) advocated using that mode,
alone, as the appropriate window.
17.5 The discrete prolate spheroidal sequence
DPSF Uk(f) as functions with a period of unity in frequency space, can be

represented as a Fourier series,
This series from n = no to n = no + N — 1 is centered at the midpoint. The

(k]
sequence vn is called the discrete prolate spheroidal sequence (DPSS). Slepian
(1978) considers this series with an arbitrary choice of no, whereas Thomson
(1982) specializes to the case of no = 0. We shall follow Thomson's choice here.
The phase factor, otherwise arbitrary, is chosen to be
to conform to Slepian's notation. The sequence obeys the matrix eigenvalue

equation
where
The eigenvalues are shown to be monotonic decreasing
In principle, the eigenvectors and eigenvalues can be obtained by standard sub-

routines such as those included in EISPACK, or modernizations of the same.
THE DISCRETE PROLATE SPHEROIDAL SEQUENCE 299
Unfortunately, this is not useful because the first 2NM eigenvalues are close to
unity. Table 17.1 displays the first four eigenvalues, A/t, for k = 0, 1, 2, 3, listed as
\k(N, W), for N = 31 and three bandwidths W = 6/31, 7/31, 8/31 obtained in
this way.
TABLE 17.1. Eigenvalues, \k(N, W), of the discrete prolate spherical series
(after Percival and Walden 1993).
k A fc (31,6/31)_ A fe (31,7/31)_ Afc(31,8/31)
0 0.9999999999999997 1.000000000000007 1.000000000000002
1 0.9999999999999769 0.9999999999999933 1.000000000000001
2 0.9999999999978725 0.9999999999999921 0.9999999999999945
3 0.9999999998764069 0.9999999999998924 0.9999999999999908
The eigenvalues are so close, include some values above unity, which are
clearly wrong. This inaccuracy means that the associated eigenvectors, using these
values, will also be inaccurate.
In the case of continuous time problems Slepian and Pollak (1961) found that
eigenfunctions that minimize the amount of energy outside some time boundary
[—1/2, 1/2] and obey an integral equation, can also be found to be solution of a
second order differential equation. For the discrete time problem, The solutions of
an integral equation of the form of Eq. (17.33) can also be obtained as solutions of
a second order difference equation
with the boundary conditions
This matrix equation has eigenvectors t/fc) and eigenvalues 9k, as functions of N,
and W. The eigenvectors v^ (N, W) are identical to those in the original integral
equation. However, the eigenvalues Ok(N, W} are now well separated. As a result,
it is easy to calculate the eigenvectors vk accurately, without requiring quadruple
precision, and then A& can be determined by
where the matrix Anm was denned in Eq. (17.42).

The matrix in Eq. (17.44) is a symmetric, tridiagonal matrix. For such matrices
there are subroutines in most libraries that will perform the diagonalization, using
computation time of order N, rather than the time of order N3 needed for the
general matrices. The necessary subroutines appear in the IMSL, NAG, and the
PORT Bell Laboratories Mathematical Subroutine Library by Fox, et al. (1978a)
as well as among the subroutines included in the Numerical Recipes book by Press
et al. (1992).
17.6 Overview of Thomson's procedure
The procedure used by Thomson (1982) to analyze time series, in 41 IEEE pages,
is sufficiently complicated that we would like to give a road-map.
(1) The first (optional) step is to replace the original input data by a new set that is
prewhitened.
An example of such a procedure is outlined by Percival and Walden (1993).
Convolute Xt with gu to get
The spectrum associated with g is given by
If the estimated spectrum, 5y (/), of Y is obtained by methods discussed below,

that of §x (/) can be obtained from
If one can choose <?(/) to cancel most of the frequency dependence of Sx (/) then
Sy(/) will be nearly constant and easier to evaluate accurately by the methods
described below.
Of course, this is a chicken and egg problem, since it supposes that an approxi-
mate spectrum, <§(/), is already known. A periodogram, or a parametric approach,
can be used to get a zeroth approximation to the spectrum of X. If one can choose
a prewindow filter that reduces the dynamic range of the resulting RV, the use of a
Slepian window will yield a more accurate estimate of the spectrum.
(2) Post-smoothing by a second window produces a modified estimate
Thomson (1982) recommends that this smoothing window be constructed using

methods due to Papoulis (1973). The subscript D above implies that a data window
HIGH RESOLUTION RESULTS 301
Dn has already been used. However, the post smoothing window depends on the
choice of data window.
(3) The principal element of the Thomson (1982) method is to window the original,
or prewhitened data by the discrete prolate spheroidal sequence (DPSS), Dn =
(k]
Vn , which proportional to the discrete Fourier transform of the discrete prolate
spheroidal function (DPSF), [/&(/). Here, /, is a frequency, n is an integer, for
the discrete time t, and k is an integer that refers to the fcth mode, £/&(/). Both
Vn and Uk(f) are functions of N, the number of data points (times) and W, the
bandwidth.
17.7 High resolution results
The first preliminary estimate of the spectral amplitude is y(f), the discrete Fourier
transform of x(n), given in Eq. (17.29). But this covers the full frequency range,
[—1/2, 1/2] with a single formula. Thomson (1982) in his Eq. (3.1) suggests that a
Fourier transform truncated to the interval [/ — W, f + W] would provide a better
resolution using the formula
This proposal is an excellent one, since one can prove that if S(f) were flat over
this smaller interval, it would yield the correct spectral density at /.
Presumably because Eq. (17.51) cannot be expressed directly in terms of
observed x(n), Thomson introduces another formula for quantity yk(f), related
to Zk(f) by Eq. (17.58) in the next section,
which also covers the truncated frequency region. Since y(f) is expressible i
terms of the data x(n) via Eq. (17.31), he then obtains
after using the integral equation (17.33) for the Slepian functions. The advantage
of this new form is that it is directly expressible in terms of the data
The disadvantage of this estimate is that it is no longer local but has contributions
from the entire [—1/2,1/2] domain.
Thomson then uses yk(fo) to obtain an estimate of the spectral density S(f; /o)
at any / in the interval /o — W < f < /o + W . He then averages this result over
the interval to obtain the average result
where each
can be regarded as an individual spectral estimate, with the fcth data window
17.8 Adaptive weighting
Thomson (1982) attempts to obtain an estimate closer to the genuinely local

estimate of Eq. (17.51) by using a weighted average
The weights are chosen to minimize the sum of least squares of
namely the differences between the ideal estimate of the amplitude and the
weighted estimate dk(f)yk(f)- The result is that the weights are estimated from
This procedure depends on S(f) and so must be self-consistent. The result is an

iterative solution of
Here £?&(/) is referred to by Thomson as the broadband bias
The integral in Eq. (17.61) is over the cut region
The evaluation of the estimate B(f)is discussed in Thomson (1982), Section 5.

TREND REMOVAL AND SEASONAL ADJUSTMENT 303
17.9 Trend removal and seasonal adjustment
For an elementary discussion of the removal of periodicity and trend see Williams
(1997). For a broad perspective on spectral analysis see Tukey (1961). A good gen-
eral reference is Handbook of Statistics III, by Brillinger and Krishnaiah (1983),
as is Harris (1967) and Tukey (1967).
A more detailed description of the techniques used in what is now called
"complex demodulation" is given in Hassan (1963).
The methods described in this chapter are nonparametric, in the sense that one
doesn't have a model with a small number of parameters which are fitted by analyz-
ing experimental time series data. Fourier techniques have been used. In a sense,
parameters are involved. But there are so many that no assumptions are really made
that pertain to a model, except possibly the assumption that one is dealing with a
stationary process. A recent comparison of the relative merits of parametric and
Fourier transform methods has been given by Gardiner (1992). He also suggests
methods of combining parametric and Fourier methods. For an excellent elemen-
tary presentation of methods of dealing with nonstationary time series see Cohen
(1995) who discusses the effects of many of the popular filters.
17.10 Appendix A: The sampling theorem
The Poisson sum formula

A useful lemma, see Lax (1974), that leads to a special Poisson sum formula is
This theorem can be established by taking the left hand sum from —NtoN, and
then taking the limit. A simpler approach is to note that the right hand side g(x)
regarded as a function of x, is periodic
The Fourier series representation over the domain 0 < x < a leads to the relation
where
In order for Eq. (17.65) to be true, we must have
In order that g(x) be periodic, A(x — x') must possess the same delta function in
each interval from na to (n + l)a. In other words,
If this result is equated to the definition, Eq. (17.66) we have the special Poisson
formula, Eq. (17.63).
If we integrate Eq. (17.63) over an arbitrary function f ( x ) , we obtain the usual
Poisson sum formula, see Titchmarch (1948),
where f ( x ) is the Fourier transform of F(k),
A generalization of the Poisson sum formula to three-dimensional (possibly

nonorthogonal) lattices was given in Lax (1974), Chapter 6.
The sampling theorem
and B = tr/a, makes the remarkable statement that f ( x ) is determined every-

where by its sample values f ( n a ) provided that its Fourier transform, F(k), in Eq.
(17.70) is band-limited to the region
Proof
Since F ( k ) = 0 unless \k\< B = vr/a, we can represent F(k) by a Fourier series
over this finite region:
APPENDIX A: THE SAMPLING THEOREM 305
where the Fourier coefficients are determined by
where the second form makes use of the definition, Eq. (17.70). If Eq. (17.74) is
inserted into Eq. (17.74) we obtain
Thus F(k) is uniquely determined by the sample values f(na). Inserting Eq.
(17.75) into Eq. (17.70), but restricting the integration region to the band from
—TT/O, to TT/O we immediately obtain Eq. (17.71), the sampling theorem.
An alternate view of the sampling theorem can be phrased as follows. Suppose
one has a set of values f(na) at the lattice points na and we wish to inter-
polate to obtain the values /(x) at other points. The sampling theorem, in the
form of Eq. (17.71) provides the smoothest interpolation in the sense that any
other interpolation will not be band-limited, and hence will involve higher Fourier
components.
Bibliography
Abramatic, J. F. and Silverman, L. M. (1982). Nonlinear restoration of noisy

images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAM1-
4, No. 2, 141-148.
Alexander, H. W. (1961). Elements of Mathematical Statistics. John Wiley and
Sons, New York.
Alfano, R. R. ed. (1994). OSA Proceedings on Advances in Optical Imaging and
photon Migration. Optical Soceity of America, Washington D.C.
Anis, A. A. and Lloyd, E. H. (1976). The expected value of the adjusted rescaled
hurst range of independent normal summands. Biometrika, 63, pi 11-116.
Arfken, G. B. and Weber, H. J. (1995). Mathematical Methods for Physicists.
Academic Press, San Diego.
Backus, G. and Gilbert, F. (1970). Uniqueness of the inversion of inaccurate gross
earth data. Philosophical Transactions of the Royal Society, 256, 123—192.
Bayes, T. (1763). Essay towards solving a problem in the doctrine of chance.
Philosophical Transactions of the Royal Society, Essays LII, 370—418.
Bell, D. A. (1960). Electrical Noise. Van Nostrand. London.
Bellman, R. E. (1964). Invariant Embedding and Time-independent Transport
Processes. American Elsevier Pub. Co., New York.
Bellman R. and Wing G. (1976). An Introduction to Invariant Embedding. Wiley,
New York.
Bendat, J. S. and Piersol, A. G. (1971). Random Data: Analysis and Measurement
Process. Wiley, New York.
Benjamin, R. (1980). Generalization of maximum-entropy pattern analysis. IEEE
Proceedings, 127 pt F, 341-353.
Bernstein, P. L. (1998). Against the Gods, the Remarkable Study of Risk. Wiley,
New York.
Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities.
Journal of Political Economy, 81, 637-654.
Bouchard, J. and Sornette, D. (1994). The Black-Scholes option pricing problem in
mathematical finance: generalization and extensions for a large class of stochastic
processes. Journal de Physique I (Paris), 4, 863-881.
Brillinger, D. R. and Krishnaiah, P. R. (1983). Handbook of Statistics III. North
Holland, Amsterdam.
308 BIBLIOGRAPHY
Brockwell, P. J. and Davis R. A. (1991). Time Series Theory and Methods. Springer
Verlag, New York.
Brown, R. (1828). A brief account of microscopical observations made in the
months of June, July, and August 1827, on the particles contained in the pollen of
plants; and on the general existence of active molecules in organic and inorganic
bodies. The Philosophical Magazine, 4, 161-173.
Bullard, B. C. and Cooper, R. J. B. (1948). The determination of the masses nec-
essary to produce a given gravitational field. Proceedings of the Royal Society A
(London), 194, 332-347.
Burgess, R. E. (1965). Fluctuation Phenomena in Solids. Academic Press, New
York.
Cai, W., Lax, M., and Alfano, R. R. (2000a). Cumulant solution of the elastic
Boltzmann transport equation in an infinite uniform medium. Physcial Review E,
61, 3871-3876.
Cai, W., Lax, M., and Alfano, R. R. (2000b). Analytical solution of the elas-
tic Boltzmann transport equation in an infinite uniform medium using cumulant
expansion. Journal of Physical Chemistry B, 104, 3996-4000.
Cai, W., Lax, M., and Alfano, R. R. (2000c). Analytical solution of the polar-
ized photon transport equation in an infinite uniform medium using cumulant
expansion. Physical Review E, 63, 016606.
Cai, W., Xu, M., Lax, M., and Alfano, R. R. (2002). Diffusion coefficient depends
on time not on absorption. Optics Letters, 27, 731-733.
Cai, W., Xu, M., and Alfano, R. R. (2003). Three-dimensional radiative transfer
tomography for turbid media. IEEE Selected Topics in Quantum Electronics, 9,
189-198.
Cai, W., Xu, M., and Alfano, R. R. (2005). Analytical form of the particle distri-
bution based on the cumulant solution of the elastic Boltzmann transport equation.
Physical Review E, 71, 041202.
Callen, H. B. and Welton, T. A. (1951). Irreversibility and generalized noise.
Physical Review, 83, 34-40.
Callen, H. B. and Greene, R. F. (1952). Theorem of irreversible thermodynamics.
Cameron, R. H. and Martin, W. T. (1944). The Wiener measure of Hilbert neigh-
borhoods in the space of continuous functions. Journal Mathematical Physics, 23,
195-209.
Cameron, R. H. and Martin, W. T. (1945). Transformations of Wiener integrals
under a general class of linear transformations. Transactions of the American
Mathematical Society, 58, 184-219.
Campbell, N. (1909). The study of discontinuous phenomena. Proceedings of the
BIBLIOGRAPHY 309
Cambridge Philosophical Society, 15, 117-136.
Carley, A. E. and Joyner, R. W. (1979). The application of deconvolution methods
in electron spectroscopy - a review. Journal of Electron Spectroscopy and Related
Phenomena, 16, 1-23.
Casimir, H. B. and Polder, D. (1948). The influence of retardation on the London-
van der Waals forces. Physical Review, 73, 360-372.
Cercignani, C. (1988). The Boltzmann Equation and its Applications. Series in
Applied Mathematical Sciences, Springer-Verlag, New York.
Chan, H. B., Aksyuk, V. A., Kleiman, R. N., Bishop, D. J., and Capasso, F. (2001).
Quantum mechanical actuation of microelectromechanical systems by the Casimir
force. Science, 291, 1941-1944.
Chandrasekhar, S. (1943). Stochastic problems in physics and astronomy. Reviews
of Modern Physics, 15, 1-89.
Chandrasekhar, S. (1960). Radiative Transfer. Dover, New York.
Chung, K. L. (1968). A Course in Probability. Harcourt Brace and World, New
York,
van Cittert, P. H. (1931). Zum einfluss der spaltbreite auf die intensitatsverteilung
in spektrallinien. Zeitschrift fiir Physik, 69, 298-308.
Cohen, L. (1995). Time-Frequency Analysis. Prentice Hall, New Jersey.
Dahlquist, G. and Bjorck, A. (1974). Numerical Methods. Prentice Hall, New
Jersey.
Demoment G. (1989). Image reconstruction and restoration: Overview of common
estimation structures and problems. IEEE Transactions on Acoustics, Speech, and
Signal Processing, 37, 2024-2036.
Deryagin, B. V. and Abrikosova, I. I. (1956). Direct measurement of molecular
attraction of solid bodies. I. Statement of the problem and methods of measure-
ment of forces using negative feedback. Zhurnal Eksperimentalnoi i Teoreticheskoi
Fiziki, 30, 993-1006.
Deryagin, B. V., Abrikosova, I. L, and Lifshitz, E. M. (1956). Direct measure-
ment of molecular attraction between solids separated by a narrow gap. Quarterly
Reviews - Chemical Society (London), 10, 295-329.
Deutch, R. (1962). Nonlinear Transformations of Random Processes. Prentice
Hall. New Jersey.
Dirac, P. A. M. (1935). Principles of Quantum Mechanics. Oxford, New York.
Doob, J. L. (1942). The Brownian motion and stochastic equations. Annals of
Mathematics, 43, 351-369.
Doob, J. L. (1953). Stochastic Processes. Wiley, New York.
310 BIBLIOGRAPHY
Durduran, T., Yodh, A. G., Chance, B., and Boas, D. A. (1997). Does the photon-
diffusion coefficient depend on absorption? Journal of the Optical Society of
America A, 14, 3358-3365.
Dyson, F. (1949). The S matrix in quantum electrodynamics. Physical Review, 75,
1736-1755.
Dyson, F. (1949). The radiation theories of Tomonaga, Schwinger and Feynman.
Eberhard, A. (1973). An optimum discrete window for the calculation of power
spectra. IEEE Transactions on Audio andElectroacoustics, 21, 37-43.
Eberly, J. H. and W'odkiewicz, K. (1977). The time-dependent physical spectrum
of light. Journal of the Optical Society of America, 67, 1252-1261.
Einstein, A. (1905). On the movement of small particles suspended in a stationary
liquid demanded by the molecular-kinetic theory of heat. Annalen der Physik, 17,
549-560.
Einstein, A. (1906). On the theory of the Brownian movement. Annalen der
Physik, 19, 371-381.
Einstein, A. (1956). Investigations on the Theory of the Brownian Movement.
Dover, New York.
Ekstein, H. and Rostoker, N. (1955). Quantum theory of fluctuations. Physical
Review, 100, 1023-1029.
Fadeeva, V. N. (1959) Computational Methods of Linear Algebra. Dover, New
York.
Feller, W. (1957, 1971). An Introduction to Probability Theory and its Applica-
tions. Volume I and II. Wiley, New York.
Feynman, R. P. (1948). Space-time approach to non-relativistic quantum mechan-
ics. Reviews of Modern Physics, 20, 367-398.
Feynman, R. P. (1951). An operator calculus having applications in quantum
electrodynamics. Physical Review, 84, 108-128.
Ford, G. W. and O'Connell, R. F. (1996). There is no quantum regression theorem.
Physical Review Letters, 77, 798-801.
Ford, G. W. and O'Connell, R. F. (2000). Driven system and the Lax formula.
Optics Communications, 179, 451—461.
Fox, P. A., Hall, A. D., and Schryer, N. L. (1978a). The PORT Mathematical
Subroutine Library. ACM Transactions on Mathematical Software, 4, 104-126.
Fox, P. A., Hall, A. D., and Schryer, N. L. (1978b). Algorithm 528: framework for
a portable library. ACM Transactions on Mathematical Software, 4, 177-188.
Franklin, J. N. (1970). Well-posed stochastic extensions of ill-posed linear
problems. Journal of Mathematical Analysis and Applications, 31, 682—716.
BIBLIOGRAPHY 311
Freeman, J. J. (1952). On the relation between the conductance and the noise
power of certain electronic streams. Journal of Applied Physics, 23, 1223-1225.
Frieden, B. R. (1975). Image enhancement and restoration in Topics in Applied
Physics, ed. Huang, T. S. Springer-Verlag, New York.
Fry, T. C. (1925). The theory of the schroteffekt. Journal of the Franklin Institute,
199, 203-220.
Fry, T. C. (1928). Probability and its Engineering Uses. Van Nostrand, London.
Furth, R. (1920). Die brownsche bewegung bei beriicksichtigung einer persistenz
der bewegungsrichtung mit anwendungen auf die bewegung lebender infusorien.
Zeitschrift fur Physik A, 2, 244-256.
Furth, R. (1922). Die bestimmung deer elektronenladung aus dem schroteffekt an
gluhkathodenrohren. Physikalische Zeitschrift, 23, 354-362.
Gandjbakhche, A. H. ed. (1999). Proceedings of Inter-Institute Workshop on in-
vivo Optical Imaging at the NIH, Optical Soceity of America, Washington D.C.
Gardiner, W. A. (1992). Fundamental comparison of Fourier transformation and
model fitting methods of spectral analysis. Imaging Systems and Technology, 4,
109-121.
Ghashghaie, S., Breymann, W., Peinke, J., Talkner, P., and Dodge, Y. (1996).
Turbulent cascades in foreign exchange markets. Nature, 381, 767-770.
Glauber, R. J. (1963a). Photon correlations. Physical Review Letters, 10, 84-86.
Glauber, R. J. (1963b). The quantum theory of optical coherence. Physical Review,
130, 2529-2539.
Glauber, R. J. (1963c). Coherent and incoherent states of the radiation field.
Glauber, R. J. (1965). Optical coherence and photon statistics in Quantum Optics
and Electrons, eds. DeWitt, C., Blandin, A., Cohen-Tannoudji, C., Gordon and
Breach, New York.
Goldstein, H. (1980). Classical Mechanics. Addison-Wesley, New York.
Goursat, E. (1917). Differential Equations. Ginn and Co., Boston.
Greene, R. F. and Callen, H. B. (1952). Theorem of irreversible thermodynamics
II. Physical Review, 88, 1387-1391.
Grobner, W. and Hofreiter, N. (1950). Integraltafel, Bestimmte Integrale. Springer
Verlag, Berlin.
Hamming, R. W. (1991). The Art of Probability. Addison Wesley, New York.
Harris, F. J. (1978). On the use of windows for harmonic analysis with discrete
Fourier transform. Transactions of IEEE, 66, 51-53.
Hartmann, C. A. (1921). The determination of the elementary electric charge by
means of the "shot effect". Annalen der Physik, 65, 51-78.
312 BIBLIOGRAPHY
Harris, B. (1967). Spectral Analysis of Time Series, John Wiley, New York.
Hassan, T. (1983). Complex demodulation: some theory and applications in
Handbook of Statistics III, eds. D. Brillinger and P. Krishnaiah. North Holland,
Amsterdam.
Hashitsume, N. (1956). A statistical theory of linear dissipative systems, II.
Progress of Theoretical Physics, 15, 369—413.
Hauge, E. H. (1974). What can we learn from Lorentz models? in Transport
phenomena, 337-367, Springer-Verlag, Berlin.
Hempstead, R. D. and Lax, M. (1967CVI). Classical noise VI: noise in self-
sustained oscillators near threshold. Physical Review, 161, 350-366.
Hill, J. E. and van Vliet, K. M. (1958). Ambipolar transport of carrier density
fluctuations in germanium. Physica, 24, 709-720.
Hillary, D. J., Wark, D. Q., and James D. G. (1965). An experimental determi-
nation of the atmospheric temperature profile by indirect means. Nature, 205,
489-491.
Hull, A. W. and Williams, N. H. (1925). Determination of elementary charge e
from measurements of shot-effect. Physical Review, 25, 147-173.
Hull, J. (1989, 2001). Options, Futures and other Derivative Securities. Prentice
Hill, New Jersey.
Hurst, H. E. (1951). Long term storage capacity of reservoirs. Transactions of the
American Society of Civil Engineers, 116, 770-799.
IMSL Inc. (1987). IMSL Math/Library Users Manual. IMSL, Houston.
Ishimaru, A. (1978). Wave Propagation and Scattering in Random Media, Volume
I and II. Academic, New York.
Ito, K. (1951). On stochastic differential equations. Memoirs of the American
Mathematical Society, 4, 1-51.
Ito, K. and McKean, H. P. (1965). Diffusion Processes and their Sample Paths.
Academic Press, New York.
Jackson, J. D. (1975). Classical Electrodynamics. Wiley, New York.
Jansson, P. A., Hunt, R. H. and Plyler, E. K. (1970). Resolution enhancement of
spectra. Journal of the Optical Society of America, 60, 596-599.
Jansson, P. A. (1970). Method for determining the response function of a high-
resolution infra-red spectrometer. Journal of the Optical Society of America, 60,
184-191.
Jansson, P. A., Hunt, R. H. and Plyler, E. K. (1970). Resolution enhancement of
spectra. Journal of the Optical Society of America, 60, 596-599.
Jaynes, E. T. (1958). Probability Theory in Science and Engineering. Socony
Mobil Oil Co., Dallas.
BIBLIOGRAPHY 313
Jeffries, H. and Jeffreys, B. S. (1950). Methods of Mathematical Physics.
Cambridge University Press, London.
Jeffreys, H. (1957). Scientific Inference. Cambridge University Press, London.
John, F. (1955). Numerical solution of the equation of heat conduction for
proceeding times. Annali di Matematica Pura edApplicata, 4, 129-142.
Johnson, J. B. (1928). Thermal agitation of electricity in conductors. Physical
Review, 32, 97-109.
Kac, M. and Siegert, A. J. F. (1947). On the theory of noise in radio receivers with
square law detectors. Journal of Applied Physics, 18, 383-400.
Kahaner, D., Moler, C., and Nash, S. (1989). Numerical Methods and Software,
Prentice Hall, New Jersey.
van Kampen N. (1992). Stochastic Processes in Physics and Chemistry, North
Holland, Amsterdam.
Kaplan, L. D. (1959). Inferences of atmospheric structures from satellite remote
radiation measurements. Journal of the Optical Society of America, 49, 1004-
1014.
Karhunen, K. (1946). Zur spektraltheorie stochastischer prozesse. Annales
Academiae Scientiarum Fennicae, 37.
Kelley, P. L. and Kleiner, W. V. (1964). Theory of electromagnetic field measure-
ment and photoelectron counting. Physical Review, 136, A316-A334.
Kendall, M. G. (1999). Kendall's Advanced Theory of Statistics. Oxford University
Press, London.
Kendall, M. G. and Stuart, A. (1969). The Advanced Theory of Statistics. Volume
I. Hafner, New York.
Khinchine, A. (1934). Korrelationstheorie de stationaren stochastichen prozesse.
Mathematische Annalen, 109, 604-615.
Khinchine, A. (1938). The theory of stationary chance processes. Rossiiskaya
Akademiya Nauk, 5.
Kitchener, J. A. and Prosser, A. P. (1957). Direct Measurement of the long-range
van der Waals forces. Proceedings of the Royal Society of London. A, 242, 403-
409.
Kittel, C. (1958). Elementary Statistical Physics. John Wiley, New York.
Kleiner, B., Martin, R. D., and Thomson, D. J. (1979). Robust estimation of power
spectra. Journal of the Royal Statistical Society B, 41, 313-351.
Kolmogorov, A. N. (1950). Foundations of the Theory of Probability, Chelsea,
New York (a translation of the 1933 Russian version).
Kuo, C., Lindberg, C. R., and Thomson, D. J. (1990). Coherence estab-
lished between atmospheric carbon dioxide and global temperature. Nature, 343,
314 BIBLIOGRAPHY
709-714.
Kubo, R. (1957). Statistical mechanical theory of irreversible processes I. Journal
of the Physical Society of Japan, 12. 570-586.
Kubo, R. (1962). Generalized cumulant expansion method. Journal of the Physical
Society of Japan, 17, 1100-1120.
Kyburg, H. E. (1969). Probability Theory. Prentice Hall, New Jersey.
Lampard, D. G. (1954). Generalization of the Wiener-Khintchine theorem to
nonstationary processes. Journal of Applied Physics, 25, 802-803.
Langevin, M. P. (1908). Sur la thAl'orie du mouvement brownien. Comptes-
Rendus de VAcadAl'mie des Sciences (Paris), 146, 530-533.
Lawson, J. L. and Uhlenbeck, G. E. (1950). Threshold Signals, MIT Radiation Lab
Series, 24, McGraw Hill, New York.
Lax, M. and Phillips, J. C. (1958). One dimensional impurity bands. Physical
Review, 110, 41-49.
Lax, M. (1958QI). Generalized mobility theory. Physical Review, 109, 1921-1926.
Lax, M. and Mengert, P. (1960). Influence of trapping, diffusion and recombi-
nation on carrier concentration fluctuations. Journal of Physics and Chemstry of
Solids, 14, 248-267.
Lax, M. (19601). Fluctuations from the nonequilibrium steady state. Reviews of
Modem Physics, 32, 25-64.
Lax, M. (1963QII). Formal theory of quantum fluctuations from a driven state.
Lax, M. (1964QIII). Quantum relaxation, the shape of lattice absorption and
inelastic neutron scattering lines. Journal of Physics and Chemistry of Solids, 25,
487-503.
Lax, M. (1966III). Classical noise III: nonlinear via Markoff processes. Reviews of
Modern Physics, 38, 359-379.
Lax, M. (1966IV). Classical noise IV: Langevin methods. Reviews of Modern
Physics, 38, 541-566.
Lax, M. (1966QIV). Quantum noise IV: quantum theory of noise sources.Physical
Review, 145, 110-129.
Lax, M. (1967). Quantum theory of noise in masers and lasers, in 1966 Tokyo
Summer Lecture in Theoretical Physics, Part 1: Dynamical Processes in Solid
State Optics, eds. Kubo, R. and Kamimura, H., Benjamin, New York, 195-245.
Lax, M. (1967V). Classical noise V: noise in self sustained oscillators. Physical
Review, 160, 290-307.
Lax, M. (1968). Fluctuations and coherence phenomena in classical and quan-
tum physics in 7966 Brandeis Summer Lecture Series, Statistical Physics, volume
BIBLIOGRAPHY 315
2, eds. Chretien, M., Gross, E. P., and Deser, S., Gordon and Breach Science
Publishers, New York, 270-478.
Lax, M. (1968QXI). Quantum noise XI: multitime correspondence between
quantum and classical stochastic processes. Physical Review, 72, 350-361.
Lax, M. and Yuen, H. (1968). Quantum noise XIII: six classical variable
description of quantum laser fluctuations. Physical Review, 172. 362-371.
Lax, M. and Zwanziger, M. (1973). Exact photocount statistics: lasers near
threshold. Physical Review A, 7, 750-771.
Lax, M. (1974). Symmetry Principles in Solid State and Molecular Physics. John
Wiley and Sons, New York.
Lax M. (1997). Stochastic processes. Encyclopedia of Applied Physics, 20, 19-60.
Lax, M. (2000). The Lax-Onsger regression 'theorem' revisited. Optics Commu-
nications, 179, 463-476.
Levinson, N. (1947). A heuristic derivation of Wiener's mathematical theory of
prediction and filtering. Journal of Mathematics and Physics, 26, 110-119.
Louisell, W H. (1973). Quantum Statistical Properties of Radiation. Wiley, New
York.
Lukacs, E. (1960). Characteristic Functions. Griffith, London.
MacDonald, D. K. C. (1962). Noise and Fluctuations. Wiley, New York.
Mandelbrot, B. B. (1983). The Fractal Geometry of Nature. Freeman, San
Francisco.
Mandelbrot, B. B. (1997). Fractals and Scaling in Finance. Springer, Berlin.
Merton, R. C. (1973). Theory of rational pricing. Bell Journal of Economics and
Management Science, 4, 141-183.
Middleton, D. (1960). Introduction to Statistical Communication Theory.
McGraw-Hill, New York.
von Mises, R. (1937). Probability Statistics and Truth, 2nd edn. Macmillan, New
York.
Montroll, E. W. (1952). Markoff chains, Wiener integrals, and quantum theory.
Communications in Pure and Applied Mathematics, 5, 415-453.
Montroll, E. and Shlesinger, M. (1983). A wonderful world of random walks, in
CCNYPhysics Symposium in Celebration ofMelvin Lax's Sixtieth Birthday, City
College of New York, New York.
Moran, P. A. P. (1968). An Introduction to Probability Theory. Clarendon, Oxford.
Morse, P. and Feshbach H. (1953). Methods of Theoretical Physics. McGraw-Hill,
New York.
Moullin, E. B. (1938). Spontaneous Fluctuation of Voltage. Oxford, New York.
316 BIBLIOGRAPHY
Moyal, J. E. (1949). Quantum mechanics as a statistical theory. Proceedings of the
Cambridge Philosophy Society, 45, 99-124.
Murdoch, J. B. (1970). Network Theory. McGraw-Hill, New York.
NAG Fortran Laboratory. Numerical Analysis Group.
von Nageli, K. (1879). Sitzber, Kgl. Bayerische Akad. wiss. Munchen, Math-
Physik Kl 9, 389-453.
Nielsen, L. T. (1999). Pricing and Hedging of Derivative Securities. Oxford
University Press, London.
Noble, B. (1977). The Numerical solution of IBs. in The State of the Art in
Numerical Analysis, ed. D. Jacobs, Academic Press, London.
North, D. O. (1940). Fluctuations in space charge limited currents at moderately
high frequencies, part II: diodes and negative grid triodes. RCA Review, 4, 441.
Nyquist, H. (1927). Minutes of the New York meeting, February 25-26, 1927.
Joint meeting with the Optical Society of American. Physical Review, 29, 614.
Nyquist, H. (1928). Thermal agitation of electric charge in conductors. Physical
Review, 32, 110.
Onsager, L. (1931a). Reciprocal relations in irreversible processess. I. Physical
Review, 37, 405-426.
Onsager, L. (193 Ib). Reciprocal relations in irreversible processess. II. Physical
Review, 38, 2265-2279.
Page, C. H. (1952). Instantaneous power spectra. Journal of Applied Physics, 23,
103-106.
Papoulis, A. (1973). Minimum bias windows for high resolution spectrum
estimates. IEEE Transactions on Information Theory, 19, 9-12.
Parzen, E. (1960). Modern Probability Theory and Its Applications. John Wiley,
New York.
Pawula, R. F. (1967). Approximation of the linear Boltzmann equation by the
Fokker-Planck equation. Physical Review, 162, 186-188.
Pearson, K. (1900). On the criterion that a given system of deviations from the
probable in the case of a correlated system of variables is such that it can be rea-
sonably supposed to have arisen from random sampling. Philosophical Magazine
B, 50, 7.
Percival, D, B. and Walden, A. T. (1993). Spectral Analysis for Physical
Applications. Cambridge University Press, London.
Peters, E. E. (1994). Fractal Market Analysis. Wiley, New York.
Phillips, D. L. (1962). A technique for the numerical solution of certain integral
equations of the first kind. Journal of Association for Computing Machinery, 9,
84-97.
BIBLIOGRAPHY 317
de-Picciotto, R., Reznikov, M., Heilblum, M., Umansky, U., Bunun, G., and
Mahalu, D. (1997). Direct observation of a fractional charge. Nature, 389, 162.
de-Picciotto, R. (1998). Shot noise of non-interacting composite Fermions.
Preprint Weizman Institute of Science, Talk at Bell Laboratories, 03/02/1998.
PORT (1984). PORT Mathematical Subroutine Library. Bell Labs, New Jersey.
Power, E. A. (1964). Introductory Quantum Electrodynamics. Longmans, Green
and Co., London.
Press, W. H., Teukolsky, S. A., Vetterling, W. T, and Flannery, B. P. (1992).
Numerical Recipes in Fortran, 2nd edn. Cambridge University Press, London.
Price, G. L. (1982). Isolation of instability in the Fredholm integral equation of
the first kind: application to the deconvolution of noisy spectra. Journal of Applied
Physics, 53, 4571-457.
Priestley, M. B. (1981). Spectral Analysis and Time series. Academic Press,
London.
Rack, A. J. (1938). Effect of space charge and transit time on shot noise in diodes.
Bell System Technical Journal, 17, 592.
Ramsey, J. B. and Thomson, D. J. (1999). A reanalysis of the spectral properties
of some economic and financial time series, in Nonlinear Time Series Analysis
of Economic and Financial Data, ed. Rothman, P., Kluwer Academic Publishers,
New York.
Reggiani, L., Lugli, P., and Mitin, V. (1988). Generalization of Nyquist-Einstein
relationship to conditions far from equilibrium in nondegenerate semiconductors.
Physical Review Letters, 60, 736.
Rice, S. O. (1944). Mathematical analysis of random noise, part I: shot noise. Bell
System Technical Journal, 23, 282-310.
Rice, S. O. (1945). Mathematical analysis of random noise, part II: power spectra
and correlation functions. Bell System Technical Journal, 24, 46-70.
Rice, S. O. (1948a). Mathematical analysis of random noise, part III: statistical
properties of random noise currents. Bell System Technical Journal, 27, 109.
Rice, S. O. (1948b). Mathematical analysis of random noise, part IV: noise through
nonlinear devices. Bell System Technical Journal, 27, 115.
Risken, H. and Vollmer, H. D. (1967). Correlation of the amplitude and of the
intensity fluctuation near threshold. Zeitschrift fur Physik, 181, 301-312.
Robinson, F. N. H. (1962). Noise in Electric Circuits. Clarendon, Oxford.
Robinson, F. N. H. (1974). Noise and Fluctuations in Electronic Devices and
Circuits. Clarendon, Oxford.
van Roosebroeck, W. (1953). Transport of added current carriers in a homogenous
semiconductors. Physical Review, 91, 282-289.
318 BIBLIOGRAPHY
Schawlow, A. L. and Townes, C. H. (1958). Infrared and optical masers. Physical
Review, 112, 1940.
Scher, H. and Lax, M. (1973). Stochastic transport in a disordered solid. I. theory.
Physical Review B, 7, 4491-4502.
Schottky, W. (1918). Uber spontane Stromschwankungen in verschiednen Elek-
trizitatsleitern. Annalen der Physik, 57, 541-567.
Schuster, A. (1898). On the investigation of hidden periodicities with application
to a supposed 26 day period of meteorological phenomena. Terestrial Magnetism,
3, 13-41.
Scully, M. O. and Lamb, W. E. (1967). Quantum theory of an optical meser. I.
general theory. Physical Review, 159, 208-226.
Scully, M. O. and Lamb, W. E. (1969). Quantum theory of an optical maser. III.
theory of photoelectron counting statistics. Physical Review, 179, 368.
Scully, M. O. and Zubairy, M. S. (1995). Quantum Optics. Cambridge University
Press., London.
Shaw, C. B. Jr. (1972). Improvement of the numerical resolution of an instrument
by numerical solution of the integral equation. Journal of Mathematical Analysis
and Applications, 37, 83-112.
Shlesinger, M. F. (1996). Random processes. Encyclopedia of Applied Physics, 16,
45-70.
Shockley, W. (1938). Currents to conductors induced by a moving point charge.
Journal of Applied Physics, 9, 635-636.
Shockley, W. (1950). Electrons and Holes in Semiconductors, Van Nostrand,
London.
Shockley, W., Copeland, J. A., and James, R. P. (1966). The impedance field
method of noise calculation in active semiconcuctor devices, in Quantum Theory
of Atoms, Molecules and the Solid State, ed. Lowdin, R O., Academic Press, New
York, 537-563.
Slepian, D. and Pollak, H. (1961). Prolate spheroidal wave functions, Fourier
analysis and uncertainty. Bell System Techcal Journal, 40, 43-64.
Slepian, D. (1964). Prolate spheroidal wave functions, Fourier analysis and
uncertainty IV. Bell System Technical Journal, 43, 3009-3057.
Slepian, D. (1965). Some asymptotic expansions for prolate spheroidal wave
functions. Journal of Mathmatical Physics, 44, 99-140.
Slepian, D. (1978). Prolate spheroidal wave functions, Fourier analysis and
uncertainty V: the discrete case. Bell System Technical Journal, 57, 1371-1429.
Smith, W. A. (1974). Laser intensity fluctuations when the photon number at
threshold is small. Optics Communications, 12, 236-239.
BIBLIOGRAPHY 319
Smoluchowski, M. V. (1916). Drei Vortrage uber Diffusion, Brownische Bewe-
gung und Koagulation von Kolloidteilchen. Physik Zeitschrift, 17, 557, 585.
Stratonovich, R. L. (1963). Topics in the Theory of Random Noise, Vol. I. Gordon
and Breach, New York.
Thiele, T. N. (1903). Theory of Observations, Dayton, Londons. Reprinted in
Annals of Mathematical Statistics, 2, (1931), 165-308.
Thomson, D. J. (1977). Spectrum estimation techniques for characterization and
development of WT4 waveguide. Bell System Technical Journal, 56, Part I, 1769-
1815; Part II, 1983-2005.
Thomson, D. J. (1982). Spectrum analysis and harmonic analysis. Proceedings
IEEE (special issue on spectrum estimation), 70, 1055-1096.
Thomson, D. J. (1990a). Time series analysis of holocene climate data. Philosoph-
ical Transactions of the Royal Society of London Series A, 330, 601-616.
Thomson, D. J. (1990b). Quadratic-inverse spectrum estimates; applications to
paleoclimatology. Philosophical Transactions of the Royal Society of London
Series A, 332, 539-597.
Thomson, D. J. and Chave, A. D. (1991). Jackknifed error estimates for spec-
tra, coherences, and transfer functions in Advances in Spectrum Estimation, ed.
Haykin S., Prentice Hall, New Jersey, 58-113.
Thomson, D. J. (1995). The seasons, global temperature, and precession. Science,
268, 59-68.
Thomson, D. J., MacLennan, C. G., and Lanzerotti, L. J. (1995). Propagation of
solar oscillations through the interplanetary medium. Nature, 376, 139-144.
Thomson, D. J. (1997). Dependence of global temperatures on atmospheric CC>2
and solar irradiance. Proceedings of the National Academy of Sciences of the
United States of America, 94, 8370-8377.
Thomson, D. J. (2001). Multitaper analysis of nonstationary and nonlinear time
series data. Nonlinear and Nonstationary Signal Processing, eds. Fitzgerald, W.,
Smith, R., Walden, A., and Young, P., Cambridge University Press, London, 317-
394.
Thornber, K. K. (1974). Treatment of microscopic fluctuations in noise theory.
Bell System Technical Journal, 53, 1041-1078.
Tikhonov, A. N. and Arsenin, V. A., (1977). Solution of Ill-Posed Problems,
Winston & Sons, Washington.
Titchmarch, E. C. (1948). Introduction to the Theory of Fourier Integrals. Oxford,
London.
Transistor Teacher's Summer School (1952). Experimental verification of the rela-
tion between diffusion constant and mobility of electrons and holes. Physical
Review, 88, 1368-1369.
320 BIBLIOGRAPHY
Tukey, J. W. (1961). Discussion, emphasising the connection between analysis of
variance and spectral analysis. Technometrics, 3, 191-219; also in The Collected
Works of John W. Tukey, ed. Brillinger, D. R., Wadsworth Advanced Books and
Software, Belmont, California.
Tukey, J. W. (1968). An introduction to calculations of numerical spectral analysis,
in Spectral Analysis of Time Series, ed. Harris, B., Wiley, New York, 25-46.
Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley, New York.
Turchin, V. R, Koslov, V. P., and Malkevich, M. S. (1971). The use of
mathematical-statistics methods in the solution of of incorrectly posed problems.
Soviet Physics USPEKHI, 13, 681-840,
Twomey, S. (1963). On the numerical solution of Fredholm integral equations of
the first kind by the inversion of the linear system produced by quadrature. Journal
of Association for Computing Machinary, 10, 97-101.
Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory of Brownian motion.
Uspensky, J. V. (1937). Introduction to Mathematical Probability. McGraw-Hill,
New York.
Valley, G. E. Jr. and Wallman, H. (1948). Vacuum Tube Amplifiers. MIT Radiation
Lab Series, 13, McGraw-Hill, New York.
Wang, L. and Jacques, S. L. (1994). Error estimation of measuring total interac-
tion coefficients of turbid media using collimated light transmission. Physics in
Medicine and Biology, 39, 2349.
Wang, M. C. and Uhlenbeck, G. E. (1945). On the theory of Brownian motion II.
Reviews of Modern Physics, 17, 323-342.
Wannier, G. (1937). The structure of electronic excitation levels in insulating
crystals. Physical Review, 52, 191.
Wax, N. (1954). Noise and Stochastic Processes. Dover, New York.
Welsh, D. (1988). Codes and Cryptography. Clarendon Press, Oxford, 41.
Welton, T. A. (1948). Some observable effects of the quantum-mechnical fluctua-
tions of the electromagnetic field. Physical Review, 74, 1157.
Wiener, N. (1926). Harmonic analysis of irregular motion, Journal of Mathemati-
cal Physics, 5, 99.
Wiener, N. (1930). Generalized harmonic analysis. Acta Mathematica, 55, 117-
258.
Wiener, N. (1949a). Extrapolation, Interpolation, and Smoothing of Stationary
Time Series. Wiley, New York, and MIT Press, Cambridge.
Wiener, N. (1949b). Time Series. MIT Press, Cambridge.
Wigner, E. (1932). Quantum corrections for thermodynamic equilibrium. Physical
BIBLIOGRAPHY 321
Review, 40, 749-760.
Williams E. C. (1937). Thermal fluctuations in complex networks. Journal of
Electrical Engineering, 81, 751.
Williams, E. C. (1936). Fluctuation voltage in diodes and in multi-electrode valves.
Journal of the Institution of Electrical Engineers, 79, 349-360.
Williams, G. P. (1997). Chaos Theory Tamed. Joseph Henry Press, Washington.
Williams, N. H. and Vincent, H. B. (1926). Determination of electronic charge
from measurements of shot-effect in aperiodic circuits. Physical Review, 28, 1250-
1264.
Williams, N. H. and Huxford, W. S. (1929). Determination of Charge of positive
thermions from measurements of shot effect. Physical Review, 33, 773-778.
Wold, H. O. A. (1938). A Study in the Analysis of Stationary Time Series. Almquist
and Wiksell, Uppsala.
Wolf, P. E. and Maret, G. (1985). Weak localization and coherent backscattering
of photons in disordered media. Physical Review Letters, 55, 2696-2699.
Xu, M., Cai, W., and Alfano, R. R. (2002). Photon migration in turbid media using
a cumulant approximation to radiative transfer. Physical Review E, 65, 066609.
Xu, M., Cai, W, and Alfano, R. R. (2004). Multiple passages of light through an
absorption inhomogeneity in optical imaging of turbid media. Optical Letters, 29,
1757-1759.
Xu, M., Cai, W, Lax, M., and Alfano, R. R. (2004). Stochastic view of photon
migration in turbid media. arXiv.cond-mat, 0401409.
Xu, M. and Alfano, R. R. (2005). Random walk of polarized light in turbid media.
Physical Review Letters, 95, 213905.
Xu, M. and Alfano, R. R. (2005). Circular polarization memory of light. Physical
Review £,72,065601.
Yodh, A. G., Tromberg, B., Sevick-Muraca, E., and Pine, D. (1997). Diffusing
photons in turbid media. Journal of the Optical Society of America A, 14, 136-342.
INDEX
X2 distribution, 27 Cartesian coordinates, 240

o--field, 136,285 Cauchy-Schwarz inequality, 137
causality, 245, 249
absorption rate, 238 central limit theorem, 238
acceptors, 215 Chaos, 64
adaptive weighting, 302 Chapman-Kolmogorov
adiabatic elimination, 63 condition, 46, 132, 148,239
after-effect function, 215, 224 equation, 132
aliasing, 292 generalized, 152
American put and calls, 273 relation, 134
amplitude fluctuation, 195, 200, 205 characteristic function, 7, 143, 151, 154
angular momentum theory, 241 generalized, 175
anti-Stokes, 121 Chebychev's inequality, 8
arbitrage, 271, 272 Chi-square, 26
arbitrageur, 272, 282 Chicago Board of Trade, 272
asset, 271 Clebsch-Gordan coefficients, 241
associated Legendre function, 243 close price, 285
autocorrelation, 69, 70, 169, 292 commission, 272
function, 141 commutation rule, 126, 128, 130
autonomous, 194 complex demodulation, 303
compressibility, 158
concentration, 215
backward equation, 152, 153 fluctuation, 158,218
ballistic light, 228 conditional average, 136, 187, 188, 191
ballistic motion, 244 conditional correlation, 173
ballistic sphere, 253 conditional expectation, 135, 285
band-limited function, 288 conduction band, 212
barring operation, 115 edge, 212
Bayes' theorem, 17 conductivity fluctuation, 215
biased, 293 correlations, 293
biased form, 290 covariance, 294
binomial distribution, 20 Cramer's spectral representation, 295
bipolar drift velcity, 224 cumulant, 10, 237
birth and death process, 139 expansion, 239
black box, 168 expansion theorem, 202
Black-Scholes first cumulant, 238
differential equation, 282, 283 kurtosis, 11
model, 280 second cumulan, 238
Boltzmann approximation, 214 skewness, 11
Boltzmann transport equation, 237 cumulant solution, 256
Boltzmann's constant, 114, 129 currencies, 273
Brillouin zone, 212, 292 current-current noise, 116
broadband bias, 302 cusp, 161
Brownian motion, 132, 135, 151, 155, 188,275,286
Bust theorem, 137 data window, 300, 301
decay parameter, 141
Campbell's theorem, 95, 96 delivery, 271
generalized, 111 delivery price, 271
canonical form, 199 delta correlation, 169, 176, 184
324 INDEX
delta function, 35, 169, 184, 193, 239 expectation value, 5
multi-dimensional, 39 extrapolation, 288
density
matrix, 117 factorization approximation, 138
of states (DOS), 211 factorization of probabilities, 147
operator, 117, 118 fast Fourier transform (FFT), 289
depolarization, 236 Fermi energy, 213
derivative, 271,282 Fermi function, 213
detailed balance, 219 Fermi-Dirac statistics, 217
deterministic function, 277, 284 filtering process, 130
diffusion, 54 filtration, 285
coefficient, 141, 144 financial area, 187, 189
constant, 130 financial quantitative analysis, 135, 275
conditional, 132 fluctuation, 113, 115, 129, 133
diffusion and mobility, 58 Casimir effect, 89
diffusion coefficient, 52, 234 fluctuation-dissipation theorem, 58, 87, 114, 115,
diffusion constant, 61, 63 117, 118, 121, 123, 124, 128, 181
frequency dependent, 90 Fokker-Planck
diffusion equation, 237 operator, 147
diffusion length, 226 equation, 129, 130, 138, 164, 195, 197, 201, 207,
diffusive light, 228 208, 279
dilemma, 161 generalized, 137, 141, 154,286
Dirichlet kernel, 297 ordinary, 137, 286
discontinuity, 161 process, 168, 182, 184, 188, 201, 205
discrete process, 286 generalized, 143
discrete prolate spheroidal foreign currency, 273
function (DPSF), 290, 297, 301 forward, 271
sequence (DPSS), 290, 298, 301 contract, 271-273, 281
disjoint events, 14 price, 271
dispersion, 8 fractal process, 287
dissipation, 113, 115, 121 fundamental analysis, 284
dissipative response, 136 future, 271
distribution function, 4 contract, 272, 273
dividend, 273
donors, 215 gambling, 32
Doob's theorem, 162 first law, 32
drift vector, 133, 136, 184, 190, 201 Gambler's ruin, 42, 52, 64
drift velocity, 182 second law, 33
Gaussian
distribution, 237
effective mass, 212, 213 random force, 195
Einstein relation, 55, 91, 129, 136, 141, 157, 163, random variable, 152
170, 181,220 generalized characteristic functional, 146
generalized, 159 generating function, 7
ellipsoid, 212, 254 generation, 179
energy gap, 215 generation-recombination
ensemble, 285 noise, 130
ensemble average, 185, 239, 241 process, 139, 286
equilibrium geometric Brownian motion, 275
density operator, 114 geometric sum, 296
ensemble, 114 global minimum, 252
theorem, 121 global warming problem, 291
equipartition, 129
equipartition theorem, 84 harmonic oscillator, 123, 284
error function, 252 heating reservoir, 284
Europen contract, 273 Heaviside unit function, 169
exercise, 273 Heyney-Greenstein phase function, 248, 252
INDEX 325
ill-posed problems, 258, 289, 296 measure theory, 280
filtering, 260, 261 metronome-like driving source, 194, 200
Franklin's method, 264 mobility, 130,215
image restoration, 270 monotonic decreasing, 297
kernels, 263 Monte Carlo simulation, 237, 248
regularization, 261 multidimensional form, 153
Shaw, 266 multiple scattering, 237
statistical regularization, 268 multitapered, 297
inertia, 180 multitime characteristic function, 159
inertial system, 161 multitime correlations, 179
instability, 194 multivariate normal distribution, 29
intelligent quasilinear approximation, 207
interaction picture, 119 natural filtration, 136, 285
interpolation, 288 noise, 69
invariant embedding, 153 autocorrelation, 70
Ito's evenness, 75
calculus lemma, 135, 185, 188, 271, 275, 282, 287 filters, 73
integral, 136, 276 homogeneous, 149, 159, 171
stochastic differential equation (ISDE), 275, 278 Johnson noise, 82, 85
nonstationary, 77
jack-knife procedures, 291 Nyquist noise, 90
Johnson noise, 116, 168 Nyquist's theorem, 87
joint events, 12 shot noise, 93, 104, 139, 141, 145, 151, 155, 168,
204, 225
Karhunen-Loeve theorem, 293 generalized, 177
process, 155, 168
Langevin Rice shot noise process, 286
approach, 57, 60, 172, 175 standard engineering definition, 69
equation, 168, 184, 186, 189, 207, 276 thermal, 82
force, 125, 127 white, 130
form, 136 Wiener-Khinchine theorem, 71
problem, 200 noise anticommutator, 128
process, 168, 184 noise spectrum, 155, 163
stochastic differential equation (LSDE), 276 nondegenerate, 213, 297
treatment, 182 nonequilibrium steady state, 218
law of chemical equilibrium, 215 nonharmonic model, 285
law of mass action, 215 nonlinearity, 195
Lebesgue integration, 280 normal distribution, 277
Levy process, 287 Nyquist
line-width, 195, 198 rate, 288
linear damping, 149, 159, 171 sampling rate, 296
linked-average, 159 theorem, 87, 205
linked-moment, 144
Liouville theorem, 166 operating point, 136, 141, 163, 171, 196, 199,200,
Lorentzian spectrum, 203 206, 284
low-pass window, 297 option, 271,287
ordinary calculus rule, 187, 191, 276, 282
margin, 272 orthogonality relation, 241
call, 273 overlapping events, 14
Markovian process, 129, 132, 133, 153, 168, 169, 195
martingale measure, 136 paradox, 161
master equation, 140 partial differential equation (PDE), 164
maturity, 273 particle distribution, 237
time, 285 partition sum, 119
mean, 293 path integral, 151
mean-square fluctuations, 137 average, 146
326 INDEX
Pauli principle, 218 free, 272, 281
periodogram, 289 free profit, 271
phase fluctuation, 195, 200, 205 neutral, 282
phase function, 228, 237, 238 rotating wave
phase variable, 194 approximation, 126, 131, 195, 196
photon migration, 228 oscillator, 199
Planck' constant, 114 van del Pol oscillator (RWVP), 164, 194, 197, 200
Poisson bracket, 114, 119, 122
Poisson distribution, 23 sampling theorem, 289, 303, 304
generalized, 228, 232 scattering rate, 238
Poisson process, 140, 142, 286 Schuster periodogram, 292
Poisson sum, 303 security, 271
portfolio, 282 self-energy diagram, 257
position-position correlations, 124 self-sustained oscillator, 195
prewhitened, 300 self-sustained oscillator, 130, 164
data, 301 semi-invariants, 145
primitive cell, 212 semiphenomenological model, 249
probability, 1 short, 272
conditional probability, 16, 44, 131, 162 Slepian function, 295, 297, 301
frequency ratio, 1 smoothing, 288, 289
mathematical probability, 2 smoothing window, 300, 301
subjective probability, 3 snake-like mode, 245
put, 271 spectral analysis, 289
spherical harmonics, 239
spot
quasi-Fermi level, 219
price, 272
quasi-Markovian limit, 128
time, 284
quasilinear, 130
value, 285
approximation, 136, 142, 171
standard deviation, 275
case, 168
stationary, 156
treatment, 195, 206
stationary state, 136
Stieltjes integral, 278
radiative transfer equation, 237 stochastic differential equation, 136, 188, 283
random process, see stochastic process stochastic integral, 188
random variable, see stochastic variable, 293 stochastic model, 130
random walk stochastic process, 44, 136
Brownian motion, 56 Gaussian, 45
diffusion, see diffusion Markovian, 45, 228
one-dimensional, 50, 54 nonstationary, 77
photon migration, 228 Poisson, 48
reciprocal lattice, 212 spectrum, 69
recombination, 179 stationary, 45
rectangular window, 289 stochastic variable, 5
recurrence relation, 240, 249 Gaussian, 66
regression, 163 sum, 19
regression theorem, 129-131, 142 stock, 187
regularization procedure, 297 stock index futures, 273
reservoir, 126 Stokes, 121
reservoir oscillator, 126 Stokes-anti-Stokes ratio, 126, 130
reshaping, 249 storage cost, 274
residual spectrum, 289
resistive nonlinearity, 195 tapering, 289
resonant circuit, 195 Taylor expansion, 134, 141, 152
response function, 114 telegrapher's equation, 237
Riemann integral, 278 thermodynamic treatment, 216
Riemann sum, 278, 286 threshold, 204, 210
risk time autocorrelation, 291
INDEX 327
time reversal, 115, 122, 142, 160, 180 value, 284
time series, 288, 289
time shift, 194, 200 valence band, 214
time-dependent Green's function, 240 van del Pol equation, 199
time-ordered multiplication, 240 velocity-velocity noise, 116
transfer matrix, 116 volume fluctuation, 158
transition
probability, 131, 132, 134, 147, 218
white noise, 184, 197, 277, 286
rate, 140 Wiener process, 184, 188, 275
translation invariance, 238 Wiener-Khinchine
transport mean free path, 239 form, 113
turbid medium, 227, 237 relation, 80, 173
two-time correlation, 129 theorem, 71, 74, 202, 203, 288, 291
window factor, 225
Uhlenbeck's model, 285 windowing, 289
Uhlenbeck-Ornstein process, 151 Wold theorem, 291,292
unbiased, 293
form, 289 zero-point contribution, 116, 203
uncorrelated, 294 zero-point effect, 125
underlying zero-point oscillations, 116
asset, 271 zero-point energy, 89

(Oxford Finance Series) Melvin Lax, Wei Cai, Min Xu-Random Processes in Physics and Finance-Oxford University Press, USA (2006)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Oxford Finance Series) Melvin Lax, Wei Cai, Min Xu-Random Processes in Physics and Finance-Oxford University Press, USA (2006)

Uploaded by

Copyright:

Available Formats

Random Processes in Physics and Finance

This page intentionally left blank

Department of Physics, City University of New York

ISBN 0-19-856776-6 978-0-19-856776-9

A Note from Co-authors xiv

2 What is a random process 44

3 Examples of Markovian processes 48

7 The fluctuation-dissipation theorem 113

8 Generalized Fokker-Planck equation 129

9 Langevin processes 168

10 Langevin treatment of the Fokker-Planck process 182

11 The rotating wave van del Pol oscillator (RWVP) 194

12 Noise in homogeneous semiconductors 211

13 Random walk of light in turbid media 227

14 Analytical solution of the elastic transport equation 237

15 Signal extraction in presence of smoothing and noise 258

16 Stochastic methods in investment decision 271

17 Spectral analysis of economic time series 288

The purpose of this chapter is to provide a review of the concepts of probability

1.1 Meaning of probability

1.1.1 Frequency ratio definition

1.1.2 A priori mathematical approach (Kolmogorov)

1.1.3 Subjective probability

1.2 Distribution functions

If X is a continuous variable, we take

1.3 Stochastic variables

1.4 Expectation values for single random variables

If X is a random variable, the average or expectation of a function f ( X ) of X is

in the discrete case. Thus, it is the value f ( x j ) multiplied by the probability PJ of

If the range of x is broken up into intervals [XJ,XJ+AXJ] where Xj + AXJ = Xj+i,

is simply the discrete approximation to the integral.

The discrete case is included, with the help of Eq. (1.7).

has the first moment m but no second or higher moments.

The most important expectation is the characteristic function

is referred to as a generating function.

for s > 0. If it is bounded from below, one can use

1.6 Measures of dispersion

The remarkable Chebychev inequality limits the probability of large deviations

By separating a factor exp(imt) the cumulants can be expressed in terms of the

Cumulants were introduced as a tool in quantum mechanics by Kubo (1962) in

The characteristic function, Eq. (1.54), can be rewritten in the form

where X is a Gaussian random variable of mean 0. A Gaussian variable with a

This is a convenient way to calculate the average of an exponential of a Gaussian

Skewness and kurtosis

Define p\(x)dx as the probability of finding X in (x, x + dx} regardless of the

which is referred to as the marginal distribution of x. Conversely,

to the corresponding points in the (u, v) plane,

by a transformation whose Jacobian is unity (see Fig. 1.1).

FlG. 1.1. Transformation from x, y variables to u, v variables.

We can verify that

(b) The average separation between points is

(c) The mean square separation is

(d) The average of any function W of the separation \X — Y\ is

The use of projection operators such as YJ is often a convenient tool in solving

Uncorrelated or independent random variables

It is clear that two independent variables are necessarily uncorrelated. The

1.8 Conditional probabilities and Bayes' theorem

where P(x, y} is the joint probability density of x and y and

is the probability density of x if no information is available about y. Of course, Eq.

is valid if B represents the event Xn = xn and A represents the compound event

then Eq. (1.87) generalizes to

which describes the probability of an event B in terms of possible starting points,

Thus observing 10 heads in a row has caused the a priori probability, 10 6, of

Thus the conditional probability is given by