You are on page 1of 9

Exact Distribution of Continuous Variables in Sequential Analysis

Author(s): Paul A. Samuelson


Source: Econometrica, Vol. 16, No. 2 (Apr., 1948), pp. 191-198
Published by: The Econometric Society
Stable URL: http://www.jstor.org/stable/1907233
Accessed: 10-07-2015 09:08 UTC

REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/1907233?seq=1&cid=pdf-reference#references_tab_contents

You may need to log in to JSTOR to access the linked references.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

http://www.jstor.org

This content downloaded from 119.15.93.148 on Fri, 10 Jul 2015 09:08:07 UTC
All use subject to JSTOR Terms and Conditions
EXACT DISTRIBUTION OF CONTINUOUS VARIABLES
IN SEQUENTIAL ANALYSIS
By PAUL A. SAMUELSON
SUMMARY
ONE of the most important wartime developments in statistical
theory has been the field of "sequential analysis." Thanks largely to
the work of Wald,' we can now improve greatly upon the "classical"
methods that test hypotheses by means of samples of a p7edetermined
size. Instead, we will cut down the work almost in half by letting the
observations themselves decide whether we should, at any stage, come
to a final decision or continue examining additional observations.
Because the results at any stage in sequential analysis depend upon
the results at previous stages, some complex problems of conditional
probability arise. For most practical purposes, it is not necessary to
solve these problems. Also, in two important special cases, exact ex-
pressions for the relevant statistical distributions are available: (a)
For the case where the likelihood ratio is an equally-spaced discrete
variable; and (b) where we can neglect the "excess of the likelihood
ratio" beyond the critical boundaries.2 Useful limits are available for
still other cases.
It may be of interest, nevertheless, to present general expressions for
the exact sequential distribution (1) of the likelihood ratio, (2) of the
power function, (3) of the probability of coming to a decision at the
end of n observations, and (4) of the factorial moment-generating func-
tion of the latter probability distribution. Results will be given for only
the continuous case; but by means of Stieltjes integrals, the same for-
mulation will cover the discrete case, or more complicated mixed cases.
Some of the relationships between sequential analysis and the classical
probability problem of gambler's ruin are briefly developed.
* * *

1. Let Pa(x)dx and Pb(x)dx represent the probability distributions of


x under two alternative hypotheses, Ha and Hb; and let z = loge
[Pb(X)/PJ(X)] =z(x). Then the Wald sequential procedure is to calcu-
I A. WVald,Sequential Analysis, New York, Wiley, 1947. See references there
to earlier work. Also, see the fundamental paper, Walter Bartky, "Multiple Sam-
pling with Constant Probability," Annals of Mathematical Statistics, Vol. 14
September, 1943, pp. 363-377.
2 Wald, op. cit., Appendix A.5; M. A. Girshick, "Contributions to the Theory of

Sequential Analysis, II," Annals of Mathematical Statistics, Vol. 17, September,


1946, pp. 282-291.
191

This content downloaded from 119.15.93.148 on Fri, 10 Jul 2015 09:08:07 UTC
All use subject to JSTOR Terms and Conditions
192 PAUL A. SAMUELSON

late successively Z = zt, Z2=Zl+Z2, * * Zn =Zl+Z2+ , +Zn, etc.


If at any stage Z exceeds some specified constant, a, then Ha is rejected
and Hb accepted; if at any stage Z is less than a negative constant, b,
Ha is accepted and Hb rejected; if at any stage b <Z <a, then a new
independent observation is drawn. The values of a and b are chosen in
advance according to how often we are willing to tolerate rejecting Ha
when it is really true, and rejecting Hb when it is true.
Whether Ia Hb, or any other hypothesis is true, there will be a prob-
ability density distribution of z: F(z)dz. The frequency density of Z1,
F1(Z1), is, of course, of exactly the same form F(Zl)dZ,. But the condi-
tional distribution of Z2= z1+z2 is no longer given by the simple "con-
volution," F2(Z2)=F*F=fr'F(Z2-Z1) F(Z1)dZ,, and the usual theo-
rems concerning cumulative sums (approach to normality so long as F
has finite moments of given order' etc.) no longer hold. However, it
is clear that F2(Z2) is given by a "truncated"' convolution, in which Z1
never goes beyond the limits b and a; and more generally that
Fn(Zn)dZncan be expressed in terms of Fn-1(Zn_)dZn1 in the form
ra
F2(Z2) = F(Z2- Z)F1(Z1)dZi
a
= F(Z2- s)Fl(s)ds, - oo < Z2 < ??

ra
Fn(Zn) = f F(Zn - S)Fn-l(s)ds, - ?? < Zn < ??.

These iterated integrals can always be calculated numerically or


otherwise. In connection with the random-walk problem, Kac3 has sug-
gested one possible expression which involves the "eigenvalues" and
"eigenfunctions" of the homogeneous Fredholm integral equation
h(t) = XfbaF(t-s)h(s)ds. However, particularly in the case of non-
symmetrical frequency distributions-there seems to be little computa-
tional merit in this approach.
2. But suppose we ask for the frequency density of Z, regardless of
the subscript showing when it occurs. Call this g(Z)dZ. This is arrived
at by taking the sum of all F's, or namely by

(1) g(Z) = F1(Z) + F2(Z) + *. . + Fn(Z) +*.

Note that g, like the F's, represents a "frequency density"; it is not a


''probability density" because its integral is different from unity.
3 M. Kac, "Random Walk in Presence of Absorbing Barriers," Annals of
Mathematical Statistics, Vol. 16, March, 1945, pp. 62-67.

This content downloaded from 119.15.93.148 on Fri, 10 Jul 2015 09:08:07 UTC
All use subject to JSTOR Terms and Conditions
EXACT SEQUENTIAL ANALYSIS 193

The whole turns out to be simpler than each of its parts. Ihe func-
tion, g(Z), can be shown to satisfy a basic nonhomogeneous integral
equation of the Fredholm type.
THEOREM 1: The frequency density of any Z is uniquely defined by the
relation
ra
(2) g(Z) = F(Z) + fbF(Z - s)g(s)ds.

If we substitute (1) into (2) the proof is immediate, once we verify


that the probability character of the problem leads to rapid convergence
of the infinite series in question.
However, (2) can be established directly by intuitive reasoning as
follows. Any given value of Z can arise in only two ways: (a) by being
observed as a first observation. In this case its frequency density is
given by the first terms on the right-hand side of (2). But it may also
be observed (b) as a result of a previously observed Z, provided that
the previous Z lies within b and a. The second term on the right-hand
side of (2) defines, in terms of the distribution of g between b and a,
the frequency density of observing such a "repeat" Z.
3. A number of important properties of g(Z) follow immediately from
its definition by equation (2). These may be briefly indicated:

a. g(Z)dZ is the power function.

ao
rb r00
b _g(Z_ Z -f g(Z)dZ.

c. Expected value of n = E(n) = g(Z)dZ = 1+ gg(Z)dZ.


-oo ~~b
rb 0

Zg(Z)dZ+ Zg(Z)dZ
d. E(n)= a
zF(z)d.
_00

The first three follow immediately from the fundamental integral


equation; the fourth can also be derived from (2), but follows more
directly from the well-known Wald relation

E(n) E(Zn)
E(z)
which is to be replaced by

This content downloaded from 119.15.93.148 on Fri, 10 Jul 2015 09:08:07 UTC
All use subject to JSTOR Terms and Conditions
194 PAUL A. SAMUELSON

E(Zn2)
E(n)- =
E(z2)
if E(z)=O0.
Wald's fundamental identity concerning characteristic functions can
also be derived from the properties of the earlier indicated iterated
truncated integrals. Also, let us define
1g(t)=g(t), b < t <a,
= O, b > t > a;
g#t= (t), b > t > a,
-0, b <! < a;
g(t) = 19(t) + 2g(t);

and similarly define jF(1) and 2F(t). Then taking the "bilateral Laplace
transform," or characteristic function of both sides of our integral
equation gives us:
00X 00
f_e-pt1g(t)dt + e-Pt2g(t)

(3)
- [f00ePtF(t)dt] [I +J e-Ptig(t)dt].

By differentiating this identity with respect to p, the Wald formula


for the average sample size can also be derived. Numerous other rela-
tionships between the moments, semi-invariants, and characteristic
functions of F, 1F,2F, G, 1G,2G, F2, . . F, . . . can also be readily de-
rived.
4. The basic Fredholm integral equation (2) can be solved or ap-
proximated by a variety of well-known devices. But rather than go
into these, it will be illuminating to generalize the relationship and to
sketch briefly a derivation of the generating function of the probabili-
ties of coming to a definite decision at stage 1, 2, 3, - - - , n, * v . Call
these probabilities P1, P2, Pn-n .- . As usual, let
-, P
co X0

Zi[kP, = E [i(i - 1) *** (i - k + 1)Pi]


1 1

be the kth factorial moment of the P's. Also, by definition


b 0oo
Pn- Fn(Zn)dZn + j Fn(Zn)dZn.
-00G

Intuitively, we can arrive at the first moment of the P's, or what is


the same thing, the expected average sample size, by the following

This content downloaded from 119.15.93.148 on Fri, 10 Jul 2015 09:08:07 UTC
All use subject to JSTOR Terms and Conditions
EXACT SEQUENTIAL ANALYSIS 195

rather intricate reasoning. Suppose that we were to change our original


frequency distribution F(z) from one times F to some number just less
than one times F, say O.999F, so that there is a little "leak,ge" of
probability at each stage. Then P1 would be reduced by about 1/1000;
P2 would be reduced by about 2/1000; P3 by 3/1000; and Pn by n/1000.
Therefore, the change in >j7Pi with respect to the leakage factor will
be approximatelyproportionateto the first moment EiP.
This suggests the following basic theorem:
THEOREM 2. The "generating" function of the probability of arriving
at a decision at each stage is gt"en by
rb rX
G(X) = f g(z, X)dZ +
a g(Z,
(4) f X)dZ,
oo
where
ra
(5) g(Z, X) = XF(Z) + X F(Z - s)g(s, X)ds

for - o <Z < oo; and consequently the probability distribution of the P's
and its factorial moments are given by the following derivatives
G() (O)
Pi = i = 1, 2, * *

(6) oo
Zi[k]P = G(k)(1), k = 1, 2,
1

To prove this we need only note that


(5) g(z, X) = XF1(Z)+ X2F2(Z)+ * + XnF,(Z) + *
does satisfy the integral equation (5); and that consequently
b oo
0 o
G(X) = b (Zx )dZ + fg(Z, X)dZ = >Pii.
Actually, if we work with g*(Z) =g(Z, X)/X then g* satisfies the more
familiar Fredholm integral equation of the second kind with a parame-
ter; namely
ra
g*(Z) = F(Z) + X fF(Z -
s)g*(s)ds.

This can be solved in terms of an ascending power series in X; or by the


Fredholm method as the ratio of two everywhere converging power
series in X; by "resolvent kernels"; and in still other ways. Where the

This content downloaded from 119.15.93.148 on Fri, 10 Jul 2015 09:08:07 UTC
All use subject to JSTOR Terms and Conditions
196 PAUL A. SAMUELSON

first of these methods is practicable, it has the virtue of yielding the


iterated integrals needed for the F,'s and the PF,'s.
Note that Theorem 2 includes 1 as a special case.
5. If z can take on only equally-spaced integral values, our prob-
ability density F must be replaced by discrete probabilities of the form
F(-2), F(-1), F(O), F(1), * , F(i)
Then our integrals are replaced by sums, our kernels by matrices, and
we have corresponding to Theorem 2
a
+
g(i, X) = XF(i) XEF(i-j)g(j, X), - i<oo<
b
(7) 00 a0

G(X) =E g(i,X) - E g(i, A).


-oo b

Only between b and a need any simultaneous equations be solved, and


there the g's are given in matrix terms by
(8) [g(i, X)] = [ij- XF(i - j) ] [XF(j)],
where the right-hand side is the ratio of two polynomials in X of the
same degree.
This result corresponds to the previous Girshick4 solution of the
equally-spaced discrete case, although the present derivation is a dif-
ferent one. It may also be remarked that the present approach helps
to bring out the similarity between Bartky's special matrix methods
and the general Wald theory.
6. In conclusion, the relationship between sequential analysis and
the classical problem of "gambler's ruin" may be briefly sketched.
Imagine two individuals, A and B, with initial money "fortunes" of
size a and Ib - -b. In each game they wager so that B stands to win
from A the algebraic amount z, with relative probability F(z)dz. Then
at the end of n games A is ruined if Z. =zi+ * * * +z7,>a; or B is
ruined if Z < b; or if neither of these events occurs, a new game in the
series is to be played until finally one of the players is ruined.
This classical probability problem, studied by Huygnens, James Ber-
noulli, Montmort, De Moivre, Lagrange, Laplace, Markoff, and others,
was recognized by Barnard5 to be formally equivalent to the Wald
sequential-probability ratio test: A being ruined is equivalent to re-
jecting Ha, and B's ruin means Hb is rejected.
As Barnard has shown, the classical writers attacked this problem
by studying a function not explicitly treated by Wald: namely, the
I Girshick, op. cit.
5 G. A. Barnard, "Sequential Tests in Industrial Statistics," Supplement to
the Journal of the Royal Statistical Society, Vol. 8, 1946, pp. 1-26.

This content downloaded from 119.15.93.148 on Fri, 10 Jul 2015 09:08:07 UTC
All use subject to JSTOR Terms and Conditions
EXACT SEQUENTIAL ANALYSIS 197

probability-once Z has already reached the value X-of the final Z's
being less than b, so that B is ruined. For the discrete case this function
satisfies a fairly simple difference equation with easily prescribed
boundary conditions. In the same discrete case, if we ask for Un(X),
the probability of the above outcome in exactly n games, we are led to
a corresponding partial-difference equation.
For brevity I shall confine my attention to the continuous case, to
show the connection between this approach and the above integral
equations which define the exact solution for the Wald process.
The integral equation
ra
(2) g(Z) = F(Z) + l F(Z - s)g(s)ds

is known from the Fredholm-Volterra theory to have an explicit solu-


tion of the form
ra
(9) g(Z) = F(Z) + M(Z, X)F(X)dX

where the "resolvent kernel," M(Z, X), satisfies either of the integral
equations
ra
M(Z, X) = F(Z - X) + f F(Z - s)1(s, X)ds
(10) b
= F(Z - X) + J M(Z, s)F(s - X)ds.

Both of these relations could be given a simple, intuitive probability


interpretation. Similar reasoning could also show that M(Z, X)dZ is
the probability-when already at X-of encountering a given final Z.
Let us now define
rb rX
u(X)= MJllI(Z, X)dZ and V(X) = S(Z, X)dZ.

Then it is not too hard to show that U(X) has the already described
probability interpretation of the classical writers. Moreover, a single
direct integration with respect to X of the first form of (10) will yield
the following integral equations:
b a
u(x) = F(s -X)ds + J F(s-X)U(s)ds,
(11) = b
V(X)- F(8-X)ds + JF(s- X)U(s)ds = 1 - U(X).

This content downloaded from 119.15.93.148 on Fri, 10 Jul 2015 09:08:07 UTC
All use subject to JSTOR Terms and Conditions
198 PAUL A. SAMUELSON

To evaluate the exact distribution of n, the number of games or


observations, we define U(X, X), V(X, X), M(Z, X, X), etc., by putting
XF(Z) for F(Z) in all of the above relations. Then by partial differenti-
ation with respect to X, as in Section 5, we can easily evaluate terms
of the form Un(X), V, (X), and the higher moments and generating
functions of such terms and of Pn-= U(0) + Vn(0). Other generalizations
will immediately suggest themselves.
Massachusetts Institute of Technology

This content downloaded from 119.15.93.148 on Fri, 10 Jul 2015 09:08:07 UTC
All use subject to JSTOR Terms and Conditions

You might also like