Professional Documents
Culture Documents
Chapter 4
Chaining
This chapter introduces some of the tools that are used in the next chapter
(on chaining) to construct approximations to stochastic processes.
Section 4.1 describes two related methods for quantifying the complexity of
a metric space: by means of covering/packing numbers or by means of
majorizing measures.
Section 4.2 defines covering and packing numbers.
Section 4.3 explains why chaining bounds are often expressed as integrals.
Section 4.4 presents a few simple ways to bound the Orlicz norm of a
maximum of finitely many random variables.
Section 4.5 illustrates the method for combining the chaining with the max-
imal inequalities like those from Section 4.4 to control the norm of the
oscillation of a stochastic process.
Section 4.6 mentions a more subtle alternative to packing/covering, which
is discussed in detail in Chapter 7.
Section 4.7 discusses chaining with tail probabilities.
Section 4.8 compares various chaining methods by applying them to the
humble example of Brownian motion on the unit interval.
With such a convention the real challenge becomes: find bounds for
finite S that do not grow unhelpfully large as the size (cardinality) of S
increases. For example, if T is any countable subset of T one might hope
to get a reasonable bound for P suptT Xt by passing to the limit along a
sequence of finite subsets Sn that increase to T . If T is dense in T the
methods from Chapter 5 then take over, leading to bounds for P suptT Xt
if the version of X has suitable sample paths.
The workhorse of the modern approach to approximating stochastic pro-
cesses is called chaining. Suppose you wish to approximate maxtS Xt on
a (possibly very large) finite subset S of T . You could try a union bound,
such as
X
P{maxtS |Xt | } P{|Xt | },
tS
but typically the upper bound grows larger than 1 as the size of S in-
creases. Instead you could creep up on S through a sequence of finite subsets,
S0 , S1 , . . . , Sm = S, breaking the process on S into a contribution from S0
plus a sum of increments across each Si+1 to Si pair.
To carry out such a strategy you need maps `i : Si+1 Si for i =
0, . . . , m 1. The composition Lp = `p `p+1 `m1 maps Sm into Sp ,
for 0 p < m. Each t = tm in Sm is connected to a point t0 = L0 t in S0 by
a chain of points
`m1 `m2 0`
\E@ the.chain <2> tm = t tm1 = Lm1 (t) tm2 = ... t0 = L0 (t),
Xm1
\E@ X.increments <3> X(t) X(t0 ) = X(ti+1 ) X(ti ) with ti = Li (t).
i=0
t0= L0(t)
S0
S1
tm-2 = m-2tm-1
tm-1 = m-1t
S m-1
Sm
t=tm
Remark. To me ` stands for link. The pair (t, `i t) with t Si+1 defines
a link in the chain connecting t to L0 t.
The final sum is the same for every t; it also provides an upper bound for
the maximum over t:
Xm1
\E@ max.link <4> maxtS |X(t) X(t0 )| maxsSi+1 |X(s) X(`i (s))| .
i=0
The tail probability P{maxtS |Xt | +} is less than P{maxtS0 |Xt | }
plus the double sum over i and Si+1 . If the growth in the size of the Si s could
be offset by a decrease in the d(s, `i s) distances you might get probabilistic
bounds that do not depend explicitly on the size of S.
The extra details involving the choice of the i s makes chaining with tail
probabilities seem more complicated that the analogous argument where one
merely takes some norm of both sides of inequality <4>, giving
X
m1
\E@ norm.max.link <5> max |X(t) X(t0 )| maxsSi+1 |X(s) X(`i (s))| .
tS i=0
(i) (X + Y ) (X) + (Y )
Chaining::covering <7> Definition. For a subset F of T write NT (, F, d) for the -covering num-
ber, the smallest number of closed -balls needed to cover F . That is, the
covering number is the smallest N for which there exist points t1 , . . . , tN
in T with miniN d(t, ti ) for each t in F . The set of centers {ti } is
called a -net for F .
Remark. Notice a small subtlety related to the subscript T in the
definition. If we regard F as a metric space in its own right, not just as
a subset of T , then the covering numbers might be larger because the
centers ti would be forced to lie in F . It is an easy exercise (select a
point of F from each covering ball that actually intersects F ) to show
that NF (2, F, d) NT (, F, d). The extra factor of 2 would usually
be of little consequence. When in doubt, you should interpret covering
numbers to refer to NF .
Some metric spaces (such as the whole real line under its usual metric)
cannot be covered by a finite set of balls of a fixed radius. A metric space T
for which NT (, T, d) < for every > 0 is said to be totally bounded.
A metric space is compact if and only if it is both complete and totally
bounded (Dudley, 2003, Section 2.3).
I prefer to work with the packing number pack(, F, d), defined as the
largest N for which there exist points t1 , . . . , tN in F that are -separated,
that is, for which d(ti , tj ) > if i 6= j. Notice the lack of a subscript T ; the
packing numbers are an intrinsic property of F , and do not depend on T
except through the metric it defines on F .
Chaining::cover.pack <8> Lemma. For each > 0,
Proof For the middle inequality, observe that no closed ball of radius /2
can contain points more than apart. Each of the centers for pack(, F, d)
must lie in a distinct /2 covering ball. The other inequalities have similarly
simple proofs.
Chaining::rr.norm <9> Example. Let kk denote any norm on Rk . For example, it might P be or-
dinary Euclidean distance (the `2 norm), or the `1 norm, kxk1 = ik |xi |.
The covering numbers for such norms share a common geometric bound.
Write BR for the ball of radius R centered at the origin. For a fixed ,
with 0 < 1, how many balls of radius R does it take to cover BR ?
Equivalently, what are the packing numbers for BR ?
Let {x1 , . . . , xN } be any set of points in BR that is R-separated. The
closed balls B[xi , R/2] of radius R/2 centered at the xi are disjoint and
their union lies within BR+R/2 . Write for the Lebesgue measure of the
unit ball B1 . Each B[xi , R] has Lebesgue measure (R/2)k and BR+R/2
has Lebesgue measure (R + R/2)k . It follows that
k
(R + R/2)k
2+
N = (3/)k
(R/2)k
That is, pack(R, BR , d) (3/)k for 0 < 1, where d denotes the metric
corresponding to kk.
(i) PM 1 (N )
PB M 1 (N/PB)
kM k 2 1 (N ).
kM k C0 1 (N )
By Jensens inequality,
{(M/) 1} (M/) 1
Divide both sides of the second inequality by N , add, then take expectations
to deduce that
1 X
P (M/) 1 + P(Xi ) 2,
N i
Choose L so that L1
2 (1) c1 . Then (L) (L) c1 if () () 1
and
()() ()
e := (c0 K1 L2 ) if () () 1.
for all finite sets of increments with d(si , ti ) > 0. For example, if were
expected value and the kXs Xt k2 d(s, t) then H(N ) = 1 2 (N ) =
p
log(1 + N ) .
Proof The first inequality results from applying to both sides of the
inequality
Xm1 |X(s) X(`i s)|
max |X(t) X(L0 t)| i max
tSm i=0 sSi+1 d(s, `i s)
Chaining::subgaussian.norm <13> Example. Suppose the {Xt : t T } process has subgaussian increments
2
with kXs Xt k2 d(s, t). The function 2 (t) = et 1 satisfies the
assumptions of Theorem <10> part (iv). Inequality <11> holds
p with equal
to the kk2 and H(N ) a constant multiple of 1
2 (N ) = log(1 + N ) .
Because pack(r, T, d) = 1 for all r > diam(T ) we may as well assume
0 diam(T ), in which case pack(r, T, d) 2 for all r 1 . That lets us
absorb the pesky 1+ from the log() into the constant, leaving
Z 1 p
max |X(t) X(L0 t)|
C log pack(r, T, d) dr,
tSm
2 0
Lemma <12> captures the main idea for chaining with norms. The next
Theorem uses the example of oscillation control to show why the uniform
approximation of {Xs : s Sm } by {Xs : s Sm } is such a powerful tool.
Then for each > 0 there exists a > 0 for which (osc(, X, S)) < for
every finite subset S of T .
Define := maxtSm |X(t) X(L0 t)|. Lemma <12> gives () < /5.
The value N0 = #S0 , which only depends on , is now fixed.
There might be many pairs s, t in Sm for which d(s, t) < , but they
correspond to at most N20 pairs L0 s, L0 t in S0 for which
Xm1 Xm1
\E@ chain.lengths <16> d(L0 s, L0 t) d(Li+1 s, Li s) + + d(Li+1 t, Li t) 40 + .
i=0 i=0
For the subgaussian case, where H(N ) grows like log N , we could
afford to invoke a finite maximal inequality to control the contributions from
the pairs in S0 by a constant multiple of H(N02 ) (40 + ), which could be
controlled by <15> if = 0 because H(N02 ) constant H(N0 ). Without
such behavior for H we would need something stronger than <15>.
For general H() it pays to make a few more trips up and down the
chains, using a clever idea of Ledoux and Talagrand (1991, Theorem 11.6).
s tE,F tF,E t
E F
The map L0 defines an equivalence relation on Sm , with t t0 if and only
if L0 t = L0 t0 . The corresponding equivalence classes define a partition m
of Sm into at most N0 subsets. For each distinct pair E, F from m choose
points tE,F E and tF,E F such that
then define
|X(tE,F ) X(tF,E )|
:= max : E, F m , E 6= F } .
d(E, F )
Assumption <11> implies () H(N02 ).
Remark. The definition of might be awkward if d were a semi-metric
and d(E, F ) = 0 for some pair E 6= F . By working with equivalence
classes we avoid such awkwardness.
L0s = L0tE,F L0t = L0tF,E If s and t belong to different sets, E and F , from m then
Remark. Notice that such an inequality can hold only for a bounded
parameter set, because H is a bounded metric. Typically it would
follow from a slight strengthening of a Hellinger differentiability
condition. In a sense, the right metric to use is h. The upper bound
in Assumption <19> ensures that the packing numbers under h
behave like packing numbers punder ordinary Euclidean distance. The
lower bound ensures that P Ln (t) decays rapidly as t increasessee
inequality <22> below.
Then a chaining argument will show that, for each 0 in [0, 1], the esti-
mator bn converges at an n1/2 -rate to 0 . More precisely, there exists
constants C3 and C4 for which
\E@ mle.dev <20> P0 ,n { n|bn 0 | y} C3 exp(C4 y 2 ) for all y 0 and all n.
The MLE also maximizes the square root of the likelihood process. The
standardized estimator btn = n(bn 0 ) maximizes, over the interval Tn =
{t R : 0 + t/ n [0, 1]}, the process
s
Ln (0 + t/ n)
Zn (t) =
Ln (0 )
Y q
= (xi , t) where (z, t) := p(z, 0 + t/ n)/p(z, 0 ) .
in
P0 ,n {|b
tn | y0 } P0 ,n {sup|t|y0 Zn (t) 1} P0 ,n sup|t|y0 Zn (t).
The last expected value involves a supremum over an uncountable set, which
sample path continuity allows us to calculate as a limit over maxima over
finite sets.
Let me show how Lemma <12> handles half the range, supty0 Zn (t).
The argument for the other half is analogous.
To simplify notation, abbreviate P0 ,n to P, with 0 fixed for the rest
of the Example. Split the set {t Tn : t y0 } into a union of intervals
Jk := [yk , yk+1 ) Tn , where yk = y0 + k for k N. Then
X
\E@ strat <21> P supty0 Zn (t) P suptJk Zn (t)
k0
Inequality <19> provides the means for bounding the kth term in the
sum <21>. The lower bound from <19> gives us some control over the
expected value of Zn :
Yq
PZn (t) n p(xi , 0 + t/ n)p(xi , 0 )
in
n
= 1 1/2h2 (P0 , P0 +t/n )
\E@ pwise.bound <22> exp( 1/2C12 t2 )
for each t in Tn . The upper bound in <19> gives some control over the
increments of the Zn process: for t1 , t2 Tn ,
After some fiddling around with constants, the bound <20> emerges.
which highlights the role of the 2 -Orlicz norm in controlling the increments
of Gaussian processes.
Fernique (1975) proved many results about the sample paths of a cen-
tered Gaussian process X. For example, he proved (his Section 6) a result
stronger than: existence of a majorizing measure implies that
\E@ bdd.paths <26> suptT |X(, t)| < for almost all .
with the convention that any terms for which i (s) = 0 are omitted from
the maxit matters only that |X(s) X(`i s)| Mi+1 i (s) for all s Si+1 .
For i 0,
X
P{Mi+1 i } P{|X(s) X(`i s)| i i (s)} Ni+1 (i )
sSi+1
Now comes the clever part: How should we choose the Si s and the i s?
I do not know any systematic way to handle that question but I do know
some heuristics that have proven useful.
Let me attempt to explain for the specific case of -norms, for > 0,
with
first for the traditional approach based on packing numbers then for the
majoring measure alternative.
Remark. You should not be misled by what follows into believing that
all chaining arguments with tail probabilities involve a lot of tedious
fiddling. I present some of the messy details just because so few authors
seem to explain the reasons for their clever choices of constants.
that is
Xm1 Xm1
P{max |X(t) X(L0 t)| > i i } Ni+1 (i ).
tSm i=0 i=0
P P
P the {i } to control i i , thenP
We could choose hope for a useful Ni+1 (i ),
or to control Ni+1 (i ), hoping for a useful i i .
Here are some heuristics for as in <29>.
To make the right-hand side of <30> small we should expect m1
P
i=0 i i
to be larger than Pm , which an analog of Lemma < 12> with H = C1
Pm1 1
bounds by a constant multiple of i=0 (N i+1 ) i . That suggests we
should make i a bit bigger than 1 (Ni+1 ). Try
i = 1 1
(Ni+1 (yi )) c (Ni+1 ) + yi for some yi > 0.
I write the extra factor in that strange way because it gives a clean bound,
min (1, Ni+1 (i )) = min (1, Ni+1 / (i )) min (1, 1/ (yi )) C eyi .
Inequality <30> then implies
Xm1 Xm1
1 eyi .
\E@ pack.bnd2 <31> P{max m (t) > c (Ni+1 ) + yi i } C
t i=0 i=0
Now we need to choose a {yi } sequence to make the sums tractable, then
maybe bound sums by integrals to get neater expressions, and so on. If
you find such games amusing, look at Problem [2], which guides you to a
more elegant bound. For the special case of subgaussian increments it gives
existence of positive constants K1 , K2 , and c for which
Z p
2
y + pack(r, T, d) + log(1/r) dr} K2 ecy .
p
P{max m (t) > K1
t 0
p
Remark. The integral of pack(r, T, d) is a surrogate for P maxt m (t).
The presence of a factor K1 , which might be much larger than 1, dis-
appoints. With a lot more effort, and sharper techniques, one can get
bounds for P{maxt m (t) P maxt m (t) + . . . }. See, for example
Massart (2000), who showed how tensorization methods (see Chap-
ter 12) can be used to rederive a conconcentration inequality due to
Talagrand (1996).
(ii) there exists an x0 for which (x) 21 exp(x ) for all x x0 , so there
1/
exists a y0 for which that 1
(y) log (2y) for all y y0 .
where Ei (t) denotes the unique member of i that contains the index point t.
Remark. If you check back you might notice that originally the ith
summand contained 1 (Ni ). The change has only a trivial effect on
the sum because 1
(y) grows like log1/ (y).
0 S0
1 S1
2 S2
3 S3
Remark. We could also ensure, for any given finite subset S of T , that
S0 S1 Sm = S, for some m
for the link length, i (t) = d(Li+1 (t), Li (t)) diam(Ei (t)). If we choose
i = y1
(Ni+1 ) with y large enough
(i ) 1/ y log1/ (Ni+1 ) 2 exp(y log Ni+1 ).
Remark. Note that d2 (s, t) is just the L2 (Leb) distance between the
indicator functions of the intervals [0, s] and [0, t].
osc1 () p
\E@ Levy <34> lim =1 almost surely, where h1 () = 2 log(1/) .
0 h1 ()
See McKean (1969, Section 1.6) for a detailed proof (which is similar in
spirit to the tail chaining described in subsection 4.8.3) of Levys theorem.
Under the d2 metric the result becomes
osc2 () p
lim =1 almost surely, where h2 () = h1 ( 2 ) = log(1/) .
0 h2 ()
To avoid too much detail I will settle for something less than <34>, namely
upper bounds that recover the O(h2 ()) behavior. For reasons that should
soon be clear, it simplifies matters to replace the function h2 by the increas-
ing function (see Problem [5])
h() := 1 2
p
2
2 (1/ ) = log(1 + ) .
Given a < 1 let p be the integer for which p+1 < p . The chains
will only extend to the Sp -level, rather than to the S0 -level. If the change
from S0 to Sp disturbs you, you could work with Sei := Si+p and ei := i+p
for i = 0, 1, . . . .
The map Lp takes points s < t in Sm to points Lp s Lp t in Sp with
d2 (Lp s, Lp t) p + d2 (s, t). If d2 (s, t) < then d2 (Lp s, Lp t) < 2p , which
means that either Lp s = Lp t or that Lp t = Lp s + i2 . Define
Xm1
m,p := maxtSm |X(t) X(Lp t)| max |X(s) X(`i s)|
i=p sSi+1
PB p p H(Np ) c0 h(p ) + c0 p
By Problem [5] the h(i ) decrease geometrically fast: h(i+1 ) < 0.77h(i )
for all i. It follows that there exists a universal constant C for which
C
h(p+1 ) + p+1 1 1
PB osc(, Sm , d2 ) 2 (1/) C h() + 2 (1/PB) .
2
Now let m tend to infinity to obtain an analogous upper bound for PB osc2 ().
If we choose B equal to the whole sample space the 1 (1/PB) term is
superfluous. We have Posc2 () Ch(), which is sharp up to a multiplica-
tive constant: Problem [6] uses the independence of the Brownian motion
increments to show that P osc2 () ch() for all 0 < 1/2, where c is
a positive constant.
Now for the surprising part. If, for an x > 0, we choose
then
which is the same as <36> except that PB has been replaced by kk2 and
= 1 (and the constant is different). With those modifications, repeat the
osc(, Sm , d2 ) 2m,p + p .
P{osc(, Sm , d2 ) y}
Xm1
P{p p p } + P{ max |X(s) X(`i s)| i i }
i=p sSi+1
Xm1
2p /2 2
2Np e + 2Ni+1 ei /2
i=p
Xm1
2 exp log Np 2p /2 + 2 exp log Ni+1 i2 /2
\E@ tail.sum <38>
i=p
How to choose the i s and p ? For the sake of comparison with the
inequality <37> obtained by the clever choice of the conditioning event B,
2
let me try for an upper bound that is a constant multiple of ex /2 , for a
given x 0.
Consider the first term in the bound <38>. To make the exponential
2
exactly equal to ex /2 we should choose
q p
p = 2 log Np + x2 2 log Np + x2 ,
A similar idea works for each of the i s, except that we need to add on a
little bit more to keep the sum bounded as m goes off to infinity. If the terms
were to decrease geometrically then the whole sum would be bounded by a
multiple of the first term, which would roughly match the p contribution.
With those thought in mind, choose
p p p
i = 2 log Ni+1 + x2 + 2 log 2ip 2 log Ni+1 + x + 2(i p) log 2
so that
Xm1 2
2 exp log Ni+1 i2 /2 4ex /2
i=p
Pm1
and i i is less than
i=p
X q X p
2
2i+1 2 log(1/i+1 ) + xi + p k 2k log 2
i=p k=0
X
8 h(i+1 ) + p (x + c1 ) c2 (h() + x)
i=p
for universal constants c1 and c2 . (I absorbed the c1 into the h() term.)
With these simplifications, inequality <38> gives a clean bound,
2 /2
P{osc(, Sm , d2 ) > Ch() + Cx} 5ex
for some universal constant C. Here I very cunningly changed the to a >
to ensure a clean passage to the limit, via
4.9 Problems
Chaining::S:Problems
Chaining::P:K1934 [1] Prove the 1934 result of Kolmogorov that is cited at the start of Section 4.10.
(It might help to look at Chapter 5 first.)
Chaining::P:Psia.tail [2] In inequality <31> put yi = (y + i p) for some nonnegative y. Use the
fact that for each decreasing nonnegative function g on R+ and nonnegative
integers a and b
Xb Z b1
g(i) g(a) + g(r) dr
i=a a
for all y 0.
(i) For each t T and r > i show that B[t, r] 2i1 i B[t, r] (2i+1 Ni )1 .
Hint: Could Si B[t, r] be empty?
(ii) By splitting the range of integration into intervals where i1 r > i ,
deduce (cf. Section 4.3) that
Z D
1 X
1 dr 2C0 i 1 (2k+1 ) + 1 (Nk ) < .
0 B[t, r] i1
Chaining::P:modulus [5] From the fact that g(y) = 2 (y)/y 2 is an increasing function on R+ deduce
that the function
q
h() = 1/ g(1 1 2
p
2 2
2 (1/ )) = (1/ ) = log(1 + )
Chaining::P:indep.BM [6] Suppose Z1 , . . . , Zn are independent random variables, each distributed N (0, 1)
(with density ). Define Mn = maxin |Zi |. For a positive constant c, let
xn = xn (c) be the value for which P{|Zi | xn } = c/n.
(i) Show that P{Mn xn } = (1 c/n)n ec as n .
q
If 0 x + 1 2 log n, show that P{|Zi | x} 2(1 + x) n1 2/.
p
(ii)
p q
Deduce that the xn ( 2/ ) + 1 > 2 log n.
(iii) Deduce that there exists some positive constant C for which PMn C log n
for all n.
(iv) For X a standard Brownianp motion as in Section 4.8, deduce from (iii)
that P osc(, X) c log(1/) for all 0 < 1/2, where c is a positive
constant. Hint: Write 2m/2 X1 as a sum of 2m independent standard normals.
Chaining::P:normal.max [7] (A sharpening of the result from the previous Problem.) The classical
bounds (Feller, 1968, Section VII.1 and Problem 7.1) show the normal tail
probability (x) = P{N (0, 1) > x} behaves asymptotically like (x)/x.
2
1 1 2x x(x)/(x)
More precisely,
2
1 for all x > 0. Less
precisely,
log c0 (x) = 2 x + log x + O(x ) as x , where c0 = 2 .
(i) (Compare with Leadbetter et al. 1983, Theorem 1.5.3.) Define an = 2 log n
and Ln = log an . For each constant define m,n = an (1+ )Ln /an . Show
that
c0 (m,n ) = n1 exp ( Ln + o(1))
4.10 Notes
Chaining::S:Notes
Credit for the idea of chaining as a method of successive approximations
clearly belongs to Kolmogorov, at least for the case of a one-dimensional
index set. For example, at the start of the paper of Chentsov (1956):
The statement of the Theorem was footnoted by the comment This theorem
was first published in a paper by E. E. Slutskii [2], with a reference to a 1937
paper that I have not seen. See Billingsley (1968, Section 12) for a small
generalization of the theoremwith credit to Kolmogorov, via Slutsky, and
Chentsovand a chaining proof.
See Dudley (1973, Section 1) and Dudley (1999a, Section 1.2 and Notes)
for more about packing and covering. The definitive early work is due to
Kolmogorov and Tikhomirov (1959).
Dudley (1973) used chaining with packing/covering numbers and tail in-
equalities to establish various probabilistic bounds for Gaussian processes.
Dudley (1978) adapted the methods using the Bernstein inequality and met-
ric entropy and inclusion assumptions (now called bracketingsee Chap-
ter 13) to extend the Gaussian techniques to empirical processes indexed
by collections of sets. He also derived bounds for processes indexed by VC
classes of sets (see Chapter 9) via symmetrization (see Chapter 8) argu-
ments. In each case he controlled the increments of the empirical processes
by exponential inequalities like those in Chapter ChapHoeffBenn.
Pisier (1983) is usually credited for realizing that the entropy methods
used for Gaussian processes could also be extended to nonGaussian processes
with Orlicz norm control of the increments. However, as Pisier (page 127)
remarked:
For the proof of this theorem, we follow essentially [10]; I have
included a slight improvement over [10] which was kindly pointed
out to me by X. Fernique. Moreover, I should mention that N.
Kono [6] proved a result which is very close to the above; at the
time of [10], I was not aware of Konos paper [6].
Here [10] = Pisier (1980) and [6] = Kono (1980). The earlier paper [10]
included extensive discussion of other precursors for the idea. See also the
Notes to Section 2.6 of Dudley (1999b).
Using methods like those in Section 4.5, Nolan and Pollard (1988) proved
a functional central limit for the U-statistic analog of the empirical process.
Kim and Pollard (1990) and Pollard (1990) proved limit theorems for a
variety of statistical estimators using second moment control for suprema of
empirical processes.
My analysis in Example <18> is based on arguments of Ibragimov and
Hasminskii (1981, Section 1.5), with the chaining bound replacing their
method for deriving maximal inequalities. The analysis could be extended
to unbounded subsets of R by similar adaptations of their arguments for
unbounded sets.
See Pollard (1985) for one way to use a form of oscillation bound (under
the name stochastic differentiability) to establish central limit theorems for
M-estimators. Pakes and Pollard (1989, Lemma 2.17) used a property more
easily recognized as oscillation around a fixed index point.
References
Billingsley68book Billingsley, P. (1968). Convergence of Probability Measures. New York:
Wiley.
Dudley78clt Dudley, R. M. (1978). Central limit theorems for empirical measures. Annals
of Probability 6, 899929.
Dudley2003RAP Dudley, R. M. (2003). Real Analysis and Probability (2nd ed.). Cambridge
studies in advanced mathematics. Cambridge University Press.
KimPollard90cuberoot Kim, J. and D. Pollard (1990). Cube root asymptotics. Annals of Statis-
tics 18, 191219.
NolanPollard88Uproc2 Nolan, D. and D. Pollard (1988). Functional limit theorems for U-processes.
Annals of Probability 16, 12911298.
PakesPollard89simulation Pakes, A. and D. Pollard (1989). Simulation and the asymptotics of opti-
mization estimators. Econometrica 57, 10271058.
Pollard85NewWays Pollard, D. (1985). New ways to prove central limit theorems. Econometric
Theory 1, 295314.
Talagrand2005MMbook Talagrand, M. (2005). The Generic Chaining: Upper and lower bounds of
stochastic processes. Springer-Verlag.