Kernel regression uniform rate estimation for censored data under α-mixing condition

Kernel regression uniform rate estimation
for censored data under -mixing

arXiv:0802.2800v1 [math.ST] 20 Feb 2008
condition
Zohra Guessoum1 and Elias Ould-Sad2
1
Faculte de Mathematiques, Univ. des Sci. et Tech. Houari Boumedienne, BP 32, El
Alia, 16111, Algeria.
e-mail: z0guessoum@hotmail.com
2
L.M.P.A. J. Liouville, Univ. du Littoral Cote dOpale, BP 699, 62228 Calais, France.
e-mail: ouldsaid@lmpa.univ-littoral.fr(corresponding author)
Abstract: In this paper, we study the behavior of a kernel esti-

mator of the regression function in the right censored model with
-mixing data . The uniform strong consistency over a real compact
set of the estimate is established along with a rate of convergence.
Some simulations are carried out to illustrate the behavior of the
estimate with different examples for finite sample sizes.
Keywords and phrases: Censored data Kernel estimator Nonpara-
metric regression Rate of convergence Strong consistency Strong
mixing.
Mathematics Subject classification (2000) 62G05 62G20.
Contents
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Definition of estimators . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Assumptions and main result . . . . . . . . . . . . . . . . . . . . . 5
4 Simulations Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1. Introduction.
Consider a real random variable (rv) Y and a sequence of strictly stationary

rvs (Yi )i1 with common unknown absolutely continuous disribution func-
tion (df) F . In survival analysis, the rvs may be the lifetimes of patients
under study. Let (Ci )i1 , be a sequence of censoring rvs with common un-
known df G. In contrast to statistics for complete data studies, censored
1
imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014
Z. Guessoum and E. Ould-Sad / Regression for censored and -mixing condition 2
model involves pairs (Ti , i )i=1,...,n where only Ti = Yi Ci and i = 1I{Yi Ci }

are observed.
Let X be an IRd -valued random vector. Let (Xi )i1 be a sequence of copies of
the random vector X and denote by Xi,1 , , Xi,d the coordinates of Xi . The
study we perform below is then on the set of observations (Ti , i , Xi )i1 . In
regression analysis one expectes to identify, if any, the relationship between
the Yi s and Xi s. This means looking for a function m (X) describing this
relationship that realizes the minimum of the mean squared error criterion.
It is well known that this minimum is achieved by the regression function
m(x) defined on IRd by
m(x) = IE (Y |X = x) .
There is a wide range of literature on nonparametric estimation of the regres-
sion function and many nonlinear smoothers including kernel, spline, local
polynomial, orthogonal methods and so on. For an overview on methods and
results for both theoretical and application points of view considering inde-
pendent or dependent case, we refer the reader to Collomb [13], Silverman
[45], Hardle [25], Wahba [48], Wand and Jones [47], Masry and Fan [37], Cai
[8] and Cai and Ould-Sad [9].
In the uncensored case, the behavior of nonparametric estimators built upon
mixing sequences is extensively studied. The consistency has been investi-
gated by many authors. Without exhaustivity we quote Robinson [39, 40],
Collomb [13], Roussas [42] and Lab [29]. Some other types of dependence
structure have been considered. We refer to Yakowitz [49] for Markov chains,
Delecroix [18], Lab and Ould-Sad [30] for ergodic processes, Hall and Hart
[24] for long-range memory processes, Cai and Roussas [11] for associated
random variables. Collomb and Hardle [15] obtained the uniform conver-
gence with rates and some other asymptotic results for a family of kernel
robust estimators under a -mixing condition, whereas Gonzalez-Manteiga
et al. [22] developed a nonparametric test, based on kernel smoothers, to
decide whether some covariates could be suppressed in a multidimensional
nonparametric regression study. Under the -mixing condition, the uniform
strong convergence of the Nadaraya-Watson estimator is treated in Doukhan
[19], Bosq [4] and Liebscher [34]. Roussas [42] established the consistency
with rate of the regression estimator under some summability requirement.
Techniques used in the estimation of nonparametric regression are closely
related to density estimation; in this case, kernel estimators have been ex-
tensively studied: See for example Roussas [41], Tran [44], Vieu [46] and
Liebscher [33, 35]. Cai and Roussas [10] established the strong convergence
of the kernel density with rate in the muldimensional case, while Tae and

Cox [43] established the same result with a slight difference on the rate.
Andrews [1] provides a comprehensive set of results concerning the uniform
almost sure convergence. Masry [36] derived sharp rates for the same kind
of convergence, but confined attention to the case of bounded regression.
Our goal is to establish the strong uniform convergence with rate for the
kernel regression estimate under -mixing condition in random censorship
models. For this kind of model, Cai [5, 7] established the asymptotic proper-
ties of the Kaplan-Meier estimator. The strong convergence of a hazard rate
estimator was examined by Lecoutre and Ould-Sad [31] while Liebscher [35]
derive a rate uniform for the strong convergence of kernel density and hazard
rate estimators. His result represents an improvement of that given in Cai
[6]. The consistency results concerning the nonparametric estimates of the
conditional survival function introduced by Beran [2], Dabrowska [16, 17]
in the iid case, were extended by Lecoutre and Ould-Sad [32] to the strong
mixing case. We point out that, for the independent case, the behavior of
the regression function under censorship model has been extensively stud-
ied. We can quote Carbonez et al. [12], Kohler et al. [28] and Guessoum
and Ould-Sad [23]. However, few papers deal with the regression function
under censoring in the dependent case.
To this end, we were interested in extending the result of Guessoum and
Ould-Sad [23] from the iid to the dependent case. The paper is organized as
follows: In Section 2 we give some definitions and notations under the cen-
sorship model of the regression function and strong-mixing process. Section
3 is devoted to the assumptions and main result. In Section 4, some simu-
lations are drawn to lend further support to our theoretical results. Proof
with auxiliary results are relegated to Section 5.
2. Definition of estimators
Suppose that {Yi , i 1} and {Ci , i 1} are two independent sequences of

stationary random variables. We want to estimate m(x)= IE (Y |X = x)
r1 (x)
which can be written as m(x)= where
(x)
Z
r1 (x) = yfX,Y (x, y)dy (1)
IR
with f, (x, y) being the joint density of (X, Y ) and () the density function
of the covariates.
Now, it is well known that the kernel estimator of the regression function

m() under censorship model (see, eg Carbonez et al. [12]) is given by

n
X i Ti
mn (x) = Win (x) (2)
i=1
G( Ti )
where G is the survival function of the rv C and

xXi
Kd hn
Win (x) = Pn
xXj

j=1 Kd hn
are the Watson-Nadaraya weights, Kd is a probability density function (pdf)

defined on IRd and hn a sequence of positive numbers converging to 0 as n
goes to infinity. Then (2) can be written
r1,n (x)
mn (x) =:
n (x)
where
n n
1 X i Ti x Xi 1 X x Xi

r1,n (x) = Kd and n (x) = Kd .
nhdn i=1 G( Ti ) hn nhdn i=1 hn
(3)
In practice, G is usually unknown, we replace it by the corresponding Kaplan-
Meier [27] estimator (KME) Gn defined by

1i 1I{Yi t}

Qn
i=1 1 , if t < Y(n) ,
Gn (t) = ni+1
0, if t Y(n) .
The properties of the KME for dependent variables can be found in Cai
r (x)
[5, 7]. Then a feasible estimator of m(x) is given by: mn (x) = 1,n
n (x)
where
n
1 X i Ti x Xi

r1,n (x) = Kd (4)
nhdn i=1 Gn ( Ti ) hn
is an estimator of r1 (x) and n (x) (defined in (3)) an estimator

of (x).
In what follows,

we define

the endpoints of F and G by F = sup y, F (y) > 0 ,
G = sup y, G(y) > 0 and we assume that F < and G(F ) > 0 (this
implies F < G ).
For technical reasons (see Lemma 5.1), we assume that {Ci , i 1} and
{(Xi , Yi ) , i 1} are independent; furthermore this condition is plausible
whenever the censoring is independent of the characteristics of the patient

under study. We point out that since Y can be a lifetime we can suppose it
bounded. We put ktk = dj=1 |tj | for t IRd .
P
In order to define the -mixing property, we introduce the following nota-

tions. Denote by Fik (Z) the algebra generated by {Zj , i j k} .
Definition 2.1 Let {Zi , i = 1, 2, ...} denote a sequence of rvs. Given a

positive integer n, set
n o
(n) = sup |IP(A B) IP(A)IP(B)| : A F1k (Z) and B Fk+n

(Z), k IN .
The sequence is said to be -mixing (strong mixing) if the mixing coefficient

(n) 0 as n .
There exists many processes fulfilling the strong mixing property. We quote,
here, the usual ARMA processes which are geometrically strongly mixing,
i.e., there exist (0, 1) and a > 0 such that, for any n 1, (n) an
(see, e.g., Jones [26]). The threshold models, the EXPAR models (see, Ozaki
[38]), the simple ARCH models (see Engle [20]), their GARCH extension (see
Bollerslev [3]) and the bilinear Markovian models are geometrically strongly
mixing under some general ergodicity conditions.
We suppose that the sequences {Yi , i 1} and {Ci , i 1} are -mixing with
coefficients 1 (n) and 2 (n), respectively. Cai ([7], Lemma 2) showed that
{Ti , i 1} is then strongly mixing, with coefficient
(n) = 4 max(1 (n), 2 (n)).
From now on, we suppose that {(Ti , i , Xi ) i = 1, ..., n} is strongly mixing.

Now we are in position to give our assumptions and main result.
3. Assumptions and main result

n o
Let C be a compact set of IRd which is included in C0 = x IRd | (x) > 0 .
We will make use of the following assumptions gathered here for easy refer-
ence:
A1) The bandwidth hn satisfies: limn+ nhdn = + and limn+ hn log log n =
0 where 0 < < d.
A2) The
R
kernel Kd is bounded and satisfies:
i) RIRd ktk Kd (t)dt < +,
ii) IRd (t1 + t2 + ... + td )Kd2 (t)dt < + and IRd Kd2 (t)dt < +,
R
iii) (t, s) C 2 |Kd (t) Kd (s)| kt sk for > 0.

A3) The mixing coefficient is such that (n) = O(n ) for some >
p + p2 + 3(d 1) where p = (4+d)+d
p
2 .
A4) The function r1 () definedZ in (1) is continuously differentiable.
A5) The function r 2 (x) := y 2 fX,Y (x, y)dy is continuously differen-
d
IR
tiable.
A6) D > 0 such that supu,vC |ij (u, v) (u)(v)| < D where ij is the
joint distribution of (Xi , Xj ).
A7) > 0, c1 > 0, c2 > 0, such that
(3) d
+d
c1 n d[(+1)+2+1] hdn c2 n 1 .
A8) The marginal density (.) is continuously differentiable and there exists
> 0 such that (x) > x C.
Remark 3.1 Assumption A1 is very common in functional estimation both

in independent and dependent cases. However, it must be reinforced by As-
sumptions A3 and A7 which ensure a practical calculus of the covariances
terms and the convergence of the series which appear in proof of Lemma 3.
Assumptions A2, A4, A5 and A6 are needed in the study of the bias term
of r1,n (x) which is the kernel estimator of r1 (x). We point out that we
do not require for Kd to be symmetric as in Guessoum and Ould-Sad [23].
Assumption A8 intervenes in the convergence of the kernel density. Finally,
the boundeness of Y is made only for the simplification of the proof. It can
be dropped while using truncation methods as in Lab and Ould-Sad [30].
In the sequel letter C denotes any generic constant.
Our main result is given in the following theorem which concerns the rate
of the almost sure uniform convergence of the regression function.
Theorem 3.1 Under Assumptions A1-A8, we have

(s )!
logn
sup |mn (x) m(x)| = O max , hn a.s as n .
xC nhdn
Remark 3.2 The rate obtained here is slightly different from that obtained
nq o
log logn 2
by Guessoum and Ould-Sad [23] in the independent case, which is O max nhn , hn
for d = 1. Their result can easily be generalized for higher dimensional co-
IRd , by adapting
variate, ie X nq o their Assumptions A1 and A2 to obtain
log logn 2
the rate O max nhd
, hn .
n


Remark 3.3 If we choose hn = O (log n/n)1/d+2 then Theorem 1 be-
comes:
1 !
logn

d+2
sup |mn (x) m(x)| = O a.s.
xC n
This is the optimal rate obtained by Liebscher [34] in the uncensored case.
4. Simulations Study
First, we consider the strong mixing bidimentionnal process generated by:

q
Xi = Xi1 + 1 2 i ,
Yi = Xi+1 i = 1, 2, ..., n,
where 0 < < 1, (i )i is a white noise with standard Gaussian distribution

and X0 is a standard Gaussian rv independent of (i )i . We also simulate
n iid rv Ci exponentially distributed with parameter = 1.5. It is clear
that the process (Xn , Yn , Cn ) is stationary and strongly mixing, p in fact the
process (Xn ) is an AR(1) and given X1 = x, we have Y1 = x + 1 2 2 ,
then, Y1 N (x, 1 2 ). In all cases we took = 0.9. We calculate our
estimator based on the observed data (Xi, Ti , i , ) i = 1, ..., n, by choosing a
Gaussian kernel K. In this case, we have m(x)= IE (Y1 |X1 = x) = x.In
all cases we took hn satisfying A1 and A7, that is hn = O (log n/n)1/3 .

1.5 2 2
continuous curve=True curve continuous curve=True curve continuous curve=True curve
dashed curve=Estimated curve dashed curve=Estimated curve dashed curve=Estimated curve
1.5 1.5
1
1 1
0.5
0.5 0.5
0
0 0
0.5
0.5 0.5
1
1 1
1.5
1.5 1.5
2 2 2
2 1.5 1 0.5 0 0.5 1 2 1.5 1 0.5 0 0.5 1 1.5 2 2 1.5 1 0.5 0 0.5 1 1.5 2
Fig 1. m(x) = x, with n = 50, 100 and 300, respectively.
We notice that the quality of fit increases with n (see Figure 1).
We also consider two nonlinear cases

Yi = sin( Xi ), sinus case, (5)
2
5 2
Yi = X 2, parabolic case. (6)
12 i+1
5 2 2
Then we have m(x)= sin( 2 x) for (5) and m(x)= 12 x + 5
12 (1 2 ) 2
for (6).

1.5 1.5 1.5
1 1 1
0.5 0.5 0.5
0 0 0
0.5 0.5 0.5
1 1 1
2 1.5 1 0.5 0 0.5 1 1.5 2 2 1.5 1 0.5 0 0.5 1 1.5 2 2 1.5 1 0.5 0 0.5 1 1.5 2
Fig 2. m(x) = sin 2 x, with n = 50, 100 and 300, respectively.
1.1 1.1 1.1

1.2 1.2 1.2
1.3 1.3 1.3
1.4 1.4 1.4
1.5 1.5 1.5
1.6 1.6 1.6
1.7 1.7 1.7
1.8 1.8 1.8
1.9 1.9 1.9
2 2 2
2.1 2.1 2.1

1.5 1 0.5 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5
5 2 2
Fig 3. m(x) = 12
x + 5
12
(1 2 ) 2, with n = 50, 100 and 300, respectively.
Figures 2 and 3 show that the quality of fit for the non linear model is as
good as in the linear model.
5. Proofs
We split the proof of the Theorem 3.1 in the following Lemmata.

Lemma 5.1 Under Assumptions A1, A2 i) and A4, for n large enough:
sup |IE (r1,n (x)) r1 (x)| = O(hn ) a.s. n +.
xC
Proof of Lemma 5.1: Observe that

1 T1 1I{Y1 C1 } Y1

IE |X1 = u = IE IE |Y1 | X1 = u
G(T1 ) G(Y1 )
Y1
h i
= IE IE 1I{Y1 C1 } |Y1 | X1 = u
G(Y1 )

= IE [Y1 |X1 = u]
= m(u).
Then, we have from (3)

1 1 T1 x X1

IE (r1,n (x)) r1 (x) = IE Kd r1 (x)
hdn G(T1 ) hn
1 x X1 1 T1

= IE Kd IE |X1 r1 (x)
hdn hn G(T1 )
1 xu
Z
= Kd m(u)(u)du r1 (x)
d hd hn
IR n
Z
= Kd (t) [r1 (x hn t) r1 (x)] dt
IRd
since r1 = m.
A Taylor expansion gives
r1 r1
r1 (x hn t) r1 (x) = hn (t1 (x ) + ... + td (x ))
x1 xd
where x is between x hn t and x. Then
Z

sup |IE(r1,n (x)) r1 (x)| = sup Kd (t) [r1 (x hn t) r1 (x)] dt
xC xC IRd
r1 r1
Z

hn sup Kd (t)(t1 (x ) + ... + td (x ))dt .
xC d
IR x 1 x d
Then Assumptions A1, A2 i) and A4 , give the result.

Now, we introduce the following lemma (see Ferraty and Vieu, [21] Propo-
sition A.11 ii), p. 237).
Lemma 5.2 Let {Ui , i IN} be a sequence of real random variables, with
strong mixing coefficient (n) = O(n ), > 1, such that n IN, i
IN, 1 i n |Ui | < +. Then for each > 0 and for each q > 1 :
( n ) ! q +1
2 2
2q
X
1
IP Ui > C 1+ 2 + nCq

i=1
qSn
where Sn2 =
PP
i j |cov(Ui , Uj )| .
Lemma 5.3 Under Assumptions A1-A7, we have
s !
log n
sup |r1,n (x) IEr1,n (x))| =O a.s as n .
xC nhdn

Proof of Lemma 5.3:
C is a compact set, then it admits a covering S by a finite number n of

balls Bk (xk , adn ) centered at xk = (x1,k , ...xd,k ), k {1, ..., n }. Then for all
x C there exists k {1, ..., n } such that kx xk k adn , where an verifies
d(+ 1 ) d
ad
n = hn
2
n 2 , ( is the same as in Assumption A2 iii)). Since C is
bounded there exists a constant M > 0 such that n aMd .
n
Now we set, for x C:
1 i Ti x Xi 1 1 T1 x X1

i (x) = d
Kd IE d
Kd .
nhn G(Ti ) hn nhn G(T1 ) hn
It is obvious that
n
X
i (x) = r1,n (x) IEr1,n (x)).
i=1

i (x), we have clearly |i (x)|
Writing i (x)i (xk ) = i (x)+|i (x )|.
k
Then,
n n
( )
1X i |Ti | 1 x Xi xk Xi
X
sup
i (x) sup Kd Kd
xC i=1
xC n i=1 G(Ti ) hdn hn hn
1 |T1 | 1 x X1 xk X1

+ sup IE d
Kd Kd .
xC G(T1 ) hn hn hn
From Assumption A2 iii)

n
2IE(|Y1 |) 1 x xk IE(|Y1 |) ad
X
sup
i (x) sup n
G(F ) hdn h G( ) +d
F hn

xC i=1
xC n
d(+ 1 ) d
IE(|Y1 |) hn 2 n 2 C
+d
q hn(d1) .
G(F ) hn d
nhn
P
Assumption A1 implies that supxC
n 1
i=1 i (x) = O( ) a.s.

nhdn
On the other hand, let Ui = nhdn i (xk ). In order to apply Lemma 5.2, we
have to calculate Sn2 . It is clear that
Sn2 =
X X
|cov(Ui , Uj )| + nV ar(U1 ).
i j, i6=j

We have
" #
12 T12
2 xk X1 1 T1 xk X1

2
V ar(U1 ) = IE Kd IE Kd =: I1 I2 .
G2 (T1 ) hn G(T1 ) hn
Using the conditional expectation properties and a change of variables, we
get
" #
12 T12
2 xk X1

I1 = IE Kd
G2 (T1 ) hn
" !#
xk X1 12 T12

= IE Kd2 IE |X1
hn G2 (T1 )
hdn
Z
Kd2 (t)r2 (xk hn t)dt.
G(F ) IR d
By a Taylor expansion around xk , under Assumptions A2 ii) and A5, we

obtain
I1 = O(hdn ).
From Assumption A4,

2
xk X1 1 T1 xk u
Z
2
I2 = IE Kd IE |X1 = Kd r1 (u)dt = O(h2d
n ).
hn G(T1 ) IRd hn
Finally V ar(U1 ) = O(hdn ).
Now let Sn2 = i j, i6=j |cov(Ui , Uj )|, a direct calculus of |cov(Ui , Uj )| gives
P P

i Ti xk Xi i Ti xk Xi

|cov(Ui , Uj )| = |IEUi Uj )| = IE Kd IE Kd
G(Ti ) hn G(Ti ) hn
" !#)
j Tj xk Xj j Tj xk Xj

Kd IE Kd

G(Tj ) hn G(Tj ) hn

!
Yi xk Xi Yj xk Xj

IE Kd Kd

G(Yi ) hn G(Yj ) hn
!
Yi xk Xi Yj xk Xj

IE Kd IE Kd

G(Yi ) hn G(Yj ) hn

Z Z
Ch2d
n Kd (z)Kd (t) [ij (xk zhn , xk thn ) i (xk zhn )j (xk thn )] dz dt.
IRd IRd
Assumption A6 gives
|cov(Ui , Uj )| = O(h2d
n ). (7)

On the other hand, from a result in Bosq ([4], p. 22), we have

|cov(Ui , Uj )| C (|i j|) . (8)
Then to evaluate Sn2 the idea is to introduce a sequence of integers wn which
we precise below. Then we use (7) for the close i and j and (8) otherwise.
That is
Sn2 =
XX XX
|cov(Ui , Uj )| + |cov(Ui , Uj )|
0<|ij|wn |ij|>wn
h2d
XX XX
C n + C (|i j|)
|ij|>wn 0<|ij|wn

C nh2d 2
n wn + Cn (wn ) .
h i
1 1
Now choosing wn = hdn
+1, we have Sn2 O(nhdn )+Cn2 hdn
. Assump-
tion A3 and the right part of Assumption A7 yield n2 ( h1d ) = O(nhdn ) .
n
So
Sn2 = O(nhdn ).
Finally, we have
Sn2 = Sn2 + nV ar(U1 ) = O(nhdn ).
Then, for > 0, applying Lemma 5.2, we have
( n ) ( n )
X X
k d
IP i (x ) > = IP Ui > nhn

i=1 i=1
! q +1
2 nhdn 2
q

1
C 1+C + nCq (9)
.
q nhdn
s
log n
If we replace by 0 for all 0 > 0 in (9), we get
nhdn
( n s ) ! q +1
X
k log n 20 log n 2
q
IP i (x ) > 0 C 1+C +nCq 1 .

nhdn
q

i=1
q 0 nhdn log n
(10)
By choosing q = (log n)1+b (b > 0), (10) becomes
( n s ) +1
X log n

q +1
k C20 1
IP i (x ) > 0 Cn + nCq nhdn log n 2

i=1
nhdn 0
d(+1)
2 (+1) +1
CnC0 + C0 (log n)(1+b) n1 2 hn 2
.

Now we can write

n
( n s ) ( n s )
X
k log n X X
k log n
IP max i (x ) > 0 IP i (x ) > 0 .

k=1,...,n
i=1
nhdn i=1

i=1
nhdn
d(+1)

+1
C20 (+1)
M ad
n Cn + C0 (log n)(1+b) n1 2 hn 2
1
d(1+ 2 ) d 2
CM hn n 2 C0
d(+1) d
(+1) +1 d 2 d
+ M C0 (log n)(1+b) n1 2
+ 2
hn 2
(+1)
=: CM J1 + M C0 J2 . (11)
We have from the left part of assumption A7
(3) (+1)+2+1
1 +1 d
+ 2 d( )
J2 C (log n)(1+b) n 2 n 2 2
(+1)+2+1 1
1d( )
C (log n)(1+b) n 2 .
Then, for an appropriate choice of , J2 is the general term of a convergent

2
series. In the same way, J1 nC0 and we can choose 0 such that J1 is the
general term of convergent series. Finally, applying Borel-Cantelli lemma, to
(11) gives the result.
Remark 5.1 We point out that the parameter of Assumption A7 can be
chosen such as:
1 1 (3 )
<< .
( + 1) + 2 + 1 1 d [( + 1) + 2 + 1]
This condition ensures the convergence of the series of Lemma 3.
Lemma 5.4 Under Assumptions A1-A3 and A6-A8,
(s )!
logn
sup |n (x) (x)| = O max , hn a.s as n .
xC nhdn
Proof of Lemma 5.4: We have
sup |(x) n (x)| sup |n (x) IE (n (x))| + sup |IE (n (x)) (x)| .
xC xC xC
By Assumptions A1-A3, A6-A8, by an analogous proof to that of Lemma

5.3 without censoring ( that is G(Ti ) = 1, i = 1 and Yi = 1) and putting

q
logn
= 0 nhdn
we get
s !
logn
sup |n (x) IE (n (x))| = O . (12)
xC nhdn
Furthermore, under A2 i) and A8 and using a Taylor expansion, we get
sup |IE (n (x)) (x)| = O(hn )
xC
which permit us to conclude.
Lemma 5.5 Under Assumptions A1-A3, A6-A8, we have

1

sup |r1,n (x) r1,n (x)| = o a.s as n .
xC nhn
Proof of Lemma 5.5: We have from (3) and (4)

n
X 1I
1 {Yi <Ci } Yi x Xi 1I{Y1 <C1 } Yi x Xi

|r1,n (x) r1,n (x)| = Kd Kd

nhdn Gn (Yi ) hn G(Yi ) hn

i=1
n
1 X x Xi G(Yi ) Gn (Yi )

Y i K d
nhdn i=1 hn Gn (Yi )G(Yi )

n
1 1 X x Xi

sup Gn (t) G(t) |Yi | Kd .
Gn (F )G(F ) tF nhdn i=1
hn
In the same way as for Theorem 2 of Cai (2001), it can be shown under A3
that s
log log n
sup Gn (t) G(t) = O a.s. Furthermore, from the defini-
tF n
tion of n (x), Lemma 5.4, Assumptions A1, A2 and A8, and the fact that
Y is bounded we get the result.
Proof of Theorem 3.1: We have
r1,n (x) r1,n (x) r1,n (x) IEr1,n (x))

sup |mn (x) m(x)| sup +
xC xC n (x) n (x) n (x) n (x)
IE(r1,n (x)) r1 (x) r1 (x) r1 (x)

+ +
n (x) n (x) n (x) (x)
(
1
sup |r1,n (x) r1,n (x)| + sup |r1,n (x) IEr1,n (x))|
inf n (x) xC xC
)

1
+ sup |IE(r1,n (x) r1 (x))| + sup |r1 (x)| sup |(x) n (x)| . (13)
xC xC xC

The kernel estimator n (x) is almost surely bounded away from 0 because
of Lemma 5.4 and the fact that the second part of Assumption A8.
Then, (13) in conjunction with Lemmas 5.1, 5.3, 5.4 and 5.5 we conclude
the proof.
References
[1] Andrews, W.K. (1995). Nonparametric kernel estimation for semipara-

metric model. Econometric Theory 11:560-596.
[2] Beran, R. (1981). Nonparametric regression with randomly censored
survival data, Technical Report university of California, Berkeley.
[3] Bollerslev, T. (1986). General autoregressive conditional heteroskedas-
ticity. J. Economt. 31:307-327.
[4] Bosq, D. (1998). Nonparametric statistics for stochastic processes. Es-
timation and Prediction. Lecture Notes in Statistics, 110 , Springer-
Verlag, New York.
[5] Cai, Z. (1998a). Asymptotic properties of Kaplan-Meier estimator for
censored dependent data. Statist. & Probab. Lett. 37:381-389.
[6] Cai, Z. (1998b). Kernel density and hazard rate estimation for censored
dependent data. J. Multivariate Anal. 67:23-34.
[7] Cai, Z. (2001). Estimating a distribution function for censored time
series data. J. Multivariate Anal. 78:299-318.
[8] Cai, Z. (2003). Weighted local linear approach to censored nonparamet-
ric regression. In Recent advances and trends in nonparametric staistics
(eds M. G. Atrikas & D.M. Politis), Elsevier, Amesterdam, the Nether-
lands; San Diego, CA, 217-231.
[9] Cai, Z., Ould-Sad, E. (2003). Local M-estimator for nonparametric
time series. Statist & Probab. Lett. 65:433-449.
[10] Cai, Z., Roussas, G.G.(1992). Uniform strong estimation under
mixing, with rates. Statist. & Probab. Lett. 15:47-55.
[11] Cai, Z., Roussas, G.G. (1998). Kaplan-Meier estimator under associa-
tion. J. Multivariate Anal. 67: 318-348.
[12] Carbonez, A., Gyorfi, L., Van der Meulin, E.C. (1995). Partition-
estimate of a regression function under random censoring. Statist. &
Decisions, 13:21-37.
[13] Collomb, G. (1981). Estimation non-parametrique de la regression: Re-
vue Bibliographique. Internat. Statist. Rev. 49:75-93.
[14] Collomb, G. (1989). Proprietes de convergence presque complete du
predicteur a noyau. Z. Wahrsch. Verw. Geb. 66:441-460.

[15] Collomb, G., Hardle, W. (1986). Strong uniform convergence rates in

robust noparametric time series analysis and prediction: kernel regres-
sion estimation from dependent observations. Stochastic Proc. Appl.
23:77-89.
[16] Dabrowska, D. (1987). Nonparametric regression with censored survival
data. Scand. J. Statist. 14:181-197
[17] Dabrowska, D. (1989). Uniform consistency of the kernel conditional
Kaplan-Meier estimate. Ann. Statist. 17, 1157-1167
[18] Delecroix, M. (1990). L2 -consistency of functional parameters under
ergodicity assumption. Pub. Inst. Statist. Univ. Paris, XXXX:33-56.
[19] Doukhan, P. (1994). Mixing: Properties and examples. Lecture Notes in
Statistics, 85, Springer-Verlag, New York.
[20] Engle, R.F. (1982), Autoregressive conditional heteroskedasticity with
estimates of the variance of U.K. inflation. Econometrica 50:987-1007.
[21] Ferraty, F., Vieu, P. (2006). Nonparametric Functionnal Data Analysis.
Theory and Practice. Springer-Verlag, New York.
[22] Gonzalez-Manteiga, W., Quintela-del-Rio, A., Vieu, P. (2002). A note
on variable selection in nonprametric regression with dependent data.
Statist. & Probab. Lett. 57:259-268.
[23] Guessoum, Z., Ould-Sad, E. (2006). On the nonpararametric estima-
tion of the regression function under censorship model. Technical Re-
port, L.M.P.A., n 308, Univ. du Littoral Cote dOpale, (Submitted).
[24] Hall, P., Hart, J. (1990). Nonparametric regression with long range
dependence. Stochastic Proc. Appl. 36:339-351.
[25] Hardle, W. (1990). Applied nonparametric regression. Cambridge Uni-
versity Press.
[26] Jones, D.A. (1978). Nonlinear autoregressive processes. Proc. Roy. Soc.
London A 360, 71-95.
[27] Kaplan, E. L., Meier, P. (1958). Nonparametric estimation from incom-
plete observations. J. Amer. Statist. Assoc. 53:457-481.
[28] Kohler, M., Mathe K., Pinter M. (2002). Prediction from randomly
right-censored Data. J. Multivariate Anal. 80:73100 .
[29] Lab N. (1993). Nonparametric estimation for regression with de-
pendent data. Application to the prevision. C. R. Acad. Sci. Paris,
317:1173-1177.
[30] Lab, N., Ould-Sad, E. (2000). A robust nonparametric estimation of
the autoregression function under an ergodic hypothesis, Canad. J.
Statist. 28:817-828.
[31] Lecoutre, J.P., Ould-Sad, E. (1995a). Hazard rate estimation for strong
mixing and censored process. Nonparametric. Statist. J. 5:83-89.

[32] Lecoutre J.P., Ould-Sad, E. (1995b). Convergence of the conditional

Kaplan-Meier estimate under strong mixing. J. Statist. Plann. Inference
44:359-369
[33] Liebscher, E. (1996). Strong convergence of sums of -mixing random
variables. Stoch. Proc. Appl. 65:69-80.
[34] Liebscher, E. (2001). Estimation of the density and the regression func-
tion under mixing condition. Statist. & Decisions 19 :9-26.
[35] Liebscher, E. (2002). Kernel density and hazard rate estimation for
censored data under mixing condition. Ann. Inst. Statist. Math. Vol
34:19-28.
[36] Masry, E. (1986). Recursive probability density estimation for weakly
dependent process. IEEE. Trans. Inform. theory. 32:254-267.
[37] Masry, E., Fan, J. (1997). Local polynomial estimation of regression
function for mixing processes. Scand. J. Statist. 24:165-179.
[38] Ozaki, T (1979), Nonlinear time series models for nonlinear random
vibrations. Technical report. Univ. of Manchester.
[39] Robinson, P.M. (1983). Nonparametric estimators for time series. J.
Time Ser. Anal. 4:185-297.
[40] Robinson, P.M. (1986). On the consistency and finite sample proper-
ties of nonparametric kernel time series regression, autoregression and
density estimators. Ann. Inst. Statist. Math. 38:539549.
[41] Roussas, G.G. (1990). Nonparametric regression estimation under mix-
ing condition. Stochastic Proc. Appl. 36:107-116.
[42] Roussas, G.G. (1988). Nonparametric estimation in mixing sequences
of random variables. J. Statist. Plann. Inference 18:135-149.
[43] Tae, Y.K., Cox, D.D. (1996). Uniform strong consistency of kernel den-
sity estimator under dependence. Statist. & Probab. Lett. 26:179-185.
[44] Tran, L.T. (1990). Nonparametric estimation under dependence.
Statist. & Probab. Lett. 10:193-201.
[45] Silverman, B.W. (1986). Estimation for statistics and data analysis.
Monographs on Statist. and Appl. Probab. Chapman & Hall. London.
[46] Vieu, P. (1991). Quadratic errors for nonparametric estimates under
dependence. J. Multivariate Anal. 39:324-347.
[47] Wand, M.P., Jones, M.C. (1995). Kernel smoothing. Chapman & Hall.
London.
[48] Wahba, G. (1990). Spline models for observational data. SIAM,
Philadelphia.
[49] Yakowitz, S. (1989). Nonparametric density estimation and regression
for Markov sequences without mixing assumption. J. Multivariate Anal.
30:124-136.

Kernel regression uniform rate estimation for censored data under α-mixing condition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kernel regression uniform rate estimation for censored data under α-mixing condition

Uploaded by

Copyright:

Available Formats

Kernel regression uniform rate estimation

for censored data under -mixing

Abstract: In this paper, we study the behavior of a kernel esti-

Consider a real random variable (rv) Y and a sequence of strictly stationary

model involves pairs (Ti , i )i=1,...,n where only Ti = Yi Ci and i = 1I{Yi Ci }

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

Suppose that {Yi , i 1} and {Ci , i 1} are two independent sequences of

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

m() under censorship model (see, eg Carbonez et al. [12]) is given by

where G is the survival function of the rv C and

are the Watson-Nadaraya weights, Kd is a probability density function (pdf)

is an estimator of r1 (x) and n (x) (defined in (3)) an estimator 

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

In order to define the -mixing property, we introduce the following nota-

Definition 2.1 Let {Zi , i = 1, 2, ...} denote a sequence of rvs. Given a

The sequence is said to be -mixing (strong mixing) if the mixing coefficient

(n) = 4 max(1 (n), 2 (n)).

From now on, we suppose that {(Ti , i , Xi ) i = 1, ..., n} is strongly mixing.

3. Assumptions and main result

iii) (t, s) C 2 |Kd (t) Kd (s)| kt sk for > 0.

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

Remark 3.1 Assumption A1 is very common in functional estimation both

Theorem 3.1 Under Assumptions A1-A8, we have

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

First, we consider the strong mixing bidimentionnal process generated by:

where 0 < < 1, (i )i is a white noise with standard Gaussian distribution

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

Fig 1. m(x) = x, with n = 50, 100 and 300, respectively.

We also consider two nonlinear cases

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

0.5 0.5 0.5

0.5 0.5 0.5

Fig 2. m(x) = sin 2 x, with n = 50, 100 and 300, respectively.

1.1 1.1 1.1

1.3 1.3 1.3

1.4 1.4 1.4

1.5 1.5 1.5

1.6 1.6 1.6

1.7 1.7 1.7

1.8 1.8 1.8

1.9 1.9 1.9

2.1 2.1 2.1

We split the proof of the Theorem 3.1 in the following Lemmata.

Proof of Lemma 5.1: Observe that

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

Then, we have from (3)

Then Assumptions A1, A2 i) and A4 , give the result.

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

Proof of Lemma 5.3:

C is a compact set, then it admits a covering S by a finite number n of

From Assumption A2 iii)

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

By a Taylor expansion around xk , under Assumptions A2 ii) and A5, we

From Assumption A4,

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

On the other hand, from a result in Bosq ([4], p. 22), we have

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

Now we can write

We have from the left part of assumption A7

Then, for an appropriate choice of , J2 is the general term of a convergent

Proof of Lemma 5.4: We have

By Assumptions A1-A3, A6-A8, by an analogous proof to that of Lemma

imsart-generic ver. 2008/01/24 file: ejs_2008_195.tex date: January 4, 2014

is an estimator of r1 (x) and n (x) (defined in (3)) an estimator