You are on page 1of 8

Panel data

1 Panel data
Cross-sectional data typically collect information on units (individuals, rms,
countries, etc.) at a given moment in time. Time-series data collect information
on a single unit (typically, a country) over many (T) time periods. So far, we
have concentrated our attention on cross-sectional data and ignored time-series
data. Both types of data move along a single dimension: individuals (cross-
sectional data) or time (time-series data).
Panel data (or longitudinal data) move instead along two (or more) dimen-
sions. The most frequent case is one where units are observed over T time
periods. The total number of observations is therefore T. Three dimensions
of data are also possible: for example, we could have information on multina-
tional rms followed over T time periods and over the C countries where they
operate. The total number of observations in this case is TC.
Two-dimension panels do not necessarily extend over units and time periods.
We could have two-dimension panels extending over units and sub-units, such
as when we have data on siblings within families, or branches within rms, or
counties within states.
Panels can be balanced (when the units are followed for an equal number of
periods) or unbalanced (when the units are followed for a dierent number of
periods). Panels can be continuous (when there is no refreshing, i.e., no entry
of new units -an example is the Panel Study of Income Dynamics) or rotating
(when units leave and are replaced by new units -an example is the Consumer
Expenditure Survey).
1.1 Advantages of panel data
Panel data allow us to control for unobserved heterogeneity. For example,
in the classical Mundlaks example, we can control directly for managerial
ability and inputs use, so as to eliminate the omitted variable bias. In
general, any unobserved heterogeneity component that remains xed over
time can be handled thus reducing considerably the omitted variable bias
problem.
They oer obvious statistical advantages. They may help us reduce the
problem of collinearity among variables, and may give us more precise
estimates due to the eciency gain brought by more data.
They can help us addressing questions of dynamics as discussed in class.
1
1.2 Disadvantages of panel data
Panel data suer from attrition, i.e., the fact that people who are initially
part of the data set fail to be interviewed in subsequent waves. For ex-
ample, people may move and the survey may be unable to track them
down. The attrition of units would not be problematic if attrition were
completely at random. However, this is unlikely to be the case. If we are
studying the dynamics of poverty, for example, we may be worried that
the people who move (and therefore disappear from the survey) are the
poorest, for example due to lack of job opportunities in the local labor
market or, to the extreme, because they become homeless. Since the poor
disappears, we will have the impression that things are improving simply
because the only people who remain in the panel are the rich.
1.3 Prototypical panel data model
The prototypical panel data model is
n
it
= c +r
it
+1
i
+
it
for i = 1. 2. .... and t = 1. 2. .... T for a total of T observations. This is
a model with a single regressor r
it
. The new term is 1
i
, which is assumed to
be individual-specic and time invariant. The classical reference is Mundlak,
who though of n
it
as the output of a farm, r
it
as inputs (fertilizers, etc.), 1
i
as
managerial ability, and
it
as random event outside the managers control (such
as weather).
Here we will make the assumption that 1 (
it
j1
i
. r
it
) = 0. As for 1
i
, two
models are usually proposed based on the assumption one is willing to make
regrading the correlation between 1 and r:
(a) the random eect model, corresponding to 1 (1
i
jr
it
) = 0;
(b) the xed eect model, corresponding to 1 (1
i
jr
it
) 6= 0.
In case (a), we assume that 1
i
is an error component, while in case (b) we
assume it is a parameter to estimate. In the example above, if we believe that
managers ability is randomly allocated among farms, the random eect is a
good description of the reality; in contrast, if we believe that managers ability
is correlated with inputs choice and eciency, the xed eect model should be
used. The main dierence between the two models is that they lead to two
dierent estimators.
2 Random eect model
(Wooldridge, ch. 14.2)
The random eect model is a variant of GLS (generalized least squares).
GLS is used when the error term of the regression
2
n = A +n
is heteroskedastic, i.e., 1 (nn
0
) 6= o
2
1.
Lets take the prototypical panel data model of the previous section:
n
it
= c +r
it
+ (1
i
+
it
)
. .. .
uit=error term
= c +r
it
+n
it
for i = 1. 2. .... and t = 1. 2. .... T. In matrix format
y
(NT1)
= X
(NT2)

(21)
+ u
(NT1)
Lets assume that 1
i
is a zero-mean error term with variance o
2
f
and n
it
is an
i.i.d. zero-mean error term with variance o
2
v
. As well, 1
i
and
it
are orthogonal
(1 (1
i

is
) = 0 for all :). The global" error term n
it
is indeed heteroskedastic.
In fact:
1

n
2
it

= 1

(1
i
+
it
)
2

= 1

1
2
i

+1

2
it

+1 (1
i

it
)
= o
2
f
+o
2
v
and for t 6= :
1 (n
it
n
is
) = 1 ((1
i
+
it
) (1
i
+
is
)) = 1

1
2
i

+1 (1
i

is
) +1 (1
i

it
) +1 (
it

is
)
= o
2
f
Thus, if u
i
=

n
i1
n
i2
...
n
it

for a generic individual i, we have that his/her


variance matrix of the error is:
1 (u
i
u
0
i
)
(TT)
=

o
2
f
+o
2
v
o
2
f
... o
2
f
o
2
f
o
2
f
+o
2
v
... o
2
f
... ... ... ...
o
2
f
o
2
f
... o
2
f
+o
2
v

= o
2
v
I +o
2
f
ii
0
=
3
where I is the identity matrix and i a vector of ones. Stacking the vectors
u
i
for all individuals, we get that the error term of the pooled regression is:
u =

u
1
u
2
...
u
N

, and so its variance matrix


1 (uu
0
)
(NTNT)
=

0 ... 0
0 ... 0
... ... ... ...
0 0 ...

= I
is a block-diagonal matrix, and indicates a Kronecker product.
2.1 GLS
The general model
n = A +n (1)
could certainly be estimated by OLS even in the presence of heteroskedasticity
of the error term. The estimates will be unbiased and consistent. To see this,
suppose for simplicity that A is xed (non-stochastic). Then OLS is
^

OLS
= (A
0
A)
1
A
0
n
= + (A
0
A)
1
A
0
n
Its expectation is
1

OLS

= + (A
0
A)
1
A
0
1 (n)
=
because 1 (n) = 0. It is also easy to prove consistency of
^

OLS
.
What about eciency? Assume 1 (nn
0
) = o
2
, so that theres heteroskedas-
ticity. The true" variance of
^

OLS
(i.e., the variance matrix that should be
used to conduct inference) is
ar

OLS

HT
= ar

+ (A
0
A)
1
A
0
n

= (A
0
A)
1
A
0
ar (n) A (A
0
A)
1
= o
2
(A
0
A)
1
A
0
A (A
0
A)
1
where ar

OLS

HT
means that this is the variance of the OLS estimator when
the econometician knows that there is heteroskedasticity.
4
There are two important things to notice here:
(a) OLS is no longer BLUE. We can nd another estimator that is also
unbiased, linear, but is more ecient than OLS. This estimator is the GLS
estimator (see below).
(b) An econometrician that estimates the model (1) by OLS ignoring het-
eroskedasticity of the error term will conduct inference on using the wrong
variance matrix of the estimates, i.e., he/she will use ar

OLS

HM
= o
2
(A
0
A)
1
instead of the correct ar

OLS

HT
(given above). Thus hypothesis tests will
give wrong results.
How do we get the GLS estimator? The idea is to transform a heteroskedastic
error term into an homoskedastic error term, and then apply the OLS idea to
the transformed model.
Suppose you can nd a (non-singular, square, and non-stochastic) matrix 1
such that

1
= 1
0
1
(see Johnstons chapter for more details on this). Then take the heteroskedastic
model
n = A +n
with 1 (nn
0
) = o
2
. Pre-multiply both sides by 1:
1n = 1A +1n
~ n =
~
A + ~ n
Is ~ n homoskedastic? First note that 1 (~ n) = 1 (1n) = 11 (n) = 0. Then
note that
ar (~ n) = ar (1n) = 1ar (n) 1
0
= o
2
11
0
Now we will prove that 11
0
= 1 starting from the denition

1
= 1
0
1

1
1
1
= 1
0
11
1
(1
0
)
1

1
1
1
= (1
0
)
1
1
0
. .. .
I
11
1
. .. .
I
(1
0
)
1

1
1
1
= 1
and since (1C)
1
= C
1
1
1

1
, it follows that
11
0
= 1
5
and therefore
ar (~ n) = o
2
11
0
= o
2
1
is homoskedastic. Thus OLS applied to the transformed model will give unbiased
and consistent estimates of and will also be BLUE:
^

OLS;tm
=

~
A
0
~
A

1
~
A
0
~ n
= (A
0
1
0
1A)
1
A
0
1
0
1n
=

A
0

1
A

1
A
0

1
n
=
^

GLS
where
^

OLS;tm
stands for OLS applied to the transformed model. The variance
of this estimator, still assuming A is xed for simplicity, is
ar

GLS

= ar

A
0

1
A

1
A
0

1
n

= ar

A
0

1
A

1
A
0

1
n

=

A
0

1
A

1
A
0

1
ar (n)
1
A

A
0

1
A

1
= o
2

A
0

1
A

1
A
0

1
A

A
0

1
A

1
= o
2

A
0

1
A

1
This estimator can also be obtained by
:i:
^

^ n
0

1
^ n
so its similar to the IV idea of weighting residuals dierently instead of equally
(as in OLS). The solution of this problem is indeed
^

GLS
=

A
0

1
A

1
A
0

1
n
Intuitively, the GLS estimator puts more weight on observations with a lower
variance of the error. Thats where its precision gain comes from relative to OLS.
The general form of the GLS estimator in a model
n = A +n
with n having a generic variance matrix ar (n) is:
^

GLS
=

A
0
ar (n)
1
A

1
A
0
ar (n)
1
n
Hence, for the random eect model in which ar (n) = 1 we have
^

RE
=

A
0
( 1)
1
A

1
A
0
( 1)
1
n
6
2.2 Feasible GLS
There is one obvious problem in the implementation of the random eect estima-
tor (and in general in the implementation of any GLS estimator):
^

RE
depends
on , which is unknown unless one has some outside information about o
2
f
and
o
2
v
(an unlikely case). One way out is to replace with a consistent estimate
of it, say
^
. If this is done, one implements a so called feasible RE (or more
generally, feasible GLS) estimator:
^

FRE
=

A
0

1
A

1
A
0

1
n
To obtain an estimate of

.we need to nd estimates of the two parameters
that appear in it, o
2
f
and o
2
v
. We will use a couple of results:
(1) Suppose you estimate the model by OLS. As said, OLS gives consistent
estimates. Dene the OLS residual n
it
= n
it
^ c
OLS
^
OLS
r
it
. In general, we
have learned that an estimate of the variance of the error term when we have a
cross-section of individuals and a model with / regressors is

N
i=1
^ n
2
it
/
We proved that 1
P
N
i=1
^ u
2
it
Nk

= o
2
u
. In panel data, however, we have a total
of T observations, so that the analog for panel data is

o
2
u
=

N
i=1

T
t=1
^ n
2
it
T /
which is also an unbiased estimate of o
2
u
.
(2) Since n
it
= 1
i
+
it
and 1
i
and
it
are both mean zero and orthogonal to
each other,
o
2
u
= 1

n
2
it

= o
2
f
+o
2
v
Thus if we have an estimate of o
2
u
(calculated using result (1) above) and an
estimate of one of its two component (o
2
f
or o
2
v
), the other one can be determined
by dierence.
Now we have to nd an estimate of o
2
f
(which is easier than nding an
estimate of o
2
v
). Note that
1 (n
is
n
it
) = o
2
f
for all : 6= t
Consider a generic individual i. There are
(T1)T
2
possible n
is
n
it
products,
namely n
i1
n
i2
,n
i1
n
i3
. .... n
i1
n
iT
,n
i2
n
i3
,...,n
iT1
n
iT
. Taking their sum:
T1

s=1
T

t=s+1
n
is
n
it
7
the expectation of which is
1

T1

s=1
T

t=s+1
n
is
n
it

=
T1

s=1
T

t=s+1
1 (n
is
n
it
) =
T1

s=1
T

t=s+1
o
2
f
=
(T 1) T
2
o
2
f
and hence 1

P
T1
s=1
P
T
t=s+1
uisuit
(T1)T
2

= 1

P
N
i=1
P
T1
s=1
P
T
t=s+1
uisuit
N
(T1)T
2

= o
2
f
, where
the second equality comes from the fact that we can use any individual in the
sample.
We dont have the true errors n, just the residuals ^ n. We can use these as
long as we make a degrees of freedom correction and we obtain:
1

T1
s=1

T
t=s+1
^ n
is
^ n
it

(T1)T
2
/
. .. .
c

2
f

= o
2
f
At this point we can obtain:

o
2
v
=

o
2
u

o
2
f
and construct
^
and implement FRE (or FGLC in general).
To summarize, FGLS is obtained using a two step procedure:
(a) Run OLS on the heteroskedastic model n = A + n, and construct the
residual ^ n = n A
^

OLS
. Use this to obtain estimates

o
2
u
and

o
2
f
and thus by
dierence

o
2
v
. This allows to construct
^
.
(b) The FGLS estimator is
^

FGLS
=

A
0

1
A

1
A
0

1
n
in the random eect model case. Use the correct variance matrix of
^

FGLS
,
ar

FGLS

A
0

1
A

1
to conduct inference on .
8

You might also like