Professional Documents
Culture Documents
Robert J. Mislevy
Nov o 8 1988 • .
September 1988 •
807
ie
![ n ol: s si fie d .
-RR-88-54-IjNR
17 COSATI CODES 18 SUBJECT TERMS (Continue on reverse if necessary and identity by block number)
FIELD GROUP SUB-GROUP Complex samples, item response theory, latent structure,
Jj i0 missing data, multiple imputation, National Assessment of
Educational Progress, sample surveys
'9 ABSTRACT )Continue on reverse if necessary and identify by block number)
DD Form 1473, JUN 86 Previous editions are obsolete SECURITY CLASSIFICATION OF THIS PAGE
Robert J. Mislevy
September 1988
CPY
lt4 SPfEC1 --
Abstract
examples and with data from the 1984 National Assessment for
surveys.
Randomization-Based Inference
Introduction
different test items are compared using item response theory (IRT)
* to analyze correctly.
data, first when all responses are in fact observed, then when
are known for all units before observations are made, but the
values of the survey variables, Y, are not. Let (X,Z) denote the
different towns).
(s - S) - N(0,U)
4
Randomization-Based Inference
reasons that may or may not be related to the values that would
were missing and observed. Much progress has been made extending
Randomization-Based Inference
now known, but the values of Ymis are not. It is not possible to
conditional expectation:
knowledge about what the missing responses might have been, given
large surveys with few missing values, one can use these empirical
p(misLobsz) = p(yi,mislyi,obszi) --
i
Randomization-Based Inference
)
P(Xmislyobs' ) = f P(Xmis'Xobs'Z'O) p('obs di , (3)
been observed.
s--by filling in each missing response with a random draw from its
as follows:
for the responses that were observed, and, for those that
U () Y(m))
M
sM = Z s (m)/ M (4)
Randomization-Based Inference
VM U M + (l+M - ) BM , (5)
where
M
UM Z U / M
m=l(i)
M
BM - mH
E l(in)
(s - SM) 2 (-i
(M-1)
M= 1
= (M-1) (l+rM
-1 2
,
* missingness:
rM - (1+M ) BH / UM
Randomization-Based Inference
-1
rM = (+M ) Trace(BMUM -1)/k
suffices.
size 12. If y values for all 12 units in the realized sample were
10
(y-g) - N(0,1/12). Suppose, though, that the values for the last
two sampled units are missing (at random). The first ten are
accomplished as follows.
N(jI) too. What we know about M from the first ten observations
11
results are .028, .106, .143, .031, and .295. For each m,
obtained by (5) as
.098.
12
actually are.
(.121 - p) - t1 8 4 (0,.098)
Latent-Variable Models
13
n
p(xiOA = H p(x (j) 0j)' , (8)
__ _ _ _ __ - _
Randomization-Based Inference
14
individuals.
(xI .... xN) of an SRS of respondents. The starting point for the
N
p(xI,fl) - H f P(xi1O,P) p(Ola) dO (10)
i-i
1977; Bock and Aitkin, 1981; Dempster, Laird, and Rubin, 1977;
secondary analyst.
I1 ...-
Randomization-Based Inference
15
latent:
(s-S) - N(O,U).
Randomization-Based Inference
16
P(XI6,fi,,Z) = P(XIO,P)
N
= H P(Xii,p) (12)
i
17
that
N
p(O[y,Z,a) = H p(ilYi,zi,a) (13)
i
design. While the sample design for selecting units from the
18
N
H Kip P(xilOi,fl) p(ilyi,zi,a) , (14)
i
of (i) the likelihood function induced for 0 by x.1 via the latent-
th
2. Produce M "completed" datasets (9(m),Xy). For the mth
Randomization-Based Inference
19
U m=U( (m),x,y).
VM - UM + (I+M " 1 ) BM ,
respectively.
Randomization-Based Inference
20
given in (6).
4previously.
simple ways.
21
l+u2 (sym)
e
1 1
r 1r 911
r62 r0 2 r1 2 1
p(Oly) - 2 (15)
where
Randomization-Based Inference
22
2 2
°Oly
1
E(O1Yl) = E(xlyl) = Y =-(112 + r 1 2 f2l1 ) Yl
0 given y as
2 / 2 a2
PC = ly / (a 1y + Oe)
2 2
Note that Op l and pCaxly - a~y.
0
0 .
Randomization-Based Inference
23
p(Olx,y) - N(O,a 2
OIXY
where
and
2 2 22
2 x Var(Olx,y) - (1-P ) a (I-P )(1-R )
OIxy c Oly C
9 - 0 + f
MLE O-x; nor does it minimize mean square error over the
example,
- Pc 0 + (1-p) ' 0 + 0
0 ...
Randomization-Based Inference
24
-E(O)
-p r6 + (l-P)+p, r12
- ov(6,y1 )
By similar calculation,
Var(6) - 1 - Var(6)
Var(Oly) - a - VrOy
Randomization-Based Inference
i2 25
IIX
rather than 0 itself? For an SRS of size N, the SEM 2for N
cor-ect expectations for some attributes, but not for others. For
estimating the mean of 8, all three turn out to have the correct
26
The first test has a reliability that might be expected with ten
imputations, are readily apparent for the short test but are less
27
-- - - -- - -
Randomization-Based Inference
28
and y are low--say less than five latent variables and five
practical applications.
E
9 from
p(OIx,Y) c
29
and z and their interactions, which may number into the millions,
P (O{5,X,E)
30
when he mentions that the imputer's model may differ from the
suggest that biases for population means and variances may be mild
Example 2 (continued)
imputations is
,4 2
W y ,a xy
p(Oix,y) = N[px + (-P
* * * 2
y 201xy ]
p (Oix ,y) - N [p x + (l-P ) I ,
I,' c 1y
Randomization-Based Inference
31
or
* 2 2 2 2 2 2
PC) = (l(ral) / (l-r6 1 + ae)
and
2 * 2 * 2
a0 xy = Var(O1x,y) (l-p) a2 (-P )(l-r2 1 )
I
0i = PcXi + (l-pc)plYil + g
I 1 1
(O,y), and its covariance matrix agrees with that of (9,y) for all
elements except the one for 6 with Y2 . Rather than r02 , one
. obtains
Cov(O 'Y2 ) r 2 c r0
02 - (lpc) r81 r1 2
instance,
E(6 ) - 0 - E(6)
S Randomization-Based Inference
32
and
Var(0 ) = 1 = Var(O)
imputer and the analyst use the same model, one obtains
*
E(O 1yl) = ily I - E(O-yl)
and
4 E(Oly) = E(Ojy) - 01 12 y I + f2 11 Y2
33
from yI.
considered.
regression of 6 on y is obtained as
6"
19. ~.
Randomization-Based Inference
34
112 = (r01-r02r12)/(l-r122
*
the corresponding coefficient in the multiple regression of 0 on
y is
(r 1 -r 2 r 1 2 )/(l-r1 2 (22)
16112 11 2 - (l-Pc) 0 2 11 r1 2
+ (23)
02 r0 2 pc r02 ( r81 r12
c-Pc)
S - (-r12
* 2
* This bias is reduced as either P, the test reliability, or r2
0I
Randomization-Based inference
35
36
37
attitude items.
38
the scale was worth ten items, the thirty items from the other
* as many items.
39
and 17, and in the modal grades associated with those ages, namely
Randomization-Based Inference
40
size, except that extreme rural and low-SES urban areas were
Randomization-Based Inference
41
with the total sample, and once with a run corresponding to each
[ PSU pair, with one of its members left out of the sample but with
school.
three ages, but most appeared in only one or in two adjacent ages.
Randomization-Based Inference
42
The 3-parameter logistic (3PL) IRT model was used. Under the 3PL,
P(x(j)=l0,ab j , b j ,c ) - c + (l-c.)/(l+exp[-l.7a.(0-b)] ,
43
would have served equally well to set the scale. The resulting
44
distributions:
91y,z - N[(y,z)c'r,a 2
j 2
where r is a vector of seventeen main effect parameters and a is
2
the residual variance. Together, r and a comprise the
not completely, as STOC and region have been included but PSU,
c
membership in (y,z) are largely mitigated.
2
Maximum likelihood estimates of r and a were obtained
allow for all two-way interactions between cohort and each of the
values thereafter.
I
Randomization-Based Inference
45
^ A
c c
where P(0=8 S ly1i z 1i,ct=a) is the density at 85 of the normal pdf
been better.) Five imputations were drawn in this manner for each
Illustrative Results
weighted mean was calculated five times, once from each completed
Randomization-Based Inference
46
data set. The average of the five was the reported estimate. A
and 1+M- =1.2 times the variance of the five aforementioned means.
illustrates some results for Age 17. The full public report is
average scoring group, for whom the test items were relatively
r,
Randomization-Based Inference
47
among pupils whose primary language was not English came due, the
involving both variables that had and variables that had not been
twice, once with the original completed datasets and once with new
48
Extensions
below.
49
p(xI ..... x4611 ... 64) is simply the product of the four
50
the huge sample sizes from which these effects are estimated, but
sampling.
Randomization-Based Inference
51
the 1984 NAEP survey of writing, Beaton and Johnson (1987) worked
under the ARM than with IRT models because the assumed linearity
Conclusion
extends over time, IRT does indeed make it possible to solve many
Randomization-Based Inference
52
0O
Randomization-Based Inference
53
References
374.
Bock, R.D., Mislevy, R.J., and Woodson, C.E.M. (1982). The next
4-11, 16.
54
Mislevy, R.J., and Bock, R.D. (1983). BILOG: Item Analysis and
55
Academic Press.
0 Testing Service.
Mislevy, R.J., and Sheehan, K.M. (in press). The role of
581-592.
.. ..
. ......
Randomization-Based Inference
56
0
Randomization-Based Inference
57
C W --
I,- Z
0 ,- t
ii -4
ZL
C4II C lb+
I 1 -p3 ,, I€1
iiii I3 - - (N
0 %-4
0) ) I
CO N-4
M C t Q)-
-
'-4 a) -44 q --
b .-4
-4 4 C1 - -4J
.-
o1-
'
4 00N
~ .Jo ,
a) IC4 r) - ) , )Q
a-
_44 4J
...
-4 1 _ $ . r
4) 4.. Z 4- C O 0 M :
-4 C4a -4C') 4
o H + 0. 4 a)M- 4M (
H CL- 1 c . o u C w > C
I 0) .I0 M
04> E
Randomization-Based Inference
58
Table 2
Dependent Variable
Population - A
*I Simple
Regression
Residual
Variance .750 1.750 .350 .438 .875
% variance
accounted for .250 .125 .417 .125 .125
.................................................................
Table 3
Dependent Variable
4 Population - A
6 0 0 (r)
Attribute 0 and 6
xy x x
Mean .000 .000 .000 .000 .000
Simple
Regression
Residual
Variance .750 .850 .690 .703 .773
% variance
accounted for .250 .227 .266 .227 .227
.................................................................
Randomization-Based Inference
59
Table 4
Dependent Variable
Population - *
Long Test
Table 5
Dependent Variable
Population * *
Attribute 0 and 0 6 (pT* 50) 0 (pT=.91)
Simple regression
Multiple regression
60
Table 6
Estimates of Conditional Distribution Parameters
61
Table 7
I.
Randomization-Based Inference
62
Table 8
White; language
minority 6.08 4.23 3.96 1.07 -43.74
White; language
non-minority 12.22 13.72 2.94 4.67 10.93 10.93
Hispanic; lang.
non-minority - .76 1.22 3.25 .38 162.30
Asian; language
minority -2.90 -6.25 4.39 -1.42 53.60
Asian; language
non-minority 9.54 17.34 4.66 3.72 44.98 44.98
Black; language
non-minority -8.64 -10.82 2.95 -3.67 20.15 20.15
Home language
* minority -9.41 -13.78 2.78 -4.96 31.71 31.71
Years academic
courses .91 1.25 .14 8.93 27.20 27.20
E
4t
* 1988/07/06