Intro 2 Irt

INTRO 2 IRT
Tim Croudace
Descriptions of IRT
IRT refers to a set of
mathematical models that
describe, in probabilistic
terms, the relationship
between a persons
response to a survey
question/test item and his
or her level of the latent
variable being measured
by the scale
This latent variable is

usually a hypothetical
construct [trait/domain or
ability] which is
postulated to exist but
cannot be measured by a
single observable
variable/item.
Fayers and Hays p55
Assessing Quality of Life in

Clinical Trials. Oxford Univ
Press:
Chapter on Applying IRT for
evaluating questionnaire
item and scale properties.
Instead it is indirectly
measured by using
multiple items or
questions in a multi-item
test/scale.
data:
logit {hi} = h 0 + h 1zi The0000
h0
h1
10
21
h0 40
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
Sources of knowledge : q1 radio

q2 newspapers
q3 reading
3
A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1
0
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
Simple sum scores

(n=1729 new individual
values)
0
0
0 [n]
Total score
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
477 zeros added to data set (new
1
1
1
2
2
2
3
1
2
2
2
3
3
3
4
4
Binary Factor / Latent Trait

Analysis
Results:
logit-probit model
Warming up to this sort of thing soon .
U1
U2
U3
...
Up
2 items with similar thresholds and similar slopes

3 items with different thresholds but similar slopes
The key concept latent factor

models for constructs
underpinning multiple binary
responses
based on(0/1)
innovations
in educational testing
and psychometric statistics > 50 years old

Same models used in educational testing with
correct incorrect answers can be applied to
symptom present / absent data (both binary)
Extensions to ordinal outcomes (Likert scales)
Flexibility in parametric form available
Semi- and non-parametric approaches too
Binary IRT : The A B C D of it
Linear vs non-linear regression of

response probability on latent variable
y-axis
prob
of
response
(Yes)
Adapted
without permission
from a
slide by
Prof H Goldstein
on a
simple
binary
(Yes/No)
scale
item
x-axis
score on latent construct being

measured
Ordinal IRT : The A B C D of

GRM
IRT models
Simplest case of a latent trait analysis
Manifest variables are binary: only 2 distinctions are
made
these take 0/1 values
Yes / No
Right / Wrong
Symptom present / absent
Agree / disagree distinctions for attitudes more likely to be ordinal

[>2 response categories] .. see next lecture IRT 2 on Friday
For scoring of individuals

(not parameter estimation for items)
it is frequently assumed that the UNOBSERVED (latent) variable
< the latent factor / trait>
is not only continuous but normally distributed
[or the prior distn is normal but the posterior distn may not be]
10
IRT for binary data

The most commonly used model was developed by
Lord-Birnbaum model (Lord, 1952; Birnbaum, )
2-parameter logistic
[a.k.a. the logit-probit model; Bartholomew (1987)]
The model is essentially a non-linear single factor model
When applied to binary data, the traditional linear factor model is
only an approximation to the appropriate item response model
sometimes satisfactory, but sometimes very poor (we can guess when)
Some accounts of Item Response Theory make it sound

like a revolutionary & very modern development
this is not true!
It should not replace or displace classical concepts, and has

suffered from being presented and taught as disconnected from
these
A unified treatment can be given that builds one from the other
(McDonald, 1999) but this would be a one term course on its own
11
What IRT does

IRT models provide a clear statement [picture!]
of the performance of each item in the scale/test
and
how the scale/test functions, overall,
for measuring the construct of interest
in the study population
The objective is to model each item by estimating
the properties describing item performance
characteristics
hence Item Characteristic Curve
or Symptom Response Function.
12
Very bland (but simple)

example
Lombard and Doering (1947) data
Questions on cancer knowledge with four
addressing the source of the information
Fitting a latent variable model might be
proposed as a way of constructing a
measure of how well informed an individual
is about cancer
A second stage might relate knowledge
about cancer to knowledge about other
diseases or general knowlege
13
Very bland (but simple)

example
Lombard and Doering (1947) data
Questions on cancer knowledge with
four addressing the source of the
information
radio
newspapers
(solid) reading (books?)
lectures
2 to the power 4 i.e. 16 possible

response patterns from 0000 to 1111
14
Data
Lombard and Doering
(1947) data
2 to the power 4
i.e. 16 possible response
patterns (all occur)
with more items this is
neither likely nor necessary
frequency shown for

0000 to 1111
frequency is the number
with each item response
pattern
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
15
data:
h0
h1
10
21
h0 40
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

q2 newspapers
q3 reading
16
Basic objectives of modelling

When multiple items are applied in a test /
survey can use latent variable modelling to
explore inter-relationships among observed responses
determine whether the inter-relationships can be explained by a
small number of factors
THEN , to assign a SCORE to each individual each on

the basis of their responses
Basically to rank order (arrange) or quantify (score) survey
participants, test takers, individuals who have been studied
CAN BE THOUGHT OF AS ADDING A NEW SCORE TO YOUR
DATASET FOR EACH INDIVIDUAL
this analysis will also help you to understand the properties

of each item, as a measure of the target construct (what
properties?)
GRAPHICAL REPRESENTATION IS BEST
17
Item Properties that we are interested in

are captured graphically by so called Item
Characteristics Curves (ICCs)
18
Item/Symptom & Test/Scale INFORMATION

is useful and necessary to examine score precision
(the accuracy of estimated scores)
we are interested in this for different individuals
(individuals with different score values)
by inspecting the amount of information about each
score level, across the score range (range of
estimated scores) we are identifying variations in
measurement precision (reliable of individuals
estimated scores)
this enables us to make statements about the
effective measurement range of an instrument in an
population
19
e.g. Item Characteristics

Curves
20
Item information functions

- add them together to get TIF
beware y axis scaling : not all the same
21
Test Information Function
22

- shown alongside their ICCs
3.0
0.14
Item Characteristics Curves
0.14
0.40
11
23
1 / Sqrt [Information] =
s.e.m
Info Sqrt(Info) 1/(sqrt(Info)
1
1.0
1.0
2
1.4
0.7
3
1.7
0.6
4
2.0
0.5
5
2.2
0.4
6
2.4
0.4
7
2.6
0.4
8
2.8
0.4
9
3.0
0.3
10
3.2
0.3
11
3.3
0.3
12
3.5
0.3
24
Standard error of measuremenr is not constant (U-shaped, not symmet
Approximate reliability
Reliability
= 1 1/[Info]
= {1 1 / [1 / (s.e.m
^2) }
s.e.m. = standard error of measurement
25
Back to the Data

Lombard and Doering
(1947) data
2 to the power 4
i.e. 16 possible response
patterns (all occur)
with more items this is
neither likely nor necessary
frequency shown for

0000 to 1111
frequency is the number
with each item response
pattern
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
What would be the easiest thing to do with these numbers; to score the26patter
Answer ..
Simply add them up
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
What would be the easiest thing to do with these numbers; to score the27patter
0
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
Simple sum scores

(n=1729 new individual
values)
0
0
0 [n]
Total score
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
477 zeros added to data set (new
1
1
1
2
2
2
3
1
2
2
2
3
3
3
4
28
Weighted
[by discriminating power]
scores
0
h 1]
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
5.50
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
0 [n]
Total
Factor Component [weighted by alpha
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
score
0
1
1
1
2
2
2
3
1
2
2
2
3
3
3
score
-0.98
-0.68
-0.67
-0.46
-0.41
-0.23
-0.22
0.0
0.16
0.42
0.43
0.66
0.72
0.99
1.02
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
score
0.72
0
3.40=
0.72
1.34
0.77
0.77
1.34
0.72
+ 0.77
Mplus version 4.1 ML Estimate
0.72 +1.34
Z by Q1
alpha h 1
0.721
1.34
Z by +
Q2 0.77
alpha h 2
3.358
Z by Q3
alpha h 3
0.72
+ 1.34
+ 0.771.344
Z by Q4
alpha h 4
0.769
3.40
0.72
+3.40 Z
Variances
1
3.40+ 0.77
Compare with Bartholomew (1987)
0.72 (0.09)
3.40
+ 1.34
3.40 (1.14)
1.34 (0.17)
0.72
+ 3.40+ 0.77
0.77 (0.15)
0.72+ 3.40+1.34
3.40+1.34+ 0.77
0
0.72
0.77
1.34
1.48
S.E.
2.06
0.093
2.10
1.035
0.167
2.82
0.145
3.40
4.12
4.16
p160
4.74
4.88
5.46
29
37
data:
h0
h1
10
21
h0 40
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

q2 newspapers
q3 reading
30
Weighted
[by discriminating power]
scores
0
h 1]
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
5.50
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
0 [n]
Total
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
score
0
1
1
1
2
2
2
3
1
2
2
2
3
3
3
score
-0.98
-0.68
-0.67
-0.46
-0.41
-0.23
-0.22
0.0
0.16
0.42
0.43
0.66
0.72
0.99
1.02
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
score
0.72
0
3.40=
0.72
1.34
0.77
0.77
1.34
0.72
+ 0.77
0.72 +1.34
Z by Q1
alpha h 1
0.721
1.34
Z by +
Q2 0.77
alpha h 2
3.358
Z by Q3
alpha h 3
0.72
+ 1.34
+ 0.771.344
Z by Q4
alpha h 4
0.769
3.40
0.72
+3.40 Z
Variances
1
3.40+ 0.77
Compare with Bartholomew (1987)
0.72 (0.09)
3.40
+ 1.34
3.40 (1.14)
1.34 (0.17)
0.72
+ 3.40+ 0.77
0.77 (0.15)
0.72+ 3.40+1.34
3.40+1.34+ 0.77
0
0.72
0.77
1.34
1.48
S.E.
2.06
0.093
2.10
1.035
0.167
2.82
0.145
3.40
4.12
4.16
p160
4.74
4.88
5.46
31
37
Something a little more

subtle
Simple sum scores assumes all item
responses equally useful at defining the
construct
may not be the case
If items are differentially important

different discriminating power with respect to
what we are measuring, we might want to take
that into accounf
How? Weighted sum scores [Component scores]
weighted by what?
weighted by the estimates (factor loading type
parameter) from a latent variable model
[latent trait model with a single latent factor]
32
logit {hi} = h 0 + h 1zi

h0
h1
Cancer
Knowledge zi
The data:
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
10
21
h0 40
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
0.14

- shown alongside their ICCs
3.0
Item Characteristics Curves
0.14
0.40
11

q2 newspapers q3 reading
q4 lectures
A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1 too! 26

Z
Z
Z
Z
by
by
by
by
Q1
Q2
Q3
Q4
Variances
alpha
alpha
alpha
alpha
h
h
h
h
1
2
3
4
0.721
3.358
1.344
0.769
Weighted
scores
S.E.
0.093
1.035
0.167
0.145
Weights
alpha h 1
parameters
Compare with Bartholomew (1987) p160

0.72
3.40
1.34
0.77
(0.09)
(1.14)
(0.17)
(0.15)
37
Q1
0.72
Q2
3.40
Q3
1.34
Q4
0.77
These numbers 33
20
?????
0.72
3.40
1.34
0.77
Estimated component scores

(weighted values)
0
h 1]
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
5.50
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
0 [n]
Total
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
score
0
1
1
1
2
2
2
3
1
2
2
2
3
3
3
score
-0.98
-0.68
-0.67
-0.46
-0.41
-0.23
-0.22
0.0
0.16
0.42
0.43
0.66
0.72
0.99
1.02
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
score
0
=
0.72
0.77
1.34
0.72+ 0.77
0.72 +1.34
1.34+ 0.77
0.72+ 1.34+ 0.77
3.40
0.72+3.40
3.40+ 0.77
3.40+ 1.34
0.72+ 3.40+ 0.77
0.72+ 3.40+1.34
3.40+1.34+ 0.77
0
0.72
0.77
1.34
1.48
2.06
2.10
2.82
3.40
4.12
4.16
4.74
4.88
5.46
34
But the bees knees are..

The estimated factor scores from the
model
Not just some simple sum or unweighted
or weighted items
Takes into account the proposed score
distribution (gaussian normal) and the
estimated model parameters (but not the
fact that they are estimates rather than
known values) and more besides (when
missing
are present)
thedata
estimated
factor scores
35
A graphical and interactive

introduction to IRT
Play with the key features of IRT
models
www2.unijena.de/svw/metheval/irt/VisualIRT.pdf
36
a b (see) [2 parameter IRT

model]
VisualIRT (pdf)
Page
VisualIRT (pdf)
Page
Individuals score = new ruler value

Any hypothetical latent variable [factor/trait] contin
expressed in a z-score metric (gaussian normal (0,1
Item properties
slope = item discrimination
location = item commonality [difficulty/prevalance/
37
IRT Resources
A visual guide to Item Response Theory
I. Partchev
Introduction to RIT,
R.Baker
http
//ericae.net/irt/baker/toc.htm
An introduction to modern measurement theory

B Reeve
Chapter in Fayers and Machin QoL book

P Fayers
ABC of Item Response Theory

H Goldstein
Moustaki papers, and online slides (FA at 100)

LSE books (Bartholomew, Knott, Moustaki, Steele)
38
Applying The Rasch Model Trevor G. Bond and Christine M. Fox 255 pages. 2001.
Constructing Measures: An ItemItem
Response
Modeling Approach Mark Wilson. 248
Response Theory Books
pages. 2005.
The EM Algorithm and Related Statistical Models Michiko Watanabe and Kazunori
Yamaguchi. 250 pages. 2004.
Essays on Item Response Theory Edited by Anne Boomsma, Marijtje A.J. van Duijn, Tom A.A.
Snijders. 438 pages. 2001.
Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach
Edited by Paul De Boeck and Mark Wilson. 382 pages. 2004.
Fundamentals of Item Response Theory Ronald K. Hambleton, H. Swaminathan, and H. Jane
Rogers. 184 pages. 1991.
Handbook of Modern Item Response Theory Edited by Wim J. van der Linden and Ronald K.
Hambleton. 510 pages. 1997.
Introduction to Nonparametric Item Response Theory Klaas Sijtsma and Ivo W. Molenaar.
168 pages. 2002.
Item Response Theory Mathilda Du Toit. 906 pages. 2003.
Item Response Theory for Psychologists Susan E. Embretson and Steven P. Reise. 376
pages. 2000.
Item Response Theory: Parameter Estimation Techniques (Second Edition, Revised and
Expanded w/CD) Frank Baker and Seock-Ho Kim. 495 pages. 2004.
Item Response Theory: Principles and Applications Ronald K. Hambleton and Hariharan
Swaminathan. 332 pages. 1984.
Logit and Probit: Ordered and Multinomial Models Vani K. Borooah. 96 pages. 2002.
Markov Chain Monte Carlo in Practice W.R. Gilks, Sylvia Richardson, and D.J.
Spiegelhalter. 512 pages. 1995.
Monte Carlo Statistical Methods Christian P. Robert and George Casella. 645 pages.
2004.
Polytomous Item Response Theory Models Remo Ostini and Michael L. Nering. 120
pages. 2005.
Rasch Models for Measurement David Andrich. 96 pages. 1988.
Rasch Models: Foundations, Recent Developments, and Applications Edited by Gerhard H.
39
Fischer and Ivo W. Molenaar. 436 pages. 1995.

Intro 2 Irt

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intro 2 Irt

Uploaded by

Copyright:

Available Formats

INTRO 2 IRT

This latent variable is

Fayers and Hays p55

Assessing Quality of Life in

Sources of knowledge : q1 radio

Simple sum scores

477 zeros added to data set (new

Binary Factor / Latent Trait

2 items with similar thresholds and similar slopes

The key concept latent factor

and psychometric statistics > 50 years old

Binary IRT : The A B C D of it

Linear vs non-linear regression of

score on latent construct being

Ordinal IRT : The A B C D of

Agree / disagree distinctions for attitudes more likely to be ordinal

For scoring of individuals

IRT for binary data

Some accounts of Item Response Theory make it sound

It should not replace or displace classical concepts, and has

What IRT does

Very bland (but simple)

Very bland (but simple)

2 to the power 4 i.e. 16 possible

frequency shown for

Sources of knowledge : q1 radio

Basic objectives of modelling

THEN , to assign a SCORE to each individual each on

this analysis will also help you to understand the properties

Item Properties that we are interested in

Item/Symptom & Test/Scale INFORMATION

e.g. Item Characteristics

Item information functions

beware y axis scaling : not all the same

Test Information Function

Item information functions

Item Characteristics Curves

beware y axis scaling : not all the same

Back to the Data

frequency shown for

Simple sum scores

477 zeros added to data set (new

[by discriminating power]

Factor Component [weighted by alpha

Sources of knowledge : q1 radio

[by discriminating power]

Factor Component [weighted by alpha

Something a little more

If items are differentially important

logit {hi} = h 0 + h 1zi

Item information functions

Item Characteristics Curves

Sources of knowledge : q1 radio

Mplus version 4.1 ML Estimate

Compare with Bartholomew (1987) p160

Estimated component scores

Factor Component [weighted by alpha

But the bees knees are..

A graphical and interactive

a b (see) [2 parameter IRT

Individuals score = new ruler value

An introduction to modern measurement theory