You are on page 1of 27

Item Response Theory

Shortcomings of Classical True


Score Model

Sample dependence
Limitation to the specific test situation.
Dependence on the parallel forms
Same error variance for all

Sample Dependence
The first shortcoming of CTS is that the values of
commonly used item statistics in test development
such as item difficulty and item discrimination
depend on the particular examinee samples in
which they are obtained. The average level of
ability and the range of ability scores in an
examinee sample influence, often substantially, the
values of the item statistics.
Difficulty level changes with the level of samples
ability and discrimination index is different
between heterogeneous sample and the
homogeneous sample.

Limitation to the Specific Test


Situation
The task of comparing examinees who have
taken samples of test items of differing
difficulty cannot easily be handled with
standard testing models and procedures.

Dependence on the Parallel


Forms
The fundamental concept, test reliability, is
defined in terms of parallel forms.

Same Error Variance For All


CTS presumes that the variance of errors of
measurement is the same for all examinees.

Item Response Theory


The purpose of any test theory is to describe how
inferences from examinee item responses and/or
test scores can be made about unobservable
examinee characteristics or traits that are
measured by a test.
An individuals expected performance on a
particular test question, or item, is a function of
both the level of difficulty of the item and the
individuals level of ability.

Item Response Theory


Examinee performance on a test can be predicted
(or
explained)
by
defining
examinee
characteristics, referred to as traits, or abilities;
estimating scores for examinees on these traits
(called "ability scores"); and using the scores to
predict or explain item and test performance.
Since traits are not directly measurable, they are
referred to as latent traits or abilities. An item
response model specifies a relationship between
the observable examinee test performance and the
unobservable traits or abilities assumed to underlie
performance on the test.

Assumptions of IRT
Unidimensionality
Local independence

Unidimensionality Assumption
It is possible to estimate an examinee's ability on
the same ability scale from any subset of items in
the domain of items that have been fitted to the
model. The domain of items needs to be
homogeneous in the sense of measuring a single
ability: If the domain of items is too heterogenous,
the ability estimates will have little meaning.
Most of the IRT models that are currently being
applied make the specific assumption that the
items in a test measure a single, or unidimensional
ability or trait, and that the items form a
unidimensional scale of measurement.

Local Independence
This assumption states that an examinee's
responses to different items in a test are
statistically independent. For this
assumption to be true, an examinee's
performance on one item must not affect,
either for better or for worse, his or her
responses on any other items in the test.

Item Characteristic Curves


Specific assumptions about the relationship
between the test taker's ability and his
performance on a given item are explicitly
stated in the mathematical formula, or item
characteristic curve (ICC).

Item Characteristic Curves


The form of the ICC is determined by the
particular mathematical model on which it is
based. The types of information about item
characteristics may include:
(1) the degree to which the item
discriminates among individuals of differing
levels of ability (the 'discrimination'
parameter a);

Item Characteristic Curves


(2) the level of difficulty of the item (the
'difficulty' parameter b), and
(3) the probability that an individual of low
ability can answer the item correctly (the
'pseudo-chance' or 'guessing' parameter c).
One of the major considerations in the
application of IRT models, therefore, is the
estimation of these item parameters.

ICC

Probability
Ability Scale

pseudo-chance parameter
c: p=0.20 for two items
difficulty parameter b:
halfway
between the
pseudo-chance parameter
and one
discrimination parameter
a: proportional to the slop
of the ICC at the point of
the difficulty parameter
The steeper the slope, the
greater the discrimination
parameter.

Ability Score
1. The test developer collects a set of observed
item responses from a relatively large number of
test takers.
2. After an initial examination of how well
various models fit the data, an IRT model is
selected.
3. Through an iterative procedure, parameter
estimates are assigned to items and ability scores
to individuals, so as to maximize the agreement, or
fit between the particular IRT model and the test
data.

Ability Score

Item Information Function


The limitations on CTS theory approaches to
precision of measurement are addressed in the IRT
concept of information function. The item
information function refers to the amount of
information a given item provides for estimating
an individual's level of ability, and is a function of
both the slope of the ICC and the amount of
variation at each ability level.
The information function of a given item will be at
its maximum for individuals whose ability is at or
near the value of the difficulty parameter.

Item Information Function

Item Information Function

Item Information Function


The information function of a given item will be at
its maximum for individuals whose ability is at or
near the value of the difficulty parameter.
(1) provides the most information about
differences in ability at the lower end of the ability
scale.
(2) provides relatively little information at any
point on the ability scale.
(3) provides the most information about
differences in ability at the high end of the ability
scale.

Test Information Function


The test information function (TIF) is the sum of
the item information functions, each of which
contributes independently to the total, and is a
measure of how much information a test provides
at different ability levels.
The TIF is the IRT analog of CTS theory
reliability and the standard error of measurement.

Item Bank
If there is a need for regular test administration and
analysis, the construction of item bank may be
taken into consideration.
Item bank is not a simple collection of test items
that is organized in their raw form, but with
parameters assigned on the basis of CTS or IRT
models.
Item bank should also have a data processing
system that assures the steady quality of the data in
the bank (describing, classifying, accepting, and
rejecting items)

Specifications in CTS Item Bank

Form of items
Type of item parts
Describing data
Classifying data

Form of Items
Dichotomous
Listening comprehension
Statement + question + choices
Short conversation +question + choices
Long conversation / passage + some questions + choices
Reading comprehension
Passage + some questions + choices
Passage + T/F questions
Syntactic knowledge / vocabulary
Question stem with blank/underlined parts + choices
Cloze
Passage + choices

Form of Items
Nondichotomous
Listening comprehension
Dictation
Dictation passage with blanks to be filled

Describing data

Ability measured
Difficulty index
Discrimination
Storage code

You might also like