Build Bright University Language Testing and Assessment: Chapter-2

Build Bright University
Language Testing and Assessment
Chapter-2
Principles of Language
Assessment
Prepared by Kheang Sokheng
Ph.D Candidate and MEd in
TESOL
Principles of Language Assessment
 Five cardinal criteria for “testing a test” are

as follows:
 Practicality
 Reliability
 Validity
 Authenticity
 Washback
Practicality
An effective practical test. This means
that it
is not excessivelyexpensive,
stays within appropriate time constraint,
is relatively easy to administer,
has a scoring/evaluation procedure that
is specific andtime-efficient.
Examples of Practicality checklist
 1. Are administrative details clearly
established before the test?
 2. Can students complete the test
reasonably within the set time frame?
 3. Is the cost of the test within budget
limits?
Reliability
 Reliability means the degree to which an
assessment tool produces stable and
consistent results.
 A reliable test is consistent anddependable.
 A test is reliable if:
“You give the same test to the same student or
matched students on two
different occasions, the test should
yield similar results.”
(Brown, 2004)
Student-Related Reliability
 The most common learner-related issue

in reliability is caused by temporary
illness, fatigue, a “bad day”, anxiety, and
otherphysicalor psychological factors.
Rater Reliability
 Inter-rater reliability:
When twoor more scorers yield
inconsistent scores of the sametest.
Factors: lack of attention to scoring,
inexperience, inattention, etc.
 Intra-rater reliability:
Scoring criteria, fatigue, bias toward

particular “good” and “bad” students,
or simple carelessness.
Test Administration Reliability
 This involves the condition in which the
test isadministered.
 Unreliability occurs due to outside
interference like noise, variations in
photocopying, temperature variations,
the amount of light in various parts of
theroom, and even theconditionof desk
andchairs.
Test Administration Reliability
 Brown (2010) stated that he once
witnessed the administration of a test of
aural comprehension in which an audio
player was used to deliver items for
comprehension, but due to street noise
outside the building, test-taker sitting
next to open windows could not hear the
stimuliclearly.
Test Reliability
Factors causeunreliability:
If a test is too long, test takers may
become fatigued by the time they reach

the later items and hastily respond
incorrectly.
Ambiguous items
Validity
 Validity is the extent to which inferences made
from assessment results are appropriate,
meaningful, and useful in terms of the
purpose of the assessment” (Gronlund, 1998,
p.226).
“Measuring whatshould be measured”
 Content-related evidence
 Criterion-related evidence
 Construct-related evidence
 Consequential validity
 Face validity
Content-Related Evidence
 If a test samples the subject matter
about which conclusions are to be
drawn.
 If a test requires the test-taker to

perform the behavior that is being
measured.
Criterion-Related Evidence
 Criterion-Related Evidence is used to
demonstrate the accuracy of a measure or
procedure by comparing it with another
measure or procedure which has been
demonstratedtobe valid.
 For instance, imagine a hands-on driving test
has been shown to be an accurate test of
driving skills. By comparing the scores on
the written driving test with the scores from
the hands-on driving test, the written can be
validated by using a criterion related
strategy in which the hand-on driving test is
compared to the writtentest.
1.Concurrent validity/empiric validity if a test
result is supported by other concurrent
performance beyond assessment itself; for
example, the validity of a high score on the
final examof a foreign language coursewill be
substantiated by actual proficiency in the
language.
2. Predictive validity is used to assess
(and predict) a test-taker’s likelihood of
future success.
E.g. Placement tests, admissions
assessment batteries, language aptitude
tests.
Consequential validity
 It encompasses all the consequences ofa
test, including such considerations
as its accuracy in measuring intended
criteria, its impact on the preparation of
the test-takers, its effect on thelearner,
and the (intended and unintended) social
consequences of a test’s interpretation

Face Validity
 “Itrefers tothedegree towhich a test looks
right, and appears to measure the
knowledge or abilities itclaims to measure,
based on the subjective judgment of the
examinees who take it, the administrative
personnel whodecide on itsuse, and other
psychometrically unsophisticated
observers”(Mousavi, 2002, p.244)
Face Validity
 Sometimes students don’t know what is being
testedwhentheytacklea test. They mayfeel,
for a variety of reasons, that is a test isn’t
testing what it is “ supposed” to test. Face
validity means that the students perceive the
testtobevalid.
 Face validity will likely be high if the learners
encounter:
 a well-constructed, expected format with
familiar tasks,
Face Validity
 a test that is clearly doable within the allotted
time limit,
 Items that are clear anduncomplicated,
 Directions that are crystal clear,
 Tasks that relate to their course work
(content validity),and
 a difficulty level that presents a reasonable
challenge.
Authenticity
 Bachman and Palmer(1996,p.23) define as “
the degree of correspondence of the
characteristics of a given language test task
to the features of a target language task,”
and then suggest an agenda for identifying
those target language tasks and for
transforming themintovalid test items.
 Authenticity of a test may be present in the
following ways:
Authenticity
 The language in a test is as natural as
possible.
 Items contextualized rather than isolated.
 Topics are meaningful (relevant, interesting
) for thelearner.
 Some thematic organization to items is
provided, such as through a story line or
episode.
 Tasks represent, or closely approximate,
real-world tasks.
Washback
 The term ‘washback’ or backwash refers
to “the effect of testing on teaching and
learning” (Hughes, 2003, p.1)
 For instance, the extent to which
assessment affects a student’s future
language development.
 Factors that provide beneficial washback
in a test (Brown, 2010):
 It can positively influence whatand how
teachers teach, studentslearn;
Washback
 offer learners a chance to adequately prepare,
 give learners feedback that enhance their
language development,
 is more formative in nature than summative,
 Provide conditions for peak performance
by learners.
 In large-scale assessment, washback refers to
the effects that tests have on instruction in
terms of how students prepare for the
test−e.g., cram courses and teaching to the
test.
Washback
 Washback also includes the effects of an
assessment on teaching and learning prior to
the assessment itself, i.e. on preparation for
the assessment.
 The challenge to teachers is to create
classroom tests that serv as
devices through whic eh washback learninis
achieved. g
 Washback enhances a number of basic
principles of learning acquisition: intrinsic
motivation, autonomy, self-confidence,
Washback
 Ways to improvewashback:
 To comment generously and specifically on test
performance
 Through a specificationof thenumericalscoreson the
various subsectionsof thetest.
 Formative versus summative tests:
 Formative tests provide washback in the form of
information to the learner on progress towards
goals.
 Summative tests provide washback for learners to
initiate further pursuits, more learning, more goals,
and morechallengesto face.

Build Bright University Language Testing and Assessment: Chapter-2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Build Bright University Language Testing and Assessment: Chapter-2

Uploaded by

Copyright:

Available Formats

Build Bright University

Language Testing and Assessment

 Five cardinal criteria for “testing a test” are

stays within appropriate time constraint,

is relatively easy to administer,

has a scoring/evaluation procedure that

 The most common learner-related issue

Scoring criteria, fatigue, bias toward

become fatigued by the time they reach

 If a test requires the test-taker to

and the (intended and unintended) social

consequences of a test’s interpretation

You might also like