Assignment 1

The six qualities of a useful test
English has long played an important role in almost every aspect of

modern life, from business to studies. Therefore, the teaching and learning
of the language have been of great concern among academics and
experts. Much research and hard work have been put into finding ways to
improve the delivery of English lessons among teachers and the retention
of lessons among learners; one of those is the designing and evaluation of
tests. When designing and evaluating a test, there are different qualities
that a test needs to meet so that it can be useful, namely reliability,
validity, authenticity, interactiveness, impact and practicality. These
qualities will be looked at in the following section respectively.
The first and most important quality of a useful test is test reliability,
according to Bachman and Palmer (1996), test reliability is the situation
whereby test scores do not vary under different test conditions or
occasions; in other words, the scores remain consistent whenever and
wherever a test is done. Therefore, reliability is a vital characteristic of a
good test. There are five features of a good test: its score consistency in
various test conditions, clear directions for scoring/evaluation, consistent
rubrics for scoring/ evaluation, consistent application of its rubrics among
scorers, unambiguous items/tasks from the perspective of test
takers.However, in reality, it is almost impossible to eliminate all factors
which affect testees test scores or performances. In fact, testers can only
improve test reliability by minimizing the sources of inconsistency during
the design of tests and minimizing test task characteristics. Besides,
consistent administration and marking also play a crucial role to the
reliability of a test. To illustrate this characteristic, lets examine this
example.
This is a real situation happening in my hometown, Ben Tre Province,
where teachers must take the FCE test by Cambridge ESOL. There is an
authorized test site in Ben Tre which many of them, FCE test takers,
condemned for its poor quality of listening conditions, resulting in their
failures of the test. As a consequence, they rushed to Ho Chi Minh City to
take the test in the high hope that their test scores could improve. It is
because they believed that the facilities in Ho Chi Minh City are better.
Sadly enough, the results came out to their disappointment and there was
no improvement in their listening scores. It proves that the level of
reliability of Cambridge ESOL papers is high and consistent across
different test conditions and settings.
Another example is about the FCE speaking test by Cambridge ESOL. A
surprising fact is that there is a too close recycling of speaking materials.
Many of the test-takers take the test every month; some of them have
taken the test around ten times and they have been given the same set of
speaking material. Moreover, in many test sites, the administration is not
strict enough, leading to those who have taken the test sharing the
information with those taking the test later. All these problems have one
way or another affected this characteristic of the test.
Test validity refers to the extent as to how far it achieves the purpose it
sets out to do. If the test fails to achieve this aim, it does not satisfy
the quality of validity. According to Heaton (1988), test validity was
traditionally subdivided into four categories: content, criterion-related,
empirical, and construct validity. On designing a test, it is important to
keep in mind this quality. The example below gives a scenario where
test validity is violated and how this particular situation can be
changed to satisfy this quality. One good example for this is:
A listening task is given with a summary for students to listen and fill in
the numbered gaps. The content of the recording is familiar with some
students sitting the test, and as a result, they can come up with
answers to the gaps. Even though the answers are not exactly the
same as the answer keys, but they are acceptable. In this case, the
test designer has violated the test validity quality. In order to improve
this situation, the instruction should clearly state that the numbered
spaces must be filled in with words from the recordings.
In short, there is a close link between test reliability and test validity as
stated in the book by Bachman and Palmer (1996, p.29): The two
measurement qualities, reliability and construct validity, are thus
essential to the usefulness of any language test. Reliability is the
necessary condition for construct validity, and hence for usefulness.
However, reliability is not a sufficient condition for either construct
validity or usefulness.
Authenticity comes next. It is an important quality as it shows the
relationship between the test and the real world. This term can be
viewed in two respects. The broader sense of authenticity in general
refers to the use of real life materials such as recordings taken from
news reports or interviews on TV, or reading texts taken from
newspapers or magazines, brochures. Another sense is seen when a
language test is designed in relation to the target language use in
specific domains besides the language test itself. It is important to take
into consideration the target language use (TLU) and test tasks when a
test is designed.
Another quality which is in close link to the above mentioned
characteristic is test interactiveness. It is defined as test-takers
reactions to the test given to them. Their reactions can be positive or
negative; however, a high positive attitude towards the test is

preferable. Therefore, effort should be made to ensure the test
interactiveness. These two qualities contribute significantly to the
usefulness of a test because they have implications on how test takers
perceive the test and how they perform in the test.
For example: A news report on current affairs taken from TV is highly
authentic in the first sense. However, if this report is given to a group
of students studying medicine, it is of little value because it is not
about their field of studies and may not be of interest to the testees.
Meanwhile, if a news report about Ebola is given to them, they may
feel more engaged in the listening and this time the test becomes
more authentic in the second respect because it is not only linked to
the real world but also to the specific domain which doctors and nurses
are studying.
Another feature is test impact. Bachman and Palmer(1996, p.39)
defined impact as the different ways in which test use can influence
the society, the educational system as a whole, and testees
individually. There are three important areas which demand our
attention concerning this quality: washback, impact on test-takers, and
on teachers.
First, washback refers to how testing influences teaching and learning,
which can be constructive or destructive (Bachman & Palmer 1996:30).
For example, in many high schools in Vietnam now, the practice of
teaching grammar solely for the purpose of semester and university
entrance test has killed the two most important skills in language
learning: listening and speaking. This phenomenon is happening due to
the requirements of the tests. This is known as teaching to the test.
It is by no means a negative practice, but as the ultimate goal of
learning a language is the ability to use it and communicate, there
should be inclusions of these skills and a balance across all skills.
Second, impact is on the part of test-takers. A test may have a positive
or negative impact on testees. For example, with good preparation and
feedback on good test results, students might feel encouraged in their
learning process. However, for those with poor test performance,
reporting the test results to them can be a painful task, both for the
teachers and the learners themselves. Therefore, in many developed
countries, the report of test scores has been made confidential and the
testee is the only one who knows their scores. This is done to ensure
testees self-esteem and it is seen as highly moral.
Finally, teachers can also use tests to improve their teaching. Good
teachers would look at areas where their students need to practice
more. This leads to the inevitable situation of teaching to the test as
mentioned above. For example, after taking the test, the teachers can
see that their students still have problems concerning the use of
relative pronouns in the writing and this particular grammar points
frequently appear in tests. So, the teachers can make adjustments to
their lesson plans so that next time their students can successfully deal
with this problem area.
The last component of test usefulness is test practicality. According to
Bachman and Palmer (1996, p.36), practicality is the correlative
relationship between the resources which are required for the design,
development, use of the test and the availability of the resources.
Although this is mentioned last in the six qualities, it is of no less
importance in relation with other features.
For example, a school with eight English teachers and twenty classes
of forty students. This school wants to conduct a speaking test and
really wants to ensure its reliability by using the scheme of two
teachers per one test room and students will take the test one by one.
On the face of it, there is no problem because there is an even number
of teachers and no teacher must work harder than any others.
However, with eight hundred students in total and eight teachers, this
is no easy task. Tiredness will certainly affect the marking of teachers
at this school. so this is not a practical way to conduct a speaking test.
In conclusion, when designing a test, it is important for test developers
to take into consideration all the six qualities mentioned in the paper
above. With one quality missing, the usefulness of the test will be
affected; in other words, such a test will be of little value both to test
takers and test teachers.
Question 2: Measures to ensure the six qualities of a useful test.
1. Measures to ensure test reliability
a. First, ensure that the length of the test is suitable for the
length of time allocated for testees to finish the test. If a test
is too short, it may lead to cheating, or if the test is too long,
it may cause irritation among test-takers as they do not have
enough time to finish the test. Consequently, testers may not
be able to measure their students knowledge.
b. If there are two or more sets of test given to students of the
same class, it is important that test tasks be kept similar. The
only thing to change is the positions of options and the order
of the questions in the test paper.
c. Another measure is to minimize the use of items where

educated guesses can be applied and the chance of arriving
at the correct answer is high, such as True/False items.
Expand it into True/False/Not Given, instead.
d. It is important to write clear instructions and ensure
consistent administration across different testing settings.
This is important because there are some invigilators who are
more lenient or tolerant than others, which may cause
unfairness to testees.
e. Forward planning is essential in test design. Last minute
construction of tests in most cases leads to improper
directions or instructions, misspellings and so on.
f. Regular tests should be given to students in classroom
settings to avoid potential inconsistencies due to students
internal factors, such as illnesses.
g. Finally, testees should be made to be familiar with the test
format by being given sample papers and also be equipped
with techniques to deal with the test successfully.
2. Measures to ensure validity:
a. First, it is important to ensure the reliability of a test because
it is the necessary condition for test validity. If test reliability
is not satisfied, the test itself cannot be valid.
b. When a test is designed, what is aimed to test should be in
the stem of the test items, not in the options.
c. The proportion of the stem and the option should also be
taken into account. For example, when a listening test is
given, the questions given should not exceed the length of
the listening; otherwise, it has become a reading
comprehension test, leading to the violation of test validity.
d. Finally, make sure that the scores given directly and
accurately reflects what is tested.
3. Measure to ensure test authenticity and interactiveness
a. First, choose test material which is from real life resources.
b. Second, take into account students level and field of studies.
c. Third, identify features that define task in the target
language use domains, or choose material which can
accommodate testees needs and interests or concerns.
d. Finally, design tasks which both satisfy linguistic skills and
TLU domains.
4. Measures to enhance test impact
a. First, help students to be well prepared for their upcoming
tests by equipping them with techniques and skills.
b. Second, provide feedback which is relevant and always try to
keep test scores confidential.
c. Third, use encouraging comments for those who fail to

achieve good test results.
5. Measures to ensure practicality
a. First, when designing tests, take into account the availability
of teaching aids and resources which can be used in the
tests.
b. Second, consider the human resources when carrying out the
test.
c. Third, allocate the resources available to achieve the highest
levels of satisfaction of all six levels
d. Finally, make clear on the lowest acceptable levels for each
individual qualities (Bachman and Palmer 1996:144)
References
Bachman, L. F., & Palmer, A. (2010). Language assessment in practice. London: Oxford
University Press
Heaton, J. B. (1988). Writing English language test (2nd ed). New York: Longman Inc.

Assignment 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment 1

Uploaded by

Copyright:

Available Formats

The six qualities of a useful test

English has long played an important role in almost every aspect of

negative; however, a high positive attitude towards the test is

c. Another measure is to minimize the use of items where

c. Third, use encouraging comments for those who fail to

You might also like