You are on page 1of 17

Test Review

Test Review: TOEFL iBT & IELTS

Assessment of English Language Learner

Mengzi Cai
Test Review

Introduction

Language assessment become a popular topic in study and employment. According to

Pill & Harding (2013), the influence of a test is real and possibly far-reaching for test-taker and

wide range of stakeholder groups. The authors also explain the reason that those groups might

not take part into the construction of test materials but take the language assessment score as

reference for making decision. Language assessment is used in many areas of social life. English

as a global language. English language assessment is widely used in many domains. Students

have to take English language assessment as one of admission test when they want to study in

the English-speaking countries. Moreover, people who wants to get job in the international

company intend to take English language assessment as a certification for their application for

job. I am interested in two English language assessments which are widespread among many

countries.

According to the ETS website, there are over 35 million users around the word and

TOEFL test is accepted by more than 1000 universities and other institutions. IELTS test could

be accepted in most of the countries, such as Australia, New Zealand and United Kingdom.

These popular language proficiency tests focus on listening, speaking, reading and writing skills

and the majority test takers of these two tests are students. Compared with these two tests, I can

make some suggestions for specific students to choose the more feasible test in order to gain a

higher education in US.


Test Review

TOFEL iBT

Publisher: Education Testing Service (ETS)


Mail Stop 50-L
Princeton, NJ 08541, USA
1-609-771-7100
Publication Date: 2005
Target Population: Non-native English speaker
Cost: $205

Overview

The full name of TOEFL test is the Test of English as a Foreign Language exam. TOEFL first
inception was in 1964. After several major revisions, the TOEFL iBT test was launched in 2005.
TOEFL iBT focuses on listening, speaking, reading and writing, which reflects real-world
academic environments. It aims at providing high-quality English assessments for a variety of
academic uses and contexts. The motivation of designing the TOEFL iBT test is to give more
reflection of communicative competence in the test and expand the scope of test takers across the
world. The medium of the test was decided to by Internet. Therefore, the original name of the test
was the internet-based test, which abbreviated as iBT.

Test purpose The TOEFL iBT test provides proficiency assessment for non-native English
learner. Typically, the purpose of the test is to examine the communicative
language ability of people who first language is other languages but not English
in the academic environment, such as university life.
(TOEFL research volume 1,p4)

Test Structure The test is divided into four parts: Reading, Listening, Speaking, Writing

Reading:
Items: 35-56 questions
Times: 60-80 minutes
Content: Three or four passages of 700 words with thirteen or fourteen
questions for each passage. The following questions assess the test taker’s
comprehension of factual information, author’s purpose and connections
between facts and ideas, inference of information from the passage, meaning of
vocabulary.

Listening:
Items: 34-51 questions
Test Review

Times: 60-90 minutes


Content: Four to Six lectures which are in different academic areas, including
two or three conversations which are between faculty/staff and fellow students.
The rest of the listening materials are lectures which focus on a specific topic.
The questions assess test takers’ understanding of main ideas or important
details, speaker’s attitude or function, organization of the information, as well
as relationships between the ideas and ability to make inferences or connections
among the information.

Speaking:
Items: 6 tasks
Times: 20 minutes
Content: Two independent parts ask students to respond based on their opinions
and experience.
Integrated speaking tasks:
read/listen/speak (campus situation): Test takers should read a short passage
which talks about a typical campus situation or policy firstly. Secondly, test
takers should listen to a conversation about the reading. Test takers should
combine and retell the key information of reading and listening materials in the
tasks.
Read/listen/speak (academic course topic): The reading part define a term,
process or idea from an academic subject and the listening part is about the
lecture providing the example and precisely explanation of the term. Finally,
test takers should combine the key information from Reading and Listening
tasks.
Listen/Speak (campus situation): Test takers have to listen to a conversation
about the problems they meet in the school and two solutions. Students should
express their opinions about the problem and these two solutions.
Listen/ speak (academic course topic): Test takers listen to a lecture that define
a term or concept by providing concrete examples. Test takers should combine
the explanation and examples.

Writing:
Items: 2 tasks
Times:50 minutes
Content:
Independent Writing Task:Test takers should write down their opinion and
explanation about a specific issue.
Integrated Writing Task: Test takers should read the passage firstly and then
listen to part of lecture which make criticize of the information related to the
reading. Test takers should summarize these two materials.
(TOEFL iBT Test Framework and Test Development, 2018)
Test takers • Students going to study at a high education institution
• People who want to get English-language learning program admissions and
exit
• Scholarship and certification candidates
Test Review

• English learners who want to test their English learning progress


• People who apply for visa
(Retrieved
from:https://www.ets.org/toefl/ibt/about?WT.ac=toeflhome_aboutibt_180910)
Scoring of test Reading Section: 0-30
Listening Section:0-30
Speaking Section: 0-30
Writing Section:0-30
Total Score: 0-120

The scoring of Speaking Section divided into four parts: General Description,
Delivery, Language Use, Topic Development. Test takers will get the score
from 0 to 4 dependent on their performance of these four parts.

Based on the writing rubric from TOEFL iBT. Test takers will get the grade
from 0 to 5 according to the standards of topic, organization, contextual and
language use.

Statistical ETS collected the data from the test takers who took the TOEFL iBT test
Distribution between January 2017 and December 2017.
of Scores
Reading Listening Speaking Writing Total
Mean 20.7 20.3 20.4 20.8 82
S.D. 6.7 6.7 4.5 4.8 20

(Test and Score Data Summary for TOEFL iBT Tests, 2017)
Validity Overall, there are six propositions and related evidences:
Evidence • The relevance and representativeness of test content
• Task design and scoring rubrics
• Linguistic knowledge, process, and strategies
• Test structure
• Relationship between TOEFL iBT scores and other criteria of language
proficiency
1)Self-assessment: “Observed correlations between the scores for each of
the four self-assessment scales averaged .46 with test scores on the
measures of four skills and .52 with the total test score.” (Validity, 11)
2)Academic placement: it summarized the mean total test score from test
takers who study in English-speaking counties but in different programs. IEP:
approximately 55-60; ESL & Content: approximately 58-65; Content only:
approximately 75-80
3) Local institutional tests for International Teaching Assistants (See
Appendix A)
4) Performance on Simulated Academic Listening Tasks
5) Performance on real-world speaking and writing tasks
• Test use and consequences
Test Review

(Validity Evidence Supporting the Interpretation use of TOEFL scores,2018)


Evidence for ETS uses the reliability coefficient and SEM to test the reliability of TOEFL
Reliability test.
Scale Reliability Estimate SEM
Reading 0-30 0.87 2.34
Listening 0-30 0.87 2.38
Speaking 0-30 0.86 1.57
Writing 0-30 0.80 2.14
Total 0-120 0.95 4.26

The reliability of Writing section is lower than the other three sections. EST
explained that Writing section consists of two task which are time-consuming
task. Time-consuming will gain the lower reliability than the less or shorter
time-consuming tasks. Also, the reliability of different form of a test is
estimated among 148,000 test takers(Reading:0.76; Listening: 0.75;Speaking:
0.80; Writing:0.9). This result indicated that it is a high degree of consistency
of the scores from these test repeaters. What’s more, a research is taken to test
the correlation between scores from Time 1 in which test repeaters’ writings
are graded by e-rater and human rater and Time 2 in which test repeaters’
writings are graded by human rater (See Appendix B). For integrated task, e-
rater score and human rater 1 score have the same correlation with Time 2
human rater score. However, for independent task, e-rater score can better
predict Time 2 score than human rater. For the raw Writing score, the
combination of human rater and e-rater of each task can best predict the Time
2 score.
(Reliability and Comparability of TOEFL iBT® Scores,2018)
Test Review

IELTS

Publisher: British Council, IDP: IELTS Australia and Cambridge Assessment English
Publication Date: 1989
Target Population: Non-native English speaker who want to study or work in English-speaking
country
Cost: $260

Overview:

IELTS is the abbreviation of The International English Language Testing System. It is used to test
the language proficiency of people who intend to study or work in English-speaking country. For
example, more than 10000 education and training providers in world approve the IELTS certificate
as one evidence of language proficiency in English while some universities which are in non-
English speaking country but have English courses also ask for an IELTS score. IELTS can be
used in professional registration too. It means that English skill is one of prerequisite for applicants
and IELTS score might be one requirement for the vocational training. What’s more, IELTS score
can be one evidence for immigration to Australia, Canada, New Zealand and UK. Therefore, there
are two types of IELTS test. One is IELTS Academic which is used to apply for higher education.
Another one is IELTS General Training which is provided for people who want to have secondary
education, work experience or training program in English-speaking country. For this test review,
I will focus on IELTS Academic.

Test purpose IELTS Academic provides tool for test takers who want to apply for higher
education or professional registration to test their English proficiency.

Test Structure The listening, Reading and Writing sections are taken on the same day but
Speaking section can be taken up to a week before or after the other tests.

Listening:
Items: 4 recordings with 40 questions
Times: 30 minutes
Content: The general topics of four recordings: 1) a conversation between two
people talking about everyday social topic 2) a monologue set in an everyday
social context 3) a conversation between more than four people talking about
educational or training context 4) a monologue which is related to academic
subject.
The task type of Listening section contains six types:
1)Multiple choice:there is a question with three possible answers and
test takers are required to choose one correct answer. Test takers also
would be asked to choose more than one answers when they are given
a longer list of possible answers. This task type focuses on the detailed
understanding of specific points or overall understanding of main
points.
Test Review

2)Matching: test takers are asked to match the items from a numbered list
and a set options on the question paper. The targeted skills of this task
type are listening detailed information, understanding of information
given in a conversation, and ability to follow a conversation between
two people and identify the relationships and connections between
facts.
3)Plan, map, diagram labelling:Test takers are required to complete or
select the label(usually from a list provided on the question paper) on a
plan, map or diagram. This task type assesses the ability to understand
and follow language expressing spatial relationships and directions.
4)Form, note, table, flow-chart, summary completion: test takers will
select the answers from list or recognize missing words from the
listening and complete the form, notes, table or flow-chart. It aims at
testing the test takers’ recording main points from listening materials.
5)Sentence completion:test takers are asked to read a set of sentences
which are summarization of key information of all the listening material
or part of it, and fill a gap which is related to the listening text. Ability
to identify key information is emphasized in this task type.
6)Short-answer questions: test takers have to read a question about the
listening text and then answer the question in a word limitation. It
focuses on listener’s ability to listen for concrete facts.

Reading:
Items: three reading passages with 40 questions
Times: 60 minutes
Content: Eleven task types are designed in Reading section
1) Multiple choice:test takers have to choose the best answer of four
alternatives or the best two answers of five alternatives, or the best three
answers of seven alternatives. It aims at testing test takers’ detailed
understanding and overall understanding of reading passage.
2) Identifying information: test takers are required to answer true, false
and not given in this task. It can be used to assess test takers’ ability to
identify the points of information in the reading passages.
3) Identifying writer’s view/claims: test takers are required to answer
yes, no or not given to the question that “Do the following statements
agree with the views/claims of the writer?” It is used to test whether
test takers recognize the opinion or ideas in reading.
4) Matching information: test takers have to locate the specific
information in the lettered paragraphs/sections and then write the
letters of correct paragraphs or sections on the sheet. It tests the ability
to scan for specific information.
The rest of task types are similar to the second to sixth task types of listening
section.

Writing:
Test Review

Items:2
Times:60 minutes
Content:For task 1, test takers have to describe facts or statistics of one or
more graphs, charts or tables on a related topic and they have to write the
statement at least 150 words. This task tests the ability to the most important
and relevant information in a graph or chart. Task 2 ask test takers to write an
academic or semi-formal/neutral style article about the given topic and more
than 250 words. This task checks the test takers’ ability of writing a clear and
well-organized argument.

Speaking
Items: three parts
Times:11-14 minutes
Content:
1) Part 1-Introduction and interview:In this part, after the examiner
introduce himself/herself and check the test taker’s identity,
examiner would ask the questions about the familiar topics to test
taker. This task checks the ability to have daily communication in
oral English.
2) Part 2- Long turn: test takers will get the task cards about the
topic they have to talk about, and the task cards includes the
points can be used in the talk and teach test takers explain one
aspect of the topic. After the test takers’ speech in 2 minutes, the
examiners have to ask questions about their speech and test takers
should answer the questions. This task assesses the ability to
speak at length on a given topic.
3) Part 3- Discussion:In this part, the examiners and test takers
keep talking about the topic in part 2, but in greater depth. This
part asks test takers to have ability to express their opinion and
analyze the issues
(Retrieved from: https://www.ielts.org/about-the-test/test-format)
Scoring of test Listening & Reading:

Listening section and reading section contain 40 questions for each and each
correct answer can be given one mark. Scores out of 40 are converted to the
IELTS nine-band scale (See Appendix C).

Criteria for Writing:


Task achievement (for Task 1) and task response (for Task 2)
Coherence and cohesion
Lexical resource
Grammatical range and accuracy
Test takers will receive band score in Writing section.

Criteria for Speaking:


Fluency and coherence
Test Review

Lexical resource
Grammatical range and accuracy
Pronunciation
Test takers will receive band score in Speaking section.
(Retrieved from: https://www.ielts.org/ielts-for-organisations/ielts-scoring-in-
detail)
Statistical IELTS score statistics are collected by several categories:Academic and
Distribution General Training test takers, Gender, Place of origin, First language. For
of Scores example, I choose a part of figures in Place of origin category in 2017.
Listening Reading Writing Speaking Overall
China 5.90 6.11 5.37 5.39 5.76
Canada 7.09 6.78 6.16 7.15 6.86
Japan 5.91 6.09 5.41 5.59 5.81
Germany 7.76 7.52 6.60 7.36 7.37
(Retrieved from: https://www.ielts.org/teaching-and-research/test-taker-
performance)
Evidence of IELTS develops at the same times with advances in applied linguistics,
validity language pedagogy, language assessment and technology, which ensures the
validity of the test. IELTS develops a process of designing the test to make sure
the validity of the test: commissioning, pre-editing, editing, pretesting, standard
fixing and test construction and grading. What’s more, the test writers are from
different English-speaking countries to ensure the content is real-life materials,
without any cultural bias and fair to all test takers.
An research was conducted by Kerstjens&Nery(2000) for investigate
predictive validity of IELTS among the students from Technical Further
Education(TAFE) and Higher Education program. It shows that “In the total
sample, significant correlated were found between the Reading and Writing
tests and GPA(.262,.204 respectively). When Higher Education and TAFE
scores were looked at separately, only the Reading score remained significant
for the Higher Education group. While none of the correlations was significant
in the TAFE group, the magnitude of the correlation between the Writing test
and GPA (.194) was very similar to that for the total sample, which was
statistically significant.” (Kerstjens& Nery, 2000, p1)
Evidence of The reliability estimates for modules are used in 2017.
Reliability Section Mean SD Alpha SEM
Listening 6.10 1.3 0.92 0.37
Academic reading 6.02 1.2 0.90 0.38

IELTS explained the reliability of the Writing and Speaking Section cannot be
estimated in the same way because they are not item-based and the scores in
these two sections are according to the criteria. However, experimental
generalizability studies are used as part of the Speaking and Writing Revision
Projects to study the reliability of rating and the recent data shows that
coefficient of 0.83-0.86 for Speaking and 0.81-0.89 for Writing. The composite
reliability estimate for both the Academic and General Training sections was
0.96 and SEM was 0.23 based on the data from 2009.
Test Review

(Retrieved from: https://www.ielts.org/teaching-and-research/test-


performance)
Test Review

Discussion

The context I envision is teaching an EFL class of non-native English speakers in China.

The students speak Mandarin as their native language, and they graduated from high school.

They have never been to English-speaking country and have less chance to talk with English

speakers. They would come to the AEP class for improving their language before they enroll in

university in US, which means that they would come to the AEP class for improving their

language skills and preparing their study in university in US. Students have to choose one

language assessment to be one evidence for enrollment in the university in US and the AEP class

would provide some training for the students aiming at the language assessment they choose.

They have appropriate two months for preparing the test.

The number of the students is 15 and they are all in the intermediate level. The students

have same level in listening, reading, writing and speaking. The AEP class have already taught

students grammar and vocabulary and instructor is going to provide some training for their

language assessment. For example, instructor can introduce the format of the language

assessment they choose and give practice for the students in the two months. Therefore, students

have to choose one assessment which the university in US can accept and they can have well

preparation for the assessment.

Based on the conditions described above, I think TOEFL iBT will be a better choice for

the students above. First of all, TOEFL iBT is designed for non-native English speakers and it is

accepted by most of universities in US. TOEFL iBT focus on listening, speaking, reading and

writing which are also emphasized in the AEP class. Based on the test review, the validity of

TOEFL iBT reflects on the test design, test implement and test use and consequence. It indicates

that the test score can well reflect the students’ performance or language proficiency.
Test Review

Although the IELTS test have advantage like the speaking section is separated from the

listening, reading and writing section, which means test takers will not have a long testing times

and they ca have a well preparation for speaking section, test takers have to an English speaker

might cause a problem. The students have less opportunity to talk with native English speaker so

they might be shy or stressful if it is the first time or second time to talk with foreigner face to

face. What’s more, they have only two months for preparing the test. The task types of IELTS

are more than TOEFL iBT. It will take more times for students to get familiar with the task types

if they choose IELTS. In addition, when I have review for the document at the IELTS website, I

realize some minus differences from British English and American English, such as the spelling

recognise. IELTS are most used in Australia, UK and New Zealand which use the British

English. However, the students choose to get higher education in US. I think in this way, TOEFL

might be more adaptive for the students.

In conclusion, TOEFL iBT is better choice for the students in a given context. Both

assessments have their advantages and disadvantages, but we should choose one appropriate

assessment based on the situation.


Test Review

Appendix
Appendix A

Table 2. Correlations Between the Scores on the TOEFL iBT Speaking Section and
Different Types of Local ITA Assessments

Observed
Type of Local ITA Assessment
Correlation
Simulated teaching test (content and noncontent combined) scored on the basis
.78
of linguistic qualities (n = 84)
Simulated teaching test (separate content- and noncontent-based tests) scored
.70
on the basis of linguistic qualities and teaching skills (n = 45)
Simulated teaching test (content based) scored on the basis of linguistic qualities,
.53
teacher presence, and nonverbal communication (n = 53)
Real classroom teaching sessions scored on the basis of linguistic qualities,
.44
lecturing skills, and awareness of American classroom culture (n = 23)

Appendix B

Table 2. Correlation of Time 1 Writing Scores with Time 2 (Independent +


Integrated) Human Rater Scores in a Sample of Repeat Test Takers

Correlation with
Time
Task Time 1 Score
2 Human Rater
Scores
e-rater score 0.57
Integrated Human Rater 1 score 0.57
Combined e-rater & Human Rater 1 score 0.62
e-rater score 0.60
Independent Human Rater 1 score 0.53
Combined e-rater & Human Rater 1 score 0.60
Human Rater 1 score on integrated task + Human Rater 1 score
0.64
on independent task
Combined e-rater & Human Rater 1 score on integrated task +
0.66
Human Rater 1 score on independent task
Total Raw Human Rater 1 score on integrated task + combined e-rater &
0.66
Score Human Rater 1 score on independent task
e-rater score on integrated task + e-rater score on independent
0.63
task
Combined e-rater & Human Rater 1 score on integrated task +
Combined e-rater & Human Rater 1 score on independent task
Test Review

Appendix C
Test Review

References

About the TOEFL iBT Test(n.d.), Retrieved April 1, 2019 from:

https://www.ets.org/toefl/ibt/about

Kerstjens M. & Nery C. (2000), Predictive Validity in The IELTS Test: A Study of the

Relationship Between IELTS Scores and Students Subsequent Academic Performance,

International English Language Testing System (IELTS) Research Reports 2000, Volume

3,86-108

Pill, J., & Harding, L. (2013). Defining the language assessment literacy gap: Evidence from a

parliamentary inquiry. Language Testing, 30(3), 381–402.

Reliability and Comparability of TOEFL iBT Scores (2018), TOEFL Research Insight Series,

Volume 3,1-16

TOEFL iBT Test Framework and Test Development (2018), TOEFL® Research Insight Series,

Volume 1, 1-12

Test and Score Data Summary for TOEFL iBT tests (2017), Retrieved April 1, 2019 from:

www.ets.org/toefl/research.

Test Format(n.d.), Retrieved April 25,2019 from: https://www.ielts.org/about-the-test/test-format

Test taker performance 2017(n.d.), Retrieved April 25,2019 from:


Test Review

https://www.ielts.org/teaching-and-research/test-taker-performance

Validity Evidence Supporting the Interpretation and Use of TOEFL iBT Scores (2018), TOEFL

Research Insight Series, Volume 4,1-17

You might also like