You are on page 1of 11

Journal of English for Academic Purposes 24 (2016) 78e88

Contents lists available at ScienceDirect

Journal of English for Academic Purposes


journal homepage: www.elsevier.com/locate/jeap

Source-based tasks in academic writing assessment: Lexical


diversity, textual borrowing and proficiency
Atta Gebril a, *, Lia Plakans b
a
The American University in Cairo, Egypt
b
The University of Iowa, Iowa City, IA, United States

a r t i c l e i n f o a b s t r a c t

Article history: With the growing interest in integrating reading with writing to assess academic English
Received 15 September 2015 writing, several questions have been raised about the role of source vocabulary in test
Received in revised form 15 September 2016 takers’ writing and, consequently, how scores from these tasks should be interpreted. The
Accepted 13 October 2016
current study investigates issues related to the influence of textual borrowing on lexical
Available online 3 November 2016
diversity and the difference in lexical diversity across test scores on integrated tasks. To
this end, 130 students in a Middle Eastern university completed a reading-based integrated
Keywords:
task. The essays were analyzed for lexical diversity using CLAN software, a computer
Integrated writing
Source-based writing
program developed to compute lexical diversity. Then to illuminate the impact of the
Lexical diversity source texts, vocabulary originating from the reading were removed from the essays, and
Academic writing the D index was recomputed for a lexical diversity score with borrowed vocabulary
Textual borrowing omitted. A paired samples t-test and Analysis of Variance (ANOVA) were used to answer
Writing assessment the research questions. The results showed that borrowing from source texts significantly
affects the lexical diversity values in integrated writing. Further, the results demonstrated
that lexical diversity plays a substantial role in integrated writing scores.
© 2016 Elsevier Ltd. All rights reserved.

1. Introduction

Integration of multiple skills in second language (L2) academic writing settings has recently received increasing attention,
with many testing programs employing integrated writing tasks in their assessments. For example, the Internet-based Test of
English as a Foreign Language (TOEFL iBT) uses an integrated writing task based on both reading and listening sources. A more
generic term used to describe the integration of reading and listening materials in writing tasks is source-based writing,
where students synthesize information from multiple sources while producing texts. Research has proposed that integrated
tasks improve the authenticity of academic writing assessment since they simulate, in part, actual practices in academic
contexts (Gebril, 2009; Knoch & Sitajalabhorn, 2013). According to Weigle (2004), students “are rarely if ever asked to write
essays based solely on their background knowledge; before they write on a given topic they are expected to read, discuss, and
think critically about that topic” (p. 30). Academic writing literature shows source-based writing as common in university
classes (e.g., Carson, 2001; Hale et al., 1996; Moore & Morton, 1999). Jordan (1997) suggests a number of skills required in an
academic setting that are closely related to source-based writing: (a) summarizing, paraphrasing, and synthesizing, (b) using
quotations and bibliography, and (c) locating and analyzing evidence.

* Corresponding author.
E-mail addresses: agebril@aucegypt.edu (A. Gebril), lia-plakans@uiowa.edu (L. Plakans).

http://dx.doi.org/10.1016/j.jeap.2016.10.001
1475-1585/© 2016 Elsevier Ltd. All rights reserved.
A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88 79

Source-based writing also provides background knowledge and, consequently, offers writers resources for unfamiliar
topics (Gebril, 2009; Weigle, 2004). Sources help students generate more ideas when working on unfamiliar topics and also
help them support their argument. In addition, research has reported that L2 writers frequently use source texts for language
support while composing integrated tasks including the borrowing of words (Cumming, 2002; Weigle, 2004; Plakans &
Gebril, 2012). These advantages have contributed to the popularity of source-based writing among both teachers and tes-
ters in different academic contexts.
However, the design and use of integrated writing tasks are not without challenges, and issues related to the effect of
source text materials on writing performance have been a concern. The literature has reported a number of problems
associated with source-based writing tasks, including problems with task design, inappropriate textual borrowing practices,
and the various conceptions of plagiarism in different instructional settings (Hirvela & Du, 2013; Pennycook, 1996; Shi, 2010;
Weigle, 2004; Gebril, 2010). One concern that has not received due attention in research is how source materials affect lexical
quality of academic writing. While an assumption may exist that writers use source text vocabulary as they write in integrated
tasks, research has not confirmed this assumption nor has it determined if this lexical borrowing significantly affects ex-
aminees’ scores. Such research is needed to provide empirical validity evidence for score interpretation.
Understanding discourse features in written performance is also essential in developing integrated test tasks and rating
scales. For example, rating scales are often designed based on intuitive methods (Brindley, 1991; Fulcher, 2003), and thus such
assumptions about writing at different score levels may be verified through this line of research. Cumming et al. (2005) stress
the importance of analyzing text features at different score levels in integrated writing for this purpose:
The discourse of written texts cannot be assumed consistent for examinees with differing levels of proficiency in
English, so consideration also needs to be given to how the written discourse of examinees varies in particular tasks
with their English proficiency. This information is needed to verify, or refine, the scoring schemes being developed to
evaluate examinees' performance on these writing tasks (pp. 8e9).
With integrated tasks questions arise as L2 writers borrow vocabulary items from sources, such as, how does this affect the
overall lexical quality of their writing? What is the relationship between these newer integrated writing tasks and lexical
quality? Such questions highlight the need for more research exploring the relationship between lexical quality and per-
formance on integrated writing tasks. For this purpose, this study investigates whether lexical diversity is affected by
borrowing words from source materials in integrated tasks, and the difference in lexical diversity across integrated writing
score levels.

2. Literature review

The following section includes an overview of the construct of lexical diversity, lexical quality research in L2 writing, and
research on lexical diversity in integrated writing contexts. These three strands were identified based on the recurrent themes
in the literature and their relevance to the current study. The first section on the lexical diversity construct clarifies the
operational definition of lexical diversity. This section is followed by a detailed survey of lexical diversity research within the
L2 writing context while the third strand narrows down the discussion into mainly lexical quality in source-based writing
contexts, the main focus of the present study.

2.1. What is lexical diversity?

Lexical measures in second language (L2) writing research are critical because they help identify the quality of a written
text as well as a writer's vocabulary knowledge and vocabulary size (Laufer & Nation, 1995). Issues related to lexical quality
are important in instructional settings especially for curriculum development and for decisions related to selection of class
materials. They are equally important in a writing assessment context where this evidence could provide information about
typical profiles at different proficiency levels.
Various terminology exists to describe lexical quality, including lexical range, verbal creativity, semantic abilities, semantic
proficiency, semantic factors, vocabulary size, lexical richness, lexical sophistication, lexical variation, lexical density, and
lexical diversity (Crystal, 1982; Fradis, Mihailescu, & Jipescu, 1992; Laufer, 2003). As Yu (2010) indicates, there is a sense of
“nomenclature diffusions and confusions” in the literature, which has resulted from using different concepts interchangeably,
quantifying lexical quality with various measures, and using lexical quality to describe both “language abilities of producers”
and “the quality of products” (p. 238). A number of researchers have attempted to deploy an umbrella term for the various
concepts. Wolfe-Quintero, Inagaki, and Kim (1998) used the term ‘lexical complexity’ to refer to the range and size of vo-
cabulary produced by a language user arguing that any investigation of lexical complexity should consider how “varied or
sophisticated the words or word types are” (p.101). Malvern, Richards, Chipere, and Duran (2004) used the label “vocabulary
richness” to subsume both lexical sophistication and lexical diversity. In a related vein, Read (2000) classified lexical richness
into four different components: lexical variation (diversity), lexical sophistication, lexical density, and number of errors
(lexical accuracy).
The current study focuses on lexical diversity, which Malvern et al. (2004) define as “the range of vocabulary and
avoidance of repetition” (p. 3). They argue that these terms could be used synonymously with lexical variation; however
‘lexical diversity’ is more commonly used in language research. The term lexical diversity was adopted from John B. Carroll
80 A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88

(1938), who used the phrasing diversity of vocabulary. He defined it as the “relative amount of repetitiveness, or the relative
variety in vocabulary” (p.379). Generally, a language user with high lexical diversity employs many different words. Less
repetition of vocabulary items leads to higher lexical variation (diversity).
This definition is based on the assumption that lexical diversity is a function of the number of different words in a text
(types) and how often they are repeated. Building on this work, Johnson (1944) suggested the type-token ratio (TTR) as a
measure of vocabulary flexibility or variability. He acknowledged the limitation of TTR as a sample-dependent measure of
lexical diversity and thus recommended calculating TTR from a fixed number of words from each text. Since the 1940s, a
number of researchers have suggested different solutions for the text length problem (e.g., Honore , 1979; Yule, 1944). Ac-
cording to McCarthy and Jarvis (2007), these research efforts are merely mathematical conversion of TTR, and they have
unfortunately failed to accurately tap into the lexical diversity construct. Recently the D Index (Malvern et al., 2004) was
proposed as an alternative measure attempting to account for text length through mathematical modeling of data. The
methodology section includes a detailed description of the D Index, and its strengths and weaknesses.

2.2. Lexical diversity in L2 academic writing research

In academic settings, appropriate vocabulary usage is considered “a strong indicator of whether the writer has adopted the
conventions of the relevant discourse community” (Nation, 2002. p. 178). University students generally demonstrate im-
provements in their academic field when their writing shows an increase in academic vocabulary (Laufer, 1994). Research in
L2 writing has consistently shown lexical skill as a critical feature in L2 writing proficiency (Engber, 1995), finding that a
relatively high correlation between vocabulary knowledge and writing quality exists with coefficients ranging from 0.6 to 0.7
(Koda, 1993; Schoonen et al., 2003). In a survey investigating ESL students’ perceptions about writing for academic purposes,
Leki and Carson (1994) reported that students cited vocabulary as one of the most important skills when writing.
Researchers have explored differences in lexical diversity between native and nonnative writers. A study with high school
students from Sweden and Britain investigated the differences in vocabulary usage between both groups (Linnarud, 1986).
The two groups were asked to write a narrative English text based on a series of pictures, which were then analyzed for lexical
features. The results showed less lexical diversity and originality in the L2 students’ writings when compared to native
writers. In addition, there was a significant difference in word frequency between the two groups with the most substantial
difference in the use of adverbs and adjectives. Jarvis (2002) also found significant differences between the writing of English
native speakers and nonnative speakers in lexical diversity.
Researchers have also considered the effects of proficiency and topic on lexical diversity in L2 writing showing general
agreement that lexical diversity improves with increased proficiency (Engber, 1995; Grant & Ginther, 2000; Jarvis, 2002). In a
recent study, Crossley and McNamara (2012) conducted a corpus analysis using essays written by students taking the Hong
Kong Advanced Level Examination (HKALE). The study focused on a wide range of text features including both linguistic
sophistication and cohesion. The study results showed that higher proficiency writers demonstrated more lexical diversity
compared to writers at lower levels; lexical features accounted for 29% of score variance.
Yu (2010) investigated the relationship between lexical diversity and both proficiency and topic in Michigan English
Language Assessment Battery (MELAB) writing tasks. Data were collected from five different MELAB tasks and the D index
was used to compute lexical diversity. Yu's analysis showed that lexical diversity has a significant positive correlation with
both general language proficiency and writing score. The results yielded variation in the correlation values between lexical
diversity and writing scores across a number of factors including L1 background, gender, and test taking purpose. Further-
more, the analysis found impersonal topics with higher lexical diversity than personal topics. Higher diversity was also found
with familiar topics. Yu's study is similar to the current study in the use of the D index to compute lexical diversity. However,
unlike previous research on writing and lexical diversity, our study uses a different type of writing activity, an integrated
reading-writing task, and investigates issues of lexical borrowing by exploring how source-related vocabulary affects lexical
diversity.

2.3. Source-based writing and lexical diversity

There is relatively little research on lexical diversity in source-based writing. One of the few studies that considered this
issue was conducted by Baba (2009) using a summary writing task. She investigated the relationship between different
aspects of lexical proficiency (including lexical diversity) and the quality of summaries written by Japanese students. This
review of her study addresses only the lexical diversity data from Baba's study. The participants completed eight different
tasks: two summaries written in English, a vocabulary size test, a vocabulary depth test, a word definition test, reading
comprehension tests, self-assessment questionnaire of communicative English ability, a writing task in Japanese, and finally a
test of Japanese vocabulary knowledge. The results showed a non-linear correlation between lexical diversity and quality of
summary writing. Baba argued that this variation in lexical diversity is due to the heavy reliance on source materials while
composing, and because of this nonlinear relationship, lexical diversity did not significantly contribute to variability of
summary writing scores. While these results were interpreted as source text borrowing, the scope of the study did not
directly measure the impact of words from the source texts on lexical diversity. To test Baba's hypothesis, the current study
investigates the potential impact of borrowed words from source texts on lexical diversity in an academic reading-writing
assessment task.
A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88 81

As part of the TOEFL iBT validation, Cumming et al. (2005) studied the differences between independent and integrated
prototype tasks across a number of discourse features. The researchers used three different writing tasks: independent,
integrated reading-writing, and integrated listening-writing. Analysis of lexical features focused on two measures: average
word length and type/token ratio. While the first index is typically used to compute lexical sophistication, the second one
measures lexical diversity. Their results showed the two integrated task types yielding higher average word length than the
independent task. A similar pattern was observed in the type-token ratio results with the integrated tasks yielding higher
lexical diversity values. The analysis also revealed statistically significant differences in the two lexical features across
different score levels. Although this study was pioneering, a number of issues were not addressed. First, type-token ratio was
used for computing lexical diversity, a measure heavily impacted by length of a written text (Malvern et al., 2004). Further, it
did not control for the content words that test takers borrowed from the source texts, a limitation acknowledged by the
authors.
Overall, the results of the previous literature on lexical diversity in writing and in integrated writing are not conclusive and
perhaps even contradictory regarding its relationship to proficiency and the impact of textual borrowing on source-based
performances. Some of the inconsistency may result from the different techniques used for measuring lexical diversity,
creating difficulty in comparing results across studies. In addition, many studies did not control for text length, which can lead
to data imprecision since, as research suggests, lexical diversity is a function of text length (Malvern et al., 2004). However,
textual borrowing was not controlled for in these studies although researchers mention the potential impact of source context
on lexical diversity to explain their results (Gebril & Plakans, 2009; Baba, 2009; Cumming et al., 2005; Yu, 2010). To provide
empirical grounding for this assumption and to inform the use and scoring of source-based writing, the current study in-
vestigates lexical diversity in integrated reading-writing assessment tasks and how it is affected by borrowing from source
texts. The following questions are addressed in this research:

(1) Is lexical diversity affected by borrowing vocabulary from sources in reading-based integrated writing tasks?
(2) Do essays at different score levels differ in lexical diversity? If so, does this difference exist when source related vo-
cabulary is removed from the essays?

3. Methods and materials

3.1. Participants

This study includes writing from 130 undergraduate students in an English medium Middle Eastern university. The writers
were female students majoring in a number of humanities and social science fields, including general linguistics, applied
linguistics, translation, and social work. Arabic is the native language of all the participants. Since classes are taught in English,
students in this university typically spend their first two years in a foundational language program. The dataset on which this
analysis is based is part of a large published research project (Gebril & Plakans, 2009, 2013; Plakans & Gebril, 2012).

3.2. Writing task

The reading-based integrated (Gebril & Plakans, 2009, 2013; Plakans & Gebril, 2012) task used in this study requires
students to write an argumentative essay on the topic of global warming. Two reading passages include opposing views about
this issue were attached to the prompt (A link to an online copy of this task will be included after the review is completed for
blinding reasons). In this task, students were asked to read the two passages and then synthesize information from these
sources in their writing. Source information could be used to support their argument through different discourse synthesis
strategies, such as quoting, paraphrasing and summarizing. Global warming was selected to allow discussion of two opposing
views and to encourage use of the sources since it was a relatively unfamiliar topic for participants. The reading passages were
chosen from authentic sources, but modified to ensure similar length and difficulty level. Readability statistics were calcu-
lated for both difficulty and length indices, and the two texts yielded relatively similar values. Following Carrell (1987)’s
suggestion to consider topic familiarity and cultural appropriateness, the task was also given to five faculty members in the
same university where data were collected to evaluate the texts for these issues. Before operational data collection, a group of
46 students piloted the task from the same university. The piloting data were not included in the final analysis. During
operational data, students were given one hour to complete the integrated task.

3.3. Scoring

The writing samples were scored using the TOEFL iBT integrated writing scoring rubric (http://www.ets.org/toefl/ibt/
scores/) with some modifications since the current study used a reading-based integrated task while the TOEFL task in-
cludes both reading and listening. This adaptation was originally made by Gebril (2009, 2010) by removing all the references
to listening sources in the rubric. Two raters were recruited for this study and completed a training session, which started
with briefly introducing the study purpose and familiarizing the raters with the task and the scoring rubric. The raters also
read sample essays representing different levels on the scoring rubric. After that, the raters scored a number of writing
82 A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88

samples followed by a discussion of each scoring decision. Once adequate agreement over scoring decisions was reached, the
raters started operational scoring. Each writing sample was scored independently by each rater, and an inter-rater reliability
value of r ¼ 0.75 (Cohen's Kappa) was obtained. In case of disagreement, a third rater was employed as a tie-breaker. After
finishing scoring, the researchers combined score levels together to create only three score levels for the purpose of data
analysis: Level 1 and Level 2 became (Level 1), Level 3 became (Level 2), and Level 4 and Level 5 changed to (Level 3). This
decision was made because of the relatively small number of observations in some of the levels. In addition, research has
shown that adjacent proficiency levels tend to yield insignificant differences (Ortega, 2003; Wolfe-Quintero et al., 1998). Table
1 includes a distribution of scores obtained from the two raters across the three proficiency levels.

3.4. Measuring lexical diversity

A wide range of lexical diversity measures have been used in previous research (Baba, 2009; Cumming et al., 2005; Duran,
Malvern, Richards, & Chipere, 2004; Malvern et al., 2004; Yu, 2010). One of the most common measures employed in these
studies is the type-token ratio (TTR), which looks into the number of different words in a text in relation to the total number of
words in the same text. As explained earlier, this measure is inherently problematic as it is affected by text length (Duran et al.,
2004; Malvern et al., 2004); since the denominator in the TTR formula is the text tokens (i.e. the total number of words in a
text), this leads to a smaller ratio when text length increases. To overcome this problem, researchers have employed a number
of techniques, such as using a standardized sample (Stickler, 1987) or algebraic transformations (Carroll, 1964; Yoder, Davies,
& Bishop, 1994). Unfortunately, these suggested solutions have not overcome this problem as serious issues with the reli-
ability and validity of the data collected were found in these studies (Malvern et al., 2004). Therefore, researchers have
continued to seek a lexical diversity measure that can do the following (Duran et al., 2004):

▪ take into account the range, amount of repetition, and function of text vocabulary.
▪ use all words in a text.
▪ employ the same sampling methods when analyzing different texts.
▪ analyze short texts.

The D index (Malvern et al., 2004) has become a preferred measure since it takes into consideration the aforementioned
criteria. The D index is based on a probability model that takes random samples of tokens (words) drawn from the same text.
The mathematical modeling procedure used depends on random sampling without replacement, meaning that words already
sampled from a specific text cannot be selected again (Malvern et al., 2004). Consequently, text length is not a main concern
when calculating the D index, an assumption that has been tested empirically (Silverman & Ratner, 2002). However, this does
not mean that text length has no impact on the D value, and it is recommended that texts be more than 50 words long. The
following quotation may give a better description of this issue (Malvern et al., 2004, p. 64):
We have claimed that by using a mathematical modeling procedure on a standard-sized window of 35e50 tokens of the
TTR versus token curve, lexical diversity values will no longer be a function of text length. The use of the word ‘function of’ is
crucial here, as we do not mean to imply that there is no relationship between lexical diversity values and total number of
words.
McCarthy and Jarvis (2007) discussed similar issues in their paper, recommending that researchers be careful when
interpreting lexical diversity values from short texts.
Recently a number of computer programs were developed to calculate the D index. We chose the CLAN suite, which is a
computer program developed by McKee and offered for free to researchers by the Child Language Data Exchange System
(CHILDES) (MacWhinney, 2011). One of the programs in the CLAN suite is called vocd, which is used to calculate the D index.
The following quotation provides a description of the procedures used by vocd (McCarthy & Jarvis, 2007):
First, vocd estimates a text's level of LD (lexical density) by taking 100 random samples of 35 tokens drawn from the text
and calculating a mean TTR for these samples. This procedure is repeated for samples of 36 tokens, 37 tokens, and so on,
all the way to samples of 50 tokens. The program then plots the mean TTR values for each sample size in order to create
a random-sampling TTR curve for the text.
We followed the guidelines provided by the CLAN manual when preparing the data for vocd analysis. First, the written
texts were checked for any spelling and typographical errors. Next, the text files were converted into a CHAT format, which is
the CLAN default, and the written texts were entered in the CLAN program to calculate the D index. A low D value shows that

Table 1
Participant proficiency Profile.

Score levels # participants


Level 1 1e2 57
Level 2 3 49
Level 3 4e5 24
A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88 83

the text repeats many of the same words, and consequently it is considered not lexically rich. On the other hand, a high D
value indicates that the essay uses a wide range of vocabulary, which makes the text lexically diverse.
As discussed earlier, there is an assumption that source texts may impact the lexical diversity values in integrated tasks
since writers are expected to use vocabulary items from the source materials in their writing. Indeed, from our preliminary
analysis, it was clear that some writers at certain score levels used the source texts more than the others. Therefore, explore
the effect of source vocabulary on lexical analysis, an additional analysis of lexical diversity was developed. This analysis was
relatively challenging since the literature has not provided any suggestions for addressing this issue, although researchers
(Gebril & Plakans, 2009, 2013; Baba, 2009; Cumming et al., 2005, 2005; Yu, 2010) have referred to this potential problem.
After some lengthy discussions, we devised an analysis plan that removed specific source-related vocabulary from the test
takers’ responses and then recalculated D for all essays.
In order to identify what source-related vocabulary to delete from the essays, we decided on a process that focused on
words from the reading passages based on human judgment and that were triangulated with participants' written perfor-
mances. As reading relies on reader interpretations, we wished to follow a process not dependent on computational analysis.
Two raters who were experienced teachers of second language writing were involved in the process of determining source-
related vocabulary (they were different from the raters who scored the essays). We hoped that raters’ expertise in source-
based writing would help them identify borrowed words and phrases that appeared in the essays. First, raters examined
the reading texts, and then highlighted the key source words that might potentially be used by writers. Function words and
highly common words were not added to this list; only words that were considered “unique” to the source materials were
included. Since the process involved subjective judgments, the two raters selected these words independently and were
asked to err on the side of caution. Next, a follow-up discussion session on the identification decisions was held to determine
an agreed upon list of source-related vocabulary. The next step was to finalize and triangulate the list based on actual per-
formances from the task. Using the list of potential borrowed words generated from the reading passages, two raters
highlighted these words in a subset of 20 essays with two essays at each score level. During this process, the raters inde-
pendently considered whether the source-related words on the list were appropriate for deletion from the essays, and also if
there were any other words that should be added. Following this marking of 20 essays, a meeting was held with the two raters
to decide on the final list of source-related vocabulary to delete before recalculating D.
To further narrow the list of words, the source-related vocabulary in the essays was categorized into four groups based on
the raters' feedback: (1) general words that were related to the topic (e.g. “scientists”, “air” “plants”), (2) technical terms or
proper nouns (e.g. “oceanographic institute” “Greenland's ice cap” “water cycle effects”), (3) high level vocabulary (e.g.
“evidence” “undeniable” “policymakers”) and (4) borrowed phrases (e.g. “well under way”, “reliable global satellite data”).1
The first group of words was considered likely to be in the writers' active vocabulary and, thus, was not included in the
deletion list. The words from the second group were deleted as we felt that they were less likely to be in the writers' active
lexicon and more likely to be borrowed from the source texts.2 Similarly the words from the other two groups were
considered ‘borrowed’ from the source texts since they were either less as likely to be part of the writers' active vocabulary or,
in the case of phrases, were strings of words that might be considered problematic borrowing (plagiarism) in academic
writing (Gebril & Plakans, 2013; Cumming et al., 2005). The deletion of these words was only to recalculate lexical diversity
with borrowed words removed; the essays, with the words deleted, were no longer readable. We did not assume that writers
could have written the essays without these words, but only focused on uncovering how much the borrowing was impacting
lexical diversity in the essays (See Appendix A for the final list).
After this identification process, words from the list in Appendix A were deleted from all the essays and the D values were
recalculated. The final data set included two variables for lexical diversity: A variable where lexical diversity was calculated
while keeping the source vocabulary (LD) and another variable where lexical diversity was calculated after content-specific
vocabulary (words from Appendix A) was deleted (LDD).

3.5. Data analysis

After completing the lexical diversity analysis using the CLAN software, descriptive statistics were run to obtain the means
and standard deviations for the different study variables (See Table 2). In order to answer Question 1, investigating the
differences in lexical diversity before and after deleting the source-related vocabulary, a paired samples t-test was used to
compare these two variables (LD and LDD) for all the test takers. Question 2 focused on one independent variable (integrated
writing score) that has three levels and two dependent variables (LD and LDD). Two ANOVAs were conducted, one for each
dependent variable. Pair-wise comparisons were conducted to consider significant differences in lexical diversity among
these different score levels.
We had some concerns about the unbalanced number of students in the three groups and for this reason we conducted a
follow-up ANOVA with a smaller sample size by randomly selecting 30 participants from Level 1, 30 from Level 2, all 24

1
The types of words in the latter group are not mutually exclusive from the other three; however, this step was helpful in considering the writers' use of
the words.
2
We recognize that a pre-task vocabulary test would be the most assured way to control for participants' active vocabulary, but since the essays from the
task were used to generate the borrowed word list, pre-testing was not possible.
84 A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88

Table 2
Descriptive statistics for lexical diversity.

N Mean SD
LD 130 60.03 25.14
LDD 130 50.35 24.62
LD100 130 54.82 25.52
LD100D 130 44.60 27.26

students from Level 3. The analysis showed similar results to those obtained from the complete data set, supporting the use of
the original data set. This decision was also reinforced by the different tests for ANOVA assumptions described in the next
section.

4. Results

The following section provides a detailed description of data analysis for the two research questions. The first part focuses
on the t-test results for Question 1 while the second part addresses the ANOVA results for Question 2.

4.1. The effect of borrowing on lexical diversity

The first question looks into whether source-related vocabulary affects lexical diversity scores on the integrated writing
task. For this purpose, a paired-samples t-test was conducted to analyze the difference in lexical diversity before and after the
deletion of source-related words. As shown in Tables 2 and 3, the results indicated that the mean for lexical diversity with
source vocabulary (LD: M ¼ 60.03, SD ¼ 25.14) was significantly higher than the mean value of lexical diversity with the
source vocabulary removed (LDD: M ¼ 50.35, SD ¼ 24.62). This result shows borrowing words from the source texts impacted
writers’ lexical diversity in integrated reading-based writing tasks (see Table 4).
Since deleting source-related vocabulary would affect the total numbers of words in an essay and thus decrease lexical
diversity values, the researchers decided to recalculate lexical diversity for these two variables with only the first 100 words in
the text. For this purpose, the only 100 words for LD and LDD were entered in the VOCD program a D value was obtained. This
analysis yielded two variables: lexical diversity based on the first 100 words in the essays (LD-100) and lexical diversity based
on the first 100 words in the essays after deleting the source-related vocabulary (LD-100D). Another paired-samples t-test
was conducted to consider whether there was a significant difference between LD100 and LD100D. The analysis confirmed
the initial results showing significant differences between lexical diversity with source vocabulary based on the initial 100
words (LD100: M ¼ 54.82, SD ¼ 25.52) and the lexical diversity based on the first 100 words after deleting source-related
vocabulary (LD100D: M ¼ 44.60, SD ¼ 27.26). These results show that text length did not substantially affect lexical di-
versity values, while borrowing words from the source text did.

4.2. Differences in lexical diversity across score levels

In order to answer Question 2, the data set was analyzed using one-way ANOVA with score level as the independent
variable and lexical diversity (LD) as the dependent variable. Levene's homogeneity of variance test yielded equal population
variances across the three score levels, and therefore the homogeneity of variance assumption was met. The ANOVA results
yielded significant differences in lexical diversity across the writing score levels, (F (2, 127) ¼ 8.45, p < .001, h2 ¼ 0.12), with
medium effect size (see Table 5). This result indicated that there is a relationship between lexical diversity and integrated
writing scores. To investigate pair-wise comparisons across the three score levels, LSD post-hoc analysis was used. As shown
in Table 6, Level 1 (M ¼ 51.84, SD ¼ 24.02) was significantly different in lexical diversity from both Level 2 (M ¼ 68.12,
SD ¼ 24.15) and Level 3 (M ¼ 69.21, SD ¼ 17.16). However, no significant differences in lexical diversity were found between
Level 2 and Level 3.
To see if similar findings exist when lexical diversity is calculated with source vocabulary removed (LDD), a second ANOVA
explored the relationship between integrated scores and lexical diversity values. The ANOVA results also showed significant

Table 3
Paired-samples t-test for lexical diversity.

Paired differences t df Sig.

M SD Std. error mean 95% Confidence


interval

Lower Upper
Pair 1 LD- LDD 9.67 15.46 1.35 7.02 12.34 7.19 129 .000
Pair 2 LD100 - LD100D 10.22 19.94 1.74 6.79 13.65 5.89 129 .000
A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88 85

Table 4
Descriptive statistics of lexical diversity across proficiency levels.

Proficiency LD LDD

Mean SD Mean SD
1 51.84 24.02 41.87 24.48
2 68.12 24.15 57.34 22.46
3 69.21 17.16 63.82 15.67
Total 60.21 24.57 50.50 24.26

Table 5
Results of ANOVA and effect sizes for lexical diversity.

df SS MS F p partial h2
Lexical diversity with source vocabulary (LD)
Score level 2 9133.67 4566.84 8.45 .000 .12
Error 127 68754.67 541.37
Total 130 577516.64
Lexical diversity with source vocabulary deleted (LDD)
Score level 2 10231.40 5115.70 9.88 .000 .135
Error 127 65728.42 517.55
Total 130 407494.14

Table 6
Post hoc test of lexical diversity across score levels.

LD LDD

1 2 1 2
2 16.28a 15.47a
3 17.91a 1.63 21.95a 6.48
a
Statistically significant difference using LSD (p ¼ .05).

differences with a relatively large effect size (.13) across integrated score levels, (F (2, 127) ¼ 9.88, p < .001, h2 ¼ 0.135). Table 6
illustrates that the results of the post-hoc comparisons, which yielded similar results to those for lexical diversity with text
borrowing. Level 2 and Level 3 were found to be significantly different from Level 1, but not from each other. Fig. 1 shows a line
graph of the lexical diversity across score levels for the two lexical diversity variables. While there are no significant dif-
ferences between Level 2 and Level 3, the results demonstrate that Level 2 lexical diversity values decreased more when
deleting source-related vocabulary compared to Level 3, possibly indicating that Level 2 benefited the most from the use of
sources in their writing.

5. Discussion

The unique finding of this study is that lexical diversity was affected by vocabulary borrowed from source materials. When
the source-related vocabulary was removed from the written responses, lexical diversity values substantially decreased, even
when essay length was controlled by using just the first 100 words. This result provides empirical evidence that source-

Fig. 1. LD and LDD across the three score levels.


86 A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88

related vocabulary improves lexical diversity in integrated academic writing. Test takers utilize the source texts in ways that
increase their word diversity while composing for integrated writing tasks as suggested in the literature. Our results provide
empirical evidence to explain the findings of Cumming et al. (2005), who showed integrated writing tasks yielding higher
lexical diversity values than independent tasks. Baba (2009) also acknowledges the potential impact of source materials on
lexical diversity, which our results confirm with statistical significance. In a study of writer's processes in using source texts in
the integrated tasks used in our analysis (Plakans & Gebril, 2012), test takers reported using source texts to serve a number of
related functions in support of their writing, including gaining ideas and language support; in terms of language support, they
explicitly mentioned finding source vocabulary to use in written responses. Fifty-three percent of the participants in the study
reported using words from the reading sources while writing, especially technical terminology. The results of the present
study confirm, empirically, that these writers reported accurately what they were doing.
The writing scores yielded statistically significant differences in lexical diversity across the various levels of integrated
scores regardless of lexical borrowing. This result is in agreement with numerous other studies that have shown advanced L2
writers employing a wide range of words, leading to higher lexical diversity (Crossley & McNamara, 2009; Engber, 1995;
Jarvis, 2002; Yu, 2010). For example, Crossley and McNamara found less lexical overlap in higher-scored essays when
compared to lower score levels. In source-based writing research (Gebril & Plakans, 2009; Cumming et al., 2005), similar
results have been obtained. Cumming et al. showed that type-token ratio is different across various score levels, especially in
contrast between the lower and higher texts. Our study obtained the same result as Level 1 was found to be significantly
different from both Level 2 and Level 3. The difference in lexical diversity between the mid and upper level writing was not
significant, suggesting that this feature plays a lesser role in distinguishing academic writing at these levels. Overall, the
textual borrowing analysis generally showed that the distinction among different score levels becomes more discernible
when the source-related vocabulary is deleted from the texts as shown in descriptive statistics and given the larger effect size
(See Fig. 1).
This study holds a number of implications for the assessment and instruction of academic English source-based writing. It
is clear from the study results that the use of sources can increase writers' lexical range and consequently enhance the
perceived quality of their writing. As Engber (1995) indicated, rich vocabulary is likely to positively affect readers and raters in
assessment contexts. This conclusion has been confirmed by other researchers who found a strong positive relationship
between writing scores and lexical quality (e.g. Yu, 2010). For this reason, instructors and raters alike have to approach a
source-based text carefully when grading integrated tasks to recognize that the apparent level of lexical diversity may be
partially due to the source material rather than the writers’ lexical competence. In addition, scoring rubrics of integrated
writing tasks should reflect this reality, with the role of source-related vocabulary emphasized. For example, the accurate use
of borrowed words should be reflected in the rubric across the score levels.
The results of this study hold a number of implications for L2 academic writing classes. Given the relationship between
text borrowing and lexical quality, writing classes, particularly in academic contexts, should provide students with necessary
skills needed to adequately use sources in integrated tasks. Teachers could focus on how to use source materials to
compensate for ‘inadequate repertoire of vocabulary’ (Oxford, 1990). Along with this, writerly reading strategies, students will
need guidance in the tricky balance between expanding one's vocabulary through source use and not appropriating ideas of
others. Activities related to vocabulary usage are very important, such as word-focused instruction (Laufer, 2005, 2011; Webb,
2005). These skills are of strategic importance in academic settings where students are required to synthesize information
from sources in most of their university classes.
This study was an exploratory step into systematically investigating the impact of source vocabulary on lexical diversity in
integrated writing performances. The conclusions are based solely on quantitative measures. Therefore, future research
should consider this borrowing from a qualitative lens to better understand the impact of borrowing. For example, were there
any differences in borrowing behavior across score levels? Perhaps the difference between Level 2 and Level 3 writing
appeared in the choice of words borrowed rather than in the D index. Were there certain words and phrases borrowed more
than others in general? These qualitative or mixed methods studies could add valuable insight for test development and
integrated assessment research.

5.1. Limitations

This study focused on reading-based integrated writing tasks, and thus, the results may not be generalizable in other task
types. Our study looked into lexical diversity, but did not investigate the accuracy of vocabulary usage in test takers' writings.
Mere exposure to vocabulary items in a source text does not guarantee that test takers will use them correctly. In addition, the
procedures used to remove source-related vocabulary were relatively subjective and may be have been affected by the raters.
We followed a systematic process and cross-checked the vocabulary removed from essays. In this process, we used the
writing performances to inform the list of words for removal; however, it is possible that some words removed were part of
participants’ active vocabulary. In future research, a pre-test on vocabulary may be used to refine this process. A vocabulary
test of the words from the source texts might help control this aspect of the study. Finally, the study worked with students
from the Middle East and that is why linguistic background of participants should be taken into consideration when inter-
preting the results to other contexts.
A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88 87

6. Conclusion

The current study investigated the effects of borrowing words from sources on lexical diversity. The role of source-related
vocabulary in integrated writing tasks requires ongoing investigation, to provide further evidence of this impact and to
illuminate the nuances for textual borrowing. A number of questions need to be answered including how writers use vo-
cabulary borrowed from the source texts and whether the lexical quality of their writing is affected by proficiency and the
topics they are writing on. Although previous research on lexical diversity (Baba, 2009; Yu, 2010) looked at some of these
issues, research on source-based writing and lexical borrowing is still in its infancy. The current study suggests a relatively
new approach to lexical diversity investigation in integrated tasks by considering the impact of lexical borrowing from
sources on vocabulary quality. The results offer empirical support for the significant impact of lexical borrowing in integrated
tasks. Consequently, L2 writing instructors should invest in training students on how to approach source texts and how to
appropriately select vocabulary needed to support their writing. Lastly, the study provides validity evidence for integrated
tests that can help language testers when making informed decisions about integrated task design and also the development
of scoring rubrics.

Appendix A. Source text words eliminated for second lexical diversity analysis.

Passage 1 Passage 2
climate change scientific agreement
caused by human activity current predictions
melting ice wrong theories
ice leave no doubt wrong belief
getting warmer ground-level temperature
people are to blame earth has warmed
weather is going to suffer show no evidence
computer models Evidence
ocean temperatures must act now
clearest signal world's governments
well under way policymakers
Mr. Barnett act immediately
climate models gather more data
most of the evidence Industry
clear effects reducing its influence
Ruth Curry Supporters of the theory
oceanographic institute human-caused
water cycle affects environmental problems
causing droughts reject these beliefs
Greenland's ice cap Sea levels
Sharon Smith around the globe
melting ice not equally.
important base Contrary to the predictions
food supply current rate
disappearing ice average rate
polar bears Undeniable
seals Sterling Burnett
losing their homes reliable global satellite data
caused by global warming
abused the earth
act immediately
save our planet
serious problems
causing drought
damaging our planet
David Smith
Reuters
Tim Barnett

References

Baba, K. (2009). Aspects of lexical proficiency in writing summaries in a foreign language. Journal of Second Language Writing, 18(3), 191e208.
Brindley, G. (1991). Defining language ability: the criteria for criteria. In S. Anivan (Ed.), Current developments in language testing (pp. 139e164). Singapore:
Regional Language Centre.
Carroll, J. B. (1938). Diversity of vocabulary and the harmonic series law of word-frequency distribution. The Psychological Record, 2, 379e386.
Carroll, J. B. (1964). Language and thought. Englewood Cliffs, NJ: Prentice-Hall.
Carson, J. (2001). A task analysis of reading and writing in academic contexts. In D. Belcher, & A. Hirvela (Eds.), Linking literacies: Perspectives on L2 reading-
writing connections (pp. 48e83). Ann Arbor, MI: The University of Michigan Press.
Crossley, S. A., & McNamara, D. S. (2009). Computational assessment of lexical differences in L1 and L2 writing. Journal of Second Language Writing, 18(2),
119e135.
88 A. Gebril, L. Plakans / Journal of English for Academic Purposes 24 (2016) 78e88

Crossley, S. A., & McNamara, D. S. (2012). Predicting second language writing proficiency: the roles of cohesion and linguistic sophistication. Journal of
Research in Reading, 35(2), 115e135.
Crystal, D. (1982). Profiling linguistic disability. London: Edward Arnold.
Cumming, A. (2002). Assessing L2 writing: Alternative constructs and ethical dilemmas. Assessing Writing, 8(2), 73e83.
Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype
tasks for next generation TOEFL. Assessing Writing, 10(1), 5e43.
Duran, P., Malvern, D., Richards, B., & Chipere, N. (2004). Developmental trends in lexical diversity. Applied Linguistics, 25(2), 220.
Engber, C. A. (1995). The relationship of lexical proficiency to the quality of ESL compositions. Journal of Second Language Writing, 4(2), 139e155.
Fradis, A., Mihailescu, L., & Jipescu, I. (1992). The distribution of major grammatical classes in the vocabulary of Romanian aphasic patients. Aphasiology, 6(5),
477e489.
Fulcher, G. (2003). Testing second language speaking. London: Pearson Education.
Grant, L., & Ginther, A. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9(2),
123e145.
Gebril, A. (2009). Score generalizability of academic writing tasks: Does one test method fit it all? Journal of Language Testing, 26, 507e531. http://dx.doi.org/
10.1177/0265532209340188.
Gebril, A. (2010). Bringing reading-to-write and writing-only assessment tasks together: A generalizability analysis. Assessing Writing, 15, 100e117. http://dx.
doi.org/10.1016/j.asw.2010.05.002.
Gebril, A., & Plakans, L. (2009). Investigating source use, discourse features, and process in integrated writing tests. In Spaan Fellow Working Papers in Second
/ Foreign Language Assessment, 7 pp. 47e84). Ann Arbor: The University of Michigan.
Gebril, A., & Plakans, L. (2013). Towards a transparent construct of reading-to-write assessment tasks: The interface between discourse features and
proficiency. Language Assessment Quarterly, 10(1), 9e27. http://dx.doi.org/10.1080/15434303.2011.642040.
Hale, G., Taylor, C., Bridgeman, J., Carson, J., Kroll, B., & Kantor, R. (1996). A study of writing tasks assigned in academic degree programs. Princeton, NJ:
Educational Testing Service.
Hirvela, A., & Du, Q. (2013). “Why am I paraphrasing?”: Undergraduate ESL writers' engagement with source-based academic writing and reading. Journal of
English for Academic Purposes, 12(2), 87e98.
Honore , A. (1979). Some simple measures of richness of vocabulary. Association for Literary and Linguistic Computing Bulletin, 7, 172e177.
Jarvis, S. (2002). Short texts, best-fitting curves and new measures of lexical diversity. Language Testing, 19(1), 57.
Johnson, W. (1944). Studies in language behavior: I. A program of research. Psychological Monographs, 56, 1e15.
Jordan, R. R. (1997). English for academic purposes: A guide and resource book for teachers. Cambridge: Cambridge University Press.
Knoch, U., & Sitajalabhorn, W. (2013). A closer look at integrated writing tasks: Towards a more focused definition for assessment purposes. Assessing
Writing, 18(4), 300e308.
Koda, K. (1993). Task-induced variability in FL composition: Language-specific perspectives. Foreign Language Annals, 26(3), 332e346.
Laufer, B. (1994). The lexical profile of second language writing: Does it change over time? RELC journal, 25(2), 21e33.
Laufer, B. (2003). The influence of L2 on L1 collocational knowledge and on L1 lexical diversity in free written expression. In Vivian Cook (Ed.), Effects of the
second language on the first (pp. 19e31). New York: Multilingual Matters Ltd.
Laufer, B. (2005). Focus on form in second language vocabulary learning. Eurosla Yearbook, 5(1), 223e250.
Laufer, B. (2011). The contribution of dictionary use to the production and retention of collocations in a second language. International Journal of Lexi-
cography, 24e49(1), 29.
Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3), 307e322.
Leki, I., & Carson, J. G. (1994). Students' perceptions of EAP writing instruction and writing needs across the disciplines. TESOL Quarterly, 28(1), 81e101.
Linnarud, M. (1986). Lexis in composition: A performance analysis of Swedish learners' written English. Malmo € , Sweden: CWK Gleerup.
MacWhinney, B. (2011). CLAN manual. Retrieved from http://childes.psy.cmu.edu/manuals/clan.pdf.
Malvern, D., Richards, B. J., Chipere, N., & Duran, P. (2004). Lexical diversity and language development: Quantification and assessment. New York: Palgrave
Macmillan.
McCarthy, P. M., & Jarvis, S. (2007). vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459e488.
Moore, T., & Morton, J. (1999). Authenticity in the IELTS academic module writing test: A comparative study of task 2 items and university assignments (IELTS
Research Reports no. 2). Canberra: IELTS Australia.
Nation, I. S. (2002). Learning vocabulary in another language. Cambridge: Cambridge University Press.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied linguistics,
24(4), 492e518.
Oxford, R. L. (1990). Language learning strategies: What every teacher should know. New York: Newbury House.
Pennycook, A. (1996). Borrowing others' words: Text, ownership, memory, and plagiarism. TESOL Quarterly, 30(2), 201e230.
Plakans, L., & Gebril, A. (2012). A close investigation of source use in integrated writing tasks. AssessingWriting, 17(1), 18e34. http://dx.doi.org/10.1016/j.asw.
2011.09.002.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.
Schoonen, R., Gelderen, A., Glopper, K., Hulstijn, J., Simis, A., Snellings, P., et al. (2003). First language and second language writing: The role of linguistic
knowledge, speed of processing, and metacognitive knowledge. Language Learning, 53(1), 165e202.
Shi, L. (2010). Textual appropriation and citing behaviors of university undergraduates. Applied Linguistics, 31(1), 1e24.
Silverman, S. W., & Ratner, N. B. (2002). Measuring lexical diversity in children who stutter: Application of vocd. Journal of Fluency Disorders, 27, 289e304.
Stickler, K. R. (1987). Guide to analysis of language transcripts. Eau Claire, WI: Thinking Publications.
Webb, S. (2005). Receptive and productive vocabulary learning. Studies in Second Language Acquisition, 27(1), 33e52.
Weigle, S. C. (2004). Integrating reading and writing in a competency test for non-native speakers of English. Assessing Writing, 9, 27e55.
Wolfe-Quintero, K., Inagaki, S., & Kim, H. Y. (1998). Second language development in writing: Measures of fluency, accuracy, & complexity. Honolulu, HI:
University of Hawaii Pres.
Yoder, P. J., Davies, B., & Bishop, K. (1994). Adult interaction style effects on the language sampling and transcription process with children who have
developmental disabilities. American Journal on Mental Retardation, 99(3), 270e282.
Yu, G. (2010). Lexical diversity in writing and speaking task performances. Applied Linguistics, 31(2), 236.
Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge, UK: Cambridge University Press.

Atta Gebril is an associate professor in the Department of Applied Linguistics, the American University in Cairo. He obtained his PhD from the University of
Iowa in foreign language and ESL education with a minor in language testing. He teaches courses in language assessment and research methods in applied
linguistics.

Lia Plakans is an associate professor in Foreign Language/ESL Education at the University of Iowa. She teaches language assessment, language planning &
policy, and L2 learning. Her research investigates L2 readingewriting connections and integrated skills assessment. She has been an English language
educator in Texas, Ohio, Iowa, and Latvia.

You might also like