You are on page 1of 171

the

Li
ngui
sti
cs
J
ournal
VOLUME8I
SSUE12014

I
SSN17381460

The Linguistics Journal

July 2014
Volume 8 Issue 1

Editors: Paul Robertson and Biljana ubrovi

The Linguistics Journal


July 2014
Volume 8, Number 1
http://www.linguistics-journal.com
English Language Education Publishing
Brisbane
Australia

This E-book is in copyright. Subject to statutory exception


no reproduction of any part may take place without
the written permission of the English Language Education Publishing.
No unauthorized photocopying
All rights reserved. No part of this book may be reproduced, stored
in a retrieval system or transmitted in any form or by any means,
electronic, mechanical, photocopying or otherwise, without the prior
written permission of English Language Education Publishing.
linguisticsj@yahoo.com
Editors: Dr. Paul Robertson and Dr. Biljana ubrovi
Chief Editor: Dr. Biljana ubrovi
Senior Advisor: Dr. John Adamson
Journal Production Editor: Dr. Erin Carrie
ISSN 1738-1460

Table of Contents:
Foreword by Biljana ubrovi

1-3

Research Articles
1. Dina Awad
Diverse Acquisition Patterns

4 - 29

2. Ibrahim M. R. Al-Shaer
The Use of Third-Person Pronouns by Native and Non-Native Speakers of English

30 - 59

3. Napasri Timyam
An Analysis of Learner Use of Argument Structure Constructions: A Case of Thai
Learners Using the Passive and Existential Constructions in English

60 - 89

4. Mohammad Aliakbari, Mahmoud Qaracholloo and Ali Mansouri Nejad


Social Class and Language Structure: A Methodological Inquiry into Bernstein's
Theory of Sociology of Education

90 -114

Research Notes
5. Ming Wei
Code-Switching in a Virtual English Community in China: An International Perspective

115 - 135

6. Jabulani Sibanda
Interrogating Current Conceptualisations of Word for Word Knowledge Studies:
Challenges and Prospects.

136 - 151

7. Mara Jos Serrano and Miguel A. Aijn Oliva


On Gendered Styles and their Socio-Cognitive Foundations

152 - 167

Foreword
This years edition of the journal comprises seven articles: four full research articles and three
research notes. Thanks are extended primarily to the authors who have contributed to this
edition, and the Associate Editors, reviewers, and the production team under Dr. Erin Carrie
for their efforts in preparing the papers for publication. This last year has been unique for the
journal in terms of the significant changes affecting the Editorial Board, a healthy volume of
submissions and a large number of new reviewers together with a brand new production team
who have joined the journal in their new roles. Congratulations must be extended to all the
new editors, who have become part of the team recently and have already proved to be
dependable, constructive and highly professional. Special thanks go to John Adamson, who
has moved on to a sister journal but has helped me take over the position of the Chief Editor
he has successfully held for many years and helped with all my questions and concerns since
January 2014.
The first contribution, entitled "Diverse Acquisition Patterns" by Dina Awad,
elaborates on second language acquisition issues, featuring one of the most problematic areas
of English grammar - the articles as used by native speakers of Arabic. Awad's original study
of the acquisition of the definite and indefinite articles in SLA shows that the developmental
patterns of the two articles are divergent in both accuracy rates and error types, and that they
cannot be easily predicted because their acquisition is influenced by multiple and diverse
factors, such as proficiency level, first language, task-type and the processing demands of
each linguistic feature. The next research article, contributed by Ibrahim M. R. Al-Shaer, is
"The Use of Third-Person Pronouns by Native and Non-Native Speakers of English",
especially in the context of pronoun-antecedent agreement, an area where it proves difficult to
draw the line between standard and non-standard usage. Similar to Dina Awad's study of the
acquisition of articles by Arabic non-native speakers of English, Al-Shaer looks into the
differences in the use of pronouns. The results of the study show that most native speakers
choose third-person pronouns depending on the socio-cultural context and pragmatic factors,
bending the formal rule of pronoun-antecedent agreement, especially when dealing with
gender-unspecified words. However, the majority of non-native speakers show an inclination
to follow prescriptive grammar rules, due to the absence of social and cultural sensitivity
evidenced in English as L2. Napasri Timyam's study, entitled "An Analysis of Learner Use of
Argument Structure Constructions: A Case of Thai Learners Using the Passive and
Existential Constructions in English", focuses on the aforementioned two types of common
1

constructions in English with the aim of discovering the deviations in terms of their general
characteristics in the written English of non-native speakers of the Thai language background.
The results reveal that Thai learners constructions differ from the prevalent native speaker
norms in that they are much more limited in terms of structural complexity, semantic and
pragmatic functions. In the last paper in the research article section, "Social Class and
Language Structure: A Methodological Inquiry into Bernstein's Theory of Sociology of
Education", Mohammad Aliakbari, Mahmoud Qaracholloo and Ali Mansouri Nejad explore
the manifestations and credibility of Bernstein's Language Codes Theory in an Iranian context
so as to check whether there are any significant differences between working- and middleclass Iranian native speakers in the domain of linguistic patterns usage. Even though
Bernstein's view of the relationship between language and social class has been largely
disputed, Aliakbari and colleagues provide some evidence supporting the manifestations of
the two dichotomous language codes: restricted code (lower strata of society) and elaborated
code (higher socioeconomic class of language users).
Three additional research notes are presented in the next section of this edition. The
first article, entitled "Code-Switching in a Virtual English Community in China: An
International Perspective and written by Ming Wei, looks into the concept of code-switching
as used in chat rooms. The study examines how code-switching negotiates social and
interactional meanings in virtual conversations as conducted by Chinese speakers of English,
as well as how it contributes to the creation of an authentic, slightly adapted context of social
interaction between interlocutors. Speakers tend to adjust their choice of code as well the
degree of code-switching, both of which are firmly entrenched in the social distance and face
management in synchronous conversations, as well as how manipulation of code
interpretation and selection was achieved in the virtual English community. Jabulani
Sibandas paper, "Interrogating Current Conceptualisations of Word for Word Knowledge
Studies: Challenges and Prospects", questions the efficacy of the conceptualisation of the
construct word represented by different terms, token, type, lemma, and word family,
as units of measurement of the English lexicon as seen in the vocabulary expansion of South
African learners of English. Jabulani points out that an implementation of an extension of
Nation and Bauers (1983) levels of word family membership, through an association of
inflected and derived forms with base words, seems a desirable proposition in second
language acquisition studies. Last but not least, the concluding research note, entitled "On
Gendered Styles and their Socio-Cognitive Foundations", is written by Mara Jos Serrano
and Miguel A. Aijn Oliva. The main purpose of their investigation is to outline a theoretical
2

and analytical framework that reconciles the quantitative and qualitative perspectives on
language and gender as used by male vs. female speakers of European Spanish. The authors
develop a view of the statistical patterning of linguistic usage which reflects the meaningful
use of linguistic elements in local contexts.

We hope you find the articles in the 2014 edition of the journal interesting. Your own
submissions and feedback are always welcome, and we look forward to receiving them.
Biljana ubrovi, Ph.D.
Chief Editor

Diverse Acquisition Patterns


Dina Awad
Leicester University
dinafirasawad@gmail.com
Bioprofile: Dina Awad holds a Ph.D. degree in Linguistics from Lancaster University (2011).
She received her M.A. in English Language Teaching and Applied Linguistics from Kings
College London in 2001. She is currently a lecturer at Leicester University, UK. Research
interests include second language acquisition, cognitive linguistics and teaching methods.

Abstract
Acquiring a second language is a complex and nonlinear process in which learner hypotheses
and production constantly change and evolve towards the target language. In order to find out
more about developmental patterns in SLA, we examined the L2 use of English articles in the
free composition of students in the United Arab Emirates, all of whom are L1 speakers of
Arabic. The participants were grouped into three proficiency levels (PL) according to the
Oxford Placement Test (OPT) to assimilate diachronic progression. It was expected that
learners performance on both articles would improve with higher competence. However, by
comparing accuracy and error rates across the three groups, we found that the articles a(n)
and the develop not only independently of each other but could sometimes progress in
diverse directions. The most influential factors that contributed to determining the final
outcome were the non-existence of a one-to-one form-function relation between the two
English articles, the dissimilarity between L1/L2 representations of definiteness and number,
and learners competence levels.

Keywords: English, second language acquisition, articles, pattern

Introduction
English articles have always been difficult for second language learners regardless of their
first language, persisting into advanced levels. Notorious as one of the most difficult features
of English to be learned or taught (Kaluza 1963, Brown 1973, Dulay et al. 1982, Pica 1983,
Master 1990, inter alia), misuse of articles ranks highest among L2 learners errors (Covitt
1976, cited in Celce-Murcia and Larsen-Freeman 1999, Richards and Simpson 1974). Sharma
(2005) established that article errors account for 60.37% of the total number of errors
4

committed by L1 Indian learners of English, while Thu (2005) found that article errors
constituted 31.5% of all other errors made by L1 Vietnamese learners. Thus, articles represent
an area of considerable prominence in any error analysis since, as traditionally believed,
performance regarding articles reflects overall linguistic competence (Oller and Redding
1971: 85). Later, researchers such as Lightfoot (1998) suggested that learners performance
on articles does not necessarily reflect their PLs. Bataineh (2005) found that senior Jordanian
learners overused the indefinite article more frequently than lower ability learners.
Research on SLA of English articles has shown that articles develop at different rates
(Chaudron and Parker 1990, Kellerman 1977, inter alia) due to the differences in the meaning
and function of each article. While the function of the definite article in English is to signal
that a particular entity in a limited context is uniquely identified by the interlocutors in a
particular pragmatic setting (Hawkins 1978, Lyons 1999), its absence is sufficient to indicate
indefiniteness, such as the case with plural and uncountable nouns. This leaves the indefinite
article primarily with a cardinality function assigned only to singular indefinite contexts.
Therefore, the disparity could arise from the fact that there is no one-to-one relationship
between the two.
In addition to the point that the two articles develop independently of each other, what
is proposed in this paper is that this nonlinearity can, under certain conditions, culminate in a
progression in different directions. The purpose of this study is to draw attention to the
complexity of article acquisition in L2 and to alert educators that progression in L2 does not
always correlate positively with performance, as advanced learners can make more errors, in
certain contexts, than weaker ones before finally improving. This pattern has been often
described as U-shaped development (See Kellerman 1977, Master 1997, Haznedar 2001), but
the consistent article errors even for advanced level L2 users undermine this proposition.

Literature Review
The two articles in English have not been reported to be acquired at the same time nor follow
the same route of development in SLA. Studies show that each article is produced and
mastered at variable stages and to incur different error types at different PLs. Several criteria,
such as difference in function, L1 grammar and task type determine the L2 development map
of each article separately.
Except for two known studies (Leung 2001, Young 1996), most researchers seem to
agree that mastering the definite article precedes that of the indefinite (Hakuta 1976, Huebner
5

1983, Master 1987, Thomas 1989, Yamada and Matsuura 1982). The rationale is that
definiteness, as a semantic concept, is at least encoded before indefiniteness (Chaudron and
Parker 1990) which involves grammatical notions of number and countability. This position
is corroborated by findings from many studies (e.g., Hamdallah 1988, Kharma and Hajjaj
1989, Maalej 2004). It is therefore noticeable that better performance on the definite article at
earlier stages is a common occurrence in the SLA process.
From a transfer perspective, the absence of articles in the L1 impedes the L2
acquisition and vice-versa (Ringbom 1987, Goad and White 2004). Despite the fact that
Arabic is considered a language with definiteness grammaticalised (+ART), there is no
explicit marker of indefiniteness. Suffix accents, or nunation (Smith 2001), sometimes mark
indefinite nouns, but their presence is optional and largely limited to classic, formal and
written registers. The indefinite NP in This is a big house, for example, can be expressed
formally where explicit markers appear as suffixes (1), or informally (2) without markers.
(1)
Haatha

bait-un

kabeer-un

Dem:prox house-N-Indef-Sg big-Adj-Indef-Sg


This house big
(2)
Haatha bait kabeer
This house big

Learners could transfer the semantic notions from Arabic in which the absence of definite
marking in a NP is a sufficient indication to its indefiniteness status. This principle, however,
is not entirely exclusive to Arabic. Leech contends that it is convenient, from many points of
view, to regard an initial determiner as obligatory for English noun phrases, so that the
absence of an article is itself a mark of indefiniteness. (1992: 15).
Studies on the production of learners whose L1s lack formal representation of articles
(ART) suggest that the failure to supply articles persists onto advanced stages (Thomas
1989, Master 1997, Trenki 2002, Ekiert 2004). Zdorenko and Paradis (2008) recorded more
omissions in the L2 production of Korean, Chinese and Japanese (ART) learners of English
than in the production of Spanish, Romanian and Arab (+ART) learners. High omission rates
of the indefinite article observed in Arab learners production is a typical occurrence of what
6

Eckman (1977) describes as the most difficult aspect to acquire in the target language, namely
the production of elements which are not present in the L1 but marked in L2. Tsimplis
(2003) conviction that the absence of features in the L1 causes syntactic representations in L2
production to become defective applies to the difficulties which Arab learners encounter.
SLA researchers, such as Hawkins and Chan (1997) and Prvost and White (2000),
ascribed the difficulty that second language learners (2LL) have in the employment of a
feature that does not exist in their L1 to a failure in mapping functional features present in the
L2 (FFFH) onto their production of the target language. With L1 transfer most operative at
weaker PLs (Odlin 1989, Sharma 2005, Slabovka 2000, Snape 2005) better performance is
expected on the definite than the indefinite article.
Previous research has provided evidence for the tendency of Arab learners to overuse
the definite article across indefinite contexts (cf. Bataineh 2005, Kharma 1981, Maalej 2004).
This error was attributed to two different sources. One group of researchers (e.g., Al-Fotih
2003, Diab 1996, Habash 1982, Kharma and Hajjaj 1989) ascribes the error to the negative
transfer of the definite article norms in Arabic. While the definite marker in Arabic is used to
generalise as well as to identify, rendering all generic references grammatically definite
(Hawas 1989, Kremers 2003), non-referential NPs in English are largely left unmarked1 as
native speakers most favourable option (Behrens 2005). The fact that the definite article
tends to be overused in non-specific contexts while the indefinite is expected to be
underrepresented can cause a gap in the development of the two articles.
Other researchers, including those whose data was collected from free production,
such as Abi Samra (2003) and Bataineh (2005), believe that the-flooding tendency is a
universal (IL) phenomenon; a stage that all L2 learners go through regardless of their L1. In a
study on university students in the Arab Emirates, Crompton (2011) contends that the most
common error is the overuse of the definite article in generic contexts. Therefore, overuse of
the is expected in indefinite plural/uncountable contexts especially at lower PLs.
The higher overuse rate of the at lower PLs is not by any means exclusive to Arab
L1 learners. Similar findings were reported in SLA studies on other L1s, including languages
that possess or lack a formal representation of articles (Huebner 1983, Nagata et al. 2005,
Thomas 1989, Young 1996). Masters (1987) study of Japanese learners, for example, found
that the definite article was flooded into indefinite contexts although Japanese does not
possess an article system.
Test type can also influence article choice causing inconsistency in production and
accuracy/error rates. Research findings suggest that free writing tasks yield higher accuracy
7

rates than controlled cloze tests. Dulay et al. (1982) argue that errors in form-focused tests
occur when formally learned rules have not yet become part of the learners linguistic
competence, i.e. learners need time to practice their explicitly learned L2 rules in order to
produce grammatically appropriate forms in free production. This is largely attributed to
avoidance strategies that are available to learners in production-based tasks (Kharma and
Hajjaj 1989, Mizuno 1985, Tarone and Parrish 1988). Accordingly, learners resort to other
determiners such as quantifiers and demonstratives to reduce the risk of committing errors in
article use. In this case, when given the choice, the definite article presents a safer option
since it collapses elements of countability and number, which endanger the grammatical
accuracy of the NP. Furthermore, the is already available in learners subconscious and
easily automated in free production, while the indefinite article, learned mostly through
explicit instruction, is more accessible in tasks that draw on metalinguistic information such
as cloze tests. With communicating meaning being the primary goal in a free production task,
learners attention might not be fully directed towards form causing the production and
accuracy rates of the indefinite article to be relatively low.
Advocates of teaching articles (e.g., Master 1997) in the EFL/ESL classroom propose
that informing learners of explicit rules can eventually lead to automated use, i.e., for a
learner to know how a feature operates, precedes, and leads to, the voluntary application of
these rules in communicative settings (DeKeyser 2003, Doughty 2003, Ellis 2001, VanPatten
1994), i.e., more time is needed for this declarative knowledge to become internally
proceduralised and voluntarily produced in meaningful output. Therefore, participants could
have achieved different results had the task been form-focused.

Method
Participants
Sixty undergraduate students from different colleges in the UAE University, United Arab
Emirates, volunteered to participate in this study. Each participant was given a reference
number. A background uniformity survey was conducted to ensure unanimity of first
language, Arabic, while participants who had studied in English medium schools or lived in
an English speaking country for more than three months were excluded.

Materials and Procedure


The Oxford Placement Test (OPT) was used to determine proficiency levels. Participants
with scores of Elementary (30 out of 60) were placed in the weakest group (G1), while those
whose scores were between 31 and 44 formed the second group (G2), to include Lower
Intermediate and average Intermediate levels by OPT standards. The highest group, (G3),
included students with 45 points and above, i.e. Upper Intermediate and Advanced by OPT
banding. In order to ensure sufficient gaps between the groups, borderline scores were
excluded from the test, leaving 51 students to take the following test.

Table 1 Banding criteria according to OPT results

Levels

OPT Scores

Range

Groups

Beginner

0-17

0-30

Elementary

18-29

Lower Intermediate

30-39

31-45

Upper Intermediate

40-47

Advanced

48-54

46-60

Very Advanced

54-60

The succession of levels is an attempt to follow, synchronically, natural L2 progress,


otherwise operationalised longitudinally, as variation across proficiencies can reflect some
aspects observed in diachronic development (Raymond et al. 2002).
The data was collected from a composition task in which learners were asked to
describe their hometowns in 350-500 word essays. The topic provides students with an
opportunity to express themselves freely and creatively by introducing new information and
referring to it later in the text, which ensures the availability of definite and indefinite
constructions. No indication was made to the purpose of the test in the prompts in order for
the production to better reflect learners communicative competence as it approximates reallife interaction (Lightbown and Spada 1999, Power 2003). Free production tests are known to
direct learners attention mainly towards delivering meaning, providing the researcher with a

sample of L2 data in non-test situations (Skehan 1989). Therefore, the outcomes of this study
might not resemble those obtained from cloze tests.
Observed by teachers, the participants were given one hour to write. Time pressure
adds a processing constraint on participants to prevent conscious contemplation of the forms
produced (cf. Robinson 1996, Sorace 1996).

Data Analysis and Statistical Analysis


NPs were numbered by order of appearance in each essay and described in terms of the
criteria that determine article use i.e. definiteness, countability and number (Celce-Murcia
and Larsen-Freeman 1999, Quirk et al. 1991). NPs were described as possessing (1) or
lacking (0) these criteria in order to facilitate calculations.
(3) *Its quite big town. (14C4)2.
(4) *He told us an interesting stories. (1A22).

According to the categories of analysis, the NP in 3 was described as [Def=0] [Count=1]


[Sing=1] while the NP in example 4 was [Def=0] [Count=1] [Sing=0].
Article use was categorised as either correct or incorrect. The approach followed to
determine the correctness of the definite article is derived from Liu and Gleasons (2002)
classification of non-generic contexts of definite article use, which, in turn is based on the
theory of definiteness advanced by Hawkins (1978). Semantically definite NPs, such as
proper nouns, pronouns and demonstratives as well as quantifiers that exclude articles were
not included in the dataset because [+Def] categorisation would automatically require the
supply of the definite article, leading to confusion in subsequent calculations. However, the
determiner some was accepted as a correct indefinite plural marker. Incorrect use was
subdivided into errors of overuse, omission and replacement. Overuse errors refer to
instances where articles should not have appeared (Pica 1983) while omission errors denote
the failure to supply either article in contexts where they are deemed obligatory. Thus, the
error in example 3 is that of omission while in 4 it is overuse. Replacement errors refer to the
employment of the indefinite article in uniquely identifiable referents (a-for-the), or the
supply of the definite article in indefinite singular contexts (the-for-a). A sample datasheet is
shown in Table 2.

10

Table 2 Sample data sheet


Article Use

Ref.

place to work

has relatives

I live in small town

the most beautiful town

in the world

Definite

Correct

Omission

1
1
1

from all over the world

It has mountains

beaches where you breathe

11

fresh air

12

and farms where you find

13

different kinds of

14

fruit and

15

Vegetable

16

It has the most important factor


which are

tourists

It has many places which attract

Singular

a special place to live

0
Countable

Correct

NP

a(n)
Overuse

aw

Omission

11

Correct

The

Overuse

No.

NP description

Student name

Safety

18

Quietness

19

and purity

20

whenever I have a problem

21

my place in the society

22

small simple houses

1
1

17

1
1

11

Two speakers of English as a first language volunteered to review the datasheets to ensure the
reliability of the coding. Some expressions were marked as (grammatically) correct, although
more target-like constructions would have been preferable.

Learner data

Native like choice

(10) As a conclusion

in conclusion

(11) The houses of people

Peoples houses

To calculate accuracy rates, the number of correct supplies was divided by the total sum of
NP environments in which articles should have appeared.

Number of observed occurrences


Correct use

%
Total number of obligatory contexts

Outcomes were measured in percentages to allow comparisons across groups with varying
numbers of participants and unequal obligatory contexts. In principle, the formula used in
calculating errors was similar to the one used for accuracy, i.e., the observed instances were
compared against the total number of contexts where such occurrences were expected to
appear. For example, to calculate the percentage of overuse of a(n), the following equation
was used:

Number of incorrect instances of a(n)


Overuse of a(n) =

%
Total number of [Def] [Count] NPs

A similar method was followed to examine the occurrence of the indefinite article in plural
contexts, simply by changing the [Count] contexts into [Sing] ones. The overuse rates of
the definite article were calculated by dividing the total number of overuse instances in
learner data by the total number of indefinite NPs in a given group.
The omission rates of a(n) were obtained by comparing the total number of
obligatory contexts; i.e. [Def] [+Count] [+ Sing] NPs, against observed instances. The same
was used for definite article omissions.

12

Replacement errors had to be calculated in a manner that would make the two articles
more comparable since it is grammatically acceptable for the definite article to replace the
indefinite while the reverse is not always possible. Therefore, only [+Sing] [+Count] nouns
were selected as constants for both articles leaving definiteness as the only dependent variable
that determines the appropriate choice of either article.

Total number of overuse instances of a(n)


a-for-the =

%
Total number of [+Def] NPs

A similar calculation was used to examine the error of replacing the indefinite article with the
definite.
Finally, to ensure that there is consistency within the responses of each group, the
following analysis was performed.

Table 3 Consistency within groups


N

Range

1st

Median

quartile

3rd
quartile

G1

19

18-30

27

30

32

G2

20

31-45

35

37

40

G3

17

46-53

45

47

49

There was also sufficient cross-group difference to justify the categorisation. In order to
measure cross-group variance, we used a non-paired, two-tailed t-test assuming equal
variance with 95% confidence, comparing two groups at a time. Cross group differences were
statistically significant as is shown in Table 4.

13

Table 4 Cross group variation

G1

CI at 95%

19

124.07

7.89

G1 v G2 <0.0001

G2

20

159.1

G1 v G3
<0.0001

8.26
G2 v G3 0.0057

G3

17

177.57

8.57

Results
Accuracy
G1 Learners employed the definite article correctly 150 times in 199 obligatory contexts,
while a(n) was correctly supplied 64 times in 106 indefinite singular contexts. The
significant difference (p=0.0079) strongly suggests that Arab learners initially perform better
on the definite than the indefinite article.

G2 achieved higher accuracy rates on both articles. The gap between the accuracy rates of the
two articles was smaller. However, the error pattern remains in line with that detected in G1s
production as the accuracy rates of the definite article (84%) remained significantly higher
than those of the indefinite (p=0.0395).

G3 The highest accuracy rates were, as expected, achieved by more advanced learners.
Unlike the results from the two lower groups; there was little difference in the accuracy rates
of the definite and indefinite articles. However, G3 performed better on the indefinite (89%)
than the definite (86%) article.

14

Figure 1 Accuracy rates across groups

The results show sustained improvement in learners performance on both articles yet the
progress on the indefinite article was more noticeable and consistent, correlating positively
with PLs with significant rates scored across PLs. On the other hand, the difference between
G2 and G3s accuracy rates of the definite article was not significant (p=0.3742) as is shown
in Table 5.
Table 5 Accuracy rates of both articles compared across groups

G1

Correct
the

Correct
a/an

19

150/199

75

G1 v G2

64/106

%
60

0.0276
G2

17

245/293

84

G2 v G3

12

191/221

86

G1 v G3
0.0039

G1 v G2
0.0277

86/116

73

0.3742
G3

G2 v G3
0.0080

57/64

89

G1 v G3
<0.0001

15

Diverse acquisition patterns can be detected as the highest scores shift from being achieved
on one article (the) to the other (a) with PL progression. The diagram in figure 2 further
illustrates this trend.

Figure 2 Accuracy trend-lines for both articles

Omission
G1 This group omitted the definite article in 48 obligatory instances, which is 24% of all
definite contexts. The failure to supply a(n) with indefinite singular countable nouns was
the most noticeable difficulty in the lower groups performance as the omission of the
indefinite article was higher than all other errors. The omission of the indefinite article was
the highest of all grammatical errors recorded in G1s production (44%). G1 omitted the
indefinite article 42 times in 106 contexts. In percentages: weaker learners failed to supply
a(n) 40% of the time with singular indefinite NPs.

G2 Although there were fewer omissions by this group than by the weaker group (p=0.0476),
G2s performance was similar to that of G1 as intermediate PL participants omitted the
indefinite article more frequently than the definite article. The omission of a(n) constituted
34% of all grammatical errors made by G2. They omitted the indefinite article 30 times in
116 indefinite singular NP contexts (26%), a significantly higher rate than that of the definite
article (17%).

16

G3 Omission rates of the indefinite article seem to have decreased regularly and significantly
as PLs improve. However, it was interesting to find that G3 participants made more
omissions of the definite than the indefinite article. The omission rate of the definite article
was 12% while the rates of a(n) omission were only 11%.

Diverse, if not inverse, patterns are clearly evident in Figure 3.

Figure 3 Omission patterns across groups with linear trend.

Overuse
G1 The results obtained from the weaker groups production reveal that indefinite nouns were
unconventionally preceded by the definite form 52 times in 385 possible contexts (14%). All
of these instances were non-referential plural/uncountable contexts. Compared to the overuse
of the indefinite article which was lower than 2%, the overuse of the definite article was
significantly higher (p<0.0001). The overuse rates of the definite article were considerably
more frequent than the total sum of ungrammatical supply of a(n) in plural/uncountable
constructions and in contexts where the definite article should have appeared.

G2 The recorded overuse rates of the definite article dropped down to 10% (47 out of 461
indefinite contexts) in the production intermediate group, with most instances observed in
generic, non-referential, contexts as is the case in learners L1. The ungrammatical supply of
a(n) with plural and uncountable nouns did not exceed 2.3% which means that the disparity
17

between the overuse rates of the two articles was smaller than the rates emerging from the
weaker groups performance.
G3 The most noticeable improvement in learners production was the significant and
systematic drop in the overuse rate of the definite article with improved L2 competence. The
definite article was overused in 14 of 236 indefinite NP environments, which reduces the rate
to only 6%. However, the advanced groups overuse rates of the indefinite article were
slightly higher than those of the two weaker groups as shown in Table 6.
Table 6 Overuse rates
p

the

a(n)

G1

19

52/385

13.51

6/279

<0.0001

G2

17

47/461

10.2

8/345

<0.0001

G3

12

14/236

5.93

4/172

0.0603

the : a

From the table above, it is noticeable that while the overuse of the definite article falls
sharply, the indefinite article is over supplied and flooded. Figure 4 illustrates the contrast in
error trends.

Figure 4 Overuse rates of articles across groups


18

Replacement
The phenomenon of diverse acquisition patterns is most evident in replacement errors.
Replacement errors constituted 59% of all errors committed in the test; a considerable rate
compared to the total sum of all other errors (41%).

G1 In analysing data entries, it was evident that the definite article was the preferred option
especially for weaker learners, as it replaced the indefinite article in many [+Count] [+ Sing]
contexts. This group overused the definite article to replace the indefinite four times as often
as they did the opposite. The definite article replaced the indefinite in only one instance out
of 111 possible replacement contexts.

G2 The intermediate group made fewer replacement errors. The improvement is also noticed
in the fact that the gap between the two replacement rates has decreased. G2 participants used
the to replace a(n) twice as often as replacing the definite. This can be a form of
improvement compared to the four-fold ratio observed in the production of G1. However,
despite the improvement, intermediate learners still preferred to substitute the indefinite
article with the definite rather than the reverse while supplying a(n) instead of the
increased from 0.9% to 1.2%.
G3 At a later learning stage, the higher groups replacement rates became very close, i.e. the
difference between the rates of replacing the-for-a were almost equal to those of replacing afor-the with the indefinite article preferred. A summary of the above results is presented in
Table 7.

Table 7 Replacement errors

Groups

the for
a(n)

a(n) for
the

G1

19

5/106

4.7

1/111

0.9

G2

17

4/116

3.4

2/161

1.2

G3

12

2/64

3.1

4/116

3.4

19

The inclination to substitute a(n) with the was reduced with improving PLs while the
production of the indefinite article in definite contexts increased steadily.
The error map of replacement in the learners data is most reflective of diverse
acquisition patterns. This is perhaps clearer in the presentation in Figure 5.

Figure 5 Replacement errors

Discussion
Accuracy
The accuracy rates of the weaker group were higher than those reported by studies on learners
of ART L1s (c.f. Butler 2002, Ekiert 2004, Master 1997, Trenki 2002), which confirms
propositions of stronger L1 influence at lower L2 levels. This can be construed as positive
transfer of L1 semantic properties to the L2 as both languages concord on most conditions for
obligatory supply. The lower accuracy scores of the indefinite article resulting from little
production or erroneous use also suggest stronger L1 influence at earlier stages. G1 learners
seem not to have internalised the rules governing the use of the indefinite article to
automatically supply it where necessary. It is not surprising G2 learners performed better on
the definite article despite the improvement in PL since this type of test better reflects implicit
knowledge in which the representation of a feature with a semantic equivalent in the
participants L1 is more accessible than the indefinite article which is not readily available in
20

the learners subconscious knowledge and perhaps requires direct prompts to activate the
newly learned L2 form. G3s higher PL is reflected in the accuracy rates of the indefinite
article, approaching those of the definite and exceeding them. Although the difference
between the accuracy rates of the two articles in G3 is small and statistically insignificant, it
strongly indicates a change of trend (see Figure 1). Thus, we can assume that with stronger
L2 ability, learners mastery of the two articles becomes more compatible.

Omission
With focus on expressing thoughts and describing locations and attractions and the lack of
prompting in the rubric to the purpose of the test, it is expected that this type of test would
accrue a high number of omission instances. This lends support to Granfeldts (2000)
observation that accuracy will decrease if learners attentional resources (Bialystok and Ryan
1985) are channelled towards goals other than accuracy.
G1 participants omission rates of the indefinite article were significantly higher than
those of the definite. The failure to provide the indefinite article can also be driven by
learners assumption that its absence does not constitute a hindrance to successful
communication of ideas. It is likely that weaker learners have subconsciously applied the
Economy Principle (Poulisse 1997) whereby maximal comprehensibility is achieved while
exerting minimal processing effort. G2 learners might have also found it redundant to mark
nominals overtly for indefiniteness if their [-DEF] value is readily inferred by the absence of
the definite marker. However, lower omission rates suggest that G2 learners have become
more aware of the conditions of indefinite article employment while beginning to realise the
limitations of the definite article to specific environments rather than its generalising function
in Arabic. Since free composition better reflects subconscious knowledge, lower omissions
and higher production of a(n) indicate that G3 learners command of the indefinite article
has become more internalised to be produced spontaneously in communicative output.
Although the disparity in the omission rates of the two articles was not significant in
G3s results, the switch in tendency is quite clear. While the weaker and intermediate learners
omitted the indefinite article more frequently than the definite, the advanced group were more
aware of the necessity to provide a(n) and at the same time reduce the provision of the
definite even in obligatory contexts. This is consistent with the findings of researchers such as
Chaudron and Parker (1990), Cziko (1986), Ekiert (2004) and Habuto (2000).

21

Overuse
The overuse errors made by the weaker group were lower than originally expected. A
possible rationale for this is that free production tests are known to yield lower overuse rates
(see Tarone and Parrish 1988) since learners were not directed to provide a particular form,
which is known to encourage overuse in cloze tests.3 While the weaker group
overwhelmingly preferred the definite article, this was less noticeable in G2s production.
The decreased difference between the overuse rates of the two articles marks a change in
learners underlying hypotheses on article use and indicates fluctuation characteristic of their
IL stage. This type of overuse is typical of what Richards (1971) refers to as partial
understanding of target language features. The significantly lower overuse rates of the
indefinite article compared to that of the definite in G1 and G2 production may not be entirely
due to learners developed awareness of article use. Instead, it could well be attributed to task
type and L1 transfer.
The increase in overuse errors of a(n) by the advanced group could be interpreted as
a form of regression but it could also be a result of hyper-correction as learners try to avoid
omission errors committed during past learning experience- over applying instructions to
produce a(n) which leads to a flooding stage similar to the one observed in definite article
use. Richards (1976) maintains that failure to observe restrictions of countability and number
in article use may be due to faulty analogy. In many cases, the analogy is derived from
formulaic expressions learned as chunks in existential and have constructions memorised at
earlier stages and incorrectly overgeneralised.

Replacement
The reason underlying the preference of G1 to replace the indefinite article with the definite is
mainly developmental, through flooding and avoidance, but also involves L1 influence in the
absence of an explicit marker of indefiniteness in L1. Although both rates of replacement
errors are considerably low in G2, what emerges at this stage is an obvious change of trend
from that observed in the production of the weaker group. G3s preference of the indefinite
article to replace the definite is probably a result of learners recently increased awareness of
the importance of supplying the indefinite article. Moreover, this result could have been
equally influenced by the receding influence of L1 represented by the drop in the overuse
rates of the before singular indefinite nouns since the use of the definite singular to deliver
generic reference is substantially recurrent in Arabic. Although acceptable in certain
22

expressions in English (e.g., She plays the piano), it is not likely that learners have been
sufficiently exposed to authentic material to the extent that would enable them to detect
similar uses and employ them unprompted. If we suppose that, in marking indefiniteness,
Arabic is an ART language, then G3s understanding of the indefinite article corresponds to
that of Leungs (2001) Japanese (ART) learners who preferred a-for-the more often than
the-for-a.
This suggests that Arab learners experience a mapping problem of a(n) into IL
grammar, which is more in line with the performance of Japanese, Chinese and Korean
learners (ART) rather than the Spanish and Romanian groups in Snape et al.s (2006) and
Zdorenko and Paradiss (2008) studies.

Implications
The results of this study show that second language development is neither homogenous nor
simultaneous. The advancement in one aspect of L2 knowledge does not imply identical level
of achievement in another. Rather, there is evidence for a complex, non-linear and sometimes
inverse progression, guided by multiple factors such as proficiency level, first language, tasktype and the processing demands of each linguistic feature.
The developmental patterns of the two articles are divergent in both accuracy rates and
error types. The learning curve seems to start with higher awareness and a better supply of a
feature which already exists in the L1 (the definite article), but with improved PLs and
reduced L1 influence, the trend gradually shifts towards a better conceptualisation, and
therefore a higher production, of the newly acquired feature (the indefinite article). Error
patterns are also converse. Learners begin by overproviding the definite article in nonreferential contexts, and gradually reduce production until it is undersupplied in obligatory
contexts at later developmental stages. In contrast, the overuse of the indefinite article is
scarce in the production of weaker learners, yet with overall L2 progress, rates exceeded
those of the definite.
A mirror image of the above pattern is observed in omission errors, as high rates of
indefinite article omissions were observed in early stages. With better PLs, the rates fell
considerably. Although the definite article was properly supplied in obligatory contexts at
elementary levels scoring very low omission rates, the error increased in the production of
more able groups leading to higher omissions. A diverse progression map is also detected in
replacement errors as participants started with higher the-for-a rates but ended with greater a23

for-the substitutions. The switch of preferences from the to a(n) reflects the regular and
systematic move from limited, L1 influenced use towards more target-like, internalised
knowledge.
It is worth mentioning that if occurrences of the indefinite article within formulaic
expressions were excluded from our calculations, since they are mostly memorised and not
automatically produced in corresponding contexts, the rates would have been more
contrastive. It is therefore safe to propose that articles develop not only independently from
one another but could also progress in diverse directions.

References
Abi Samra, N. (2003). An analysis of errors in Arabic speakers English writings. American
University of Beirut. Retrieved 25 October, 2005 from
http://abisamra03.tripod.com/nada/languageacq-erroranalysis.html
Al-Fotih, T. A. (2003). Acquisition of the English articles by Arabic-speaking students.
Indian Linguistics, 64, 157-174.
Bataineh, R. F. (2005). Jordanian undergraduate EFL students errors in the use of the
indefinite article. Asian EFL Journal, 7(1), 56-76.
Behrens, L. (2005). Genericity from a cross-linguistic perspective. Linguistics, 43(2), 275
344.
Bialystok, E. and E. B. Ryan. (1985). A metacognitive framework for the development of
first and second language skills. In D. L. Forrest-Pressley, G. E. Mackinnon, and T. G.
Waller (Eds.), Metacognition, cognition, and human performance: Vol. 1. Theoretical
perspectives (pp. 207-252). San Diego, CA: Academic Press.
Butler, Y. G. (2002). Second language learners theories on the use of English articles: An
analysis of the metalinguistic knowledge used by Japanese students in acquiring the
English article system. Studies in Second Language Acquisition, 24(3), 451-480.
Celce-Murcia, M. and D. Larsen-Freeman. (1999). The Grammar Book. Los Gatos: Sky Oaks
Production.
Chaudron, C. and K. Parker. (1990). Discourse markedness and structural markedness: The
acquisition of English noun phrases. Studies in Second Language Acquisition, 12(1), 43
64.
Crompton, P. (2011). Article errors in the English writing of advanced L1 Arabic learners:
The role of transfer. Asian EFL Journal, 50, 4-32.
24

Cziko, G. (1986). Testing the language hypothesis: A review of childrens acquisition of


articles. Language, 62, 878-898.
DeKeyser, R. M. (2003). Implicit and explicit learning. In C. J. Doughty and M. H. Long
(Eds.), The Handbook of second language acquisition (pp. 313-348). Malden, MA:
Blackwell.
Diab, N. (1996). The transfer of Arabic in the English writings of Lebanese students. The
ESPecialist, 18(1), 71-83.
Doughty, C. J. (2003). Instructed SLA: Constraints, compensation, and enhancement. In C. J.
Doughty and M. H. Long (Eds.), The Handbook of Second Language Acquisition. (pp.
256-310). Malden, MA: Blackwell.
Dulay, H., M. Burt, and S. Krashen. (1982). Language Two. New York: Oxford University
Press.
Eckman, F. (1977). Markedness and the contrastive analysis hypothesis. Language Learning,
27(2), 315-330.
Ekiert, M. (2004). Acquisition of the English article system by speakers of Polish in ESL and
EFL settings. Columbia University Working Papers in TESOL and Applied Linguistics,
4(1), 1-23.
Ellis, R. (2001). Investigating form-focused instruction. Language Learning, 51(1), 146.
Foster, P. and P. Skehan. (1996). The influence of planning and task type on second language
performance. Studies in Second Language Acquisition, 18(3), 299-323.
Garcia Mayo, M. P. (2008). The acquisition of four nongeneric uses of the article the by
Spanish EFL learners. System, 36, 550565.
Goad, H. and L. White. (2004). Ultimate attainment of L2 inflection effects of L1 prosodic
structure. European Second Language Association Yearbook, 4 (pp. 119-145). John
Benjamins.
Granfeldt, J. (2000). The acquisition of the determiner phrase in bilingual and second
language French. Bilingualism: Language and Cognition, 3, 263-280.
Habash, Z. (1982). Common errors in the use of English prepositions in the written work of
UNRWA students at the end of the preparatory cycle in the Jerusalem area. Retrieved 3
July, 2006 from http://www.zeinab-habash.ws/education/books/master.pdf
Habuto, J. (2000). Comprehensible output hypothesis: Study of Japanese ESL students and
the acquisition of the English article system. In Moroishi, M. (Ed.), Classroom Second
Language Acquisition. FLL679S.

25

Hakuta, K. (1976). A case study of a Japanese child learning English as a second language.
Language Learning, 26, 321-351.
Hamdallah, R. (1988). Syntactic errors in written English: Study of errors made by Arab
students of English. Unpublished doctoral dissertation. Lancaster University, UK.
Hawas, H. M. (1989). The articles in English and Arabic: A contrastive study. Indian Journal
of Applied Linguistics, 15(2), 23-52.
Hawkins, J. A. (1978). Definiteness and indefiniteness. London: Croom Helm.
Hawkins, R. (2004). Explaining full and partial success in the acquisition of second language
grammatical properties. Paper presented at J-SLA, Gunma Prefectural Womens
University, Gunma, Japan.
Hawkins, R. and Y. Chan. (1997). The partial availability of universal grammar in second
language acquisition: The failed functional features hypothesis. Second Language
Research, 13(3), 187226.
Haznedar, B. (2001). The acquisition of the IP system in child L2 English. Studies in Second
Language Acquisition, 23(1), 139.
Huebner, T. (1983). A longitudinal analysis of the acquisition of English. Ann Arbor.
Kellerman, E. (1977). Towards a characterization of the strategies of transfer in second
language learning. Interlanguage Studies Bulletin, 2, 58-145.
Kharma, N. (1981). Analysis of the errors committed by Arab university students in the use
of the English definite/indefinite articles. International Review of Applied Linguistics, 19,
331-345.
Kharma, N. and A. Hajjaj. (1989). Errors in English among Arabic speakers: Analysis and
remedy. London: Longman Group UK Limited.
Kremers, J. M. (2003). The Arabic noun phrase. LOT: The Netherlands.
Larsen-Freeman, D. (1997). Chaos/complexity science and second language acquisition.
Applied Linguistics, 18(2), 141-165.
Leech, G. (1992). Introducing English grammar. London: Penguin.
Lightbown, P. M. and N. Spada. (1999). How Languages are Learned. (2nd ed.). Oxford:
Oxford University Press.
Lightfoot, A. R. (1998). Japanese second-language learners and the English article system: A
study in error analysis. University of Leeds. Retrieved 6 November, 2008 from
http://ardle.net/linguistics.html

26

Liu, D. and J. I. Gleason. (2002). Acquisition of the article the by nonnative speakers of
English: An analysis of four nongeneric uses. Studies in Second Language Acquisition,
24(1), 1-26.
Lyons, C. (1999). Definiteness. Cambridge Textbooks in Linguistics. Cambridge University
Press.
Maalej, Z. (2004). On the misuse of determination in Arab students writing. University of
Manouba-Tunis. Retrieved 18 February, 2006 from www.executivetranslators.com
Master, P. (1987). A cross-linguistic interlanguage analysis of the acquisition of articles.
Unpublished doctoral dissertation. University of California, Los Angeles.
Master, P. (1990). Teaching the English articles as a binary system. TESOL Quarterly, 24,
461478.
Master, P. (1997). The English article system: acquisition, function, and pedagogy. System,
25, 215-232.
Mizuno, H. (1985). A psycholinguistic approach to the article system in English. JACET
Bulletin, 16, 1-29.
Nagata, R., T. Iguchi, K. Wakidera, F. Masui and A. Kawai. (2005). Recognizing article
errors in the writing of Japanese learners of English. Systems and Computers in Japan,
36(7), 54-62.
Oller, J. W. and E. Z. Redding. (1971). Article usage and other language skills. Language
Learning, 21(1), 85-95.
Parrish, B. (1987). A new look at methodologies in the study of article acquisition for learners
of ESL. Language Learning, 37, 361383.
Pica, T. (1983). Adult acquisition of English as a second language under different conditions
of exposure. Language Learning, 33, 465-97.
Poulisse, N. (1997). Some words in defense of the psycholinguistic approach: a response to
Firth and Wagner. The Modern Language Journal, 81(3), 324-328.
Power, T. (2003). Communicative language teaching: The appeal and poverty of
communicative language teaching. Retrieved 15 June, 2007 from
http://www.btinternet.com/~ted.power/esl0404.html
Prvost, P. and L. White. (2000). Missing surface inflection or impairment in second
language acquisition? Evidence from tense and agreement. Second Language Research,
16, 103-133.

27

Raymond, W., J. A. Fisher, and A. F. Healy. (2002). Linguistic knowledge and language
performance in English article variant preference. Language and Cognitive Processes,
17(6), 613662.
Richards, J. C. (1971). A non-contrastive approach to error analysis. English Language
Teaching Journal, 25, 204-19.
Richards, J. C. (1976). The role of vocabulary teaching. TESOL Quarterly, 10(1), 77-89.
Ringbom, H. (1987). The role of the first language in foreign language learning. Clevedon,
UK: Multilingual Matters.
Robinson, P. (1996). Learning simple and complex second language rules under implicit,
incidental, rule-search, and instructed conditions. Studies in Second Language Acquisition,
18(1), 2767.
Sharma D. (2005). Transfer and universals in Indian English article use. Studies in Second
Language Acquisition, 27(4), 535-566.
Skehan, P. (1989). Language testing. Language Teaching, 22, 1-13.
Slabakova, R. (2000). L1 transfer revisited: the L2 acquisition of telicity marking in English
by Spanish and Bulgarian native speakers. Linguistics, 38(4), 739-770.
Smith, B. (2001). Learner English: A teachers guide to interference and other problems.
Cambridge: Cambridge University Press.
Snape, N. (2005). The uses of articles in L2 English by Japanese and Spanish speakers. Paper
submitted to the annual conference on language acquisition. Essex Graduate Student
Papers in Language and Linguistics, 7, (pp. 1-23).
Snape, N., Y. I. Leung and H-C. Ting. (2006). Comparing Chinese, Japanese and Spanish
speakers in L2 English article acquisition: evidence against the fluctuation hypothesis. In
M. Grantham OBrien, C. Shea, and J. Archibald (Eds.), Proceedings of the 8th Generative
Approaches to Second Language Acquisition Conference (pp. 132-139). Somerville, MA:
Cascadilla Proceedings Project.
Sorace, A. (1996). The use of acceptability judgments in second language acquisition
research. In W. Ritchie and T. Bhatia (Eds.), Handbook of Second Language Acquisition
(pp. 375409). San Diego, CA: Academic Press.
Tarone, E. and B. Parrish. (1988). Task-related variation in interlanguage: the case of articles.
Language Learning, 38, 21-44.
Thomas, M. (1989). The acquisition of articles by native and non-native speakers of first and
second language learners. Applied Psycholinguistics, 10, 335-355.

28

Trenki, D. (2000). The acquisition of English articles by Serbian speakers. Unpublished


doctoral dissertation. University of Cambridge.
Trenki, D. (2002). Form-meaning connections in the acquisition of English articles. In
Foster Cohen, S., T. Ruthenberg and M. Poschen (Eds.), European Second Language
Association Yearbook, 2 (pp. 115-133). Amsterdam: John Benjamins.
Tsimpli, I. M. (2003). Clitics and determiners in L2 Greek. In J. M. Liceras, H. Zobl and H.
Goodluck (Eds.), Proceedings of the 6th Generative Approaches to Second Language
Acquisition Conference (pp. 331-339). Somerville, MA: Cascadilla Proceedings Project.
VanPatten, B. (1994). Cognitive aspects of input processing in second language acquisition.
In P. Heshemipour, I. Maldonado, and M. Van Naerssen (Eds.), Festschrift in honour of
Tracy D. Terrill (pp. 170-183). NewYork: McGraw-Hill.
Yamada, J. and N. Matsuura. (1982). The use of the English article among Japanese students.
RELC Journal, 13, 50-63.
Young, R. (1996). Form-function relations in articles in English interlanguage. In R. Bayley
and D. R. Preston (Eds.), Second language acquisition and linguistic variation (pp. 135175). Amsterdam: John Benjamins.
Zdorenko, T. and J. Paradis. (2008). The acquisition of articles in child second language
English: Fluctuation, transfer or both? Second Language Research, 24(2), 227-250.

Notes
1. Some researchers (e.g., Master 1987) consider bare nouns as marked with a zero article.
2. The number in brackets reflects the students serial number, her PL group (A/B/C), and the
ordinal number of the NP in the essay.
3.

For task type effect on L2 production, see Foster and Skehan (1996).

29

The Use of Third-Person Pronouns by Native and Non-Native Speakers of English


Ibrahim M. R. Al-Shaer
Al-Quds Open University
ishaer@qou.edu

Bioprofile:
Dr Ibrahim Al-Shaer has 23 years of experience in higher education. He spent his first 7 years
of professional experience teaching different English language and linguistics courses at
several universities. He was also the Director of Al-Quds Open University in Bethlehem for
10 years. He is currently the President Assistant for Innovation and Excellence.
Dr Al-Shaer obtained a Bachelor of Arts in English language and a Diploma in secondary
education in 1986 from Bethlehem University. He is a recipient of a 1989 scholarship from
the British Council, to study for a Master of Linguistics for ELT at Lancaster University. Dr
Al-Shaer is also a recipient of a 1998 scholarship from ASAI in conjunction with Al-Quds
Open University to study for a Ph.D. in Applied Linguistics from the University of Reading.
Dr Al-Shaers main research interests are in the fields of psycholinguistics, construction
grammar, semantics, syntax, ELT applications, writing skill, corpus linguistics, e-learning,
innovation, and creativity.

Abstract
This study addresses research questions concerning the use of third-person pronouns by
native and non-native speakers of English. For this purpose, a corpus-based analysis of these
pronouns in naturally-occurring data was carried out, highlighting the different constraints
that cause writers to choose one pronoun over another. Then, thirteen sentences with tricky
third-person pronouns taken from the IBM-Lancaster Associated Press corpus were presented
in writing to two groups of native and non-native speakers of English. The results indicated
that most native speakers choose third-person pronouns depending on the socio-cultural
context and pragmatic factors, showing an inclination to bend the formal rule of pronounantecedent agreement. However, the majority of non-native speakers had a tendency to abide
by the prescriptive rule of pronoun-antecedent agreement, showing little or no sensitivity to
context. The study concluded that pronoun-antecedent agreement has proven to be an area
where it is difficult to draw a line between standard and non-standard usage.

Keywords: third-person pronouns, cohesive devices, native speakers, non-native speakers,


pragmatic constraints

30

Introduction
Traditionally speaking, pronouns are simply defined as words used instead of a noun or a
noun phrase to avoid repetition. Quirk et al. (1985) have defined pronouns in English as
noun-like but differ from nouns in that they have distinct forms in terms of case, person,
number, and gender as opposed to nouns in general. Fromkin et al. (2007) have described
them as substantives whose interpretation depends on syntax and context.
Standard English grammar provides the reader with the prescriptive rule that a
pronoun must agree with its antecedent for person, number, and gender (Kroeger 2005: 138).
When the gender of an antecedent is unspecified, as with student, nurse, everyone, standard
grammar states that the default pronoun to be employed is the masculine one. According to
the Chicago Manual of Style (2010), this approach is no longer acceptable as it is taken to be
outdated and sexist. As such, other approaches are adopted in an attempt to offer a genderneutral resolution, as in (1).

(1)

a.

A student must do his/her homework.

b.

A student must do their homework

But some people find repeating his or her throughout a long piece of writing irritating and
others find using plural pronouns in such contexts ungrammatical. For example, Mangan
(2010) has asked for a gender-neutral third-person singular pronoun. Einsohn has even gone
further saying that the newer grammar books recommend using the plural pronoun after an
indefinite subject (2011: 361).
Third-person pronouns are the only class of pronouns which are inherently cohesive,
in that a third-person pronoun form typically refers anaphorically or cataphorically to another
item in the text. For example, first- and second-person forms do not normally refer to the text
at all; their referents are defined by the speaker and hearer speech roles and are normally
interpreted exophorically by reference to the situation. A third-person form implies the
presence of a referent somewhere in the text, and in the absence of such a referent the text
appears incomplete.
Third-person pronouns are very important for the semantic interpretation of texts
because they contribute to cohesion. The concept of cohesion is a semantic one referring to
the relations of meaning that exist within a text that define it as a text (Halliday and Hasan
1976). As such, cohesion is not a structural relation; although cohesion relations could be in
the same sentence, they are not restricted by sentence boundaries. In its most normal form, it
31

is simply the presupposition of something that has been mentioned somewhere in the text
(endophora), whether in the preceding sentences (anaphora) or in the following ones
(cataphora). In addition, third-person pronouns may sometimes co-refer with entities which
cannot be found in the text itself but in the extralinguistic context (exophora) (Quirk et al.
1985).
According to Wilson (1990, cited in Partington 2003), the first-person pronoun we can
be used by politicians in their strategies either inclusively to convey solidarity or exclusively
to stress joint responsibility. Clearly, there is more to pronouns than the simple formal
definition which describes them as words used instead of nouns that must agree with their
referents in gender and number. Pronouns can reflect language users' attitudes and social
orientations. As Curzan has put it:

[P]ronoun selection depends on speaker attitudes and involvement as well as


cultural prototypes [and] all of these factors in turn rest on the same
foundation: the concepts of sex and gender held by language users and the
society in which they express themselves. (2003: 29)
In the same vein, Gocheco (2012: 5) has claimed that pronouns, among other linguistic
features, can shed light on how participants project themselves and how they express
associations with others.
Apart from this brief introduction, this paper will be presented in five sections.
Section two outlines the general research methodology. Section three offers the results of the
corpus-based analysis of the behavior of third-person pronouns in journalistic texts and
presents the elicited performance of native and non-native speakers in a set of sentences
selected from Associated Press news articles as compared with their syntactic, semantic and
pragmatic behavior in the data. Section four offers a discussion of the results. Section five
gives a brief summary of the conclusions drawn from this empirical work.

Statement of the Problem


The researchers students, as EFL learners of English, often complain that they get confused
by pronoun-antecedent agreement when interacting with native speakers. For instance, one
student complained that she does not know if generic he is still used in present-day English as
inclusive of she or as an acceptable choice to refer to generic antecedents like someone or
dual-gender words like student. She wanted to know whether using the coordinate
32

construction he or she irritates native speakers, or if the plural they is acceptable to refer to
individuals with unknown gender. When the researcher approached a native speaker of
American English for advice, she replied: Who knows exactly what pronoun to use
anymore! This definitely puts a greater burden on non-native teachers who have limited
exposure to English, as non-native learners of English, especially beginners, need explicit
rules to learn the language; otherwise, they will be lost. Given this challenge, the current
study attempts to provide insights on the reality of pronoun agreement and the challenges it
poses to both native and non-native speakers.

Research Objectives and Questions


Since the great bulk of linguistic arguments and conclusions are currently derived from
reliable evidence stemming from gauging native speakers natural performance, or from
spoken or written corpora, or from contrastive studies, this paper will be concerned with the
natural function and behavior of third person pronouns (he, she, it, they) as cohesive elements
in naturally-occurring data rather than their rigid theoretical features. Then, it will offer a
comparative analysis of the spontaneous choices and preferences of a group of American
native English speakers as compared to that of a group of non-native Palestinian speakers on
the same items extracted from Associated Press articles.
More specifically, the current study can contribute by providing some answers to the
following questions:

1. What grammatical, textual, and extralinguistic factors constrain co-reference in


American English journalistic texts?
2. Are there group differences between native and non-native speakers of English on the
use of third-person pronouns?
3. To what extent do the factors of gender and age play a role in shaping language users
choice of one pronoun over another?

Methods
The data used in this study come from two sources, and they are joined together to hopefully
generate powerful insights concerning third-person pronoun-antecedent agreement. The first
source was a collection of examples taken from the IBM-Lancaster Associated Press corpus
33

(A001 A010), consisting of some one million words of tagged 1970s American Press
material. The second source was a survey of native and non-native speakers performance.

Corpus-based analysis
For the purpose of this analysis, the Associated Press was selected for its prestigious character
and the topics dealt with are interesting for international audiences, though they are directed
to the general American public.
In this analysis, the frequency distributions of the various types of cohesive devices
were presented and examined. Then, aspects of usage in the corpora that required the choice
of a given pronoun were identified and described. All examples were manually processed and
systematically classified in order to identify the environments in which the writer chose one
pronoun over another.
Survey of native and non-native speakers performance
Instrument
The second source of data is a survey of 40 native and 40 non-native speakers usage of
pronoun-antecedent agreement in a selected set of sentences mostly taken from the
Associated Press corpus. As shown in the Appendix, the survey consists of two parts. In the
first part, the participants were asked to fill in each blank space with an appropriate thirdperson pronoun to complete the sentence, and in the second part they were asked to mark
their preferred choice either a passive construction with the third-person singular neuter it
used as its subject or an active construction.

Participants
The survey involved two groups of participants. The first consisted of native speakers with no
background in linguistics. Since the tested materials were assumed to be so basic and
universal that they could be generalized beyond the given sample, the non-probability
sampling, or snowball sampling was used. Snowballing allowed for locating information-rich
key informants. The first wave of participants were given a selection criterion (e.g., age,
gender, and no background in linguistics) that helped randomize the sampling process; they
were also asked to recommend for the second wave potential participants who lived the
farthest away. This sampling was not a stand-alone tool; it was just a way of selecting
participants and then the survey was conducted.
34

The 40 native participants were selected from the US states of Kansas and Missouri.
The median age was 35 and ages ranged from 18 to 65. The second group consisted of nonnative speaker participants. They were third-year English majors studying at Al-Quds Open
University who are native speakers of Arabic. Their median age was 24, and ages ranged
from 18 to 36 years.
All participants were instructed to base their responses solely on their immediate
reactions, without worrying too much about any rules they might have learnt about so-called
correct English. Respondents needed approximately ten minutes to complete the survey.

Findings
Corpus-based analysis
The main concern of this paper is the use of third-person pronouns as cohesive elements.
However, the existence of other cohesive devices in the corpus affects the frequency
distribution of these pronouns. Therefore, perhaps giving a sense of the incidence of such
cohesive devices as compared to the referential functions of third-person pronouns would be
useful to get the feel of their functioning (Table 1). In this respect, Halliday has said:

Continuity may be established in a text by the choice of words. This may


take the form of word repetition; or the choice of a word that is related in
some way to a previous oneeither semantically, such that the two are in
the broadest sense synonymous, or collocationally, such that the two have a
more than ordinary tendency to co-occur. (1985: 289)

Table 1 Distribution of cohesive devices in the corpus


Cohesive Devices

Number

Percentage

Total

Lexical Devices
Reference

Specific to General

96

15%

General to Specific

25

4%

Repetition
Substitution
Ellipsis
Cataphoric

364
153
5
8

57%
23%
1%
1.5%

Anaphoric

576

91%

Undecided

47

7.5%

1274

100%

643 = 50.5%

631 = 49.5%

Grand Total

1274

35

Starting with repetition, the data show that the use of this device has an important role in
journalistic language. In the data, there are 364 instances of repetition out of a grand total of
1274 different cohesive elements (see Table 1). In many texts, some nouns or noun phrases
are continuously repeated many times, though it is possible to use other cohesive devices in
the same places. For instance, in A001 127/ 128/ 129/ 130, the noun phrase the offender is
repeated four times. This phenomenon has one possible interpretation: the writer might have
found it safer to avoid the dilemma of choosing a pronoun appropriate to the situation.
Another case of repetition appears when the text is condensed with many nouns and
noun phrases. For instance, in A 008 43-50, which is a very short sports report about
basketball, repetition is the only cohesive device used throughout the whole text. A potential
reason for this is that as there are nine noun phrases (teams and players), the writer had to
repeat every noun to avoid confusion on the part of the reader.
The next partially lexical cohesive device used in the data is substitution. What
distinguishes this device from pronominal substitution is that it operates on both the syntactic
and the semantic levels. In other words, grammatically speaking, the substitution element has
to match its referent in terms of syntactic features (especially word class); and lexically, the
substitution element helps produce more coherent texts and solves the problem of intensive
use of other types of cohesive device.
As shown in Table 1 above, lexical substitution is the next most frequent cohesive
device after repetition. Of the 1274 cohesive devices identified, 274 of these are substitution
cases (general to specific = 25, specific to general = 96, synonyms or others = 153).
To begin with the first sub-class general to specific, in A 004 1 (A man opened fire
with a 22 caliber rifle ), then in A 004 3 (The man was subdued by bar patrons..), and
finally in A 004 5 (A deputy at El Paso County Jail said Barry Chvarak 21) the writer
starts the report by a general noun accompanied by the indefinite article a, then continues
with the same noun but with definite article the, to replace it finally by the man's name (a man
the man Barry Chvarak).
The second substitution sub-class is from specific to general. For example, in A 001
45 (A dormitory fire at the University of Northern Colorado that sent hundreds of students
scurrying from the building on Saturday), this long noun phrase modified by the relative
clause is used to describe in a very specific way the word fire, which is replaced later by
another synonym (blaze) in A 001 47 ( just after the blaze was discovered about 3.00 a.m.
).
36

The third sub-division of lexical substitution is the use of synonyms. In A 005 52-55
(The pact approved by the Association's executive I think this contract is a major step
forward Under the agreement, the players would receive an increase), here is a good
example of a series of synonyms (the pact this contract the agreement) to refer to the
same idea.
The last sub-class is lexical ellipsis. Halliday and Hasan (1976: 142) have argued that
the starting point of the discussion of ellipsis can be the familiar notion that it is something
left unsaid is used in the special sense of going without saying.
In the data, only 5 cases of lexical ellipsis have been identified. For instance, in A 005
102-103 (Police said the impact split the car in halfOne of the passengers, Kern Jones of
Cushing, remained in critical condition), the phrase One of the passengers could have been
written in a more explicit way (e.g., One of the cars passengers), however, the omission of
the word car, in the researchers opinion, does not change the meaning or cause any
confusion because the semantic relation between the words passenger and car is very strong
in this context.
Last but not least, an important characteristic of the data in this study is the use of
referential chains which are produced by the combination of lexical cohesion (repetition and
synonyms) and reference. A typical referential chain in the data can be found in A 004 1-21,
in which the noun man (A 004 1) is gradually replaced by the man (A 004 8), Chvarak (A
004 8), he (A 004 12), the guy and his brother (A 004 13), he (A 004 14) he and his vehicle
(A 004 15), the suspect (A 004 21).
As for the referential functions of third-person pronouns, as presented in Table 1
above, despite the significance of other reference tools and lexical devices in journalistic
discourse, the study reveals the predominance of anaphoric reference in the data. In A 001
A 010, out of 631 referential occurrences of the third-person pronouns, there are 576
anaphoric cases. Taking one of these cases in A 009 10 (The Whalers have been playing their
home games in the Springfield...), the possessive pronoun their refers anaphorically to the
noun Whalers.
With regard to cataphoric reference, in A 003 19 Although it has expressed support
for holding the Games, the German Olympic committee has taken no final stand , the
pronoun it cataphorically refers to the noun phrase the German Olympic committee which is
introduced later in the text.
Table 1 above indicates the low occurrence of cataphora in the journalistic texts; only
8 of the 631 referential uses of third-person pronouns are cataphoric. This is not surprising as
37

cataphora is generally uncommon. However, a possible specific interpretation of this is that


because journalistic writing is usually meant to be more straightforward and accessible to
ordinary readers than other types of writing, journalists might try to use the simplest
referential devices and avoid more difficult ones, such as cataphora.
A point to be added here is that, where the cataphoric reference occurs, the anaphoric
one is possible as well. In other words, we can equate two synonymous sentences in which
the positions of the pronoun and antecedent can be reversed (Quirk et al. 1985: 351). As
such, the above example can be easily transformed from cataphora into anaphora, as in:
Although the German Olympic committee has expressed support for holding the Games, it
has taken no final stand...
Moreover, the data reveal that the linguistic choices that writers tend to make may
affect the semantic and stylistic interpretation of a text. On the one hand, in A 010 125 (He
refused to take questions and returned inside), the writer did not repeat the pronoun he after
the conjunction and, though s/he could have done so. On the other hand, in A 007 107
(Taylor played for the Denver Broncos and Houston Oilers when those clubs were part of the
American Football League, and he coached for seven seasons), the writer does repeat he.
The researchers interpretation of these two cases is that, in the first, there is only one referent
and so the writer may have found it redundant to use he again. In the second case, the
repetition of the pronoun he helps the reader figure out easily the right referent because of the
big distance between the two elements.
There is another special case where reference is both anaphoric and cataphoric at the
same time. For example, in A 010 11 (They were: Suffolk, Queens, Brooklyn, Nassau, Erie,
and Manhattan), the third-person pronoun they co-refers with an antecedent in the previous
sentences A 010 10 (Six countries accounted for 38.6 percent of all traffic related deaths)
as well as forward with the items (Suffolk, Queens, ) that come after the same pronoun they
in the same sentence. This is a case of copular relationship, which is not the same as
cohesion, as it is a structural relationship.
All the cases discussed thus far involve specific reference; the pronoun has a definite
referent somewhere in the text. This implies that the data do not have any third person
pronouns with generic reference, i.e. there is no co-reference with any unspecified entity such
as people, animals, plants, etc.
The use of these various types of lexical substitutions appears to depend on the nature
of this genre. The fact that journalistic writing is intended for ordinary readers imposes some
constraints on the way information is presented. On the one hand, journalists tend to take into
38

account the tastes and needs of those who may get irritated by the intensive use of repetition,
confusion, ellipsis, or even pronominal reference and difficult synonyms. This is the device
known as elegant variation (Fowler 1965). On the other hand, they avoid the tricky pronouns
(e.g., cataphora, generic masculine he, and plural they referring to singular referents with
unspecified gender), and use, for example, repetition or substitution which proved to be
the most frequently used devices in the data. These techniques spare their readers potential
ambiguity or complexity.
The analysis above reveals that the traditional prescriptive rule that antecedents must
take gender- and number-matched pronouns is not highly respected. In the data, the plural
their is used to refer to the singular antecedent anyone in one case and to the people of
Canada in another. The pronoun he is used instead of the pronoun it to co-refer with the
animal horse. Moreover, the third-person singular neuter it is used as a subject of a passive
construction where the more straightforward active construction could have been used. These
interesting cases and others break the prescriptive requirements for the use of third-person
singular pronouns. Clearly, much still remains to be done to clarify how this affects native
and non-native speakers usage.
Survey of native and non-native speakers performance
This section offers a comparison of native and non-native speakers use of third-person
pronouns in problematic sentences taken from the Associated Press corpus and other sources.
In the light of what prescriptive grammarians say concerning particular points of
pronoun-antecedent agreement and the journalists choices of third-person pronouns within
the given environments, the performance of native speakers as compared with that of nonnatives on the usage of 13 sentences will be examined. It should be noted that Figures 1-10
employ these abbreviations: (NNs = non-natives; Ns = natives; F =female; M= male; O = old;
Y = young; S = sentence).

Sentence 1: Every student must bring books to class.


According to prescriptive grammarians, a pronoun must agree with its antecedent in gender
and in number (Celce-Murcia 1985). Despite those grammarians dissatisfaction with using
they or the coordinator construction he or she as alternatives for using the masculine he when
the gender of a referent is unknown (Quirk et al. 1985), usage is changing. As shown in
Figure 1 below, the results of the survey confirm the observation that out of 40, 16 native
39

speakers chose his/her; 13 went for the plural their; and only 11 chose his to refer to the
genderless word student. Although native speakers proved to be equally divided between
using their or his/her, they showed readiness to bend the traditional rule, although more of the
older native speakers opted for abiding by the formal rule.
Interestingly, although this sentence would be grammatical as Every student must
bring books to class, it is surprising that the null option did not show up in the native
performance. This can be attributed to the way it was presented to informants in which the
sentence had an indicated gap after bring, and were told to write a third-person pronoun in
the blank.

4
6

3
his
his/her

their
O

2
Y

4
2

NNs

10
9
8
7
6
5
4
3
2
1
0

Y
M

Ns
S1

Figure 1 Comparison of Ns and NNs' performance for S1

Equally important, the results revealed that the majority of non-native speakers (26 out of 40)
abided by the prescriptive rule and chose the masculine pronoun his. This suggests that native
speakers are not conscious of or do not follow any systematic criterion while using their
language, and non-native speakers need well-defined rules to follow.

40

Sentence 2: A child learns to speak the language of environment. (Quirk et al. 1985:
316)
According to Quirk et al. (1972: 360), words like child are exceptionally referred to by the
pronoun its. According to the survey results, very few native speakers (only 4 out of 40) used
the pronoun its to refer to child. Almost half of them (19 of 40) chose his/her.
Native speakers were not keen to make a gender distinction and use the coordinate
pronouns his or her or the plural their to refer to the noun child. However, one fourth of the
respondents (11 out of 40) made a gender distinction in favor of the masculine pronoun his.
Perhaps parents tend to refer to their baby with personal reference, and those without children
may prefer to use non-personal reference. Quirk et al. (1985: 316) have described them as
emotionally unrelated to the child.

1
0

his

1
0

1
0

1
0

his/her

2
0

their
its
O

NNs

10
9
8
7
6
5
4
3
2
1
0

Y
M

Ns
S2

Figure 2 Comparison of Ns and NNs' performance for S2

Half of the non-native speakers chose its based on what they learn in their EFL
classes, and nearly the other half opted for his mainly because this pronoun in Arabic has a
generic reference.
An interesting result here has to do with the clear interaction between age and gender
in the native speakers performance regarding the use of his/her. Younger males use it twice
as much as older males, but older and younger females use it most frequently.

41

Sentence 3: Ridden by jockey Aki Kato, Tally Ho the Fox, scored second consecutive
stakes win.
According to Quirk et al. (1985), the pronoun it is mainly used to refer to lower animals. In
the data, there is an exceptional occurrence which does not lend itself to the formal rule. In
this sentence, the pronoun his is used to refer to the horse Tally Ho the Fox instead of the
pronoun it. If the horse is viewed as a non-personal entity, it is mainly referred to by the
neutral pronoun it. But, according to Quirk et al. (1985), people express male/female gender
distinctions with higher animals.
In this case, in which syntactic and lexicogrammatical rules do not seem to be in
operation, readers would not have been able to understand what Tally Ho the Fox was, if the
immediate environment had not provided them with the information ridden by jockey Aki
Kato (A004 45). The particular choice of the masculine he to refer to a horse may depend
on a number of variables, primarily the speakers relation to the species in question, but also
on her/his individual preference for pronoun usage. If the horse was not male, then it may be
explained in terms of human-like behavior of the horse which scores just like humans do. One
may add an additional factor encouraging the use of he or she: the horse was named.
Animals are mainly referred to with non-personal gender pronouns (it, its, itself).
However, Quirk et al. (1985: 314) have asserted that persons are not only human beings, but
may also include supernatural beings and higher animals.

3
5

9
8
6

5
0

10

its
his/her

his

4
2

1
0

NNs

Y
M

Ns
S3

Figure 3 Comparison of Ns and NNs' performance for S3

42

The majority of native speakers (23 out of 40) opted for the masculine he to refer to the word
horse. This is not surprising in racing contexts or with pets. However, a striking finding with
age and gender is that most of younger females went for his/her.
Most non-native speakers (26 out of 40) went for the non-personal pronoun its to refer
to the horse. Almost one third of them (14 out of 40) chose the pronoun his, taking the horse
as male, and none chose his or her.
According to Quirk et al. (1985), since English lacks gender-neutral third-person
singular pronouns, the plural they represents an alternative to using the masculine pronoun he
in reference to mixed-gender groups or persons of unknown gender.
Sentence 4: When the average person walks into a bank, looks over brochures in the
lobby.
In Figure 4, the results show that almost half of the native speakers, mostly the young group,
chose he or she (18 out of 40). Only one fourth of the native speakers (10 out of 40) went for
the singular they as an alternative to the masculine generic pronoun he.

0
1

10

9
7

6
they

he/she

5
5

4
3

he

2
1

1
0

NNs

Y
M

Ns
S4

Figure 4 Comparison of Ns and NNs' performance for S4

This result is consistent with findings of a study by Madson and Hessling (2001) in American
readers' perceptions of four alternatives to masculine generic pronoun in which the
respondents rated the they version as lowest in overall quality. However, this is inconsistent
43

with Johnson's (2004) claim that many English speakers prefer the singular they, and
proposes, based on evidence, endorsement of the singular they rather than other alternative
strategies. In the researchers analysis, the form of the verb looks makes the alternative they
ungrammatical, otherwise it would have been used more. Not surprisingly, perhaps, 10 older
native speakers went for he, as compared to only 2 young taking he as their choice. This
reflects the gap between the old and young cohorts.
As for non-native speakers, the majority of them fell back on what they learnt in their
EFL classes (30 out of 40) and chose he as a pronoun referring to the antecedent average
person.

Sentence 5: It was a singular act of courage on the part of Canada to spirit out of Iran a
group of diplomats who were not even own citizens.
This sentence begins with the prop it the most neutral and semantically unmarked of the
personal pronouns. The prop it in this sentence appears to function as an empty theme
(Quirk et al. 1985). This prop it is followed by the verb to be and a construction which
makes it natural to achieve focus on the item that follows: in effect, end focus within an SVC
clause (Quirk et al. 1985: 1384). This equals extraposition of subject clauses.
The observation here has to do with the writers apparent violation of number
concord, that is, her/his choice of the plural pronoun their to refer to a singular entity Canada.
This choice does have a purpose if explained within the politeness framework.
According to Brown and Levinson (1987: 180), plurality signifies respect
throughout the pronominal paradigm of reference. Likewise, Lin has argued that the idea of
plural is naturally and historically connected with power (1988: 159-160). It is also believed
that plurality is a very old and ubiquitous metaphor for power, the earliest instance of which
has been used to address the emperors of Rome in the 4th century (Brown and Gilman:
1960). Obviously, the writer takes Canada as a plural to show collectivity (as a nation
consisting of millions of people).
Surprisingly, the pronoun her, which represents another alternative, didnt show up
in the native speakers performance. As shown in Figure 5, the data illustrate that the high
majority of native speakers (36) and non-native speakers (37) chose the pronoun its. Although
the contextual information surrounding the antecedent Canada in sentence (5) presents it as
political entity, the respondents immediate reactions portray Canada as a geographical entity
(i.e., inanimate).

44

10

10

10

their

its
O

NNs

10
9
8
7
6
5
4
3
2
1
0

Y
M

Ns
S5

Figure 5 Comparison of Ns and NNs' performance for S5

Sentence 6: I dont think anyone would approve of having children attend classes in
this setting.
Figure (6) shows that the majority of non-native speakers (27 out of 40) chose the masculine
pronoun he to refer to the non-specific referent anyone. Native speakers, nonetheless, were
less observant of prescriptive rules; 12 of them chose the plural they. 12 chose the coordinate
construction he or she; 12 of them voted for ones, and only 10 old speakers chose his.
Regarding native speakers' performance, the results support Holmes (1998)
conclusion, after conducting an analysis of generic pronouns in New Zealand that 80% of
non-specific referents, such as anyone, are referred to by they. However, the results show that
almost half of non-native speakers stick to the traditional rule and chose his rather than their.
By using their as gender-free pronoun, the majority of native participants in this
survey appeared to be socially-sensitive to avoid gender bias. This is consistent with Mair and
Leechs conclusion that an ideological motivation (avoidance of sexual inequality) [can be a
reason, among others] for replacing an older pronoun usage by a newer one (2006: 336). In
addition, big differences were observed in the survey in the natives performance between
older and younger females regarding this point.

45

3
5

his
his/her
ones

0
2

1
0
1

1
0
1

3
2
0
1

their
O

NNs

10
9
8
7
6
5
4
3
2
1
0

Y
M

Ns
S6

Figure 6 Comparison of Ns and NNs' performance for S6

In this sentence, the plural pronoun their is used in defiance of strict number
concord in co-reference to the indefinite pronoun anyone. This violation appears to have a
different interpretation from the one mentioned above. The reporter, here, may have used the
plural as a convenient means of avoiding the traditional use of the third person masculine he,
as the syntactically unmarked form. Since the gender of the indefinite pronoun anyone is
unspecified, the writer chooses their in order to avoid possible attacks from those who view
the use of the generic he as a kind of sexual bias in language. In addition, by the choice of
their to co-refer with anyone, s/he also avoids being vulnerable to the objection of seeming
to have a male orientation (Greenbaum et al. 1990: 451). The choice of their seems to be
governed by more contingent, context-dependent pragmatic as well as social orientations.
Sentence 7: The hairdresser turned down the offer and returned inside.
The overwhelming majority of non-native speakers (37 out of 40) chose the feminine pronoun
she to refer to the antecedent hairdresser, and almost half of the native speakers (18 out of
40) chose she as well. This is by no means surprising since the default inference of
hairdresser in some communities is female.

46

1
0

0
2

10

7
6
10

10

he

4
3

she
2

he\she
0

NNs

Ns
S7

Figure 7 Comparison of Ns and NNs' performance for S7

Sentence 8: The blacksmith remained silent and refused to leave the coach.
As shown in Figure (8), the overall results show that the antecedent word blacksmith is
treated as male and referred to as he.

0
1

0
1

1
0
2

2
0

2
he/she

10

10

2
1

3
3

2
7

she

he
O

NNs

10
9
8
7
6
5
4
3
2
1
0

Y
M

Ns
S8

Figure 8 Comparison of Ns and NNs' performance S8

47

The overwhelming majority of non-native speakers (38 out of 40) chose the masculine
pronoun he to refer to the antecedent blacksmith, and half of the native speakers (20 out of
40) chose it as well. This is by no means surprising since the default inference of blacksmith
in some communities is male. Although none of the non-native speakers treated blacksmith as
female, three native speakers singled out the feminine she and seven of them preferred the
coordinate construction he or she.
Sentence 9: The Titanic was massive because killed thousands and thousands of
people.
The neutral pronoun it is almost always used in place of a single thing. However, there are,
according to Quirk et al. (1985), a few exceptions. For example, the feminine pronoun she
can be exceptionally used in a case of personification to refer to a ship.

3
10

10

she

it
O

NNs

10
9
8
7
6
5
4
3
2
1
0

Y
M

Ns
S9

Figure 9 Comparison of Ns and NNs' performance for S9

However, the results presented in Figure (9) show that the majority of native and non-native
speakers treated even the ship Titanic as a single thing rather than female and chose the
neuter pronoun it.
In this particular case, it seems that the referent is not to the ship itself, but the disaster
event which is named after the ship involved in it. Clearly, the option of she for the ship is not
a popular choice, and almost all language users indicated that the gap is best filled by it.
48

Sentence 10: The offender argued logically and calmly. This could eventually help
change the attitudes of the taxpayers and officials, who are in a position to give more
support to as well as to the victims.
As shown in Figure (10), the majority of native (24) and non-native speakers (27) chose the
masculine pronoun him to refer to the antecedent word offender. This goes in line with the
widely-held assumption that the default gender interpretation of offender is male. It seems
that this applies to both American and Palestinian communities.

0
1
0

1
1

2
1

2
0

them

him/her

2
0

2
0

0
7

her

2
0

9
6

him
O

NNs

10
9
8
7
6
5
4
3
2
1
0

Y
M

Ns

S10

Figure 10 Comparison of Ns and NNs' performance for S10

Interestingly, when the same sentence is considered in its context, the reporter did not use any
pronoun and chose to repeat the noun phrase the offender, although it would have been
equally explicit if it had been substituted for a pronoun. The same noun phrase is repeated in
A 001 127 and in A 001 128, and the context itself gives enough information for the reader to
interpret it in the right way. A possible explanation is that the reporter wants to avoid the
dilemma of choosing a pronoun appropriate to the situation. According to Mair and Leech
(2006), the generic use of he for both male and female was prevalent in the 1960s, but it
declined in the 1990s owing to the efforts of womens movements. The feminist

49

recommendations in this regard, together with the need to fill the gap left by the downfall of
the generic he, allowed for the deeply-rooted they to re-emerge.
A few years ago, the choice of the third person plural they would have been totally
unacceptable in terms of number concord, since the offender signals a singular entity which
requires a singular co-referent. Choice of the third person masculine he could have been seen
as male oriented or another manifestation of the subjection of women to men, whereas the
third person feminine she would have been awkward, since readers are not used to the idea of
the feminine as a generic pronoun. The writer successfully managed to avoid the dilemma by
repeating the noun.

Discussion
The two most striking results to emerge from both the corpus-based analysis and the survey
can be summarized as follows. The first is that the traditional prescriptive rule that
antecedents must take gender- and number-matched pronouns is not highly respected. In the
journalistic corpus data, many reporters pronoun choices hinge upon contingent, contextdependent pragmatic social and cultural factors. For example, the plural their is used to refer
to the singular antecedent anyone in one case and to the state Canada in another. The pronoun
he is also used to refer to the animal horse instead of the pronoun it. Moreover, the results
obtained from the survey show that native speakers are deeply divided about what pronoun to
use when dealing with entities of unknown gender relative to their age and gender.
Moreover, the systematic way in which the language users responses of a certain
pronoun pattern provides evidence that the age factor, for instance, constrains their choices
and leads to apparent sensitivity of judgment relative to the given socio-cultural context. With
this in mind, usage nowadays is changing under the pressure of social, cultural, and pragmatic
constraints.
Further evidence can be obtained from sentences (11-13) below, which are extracted
from the survey. They show how the third-person singular neuter it is used as the subject of a
passive construction where the more straightforward active construction could have been
used.

11.

A. It is hoped that as a result, the public might view the offender in a more positive light.
B. I hope that the public might view the offender in a more positive light.

12.

A. It was not known immediately what interest rates would be charged.

50

B. The minister did not know immediately what interest rates would be charged.

13.

A. It also was announced that Bowman's squad had lost three players to injury.
B. The coach announced that Bowman's squad had lost three players to injury.

To begin with sentence (11) (A 001 28), the third person singular neuter it is used as a subject
of a passive construction. A possible interpretation of this choice is that since newspaper
language has to be objective and unbiased, the writer tries to disassociate her/himself from the
potential hope expressed in the utterance, simply because s/he is not the appropriate person to
express feelings and hopes, but facts.
Let us consider for a moment how the sentence could have been stated otherwise: I
hope that the public might view the offender in a more positive light. The use of the first
person pronoun, together with the modal auxiliary might, expresses a strong hope on the part
of the writer, but at the same time, it can be viewed as a kind of a mild imperative (an
ethical code requires that you should view the offender ...), or as a strong suggestion, which
automatically turns a simple utterance to a Face Threatening Act (FTA) (Brown and
Levinson 1987: 10).
The Face Threatening Act, which is closely related to the notion of politeness,
imposes many constraints on the linguistic choices language users make, both in spoken and
written discourse. Brown and Levinson (1987), in discussing the notion of politeness, have
proposed that face consists of two related aspects. Negative face refers to the want of
every individual that his actions be unimpeded by others (i.e., one's freedom of action and
freedom from imposition). Positive face refers to the want of every member that his wants
be desirable to at least some others. Brown and Levinson (1987: 61)
Brown and Levinson (1987) have also highlighted the options available to the speaker
who must decide whether and how to utter a Face Threatening Act, that is, an act which poses
a threat to either the positive or the negative face of the addressee. These options range from
simply not doing the FTA (off-record), to doing the act boldly, with little or no concern for
face (on record, without redressive action). Between these two options, for a speaker who
chooses to do the FTA but who wishes to show an appropriate concern to face, there are
various FTA minimizing strategies and devices for mitigating the illocutionary force of
particular utterances: (cf. Brown and Levinson 1987, Lakoff 1972, Leech 1983).

51

Therefore, one could say that the journalist uses a negative politeness strategy, that
is, the passive construction, in order to preserve the addressee's (the public's) negative face, as
well as to avoid any kind of impingement on their desire to be free from imposition.
Likewise, in example (12), it was not known immediately what interest rates would
be charged... (A009 83), the writer again chooses the passive construction without an agent,
in order to avoid putting the blame on anyone, on the authorities or on the particular president
of the institution in our case. If the reporter had used the third person plural they to mean
persons unspecified, or persons with responsibility (Halliday and Hasan 1976: 53), s/he
would have again performed an FTA, that is, s/he would have shown disapproval or
contempt, expressions that both threaten the addressee's positive face want, by indicating that
the speaker doesn't care about the addressee's feelings, wants, etc. (Brown and Levinson
1987: 66).
Something similar can be observed in (13): it also was announced that Bowman's
squad, which already had lost three players to injury (A 007 13 in the data). The pronoun
it here does not co-refer with a previous antecedent, but it occupies the subject role of the
utterance. The journalist may have used this construction in order to avoid attribution of
blame or responsibility to persons involved in the situation.
When the same examples were presented to native and non-native speakers out of
their context, they, as shown in Figure (11) below, overwhelmingly selected the active form.
Putting the results obtained from the analysis and the survey together shows how the
contextual meaning of individual examples shapes their structural form relative to what the
speaker intends her/his meaning to be (i.e., by means of a pragmatic rather than a syntactic or
semantic explanation). Clearly, pragmatics goes a step further than text and textual meaning,
clarifying what exactly a piece of language means to a given person to the speaker or
addressee in a given speech situation (Leech 1980: 80).
These pragmatic constraints of the occurrence of the third-person pronouns refute the
claim that since semantic interpretation is the study of what a piece of language means
(Leech 1980: 80), pragmatic explanation of any piece of spoken or written discourse is
redundant.
The second striking result is that the obtained information on the spontaneous choices
of Americans as compared with those of non-native speakers paints a rather fuzzy picture.
However, a couple of patterns are worth mentioning here. Native speakers were more flexible
than non-native speakers in their choices of the plural they or the coordinate he or she to refer

52

7.5

97.5

100

Passive

92.5

95

87.5

80

Active

S13

S12

S11

Non-native Speakers

S13

S12

S11

Native Speakers

Figure 11 Comparison of Ns and NNs' performance for S11 S13

to singular words with unspecified gender. When using they as gender-free pronoun, native
speakers here are socially-sensitive to avoid gender bias in their communities. Interestingly, a
clear interaction between age and gender in the native participants, influencing the use of
his/her, has been observed in many cases; younger males used it twice as much as older
males, but older and younger females used it most and equivalently. Not surprisingly,
perhaps, older native speakers went for the masculine pronoun he to refer to non-specific
referents like anyone.
The data analysis highlights the problem of the nonexistence of gender-neutral
singular pronouns in English. An antecedent like student or anyone does not display whether
the referent is male or female. This study has shown the usage of third-person pronouns, in
native and non-native speakers completion of sentences extracted from Associated Press
news articles.
This is fair enough, if the issue is restricted to native speakers living in one society.
But can this be easily adopted by non-native speakers to become socially sensitive to the
culture-specific rules in English-speaking countries? Should EFL teachers and students
follow what prescriptive grammarians say, or study language as it is used by its speakers?
Although the plural they or the coordinate construction he or she is widely acceptable
nowadays in English-speaking societies to refer to gender-unknown singular words, their use
poses a real problem for non-native speakers who need systematic formal rules that can be
easily followed.
53

Clearly, non-native speakers of English tend to follow the prescriptive rule that a
pronoun must agree with its antecedent in gender and number without paying much attention
to social developments in the English-speaking communities. As the survey results indicate,
there is a clear gap between native and non-native speakers performance on the choice of
third-person pronouns. This can be explained in two ways. First, it may be partly attributed to
language interference in which L2 learners in general, and Arabic-speaking learners in
particular, transfer the pronoun system of their native language to L2 (Al-Jarf 2010). Unlike
nouns in Arabic which show grammatical gender, nouns including indefinite pronouns in
English (e.g., someone, anyone) do not display gender (Khalil 1999). Second, it may be
attributed to the lack of cultural knowledge and awareness on the part of the non-native
speakers. To bridge such a gap and avoid intercultural miscommunication, culture teaching is
badly needed to develop EFL students cultural awareness and competence. Clearly, EFL
teachers need to integrate some cultural knowledge into classroom teaching of certain
grammar points.
These findings break the prescriptive requirements for the use of third-person singular
pronouns. The overall impression one gets from the discussion above is that there is no
concrete well-defined criterion as to what pronoun to use when talking about an entity with
unspecified gender. Language users, whether native or non-native, need to be sensitive to the
culture-specific rules in English-speaking countries in order to use third-person pronouns
appropriately. Some people may accept that it is important to raise EFL learners and
teachers awareness of native speakers use, and to train them on how to notice the difference
in cultural orientations. Others may argue that non-native speakers should not be left at the
mercy of native speakers attitudes and desires, and they should not be hung in the middle
between strict prescriptive rules and users' actual practices or applications.

Summary and Conclusion


The purpose of this study is threefold: first to highlight the factors that constrain co-reference
in American English journalistic texts using grammatical, textual and extralinguistic
parameters; second to examine the extent to which native and non-native speakers of English
differ in terms of their use of third-person pronouns; and third to measure the impact of the
factors of gender and age on language users choice of one pronoun rather than another.
The analysis of the frequency and usage of third-person pronouns in Associated Press
articles has provided insights on their important role in achieving cohesion. It has also offered
a pragmatic explanation of the speakers (reporters) intended meaning in order to account for
54

third-person pronouns as cohesive devices, which could not be interpreted either syntactically
or semantically. Moreover, the study has shown how the pronoun usage reflected in the
reporters choices reflects the relations toward participants acts in the discourse. That is,
third-person pronouns, among other linguistic features, have displayed how reporters project
themselves and how they express associations or disassociations with others acts.
In order to carry out a performance comparison of native and non-native speakers use
of English third-person pronouns, thirteen sentences with tricky pronouns taken from the
corpus were presented in writing to two groups of native and non-native speakers. The results
have revealed that most native speakers chose third person pronouns depending on the sociocultural context and pragmatic factors, tending to bend the formal rule of pronoun-antecedent
agreement, especially when dealing with gender-unspecified words. However, the majority of
non-native speakers showed an inclination to abide by the prescriptive rules of grammar,
demonstrating little social and cultural sensitivity.
This seems to imply that a treatment of third-person pronouns, or pronouns in general,
based on syntactic conditions alone, may not lead to a consistent and convincing explanation
of their behavior. Going a little bit further, the results of this study suggest that the choice of
different forms in a particular discourse type may be a matter of emotional reflection, as well
as a matter of particular linguistic needs and attitudes, which have to be taken seriously into
consideration well before attempting any kind of syntactic, semantic or pragmatic analysis.
This has been reflected, for example, in the native speakers divided responses regarding the
antecedent child. In many cases, the results obtained from native speakers have shown an
interesting interaction between age and gender, influencing the use of one pronoun rather than
another. The performance of younger males or females was different in many cases from that
of older males and females. Pronoun-antecedent agreement has proven to be an area where it
is difficult to draw the line between standard and non-standard usage.
It should be noted that this study has not fully covered the broad topic of pronoun
usage. One limitation stems from the fact that the data was of a particular discourse type,
namely American newspaper reports, which are copy-edited according to prescriptive
stylebooks. Other limitations may be attributed to the participants characteristics. However,
this study should, hopefully, provide insights into the reality of pronouns and the challenges
they pose to both native and non-native speakers.

55

Acknowledgements:
I am immensely grateful to Professors Mike Garman and Aziz Khalil for their invaluable
comments on an earlier draft of this paper. Special thanks go to Professor Steve Schwegler
for helping me recruit American participants for the survey. I also extend my deepest thanks
to the anonymous reviewers for The Linguistics Journal for their helpful remarks and
insightful suggestions. My special thanks are also due to all American and Palestinian
participants who willingly volunteered to complete the survey.

References
Al-Jarf, R. (in press). Interlingual pronoun errors in English-Arabic translation. King Saud
University. Retrieved July 21, 2013 from
http://faculty.ksu.edu.sa/aljarf/Publications/Forms/AllItems.asp
Brown, P. and S. C. Levinson (1987). Politeness: Some universals in language usage.
Cambridge: Cambridge University Press.
Brown, R. and A. Gilman (1960). The pronouns of power and solidarity. In T. A. Sebeok
(Ed.), Style in language (pp. 253-276). Cambridge, Mass: MIT Press.
Celce-Murcia, M. (1985). Making informed decisions about the role of grammar in
language teaching. TESOL Newsletter, 1 , 4 - 5 .
Christophersen, P. and A. Sandred. (1969). An advanced English grammar. London:
Macmillan.
Curzan, A. (2003) Gender shifts in the history of English. Cambridge: Cambridge University
Press.
Einsohn, A. (2011). The copyeditor's handbook: A guide for book publishing and corporate
communications, with exercises and answer keys. Berkeley: University of California
Press.
Fowler, W. H. (1965). Fowler's modern English usage. In E. Gowers (Ed.), A dictionary of
modern English usage (2nd ed.). London: Oxford University Press.
Fromkin, V., R. Rodman and N. Hyams. (2007). An introduction to language (8th ed.). New
York: Thomson Corporation.
Gocheco, P. (2012). Pronominal choice: a reflection of culture and persuasion in Philippine
political campaign discourse. The Philippine ESL Journal, 8, 4-25.
Greenbaum, S., R. Quirk, G. Leech and J. Svartvik. (1990). A students grammar of the
English language. Essex: Longman.
Halliday, M. A. K. and R. Hasan. (1976). Cohesion in English. London: Longman.
Halliday, M. A. K. (1985). An introduction to functional grammar. London: Edward Arnold.
56

Holmes, J. (1998). Generic pronouns in the Wellington corpus of spoken New Zealand
English. Kotare: New Zealand notes and queries, 1(1), 32-40.
Johnson, S. (2004). Exploring the use of the 'they' pronoun singularly in English. California
Linguistics Notes, 29(1), 1-5.
Khalil, A. (1999). A contrastive grammar of English and Arabic. Amman: Jordan Book
Center.
Kroeger, P. R. (2005). Analyzing grammar: An introduction. Cambridge: Cambridge
University Press.
Lakoff, G. (1972). Hedges, fuzzy logic and multiple meaning criteria. Papers from the
Chicago Linguistic Society 8, 183-228.
Lakoff, R. T. (1984). Remarks on THIS and THAT. Papers from the Chicago Linguistic
Society 10, 345-356.
Leech, G. N. (1980). Explorations in semantics and pragmatics. Amsterdam: John
Benjamins.
Leech, G. N. (1983). The principles of pragmatics. London: Longman.
Lin, Yang-Yong (1988). The English pronoun of address: A matter of self-compensation.
Sociolinguistics, 2, 157-180.
Linde, C. (1979) Focus of attention and the choice of pronouns in discourse. In T. Givn,
(Ed.), Syntax and semantics, 12. New York: Academic Press.
Lyons, J. (1975). Deixis as a source of reference. In E. L. Keenan (Ed.), Formal semantics of
natural language (pp. 61-83). Cambridge: Cambridge University Press.
Lyons, J. (1977). Semantics. Cambridge: Cambridge University Press.
Madson, L. and R. Hessling. (2001). Readers' perceptions of four alternatives to masculine
generic pronouns. Journal of Social Psychology, 141(1), 156-158.
Mair, Ch. and G. N. Leech. (2006). Current changes in the English syntax. In B. Aarts and A.
McMahon (Eds.), The Handbook of English Linguistics (pp. 318-342). Oxford:
Blackwell.
Mangan, L. (2010). All style and substance. The Guardian. Retrieved 24 July, 2010 from
http://www.theguardian.com/lifeandstyle/mind-your-language/2010/jul/24/styleguide-grammar-lucy-mangan
Partington, A. (2003). Politics, power and politeness. In A. Partington (Ed.), The linguistics of
political argument (pp. 124 - 155). London and New York: Routledge.
Quirk, R, S. Greenbaum, G. N. Leech and J. Svartvik. (1972). A grammar of contemporary
English. London: Longman.
57

Quirk, R., S. Greenbaum, G. N. Leech and J. Svartvik. (1985). A comprehensive grammar of


the English language. London: Longman.
The Chicago Manual of Style (16th ed.). (2010). Chicago: University of Chicago Press.

58

Appendix I Use of Third-Person Pronouns in English


Dear colleague,
The purpose of this mini-research project is to survey native speakers opinions about the use of the third-person
pronouns. I would be grateful if you could spare ten minutes to do the following exercises. Dont worry about
any rules you may have learnt about what proper or correct English is. Work as quickly as you can what we
are interested in is your immediate reaction.
Thanks for your co-operation
Native Speaker ( ...................... )
Age (........................ )

Non-native speaker (..............................)


Sex

(...........................................)

Part I: Fill in each blank in the following sentences with an appropriate third-person pronoun and briefly
explain why.
-

Every student must bring (1) books to class.

The child learns to speak the language of (2) environment.

Ridden by jockey Aki Kato, Tally Ho the Fox, scored (3) second consecutive stakes win.

When the average person walks into a bank, (4) looks over brochures in the lobby.

It was a singular act of courage on the part of Canada to spirit out of Iran a group of diplomats who were not
even (5) own citizens.

I dont think anyone would approve of having (6) children attend classes in this setting.

The hairdresser turned down the offer and (7) returned inside.

The blacksmith remained silent and (8) refused to leave the coach.

The Titanic was massive because (9) killed thousands and thousands of people.

The offender argued logically and calmly. This could eventually help change the attitudes of the taxpayers
and officials, who are in a position to give more support to (10) as well as to the victims.

Part II: Which sentence would you prefer to use in your writing? Please tick the box next to it.
11. A. It is hoped that as a result, the public might view the offender in a more positive light.
B. I hope that the public might view the offender in a more positive light.
12. A. It was not known immediately what interest rates would be charged.
B. The minister did not know immediately what interest rates would be charged.
13. A. It also was announced that Bowman's squad had lost three players to injury.
B. The coach announced that Bowman's squad had lost three players to injury.

Thank you

59

An Analysis of Learner Use of Argument Structure Constructions: A Case of Thai


Learners Using the Passive and Existential Constructions in English
Napasri Timyam
Kasetsart University
napasrit@yahoo.com
Bioprofile: Napasri Timyam earned her Ph.D. degree in Linguistics from the Department of
Linguistics, University of Hawaii at Manoa, USA. She is currently an assistant professor at
the Department of Foreign Languages, Faculty of Humanities, Kasetsart University, Thailand.
Her current research interests include syntactic theory, Thai learners of ELF, and acquisition
of child Thai.
Abstract
Taking the Construction Grammar (CxG) and English as a Lingua Franca (ELF) approaches
together, this study examined whether Thai learners use of the English passive and
existential constructions deviated from the native speaker norms and how such deviations
reflected the general, universal characteristics of ELF. Data were taken from 70 Englishmajor students who represented ELF speakers at the upper-intermediate level. Two kinds of
writing tasks were designed writing with prompts and free essay writing.
The results revealed that passive and existential sentences produced by Thai learners
compared to native speakers are much more limited in structural complexity and also
semantic and pragmatic functions. Moreover, the results reflected that Thai learners use of
the English clausal constructions is also governed by three general and universal
characteristics, i.e., simplicity, regularity, and analogy, which have been found in the
phonology and pragmatics of different varieties of ELF.
The study extended the CxG scope from L1 settings to L2 phenomena by showing the
differences in constructional use between native and non-native speakers, which can be used
as guidelines for teaching argument structures to English learners. It also broadened the scope
of ELF research; ELF deviations at all levels sounds, words, phrases, discourse, and also
sentences are governed by some universal characteristics which reflect speakers motivation
to shape English in the direction that results in a simple and effective form of communication.

Keywords: argument structure constructions, varieties of ELF, the passive construction, the
existential construction

60

Introduction
In the theory of Construction Grammar (CxG), all levels of description in language lie in the
notion of construction, which refers to a pairing of form and meaning. Morphemes, words,
idioms, and phrasal patterns are all constructions since they are instances of form-meaning
correspondences (Fillmore 1988). Generalizations about particular arguments being topical,
focused, inferable, etc., as well as facts about the actual use such as frequencies are also
stated as part of the constructional representation (Goldberg 2002, 2009). Such perspective of
constructional properties suggests a more precise definition of a construction implied in the
theory, i.e., an association of form, meaning, and use.
Clause-level syntactic patterns, often referred to as argument structures, are one type
of construction because they are associated with a particular form, meaning, and use. A
fundamental idea behind the CxG approach to argument structure constructions is that they
designate event types, which are basic to human experience. The meanings of these event
types are rather general and abstract (Goldberg 1995). For instance, in English, the transitive
construction (of the form Subject-Verb-Object, as in Pat opened the door) denotes something
acting on something; the ditransitive construction (Subject-Verb-Object1-Object2, as in Pat
gave Jill a gift) denotes possessive transfer from one participant to another.
Compared to constructions at the lower levels, argument structures are more difficult
to acquire. When English-speaking children encounter new words, for example, they can
quite quickly pick up the form and meaning of those unfamiliar expressions from the
immediate context. In contrast, properties of an argument structure are general and abstract.
Children need to be exposed to a number of instances of one argument structure before they
can make generalizations about the form, meaning, and use inherently attached to that
construction.
Children learn their first language by making generalizations and drawing conclusions
based on the linguistic input they have received. They tend to lose this innate linguistic ability
when they grow up (Bley-Vroman, 1988), and the process of learning a second language is
more explicit and depends heavily on explanations of instructors. Based on this fact in the
acquisition literature, the task of learning an argument structure becomes even more
challenging for second language learners. While some constructional features are noticeable
and easy to describe, many general and abstract constructional features are hard to explain. In
order to appropriately use one argument structure, English learners need to recognize all of its
syntactic, semantic, and pragmatic principal properties. Given that their deviations in using
clausal patterns are often found, it is evident that this is not always the case.
61

Nevertheless, deviations produced by English learners do not always occur


sporadically. According to research on English as a Lingua Franca (ELF) an approach to
study English used for communication among people speaking different languages, there is
universal motivation underlying learners usage of English. In other words, there are some
general characteristics which are produced by English learners across language backgrounds,
including repetition, explicitness, and regularization, etc. Due to these universal features,
researchers adhering to this approach believe that the notion of ELF encompasses not only the
use of English internationally, but also the use and modification of a particular form of
English which does not necessarily conform to native speaker norms (Dewey 2007, Jenkins
2006, Seidlhofer 2001).
Based on this tenet of the ELF approach, ELF researchers regard common and
systematic forms of deviations as variations which are part of the natural process of
language contact and language change, rather than errors caused by incomplete acquisition
of the target language. Moreover, they hold that despite differences in minute detail, different
groups of L2 users have developed ELF in a rather similar direction, with general and
universal characteristics underlying the use of constructions at all levels.
Taking the two lines of the CxG and ELF approaches together, this study aims to
investigate learners use of argument structure constructions in English. Since a clausal
construction possesses general and abstract properties, the study hypothesizes that English
learners do not acquire all features associated with the construction. As a result, their use of
the construction deviates from the native speaker norms. Yet, such differences should not be
completely unexpected and lead to disunity because they are partly governed by the universal
principles of second language usage which have been observed in numerous varieties of ELF,
particularly in their phonological and pragmatic features (Cogo and Dewey 2006, Seidlhofer
2004).
The study focuses on Thai learners using the passive and existential constructions in
English. The passive is a construction in which the subject corresponds to the theme, as in
The glass was broken by the boy. The existential construction expresses the existence of an
entity, as in There are two books on the table. These constructions were chosen for the case
study for two major reasons. First, both constructions are known to possess a number of
linguistic and pragmatic properties, which trigger variations in the use of English learners.
Second, the two constructions have their unmarked counterparts. The active (as in The boy
broke the glass) is the unmarked structure for the passive while the non-existential structure
(as in Two books are on the table) is the unmarked version of the existential-there sentence.
62

Thus, the use of the passive and existential constructions is a linguistic option. Native
speakers choose them over the more basic structures due to very specific properties. It is
difficult for L2 learners to differentiate between the alternative structures and recognize all
properties particularly attached to each of the two constructions. In sum, by dealing with the
passive and existential constructions, the objectives of the study are: (1) to investigate Thai
learners use of the English constructions, in comparison with the native speaker norms, and
(2) to analyze the deviations in terms of the general, universal characteristics of ELF.

Literature Review
The review of the literature covers four areas: (1) CxG, (2) ELF, (3) the English passive
construction, and (4) the English existential construction.

Construction Grammar
The basic tenet of CxG is that constructions form-meaning correspondences constitute the
basic units of language (Goldberg 1995). The main objective of the theory is to provide a full
range of facts in language on the basis of various types of constructions available in human
languages.
Argument structures hold a special interest in the theory. This type of construction is
marked by syntactic, semantic, and pragmatic properties. According to the Principle of No
Synonymy of Grammatical Forms, the form of a construction is very specific; even slight
changes in a sentence structure can result in differences in meaning either denotational or
pragmatic meaning (Goldberg 1995). Thus, pairs of alternating sentences such as an active
and its passive counterpart belong to different constructions that denote subtle differences in
meaning. Semantically, an argument structure designates a scene basic to human experience,
and its meaning can be polysemous, having a family of different, but related senses. As a
result, there are semantic variations in the way speakers use a construction. For example,
while the English ditransitive typically expresses successful transfer, some ditransitive
sentences denote other related senses of transfer, including future transfer, intended transfer,
and negation of transfer. Pragmatically, the use of a construction varies along different kinds
of pragmatic dimensions, such as packaging of information structure, grammatical heaviness,
and register. All of these properties in form, meaning, and use contribute to the existence of
an argument structure construction in a language.

63

English as a Lingua Franca


Lingua Franca refers to a language used as a means of communication between people who
speak different languages. Speakers of English as a lingua franca are those who have learned
English as an additional language, and to whom it serves as the most useful instrument for
communication that cannot be conducted in the mother tongue, be it in business, casual
conversation, science, politics, etc. (Seidlhofer 2001).
With the more important role of non-native speakers and the increased acceptance of
various forms of English in the globalization period, ELF a recent approach to the study of
English has emerged. The growing body of ELF research has revealed the patterns of
change and linguistic fluidity emerging in the way English is transformed in lingua franca
interaction (Dewey 2007, Jenkins, Cogo and Dewey 2011). Close examination of a number of
features, mostly at the levels of phonology and pragmatics, reflects the underlying motivation
of ELF speakers. That is, there is a tendency to shape the language in the direction, which
renders a simple and effective form of communication. As Breiteneder (2009) pointed out,
this is a universal tendency for second language usage; speakers from different lingua cultures
who enter into intercultural communication situations usually shift their focus to simplicity
and effectiveness. ELF researchers (e.g., Breiteneder 2009, Cogo and Dewey 2006, Dewey
2007) have summarized a set of general characteristics found in ELF interactions among
speakers across various linguistic and cultural backgrounds. Table 1 presents these shared
characteristics, all of which contribute to simplicity and effectiveness in communication.
Table 1 General and Universal Characteristics of ELF

Characteristic
Repetition

Definition
ELF speakers often repeat their words and other speakers words. Repetition
is an accommodation strategy to achieve efficiency of communication,
signal agreement and alignment, show attention and engagement in the
conversation, and establish cohesion.

Explicitness and redundancy

Extra words are inserted to ensure clarity of the conversation.

Simplification

Complex forms are replaced by simple, shortened forms. Complex rules are
simplified.

Regularization

ELF speakers make use of rule regularizations to make the rules more
general and consistent and to avoid exceptions.

Analogy

ELF speakers prefer generalizing uses of expressions to all or more varied


contexts on the basis of predominant cases.

64

Table 2 NS Norms of the Passive Construction

Property

NS Norm

Syntax:
A passive verb appears in many forms, with various tenses, aspects, and auxiliaries.
An agent is usually omitted when it is unknown or irrelevant to the point being
discussed, when it is predictable by the context or world knowledge, and when it
refers to people in general.
An agent is retained when it conveys new information. Typically, it is introduced
by the preposition by.
Semantics and pragmatics:
The theme functions as the topic of a passive sentence; it usually expresses given
information.
Speakers tend to choose the passive when an agent at sentence-final position is
structurally heavy.
Non-basic passives:
The get passive is used with an event whose subject is partly responsible for the
result, or which happens unexpectedly.
The ditransitive passive is formed from a ditransitive verb (e.g., she was sent a note).
The prepositional passive is formed from an intransitive verb that occurs with a
preposition (e.g., the project was thought about).

The English Passive Construction


The passive is the construction in which the theme, instead of the agent, is linked to the
subject. As the passive sentence in (1) illustrates, the theme NP (the thieves) serves as the
subject, and the agent NP (police) downgrades to be the oblique.

(1)

The thieves were caught by police.

In terms of form, the passive structure includes a theme subject, a passive verb form
(usually consisting of be and a past participle), and an optional agent phrase. As to meaning, a
passive sentence is used to talk about an action from the viewpoint of the theme. Apart from
these basic form and meaning, the English passive is associated with a set of syntactic,
semantic, and pragmatic properties. The major characteristics of the construction as
frequently discussed in literature (e.g., Downing and Locke 2006, Finegan 2004, OGrady
2001, Parrott 2000) are listed in Table 2.
65

Table 3 NS Norms of the Existential Construction

Property

NS Norm

Syntax:
The form of be is varied, with various tenses, aspects, and auxiliaries.
In addition to be, a small number of verbs appears in the construction. Most are
intransitive verbs.
The displaced subject denotes countable, uncountable, or abstract entities.
The displaced subject tends to be long, having various kinds of modifiers.
The bare existential structure contains there, be, and a displaced subject. The
extended existential structure also contains an extension often a locative or
temporal expression.
Existential sentences often appear in the declarative form and in the simple structure.
Semantics and pragmatics:
The existential construction typically serves a presentational function. It draws an
addressees attention to the displaced subject.
The displaced subject typically conveys new information; its position is usually
occupied by an indefinite noun phrase.

The English Existential Construction


The existential construction expresses the existence of an entity. The locative expression
functions as the expletive subject which appears in the form of unstressed there.

(2)

There were ten students in the classroom.


The existential construction requires an unusual agreement pattern (OGrady 2001).

As the example in (2) shows, the verb agrees with the pivot noun phrase that follows, rather
than with the expletive subject there, which is neutral for number. As a result, the pivot
nominal is called a displaced subject, i.e., the real subject that is moved from the pre-verbal
original position to the position after the verb.
In terms of form, the existential structure consists of the expletive there, the verb be, a
displaced subject, and an optional extension. As to meaning, an existential sentence denotes
the presence of something. Apart from these basic form and meaning, the English existential
construction is associated with a set of syntactic, semantic, and pragmatic properties. The
major characteristics of the construction as frequently discussed in literature (e.g., Collins
66

2002, Downing and Locke 2006, Huddleston and Pullum 2005, OGrady 2001) are listed in
Table 3.

Research Methodology
The study employed the qualitative approach, by assigning a writing task with prompts and a
free writing task to collect data and interpreting the results in terms of the common and
systematic characteristics of Thai learners use of the English passive and existential
constructions. The details of the subjects and instruments are as follows:

Subjects
Since their deviations should reflect systematic variations not sporadic errors of beginning
learners, the target population was upper-intermediate Thai ELF learners who had received
formal instruction in English and had been schooled to conform to Standard English norms
over several years. Both the purposive and random sampling procedures were used to select
the representatives of the population. That is, the subjects were among those who met the
following language criteria. First, undergraduate students majoring in English at Kasetsart
University who had been in the program for more than one year were targeted since they had
studied the four skills of English extensively listening, speaking, reading, and writing,
especially during the period of their study at the university. Second, to ensure that they had
upper-intermediate level English knowledge and skills, only those with an average grade of
over 3.25 for all English classes taken at the university were considered. The subjects were
randomly selected from this group of students who met the two criteria.
Subjects meeting the selection criteria were in the third and fourth years of their study.
They were in the regular and special programs of English, affiliated with the Department of
Foreign Languages, Faculty of Humanities. The two programs shared the same curriculum;
they differed only in the class times. There were 139 third-year students and 122 fourth-year
students, yielding 261 third-year and fourth-year students in the two English programs. 35
third-year students and 35 fourth-year students were chosen to participate in the study. Of
these 70 subjects, 50 (71.4%) were female, and 20 (28.6%) were male; 40 subjects (57.1%)
studied in the regular program while 30 (42.9%) studied in the special program. The average
age of all the subjects was 22, and the average number of years of English study was 16.

67

Instruments
Two types of writing tasks were designed. In order that the subjects could concentrate on
their writing, they were assigned to do the tasks in two separate sessions, which took place on
different days. There was no time limit on finishing each task; however, most subjects could
finish within two hours. The designs and instructions of the tasks are as follows:

Writing Task with Prompts


The writing task with prompts included two sub-tasks picture description and Thai-English
translation. For the first sub-task, three pictures depicting people doing various activities in
different places (such as a beach where people were doing relaxing activities) were prepared.
Based on several syntactic studies which have demonstrated that production of the target
structure is likely to be enhanced by using lexical items as prompts (e.g., McDonough and
Kim 2009), twelve expressions relevant to the scene depicted in each picture were given.
They included five critical items that prompted the target structures (i.e., the verb be, the verb
get, two past participle forms, and the expletive there) and seven fillers, which were related to
other constructions or provided unfamiliar vocabulary (e.g., on vacation, easel). To minimize
hints to the students, words corresponding to the same target structure (e.g., there and be,
be/get and a past participle) were placed separately, with one or more fillers between them.
Each subject was randomly presented with one of the three pictures. On the top of the
picture, there was an instruction to write what they saw by using about ten to twelve
sentences. A list of twelve lexical items was provided in the box below the instruction. The
subjects were encouraged to use the given expressions in their description of the picture, and
they were allowed to use any of those expressions more than once.
As to the second sub-task, a test containing eight Thai sentences was constructed. Two
sentences were targeted at the passive construction; two were targeted at the existential
construction; and the other four served as fillers. A list of twelve expressions relevant to the
content in the eight sentences was provided. The list included five critical items prompting
the target structures (the verb be, the verb get, two past participle forms, and the expletive
there) and seven fillers related to other constructions or providing unfamiliar vocabulary. The
critical items of the same target structure were placed separately.
The students were instructed to translate all sentences into English. They were
encouraged to use the given expressions in the box below the instruction. They were allowed
to use any of the expressions more than once.

68

Free Writing Task


For the second session of the tasks, the subjects were asked to write one essay on a topic of
their own interest or choose one of six suggested topics. Three of these topics were nonacademic (e.g., my favorite hobbies) while the other three were concerned with more
serious or academic issues (e.g., the problem of deforestation in my country). The objective
of this task was to stimulate the subjects to tell stories they were interested in concerning
relaxing or serious topics from their own experience and linguistic knowledge by using
expressions and structures they were familiar with.
The subjects were given sheets of paper; instructions were in English. They were told
to write an essay (approximately 1,200-1,500 words) about one topic. Moreover, to ensure
that the target sentences obtained from the task would be sufficiently substantial for the
analysis, the researcher encouraged the subjects to write more than one essay on different
topics. Since all of the subjects had taken several English writing classes, most of them were
able to write on two topics and some subjects could finish three topics in one writing session.
The total number of the essays written by the 70 subjects was 158.

Results
This section is divided into three parts. The first part involves the passive construction. The
second part discusses the existential construction. The last part analyzes how the Thai
learners use of the constructions reflects the general, universal characteristics of ELF.
Thai Learners Use of the English Passive Construction
Table 4 presents the number of passive sentences and passive verb phrases taken from each
task and sub-task. Since several sentences contained more than one passive verb phrase, the
number of the passive verb phrases outnumbered that of the passive sentences.
Table 4 Number of Passive Sentences and Passive Verb Phrases

Task/Sub-task

Picture description

Number of Passive Sentences

Number of Passive Verb Phrases

65

70

Translation

271

277

Essay writing

455

501

Total

791

848

69

Of these three data sources, the sentences from the students essays are considered the
best indicator of how the Thai students used the English construction. The sentences from
essay writing are naturalistic, or naturally occurring data; the students produced these
sentences from their own linguistic repertoire, with no hints or stimulation to use any
particular features through the provided word prompts, pictures, or Thai counterpart
sentences. Accordingly, the results from the essay and the writing with prompts are presented
separately for both the passive and existential constructions. This is to see whether the results
from the naturalistic data and elicited data supported each other regarding the Thai students
use of the constructions.

Table 5 Passive Verb Forms

Essay Writing

Picture Description & Translation

Verb Form

Frequency

Verb Form

Frequency

Present simple

232 (46.3%)

Present simple

147 (42.4%)

Past simple

67 (13.4%)

Present perfect

60 (17.3%)

The modal can

56 (11.2%)

The modal can

53 (15.3%)

Present perfect

29 (5.8%)

Past simple

38 (11%)

The modal should

26 (5.2%)

Future simple

34 (9.8%)

To infinitive

25 (4.9%)

To infinitive

4 (1.2%)

Future simple

23 (4.6%)

Present participle & gerund

4 (1.2%)

Present participle & gerund

8 (1.6%)

Present continuous

2 (0.6%)

The modal may

8 (1.6%)

Present perfect continuous

2 (0.6%)

The modal could

7 (1.4%)

Past continuous

1 (0.2%)

The modal must

4 (0.8%)

The modal would

1 (0.2%)

The modal have to

4 (0.8%)

The modal may

1 (0.2%)

The modal would

3 (0.6%)

Bare infinitive

2 (0.4%)

Present continuous

2 (0.4%)

The modal might

2 (0.4%)

Past continuous

1 (0.2%)

Past perfect

1 (0.2%)

Imperative

1 (0.2%)

Total

Total

347

501

70

1. Passive Verb Forms


Despite their variety of passive verb forms, the Thai students distinctly wrote passive
sentences in the present simple tense in essay writing (46.3%) and in the picture description
and translation (42.4%). Since Thai verbs do not have inflection to show the time reference,
this finding shows that many Thai students generalize the use of the present simple tense to
various situations not only facts or timeless events but also other situations they do not want
to clarify the time reference of.

2. Auxiliary Verbs
The students passive sentences were predominantly formed by the typical passive auxiliary
verb be in essay writing (97.2%) and in the picture description and translation (96.8%). This
finding shows that Thai students usually produce the basic form of the English passive verb
phrase; variant forms containing other auxiliaries are uncommon.

Table 6 Auxiliary Verbs

Essay Writing

Picture Description & Translation

Auxiliary

Frequency

Auxiliary

Frequency

be

487 (97.2%)

be

336 (96.8%)

become

6 (1.2%)

get

10 (2.9%)

get

4 (0.8%)

seem

1 (0.3%)

feel

3 (0.6%)

look

1 (0.2%)
Total

Total

347

501

3. The Agentless Passive


The students frequently omitted agent phrases in essay writing (77.4%) and in the picture
description and translation (66%). This suggests that Thai students perceive the most distinct
pragmatic property of the construction, i.e., to talk about an event from the perspective of the
theme, thereby making the agent become less prominent and very often be eliminated from
the structure (OGrady 2001). Because of the awareness of the downgraded agents status,
Thai students tend to produce the passive without explicitly identifying the doer of the action.

71

Table 7 Agentless Passive

Essay Writing

Picture Description & Translation

Agent Phrase

Frequency

Agent Phrase

Frequency

Passives with no agent

388 (77.4%)

Passives with no agent

229 (66%)

Passive with an agent

113 (22.6%)

Passive with an agent

118 (34%)

Total

501

Total

347

4. Contexts for Agent Omission


The students most often omitted the agent phrase when it was unidentified or irrelevant to the
point being discussed in essay writing (61.3%) and in the picture description and translation
(65.1%). This is the context that is also most typical in native speaker English (Finegan
2004). This finding supports the result of the previous topic. Thai students perceive the
English passive as the structure for downgrading the agent role; thus, they tend to choose the
passive when they do not know or are not interested in the agent.
Table 8 Contexts for Agent Omission

Essay Writing

Picture Description & Translation

Context

Frequency

Context

Frequency

Unknown or irrelevant

238 (61.3%)

Unknown or irrelevant

149 (65.1%)

Predictable by context

87 (22.4%)

Referring to people

63 (27.5%)

Referring to people

41 (10.6%)

Predictable by context

16 (7%)

Predictable by world knowledge

22 (5.7%)

Predictable by world knowledge

1 (0.4%)

Total

388

Total

229

5. Prepositions of the Agent Phrases


The students mostly put the preposition by before the agent phrase in essay writing (76.1%)
and in the picture description and translation (60.2%). This finding reveals once again that
Thai students usually write passive sentences of the basic, typical pattern; they frequently use
the typical preposition by as the agent marker.

72

Table 9 Prepositions of Agent Phrases

Essay Writing

Picture Description & Translation

Preposition

Frequency

Preposition

Frequency

by

86 (76.1%)

by

71 (60.2%)

to

11 (9.7%)

from

41 (34.7%)

with

5 (4.4%)

with

6 (5.1%)

because of

4 (3.5%)

due to

4 (3.5%)

from

3 (2.7%)
Total

Total

118

113

Table 10 Weight of Agent Phrases

Essay Writing

Picture Description & Translation

Weight

Frequency

Weight

Frequency

1-2 words

51 (45.1%)

1-2 words

37 (31.4%)

3-4 words

28 (24.8%)

3-4 words

35 (29.7%)

5-6 words

11 (9.7%)

5-6 words

37 (31.4%)

7-8 words

7 (6.2%)

7-8 words

6 (5.1%)

9-10 words

7 (6.2%)

9-10 words

1 (0.8%)

11-12 words

2 (1.8%)

11-12 words

1 (0.8%)

13 words or more

7 (6.2%)

13 words or more

1 (0.8%)

Total

113

Total

118

6. Weight of the Agent Phrases


The agent phrases mostly belonged to the two lightweight groups containing not more than
four words in essay writing (69.9%) and in the picture description and translation (61.1%).
This indicates that Thai students do not associate the construction with the end-weight
principle. For English speakers, the passive is preferred when the retained agent phrase is
long because it is allowed to occur at the end of the sentence the usual position for a heavy
element in the language (Downing and Locke 2006). For Thai students, however, the agent
phrase tends to be short. This finding is not surprising given that the principal pragmatic
property of the passive is concerned with the theme being topical and the agent being
73

downgraded. Many Thai students are aware only of this distinct pragmatic, which involves
the omission of the agent, and they do not recognize other additional functions including the
end-weight principle, which involves the presence of the agent.

7. The Theme Subjects


The theme subjects were often expressed as given and definite noun phrases in essay writing
(43.3%) and in the picture description and translation (32.8%). In fact, the most common
correlation in the sub-tasks with prompts was new and indefinite subjects (34.9%). Since
these sub-tasks gave the pictures and sentences for translation with no prior context, the
students were likely to present the subject nouns mentioned for the first time as new and
indefinite. Yet, a closed relation between given information and definiteness could be
identified. Therefore, in general, Thai students produce passive subjects showing the most
typical correlation between information structure and definiteness. The subjects of their
passive sentences like those of native speakers are usually given and definite.

Table 11 Theme Subjects

Essay Writing

Picture Description & Translation

Theme Subject

Frequency

Theme Subject

Frequency

Given & definite

217 (43.3%)

New & indefinite

121 (34.9%)

Given & indefinite

103 (20.5%)

Given & definite

114 (32.8%)

New & indefinite

80 (16%)

Given & indefinite

68 (19.6%)

New & definite

76 (15.2%)

New & definite

41 (11.8%)

Dummy it

20 (4%)

Dummy it

2 (0.6%)

Interrogative pronoun

5 (1%)

Interrogative pronoun

1 (0.3%)

Total

501

Total

347

8. Sentence Types by Grammatical Structures


The students often produced passive sentences in two structures the simple and complex
structures in essay writing (81.1%) and in the picture description and translation (96.4%).
This means that when writing in English, Thai students often express their idea in one
independent clause, i.e., the simple structure, which is considered the basic sentence structure.
In cases where they want to expand the message, they usually do it by adding one or more
dependent clauses to the independent clause, resulting in the complex structure.
74

Table 12 Sentence Types by Grammatical Structures

Essay Writing

Picture Description & Translation

Sentence Type

Frequency

Sentence Type

Frequency

Complex

235 (51.6%)

Simple

258 (76.8%)

Simple

134 (29.5%)

Complex

66 (19.6%)

Compound-complex

58 (12.7%)

Compound

10 (3%)

Compound

28 (6.2%)

Compound-complex

2 (0.6%)

Total

455

Total

336

9. Sentence Types by Communicative Purposes


The predominant sentence type of the passive produced by the students in essay writing and
in the picture description and translation was the declarative (95.8% and 99.4%, respectively).
Like the results in several topics, this finding suggests that Thai students usually produce
passives of the basic form; most passive sentences belong to the declarative structure, which
is considered the canonical sentence type.

Table 13 Sentence Types by Communicative Purposes

Essay Writing

Picture Description & Translation

Sentence Type

Frequency

Sentence Type

Frequency

Declarative

480 (95.8%)

Declarative

345 (99.4%)

Indirect interrogative

17 (3.4%)

Indirect interrogative

2 (0.6%)

Direct interrogative

4 (0.8%)

Total

Total

347

501

10. Basic and Non-Basic Passives


The passive sentences mostly belonged to the basic passive structure in essay writing (98%)
and in the picture description and translation (97.1%). Once again, the finding shows Thai
students preference for the basic structure; they tend to produce passive sentences of the
basic type. Non-basic passives are rare.

75

Table 14 Basic and Non-Basic Passives

Essay Writing

Picture Description & Translation

Passive Type

Frequency

Passive Type

Frequency

Basic

491 (98%)

Basic

337 (97.1%)

Ditransitive passive

6 (1.2%)

Get passive

10 (2.9%)

Get passive

4 (0.8%)

Total

Total

347

501

Thai Learners Use of the English Existential Construction


Table 15 presents the number of existential sentences and clauses taken from each task and
sub-task. Since some sentences contained two existential clauses, the number of the
existential clauses was a little higher than that of the existential sentences.

Table 15 Number of Existential Sentences and Clauses

Task/Sub-task

Number of Existential Sentences

Number of Existential Clauses

Picture description

121

125

Translation

162

163

Essay writing

244

248

Total

527

536

1. Verb Forms
The students wrote existential sentences mainly in the present simple tense for essay writing
(87.1%) and the picture description and translation (93.1%). This shows that many Thai
students generalize the use of the present simple tense to talk about not only facts and habits,
but also other event types in which they do not want to clarify the time reference.

76

Table 16 Verb Forms

Essay Writing

Picture Description & Translation

Verb Form

Frequency

Verb Form

Frequency

Present simple

216 (87.1%)

Present simple

268 (93.1%)

Past simple

16 (6.5%)

Past simple

10 (3.5%)

Future simple

6 (2.4%)

Present perfect

5 (1.7%)

Present perfect

2 (0.8%)

Future simple

4 (1.4%)

The modal may

2 (0.8%)

The modal might

1 (0.3%)

The modal would

2 (0.8%)

The modal must

2 (0.8%)

Past perfect

1 (0.4%)

The lexical verb seem to

1 (0.4%)

Total

Total

288

248

2. Types of Verbs
The students overwhelmingly chose the typical verb be in essay writing (98%) and in the
picture description and translation (100%). The finding reflects that Thai students usually
produce existential sentences of the basic form. Moreover, it suggests that they consider the
form there + be an essential part of the construction; they treat this specific pattern as an
idiomatic expression whose elements always co-occur and do not allow much variation.

Table 17 Types of Verbs

Essay Writing

Picture Description & Translation

Verb

Frequency

Verb

Frequency

be

243 (98%)

be

288 (100%)

come

2 (0.8%)

come up with

1 (0.4%)

remain

1 (0.4%)

seem to be

1 (0.4%)

Total

Total

288

248

77

3. Types of the Displaced Subjects


The students strongly associated the existential construction with countable nouns in essay
writing (85.5%) and in the picture description and translation (95.1%). This reflects that Thai
students use the construction to mainly talk about the presence of countable, discrete entities.
This is not surprising because countable nouns are the most common type of nouns, and all
the results reported so far have shown that Thai students tend to use the basic, typical forms
of the English constructions.

Table 18 Types of Displaced Subjects

Essay Writing

Picture Description & Translation

Displaced Subject

Frequency

Displaced Subject

Frequency

Countable noun

212 (85.5%)

Countable noun

274 (95.1%)

Abstract noun

15 (6.1%)

Uncountable noun

13 (4.5%)

Uncountable noun

13 (5.2%)

Countable & uncountable nouns

1 (0.4%)

Indefinite pronoun

8 (3.2%)

Total

Total

288

248

Table 19 Weight of Displaced Subjects

Essay Writing

Picture Description & Translation

Weight

Frequency

Weight

Frequency

1-2 words

36 (14.5%)

1-2 words

27 (9.4%)

3-4 words

34 (13.7%)

3-4 words

65 (22.5%)

5-6 words

47 (19%)

5-6 words

25 (8.7%)

7-8 words

42 (16.9%)

7-8 words

28 (9.7%)

9-10 words

27 (10.9%)

9-10 words

46 (16%)

11-12 words

17 (6.9%)

11-12 words

31 (10.8%)

13-14 words

12 (4.8%)

13-14 words

11 (3.8%)

15 words or more

33 (13.3%)

15 words or more

55 (19.1%)

Total

248

Total

288

78

4. Weight of the Displaced Subjects


The displaced subjects mostly belonged to heavy weight groups containing more than four
words in essay writing (71.8%) and in the picture description and translation (68.1%). This
implies that Thai students are aware of the most distinct pragmatics of the construction, i.e.,
to introduce a new referent into the discourse (Collins 2002). Since the displaced subject is
new or unfamiliar to the addressee, it needs detailed description, resulting in the form of a
long noun phrase.

5. Information Structure and Definiteness of the Displaced Subjects


Most of the displaced subjects were expressed as new and indefinite noun phrases in essay
writing (78.6%) and in the picture description and translation (99%). This is also the most
typical kind of correlation for native speakers (Huddleston and Pullum 2005). This finding
confirms that Thai students are aware of the principal pragmatic function of the construction.
They usually encode the displaced subject that is newly introduced as an indefinite noun
phrase.
Table 20 Information Structure and Definiteness of Displaced Subjects

Essay Writing

Picture Description & Translation

Displaced Subject

Frequency

Displaced Subject

Frequency

New & indefinite

195 (78.6%)

New & indefinite

285 (99%)

Given & indefinite

50 (20.2%)

New & definite

3 (1%)

New & definite

2 (0.8%)

Given & definite

Given & definite

1 (0.4%)

Given & indefinite

Total

248

Total

288

6. Types of Existential Sentences


The number of bare existential sentences was much higher than extended ones in essay
writing (73.4%) and in the picture description and translation (75.3%). This indicates that
Thai students usually produce existential sentences in the basic structure, containing just the
three main components (there + be + displaced subject), without an additional extension.

79

Table 21 Types of Existential Sentences

Essay Writing

Picture Description & Translation

Type

Frequency

Type

Frequency

Bare

182 (73.4%)

Bare

217 (75.3%)

Extended

66 (26.6%)

Extended

71 (24.7%)

Total

248

Total

288

7. Types of Modifiers of the Bare Existential


Most of the displaced subjects of the bare structure contained one or more modifiers in essay
writing (89%) and in the picture description and translation (98.6%). Moreover, relative
clauses and prepositional phrases accounted for a big proportion of modifiers in both data
sources (53.1% and 68.6%, respectively). They are among the most common modifiers of
nouns in English (Downing and Locke 2006). Since Thai students are aware that the main
pragmatic function of the construction is to introduce a novel entity, they tend to give the full
description of this unfamiliar referent by adding various pre- and post-modifiers.

Table 22 Types of Modifiers of the Bare Existential

Essay Writing

Picture Description & Translation

Modifier

Frequency

Modifier

Frequency

Relative clause

71 (29.2%)

Relative clause

101 (35%)

Prepositional phrase

58 (23.9%)

Prepositional phrase

97 (33.6%)

Adjective

42 (17.3%)

Present participial phrase

47 (16.3%)

Infinitive phrase

29 (11.9%)

Adjective

28 (9.7%)

Present participial phrase

14 (5.8%)

Noun

7 (2.4%)

Adjective phrase

9 (3.7%)

Past participial phrase

4 (1.4%)

Noun phrase

7 (2.9%)

Adjective phrase

3 (1%)

Past participial phrase

7 (2.9%)

Noun phrase

1 (0.3%)

Noun

6 (2.5%)

Adverb

1 (0.3%)

Total

243

Total

289

80

8. Types of Extensions of the Extended Existential


Locative extensions were common in the extended existential sentences in essay writing
(72.2%) and in the picture description and translation (77.3%). The locative expression is the
most common type of extensions in native speaker English (Huddleston and Pullum 2005).
Once again, the result shows that Thai students usually produce existential sentences of the
basic, typical structure. They prefer the locative expression, which is the most typical
extension of the extended existential structure.

Table 23 Types of Extensions of the Extended Existential

Essay Writing

Picture Description & Translation

Extension

Frequency

Extension

Frequency

Locative

52 (72.2%)

Locative

58 (77.3%)

Temporal

19 (26.4%)

Temporal

17 (22.7%)

Comparison

1 (1.4%)
Total

Total

75

72

9. Sentence Types by Grammatical Structures


The students mostly produced existential sentences in two structures the complex and
simple structures in essay writing (80.3%) and in the picture description and translation
(93.6%). Like the passive, when writing an argument structure, Thai students often express
their idea in one independent clause, i.e., the simple structure. When they want to expand the
message, they usually do it by adding one or more dependent clauses to the independent
clause, creating the complex structure.

Table 24 Sentence Types by Grammatical Structures

Essay Writing

Picture Description & Translation

Sentence Type

Frequency

Sentence Type

Frequency

Complex

126 (51.6%)

Complex

135 (47.7%)

Simple

70 (28.7%)

Simple

130 (45.9%)

Compound-complex

39 (16%)

Compound

11 (3.9%)

Compound

9 (3.7%)

Compound-complex

7 (2.5%)

Total

244

Total

283

81

10. Sentence Types by Communicative Purposes


The predominant sentence type of the existential construction produced by the students in
essay writing and in the picture description and translation was the declarative (98.8% and
100% respectively). This finding suggests again that Thai students usually produce existential
sentences of the basic structure; most sentences belong to the declarative, canonical form.

Table 25 Sentence Types by Communicative Purposes

Essay Writing

Picture Description & Translation

Sentence Type

Frequency

Sentence Type

Frequency

Declarative

245 (98.8%)

Declarative

288 (100%)

Indirect interrogative

3 (1.2%)

Total

Total

288

248

Thai Learners and Universal Characteristics of ELF


All these characteristics of passive and existential sentences reveal one fact about Thai
learners usage of the English constructions. That is, when used by native speakers, the two
constructions are known to be associated with a variety of basic and non-basic properties.
However, when used by Thai learners, the two constructions are simplified and generalized to
such an extent that they usually exhibit only the most distinct, fundamental properties in
terms of syntax, semantics, and pragmatics. Accordingly, sentences in the two constructions
produced by Thai learners are much more limited in terms of structural complexity and
semantic and pragmatic value. It is important to note that passive and existential sentences
produced by native speakers are also associated with basic linguistic characteristics, but the
association is not as strong, and thus various non-basic patterns are prevalent.
Applying such unique usage to the ELF framework, we find that the properties of the
passive and existential constructions produced by Thai learners serve to reflect three general
and universal characteristics of ELF. These include (i) simplification, (ii) regularization, and
(iii) analogy.
Simplification Simple and Basic Structural Patterns
Simplification is revealed in many properties of the constructions produced by Thai learners.
Most of them involve syntax; shortened and basic syntactic forms are preferred to complex
82

and non-basic ones. This characteristic results in the association of the constructions with
simple, basic structural patterns.
For instance, passive and existential sentences are usually of the basic type; the
passive consists of the typical auxiliary be and a past participle while the existential structure
is made up of the expletive there, the typical verb be, and the displaced subject. More
complex or non-basic forms, such as ditransitive passives and extended existential sentences,
are not frequently found among Thai learners.
Regularization and Analogy No Variety in Form and Meaning
Regularization and analogy are reflected by many properties of the two constructions
produced by Thai learners. They involve syntax, semantics, and pragmatics; various kinds of
constructional features are regularized and generalized to become more general and consistent
on the basis of predominant cases. These characteristics result in no great variety in the use of
the constructions.
In terms of syntax, for example, passive and existential sentences do not appear in
various verb forms. In most cases, they are in the present simple tense, which is regarded as
the unmarked verb form of English. Moreover, since by is the typical marker of the passive
agent (Parrott 2000), most agent phrases produced by Thai learners by means of analogy
are introduced by this preposition. Likewise, since the majority of nouns are countable,
almost all existential sentences produced by Thai learners talk about the presence of this type
of nouns which function as the displaced subject.
As to semantics and pragmatics, for example, both the theme subject of the passive
and the displaced subject of the existential follow the main tendencies of the constructional
usage. On the basis of predominant cases, the former usually appears as given and definite
and the latter as new and indefinite. Moreover, like native speakers who mainly choose the
passive when they want to focus the theme and downgrade the agent, Thai learners frequently
omit the agent phrase in their passive sentences. Likewise, the forms of displaced subjects in
Thai learners existential sentences are quite consistent. As entities newly introduced, most
displaced subjects are structurally heavy, containing various modifiers, particularly relative
clauses and prepositional phrases, which are among the most common kinds of English noun
modifiers.
Associated with these characteristics simplification, regularization, and analogy
English passive and existential sentences produced by Thai learners are involved with only
the most distinct and fundamental properties in syntax, semantics, and pragmatics. Moreover,
83

their uses are more regular and consistent, not as varied as those of native speakers. In other
words, due to these universal tendencies of second language usage, Thai learners treat the
passive and existential constructions in English as idiomatic expressions or pre-fabricated
chunks which are made up of rather fixed components and do not allow much variation and
flexibility in both form and meaning.

Discussion
Based on the characteristics of the students use of the English passive and existential
constructions, we can draw four general properties of argument structures typically produced
by Thai learners of English.

The Present Simple Tense


Thai learners usually produce argument structure constructions in the present simple tense.
The preference for this tense is largely due to first language interference. Thai verbs do not
have inflection to show tense or time reference; situation and context provide clues to avoid
any ambiguity (Swan and Smith 2001). The present simple tense in English is the unmarked
tense; it is used to describe general actions and states which are not viewed as being in any
way temporary or limited in time (Parrott 2000). Thus, the present simple tense is generalized
to talk about various situations whose time reference is not needed. For example, the results
of the study show that although passive and existential sentences produced by Thai learners
appear in various verb forms, the predominant verb form for both constructions is the present
simple tense.

The Basic Sentence Type


Thai learners usually express argument structure constructions in the basic sentence type. On
the criterion of grammatical structures, sentences in a clausal construction occur frequently in
the simple structure. When they are used to convey an extended message, they are likely to
appear in the complex structure, by attaching one or more dependent clauses to the existing
independent clause. As to the criterion of communicative purposes, sentences in a clausal
construction frequently appear in the declarative form, with the basic SVO order. For
instance, the results of the study indicate that many passive and existential sentences
produced by Thai learners are in the simple, declarative structures.

84

The Most Basic Structure


Many English clausal constructions have their variant structures, which slightly differ in form
and meaning (Goldberg 1995). Thai learners prefer the most basic structure, which is made
up of only the core components of the construction. For example, the basic passive
construction, consisting of the typical auxiliary be and a past participial verb, is the kind of
passive structure most commonly produced by Thai learners. Likewise, the bare existential
construction, consisting of the expletive subject there, the typical verb be, and the displaced
subject, is the most prevalent existential structure found in Thai learners writing.

The Most Distinct, Fundamental Meaning


Thai learners are usually aware of the most distinct, fundamental meaning. Thus, their usage
of the construction is relatively limited, without variation in meaning. For example, the
English passive typically serves to put the theme as the topic (OGrady 2001). Many Thai
learners produce passive sentences with this pragmatic tendency by having a given and
definite theme subject and an omitted agent phrase. Likewise, the English existential typically
has the presentational function of a novel entity (Collins 2002). Thai learners existential
sentences tend to have a new and indefinite displaced subject, which is in the form of a long
noun phrase having several modifiers to describe the subject referent.
Additional semantic or pragmatic properties are unlikely to be observed by Thai
learners. For instance, another pragmatic property of the English passive involves the endweight principle: the passive is used to place a long agent phrase in sentence-final position
(Downing and Locke 2006). Contrary to this principle, Thai learners passive sentences tend
to contain short agent phrases. Likewise, the existential construction has some additional
functions, such as providing circumstantial background and reintroducing a referent already
mentioned (Collins 2002, Ward and Birner 1995). Thai learners existential sentences do not
usually convey these functions; they are used mostly for the presentational function.
Therefore, compared to the native speaker norms, Thai learners use of argument
structure constructions is much more limited in syntax, semantics, and discourse functions. In
general, there is the tendency for Thai learners to treat argument structure constructions in
English as idiomatic expressions or pre-fabricated chunks which are made up of rather
fixed components and are used to convey one meaning, and hence do not allow much
variation in both form and meaning.
What motivates such usage among Thai learners? Like many deviations found in the
phonology and pragmatics of other varieties of ELF, the motivation of Thai learners distinct
85

usage of clausal constructions is the need for simplicity and effectiveness in communication.
Thai learners have developed their own version of an argument structure construction, which
is simpler and more consistent than the native speaker norms. Because this version is
associated with one particular form and one particular meaning with not much variation, it
ensures mutual understanding and successful communication. Therefore, the present study
supports the precept of the ELF approach, which holds that there is a universal tendency for
L2 speakers to make some changes in the way they use English and shift their focus to
simplicity and effectiveness in communication.

Conclusion and Suggestions


In conclusion, this study investigated Thai learners use of the English passive and existential
constructions. Data were taken from 70 English-major undergraduate students who
represented ELF speakers at the upper-intermediate level. Two kinds of writing tasks were
designed to collect the data writing with prompts and free essay writing. The results
revealed that Thai learners constructions deviate from the native speaker norms in that they
are much more limited in terms of structural complexity as well as semantic and pragmatic
functions. Moreover, the results reflected that Thai learners use of the English constructions
is also governed by three general and universal characteristics, i.e., simplicity, regularity, and
analogy, which have also been found in different varieties of ELF.
The results have a pedagogical implication for teaching English argument structures to
non-native speakers. A traditional way of teaching an argument structure in many Asian
schools is by introducing the form and emphasizing its grammatical properties (i.e., a
grammar-based approach). However, this teaching method is not especially effective,
particularly in Thailand where graduates do not have sufficient skills in English (Kirkpatrick
2012). As shown by the study, there is the tendency for L2 speakers to use a clausal
construction in a simple and consistent pattern by associating it with only the most basic and
distinct properties in syntax, semantics, and pragmatics. This overall result suggests that the
process of teaching an argument structure should be divided into steps based on all the
properties involved. Basic and principal properties in both form and meaning should be
introduced earlier than non-basic and additional ones because they can be treated like
formulaic expressions or chunks, which are easier to acquire. Once learners can pick up the
basic form and meaning of a construction, teachers should present its variant characteristics
and the specific nuances of semantic and pragmatic meanings conveyed by them so that the
learners can use the construction in a more complex and varied way. Such steps of teaching
86

an argument structure are in accordance with Ellis (2005) principle of second language
acquisition that formulaic expressions serve as a basis for the later development of more
complicated features which require a rule-based competence.
The study has extended the scope of CxG from L1 settings to L2 phenomena. Most
studies in the CxG approach have focused on the formal properties of various constructions in
English and other languages from the perspective of native speakers reception and
production. The results of this study have revealed differences in the constructional use
between L1 and L2 speakers, which serve to provide guidelines of teaching argument
structure constructions to English learners. Moreover, the study has broadened the scope of
ELF research, which has focused on phonological and pragmatic features of ELF interactions,
with just a little description at the lexical-grammatical level (Cogo and Dewey 2006,
Seidlhofer 2004). The results have demonstrated that ELF speakers deviations from Standard
English at all levels sounds, words, phrases, discourse, and also sentences are governed by
the universal characteristics of second language usage, which reflects the underlying
motivations of ELF speakers to shape the language in the direction that results in a simple
and effective form of communication.
However, all data in the study involved only written English. In fact, the spoken form
of language is considered more natural (Stewart, Jr. and Vaillette 2001), and an analysis of
data taken from both written and spoken English should reflect more precise characteristics of
the constructions. Moreover, the subjects in the study were from only one institution; data
from various institutions should better represent Thai ELF learners. Therefore, future research
that includes both written and spoken English and participants from various institutions
should be able to find out Thai learners use of English argument structure constructions in
more precise and specific detail.

Acknowledgements
This research project was supported by the Department of Foreign Languages, Faculty of
Humanities, Kasetsart University.

References
Bley-Vroman, R. (1988). The fundamental character of foreign language learning. In W.
Rutherford and M. Sharwood-Smith (Eds.), Grammar and second language teaching: A
book of readings (pp. 19-30). Rowley, MA: Newbury House.

87

Breiteneder, A. (2009). English as a lingua franca in Europe: An empirical perspective. World


Englishes, 28(2), 256-269.
Cogo, A. and M. Dewey. (2006). Efficiency in ELF communication: From pragmatic motives
to lexico-grammatical innovation. Nordic Journal of English Studies, 5(2), 59-94.
Collins, P. (2002). Some discourse functions of existentials in English. In C. Allen (Ed.), The
Proceedings of the 2001 Conference of the Australian Linguistic Society (pp. 1-6).
Australia: Canberra.
Dewey, M. (2007). English as a lingua franca and globalization: An interconnected
perspective. International Journal of Applied Linguistics, 17(3), 332-354.
Downing, A. and P. Locke. (2006). English grammar: A university course (2nd ed.). London
and New York: Routledge.
Ellis, R. (2005). Principles of instructed language learning. Asian ELF Journal, 7(3), 9-24.
Fillmore, C. J. (1988). The mechanisms of construction grammar. BLS 14, 35-55.
Finegan, E. (2004). Language: Its structure and use (4th ed.). Boston, MA: Wadsworth.
Goldberg, A. E. (1995). A construction grammar approach to argument structure. Chicago
and London: University of Chicago Press.
(2002). Construction grammar. In L. Nadel (Ed.), Encyclopedia of Cognitive Science (pp.
813-816). London: Macmillan.
(2009). The nature of generalization in language. Cognitive Linguistics, 20(1), 93-127.
Huddleston, R. and G. Pullum. (2005). A students introduction to English grammar.
Cambridge: Cambridge University Press.
Jenkins, J. (2006). Points of view and blind spots: ELF and SLA. International Journal of
Applied Linguistics, 16(2), 137-162.
A. Cogo, and M. Dewey. (2011). Review of developments in research into English as a
Lingua Franca. Language Teaching, 44(3), 281-315.
Kirkpatrick, R. (2012). English education in Thailand: 2012. Asian ELF Journal, 61.
Retrieved July 20, 2013 from http://www.asian-elf-journal.com
McDonough, K. and Y. Kim. (2009). Syntactic priming, type frequency, and EFL learners
production of wh-questions. The Modern Language Journal, 93(3), 386-398.
OGrady, W. (2001). The syntax files. Honolulu: University of Hawaii at Manoa.
Parrott, M. (2000). Grammar for English language teachers. Cambridge: Cambridge
University Press.
Seidlhofer, B. (2001). Closing a conceptual gap: The case for a description of English as a
lingua franca. International Journal of Applied Linguistics, 11(2), 133-158.
88

(2004). Research perspectives on teaching English as a lingua franca. Annual Review of


Applied Linguistics, 24, 209-239.
Stewart, Jr., T. and N. Vaillette (Eds.). (2001). Language files: Materials for an introduction
to language and linguistics (8th ed.). Columbus: The Ohio State University Press.
Swan, M. and B. Smith. (2001). Learner English: A teachers guide to interference and other
problems (2nd ed.). Cambridge: Cambridge University Press.
Ward, G. and B. J. Birner. (1995). Definiteness and the English existential. Language, 71(4),
722-742.

89

Social Class and Language Structure: A Methodological Inquiry into Bernstein's


Theory of Sociology of Education
Mohammad Aliakbari
maliakbari@hotmail.com
Mahmoud Qaracholloo
Ali Mansouri Nejad
Ilam University
Bioprofiles:
Mohammad Aliakbari is an Associate Professor of TEFL at Ilam University, Iran. His areas
of interest embrace SLA, sociolinguistics and bilingualism.
Mahmoud Qaracholloo holds an M.A. in TEFL. His research interests are different aspects of
English teaching, issues of sociolinguistics, and discourse analysis.
Ali Mansouri Nejad is a Ph.D. candidate at the University of Ilam, Iran. His areas of interest
include critical discourse analysis (CDA), co-teaching and genre analysis.
Abstract
The present study aimed at finding the differences between the language patterns of Iranian
working-class and middleclass speakers. To see if the language patterns produced by
members of different social classes have particular attributes, 100 participants from a western
city of Iran were selected from working and middle-class members. The working-class
members were selected from among salespersons, sale-assistants, and shopkeepers who
worked in groceries, department stores, supermarkets and cafs with no high education. The
subjects selected for the middle-class sample included 16 participants with Ph.D. degrees who
were professors at Ilam University and 34 master students of Ilam University who were
language teachers. Prompts with two topics were administered to both groups to write what
they wished for. After excluding part of the data which was not suitable for the purpose of
this study, the texts were analyzed in terms of the frequencies of total number of words,
content-words repetitions, personal pronouns, impersonal pronouns, structurally-complete
sentences, quasi-sentences, noun groups, adjective groups and verb groups. The

results

indicated significant differences between working and middle-class samples in terms of the
total number of words, content-words repetitions, impersonal pronouns, quasi-sentences, and
verb groups. Moreover, the findings of the study showed that middle-class members were
more productive and creative than persons from lower classes. Accordingly, this study can be
regarded as partial support of Bernstein's Language Codes Theory in an Iranian context.

90

Keywords: language codes theory, restricted code, elaborated code, working-class, middleclass
1.

Background

It is often claimed that social class structure is mirrored in the language patterns produced by
speakers (Holmes 1992) and that there is a direct and reciprocal relationship between a
particular kind of social structure, in both its establishment and maintenance, and the way
people in that social structure use language (Wardhaugh 2006: 336). It is also credited that
the quality of the speakers' language patterns changes according to their socio-economic
status. Therefore, the way language production interacts with social class has provided a rich
area of investigation (e.g., Allafchi 1998, Hoff-Ginsberg 1998, Richardson et al. 1976,
Walker et al. 1994).
Research on this line of study has received much interest in Iranian context in recent
years. Drawing on the relationship between language 1 and language 2 proficiency, Hosseini
(2003) studied learners writing characteristics in light of their socio-economic statuses in
Iran. The study revealed that learners with high and low socio-economic status performed
differently in their writing. Further, no significant relationship was identified between L1 and
L2 proficiency in terms of socio-economic statuses. Likewise, Aliakbari et al. (2012) analyzed
the relationship between social class and language patterns among a group of elementary
school students in Iran. The result of their study illustrated a significant relationship between
ones' use of grammatical categories and their social classes.
Bernstein (1973a) argues that the linguistic differences of various social class
structures lead to two dichotomous language codes: a restricted code and an elaborated code;
the former concerns the language produced by working-class people, and the latter deals with
the language patterns of middle-class language users. The difference between restricted and
elaborated language codes is so interwoven that Bernstein has developed them into two
dichotomous language codes, each one holding its own particular characteristics. More
specifically, it is argued that working-class people do not have access to the elaborated code
and language users or speakers with lower socio-economic statuses speak a language that is
not useful for academic or educational purposes.
The aforementioned language codes are thought to have advantages and disadvantages
Ginsberg (2006) considers that less academic achievement can be attributed to insufficient
language skills. She contends that children from a low socio-economic status are usually
more under-achieved than middle-class students. Such a conclusion was strongly supported
91

by a host of studies which have given a specific attention to social class and written
composition (Richardson et al. 1976), the number of produced vocabulary (Tizard and
Hughes 1984), and vocabulary growth (Walker et al. 1994). Bernstein (1973a) points out that
the process of schooling needs specific language patterns to which low working-class
students have less access. In agreement with Bernstein, Christie (1999) writes that middleclass children have access to the language code needed for educational purposes and are
successful at schools, whereas children from lower social classes lack access to it. To
maintain the platform for the present research, more elaboration of Bernstein's theory of
sociology of education and his restricted and elaborated language codes seems warranted.

1.1.

Bernstein's Theory of Sociology of Education

Bernstein's social theory has been considered as a theory of sociology of education because it
is highly associated with the linguistic differences across social classes and the great effects
that linguistic differences have on the educational processes. Allafchi (1998) believes that
Bernstein has been affected by scholars like Sapir, Mead, Von Humboldt, Cassier, Firth,
Malinovski, Vygotsky and Luria. According to Sadovnik (2001), Durkheim has also played a
fundamental role in the formation of Bernstein's thought and Bernstein (1972) himself
confessed the great influence of Durkheim on his viewpoints. He believed that Durkheim
owned a truly remarkable vision into the relationship between symbolic orders, social
relationship, and the structure of experience. Accepting Durkheim's social opinion, Bernstein
established the foundations of his social theory. Just like Sadovnik (2001), Atkinson (1981)
also explains that Bernstein's theory roots in Durkheimian ideology. However, he states that
Bernstein's sociology gradually found tendency toward European structuralism. According to
Allafchi (1998), as a structuralist, Bernstein was highly indebted to Whorf who believed in a
single universalistic relationship between language and worldview. Sadovnik (2001: 2) notes
that from his early study on language, communication, codes, and schooling, to his later
works on pedagogic discourse, practice and educational transmission, Bernstein produced a
theory of social and educational codes and their effect on social reproduction. The influence
of Bernstein's theory was so noticeable that Karabel and Halsey (1977) called Bernstein's
work in the field of sociology of education the harbinger of a new synthesis. Compatible
with Karabel and Halsey (1977: 62), Robertson (2008) called Bernstein a central actor in
developing a new sociology of education.

92

1.2.

Restricted and Elaborated Language Codes

The discrimination between public and formal languages was the source for introduction and
development of language codes theory that stood as the core of Bernstein's social and
educational theory. Bernstein introduced and developed the language codes theory in 1960s,
1970s and 1980s. As a pioneer, he investigated the interaction between informal languages,
power and shared meaning (Bernstein 1958, 1960, 1961). The study on the nature of informal
and formal languages led to the introduction of restricted and elaborated language codes.
Bernstein concentrated all his attention on the development of restricted and elaborated
language codes (Bernstein 1962a, 1962b). Sadovnik (2001) reports that Bernstein (1972,
1973a) investigated the relationships between socio-economic status, family, and the
regeneration of systems of meaning. He also differentiated between the restricted code of the
working-class and the elaborated code of the middle-class. Bernstein (1973a) acknowledges
that schools require an elaborated code for success to which working-class children may have
no access. Sadovnik (2001) considers restricted codes as context-dependent and particularistic
and elaborated codes as context-independent and universalistic. In addition, an elaborated
code closely corresponds to horizontal discourse introduced by Bernstein as common sense
knowledge. On the other hand, a restricted code is intricately interwoven with vertical
discourse, a style of interrogation and text creation (Bernstein 1999: 159).
Bernstein (1972) differentiated among four socialization agencies that aid the
production of restricted and elaborated language codes: The job, the educational setting, peerage class, and the family. He further considered family as the most important element in the
process of socialization. A number of studies reflected his views toward the role of family
(Bornstein, Haynes and Painter 1998, Dollaghan et al. 1999, Naigles and Hoff-Ginsberg
1998). In this regard, he differentiated between positional and person-oriented families
(Bernstein 1972). In positional and working-class families, children's roles are often
determined by position. As a consequence, children are subordinate to their parents and do
not have the permission to participate in many conversations. Such persons are, therefore, not
allowed to generate individualized speeches. On the contrary, in person-oriented families,
typical of middle-class families, children's individual capacities and interests are taken into
account. They even enjoy the privilege to discuss issues with their parents. Thus, an intense
system of communication is established.
For a better understanding of these concepts, some main characteristics of the
informal and formal languages which are respectively in line with restricted and elaborated
language codes (Bernstein 1973b: 42-43, 55) are presented in the following table.
93

Table 1 Characteristics of the public/informal and formal languages.


Informal languages

formal languages

Short, grammatically simple, often unfinished Accurate grammatical order and syntax
sentences with a poor syntactical

regulate what is said.

construction.
Simple and repetitive use of conjunctions (so,

Logical modifications and stress are mediated

then, and).

through a grammatically complex sentence


construction, especially through a range of
conjunctions and relative clauses.

Modifications, qualifications, and logical

Frequent use of prepositions which indicate

stress will tend to be indicated by non-verbal

logical relationships as well as prepositions

means.

which indicate temporal and spatial


contiguity.

Frequent use of short commands and

Frequent use of impersonal pronouns ( it,

questions.

one)

Rigid and limited use of adjectives and

A discriminative selection from a range of

adverbs.

adjectives and adverbs.

Infrequent use of impersonal pronouns (it,

Individual qualification is verbally mediated

one), as subject of a conditional sentence.

through the structure and relationships within


and between sentences. That is, it is explicit.

Statements formulated as questions which set

A language use which points to the

up a sympathetic circularity, just fancy? Isn't

possibilities inherent in a complex conceptual

it terrible? Isn't it a shame? It's only natural,

hierarchy for the organizing of experience.

isn't it?

94

A statement of fact is often used as both a

Universal

reason and a conclusion, you are not going


out. I told you to hold on tight (mother to
child on bus, as repeated answer to child's
why).
Individual selection from a group of

Low structural prediction

traditional phrases plays a great part.


Symbolism is of a low order of generality.
The personal qualification is left out of the
structure of the sentence; therefore it is a
language of implicit meaning.
Communicated feelings will be diffused and
crudely differentiated when a public language
is being used, for if a personal qualification is
to be given to this language, it can be done
only by non-verbal means, primarily by
change in volume and tone accompanied by
pictures, bodily movement, facial expression,
and physical set.

Having been inspired by the theoretical position reviewed earlier, this study aimed to
investigate the relationship between social classes and language patterns with a particular
reference to Iran. The study is undertaken with the following research question in mind: is
there any significant difference between working and middle-class language users in use of
language patterns?
2.

The Study

Bernstein's theory was mainly based on speech; however, less attention has been paid to
written performance. The similarities between spoken and written discourse (Akinnaso 1985),
95

the interplay between speech and writing (Gillam and Johnson 1992, Olson 1995, Strmqvist
et al. 2002, Tseng 2002), and the presentation of speech by writing (Olson 1993),
substantiated more studies on the linguistic differences between working and middle-class
writings. Inspired by this assumption, the researchers were encouraged to investigate the
quality of writing in the compositions of working and middle-class language speakers in the
Iranian context.
Meanwhile, Bernstein's remarks on the linguistic differences between working and
middle-classes have led to a number of language productivity studies. Although references
were made to a few studies carried out in the Iranian society, the nature of language across
social classes is still indefinite and demands further research. Worthy of note is the fact that
the previous studies have used the general number of vocabularies as the criteria of linguistic
productivity, with less or no focus on the grammatical categories of words. As a result, the
present study compared the linguistic productivity of working and middle-class subjects. To
do so, the language patterns produced by working and middle-class language speakers were
investigated in terms of various grammatical categories.
The dilemma of applicability of Bernsteins theoretical framework in EFL context such
as Iran motivated the present investigation. Prior to the study, much has been tried to testify
Bernsteins model in English-speaking society (ESL context) that reflects the a better
discrimination between working and middle-classes whereas in eastern society, namely Iran,
assigning elaborate and restricted codes to their respective socio-economic status is a
daunting task because the sociocultural background of eastern society obscures the
differentiation between different socio-economic classes. Thus, the study is intended to
examine how Bernstein's Language Codes Theory functions in Iranian context.

2.1.

Participants

A total of 100 subjects participated in the study. Working and middle-class members were
selected according to the level of education and occupation and two indexes of social class.
The social class indexes employed for subject sampling included Socio-economic Status
Scores by Nam and Powers (1983) and Hollingshead's two-factor Index of Social Position
(1957) which have been developed based on two countrywide surveys in the US. Workingclass members were salespersons, sale-assistants, and shopkeepers from among low-educated
and low-income people who had low score (29) from Nam and Powers Socio-economic
Status Scores (1983). The salespersons, sale-assistants, and shopkeepers who participated in
the study used to work in groceries, department stores, and supermarkets in Ilam, a western
96

city of Iran. The sample comprised 9 females and 41 males whose ages ranged from 18 to 50.
Based on the aforementioned indexes, the middle-class subjects were 16 professors at Ilam
University with Ph.D. degrees and 34 Master students from different tracks at the same
university. All the professors were male, aged between 30 and 60, while Master students
comprised 2 females and 32 males whose age varied from 24 to 30. M.A. students were
studying in their third semester. The university professors' score on Nam and Powers Socioeconomic Status Scores (1983) was between 70 to 99 and Master students were considered as
the main specialist group according to Hollingshead's two-factor Index of Social Position
(1957).
2.2.

Language Pattern Elicitation Prompt

To obtain a rich corpus of language data, a prompt was designed. The prompt included two
topics, life and home country. Participants were asked to write about these topics. The topics
were in Persian and the subjects were required to write their compositions in Farsi, the
language of the participants. The selected topics were ideological notions that evoked the
participants, whether high or low educated, to write about (example of the English version of
the prompt is provided in Appendix A).
2.3.

Raters

Two Master students analyzed and investigated the language data retrieved from working and
middle-class members. A number of attributes made them qualified enough for analyzing the
data. Both of them were native speakers of Persian who had received Persian Language and
Literature and Humanities Diploma issued by the Office of Education which indicated that
they attended many Persian language and literature courses at high school. They were aptly
familiar with the Persian language grammar and structure. Both raters had also passed a
course on Persian language and literature in their B.A. with excellent marks. In addition, the
correlation coefficient of 78% indicated an inter-rater reliability for their analyses of data.
2.4.

Administration

After subject sampling, during the following week, the copies of the prompts were given to
members of both classes individually and in their workplaces. The prompts were given to
university professors in their offices, and to salespersons, sale-assistants, and shopkeepers in
groceries, department stores, and supermarkets. The procedure was somehow different for
Master students. Since all Master students were not classmates and did not have workplaces,
97

they were provided with the prompts in the dormitory, classroom, or the campus. Although
the prompts were administered at different places, all the subjects were asked to write their
texts or paragraphs at the very moment without any time interval. The reason to adopt this
procedure was to make the situation more natural and to prevent the participants from
cheating. Although the subjects were asked to write impromptu and not to quote and copy
from any sources, some of the collected writings included inappropriate data. Therefore, those
texts which showed cases of plagiarism were excluded from the study. Illegible handwritings
and too lengthy texts were left out as well. Finally from each group, 30 prompts which were
appropriate to the purpose of this study were selected for the analysis.
2.5.

Data Analysis Procedure

The raters analyzed the language data elicited from both groups and investigated the Persian
grammatical categories (GCs). The investigation of the GCs was based on Ahmadi Givi and
Anvari's (2006) model. Consultants with the full faculty members of the Persian language
department of Ilam university made it clear that Ahmadi Givi and Anvari's (2006)
classification of Persian language GCs is the most up to date, authoritative and
comprehensive index in the Persian language. The raters analyzed the language data for their
total number of words (TNWs), content-words repetitions (CWRs), personal pronouns (PPs),
impersonal pronouns (IPs), structurally-complete-sentences (SCSs), quasi-sentences (QSs),
noun groups (NGs), adjectives groups (AGs), and verb groups (VGs). First, the TNWs
produced by each class of participants were counted by the raters. Then, the frequency of
CWRs, i.e., words which had been repeated at least twice, was determined for each class of
participants. Next, all the variations of PPs, including subjects, objects, possessives, reflective
and emphatic pronouns were counted. Since Persian is a pro-drop language, the subjects of
the sentences are sometimes deleted and the verb suffixes indicate the subject of the sentence.
For example, in the verb xord-am (I ate), am refers to the first person singular I. In prodropped sentences the verb suffixes were regarded as the subject of the sentences and were
counted as PPs. The frequency of IPs, those referring to indefinite human beings, like
someone, somebody, and everybody, was determined as well. Those sentences which were
complete in their surface structure or had all the features of a complete sentence were counted
and labeled as SCSs. Contrary to SCSs, some sentences are semantically complete, but do not
have all the features of a complete sentence. A good example is that such sentences lack
verbs, but still present a complete idea. Structurally or syntactically incomplete sentences
were counted individually and were labeled as QSs. Finally, the frequencies of NGs, AGs,
98

and VGs were enumerated for each class of participants. According to Ahmadi Givi and
Anvari (2006), NGs, AGs, and VGs are very vast categories which comprise many cases, but
for the sake of precision, this study was limited to only those groups of nouns, adjectives, and
verbs that associated each other by the Persian conjunction word, va (and).
To illustrate the analysis procedure, the next two paragraphs present a word by word
translation of two pieces of language data in which all the syntactic and grammatical elements
of the Persian language were presented with no change.

Life
Good life with particular meanings for each man (1)*. For some people, happiness means
having cars, house, and many properties (2). But for others, a simple house is enough for
the family to be happy (3). Many believe ordinary and common life accompanies salvation,
but luxurious life destroys comfort (4).

Home country
Home country, the place where human beings are born, grow up, and live (5)*. We
accommodate in the Muslim country named Iran (6). Iranians have a specific interest in this
treasure, because this country has achieved revolution due to the attempt of many people (7).
We lost many youths for this; therefore, we must love our home country like our essence and
spirit (8).

The italicized words are English language specific which did not exist in the Persian text, but
their existence in the English translation was compulsory. The TNWs, excluding the italicized
ones, was 102. The numbers within the parentheses indicate the sentences. The asterisks show
the QSs. The number of all sentences in this data was 8, 6 SCSs, and 2 QSs. It was found that
the language data included 5 PPs. The words others, many, and this were the IPs in this
prompt. We, our, home country, house, people, and life are the CWRs. The samples included
14 CWRs. Finally, the bold words indicate NGs, AGs, and VGs. The samples included 2
NGs, 1 AG, and 1 VG.

3.

Results

3.1.

Descriptive Presentation of Data

Table 2 (Appendix B) displays the frequency of the grammatical categories in the middleclass. The middle-class data included a total of 3049 TNWs, 412 CWRs, 123 PPs, 80 IPs, 164
99

SCSs, 55 QSs, 57 NGs, 15 AGs, and only 10 VGs. As Table 3 shows, the minimum and
maximum number of words produced was 13 and 193, respectively. The middle-class
members produced 101.6333 words on average (Table 3). The frequency of PPs was much
higher than IPs. The number of SCSs was nearly triple that of QSs. Among NGs, AGs, and
VGs, the highest and the lowest portions were for NGs and VGs respectively. The division of
TNWs by the number of all sentences (SCSs and QSs) indicated that average sentence length
for middle-class data was 14.004.
Table 3 Descriptive statistics of grammatical categories in middle-class data
GCs

Range

Minimum

Maximum

Sum

Mean

SD

TNWs

30

180.00

13.00

193.00

3049.00

101.6333

45.45971

CWRs

30

32.00

.00

32.00

412.00

13.7333

8.30012

PPs

30

14.00

.00

14.00

123.00

4.1000

3.65164

IPs

30

10.00

.00

10.00

80.00

2.6667

2.82029

SCSs

30

13.00

.00

13.00

164.00

5.4667

3.28773

QS

30

12.00

.00

12.00

55.00

1.8333

2.75535

NGs

30

7.00

.00

7.00

57.00

1.9000

1.82606

AGs

30

2.00

.00

2.00

15.00

.5000

.62972

VGs

30

3.00

.00

3.00

10.00

.3333

.71116

As for the working-class, Table 4 (Appendix B) shows the frequency and distribution of
the grammatical categories in the collected data. Data presented in Table 4 show that there
were 2766 words, 525 CWRs, 131 PPs, 32 IPs, 154 SCSs, 81 QSs, 75 NGs, 15 AGs, and just
2 VGs. As shown in Table 5, the minimum and maximum numbers of words were 16 and 203
respectively. The frequency of PPs was much higher than IPs. The number of SCSs was
nearly twice that of QSs. Similar to middle-class data, among NGs, AGs, and VGs, the
highest and the lowest portions were for NGs and VGs respectively. The division of TNWs
by the number of all sentences (SCSs and QSs) showed that average sentence length for the
100

working-class data was 11.77. Summary of the results of descriptive analysis of grammatical
categories collected from the working-class prompts has been represented in Table 5.
Table 5 Descriptive statistics of grammatical categories in working-class data
GCs

Range

Minimum

Maximum

Sum

Mean

SD

TNWs

30

187.00

16.00

203.00

2766.00

92.2000

45.65644

CWRs

30

48.00

2.00

50.00

525.00

17.5000

10.80788

PPs

30

11.00

.00

11.00

131.00

4.3667

3.16754

IPs

30

6.00

.00

6.00

32.00

1.0667

1.59597

SCSs

30

10.00

.00

10.00

154.00

5.1333

3.10432

QS

30

14.00

.00

14.00

81.00

2.7000

3.86987

NGs

30

9.00

.00

9.00

75.00

2.5000

2.46003

AGs

30

6.00

.00

6.00

15.00

.5000

1.19626

VGs

30

1.00

.00

1.00

2.00

.0667

.25371

Table 6 Percentages of GCs in proportion to the TNWs, along with percentages of SCSs
and QSs in proportion to the total number of sentences
GCS

Middle-Class

Working-class
18.980

13.433

PPs

4.401

4.736

IPs

2.608

1.084

SCSs

74.885

65.531

QSs

25.114

34.468

NGs

1.853

2.711

AGs

0.487

0.542

VGs

3.250

0.072

CWRs
Pronouns

Sentences

101

After data collection, the percentages of the frequencies of GCs in each social class
were computed in proportion to the TNWs produced by the same social class. Table 6 also
shows the percentages of SCSs and QSs in each social class computed in proportion to the
total number of sentences produced by the same social class. Although for categories PPs,
IPs, NGs, and AGs, the percentages were nearly the same for both social classes, the
percentages of CWRs, SCSs, QSs, and VGs were different for both groups. Middle-class
members produced higher percentages of CWRs, SCSs, and VGs. However, the percentage of
QSs was greater for the working-class members.
3.2.

Referential data analysis

In order to see if there were any significant differences between the two groups in their
frequencies of the GCs, 9

were run for the given categories. The results of the

indicated

significant differences in five cases, and four of the differences in the frequencies of GCs
were found insignificant.
In the case of the TNWs, the middle-class language data comprised more words. There
was a significant difference (

= 13.584, p < .01) between two groups in the TNWs. CWR

was another point of discrepancy between two classes. Working-class members were more
eager to use words more repetitively than members of the middleclass. The Chi square result
indicated one more significant difference (

= 13.621, p < 0.01) between two social classes

where the number of IPs produced by middleclass was nearly triple that of working-class
data. Another significant difference (

= 20.57, p < 0.01) was reported for the frequencies of

IPs between two groups. Although the middle-class overcame working-class data in the
frequencies of the TNWs and IPs, working-class members produced more QSs and the
difference in the frequency of QSs was found to be significant (

= 4.971, p < .05) as well.

Finally, a significant difference was reported in the frequency of VGs (

= 5.333, p <.01)

between two classes of language users (Table 7).


Although the results of the five GCs indicated that the differences between two classes
were significant, supporting Bernstein's theory, some discrepant results were also reported.
There was not much difference between the MC and WC members in terms of the frequency
of PPs. Middle-class members used just 4 PPs more than WC ones and this trivial difference
in the number of PPs led to no significant difference ( 2= .320, p > 0.05) between the two
groups. Similar to PPs, the frequency of SCSs was nearly the same for both SCs and the Chi
square results indicated no significant difference (

= 0.314, p > 0.05) between two social


102

classes. Besides, no significant difference was reported (

= 2.45, p > 0.05) for the frequency

of NGs. Finally, since the frequency of AGs was exactly the same for both SCs,

was 0 and

p was equaled to 1.00. Summary of Chi square results with respect to the distribution of
grammatical categories is shown in Table 7.
Table 7: The results of

for the differences in the frequencies of the grammatical


categories

GCs

TNWs

CWRs

13.584

Sig.

.000**

PPs

IPs

SCSs

QSs

NGs

AGs

VGs

13.6213 .320

20.571

.314

4.971

2.455

.000

5.333

.000**

.000**

.575

.026*

.117

1.000

.021*

.572

** P < 0.01; * p < 0.05


4.

Discussion

The present study attempted to compare working and middle-class language users in an
Iranian context with respect to the frequency of certain GCs in their compositions. As was
reported in the previous sections, some discrepant findings arose out of the data analysis. In
terms of the frequency of the TNWs, a significant difference was found between the groups.
For instance, middle-class members produced greater number of vocabularies. This means
that middle-class members were more productive and creative than subjects from the lower
social class. Although the participants were asked to write about the same topics, the
professors and master students were more productive. The difference in the productivity level
of the two groups leads two general conclusions: First, with respect to the relationship
between language and thought, professors and Master students might have read more books;
they are more prepared to discuss abstract concepts such as life. In other words, they are more
thoughtful and have more ideas to express. The second conclusion that is more in line with
Bernstein's language code theory is that the higher linguistic creativity of the middle-class
members may have nothing to do with ones thought but with the developed language pattern,
which has the potentials to discuss any abstract topic. The topics selected for participants to
write about were so general and ideological that people with different levels of education
could write about. Therefore, the first remark that educated people can discuss more because
they are more thoughtful and have more opinions for the discussion cannot be taken seriously.
On the other hand, production of more words can be discussed in terms of a more developed

103

language pattern which provides language speakers with more words to use in language
production.
While middle-class members were more productive in their writing, working-class
members were more repetitive in their terminologies. That is, middle-class members
expressed themselves using a variety of vocabularies, but the self-presentations of workingclass members were more bound to a range of repetitive words that were more or less
synonymous to the topics they were to write about. It can, thus, be claimed that the
application of word repetition by working-class members is due to their inaccessibility to an
enough corpus of terms in their language code to express themselves easily. On the contrary,
middle-class members seem to have access to a more lexically developed language which
allows language producers to express the same intentions with different lexicons.
As for the IPs, there was a significant difference between the two groups of participants
in that middle-class members used more IPs. In contrast with PPs as placeholders for proper
or common nouns with real referents in the world, IPs refer to no definite persons in the real
world and are used to express facts or opinions anonymously. In general, IPs are factors that
are used to express ideas context-independently. The higher number of IPs means that their
language production is less context-or-situation-bound. It can, therefore, be claimed that
middle-class members express ideas as generalizations. In other words, they usually
overgeneralize their beliefs to be more acceptable in different situations. Stated otherwise,
middle-class language pattern can be regarded as a general, or in Bernstein's terms,
universal language code.
Difference in language production of the subjects was noteworthy for QSs. It was found
that QSs were more common among members of lower social classes. As noted by Ahmadi
Givi and Anvari (2006), QSs are shorter and more concise sentences because they lack some
elements of SCSs. Results of data analysis indicated that such sentences are typical of
working-class members. This finding supports Bernstein's idea that restricted code is full of
short and incomplete sentences, either grammatically or semantically.
In the case of VGs, a significant difference was reported between two social classes as
well. Middle-class members had more preferences for VGs which are examples of language
elaboration devices. They are used to express meaning more explicitly and in details. In the
current research only those categories of verbs that have been linked together by the Persian
conjunction word va (and) were included. The verbs that follow the previous verb by a
conjunction give more explanation to the meaning of the previous verb. In such groups of
verbs, neighboring verbs influence each other semantically. The more verbs that accompany
104

each other, the more comprehensive and exact meaning is expressed. It can be claimed that
this language pattern which is typical of middle-class members is semantically precise. Such
a precision is gained through a link of linguistic elements that express ideas explicitly. This
remark supports Bernstein's theory in that elaborated language code is more explicit and
semantically precise and expresses all meaning exploiting linguistic structures.
PPs, as opposite elements to IPs, were another point of investigation in the study. PPs
replace the proper and common nouns in the real context and are indicators of a contextdependent language code. The more members use PPs in their speaking or writing, the more
context-dependent and specific their language will be. Though to Bernstein (1973a) restricted
language code is context-dependent and full of PPs, in the present study, no significant
difference was found between the frequencies of PPs in performance of the participants in the
given classes. PPs and IPs stand as dichotomous concepts, each of which typical in one of the
codes developed by Bernstein. In this study, the higher frequency of IPs among middle-class
members was approved, but the frequency of PPs was nearly the same for both classes, which
did not support Bernstein's claim. The percentages of PPs in proportion to the TNWs
produced by each class also exhibited no difference between the two classes.
Since the structure of SCSs is based on the common logical grammaticality, a sentence
has all the grammatical elements, hence longer and more logical. Data analysis indicated no
significant difference between the frequencies of SCSs. Of course, the percentages of SCSs in
proportion to the TNWs produced by each class indicated a big difference between two SCs.
Therefore, it was shown that middle-class members have produced higher percentage of
complete SCSs in contrast with the working-class members who produced higher percentage
of QSs.
As indicated by Ahmadi Givi and Anvari (2006), just like VGs, AGs and NGs are
appropriate tools to produce a more elaborated language code. It was found that middle-class
members preferred to use VGs more than working-class members, but no significant
difference between the frequencies of AGs and NGs was reported. In other words, in case of
AGs and NGs, Bernstein's theory was not supported either.

5.

Conclusion

Seeking the distribution and significance of language users linguistic patterns within distinct
social classes, the study was an attempt to underline the interplay between language
production and socio-economic classes. Elaboration of the interaction can provide a better
view of applicability of linguistic categories within the social frameworks. Although the
105

investigation of differences in the frequency of GCs in the language data collected from both
groups was not an absolute issue, Bernstein's remark on the linguistic differences between
language speakers from various social classes was supported to some extent. Middle-class
members were found to be more productive and creative than persons from lower classes. The
accessibility to enough ranges of vocabularies or terminologies was different across groups.
Working-class members had limited access to terms to easily express themselves. Since
middle-class members used many more IPs, it was concluded that their language code is less
situation-or-context-specific. In other words, middle-class language code is a general or
universal pattern which is easily overgeneralized to different occasions. In addition, it was
found that working-class members usually express their meanings using shorter sentences.
Finally, although the distribution of AGs and NGs was the same across two classes of Iranian
native speakers, the middle-class's preference for the production of more VGs indicates that
their language is more elaborated and explicit. All in all, the data collected in the given
Iranian context support Bernstein's language code theory to a certain extent.
6.

Research Implications

The findings of the study can have some implications for language studies, sociolinguistics,
schooling and education in Iran and similar context. First, it can contribute to the field of
discourse studies. Since a central emphasis of Bernstein's theory is the impact that context
imposes on the production of linguistic structures, discourse analysts can take advantage of
this study about the production of the linguistic structures. This study can support Bernstein's
differentiation between horizontal and vertical discourses that can also be a good framework
available for discourse analysts. Second, although sociologists and sociolinguists usually
consider factors like occupation and education as indicators of social class, the present study
advocates linguistic structure as a new indicator for that purpose. The difference in the
language structures produced by people from different social classes justifies sociolinguistic
perspectives on the application of the language pattern as a device for determination of social
class. Third, even though the present study was conducted among adult participants, its
findings can be beneficial to language teachers in making them alert to the fact that students
from different social class families do not have identical access to language knowledge in
schooling even though they have passed similar level of education. As wary of socioeconomic status of students and their different accessibility to the language use, teachers can
minimize language loss of working-class students through holding classes participated by
students with different socio-economic backgrounds. Such heterogeny might provide
106

working-class students with better language accessibility in proximity to the middle-class


students. Finally, although the current results are more conducive in the society, they are not,
at least partially, value free in the educational context for students from families with
different socio-economic statuses. Thus, material and syllabus designers can also benefit from
the results of the present study in pedagogical contexts. They could include socio-economic
considerations in materials and syllabuses to compensate for the language loss of workingclass children.

References
Ahmadi Givi, H. and H. Anvari. (2006). Persian syntax (3rd ed.). Iran, Tehran: Fatemi
Publication.
Akinnaso, F. N. (1985). On the similarities between spoken and written language. Language
and Speech, 28(4), 323-359.
Aliakbari, M., M. Samaie, K. Sayehmiri and M. Qaracholloo. (2012). The grammatical
correlates of social class factors: The case of Iranian fifth-graders. Linguistikonline,
56(6), 3-20.
Allafchi, J. (1998). The relationship between social class and speech codes with respect to
syntactic complexity. Unpublished Master's dissertation. Shiraz University, Iran.
Atkinson, P. (1981). Bernstein's structuralism. Educational Analysis, 3(1), 85-96.
Bernstein, B. (1958). Some sociological determinants of perception: An enquiry into subcultural differences. British Journal of Sociology, 9(10), 159-174.
Bernstein, B. (1960). Language and social class: A research note. British Journal of
Sociology, 11(3), 271-276.
Bernstein, B. (1961). Social structure, language and learning. Educational Research, 3(3),
163-176.
Bernstein, B. (1962a). Linguistic codes, hesitation phenomena and intelligence. Language
and Speech, 5(1), 31-46.
Bernstein, B. (1962b). Social class, linguistic codes and grammatical elements. Language and
Speech, 5(4), 221-240.
Bernstein, B. (1972). A sociolinguistic approach to socialization with some reference to
educability. In J. J. Gumperz and D. Hymes (Eds), Directions in sociolinguistics: The
ethnography of communication. New York: Halt, Reinhart and Winston.
107

Bernstein, B. (1973a). Class, codes and control, Vol 1. London: Routledge and Kegan Paul.
Bernstein, B. (1973b). Class, codes and control, Vol 2. London: Routledge and Kegan Paul.
Bernstein, B. (1999). Vertical and horizontal discourse: An essay. British Journal of
Education, 20(2), 157-173.
Bornstein, M. H., M. O. Haynes, and K. M. Painter. (1998). Sources of child vocabulary
competence: A multivariate model. Journal of Child Language, 25, 367-393.
Christie, F. (1999). Pedagogy and the shaping of consciousness: Linguistic and social
processes. London: Continuum.
Dollaghan, C. A., T. F. Campbell, J. L. Paradise, H. M. Feldman, J. E. Janosky, D. N. Pitcairn
and M. Kurs-Lasky. (1999). Maternal education and measures of early speech and
language. Journal of Speech, Language and Hearing Research, 42, 1432-1443.
Gillam, R. B. and J. R. Johnston. (1992). Spoken and written language relationships in
language/learning-impaired and normally achieving school-age children. Journal of
Speech and Hearing Research, 35, 1303-1315.
Ginsborg, J. (2006). The effects of socio-economic status on childrens language acquisition
and use. In J. Clegg and J. Ginsborg (Eds.), Language and social disadvantage:
Theory into practice (pp. 9-27). Chichester: John Wiley and Sons.
Hoff-Ginsberg, E. (1998). The relation of birth order and SES to children's language
experience and language development. Applied Psycholinguistics, 19, 603-629.
Hollingshead, A. B. (1957). Two factor index of social position. New Haven, CT: Privately
printed.
Holmes, J. (1992). An introduction to sociolinguistics. London: Longman.
Hosseini, A. (1993). The relationship between L1 academic proficiency and foreign language
learning with respect to socio-economic background of learners. Unpublished
Master's dissertation. University for Teacher Education, Tehran, Iran.
Karabel, J. and A. H. Halsey. (1977). Power and ideology in education. New York: Oxford
University Press.
Naigles, L. R. and E. Hoff-Ginsberg. (1998). Why are some verbs learned before other verbs?
Effects of input frequency and structure on children's early verb use. Journal of Child
Language, 25, 95-120.
Nam, C. B. and M. G. Powers. (1983). The socioeconomic approach to status measurement.
Houston: Cap and Gown.
Olson, D. R. (1993). How writing represents speech. Language and Communication, 13(1), 117.
108

Olson, D. R. (1995). Towards a psychology of literacy: On the relations between speech and
writing. Cognition, 60, 83-104.
Richardson, K., M. Calnan, J. Essen and L. Lambert. (1976). The linguistic maturity of 11year olds: Some analysis of the written compositions of children in the national child
development study. Journal of Child Language, 3, 99-115.
Robertson, I. (2008). An introduction to Basil Bernstein's sociological theory of pedagogy.
Retrieved from http://sites.google.com/site/robboian/IntroBernstein.pdf?attredirects=0
Sadovnik, A. R. (2001). Basil Bernstein. Prospects: The Quarterly Review of Comparative
Education, 31(4), 687-703.
Strmqvist, S., V. Johansson, S. Kriz, H. Ragnarsdttir, R. Aisenman and D. Ravid. (2002).
Toward a cross-linguistic comparison of lexical quanta in speech and writing. Written
Language and Literacy, 5(1), 45-67.
Tizard, B. and M. Hughes. (1984). Young children learning: Talking and thinking at home
and at school. London: Fontana.
Tseng, M. Y. (2002). On the interplay between speech and writing: Where Wordsworth and
Zen discourse meet. Journal of Literary Semantics, 31(2), 171-198.
Walker, D., C. Greenwood, B. Hart and J. Carta. (1994). Prediction of school outcomes based
on early language production and socioeconomic factors. Child Development, 65, 606621.
Wardhaugh, R. (2006). An introduction to sociolinguistics (5th ed.). Oxford: Oxford
University Press.

109

Appendix A
The present prompt has been developed for research purposes. Appreciating your favor, please
help us carrying out the research
It should be mentioned that, since no personal information of respondent's identity is requested,
all the opinions presented in the prompt will remain confidential and will be used only for
research purposes.
Write what you like about the following topics.
Life

Home country

Many thanks

110

Appendix B
Table 2: Frequency of the grammatical categories in the middle-class group
Middle TNWs

CWRs

PPs

IPs

SCSs

QSs

NGs

AGs

VGs

Class
1

96

14

116

13

13

65

113

24

83

22

103

21

193

32

11

101

128

17

10

10

93

10

11

161

21

12

93

13

13

13

14

158

19

15

185

21

111

16

93

10

17

126

12

14

18

189

22

19

57

13

10

20

39

21

75

30

12

22

83

23

70

24

38

25

110

12

26

60

27

73

28

73

14

29

160

20

13

30

102

14

Total 3049

412

123

80

164

55

57

15

10

112

Table 4: Frequency of the grammatical categories in the working-class group


Working-class

TNWs CWRs

PPs

IPs

SCSs

QSs

NGs

AGs

VGs

122

12

32

68

18

37

71

21

31

13

59

16

97

24

71

13

10

42

11

16

12

51

11

13

117

30

14

87

28

10

15

106

11

16

95

34

14

17

100

15

113

18

82

25

19

37

11

20

54

16

21

141

16

22

107

11

11

23

166

15

10

24

125

25

203

50

26

165

18

27

143

38

13

28

101

11

29

113

15

30

127

29

Total

2766

525

131

32

154

81

75

15

Abbreviations: TNWs = total number of words; WR = word repetition; PP = personal


pronoun; IPA = impersonal pronoun/adjective; GC = grammatical category; SCS =
structurally-complete sentence; QS = quasi-sentence; NG = noun group; AG = adjective
group; VG = verb group.

114

Code-Switching in a Virtual English Community in China: An International Perspective


Ming Wei
University of International Business and Economics
mingweigrace@163.com
Bioprofile: Ming Wei is an Associate Professor at the University of International Business
and Economics. She earned her doctorate degree in 2009 at Oklahoma State University in
Linguistics/TESL in the United States after teaching at Beijing Foreign Studies University for
five years. She received her Masters degree in Linguistics in 1999 from Nankai University,
China.
Abstract
This paper investigates how a Net-based environment promotes code-switching practices
among English learners in China and examines how such practices negotiate social and
interactional meanings. Based on the analysis of conversations in an English chat room in
China from the interactional perspective, this study demonstrates how code-switching
contributes to the creation of an authentic and distinctive context of social interaction. It
reveals how the speakers adjustment of code choice and degree of code-switching are firmly
anchored to the situational need to manage face and social distance in synchronous
conversations, as well as how manipulation of code interpretation and selection was achieved.
It was found that peoples use of code affects the addressees involvement in the ongoing
dialogue in that it either acknowledges the latters intention behind the code choice or inhibits
behavior perceived as inappropriate. The paper also discusses how code choice may relate to
the local setting of learners studying English in mainland China.

Keywords: code-switching, chat room, social meanings, interactional perspective, English


learning

1.

Introduction

Code-switching has been extensively studied in the past few decades in terms of its patterns
and meanings in oral production (e.g., Adendorff 1996, Cheng and Butler 1991, Gumperz
1982, Hoffman 1991, Lu 1991, Myers-Scotton 1989). It has been found to be a discursive
convention which can index contextual and metalinguistic information that is conveyed by
other means (e.g. prosody) in monolingual settings. This is particularly relevant to online
communications which, in addition to being social and context dependent, are structurally
simpler to meet specific interactive purposes and overcome its lack of a conventional form of
115

presence (Bays 1998, Crystal 2001). However, little is known about how interactional
frameworks are built in fluid virtual communities populated by English learners, especially
strangers whose identities and presence are primarily maintained by their verbal practices.
Using an interactional perspective, this study analyzed interactions of English learners in
China in an online chat room to uncover how code-switching helps speakers manage social
distance and facework as well as how this affects the addressees choice of code. Also, it aims
to contribute to the research on second language learning by identifying some gaps in
learners interactional competence in English through examining where they switch to their
native language.

1.1.

Code-Switching and Interaction

Over the past few decades, code-switching, which has been described as two languages
juxtaposed, or alternated in discourse, typically within a single conversation, or within a
sentence or utterance (Auer 1998, Liebscher and Dailey OCain 2005), has been dealt with by
numerous scholars. In a prototypical case, code-switching occurs in a sociolinguistic context
in which speakers orient towards a preference for one language at a time (Auer 1998). As an
integral aspect of conversational analysis, it is one of the contextualization conventions which
are acquired through interactions where people participate in a particular network of
relationships (Gumperz 1982).
Code-switching is intention-driven and functionally motivated (Adendorff 1996,
Hoffman 1991, Myers-Scotton 1989). For example, Saville-Troike (1982) identified eight
different functions such as softening or strengthening a request or command, humorous
effect, or lexical need. Gardner-Chloros (1991) argues that code-switching may occur as an
effect of the topic or the roles of the participants. Auer (1998, 2007) asserts that as an index
of certain extralinguistic social categories it can be interpreted by participants as indicating
either some aspects of the situation (discourse-related switching), or some features of the
code-switching speaker (participant-related switching).
In discussing how a code signifies a network of interpersonal relationships, McConvell
(1988) believes that we should consider the standpoint and attitude speakers wish to express
and of the social domain where they wish to relate to the interlocutor or the referent. Tay
(1989) argues that it can contribute to solidarity and rapport in multilingual discourses. Codeswitching has been associated with footing, which is defined as the speakers alignment, or
set, or stance, or posture, or projected self (Goffman 1981: 128) and the projection of a
speakers stance towards an utterance (its truth value and emotional content), as well as
116

towards other parties and events (Levinson 1988, as cited in Wine 2008: 2). Through its
departure from the established language-of-interaction, code-switching signals otherness of
the upcoming contextual frame and thereby achieves a change of footing (Auer 1998). In
other words, it can affect conversational status and social distance among interlocutors during
the production and reception of an utterance. As a form of foot shifting, code-switches can be
temporary suspensions of social relations that are later resumed or change the nature of whole
activities (Levinson 1992).
Existing studies have been primarily oriented towards the way speakers alternate
languages and how this indexes speakers purposes and the communication situation. There
are several exceptions which look into how code-switching relates to the interaction between
interlocutors. For example, in a pioneering study of language alternation in Italian-German
peer talk and adult-child conversations, Auer (1984) analyzed both speaker adjustments and
participation framework phenomena in relation to code-switching and demonstrated how
code-switching may be used to attain a shift in the recipient constellation. Cromdal and
Aronsson (2000) examined in depth speakers mutual adjustment of actions and reception of
code-switches, revealing that footings are intrinsically interactional achievements. Su (2009)
suggests that code-switching can negotiate interpersonal relationships in a face-threatening
situation on the interactional level in conversational interaction, and can make it easier for the
addressee to identify changes in frames, alignment and footing and react accordingly. The
interactional perspective has informed the research on how code-switching affects the
participation framework. Nevertheless, bilingual conversations have rarely been approached
explicitly from the perspective of whether and how specific code shifts can affect others
choice of code.

1.2.

Code-Switching and Language Learning

Besides revealing the interactive mechanism, analyzing the way codes switch has been
considered relevant to language learning. In the unfolding of meaning, switches can be
indicative of different stages in learners learning and using of the target language. In
particular, code alternation can fulfill a wide range of functions in cognitive, linguistic,
interactional as well as discourse terms in the L2 setting (e.g., Van Lier 1996, Simon 2001).
Code-switching has been traditionally seen as an asset in communication. As pointed out by
Goffman (1981: 156), switching codes requires the capacity of a dexterous speaker to jump
back and forth, keeping different circles in play. Heller (1988) sees it as a constructive verbal
strategy used in social interaction which facilitates the effort of interlocutors to seek common
117

ground in bilingual conversation. Cheng and Butler (1989) contend that it can be seen as an
asset when it is employed to promote the content and the essence of the message. In two more
recent studies (Liebscher 2005, Olmstead 2004), code-switching has been shown to be a
useful conversational resource that enhances sociability by building shared understanding
about the ongoing interaction and indicate participants orientation toward the interaction and
toward each other. Some other scholars have related code-switching with language
deficiency. For example, Auers (1984) non-classroom data show that code-switching could
be an indication of a momentary lack of competence. Cheng and Butler (1989) are also
concerned that it can be a deficit when used to the extent that it interferes with
communication. Sert (2005) also reminds us that code-switching may interfere with mutual
intelligibility when learners interact with native speakers of the target language and pose
long-term damage on the foreign language learning process. Whether code-switching plays a
positive or negative role depends largely on the addressee and specific goal of interaction.
However, from the perspective of second language learning, studying the way learners switch
between languages could reveal non-native learners communicative capacity. Under the
assumption that the function performed by the use of the native language in target-languagebased conversation may indicate a gap of capability and lack of comfort in the target language
relative to the native language.

1.3.

Code-Switching and the Internet

Language used in Internet communication triggered immense research interest in recent


years. The increasingly widespread use of the Internet, which has developed from a peripheral
cultural phenomenon to an important locale of cultural transformation and production in its
own right, has given rise to new varieties of communities (Porter 1996: 17). Porter describes
the new phenomenon of the Internet as an interface not between the user and the computer,
but between the user and the collective imagination of the vast virtual audience, where one
can find that the whole range of interpersonal dynamics has adjusted to the distinct conditions
of online connectivity. In turn, this establishes conventions of self-presentation and
argument, widely shared systems of value and belief, complete lexicons of gestural symbols
to convey nuances of personal style, and modified standards of social decorum that facilitate
easy interactions with strangers (ibid.:13). In particular, anonymity has been noted by several
authors as a defining feature of the environment of cyberspace which makes it possible to
consciously shape ones persona by creating alternative versions of ones self (Baym 1995,
Wilbur 1997). In addition, the lack of obligation on the part of the participants of virtual
118

encounters contributes to the fluidity or changeability that other aspects of lives do not have
(Healy 1997, Wilbur 1997). Not surprisingly, the Internet has become a site where virtual
communities of social and cultural interest groups are organized and new modes of
communication are formed.
The Internet chat group is a typical example of such virtual communities, which is
defined by Rheingold (1993: 7) as social aggregations that emerge from the Net when
enough people carry on those public discussions long enough, with sufficient human feeling,
to form webs of personal relationships in cyberspace. Previous studies (e.g., Friermuth 2001,
Hall 1996, Lam 2004, Tepper 1997) have dealt with online chatting as a distinct form of
communication in the make-believe world. For instance, Bays (1998) asserts that the
combination of textuality and temporality contributes to a conversational mode of the
environment which allows for an enlarged possibility for identity experimentation and fictive
exaggeration of discursive action. Crystal (2001) also points out that in synchronous
communication in computer-mediated contexts, the form of talk has been traditionally seen as
social rather than serious in its content in that it is more context dependent and structurally
simpler to serve specific interactive purposes.
A major distinction has been made between online and real world communication
concerning the form of presence. The subtleties in conventional conversations typically
conveyed by physical qualities such as vocal intonation, stress and gesture become
problematic in the chat room where the encounter is typically not face to face. However, as
proposed by Bays (1998), the need for the underlying sense of presence can be fulfilled by the
physical setting of the computer and the scrolling dialogue, which indicates that there is some
unseen user out there typing and sending responses to their messages, as well as some
discursive strategies, such as addressivity, which allow the users to engage personally in the
electronic setting. According to Bays (1998), participants readjust their contributions for a
valid and desired exchange by recreating presence as the cognitive foundation of conversation
where parallels to ordinary conversation can be found through discursive conventions.
Code-switching has been found to be one of such discursive conventions which can
index contextual and metalinguistic information that are conveyed in other ways, e.g.,
prosody, in monolingual settings (Gumperz 1982). Comparable to prosodic parameters and as
a contextualization strategy, it helps create situational co-presence in a pseudo-physical
environment (Auer 1988, Nilep 2006). It has been found to work as a feasible strategy
sustaining viable social encounters. For example, Bays (1998) asserts that alternative
language choice is used as a strategy to achieve and handle disagreement in the Internet chat
119

room. In Lams (2004) investigation with two Chinese immigrant high school girls in the US,
the examination of their code-switching practices revealed that the girls' participation in the
chat room should be understood in relation to their experiences in the national context of the
US and demonstrated how alternative identities are sought in the virtual world. Ho (2006)
looked into the bilingual practices of tertiary students in Hong Kong when using ICQ an
instant messaging computer program. She found that English and Chinese were
complementary to each other in helping participants handle the pressure of instant
communication. Crdenas-Claros and Isharyantis (2009) study with some of their MSN
messenger (another online social networking site) contacts and Goldbargs (2009) analyses of
the survey results with her personal contacts suggest that online chatters showed peoples
preference for their first language in conveying more personal content and feelings. These
studies have been illuminating how peoples choice of code may relate to social realities in
virtual communities and have given rise to the relevance of code-switching to learners verbal
behaviors in the online context. Nevertheless, the majority of existing studies focused on
people who already knew each other. Relatively little is known about code-switching between
total strangers in an online environment, where the validity and durability of their identities
rely almost exclusively on their presence and behaviors in the virtual context.

1.4.

This Study

Although English has been widely accepted as an indispensable tool for achieving academic
and career advancement in China, learners generally do not have much exposure to the
English-speaking environment other than classroom settings. In such a context where English
is rarely used for daily communication, the Internet chat room represents a unique locale of
interactions; it has been regarded by many English learners as a useful and handy site to
practice their English, especially spoken English, comparable to the so-called English
corner outside EFL classrooms where spoken English is practiced in the physical world
where the conversation is typically the first encounter for the interactors who do not know
each other.
However, not much information is available as to how this group of learners interacts
synchronously in a Net-based environment where co-presence is maintained primarily
through ones literary practice. By studying the code-switching practice and its functions in
the English chat room from an interactional perspective, it is hoped that we can understand
how social relations and interactional meanings are co-constructed through particular forms
of discursive practices. Meanwhile, through analyzing the sequential position in which a
120

code-switch occurs and how the code choice of one interlocutor affects that of the other, we
can catch a glimpse of the dynamics at play which prompt code-switches and affect the
reception and code choice by the addressee. Finally, it is presumed that code shifts in contexts
where co-presence is maintained primarily through verbal practices not accompanied by
prosody or body language could indicate learners (lack of) target language competence to
meet various interactional needs.

2.

Methods

2.1.

Data Collection

The chat room under study is called English WW, a component of www.bliao.com which is
the largest chatting website in China composed of various freely accessible chat rooms
catering for people with different interests. This particular room is one of the twelve English
chat rooms on this website intended for people to practice conversational skills in English; in
other words, the chatters are typically English learners. Chatters use nicknames they make up
for this chat room, which enables them to remain anonymous regarding their real life
identities. As observed from the exchanges, interactors are from a broad range of
backgrounds, being college students, white collars, teachers, etc.
This chat room was selected because it was the most populated English chat rooms of
the website, with an average of 50-60 participants at a time, which provided a rich resource
for linguistic research. Also importantly, the settings of this chat room enabled the researcher
to easily copy the ongoing conversation.
The researcher logged into the chat room and observed it for about two hours on
twelve consecutive days. The conversations in progress were copied and saved for subsequent
analysis, resulting in approximately 18 hours of verbal exchanges. Due to the voluntariness,
anonymity, irregularity and fluidity of online communities, it was impossible to obtain
demographic information from the participants or keep track of them once they quit the chat
room. Each line is prefaced by the names of both the speaker and the addressee, making it
possible for the investigator to piece them together and obtain individual conversations from
the synchronous multiple conversations which mingled together on the screen.
Then instances of code-switching were identified and examined to find under what
circumstances participants shifted codes, and whether and how the code-switching affected
social relations with the interlocutor and the code choice of the interlocutor.

121

2.2.

Codes Used in the Chat room

English proficiency varied greatly from chatter to chatter; some people demonstrated
noticeable and frequent grammatical errors in English. However, the major focus of the
present study is not English proficiency variation, but the way people shifted between English
and Chinese. It was noted through observational data that this English-based chat room had
been turned into a peculiar English-based bilingual community through the use of a mixedcode variety of language among the interactors, consisting of English and pinyin, i.e.,
romanized Chinese. Pinyin is a system of romanization for Standard Mandarin. It was
adopted in 1979 by China as the method of phonetic instruction in mainland China and
established by the International Organization for Standardization (ISO) as the standard
romanization for modern Chinese. Pinyin uses Roman letters not to represent the shapes of
Chinese characters, but to spell the sounds of Standard Mandarin (Swofford 2006). It has also
become a convenient tool for entering Chinese language text on computers. Pinyin was found
to be a code preferred to Chinese characters in the chat room under study, which could be
primarily a result of the participants avoiding the trouble of having to convert between
English and Chinese characters, or partly due to the consideration that in an English context,
Chinese characters would appear somewhat abrupt. Therefore, the following section will
focus on the switch between English and pinyin within utterances and across utterance
boundaries. Among the great number of such switches, a large part of which took place in
brief phatic verbal exchanges, three excerpts were selected for detailed qualitative analyses
because they provided relatively complete communicative settings which made it possible to
carry out more meaningful, objective, and rational interpretation and discussion from the
interactional perspective.

3.

Discussion

It was found that in many cases, chatters used various combinations of English and pinyin,
which seemed to have worked well with this chat group. A noticeable aspect of the
phenomenon of code-switching was the attachment of Chinese particles to the end of English
utterances. Although pragmatic particles do not contribute significantly to the propositional
content, they affect the utterance as a whole in that they provide contextual coordinates for
the proper interpretation of the speakers utterances in ongoing discourse (Ostman 1982). In
traditional Chinese grammar, sentence-final particles are referred to as yq c mood words,
which suggests that their function is primarily to relate in various ways the hosting utterance
to the conversational context and to indicate how this utterance is to be interpreted by the
122

hearer (Li and Thompson 1981). Although these particles are optional as far as
grammaticality judgments are concerned, they are pragmatically informative and express the
speakers attitude or emotional state in the communication interchange. As pointed out by
Chao (1968), they are important devices in Chinese that fulfill many of the functions of
intonation in other languages, such as English, which is especially meaningful in online chat
rooms where there is a lack of prosodic features.
In the collected corpus, many conversation participants adopted Chinese sentencefinal particles to cue the modality of their utterances and their orientation to the addressee.
This is particularly interesting because there is no one-to-one correspondence between pinyin
and Chinese characters, not to mention that pinyin mixed in English utterances was not
marked with tones, an important feature of Chinese pronunciation. In the following excerpt,
particles constitute all of the code-switches from English to pinyin. Tim and Vicki
(pseudonyms) are talking about Vickis relationship with her boyfriend. Vicki is not very
happy with her boyfriend and Tim is trying to help her out by offering suggestions.

Excerpt 1
1

Tim:

Vicki: leaving from him?

Vicki: how?

Vicki: I dont want to mention it first.

Tim:

Vicki: no guy I can try a

Vicki: I dont want to play game in fair

Tim:

well..u r living in the kingdom of gals?

Tim:

its not playing a

10

Vicki: iiiiiiiiiii)

11

Tim:

12

Vicki: I dont know what should I do

13

Tim:

14

Vicki: he said

15

Vicki: but I cant fell that

16

Vicki: feel

17

Vicki: right now I am working in one company, no gals, so many guys

18

Tim:

do u think about leaving?

by starting to try another guy ne

yes?

he loves you?

which stage does a girl feel the love to her from bf most?
123

19

Vicki: I have no time to make other bf due to busy

20

Tim:

21

Vicki: at my stage ba

22

Tim:

23

Vicki: I wish I have bf, we leave in different city but not far from

24

Vicki: marriage is far from me I think

25

Tim:

26

Vicki: I never think about it

I see

I mean, at the beginning of an affair, or in the mid, or after getting married ne

hehe

Tim and Vicki have been conversing solely in English for 16 minutes. Then in reply to
Vickis question in line 3, instead of using a question mark, Tim code-switches to pinyin ne, a
rough equivalent to how about, which has the function of converting a statement into a
question in context that is already known (Chu 1998). The tentativeness achieved by suffixing
such a force-reducing particle saves Vickis negative face and indicates Tims awareness of
the potential risk of being perceived as impolite and intrusive in advising people, especially
strangers, on their personal affairs. This is Tims adaptation to this peculiar virtual
environment which lacks nonverbal subtleties that can otherwise be conveyed by body
language or voice features. Tims change of code to signal his pragmatic intent triggers
Vickis incorporation of pinyin a in line 6, which, as a sentence-final particle, similar in
pragmatic function to ne, reduces the assertiveness of the message conveyed by the sentence
(Li and Thompson 1981). This suggests Vickis attempt to mitigate the tone of her negative
reply and is a sign that she is aware of Tims insertion and the potential threat to Tims
positive face. The same particle is also invoked by Tim in line 9 as a face-saving tone
softener for his disagreement with Vickis comparison between love and game playing.
Vickis second code-switch to ba in line 21 implies her desire to solicit the approval or
agreement of the hearer with respect to the information conveyed by the sentence (Li and
Thompson 1981: 307); its semantic function resembles that of questions dont you think so?
or wouldnt you agree? in English. This also seems to contribute partly to Tims code
change in line 22 which combines the English utterance with the Chinese mitigator and
question marker ne.
In brief, Tim initiates the use of Chinese sentence-final particles, which results in a
similar choice of code on the part of Vicki, who, after a little while, also resorts to these
particles, which in turn affects Tims language use. This extract shows switching to
romanized particles is used to adjust and negotiate interlocutors involvement in this virtual
124

environment and to affect the mutual interpretation and participation in the ongoing dialogue.
Such careful tagging reduces assertiveness of the otherwise monolingual English utterances
and indicates that the accommodation of the interlocutors is probably out of face-maintaining
considerations when making suggestions, showing disagreement or indicating tentativeness.
This mixture of codes facilitates the building of rapport and intimacy between the speakers
involved.
It was also found that the insertion of pinyin minimized the chance of communication
breakdowns by softening the atmosphere that would be otherwise tense. In the following
excerpt, Tina and Jason ask for each others means of contact, i.e., Tinas number on QQ,
another chat program, and Jasons email address. However, somehow, neither of them
succeeds.
Excerpt 2 Part 1
27

Tina: are you playing music ne?

28

Jason: yeah

29

Jason: a loving one

30

Jason: got it

31

Jason: so noisy here

32

Tina: huh

33

Jason: hi

34

Jason: do you have a qq

35

Tina: no ya

36

Jason: i have no email

37

Tina: why ??

38

Jason: wo ye mei you a


[I dont have it either a]

Tinas attachment of ya to her reply to Jasons request for her QQ number in line 35 is a tone
softener, counteracting the forcefulness of her negative reply that is potentially facethreatening and offensive because it is likely to be interpreted as a refusal. Jason immediately
returns the rejection in line 36 in an unmarked manner to save his own face by claiming that
he does not have an email, which obviously takes Tina by surprise and threatens her positive
face, as shown by line 37. At this point, the atmosphere is getting tenser and the conversation
seems to have reached a deadlock. Then Jasons unexpected and thorough change of code in
125

line 38 suggests his realization of the possible embarrassment caused by his bluntness in line
36; it may be a repair of line 36 based on Tinas use of the mixed code in line 35. From
exclusively English to exclusively pinyin, this drastic code change indicates Jasons timely
adjustment to the changing context. Their conversation continues.
Excerpt 2 Part 2
39

Tina: no one have no email in this world

40

Jason: send you my pic by qq

41

Tina: no qq here

42

Jason: oh

43

Jason:

44

Tina: soooooooooooooooooo

45

Tina: even bargain

46

Jason: what?

47

Tina: even bargain ya

48

Tina: you have what i dont have

49

Jason: hehe

50

Tina: and i have what you dont have ya

51

Jason: never mind ya

52

Jason: qq is more convenient than email, at least for me

53

Tina: really? I see

so pitiful

Jason is not annoyed by Tinas disbelief about him not having an email. Instead, in line 40, he
offers to send his picture to Tina through QQ which is what Tina just claims she does not
have, only to be rejected indirectly by Tina in line 41 on the grounds that it is an even
bargain in line 45. Jasons what in line 46 shows that he is either surprised or does not
understand what Tina says, which provides a need and opportunity for Tina to repair the
satirical tone and provocativeness of line 45 through the attachment of ya to lines 47 and 50.
This modification makes the tone lighter and more playful, and thereby reduces the tension.
As a result, Jason seems to be able to conform to this newly emerging norm of code use, and
follows Tina in the use of ya for his never mind (line 51), which steers the conversation in a
more friendly direction. Thus the wh-questions of line 37 and line 46 both set off a
subsequent change of code use: before them, the participants use English when trying to
sound assertive to keep their own face; after them, they code-switch to pinyin to various
126

extents to save the addressees face. Besides, in this process, the participants are keenly
sensitive to the subtle messages conveyed through code alternation by the interlocutor and
often adjust their code choice accordingly. Social relations are thus implicitly co-constructed
in the virtual environment through a distinctive way of speaking when people modify their
own behaviors in the sequential context of a conversation.
This virtual community is also a place where peoples behaviors are manipulated
explicitly through shifts in code. In the following dialogue between Justin and Linda, the
mixed-code variety of language use not only heightens the interpersonal nature of the
conversation but also signifies a process in which Linda gets socialized to behave more
politely in this peculiar environment of interaction.
Excerpt 3 Part 1
54

Justin: hi, Linda, nice to see you

55

Justin: i am wonderful, and you?

56

Linda: glad to hear that

57

Justin: :)

58

Linda: compare with u, i should say it's as usual

59

Justin: that is not easy, many people get worse and worse, don't be greedy la

60

Linda: haha

Here Justins use of an imperative in line 59 is a Chinese way of establishing intimacy and
solidarity; but there is still a potential risk of being perceived as rude and offensive.
Therefore, la is attached to line 59 as a mitigator. This particle usually appears as a sentence
suffix, used in many Chinese dialects to present a sentence as rather light-going and to entice
solidarity. This combination of an English imperative and a Chinese particle is proved to be
effective because Linda is obviously not offended, but amused and pleased, as shown in line
60 as follows.
Excerpt 3 Part 2
61

Justin: dui ba?


[Is it right?]

62

Linda: dui ni ge tou la


[right you quantifier head la]

63

Justin: ah?
127

64

Justin: bu xing,
[You cannot do this.]

65

Justin: hai shi dui ni de tou ba


[It would be better if ]

Justins total code-switch for his tag question in line 61 makes the tone even milder, which
further counteracts the assertiveness in line 59. This is followed by another complete change
of code on the part of Linda in line 62. But Lindas hasty judgment of their social distance
causes her to make fun of Justin. Her ni ge tou la is a teasing way in Chinese of claiming
ones disagreement or negative opinion on what is said by the addressee, usually used
casually as a pet phrase with intimate friends or people lower in power rank. It can be seen as
Lindas effort in reducing her social distance with Justin. This bold use, as an indication of an
attempt for greater intimacy, is potentially face-threatening and sounds abrupt and impolite in
this context where interlocutors are usually stranger to each other. It turns out to be
detrimental to the atmosphere and gives rise to a communication crisis, which is substantiated
by Justins ah in line 63 showing his surprise with the way Linda talked to her. His
subsequent buxing in line 64 and hai shi dui ni de tou ba in line 65 reveal that he is obviously
offended and are strong protests against Lindas rude verbal behavior. In particular, in line 65,
Justin returns what Linda says in line 62 to Linda, with the addition of the affix hai shi
(meaning it would be better if) and the suffix ba. This seemingly polite expression in reality
conveys his dissatisfaction with Lindas manner, on the one hand, and works as a mitigator,
on the other hand, in the sense that it saves Lindas face through the joint use of the forcereducing prefix and suffix. Therefore, the code-switch in lines 64 and 65 can be perceived as
Justins blunt correction of Lindas verbal behavior and an indication of a change in
alignment.
Excerpt 3 Part 3
66

Justin: zen me la? bu hao yi si?


[Are you OK? Do you feel embarrassed?]

67

Linda: dui bu qi
[Im sorry.]

68

Justin: bu yao jin


[Never mind.]

128

69

Linda: en
[All right]

It is worth mentioning that Linda then pauses for almost half a minute, which is a likely
indication of Lindas embarrassment resulting from Justins explicit expression of
displeasure. Justins continuing use of pinyin in line 66, which shows clearly his concern
about how Linda feels, suggests his awareness of Lindas loss of face due to his utterance in
line 65 and has a remedial function for line 65. It helps alleviate the tension building up
between them that put the conversation on the verge of a breakdown, and finally gets Linda to
apologize using the same code in line 67 for what she says in line 62. Thus, Justin finally
regains his face; subsequently, his bu yao jin in line 67 marks clearly his willing acceptance
of Lindas apology, which is acknowledged by Linda whose onomatopoeic en in line 69
puts an end to the unpleasant and embarrassing part of their verbal interchange.
Excerpt 3 Part 4
70

Linda: I will leave

71

Justin: for what?

72

Linda: for working

73

Justin: i know le

74

Linda: bai la

75

Justin: that is for money

76

Justin: bai bai

But Lindas switching back to English in line 70 is an intentional attempt to increase social
distance; she obviously does not feel at ease about what just happens. This results in the same
code change on the part of Justin in line 71, who then incorporates another romanized
Chinese particle le to I know in line 73. According to Li and Thompson (1986: 240), the basic
communicative function of le is to signal a currently relevant state; in other words, it claims
that a state of affairs has special current relevance with respect to some particular situation. In
this case, it signals to Linda in a mild way that Justin has already understood the reason why
she is leaving and represents Justins effort to soften the tense atmosphere through
manipulating code use. It is followed by Lindas interesting combination of bai (a loan of
English bye), which is commonly used among young Chinese intimates, and a sentence-final
particle la. This is responded by Justin in a similar fashion, which concludes this online
129

encounter. The code shift to pinyin resumed by Justin in line 73 and used by both participants
thus helps to restore rapport between the two persons.
In short, Justins playful tone accomplished by his incorporation of pinyin into his
utterances enhances the intimacy with Linda and also leads to Lindas change of code as well
as her blunt tone suggestive of her misjudgment of their social distance. Her face-threatening
teasing causes some discomfort in Justin and turns out to be unacceptable for him. The
succeeding use of pinyin is corrective, enabling Justin to make Linda realize that he doesnt
like the way he was treated by Linda, which is followed by his inviting Linda back into the
conversation after realizing Lindas loss of face. Social distance is then increased by Linda by
switching back to English as a retreat from the embarrassment, and is ultimately reduced by
Linda by an interesting mixture of codes when she leaves the chat room. Both acts
immediately change Justins code choice. Therefore, code-switching facilitates Justin and
Lindas face management and proximity manipulation. It is particularly interesting that the
extent of code-switching seems to vary with the atmosphere and purpose of the speaker. A
complete switch to pinyin or English highlights the utterance and explicitly marks speech acts
as seeking agreement, protesting, apologizing or bidding farewell, indicating the negotiation
of social meanings between the two interlocutors.

4.

Conclusions

The above analysis reveals that code-switching, which has been shown to be an interactive
and dynamic negotiation process during which participants shape their social positions and
build their virtual environment, helps Chinese learners of English actively co-construct social
meanings and relations in this virtual chat room. Their code choice and degree of codeswitching are firmly anchored to the situational need in social distance and face maintenance.
The analyzed conversations lend further support to Olmsteads (2004: 23) claim that codeswitching, which indicates participants orientation toward the interaction and toward each
other, is a positive conversational resource that enhances sociability, and allows shared
understandings about the purpose of the interaction to enter into the language practice. It
helps people convey subtle messages that underlie the propositional content and signals a role
shift in the social alignments of the participants. From the interactive perspective, one
persons selection of code constrains the interpretation and the code choice of the addressee,
which in turn has a considerable effect on their context. Peoples use of code affects the
addressees involvement in the ongoing dialogue in that it either acknowledges the latters
intention behind the code choice or corrects his or her behavior perceived as inappropriate.
130

Chatters in this online virtual community have been shown to draw on the linguistic
and discursive resources of both English and Chinese in the development of a distinct virtual
social network, which contributes to the creation of their relationships as bilingual speakers
who resorts to code shifts, especially from English to pinyin, for more subtle interactive and
social purposes. This use of hybrid language also shapes roles for interlocutors in either
encouraging or inhibiting certain types of verbal behaviors. Social distance, identities, and
facework are negotiated rather than pre-established and fixed, which is particularly
meaningful in a context where participants are strangers and other contextualization cues such
as prosodic features and body language are not possible.
Tying code-switching in a computer-mediated community in an EFL setting to
approaching the online interaction demonstrates how the electronic chat room provides an
authentic and distinct context of social interaction. It illuminates how language is a valuable
asset that enriches our knowledge of the way specific interactive purposes are served in an
online environment typically populated by strangers. Meanwhile, an examination of the way
Chinese and English are mixed as contextualization cues to index social meanings can inform
our understanding of how people adjust to the practices of the virtual community they are
involved in.
Furthermore, thanks to the lack of visual and audio aids in the context under study, the
investigation of literary practices in this peculiar online setting also sheds some light on how
the verbal behaviors of English learners in the chat room relate to their local experiences of
English learning. It has to be recognized that the Internet offers unique opportunities for EFL
learners in China to use the target language. It provides a platform for people not only to
practice their English language, but also to create a new collective identity not simply as
English speakers or Chinese speakers, but as learners trying to converse in a language that is
rarely used in their daily life. On the one hand, this mixed-code variety works well among the
interlocutors since there are no obvious signs of confusion and misunderstanding as speakers
seem to have managed to effectively get across the propositional and non-propositional
messages. On the other hand, shifting skillfully to Chinese to various extents complements
the use of English in expressing subtle interactive and social meanings, which should have
been attended to in English, given the purpose of the chat rooms. Their shift to Chinese runs
counter to their purposes in a sense. This phenomenon that members of this community use
English primarily for ideational content and frequently resort to Chinese for interactive and
emotional nuance may suggest their underdeveloped ability to attend to the social and
pragmatic aspects of communication in English relative to Chinese. Therefore, from the
131

perspective of language learning, this study makes another good case for improving the
interactive competence of English in EFL settings where exposure to authentic language use
is rather limited.

References
Adendorff, R. (1996). The functions of code switching among high school teachers and
students in KwaZulu and implications for teacher education. In K. M. Bailey and D.
Nunan (Eds.), Voices form the language classroom: Qualitative research in second
language education (pp. 388406). Cambridge: Cambridge University Press.
Auer, P. (1984). Bilingual conversation. Amsterdam: Benjamins.
Auer, P. (1988). A conversation analytic approach to code-switching and transfer. In M.
Heller (Ed.), Codeswitching: Anthropological and sociolinguistic perspectives (pp. 187213). Berlin: Mouton de Gruyter.
Auer, P. (1998). Code-switching in conversation: Language, interaction and identity. New
York: Routledge.
Auer, P. (2007). A postscript: code-switching and social identity. Journal of Pragmatics,
37(3), 403-410.
Baym, N. K. (1995). The emergence of community in computer-mediated communication. In
S. G. Jones (Ed.), Cybersociety: Computer-mediated communication and community
(pp. 138-163). Thousand Oaks: SAGE.
Bays, H. (1998). Framing and face in internet exchanges: A socio-cognitive approach.
Linguistik Online, 1. Retrieved June 9, 2008 from http://viadrina.euv-frankfurto.de/~wjournal/bays.htm
Crdenas-Claros, M. S. and N. Isharyanti. (2009). Code-switching and code mixing in
internet chatting. The JALT CALL Journal, 5(3), 67-78.
Chao, Y. R. (1968). A grammar of spoken Chinese. Berkeley: University of California Press.
Cheng, L. and K. Butler. (1989). Code-switching: A natural phenomenon vs language
deficiency. World Englishes, 8(3), 293-309.
Chu, C. C. (1998). A discourse grammar of Mandarin Chinese. New York: Peter Lang
Publishing.
Cromdal, J. and K. Aronsson. (2000). Footing in bilingual play. Journal of Sociolinguistics,
4(3), 435-457.
Crystal, D. (2001). Language and the Internet. Cambridge: Cambridge University Press.
132

Freiermuth, M. R. (2001). Features of electronic synchronous communication: A comparative


analysis of online chat, spoken and written texts. Unpublished Masters dissertation,
Oklahoma State University, Stillwater.
Gardner-Chloros, P. (1991). Language selection and switching in Strasbourg. Oxford:
Clarendon Press.
Goffman, E. (1981). Forms of talk. Philadelphia: University of Pennsylvania Press.
Goldbarg, R. N. (2009). Spanish-English codeswitching in email communication. Language
@ Internet, 6. Retrieved June 19, 2009 from
http://www.languageatinternet.de/articles/2009/2139
Gumperz, J. J. (1982). Introduction: Language and the communication of social identity. In J.
J. Gumperz (Ed.), Language and social identity (pp. 1-21). Cambridge: Cambridge
University Press.
Hall, K. (1996). Cyberfeminism. In S. C. Herring (Ed.), Computer-mediated communication:
Linguistic, social and cross-cultural perspectives (pp.147-170). Amsterdam: John
Benjamins.
Healy, D. (1996). Cyberspace and place: The internet as middle landscape on the electronic
frontier. In D. Porter (Ed.), Internet culture (pp.55-72). New York: Routledge.
Heller, M. (1988). Codeswitching: Anthropological and sociolinguistic perspectives. Berlin:
Mouton de Gruyter.
Ho, J. W. Y. (2006). Functional complementarity between two languages in ICQ.
International Journal of Bilingualism, 10(4), 429-451.
Hoffman, C. (1991). Introduction to bilingualism. New York: Longman.
Lam, W. S. (2004). Second language socialization in a bilingual chat room: Global and local
considerations. Language learning technology, 8(3), 44-65.
Li, C. N. and S. A. Thompson. (1986). Mandarin Chinese. Berkeley: University of California
Press.
Liebscher, G. and J. Dailey-O'Cain. (2005). Learner code-switching in the content-based
foreign language classroom. The Modern Language Journal, 89(2), 234-247.
Lu, J. Y. (1991). Code-switching between Mandarin and English. World Englishes, 10(2),
139-151.
McConvell, P. (1988). Mix-im-up: Aboriginal code-switching, old and new. In M. H. (Ed.),
Codeswitching: Anthropological and sociolinguistic perspectives (pp. 97-149). Berlin:
Mouton de Gruyter.
133

Myers-Scotton, C. (1988). Self-enhancing codeswitching as interactional power. Language


and Communication, 8(3), 199-211.
Nilep, C. (2006). Code-switching in sociocultural linguistics. Colorado Research in
Linguistics, 19(1). Retrieved June 19, 2008, from
http://www.colorado.edu/ling/CRIL/Volume19_Issue1/paper_NILEP.pdf
Olmstead-Wang, S. (2004). Construction sociability through code-switching in MandarinEnglish family conversations. Unpublished doctoral dissertation. The University of
Alabama, Tuscaloosa.
Ostman, J. O. (1982). The symbiotic relationship between pragmatic particles and impromptu
speech. In N. E. Enkvist (Ed.), Impromptu speech: A symposium (pp.147-177). Abo:
Akademi.
Porter, D. (1996). Introduction. In D. Porter (Ed.), Internet culture (pp.11-18). New York:
Routledge.
Rheingold, H. (1993). The Virtual Community: Homesteading on the electronic frontier.
Reading: Addison-Wesley.
Saville-Troike, M. (1982). The ethnography of communication. Oxford: Blackwell.
Sert, O. (2005). The functions of code-switching in ELT classrooms. The Internet TESL
Journal, 11(8), Retrieved February 20, 2009 from
http://iteslj.org/Articles/Sert-CodeSwitching.html
Simon, D. L. (2001) Towards a new understanding of codeswitching in the foreign language
classroom. In R. Jacobson (Ed.), Codeswitching worldwide II (pp. 311342). Berlin:
Mouton de Gruyter.
Van Lier, L. (1996) Conflicting voices. In K. Bailey and D. Nunan (Eds.), Voices from the
classroom. Cambridge: Cambridge University Press.
Levinson, S. C. (1992). Activity types and language. In P. Drew and J. Heritage (Eds.), Talk
at work: Interaction in institutional settings (pp. 181-205). Thousand Oaks: SAGE.
Su, H. (2009). Code-switching in managing a face-threatening communicative task: Footing
and ambiguity in conversational interaction in Taiwan. Journal of Pragmatics, 41(2),
372-392.
Tay, M. W. (1989). Code-switching and code mixing as a communicative strategy in
multilingual discourse. World Englishes, 8(3), 293-309.
Tepper, M. (1997). Usenet communities and the cultural politics of information. In D. Porter
(Ed.), Internet culture (pp.39-54). New York: Routledge.

134

Swofford, M. (2006). The Three NOTs of Hanyu Pinyin. Retrieved March 15, 2006 from
http://www.pinyin.info
Wilbur, S. P. (1996). An archaeology of cyberspaces: Virtuality, community, identity. In D.
Porter (Ed.), Internet culture (pp.5-22). New York: Routledge.
Wine, L. (2008). Towards a deeper understanding of framing, footing, and alignment.
Working Papers in TESOL & Applied Linguistics, 8(2), 1-3.

135

Interrogating Current Conceptualisations of Word for Word Knowledge Studies:


Challenges and Prospects
Jabulani Sibanda
Rhodes University
jabusbnd@gmail.com
Bioprofile: Jabulani Sibanda is currently studying for a Ph.D. degree with Rhodes University
(South Africa). He has taught and published in the area of second language teaching and
literacy. His primary research interests are in second language teaching and research, and
literacy development.
Abstract
The present paper interrogates the efficacy of the conceptualisation of the construct word
represented by token, type, lemma, and word family as units of measurement in
English vocabulary knowledge research studies. It uses Grade 3 second language learners of
English in South Africa as the context for investigating the adequacy and validity of each of
the word units. The paper argues that the token and type units disregard of the important
principle of learning burden and the lemma and word family units over extension of the
principle militates against their validity as units of vocabulary measurement. The paper casts
doubt on the feasibility of objectively defining the lemma and word family membership
with precision, which compromises their efficacy as units of word measurements. An
extension of Nation and Bauers (1983) levels of word family membership, through a
determination of inflected and derived forms of base words learners show a propensity for
acquisition and the order of that acquisition, is proposed as a desirable and requisite way
forward.

Keywords: token, type, lemma, word family, learning burden, word knowledge

Introduction
The lofty place of words in language proficiency has long been acknowledged in statements
like what learners carry around with them are dictionaries and not grammar books (Baxter
1980) and without grammar very little can be conveyed, without words nothing can be
conveyed (Wilkins 1972: 111). Both statements attest to the superior effect of vocabulary
over grammar for the development of language proficiency. In fact, grammar and language
proficiency are an outgrowth of ones lexical competency which renders word knowledge a
proxy of language proficiency. Research has consistently testified to vocabulary having
136

higher correlations with language proficiency than other measures (Qian 2002, Koda 2005,
Chen 2011). Words have both an upward and downward influence; downward to their
constituent morphemes and upward to larger units of which they are parts. In the latter, they
form the basis of all language as they are basic units of meaning upon which larger structures
like phrases, sentences, and paragraphs hinge. The bulk of vocabulary research focuses on
individual words. The exalted status of words in language proficiency coupled with Mrmols
(2011: 12) observation that despite new trends in vocabulary research that focus on higher
units as collocations or idioms, there is no doubt that the word is the main unit in vocabulary
quantification and language by and large is demonstrative of the merit there is in closely
examining the concept word which the present paper seeks to do. The interrogation of the
efficacy of the current conceptualisations of the construct word is done in the context of
Grade 3 second language (L2) learners transitioning to reading to learn in Grade 4. Such a
context, it is hoped, would be illustrative of the need for a further reconceptualization of the
construct word for word knowledge measurements on Foundation Phase (FP) L2 learners.

The Context
The Grade 3 learners who speak any of the 10 official languages of South Africa (excluding
English) as their Home Language (HL) or First Language (L1) who are on the verge of a
transition to Grade 4 form the context on which the papers discussion hinges. The table
below indicates the Home Language distribution according to the 2011 census.
SOUTH AFRICAN LANGUAGES 2011
Language
Number of speakers* % of total
Afrikaans
6 855 082
13.5%
English
4 892 623
9.6%
IsiNdebele
1 090 223
2.1%
IsiXhosa
8 154 258
16%
IsiZulu
11 587 374
22.7%
Sepedi
4 618 576
9.1%
Sesotho
3 849 563
7.6%
Setswana
4 067 248
8%
Sign language
234 655
0.5%
SiSwati
1 297 046
2.5%
Tshivenda
1 209 388
2.4%
Xitsonga
2 277 148
4.5%
Other
828 258
1.6%
TOTAL
50 961 443**
100%
* Spoken as a home language
** Unspecified and not applicable excluded
Source: Statistics SA

137

Third graders from such linguistic demographic profiles are expected to learn in their HL for
the duration of the Foundation Phase (Grade R-3) and shift, largely to English as the
Language of Learning and Teaching from fourth grade onwards (South Africa Department of
Education Curriculum and Policy Statement (CAPS) 2011). Prior to the CAPS dispensation
(which has only been phased in with effect from 2012) schools were at liberty to determine
the point at which they wanted to introduce English as a subject in their FP curriculum. The
current third graders therefore, have a diverse duration of exposure to English ranging from -1
year to a maximum of 4 years for those who have had exposure to English since Grade R.
Although they have been in school for almost three years, there is a sense in which the
majority of them are beginners in terms of exposure to English. The fact that for most of
them, English is not sufficiently reinforced at home (CAPS 2011) represents a challenge
which is accentuated by the fact that the focus of fourth grade reading is reading to learn
which is qualitatively more challenging than the FP learning to read. The assumption is that
by end of third grade the learners have attained reading proficiency in the language they are
going to use to learn, and are now well positioned to use their reading proficiency to learn
textual material. Even among HL speakers of English, a fourth grade slump, a designation of
the sudden drop-off between third and fourth grade in the reading scores (Hirsch 2003:
10) is a common phenomenon. For second language learners who have had scant exposure to
English both at home and at school, the slump could only be worse. Recognising how much
vocabulary is a proxy for language proficiency, a measure of such learners vocabulary
knowledge would be indicative of their chances of surviving the impending slump. The
question meriting consideration is whether there is a conceptualisation of the construct word
which is equal to the task of indicating the actual word knowledge of learners with the profile
described.
Conceptualisation of the Construct Word
The infamous question What is a word? has plagued the field of vocabulary testing for years
and has defied singularity or uniformity of definition. Discrepancies in vocabulary size
estimates are primarily a result of lack of consensus on what constitutes a word for wordcounting purposes. Put differently, if a child knows all the words in the statement, The boy
did not go to the shops when the other boys were going, how many words do they actually
know? Should we keep counting the word the the three times it recurs in the statement or
should we just count it once? Can we not presuppose the knowledge of boys to be an
outgrowth of the knowledge of boy to warrant treating them as the same word? Should go
138

and going not be taken as one word in different forms? Such fundamental questions lead to
diverse conceptualisations of the construct word. In a bid to respond to such questions, the
field of vocabulary measurement has landed itself with four conceptualisations namely: word
as token, word as type, word as lemma, and word as word family. The relative merits of these
word constructs in relation to the context of this paper require examination. D'Anna,
Zechmeister and Halls (1991: 111) question, When we say that a child learns 3,000 or 5,000
words per year, what exactly are we talking about? is as valid now as it was then.

Word as Token
Ordinarily we identify words simply by the space between the strings of letters in written
language (Luitel 2011: 59). This is consistent with Carter in Cataln and Franciscos (2008:
151) definition of a token as any sequence of letters (and a limited number of other
characteristics such as hyphen and apostrophe) bounded on either side by a space or
punctuation mark. Any expression devoid of any spaces within it and separated by spaces
from other expressions is consistent with the view of word as token. Such a conceptualisation
can, however, be faulted on the basis of its failure to account for some compound
constructions like cannot which can be regarded as one or two words depending on how
they are written. As well, should hyphens be considered as spaces or not? If they should, what
do we say about the inconsistency in the division of compound words like injustice and inlaws? Some words like ice cream are visualised and thought of as one word despite having
two forms and there is the complication of whether we need to consider the forms making up
the expression or the concept represented by the forms. Does the fact that an ice cream is one
item make the word a single word or does the presence of two forms make it two words?
Mrmol (2011) contends that because such words represent a single concept and learners
learn and understand them as just one concept, they should be considered as single words.
The criterion of spaces demonstrates the uninterruptiblity of words where one cannot add
anything between words as they would with a sentence. Inserting another word between a
word and its inflection is impossible but you can always add a qualifier to say more about a
verb or noun in a sentence. Tokens are also referred to as running words in a text and each
occurrence of a form is counted separately (Luitel 2011: 59). Tokens indicate the total
number of words in a text or corpus yielding the quantity of input in a text in raw terms
(Mrmol 2011). According to Nation (2001), tokens are the conceptualisation of word we
would be making reference to when we talk about a summary, a telegram, or a research paper

139

being so many words long. Every occurrence of each word is counted despite the recurrence
of some words in the text.
There are limitations to the application of the token as a conceptualisation of word in
vocabulary measurement. Most vocabulary measurement studies utilise word frequencies to
determine the most frequent words and the learners extent of their knowledge. Using the
token as a unit of measurement would make computation of word frequencies impossible
since every stand-alone form is regarded as a different word. Token as a unit of analysis treats
every form as diverse from the others implying that each form has to be learnt separately. In a
statement Your mother was talking to my mother in your garden, the words mother and
your, which appear twice each, are regarded as four different words yet everything about
them (orthographic make-up, meaning, and pronunciation) remains the same. Apart from
treating the same form as a different word whenever it recurs in text, forms like boy and boys
are presumably learnt one by one. This would make vocabulary acquisition and learning a
painfully slow process. What should, and does, happen is that sometimes we learn the
meanings of some words by inferring them from those related words which are already part of
our repertoire. Even the English Second Language (ESL) third graders profiled in this paper
can deductively recover some words meanings from those they already know. The token
therefore, falls short as a unit of word counting for word knowledge studies in this and other
contexts. Word as type addresses some of the limitations of the token construct and so
deserves some scrutiny.

Word as Type
According to Read (2000), in the conceptualisation of word as type, only the word form that
is dissimilar from all the others in an utterance is counted. Any recurring word form is only
counted once. Using the Your mother was talking to my mother in your garden example, we
can note that although there are ten tokens, there are only eight types since the words your
and mother appear twice in the statement. If we adopt the word as type as the unit of
quantification, all words identically spelt will be considered as one word. Word types would,
then, be all those items with different orthographic identity. Nation (2001: 7) observes that
conceptualising words as tokens is necessary when responding to questions like How large
was Shakespeares vocabulary? Conceiving a word as a type is based on two assumptions:
first, that knowing a particular word in one context translates to its knowledge in different
contexts making it one word no matter the number of times it recurs in a text; and, second,
that every individual word type is unique and its understanding does not depend on an
140

understanding of another. Learners knowledge of some words should, therefore, not be


inferred from their knowledge of other words. Both assumptions are questionable. The fact
that a words identity rests on its orthographic composition or spelling leads to problems with
homonyms which take on several meanings depending on the context of use. An overused but
apt example is that of bank. Word as type considers such as one word when it can be many
words. Some words also function as both nouns and verbs depending on their use. An
example would be the form pin in the statement Get the pin and pin the papers. The first
pin is a noun and the second is a verb, and knowledge of the first does not guarantee that of
the other. This discounts the assumption that because the word is spelt the same, it is the same
word wherever it is encountered. The other limitation of the word as type construct is the
disregard of the idea that some words meanings can be extrapolated from knowledge of
related others. Knowing the word boys logically presupposes knowledge of the word boy
and the two would well be considered as one word even for ESL FP learners. Such a
generalisation lacking in type is the basis upon which the lemma is built.

Word as Lemma
The lemma is preferred for lexical quantification on account of overcoming the limitation of
having to consider each word form as a unique form unrelated to the other forms as does the
type and token conceptualisations. Gardner (2007) notes that, in a lemma, all lexical forms
share the same stem and word class, and differ only in inflection or orthographic make-up.
The words write, writes, writing, written and wrote are all verbs emanating from the base
form write. The -s, -ing, -en are the inflections which are just indicative of a change in
grammatical functioning of the same base word write. The lemma is based on the assumption
that the knowledge of the inflected forms is eased and expedited once the base form, as well
as the morphological inflections, are known. The learning burden, which Nation (2001)
defines as the amount of effort required to learn a new word, is eliminated or eased
considerably if the base word is known. Knowledge of the inflectional system of English
would ease the learning of the inflected forms on the basis of the knowledge of the base form.
The other justification for considering inflected forms as one word with the base form is that
morphemes do not create new words; they merely modify the form in which they occur to
indicate grammatical functioning, such as plurality. The base form which has to be known in
this instance is write and what the inflections do is to give grammaticality to the functioning
of the same word in different contexts.

141

The requirement of having all members of a lemma belong to the same word class
would disqualify the form writer from the lemma of write, writes, writing, written and wrote
as it belongs to the class of nouns. It would become a base word for a different lemma of
writer, writers, writers and writers. The assumption is that the learning burden of words
emanating from the base form belonging to the same word class is less than that of inflected
forms from the same base which cut across word classes. Browne, Cihi and Culligan (2007:
2) exemplify and corroborate this assumption when they posit that the statistical item
difficulty factors for accept, accepts and accepting are very close, whereas the statistical
difficulties for acceptable, acceptance and unacceptable, are all quite different. One
hypothesis is that the brain treats these six items as four different Base Words. Such an
argument necessitates and rationalises the confinement of members of a lemma to a single
word class. The example of the six word forms given fit the argument well but going back to
the examples of inflected forms emanating from write, one may argue that knowledge of the
base form write may make the form writer easier to one learner than the form wrote or written
which belongs to the same word class as write. That the definition of a lemma cited above
accommodates irregular verbs like went for go, sought for seek or am, is, are, was, were,
being for be within a lemma makes the assumption that belonging to the same part of speech
as the base reduces the learning burden of a word highly suspect. As Gardner (2007: 244)
observes, the case of the irregulars poses serious quandaries relating to the psychological
validity of such family relationships namely, that the opaque spelling and phonological
connections between the lemma headword and the family members will surely cause more
and different learning problems than their more transparent counterparts. This defeats the
whole principle of learning burden for which the lemma is created to uphold.
Nation (2001: 8) registers concern over the inclusion of irregular forms within a
lemma when he notes that one problem in forming lemmas is to decide what will be done
with irregular forms such as mice, is, brought, beaten and best. The learning burden of these
is clearly heavier than the learning burden of regular forms like books, runs, talked, washed
and fastest. Should the irregular forms be counted as a part of the same lemma as their base
word or should they be put into separate lemmas? The orthographic constitution or spelling
of the word best is not in any way indicative of stemming from the base form good.
Including it within the lemma of good would present an even higher burden of recovering
its meaning from the latter than it would be in learning its antonym bad for instance.
Irregular plurals or verbal forms may need to be considered independently from their
headwords but such exclusion would mean quite a number of words would just be treated as
142

types or tokens as they cannot belong to lemmas. The words like good, better, best would not
be part of any lemma, as would all the irregular forms. The lemma should be a grouping of all
those words whose understanding is almost made obvious whenever the base form is known,
rather than a collection of words, which are brought together by virtue of them being inflected
from the same base form. Irregular forms normally use inflections diverse from regular ones
which gives an abstract status to morphemes. The regularity of frequent or regular inflections
stems from them being the inflections added to the vast majority of content words (verbs,
nouns, adjectives, and adverbs) to reflect grammatical properties such as tense, number, and
degree. The criteria of inflection and belonging to the same word class are not tight enough to
ensure only those words whose meanings are easily recoverable from the meaning of the base
gain entrance into the lemma.
Nation (2001) broadens the scope of a lemma to include the contracted forms. One
may express reservations over the inclusion of contracted forms on at least two grounds. First,
knowledge of the contracted form requires knowledge of, not only the base form, but also that
of not since the contracted form is both a fusion and reduction of two words (for example,
can + not = cant). Second, there are transparent and opaque kinds of contractions and the
opaque contractions cannot easily be inferred from the base form + not. Transparent
contractions would be forms like have + not = havent, do + not = dont and the opaque
forms would be will + not = wont, am + not = aint, shall + not = shant. The opaque
contractions have a higher learning burden which does not justify treating them as part of the
same lemma as the base especially for vocabulary knowledge measurement on second
language Foundation Phase learners. Asserting that beginners can associate such irregular
forms with their headwords is fundamentally unrealistic.
Possibly from realising the problems of having a too-accommodative criteria for a
lemma, Milton (2009: 10) makes the conception of a lemma less accommodative but more
manageable by narrowing its definition saying, it ...includes a headword and its most
frequent inflections and this process must not involve changing the part of speech from that of
the headword. In formulaic terms, the definition of a lemma can be represented, thus:

Lemma = headword + most frequent inflections + their contracted forms (belonging to same
class)
The use of the word most frequent is noteworthy and could well be interchanged or used
together with transparent. The only problem with most frequent is that it leaves the
143

determination of most frequent to the researchers discretion in the absence of frequency lists
of inflected forms. The frequency also needs qualification, whether it is the frequency with
which the inflected form is used in a text, or the frequency that stems from the number of
English words that an inflection inflects. The former kind of frequency would be relative to
text as frequent forms in one text may be less frequent in another. A definition of word whose
criterion is of a relative nature is not tight enough to allow easy and objective application. The
latter kind of frequency does not guarantee that inflections that have a lower spread in their
use are more difficult than those that impact a wide range of word forms in the language.
The lemma is also based on an assumption that inflections are easier than other forms
of affixation (prefixation and suffixation) which can be challenged. Some suffixes like -able
and -less and prefixes like un- have meaning in and of themselves which can be used to
recover the meaning of a suffixed and prefixed form like suitable, careless and unfair;
yet, inflections are devoid of such independent meaning. Such systematic use of affixes can
be used to significantly reduce the learning burden of the words derived from a known base
form. That the inflections -s and -es can be used for both verb and plural forms can be a
confounding factor on its own. This is not to imply such is absent from affixed forms.
In this paper, reference has severally been made to the base form, better known as the
headword but what really constitutes or counts as a headword is not clear. Nation (2001)
raises Sinclairs concern whether a headword should be the base form or the most frequent
form. The base form may not be the most common form or the form that learners are likely to
acquire first. The base itself can be recoverable from the most common form which justifies
the supposed complication of which to consider as the headword, the base form or the most
common form. That the construct lemma is elusive to define with precision explains why,
although the comparative and superlative forms have always been considered English
inflections, Nation (2001) notes that, in the computerised, lemmatised list of the Brown
Corpus (Francis and Kuera 1982), these are excluded.
Stubbs (2002) proposes an additional criterion for membership into a lemma: the
requirement that all the members share the same meaning, a criterion challenged for its failure
to distinguish a lemma from a lexeme. The lexeme also denotes a group of words sharing the
same meaning and same word class which the lemma does as well. An additional criterion
complicates the determination of what it is that should gain admission into the lemma
membership. Acknowledging the difficulty of constituting a lemma and the unconvincing
generalisations often emanating from generalizations about whole lemma. Knowles
and Mohd Don (2004: 71) advise researchers to consider individual words or actually
144

even individual word meanings as the basis for their word count and analyses. This is
almost a call to revert to conceptualisation of word as type.
Brain research has provided insights which support the learning burden principle but
not the constitution of lemmas. Browne, Cihi and Culligan (2007: 2) assert that the brain
stores and processes lemmas having similar difficulty factors as forms of the same word,
andstores and processes lemmas having different difficulty factors as different words. The
idea of coming up with a formula for defining what qualifies as a lemma is a noble one which
seeks to make the determination of lemmas objective. We have already seen how some
inflected or contracted forms are more difficult than others, implying that there is no
justification in generalising that because a word is an inflection or contraction of a base form
then it should enjoy lemma membership. Browne, Cihi and Culligans (2007) observation that
some lemmas are registered by the brain as separate words, rather than one word, casts doubt
on the validity of lemmas as a unit of vocabulary counting and analysis. That the brain does
not always store and process lemmas as we constitute them points to the need for either a
revisit of the constitution of lemmas if not a creation of another unit of counting.

Word as Word Family


Nation (2001) identifies the components of a word family as a headword, its inflected forms
and closely derived forms (derivatives). Derivation differs from inflection in that, while
inflection does not produce separate words, derivation creates separate but morphologically
related words usually involving some change in form. A subjective element is introduced by
the expression closely derived forms as one cannot make, with objective certainty, the
determination of the closely derived forms and those not so closely derived. Words from
across word classes can gain membership into a word family. Word families have their basis
on the understanding that the acquisition of thousands of words is through the application of
rules which make words into morphological families which ensures little or no extra
learning when one or more of the members is already known to the learner (Chung 2009:
162). For instance, the process of affixation, which includes prefixation and suffixation, eases
the learning of a lot of words. A word family, therefore, includes a wider range of
inflections and derivationsas the basis of word counts (Milton 2009: 11). Our word family
formula would be:

Word Family = Base form + Basic inflected forms + Transparent derivatives

145

The learning burden principle is the basis upon which the word family unit is constructed.
Knowledge of the base form engenders knowledge of its inflections and its close derivatives.
The word family unit is too accommodative of members into the family than the lemma. In
the first place, there is an inclusion of derivatives which are not included in the lemma, and
second, the restriction of having members belong to the same word class does not apply.
Word family members traverse boundaries of grammatical classes. Several lemmas usually
find themselves part of a single word family. From the base form long can come long, longer,
longest, longevity, longish, length, lengthen, lengthy; and all these can be considered as one
word under the word family unit of analysis. Certainly, all these forms cannot have similar
learning burden from the base form to warrant inclusion in the same word family. Even
derived forms differ in their complexity and difficulty of comprehension (Browne, Cihi and
Culligan 2007). That all these forms would be known once the base form is known is the
argument behind the word family unit. Mrmol (2011: 12) challenges such an assumption by
pointing out that we cast doubt on the idea that a child acquiring bed has also acquired
bedroom. There is the possibility that an adult could guess the meaning of the latter, but a
young language learner in his first stages of acquisition may not be able to make those
inferences. The word family unit depends for its use on the learners possession of an
intricate knowledge of morphological inflections of the English language in order to make
intelligent guesses about the meaning of some words on the basis of knowledge of their base
form. Evidently, learners, such as the ones described in this paper, would not possess the
native-like knowledge of morphological relations between words in a family. Schmitt and
Zimmermans (2002) study which required non-native postgraduate and undergraduate
participants to identify the derivational forms of stimulus stem words revealed that
participants could only rarely provide all the different derivations of the stimulus words. This
suggested only partial knowledge of derivational forms on the part of the participants. Bauer
and Nation (1993) even add that learners should know that mean does not derive from me,
despite the orthographic or spelling string for me occurring in mean. Learners should also
have some implicit knowledge of the role of affixes (prefixes and suffixes) in word formation
and word meaning, as well as use permissible base-affix combinations in speech and writing.
Because it takes in a broader membership and treats the different members as one, most if not
all the challenges confounding the application of a word family for word frequency counts
and word knowledge analysis are similar to, and even take a greater magnitude than, those of
lemmas as discussed in this paper. The challenge of deciding what should be included in a
word family and what should not is as manifest in the word family unit as in lemmatisation.
146

Bauer and Nations (1993) studied inflections and affixations of English words based
on their productivity, frequency, regularity and predictability and came up with a scheme for
defining word-families. They came up with seven levels or a word family scale based on an
analysis of the 1,000,000 token Lancaster-Oslo-Bergen (LOB) corpus dealing mainly with
affixation. These levels were supposed to form the basis for teaching and learning of English
words. The scheme is a welcome acknowledgement that learners knowledge of affixation
develops with more experience of the language. A sensible word family for one learner may
be beyond another learners current level of proficiency. This necessitates the scaling of word
families from the most elementary and transparent members to those of less obvious
possibilities (Nation 2001). At level 1, learners are assumed to treat each form as a different
word. The table below, adapted from Bauer and Nation (1993: 254), takes the scale from the
second level to the seventh level of inflections and affixations.

Level Affixation and inflection


1

No affixes.

-s, -ing, -ed, -er, -est, (all inflections)

-able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, (Most frequent and regular
derivational affixes)

-al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, -ous, in- (Frequent, orthographically
regular affixes)

-age, -al, -ally, -an, -ance, -ant, -ary, -atory, -dom, -eer, -en, -ence, -ent, -ery, -ese,
-esque, -ette, -hood, -l, -ian, -ite, -let, -ling, -ly, -most, -ory, -ship, -ward, -ways,
-wise, ante-, anti-, arch-, bi-, circum-, counter-, en-, ex-, fore-, hyper-, inter-, mid-,
mis-, neo-, post-, pro-, semi-, sub-, un- (Regular but infrequent affixes)

-able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y, pre-, re- (Frequent but irregular
affixes)

ab-, ad-, com-, de-, dis-, ex-, and sub- (Classical roots and affixes)

N.B.: Bracketed words in italics at the end of levels 4 through 7 are not part of the original.
Gardner (2007: 247) appreciates the apparent advantage of this seven-level
categorization scheme that Word or Word Family can be operationalized at various
defensible levels for analysis and comparative analysis purposes at least in terms of
learners abilities to associate morphologically related words. Bauer and Nation (1993) need
147

to be applauded for hierarchically organising word family levels which can be matched with
learners competence levels. For a learner operating at level 5, for instance, all the words in
levels 1 to 5 emanating from the same base would be considered as a single word, but those in
levels 6 and 7 would be regarded as different words from their base form. It is also significant
that such categorisation was done systematically on the basis of a rigorous criteria identified
above (their productivity, frequency, regularity and predictability) and on a large corpus (a
million words) which gives the categorisation a substantial measure of validity.
Gardner, however, notes as problematic, the repetition of many affixed forms at the
different levels, failure to acknowledge that derivational prefixes and derivational suffixes
may present different learning dilemmas for developing readers, as well as assuming that
learners exposure to, and acquisition of, morphologically-related words is somehow linear in
nature in other words, that language learners acquire base forms before their inflected and
derived family members (2007: 247).
Such an assumption is refuted by Biemiller and Slonim (2001), who note that young
children may actually acquire many derived forms before they acquire their root-form
counterparts. Concerning the duplication of affixes, an example would be the suffix -able in
level 3 and in level 6 which presents uncertainty about membership level of forms like
suitable on the word family scale. The assumption of the linear nature of exposure and
acquisition of word family members rests on a shaky pedestal. A form like disadvantage
(level 7, according to the taxonomy of levels of inflections and affixations) can have a lower
learning burden than advantageous (level 4).
Such categorisation as Bauer and Nation (1993) come up with seems to come as a
solution to the challenge of determining what qualifies as a member of a word family. The
present paper, however, takes exception to the idea of basing the categorisation of the word
family levels solely on the basis of a corpus without complementing it with empirical
evidence of the ease with which learners acquire the different affixed forms. This is not a
criticism of Bauer and Nations (1993) work but a pointer to the need for further large scale
research to corroborate the match between the levels of the corpus analysis and the
psychological realities of learners word learning and acquisition.

Conclusion: The Potential Way Forward


A resuscitation and extension of morpheme studies by Dulay and Burt (1974), Fathman
(1975), Makino (1980), cited in Krashen (1982), showed that acquisition of English
148

grammatical structures follows a 'natural order' which is predictable and is independent of


instruction, learners' age, L1 background, or conditions of exposure. Such studies need to be
conducted with both L1 and L2 learners of different language backgrounds to determine the
extent of the match between the language corpus ideals and the psychological realities of the
learners. Browns (1973) longitudinal study reported in Kwon (2005: 4) produced the
following order of L1 acquisition of English Morphemes.

Rank

Morpheme

Present progressive (-ing)

2/3

in, on

Plural (-s)

Past irregular

Possessive (-s)

Uncontractible copula (is, am, are)

Articles (a, the)

Past regular (-ed)

10

Third person singular (-s)

11

Third person irregular

12

Uncontractible auxiliary (is, am, are)

13

Contractible copula

14

Contractible auxiliary

The above hierarchical ordering is limited in two ways. First, the studies are based on native
English language speakers and importing the ranking wholesale to ESL learners may be
misleading. Second, the studies are exclusively based on morpheme studies when in fact most
high frequency words are just sight words which cannot be reduced to their morphological

149

composition. The paper, therefore, argues for extensive testing and documentation of the
acquisition order of English affixed forms (suffixed and prefixed for both inflections and
derivations). The testing should cover a wide range of learner profiles from diverse language
backgrounds and competence levels. The resultant taxonomy should ensure that only those
lexical forms which pose negligible or no learning burden in the event that the base form is
known, are regarded as one word. Two forms may justifiably be regarded as one for one
learner but not for another depending on their level of competence. A taxonomy of word
conceptualisation levels is, therefore, needed where, at the first level, some lexical forms may
be regarded as separate words but, at the next levels, be considered as one word. Researchers
would then choose the level at which they conceptualise word for their word knowledge
measurements depending on the competence level of the learners. A departure from a one
size fits all would make possible the replication of studies. One would just need to specify
that they based their studies on level 3 of the word conceptualisation taxonomy. Explicit rules
would need to be generated for word membership at each level and exceptions identified.
Even teachers would know which lexical forms they need to give preference to for explicit
instruction depending on the competence level of the learners. A move away from the current
word conceptualisations would ensure more realistic and valid conclusions on word
knowledge measurement studies.

References
Baxter, J. (1980). The dictionary and vocabulary behavior: A single word or a handful?
TESOL Quarterly, 14, 325-336.
Bauer, L. and I. S. P. Nation. (1993). Word families. International Journal of Lexicography,
6, 253279.
Cataln, R., J. and R. M. Francisco. (2008). Vocabulary input in EFL textbooks. RESLA, 21,
147-165.
Chen, K., Y. (2011). The impact of EFL students vocabulary breadth of knowledge on
literal reading comprehension. Asian EFL Journal, 51, 30-40.
Chung, T., M. (2009). The newspaper word list: A specialised vocabulary for reading
newspapers. JALT Journal, 31(2), 159-182.
D'Anna, C., A., E. B. Zechmeister and J. W. Hall. (1991). Toward a meaningful definition
of vocabulary size. Journal of Literacy Research 23(1), 109-122.

150

Gardner, D. (2007). Validating the construct of word in applied corpus-based vocabulary


research: A critical survey. Applied Linguistics, 28(2), 241-265.
Hirsch, E. D. (2003). Reading comprehension requires knowledge of words and the world:
Scientific insights into the fourth-grade slump and the nations stagnant
comprehension scores. American Educator. American Federation of Teachers.
Knowles, G. and Z. Mohd Don. (2004). The notion of a lemma: Headwords, roots and
lexical sets. International Journal of Corpus Linguistics 9(1), 69-81.
Koda, K. (2005). Insight into second language reading: A cross-linguistic approach.
Cambridge: Cambridge University Press.
Krashen, S. D. (1982). Principles and practice in second language acquisition. Oxford:
Pergamon.
Kwon, E. Y. (2005). The natural order of morpheme acquisition: A historical survey and
discussion of three putative determinants. Teachers College, Columbia University
Working Papers in TESOL and Applied Linguistics, 5(1), 1-21.
Luitel, B. (2011). Vocabulary in the new B.Ed. general English under Tribhuvan University.
Nepal English Language Teachers Association Journal of NELTA, 16(1-2), 59-69.
Mrmol, A. (2011). Vocabulary input in classroom materials: Two EFL coursebooks used
in Spanish schools by Gema. RESLA, 24, 9-28.
Milton, J. (2009). Measuring second language vocabulary acquisition. Bristol: Multilingual
Matters.
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: Cambridge
University Press.
Qian, D. D. (2002). Investigating the relationship between vocabulary knowledge and
academic reading performance: An assessment perspective. Language Learning,
52(3), 513-536.
Read, J. (2000). Assessing vocabulary. Cambridge: Cambridge University Press.
Schmitt, N. and C. B. Zimmerman. (2002). Derivative word forms: What do learners
know? TESOL Quarterly, 36(2), 145-171.
South Africa Department of Basic Education Curriculum and Assessment Policy Statement
(CAPS). (2012). Foundation Phase First Additional Language. Pretoria: Government
Printer.
Stubbs, M. (2002). Words and phrases: Corpus studies of lexical semantics. Oxford:
Blackwell Publishing.

151

On Gendered Styles and their Socio-Cognitive Foundations


Mara Jos Serrano
Universidad de La Laguna
Miguel A. Aijn Oliva
Universidad de Salamanca
Bioprofiles:
Mara Jos Serrano is a Full Professor at the University of La Laguna (Spain). Her research
areas are syntactic variation, sociolinguistics, pragmatics and cognitive linguistics. Some of
her recent publications are: Sociolingstica (2011, Ediciones del Serbal) and Variacin
variable (2011 ed., Crculo Rojo). She has also published articles in Spanish in Context,
Language Sciences, Language & Communication and Folia Lingistica, among others.
Miguel A. Aijn Oliva is an Associate Professor at the University of Salamanca (Spain). His
research interests include variation in Spanish morphosyntax and the development of styles
from pragmatic, sociolinguistic and cognitive viewpoints. His recent work has been published
in journals like Language Sciences, Language and Communication and Folia Linguistica.
Abstract
The dynamic construction of gender as a set of communicative values is subject to evergrowing interest amidst the social sciences. The main purpose of this investigation is to
outline a theoretical and analytical frame that reconciles both the quantitative and qualitative
perspectives on language and gender. A view is developed of the statistical patterning of
linguistic usage as reflecting the meaningful use of linguistic elements in local contexts. The
adequacy of such an approach is subsequently tested through the analysis of syntactic choices
in male vs. female speakers. Syntactic variants are not synonymous, but entail particular
discursive and cognitive meanings, and thus may contribute to the shaping of different
communicative styles.
The syntactic phenomenon chosen for the study is the expression vs. omission of
Spanish pronoun subjects in spontaneous conversation and media discourse. Three different
forms are separately studied: yo I, nosotros we and t you (sing.), and their notional
peculiarities are taken into account. In all cases, quantitative analyses certify frequential
differences in the distribution of expression vs. omission across male and female discourse.
More specifically, men display significantly higher rates of expressed subject pronouns, while
women are more inclined towards omission. Statistical calculations are then complemented
by the contextual observation of some gendered stylistic values that seem to be brought into
play through syntactic choice. A relationship is suggested between gendered styles and the
discursive-cognitive continuum from objectivity to subjectivity, this being reflected on a wide
range of communicative possibilities.
152

Keywords: gender, syntactic variation, pronouns, style

1. Gender as Style1
The study of the relationships between sex/gender and communication is an ever-developing
area of social science research with already quite a long history behind, and one that currently
offers some of the most promising prospects for sociolinguistics. Far from traditional clichs
and prejudices on the subject, a fair deal of consensus has been reached regarding the fact that
linguistic-communicative usage is usually less conditioned by biological sexual factors than
by psychosocial ones (see Eckert 1989). Sex/gender needs to be analyzed within socially and
situationally contextualized approaches, observing how identities are constructed and
reformulated through linguistic choice.
Dialectological and sociolinguistic studies conducted in diverse human communities
have long pointed out differences in the communicative norms followed by men vs. women.
Most investigations have focused on the supposed peculiarities of female behaviour, thus
more or less implicitly certifying male speech as the unmarked or standard variety. The result
of such an orientation for linguistic research has been more thorough knowledge of womens
discourse and of the social contexts and practices across which it is developed (Coates 2003:
3; Edwards 2009: 146). However, this tendency is also being counterbalanced by the
appearance of more investigations specifically devoted to male self-presentation and
socialization through speech (Coates 2003; Jordan-Jackson and Davis 2005; Kiesling 2005).
The truth is that each gender group is repeatedly found to follow partly different
interactional patterns that, in our view, may well be the manifestation of different basic sociocommunicative styles. The view of gendered linguistic usage as a matter of style makes it
possible to move beyond the reactive orientations promoted by classic sociolinguistic
quantitative research, which inevitably lead to somewhat fixed and static conceptions of
gender just as any other social ascription (Bell 1999: 524). Seeking the balance point
between the conditionings imposed on gender by societal structures on one hand, and speaker
agency and creative elaboration on the other, seems to offer the most realistic and potentially
fruitful path at the present state of knowledge.
This paper is part of the research project Los estilos de comunicacin y sus bases cognitivas en el
estudio de la variacin sintctica en espaol (FFI2009-07181/FILO), funded by the Spanish
Ministerio de Ciencia e Innovacin.
1

153

Style can be understood as any system of (linguistic and other) meaningful choices
that helps someone shape some (social, professional, emotional, ...) self-image; this being
perceived by the speaker as optimal for the achievement of certain interactional goals in a
particular context. The making of styles needs to be found and analyzed within real discourse,
where it may be feasible to describe the relevant circumstances of the situation, the social
features of the participants, and how they all interact with creative linguistic choice (Aijn
Oliva and Serrano 2013: 11-45; Serrano and Aijn Oliva 2011: 139). Sociolinguistic meaning
does not arise from extralinguistic factors, but from the joint action of linguistic and any
other semiotic choices across symbolic communicative acts (Coupland 2007: 3).
The application of these principles to the study of language and gender naturally
results in a view of masculinity and femininity as sets of values that are partly received from
social structure, but that can and need to be continuously elaborated in interaction. Speakers
are not male or female once and for all, nor do they need to be just one or the other.
Rather, they can choose the extent to which they want to associate themselves with some
gender label and even what the labels themselves might imply in a certain context ; their
stylistic work will aim to shape a corresponding self-image towards others. In this paper, we
will conduct an analysis of stylistic choice and the configuration of gender as a socio-semiotic
category, with regard to a phenomenon of Spanish morphosyntax: variable formal expression
of first- and second-person clause subjects.

2. Subject Variation in Spanish: Discursive and Cognitive Interpretation


The approach to linguistic choice as stylistic work outlined in the preceding section also quite
naturally allows for a view of variation in morphosyntax and the lexicon as inherently
meaningful, not just socially and pragmatically, but even at the semantic and cognitive levels.
A linguistic structure mirrors the structure of human cognition; it is shaped by the human
perception of the surrounding world just as it helps shape it (Croft and Cruse 2004; Langacker
2009). This, in turn, leads to the assumption that the meaning of a construction will never be
exactly the same as that of seemingly synonymous alternatives, a tenet put forward by, for
example, construction grammar (cf. Goldberg 2003) and other related theoretical approaches.
The relevance of these views for the rethinking and refinement of the analysis of
linguistic variation can hardly be overstated. Following such reasoning, the cognitive
properties of morphosyntactic choices should be at the base of any usage patterns and
tendencies they might reveal. In fact, we believe the conjunction of so-called internal
meaning with social and situational features is what engenders socio-communicative styles;
154

that is, it creates systems of meaning affecting all possible levels of communicative choice
(Aijn Oliva and Serrano 2010a: 9; Serrano and Aijn Oliva 2011: 142).
Variation between the expression and omission of subject pronouns in languages such
as Spanish is one among many syntactic phenomena that lend themselves to style
construction. A relationship between pronoun usage and meaningful social factors such as
gender can thus be hypothesized and scientifically tested. It will be our task to ascertain
whether there are statistical differences in subject expression according to speaker gender, as
has been found in many other facts of linguistic variation. But, more importantly, if this is the
case, we will try to advance some explanation of such statistical patterning, by investigating
which semiotic facets of gender seem to be conveyed through syntactic choice in particular
discursive genres and contexts, and how this can be related to the meanings inherently linked
to grammatical forms.
In order to do this, we will first examine whether the syntactic phenomenon under
study is, in fact, the carrier of meaning differences at internal linguistic levels. As can be
inferred from a number of previous studies (Delbecque 2005; Siewierska 2004), variation
between the expression and omission of subject pronouns seems to be a formal reflection of
the degree of cognitive salience achieved by discursively encoded entities. When the referent
of a clause subject is under the attention focus, and can thus be considered salient or
accessible, its formulation tends to be perceived as unnecessary for the communicative
purposes of the speaker (Langacker 2009: 112). This is particularly evident in languages with
a relatively rich inflectional morphology such as Spanish, where the identity of clause
subjects can easily be tracked through verb agreement morphemes (cant-o I sing, cant-as
you (s.) sing, and so on), which, in fact, makes subject omission the unmarked choice in
most discourse types (Serrano 2013: 276-281).
At the same time, discourse-oriented studies on subject variation such as the ones
cited above have often explained subject expression through informativeness, understood as
the degree of mental processing required by textual elements, given their newness or
unpredictability for participants (Beaugrande and Dressler 1997: 201). Informativeness is not
unrelated to salience but could, rather, be considered a textual correlate of it, albeit an
inversely proportional one; in general, the most salient entities are also the less informational
ones, due to their very accessibility and continuity across discourse stretches. Both salience
and informativeness should be conceived of as gradual magnitudes that are largely dependent
on the particular context, the relationship between the participants and other factors. Their
existence confirms the notion that different syntactic forms such as subject expression and
155

omission can hardly be seen as synonymous they represent different views of non-linguistic
situations encoded through linguistic means (Serrano 2013: 284-288).
The analysis of subject pronouns, given their deictic nature and their power to endow
real-world entities with different degrees of cognitive salience within discourse, suggests that
their choice is a formal manifestation of abstract cognitive dimensions underlying speech, and
particularly of the continuum between objectivity and subjectivity. The latter is the tendency
of discourse and perception to revolve around subjects (mainly human participants, these
being the entities most frequently encoded as clause subjects in conversational speech and
other discourse types), while objectivity would imply the converse orientation towards nonparticipants: third-person human and non-human entities. There is, in fact, a significant
amount of evidence pointing to objectivity-subjectivity as a very powerful notion for the
theoretical explanation of linguistic variation and style construction (Aijn Oliva and Serrano
2013, Kerbrat-Orecchioni 1980, Kristiansen 2008). In the present study, we will try to
elucidate whether this may bear some relationship to the shaping of gendered identities
through syntactic choice.

3. Corpora and Methodology


Two corpora of European Spanish were analyzed for the present research. The first one is the
Corpus Conversacional del Espaol de Canarias (CCEC), which comprises a series of
transcribed oral interactions among Canary Island speakers in different communicative
situations, basically divided into two types: spontaneous conversations and talk shows
broadcast on regional TV. Both gender groups and a variety of speaker social ascriptions are
sufficiently represented across the texts. The second corpus under analysis is the Corpus de
Lenguaje de los Medios de Comunicacin de Salamanca (MEDIASA), devoted to
representing the media discourse of a Spanish Peninsular central town. It incorporates not just
oral but also written texts which are taken from local newspapers. Oral materials come from
the transcription of radio broadcasts pertaining to different media genres, and in which
socially and professionally heterogeneous speakers take part. Together, both corpora make it
possible to observe and analyze a fairly wide range of contexts and interactions.
As discussed in the preceding section, variation between the omission and expression
of subjects is a matter of semiotic choice whereby different meanings are communicated
through different syntactic configurations. Now it is necessary to specify the empirical
methodology required to analyze the projection of semiotic choice in syntactic form and its
communicative repercussions.
156

In this sense, the calculation of descriptive frequencies, that is, of the percentages of
one variant against the other, is the most basic tool for the quantitative assessment of
variation. This is referred to as relative variables. However, the consideration of syntactic
options as meaningful options by themselves and not in opposition to other variants suggests
the incorporation of a complementary statistical method that can, in some way, better suit this
conception of linguistic variability. This we shall refer to as the absolute variable
methodology (Aijn Oliva and Serrano 2012: 80-94). It is based on the assumption that any
form-meaning pairing is contextually chosen for its own value and not just as opposed to any
other options. Consequently, aside from assessing its frequency against those of its alleged
alternatives, it may be interesting to calculate it in overall terms according to an independent
measure, such as word number. In our case, this means assuming that the total frequencies of,
e.g., expressed subjects across some text, group of speakers, etc. can be scientifically
revealing in itself and irrespective of their relationship to omitted-subject rates. Thus, a
frequency index of each form per 10,000 words will be used to clarify the tendencies
suggested by percentage data.
Now it must be acknowledged that statistical patterns, useful and revealing as they
may be, would make little sense if they had no relationship to the actual instances of
communication they emerge from. We believe there is an essential connection between the
quantitative and the qualitative sides of sociolinguistic variation; one that has been generally
neglected, but that is indispensable for the future construction of a general theory. In the case
of our study, the conjunction of statistical and interactional findings seems particularly crucial
if we aim to explain communicative styles as the contextual construction of identities by men
and women.
Our analysis and discussion of syntactic variation and its stylistic implications for the
notion of gender will be divided into the next three sections, each one focusing on a different
subject pronoun: yo I, nosotros we and t you (singular).
4. The First-Person Singular yo I in (yo) creo I think Constructions
In general terms, yo is the most frequent subject in Spanish clauses. Its statistical dominance
can be taken as a formal reflection of the general egocentric orientation of human language
(Keysar 2007, Serrano 2014), even if its occurrence rates are obviously quite variable
depending on the context and discourse type, usually becoming higher in contentious or
persuasive speech. The argumentative potential of first-person subjects is particularly obvious
in the context of verbal lexemes acting as indicators of modality, among which creer to
157

think seems to be the most frequent one in Spanish discourse. This is why our present
analysis will be restricted to the construction (yo) creo and its basic usage patterns.
Qualitative contextual analysis suggests that formal expression of the subject (yo creo
or else creo yo) represents the paradigmatic case of the aforesaid association of the structure
with personal opinion and argumentation, as seen in example (1), regarding the procedure that
should be followed in a Carnival competition. The speaker emphasizes the personal nature of
her stance.

[Female]:
(1)

La gente que vena de la Pennsula no saba valorar un traje\yo creo que las personas
famosas\que vienen aqu al Carnaval\deberan de ser invitados\yo creo que poda haber un|||un
jurado ms especfico sobre el tema que estamos tratando\ (CCEC Conv<MaTe09>)
People who used to come from Peninsular Spain were not apt to evaluate a costume. I think
famous people taking part in the Carnival competition should be specifically chosen. I think
there should be a jury composed of experts on the topic were dealing with.

On the other hand, omission of the subject ( creo) tends to be preferred for the presentation
of contents as hypothetical or as having a more general and less personal scope. In (2), the
speaker is expressing what she believes to be a mere possibility rather than a personal
position. That is, the omission of yo seems to displace potentially contentious discourse
towards objectivity.

[A: Male, B: Female]:


(2)

A: Yo por lo que he ledo en prensa\tengo la idea de que tu madre||dej escrito algo


B: No sabemos\ creo que fue algo sobre un dinero que le deba\para que se le pagara\
(CCEC Conv<MaTe09>)
A: Based on what I read in the newspapers, [I] believe your mother left something written.
B: We dont really know. [I] think it might have to do with some money she owed and wanted
to be returned.

The variability and its discursive repercussions are explainable through the higher salience
and accessibility of omitted subjects. Avoiding overt self-indexation, the speaker builds a
more objective self-image that can be perceived as advantageous in contexts such as that of
158

(2). It is interesting to point out the fact that (yo) creo is one of the rare Spanish constructions
in which expression of the first-person subject is altogether more frequent than its omission,
as discussed in previous works (see Aijn Oliva and Serrano 2010b). This suggests that its
basic function is that of indexing the speaker in discourse, rather than strictly introducing a
belief or opinion, as the verb lexeme would indicate.
If just the overall frequencies of (yo) creo are calculated, whether with expressed or
omitted yo, we find that its occurrence is notably more usual in male speech. This table shows
that, in the CCEC corpus, men are ahead of women by 6.5 items of (yo) creo per 10,000
words (Table 1).

Table 1 Overall frequency of (yo) creo (expressed and omitted) according to gender
(CCEC media texts)
Gender

Word number

Overall

Frequency index per

occurrences of (yo)

10,000 words

creo
Men

48,035

136

28.3

Women

19,654

43

21.8

In the case of the MEDIASA corpus, the contrast is even sharper, with the scores of men
outweighing those of women by a three-one ratio. That is, male speech seems to be
characterized by a stronger tendency towards discourse modalization through self-indexing
choices such as (yo) creo. However, such a hypothesis needs to be confirmed by analyzing
other facts of grammatical choice in discourse.

Table 2 Overall frequency of (yo) creo (expressed and omitted) according to gender
(MEDIASA corpus)
Gender

Word number

Overall occurrences

Frequency index per

of (yo) creo

10,000 words

Men

177,332

232

13.1

Women

116,288

51

4.4

159

5. The First-Person Plural nosotros we


The referential content of nosotros we is naturally diffuse, making it a highly versatile
pronoun in discourse. However, the higher informational load associated with subject
expression (see Section 2) usually results in a somewhat sharper demarcation of its referential
scope. This often goes together with some intention to detach some group of people, in which
the speaker includes him/herself and may exclude or include others.
Expressed nosotros is typical of discourse characterized by overt argumentation, a
pragmatic function that sometimes makes it useful to suggest the speakers inclusion in a
particular human group (examples 3 and 4).

[Female]:
(3)

Que nosotros hemos tomado decisiones en reuniones \y despus el resto de la gente no est
informada de lo que hay que hacer\ (CCEC Conv<ElEn08>)
We have made decisions in our meetings, but the rest of the people have no way of knowing
what is to be done.

[Male]:
(4)

Nosotros slo pedimos que se cumplan los compromisos que estaban acordados. (MEDIASA
<Ent-Ad-131104-17>)
We are only asking for the commitments agreed on to be fulfilled.

Omission is fostered by a high degree of subject salience in the context; but, due to the
peculiar discursive projections of nosotros, it is also often related to referentially vague uses
in which the first-person plural indexes a general community or performs a merely discursive
function. These usually promote a universal interpretation of the content. Omitted nosotros
helps move attention away from particular human subjects and place the interest of discourse
on objects being talked about in other words, it enhances objectivity. In (5), the content is
presented as relating to any human being and not just a definite group, while in (6) the form
digamos lets say basically acts as a discourse marker.

160

[Male]:
(5)

Los muertos nos permiten comprender la vida que hemos construido y a su travs
entramos en la razn de ser de lo que hemos sido y hecho. (MEDIASA <Art-Ga-0511045c>)
The dead help us understand the life [we] have built, and it is through them that [we] discover
the raison dtre of all that [we] have been and done.

[Male]:
(6)

pretende: / por un lado / e: sacar:: / es:cenas en las que se muestra: / e: digamos: / la


barbarie entre comillas: / de: los republicanos: / Y: lo buenos: que eran / tambin entre
comillas: / los nacionales (MEDIASA <Inf-SE-180603-14:10>)
It is his intention to capture scenes showing, [we] say, the so-called barbarity of the Spanish
Republicans, as well as the supposed goodness of the Nationalists.

From a cognitive viewpoint, any use of nosotros can be described as an extension of the first
person towards a larger group. Thus, whenever the first-person plural perspective is adopted,
the speaker will be included in some way, even if just in a metaphorical sense. But, crucially,
his/her personal sphere will be extended to include others as well.

Salience and

informativeness can account for the observed variation, thus contribute to shape
communicative styles oriented to subjectivity or to objectivity.
The results from the CCEC corpus are clearly indicative of gender differences:
Omitted nosotros as an expressive choice is, in fact, much more usual in womens
conversational speech. The objective presentation of facts and ideas through subject omission
would thus seem to be a trait more typical of female communicative styles, placing them
away from the pole of subjectivity (Table 3).

In this respect, inclusion against exclusion of the audience in the scope of nosotros appears as
particularly significant, even if it will not be possible to investigate the subject in this paper.

161

Table 3 Overall frequency of omitted nosotros according to gender


(CCEC conversational texts)
Gender

Word number

Overall occurrences

Frequency index per

of omitted nosotros

10,000 words

Men

27,867

37

13.2

Women

51,677

168

32.5

6. The Second-Person Singular t you


As it is the case with nosotros, matters of referential variability are important to the discursive
and pragmatic study of t. The most significant fact in this respect is the existence of nonspecific uses of the pronoun, whereby some particular content can be presented as more
general; in fact, this is a possibility shared with English and other languages. The discursive
effect of generalization and objectivity is achieved by iconically associating the content of the
utterance with the hearer, even if deixis is hardly literal in this case. The switch from first to
second person seems to characterize the utterance as having a broader scope; this use of the
second person could, thus, be termed objectivizing t (Serrano and Aijn Oliva 2012). Its
basic communicative motivations are notorious whenever it is clear that the speaker is
drawing on personal experience, as can be perceived in excerpts (7) and (8).

[Female]:
(7)

No es que tu hijo o tu hija tengan hijos\ es que t te conviertes en abuela\ a m eso me parece
ms fuerte\ (CCEC Conv<ElEn08>)
Its not just that your son or daughter may become a parent; you in turn will become a
grandmother, and thats what feels most shocking to me.

[Female]:
(8)

desde luego es en la nica cadena / que se: puede hablar / porque en las otras / cuanto
empiezas a decir algo de esto / te cortan (MEDIASA <Var-Co-230503-12:30>)
This is indeed the only radio station where one can talk freely; in others, whenever [you] start
saying things like these, theyll cut you.

162

Both nosotros and objectivizing t can be seen as discursive-cognitive extensions of yo,


aimed to widen or blur first-person deixis for a variety of communicative goals. As is the case
with referentially deictic instances of t, formal expression in its nonspecific use is variable.
Whenever personal circumstances or positions are attributed to a second-person subject, they
seem to move beyond their particular notional sphere and acquire a more general value,
relieving the speaker from direct responsibility, and promoting discursive and cognitive
objectivity.
According to the scores in both corpora, it is somewhat more frequent for female
speech to carry out a transition from the first to the second person. That is, women are slightly
more inclined towards the indexation of their interactional partner (t) in discourse, iconically
involving him/her in the content discussed. This could be interpreted as a quantitative
reflection of the collaborative or supportive orientation often attributed to female speech in
gender studies (e.g., Johnstone, Ferrara and Bean 1992: 150; Maltz and Borker 2011: 488)
(Table 4, Table 5).

Table 4 Overall frequency of objectivizing t according to gender


(CCEC conversational texts)
Gender

Word number

Overall occurrences

Frequency index per

of objectivizing t

10,000 words

Male

27,867

17

6.1

Female

51,677

38

7.3

Table 5 Overall frequency of objectivizing t according to gender (MEDIASA texts)


Gender

Word number

Overall occurrences

Frequency index per

of objectivizing t

10,000 words

Male

177,332

105

5.9

Female

116,288

75

6.4

163

However, the analysis of t begs for further elaboration of the objectivity-subjectivity


continuum as an abstract dimension explaining pronoun usage and style construction.
Whereas the choice of second-person pronouns is itself related to the realm of subjectivity, in
a given context it may, in fact, be intended to downplay the more prototypical subjectivity
conveyed by the first person. Even so, in general terms we can once again point out some
preference of women for the variants conveying objectivity and a lesser tendency to impose
personal views on discourse, favouring agreement and collaboration, instead. Our analysis
has shown that such orientation does not surface only in general discursive and interactional
strategies, but also in local grammatical facts such as subject choice and formulation.

7. Conclusions
In the present study, we have analyzed the statistical variation and some interactional
projections of the expression vs. omission of three Spanish subject pronouns; we hypothesize
that the syntactic variants under study might constitute formal-semantic choices helping the
development of communicative styles. More specifically, such choices might be associated
with the interactional construction of sex/gender as a stylistic category.
Our results seem to largely confirm the hypotheses assumed, as well as support and
explain certain previous findings on male vs. female ways of communicating, particularly
those regarding the supposed collaborative orientation of female speech. The notion that
women tend to favour interactional co-operation and agreement, while men orient themselves
more clearly towards self-expression and imposition is widespread in gender studies. But we
have also tried to offer a cognitive explanation to such social variability. This can be
condensed in the abstract continuum between objectivity and subjectivity, understood as a
dimension conditioning all levels of form and meaning. In this sense, the analysis suggests
that female speech is particularly inclined to syntactic choices promoting objectivity or,
perhaps more precisely, downplaying subjectivity , whereas the opposite tendency seems to
characterize male communicative styles.
Our positive conclusions on the connection between pronoun usage and gendered
identities are not meant to imply that such usage is perceived as anything like a gender
marker in Spanish-speaking communities, but rather that it is one among the variety of
semiotic resources used for the (sometimes quite subtle) construction of gender in interaction.
A line of research like the one outlined here should further incorporate other meaningful
linguistic and communicative phenomena, as well as refine the analysis of interactional
contexts, in order to achieve a more realistic picture of the ways male and female identities
164

are contextually shaped, and of the cognitive orientations towards reality underlying such
identities.
This should probably start from the joint consideration of the whole paradigm of
grammatical persons, each of which can be seen as embodying a different perspective along
the subjectivity-objectivity continuum. For example, the singular first person can be viewed
as signaling the highest degree of subjectivity, while the plural downplays this value by
including the speaker in a wider group. In turn, second and third persons, as well as their
different variants, will promote different perceptions and interpretations of the content of
discourse. If a relationship can be demonstrated between the choice of person as a discursivecognitive perspective and the construction of gender as well as other relevant identity
features, a further step will be achieved towards the theoretical, explanatory model of
sociolinguistic variation that we see as a desirable scientific goal. The handling of general
cognitive notions such as subjectivity in the description and explanation of styles is, in our
view, the key to transcend the peculiarities of the communities and interactional domains
analyzed. In sum, further research from this viewpoint in different settings and languages
should be carried out in order to check the wider validity of the claims put forward here.

References
Aijn Oliva, M. . and M. J. Serrano. (2010a). Las bases cognitivas del estilo lingstico.
Sociolinguistic Studies 4, 115-144.
Aijn Oliva, M. . and M. J. Serrano. (2010b). El hablante en su discurso: Expresin y
omisin del sujeto de creo. Oralia 13, 7-38.
Aijn Oliva, M. . and M. J. Serrano. (2012). Towards a comprehensive view of variation in
language: The absolute variable. Language & Communication 32: 80-94.
Aijn Oliva, M. . and M. J. Serrano. (2013). Style in syntax: Investigating variation in
Spanish pronoun subjects. Bern: Peter Lang.
Beaugrande, R. A. and W.

Dressler. (1997). Introduccin a la lingstica del texto.

Barcelona: Ariel.
Bell, A. (1999). Styling the other to define the self: A study in New Zealand identity making.
Journal of Sociolinguistics 3, 523-541.
Coates, J. (2003). Men talk. Oxford: Blackwell.
Coupland, N. (2007). Style: Language variation and identity. Cambridge: Cambridge
University Press.
165

Croft, W. and D. A. Cruse. (2004). Cognitive linguistics. Cambridge: Cambridge University


Press.
Delbecque, N. (2005). El anlisis de corpus al servicio de la gramtica cognoscitiva: Hacia
una interpretacin de la alternancia lineal SV / VS. In G. Knauer and V. Bellosta von
Colbe (Eds.), Variacin sintctica en espaol: Un reto para las teoras de la sintaxis
(pp. 51-74). Tbingen: Niemeyer.
Eckert, P. (1989). The whole woman: Sex and gender differences in variation. Language
Variation and Change 1, 245-267.
Edwards, J. (2009). Language and identity: An introduction. Cambridge: Cambridge
University Press.
Goldberg, A. E. (2003). Constructions: A new theoretical approach to language. Trends in
Cognitive Sciences 7, 219-224.
Johnstone, B., K. Ferrara and J. M. Bean. (1992). Gender, politeness, and discourse
management in same-sex and cross-sex opinion-poll interviews. Journal of
Pragmatics 18, 145-170.
Jordan-Jackson, F. F. and K. A. Davis. (2005). Men talk: An exploratory study of
communication patterns and communication apprehension of black and white males.
Journal of Mens Studies 13, 347-367.
Kerbrat-Orecchioni, C. (1980). La enunciacin: De la subjetividad en el lenguaje. Buenos
Aires: Hachette.
Kristiansen, G. (2008). Style shifting and shifting styles: A socio-cognitive approach to lectal
variation. In G. Kristiansen and R. Dirven (Eds.), Cognitive sociolinguistics:
Language variation, cultural models, social systems (pp. 45-88). Berlin: Mouton de
Gruyter.
Keysar, B. (2007). Communication and miscommunication: The role of egocentric processes.
Intercultural Pragmatics 4, 71-84.
Kiesling, S. F. (2005). Homosocial desire in mens talk: Balancing and re-creating cultural
discourses of masculinity. Language in Society 34, 695-726.
Langacker, R. W. (2009). Investigations in cognitive grammar. Berlin: Mouton de Gruyter.
Maltz, D. N. and R. A. Borker. (2011). A cultural approach to male-female
miscommunication. In J. Coates and P. Pichler (Eds.), Language and gender: A
reader (pp.487-502). Oxford: Wiley-Blackwell.
Serrano, M. J. (2013) De la cognicin al texto: El efecto de la prominencia cognitiva y la
informatividad discursiva en el estudio de la variacin de los sujetos pronominales.
166

Estudios de Lingstica de la Universidad de Alicante 27, 275-29


Serrano, M. J. (in press). El sujeto y la subjetividad: Variacin del pronombre yo en gneros
textuales del Espaol de Canarias. Revista Signos: Estudios de Lingstica, 47, 85.
Serrano, M. J. and M. . Aijn Oliva. (2011). Syntactic variation and communicative style.
Language Sciences 33, 138-153.
Serrano, M. J. and M. . Aijn Oliva. (2012). Cuando t eres yo: La inespecificidad
referencial de t como objetivacin del discurso. Nueva Revista de Filologa
Hispnica 60(2), 541-563.
Siewierska, A. (2004). On the discourse basis of person agreement. In T. Virtanen (Ed.),
Approaches to cognition through text and discourse (pp.33-48). Berlin: Mouton de
Gruyter.

167

You might also like