Professional Documents
Culture Documents
Li
ngui
sti
cs
J
ournal
VOLUME8I
SSUE12014
I
SSN17381460
July 2014
Volume 8 Issue 1
Table of Contents:
Foreword by Biljana ubrovi
1-3
Research Articles
1. Dina Awad
Diverse Acquisition Patterns
4 - 29
2. Ibrahim M. R. Al-Shaer
The Use of Third-Person Pronouns by Native and Non-Native Speakers of English
30 - 59
3. Napasri Timyam
An Analysis of Learner Use of Argument Structure Constructions: A Case of Thai
Learners Using the Passive and Existential Constructions in English
60 - 89
90 -114
Research Notes
5. Ming Wei
Code-Switching in a Virtual English Community in China: An International Perspective
115 - 135
6. Jabulani Sibanda
Interrogating Current Conceptualisations of Word for Word Knowledge Studies:
Challenges and Prospects.
136 - 151
152 - 167
Foreword
This years edition of the journal comprises seven articles: four full research articles and three
research notes. Thanks are extended primarily to the authors who have contributed to this
edition, and the Associate Editors, reviewers, and the production team under Dr. Erin Carrie
for their efforts in preparing the papers for publication. This last year has been unique for the
journal in terms of the significant changes affecting the Editorial Board, a healthy volume of
submissions and a large number of new reviewers together with a brand new production team
who have joined the journal in their new roles. Congratulations must be extended to all the
new editors, who have become part of the team recently and have already proved to be
dependable, constructive and highly professional. Special thanks go to John Adamson, who
has moved on to a sister journal but has helped me take over the position of the Chief Editor
he has successfully held for many years and helped with all my questions and concerns since
January 2014.
The first contribution, entitled "Diverse Acquisition Patterns" by Dina Awad,
elaborates on second language acquisition issues, featuring one of the most problematic areas
of English grammar - the articles as used by native speakers of Arabic. Awad's original study
of the acquisition of the definite and indefinite articles in SLA shows that the developmental
patterns of the two articles are divergent in both accuracy rates and error types, and that they
cannot be easily predicted because their acquisition is influenced by multiple and diverse
factors, such as proficiency level, first language, task-type and the processing demands of
each linguistic feature. The next research article, contributed by Ibrahim M. R. Al-Shaer, is
"The Use of Third-Person Pronouns by Native and Non-Native Speakers of English",
especially in the context of pronoun-antecedent agreement, an area where it proves difficult to
draw the line between standard and non-standard usage. Similar to Dina Awad's study of the
acquisition of articles by Arabic non-native speakers of English, Al-Shaer looks into the
differences in the use of pronouns. The results of the study show that most native speakers
choose third-person pronouns depending on the socio-cultural context and pragmatic factors,
bending the formal rule of pronoun-antecedent agreement, especially when dealing with
gender-unspecified words. However, the majority of non-native speakers show an inclination
to follow prescriptive grammar rules, due to the absence of social and cultural sensitivity
evidenced in English as L2. Napasri Timyam's study, entitled "An Analysis of Learner Use of
Argument Structure Constructions: A Case of Thai Learners Using the Passive and
Existential Constructions in English", focuses on the aforementioned two types of common
1
constructions in English with the aim of discovering the deviations in terms of their general
characteristics in the written English of non-native speakers of the Thai language background.
The results reveal that Thai learners constructions differ from the prevalent native speaker
norms in that they are much more limited in terms of structural complexity, semantic and
pragmatic functions. In the last paper in the research article section, "Social Class and
Language Structure: A Methodological Inquiry into Bernstein's Theory of Sociology of
Education", Mohammad Aliakbari, Mahmoud Qaracholloo and Ali Mansouri Nejad explore
the manifestations and credibility of Bernstein's Language Codes Theory in an Iranian context
so as to check whether there are any significant differences between working- and middleclass Iranian native speakers in the domain of linguistic patterns usage. Even though
Bernstein's view of the relationship between language and social class has been largely
disputed, Aliakbari and colleagues provide some evidence supporting the manifestations of
the two dichotomous language codes: restricted code (lower strata of society) and elaborated
code (higher socioeconomic class of language users).
Three additional research notes are presented in the next section of this edition. The
first article, entitled "Code-Switching in a Virtual English Community in China: An
International Perspective and written by Ming Wei, looks into the concept of code-switching
as used in chat rooms. The study examines how code-switching negotiates social and
interactional meanings in virtual conversations as conducted by Chinese speakers of English,
as well as how it contributes to the creation of an authentic, slightly adapted context of social
interaction between interlocutors. Speakers tend to adjust their choice of code as well the
degree of code-switching, both of which are firmly entrenched in the social distance and face
management in synchronous conversations, as well as how manipulation of code
interpretation and selection was achieved in the virtual English community. Jabulani
Sibandas paper, "Interrogating Current Conceptualisations of Word for Word Knowledge
Studies: Challenges and Prospects", questions the efficacy of the conceptualisation of the
construct word represented by different terms, token, type, lemma, and word family,
as units of measurement of the English lexicon as seen in the vocabulary expansion of South
African learners of English. Jabulani points out that an implementation of an extension of
Nation and Bauers (1983) levels of word family membership, through an association of
inflected and derived forms with base words, seems a desirable proposition in second
language acquisition studies. Last but not least, the concluding research note, entitled "On
Gendered Styles and their Socio-Cognitive Foundations", is written by Mara Jos Serrano
and Miguel A. Aijn Oliva. The main purpose of their investigation is to outline a theoretical
2
and analytical framework that reconciles the quantitative and qualitative perspectives on
language and gender as used by male vs. female speakers of European Spanish. The authors
develop a view of the statistical patterning of linguistic usage which reflects the meaningful
use of linguistic elements in local contexts.
We hope you find the articles in the 2014 edition of the journal interesting. Your own
submissions and feedback are always welcome, and we look forward to receiving them.
Biljana ubrovi, Ph.D.
Chief Editor
Abstract
Acquiring a second language is a complex and nonlinear process in which learner hypotheses
and production constantly change and evolve towards the target language. In order to find out
more about developmental patterns in SLA, we examined the L2 use of English articles in the
free composition of students in the United Arab Emirates, all of whom are L1 speakers of
Arabic. The participants were grouped into three proficiency levels (PL) according to the
Oxford Placement Test (OPT) to assimilate diachronic progression. It was expected that
learners performance on both articles would improve with higher competence. However, by
comparing accuracy and error rates across the three groups, we found that the articles a(n)
and the develop not only independently of each other but could sometimes progress in
diverse directions. The most influential factors that contributed to determining the final
outcome were the non-existence of a one-to-one form-function relation between the two
English articles, the dissimilarity between L1/L2 representations of definiteness and number,
and learners competence levels.
Introduction
English articles have always been difficult for second language learners regardless of their
first language, persisting into advanced levels. Notorious as one of the most difficult features
of English to be learned or taught (Kaluza 1963, Brown 1973, Dulay et al. 1982, Pica 1983,
Master 1990, inter alia), misuse of articles ranks highest among L2 learners errors (Covitt
1976, cited in Celce-Murcia and Larsen-Freeman 1999, Richards and Simpson 1974). Sharma
(2005) established that article errors account for 60.37% of the total number of errors
4
committed by L1 Indian learners of English, while Thu (2005) found that article errors
constituted 31.5% of all other errors made by L1 Vietnamese learners. Thus, articles represent
an area of considerable prominence in any error analysis since, as traditionally believed,
performance regarding articles reflects overall linguistic competence (Oller and Redding
1971: 85). Later, researchers such as Lightfoot (1998) suggested that learners performance
on articles does not necessarily reflect their PLs. Bataineh (2005) found that senior Jordanian
learners overused the indefinite article more frequently than lower ability learners.
Research on SLA of English articles has shown that articles develop at different rates
(Chaudron and Parker 1990, Kellerman 1977, inter alia) due to the differences in the meaning
and function of each article. While the function of the definite article in English is to signal
that a particular entity in a limited context is uniquely identified by the interlocutors in a
particular pragmatic setting (Hawkins 1978, Lyons 1999), its absence is sufficient to indicate
indefiniteness, such as the case with plural and uncountable nouns. This leaves the indefinite
article primarily with a cardinality function assigned only to singular indefinite contexts.
Therefore, the disparity could arise from the fact that there is no one-to-one relationship
between the two.
In addition to the point that the two articles develop independently of each other, what
is proposed in this paper is that this nonlinearity can, under certain conditions, culminate in a
progression in different directions. The purpose of this study is to draw attention to the
complexity of article acquisition in L2 and to alert educators that progression in L2 does not
always correlate positively with performance, as advanced learners can make more errors, in
certain contexts, than weaker ones before finally improving. This pattern has been often
described as U-shaped development (See Kellerman 1977, Master 1997, Haznedar 2001), but
the consistent article errors even for advanced level L2 users undermine this proposition.
Literature Review
The two articles in English have not been reported to be acquired at the same time nor follow
the same route of development in SLA. Studies show that each article is produced and
mastered at variable stages and to incur different error types at different PLs. Several criteria,
such as difference in function, L1 grammar and task type determine the L2 development map
of each article separately.
Except for two known studies (Leung 2001, Young 1996), most researchers seem to
agree that mastering the definite article precedes that of the indefinite (Hakuta 1976, Huebner
5
1983, Master 1987, Thomas 1989, Yamada and Matsuura 1982). The rationale is that
definiteness, as a semantic concept, is at least encoded before indefiniteness (Chaudron and
Parker 1990) which involves grammatical notions of number and countability. This position
is corroborated by findings from many studies (e.g., Hamdallah 1988, Kharma and Hajjaj
1989, Maalej 2004). It is therefore noticeable that better performance on the definite article at
earlier stages is a common occurrence in the SLA process.
From a transfer perspective, the absence of articles in the L1 impedes the L2
acquisition and vice-versa (Ringbom 1987, Goad and White 2004). Despite the fact that
Arabic is considered a language with definiteness grammaticalised (+ART), there is no
explicit marker of indefiniteness. Suffix accents, or nunation (Smith 2001), sometimes mark
indefinite nouns, but their presence is optional and largely limited to classic, formal and
written registers. The indefinite NP in This is a big house, for example, can be expressed
formally where explicit markers appear as suffixes (1), or informally (2) without markers.
(1)
Haatha
bait-un
kabeer-un
Learners could transfer the semantic notions from Arabic in which the absence of definite
marking in a NP is a sufficient indication to its indefiniteness status. This principle, however,
is not entirely exclusive to Arabic. Leech contends that it is convenient, from many points of
view, to regard an initial determiner as obligatory for English noun phrases, so that the
absence of an article is itself a mark of indefiniteness. (1992: 15).
Studies on the production of learners whose L1s lack formal representation of articles
(ART) suggest that the failure to supply articles persists onto advanced stages (Thomas
1989, Master 1997, Trenki 2002, Ekiert 2004). Zdorenko and Paradis (2008) recorded more
omissions in the L2 production of Korean, Chinese and Japanese (ART) learners of English
than in the production of Spanish, Romanian and Arab (+ART) learners. High omission rates
of the indefinite article observed in Arab learners production is a typical occurrence of what
6
Eckman (1977) describes as the most difficult aspect to acquire in the target language, namely
the production of elements which are not present in the L1 but marked in L2. Tsimplis
(2003) conviction that the absence of features in the L1 causes syntactic representations in L2
production to become defective applies to the difficulties which Arab learners encounter.
SLA researchers, such as Hawkins and Chan (1997) and Prvost and White (2000),
ascribed the difficulty that second language learners (2LL) have in the employment of a
feature that does not exist in their L1 to a failure in mapping functional features present in the
L2 (FFFH) onto their production of the target language. With L1 transfer most operative at
weaker PLs (Odlin 1989, Sharma 2005, Slabovka 2000, Snape 2005) better performance is
expected on the definite than the indefinite article.
Previous research has provided evidence for the tendency of Arab learners to overuse
the definite article across indefinite contexts (cf. Bataineh 2005, Kharma 1981, Maalej 2004).
This error was attributed to two different sources. One group of researchers (e.g., Al-Fotih
2003, Diab 1996, Habash 1982, Kharma and Hajjaj 1989) ascribes the error to the negative
transfer of the definite article norms in Arabic. While the definite marker in Arabic is used to
generalise as well as to identify, rendering all generic references grammatically definite
(Hawas 1989, Kremers 2003), non-referential NPs in English are largely left unmarked1 as
native speakers most favourable option (Behrens 2005). The fact that the definite article
tends to be overused in non-specific contexts while the indefinite is expected to be
underrepresented can cause a gap in the development of the two articles.
Other researchers, including those whose data was collected from free production,
such as Abi Samra (2003) and Bataineh (2005), believe that the-flooding tendency is a
universal (IL) phenomenon; a stage that all L2 learners go through regardless of their L1. In a
study on university students in the Arab Emirates, Crompton (2011) contends that the most
common error is the overuse of the definite article in generic contexts. Therefore, overuse of
the is expected in indefinite plural/uncountable contexts especially at lower PLs.
The higher overuse rate of the at lower PLs is not by any means exclusive to Arab
L1 learners. Similar findings were reported in SLA studies on other L1s, including languages
that possess or lack a formal representation of articles (Huebner 1983, Nagata et al. 2005,
Thomas 1989, Young 1996). Masters (1987) study of Japanese learners, for example, found
that the definite article was flooded into indefinite contexts although Japanese does not
possess an article system.
Test type can also influence article choice causing inconsistency in production and
accuracy/error rates. Research findings suggest that free writing tasks yield higher accuracy
7
rates than controlled cloze tests. Dulay et al. (1982) argue that errors in form-focused tests
occur when formally learned rules have not yet become part of the learners linguistic
competence, i.e. learners need time to practice their explicitly learned L2 rules in order to
produce grammatically appropriate forms in free production. This is largely attributed to
avoidance strategies that are available to learners in production-based tasks (Kharma and
Hajjaj 1989, Mizuno 1985, Tarone and Parrish 1988). Accordingly, learners resort to other
determiners such as quantifiers and demonstratives to reduce the risk of committing errors in
article use. In this case, when given the choice, the definite article presents a safer option
since it collapses elements of countability and number, which endanger the grammatical
accuracy of the NP. Furthermore, the is already available in learners subconscious and
easily automated in free production, while the indefinite article, learned mostly through
explicit instruction, is more accessible in tasks that draw on metalinguistic information such
as cloze tests. With communicating meaning being the primary goal in a free production task,
learners attention might not be fully directed towards form causing the production and
accuracy rates of the indefinite article to be relatively low.
Advocates of teaching articles (e.g., Master 1997) in the EFL/ESL classroom propose
that informing learners of explicit rules can eventually lead to automated use, i.e., for a
learner to know how a feature operates, precedes, and leads to, the voluntary application of
these rules in communicative settings (DeKeyser 2003, Doughty 2003, Ellis 2001, VanPatten
1994), i.e., more time is needed for this declarative knowledge to become internally
proceduralised and voluntarily produced in meaningful output. Therefore, participants could
have achieved different results had the task been form-focused.
Method
Participants
Sixty undergraduate students from different colleges in the UAE University, United Arab
Emirates, volunteered to participate in this study. Each participant was given a reference
number. A background uniformity survey was conducted to ensure unanimity of first
language, Arabic, while participants who had studied in English medium schools or lived in
an English speaking country for more than three months were excluded.
Levels
OPT Scores
Range
Groups
Beginner
0-17
0-30
Elementary
18-29
Lower Intermediate
30-39
31-45
Upper Intermediate
40-47
Advanced
48-54
46-60
Very Advanced
54-60
sample of L2 data in non-test situations (Skehan 1989). Therefore, the outcomes of this study
might not resemble those obtained from cloze tests.
Observed by teachers, the participants were given one hour to write. Time pressure
adds a processing constraint on participants to prevent conscious contemplation of the forms
produced (cf. Robinson 1996, Sorace 1996).
10
Ref.
place to work
has relatives
in the world
Definite
Correct
Omission
1
1
1
It has mountains
11
fresh air
12
13
different kinds of
14
fruit and
15
Vegetable
16
tourists
Singular
0
Countable
Correct
NP
a(n)
Overuse
aw
Omission
11
Correct
The
Overuse
No.
NP description
Student name
Safety
18
Quietness
19
and purity
20
21
22
1
1
17
1
1
11
Two speakers of English as a first language volunteered to review the datasheets to ensure the
reliability of the coding. Some expressions were marked as (grammatically) correct, although
more target-like constructions would have been preferable.
Learner data
(10) As a conclusion
in conclusion
Peoples houses
To calculate accuracy rates, the number of correct supplies was divided by the total sum of
NP environments in which articles should have appeared.
%
Total number of obligatory contexts
Outcomes were measured in percentages to allow comparisons across groups with varying
numbers of participants and unequal obligatory contexts. In principle, the formula used in
calculating errors was similar to the one used for accuracy, i.e., the observed instances were
compared against the total number of contexts where such occurrences were expected to
appear. For example, to calculate the percentage of overuse of a(n), the following equation
was used:
%
Total number of [Def] [Count] NPs
A similar method was followed to examine the occurrence of the indefinite article in plural
contexts, simply by changing the [Count] contexts into [Sing] ones. The overuse rates of
the definite article were calculated by dividing the total number of overuse instances in
learner data by the total number of indefinite NPs in a given group.
The omission rates of a(n) were obtained by comparing the total number of
obligatory contexts; i.e. [Def] [+Count] [+ Sing] NPs, against observed instances. The same
was used for definite article omissions.
12
Replacement errors had to be calculated in a manner that would make the two articles
more comparable since it is grammatically acceptable for the definite article to replace the
indefinite while the reverse is not always possible. Therefore, only [+Sing] [+Count] nouns
were selected as constants for both articles leaving definiteness as the only dependent variable
that determines the appropriate choice of either article.
%
Total number of [+Def] NPs
A similar calculation was used to examine the error of replacing the indefinite article with the
definite.
Finally, to ensure that there is consistency within the responses of each group, the
following analysis was performed.
Range
1st
Median
quartile
3rd
quartile
G1
19
18-30
27
30
32
G2
20
31-45
35
37
40
G3
17
46-53
45
47
49
There was also sufficient cross-group difference to justify the categorisation. In order to
measure cross-group variance, we used a non-paired, two-tailed t-test assuming equal
variance with 95% confidence, comparing two groups at a time. Cross group differences were
statistically significant as is shown in Table 4.
13
G1
CI at 95%
19
124.07
7.89
G1 v G2 <0.0001
G2
20
159.1
G1 v G3
<0.0001
8.26
G2 v G3 0.0057
G3
17
177.57
8.57
Results
Accuracy
G1 Learners employed the definite article correctly 150 times in 199 obligatory contexts,
while a(n) was correctly supplied 64 times in 106 indefinite singular contexts. The
significant difference (p=0.0079) strongly suggests that Arab learners initially perform better
on the definite than the indefinite article.
G2 achieved higher accuracy rates on both articles. The gap between the accuracy rates of the
two articles was smaller. However, the error pattern remains in line with that detected in G1s
production as the accuracy rates of the definite article (84%) remained significantly higher
than those of the indefinite (p=0.0395).
G3 The highest accuracy rates were, as expected, achieved by more advanced learners.
Unlike the results from the two lower groups; there was little difference in the accuracy rates
of the definite and indefinite articles. However, G3 performed better on the indefinite (89%)
than the definite (86%) article.
14
The results show sustained improvement in learners performance on both articles yet the
progress on the indefinite article was more noticeable and consistent, correlating positively
with PLs with significant rates scored across PLs. On the other hand, the difference between
G2 and G3s accuracy rates of the definite article was not significant (p=0.3742) as is shown
in Table 5.
Table 5 Accuracy rates of both articles compared across groups
G1
Correct
the
Correct
a/an
19
150/199
75
G1 v G2
64/106
%
60
0.0276
G2
17
245/293
84
G2 v G3
12
191/221
86
G1 v G3
0.0039
G1 v G2
0.0277
86/116
73
0.3742
G3
G2 v G3
0.0080
57/64
89
G1 v G3
<0.0001
15
Diverse acquisition patterns can be detected as the highest scores shift from being achieved
on one article (the) to the other (a) with PL progression. The diagram in figure 2 further
illustrates this trend.
Omission
G1 This group omitted the definite article in 48 obligatory instances, which is 24% of all
definite contexts. The failure to supply a(n) with indefinite singular countable nouns was
the most noticeable difficulty in the lower groups performance as the omission of the
indefinite article was higher than all other errors. The omission of the indefinite article was
the highest of all grammatical errors recorded in G1s production (44%). G1 omitted the
indefinite article 42 times in 106 contexts. In percentages: weaker learners failed to supply
a(n) 40% of the time with singular indefinite NPs.
G2 Although there were fewer omissions by this group than by the weaker group (p=0.0476),
G2s performance was similar to that of G1 as intermediate PL participants omitted the
indefinite article more frequently than the definite article. The omission of a(n) constituted
34% of all grammatical errors made by G2. They omitted the indefinite article 30 times in
116 indefinite singular NP contexts (26%), a significantly higher rate than that of the definite
article (17%).
16
G3 Omission rates of the indefinite article seem to have decreased regularly and significantly
as PLs improve. However, it was interesting to find that G3 participants made more
omissions of the definite than the indefinite article. The omission rate of the definite article
was 12% while the rates of a(n) omission were only 11%.
Overuse
G1 The results obtained from the weaker groups production reveal that indefinite nouns were
unconventionally preceded by the definite form 52 times in 385 possible contexts (14%). All
of these instances were non-referential plural/uncountable contexts. Compared to the overuse
of the indefinite article which was lower than 2%, the overuse of the definite article was
significantly higher (p<0.0001). The overuse rates of the definite article were considerably
more frequent than the total sum of ungrammatical supply of a(n) in plural/uncountable
constructions and in contexts where the definite article should have appeared.
G2 The recorded overuse rates of the definite article dropped down to 10% (47 out of 461
indefinite contexts) in the production intermediate group, with most instances observed in
generic, non-referential, contexts as is the case in learners L1. The ungrammatical supply of
a(n) with plural and uncountable nouns did not exceed 2.3% which means that the disparity
17
between the overuse rates of the two articles was smaller than the rates emerging from the
weaker groups performance.
G3 The most noticeable improvement in learners production was the significant and
systematic drop in the overuse rate of the definite article with improved L2 competence. The
definite article was overused in 14 of 236 indefinite NP environments, which reduces the rate
to only 6%. However, the advanced groups overuse rates of the indefinite article were
slightly higher than those of the two weaker groups as shown in Table 6.
Table 6 Overuse rates
p
the
a(n)
G1
19
52/385
13.51
6/279
<0.0001
G2
17
47/461
10.2
8/345
<0.0001
G3
12
14/236
5.93
4/172
0.0603
the : a
From the table above, it is noticeable that while the overuse of the definite article falls
sharply, the indefinite article is over supplied and flooded. Figure 4 illustrates the contrast in
error trends.
Replacement
The phenomenon of diverse acquisition patterns is most evident in replacement errors.
Replacement errors constituted 59% of all errors committed in the test; a considerable rate
compared to the total sum of all other errors (41%).
G1 In analysing data entries, it was evident that the definite article was the preferred option
especially for weaker learners, as it replaced the indefinite article in many [+Count] [+ Sing]
contexts. This group overused the definite article to replace the indefinite four times as often
as they did the opposite. The definite article replaced the indefinite in only one instance out
of 111 possible replacement contexts.
G2 The intermediate group made fewer replacement errors. The improvement is also noticed
in the fact that the gap between the two replacement rates has decreased. G2 participants used
the to replace a(n) twice as often as replacing the definite. This can be a form of
improvement compared to the four-fold ratio observed in the production of G1. However,
despite the improvement, intermediate learners still preferred to substitute the indefinite
article with the definite rather than the reverse while supplying a(n) instead of the
increased from 0.9% to 1.2%.
G3 At a later learning stage, the higher groups replacement rates became very close, i.e. the
difference between the rates of replacing the-for-a were almost equal to those of replacing afor-the with the indefinite article preferred. A summary of the above results is presented in
Table 7.
Groups
the for
a(n)
a(n) for
the
G1
19
5/106
4.7
1/111
0.9
G2
17
4/116
3.4
2/161
1.2
G3
12
2/64
3.1
4/116
3.4
19
The inclination to substitute a(n) with the was reduced with improving PLs while the
production of the indefinite article in definite contexts increased steadily.
The error map of replacement in the learners data is most reflective of diverse
acquisition patterns. This is perhaps clearer in the presentation in Figure 5.
Discussion
Accuracy
The accuracy rates of the weaker group were higher than those reported by studies on learners
of ART L1s (c.f. Butler 2002, Ekiert 2004, Master 1997, Trenki 2002), which confirms
propositions of stronger L1 influence at lower L2 levels. This can be construed as positive
transfer of L1 semantic properties to the L2 as both languages concord on most conditions for
obligatory supply. The lower accuracy scores of the indefinite article resulting from little
production or erroneous use also suggest stronger L1 influence at earlier stages. G1 learners
seem not to have internalised the rules governing the use of the indefinite article to
automatically supply it where necessary. It is not surprising G2 learners performed better on
the definite article despite the improvement in PL since this type of test better reflects implicit
knowledge in which the representation of a feature with a semantic equivalent in the
participants L1 is more accessible than the indefinite article which is not readily available in
20
the learners subconscious knowledge and perhaps requires direct prompts to activate the
newly learned L2 form. G3s higher PL is reflected in the accuracy rates of the indefinite
article, approaching those of the definite and exceeding them. Although the difference
between the accuracy rates of the two articles in G3 is small and statistically insignificant, it
strongly indicates a change of trend (see Figure 1). Thus, we can assume that with stronger
L2 ability, learners mastery of the two articles becomes more compatible.
Omission
With focus on expressing thoughts and describing locations and attractions and the lack of
prompting in the rubric to the purpose of the test, it is expected that this type of test would
accrue a high number of omission instances. This lends support to Granfeldts (2000)
observation that accuracy will decrease if learners attentional resources (Bialystok and Ryan
1985) are channelled towards goals other than accuracy.
G1 participants omission rates of the indefinite article were significantly higher than
those of the definite. The failure to provide the indefinite article can also be driven by
learners assumption that its absence does not constitute a hindrance to successful
communication of ideas. It is likely that weaker learners have subconsciously applied the
Economy Principle (Poulisse 1997) whereby maximal comprehensibility is achieved while
exerting minimal processing effort. G2 learners might have also found it redundant to mark
nominals overtly for indefiniteness if their [-DEF] value is readily inferred by the absence of
the definite marker. However, lower omission rates suggest that G2 learners have become
more aware of the conditions of indefinite article employment while beginning to realise the
limitations of the definite article to specific environments rather than its generalising function
in Arabic. Since free composition better reflects subconscious knowledge, lower omissions
and higher production of a(n) indicate that G3 learners command of the indefinite article
has become more internalised to be produced spontaneously in communicative output.
Although the disparity in the omission rates of the two articles was not significant in
G3s results, the switch in tendency is quite clear. While the weaker and intermediate learners
omitted the indefinite article more frequently than the definite, the advanced group were more
aware of the necessity to provide a(n) and at the same time reduce the provision of the
definite even in obligatory contexts. This is consistent with the findings of researchers such as
Chaudron and Parker (1990), Cziko (1986), Ekiert (2004) and Habuto (2000).
21
Overuse
The overuse errors made by the weaker group were lower than originally expected. A
possible rationale for this is that free production tests are known to yield lower overuse rates
(see Tarone and Parrish 1988) since learners were not directed to provide a particular form,
which is known to encourage overuse in cloze tests.3 While the weaker group
overwhelmingly preferred the definite article, this was less noticeable in G2s production.
The decreased difference between the overuse rates of the two articles marks a change in
learners underlying hypotheses on article use and indicates fluctuation characteristic of their
IL stage. This type of overuse is typical of what Richards (1971) refers to as partial
understanding of target language features. The significantly lower overuse rates of the
indefinite article compared to that of the definite in G1 and G2 production may not be entirely
due to learners developed awareness of article use. Instead, it could well be attributed to task
type and L1 transfer.
The increase in overuse errors of a(n) by the advanced group could be interpreted as
a form of regression but it could also be a result of hyper-correction as learners try to avoid
omission errors committed during past learning experience- over applying instructions to
produce a(n) which leads to a flooding stage similar to the one observed in definite article
use. Richards (1976) maintains that failure to observe restrictions of countability and number
in article use may be due to faulty analogy. In many cases, the analogy is derived from
formulaic expressions learned as chunks in existential and have constructions memorised at
earlier stages and incorrectly overgeneralised.
Replacement
The reason underlying the preference of G1 to replace the indefinite article with the definite is
mainly developmental, through flooding and avoidance, but also involves L1 influence in the
absence of an explicit marker of indefiniteness in L1. Although both rates of replacement
errors are considerably low in G2, what emerges at this stage is an obvious change of trend
from that observed in the production of the weaker group. G3s preference of the indefinite
article to replace the definite is probably a result of learners recently increased awareness of
the importance of supplying the indefinite article. Moreover, this result could have been
equally influenced by the receding influence of L1 represented by the drop in the overuse
rates of the before singular indefinite nouns since the use of the definite singular to deliver
generic reference is substantially recurrent in Arabic. Although acceptable in certain
22
expressions in English (e.g., She plays the piano), it is not likely that learners have been
sufficiently exposed to authentic material to the extent that would enable them to detect
similar uses and employ them unprompted. If we suppose that, in marking indefiniteness,
Arabic is an ART language, then G3s understanding of the indefinite article corresponds to
that of Leungs (2001) Japanese (ART) learners who preferred a-for-the more often than
the-for-a.
This suggests that Arab learners experience a mapping problem of a(n) into IL
grammar, which is more in line with the performance of Japanese, Chinese and Korean
learners (ART) rather than the Spanish and Romanian groups in Snape et al.s (2006) and
Zdorenko and Paradiss (2008) studies.
Implications
The results of this study show that second language development is neither homogenous nor
simultaneous. The advancement in one aspect of L2 knowledge does not imply identical level
of achievement in another. Rather, there is evidence for a complex, non-linear and sometimes
inverse progression, guided by multiple factors such as proficiency level, first language, tasktype and the processing demands of each linguistic feature.
The developmental patterns of the two articles are divergent in both accuracy rates and
error types. The learning curve seems to start with higher awareness and a better supply of a
feature which already exists in the L1 (the definite article), but with improved PLs and
reduced L1 influence, the trend gradually shifts towards a better conceptualisation, and
therefore a higher production, of the newly acquired feature (the indefinite article). Error
patterns are also converse. Learners begin by overproviding the definite article in nonreferential contexts, and gradually reduce production until it is undersupplied in obligatory
contexts at later developmental stages. In contrast, the overuse of the indefinite article is
scarce in the production of weaker learners, yet with overall L2 progress, rates exceeded
those of the definite.
A mirror image of the above pattern is observed in omission errors, as high rates of
indefinite article omissions were observed in early stages. With better PLs, the rates fell
considerably. Although the definite article was properly supplied in obligatory contexts at
elementary levels scoring very low omission rates, the error increased in the production of
more able groups leading to higher omissions. A diverse progression map is also detected in
replacement errors as participants started with higher the-for-a rates but ended with greater a23
for-the substitutions. The switch of preferences from the to a(n) reflects the regular and
systematic move from limited, L1 influenced use towards more target-like, internalised
knowledge.
It is worth mentioning that if occurrences of the indefinite article within formulaic
expressions were excluded from our calculations, since they are mostly memorised and not
automatically produced in corresponding contexts, the rates would have been more
contrastive. It is therefore safe to propose that articles develop not only independently from
one another but could also progress in diverse directions.
References
Abi Samra, N. (2003). An analysis of errors in Arabic speakers English writings. American
University of Beirut. Retrieved 25 October, 2005 from
http://abisamra03.tripod.com/nada/languageacq-erroranalysis.html
Al-Fotih, T. A. (2003). Acquisition of the English articles by Arabic-speaking students.
Indian Linguistics, 64, 157-174.
Bataineh, R. F. (2005). Jordanian undergraduate EFL students errors in the use of the
indefinite article. Asian EFL Journal, 7(1), 56-76.
Behrens, L. (2005). Genericity from a cross-linguistic perspective. Linguistics, 43(2), 275
344.
Bialystok, E. and E. B. Ryan. (1985). A metacognitive framework for the development of
first and second language skills. In D. L. Forrest-Pressley, G. E. Mackinnon, and T. G.
Waller (Eds.), Metacognition, cognition, and human performance: Vol. 1. Theoretical
perspectives (pp. 207-252). San Diego, CA: Academic Press.
Butler, Y. G. (2002). Second language learners theories on the use of English articles: An
analysis of the metalinguistic knowledge used by Japanese students in acquiring the
English article system. Studies in Second Language Acquisition, 24(3), 451-480.
Celce-Murcia, M. and D. Larsen-Freeman. (1999). The Grammar Book. Los Gatos: Sky Oaks
Production.
Chaudron, C. and K. Parker. (1990). Discourse markedness and structural markedness: The
acquisition of English noun phrases. Studies in Second Language Acquisition, 12(1), 43
64.
Crompton, P. (2011). Article errors in the English writing of advanced L1 Arabic learners:
The role of transfer. Asian EFL Journal, 50, 4-32.
24
25
Hakuta, K. (1976). A case study of a Japanese child learning English as a second language.
Language Learning, 26, 321-351.
Hamdallah, R. (1988). Syntactic errors in written English: Study of errors made by Arab
students of English. Unpublished doctoral dissertation. Lancaster University, UK.
Hawas, H. M. (1989). The articles in English and Arabic: A contrastive study. Indian Journal
of Applied Linguistics, 15(2), 23-52.
Hawkins, J. A. (1978). Definiteness and indefiniteness. London: Croom Helm.
Hawkins, R. (2004). Explaining full and partial success in the acquisition of second language
grammatical properties. Paper presented at J-SLA, Gunma Prefectural Womens
University, Gunma, Japan.
Hawkins, R. and Y. Chan. (1997). The partial availability of universal grammar in second
language acquisition: The failed functional features hypothesis. Second Language
Research, 13(3), 187226.
Haznedar, B. (2001). The acquisition of the IP system in child L2 English. Studies in Second
Language Acquisition, 23(1), 139.
Huebner, T. (1983). A longitudinal analysis of the acquisition of English. Ann Arbor.
Kellerman, E. (1977). Towards a characterization of the strategies of transfer in second
language learning. Interlanguage Studies Bulletin, 2, 58-145.
Kharma, N. (1981). Analysis of the errors committed by Arab university students in the use
of the English definite/indefinite articles. International Review of Applied Linguistics, 19,
331-345.
Kharma, N. and A. Hajjaj. (1989). Errors in English among Arabic speakers: Analysis and
remedy. London: Longman Group UK Limited.
Kremers, J. M. (2003). The Arabic noun phrase. LOT: The Netherlands.
Larsen-Freeman, D. (1997). Chaos/complexity science and second language acquisition.
Applied Linguistics, 18(2), 141-165.
Leech, G. (1992). Introducing English grammar. London: Penguin.
Lightbown, P. M. and N. Spada. (1999). How Languages are Learned. (2nd ed.). Oxford:
Oxford University Press.
Lightfoot, A. R. (1998). Japanese second-language learners and the English article system: A
study in error analysis. University of Leeds. Retrieved 6 November, 2008 from
http://ardle.net/linguistics.html
26
Liu, D. and J. I. Gleason. (2002). Acquisition of the article the by nonnative speakers of
English: An analysis of four nongeneric uses. Studies in Second Language Acquisition,
24(1), 1-26.
Lyons, C. (1999). Definiteness. Cambridge Textbooks in Linguistics. Cambridge University
Press.
Maalej, Z. (2004). On the misuse of determination in Arab students writing. University of
Manouba-Tunis. Retrieved 18 February, 2006 from www.executivetranslators.com
Master, P. (1987). A cross-linguistic interlanguage analysis of the acquisition of articles.
Unpublished doctoral dissertation. University of California, Los Angeles.
Master, P. (1990). Teaching the English articles as a binary system. TESOL Quarterly, 24,
461478.
Master, P. (1997). The English article system: acquisition, function, and pedagogy. System,
25, 215-232.
Mizuno, H. (1985). A psycholinguistic approach to the article system in English. JACET
Bulletin, 16, 1-29.
Nagata, R., T. Iguchi, K. Wakidera, F. Masui and A. Kawai. (2005). Recognizing article
errors in the writing of Japanese learners of English. Systems and Computers in Japan,
36(7), 54-62.
Oller, J. W. and E. Z. Redding. (1971). Article usage and other language skills. Language
Learning, 21(1), 85-95.
Parrish, B. (1987). A new look at methodologies in the study of article acquisition for learners
of ESL. Language Learning, 37, 361383.
Pica, T. (1983). Adult acquisition of English as a second language under different conditions
of exposure. Language Learning, 33, 465-97.
Poulisse, N. (1997). Some words in defense of the psycholinguistic approach: a response to
Firth and Wagner. The Modern Language Journal, 81(3), 324-328.
Power, T. (2003). Communicative language teaching: The appeal and poverty of
communicative language teaching. Retrieved 15 June, 2007 from
http://www.btinternet.com/~ted.power/esl0404.html
Prvost, P. and L. White. (2000). Missing surface inflection or impairment in second
language acquisition? Evidence from tense and agreement. Second Language Research,
16, 103-133.
27
Raymond, W., J. A. Fisher, and A. F. Healy. (2002). Linguistic knowledge and language
performance in English article variant preference. Language and Cognitive Processes,
17(6), 613662.
Richards, J. C. (1971). A non-contrastive approach to error analysis. English Language
Teaching Journal, 25, 204-19.
Richards, J. C. (1976). The role of vocabulary teaching. TESOL Quarterly, 10(1), 77-89.
Ringbom, H. (1987). The role of the first language in foreign language learning. Clevedon,
UK: Multilingual Matters.
Robinson, P. (1996). Learning simple and complex second language rules under implicit,
incidental, rule-search, and instructed conditions. Studies in Second Language Acquisition,
18(1), 2767.
Sharma D. (2005). Transfer and universals in Indian English article use. Studies in Second
Language Acquisition, 27(4), 535-566.
Skehan, P. (1989). Language testing. Language Teaching, 22, 1-13.
Slabakova, R. (2000). L1 transfer revisited: the L2 acquisition of telicity marking in English
by Spanish and Bulgarian native speakers. Linguistics, 38(4), 739-770.
Smith, B. (2001). Learner English: A teachers guide to interference and other problems.
Cambridge: Cambridge University Press.
Snape, N. (2005). The uses of articles in L2 English by Japanese and Spanish speakers. Paper
submitted to the annual conference on language acquisition. Essex Graduate Student
Papers in Language and Linguistics, 7, (pp. 1-23).
Snape, N., Y. I. Leung and H-C. Ting. (2006). Comparing Chinese, Japanese and Spanish
speakers in L2 English article acquisition: evidence against the fluctuation hypothesis. In
M. Grantham OBrien, C. Shea, and J. Archibald (Eds.), Proceedings of the 8th Generative
Approaches to Second Language Acquisition Conference (pp. 132-139). Somerville, MA:
Cascadilla Proceedings Project.
Sorace, A. (1996). The use of acceptability judgments in second language acquisition
research. In W. Ritchie and T. Bhatia (Eds.), Handbook of Second Language Acquisition
(pp. 375409). San Diego, CA: Academic Press.
Tarone, E. and B. Parrish. (1988). Task-related variation in interlanguage: the case of articles.
Language Learning, 38, 21-44.
Thomas, M. (1989). The acquisition of articles by native and non-native speakers of first and
second language learners. Applied Psycholinguistics, 10, 335-355.
28
Notes
1. Some researchers (e.g., Master 1987) consider bare nouns as marked with a zero article.
2. The number in brackets reflects the students serial number, her PL group (A/B/C), and the
ordinal number of the NP in the essay.
3.
For task type effect on L2 production, see Foster and Skehan (1996).
29
Bioprofile:
Dr Ibrahim Al-Shaer has 23 years of experience in higher education. He spent his first 7 years
of professional experience teaching different English language and linguistics courses at
several universities. He was also the Director of Al-Quds Open University in Bethlehem for
10 years. He is currently the President Assistant for Innovation and Excellence.
Dr Al-Shaer obtained a Bachelor of Arts in English language and a Diploma in secondary
education in 1986 from Bethlehem University. He is a recipient of a 1989 scholarship from
the British Council, to study for a Master of Linguistics for ELT at Lancaster University. Dr
Al-Shaer is also a recipient of a 1998 scholarship from ASAI in conjunction with Al-Quds
Open University to study for a Ph.D. in Applied Linguistics from the University of Reading.
Dr Al-Shaers main research interests are in the fields of psycholinguistics, construction
grammar, semantics, syntax, ELT applications, writing skill, corpus linguistics, e-learning,
innovation, and creativity.
Abstract
This study addresses research questions concerning the use of third-person pronouns by
native and non-native speakers of English. For this purpose, a corpus-based analysis of these
pronouns in naturally-occurring data was carried out, highlighting the different constraints
that cause writers to choose one pronoun over another. Then, thirteen sentences with tricky
third-person pronouns taken from the IBM-Lancaster Associated Press corpus were presented
in writing to two groups of native and non-native speakers of English. The results indicated
that most native speakers choose third-person pronouns depending on the socio-cultural
context and pragmatic factors, showing an inclination to bend the formal rule of pronounantecedent agreement. However, the majority of non-native speakers had a tendency to abide
by the prescriptive rule of pronoun-antecedent agreement, showing little or no sensitivity to
context. The study concluded that pronoun-antecedent agreement has proven to be an area
where it is difficult to draw a line between standard and non-standard usage.
30
Introduction
Traditionally speaking, pronouns are simply defined as words used instead of a noun or a
noun phrase to avoid repetition. Quirk et al. (1985) have defined pronouns in English as
noun-like but differ from nouns in that they have distinct forms in terms of case, person,
number, and gender as opposed to nouns in general. Fromkin et al. (2007) have described
them as substantives whose interpretation depends on syntax and context.
Standard English grammar provides the reader with the prescriptive rule that a
pronoun must agree with its antecedent for person, number, and gender (Kroeger 2005: 138).
When the gender of an antecedent is unspecified, as with student, nurse, everyone, standard
grammar states that the default pronoun to be employed is the masculine one. According to
the Chicago Manual of Style (2010), this approach is no longer acceptable as it is taken to be
outdated and sexist. As such, other approaches are adopted in an attempt to offer a genderneutral resolution, as in (1).
(1)
a.
b.
But some people find repeating his or her throughout a long piece of writing irritating and
others find using plural pronouns in such contexts ungrammatical. For example, Mangan
(2010) has asked for a gender-neutral third-person singular pronoun. Einsohn has even gone
further saying that the newer grammar books recommend using the plural pronoun after an
indefinite subject (2011: 361).
Third-person pronouns are the only class of pronouns which are inherently cohesive,
in that a third-person pronoun form typically refers anaphorically or cataphorically to another
item in the text. For example, first- and second-person forms do not normally refer to the text
at all; their referents are defined by the speaker and hearer speech roles and are normally
interpreted exophorically by reference to the situation. A third-person form implies the
presence of a referent somewhere in the text, and in the absence of such a referent the text
appears incomplete.
Third-person pronouns are very important for the semantic interpretation of texts
because they contribute to cohesion. The concept of cohesion is a semantic one referring to
the relations of meaning that exist within a text that define it as a text (Halliday and Hasan
1976). As such, cohesion is not a structural relation; although cohesion relations could be in
the same sentence, they are not restricted by sentence boundaries. In its most normal form, it
31
is simply the presupposition of something that has been mentioned somewhere in the text
(endophora), whether in the preceding sentences (anaphora) or in the following ones
(cataphora). In addition, third-person pronouns may sometimes co-refer with entities which
cannot be found in the text itself but in the extralinguistic context (exophora) (Quirk et al.
1985).
According to Wilson (1990, cited in Partington 2003), the first-person pronoun we can
be used by politicians in their strategies either inclusively to convey solidarity or exclusively
to stress joint responsibility. Clearly, there is more to pronouns than the simple formal
definition which describes them as words used instead of nouns that must agree with their
referents in gender and number. Pronouns can reflect language users' attitudes and social
orientations. As Curzan has put it:
construction he or she irritates native speakers, or if the plural they is acceptable to refer to
individuals with unknown gender. When the researcher approached a native speaker of
American English for advice, she replied: Who knows exactly what pronoun to use
anymore! This definitely puts a greater burden on non-native teachers who have limited
exposure to English, as non-native learners of English, especially beginners, need explicit
rules to learn the language; otherwise, they will be lost. Given this challenge, the current
study attempts to provide insights on the reality of pronoun agreement and the challenges it
poses to both native and non-native speakers.
Methods
The data used in this study come from two sources, and they are joined together to hopefully
generate powerful insights concerning third-person pronoun-antecedent agreement. The first
source was a collection of examples taken from the IBM-Lancaster Associated Press corpus
33
(A001 A010), consisting of some one million words of tagged 1970s American Press
material. The second source was a survey of native and non-native speakers performance.
Corpus-based analysis
For the purpose of this analysis, the Associated Press was selected for its prestigious character
and the topics dealt with are interesting for international audiences, though they are directed
to the general American public.
In this analysis, the frequency distributions of the various types of cohesive devices
were presented and examined. Then, aspects of usage in the corpora that required the choice
of a given pronoun were identified and described. All examples were manually processed and
systematically classified in order to identify the environments in which the writer chose one
pronoun over another.
Survey of native and non-native speakers performance
Instrument
The second source of data is a survey of 40 native and 40 non-native speakers usage of
pronoun-antecedent agreement in a selected set of sentences mostly taken from the
Associated Press corpus. As shown in the Appendix, the survey consists of two parts. In the
first part, the participants were asked to fill in each blank space with an appropriate thirdperson pronoun to complete the sentence, and in the second part they were asked to mark
their preferred choice either a passive construction with the third-person singular neuter it
used as its subject or an active construction.
Participants
The survey involved two groups of participants. The first consisted of native speakers with no
background in linguistics. Since the tested materials were assumed to be so basic and
universal that they could be generalized beyond the given sample, the non-probability
sampling, or snowball sampling was used. Snowballing allowed for locating information-rich
key informants. The first wave of participants were given a selection criterion (e.g., age,
gender, and no background in linguistics) that helped randomize the sampling process; they
were also asked to recommend for the second wave potential participants who lived the
farthest away. This sampling was not a stand-alone tool; it was just a way of selecting
participants and then the survey was conducted.
34
The 40 native participants were selected from the US states of Kansas and Missouri.
The median age was 35 and ages ranged from 18 to 65. The second group consisted of nonnative speaker participants. They were third-year English majors studying at Al-Quds Open
University who are native speakers of Arabic. Their median age was 24, and ages ranged
from 18 to 36 years.
All participants were instructed to base their responses solely on their immediate
reactions, without worrying too much about any rules they might have learnt about so-called
correct English. Respondents needed approximately ten minutes to complete the survey.
Findings
Corpus-based analysis
The main concern of this paper is the use of third-person pronouns as cohesive elements.
However, the existence of other cohesive devices in the corpus affects the frequency
distribution of these pronouns. Therefore, perhaps giving a sense of the incidence of such
cohesive devices as compared to the referential functions of third-person pronouns would be
useful to get the feel of their functioning (Table 1). In this respect, Halliday has said:
Number
Percentage
Total
Lexical Devices
Reference
Specific to General
96
15%
General to Specific
25
4%
Repetition
Substitution
Ellipsis
Cataphoric
364
153
5
8
57%
23%
1%
1.5%
Anaphoric
576
91%
Undecided
47
7.5%
1274
100%
643 = 50.5%
631 = 49.5%
Grand Total
1274
35
Starting with repetition, the data show that the use of this device has an important role in
journalistic language. In the data, there are 364 instances of repetition out of a grand total of
1274 different cohesive elements (see Table 1). In many texts, some nouns or noun phrases
are continuously repeated many times, though it is possible to use other cohesive devices in
the same places. For instance, in A001 127/ 128/ 129/ 130, the noun phrase the offender is
repeated four times. This phenomenon has one possible interpretation: the writer might have
found it safer to avoid the dilemma of choosing a pronoun appropriate to the situation.
Another case of repetition appears when the text is condensed with many nouns and
noun phrases. For instance, in A 008 43-50, which is a very short sports report about
basketball, repetition is the only cohesive device used throughout the whole text. A potential
reason for this is that as there are nine noun phrases (teams and players), the writer had to
repeat every noun to avoid confusion on the part of the reader.
The next partially lexical cohesive device used in the data is substitution. What
distinguishes this device from pronominal substitution is that it operates on both the syntactic
and the semantic levels. In other words, grammatically speaking, the substitution element has
to match its referent in terms of syntactic features (especially word class); and lexically, the
substitution element helps produce more coherent texts and solves the problem of intensive
use of other types of cohesive device.
As shown in Table 1 above, lexical substitution is the next most frequent cohesive
device after repetition. Of the 1274 cohesive devices identified, 274 of these are substitution
cases (general to specific = 25, specific to general = 96, synonyms or others = 153).
To begin with the first sub-class general to specific, in A 004 1 (A man opened fire
with a 22 caliber rifle ), then in A 004 3 (The man was subdued by bar patrons..), and
finally in A 004 5 (A deputy at El Paso County Jail said Barry Chvarak 21) the writer
starts the report by a general noun accompanied by the indefinite article a, then continues
with the same noun but with definite article the, to replace it finally by the man's name (a man
the man Barry Chvarak).
The second substitution sub-class is from specific to general. For example, in A 001
45 (A dormitory fire at the University of Northern Colorado that sent hundreds of students
scurrying from the building on Saturday), this long noun phrase modified by the relative
clause is used to describe in a very specific way the word fire, which is replaced later by
another synonym (blaze) in A 001 47 ( just after the blaze was discovered about 3.00 a.m.
).
36
The third sub-division of lexical substitution is the use of synonyms. In A 005 52-55
(The pact approved by the Association's executive I think this contract is a major step
forward Under the agreement, the players would receive an increase), here is a good
example of a series of synonyms (the pact this contract the agreement) to refer to the
same idea.
The last sub-class is lexical ellipsis. Halliday and Hasan (1976: 142) have argued that
the starting point of the discussion of ellipsis can be the familiar notion that it is something
left unsaid is used in the special sense of going without saying.
In the data, only 5 cases of lexical ellipsis have been identified. For instance, in A 005
102-103 (Police said the impact split the car in halfOne of the passengers, Kern Jones of
Cushing, remained in critical condition), the phrase One of the passengers could have been
written in a more explicit way (e.g., One of the cars passengers), however, the omission of
the word car, in the researchers opinion, does not change the meaning or cause any
confusion because the semantic relation between the words passenger and car is very strong
in this context.
Last but not least, an important characteristic of the data in this study is the use of
referential chains which are produced by the combination of lexical cohesion (repetition and
synonyms) and reference. A typical referential chain in the data can be found in A 004 1-21,
in which the noun man (A 004 1) is gradually replaced by the man (A 004 8), Chvarak (A
004 8), he (A 004 12), the guy and his brother (A 004 13), he (A 004 14) he and his vehicle
(A 004 15), the suspect (A 004 21).
As for the referential functions of third-person pronouns, as presented in Table 1
above, despite the significance of other reference tools and lexical devices in journalistic
discourse, the study reveals the predominance of anaphoric reference in the data. In A 001
A 010, out of 631 referential occurrences of the third-person pronouns, there are 576
anaphoric cases. Taking one of these cases in A 009 10 (The Whalers have been playing their
home games in the Springfield...), the possessive pronoun their refers anaphorically to the
noun Whalers.
With regard to cataphoric reference, in A 003 19 Although it has expressed support
for holding the Games, the German Olympic committee has taken no final stand , the
pronoun it cataphorically refers to the noun phrase the German Olympic committee which is
introduced later in the text.
Table 1 above indicates the low occurrence of cataphora in the journalistic texts; only
8 of the 631 referential uses of third-person pronouns are cataphoric. This is not surprising as
37
account the tastes and needs of those who may get irritated by the intensive use of repetition,
confusion, ellipsis, or even pronominal reference and difficult synonyms. This is the device
known as elegant variation (Fowler 1965). On the other hand, they avoid the tricky pronouns
(e.g., cataphora, generic masculine he, and plural they referring to singular referents with
unspecified gender), and use, for example, repetition or substitution which proved to be
the most frequently used devices in the data. These techniques spare their readers potential
ambiguity or complexity.
The analysis above reveals that the traditional prescriptive rule that antecedents must
take gender- and number-matched pronouns is not highly respected. In the data, the plural
their is used to refer to the singular antecedent anyone in one case and to the people of
Canada in another. The pronoun he is used instead of the pronoun it to co-refer with the
animal horse. Moreover, the third-person singular neuter it is used as a subject of a passive
construction where the more straightforward active construction could have been used. These
interesting cases and others break the prescriptive requirements for the use of third-person
singular pronouns. Clearly, much still remains to be done to clarify how this affects native
and non-native speakers usage.
Survey of native and non-native speakers performance
This section offers a comparison of native and non-native speakers use of third-person
pronouns in problematic sentences taken from the Associated Press corpus and other sources.
In the light of what prescriptive grammarians say concerning particular points of
pronoun-antecedent agreement and the journalists choices of third-person pronouns within
the given environments, the performance of native speakers as compared with that of nonnatives on the usage of 13 sentences will be examined. It should be noted that Figures 1-10
employ these abbreviations: (NNs = non-natives; Ns = natives; F =female; M= male; O = old;
Y = young; S = sentence).
speakers chose his/her; 13 went for the plural their; and only 11 chose his to refer to the
genderless word student. Although native speakers proved to be equally divided between
using their or his/her, they showed readiness to bend the traditional rule, although more of the
older native speakers opted for abiding by the formal rule.
Interestingly, although this sentence would be grammatical as Every student must
bring books to class, it is surprising that the null option did not show up in the native
performance. This can be attributed to the way it was presented to informants in which the
sentence had an indicated gap after bring, and were told to write a third-person pronoun in
the blank.
4
6
3
his
his/her
their
O
2
Y
4
2
NNs
10
9
8
7
6
5
4
3
2
1
0
Y
M
Ns
S1
Equally important, the results revealed that the majority of non-native speakers (26 out of 40)
abided by the prescriptive rule and chose the masculine pronoun his. This suggests that native
speakers are not conscious of or do not follow any systematic criterion while using their
language, and non-native speakers need well-defined rules to follow.
40
Sentence 2: A child learns to speak the language of environment. (Quirk et al. 1985:
316)
According to Quirk et al. (1972: 360), words like child are exceptionally referred to by the
pronoun its. According to the survey results, very few native speakers (only 4 out of 40) used
the pronoun its to refer to child. Almost half of them (19 of 40) chose his/her.
Native speakers were not keen to make a gender distinction and use the coordinate
pronouns his or her or the plural their to refer to the noun child. However, one fourth of the
respondents (11 out of 40) made a gender distinction in favor of the masculine pronoun his.
Perhaps parents tend to refer to their baby with personal reference, and those without children
may prefer to use non-personal reference. Quirk et al. (1985: 316) have described them as
emotionally unrelated to the child.
1
0
his
1
0
1
0
1
0
his/her
2
0
their
its
O
NNs
10
9
8
7
6
5
4
3
2
1
0
Y
M
Ns
S2
Half of the non-native speakers chose its based on what they learn in their EFL
classes, and nearly the other half opted for his mainly because this pronoun in Arabic has a
generic reference.
An interesting result here has to do with the clear interaction between age and gender
in the native speakers performance regarding the use of his/her. Younger males use it twice
as much as older males, but older and younger females use it most frequently.
41
Sentence 3: Ridden by jockey Aki Kato, Tally Ho the Fox, scored second consecutive
stakes win.
According to Quirk et al. (1985), the pronoun it is mainly used to refer to lower animals. In
the data, there is an exceptional occurrence which does not lend itself to the formal rule. In
this sentence, the pronoun his is used to refer to the horse Tally Ho the Fox instead of the
pronoun it. If the horse is viewed as a non-personal entity, it is mainly referred to by the
neutral pronoun it. But, according to Quirk et al. (1985), people express male/female gender
distinctions with higher animals.
In this case, in which syntactic and lexicogrammatical rules do not seem to be in
operation, readers would not have been able to understand what Tally Ho the Fox was, if the
immediate environment had not provided them with the information ridden by jockey Aki
Kato (A004 45). The particular choice of the masculine he to refer to a horse may depend
on a number of variables, primarily the speakers relation to the species in question, but also
on her/his individual preference for pronoun usage. If the horse was not male, then it may be
explained in terms of human-like behavior of the horse which scores just like humans do. One
may add an additional factor encouraging the use of he or she: the horse was named.
Animals are mainly referred to with non-personal gender pronouns (it, its, itself).
However, Quirk et al. (1985: 314) have asserted that persons are not only human beings, but
may also include supernatural beings and higher animals.
3
5
9
8
6
5
0
10
its
his/her
his
4
2
1
0
NNs
Y
M
Ns
S3
42
The majority of native speakers (23 out of 40) opted for the masculine he to refer to the word
horse. This is not surprising in racing contexts or with pets. However, a striking finding with
age and gender is that most of younger females went for his/her.
Most non-native speakers (26 out of 40) went for the non-personal pronoun its to refer
to the horse. Almost one third of them (14 out of 40) chose the pronoun his, taking the horse
as male, and none chose his or her.
According to Quirk et al. (1985), since English lacks gender-neutral third-person
singular pronouns, the plural they represents an alternative to using the masculine pronoun he
in reference to mixed-gender groups or persons of unknown gender.
Sentence 4: When the average person walks into a bank, looks over brochures in the
lobby.
In Figure 4, the results show that almost half of the native speakers, mostly the young group,
chose he or she (18 out of 40). Only one fourth of the native speakers (10 out of 40) went for
the singular they as an alternative to the masculine generic pronoun he.
0
1
10
9
7
6
they
he/she
5
5
4
3
he
2
1
1
0
NNs
Y
M
Ns
S4
This result is consistent with findings of a study by Madson and Hessling (2001) in American
readers' perceptions of four alternatives to masculine generic pronoun in which the
respondents rated the they version as lowest in overall quality. However, this is inconsistent
43
with Johnson's (2004) claim that many English speakers prefer the singular they, and
proposes, based on evidence, endorsement of the singular they rather than other alternative
strategies. In the researchers analysis, the form of the verb looks makes the alternative they
ungrammatical, otherwise it would have been used more. Not surprisingly, perhaps, 10 older
native speakers went for he, as compared to only 2 young taking he as their choice. This
reflects the gap between the old and young cohorts.
As for non-native speakers, the majority of them fell back on what they learnt in their
EFL classes (30 out of 40) and chose he as a pronoun referring to the antecedent average
person.
Sentence 5: It was a singular act of courage on the part of Canada to spirit out of Iran a
group of diplomats who were not even own citizens.
This sentence begins with the prop it the most neutral and semantically unmarked of the
personal pronouns. The prop it in this sentence appears to function as an empty theme
(Quirk et al. 1985). This prop it is followed by the verb to be and a construction which
makes it natural to achieve focus on the item that follows: in effect, end focus within an SVC
clause (Quirk et al. 1985: 1384). This equals extraposition of subject clauses.
The observation here has to do with the writers apparent violation of number
concord, that is, her/his choice of the plural pronoun their to refer to a singular entity Canada.
This choice does have a purpose if explained within the politeness framework.
According to Brown and Levinson (1987: 180), plurality signifies respect
throughout the pronominal paradigm of reference. Likewise, Lin has argued that the idea of
plural is naturally and historically connected with power (1988: 159-160). It is also believed
that plurality is a very old and ubiquitous metaphor for power, the earliest instance of which
has been used to address the emperors of Rome in the 4th century (Brown and Gilman:
1960). Obviously, the writer takes Canada as a plural to show collectivity (as a nation
consisting of millions of people).
Surprisingly, the pronoun her, which represents another alternative, didnt show up
in the native speakers performance. As shown in Figure 5, the data illustrate that the high
majority of native speakers (36) and non-native speakers (37) chose the pronoun its. Although
the contextual information surrounding the antecedent Canada in sentence (5) presents it as
political entity, the respondents immediate reactions portray Canada as a geographical entity
(i.e., inanimate).
44
10
10
10
their
its
O
NNs
10
9
8
7
6
5
4
3
2
1
0
Y
M
Ns
S5
Sentence 6: I dont think anyone would approve of having children attend classes in
this setting.
Figure (6) shows that the majority of non-native speakers (27 out of 40) chose the masculine
pronoun he to refer to the non-specific referent anyone. Native speakers, nonetheless, were
less observant of prescriptive rules; 12 of them chose the plural they. 12 chose the coordinate
construction he or she; 12 of them voted for ones, and only 10 old speakers chose his.
Regarding native speakers' performance, the results support Holmes (1998)
conclusion, after conducting an analysis of generic pronouns in New Zealand that 80% of
non-specific referents, such as anyone, are referred to by they. However, the results show that
almost half of non-native speakers stick to the traditional rule and chose his rather than their.
By using their as gender-free pronoun, the majority of native participants in this
survey appeared to be socially-sensitive to avoid gender bias. This is consistent with Mair and
Leechs conclusion that an ideological motivation (avoidance of sexual inequality) [can be a
reason, among others] for replacing an older pronoun usage by a newer one (2006: 336). In
addition, big differences were observed in the survey in the natives performance between
older and younger females regarding this point.
45
3
5
his
his/her
ones
0
2
1
0
1
1
0
1
3
2
0
1
their
O
NNs
10
9
8
7
6
5
4
3
2
1
0
Y
M
Ns
S6
In this sentence, the plural pronoun their is used in defiance of strict number
concord in co-reference to the indefinite pronoun anyone. This violation appears to have a
different interpretation from the one mentioned above. The reporter, here, may have used the
plural as a convenient means of avoiding the traditional use of the third person masculine he,
as the syntactically unmarked form. Since the gender of the indefinite pronoun anyone is
unspecified, the writer chooses their in order to avoid possible attacks from those who view
the use of the generic he as a kind of sexual bias in language. In addition, by the choice of
their to co-refer with anyone, s/he also avoids being vulnerable to the objection of seeming
to have a male orientation (Greenbaum et al. 1990: 451). The choice of their seems to be
governed by more contingent, context-dependent pragmatic as well as social orientations.
Sentence 7: The hairdresser turned down the offer and returned inside.
The overwhelming majority of non-native speakers (37 out of 40) chose the feminine pronoun
she to refer to the antecedent hairdresser, and almost half of the native speakers (18 out of
40) chose she as well. This is by no means surprising since the default inference of
hairdresser in some communities is female.
46
1
0
0
2
10
7
6
10
10
he
4
3
she
2
he\she
0
NNs
Ns
S7
Sentence 8: The blacksmith remained silent and refused to leave the coach.
As shown in Figure (8), the overall results show that the antecedent word blacksmith is
treated as male and referred to as he.
0
1
0
1
1
0
2
2
0
2
he/she
10
10
2
1
3
3
2
7
she
he
O
NNs
10
9
8
7
6
5
4
3
2
1
0
Y
M
Ns
S8
47
The overwhelming majority of non-native speakers (38 out of 40) chose the masculine
pronoun he to refer to the antecedent blacksmith, and half of the native speakers (20 out of
40) chose it as well. This is by no means surprising since the default inference of blacksmith
in some communities is male. Although none of the non-native speakers treated blacksmith as
female, three native speakers singled out the feminine she and seven of them preferred the
coordinate construction he or she.
Sentence 9: The Titanic was massive because killed thousands and thousands of
people.
The neutral pronoun it is almost always used in place of a single thing. However, there are,
according to Quirk et al. (1985), a few exceptions. For example, the feminine pronoun she
can be exceptionally used in a case of personification to refer to a ship.
3
10
10
she
it
O
NNs
10
9
8
7
6
5
4
3
2
1
0
Y
M
Ns
S9
However, the results presented in Figure (9) show that the majority of native and non-native
speakers treated even the ship Titanic as a single thing rather than female and chose the
neuter pronoun it.
In this particular case, it seems that the referent is not to the ship itself, but the disaster
event which is named after the ship involved in it. Clearly, the option of she for the ship is not
a popular choice, and almost all language users indicated that the gap is best filled by it.
48
Sentence 10: The offender argued logically and calmly. This could eventually help
change the attitudes of the taxpayers and officials, who are in a position to give more
support to as well as to the victims.
As shown in Figure (10), the majority of native (24) and non-native speakers (27) chose the
masculine pronoun him to refer to the antecedent word offender. This goes in line with the
widely-held assumption that the default gender interpretation of offender is male. It seems
that this applies to both American and Palestinian communities.
0
1
0
1
1
2
1
2
0
them
him/her
2
0
2
0
0
7
her
2
0
9
6
him
O
NNs
10
9
8
7
6
5
4
3
2
1
0
Y
M
Ns
S10
Interestingly, when the same sentence is considered in its context, the reporter did not use any
pronoun and chose to repeat the noun phrase the offender, although it would have been
equally explicit if it had been substituted for a pronoun. The same noun phrase is repeated in
A 001 127 and in A 001 128, and the context itself gives enough information for the reader to
interpret it in the right way. A possible explanation is that the reporter wants to avoid the
dilemma of choosing a pronoun appropriate to the situation. According to Mair and Leech
(2006), the generic use of he for both male and female was prevalent in the 1960s, but it
declined in the 1990s owing to the efforts of womens movements. The feminist
49
recommendations in this regard, together with the need to fill the gap left by the downfall of
the generic he, allowed for the deeply-rooted they to re-emerge.
A few years ago, the choice of the third person plural they would have been totally
unacceptable in terms of number concord, since the offender signals a singular entity which
requires a singular co-referent. Choice of the third person masculine he could have been seen
as male oriented or another manifestation of the subjection of women to men, whereas the
third person feminine she would have been awkward, since readers are not used to the idea of
the feminine as a generic pronoun. The writer successfully managed to avoid the dilemma by
repeating the noun.
Discussion
The two most striking results to emerge from both the corpus-based analysis and the survey
can be summarized as follows. The first is that the traditional prescriptive rule that
antecedents must take gender- and number-matched pronouns is not highly respected. In the
journalistic corpus data, many reporters pronoun choices hinge upon contingent, contextdependent pragmatic social and cultural factors. For example, the plural their is used to refer
to the singular antecedent anyone in one case and to the state Canada in another. The pronoun
he is also used to refer to the animal horse instead of the pronoun it. Moreover, the results
obtained from the survey show that native speakers are deeply divided about what pronoun to
use when dealing with entities of unknown gender relative to their age and gender.
Moreover, the systematic way in which the language users responses of a certain
pronoun pattern provides evidence that the age factor, for instance, constrains their choices
and leads to apparent sensitivity of judgment relative to the given socio-cultural context. With
this in mind, usage nowadays is changing under the pressure of social, cultural, and pragmatic
constraints.
Further evidence can be obtained from sentences (11-13) below, which are extracted
from the survey. They show how the third-person singular neuter it is used as the subject of a
passive construction where the more straightforward active construction could have been
used.
11.
A. It is hoped that as a result, the public might view the offender in a more positive light.
B. I hope that the public might view the offender in a more positive light.
12.
50
B. The minister did not know immediately what interest rates would be charged.
13.
A. It also was announced that Bowman's squad had lost three players to injury.
B. The coach announced that Bowman's squad had lost three players to injury.
To begin with sentence (11) (A 001 28), the third person singular neuter it is used as a subject
of a passive construction. A possible interpretation of this choice is that since newspaper
language has to be objective and unbiased, the writer tries to disassociate her/himself from the
potential hope expressed in the utterance, simply because s/he is not the appropriate person to
express feelings and hopes, but facts.
Let us consider for a moment how the sentence could have been stated otherwise: I
hope that the public might view the offender in a more positive light. The use of the first
person pronoun, together with the modal auxiliary might, expresses a strong hope on the part
of the writer, but at the same time, it can be viewed as a kind of a mild imperative (an
ethical code requires that you should view the offender ...), or as a strong suggestion, which
automatically turns a simple utterance to a Face Threatening Act (FTA) (Brown and
Levinson 1987: 10).
The Face Threatening Act, which is closely related to the notion of politeness,
imposes many constraints on the linguistic choices language users make, both in spoken and
written discourse. Brown and Levinson (1987), in discussing the notion of politeness, have
proposed that face consists of two related aspects. Negative face refers to the want of
every individual that his actions be unimpeded by others (i.e., one's freedom of action and
freedom from imposition). Positive face refers to the want of every member that his wants
be desirable to at least some others. Brown and Levinson (1987: 61)
Brown and Levinson (1987) have also highlighted the options available to the speaker
who must decide whether and how to utter a Face Threatening Act, that is, an act which poses
a threat to either the positive or the negative face of the addressee. These options range from
simply not doing the FTA (off-record), to doing the act boldly, with little or no concern for
face (on record, without redressive action). Between these two options, for a speaker who
chooses to do the FTA but who wishes to show an appropriate concern to face, there are
various FTA minimizing strategies and devices for mitigating the illocutionary force of
particular utterances: (cf. Brown and Levinson 1987, Lakoff 1972, Leech 1983).
51
Therefore, one could say that the journalist uses a negative politeness strategy, that
is, the passive construction, in order to preserve the addressee's (the public's) negative face, as
well as to avoid any kind of impingement on their desire to be free from imposition.
Likewise, in example (12), it was not known immediately what interest rates would
be charged... (A009 83), the writer again chooses the passive construction without an agent,
in order to avoid putting the blame on anyone, on the authorities or on the particular president
of the institution in our case. If the reporter had used the third person plural they to mean
persons unspecified, or persons with responsibility (Halliday and Hasan 1976: 53), s/he
would have again performed an FTA, that is, s/he would have shown disapproval or
contempt, expressions that both threaten the addressee's positive face want, by indicating that
the speaker doesn't care about the addressee's feelings, wants, etc. (Brown and Levinson
1987: 66).
Something similar can be observed in (13): it also was announced that Bowman's
squad, which already had lost three players to injury (A 007 13 in the data). The pronoun
it here does not co-refer with a previous antecedent, but it occupies the subject role of the
utterance. The journalist may have used this construction in order to avoid attribution of
blame or responsibility to persons involved in the situation.
When the same examples were presented to native and non-native speakers out of
their context, they, as shown in Figure (11) below, overwhelmingly selected the active form.
Putting the results obtained from the analysis and the survey together shows how the
contextual meaning of individual examples shapes their structural form relative to what the
speaker intends her/his meaning to be (i.e., by means of a pragmatic rather than a syntactic or
semantic explanation). Clearly, pragmatics goes a step further than text and textual meaning,
clarifying what exactly a piece of language means to a given person to the speaker or
addressee in a given speech situation (Leech 1980: 80).
These pragmatic constraints of the occurrence of the third-person pronouns refute the
claim that since semantic interpretation is the study of what a piece of language means
(Leech 1980: 80), pragmatic explanation of any piece of spoken or written discourse is
redundant.
The second striking result is that the obtained information on the spontaneous choices
of Americans as compared with those of non-native speakers paints a rather fuzzy picture.
However, a couple of patterns are worth mentioning here. Native speakers were more flexible
than non-native speakers in their choices of the plural they or the coordinate he or she to refer
52
7.5
97.5
100
Passive
92.5
95
87.5
80
Active
S13
S12
S11
Non-native Speakers
S13
S12
S11
Native Speakers
to singular words with unspecified gender. When using they as gender-free pronoun, native
speakers here are socially-sensitive to avoid gender bias in their communities. Interestingly, a
clear interaction between age and gender in the native participants, influencing the use of
his/her, has been observed in many cases; younger males used it twice as much as older
males, but older and younger females used it most and equivalently. Not surprisingly,
perhaps, older native speakers went for the masculine pronoun he to refer to non-specific
referents like anyone.
The data analysis highlights the problem of the nonexistence of gender-neutral
singular pronouns in English. An antecedent like student or anyone does not display whether
the referent is male or female. This study has shown the usage of third-person pronouns, in
native and non-native speakers completion of sentences extracted from Associated Press
news articles.
This is fair enough, if the issue is restricted to native speakers living in one society.
But can this be easily adopted by non-native speakers to become socially sensitive to the
culture-specific rules in English-speaking countries? Should EFL teachers and students
follow what prescriptive grammarians say, or study language as it is used by its speakers?
Although the plural they or the coordinate construction he or she is widely acceptable
nowadays in English-speaking societies to refer to gender-unknown singular words, their use
poses a real problem for non-native speakers who need systematic formal rules that can be
easily followed.
53
Clearly, non-native speakers of English tend to follow the prescriptive rule that a
pronoun must agree with its antecedent in gender and number without paying much attention
to social developments in the English-speaking communities. As the survey results indicate,
there is a clear gap between native and non-native speakers performance on the choice of
third-person pronouns. This can be explained in two ways. First, it may be partly attributed to
language interference in which L2 learners in general, and Arabic-speaking learners in
particular, transfer the pronoun system of their native language to L2 (Al-Jarf 2010). Unlike
nouns in Arabic which show grammatical gender, nouns including indefinite pronouns in
English (e.g., someone, anyone) do not display gender (Khalil 1999). Second, it may be
attributed to the lack of cultural knowledge and awareness on the part of the non-native
speakers. To bridge such a gap and avoid intercultural miscommunication, culture teaching is
badly needed to develop EFL students cultural awareness and competence. Clearly, EFL
teachers need to integrate some cultural knowledge into classroom teaching of certain
grammar points.
These findings break the prescriptive requirements for the use of third-person singular
pronouns. The overall impression one gets from the discussion above is that there is no
concrete well-defined criterion as to what pronoun to use when talking about an entity with
unspecified gender. Language users, whether native or non-native, need to be sensitive to the
culture-specific rules in English-speaking countries in order to use third-person pronouns
appropriately. Some people may accept that it is important to raise EFL learners and
teachers awareness of native speakers use, and to train them on how to notice the difference
in cultural orientations. Others may argue that non-native speakers should not be left at the
mercy of native speakers attitudes and desires, and they should not be hung in the middle
between strict prescriptive rules and users' actual practices or applications.
third-person pronouns as cohesive devices, which could not be interpreted either syntactically
or semantically. Moreover, the study has shown how the pronoun usage reflected in the
reporters choices reflects the relations toward participants acts in the discourse. That is,
third-person pronouns, among other linguistic features, have displayed how reporters project
themselves and how they express associations or disassociations with others acts.
In order to carry out a performance comparison of native and non-native speakers use
of English third-person pronouns, thirteen sentences with tricky pronouns taken from the
corpus were presented in writing to two groups of native and non-native speakers. The results
have revealed that most native speakers chose third person pronouns depending on the sociocultural context and pragmatic factors, tending to bend the formal rule of pronoun-antecedent
agreement, especially when dealing with gender-unspecified words. However, the majority of
non-native speakers showed an inclination to abide by the prescriptive rules of grammar,
demonstrating little social and cultural sensitivity.
This seems to imply that a treatment of third-person pronouns, or pronouns in general,
based on syntactic conditions alone, may not lead to a consistent and convincing explanation
of their behavior. Going a little bit further, the results of this study suggest that the choice of
different forms in a particular discourse type may be a matter of emotional reflection, as well
as a matter of particular linguistic needs and attitudes, which have to be taken seriously into
consideration well before attempting any kind of syntactic, semantic or pragmatic analysis.
This has been reflected, for example, in the native speakers divided responses regarding the
antecedent child. In many cases, the results obtained from native speakers have shown an
interesting interaction between age and gender, influencing the use of one pronoun rather than
another. The performance of younger males or females was different in many cases from that
of older males and females. Pronoun-antecedent agreement has proven to be an area where it
is difficult to draw the line between standard and non-standard usage.
It should be noted that this study has not fully covered the broad topic of pronoun
usage. One limitation stems from the fact that the data was of a particular discourse type,
namely American newspaper reports, which are copy-edited according to prescriptive
stylebooks. Other limitations may be attributed to the participants characteristics. However,
this study should, hopefully, provide insights into the reality of pronouns and the challenges
they pose to both native and non-native speakers.
55
Acknowledgements:
I am immensely grateful to Professors Mike Garman and Aziz Khalil for their invaluable
comments on an earlier draft of this paper. Special thanks go to Professor Steve Schwegler
for helping me recruit American participants for the survey. I also extend my deepest thanks
to the anonymous reviewers for The Linguistics Journal for their helpful remarks and
insightful suggestions. My special thanks are also due to all American and Palestinian
participants who willingly volunteered to complete the survey.
References
Al-Jarf, R. (in press). Interlingual pronoun errors in English-Arabic translation. King Saud
University. Retrieved July 21, 2013 from
http://faculty.ksu.edu.sa/aljarf/Publications/Forms/AllItems.asp
Brown, P. and S. C. Levinson (1987). Politeness: Some universals in language usage.
Cambridge: Cambridge University Press.
Brown, R. and A. Gilman (1960). The pronouns of power and solidarity. In T. A. Sebeok
(Ed.), Style in language (pp. 253-276). Cambridge, Mass: MIT Press.
Celce-Murcia, M. (1985). Making informed decisions about the role of grammar in
language teaching. TESOL Newsletter, 1 , 4 - 5 .
Christophersen, P. and A. Sandred. (1969). An advanced English grammar. London:
Macmillan.
Curzan, A. (2003) Gender shifts in the history of English. Cambridge: Cambridge University
Press.
Einsohn, A. (2011). The copyeditor's handbook: A guide for book publishing and corporate
communications, with exercises and answer keys. Berkeley: University of California
Press.
Fowler, W. H. (1965). Fowler's modern English usage. In E. Gowers (Ed.), A dictionary of
modern English usage (2nd ed.). London: Oxford University Press.
Fromkin, V., R. Rodman and N. Hyams. (2007). An introduction to language (8th ed.). New
York: Thomson Corporation.
Gocheco, P. (2012). Pronominal choice: a reflection of culture and persuasion in Philippine
political campaign discourse. The Philippine ESL Journal, 8, 4-25.
Greenbaum, S., R. Quirk, G. Leech and J. Svartvik. (1990). A students grammar of the
English language. Essex: Longman.
Halliday, M. A. K. and R. Hasan. (1976). Cohesion in English. London: Longman.
Halliday, M. A. K. (1985). An introduction to functional grammar. London: Edward Arnold.
56
Holmes, J. (1998). Generic pronouns in the Wellington corpus of spoken New Zealand
English. Kotare: New Zealand notes and queries, 1(1), 32-40.
Johnson, S. (2004). Exploring the use of the 'they' pronoun singularly in English. California
Linguistics Notes, 29(1), 1-5.
Khalil, A. (1999). A contrastive grammar of English and Arabic. Amman: Jordan Book
Center.
Kroeger, P. R. (2005). Analyzing grammar: An introduction. Cambridge: Cambridge
University Press.
Lakoff, G. (1972). Hedges, fuzzy logic and multiple meaning criteria. Papers from the
Chicago Linguistic Society 8, 183-228.
Lakoff, R. T. (1984). Remarks on THIS and THAT. Papers from the Chicago Linguistic
Society 10, 345-356.
Leech, G. N. (1980). Explorations in semantics and pragmatics. Amsterdam: John
Benjamins.
Leech, G. N. (1983). The principles of pragmatics. London: Longman.
Lin, Yang-Yong (1988). The English pronoun of address: A matter of self-compensation.
Sociolinguistics, 2, 157-180.
Linde, C. (1979) Focus of attention and the choice of pronouns in discourse. In T. Givn,
(Ed.), Syntax and semantics, 12. New York: Academic Press.
Lyons, J. (1975). Deixis as a source of reference. In E. L. Keenan (Ed.), Formal semantics of
natural language (pp. 61-83). Cambridge: Cambridge University Press.
Lyons, J. (1977). Semantics. Cambridge: Cambridge University Press.
Madson, L. and R. Hessling. (2001). Readers' perceptions of four alternatives to masculine
generic pronouns. Journal of Social Psychology, 141(1), 156-158.
Mair, Ch. and G. N. Leech. (2006). Current changes in the English syntax. In B. Aarts and A.
McMahon (Eds.), The Handbook of English Linguistics (pp. 318-342). Oxford:
Blackwell.
Mangan, L. (2010). All style and substance. The Guardian. Retrieved 24 July, 2010 from
http://www.theguardian.com/lifeandstyle/mind-your-language/2010/jul/24/styleguide-grammar-lucy-mangan
Partington, A. (2003). Politics, power and politeness. In A. Partington (Ed.), The linguistics of
political argument (pp. 124 - 155). London and New York: Routledge.
Quirk, R, S. Greenbaum, G. N. Leech and J. Svartvik. (1972). A grammar of contemporary
English. London: Longman.
57
58
(...........................................)
Part I: Fill in each blank in the following sentences with an appropriate third-person pronoun and briefly
explain why.
-
Ridden by jockey Aki Kato, Tally Ho the Fox, scored (3) second consecutive stakes win.
When the average person walks into a bank, (4) looks over brochures in the lobby.
It was a singular act of courage on the part of Canada to spirit out of Iran a group of diplomats who were not
even (5) own citizens.
I dont think anyone would approve of having (6) children attend classes in this setting.
The hairdresser turned down the offer and (7) returned inside.
The blacksmith remained silent and (8) refused to leave the coach.
The Titanic was massive because (9) killed thousands and thousands of people.
The offender argued logically and calmly. This could eventually help change the attitudes of the taxpayers
and officials, who are in a position to give more support to (10) as well as to the victims.
Part II: Which sentence would you prefer to use in your writing? Please tick the box next to it.
11. A. It is hoped that as a result, the public might view the offender in a more positive light.
B. I hope that the public might view the offender in a more positive light.
12. A. It was not known immediately what interest rates would be charged.
B. The minister did not know immediately what interest rates would be charged.
13. A. It also was announced that Bowman's squad had lost three players to injury.
B. The coach announced that Bowman's squad had lost three players to injury.
Thank you
59
Keywords: argument structure constructions, varieties of ELF, the passive construction, the
existential construction
60
Introduction
In the theory of Construction Grammar (CxG), all levels of description in language lie in the
notion of construction, which refers to a pairing of form and meaning. Morphemes, words,
idioms, and phrasal patterns are all constructions since they are instances of form-meaning
correspondences (Fillmore 1988). Generalizations about particular arguments being topical,
focused, inferable, etc., as well as facts about the actual use such as frequencies are also
stated as part of the constructional representation (Goldberg 2002, 2009). Such perspective of
constructional properties suggests a more precise definition of a construction implied in the
theory, i.e., an association of form, meaning, and use.
Clause-level syntactic patterns, often referred to as argument structures, are one type
of construction because they are associated with a particular form, meaning, and use. A
fundamental idea behind the CxG approach to argument structure constructions is that they
designate event types, which are basic to human experience. The meanings of these event
types are rather general and abstract (Goldberg 1995). For instance, in English, the transitive
construction (of the form Subject-Verb-Object, as in Pat opened the door) denotes something
acting on something; the ditransitive construction (Subject-Verb-Object1-Object2, as in Pat
gave Jill a gift) denotes possessive transfer from one participant to another.
Compared to constructions at the lower levels, argument structures are more difficult
to acquire. When English-speaking children encounter new words, for example, they can
quite quickly pick up the form and meaning of those unfamiliar expressions from the
immediate context. In contrast, properties of an argument structure are general and abstract.
Children need to be exposed to a number of instances of one argument structure before they
can make generalizations about the form, meaning, and use inherently attached to that
construction.
Children learn their first language by making generalizations and drawing conclusions
based on the linguistic input they have received. They tend to lose this innate linguistic ability
when they grow up (Bley-Vroman, 1988), and the process of learning a second language is
more explicit and depends heavily on explanations of instructors. Based on this fact in the
acquisition literature, the task of learning an argument structure becomes even more
challenging for second language learners. While some constructional features are noticeable
and easy to describe, many general and abstract constructional features are hard to explain. In
order to appropriately use one argument structure, English learners need to recognize all of its
syntactic, semantic, and pragmatic principal properties. Given that their deviations in using
clausal patterns are often found, it is evident that this is not always the case.
61
Thus, the use of the passive and existential constructions is a linguistic option. Native
speakers choose them over the more basic structures due to very specific properties. It is
difficult for L2 learners to differentiate between the alternative structures and recognize all
properties particularly attached to each of the two constructions. In sum, by dealing with the
passive and existential constructions, the objectives of the study are: (1) to investigate Thai
learners use of the English constructions, in comparison with the native speaker norms, and
(2) to analyze the deviations in terms of the general, universal characteristics of ELF.
Literature Review
The review of the literature covers four areas: (1) CxG, (2) ELF, (3) the English passive
construction, and (4) the English existential construction.
Construction Grammar
The basic tenet of CxG is that constructions form-meaning correspondences constitute the
basic units of language (Goldberg 1995). The main objective of the theory is to provide a full
range of facts in language on the basis of various types of constructions available in human
languages.
Argument structures hold a special interest in the theory. This type of construction is
marked by syntactic, semantic, and pragmatic properties. According to the Principle of No
Synonymy of Grammatical Forms, the form of a construction is very specific; even slight
changes in a sentence structure can result in differences in meaning either denotational or
pragmatic meaning (Goldberg 1995). Thus, pairs of alternating sentences such as an active
and its passive counterpart belong to different constructions that denote subtle differences in
meaning. Semantically, an argument structure designates a scene basic to human experience,
and its meaning can be polysemous, having a family of different, but related senses. As a
result, there are semantic variations in the way speakers use a construction. For example,
while the English ditransitive typically expresses successful transfer, some ditransitive
sentences denote other related senses of transfer, including future transfer, intended transfer,
and negation of transfer. Pragmatically, the use of a construction varies along different kinds
of pragmatic dimensions, such as packaging of information structure, grammatical heaviness,
and register. All of these properties in form, meaning, and use contribute to the existence of
an argument structure construction in a language.
63
Characteristic
Repetition
Definition
ELF speakers often repeat their words and other speakers words. Repetition
is an accommodation strategy to achieve efficiency of communication,
signal agreement and alignment, show attention and engagement in the
conversation, and establish cohesion.
Simplification
Complex forms are replaced by simple, shortened forms. Complex rules are
simplified.
Regularization
ELF speakers make use of rule regularizations to make the rules more
general and consistent and to avoid exceptions.
Analogy
64
Property
NS Norm
Syntax:
A passive verb appears in many forms, with various tenses, aspects, and auxiliaries.
An agent is usually omitted when it is unknown or irrelevant to the point being
discussed, when it is predictable by the context or world knowledge, and when it
refers to people in general.
An agent is retained when it conveys new information. Typically, it is introduced
by the preposition by.
Semantics and pragmatics:
The theme functions as the topic of a passive sentence; it usually expresses given
information.
Speakers tend to choose the passive when an agent at sentence-final position is
structurally heavy.
Non-basic passives:
The get passive is used with an event whose subject is partly responsible for the
result, or which happens unexpectedly.
The ditransitive passive is formed from a ditransitive verb (e.g., she was sent a note).
The prepositional passive is formed from an intransitive verb that occurs with a
preposition (e.g., the project was thought about).
(1)
In terms of form, the passive structure includes a theme subject, a passive verb form
(usually consisting of be and a past participle), and an optional agent phrase. As to meaning, a
passive sentence is used to talk about an action from the viewpoint of the theme. Apart from
these basic form and meaning, the English passive is associated with a set of syntactic,
semantic, and pragmatic properties. The major characteristics of the construction as
frequently discussed in literature (e.g., Downing and Locke 2006, Finegan 2004, OGrady
2001, Parrott 2000) are listed in Table 2.
65
Property
NS Norm
Syntax:
The form of be is varied, with various tenses, aspects, and auxiliaries.
In addition to be, a small number of verbs appears in the construction. Most are
intransitive verbs.
The displaced subject denotes countable, uncountable, or abstract entities.
The displaced subject tends to be long, having various kinds of modifiers.
The bare existential structure contains there, be, and a displaced subject. The
extended existential structure also contains an extension often a locative or
temporal expression.
Existential sentences often appear in the declarative form and in the simple structure.
Semantics and pragmatics:
The existential construction typically serves a presentational function. It draws an
addressees attention to the displaced subject.
The displaced subject typically conveys new information; its position is usually
occupied by an indefinite noun phrase.
(2)
As the example in (2) shows, the verb agrees with the pivot noun phrase that follows, rather
than with the expletive subject there, which is neutral for number. As a result, the pivot
nominal is called a displaced subject, i.e., the real subject that is moved from the pre-verbal
original position to the position after the verb.
In terms of form, the existential structure consists of the expletive there, the verb be, a
displaced subject, and an optional extension. As to meaning, an existential sentence denotes
the presence of something. Apart from these basic form and meaning, the English existential
construction is associated with a set of syntactic, semantic, and pragmatic properties. The
major characteristics of the construction as frequently discussed in literature (e.g., Collins
66
2002, Downing and Locke 2006, Huddleston and Pullum 2005, OGrady 2001) are listed in
Table 3.
Research Methodology
The study employed the qualitative approach, by assigning a writing task with prompts and a
free writing task to collect data and interpreting the results in terms of the common and
systematic characteristics of Thai learners use of the English passive and existential
constructions. The details of the subjects and instruments are as follows:
Subjects
Since their deviations should reflect systematic variations not sporadic errors of beginning
learners, the target population was upper-intermediate Thai ELF learners who had received
formal instruction in English and had been schooled to conform to Standard English norms
over several years. Both the purposive and random sampling procedures were used to select
the representatives of the population. That is, the subjects were among those who met the
following language criteria. First, undergraduate students majoring in English at Kasetsart
University who had been in the program for more than one year were targeted since they had
studied the four skills of English extensively listening, speaking, reading, and writing,
especially during the period of their study at the university. Second, to ensure that they had
upper-intermediate level English knowledge and skills, only those with an average grade of
over 3.25 for all English classes taken at the university were considered. The subjects were
randomly selected from this group of students who met the two criteria.
Subjects meeting the selection criteria were in the third and fourth years of their study.
They were in the regular and special programs of English, affiliated with the Department of
Foreign Languages, Faculty of Humanities. The two programs shared the same curriculum;
they differed only in the class times. There were 139 third-year students and 122 fourth-year
students, yielding 261 third-year and fourth-year students in the two English programs. 35
third-year students and 35 fourth-year students were chosen to participate in the study. Of
these 70 subjects, 50 (71.4%) were female, and 20 (28.6%) were male; 40 subjects (57.1%)
studied in the regular program while 30 (42.9%) studied in the special program. The average
age of all the subjects was 22, and the average number of years of English study was 16.
67
Instruments
Two types of writing tasks were designed. In order that the subjects could concentrate on
their writing, they were assigned to do the tasks in two separate sessions, which took place on
different days. There was no time limit on finishing each task; however, most subjects could
finish within two hours. The designs and instructions of the tasks are as follows:
68
Results
This section is divided into three parts. The first part involves the passive construction. The
second part discusses the existential construction. The last part analyzes how the Thai
learners use of the constructions reflects the general, universal characteristics of ELF.
Thai Learners Use of the English Passive Construction
Table 4 presents the number of passive sentences and passive verb phrases taken from each
task and sub-task. Since several sentences contained more than one passive verb phrase, the
number of the passive verb phrases outnumbered that of the passive sentences.
Table 4 Number of Passive Sentences and Passive Verb Phrases
Task/Sub-task
Picture description
65
70
Translation
271
277
Essay writing
455
501
Total
791
848
69
Of these three data sources, the sentences from the students essays are considered the
best indicator of how the Thai students used the English construction. The sentences from
essay writing are naturalistic, or naturally occurring data; the students produced these
sentences from their own linguistic repertoire, with no hints or stimulation to use any
particular features through the provided word prompts, pictures, or Thai counterpart
sentences. Accordingly, the results from the essay and the writing with prompts are presented
separately for both the passive and existential constructions. This is to see whether the results
from the naturalistic data and elicited data supported each other regarding the Thai students
use of the constructions.
Essay Writing
Verb Form
Frequency
Verb Form
Frequency
Present simple
232 (46.3%)
Present simple
147 (42.4%)
Past simple
67 (13.4%)
Present perfect
60 (17.3%)
56 (11.2%)
53 (15.3%)
Present perfect
29 (5.8%)
Past simple
38 (11%)
26 (5.2%)
Future simple
34 (9.8%)
To infinitive
25 (4.9%)
To infinitive
4 (1.2%)
Future simple
23 (4.6%)
4 (1.2%)
8 (1.6%)
Present continuous
2 (0.6%)
8 (1.6%)
2 (0.6%)
7 (1.4%)
Past continuous
1 (0.2%)
4 (0.8%)
1 (0.2%)
4 (0.8%)
1 (0.2%)
3 (0.6%)
Bare infinitive
2 (0.4%)
Present continuous
2 (0.4%)
2 (0.4%)
Past continuous
1 (0.2%)
Past perfect
1 (0.2%)
Imperative
1 (0.2%)
Total
Total
347
501
70
2. Auxiliary Verbs
The students passive sentences were predominantly formed by the typical passive auxiliary
verb be in essay writing (97.2%) and in the picture description and translation (96.8%). This
finding shows that Thai students usually produce the basic form of the English passive verb
phrase; variant forms containing other auxiliaries are uncommon.
Essay Writing
Auxiliary
Frequency
Auxiliary
Frequency
be
487 (97.2%)
be
336 (96.8%)
become
6 (1.2%)
get
10 (2.9%)
get
4 (0.8%)
seem
1 (0.3%)
feel
3 (0.6%)
look
1 (0.2%)
Total
Total
347
501
71
Essay Writing
Agent Phrase
Frequency
Agent Phrase
Frequency
388 (77.4%)
229 (66%)
113 (22.6%)
118 (34%)
Total
501
Total
347
Essay Writing
Context
Frequency
Context
Frequency
Unknown or irrelevant
238 (61.3%)
Unknown or irrelevant
149 (65.1%)
Predictable by context
87 (22.4%)
Referring to people
63 (27.5%)
Referring to people
41 (10.6%)
Predictable by context
16 (7%)
22 (5.7%)
1 (0.4%)
Total
388
Total
229
72
Essay Writing
Preposition
Frequency
Preposition
Frequency
by
86 (76.1%)
by
71 (60.2%)
to
11 (9.7%)
from
41 (34.7%)
with
5 (4.4%)
with
6 (5.1%)
because of
4 (3.5%)
due to
4 (3.5%)
from
3 (2.7%)
Total
Total
118
113
Essay Writing
Weight
Frequency
Weight
Frequency
1-2 words
51 (45.1%)
1-2 words
37 (31.4%)
3-4 words
28 (24.8%)
3-4 words
35 (29.7%)
5-6 words
11 (9.7%)
5-6 words
37 (31.4%)
7-8 words
7 (6.2%)
7-8 words
6 (5.1%)
9-10 words
7 (6.2%)
9-10 words
1 (0.8%)
11-12 words
2 (1.8%)
11-12 words
1 (0.8%)
13 words or more
7 (6.2%)
13 words or more
1 (0.8%)
Total
113
Total
118
downgraded. Many Thai students are aware only of this distinct pragmatic, which involves
the omission of the agent, and they do not recognize other additional functions including the
end-weight principle, which involves the presence of the agent.
Essay Writing
Theme Subject
Frequency
Theme Subject
Frequency
217 (43.3%)
121 (34.9%)
103 (20.5%)
114 (32.8%)
80 (16%)
68 (19.6%)
76 (15.2%)
41 (11.8%)
Dummy it
20 (4%)
Dummy it
2 (0.6%)
Interrogative pronoun
5 (1%)
Interrogative pronoun
1 (0.3%)
Total
501
Total
347
Essay Writing
Sentence Type
Frequency
Sentence Type
Frequency
Complex
235 (51.6%)
Simple
258 (76.8%)
Simple
134 (29.5%)
Complex
66 (19.6%)
Compound-complex
58 (12.7%)
Compound
10 (3%)
Compound
28 (6.2%)
Compound-complex
2 (0.6%)
Total
455
Total
336
Essay Writing
Sentence Type
Frequency
Sentence Type
Frequency
Declarative
480 (95.8%)
Declarative
345 (99.4%)
Indirect interrogative
17 (3.4%)
Indirect interrogative
2 (0.6%)
Direct interrogative
4 (0.8%)
Total
Total
347
501
75
Essay Writing
Passive Type
Frequency
Passive Type
Frequency
Basic
491 (98%)
Basic
337 (97.1%)
Ditransitive passive
6 (1.2%)
Get passive
10 (2.9%)
Get passive
4 (0.8%)
Total
Total
347
501
Task/Sub-task
Picture description
121
125
Translation
162
163
Essay writing
244
248
Total
527
536
1. Verb Forms
The students wrote existential sentences mainly in the present simple tense for essay writing
(87.1%) and the picture description and translation (93.1%). This shows that many Thai
students generalize the use of the present simple tense to talk about not only facts and habits,
but also other event types in which they do not want to clarify the time reference.
76
Essay Writing
Verb Form
Frequency
Verb Form
Frequency
Present simple
216 (87.1%)
Present simple
268 (93.1%)
Past simple
16 (6.5%)
Past simple
10 (3.5%)
Future simple
6 (2.4%)
Present perfect
5 (1.7%)
Present perfect
2 (0.8%)
Future simple
4 (1.4%)
2 (0.8%)
1 (0.3%)
2 (0.8%)
2 (0.8%)
Past perfect
1 (0.4%)
1 (0.4%)
Total
Total
288
248
2. Types of Verbs
The students overwhelmingly chose the typical verb be in essay writing (98%) and in the
picture description and translation (100%). The finding reflects that Thai students usually
produce existential sentences of the basic form. Moreover, it suggests that they consider the
form there + be an essential part of the construction; they treat this specific pattern as an
idiomatic expression whose elements always co-occur and do not allow much variation.
Essay Writing
Verb
Frequency
Verb
Frequency
be
243 (98%)
be
288 (100%)
come
2 (0.8%)
come up with
1 (0.4%)
remain
1 (0.4%)
seem to be
1 (0.4%)
Total
Total
288
248
77
Essay Writing
Displaced Subject
Frequency
Displaced Subject
Frequency
Countable noun
212 (85.5%)
Countable noun
274 (95.1%)
Abstract noun
15 (6.1%)
Uncountable noun
13 (4.5%)
Uncountable noun
13 (5.2%)
1 (0.4%)
Indefinite pronoun
8 (3.2%)
Total
Total
288
248
Essay Writing
Weight
Frequency
Weight
Frequency
1-2 words
36 (14.5%)
1-2 words
27 (9.4%)
3-4 words
34 (13.7%)
3-4 words
65 (22.5%)
5-6 words
47 (19%)
5-6 words
25 (8.7%)
7-8 words
42 (16.9%)
7-8 words
28 (9.7%)
9-10 words
27 (10.9%)
9-10 words
46 (16%)
11-12 words
17 (6.9%)
11-12 words
31 (10.8%)
13-14 words
12 (4.8%)
13-14 words
11 (3.8%)
15 words or more
33 (13.3%)
15 words or more
55 (19.1%)
Total
248
Total
288
78
Essay Writing
Displaced Subject
Frequency
Displaced Subject
Frequency
195 (78.6%)
285 (99%)
50 (20.2%)
3 (1%)
2 (0.8%)
1 (0.4%)
Total
248
Total
288
79
Essay Writing
Type
Frequency
Type
Frequency
Bare
182 (73.4%)
Bare
217 (75.3%)
Extended
66 (26.6%)
Extended
71 (24.7%)
Total
248
Total
288
Essay Writing
Modifier
Frequency
Modifier
Frequency
Relative clause
71 (29.2%)
Relative clause
101 (35%)
Prepositional phrase
58 (23.9%)
Prepositional phrase
97 (33.6%)
Adjective
42 (17.3%)
47 (16.3%)
Infinitive phrase
29 (11.9%)
Adjective
28 (9.7%)
14 (5.8%)
Noun
7 (2.4%)
Adjective phrase
9 (3.7%)
4 (1.4%)
Noun phrase
7 (2.9%)
Adjective phrase
3 (1%)
7 (2.9%)
Noun phrase
1 (0.3%)
Noun
6 (2.5%)
Adverb
1 (0.3%)
Total
243
Total
289
80
Essay Writing
Extension
Frequency
Extension
Frequency
Locative
52 (72.2%)
Locative
58 (77.3%)
Temporal
19 (26.4%)
Temporal
17 (22.7%)
Comparison
1 (1.4%)
Total
Total
75
72
Essay Writing
Sentence Type
Frequency
Sentence Type
Frequency
Complex
126 (51.6%)
Complex
135 (47.7%)
Simple
70 (28.7%)
Simple
130 (45.9%)
Compound-complex
39 (16%)
Compound
11 (3.9%)
Compound
9 (3.7%)
Compound-complex
7 (2.5%)
Total
244
Total
283
81
Essay Writing
Sentence Type
Frequency
Sentence Type
Frequency
Declarative
245 (98.8%)
Declarative
288 (100%)
Indirect interrogative
3 (1.2%)
Total
Total
288
248
and non-basic ones. This characteristic results in the association of the constructions with
simple, basic structural patterns.
For instance, passive and existential sentences are usually of the basic type; the
passive consists of the typical auxiliary be and a past participle while the existential structure
is made up of the expletive there, the typical verb be, and the displaced subject. More
complex or non-basic forms, such as ditransitive passives and extended existential sentences,
are not frequently found among Thai learners.
Regularization and Analogy No Variety in Form and Meaning
Regularization and analogy are reflected by many properties of the two constructions
produced by Thai learners. They involve syntax, semantics, and pragmatics; various kinds of
constructional features are regularized and generalized to become more general and consistent
on the basis of predominant cases. These characteristics result in no great variety in the use of
the constructions.
In terms of syntax, for example, passive and existential sentences do not appear in
various verb forms. In most cases, they are in the present simple tense, which is regarded as
the unmarked verb form of English. Moreover, since by is the typical marker of the passive
agent (Parrott 2000), most agent phrases produced by Thai learners by means of analogy
are introduced by this preposition. Likewise, since the majority of nouns are countable,
almost all existential sentences produced by Thai learners talk about the presence of this type
of nouns which function as the displaced subject.
As to semantics and pragmatics, for example, both the theme subject of the passive
and the displaced subject of the existential follow the main tendencies of the constructional
usage. On the basis of predominant cases, the former usually appears as given and definite
and the latter as new and indefinite. Moreover, like native speakers who mainly choose the
passive when they want to focus the theme and downgrade the agent, Thai learners frequently
omit the agent phrase in their passive sentences. Likewise, the forms of displaced subjects in
Thai learners existential sentences are quite consistent. As entities newly introduced, most
displaced subjects are structurally heavy, containing various modifiers, particularly relative
clauses and prepositional phrases, which are among the most common kinds of English noun
modifiers.
Associated with these characteristics simplification, regularization, and analogy
English passive and existential sentences produced by Thai learners are involved with only
the most distinct and fundamental properties in syntax, semantics, and pragmatics. Moreover,
83
their uses are more regular and consistent, not as varied as those of native speakers. In other
words, due to these universal tendencies of second language usage, Thai learners treat the
passive and existential constructions in English as idiomatic expressions or pre-fabricated
chunks which are made up of rather fixed components and do not allow much variation and
flexibility in both form and meaning.
Discussion
Based on the characteristics of the students use of the English passive and existential
constructions, we can draw four general properties of argument structures typically produced
by Thai learners of English.
84
usage of clausal constructions is the need for simplicity and effectiveness in communication.
Thai learners have developed their own version of an argument structure construction, which
is simpler and more consistent than the native speaker norms. Because this version is
associated with one particular form and one particular meaning with not much variation, it
ensures mutual understanding and successful communication. Therefore, the present study
supports the precept of the ELF approach, which holds that there is a universal tendency for
L2 speakers to make some changes in the way they use English and shift their focus to
simplicity and effectiveness in communication.
an argument structure are in accordance with Ellis (2005) principle of second language
acquisition that formulaic expressions serve as a basis for the later development of more
complicated features which require a rule-based competence.
The study has extended the scope of CxG from L1 settings to L2 phenomena. Most
studies in the CxG approach have focused on the formal properties of various constructions in
English and other languages from the perspective of native speakers reception and
production. The results of this study have revealed differences in the constructional use
between L1 and L2 speakers, which serve to provide guidelines of teaching argument
structure constructions to English learners. Moreover, the study has broadened the scope of
ELF research, which has focused on phonological and pragmatic features of ELF interactions,
with just a little description at the lexical-grammatical level (Cogo and Dewey 2006,
Seidlhofer 2004). The results have demonstrated that ELF speakers deviations from Standard
English at all levels sounds, words, phrases, discourse, and also sentences are governed by
the universal characteristics of second language usage, which reflects the underlying
motivations of ELF speakers to shape the language in the direction that results in a simple
and effective form of communication.
However, all data in the study involved only written English. In fact, the spoken form
of language is considered more natural (Stewart, Jr. and Vaillette 2001), and an analysis of
data taken from both written and spoken English should reflect more precise characteristics of
the constructions. Moreover, the subjects in the study were from only one institution; data
from various institutions should better represent Thai ELF learners. Therefore, future research
that includes both written and spoken English and participants from various institutions
should be able to find out Thai learners use of English argument structure constructions in
more precise and specific detail.
Acknowledgements
This research project was supported by the Department of Foreign Languages, Faculty of
Humanities, Kasetsart University.
References
Bley-Vroman, R. (1988). The fundamental character of foreign language learning. In W.
Rutherford and M. Sharwood-Smith (Eds.), Grammar and second language teaching: A
book of readings (pp. 19-30). Rowley, MA: Newbury House.
87
89
results
indicated significant differences between working and middle-class samples in terms of the
total number of words, content-words repetitions, impersonal pronouns, quasi-sentences, and
verb groups. Moreover, the findings of the study showed that middle-class members were
more productive and creative than persons from lower classes. Accordingly, this study can be
regarded as partial support of Bernstein's Language Codes Theory in an Iranian context.
90
Keywords: language codes theory, restricted code, elaborated code, working-class, middleclass
1.
Background
It is often claimed that social class structure is mirrored in the language patterns produced by
speakers (Holmes 1992) and that there is a direct and reciprocal relationship between a
particular kind of social structure, in both its establishment and maintenance, and the way
people in that social structure use language (Wardhaugh 2006: 336). It is also credited that
the quality of the speakers' language patterns changes according to their socio-economic
status. Therefore, the way language production interacts with social class has provided a rich
area of investigation (e.g., Allafchi 1998, Hoff-Ginsberg 1998, Richardson et al. 1976,
Walker et al. 1994).
Research on this line of study has received much interest in Iranian context in recent
years. Drawing on the relationship between language 1 and language 2 proficiency, Hosseini
(2003) studied learners writing characteristics in light of their socio-economic statuses in
Iran. The study revealed that learners with high and low socio-economic status performed
differently in their writing. Further, no significant relationship was identified between L1 and
L2 proficiency in terms of socio-economic statuses. Likewise, Aliakbari et al. (2012) analyzed
the relationship between social class and language patterns among a group of elementary
school students in Iran. The result of their study illustrated a significant relationship between
ones' use of grammatical categories and their social classes.
Bernstein (1973a) argues that the linguistic differences of various social class
structures lead to two dichotomous language codes: a restricted code and an elaborated code;
the former concerns the language produced by working-class people, and the latter deals with
the language patterns of middle-class language users. The difference between restricted and
elaborated language codes is so interwoven that Bernstein has developed them into two
dichotomous language codes, each one holding its own particular characteristics. More
specifically, it is argued that working-class people do not have access to the elaborated code
and language users or speakers with lower socio-economic statuses speak a language that is
not useful for academic or educational purposes.
The aforementioned language codes are thought to have advantages and disadvantages
Ginsberg (2006) considers that less academic achievement can be attributed to insufficient
language skills. She contends that children from a low socio-economic status are usually
more under-achieved than middle-class students. Such a conclusion was strongly supported
91
by a host of studies which have given a specific attention to social class and written
composition (Richardson et al. 1976), the number of produced vocabulary (Tizard and
Hughes 1984), and vocabulary growth (Walker et al. 1994). Bernstein (1973a) points out that
the process of schooling needs specific language patterns to which low working-class
students have less access. In agreement with Bernstein, Christie (1999) writes that middleclass children have access to the language code needed for educational purposes and are
successful at schools, whereas children from lower social classes lack access to it. To
maintain the platform for the present research, more elaboration of Bernstein's theory of
sociology of education and his restricted and elaborated language codes seems warranted.
1.1.
Bernstein's social theory has been considered as a theory of sociology of education because it
is highly associated with the linguistic differences across social classes and the great effects
that linguistic differences have on the educational processes. Allafchi (1998) believes that
Bernstein has been affected by scholars like Sapir, Mead, Von Humboldt, Cassier, Firth,
Malinovski, Vygotsky and Luria. According to Sadovnik (2001), Durkheim has also played a
fundamental role in the formation of Bernstein's thought and Bernstein (1972) himself
confessed the great influence of Durkheim on his viewpoints. He believed that Durkheim
owned a truly remarkable vision into the relationship between symbolic orders, social
relationship, and the structure of experience. Accepting Durkheim's social opinion, Bernstein
established the foundations of his social theory. Just like Sadovnik (2001), Atkinson (1981)
also explains that Bernstein's theory roots in Durkheimian ideology. However, he states that
Bernstein's sociology gradually found tendency toward European structuralism. According to
Allafchi (1998), as a structuralist, Bernstein was highly indebted to Whorf who believed in a
single universalistic relationship between language and worldview. Sadovnik (2001: 2) notes
that from his early study on language, communication, codes, and schooling, to his later
works on pedagogic discourse, practice and educational transmission, Bernstein produced a
theory of social and educational codes and their effect on social reproduction. The influence
of Bernstein's theory was so noticeable that Karabel and Halsey (1977) called Bernstein's
work in the field of sociology of education the harbinger of a new synthesis. Compatible
with Karabel and Halsey (1977: 62), Robertson (2008) called Bernstein a central actor in
developing a new sociology of education.
92
1.2.
The discrimination between public and formal languages was the source for introduction and
development of language codes theory that stood as the core of Bernstein's social and
educational theory. Bernstein introduced and developed the language codes theory in 1960s,
1970s and 1980s. As a pioneer, he investigated the interaction between informal languages,
power and shared meaning (Bernstein 1958, 1960, 1961). The study on the nature of informal
and formal languages led to the introduction of restricted and elaborated language codes.
Bernstein concentrated all his attention on the development of restricted and elaborated
language codes (Bernstein 1962a, 1962b). Sadovnik (2001) reports that Bernstein (1972,
1973a) investigated the relationships between socio-economic status, family, and the
regeneration of systems of meaning. He also differentiated between the restricted code of the
working-class and the elaborated code of the middle-class. Bernstein (1973a) acknowledges
that schools require an elaborated code for success to which working-class children may have
no access. Sadovnik (2001) considers restricted codes as context-dependent and particularistic
and elaborated codes as context-independent and universalistic. In addition, an elaborated
code closely corresponds to horizontal discourse introduced by Bernstein as common sense
knowledge. On the other hand, a restricted code is intricately interwoven with vertical
discourse, a style of interrogation and text creation (Bernstein 1999: 159).
Bernstein (1972) differentiated among four socialization agencies that aid the
production of restricted and elaborated language codes: The job, the educational setting, peerage class, and the family. He further considered family as the most important element in the
process of socialization. A number of studies reflected his views toward the role of family
(Bornstein, Haynes and Painter 1998, Dollaghan et al. 1999, Naigles and Hoff-Ginsberg
1998). In this regard, he differentiated between positional and person-oriented families
(Bernstein 1972). In positional and working-class families, children's roles are often
determined by position. As a consequence, children are subordinate to their parents and do
not have the permission to participate in many conversations. Such persons are, therefore, not
allowed to generate individualized speeches. On the contrary, in person-oriented families,
typical of middle-class families, children's individual capacities and interests are taken into
account. They even enjoy the privilege to discuss issues with their parents. Thus, an intense
system of communication is established.
For a better understanding of these concepts, some main characteristics of the
informal and formal languages which are respectively in line with restricted and elaborated
language codes (Bernstein 1973b: 42-43, 55) are presented in the following table.
93
formal languages
Short, grammatically simple, often unfinished Accurate grammatical order and syntax
sentences with a poor syntactical
construction.
Simple and repetitive use of conjunctions (so,
then, and).
means.
questions.
one)
adverbs.
isn't it?
94
Universal
Having been inspired by the theoretical position reviewed earlier, this study aimed to
investigate the relationship between social classes and language patterns with a particular
reference to Iran. The study is undertaken with the following research question in mind: is
there any significant difference between working and middle-class language users in use of
language patterns?
2.
The Study
Bernstein's theory was mainly based on speech; however, less attention has been paid to
written performance. The similarities between spoken and written discourse (Akinnaso 1985),
95
the interplay between speech and writing (Gillam and Johnson 1992, Olson 1995, Strmqvist
et al. 2002, Tseng 2002), and the presentation of speech by writing (Olson 1993),
substantiated more studies on the linguistic differences between working and middle-class
writings. Inspired by this assumption, the researchers were encouraged to investigate the
quality of writing in the compositions of working and middle-class language speakers in the
Iranian context.
Meanwhile, Bernstein's remarks on the linguistic differences between working and
middle-classes have led to a number of language productivity studies. Although references
were made to a few studies carried out in the Iranian society, the nature of language across
social classes is still indefinite and demands further research. Worthy of note is the fact that
the previous studies have used the general number of vocabularies as the criteria of linguistic
productivity, with less or no focus on the grammatical categories of words. As a result, the
present study compared the linguistic productivity of working and middle-class subjects. To
do so, the language patterns produced by working and middle-class language speakers were
investigated in terms of various grammatical categories.
The dilemma of applicability of Bernsteins theoretical framework in EFL context such
as Iran motivated the present investigation. Prior to the study, much has been tried to testify
Bernsteins model in English-speaking society (ESL context) that reflects the a better
discrimination between working and middle-classes whereas in eastern society, namely Iran,
assigning elaborate and restricted codes to their respective socio-economic status is a
daunting task because the sociocultural background of eastern society obscures the
differentiation between different socio-economic classes. Thus, the study is intended to
examine how Bernstein's Language Codes Theory functions in Iranian context.
2.1.
Participants
A total of 100 subjects participated in the study. Working and middle-class members were
selected according to the level of education and occupation and two indexes of social class.
The social class indexes employed for subject sampling included Socio-economic Status
Scores by Nam and Powers (1983) and Hollingshead's two-factor Index of Social Position
(1957) which have been developed based on two countrywide surveys in the US. Workingclass members were salespersons, sale-assistants, and shopkeepers from among low-educated
and low-income people who had low score (29) from Nam and Powers Socio-economic
Status Scores (1983). The salespersons, sale-assistants, and shopkeepers who participated in
the study used to work in groceries, department stores, and supermarkets in Ilam, a western
96
city of Iran. The sample comprised 9 females and 41 males whose ages ranged from 18 to 50.
Based on the aforementioned indexes, the middle-class subjects were 16 professors at Ilam
University with Ph.D. degrees and 34 Master students from different tracks at the same
university. All the professors were male, aged between 30 and 60, while Master students
comprised 2 females and 32 males whose age varied from 24 to 30. M.A. students were
studying in their third semester. The university professors' score on Nam and Powers Socioeconomic Status Scores (1983) was between 70 to 99 and Master students were considered as
the main specialist group according to Hollingshead's two-factor Index of Social Position
(1957).
2.2.
To obtain a rich corpus of language data, a prompt was designed. The prompt included two
topics, life and home country. Participants were asked to write about these topics. The topics
were in Persian and the subjects were required to write their compositions in Farsi, the
language of the participants. The selected topics were ideological notions that evoked the
participants, whether high or low educated, to write about (example of the English version of
the prompt is provided in Appendix A).
2.3.
Raters
Two Master students analyzed and investigated the language data retrieved from working and
middle-class members. A number of attributes made them qualified enough for analyzing the
data. Both of them were native speakers of Persian who had received Persian Language and
Literature and Humanities Diploma issued by the Office of Education which indicated that
they attended many Persian language and literature courses at high school. They were aptly
familiar with the Persian language grammar and structure. Both raters had also passed a
course on Persian language and literature in their B.A. with excellent marks. In addition, the
correlation coefficient of 78% indicated an inter-rater reliability for their analyses of data.
2.4.
Administration
After subject sampling, during the following week, the copies of the prompts were given to
members of both classes individually and in their workplaces. The prompts were given to
university professors in their offices, and to salespersons, sale-assistants, and shopkeepers in
groceries, department stores, and supermarkets. The procedure was somehow different for
Master students. Since all Master students were not classmates and did not have workplaces,
97
they were provided with the prompts in the dormitory, classroom, or the campus. Although
the prompts were administered at different places, all the subjects were asked to write their
texts or paragraphs at the very moment without any time interval. The reason to adopt this
procedure was to make the situation more natural and to prevent the participants from
cheating. Although the subjects were asked to write impromptu and not to quote and copy
from any sources, some of the collected writings included inappropriate data. Therefore, those
texts which showed cases of plagiarism were excluded from the study. Illegible handwritings
and too lengthy texts were left out as well. Finally from each group, 30 prompts which were
appropriate to the purpose of this study were selected for the analysis.
2.5.
The raters analyzed the language data elicited from both groups and investigated the Persian
grammatical categories (GCs). The investigation of the GCs was based on Ahmadi Givi and
Anvari's (2006) model. Consultants with the full faculty members of the Persian language
department of Ilam university made it clear that Ahmadi Givi and Anvari's (2006)
classification of Persian language GCs is the most up to date, authoritative and
comprehensive index in the Persian language. The raters analyzed the language data for their
total number of words (TNWs), content-words repetitions (CWRs), personal pronouns (PPs),
impersonal pronouns (IPs), structurally-complete-sentences (SCSs), quasi-sentences (QSs),
noun groups (NGs), adjectives groups (AGs), and verb groups (VGs). First, the TNWs
produced by each class of participants were counted by the raters. Then, the frequency of
CWRs, i.e., words which had been repeated at least twice, was determined for each class of
participants. Next, all the variations of PPs, including subjects, objects, possessives, reflective
and emphatic pronouns were counted. Since Persian is a pro-drop language, the subjects of
the sentences are sometimes deleted and the verb suffixes indicate the subject of the sentence.
For example, in the verb xord-am (I ate), am refers to the first person singular I. In prodropped sentences the verb suffixes were regarded as the subject of the sentences and were
counted as PPs. The frequency of IPs, those referring to indefinite human beings, like
someone, somebody, and everybody, was determined as well. Those sentences which were
complete in their surface structure or had all the features of a complete sentence were counted
and labeled as SCSs. Contrary to SCSs, some sentences are semantically complete, but do not
have all the features of a complete sentence. A good example is that such sentences lack
verbs, but still present a complete idea. Structurally or syntactically incomplete sentences
were counted individually and were labeled as QSs. Finally, the frequencies of NGs, AGs,
98
and VGs were enumerated for each class of participants. According to Ahmadi Givi and
Anvari (2006), NGs, AGs, and VGs are very vast categories which comprise many cases, but
for the sake of precision, this study was limited to only those groups of nouns, adjectives, and
verbs that associated each other by the Persian conjunction word, va (and).
To illustrate the analysis procedure, the next two paragraphs present a word by word
translation of two pieces of language data in which all the syntactic and grammatical elements
of the Persian language were presented with no change.
Life
Good life with particular meanings for each man (1)*. For some people, happiness means
having cars, house, and many properties (2). But for others, a simple house is enough for
the family to be happy (3). Many believe ordinary and common life accompanies salvation,
but luxurious life destroys comfort (4).
Home country
Home country, the place where human beings are born, grow up, and live (5)*. We
accommodate in the Muslim country named Iran (6). Iranians have a specific interest in this
treasure, because this country has achieved revolution due to the attempt of many people (7).
We lost many youths for this; therefore, we must love our home country like our essence and
spirit (8).
The italicized words are English language specific which did not exist in the Persian text, but
their existence in the English translation was compulsory. The TNWs, excluding the italicized
ones, was 102. The numbers within the parentheses indicate the sentences. The asterisks show
the QSs. The number of all sentences in this data was 8, 6 SCSs, and 2 QSs. It was found that
the language data included 5 PPs. The words others, many, and this were the IPs in this
prompt. We, our, home country, house, people, and life are the CWRs. The samples included
14 CWRs. Finally, the bold words indicate NGs, AGs, and VGs. The samples included 2
NGs, 1 AG, and 1 VG.
3.
Results
3.1.
Table 2 (Appendix B) displays the frequency of the grammatical categories in the middleclass. The middle-class data included a total of 3049 TNWs, 412 CWRs, 123 PPs, 80 IPs, 164
99
SCSs, 55 QSs, 57 NGs, 15 AGs, and only 10 VGs. As Table 3 shows, the minimum and
maximum number of words produced was 13 and 193, respectively. The middle-class
members produced 101.6333 words on average (Table 3). The frequency of PPs was much
higher than IPs. The number of SCSs was nearly triple that of QSs. Among NGs, AGs, and
VGs, the highest and the lowest portions were for NGs and VGs respectively. The division of
TNWs by the number of all sentences (SCSs and QSs) indicated that average sentence length
for middle-class data was 14.004.
Table 3 Descriptive statistics of grammatical categories in middle-class data
GCs
Range
Minimum
Maximum
Sum
Mean
SD
TNWs
30
180.00
13.00
193.00
3049.00
101.6333
45.45971
CWRs
30
32.00
.00
32.00
412.00
13.7333
8.30012
PPs
30
14.00
.00
14.00
123.00
4.1000
3.65164
IPs
30
10.00
.00
10.00
80.00
2.6667
2.82029
SCSs
30
13.00
.00
13.00
164.00
5.4667
3.28773
QS
30
12.00
.00
12.00
55.00
1.8333
2.75535
NGs
30
7.00
.00
7.00
57.00
1.9000
1.82606
AGs
30
2.00
.00
2.00
15.00
.5000
.62972
VGs
30
3.00
.00
3.00
10.00
.3333
.71116
As for the working-class, Table 4 (Appendix B) shows the frequency and distribution of
the grammatical categories in the collected data. Data presented in Table 4 show that there
were 2766 words, 525 CWRs, 131 PPs, 32 IPs, 154 SCSs, 81 QSs, 75 NGs, 15 AGs, and just
2 VGs. As shown in Table 5, the minimum and maximum numbers of words were 16 and 203
respectively. The frequency of PPs was much higher than IPs. The number of SCSs was
nearly twice that of QSs. Similar to middle-class data, among NGs, AGs, and VGs, the
highest and the lowest portions were for NGs and VGs respectively. The division of TNWs
by the number of all sentences (SCSs and QSs) showed that average sentence length for the
100
working-class data was 11.77. Summary of the results of descriptive analysis of grammatical
categories collected from the working-class prompts has been represented in Table 5.
Table 5 Descriptive statistics of grammatical categories in working-class data
GCs
Range
Minimum
Maximum
Sum
Mean
SD
TNWs
30
187.00
16.00
203.00
2766.00
92.2000
45.65644
CWRs
30
48.00
2.00
50.00
525.00
17.5000
10.80788
PPs
30
11.00
.00
11.00
131.00
4.3667
3.16754
IPs
30
6.00
.00
6.00
32.00
1.0667
1.59597
SCSs
30
10.00
.00
10.00
154.00
5.1333
3.10432
QS
30
14.00
.00
14.00
81.00
2.7000
3.86987
NGs
30
9.00
.00
9.00
75.00
2.5000
2.46003
AGs
30
6.00
.00
6.00
15.00
.5000
1.19626
VGs
30
1.00
.00
1.00
2.00
.0667
.25371
Table 6 Percentages of GCs in proportion to the TNWs, along with percentages of SCSs
and QSs in proportion to the total number of sentences
GCS
Middle-Class
Working-class
18.980
13.433
PPs
4.401
4.736
IPs
2.608
1.084
SCSs
74.885
65.531
QSs
25.114
34.468
NGs
1.853
2.711
AGs
0.487
0.542
VGs
3.250
0.072
CWRs
Pronouns
Sentences
101
After data collection, the percentages of the frequencies of GCs in each social class
were computed in proportion to the TNWs produced by the same social class. Table 6 also
shows the percentages of SCSs and QSs in each social class computed in proportion to the
total number of sentences produced by the same social class. Although for categories PPs,
IPs, NGs, and AGs, the percentages were nearly the same for both social classes, the
percentages of CWRs, SCSs, QSs, and VGs were different for both groups. Middle-class
members produced higher percentages of CWRs, SCSs, and VGs. However, the percentage of
QSs was greater for the working-class members.
3.2.
In order to see if there were any significant differences between the two groups in their
frequencies of the GCs, 9
indicated
significant differences in five cases, and four of the differences in the frequencies of GCs
were found insignificant.
In the case of the TNWs, the middle-class language data comprised more words. There
was a significant difference (
was another point of discrepancy between two classes. Working-class members were more
eager to use words more repetitively than members of the middleclass. The Chi square result
indicated one more significant difference (
where the number of IPs produced by middleclass was nearly triple that of working-class
data. Another significant difference (
IPs between two groups. Although the middle-class overcame working-class data in the
frequencies of the TNWs and IPs, working-class members produced more QSs and the
difference in the frequency of QSs was found to be significant (
= 5.333, p <.01)
of NGs. Finally, since the frequency of AGs was exactly the same for both SCs,
was 0 and
p was equaled to 1.00. Summary of Chi square results with respect to the distribution of
grammatical categories is shown in Table 7.
Table 7: The results of
GCs
TNWs
CWRs
13.584
Sig.
.000**
PPs
IPs
SCSs
QSs
NGs
AGs
VGs
13.6213 .320
20.571
.314
4.971
2.455
.000
5.333
.000**
.000**
.575
.026*
.117
1.000
.021*
.572
Discussion
The present study attempted to compare working and middle-class language users in an
Iranian context with respect to the frequency of certain GCs in their compositions. As was
reported in the previous sections, some discrepant findings arose out of the data analysis. In
terms of the frequency of the TNWs, a significant difference was found between the groups.
For instance, middle-class members produced greater number of vocabularies. This means
that middle-class members were more productive and creative than subjects from the lower
social class. Although the participants were asked to write about the same topics, the
professors and master students were more productive. The difference in the productivity level
of the two groups leads two general conclusions: First, with respect to the relationship
between language and thought, professors and Master students might have read more books;
they are more prepared to discuss abstract concepts such as life. In other words, they are more
thoughtful and have more ideas to express. The second conclusion that is more in line with
Bernstein's language code theory is that the higher linguistic creativity of the middle-class
members may have nothing to do with ones thought but with the developed language pattern,
which has the potentials to discuss any abstract topic. The topics selected for participants to
write about were so general and ideological that people with different levels of education
could write about. Therefore, the first remark that educated people can discuss more because
they are more thoughtful and have more opinions for the discussion cannot be taken seriously.
On the other hand, production of more words can be discussed in terms of a more developed
103
language pattern which provides language speakers with more words to use in language
production.
While middle-class members were more productive in their writing, working-class
members were more repetitive in their terminologies. That is, middle-class members
expressed themselves using a variety of vocabularies, but the self-presentations of workingclass members were more bound to a range of repetitive words that were more or less
synonymous to the topics they were to write about. It can, thus, be claimed that the
application of word repetition by working-class members is due to their inaccessibility to an
enough corpus of terms in their language code to express themselves easily. On the contrary,
middle-class members seem to have access to a more lexically developed language which
allows language producers to express the same intentions with different lexicons.
As for the IPs, there was a significant difference between the two groups of participants
in that middle-class members used more IPs. In contrast with PPs as placeholders for proper
or common nouns with real referents in the world, IPs refer to no definite persons in the real
world and are used to express facts or opinions anonymously. In general, IPs are factors that
are used to express ideas context-independently. The higher number of IPs means that their
language production is less context-or-situation-bound. It can, therefore, be claimed that
middle-class members express ideas as generalizations. In other words, they usually
overgeneralize their beliefs to be more acceptable in different situations. Stated otherwise,
middle-class language pattern can be regarded as a general, or in Bernstein's terms,
universal language code.
Difference in language production of the subjects was noteworthy for QSs. It was found
that QSs were more common among members of lower social classes. As noted by Ahmadi
Givi and Anvari (2006), QSs are shorter and more concise sentences because they lack some
elements of SCSs. Results of data analysis indicated that such sentences are typical of
working-class members. This finding supports Bernstein's idea that restricted code is full of
short and incomplete sentences, either grammatically or semantically.
In the case of VGs, a significant difference was reported between two social classes as
well. Middle-class members had more preferences for VGs which are examples of language
elaboration devices. They are used to express meaning more explicitly and in details. In the
current research only those categories of verbs that have been linked together by the Persian
conjunction word va (and) were included. The verbs that follow the previous verb by a
conjunction give more explanation to the meaning of the previous verb. In such groups of
verbs, neighboring verbs influence each other semantically. The more verbs that accompany
104
each other, the more comprehensive and exact meaning is expressed. It can be claimed that
this language pattern which is typical of middle-class members is semantically precise. Such
a precision is gained through a link of linguistic elements that express ideas explicitly. This
remark supports Bernstein's theory in that elaborated language code is more explicit and
semantically precise and expresses all meaning exploiting linguistic structures.
PPs, as opposite elements to IPs, were another point of investigation in the study. PPs
replace the proper and common nouns in the real context and are indicators of a contextdependent language code. The more members use PPs in their speaking or writing, the more
context-dependent and specific their language will be. Though to Bernstein (1973a) restricted
language code is context-dependent and full of PPs, in the present study, no significant
difference was found between the frequencies of PPs in performance of the participants in the
given classes. PPs and IPs stand as dichotomous concepts, each of which typical in one of the
codes developed by Bernstein. In this study, the higher frequency of IPs among middle-class
members was approved, but the frequency of PPs was nearly the same for both classes, which
did not support Bernstein's claim. The percentages of PPs in proportion to the TNWs
produced by each class also exhibited no difference between the two classes.
Since the structure of SCSs is based on the common logical grammaticality, a sentence
has all the grammatical elements, hence longer and more logical. Data analysis indicated no
significant difference between the frequencies of SCSs. Of course, the percentages of SCSs in
proportion to the TNWs produced by each class indicated a big difference between two SCs.
Therefore, it was shown that middle-class members have produced higher percentage of
complete SCSs in contrast with the working-class members who produced higher percentage
of QSs.
As indicated by Ahmadi Givi and Anvari (2006), just like VGs, AGs and NGs are
appropriate tools to produce a more elaborated language code. It was found that middle-class
members preferred to use VGs more than working-class members, but no significant
difference between the frequencies of AGs and NGs was reported. In other words, in case of
AGs and NGs, Bernstein's theory was not supported either.
5.
Conclusion
Seeking the distribution and significance of language users linguistic patterns within distinct
social classes, the study was an attempt to underline the interplay between language
production and socio-economic classes. Elaboration of the interaction can provide a better
view of applicability of linguistic categories within the social frameworks. Although the
105
investigation of differences in the frequency of GCs in the language data collected from both
groups was not an absolute issue, Bernstein's remark on the linguistic differences between
language speakers from various social classes was supported to some extent. Middle-class
members were found to be more productive and creative than persons from lower classes. The
accessibility to enough ranges of vocabularies or terminologies was different across groups.
Working-class members had limited access to terms to easily express themselves. Since
middle-class members used many more IPs, it was concluded that their language code is less
situation-or-context-specific. In other words, middle-class language code is a general or
universal pattern which is easily overgeneralized to different occasions. In addition, it was
found that working-class members usually express their meanings using shorter sentences.
Finally, although the distribution of AGs and NGs was the same across two classes of Iranian
native speakers, the middle-class's preference for the production of more VGs indicates that
their language is more elaborated and explicit. All in all, the data collected in the given
Iranian context support Bernstein's language code theory to a certain extent.
6.
Research Implications
The findings of the study can have some implications for language studies, sociolinguistics,
schooling and education in Iran and similar context. First, it can contribute to the field of
discourse studies. Since a central emphasis of Bernstein's theory is the impact that context
imposes on the production of linguistic structures, discourse analysts can take advantage of
this study about the production of the linguistic structures. This study can support Bernstein's
differentiation between horizontal and vertical discourses that can also be a good framework
available for discourse analysts. Second, although sociologists and sociolinguists usually
consider factors like occupation and education as indicators of social class, the present study
advocates linguistic structure as a new indicator for that purpose. The difference in the
language structures produced by people from different social classes justifies sociolinguistic
perspectives on the application of the language pattern as a device for determination of social
class. Third, even though the present study was conducted among adult participants, its
findings can be beneficial to language teachers in making them alert to the fact that students
from different social class families do not have identical access to language knowledge in
schooling even though they have passed similar level of education. As wary of socioeconomic status of students and their different accessibility to the language use, teachers can
minimize language loss of working-class students through holding classes participated by
students with different socio-economic backgrounds. Such heterogeny might provide
106
References
Ahmadi Givi, H. and H. Anvari. (2006). Persian syntax (3rd ed.). Iran, Tehran: Fatemi
Publication.
Akinnaso, F. N. (1985). On the similarities between spoken and written language. Language
and Speech, 28(4), 323-359.
Aliakbari, M., M. Samaie, K. Sayehmiri and M. Qaracholloo. (2012). The grammatical
correlates of social class factors: The case of Iranian fifth-graders. Linguistikonline,
56(6), 3-20.
Allafchi, J. (1998). The relationship between social class and speech codes with respect to
syntactic complexity. Unpublished Master's dissertation. Shiraz University, Iran.
Atkinson, P. (1981). Bernstein's structuralism. Educational Analysis, 3(1), 85-96.
Bernstein, B. (1958). Some sociological determinants of perception: An enquiry into subcultural differences. British Journal of Sociology, 9(10), 159-174.
Bernstein, B. (1960). Language and social class: A research note. British Journal of
Sociology, 11(3), 271-276.
Bernstein, B. (1961). Social structure, language and learning. Educational Research, 3(3),
163-176.
Bernstein, B. (1962a). Linguistic codes, hesitation phenomena and intelligence. Language
and Speech, 5(1), 31-46.
Bernstein, B. (1962b). Social class, linguistic codes and grammatical elements. Language and
Speech, 5(4), 221-240.
Bernstein, B. (1972). A sociolinguistic approach to socialization with some reference to
educability. In J. J. Gumperz and D. Hymes (Eds), Directions in sociolinguistics: The
ethnography of communication. New York: Halt, Reinhart and Winston.
107
Bernstein, B. (1973a). Class, codes and control, Vol 1. London: Routledge and Kegan Paul.
Bernstein, B. (1973b). Class, codes and control, Vol 2. London: Routledge and Kegan Paul.
Bernstein, B. (1999). Vertical and horizontal discourse: An essay. British Journal of
Education, 20(2), 157-173.
Bornstein, M. H., M. O. Haynes, and K. M. Painter. (1998). Sources of child vocabulary
competence: A multivariate model. Journal of Child Language, 25, 367-393.
Christie, F. (1999). Pedagogy and the shaping of consciousness: Linguistic and social
processes. London: Continuum.
Dollaghan, C. A., T. F. Campbell, J. L. Paradise, H. M. Feldman, J. E. Janosky, D. N. Pitcairn
and M. Kurs-Lasky. (1999). Maternal education and measures of early speech and
language. Journal of Speech, Language and Hearing Research, 42, 1432-1443.
Gillam, R. B. and J. R. Johnston. (1992). Spoken and written language relationships in
language/learning-impaired and normally achieving school-age children. Journal of
Speech and Hearing Research, 35, 1303-1315.
Ginsborg, J. (2006). The effects of socio-economic status on childrens language acquisition
and use. In J. Clegg and J. Ginsborg (Eds.), Language and social disadvantage:
Theory into practice (pp. 9-27). Chichester: John Wiley and Sons.
Hoff-Ginsberg, E. (1998). The relation of birth order and SES to children's language
experience and language development. Applied Psycholinguistics, 19, 603-629.
Hollingshead, A. B. (1957). Two factor index of social position. New Haven, CT: Privately
printed.
Holmes, J. (1992). An introduction to sociolinguistics. London: Longman.
Hosseini, A. (1993). The relationship between L1 academic proficiency and foreign language
learning with respect to socio-economic background of learners. Unpublished
Master's dissertation. University for Teacher Education, Tehran, Iran.
Karabel, J. and A. H. Halsey. (1977). Power and ideology in education. New York: Oxford
University Press.
Naigles, L. R. and E. Hoff-Ginsberg. (1998). Why are some verbs learned before other verbs?
Effects of input frequency and structure on children's early verb use. Journal of Child
Language, 25, 95-120.
Nam, C. B. and M. G. Powers. (1983). The socioeconomic approach to status measurement.
Houston: Cap and Gown.
Olson, D. R. (1993). How writing represents speech. Language and Communication, 13(1), 117.
108
Olson, D. R. (1995). Towards a psychology of literacy: On the relations between speech and
writing. Cognition, 60, 83-104.
Richardson, K., M. Calnan, J. Essen and L. Lambert. (1976). The linguistic maturity of 11year olds: Some analysis of the written compositions of children in the national child
development study. Journal of Child Language, 3, 99-115.
Robertson, I. (2008). An introduction to Basil Bernstein's sociological theory of pedagogy.
Retrieved from http://sites.google.com/site/robboian/IntroBernstein.pdf?attredirects=0
Sadovnik, A. R. (2001). Basil Bernstein. Prospects: The Quarterly Review of Comparative
Education, 31(4), 687-703.
Strmqvist, S., V. Johansson, S. Kriz, H. Ragnarsdttir, R. Aisenman and D. Ravid. (2002).
Toward a cross-linguistic comparison of lexical quanta in speech and writing. Written
Language and Literacy, 5(1), 45-67.
Tizard, B. and M. Hughes. (1984). Young children learning: Talking and thinking at home
and at school. London: Fontana.
Tseng, M. Y. (2002). On the interplay between speech and writing: Where Wordsworth and
Zen discourse meet. Journal of Literary Semantics, 31(2), 171-198.
Walker, D., C. Greenwood, B. Hart and J. Carta. (1994). Prediction of school outcomes based
on early language production and socioeconomic factors. Child Development, 65, 606621.
Wardhaugh, R. (2006). An introduction to sociolinguistics (5th ed.). Oxford: Oxford
University Press.
109
Appendix A
The present prompt has been developed for research purposes. Appreciating your favor, please
help us carrying out the research
It should be mentioned that, since no personal information of respondent's identity is requested,
all the opinions presented in the prompt will remain confidential and will be used only for
research purposes.
Write what you like about the following topics.
Life
Home country
Many thanks
110
Appendix B
Table 2: Frequency of the grammatical categories in the middle-class group
Middle TNWs
CWRs
PPs
IPs
SCSs
QSs
NGs
AGs
VGs
Class
1
96
14
116
13
13
65
113
24
83
22
103
21
193
32
11
101
128
17
10
10
93
10
11
161
21
12
93
13
13
13
14
158
19
15
185
21
111
16
93
10
17
126
12
14
18
189
22
19
57
13
10
20
39
21
75
30
12
22
83
23
70
24
38
25
110
12
26
60
27
73
28
73
14
29
160
20
13
30
102
14
Total 3049
412
123
80
164
55
57
15
10
112
TNWs CWRs
PPs
IPs
SCSs
QSs
NGs
AGs
VGs
122
12
32
68
18
37
71
21
31
13
59
16
97
24
71
13
10
42
11
16
12
51
11
13
117
30
14
87
28
10
15
106
11
16
95
34
14
17
100
15
113
18
82
25
19
37
11
20
54
16
21
141
16
22
107
11
11
23
166
15
10
24
125
25
203
50
26
165
18
27
143
38
13
28
101
11
29
113
15
30
127
29
Total
2766
525
131
32
154
81
75
15
114
1.
Introduction
Code-switching has been extensively studied in the past few decades in terms of its patterns
and meanings in oral production (e.g., Adendorff 1996, Cheng and Butler 1991, Gumperz
1982, Hoffman 1991, Lu 1991, Myers-Scotton 1989). It has been found to be a discursive
convention which can index contextual and metalinguistic information that is conveyed by
other means (e.g. prosody) in monolingual settings. This is particularly relevant to online
communications which, in addition to being social and context dependent, are structurally
simpler to meet specific interactive purposes and overcome its lack of a conventional form of
115
presence (Bays 1998, Crystal 2001). However, little is known about how interactional
frameworks are built in fluid virtual communities populated by English learners, especially
strangers whose identities and presence are primarily maintained by their verbal practices.
Using an interactional perspective, this study analyzed interactions of English learners in
China in an online chat room to uncover how code-switching helps speakers manage social
distance and facework as well as how this affects the addressees choice of code. Also, it aims
to contribute to the research on second language learning by identifying some gaps in
learners interactional competence in English through examining where they switch to their
native language.
1.1.
Over the past few decades, code-switching, which has been described as two languages
juxtaposed, or alternated in discourse, typically within a single conversation, or within a
sentence or utterance (Auer 1998, Liebscher and Dailey OCain 2005), has been dealt with by
numerous scholars. In a prototypical case, code-switching occurs in a sociolinguistic context
in which speakers orient towards a preference for one language at a time (Auer 1998). As an
integral aspect of conversational analysis, it is one of the contextualization conventions which
are acquired through interactions where people participate in a particular network of
relationships (Gumperz 1982).
Code-switching is intention-driven and functionally motivated (Adendorff 1996,
Hoffman 1991, Myers-Scotton 1989). For example, Saville-Troike (1982) identified eight
different functions such as softening or strengthening a request or command, humorous
effect, or lexical need. Gardner-Chloros (1991) argues that code-switching may occur as an
effect of the topic or the roles of the participants. Auer (1998, 2007) asserts that as an index
of certain extralinguistic social categories it can be interpreted by participants as indicating
either some aspects of the situation (discourse-related switching), or some features of the
code-switching speaker (participant-related switching).
In discussing how a code signifies a network of interpersonal relationships, McConvell
(1988) believes that we should consider the standpoint and attitude speakers wish to express
and of the social domain where they wish to relate to the interlocutor or the referent. Tay
(1989) argues that it can contribute to solidarity and rapport in multilingual discourses. Codeswitching has been associated with footing, which is defined as the speakers alignment, or
set, or stance, or posture, or projected self (Goffman 1981: 128) and the projection of a
speakers stance towards an utterance (its truth value and emotional content), as well as
116
towards other parties and events (Levinson 1988, as cited in Wine 2008: 2). Through its
departure from the established language-of-interaction, code-switching signals otherness of
the upcoming contextual frame and thereby achieves a change of footing (Auer 1998). In
other words, it can affect conversational status and social distance among interlocutors during
the production and reception of an utterance. As a form of foot shifting, code-switches can be
temporary suspensions of social relations that are later resumed or change the nature of whole
activities (Levinson 1992).
Existing studies have been primarily oriented towards the way speakers alternate
languages and how this indexes speakers purposes and the communication situation. There
are several exceptions which look into how code-switching relates to the interaction between
interlocutors. For example, in a pioneering study of language alternation in Italian-German
peer talk and adult-child conversations, Auer (1984) analyzed both speaker adjustments and
participation framework phenomena in relation to code-switching and demonstrated how
code-switching may be used to attain a shift in the recipient constellation. Cromdal and
Aronsson (2000) examined in depth speakers mutual adjustment of actions and reception of
code-switches, revealing that footings are intrinsically interactional achievements. Su (2009)
suggests that code-switching can negotiate interpersonal relationships in a face-threatening
situation on the interactional level in conversational interaction, and can make it easier for the
addressee to identify changes in frames, alignment and footing and react accordingly. The
interactional perspective has informed the research on how code-switching affects the
participation framework. Nevertheless, bilingual conversations have rarely been approached
explicitly from the perspective of whether and how specific code shifts can affect others
choice of code.
1.2.
Besides revealing the interactive mechanism, analyzing the way codes switch has been
considered relevant to language learning. In the unfolding of meaning, switches can be
indicative of different stages in learners learning and using of the target language. In
particular, code alternation can fulfill a wide range of functions in cognitive, linguistic,
interactional as well as discourse terms in the L2 setting (e.g., Van Lier 1996, Simon 2001).
Code-switching has been traditionally seen as an asset in communication. As pointed out by
Goffman (1981: 156), switching codes requires the capacity of a dexterous speaker to jump
back and forth, keeping different circles in play. Heller (1988) sees it as a constructive verbal
strategy used in social interaction which facilitates the effort of interlocutors to seek common
117
ground in bilingual conversation. Cheng and Butler (1989) contend that it can be seen as an
asset when it is employed to promote the content and the essence of the message. In two more
recent studies (Liebscher 2005, Olmstead 2004), code-switching has been shown to be a
useful conversational resource that enhances sociability by building shared understanding
about the ongoing interaction and indicate participants orientation toward the interaction and
toward each other. Some other scholars have related code-switching with language
deficiency. For example, Auers (1984) non-classroom data show that code-switching could
be an indication of a momentary lack of competence. Cheng and Butler (1989) are also
concerned that it can be a deficit when used to the extent that it interferes with
communication. Sert (2005) also reminds us that code-switching may interfere with mutual
intelligibility when learners interact with native speakers of the target language and pose
long-term damage on the foreign language learning process. Whether code-switching plays a
positive or negative role depends largely on the addressee and specific goal of interaction.
However, from the perspective of second language learning, studying the way learners switch
between languages could reveal non-native learners communicative capacity. Under the
assumption that the function performed by the use of the native language in target-languagebased conversation may indicate a gap of capability and lack of comfort in the target language
relative to the native language.
1.3.
encounters contributes to the fluidity or changeability that other aspects of lives do not have
(Healy 1997, Wilbur 1997). Not surprisingly, the Internet has become a site where virtual
communities of social and cultural interest groups are organized and new modes of
communication are formed.
The Internet chat group is a typical example of such virtual communities, which is
defined by Rheingold (1993: 7) as social aggregations that emerge from the Net when
enough people carry on those public discussions long enough, with sufficient human feeling,
to form webs of personal relationships in cyberspace. Previous studies (e.g., Friermuth 2001,
Hall 1996, Lam 2004, Tepper 1997) have dealt with online chatting as a distinct form of
communication in the make-believe world. For instance, Bays (1998) asserts that the
combination of textuality and temporality contributes to a conversational mode of the
environment which allows for an enlarged possibility for identity experimentation and fictive
exaggeration of discursive action. Crystal (2001) also points out that in synchronous
communication in computer-mediated contexts, the form of talk has been traditionally seen as
social rather than serious in its content in that it is more context dependent and structurally
simpler to serve specific interactive purposes.
A major distinction has been made between online and real world communication
concerning the form of presence. The subtleties in conventional conversations typically
conveyed by physical qualities such as vocal intonation, stress and gesture become
problematic in the chat room where the encounter is typically not face to face. However, as
proposed by Bays (1998), the need for the underlying sense of presence can be fulfilled by the
physical setting of the computer and the scrolling dialogue, which indicates that there is some
unseen user out there typing and sending responses to their messages, as well as some
discursive strategies, such as addressivity, which allow the users to engage personally in the
electronic setting. According to Bays (1998), participants readjust their contributions for a
valid and desired exchange by recreating presence as the cognitive foundation of conversation
where parallels to ordinary conversation can be found through discursive conventions.
Code-switching has been found to be one of such discursive conventions which can
index contextual and metalinguistic information that are conveyed in other ways, e.g.,
prosody, in monolingual settings (Gumperz 1982). Comparable to prosodic parameters and as
a contextualization strategy, it helps create situational co-presence in a pseudo-physical
environment (Auer 1988, Nilep 2006). It has been found to work as a feasible strategy
sustaining viable social encounters. For example, Bays (1998) asserts that alternative
language choice is used as a strategy to achieve and handle disagreement in the Internet chat
119
room. In Lams (2004) investigation with two Chinese immigrant high school girls in the US,
the examination of their code-switching practices revealed that the girls' participation in the
chat room should be understood in relation to their experiences in the national context of the
US and demonstrated how alternative identities are sought in the virtual world. Ho (2006)
looked into the bilingual practices of tertiary students in Hong Kong when using ICQ an
instant messaging computer program. She found that English and Chinese were
complementary to each other in helping participants handle the pressure of instant
communication. Crdenas-Claros and Isharyantis (2009) study with some of their MSN
messenger (another online social networking site) contacts and Goldbargs (2009) analyses of
the survey results with her personal contacts suggest that online chatters showed peoples
preference for their first language in conveying more personal content and feelings. These
studies have been illuminating how peoples choice of code may relate to social realities in
virtual communities and have given rise to the relevance of code-switching to learners verbal
behaviors in the online context. Nevertheless, the majority of existing studies focused on
people who already knew each other. Relatively little is known about code-switching between
total strangers in an online environment, where the validity and durability of their identities
rely almost exclusively on their presence and behaviors in the virtual context.
1.4.
This Study
Although English has been widely accepted as an indispensable tool for achieving academic
and career advancement in China, learners generally do not have much exposure to the
English-speaking environment other than classroom settings. In such a context where English
is rarely used for daily communication, the Internet chat room represents a unique locale of
interactions; it has been regarded by many English learners as a useful and handy site to
practice their English, especially spoken English, comparable to the so-called English
corner outside EFL classrooms where spoken English is practiced in the physical world
where the conversation is typically the first encounter for the interactors who do not know
each other.
However, not much information is available as to how this group of learners interacts
synchronously in a Net-based environment where co-presence is maintained primarily
through ones literary practice. By studying the code-switching practice and its functions in
the English chat room from an interactional perspective, it is hoped that we can understand
how social relations and interactional meanings are co-constructed through particular forms
of discursive practices. Meanwhile, through analyzing the sequential position in which a
120
code-switch occurs and how the code choice of one interlocutor affects that of the other, we
can catch a glimpse of the dynamics at play which prompt code-switches and affect the
reception and code choice by the addressee. Finally, it is presumed that code shifts in contexts
where co-presence is maintained primarily through verbal practices not accompanied by
prosody or body language could indicate learners (lack of) target language competence to
meet various interactional needs.
2.
Methods
2.1.
Data Collection
The chat room under study is called English WW, a component of www.bliao.com which is
the largest chatting website in China composed of various freely accessible chat rooms
catering for people with different interests. This particular room is one of the twelve English
chat rooms on this website intended for people to practice conversational skills in English; in
other words, the chatters are typically English learners. Chatters use nicknames they make up
for this chat room, which enables them to remain anonymous regarding their real life
identities. As observed from the exchanges, interactors are from a broad range of
backgrounds, being college students, white collars, teachers, etc.
This chat room was selected because it was the most populated English chat rooms of
the website, with an average of 50-60 participants at a time, which provided a rich resource
for linguistic research. Also importantly, the settings of this chat room enabled the researcher
to easily copy the ongoing conversation.
The researcher logged into the chat room and observed it for about two hours on
twelve consecutive days. The conversations in progress were copied and saved for subsequent
analysis, resulting in approximately 18 hours of verbal exchanges. Due to the voluntariness,
anonymity, irregularity and fluidity of online communities, it was impossible to obtain
demographic information from the participants or keep track of them once they quit the chat
room. Each line is prefaced by the names of both the speaker and the addressee, making it
possible for the investigator to piece them together and obtain individual conversations from
the synchronous multiple conversations which mingled together on the screen.
Then instances of code-switching were identified and examined to find under what
circumstances participants shifted codes, and whether and how the code-switching affected
social relations with the interlocutor and the code choice of the interlocutor.
121
2.2.
English proficiency varied greatly from chatter to chatter; some people demonstrated
noticeable and frequent grammatical errors in English. However, the major focus of the
present study is not English proficiency variation, but the way people shifted between English
and Chinese. It was noted through observational data that this English-based chat room had
been turned into a peculiar English-based bilingual community through the use of a mixedcode variety of language among the interactors, consisting of English and pinyin, i.e.,
romanized Chinese. Pinyin is a system of romanization for Standard Mandarin. It was
adopted in 1979 by China as the method of phonetic instruction in mainland China and
established by the International Organization for Standardization (ISO) as the standard
romanization for modern Chinese. Pinyin uses Roman letters not to represent the shapes of
Chinese characters, but to spell the sounds of Standard Mandarin (Swofford 2006). It has also
become a convenient tool for entering Chinese language text on computers. Pinyin was found
to be a code preferred to Chinese characters in the chat room under study, which could be
primarily a result of the participants avoiding the trouble of having to convert between
English and Chinese characters, or partly due to the consideration that in an English context,
Chinese characters would appear somewhat abrupt. Therefore, the following section will
focus on the switch between English and pinyin within utterances and across utterance
boundaries. Among the great number of such switches, a large part of which took place in
brief phatic verbal exchanges, three excerpts were selected for detailed qualitative analyses
because they provided relatively complete communicative settings which made it possible to
carry out more meaningful, objective, and rational interpretation and discussion from the
interactional perspective.
3.
Discussion
It was found that in many cases, chatters used various combinations of English and pinyin,
which seemed to have worked well with this chat group. A noticeable aspect of the
phenomenon of code-switching was the attachment of Chinese particles to the end of English
utterances. Although pragmatic particles do not contribute significantly to the propositional
content, they affect the utterance as a whole in that they provide contextual coordinates for
the proper interpretation of the speakers utterances in ongoing discourse (Ostman 1982). In
traditional Chinese grammar, sentence-final particles are referred to as yq c mood words,
which suggests that their function is primarily to relate in various ways the hosting utterance
to the conversational context and to indicate how this utterance is to be interpreted by the
122
hearer (Li and Thompson 1981). Although these particles are optional as far as
grammaticality judgments are concerned, they are pragmatically informative and express the
speakers attitude or emotional state in the communication interchange. As pointed out by
Chao (1968), they are important devices in Chinese that fulfill many of the functions of
intonation in other languages, such as English, which is especially meaningful in online chat
rooms where there is a lack of prosodic features.
In the collected corpus, many conversation participants adopted Chinese sentencefinal particles to cue the modality of their utterances and their orientation to the addressee.
This is particularly interesting because there is no one-to-one correspondence between pinyin
and Chinese characters, not to mention that pinyin mixed in English utterances was not
marked with tones, an important feature of Chinese pronunciation. In the following excerpt,
particles constitute all of the code-switches from English to pinyin. Tim and Vicki
(pseudonyms) are talking about Vickis relationship with her boyfriend. Vicki is not very
happy with her boyfriend and Tim is trying to help her out by offering suggestions.
Excerpt 1
1
Tim:
Vicki: how?
Tim:
Tim:
Tim:
10
Vicki: iiiiiiiiiii)
11
Tim:
12
13
Tim:
14
Vicki: he said
15
16
Vicki: feel
17
18
Tim:
yes?
he loves you?
which stage does a girl feel the love to her from bf most?
123
19
20
Tim:
21
Vicki: at my stage ba
22
Tim:
23
Vicki: I wish I have bf, we leave in different city but not far from
24
25
Tim:
26
I see
hehe
Tim and Vicki have been conversing solely in English for 16 minutes. Then in reply to
Vickis question in line 3, instead of using a question mark, Tim code-switches to pinyin ne, a
rough equivalent to how about, which has the function of converting a statement into a
question in context that is already known (Chu 1998). The tentativeness achieved by suffixing
such a force-reducing particle saves Vickis negative face and indicates Tims awareness of
the potential risk of being perceived as impolite and intrusive in advising people, especially
strangers, on their personal affairs. This is Tims adaptation to this peculiar virtual
environment which lacks nonverbal subtleties that can otherwise be conveyed by body
language or voice features. Tims change of code to signal his pragmatic intent triggers
Vickis incorporation of pinyin a in line 6, which, as a sentence-final particle, similar in
pragmatic function to ne, reduces the assertiveness of the message conveyed by the sentence
(Li and Thompson 1981). This suggests Vickis attempt to mitigate the tone of her negative
reply and is a sign that she is aware of Tims insertion and the potential threat to Tims
positive face. The same particle is also invoked by Tim in line 9 as a face-saving tone
softener for his disagreement with Vickis comparison between love and game playing.
Vickis second code-switch to ba in line 21 implies her desire to solicit the approval or
agreement of the hearer with respect to the information conveyed by the sentence (Li and
Thompson 1981: 307); its semantic function resembles that of questions dont you think so?
or wouldnt you agree? in English. This also seems to contribute partly to Tims code
change in line 22 which combines the English utterance with the Chinese mitigator and
question marker ne.
In brief, Tim initiates the use of Chinese sentence-final particles, which results in a
similar choice of code on the part of Vicki, who, after a little while, also resorts to these
particles, which in turn affects Tims language use. This extract shows switching to
romanized particles is used to adjust and negotiate interlocutors involvement in this virtual
124
environment and to affect the mutual interpretation and participation in the ongoing dialogue.
Such careful tagging reduces assertiveness of the otherwise monolingual English utterances
and indicates that the accommodation of the interlocutors is probably out of face-maintaining
considerations when making suggestions, showing disagreement or indicating tentativeness.
This mixture of codes facilitates the building of rapport and intimacy between the speakers
involved.
It was also found that the insertion of pinyin minimized the chance of communication
breakdowns by softening the atmosphere that would be otherwise tense. In the following
excerpt, Tina and Jason ask for each others means of contact, i.e., Tinas number on QQ,
another chat program, and Jasons email address. However, somehow, neither of them
succeeds.
Excerpt 2 Part 1
27
28
Jason: yeah
29
30
Jason: got it
31
32
Tina: huh
33
Jason: hi
34
35
Tina: no ya
36
37
Tina: why ??
38
Tinas attachment of ya to her reply to Jasons request for her QQ number in line 35 is a tone
softener, counteracting the forcefulness of her negative reply that is potentially facethreatening and offensive because it is likely to be interpreted as a refusal. Jason immediately
returns the rejection in line 36 in an unmarked manner to save his own face by claiming that
he does not have an email, which obviously takes Tina by surprise and threatens her positive
face, as shown by line 37. At this point, the atmosphere is getting tenser and the conversation
seems to have reached a deadlock. Then Jasons unexpected and thorough change of code in
125
line 38 suggests his realization of the possible embarrassment caused by his bluntness in line
36; it may be a repair of line 36 based on Tinas use of the mixed code in line 35. From
exclusively English to exclusively pinyin, this drastic code change indicates Jasons timely
adjustment to the changing context. Their conversation continues.
Excerpt 2 Part 2
39
40
41
Tina: no qq here
42
Jason: oh
43
Jason:
44
Tina: soooooooooooooooooo
45
46
Jason: what?
47
48
49
Jason: hehe
50
51
52
53
so pitiful
Jason is not annoyed by Tinas disbelief about him not having an email. Instead, in line 40, he
offers to send his picture to Tina through QQ which is what Tina just claims she does not
have, only to be rejected indirectly by Tina in line 41 on the grounds that it is an even
bargain in line 45. Jasons what in line 46 shows that he is either surprised or does not
understand what Tina says, which provides a need and opportunity for Tina to repair the
satirical tone and provocativeness of line 45 through the attachment of ya to lines 47 and 50.
This modification makes the tone lighter and more playful, and thereby reduces the tension.
As a result, Jason seems to be able to conform to this newly emerging norm of code use, and
follows Tina in the use of ya for his never mind (line 51), which steers the conversation in a
more friendly direction. Thus the wh-questions of line 37 and line 46 both set off a
subsequent change of code use: before them, the participants use English when trying to
sound assertive to keep their own face; after them, they code-switch to pinyin to various
126
extents to save the addressees face. Besides, in this process, the participants are keenly
sensitive to the subtle messages conveyed through code alternation by the interlocutor and
often adjust their code choice accordingly. Social relations are thus implicitly co-constructed
in the virtual environment through a distinctive way of speaking when people modify their
own behaviors in the sequential context of a conversation.
This virtual community is also a place where peoples behaviors are manipulated
explicitly through shifts in code. In the following dialogue between Justin and Linda, the
mixed-code variety of language use not only heightens the interpersonal nature of the
conversation but also signifies a process in which Linda gets socialized to behave more
politely in this peculiar environment of interaction.
Excerpt 3 Part 1
54
55
56
57
Justin: :)
58
59
Justin: that is not easy, many people get worse and worse, don't be greedy la
60
Linda: haha
Here Justins use of an imperative in line 59 is a Chinese way of establishing intimacy and
solidarity; but there is still a potential risk of being perceived as rude and offensive.
Therefore, la is attached to line 59 as a mitigator. This particle usually appears as a sentence
suffix, used in many Chinese dialects to present a sentence as rather light-going and to entice
solidarity. This combination of an English imperative and a Chinese particle is proved to be
effective because Linda is obviously not offended, but amused and pleased, as shown in line
60 as follows.
Excerpt 3 Part 2
61
62
63
Justin: ah?
127
64
Justin: bu xing,
[You cannot do this.]
65
Justins total code-switch for his tag question in line 61 makes the tone even milder, which
further counteracts the assertiveness in line 59. This is followed by another complete change
of code on the part of Linda in line 62. But Lindas hasty judgment of their social distance
causes her to make fun of Justin. Her ni ge tou la is a teasing way in Chinese of claiming
ones disagreement or negative opinion on what is said by the addressee, usually used
casually as a pet phrase with intimate friends or people lower in power rank. It can be seen as
Lindas effort in reducing her social distance with Justin. This bold use, as an indication of an
attempt for greater intimacy, is potentially face-threatening and sounds abrupt and impolite in
this context where interlocutors are usually stranger to each other. It turns out to be
detrimental to the atmosphere and gives rise to a communication crisis, which is substantiated
by Justins ah in line 63 showing his surprise with the way Linda talked to her. His
subsequent buxing in line 64 and hai shi dui ni de tou ba in line 65 reveal that he is obviously
offended and are strong protests against Lindas rude verbal behavior. In particular, in line 65,
Justin returns what Linda says in line 62 to Linda, with the addition of the affix hai shi
(meaning it would be better if) and the suffix ba. This seemingly polite expression in reality
conveys his dissatisfaction with Lindas manner, on the one hand, and works as a mitigator,
on the other hand, in the sense that it saves Lindas face through the joint use of the forcereducing prefix and suffix. Therefore, the code-switch in lines 64 and 65 can be perceived as
Justins blunt correction of Lindas verbal behavior and an indication of a change in
alignment.
Excerpt 3 Part 3
66
67
Linda: dui bu qi
[Im sorry.]
68
128
69
Linda: en
[All right]
It is worth mentioning that Linda then pauses for almost half a minute, which is a likely
indication of Lindas embarrassment resulting from Justins explicit expression of
displeasure. Justins continuing use of pinyin in line 66, which shows clearly his concern
about how Linda feels, suggests his awareness of Lindas loss of face due to his utterance in
line 65 and has a remedial function for line 65. It helps alleviate the tension building up
between them that put the conversation on the verge of a breakdown, and finally gets Linda to
apologize using the same code in line 67 for what she says in line 62. Thus, Justin finally
regains his face; subsequently, his bu yao jin in line 67 marks clearly his willing acceptance
of Lindas apology, which is acknowledged by Linda whose onomatopoeic en in line 69
puts an end to the unpleasant and embarrassing part of their verbal interchange.
Excerpt 3 Part 4
70
71
72
73
Justin: i know le
74
Linda: bai la
75
76
But Lindas switching back to English in line 70 is an intentional attempt to increase social
distance; she obviously does not feel at ease about what just happens. This results in the same
code change on the part of Justin in line 71, who then incorporates another romanized
Chinese particle le to I know in line 73. According to Li and Thompson (1986: 240), the basic
communicative function of le is to signal a currently relevant state; in other words, it claims
that a state of affairs has special current relevance with respect to some particular situation. In
this case, it signals to Linda in a mild way that Justin has already understood the reason why
she is leaving and represents Justins effort to soften the tense atmosphere through
manipulating code use. It is followed by Lindas interesting combination of bai (a loan of
English bye), which is commonly used among young Chinese intimates, and a sentence-final
particle la. This is responded by Justin in a similar fashion, which concludes this online
129
encounter. The code shift to pinyin resumed by Justin in line 73 and used by both participants
thus helps to restore rapport between the two persons.
In short, Justins playful tone accomplished by his incorporation of pinyin into his
utterances enhances the intimacy with Linda and also leads to Lindas change of code as well
as her blunt tone suggestive of her misjudgment of their social distance. Her face-threatening
teasing causes some discomfort in Justin and turns out to be unacceptable for him. The
succeeding use of pinyin is corrective, enabling Justin to make Linda realize that he doesnt
like the way he was treated by Linda, which is followed by his inviting Linda back into the
conversation after realizing Lindas loss of face. Social distance is then increased by Linda by
switching back to English as a retreat from the embarrassment, and is ultimately reduced by
Linda by an interesting mixture of codes when she leaves the chat room. Both acts
immediately change Justins code choice. Therefore, code-switching facilitates Justin and
Lindas face management and proximity manipulation. It is particularly interesting that the
extent of code-switching seems to vary with the atmosphere and purpose of the speaker. A
complete switch to pinyin or English highlights the utterance and explicitly marks speech acts
as seeking agreement, protesting, apologizing or bidding farewell, indicating the negotiation
of social meanings between the two interlocutors.
4.
Conclusions
The above analysis reveals that code-switching, which has been shown to be an interactive
and dynamic negotiation process during which participants shape their social positions and
build their virtual environment, helps Chinese learners of English actively co-construct social
meanings and relations in this virtual chat room. Their code choice and degree of codeswitching are firmly anchored to the situational need in social distance and face maintenance.
The analyzed conversations lend further support to Olmsteads (2004: 23) claim that codeswitching, which indicates participants orientation toward the interaction and toward each
other, is a positive conversational resource that enhances sociability, and allows shared
understandings about the purpose of the interaction to enter into the language practice. It
helps people convey subtle messages that underlie the propositional content and signals a role
shift in the social alignments of the participants. From the interactive perspective, one
persons selection of code constrains the interpretation and the code choice of the addressee,
which in turn has a considerable effect on their context. Peoples use of code affects the
addressees involvement in the ongoing dialogue in that it either acknowledges the latters
intention behind the code choice or corrects his or her behavior perceived as inappropriate.
130
Chatters in this online virtual community have been shown to draw on the linguistic
and discursive resources of both English and Chinese in the development of a distinct virtual
social network, which contributes to the creation of their relationships as bilingual speakers
who resorts to code shifts, especially from English to pinyin, for more subtle interactive and
social purposes. This use of hybrid language also shapes roles for interlocutors in either
encouraging or inhibiting certain types of verbal behaviors. Social distance, identities, and
facework are negotiated rather than pre-established and fixed, which is particularly
meaningful in a context where participants are strangers and other contextualization cues such
as prosodic features and body language are not possible.
Tying code-switching in a computer-mediated community in an EFL setting to
approaching the online interaction demonstrates how the electronic chat room provides an
authentic and distinct context of social interaction. It illuminates how language is a valuable
asset that enriches our knowledge of the way specific interactive purposes are served in an
online environment typically populated by strangers. Meanwhile, an examination of the way
Chinese and English are mixed as contextualization cues to index social meanings can inform
our understanding of how people adjust to the practices of the virtual community they are
involved in.
Furthermore, thanks to the lack of visual and audio aids in the context under study, the
investigation of literary practices in this peculiar online setting also sheds some light on how
the verbal behaviors of English learners in the chat room relate to their local experiences of
English learning. It has to be recognized that the Internet offers unique opportunities for EFL
learners in China to use the target language. It provides a platform for people not only to
practice their English language, but also to create a new collective identity not simply as
English speakers or Chinese speakers, but as learners trying to converse in a language that is
rarely used in their daily life. On the one hand, this mixed-code variety works well among the
interlocutors since there are no obvious signs of confusion and misunderstanding as speakers
seem to have managed to effectively get across the propositional and non-propositional
messages. On the other hand, shifting skillfully to Chinese to various extents complements
the use of English in expressing subtle interactive and social meanings, which should have
been attended to in English, given the purpose of the chat rooms. Their shift to Chinese runs
counter to their purposes in a sense. This phenomenon that members of this community use
English primarily for ideational content and frequently resort to Chinese for interactive and
emotional nuance may suggest their underdeveloped ability to attend to the social and
pragmatic aspects of communication in English relative to Chinese. Therefore, from the
131
perspective of language learning, this study makes another good case for improving the
interactive competence of English in EFL settings where exposure to authentic language use
is rather limited.
References
Adendorff, R. (1996). The functions of code switching among high school teachers and
students in KwaZulu and implications for teacher education. In K. M. Bailey and D.
Nunan (Eds.), Voices form the language classroom: Qualitative research in second
language education (pp. 388406). Cambridge: Cambridge University Press.
Auer, P. (1984). Bilingual conversation. Amsterdam: Benjamins.
Auer, P. (1988). A conversation analytic approach to code-switching and transfer. In M.
Heller (Ed.), Codeswitching: Anthropological and sociolinguistic perspectives (pp. 187213). Berlin: Mouton de Gruyter.
Auer, P. (1998). Code-switching in conversation: Language, interaction and identity. New
York: Routledge.
Auer, P. (2007). A postscript: code-switching and social identity. Journal of Pragmatics,
37(3), 403-410.
Baym, N. K. (1995). The emergence of community in computer-mediated communication. In
S. G. Jones (Ed.), Cybersociety: Computer-mediated communication and community
(pp. 138-163). Thousand Oaks: SAGE.
Bays, H. (1998). Framing and face in internet exchanges: A socio-cognitive approach.
Linguistik Online, 1. Retrieved June 9, 2008 from http://viadrina.euv-frankfurto.de/~wjournal/bays.htm
Crdenas-Claros, M. S. and N. Isharyanti. (2009). Code-switching and code mixing in
internet chatting. The JALT CALL Journal, 5(3), 67-78.
Chao, Y. R. (1968). A grammar of spoken Chinese. Berkeley: University of California Press.
Cheng, L. and K. Butler. (1989). Code-switching: A natural phenomenon vs language
deficiency. World Englishes, 8(3), 293-309.
Chu, C. C. (1998). A discourse grammar of Mandarin Chinese. New York: Peter Lang
Publishing.
Cromdal, J. and K. Aronsson. (2000). Footing in bilingual play. Journal of Sociolinguistics,
4(3), 435-457.
Crystal, D. (2001). Language and the Internet. Cambridge: Cambridge University Press.
132
134
Swofford, M. (2006). The Three NOTs of Hanyu Pinyin. Retrieved March 15, 2006 from
http://www.pinyin.info
Wilbur, S. P. (1996). An archaeology of cyberspaces: Virtuality, community, identity. In D.
Porter (Ed.), Internet culture (pp.5-22). New York: Routledge.
Wine, L. (2008). Towards a deeper understanding of framing, footing, and alignment.
Working Papers in TESOL & Applied Linguistics, 8(2), 1-3.
135
Keywords: token, type, lemma, word family, learning burden, word knowledge
Introduction
The lofty place of words in language proficiency has long been acknowledged in statements
like what learners carry around with them are dictionaries and not grammar books (Baxter
1980) and without grammar very little can be conveyed, without words nothing can be
conveyed (Wilkins 1972: 111). Both statements attest to the superior effect of vocabulary
over grammar for the development of language proficiency. In fact, grammar and language
proficiency are an outgrowth of ones lexical competency which renders word knowledge a
proxy of language proficiency. Research has consistently testified to vocabulary having
136
higher correlations with language proficiency than other measures (Qian 2002, Koda 2005,
Chen 2011). Words have both an upward and downward influence; downward to their
constituent morphemes and upward to larger units of which they are parts. In the latter, they
form the basis of all language as they are basic units of meaning upon which larger structures
like phrases, sentences, and paragraphs hinge. The bulk of vocabulary research focuses on
individual words. The exalted status of words in language proficiency coupled with Mrmols
(2011: 12) observation that despite new trends in vocabulary research that focus on higher
units as collocations or idioms, there is no doubt that the word is the main unit in vocabulary
quantification and language by and large is demonstrative of the merit there is in closely
examining the concept word which the present paper seeks to do. The interrogation of the
efficacy of the current conceptualisations of the construct word is done in the context of
Grade 3 second language (L2) learners transitioning to reading to learn in Grade 4. Such a
context, it is hoped, would be illustrative of the need for a further reconceptualization of the
construct word for word knowledge measurements on Foundation Phase (FP) L2 learners.
The Context
The Grade 3 learners who speak any of the 10 official languages of South Africa (excluding
English) as their Home Language (HL) or First Language (L1) who are on the verge of a
transition to Grade 4 form the context on which the papers discussion hinges. The table
below indicates the Home Language distribution according to the 2011 census.
SOUTH AFRICAN LANGUAGES 2011
Language
Number of speakers* % of total
Afrikaans
6 855 082
13.5%
English
4 892 623
9.6%
IsiNdebele
1 090 223
2.1%
IsiXhosa
8 154 258
16%
IsiZulu
11 587 374
22.7%
Sepedi
4 618 576
9.1%
Sesotho
3 849 563
7.6%
Setswana
4 067 248
8%
Sign language
234 655
0.5%
SiSwati
1 297 046
2.5%
Tshivenda
1 209 388
2.4%
Xitsonga
2 277 148
4.5%
Other
828 258
1.6%
TOTAL
50 961 443**
100%
* Spoken as a home language
** Unspecified and not applicable excluded
Source: Statistics SA
137
Third graders from such linguistic demographic profiles are expected to learn in their HL for
the duration of the Foundation Phase (Grade R-3) and shift, largely to English as the
Language of Learning and Teaching from fourth grade onwards (South Africa Department of
Education Curriculum and Policy Statement (CAPS) 2011). Prior to the CAPS dispensation
(which has only been phased in with effect from 2012) schools were at liberty to determine
the point at which they wanted to introduce English as a subject in their FP curriculum. The
current third graders therefore, have a diverse duration of exposure to English ranging from -1
year to a maximum of 4 years for those who have had exposure to English since Grade R.
Although they have been in school for almost three years, there is a sense in which the
majority of them are beginners in terms of exposure to English. The fact that for most of
them, English is not sufficiently reinforced at home (CAPS 2011) represents a challenge
which is accentuated by the fact that the focus of fourth grade reading is reading to learn
which is qualitatively more challenging than the FP learning to read. The assumption is that
by end of third grade the learners have attained reading proficiency in the language they are
going to use to learn, and are now well positioned to use their reading proficiency to learn
textual material. Even among HL speakers of English, a fourth grade slump, a designation of
the sudden drop-off between third and fourth grade in the reading scores (Hirsch 2003:
10) is a common phenomenon. For second language learners who have had scant exposure to
English both at home and at school, the slump could only be worse. Recognising how much
vocabulary is a proxy for language proficiency, a measure of such learners vocabulary
knowledge would be indicative of their chances of surviving the impending slump. The
question meriting consideration is whether there is a conceptualisation of the construct word
which is equal to the task of indicating the actual word knowledge of learners with the profile
described.
Conceptualisation of the Construct Word
The infamous question What is a word? has plagued the field of vocabulary testing for years
and has defied singularity or uniformity of definition. Discrepancies in vocabulary size
estimates are primarily a result of lack of consensus on what constitutes a word for wordcounting purposes. Put differently, if a child knows all the words in the statement, The boy
did not go to the shops when the other boys were going, how many words do they actually
know? Should we keep counting the word the the three times it recurs in the statement or
should we just count it once? Can we not presuppose the knowledge of boys to be an
outgrowth of the knowledge of boy to warrant treating them as the same word? Should go
138
and going not be taken as one word in different forms? Such fundamental questions lead to
diverse conceptualisations of the construct word. In a bid to respond to such questions, the
field of vocabulary measurement has landed itself with four conceptualisations namely: word
as token, word as type, word as lemma, and word as word family. The relative merits of these
word constructs in relation to the context of this paper require examination. D'Anna,
Zechmeister and Halls (1991: 111) question, When we say that a child learns 3,000 or 5,000
words per year, what exactly are we talking about? is as valid now as it was then.
Word as Token
Ordinarily we identify words simply by the space between the strings of letters in written
language (Luitel 2011: 59). This is consistent with Carter in Cataln and Franciscos (2008:
151) definition of a token as any sequence of letters (and a limited number of other
characteristics such as hyphen and apostrophe) bounded on either side by a space or
punctuation mark. Any expression devoid of any spaces within it and separated by spaces
from other expressions is consistent with the view of word as token. Such a conceptualisation
can, however, be faulted on the basis of its failure to account for some compound
constructions like cannot which can be regarded as one or two words depending on how
they are written. As well, should hyphens be considered as spaces or not? If they should, what
do we say about the inconsistency in the division of compound words like injustice and inlaws? Some words like ice cream are visualised and thought of as one word despite having
two forms and there is the complication of whether we need to consider the forms making up
the expression or the concept represented by the forms. Does the fact that an ice cream is one
item make the word a single word or does the presence of two forms make it two words?
Mrmol (2011) contends that because such words represent a single concept and learners
learn and understand them as just one concept, they should be considered as single words.
The criterion of spaces demonstrates the uninterruptiblity of words where one cannot add
anything between words as they would with a sentence. Inserting another word between a
word and its inflection is impossible but you can always add a qualifier to say more about a
verb or noun in a sentence. Tokens are also referred to as running words in a text and each
occurrence of a form is counted separately (Luitel 2011: 59). Tokens indicate the total
number of words in a text or corpus yielding the quantity of input in a text in raw terms
(Mrmol 2011). According to Nation (2001), tokens are the conceptualisation of word we
would be making reference to when we talk about a summary, a telegram, or a research paper
139
being so many words long. Every occurrence of each word is counted despite the recurrence
of some words in the text.
There are limitations to the application of the token as a conceptualisation of word in
vocabulary measurement. Most vocabulary measurement studies utilise word frequencies to
determine the most frequent words and the learners extent of their knowledge. Using the
token as a unit of measurement would make computation of word frequencies impossible
since every stand-alone form is regarded as a different word. Token as a unit of analysis treats
every form as diverse from the others implying that each form has to be learnt separately. In a
statement Your mother was talking to my mother in your garden, the words mother and
your, which appear twice each, are regarded as four different words yet everything about
them (orthographic make-up, meaning, and pronunciation) remains the same. Apart from
treating the same form as a different word whenever it recurs in text, forms like boy and boys
are presumably learnt one by one. This would make vocabulary acquisition and learning a
painfully slow process. What should, and does, happen is that sometimes we learn the
meanings of some words by inferring them from those related words which are already part of
our repertoire. Even the English Second Language (ESL) third graders profiled in this paper
can deductively recover some words meanings from those they already know. The token
therefore, falls short as a unit of word counting for word knowledge studies in this and other
contexts. Word as type addresses some of the limitations of the token construct and so
deserves some scrutiny.
Word as Type
According to Read (2000), in the conceptualisation of word as type, only the word form that
is dissimilar from all the others in an utterance is counted. Any recurring word form is only
counted once. Using the Your mother was talking to my mother in your garden example, we
can note that although there are ten tokens, there are only eight types since the words your
and mother appear twice in the statement. If we adopt the word as type as the unit of
quantification, all words identically spelt will be considered as one word. Word types would,
then, be all those items with different orthographic identity. Nation (2001: 7) observes that
conceptualising words as tokens is necessary when responding to questions like How large
was Shakespeares vocabulary? Conceiving a word as a type is based on two assumptions:
first, that knowing a particular word in one context translates to its knowledge in different
contexts making it one word no matter the number of times it recurs in a text; and, second,
that every individual word type is unique and its understanding does not depend on an
140
Word as Lemma
The lemma is preferred for lexical quantification on account of overcoming the limitation of
having to consider each word form as a unique form unrelated to the other forms as does the
type and token conceptualisations. Gardner (2007) notes that, in a lemma, all lexical forms
share the same stem and word class, and differ only in inflection or orthographic make-up.
The words write, writes, writing, written and wrote are all verbs emanating from the base
form write. The -s, -ing, -en are the inflections which are just indicative of a change in
grammatical functioning of the same base word write. The lemma is based on the assumption
that the knowledge of the inflected forms is eased and expedited once the base form, as well
as the morphological inflections, are known. The learning burden, which Nation (2001)
defines as the amount of effort required to learn a new word, is eliminated or eased
considerably if the base word is known. Knowledge of the inflectional system of English
would ease the learning of the inflected forms on the basis of the knowledge of the base form.
The other justification for considering inflected forms as one word with the base form is that
morphemes do not create new words; they merely modify the form in which they occur to
indicate grammatical functioning, such as plurality. The base form which has to be known in
this instance is write and what the inflections do is to give grammaticality to the functioning
of the same word in different contexts.
141
The requirement of having all members of a lemma belong to the same word class
would disqualify the form writer from the lemma of write, writes, writing, written and wrote
as it belongs to the class of nouns. It would become a base word for a different lemma of
writer, writers, writers and writers. The assumption is that the learning burden of words
emanating from the base form belonging to the same word class is less than that of inflected
forms from the same base which cut across word classes. Browne, Cihi and Culligan (2007:
2) exemplify and corroborate this assumption when they posit that the statistical item
difficulty factors for accept, accepts and accepting are very close, whereas the statistical
difficulties for acceptable, acceptance and unacceptable, are all quite different. One
hypothesis is that the brain treats these six items as four different Base Words. Such an
argument necessitates and rationalises the confinement of members of a lemma to a single
word class. The example of the six word forms given fit the argument well but going back to
the examples of inflected forms emanating from write, one may argue that knowledge of the
base form write may make the form writer easier to one learner than the form wrote or written
which belongs to the same word class as write. That the definition of a lemma cited above
accommodates irregular verbs like went for go, sought for seek or am, is, are, was, were,
being for be within a lemma makes the assumption that belonging to the same part of speech
as the base reduces the learning burden of a word highly suspect. As Gardner (2007: 244)
observes, the case of the irregulars poses serious quandaries relating to the psychological
validity of such family relationships namely, that the opaque spelling and phonological
connections between the lemma headword and the family members will surely cause more
and different learning problems than their more transparent counterparts. This defeats the
whole principle of learning burden for which the lemma is created to uphold.
Nation (2001: 8) registers concern over the inclusion of irregular forms within a
lemma when he notes that one problem in forming lemmas is to decide what will be done
with irregular forms such as mice, is, brought, beaten and best. The learning burden of these
is clearly heavier than the learning burden of regular forms like books, runs, talked, washed
and fastest. Should the irregular forms be counted as a part of the same lemma as their base
word or should they be put into separate lemmas? The orthographic constitution or spelling
of the word best is not in any way indicative of stemming from the base form good.
Including it within the lemma of good would present an even higher burden of recovering
its meaning from the latter than it would be in learning its antonym bad for instance.
Irregular plurals or verbal forms may need to be considered independently from their
headwords but such exclusion would mean quite a number of words would just be treated as
142
types or tokens as they cannot belong to lemmas. The words like good, better, best would not
be part of any lemma, as would all the irregular forms. The lemma should be a grouping of all
those words whose understanding is almost made obvious whenever the base form is known,
rather than a collection of words, which are brought together by virtue of them being inflected
from the same base form. Irregular forms normally use inflections diverse from regular ones
which gives an abstract status to morphemes. The regularity of frequent or regular inflections
stems from them being the inflections added to the vast majority of content words (verbs,
nouns, adjectives, and adverbs) to reflect grammatical properties such as tense, number, and
degree. The criteria of inflection and belonging to the same word class are not tight enough to
ensure only those words whose meanings are easily recoverable from the meaning of the base
gain entrance into the lemma.
Nation (2001) broadens the scope of a lemma to include the contracted forms. One
may express reservations over the inclusion of contracted forms on at least two grounds. First,
knowledge of the contracted form requires knowledge of, not only the base form, but also that
of not since the contracted form is both a fusion and reduction of two words (for example,
can + not = cant). Second, there are transparent and opaque kinds of contractions and the
opaque contractions cannot easily be inferred from the base form + not. Transparent
contractions would be forms like have + not = havent, do + not = dont and the opaque
forms would be will + not = wont, am + not = aint, shall + not = shant. The opaque
contractions have a higher learning burden which does not justify treating them as part of the
same lemma as the base especially for vocabulary knowledge measurement on second
language Foundation Phase learners. Asserting that beginners can associate such irregular
forms with their headwords is fundamentally unrealistic.
Possibly from realising the problems of having a too-accommodative criteria for a
lemma, Milton (2009: 10) makes the conception of a lemma less accommodative but more
manageable by narrowing its definition saying, it ...includes a headword and its most
frequent inflections and this process must not involve changing the part of speech from that of
the headword. In formulaic terms, the definition of a lemma can be represented, thus:
Lemma = headword + most frequent inflections + their contracted forms (belonging to same
class)
The use of the word most frequent is noteworthy and could well be interchanged or used
together with transparent. The only problem with most frequent is that it leaves the
143
determination of most frequent to the researchers discretion in the absence of frequency lists
of inflected forms. The frequency also needs qualification, whether it is the frequency with
which the inflected form is used in a text, or the frequency that stems from the number of
English words that an inflection inflects. The former kind of frequency would be relative to
text as frequent forms in one text may be less frequent in another. A definition of word whose
criterion is of a relative nature is not tight enough to allow easy and objective application. The
latter kind of frequency does not guarantee that inflections that have a lower spread in their
use are more difficult than those that impact a wide range of word forms in the language.
The lemma is also based on an assumption that inflections are easier than other forms
of affixation (prefixation and suffixation) which can be challenged. Some suffixes like -able
and -less and prefixes like un- have meaning in and of themselves which can be used to
recover the meaning of a suffixed and prefixed form like suitable, careless and unfair;
yet, inflections are devoid of such independent meaning. Such systematic use of affixes can
be used to significantly reduce the learning burden of the words derived from a known base
form. That the inflections -s and -es can be used for both verb and plural forms can be a
confounding factor on its own. This is not to imply such is absent from affixed forms.
In this paper, reference has severally been made to the base form, better known as the
headword but what really constitutes or counts as a headword is not clear. Nation (2001)
raises Sinclairs concern whether a headword should be the base form or the most frequent
form. The base form may not be the most common form or the form that learners are likely to
acquire first. The base itself can be recoverable from the most common form which justifies
the supposed complication of which to consider as the headword, the base form or the most
common form. That the construct lemma is elusive to define with precision explains why,
although the comparative and superlative forms have always been considered English
inflections, Nation (2001) notes that, in the computerised, lemmatised list of the Brown
Corpus (Francis and Kuera 1982), these are excluded.
Stubbs (2002) proposes an additional criterion for membership into a lemma: the
requirement that all the members share the same meaning, a criterion challenged for its failure
to distinguish a lemma from a lexeme. The lexeme also denotes a group of words sharing the
same meaning and same word class which the lemma does as well. An additional criterion
complicates the determination of what it is that should gain admission into the lemma
membership. Acknowledging the difficulty of constituting a lemma and the unconvincing
generalisations often emanating from generalizations about whole lemma. Knowles
and Mohd Don (2004: 71) advise researchers to consider individual words or actually
144
even individual word meanings as the basis for their word count and analyses. This is
almost a call to revert to conceptualisation of word as type.
Brain research has provided insights which support the learning burden principle but
not the constitution of lemmas. Browne, Cihi and Culligan (2007: 2) assert that the brain
stores and processes lemmas having similar difficulty factors as forms of the same word,
andstores and processes lemmas having different difficulty factors as different words. The
idea of coming up with a formula for defining what qualifies as a lemma is a noble one which
seeks to make the determination of lemmas objective. We have already seen how some
inflected or contracted forms are more difficult than others, implying that there is no
justification in generalising that because a word is an inflection or contraction of a base form
then it should enjoy lemma membership. Browne, Cihi and Culligans (2007) observation that
some lemmas are registered by the brain as separate words, rather than one word, casts doubt
on the validity of lemmas as a unit of vocabulary counting and analysis. That the brain does
not always store and process lemmas as we constitute them points to the need for either a
revisit of the constitution of lemmas if not a creation of another unit of counting.
145
The learning burden principle is the basis upon which the word family unit is constructed.
Knowledge of the base form engenders knowledge of its inflections and its close derivatives.
The word family unit is too accommodative of members into the family than the lemma. In
the first place, there is an inclusion of derivatives which are not included in the lemma, and
second, the restriction of having members belong to the same word class does not apply.
Word family members traverse boundaries of grammatical classes. Several lemmas usually
find themselves part of a single word family. From the base form long can come long, longer,
longest, longevity, longish, length, lengthen, lengthy; and all these can be considered as one
word under the word family unit of analysis. Certainly, all these forms cannot have similar
learning burden from the base form to warrant inclusion in the same word family. Even
derived forms differ in their complexity and difficulty of comprehension (Browne, Cihi and
Culligan 2007). That all these forms would be known once the base form is known is the
argument behind the word family unit. Mrmol (2011: 12) challenges such an assumption by
pointing out that we cast doubt on the idea that a child acquiring bed has also acquired
bedroom. There is the possibility that an adult could guess the meaning of the latter, but a
young language learner in his first stages of acquisition may not be able to make those
inferences. The word family unit depends for its use on the learners possession of an
intricate knowledge of morphological inflections of the English language in order to make
intelligent guesses about the meaning of some words on the basis of knowledge of their base
form. Evidently, learners, such as the ones described in this paper, would not possess the
native-like knowledge of morphological relations between words in a family. Schmitt and
Zimmermans (2002) study which required non-native postgraduate and undergraduate
participants to identify the derivational forms of stimulus stem words revealed that
participants could only rarely provide all the different derivations of the stimulus words. This
suggested only partial knowledge of derivational forms on the part of the participants. Bauer
and Nation (1993) even add that learners should know that mean does not derive from me,
despite the orthographic or spelling string for me occurring in mean. Learners should also
have some implicit knowledge of the role of affixes (prefixes and suffixes) in word formation
and word meaning, as well as use permissible base-affix combinations in speech and writing.
Because it takes in a broader membership and treats the different members as one, most if not
all the challenges confounding the application of a word family for word frequency counts
and word knowledge analysis are similar to, and even take a greater magnitude than, those of
lemmas as discussed in this paper. The challenge of deciding what should be included in a
word family and what should not is as manifest in the word family unit as in lemmatisation.
146
Bauer and Nations (1993) studied inflections and affixations of English words based
on their productivity, frequency, regularity and predictability and came up with a scheme for
defining word-families. They came up with seven levels or a word family scale based on an
analysis of the 1,000,000 token Lancaster-Oslo-Bergen (LOB) corpus dealing mainly with
affixation. These levels were supposed to form the basis for teaching and learning of English
words. The scheme is a welcome acknowledgement that learners knowledge of affixation
develops with more experience of the language. A sensible word family for one learner may
be beyond another learners current level of proficiency. This necessitates the scaling of word
families from the most elementary and transparent members to those of less obvious
possibilities (Nation 2001). At level 1, learners are assumed to treat each form as a different
word. The table below, adapted from Bauer and Nation (1993: 254), takes the scale from the
second level to the seventh level of inflections and affixations.
No affixes.
-able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, (Most frequent and regular
derivational affixes)
-al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, -ous, in- (Frequent, orthographically
regular affixes)
-age, -al, -ally, -an, -ance, -ant, -ary, -atory, -dom, -eer, -en, -ence, -ent, -ery, -ese,
-esque, -ette, -hood, -l, -ian, -ite, -let, -ling, -ly, -most, -ory, -ship, -ward, -ways,
-wise, ante-, anti-, arch-, bi-, circum-, counter-, en-, ex-, fore-, hyper-, inter-, mid-,
mis-, neo-, post-, pro-, semi-, sub-, un- (Regular but infrequent affixes)
-able, -ee, -ic, -ify, -ion, -ist, -ition, -ive, -th, -y, pre-, re- (Frequent but irregular
affixes)
ab-, ad-, com-, de-, dis-, ex-, and sub- (Classical roots and affixes)
N.B.: Bracketed words in italics at the end of levels 4 through 7 are not part of the original.
Gardner (2007: 247) appreciates the apparent advantage of this seven-level
categorization scheme that Word or Word Family can be operationalized at various
defensible levels for analysis and comparative analysis purposes at least in terms of
learners abilities to associate morphologically related words. Bauer and Nation (1993) need
147
to be applauded for hierarchically organising word family levels which can be matched with
learners competence levels. For a learner operating at level 5, for instance, all the words in
levels 1 to 5 emanating from the same base would be considered as a single word, but those in
levels 6 and 7 would be regarded as different words from their base form. It is also significant
that such categorisation was done systematically on the basis of a rigorous criteria identified
above (their productivity, frequency, regularity and predictability) and on a large corpus (a
million words) which gives the categorisation a substantial measure of validity.
Gardner, however, notes as problematic, the repetition of many affixed forms at the
different levels, failure to acknowledge that derivational prefixes and derivational suffixes
may present different learning dilemmas for developing readers, as well as assuming that
learners exposure to, and acquisition of, morphologically-related words is somehow linear in
nature in other words, that language learners acquire base forms before their inflected and
derived family members (2007: 247).
Such an assumption is refuted by Biemiller and Slonim (2001), who note that young
children may actually acquire many derived forms before they acquire their root-form
counterparts. Concerning the duplication of affixes, an example would be the suffix -able in
level 3 and in level 6 which presents uncertainty about membership level of forms like
suitable on the word family scale. The assumption of the linear nature of exposure and
acquisition of word family members rests on a shaky pedestal. A form like disadvantage
(level 7, according to the taxonomy of levels of inflections and affixations) can have a lower
learning burden than advantageous (level 4).
Such categorisation as Bauer and Nation (1993) come up with seems to come as a
solution to the challenge of determining what qualifies as a member of a word family. The
present paper, however, takes exception to the idea of basing the categorisation of the word
family levels solely on the basis of a corpus without complementing it with empirical
evidence of the ease with which learners acquire the different affixed forms. This is not a
criticism of Bauer and Nations (1993) work but a pointer to the need for further large scale
research to corroborate the match between the levels of the corpus analysis and the
psychological realities of learners word learning and acquisition.
Rank
Morpheme
2/3
in, on
Plural (-s)
Past irregular
Possessive (-s)
10
11
12
13
Contractible copula
14
Contractible auxiliary
The above hierarchical ordering is limited in two ways. First, the studies are based on native
English language speakers and importing the ranking wholesale to ESL learners may be
misleading. Second, the studies are exclusively based on morpheme studies when in fact most
high frequency words are just sight words which cannot be reduced to their morphological
149
composition. The paper, therefore, argues for extensive testing and documentation of the
acquisition order of English affixed forms (suffixed and prefixed for both inflections and
derivations). The testing should cover a wide range of learner profiles from diverse language
backgrounds and competence levels. The resultant taxonomy should ensure that only those
lexical forms which pose negligible or no learning burden in the event that the base form is
known, are regarded as one word. Two forms may justifiably be regarded as one for one
learner but not for another depending on their level of competence. A taxonomy of word
conceptualisation levels is, therefore, needed where, at the first level, some lexical forms may
be regarded as separate words but, at the next levels, be considered as one word. Researchers
would then choose the level at which they conceptualise word for their word knowledge
measurements depending on the competence level of the learners. A departure from a one
size fits all would make possible the replication of studies. One would just need to specify
that they based their studies on level 3 of the word conceptualisation taxonomy. Explicit rules
would need to be generated for word membership at each level and exceptions identified.
Even teachers would know which lexical forms they need to give preference to for explicit
instruction depending on the competence level of the learners. A move away from the current
word conceptualisations would ensure more realistic and valid conclusions on word
knowledge measurement studies.
References
Baxter, J. (1980). The dictionary and vocabulary behavior: A single word or a handful?
TESOL Quarterly, 14, 325-336.
Bauer, L. and I. S. P. Nation. (1993). Word families. International Journal of Lexicography,
6, 253279.
Cataln, R., J. and R. M. Francisco. (2008). Vocabulary input in EFL textbooks. RESLA, 21,
147-165.
Chen, K., Y. (2011). The impact of EFL students vocabulary breadth of knowledge on
literal reading comprehension. Asian EFL Journal, 51, 30-40.
Chung, T., M. (2009). The newspaper word list: A specialised vocabulary for reading
newspapers. JALT Journal, 31(2), 159-182.
D'Anna, C., A., E. B. Zechmeister and J. W. Hall. (1991). Toward a meaningful definition
of vocabulary size. Journal of Literacy Research 23(1), 109-122.
150
151
1. Gender as Style1
The study of the relationships between sex/gender and communication is an ever-developing
area of social science research with already quite a long history behind, and one that currently
offers some of the most promising prospects for sociolinguistics. Far from traditional clichs
and prejudices on the subject, a fair deal of consensus has been reached regarding the fact that
linguistic-communicative usage is usually less conditioned by biological sexual factors than
by psychosocial ones (see Eckert 1989). Sex/gender needs to be analyzed within socially and
situationally contextualized approaches, observing how identities are constructed and
reformulated through linguistic choice.
Dialectological and sociolinguistic studies conducted in diverse human communities
have long pointed out differences in the communicative norms followed by men vs. women.
Most investigations have focused on the supposed peculiarities of female behaviour, thus
more or less implicitly certifying male speech as the unmarked or standard variety. The result
of such an orientation for linguistic research has been more thorough knowledge of womens
discourse and of the social contexts and practices across which it is developed (Coates 2003:
3; Edwards 2009: 146). However, this tendency is also being counterbalanced by the
appearance of more investigations specifically devoted to male self-presentation and
socialization through speech (Coates 2003; Jordan-Jackson and Davis 2005; Kiesling 2005).
The truth is that each gender group is repeatedly found to follow partly different
interactional patterns that, in our view, may well be the manifestation of different basic sociocommunicative styles. The view of gendered linguistic usage as a matter of style makes it
possible to move beyond the reactive orientations promoted by classic sociolinguistic
quantitative research, which inevitably lead to somewhat fixed and static conceptions of
gender just as any other social ascription (Bell 1999: 524). Seeking the balance point
between the conditionings imposed on gender by societal structures on one hand, and speaker
agency and creative elaboration on the other, seems to offer the most realistic and potentially
fruitful path at the present state of knowledge.
This paper is part of the research project Los estilos de comunicacin y sus bases cognitivas en el
estudio de la variacin sintctica en espaol (FFI2009-07181/FILO), funded by the Spanish
Ministerio de Ciencia e Innovacin.
1
153
Style can be understood as any system of (linguistic and other) meaningful choices
that helps someone shape some (social, professional, emotional, ...) self-image; this being
perceived by the speaker as optimal for the achievement of certain interactional goals in a
particular context. The making of styles needs to be found and analyzed within real discourse,
where it may be feasible to describe the relevant circumstances of the situation, the social
features of the participants, and how they all interact with creative linguistic choice (Aijn
Oliva and Serrano 2013: 11-45; Serrano and Aijn Oliva 2011: 139). Sociolinguistic meaning
does not arise from extralinguistic factors, but from the joint action of linguistic and any
other semiotic choices across symbolic communicative acts (Coupland 2007: 3).
The application of these principles to the study of language and gender naturally
results in a view of masculinity and femininity as sets of values that are partly received from
social structure, but that can and need to be continuously elaborated in interaction. Speakers
are not male or female once and for all, nor do they need to be just one or the other.
Rather, they can choose the extent to which they want to associate themselves with some
gender label and even what the labels themselves might imply in a certain context ; their
stylistic work will aim to shape a corresponding self-image towards others. In this paper, we
will conduct an analysis of stylistic choice and the configuration of gender as a socio-semiotic
category, with regard to a phenomenon of Spanish morphosyntax: variable formal expression
of first- and second-person clause subjects.
that is, it creates systems of meaning affecting all possible levels of communicative choice
(Aijn Oliva and Serrano 2010a: 9; Serrano and Aijn Oliva 2011: 142).
Variation between the expression and omission of subject pronouns in languages such
as Spanish is one among many syntactic phenomena that lend themselves to style
construction. A relationship between pronoun usage and meaningful social factors such as
gender can thus be hypothesized and scientifically tested. It will be our task to ascertain
whether there are statistical differences in subject expression according to speaker gender, as
has been found in many other facts of linguistic variation. But, more importantly, if this is the
case, we will try to advance some explanation of such statistical patterning, by investigating
which semiotic facets of gender seem to be conveyed through syntactic choice in particular
discursive genres and contexts, and how this can be related to the meanings inherently linked
to grammatical forms.
In order to do this, we will first examine whether the syntactic phenomenon under
study is, in fact, the carrier of meaning differences at internal linguistic levels. As can be
inferred from a number of previous studies (Delbecque 2005; Siewierska 2004), variation
between the expression and omission of subject pronouns seems to be a formal reflection of
the degree of cognitive salience achieved by discursively encoded entities. When the referent
of a clause subject is under the attention focus, and can thus be considered salient or
accessible, its formulation tends to be perceived as unnecessary for the communicative
purposes of the speaker (Langacker 2009: 112). This is particularly evident in languages with
a relatively rich inflectional morphology such as Spanish, where the identity of clause
subjects can easily be tracked through verb agreement morphemes (cant-o I sing, cant-as
you (s.) sing, and so on), which, in fact, makes subject omission the unmarked choice in
most discourse types (Serrano 2013: 276-281).
At the same time, discourse-oriented studies on subject variation such as the ones
cited above have often explained subject expression through informativeness, understood as
the degree of mental processing required by textual elements, given their newness or
unpredictability for participants (Beaugrande and Dressler 1997: 201). Informativeness is not
unrelated to salience but could, rather, be considered a textual correlate of it, albeit an
inversely proportional one; in general, the most salient entities are also the less informational
ones, due to their very accessibility and continuity across discourse stretches. Both salience
and informativeness should be conceived of as gradual magnitudes that are largely dependent
on the particular context, the relationship between the participants and other factors. Their
existence confirms the notion that different syntactic forms such as subject expression and
155
omission can hardly be seen as synonymous they represent different views of non-linguistic
situations encoded through linguistic means (Serrano 2013: 284-288).
The analysis of subject pronouns, given their deictic nature and their power to endow
real-world entities with different degrees of cognitive salience within discourse, suggests that
their choice is a formal manifestation of abstract cognitive dimensions underlying speech, and
particularly of the continuum between objectivity and subjectivity. The latter is the tendency
of discourse and perception to revolve around subjects (mainly human participants, these
being the entities most frequently encoded as clause subjects in conversational speech and
other discourse types), while objectivity would imply the converse orientation towards nonparticipants: third-person human and non-human entities. There is, in fact, a significant
amount of evidence pointing to objectivity-subjectivity as a very powerful notion for the
theoretical explanation of linguistic variation and style construction (Aijn Oliva and Serrano
2013, Kerbrat-Orecchioni 1980, Kristiansen 2008). In the present study, we will try to
elucidate whether this may bear some relationship to the shaping of gendered identities
through syntactic choice.
In this sense, the calculation of descriptive frequencies, that is, of the percentages of
one variant against the other, is the most basic tool for the quantitative assessment of
variation. This is referred to as relative variables. However, the consideration of syntactic
options as meaningful options by themselves and not in opposition to other variants suggests
the incorporation of a complementary statistical method that can, in some way, better suit this
conception of linguistic variability. This we shall refer to as the absolute variable
methodology (Aijn Oliva and Serrano 2012: 80-94). It is based on the assumption that any
form-meaning pairing is contextually chosen for its own value and not just as opposed to any
other options. Consequently, aside from assessing its frequency against those of its alleged
alternatives, it may be interesting to calculate it in overall terms according to an independent
measure, such as word number. In our case, this means assuming that the total frequencies of,
e.g., expressed subjects across some text, group of speakers, etc. can be scientifically
revealing in itself and irrespective of their relationship to omitted-subject rates. Thus, a
frequency index of each form per 10,000 words will be used to clarify the tendencies
suggested by percentage data.
Now it must be acknowledged that statistical patterns, useful and revealing as they
may be, would make little sense if they had no relationship to the actual instances of
communication they emerge from. We believe there is an essential connection between the
quantitative and the qualitative sides of sociolinguistic variation; one that has been generally
neglected, but that is indispensable for the future construction of a general theory. In the case
of our study, the conjunction of statistical and interactional findings seems particularly crucial
if we aim to explain communicative styles as the contextual construction of identities by men
and women.
Our analysis and discussion of syntactic variation and its stylistic implications for the
notion of gender will be divided into the next three sections, each one focusing on a different
subject pronoun: yo I, nosotros we and t you (singular).
4. The First-Person Singular yo I in (yo) creo I think Constructions
In general terms, yo is the most frequent subject in Spanish clauses. Its statistical dominance
can be taken as a formal reflection of the general egocentric orientation of human language
(Keysar 2007, Serrano 2014), even if its occurrence rates are obviously quite variable
depending on the context and discourse type, usually becoming higher in contentious or
persuasive speech. The argumentative potential of first-person subjects is particularly obvious
in the context of verbal lexemes acting as indicators of modality, among which creer to
157
think seems to be the most frequent one in Spanish discourse. This is why our present
analysis will be restricted to the construction (yo) creo and its basic usage patterns.
Qualitative contextual analysis suggests that formal expression of the subject (yo creo
or else creo yo) represents the paradigmatic case of the aforesaid association of the structure
with personal opinion and argumentation, as seen in example (1), regarding the procedure that
should be followed in a Carnival competition. The speaker emphasizes the personal nature of
her stance.
[Female]:
(1)
La gente que vena de la Pennsula no saba valorar un traje\yo creo que las personas
famosas\que vienen aqu al Carnaval\deberan de ser invitados\yo creo que poda haber un|||un
jurado ms especfico sobre el tema que estamos tratando\ (CCEC Conv<MaTe09>)
People who used to come from Peninsular Spain were not apt to evaluate a costume. I think
famous people taking part in the Carnival competition should be specifically chosen. I think
there should be a jury composed of experts on the topic were dealing with.
On the other hand, omission of the subject ( creo) tends to be preferred for the presentation
of contents as hypothetical or as having a more general and less personal scope. In (2), the
speaker is expressing what she believes to be a mere possibility rather than a personal
position. That is, the omission of yo seems to displace potentially contentious discourse
towards objectivity.
The variability and its discursive repercussions are explainable through the higher salience
and accessibility of omitted subjects. Avoiding overt self-indexation, the speaker builds a
more objective self-image that can be perceived as advantageous in contexts such as that of
158
(2). It is interesting to point out the fact that (yo) creo is one of the rare Spanish constructions
in which expression of the first-person subject is altogether more frequent than its omission,
as discussed in previous works (see Aijn Oliva and Serrano 2010b). This suggests that its
basic function is that of indexing the speaker in discourse, rather than strictly introducing a
belief or opinion, as the verb lexeme would indicate.
If just the overall frequencies of (yo) creo are calculated, whether with expressed or
omitted yo, we find that its occurrence is notably more usual in male speech. This table shows
that, in the CCEC corpus, men are ahead of women by 6.5 items of (yo) creo per 10,000
words (Table 1).
Table 1 Overall frequency of (yo) creo (expressed and omitted) according to gender
(CCEC media texts)
Gender
Word number
Overall
occurrences of (yo)
10,000 words
creo
Men
48,035
136
28.3
Women
19,654
43
21.8
In the case of the MEDIASA corpus, the contrast is even sharper, with the scores of men
outweighing those of women by a three-one ratio. That is, male speech seems to be
characterized by a stronger tendency towards discourse modalization through self-indexing
choices such as (yo) creo. However, such a hypothesis needs to be confirmed by analyzing
other facts of grammatical choice in discourse.
Table 2 Overall frequency of (yo) creo (expressed and omitted) according to gender
(MEDIASA corpus)
Gender
Word number
Overall occurrences
of (yo) creo
10,000 words
Men
177,332
232
13.1
Women
116,288
51
4.4
159
[Female]:
(3)
Que nosotros hemos tomado decisiones en reuniones \y despus el resto de la gente no est
informada de lo que hay que hacer\ (CCEC Conv<ElEn08>)
We have made decisions in our meetings, but the rest of the people have no way of knowing
what is to be done.
[Male]:
(4)
Nosotros slo pedimos que se cumplan los compromisos que estaban acordados. (MEDIASA
<Ent-Ad-131104-17>)
We are only asking for the commitments agreed on to be fulfilled.
Omission is fostered by a high degree of subject salience in the context; but, due to the
peculiar discursive projections of nosotros, it is also often related to referentially vague uses
in which the first-person plural indexes a general community or performs a merely discursive
function. These usually promote a universal interpretation of the content. Omitted nosotros
helps move attention away from particular human subjects and place the interest of discourse
on objects being talked about in other words, it enhances objectivity. In (5), the content is
presented as relating to any human being and not just a definite group, while in (6) the form
digamos lets say basically acts as a discourse marker.
160
[Male]:
(5)
Los muertos nos permiten comprender la vida que hemos construido y a su travs
entramos en la razn de ser de lo que hemos sido y hecho. (MEDIASA <Art-Ga-0511045c>)
The dead help us understand the life [we] have built, and it is through them that [we] discover
the raison dtre of all that [we] have been and done.
[Male]:
(6)
From a cognitive viewpoint, any use of nosotros can be described as an extension of the first
person towards a larger group. Thus, whenever the first-person plural perspective is adopted,
the speaker will be included in some way, even if just in a metaphorical sense. But, crucially,
his/her personal sphere will be extended to include others as well.
Salience and
informativeness can account for the observed variation, thus contribute to shape
communicative styles oriented to subjectivity or to objectivity.
The results from the CCEC corpus are clearly indicative of gender differences:
Omitted nosotros as an expressive choice is, in fact, much more usual in womens
conversational speech. The objective presentation of facts and ideas through subject omission
would thus seem to be a trait more typical of female communicative styles, placing them
away from the pole of subjectivity (Table 3).
In this respect, inclusion against exclusion of the audience in the scope of nosotros appears as
particularly significant, even if it will not be possible to investigate the subject in this paper.
161
Word number
Overall occurrences
of omitted nosotros
10,000 words
Men
27,867
37
13.2
Women
51,677
168
32.5
[Female]:
(7)
No es que tu hijo o tu hija tengan hijos\ es que t te conviertes en abuela\ a m eso me parece
ms fuerte\ (CCEC Conv<ElEn08>)
Its not just that your son or daughter may become a parent; you in turn will become a
grandmother, and thats what feels most shocking to me.
[Female]:
(8)
desde luego es en la nica cadena / que se: puede hablar / porque en las otras / cuanto
empiezas a decir algo de esto / te cortan (MEDIASA <Var-Co-230503-12:30>)
This is indeed the only radio station where one can talk freely; in others, whenever [you] start
saying things like these, theyll cut you.
162
Word number
Overall occurrences
of objectivizing t
10,000 words
Male
27,867
17
6.1
Female
51,677
38
7.3
Word number
Overall occurrences
of objectivizing t
10,000 words
Male
177,332
105
5.9
Female
116,288
75
6.4
163
7. Conclusions
In the present study, we have analyzed the statistical variation and some interactional
projections of the expression vs. omission of three Spanish subject pronouns; we hypothesize
that the syntactic variants under study might constitute formal-semantic choices helping the
development of communicative styles. More specifically, such choices might be associated
with the interactional construction of sex/gender as a stylistic category.
Our results seem to largely confirm the hypotheses assumed, as well as support and
explain certain previous findings on male vs. female ways of communicating, particularly
those regarding the supposed collaborative orientation of female speech. The notion that
women tend to favour interactional co-operation and agreement, while men orient themselves
more clearly towards self-expression and imposition is widespread in gender studies. But we
have also tried to offer a cognitive explanation to such social variability. This can be
condensed in the abstract continuum between objectivity and subjectivity, understood as a
dimension conditioning all levels of form and meaning. In this sense, the analysis suggests
that female speech is particularly inclined to syntactic choices promoting objectivity or,
perhaps more precisely, downplaying subjectivity , whereas the opposite tendency seems to
characterize male communicative styles.
Our positive conclusions on the connection between pronoun usage and gendered
identities are not meant to imply that such usage is perceived as anything like a gender
marker in Spanish-speaking communities, but rather that it is one among the variety of
semiotic resources used for the (sometimes quite subtle) construction of gender in interaction.
A line of research like the one outlined here should further incorporate other meaningful
linguistic and communicative phenomena, as well as refine the analysis of interactional
contexts, in order to achieve a more realistic picture of the ways male and female identities
164
are contextually shaped, and of the cognitive orientations towards reality underlying such
identities.
This should probably start from the joint consideration of the whole paradigm of
grammatical persons, each of which can be seen as embodying a different perspective along
the subjectivity-objectivity continuum. For example, the singular first person can be viewed
as signaling the highest degree of subjectivity, while the plural downplays this value by
including the speaker in a wider group. In turn, second and third persons, as well as their
different variants, will promote different perceptions and interpretations of the content of
discourse. If a relationship can be demonstrated between the choice of person as a discursivecognitive perspective and the construction of gender as well as other relevant identity
features, a further step will be achieved towards the theoretical, explanatory model of
sociolinguistic variation that we see as a desirable scientific goal. The handling of general
cognitive notions such as subjectivity in the description and explanation of styles is, in our
view, the key to transcend the peculiarities of the communities and interactional domains
analyzed. In sum, further research from this viewpoint in different settings and languages
should be carried out in order to check the wider validity of the claims put forward here.
References
Aijn Oliva, M. . and M. J. Serrano. (2010a). Las bases cognitivas del estilo lingstico.
Sociolinguistic Studies 4, 115-144.
Aijn Oliva, M. . and M. J. Serrano. (2010b). El hablante en su discurso: Expresin y
omisin del sujeto de creo. Oralia 13, 7-38.
Aijn Oliva, M. . and M. J. Serrano. (2012). Towards a comprehensive view of variation in
language: The absolute variable. Language & Communication 32: 80-94.
Aijn Oliva, M. . and M. J. Serrano. (2013). Style in syntax: Investigating variation in
Spanish pronoun subjects. Bern: Peter Lang.
Beaugrande, R. A. and W.
Barcelona: Ariel.
Bell, A. (1999). Styling the other to define the self: A study in New Zealand identity making.
Journal of Sociolinguistics 3, 523-541.
Coates, J. (2003). Men talk. Oxford: Blackwell.
Coupland, N. (2007). Style: Language variation and identity. Cambridge: Cambridge
University Press.
165
167