You are on page 1of 5

LING 6401

806007430

Dr. Braithwaite

Name: Hassan Basarally


ID: 806007430
Course: LING 6401-The Grammar of English: An Extrapolatory Approach
Lecturer: Dr. Ben Braithwaite
Assignment: 1-What is Corpus Linguistics? How do approaches to the study of (English)
Grammar that use Corpus Linguistics differ from other approaches within Linguistics? What are
the advantages and disadvantages of using corpus data? Give an example of a question about
grammar that could be investigated using a Corpus Linguistics methodology, and suggest how
you could go about investigating it.
Date Due: 23/02/2011

Corpus linguistics is a methodology of linguistic analysis that originated in the 1960s. It


is different from the major linguistic practices of that time as it viewed natural language as a
reliable source for the investigation linguistic structures. Amongst its advantages is the amount of
data that can be analysed, its usage in various fields of linguistics, and its qunatifiability. On the
other hand, there are concerns about the quality of data, limiting the investigation of structures in
language. As a research tool in grammar, it is very useful in investigating the acceptability of a
feature as the corpora provide a ready data pool to see how a particular feature is used in actual
speech.
Corpus linguistics must not be viewed as a different subfield of linguistics. Instead, it is a
methodology or tool to be used in linguistic analysis. This approach is based on four

LING 6401

806007430

Dr. Braithwaite

characteristics: it is empirical, it utilises a large collection of natural texts, it uses computers for
analysis and it depends on both qualitative and quantitative techniques (Biber, Conrad and
Rappen, 4). The corpora collected are placed into a searchable data base. There is no limit to the
type of corpora that can be collected. Data can come from fields as wide as written to spoken,
academic to non academic discourses and newspapers to novels. The texts used in the corpora
are unelicited and authentic language. There are several types of corpora available which allow
access to a variety of language varieties. There is spoken and written corpora, which can be only
spoken or written or a mixture of both e.g. the British National Corpus. Historical corpora show
language change over time e.g. the Helsinki Corpora which has texts from 770 to 1700.
Dictionaries, especially recent ones, rely on corpora and there are specialised corpora e.g. the
International Corpus of Learner English (ICLE). The presence of specialised corpora show that
having a large corpus is not necessary if one is investigating a specific speech community.
A major difference between corpus linguistics and other methodologies lays the validity
of natural language as a source for extracting grammar. In a corpus approach, data is taken from
actual text and speech from a variety of genres. This is authentic communication, with the
speaker or writer usually unaware that it is being recorded or the data being recorded after it
production. Chomsky and others who promote generative grammar view natural language as
filled with performance errors, hence it does not represent the grammar of a language. Such
linguists depend more on introspection or intuitive judgements in a laboratory setting or the
mental grammar of an idealised speaker (Lindquist, 8). Corpus linguists do not rely on
anecdotal evidence from small samples but on large databases of authentic texts. The focus is on
descriptive adequacy as opposed to explanatory adequacy as corpus linguists view complexity
and variation as inherent to language (Meyer, 3). The aim was to quantify features empirically
and independently as opposed to relying on prescriptivist introspections. In addition,
2

LING 6401

806007430

Dr. Braithwaite

introspection by an individual was only possible when meta linguistic awareness developed in
older children and adults (Wilson and McEnery).
The advantages of corpus linguistics include the speed and reliability of obtaining
concordances and frequencies and the large amount of corpora allow for more exact information.
The concordances provided by corpora show all the contexts in which a word occurs in a
particular text or genre. Corpora show keyword-in-context (KWIC) which shows the word in a
line which show the linguistic environment and associated structures of a word in a sentence.
Frequency charts or tables show the occurrence and co-occurrence of a linguistic element. As
frequency indicates linguistic change, corpora are useful in tracking language variation
(Lindquist, 8).
Other benefits of the method include: corpora are easily accessible to researchers as they
are online, it is verifiable and an excellent tool for teaching non-native speakers as it represents
actual speech. As corpora can be specific, a linguistic construction can be investigated in terms
of its frequency in a particular genre. Another benefit of corpus based approaches is the wide
variety of applications to existing linguistic fields possible. It is applicable to reference
grammars, lexicography, variation studies, historical linguistics, contrastive analysis between
languages and language pedagogy. Language pedagogy has shown some of the greatest uses of
corpus linguistics. The frequency lists produced show which linguistic feature is most common
and should be taught first to students. It also shows a relationship between individual words and
the structures they appear in. This type of information is very useful to language teachers as the
rules derived are actually implemented in actual speech.
A problem with the corpus approach is quality of the corpus. As language is infinite, a
single corpus or even a collection of corpora cannot adequately represent a language. Hence, the
possibility of total linguistic accountability is unattainable. Related to this fact, the absence of a
feature in the corpus does not mean that it does not exist. It could simply mean that it was not
3

LING 6401

806007430

Dr. Braithwaite

recorded, omitted by the researcher or did not occur at the time the corpus was collected. In
addition performance errors of speakers may have to be omitted and the corpus by itself cannot
produce a grammar of its own. There will always be the need of a native speaker to assist with
what is grammatical or not (Lindquist, 10).
A corpus being representative and balanced also proves difficult. Ideally a corpus should
have all parts of a linguistic variety e.g. spoken, non- academic etc and the proportion of a part.
In the corpus should be proportional to that part in the actual variety (Gries, 7). As no one knows
this information the researcher has to use an approximation. Another flaw with the corpus
methodology is the limited use of introspection. Introspection, especially by native speakers is
necessary to determine the acceptability or meaning of a sentence. Despite the many flaws of
introspection e.g. it being unobservable or artificial (Wilson and McEnery), it is still a useful tool
in grammar.
The applicability and feasibility of using a corpus based approach in investigating a
question in grammar can be seen in looking at the replacement of the past tense verb to be with
the past tense verb to have in there was with the phrase there had in Trinidad English. The first
step is identifying a relevant corpus to use. Large amount on corpora has been developed over
the years and there is the International Corpus of English- Trinidad and Tobago (ICE T & T).
Using ICE-T&T the frequency of there was and there had would be tallied to see which one was
more prevalent. The next step would be to determine in what genres or categories each feature
occurs in; spoken vs. written, academic vs. non academic, magazine vs. newspaper etc. From this
the environments in which each occurs can be investigated, for example: Does the feature occur
sentence initially, medially or finally? or Does the feature precede a noun phrase or verb phrase?
or After which part of speech does the feature never occur. The next step will involve referring to
grammar texts about which verb form is viewed as correct and compare it to what is actually

LING 6401

806007430

Dr. Braithwaite

being used and in which context, Hence, a rule for the use of each feature can be created based
on authentic data or a prescriptive rule can be tested.
The corpus methodology has become an important tool in linguistic investigation.
Despite opposition to it by generative linguists, it was proven as a resource in investigating
grammar, variation and language pedagogy. Its ability to easy determine the frequency of a
linguistic feature has helped test the validity of grammatical rules that have been held and taught
for years.
Works Cited
Biber, Douglas, Susan Conrad and Randi Reppen. Corpus Linguistics: Investigating Language,
Structure and Use. Cambridge; New York: Cambridge University Press, 1998.
Print.
Gries, Stefan. What is Corpus Linguistics? Language and Linguistics Compass 3 (2009): 1-17.
Web. 16 Feb. 2011
Lindquist, Hans. Corpus Linguistics and the Description of English. Edinburgh: Edinburgh
University Press, 2009. Print.
McEnery A. and Wilson A. Corpus linguistics. Information and Communications Technology
for Language Teachers (ICT4LT), Thames Valley University, 2011. Web. 21 Feb.2011
Meyer, Charles F.. English Corpus Linguistics. Cambridge; New York: Cambridge University
Press, 2002. Print.

You might also like