Professional Documents
Culture Documents
Interface Explorations 26
Editors
Artemis Alexiadou
T. Alan Hall
De Gruyter Mouton
Consonant Clusters and
Structural Complexity
edited by
Philip Hoole
Lasse Bombien
Marianne Pouplier
Christine Mooshammer
Barbara Khnert
De Gruyter Mouton
ISBN 978-1-61451-076-5
e-ISBN 978-1-61451-077-2
ISSN 1861-4167
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Theo Vennemann
Abstract
This paper attempts a denition of consonant clusters, consonant cluster complexity,
and cluster complexity reduction in a phonological perspective. In particular, since at
the present stage of our knowledge a metrical (and thus: general) denition of con-
sonant cluster complexity is not possible, a relative and structure-dependent concept is
proposed: Only clusters within the scope of one and the same preference law can be
compared, namely evaluated as the more complex the less preferred they are in terms
of that preference law. This concept, as well as ways in which cluster complexity is
reduced, are illustrated with examples from various languages. They include word-
initial muta-cum-liquida reductions in Spanish and Portuguese, certain cases of
metathesis at a distance (e.g. Spanish periglo > peligro danger), and slope dis-
placements as in Old Italian ca.'pes.tro > ca.'pres.to rope, Tuscan pa.'dro.ne >
pra.'do.ne lord, employer. The opposite kind of development, namely the formation
and complexication of clusters, is argued for the most part not to be motivated by
syllable structure preferences but (a) by a variety of syntactic and morphological pro-
cesses and (b) in phonology itself by rhythmically induced copations (e.g. syncope in
Latin periculo > Spanish periglo), or to result from borrowing.
Let us begin with the question what we mean when speaking about consonant
clusters. What would be a suitable denition? Since I am a phonologist rather
than a phonetician, all the denitions that follow will be phonological rather
than phonetic.
The Oxford English Dictionary denes a cluster as a collection of things of
the same kind, as fruits or owers, growing closely together; a bunch, origi-
nally of grapes [!]. The word is attested in the language as early as the year
800. It is assumed to be a -tro-derivate of the same root that we also have in
clot, clout, and cleat, German Klotz and Klo.
In any event, a cluster consists of discrete elements, a consonant cluster of
discrete consonantal elements. In traditional phonetics one learns that phonetic
objects are continua. Hence a consonant cluster as a phonetic object would
have to be a continuum, and that is what a cluster by denition is not. Philip
12 Theo Vennemann
Hoole (p.c.) has assured me that modern phonetics can show that a degree of
segmentation already occurs at the articulatory level, rather than only on the
mental articulatory retina (for which cf. Tillmann/Mansell 1980), and that
within the so-called gestural framework (Browman and Goldstein 1986,
1989, 1992), gestures whose coordination is part of a words lexical represen-
tation bear a close relationship to those conglomerates of gestures that con-
stitute what is traditionally considered to be a segment (Byrd 1996: 160).
However that may be, phonologists are dealing exclusively with discrete
objects. Therefore in that regard they have no problem dening a consonant
cluster, namely indeed as a set of consonants understood as discrete objects,
but more precisely as an uninterrupted sequence of two or more consonants
within some well-dened unit of language, such as a syllable, word, or phrase.
And if phonologists do have a problem it is because they do not know for sure
what a consonant is, an uncertainty which may also hold for phoneticians. For
example, is the second speech sound in twist, twinkle, twine, twenty, twaddle,
etc. and in quick, quest, quiet, quota, etc. a consonant or a vowel? If it is
a consonant, then the words twist and quick begin with a consonant cluster.
If the second speechsound is just the vowel /u/ in a syllable margin, namely
in a complex syllable head, then those words do not begin with a consonant
cluster, but rather with a sequence of consonant and vowel within a syllable
head. Perhaps that is actually what phonologists mean when speaking of
consonant clusters: an uninterrupted sequence of marginal speech sounds, i.e.
a sequence of speech sounds not interrupted by a syllable nucleus (nor, of
course, by a pause). And this may be the only legitimate meaning if we take
seriously the idea that the speech sounds of any language can be arranged
hierarchically on scales of increasing consonantality, or decreasing sonority,
without any break-off point, as in (1).
This particular scale is that presented in Vennemann (1988: 9). There are
other arrangements. Some authors use ner scales, for example scales which
hierarchize obstruents and nasals by place of articulation, and vowels on the
frontness parameter. Conversely there are less ne-graded scales, such as
scales lumping all obstruents or all vowels together or not distinguishing
lateral and central liquids in terms of strength. Thus one often sees the simple
scale V L N F P (vowels liquids nasals fricatives plosives). For some languages
even this scale may prohibit certain generalizations. The only scale to which
I have never seen contrary language material is V R O (vowels sonorants
obstruents). The above scale may be the most ne-graded that most linguists
can agree on. When ner distinctions are made, language-specic differences
begin to play a role, and linguists will begin to differ.
The scalar nature of the consonantality, or conversely the sonority, of the
speech sounds in any language is a venerable concept, much worked with by
Sievers (1901), among others. The history of the concept is described in chapter
2 of Murray (1988).
Turning now to the question of clustering, there follow some denitions, (2)
to (7).
(2) A cluster is an uninterrupted sequence of cardinality greater than one.
Mathematians would undoubtedly let the cardinality begin with zero, i.e., they
would admit empty clusters and unit clusters. But in everyday usage a cluster
of objects contains at least two objects. The Oxford English Dictionary ex-
presses that much by dening a cluster as a collection of things. Indeed, we
would not, except perhaps jokingly, call a single painting, or no painting at all,
an art collection.
(3) A consonant cluster is a cluster of marginal speech sounds (i.e., a cluster
of speech sounds not interrupted by a nuclear speech sound).
With C for marginal speech sounds and V for nuclear speech sounds, and with
$ (or a period, .) for a syllable boundary, CC, C$C, CCC, C$CC, CC$C etc.
are consonant clusters, CVC, CV$C, CVCC, CCVCC, CV$CC etc. are not.
(4) A head cluster is a consonant cluster entirely within a syllable head.
(5) A coda cluster is a consonant cluster entirely within a syllable coda.
(6) An intersyllabic cluster is a consonant cluster containing both coda and
head speech sounds.
C$C, CC$C, C$CC etc. are intersyllabic clusters.
14 Theo Vennemann
clusters can be compared and judged more or less complex, or even some
numerical scale or measure for the structural complexity for consonant clusters,
then phonology cannot help. But we can do two things below that level of
generality. First, we can compare any two consonant clusters in terms of
structural complexity that are on one and the same quality scale of one of the
preference laws, viz. the Head Law, the Coda Law, and the Contact Law,
cf. (9).
(9) The structural complexity of consonant cluster A is greater than that of
consonant cluster B if B is more preferred than A in terms of one of the
preference laws.
Since the preference laws for syllable structure refer to structural aspects of
syllables, it follows that cluster complexity is structure-dependent. For exam-
ple, kl is less complex than lk in syllable heads but more complex in syllable
codas. This is recognized in (9) by relativizing complexity comparisons to a
particular preference law, in this case either to the Head Law or to the Coda
Law. It makes no sense, in this framework, to ask which of the two clusters,
kl or lk, is less complex an sich, i.e. without such structural relativization.
Second, we can say what contributes to the structural complexity of con-
sonant clusters, if we correlate this concept with that of linguistic quality in
terms of preference, i.e. of graded naturalness, cf. (10).
(10) Every property that makes a consonant cluster less preferred relative to
some other consonant cluster contributes to the structural complexity of
the given consonant cluster.
Let us illustrate (9) above with a straightforward example that everyone knows.
When initial consonant clusters of a plosive and a sonorant are eliminated
eliminated because not only clusters of cardinality greater than two are com-
plex but all clusters are, only single consonants are good there is an order
that may not be broken. Thus, in English, on the partial scale in (11),
(11) English: *kn- ?kl- kr-
||| increasing quality of head clusters
all three clusters existed in Old English, as they do in Contemporary German.
In Contemporary English, the worst of these clusters is gone German Knie is
English knee, where the k- is still spelled but no longer spoken, nor is it speak-
able; the cluster is ungrammatical as a word-initial head cluster. The next on
16 Theo Vennemann
the quality scale, kl-, is unstable in some dialects, t or a glottal stop or some-
thing else, barely audible, being spoken instead of k (cf. Luick 1914: 801,
802, Fisiak 1980, and Lutz 1991: 251254 with further references). The clus-
ter kr- is intact everywhere. As phonologists we can explain this by reference
to the Head Law, part (c). If one compares (11) to (1) farther up, it becomes
obvious that the sonorants in the clusters are arranged along the scale of
Consonantal Strength. Part (c) of the Head Law, the preference law for the
structure of complex syllable heads, says that a head cluster is the more pre-
ferred the sharper the Consonantal Strength drops from the rst head speech
sound to the next. And as one can see in (11.a),
(11.a) k n l r
|||| decreasing Consonantal Strength
the drop from k to n is smallest, which means kn- is most disfavored. Many
phonologists see this as an explanation for the loss of kn-: The English devel-
opment instantiates a universal preference law. This preference law in turn is a
generalization gained by many phonologists looking at the structure and
changes in many languages (see for example Greenberg 1978).
Looking at the quality scale for initial consonant clusters of plosive and
sonorant in more general terms cf. the scale in (12)
(12) A quality scale for CC heads with plosives (P) as onset and plosives
(P), fricatives (F), nasals (N), lateral liquids l, central liquids r, and
semivowels V on the slope)
PP- PF- PN- Pl- Pr- PV -
||||||
we see in (13) to (19) how nicely languages range from instantiating the full
scale to no such head clusters at all.
(13) Classical Greek
PP- PF- PN- Pl- Pr- PV -
||||||
+ + + + + (+)
(14) Contemporary Greek, German
PP- PF- PN- Pl- Pr- PV -
||||||
+ + + + +
Structural complexity of consonant clusters: A phonologists view 17
with degeminating rst in syllable heads and codas, later generally: scf,
skf- > Schaf [ a:f ] sheep, wascan > waschen [van] to wash, tisc,
tisk- > Tisch [t ] table. The same cluster was soon reintroduced in loan-
words: Skat (a card game, < Ital. scarto discarded playing cards), Skandal,
Skrupel, Sklave, Maske, grotesk.
organized their subject matter, I would like at this point to do just the opposite,
i.e., point out that in reality we do not really understand how complexity
problems of this sort are solved in any given case. Not only can we not predict
whether or when a complexity problem comes under attack, we also cannot
predict which of several possible solutions to the problem will be chosen,
so to speak. For example, we understand perfectly that a head cluster C1C2-
that is dispreferred according to the Head Law, part (c), is structurally complex
and therefore likely to come under attack. But whether the problem is resolved
by deleting the onset consonant C1 as too weak or the slope consonant C2
as too strong, or by manipulating the strength of one of the two, namely by
strengthening C1 or by weakening C2, and with what result, or whether a
vowel will be inserted to break up the cluster and achieve a nice C1V.C2V
sequence, or whether the cluster will be partially removed from the head posi-
tion by prosthesis and heterosyllabized as a medial cluster, VC1.C2V, we do
not yet know. All of these measures are on record for various languages, see
the partial illustration in (23) to (31).
(23) Kn- > n- in English
(24.a) Cl- > Cr- in Portuguese, see (22) above
(24.b) Cl- > Ci - in Italian
(25.a) wl- > l- in English, German, Old Norse (Lutz 1997)
(25.b) wl- > bl- in English, German (Lutz 1997), also in Classical Greek
(25.c) wl- > - in German dialect (Lutz 1997)
(25.d) wr- > r- in Scandinavian, English, German dialects (Lutz 1997)
(25.e) wr- > br- in English, German (Lutz 1997)
(26.a) n- > n- in almost all of Germanic (Lutz 1997)
(26.b) n- > gn-, kn- in Scandinavian dialects (Lutz 1997)
(26.c) n- > sn- in Swedish (Lutz 1997)
(27.a) l- > l- in almost all of Germanic (Lutz 1997)
(27.b) r- > r- in almost all of Germanic (Lutz 1997)
(28.a) w- > w- in almost all of Germanic (Lutz 1997)
(28.b) w- > kw-/kv- in Scandinavian (Lutz 1997)
(29.a) fn- > n- in almost all of Germanic (Lutz 1997)
Structural complexity of consonant clusters: A phonologists view 21
The following are some tricky cases of sound change which in times before
phonologists thinking in terms of graded naturalness, or preferences, developed
were simply dubbed metatheses at a distance. Let us look at (32).
(32) Lat. periculum > Span. peligro danger : r - l > l - r
We see r and l changing places, a clear case of metathesis, if there ever was
one. How do we explain it? Do r and l simply exchange position in Spanish?
Certainly not, because the change does not always happen, not even in words
of the same rhythmic structure as peligro, cf. (33).
(33) Lat. alacrem > alegre lively, merry : l - r > idem
So is (32) a simple case of confusion? Certainly not, see (34).
(34) Lat. miraculum > milagro miracle : r - l > l - r
Lat. parabola > palabra word : r - l > l - r
(32) and (34) apparently follow a rule. Is the rule then to change r - l into l - r
but not conversely? Not either, cf. (35).
(35) Lat. aprilem > abril April : r - l > idem
So both l - r and r - l may remain unchanged, and the question is still why r - l
metathesizes precisely in the environment set up by the group in (32) and (34),
and there unexceptionably.
Structural complexity of consonant clusters: A phonologists view 23
(38) $trV rather simple, Vtr $ very complex (Vrt $ less complex)
But whether a cluster is more or less complex depends not only on its position
in the syllable but also on the position of that syllable in larger structures,
especially the word. Please look at (39).
Structural complexity of consonant clusters: A phonologists view 25
(41).a The Early Syllable Law: All syllabic complexities are less
disfavored the earlier they occur within the word.
(41.a) The First Syllable Law: All syllabic complexities are less disfavored
in rst syllables than in later syllables.
See the example in (42).
26 Theo Vennemann
9. Conclusion
In the preceding sections of this paper it has been shown what structural com-
plexity of consonant clusters and what change especially reduction of
consonant cluster complexity may mean in phonology. The phoneticians will
clarify and illustrate these terms in their own language. Since it counts as
the hallmark of a good scientic approach to be compatible with approaches
in neighboring disciplines, and since phonetics is the closest neighbor of
Structural complexity of consonant clusters: A phonologists view 27
Appendix
Excerpts from Vennemann 1988. The laws are also cited and illustrated in
Restle and Vennemann 2001. Numbers refer to pages in Vennemann 1988,
except for the Early Syllable Law and the First Syllable Law where they refer
to Vennemann 1997.
References
Byrd, Dani
1996 A phase window framework for articulatory timing. Phonology 13:
139169.
Browman, C. P., and Louis Goldstein
1986 Towards an articulatory phonology. Phonology Yearbook 3: 219
252.
Browman, C. P., and Louis Goldstein
1989 Articulatory gestures as phonological units. Phonology 6: 201251.
Browman, C. P., and Louis Goldstein
1992 Articulatory phonology: An overview. Phonetica 49: 155180.
Fisiak, Jacek
1980 Was there a kl-, gl- > tl-, dl-change in Early Modern English?
Lingua Posnaniensis 23: 8790.
Greenberg, Joseph H.
1978 Some generalizations concerning initial and nal consonant clusters.
In: Joseph H. Greenberg (ed.), Universals of human language, 4
vols, vol. 1: Phonology, 243279. Stanford, California: Stanford
University Press.
Krahmalkov, Charles R.
2001 A Phoenician-Punic grammar (Handbook of Oriental Studies, Sec-
tion one: The Near and Middle East 54). Leiden: Brill.
Lipski, John M.
1992 Metathesis as template-matching: A case study from Spanish. Folia
Linguistica Historica 11 (1990 [1992]): 89104, and 12 (1991
[1992]): 127145.
Luick, Karl
19141940 Historische Grammatik der englischen Sprache, 2 vols. Leipzig:
Bernhard Tauchnitz. [Reprint Stuttgart: Bernhard Tauchnitz, 1964.]
Lutz, Angelika
1991 Phonotaktische gesteuerte Konsonantenvernderungen in der Ge-
schichte des Englischen (Linguistische Arbeiten 272). Tbingen:
Niemeyer.
Lutz, Angelika
1997 Lautwandel bei Wrtern mit imitatorischem oder lautsymbolischem
Charakter in den germanischen Sprachen. In: Kurt Gustav Goblirsch,
Martha Berryman Mayou and Marvin Taylor (eds.), Germanic studies
in honor of Anatoly Liberman, 439462. (NOWELE 31/32.) Odense:
Odense University Press.
Morelli, Frida
1998 Markedness relations and implicational universals in the typology of
onset obstruent clusters. Proceedings of the Annual Meeting of the
North Eastern Linguistic Society [NELS] 28, vol. 2. Available on the
Internet at http://ebookbrowse.com/roa-251-morelli-2-pdf-d6710926
(24 April 2011).
30 Theo Vennemann
Morelli, Frida
1999 The phonotactics and phonology of obstruent clusters in optimality
theory. Ph.D. Dissertation, University of Maryland at College Park.
Available on the Internet at http://roa.rutgers.edu/view.php3?id=432
(24 April 2011).
Murray, Robert W.
1982 Consonant cluster development in Pli. Folia Linguistica Historica
3: 163184.
Murray, Robert W.
1988 Phonological strength and Early Germanic syllable structure (Studies
in Theoretical Linguistics 1.) Munich: Wilhelm Fink.
Murray, Robert W., and Theo Vennemann
1982 Syllable contact change in Germanic, Greek, and Sidamo. Klagen-
furter Beitrge zur Sprachwissenschaft 8: 321349.
Restle, David, and Theo Vennemann
2001 Silbenstruktur. In: Martin Haspelmath, Ekkehard Knig, Wulf
Oesterreicher and Wolfgang Raible (eds.), Sprachtypologie und
sprachliche Universalien: Ein internationales Handbuch, II.1310
1336. (Handbcher zur Sprach- und Kommunikationswissenschaft
20.) 2 vols. Berlin: Walter de Gruyter.
Rocho, Marzena
2000 Optimality in complexity: The case of Polish consonant clusters.
(Studia Grammatica 48.) Berlin: Akademie-Verlag.
Rohlfs, Gerhard
1972 Historische Grammatik der italienischen Sprache und ihrer Mund-
arten. (Bibliotheca Romanica 5.) 3 vols. Vol. I: Lautlehre. 2nd
unchanged ed. [1st ed. 1949.] Bern: Francke.
Sievers, Eduard
1901 Grundzge der Phonetik zur Einfhrung in das Studium der Laut-
lehre der indogermanischen Sprachen. 5th ed. Leipzig: Breitkopf &
Hrtel. [Reprint Hildesheim: Georg Olms 1976.]
Tillmann, Hans G., with Phil Mansell
1980 Phonetik: Lautsprachliche Zeichen, Sprachsignale und lautsprachlicher
Kommunikationsproze. Stuttgart: Klett-Cotta.
Trask, R. Larry
1997 The history of Basque. London: Routledge.
Vennemann, Theo
1988 Preference laws for syllable structure and the explanation of sound
change: With special reference to German, Germanic, Italian, and
Latin. Berlin: Mouton de Gruyter.
Vennemann, Theo
1989 Language change as language improvement. In: Vincenzo Orioles
(ed.), Modelli esplicativi della diacronia linguistica: Atti del Con-
vegno della Societ Italiana di Glottologia, Pavia, 1517 settembre
1988, 1135. Pisa: Giardini Editori e Stampatori. [Reprinted in:
Structural complexity of consonant clusters: A phonologists view 31
Rina Kreitman
Abstract
In previous literature it has been reported that the features [sonorant] and [voice] are
closely related. Voicing has long been linked to the feature [sonorant] as one of its
phonetic correlates, since voicing is one of the attributes common to all sonorant con-
sonants. It has been suggested that the distribution of the feature [voice] in clusters can
be predicted from the behavior of the feature [sonorant]. If sonority reversed clusters
are prohibited, voicing reversals, a situation where voicing decreases within a cluster
pre-vocalically, should not be tolerated either (Lombardi 1991). Here, I report on a
cross-linguistic typological study of the distribution of these two features in word-
initial onset clusters and how they relate to one another. The different typological
patterning of the two features and their internal markedness imply that it is impossible
to predict the typological patterning of clusters in terms of one of these features based
on the other. A language can be of one type in terms of [sonorant] but of a different
type in terms of [voice]. The typology presented can further predict language type
shifts due to historical changes. The prediction is: no matter the stage the language is
in, it must become a type of language predicted by the typology.
1. Introduction
1. Languages which are argued to rely on features other than [voice] to distinguish
between obstruents, were excluded from the survey as will become evident in
section 3.
34 Rina Kreitman
own ndings, to be reported here, do not support this position. Rather, I show
that the organization of onset clusters in terms of the feature [sonorant] follows
a different pattern from the organization of onset clusters in terms of the fea-
ture [voice]. I show that the claim that [+voice][voice] clusters are closely
correlated with SO clusters (Lombardi 1991) is untenable.
While it is possible that the two features [sonorant] and [voice] are closely
linked phonetically (Parker 2002, 2008), it is not immediately transparent that
they are mutually dependent. As will become evident from the typologies
presented here, the typological patternings of the two features are entirely
independent of each other and therefore, these two features cannot be reduced
to a single feature. Moreover, I show that the patterning of one feature does
not provide any clues about the typological patterning of the other feature.
Furthermore, the markedness relations of clusters in terms of the feature
[sonorant] are quite different from markedness relations in terms of the feature
[voice], which will become evident in the discussion in section 4. The typolo-
gies I present are a result of a cross linguistic survey, which included 63 lan-
guages from 22 language families.
The typologies presented here are based strictly on the phonological features
[sonorant] and [voice]. It is important to note that in this work I discuss the
feature [sonorant], which partitions the consonant set into two classes: the
class of obstruents and the class of sonorants. Following Zec (1995), I address
only the classes of obstruents and sonorants and do not address any further
distinctions within these classes. In other words, the phonological feature
[sonorant] is not equated to the commonly used property sonority, expressed
in terms of a scale. This paper does not address the further ne-grained dis-
tinctions found in more elaborate sonority scales or the behavior of such
sonority scales but rather, it explores the relationship of the feature [sonorant]
and the feature [voice].
In word initial, bi-consonantal onset clusters there are four logical combinations
of obstruents (O) standing for [sonorant] consonants, and sonorants (S),
standing for [+sonorant] consonants. The four logical possibilities for combin-
ing obstruent (O) and sonorant (S) consonants in an onset cluster are as in (1):
(1) a. OS b. OO c. SS d. SO
In the obstruent (O) class only consonantal segments specied for [sonorant]
are included; this includes both stops and fricatives. Conversely, only segments
On the relations between [sonorant] and [voice] 35
specied for [+sonorant] are included in the sonorant (S) class. For the purpose
of this survey only, this latter group consisted of liquids and nasals. Glides
were excluded for reasons listed in (5).
Logically, a language can have any of the clusters in (1), or any combina-
tion of them, or none. A language that has none of the clusters listed in (1) is,
of course, a language that does not allow any consonantal clusters. We exam-
ine only those languages which allow at least one of the clusters listed in (1).
Given the cluster combinations in (1), a-priori there are fteen logical possibil-
ities for combining these clusters into groups of one to four cluster types.
Therefore, a-priori there are fteen logically possible language types, as in (2).
If a language L has only one of the onset clusters listed in (1), it can, a-priori,
be any one of them, as in (2a). If a language has two of the onset clusters in
(1), it can, a-priori be any of the sets listed in (2b). If a language has three of
the onset clusters in (1), it can have any of the sets listed in (2c). Finally, it is
logically possible for a language to have all four onset clusters listed in (1), as
in (2d). A language that has no onset clusters constitutes an empty group, { },
which is a sixteenth logically possible language type and is excluded from this
study.
(2) a. 1 cluster b. 2 clusters c. 3 clusters d. 4 clusters
{OS} {OS,OO} {OS,OO,SS} {OS,OO,SS,SO}
{OO} {OS,SS} {OS,OO,SO}
{SS} {OS,SO} {OS,SS,SO}
{SO} {OO,SS} {OO,SS,SO}
{OO,SO}
{SS,SO}
In sum, in (2) I list all fteen logically possible language types (excluding
the empty group). The question arises, which of the logically possible language
types in (2) are occurring language types. To address this, I conducted a cross-
linguistic survey of languages which allow word initial onset clusters. The
methodology of the survey is outlined in section 2.1 and the results of the
survey are presented in section 2.2.
2. Languages were also excluded for technical reasons, for example, if sources of
data were incomplete or inconclusive. Some sources, for example, Matthews
(1955) for Dakota, and Hoff (1968) for Carib, do not make a clear distinction
between word initial and word medial clusters, which makes it impossible to dis-
tinguish them. Moreover, for Dakota, different grammars listed different possible
clusters. Also excluded were languages for which data from different sources
were inconsistent. One such example is Chukchee (Bogoras 1922, Kenstowitz
1981 and Levin 1985 among others). Some sources claim that Chukchee contains
initial clusters (Levin 1985 following Bogoras 1922) while others (Kenstowiz
1981) claim that clusters in Chukchee are broken by vowel epenthesis. Skorik
(1961) explains that in Chukchee in some words consonantal sequences can
appear either with or without a vowel word initially but when the same sequence
appears in an onset position word medially, it must appear with the vowel between
the two segments or with a preceding vowel, suggesting that consonant sequences
are not truly clusters underlyingly. This is conrmed in Asinovskiis (1991) acoustic
data.
38 Rina Kreitman
(5) Additional conditions and criteria for excluding languages from the
survey:
(5) ii(i) A language was excluded if it had only obstruent + glide clusters.
For example, Korean, which has obstruent + glide clusters such as
py and gw, was not included.3
(5) i(ii) Also excluded from the survey were languages with only homorganic
nasal + obstruent clusters such as mb and nd. For example, Babungo
(Schaub 1985) has only simplex onsets and pre-nasalised onsets and
no other clusters. The phonological status of pre-nasalised sequences
is not immediately transparent. Such sequences can be a cluster or
a pre-nasalised segment (Maddieson and Ladefoged 1993, Riehl
2008). Without more information about the phonological status of
these sequences, it is impossible to determine whether a specic
sequence is a cluster or simply a pre-nasalised unary segment.
Languages which have non-homorganic nasal-obstruent sequences
in addition to homorganic nasal-obstruent sequences were included
in the survey. For example, if a language has mb clusters but also
mt or mk clusters (Taba, Bowden 2001), then the language was
included in the survey but the homorganic clusters were excluded
(i.e. they were not counted as SO clusters since their underlying
status is not always transparent, and they may or may not be clus-
ters). The non-homorganic clusters were included in the survey.
(5) (iii) Also excluded were languages which have only h + obstruent or
+ obstruent clusters, or obstruent + h and obstruent + clusters
such as Comanche (Riggs 1949) since these may function as pre-
or post-aspiration or glottalization.4
In sum, the survey focuses on languages which allow bi-consonantal word
initial onset clusters. Some of the languages included in the survey, such
as Chatino (McKaughan 1954), Georgian (Butskhrikidze 2002), and Polish
(Sawicka 1974), to name a few, allow clusters longer than two consonants
but those clusters were not the focus of this survey.
3. Clusters with glides as the second member are not included in this survey. Surface
glides may have a different underlying status. They may be underlying glides that
surface as glides or they may be underlyingly vowels that surface as glides (Levi
2004, 2008). Due to the lack of transparency in the underlying status of glides,
clusters with glides were excluded from the survey all together.
4. Mazatec (Steriade 1994) and Temoayan Otomi (Andrews 1949) are examples of
languages which have mostly pre- and post-aspirated and pre- and post-glottalised
sequences as well as pre-nasalised sequences; therefore, they were excluded from
the survey all together.
On the relations between [sonorant] and [voice] 39
Type OS OO SS SO Language
Type 1 Z Basque, Wa
Type 2 Z Z Kutenai, Modern Hebrew
Type 3 Z Z Z Greek, Irish
Type 4 Z Z Z Z Georgian, Russian, Pashto
In sum, evident from Table (1) are the implicational relations captured in
(7). The implicational relations in (7) are all unidirectional and without excep-
tions in the languages of the survey. Next, I single out crucial asymmetries
evident in Table (1) and the implicational relations in (7).
(7) SO % SS % OO % OS
First, there is an asymmetry between the right and left edges of the implica-
tional relations. The presence of SO clusters implies the presence of all other
clusters while OS clusters are implied by all other clusters. This asymmetry is
expected given that SO is of falling sonority, that is, violates the SSP, while
OS has a rise in sonority, i.e. conforms to the SSP. Based on the SSP we
expect clusters with rising sonority to occur more frequently than clusters
with reversed sonority. It is important to note that in this work an increase
or a rise in sonority means an increase from a negative value of the feature
[sonorant] to a positive one. That is, there is an increase in sonority from an
40 Rina Kreitman
OS OO SS SO
# of langs 63/63 54/63 32/63 19/63
% 100% 85.7% 50.8% 30%
Table (2) presents the distributional data regarding each cluster type. From
Table (2) it is evident that if a language allows a consonantal cluster word
initially it will allow an OS cluster. More surprising is the frequency of OO,
SS and SO cross-linguistically. First, 30% of the languages in the survey admit
one or more SO clusters. This number is quite signicant making SO clusters
much more common than previously assumed. They are not anomalies occur-
ring rarely; rather they occur cross-linguistically in languages as varied as
Russian (Indo-European) and Hua (Trans New-Guinea). Secondly, the asym-
metry between OO and SS clusters is quite robust with OO clusters being
more than one and a half times more common than SS clusters, although
both constitute sonority plateaus.
3. Voicing typology
specied for [voice], only a subset of the clusters examined in section 2 will
be the focus of this section.
Lombardi claims that the prohibition against [+v][v] onset clusters is uni-
versal. According to her, voiced segments may occur only before a sonorant
segment, either a vowel or a sonorant consonant. Her argument continues that
a [+v][v] obstruent cluster cannot be an occurring cluster type because voice-
less segments cannot intervene between a voiced obstruent and a vowel. She
refers to the gure in (9) as a Universal Sonority Constraint, . . . an absolute
universal which no language can violate. (1991: 59). Moreover, Lombardi
correlates the prohibition in the gure in (9) with the prohibition on sonority
reversed clusters. For her, voicing reversals are comparable to SO clusters.
As we will see in the next section, this parallel is untenable.
Likewise, Lindblom (1983) claims, based on the principle of gestural
economy, that [+v][v] clusters should be excluded on phonetic grounds.
On the relations between [sonorant] and [voice] 43
From (10) it is clear that the majority, or two thirds (66.7%) of the clusters
in Greenbergs survey are sequences of voiceless obstruents [v][v]. All other
cluster types constitute the remaining third. Of these, almost 22% are [+v][+v]
and just under 11% are [v][+v] clusters. Under 1% of all clusters are [+v][v]
clusters. However, the numbers are somewhat misleading since Greenberg
does not separate obstruent clusters from sonorant clusters. A tn cluster, for
example, is considered a [v][+v] cluster. This skews the numbers of [v][+v]
clusters and [+v][+v] clusters, making it difcult to correctly decipher the sta-
tistical data.
Evident from this survey is that clusters in which both members are voice-
less are preferable to clusters with any other voicing combination. Mixed voic-
ing clusters are a great minority at just a little over 12% of all clusters but both
[v][+v] and [+v][v] clusters exist. However, while Greenberg accepts the
existence of [v][+v] clusters, he doubts the existence of [+v][v] clusters,
although his survey lists two languages, Bilaan and Khasi, for which obstruent
[+v][v] clusters have been reported. Since Greenbergs sources for Bilaan and
Khasi (Dean 1955 and Rabel 1961 respectively), presented no phonetic evi-
dence for [+v][v] obstruent clusters, Greenberg allows for the possibility
that the reported [+v][v] clusters are phonetically realised as [v][v];
clusters like bt reported for Khasi and bs reported for Bilaan, might actually
be phonetically realised as pt and ps respectively. Since Bilaan does not distin-
guish between b and p, and contains only b in its phonemic inventory, it is
possible that the cluster bs listed in the grammar is phonetically realised as
44 Rina Kreitman
claimed to be based on the feature [spread glottis] (Iverson and Salmons 1995,
Jessen 2001, Jessen and Ringen 2002 among others). That is, they are claimed
to have a distinction between aspirated and unaspirated stops rather than
voiced and voiceless stops. For some of these languages (German), there are
conicting claims regarding the proper distinctive laryngeal feature. Since the
nature of the distinctive feature in these languages is controversial but is out-
side the scope of this work, these languages were excluded from the survey for
the feature [voice]. The methodology I employed is the same as outlined in
section 2.1. The languages included in this section are also listed in appendix I.
Results of the survey indicate that in reality clusters of the [+v][v] type do
occur albeit they are rare. Six languages are reported to contain such clusters
and three cases have been documented with supporting phonetic evidence in
the literature:
(i) Khasi in which, dk in dkar tortoise is distinct from tk in tkor-tkor
plump and tender. According to Henderson (1991), dissimilation of
voicing is a widespread feature in Khasi. However, few phonetic details
are available, and, unfortunately, in the case of the only instrumental
investigation (Henderson 1991 reproduced in Kreitman 2008, 2010) it
is not clear that the material was produced by a native speaker.
(ii) Tsou in which s is distinct from ps (Wright 1996 reproduced in Kreitman
2008, 2010);5
(iii) Modern Hebrew in which dk in dkalim palms is distinct from tk in
tkarim at tires and dg in dgalim ags (Kreitman 2008, 2010).
Figure 1 is a spectrogram of the word dkalim palm trees from Modern
Hebrew (Kreitman 2008). It is an illustrative sample which provides acoustic
phonetic evidence for the existence of [+v][v] clusters in addition to the evi-
dence available from Khasi and Tsou. A much wider range of utterances and
many more examples of the occurrence of [+v][v] clusters, can be found in
Kreitman (2008, 2010).
Given these facts, Lombardis cross-linguistic prohibition against [+v][v]
clusters and Lindbloms prediction that [+v][v] clusters cannot be produced,
have no empirical basis.
The presence of at least one varied voicing combination implies the presence
of a [v][v] cluster. But in a Type 3 language both, a [v][+v] cluster and a
[+v][+v] cluster are present as in Georgian. By implication a Type 3 language
also contains a [v][v] cluster, as in (14):
(14) [+v][+v]
+
[v][+v] % [v][v]
A Type 4 language has all possible voicing combinations as in (12). Lan-
guages which belong to this type include, Modern Hebrew, Tsou, Hua and
Khasi.7
A Type 5 language, however, contains only one cluster with varying voic-
ing and by implication also a [v][v] cluster as in (15) below. Languages
which belong to this type include Biloxi and Camsa.
(15) [v][+v] % [v][v]
A Type 6 language has both possible varying voicing clusters [v][+v] and
[+v][v] and therefore by implication also [v][v] clusters as in (16):
(16) [+v][v] % [v][+v] % [v][v]
A Type 6 language is typologically predicted on the basis of the implica-
tional relations in (12); in Table (3) this is exemplied by Bilaan and Amuesha.
The only available grammatical description of Bilaan (Dean 1955) lists [v][v],
[v][+v] and [+v][v] as occurring clusters, making Bilaan a Type 6 language.
However, as mentioned previously, with lack of phonetic evidence, the cases
of [+v][v] in Bilaan are suspect. Without further phonetic investigation it is
impossible to determine whether the [+v][v] clusters are realised as such in
Bilaan or whether some other phonetic properties are used to distinguish these
clusters.
7. Berber (Berber) and Moroccan Arabic (Semitic) may also be Type 4 languages in
terms of the feature [voice], since they allow [+v][v] clusters word initially. This
can, potentially, increase the number of Type 4 languages in terms of the feature
[voice] to 8 and the percentage of languages which permit [+v][v] clusters to
17%. However, these languages were not included in the survey for two reasons.
The rst reason for excluding these languages is because available sources did not
give exhaustive coverage of permissible clusters. The second reason these clusters
were excluded is because the syllabic status of the initial clusters in these
languages is controversial (Dell and Elmedlaoui 2002, Shaw et al., 2009 and
references therein).
On the relations between [sonorant] and [voice] 49
The numbers presented in Table (4) for the distribution of the various voic-
ing combinations in clusters differ quite signicantly from the numbers found
by Greenberg (1965), provided in (10). This is to be expected considering
Greenberg calculated the distribution of each cluster type out of the entire
set of cluster types while the calculation presented here shows how many
languages contain a certain cluster type out of the subset of obstruent clusters
only. Surprisingly, [+v][+v] cluster type is much more rare than initially ex-
pected. Conversely, the mixed cluster types are more common than initially
expected.
We are now in a position to compare the implicational relations for the two
typologies presented in sections 2 (for the feature [sonorant]) and in section 3
(for the subset of obstruents specied for the feature [voice]). The implica-
tional relations found for the feature [sonorant] are repeated in (17a) and the
implicational relations found for the feature [voice] are repeated in (17b).
(17) (a) Sonority implicational relations:
SO % SS % OO % OS
50 Rina Kreitman
stage, but allows them at another stage, is said to shift types. Clusters may
become part of the grammar in several ways: borrowings, morphological
or phonological processes such as syncope. Predictions regarding language
type shifts follow from the implicational relations stated in (7) and (17a). A
language L1 of type T1, can change membership and become a member of
another type, T2, by changing the inventory of clusters allowed by the lan-
guages grammar. It follows from (7) that if a language has no clusters then
the rst cluster type it will achieve is OS. Thus, a language with no clusters
can shift to become a Type 1 language, i.e. a language with OS clusters.
Examples of languages that shifted types are West Greenlandic (Fortescue
1984) and Popoluca (Elson 1947). Both languages disallowed consonantal
clusters word initially at an earlier point in their history, and due to borrowing
(from Danish and Spanish respectively), have shifted to become Type 1 lan-
guages; both now allow OS clusters.
A language may also gain clusters through a process of vowel syncope. For
example, a vowel may be consistently deleted in the rst syllable of every
word. That could result in a language gaining all types of clusters at once and
becoming a Type 4 language. However, a language cannot gain only {OO} or
only {SS} clusters as languages with only {OO} or {SS} clusters are not
empirically attested and are therefore not part of the typology.
It is also possible for a language to lose clusters. Once again, it is predicted
that if a language loses one cluster type, it will lose the cluster type which
implies all other clusters. Thus, a language of Type 4, which allows reversed
sonority clusters, those that imply all other clusters, may disallow such clusters
and shift to become a Type 3 language.
The prediction is that no matter what stage the language is in, if it gains
or loses clusters, it must become a language type which is predicted by the
typology. A language will never gain only OO and SS clusters without having
OS clusters as well, because the set *{OO, SS} cannot belong to an occurring
language type.
6. Conclusion
While claims in the literature link the feature [sonorant] and the feature
[voice], it has been shown here that they may not be so closely correlated, at
least not typologically. This suggests that, although these two features may
interact in complex ways, they are not mutually dependent and their typologi-
cal patterning cannot be reduced to a single pattern. That is, the phonological
patterning of one of these features in clusters cannot be conjectured based on
52 Rina Kreitman
the other feature. The typological patterning of clusters based on the feature
[sonorant] does not provide any clues about the phonological patterning of
the feature [voice] in clusters. A language can be of one type in regards to
one of these features, and another type in regards to the other. For example,
Russian exhibits all possible clusters of the feature [sonorant], OS, OO, SS
and SO, making it a Type 4 language in terms of the feature [sonorant], yet
only two combinations of the feature [voice] are permitted, [v][v] and [+v]
[+v], making it a Type 2 language in terms of the feature [voice]. Russian,
thus, is elaborate in terms of the combinations it allows word initially for the
feature [sonorant] but relatively simple in terms of the voicing combinations it
permits. Modern Hebrew is the opposite example. It only allows two cluster
types in terms of the feature [sonorant], OS and OO, making it a Type 2
language in terms of the feature [sonorant], but allows all possible voicing
combinations, [v][v], [+v][+v], [v][+v] and [+v][v], making it a Type 4
language in terms of the feature [voice]. Modern Hebrew is simple in terms
of the combinations it allows word initially for [sonorant] but is quite complex
in terms of the voicing combinations it permits. This suggests that typological
classication of languages based on either one of these features should be
explored independently.
On the relations between [sonorant] and [voice] 53
[sonorant] [voice]
Language
OS OO SS SO [vv] [+v+v] [v+v] [+vv]
Aguacatec Z Z Z Z Z Z
Aleut Z Z Z
Amuesha Z Z Z Z Z
Basque Z
Belarusian Z Z Z Z Z
Bilaan Z Z Z Z Z Z Z
Biloxi Z Z Z Z
Breton Z Z Z
Bulgarian Z Z Z Z Z
Cambodian Z Z Z Z Z Z
Camsa Z Z Z Z
Chami Z
Chatino Z Z Z Z Z Z13
Cornish Z Z Z
Czech Z Z Z Z Z Z
Danish Z Z 12
Dutch Z Z Z
Embara Catio Z
Frisian Z Z Z
Gaelic (Scots) Z Z Z 12
Georgian Z Z Z Z Z Z Z
German Z Z 12
Greek Z Z Z Z Z
Hebrew (Modern) Z Z Z Z Z Z
Hindi Z Z Z Z
Hixkaryana Z Z Z
54 Rina Kreitman
Appendix I: Continued
[sonorant] [voice]
Language
OS OO SS SO [vv] [+v+v] [v+v] [+vv]
Hua Z Z Z Z Z Z Z Z
Hungarian10 Z Z Z Z9 Z
Icelandic Z Z 12
Inga Z Z Z Z Z
Irish Z Z Z 12
Khasi Z Z Z Z Z Z Z Z
Klamath Z Z Z Z 12
Kobon Z
Kutenai Z Z Z
Lithuanian Z Z Z Z
Macedonian Z Z Z Z Z
Manx Z Z Z Z
Mon (Burmese) Z
Norwegian Z Z Z
Pashto Z Z Z Z Z Z
Polish Z Z Z Z Z
Popoluca Z
Romani Z Z Z
Romanian Z Z Z Z Z
Russian Z Z Z Z Z Z
Serbian Z Z Z Z Z
Seri Z Z Z
Slovak Z Z Z Z Z Z
Slovenian Z Z Z Z Z
Sorbian (lower) Z Z Z Z Z Z
Sorbian (upper) Z Z Z Z Z Z
On the relations between [sonorant] and [voice] 55
Appendix I: Continued
[sonorant] [voice]
Language
OS OO SS SO [vv] [+v+v] [v+v] [+vv]
Spanish Z
Swedish Z Z Z
Taba Z Z Z Z Z
Totonac Z Z Z
Tsou Z Z Z Z Z Z14 Z Z
Ukrainian Z Z Z Z Z Z
Wa Z
Welsh Z Z 12
Yiddish Z Z Z Z Z Z
Zapotec (Isthmus)11 Z Z Z
Zoque Z
Language database (An asterisk (*) next to the language name indicates that
the language was not included in the survey as it either did not contain any
clusters or did not conform to the conditions listed in (3) and (5)):
Aguacatec (Mayan) McArthur and McArthur 1956
Aleut (Eskimo-Aleut) Bergsland 1997
Amuesha (Arawakan) Fast 1953
Armenian* (Armenian, Indo-European) Werner 1962; Vaux 1998
Arabic* (Moroccan) Shaw, Gafos, Hoole and Zeroual 2009; Dell and Elmedlaoui
2002
Asheninka*(Arawakan) Dirks 1953
Basque (Basque) Hualde 1991
Babungo* (Niger-Congo) Schaub 1985
Belarusian (Slavic Indo-European) Sawicka 1974
Bengali*(Indo-Iranian, Indo-European) Ferguson and Chowdhury 1960
Beber* (Berber) Dell. and Elmedlaoui 2002
Bilaan (Austronesian) Dean and Dean 1955
Biloxi (Siouan) Einaudi 1976
Breton (Celtic, Indo-European) Ternes 1992
Bulgarian (Slavic Indo-European) Scatton 1984; Sawicka 1974
Burmese* (Sino-Tibetan) Sun 1986
Cambodian (Mon-Khmer) Nacaskul 1978
Camsa (language isolate) Howard 1967
Carib* (Carib) Hoff 1968
Chami (Choco) Gralow 1976
Chatino (Oto-Manguean) McKaughan 1954
Chukchee* (Chukotko-Kamchatkan) Asinovskii 1991; Bogoras 1922; Kenstowicz
1981; Levin 1985; Skorik 1961
Comanche* (Uto-Aztecan) Riggs 1949
Cornish (Celtic, Indo-European) George 1993
Cuicateco* (Oto-Manguean) Needham and David 1946
Czech (Slavic, Indo-European) Kuera 1961; Kuera and Monroe 1968
Dakota* (Siouan) Matthews 1955
Danish (Germanic, Indo-European) Diderichsen 1964; Hansen 1967
Dutch (Germanic, Indo-European) Booij 1995
Eggon* (Niger-Congo) Maddieson 1981
Embara Catio (Choco) Mortensen 1999
French* (Romance, Indo-European) Dell 1995
Frisian (Germanic, Indo-European) Cohen, Ebeling, Fokkema and van Holk 1961
Gaelic (Scots) (Celtic, Indo-European) Gillies 1993; Green 1997
Georgian (Kartvelian) Butskhrikidze 2002; Chitoran 1998; Chitoran 1999;
Chitoran, Goldstein and Byrd 2002; Gvarjaladze and Gvarjaladze 1974
German (Germanic, Indo-European) Wiese 1996
Greek (Greek, Indo-European) Eleftheriades 1985; Joseph and Philippaki-
Warburton 1987
On the relations between [sonorant] and [voice] 57
References
Agard, Frederick B.
1958 Structural sketch of Rumanian. Language, 34(1): 7127.
Ambrazas, Vytautas
1997 Lithuanian Grammar. Vilnius: Baltos lankos.
Andrews, Henrietta
1949 Phonemes and morphophonemes of Temoayan Otomi. International
Journal of American Linguistics, 15: 213222.
Aschmann, Herman P.
1946 Totonaco phonemes. International Journal of American Linguistics,
12: 3443.
Asinovskii, Aleksandr Semenovich
1991 Konsonantizm Chukotskogo Jazyka [Consonantism of the Chukchee
language]. Leningrad: Nauka. (In Russian).
Awbery, Gwenllian M.
1984 Phonotactic constraints in Welsh. In Martin J. Ball and Glyn E.
Jones (eds.), Welsh Phonology, Selected Readings, 65104. Cardiff:
University of Wales Press.
Ball, Martin and James Fife (eds.)
1993 The Celtic Languages. London: Routledge.
Barka, Malachi and Julia Horvath
1978 Voicing assimilation and the sonority hierarchy: Evidence from Rus-
sian, Hebrew and Hungarian. Linguistics, 212: 7788.
Barker, Muhammad A. R.
1964 Klamath Grammar. University of California Publications in Linguis-
tics 32. University of California Press.
Bat-El, Outi
1994 Stem modication and cluster transfer in Modern Hebrew. Natural
Language and Linguistic Theory, 12: 571596.
Bergsland, Knut
1997 Aleut Grammar: Unangam Tunuganaan Achixaasix. Fairbanks:
Alaska Native Language Center.
Berman, Ruth
1997 Modern Hebrew. In Robert Hetzron (ed.), The Semitic Languages,
312333. New York: Routledge.
On the relations between [sonorant] and [voice] 59
Blevins, Juliette
2003 The independent nature of phonotactic constraints: An alternative
to syllable-based approaches. In Caroline Fry and Ruben van
de Vijver (eds.), The Syllable in Optimality Theory, 375404.
Cambridge: Cambridge University Press.
Bogoras, Waldemar
1922 Chukchee. In Franz Boas (ed.), Handbook of American Indian
Languages: Part 2. Washington: Smithsonian.
Booij, Geert
1995 The Phonology of Dutch. Oxford: Oxford University Press.
Bowden, John
2001 Taba: Description of a South Halmahera language. Pacic Linguis-
tics 521. Canberra: Australian National University.
Briggs, Elinor
1961 Mitla Zapotec Grammar. Mexico: Instituto Lingstico de Verano
and Centro de Investigaciones Antropolgicas de Mxico.
Broderick, George
1993 Manx. In Martin Ball and James Fife (eds.), The Celtic Languages,
228288. London: Routledge.
Butskhrikidze, Marika
2002 The Consonant Phonotactics of Georgian. Utrecht: LOT.
Chavarria-Aguilar, O. L.
1951 The phonemes of Costa Rican Spanish. Language, 27(3): 248253.
Chayen, Moshe J.
1972 The accent of Israeli Hebrew. Lenshonenu, 36: 212219, 287300.
Chayen, Moshe J.
1973 The Phonetics of Modern Hebrew. The Hague: Mouton.
Chitoran, Ioana
1998 Georgian harmonic clusters: Phonetic cues to phonological represen-
tation. Phonology, 15(2): 121141.
Chitoran, Ioana
1999 Accounting for sonority violations: The case of Georgian consonant
sequencing. Proceedings of the 14th International Congress of
Phonetic Sciences, 101104. San Francisco, August 1999.
Chitoran, Ioana, Louis Goldstein and Dani Byrd
2002 Gestural overlap and recoverability: Articulatory evidence from
Georgian. In Carlos Gussenhoven and Natasha Warner (eds.),
Laboratory Phonology 7, 419447. Berlin, New York: Mouton de
Gruyter.
Cho, Young-mee Yu and Tracy Holloway King
2003 Semi-syllables and universal syllabication. In Caroline Fry and
Ruben van de Vijver, (eds.), The Syllable in Optimality Theory:
183212. Cambridge: Cambridge University Press.
Clements, Nick G.
1990 The role of the sonority cycle in core syllabication. In John King-
ston and Mary Beckman (eds.), Papers in Laboratory Phonology I:
60 Rina Kreitman
Einaudi, Paula
1976 A Grammar of Biloxi. New York: Garland.
Eleftheriades, Olga
1985 Modern Greek: A Contemporary Grammar. Palo Alto: Pacic Books
Publishers.
Elson, Ben
1947 Sierra Popoluca syllable structure. International Journal of American
Linguistics, 13(1): 1317.
Engelenhoven, Aone van
1995 A Description of the Leti language (as spoken in Tutukei). Ridderkerk:
Offsetdrukkerij Ridderprint B.V.
Everett, Daniel and Keren Everett
1984 On the Relevance of Syllable Onsets to Stress Placement. Linguistic
Inquiry, 15: 705711.
Fast, Peter W.
1953 Amuesha (Arawak) phonemes. International Journal of American
Linguistics, 19: 191194.
Ferguson, Charles and Munier Chowdhury
1960 The Phonemes of Bengali. Language, 36(1): 2259.
Fortescue, Michael D.
1984 West Greenlandic. London, Croom Helm.
Garvin, Paul L.
1948 Kutenai I: Phonemics. International Journal of American Linguistics,
14: 3742.
George, Ken
1993 Cornish. In Martin Ball and James Fife (eds.), The Celtic Languages,
410470. London: Routledge.
Gillies, William
1993 Scottish Gaelic. In Martin Ball and James Fife (eds.), The Celtic
Languages, 145227. London: Routledge.
Goedemans, Rob
1998 Weightless segments. The Hague: Holland Academic Graphics.
Gordon, Matthew
1999 Syllable weight: Phonetics, phonology, and typology. Ph.D. disserta-
tion, Department of Linguistics, University of California, Los
Angeles.
Gralow, Frances L.
1976 Fonologa del Cham [Chami Phonology]. Sistemas Fonolgicos de
Idiomas Colombianos 3, 2942. Bogot: Ministerio de Gobierno
and Instituto Lingstico de Verano.
Green, Anthony
1997 The prosodic structure of Irish, Scots Gaelic, and Manx. Ph.D. dis-
sertation, Department of Linguistics, Cornell University.
Greenberg, Joseph
1965 Some generalizations concerning initial and nal Consonant sequences.
Linguistics, 18: 534. (reprinted as Greenberg 1978).
62 Rina Kreitman
Greenberg, Joseph
1978 Some generalizations concerning initial and nal consonant clusters.
In Joseph H. Greenberg (ed.), Universals of Human Language, vol.
2: Phonology. Stanford, California: Stanford University Press.
Gumperz, John
1958 Phonological differences in three Hindi dialects. Language, 34:
212224.
Gussmann, Edmund
1992 Resyllabication and delinking: the case of Polish voicing. Linguistic
Inquiry, 23, 2556.
Gvarjaladze Tamar and Isidor Gvarjaladze
1974 English-Georgian Dictionary. Tbilisi: State Publication House.
Haiman, John
1980 Hua, A Papuan Language of the Eastern Highlands of New Guinea.
Amsterdam: John Benjamins.
Hajek, John and John Bowden
1999 Taba and Roma: clusters and geminates in two Austronesian lan-
guages. In Proceedings of the XIVth Congress of Phonetic Sciences:
San Francisco, 17 August, 10331036. American Institute of
Physics.
Halpern, Abraham Meyer
1946 Yuma I: Phonemics. International Journal of American Linguistics,
12(1): 2533.
Hansen, Aage
1967 Moderne Dansk [Modern Danish]. Kbenhavn: Forlag Harley. (In
Danish)
Henderson, Eugnie
1991 Khasi clusters and Greenbergs universals. Mon-Khmer Studies, 18
19: 616.
Hoard, James E.
1978 Remarks on the nature of syllabic stops and affricates. In Alan Bell
and Joan Hooper (eds.), Syllables and Segments. Amsterdam: North-
Holland.
Hodge, Carleton T.
1946 Serbo-Croatian phonemes. Language, 22: 112120.
Hoff, Bernard J.
1968 The Carib Language: Phonology, Morphonology, Morphology, Texts
and Word Index. The Hague: Martinus Nijhoff.
Howard, Linda
1967 Camsa phonology. In Viola G. Waterhouse (ed.), Phonemic Systems
of Colombian Languages, 7387. Summer Institute of Linguistics
Publications in Linguistics and Related Fields, 14. Norman: Summer
Institute of Linguistics of the University of Oklahoma.
On the relations between [sonorant] and [voice] 63
Hsin, Tien-Hsin
2000 Consonant clusters in Tsou and their theoretical implications. The
Proceedings of the 18th West Coast Conference on Formal Linguis-
tics, Cascadilla Press.
Hualde, Jos Ignacio
1991 Basque Phonology. London, New York: Routledge.
Huffman, Franklin E.
1990 Burmese, Thai Mon, and Nyah Kur: A synchronic comparison.
Mon-Khmer Studies, 1617: 3164.
Hyman, Larry
1985 A Theory of Phonological Weight. Dordrecht: Foris.
It, Junko
1989 A prosodic theory of epenthesis. Natural Language and Linguistic
Theory, 7: 217259.
Iverson, Gregory and Joseph Salmons
1995 Aspiration and laryngeal representation in Germanic. Phonology, 12:
369396.
Jacobs, Neil G.
2005 Yiddish: A Linguistic Introduction. Cambridge: Cambridge University
Press.
Jessen, Michael
2001 Phonetic implementation of the distinctive auditory features [voice]
and [tense] in stop consonants. In Tracy Alan Hall (ed.), Distinctive
Feature Theory, 237294. Berlin, New York: Mouton de Gruyter.
Jessen, Michael and Catherine O. Ringen
2002 Laryngeal features in German. Phonology, 19: 189218.
Joseph, Brian. D. and Irene Philippaki-Warburton
1987 Modern Greek. London: Croom Helm.
Kahn, Daniel
1976 Syllable-based generalizations in English phonology. Ph.D. disserta-
tion, Department of Linguistics Massachusetts Institute of Technol-
ogy. [Published 1980 New York: Garland Press.]
Keating, Patricia A.
1984 Phonetic and phonological representation of stop consonant voicing.
Language, 60: 286319.
Kenstowicz, Michael
1981 The phonology of Chukchee consonants. In Bernard Comrie (ed.),
Studies in the Languages of the USSR. Carbondale: Linguistic
Research Inc.
Kreitman, Rina
2003 Diminutive reduplication in Modern Hebrew. Working Papers of the
Cornell Phonetics Laboratory, 15: 101129.
Kreitman, Rina
2006 Cluster buster: A typology of onset clusters. In J. Bunting, S. Desai,
R. Peachy, C. Straughn and Z. Tomkov (eds.), Chicago Linguistic
Society, 42(1): 163179.
64 Rina Kreitman
Kreitman, Rina
2008 The phonetics and phonology of onset clusters: The case of Modern
Hebrew. Ph.D. dissertation, Department of Linguistics, Cornell
University.
Kreitman, Rina
2010 Mixed voicing word-initial onset clusters. In Ccile Fougeron,
Barbara Khnert, Mariapaola DImperio and Natalie Valle (eds.),
Laboratory Phonology 10: Phonology and Phonetics, 169200.
Berlin: Mouton de Gruyter.
Kuera, Henry
1961 The Phonology of Czech. The Hague: Mouton and Company.
Kuera, Henry and George Monroe
1968 A Comparative Quantitative Phonology of Russian, Czech and
German. New York: American Elsevier Publication.
Ladefoged, Peter and Ian Maddieson
1996 The Sounds of the Worlds Languages. Oxford: Blackwell.
Laufer, Asher
1994 Voicing in contemporary Hebrew. Leshonenu, 57(4): 299342. (in
Hebrew).
Levi, Susannah V.
2004 The representation of underlying glides. Ph.D. dissertation, Depart-
ment of Linguistics, University of Washington.
Levi, Susannah V.
2008 Phonemic vs. derived glides. Lingua, 118: 19561978.
Levin, Juliette
1985 A metrical theory of syllabicity. Ph.D. dissertation, Department of
Linguistics, Massachusetts Institute of Technology.
Levinsohn, Stephen H.
1979 Fonologa del Inga [Phonology of Inga]. In Marilyn E. Cathcart et
al. (eds.), Sistemas Fonolgicos de Idiomas Colombianos 4, 6585.
Bogota: Ministerio de Gobierno. (In Spanish)
Lindblom, Bjrn
1983 Economy of speech gestures. In Peter MacNeilage (ed.), Speech
Production, 217246. New York: Springer-Verlag.
Lindblom, Bjrn and Ian Maddieson
1988 Phonetic universals in consonant systems In Larry M. Hyman and
Charles N. Li (eds.), Language, Speech and Mind, 6278. New
York: Routledge.
Lombardi, Linda
1991 Laryngeal features and laryngeal neutralization. Ph.D. dissertation,
Department of Linguistics, University of Massachusetts, Amherst.
Lombardi, Linda
1995a Laryngeal features and privativity. The Linguistic Review, 12: 35
59.
On the relations between [sonorant] and [voice] 65
Lombardi, Linda
1995b Laryngeal neutralization and syllable wellformedness. Natural Lan-
guage and Linguist Theory, 13: 3974.
Lombardi, Linda
1999 Positional faithfulness and voicing assimilation in Optimality
Theory. Natural Language and Linguist Theory, 17: 267302.
MacKay, Carolyn J.
1994 A sketch of Misantla Totonac phonology. International Journal of
American Linguistics, 60(4): 369419.
MacKay, Carolyn J.
1999 A Grammar of Misantla Totonac. Salt Lake City: The University of
Utah Press.
Maddieson, Ian
1981 Unusual consonant clusters and complex segments in Eggon. Studies
in African Linguistics, Supplement 8: 8992.
Maddieson, Ian and Peter Ladefoged
1993 Phonetics of partially nasal consonants. In Marie K. Huffman and
Rena Krakow (eds.), Nasals, Nasalization and the Velum, 251301.
San Diego: Academic Press.
Mallinson, Graham
1986 Rumanian. London: Croom Helm.
Marlett, Stephen A.
1988 The syllable structure of Seri. International Journal of American
Linguistics, 54: 245278.
Marlett, Stephen A. and Velma B. Pickett
1987 The syllable structure and aspect morphology of Isthmus Zapotec.
International Journal of American Linguistics, 53: 398422.
Matthews, Hubert
1955 A phonemic analysis of a Dakota dialect. International Journal of
American Linguistics, 21: 5659.
McArthur, Harry S. and Lucille E. McArthur
1956 Aguacatec (Mayan) phonemes within the stress group. International
Journal of American Linguistics, 22: 7276.
McCarthy, John and Alan Prince
1986 Prosodic morphology. Ms., University of Massachusetts, Amherst,
and Brandeis University, Waltham, Mass.
McKaughan, Howard. P.
1954 Chatino formulas and phonemes. International Journal of American
Linguistics, 20: 2327.
Morelli, Frida
1998 Markedness relations and implicational universals in the typology of
onset obstruent clusters. In Proceedings of NELS 28: Volume 2.
Morelli, Frida
1999 The phonotactics and phonology of obstruent clusters in Optimality
Theory. Ph.D. dissertation, Department of Linguistics, University of
Maryland at College Park.
66 Rina Kreitman
Morelli, Frida
2003 The relative harmony of /s+stop/ onsets: Obstruent clusters and the
sonority sequencing principle. In Caroline Fry and Ruben van de
Vijver (eds.), The Syllable in Optimality Theory, 356371. Cam-
bridge: Cambridge University Press.
Mortensen, Charles A.
1999 A Reference Grammar of the Northern Embera Languages: Studies
in the Languages of Colombia 7. Arlington: Summer Institute of
Linguistics and the University of Texas, Publications in Linguistics,
118. Dallas: Summer Institute of Linguistics and the University of
Texas at Arlington.
Nacaskul, Karnchana
1978 The syllabic and morphological structure of Cambodian words.
Mon-Khmer Studies, 7: 183200.
Nagaraja, Keralapura S.
1990 Khasi Phonetic Reader. Mysore: Central Institute of Indian Languages.
Ns, Olav
1965 Norsk Grammatikk: Elementre Strukturer og Syntaks [Norwegian
Grammar: Elementary Structures and Syntax]. Forlag: Fabritius &
Snners. (In Norwegian).
Needham, Doris and Marjorie Davis
1946 Cuicateco Phonology. International Journal of American Linguistics,
12: 139146.
Nepveu, Denis
1994 Georgian and Bella Coola: Headless syllables and syllabic obstruents.
MA thesis, UC Santa Cruz.
Ohala, Manjari
1983 Aspects of Hindi Phonology. Delhi: Motilal Baarsidass.
Okrand, Marc
1979 Metathesis in Costanoan grammar. International Journal of American
Linguistics, 45: 123130.
Parker, Steve
2002 Quantifying the sonority hierarchy. Ph.D. dissertation, Department
of Linguistics, University of Massachusetts, Amherst.
Parker, Steve
2008 Sound level protrusions as physical correlates of sonority. Journal of
Phonetics, 36: 5590.
Penzl, Herbert
1955 A Grammar of Pashto: A Descriptive Study of the Dialect of Kanda-
har, Afghanistan. Washington, D.C.: American Council of Learned
Societies.
Pike, Kenneth and Eunice Pike
1947 Immediate constituents of Mazateco syllables. International Journal
of American Linguistics, 13(2): 7891.
On the relations between [sonorant] and [voice] 67
Rabel, Lili
1961 Khasi, a Language of Assam. Baton Rouge: Louisiana State Univer-
sity Press.
Rex, Eileen and Mareike Schttelndreyer
1973 Sistema fonolgico del Cato [Phonological systems of Catio].
Sistemas Fonolgicos de Idiomas Colombianos 2, 7385. Bogot:
Ministerio de Gobierno. (In Spanish).
Rialland, Annie
1994 The phonology and phonetics of extrasyllabicity in French. In Patricia
Keating (ed.), Phonological Structure and Phonetic Form: Papers in
Laboratory Phonology 3, 136159. Cambridge: Cambridge Univer-
sity Press.
Riehl, Anastasia
2008 The phonology and phonetics of Nasal-Obstruent sequences. Ph.D.
dissertation, Department of Linguistics, Cornell University.
Riggs, Venda
1949 Alternate phonemic analysis of Comanche. International Journal of
American Linguistics, 15: 229231.
Rgnvaldsson, Eirkur
1993 slensk Hljokersfri [Icelandic Phonology]. Reykjavk: Mlv-
sindastofnun Hskla slands. (In Icelandic).
Rusanivskyi, V. M. (ed.)
1986 Ukrainskaya Grammatika [Ukrainian Grammar]. Kiev: Naukova
dumka. (In Russian).
Sapir, Edward
1923 The Phonetics of Haida. International Journal of American Linguis-
tics, 2(3/4): 143158.
Sawicka, Irena
1974 Struktura Grup Spgoskowych w Jezykach Sowiaskich [Structure
of Consonantal Clusters in Slavic Languages]. Wrocaw: Zakad
Narodowy im Ossoliskich. (In Polish).
Scatton, Ernest. A.
1984 A Reference Grammar of Modern Bulgarian. Cambridge: Slavica
Publishers.
Schaub, Willi
1985 Babungo. London: Croom Helm.
Selkirk, Elisabeth
1982 The syllable. In Harry van der Hulst and Norval Smith (eds.), The
structure of phonological representations. Dordrecht: Foris Publica-
tions.
Selkirk, Elizabeth
1984 On the major class features and syllable theory. In Morris Halle,
Mark Aronoff and Richard T. Oehrle (eds.), Language Sound Struc-
ture: Studies in Phonology, 107136. Cambridge, Massachusetts:
MIT Press.
68 Rina Kreitman
Ventzel, Tatiana V.
1983 The Gypsy Language. Moscow: Nauka Publication.
Watkins, Justin
2002 The Phonetics of Wa. Canberra: Pacic Linguistics.
Werner, Winter
1962 Problems of Armenian phonology III. Language, 38(3): 254262.
Westbury, John and Patricia Keating
1986 On the naturalness of stop consonant voicing. Journal of Linguistics.
22: 145166.
Wetzels, W. Leo and Joan Mascar
2001 The typology of voicing and devoicing. Language, 77(2): 207244.
Wheeler, Max
2005 Voicing contrast: licensed by prosody or licensed by cue? ROA
769, Rutgers Optimality Archive, http://roa.rutgers.edu/.
Wiese, Richard
1996 The Phonology of German. Oxford: Calderon Press.
Wonderly, William L.
1951 Zoque II: Phonemes and morphophonemes. International Journal of
American Linguistics, 17(2): 105123.
Wright, Richard
1996 Consonant clusters and cue preservation in Tsou. Ph.D. dissertation,
Department of Linguistics, University of California Los Angeles.
Yoshioka, Hirohide, Anders Lfqvist and Ren Collier
1982 Laryngeal adjustments in Dutch voiceless obstruent production.
Annual Bulletin of the Research Institute of Logopedics and Pho-
niatrics, 16: 2735.
Zec, Draga
1988 Sonority constraints on prosodic structure. Ph.D. dissertation,
Department of Linguistics, Stanford University.
Zec, Draga
1995 Sonority constraints on syllable structure. Phonology, 12: 85129.
Limited consonant clusters in OV languages
Abstract
It has been claimed that the complexity of syllable structure is correlated to the order
between verb and object in languages of the world: the syllable structure in OV languages
is simpler than that in VO languages. However, our analysis of data in Maddieson (2005)
and Dryer (2005) seems to show that a number of OV languages have (moderately)
complex syllable structure. In spite of this result, we argue that the syllable structure
in OV languages is simpler than has been reported, by considering the geographical
gradience of coda variety, coda inventory, phonological simplication and particles
attached to nouns, and complement-head orders other than OV/VO. We also discuss
why OV languages have simple syllable structure: it is argued that juncture between
constituents is stronger in left-branching structure (OV) than in right-branching struc-
ture (VO); strong juncture in left-branching structure makes words closely connected
to each other; simple syllable structure such as CV ts nicely into the stronger juncture
without making a consonant cluster.
1. Introduction
It has been pointed out that languages with object-verb order (OV) tend to
have simple syllable structure (Lehmann 1973, Gil 1986, Plank 1998). This
is the case in some OV languages such as Ijo, Yareba and Warao, whose syl-
lable form is CV. However, examination of data in Haspelmath et al. (2005)
(henceforth WALS) shows that a number of OV languages have (moderately)
complex syllable structure.
In this paper, we argue that the syllable structure in OV languages is sim-
pler than has been reported, by showing that consonant clusters are limited at
word boundaries and between words in OV languages. We base our argument
on only a small number of example languages but hope that these will be suf-
cient to demonstrate the viability of our research proposal. From a conceptual
and theoretical point of view, we also discuss the reason why OV languages
should have simple syllable structure.
In Section 2, we review the previous studies of the correlation between syl-
lable complexity and word order. We also examine the correlation hypothesis
using data from WALS. In Section 3, we argue that syllable structure in OV
languages is simpler than it looks if we consider geographical gradation, sim-
plication processes and limited coda inventory. Section 4 discusses why OV
72 Hisao Tokizaki and Yasutomo Kuwana
languages have simple syllable structure; we argue that juncture between con-
stituents is stronger in left-branching structure (OV) than in right-branching
structure (VO). Section 5 concludes the discussion.
The two observations in (1) and (2) predict that there will be considerable differ-
ences between SOV and SVO languages with respect to syllable complexity.
Gil (1986) tests the correlation between OV/VO order and syllable struc-
ture with his 170 sample languages. He reports that the average number of
segments in the syllable structure templates: SOV 4.04 < SVO 4.93. How-
ever, this result is not very convincing because the difference between SOV
and SVO is less than 0.9 (0.89). Moreover, the number of sample languages
is not large enough to claim (1) and (2) as universals across languages; it is
necessary, therefore, to test the hypothesis with more data.
2. Maddieson (2009) admits the crudity of this three-way distinction of syllable com-
plexity, and proposes a renement of syllable typology by scoring the complexity
of onset, nucleus and coda, as shown in (i)(iii).
74 Hisao Tokizaki and Yasutomo Kuwana
These results do not seem to show the expected correlation between the
object-verb order and the syllable structure, that we have seen in (1) (i.e.
OV ! simple syllable) and (2) (i.e. VO ! complex syllable) above. Even
worse, the 23 languages with simple syllable structure and VO orders out-
number the 18 languages with simple syllable structure and OV order. The
60 languages with complex syllable structure and OV order outnumber the 47
languages with complex syllable structure and VO order. These data are in fact
the opposite of what we expected, given the previous studies we have seen
above. It may be that the results can be improved by rening our quantitative
approach.
First, Dryer (1992, 2009) argues that typological work should not be based
on the number of languages, but on the number of genera. Genera are groups
of languages whose similarity is such that their genetic relatedness is un-
controversial (Dryer 1992: 84). Dryer argues that counting genera rather than
languages controls for the most severe genetic bias. Counting the numbers of
genera instead of languages slightly improves the results, as shown in Table 2.
The 17 genera with simple syllable structure and OV order outnumber the 16
genera with simple syllable structure and VO order. However, the 48 genera
with complex syllable structure and OV order still outnumber the 38 genera
with complex syllable structure and VO order.
Second, Dryer (1992, 2009) argues that genera should also be divided into
six macro areas. He emphasizes that it is dangerous to use data from raw totals
of languages without examining their distribution over areas. Dividing genera
into macro areas gives Table 3.
Table 3 shows that there are more OV genera than VO genera with simple
syllable structure in (d) Australia (7:1) and (f ) South America (8:3). However,
these areas also have more OV genera than VO genera with complex syllable
structure, i.e. (d) Australia (9:1) and (f ) South America (7:1). In the other
areas, (a) Africa, (b) Eurasia, (c) South East Asia and (e) North America, the
number of OV genera with simple syllable structure is not more than that of
VO genera with simple syllable structure. These results show that the data in
WALS do not give straightforward support for the hypothesis that OV languages
have simple syllable structure.
However, in the next section we argue that OV languages do have simple
syllable structure if we consider the geographical gradation of the variety of
word-nal consonants, the ne classication of syllable complexity and head-
complement orders, the coda inventory and the simplication of syllable
structure within words and between words.
enable us to see possible correlations with other features such as word orders.
For example, syllable complexity should be dened on the basis of the number
and variety of coda consonants. Hashimoto (1978) argues that both coda and
tone are simpler in north Asia than in south Asia, as shown in Table 4.
Table 4. Number of tones and codas in Asian languages (cf. Hashimoto 1978)
Table 5. Number of tones, coda variety and complement-head orders (+) (Stem-Sufx,
Genitive-Noun, Adjective-Noun, Noun Phrase-Postposition, Object-Verb,
Clause-Adverbial Subordinator)
Although the data are insufcient in some cases, Table 6 shows a tendency:
as the number of segments increases, the value of head-complement orders
increases. Except for the languages with two, ve and nine segments in a
syllable, which have the head-complement scores of 1.33, 0.92 and 2.33
respectively (italicized), the HC score gradually increases from 1.45 to 2.50.
This result at least shows that we can expect a ne correlation between syllable
complexity and head-complement orders including OV/VO order.
5. The coda data for Kanuri, Korean, Tamil and Chukuchi in list (5) are from VanDam
(2004). We also checked the other languages by analyzing the data in Kamei et al.
(19882001).
Limited consonant clusters in OV languages 81
6. One might argue that our selection of languages in this section and the next is arbi-
trary. We admit that we have not checked all languages in a principled manner.
However, the point of our argument is to show that there are at least a number of
OV languages whose syllable structure is simpler than previously reported, and
that this is an area worthy of future investigation.
82 Hisao Tokizaki and Yasutomo Kuwana
9. We need to consider the reason why -kwa instead of -wa is used after a word
ending with a consonant to make a consonant cluster. Another remaining problem
is why the genitive case marker -uy does not have another form with an onset
consonant.
Limited consonant clusters in OV languages 85
We have argued that OV languages tend to have simple syllable structure with
fewer consonant clusters between words and within words. In this section, we
consider why word orders correlate with syllable structure. Tokizaki (2008)
argues that left-branching structure has stronger juncture between its con-
stituents than right-branching structure. The juncture between B and C in left-
branching (16a) is stronger than the juncture between A and B in right-
branching (16b).
(16) a. [[A B] C]
b. [A [B C]]
In this sense, the juncture is asymmetrical between left-branching and right-
branching structure. Tokizaki (2008) shows phonological and morpho-syntactic
evidence for this junctural asymmetry. Let us review some of the arguments
about Japanese and Korean presented there and discuss some new data from
Dutch and German. First, consider Rendaku (sequential voicing) in Japanese,
which applies to the rst consonant in a word preceded by another word end-
ing with a vowel. For example, the rst consonant in the second word in (17a)
and (17b) is voiced when it is a part of a compound.
86 Hisao Tokizaki and Yasutomo Kuwana
and syllable structure in languages. Let us consider how simple syllable struc-
ture allows an object to move to the left of the verb to make left-branching
structure. For example, a verb phrase tends to have right-branching structure
in a head-initial language (24a), and left-branching structure in a head-nal
language (24b).
(24) a. [VP V [NP .. N ..]]
b. [VP [NP .. N ..] V]
However, if we assume the left/right-branching asymmetry discussed above,
head-nal languages in fact have compound-like verb phrases.
(25) [V [ .. N .. ] V]
The object and the verb in (25), separated only by a weak bracket (represented
by ] ), are more closely connected to each other than the object and the verb in
(24a), which are separated by a strong boundary. Simple syllable structure
such as CV ts nicely into the stronger juncture in (25) without making a
consonant cluster, as in (26).
(26) [V [ .. CV ] CV]
Then VO languages are allowed to have complex syllable structure because
strong boundaries separate the coda of the verb and the onset of the object as
shown in (27).
(27) [VP .. CCCVCC [NP CCCVCC .. ]]
Thus, left/right-branching asymmetry gives us an interesting way to explain a
correlation between syntax and phonology.10
5. Conclusion
We have seen that data in WALS do not show a clear correspondence between
OV languages and simple syllable structure. However, we have argued that
this is partly due to the crude distinction between syllable complexity in
WALS. We have pointed out that we should take into account the geographical
gradience of coda variety, coda inventory, phonological simplication and
10. Mehler et al. (2004) report experimental work showing the correlation between head-
complement order and rhythm, i.e. head-complement = stress-timed vs. complement-
head = mora-timed. Although it is based on data from only fourteen languages, the
result seems to apply to other languages as well.
Limited consonant clusters in OV languages 89
Acknowledgments
We would like to thank Theo Vennemann for invaluable comments and sugges-
tions. We are also grateful to Bingfu Lu for his comments on Chinese dialects.
This work is supported by Grant-in-Aid for Scientic Research (A20242010,
C18520388) and Sapporo University.
References
Rastorgueva, Vera S.
1964 A short sketch of the grammar of Persian, (translated by Steven P.
Hill; edited by Herbert H. Paper.) Bloomington: Indiana University.
Shiraishi, Hidetoshi
2006 Topics in Nivkh phonology. Groningen Dissertations in Linguistics
61. University of Groningen.
Tokizaki, Hisao
2008 Symmetry and asymmetry in the syntax-phonology interface. On-in
Kenkyu (Phonological Studies) 11, 123130.
VanDam, Mark
2004 Word nal coda typology. Journal of Universal Language 5: 119
148.
Wagner, Michael
2005 Asymmetries in prosodic domain formation. MIT Working Papers in
Linguistics 49, 329367.
Weiers, Michael
2003 Moghol. In: Juha Janhunen (ed.) The Mongolic languages, London:
Routledge, 248264.
Manner, place and voice interactions in Greek
cluster phonotactics
Marina Tzakosta
Abstract
This paper evaluates cluster formation and cluster well-formedness in Greek on the
basis of three distinct scales, namely the scale of manner of articulation, the scale of
place of articulation and the scale of voicing. The proposal of this paper is that the
classical Sonority Scale (cf. Selkirk 1984, Steriade 1982) and the bi-dimensional model
proposed by Morelli (1999) in which cluster formation is evaluated on the basis of two
distinct scales, i.e. the manner and place scales, are not adequate to account for cluster
formation and cluster well-formedness. According to the present proposal, in addition
to the scales of manner and place, voicing is crucial for cluster well-formedness and
needs to constitute a distinct scale. Voicing actually denes a cluster as an acceptable
tautosyllabic sequence. Well-formedness is driven by the rightward satisfaction of the
scales in combination with the Distance holding among cluster members. Different
degrees of satisfaction of the scales and different distances holding among cluster
members result in different degrees of cluster well-formedness. The theoretical claims
expressed here are tested through Greek dialectal and developmental data but aim at
having cross-linguistic value. The current proposal further contributes to the establish-
ment of principles governing syllabication.
1. Introduction
1. In stress-to-weight systems stress adds weight to the syllable that carries it while in
weight-to-stress systems stress falls on heavy syllables.
94 Marina Tzakosta
rise in sonority from left to right; therefore, stops are the least sonorous seg-
ments whereas vowels are the most sonorous ones. The notion of sonority
was rst introduced by Sievers (1901) and further developed by Jespersen
(1904). Jespersen proposes the classication of phonemes in terms of sonority.
Sonority is considered to be a universal principle dependent on phonological
grounds. Moreover, there are acoustic studies which further support its universal
cross-linguistic character (cf. Jany et al. 2007).
Sonority is a gradient notion in the sense that it is comparative; for example,
stops are less sonorous than fricatives and both are less sonorous than vowels.
Moreover, the more sonorous a segment is the more chances it has to occupy
syllabic nuclei positions. On the contrary, the least sonorous a segment is the
more probable it is to be part of a syllabic onset or a syllabic coda. Given the
above, a syllable is a contour schema rising in sonority towards the nucleus
and falling in sonority towards the coda. Rightward satisfaction of the scale
implies that, for example, stops may cluster with any consonant type to their
right on the scale and result in well-formed clusters. However, fricatives can
cluster with all consonant types except for stops which are located to their
left. Therefore, according to the SonS, FAFFR,2 FN, FL clusters are perfectly
acceptable, but FS3 sequences are not.
2. S stands for stops, F for fricatives, AFFR for affricates, N for nasals, L for laterals
and rhotics, G for glides, V for vowels and C for obstruent consonants, i.e. stops
and fricatives.
3. Morelli (1999) suggests that the systematic occurrence of obstruent clusters must
be explained in sonority-independent terms. She suggests that the sonority scale
should be divided in two distinct scales, one for PoA and one for MoA, along
which generalizations can be made. According to her proposal, FS sequences are
the only well-formed clusters in Greek. /s/ clusters are also unmarked along both
dimensions. However, Greek allows not only for FS clusters but also for SF, FS,
SS and FF sequences.
Manner, place and voice interactions in Greek cluster phonotactics 95
SD, on the other hand, a notion qualitative in nature, determines the degree
of cluster well-formedness (cf. Clements 1988, 1990, 1992). More specically,
cluster members marked by the biggest possible and sonority-rising distance
between them make up the best-formed clusters. Numbers on the SonS signal
the distance among cluster members. Consequently, a SF cluster like /px/ with
a SD (1) is less well-formed compared to SL sequences like /pl/ with SD (4),
though both are well-formed clusters. Therefore, SD presupposes that cluster
well-formedness is marked by different degrees of cluster perfection and
acceptability. Put differently, cluster perfection is signaled by the biggest
possible sonority distance among cluster members the minimal distance
being (1) while cluster acceptability is signaled by, in most cases, (0) dis-
tance among cluster members; (0) distance is attested when cluster members
share the same manner of articulation, place of articulation or voicing.
Gradience in cluster formation is one of the cores of the present study
which will be discussed in detail. It is important to mention that Lass (1984)
has proposed a mirror image of the SonS, namely, the Scale of Consonantal
Strength (hereafter SConS). On the SConS, phonemes are evaluated and inter-
related not with respect to sonority but with respect to their strength. Therefore,
while vowels are the most sonorous and stops are the least sonorous segments
on the SonS, stops are the strongest segments while vowels are the weakest
segments on the SConS.
Claims like the ones discussed above allow us to make certain predictions
regarding cluster realization and, implicitly, cluster perception. More speci-
cally, if the SonS and SD govern cluster perfection, we expect that a perfect
cluster would be perceptually more salient than an acceptable cluster; as a
result, the former would have more chances to remain intact in its surface/
phonetic realization. In other words, we would expect that the SonS and SD
drive clarity of perception which, in turn, facilitates production. Consequently,
CL rather than CC clusters are expected to emerge more frequently not only
cross-linguistically but also in various aspects of a language (i.e. its dialectal
varieties, L1 and L2 data, language disorders). The accuracy of the above
assumptions is reinforced by the fact that multiple repair strategies, such as
epenthesis, deletion or fusion, apply in clusters with small SD, like SF or FN,
whereas clusters with big SD, like SL, are characterized by vowel anaptyxis.
These assumptions have been tested and veried by Greek L1 and L2 experi-
mental and developmental data in Tzakosta (2009) and Tzakosta and Vis
(2009a, 2009b, 2009c).
Although there is solid argumentation regarding the universal as well as
(per language) parametric factors that determine the formation of consonant
clusters at the level of the SonS and SD, little has been said regarding the
internal coherence of consonant clusters and additional factors which drive
96 Marina Tzakosta
2. The problem
c. CG .jos empty-ADJ.MASC.NOM.SG.
d. CN a.km acme/prosperity-FEM.NOM.SG.
.nos nation-NEUT.NOM.SG.
e. NN a.mne.si.a amnesia-FEM.NOM.SG.
In this study, the focus is on two-member CL and CC consonant clusters
because these cluster types, rst, display a great variety of possible combinations
in Greek, second, added up they are the most frequently attested (Protopapas
et al. in press), and, third, they differ radically regarding their phonological
representation. More specically and regarding this latter parameter, CC
sequences have tight phonological representations similar to those of complex
segments, and, consequently, they are difcult to perceive and produce. On the
contrary, CL clusters have a loose phonological representation, therefore,
they are assumed to be easy to perceive (Tzakosta and Vis 2009a). CC and CL
phonological representations are depicted in figures (2a) and (2b), respectively.
sibilants are fricatives, the former behave differently from other fricatives. Not
randomly, sibilants, when they appear in onset position, are considered to be
extrasyllabic segments. Besides, we assume that this exible and extrasyllabic
role of /s/ makes /st/ be the most frequently attested cluster in Greek (Protopa-
pas et al. in press). It is important to note that, though excluded, these cluster
types reinforce our present account. For a relevant discussion see Tzakosta (in
press).
The major question underlying this study refers to the types of consonant
clusters emerging in various aspects of a language system. More specically,
Greek is characterized by constraints that limit the types of clusters allowed
in standard Greek. However, dialectal as well as developmental Greek L1 and
experimental L2 data reveal that clusters not allowed in the standard language
are allowed in other aspects of Greek. It will be shown that segments which
are unmarked under a theory of Markedness and, therefore, expected to
surface earlier and more accurately in L1 and L2 are substituted for more
complex segments/sequences.
Such facts suggest that a theoretical account of the segmental composition
of clusters based on Feature Geometry and Underspecication is explanatorily
inadequate. There are additional questions related to the above claims. For
example, if CL are perfect clusters due to SD why do non-perfect clusters,
such as CC, emerge massively in Greek dialects and language development?
Why do clusters not allowed by the phonotactics of standard Greek emerge
in dialectal and developmental data? These topics will be addressed in the
discussion that follows.
Based on the question just pointed out in the previous section, the current
study has the following aims: rst, to investigate the production patterns of
CL and CC clusters with the additional aim to test whether all cluster types
have the same survival chances in their surface realization, and, second,
to make a typological account of CL and CC cluster formation in dialectal
varieties of Greek, L1 acquisition and L2 learning.
Our major working hypothesis is that the SonS is not adequate to explain
cluster formation. Rather, cluster formation should be evaluated on the basis
of three distinct scales of manner of articulation (hereafter MoA), place of
articulation (hereafter PoA) and voicing. More specically, we propose that
the MoA scale controls good cluster sonority (Clements 1988, 1990), the
PoA scale registers the satisfaction of the xed place hierarchy (Prince and
Smolensky 1993), while voicing renes cluster formation.
100 Marina Tzakosta
4. Data sources
For the purposes of the present study we draw on data from three corpora:
rst, indexed dialectal data (Tzakosta 2010, Tzakosta and Karra 2011) from
the major dialectal zones of Greek, namely, Dialects of Northern Greece
(Epirus, Meleniko, Lesvos, Pontos, Thassos, Corfu, Attica, Thessalia, Kozani,
Trikala, Samothraki, Thessaloniki, Koutsovlahika) and Dialects of Southern
Greece (Cyprus, Crete, Dodekanese, Ikaria). Data indexation was achieved
through the detailed study of grammars, atlases and dictionaries of Greek
dialects. No oral speech dialectal data were recorded.
The second corpus consisted of naturalistic Greek L1 developmental data
from 6 monolingual children whose ages ranged between 1;073;05 years.
The data were collected on the basis of a) a semi-structured technique of
picture naming and b) through free interaction with the children (Tzakosta
2004). The data were recorded and broadly transcribed using IPA.
The third corpus consisted of naturalistic Greek L2 data selected from
groups with different L1 backgrounds. First, 10 Dutch monolingual adults with
age range between 2560 years and of intermediate prociency level, and,
second, 3 Romanian monolingual adults with age range between 2751 years
of intermediate and advanced prociency level. The data collection technique
used was structured questionnaires (cf. Tzakosta 2006). Data from both groups
were recorded and broadly transcribed using IPA.
It is important to mention that our study is qualitative in nature. Therefore,
we focus on the patterns of consonant clusters that emerge in Greek varieties,
L1 and L2, rather than on the frequencies of their surface realization. Conse-
quently, we do not provide statistical analyses or input frequency effects.6
6. For statistical analyses related to the topic of the current study the interested reader
may refer to Tzakosta (2009, 2010).
Manner, place and voice interactions in Greek cluster phonotactics 101
The D among the members of // is (3), while it is (4) among the members of
/pl/. A necessary condition for the formation of a perfect cluster is the minimal
satisfaction of all scales, i.e. with D (1).
Acceptable clusters are consonantal sequences consisting of members
mostly sharing the same landing point on the SonS. In cluster /pt/, for exam-
ple, both cluster members are voiceless stops; they only differ with respect
to place of articulation. In the discussion of the current proposal, we will high-
light the fact that acceptable clusters need to at least (vacuously) satisfy one of
the three scales.
Finally, non-acceptable clusters are consonantal sequences not respecting
the SonS. In other words, non-acceptable clusters are formed by consonants
selected on a leftward direction on the SonS, like /p/ whose rst member
is a fricative and the second is a stop. Following Tzakosta (2009), and, as
already mentioned, we assume that different patterns in the production of CL
and CC clusters are due to differences in clusters perceptual load. Different
perceptual loads are due to distinct phonological representations. In other
words, complex phonological representations mirror heavy perceptual loads
while non-complex representations mirror light perceptual loads, as having
been shown in gures (2a) and (2b) above.
The problem arises because the SonS sees segments as inseparable wholes
providing information only regarding the principles which govern cluster
formation, without giving any information about why certain clusters are
better- or worse-formed than others. According to the current proposal, the
SonS should be evaluated separately with respect to MoA, PoA and voicing
in order to assess subtle cluster differentiations. Given the cluster categoriza-
tion suggested above, we suggest that perfect, acceptable and non-acceptable
cluster formation depends on the degree of satisfaction of the scales of
manner, place and voicing which are illustrated in gures 3, 4 and 5, respec-
tively. Like the classical SonS, all scales need to be satised in a rightward
manner. However, not all clusters are perfect to the same extent, since, as
already mentioned, cluster perfection is gradient; the bigger the D among
cluster members on all scales the better-formed the cluster. The minimal possi-
ble D for perfect clusters is (1) and the maximal is (4).
The manner scale in g. 3 heavily draws on the classical SonS. In the data
in (3), (3d) is an example of a cluster which satises the manner scale, though
with the minimal possible D; the stop is the leftmost cluster member, while the
fricative is the rightmost one. In other words, /p/ in (3d) is a perfect cluster
on the manner scale with the minimal possible distance (1) holding among its
members. (3ac), on the other hand, are instances of clusters which vacuously
satisfy the manner scale because its members land at the same point on the
102 Marina Tzakosta
scale, i.e. they are both either stops or fricatives. Cluster members sharing the
same manner of articulation form acceptable clusters. In addition, in (3ab)
both cluster members are stops. It is interesting that in (3c) stop /p/ changes
to fricative /v/ and, consequently, a minimally perfect cluster becomes due
to its fricative members an acceptable one. It is important to mention again
that the difference between a minimally perfect and an acceptable cluster is the
D holding among their members. In a minimally perfect cluster D should be
(1), while in an acceptable cluster it is (0).
(3) a. /a..ti.kos/ [a..tkus] different-ADJ.MASC.NOM.SG.
b. /a.po.k.to/ [a.pk.tus] underneath-ADV.
(Meleniko, Andriotes 1989)
c. /pe./ [v] child-NEUT.NOM.SG.
d. /pi.a.m/ [pa.m] span-FEM.NOM.SG.
(Thessalia, Tzartzanos 1909)
On the other hand, the place scale depicted in g. 4 is equivalent to the
xed place hierarchy proposed by Prince and Smolensky (1993). According
to this hierarchy, velars are more marked compared to labials and labials are
more marked compared to coronals. Interpreting the xed place hierarchy
into the place scale proposed here means that a velar or a labial needs to be
the leftmost member of a cluster if a coronal is the rightmost one. Accord-
ingly, in order to form a perfect cluster, if the second member of a cluster is a
labial, the rst member needs to be a velar.
The data in (4) provide evidence that the place scale is satised, though
input clusters are slightly changed in their output realization due to D. More
specically, in (4a) the perfect at the manner level cluster /l/ becomes
// in order to achieve perfection at the place level as well. More specically,
// and /l/ make up an acceptable cluster given that both segments land at the
same point on the place scale they are both coronals with D (0); however,
substitution of // for /f/ creates D (1) on the place scale among the members
of the newly formed cluster. Similarly, in (4b), although /v/ and /l/ make up a
perfect cluster and cluster members are marked by D (1), /v/ is substituted for
// in order to achieve an even bigger D (2). Again, data such as that in (4b)
illustrate that cluster perfection and acceptability are gradient. Certain clusters
are better than others due to D; the bigger the D among cluster members, the
better-formed a cluster at the level of perfection and acceptability. In other
words, clusters with members differing even minimally with respect to oA
and/or PoA are preferred to those sharing the same oA and/or PoA. Finally,
(4c) is a mirror case to those described in (4a) and (4b); more specically,
although (4a) and (4b) illustrate instances of better perfect clusters compared
to (4c), (4c) exemplies that perfect clusters may be substituted for acceptable
ones. Acceptable clusters are characterized by a small, minimal or even zero,
D among their cluster members on at least one of the three scales. In /f/, the
manner and voicing scales are vacuously satised, whereas the place scale is
minimally satised with D (1).7
(4) a. /li.ve.rs/ [i.vi.rs] depressing-ADJ.MASC.NOM.SG.
b. /vl.po/ [l.po] see-1SG.PRES. (Meleniko, Andriotes 1989)
c. /l.vo.me/ [f.vo.me] be sad-1SG.PRES.
(Pontos, Oikonomides 1958)
Finally, the voicing scale in g. 5 is the least complex scale, given that
segments may be either [voiced] or [+voiced]. According to this scale, a
perfect cluster is a cluster whose rst member is [voiced] and the second is
[+voiced]. The converse voicing order is responsible for the formation of non-
acceptable clusters. Consonants sharing the same voicing characteristics, i.e. if
they are both voiceless or voiced, form acceptable clusters.8
Voicing has been primarily dealt with with respect to voicing and devoicing
alternations emerging mostly in Germanic languages (cf. Oostendorp 2004,
2006, among others) and assimilatory processes (cf. Al-Ahmadi Al-Habi to
appear, Arvaniti 1999, Baroni 1997, Grijzenhout 2000). Such phenomena have
been accounted for mostly within OT by means of the *NC, ND, *ND con-
straints which allow or forbid NC or ND sequences to emerge (cf. Borowsky
2000, Grijzenhout 2000, Lombardi 1995, 1999, Pater 1999).9 In order to
establish a voicing scale in our proposal, the motivating question was the
following: if voice assimilation applies to non-adjacent consonants and within
consonant clusters and, at the same time, [voi] + [+voi] clusters like /k/ are
acceptable and attested in the norm and dialectal data, why are [+voi] + [voi]
clusters, like /k/, non-acceptable and, actually, non-emergent in any aspect of
Greek?
The data in (5ac) illustrate the rightward satisfaction of the voicing scale;
the rst member of the cluster is voiceless while the second is voiced. Data
(5de) highlight the creation of clusters which share the same voicing charac-
teristics. Finally, the data (5fi) pinpoint cases of regressive devoicing assimi-
lation; it is interesting that both voiced and voiceless segments may drive
assimilation, as shown in (5fi), respectively. We assume that in languages
like Greek in which both voiced and voiceless segments are allowed in all
word positions which means that neither voicing nor devoicing is preferred
assimilation of both voicing and devoicing are allowed. All clusters in (5) are
acceptable only (5c) is perfect because it minimally satises all scales
because they all vacuously satisfy at least one scale. In order to be perfect,
the clusters in (5) should at least minimally satisfy all scales.
10. Cf. Blaho and Bye (2006) for equivalent cross-linguistic results.
11. For the conditions under which the voicing scale may be violated see Tzakosta
(2009b).
12. Scale vacuous satisfaction is characteristic only of acceptable clusters.
106 Marina Tzakosta
There is still another important question to be addressed; why are data such
as those in (6) attested in different aspects of Greek? More specically, why
are acceptable clusters preferred to perfect ones? First of all, all data in (6)
except (6e) are the result of vowel loss. Apparently, the combination of the
newly adjacent consonants is valid on the basis of the three scales. Therefore,
acceptable clusters emerge. However, it is difcult for the present proposal to
account for cases such as those of (6e) in which a perfect cluster is substituted
for an acceptable one. We assume that (6e) is rather a case of cluster mis-
perception which has been established in the dialect with time. This is apparently
an issue that is still open for discussion.
the latter. (7b) and (7d) violate both the manner and place scale but due to the
vacuous satisfaction of the voicing scale the output clusters are acceptable.
(7) a. /a.ft/ [a.pt], [a.ft] this-DEM.PR. (B: 1;11.27)
b. /sxo.l.o/ [xo.l.o] school-NEUT.NOM.SG. (D: 2;07.06)
c. /o.br.la/ [ku.bl.la] umbrella-FEM.NOM.SG. (Me:1;11.22)
d. /p.sxa/ [p.ka] Easter-NEUT.NOM.SG. (B.M.: 2;09.25)
Dutch and Romanian learners of Greek exhibit equivalent data, as exemplied
in (8) and (9), respectively.
(8) a. /fo.to.ra.f.a/ [fo.to.xra.f.a] photo-FEM.NOM.SG. (S1)
b. /gri..ris/ [kri.ni..ris] nasty-ADJ.MASC.NOM.SG. (S2)
c. /e.vo.m.a/ [e.vdo.m.da] week-FEM.NOM.SG. (S3)
a. /o.ri.kt/ [fri.kt] tanker-NEUT.NOM.SG. (S2)
c. /u.ra.ns/ [i.ra.ns] sky-MASC.NOM.SG. (S3)
d. /e.po./ [e.p.ksi] season-FEM.NOM.SG. (S4)
e. /c.ni.si/ [kl.si] circulation-FEM.NOM.SG. (S5)
(9) a. /f.ri.o/ [fto.r] uorine-NEUT.NOM.SG. (S3)
b. /e.vo.m.a/ [e.vdo.m.da] week-FEM.NOM.SG. (S1)
c. /a.v/ [a.vg] egg-NEUT.NOM.SG. (S2)
d. /xte.n.zo/ [kte.n.zo] comb-1SG.PRES. (S1)
e. /.no/ [gd.no] denude-1SG.PRES. (S2)
We assume that the preference for acceptable clusters is an indication of
the freer cluster formation mechanisms characteristic of Greek dialects but
also other aspects of the language; dialects especially those of the northern
dialectal zone are less conservative regarding cluster synthesis given that
clusters may appear in coda position due to the application of phonological
rules according to which high vowel loss and/ or raising apply in unstressed
syllables (Newton 1972). This allows various acceptable clusters to appear
extensively in the surface realization. In acceptable clusters, consonantal com-
binations are freer than those of a perfect cluster given that D (0) allows for a
high number of consonantal sequences to emerge. Therefore, the number of
acceptable clusters is higher than that of perfect clusters.
Cluster formation gradience is illustrated in tables 13. Table 1 illustrates
the segmental combinations which result in gradience in cluster formation at
the level of manner of articulation. Table 2 displays gradience at the level of
place of articulation, while table 3 presents gradience at the level of voicing.
108 Marina Tzakosta
References
Al-Ahmadi Al-Harbi
To appear English voicing assimilation: Input-to-output [voice] and Output-
to-Input [voice]. Journal of King Abdulaziz University 13.
Andriotes, Panagiotes
1989 The Dialect of Meleniko [ ] [in
Greek]. Thessaloniki: Publications of the Society of Macedonian
Studies.
Arvaniti, Amalia
1999 Greek voiced stops: Prosody, syllabication, underlying representa-
tions or selection of the optimal? Proceedings of the 3rd Interna-
tional Conference of Greek Linguistics. 883390. Athens: Ellinika
Grammata.
Baroni, Marco
1997 The representation of prexed forms in the Italian lexicon: Evidence
from the distribution of intervocalic [s] and [z] in northern Italian.
M.A. Thesis, Department of Linguistics, UCLA.
Manner, place and voice interactions in Greek cluster phonotactics 113
Tzakosta, Marina
2006 Developmental paths in L1 and L2 phonological acquisition: conso-
nant clusters in the speech of native speakers and Turkish and Dutch
learners of Greek. In Andrianna Belletti, Elisa Bennati, Cristiano
Chesi, Elisa di Domenico and Ida Ferrari (eds.), Language Acquisi-
tion and Development: Proceedings of GALA 2005, Generative Ap-
proaches in Language Acquisition. 536549. Cambridge: Cambridge
Scholars Press.
Tzakosta, Marina
2009 Asymmetries in /s/ cluster production and their implications for
language learning and language teaching. Proceedings of the 18th
International Symposium of Theoretical and Applied Linguistics.
365373. Department of English Language and Linguistics: Aristotle
University of Thessaloniki.
Tzakosta, Marina
2010 The importance of being voiced: cluster formation in dialectal
variants of Greek. In Angela Ralli, Brian Joseph, Marc Janse and
Athanasios Karasimos (eds.), E-proceedings of the 4th international
Conference of Modern Greek dialect and Linguistic Theory. 213
223. University of Patras. http://www.philology.upatras.gr/LMGD/
el/index.html (ISSN: 17923743).
Tzakosta, Marina
In press Consonantal interactions in dialectal variants of Greek: a typological
approach of three-members consonant clusters. Greek Dialectology
6.
Tzakosta, Marina and Athanasia Karra
2011 A typological and comparative account of CL and CC clusters in
Greek dialects. In Marc Janse, Brian Joseph, Angela Ralli and Spyros
Armosti (eds.), Studies in Modern Greek Dialects and Linguistic
Theory I. 95105. Nicosia: Kykkos Cultural Research Centre.
Tzakosta, Marina and Jeroen Vis
2009a symmetries of consonant sequences in perception and production:
affricates vs. /s/ clusters. In Anastasios Tsangalidis (ed.), Selected
Papers from the 18th International Symposium on Theoretical and
Applied Linguistics. 375384. Department of English Language
and Linguistics: Aristotle University of Thessaloniki: Monochromia.
Tzakosta, Marina and Jeroen Vis
2009b Perception and production asymmetries in Greek: evidence from the
phonological representation of CC clusters in child and adult speech.
Greek Linguistics 29: 553565.
Tzakosta, Marina and Jeroen Vis
2009c Phonological representations of consonant sequences: the case of
affricates vs. true clusters. In Georgios K. Giannakis, Mary Baltazani,
Georgios I. Xydopoulos and Tassos Tsaggalidis (eds.), E-proceed-
ings of the 8th International Conference of Greek Linguistics
Manner, place and voice interactions in Greek cluster phonotactics 117
Zsuzsa Varnai
Abstract
The purpose of this paper to present a description of the clusters of Samoyedic languages:
Nenets (Tundra), Enets, Nganasan and Selkup (Taz dialect), which are endangered
Uralic languages spoken in North-Siberia in Russia.
In this paper I will give an account of the syllable types attested in root lexemes and
discuss the constraints that apply to the constituents of the syllable in four examined
languages. Despite the fact that these languages are historically and geographically
very close to each other, they have different syllable structures, and they choose differ-
ent processes to adapt borrowed clusters from Russian. I will focus on the similarities
and differences between these languages with respect to the processes affecting clusters
in Russian loanwords. Russian is counted as having complex syllable structure, very
different from the Samoyedic languages.
After a brief description of the languages in question I dene a syllable template and
the representation of the syllable for each language. Then I specify the possible com-
plexity of onset and coda, and I show what types of sequences exist in these languages
and what types do not. Then I discuss what happens in these languages to relatively old
Russian loanwords.
1. Introduction
Russian loanwords. Russian has many clusters, not only in word medial posi-
tion across syllable boundaries, but also in onset position at the beginning of
the word. My research questions are the following: How are Russian con-
sonant clusters treated in Samoyedic? Which types of clusters are retained,
and which ones are simplied in the course of borrowing from Russian?
What happens to branching onsets in Samoyedic languages? Which way do
they choose to adapt these clusters? Do they all choose the same way or dif-
ferent ways? Which types of sequences undergo simplication processes, and
what processes do they undergo?
1.1. Sources
The purpose of this paper to present a description of the clusters of Samoyedic
languages, esp. of Nenets (Tundra), Enets, Nganasan and Selkup (Taz dialect),
which are endangered Uralic languages spoken in North-Siberia in Russia (see
map in Fig. 1). They have not yet been thoroughly investigated in the phono-
logical literature. Despite the fact that these languages in question are histori-
cally and geographically very close to each other, they have different syllable
structures, and they choose different processes to adapt borrowed clusters from
Russian. Russian is counted as having complex syllable structure (see WALS
2005), very different from the Samoyedic languages. It is very remarkable that
different repair mechanisms are found for the same Russian cluster type.
NENETS, YURAK-SAMOYED
Territory / Region:
Russia, Northeast Europe and Northwest Siberia in the Tyumen Region:
Yamal-Nenets, Khanty-Mansi Autonomous Area, Krasnoyarsk:
Tajmyr Municipal District of Krasnoyarsk Region in the Arkhangelsk Region.:
Nenets Autonomous Area
Dialect: Tundra and Forest Nenets
Ethnic population: 41,302
Total number of speakers: 29,052
Finally, let me compare the linguistic situation of the four Samoyedic minor-
ities under review with Fishmans Graded Intergenerational Disruption Scale
(GIDS) (1991, 2001). He has designed a framework to assist speakers of an
endangered language in revitalizing their mother tongue and in reversing
language shift. We have relied on the model when identifying the threatened
status of the Uralic minority languages described above and assigned each to
the following GIDS levels:
Stage 8 So few uent speakers that community needs to re-establish
language norms; often requires outside experts (e.g., mostly native
speaker linguists).
Stage 7 Older generation uses language enthusiastically but children are not
learning it. L1 is only taught as L2.
Stage 6 Language and identity socialization of children takes place in home
and community.
Stage 5 Language socialization involves extensive literacy, usually including
non-formal L1 schooling.
Stage 4 L1 used in childrens formal education in conjunction with national
or ofcial language.
Stage 3 L1 used in workplaces of larger society, beyond normal L1
boundaries.
Stage 2 Lower governmental services and local mass media are open to L1.
Stage 1 L1 used at upper governmental level.
Assigning the four Uralic minority languages described above to these levels,
the following situation was found: their situation is alarming in general; Enets
and some Selkup dialects are at Stage 8; Nganasan and some Nenets and
Selkup dialects are at Stage 7. Only some reindeer herding Nenets communities
are at Stage 6.
Most of the sources used in the study provide only word lists without con-
text. They are usually written documents and dictionaries. The Nganasan and
Enets dictionaries are written for pupils of primary schools, including approx.
3,000 entries, while two others, the Selkup and Nenets ones, contain far more
entries. Alternative sources may not be useful for loanwords. Even though
there are many published texts of these languages, they are usually tales, folk-
lore texts, and stories with very few loanwords.
Nenets: Tereenko (1989), Tundra Nenets dialect
Enets: Sorokina & Bolina (2001), Tundra Enets dialect
Nganasan: Kosterkina, Momde & danova (2001)
Selkup: only the Taz dialect will be under investigation here (Helimski 2007)
124 Zsuzsa Vrnai
2. The survey
the sequence an.ta is preferred (more natural, less marked) to ap.ta. Con-
sequently the most preferred heterosyllabic cluster is the sonorant-obstruent
(SO) cluster and obstruent-sonorant less well-formed (OS). Referring to the
SSP and SCL I will determine the well- or ill-formedness of the clusters.
In the next section we present the consonant systems and their distribution,
the most signicant phonotactical restrictions and regularities, and CC combina-
tions for each of the languages.
The following table shows which consonants can occur in the various syllabic
positions in this language:
p pj t t j k b bj d d j c c j s s j x m m j n n j l lj r rj w j
#__ + + + + + + + + + + + + + + + + +
V__V + + + + + + + + + + + + + + + + + + + + + + + + + +
__C + + + + + + + + + + + + + + + + + + +
C__ + + + + + + + + + + + + + + + + + + + + + + + + + +
__# + + + + + + + + + + + + + + + + + + + + + + + + + +
C2 O S
C1 plosive affricate spirant nasal liquid glide
plosive ++ ++ ++
O affricate
spirant
nasal ++ ++ +
S liquid ++ ++ ++
glide ++ +
The most frequent word nal types of clusters are SO clusters in Nenets. Ob-
truents occur frequently as the second constituent. Of the sonorants only the
nasals can occur in C2 position; liquids and glides do not occur at all. Affri-
cates and spirants cannot form the rst element of the cluster.
Transsyllabic clusters in Nenets are shown in Table 4. They are adjacent
segments belonging to two different syllables.
The distribution of transsyllabic cluster types in Nenets is slightly different
than nal codas: affricate and spirant can be the rst element of the cluster,
and liquids and glides can occur in C2 position.
There are also clusters of three elements in Nenets. They can occur in
medial and nal position. In medial position the syllable boundary is after the
C2 : C1 C2 $C3 . They are generally all well-formed clusters from the viewpoint
of sonority. In syllable contact (i.e., intervocalic) clusters of three elements, C1
is most often a plosive, a liquid, a glide or a nasal, C2 is a glottal stop, an
obstruent, an affricate or a nasal and C3 is an obstruent, a nasal or a glottal
stop. Their elements are never from the same class.
Consonant clusters in four Samoyedic languages 127
C2 O S
C1 plosive affricate spirant nasal liquid glide
plosive + + + +
O affricate +
spirant + +
nasal ++ ++ + +* +
S liquid ++ ++ + ++ + +
glide + + + ++ ++
2.1.2. Enets
The classication of Enets consonants is shown in Table 5.
fricative liquid
plosive nasal glide
sibilant spirant lateral trill
vless voiced vless voiced vless voiced voiced voiced
labial p b m
dental t d s []* n l r
palatal t j/t dj sj [ j]** nj lj j
velar k g x
glottal
* = free variant of s
** = free variant of s j
128 Zsuzsa Vrnai
The following table shows which consonants can occur in the various syllabic
positions in this language:
p t t k b d dj g s sj x m n nj l lj r j
#__ + + + + + + + + + + + + + + + + +
V_V + + + + + + + + + + + + + + + + + + + +
__C + + + + + + + + + + + + + + + + + +
C__ + + + + + + + + + + + + + + + + + + +
__# + + + + + + + + + + + + + + + + + + +
The following phonotactic regularities apply in Enets: The onset can be empty
or lled, but it can be non-complex only. Thus both vowel- and consonant-
initial syllables are possible. The nucleus may be simple or branching in
Enets. Complex nuclei occur only when they dominate a single element as in
Nenets. Codas in this language may be empty or simple. There are no CCC
clusters in Enets.
Enets syllable contact cluster types are shown in Table 7. In Enets, some
intervocalic geminates can occur word medially: dd, gg, , ss.
C2 O S
C1 plosive fricative nasal liquid glide
plosive ++ + + +
O
fricative + + + +
nasal + + + +
S liquid ++ ++ +
glide ++ ++ + +
The most frequent types of clusters are SO also in Enets. Obtruents and also
sonorants occur as the second constituent, except glides.
Consonant clusters in four Samoyedic languages 129
2.1.3. Nganasan
The classication of Nganasan consonants is shown in Table 8.
fricative liquid
plosive sibilant spirant nasal lateral trill
vless voiced vless voiced voiced voiced
labial [p]* b m
dental t d** s ** n l r
palatal t/c dj sj lj j
velar k ** x
glottal
The following table shows which consonants can occur in the various syllabic
positions in this language:
t t k b d dj g s sj h m n nj l lj r j
#__V + + + + + + + + + + + + + +
V__V + + + + + + + + + + + + + + + + +
__C + p + + + + + + + +
C__ + + + + nd + + + + + + + + + + +
V__# + + + + +
C2 O S
C1 plosive fricative nasal liquid
plosive * * * *
O
fricative
nasal + ++ +
S
liquid ++ ++ +
2.1.4. Selkup
The Selkup consonant system is shown in Table 11.
The following table shows which consonants can occur in the various syllabic
positions in this language:
Consonant clusters in four Samoyedic languages 131
p t k q t s m n nj l lj r j w
#__ + + + + + + + + + + + + + + +
V__V + + + + + + + + + + + + + + + +
__C + + + + + + + + + + + + + + +
C__ + + + + + + + + + + + + + +
__# + + + + + + + + + + + + + + +
C2 O S
C1 plosive affricate fricative nasal liquid glide
plosive ++ ++ ++ + +
O affricate ++ ++
fricative ++ ++ + +
nasal ++ ++ + + +
S liquid ++ ++ ++ ++ ++
glide ++ + +
Consonants cannot appear as syllable nuclei in any of the four languages ana-
lysed here. There are no complex edge components in Samoyedic languages in
any position, except for nal complex codas in Nenets. After derivation and
inection there could be C#CC; but a simplication process applies, deriving
C1 C2 C3 ! C1 C3 . These languages have moderately complex syllable struc-
ture, which is the most frequent structure in the worlds languages (247 of
485 studied languages have moderately complex syllable structure according
to WALS 2005), which means they permit a single consonant after the vowel
and/or allow two consonants to occur before the vowel, but adhere to a limita-
tion to only the common two-consonant patterns (Maddieson 2008, WALS
2005). Edge effects are active in Nganasan, where there is initial obligatory
onset whenever the nucleus is branching, and in Nenets, where there are
medial and nal branching codas and nal clusters with three constituents
(CCC#).
Table 16 summarizes the information given in Tables 4, 7, 10, and 13 about
contact clusters (i.e., those straddling a syllable boundary) in the languages
investigated.
134 Zsuzsa Vrnai
Accordingly, I will only analyse early Russian loans in the four Samoyedic
languages, and do not deal with later adoptions of the bilingual speech com-
munities.
Given the strong restrictions on onset and coda complexity in Samoyedic
languages, and the extensive range of clusters found in Russian, it is interest-
ing to examine the processes affecting Russian loanwords.
Not all clusters have been investigated in all languages; only those can be
discussed here which were represented in the sources. Gaps in the picture are
due to missing evidence, i.e. if the relevant clusters do not occur in the dic-
tionaries, or are not represented in the sample. Unfortunately we lack exten-
sive quantities of data, so we cannot make predictions but can only review
the regularities.
2.3.1.1. Epenthesis
The most frequent strategy is epenthesis. It is active in every position in all the
four languages examined here. This, moreover, corresponds to crosslinguistic
data: epenthesis appears to be the most frequent adaptation process in languages
(Paradis and LaCharit 1997).
When vowel epenthesis is used to break up a consonant cluster, there is
often more than one location where the vowel could be placed to produce a
phonotactically acceptable output. For example, if a language has open syllable
structure {CV, V}, hence disallowing CC clusters at the beginning of a word,
an initial CCV could be broken up by putting a vowel before the consonants
(VC.CV) prothesis or between the consonants (CV.CV) anaptyxis. In a
medial CCC cluster, the vowel could occur before the second or third con-
sonant. The choice of epenthesis locations is language specic.
136 Zsuzsa Vrnai
2.3.1.2. Prothesis
A particular type of epenthesis is when the vowel is inserted before the con-
sonant cluster at the beginning of the word; this is also known as prothesis.
That process can be observed in the Nenets, Nganasan and Selkup data. The
inserted vowel is a in Nenets and Nganasan and i in Selkup and in some cases
in Nganasan. Prothesis affects mostly sibilant + plosive clusters at the beginn-
ning of the word.
2.3.1.3. C1-deletion
In general, vowel epenthesis seems to be a heavily prefered repair type in
loanword adaptation. Uffmann (2007) surveys case studies of loanword adap-
tation and he concludes that consonant deletion is a marginal phenomenon,
compared to epenthesis. Adding extra segments is less undesirable than delet-
ing segments from the word (Paradis and LaCharit 1997). C1-deletion affects
only Russian tautosyllabic clusters in the onset. It acts in each of the four
languages:
Nenets:
stakan > takan cup
kola > kola school
vtulka > tulka lead shot
Enets:
kola > kola school
st j eklo > t j eklo glass
Nganasan:
skamejka > kamejka bench
kola > kol school
Selkup
spirt > pirt spirit
- zdarovat j -s j a > tarowatt-qo to welcome
We have to mention that the same cluster can be affected by several different
processes, i.e. Russian word initial sibilant + plosive clusters can be borrowed
to Nenets with C1 -deletion or prothesis (see later discussion in 2.3.2).
138 Zsuzsa Vrnai
2.3.1.4. C2 -deletion
This is a very interesting repair process. In general, when truncation occurs it
eliminates the rst consonant of the cluster. We have only three pieces of data
for C2 -deletion. This repair strategy is active only in Selkup, affecting two
intersyllabic clusters and one onset cluster. The Russian complex sibilant +
plosive onset cluster is resolved by two types of truncation in Selkup:
zdarovat j -s j a > C1 -deletion: tarowatt-qo and C2 -deletion: sarowatt-qo. This
dichotomy is dialectal. Unfortunately, we have very few data; it would be use-
ful to get more examples of C2 -deletion.
Selkup
- zdarovat j -s j a > sarowatt-qo to welcome
kukla > kuka puppet
nuda > nua poverty
2.3.1.5. CV-metathesis
This adaptation strategy primarily affects initial onset clusters; it is not a com-
mon strategy, and its goal is to restructure the complex onset and to shift the
cluster to the syllable boundary:
CCVCV > CVCCV truba > turba
or
CVCCCVCV > CVCCVCCV kastrul j a > kosturl ja.
Enets
platok > poltok kerchief
truba > turba chimney, pipe
Selkup
krupa > kurpa cereals
kruptatka > kurtatka grits
kastrul j a > kosturl j a pot
2.3.1.6. Syncope
It is an extraordinary, unique phenomenon in the sample that works only in
Enets and produces (rather than removes) syllable contact clusters. Presumably
the aim of this strategy is to make a trisyllabic word bisyllabic, because bi-
syllabic structures are the most frequent ones in the Samoyedic languages.
Unfortunately we have very few data, only these two examples:
Consonant clusters in four Samoyedic languages 139
Enets
bumaga > bomga paper
malako > molka milk
2.3.1.7. Substitution
It is a little different from the other strategies. It is not a restructuring repair,
but it is a kind of assimilation where non-native segments are mapped onto
the phonetically closest ones that are well-formed in the native phonology. It
affects mostly contact clusters in intersyllabic position, but it can also affect
single segments.
Nenets doktor toxtur doctor
ag plak ag
Enets lavka lapka store
Nganasan kanfety kmpet candy
lavka lapku store
Selkup potta pota / pocta post ofce
rovna romna exactly
Substitution can act with restructuring repair (epenthesis) together:
On = onset
Co = coda
The transsyllabic clusters are the second most frequent place where repair pro-
cesses work. We have to mention that according to our data coda clusters are
resolved by epenthesis only.
voiceless sibilant + stop cluster, a vowel tends to be inserted before the cluster
while in an obstruent + sonorant cluster, a vowel tends to be inserted into the
cluster. In Table 19 we can examine what kinds of clusters are affected by the
various repair processes according to position in Samoyedic languages.
On = onset F = fricative
Co = coda L = liquid
O = obstruent N = nasal
S = sonorant P = plosive
s = sibilant
There is no SS repair and SO repair is very rare. The most frequent types of
cluster affected by repair mechanisms are sP and PL clusters: they are affected
by all deletions, epenthesis, metathesis, and substitution as well. They are the
most unacceptable sequences in all three positions.
The resulting order for repair strategies according to the most frequent
cluster types is:
sP: epenthesis > deletion > substitution > metathesis
PL: epenthesis > metathesis / substitution > deletion
FP: substitution > deletion > syncope
LP: epenthesis / syncope
PN: epenthesis / substitution
142 Zsuzsa Vrnai
3. Conclusion
The Samoyedic languages permit consonant clusters, but they are very restric-
tive in terms of complex edge components. For example, most of them do not
permit initial consonant clusters, or more than two consecutive consonants in
other positions, especially at the same side of the syllable boundary. There are
Consonant clusters in four Samoyedic languages 143
Appendix
Nenets: V-epenthesis
onset (#CC)
Russian Nenets
gr gram xaram gramm
kr krupa xurupa cereals
kr klass xalas class
PL
OS kl kravat j xorovat j bed
dr drob j torob barrel
br brezent persent y
PN k kiga xyika book
coda (CC#)
OS PL tr kilometr xilometra km
prothesis
askola
OO sP k kola school
xaskola
C1 -deletion
onset (#CC)
stakan takan cup
st
stol tol table
sP
OO kola kola school
k
kaf kap cupboard
FP vt vtulka tulka lead shot
substitution
C$C
FP ft kaftot ka xoptocka blouse
OO
PP kt doktor toxtur doctor
- fotograr- to take a
OS PL gr potokrapirujas j
ovat j photo
Consonant clusters in four Samoyedic languages 145
onset (#CC)
PL br brigada prigada brigade
OS
FL ag plak ag
Enets: V-epenthesis
onset (#CC)
Russian Enets
pla pala log
pl
plat j e palat j a dress
OS PL
br brevno beremno beam
kl j kl j ut kul j ut key
coda (CC#)
OS PL tr metr metra meter
SO LF rf arf arpa scarf
C1 -deletion
onset (#CC)
zdarova a doroba health,
zd
welcome
k kola kola school
sk skutno kuno boring
sp spasiba pasiba aj thanks
stakan takan cup
metathesis
onset (#CC)
pl platok poltok kerchief
OS PL chimney,
tr truba turba
pipe
syncope (CVC I C$C)
SO NP bumaga bomga paper
LP malako molka milk
substitution
C$C
vk I
OO FP lavka lapka store
pk
Nganasan: V-epenthesis
onset (#CC)
Russian Nganasan
brigada birigad brigade
br
br j uki buruk trousers
tr chimney,
truba turuba
pipe
PL
OS kl kladovka kolodovka chamber
krest j kirist chest
kr
krupa kyryh cereals
pl plan holan plan
PN k kiga kiig book
coda (CC#)
dr kedr kedr pine
OS PL tr metr metr meter
br nojabr njabri november
Consonant clusters in four Samoyedic languages 147
C$C
br fabrika hu abirik factory
PL tr natruska naturuska strainer
OS
kl uklad ukulat steel
PN d jm s j ed j moj s j ed j emi seventh
SS LN rm t j urma t j yryma prison
SO NP nt kontora kntor ofce
prothesis
k kola askol school
OO sP
st stul istul chair
SO Ls r ranoj ars j enj rye
C1 -deletion
onset (#CC)
sk skamejka kamejka bench
spasiba hu aiba thanks
sp
spravka horaapk certicate
stakan takan cup
OO sP st
stol tol table
zd zdarovat j - drbatu- to
sja dja welcome
k kola kol school
substitution
C$C
NF nf I
SO konfety kmpet candy
np
FP vk I
OO lavka lapku store
pk
148 Zsuzsa Vrnai
Selkup: V-epenthesis
onset (#CC)
Russian Selkup
OO sP gr gruz kurus cargo
coda (CC#)
OO sP sp spirt pirta spirit
OS PL tr metr metra meter
Coregonus
l jdj s j el j d j sel j t j a
SO LP sardinella
lk j olk olka silk
C$C
OO sP st j st j eklo t j ekla glass
prothesis CC#
sk skamejka iskamjka bench
OO sP
st stol istol table
C1 -deletion
onset (#CC)
spirt pirt spirit
to get
sp sputat j - mixed,
putaji-qo
sja to get
confused
zd zdarovat j - tarowatt- to
OO sP
sja qo welcome
st stakan takan cup
sk skamejka kamejka bench
sp spasiba paipo thanks
st staro toru guard
Consonant clusters in four Samoyedic languages 149
C2 -deletion
onset (#CC)
to
OO sP zd zdarovat j -s j a sarowatt-qo
welcome
C$C
OO sP d nuda nua poverty
OS PL kl kukla kuka puppet
metathesis
onset (#CC)
krupa kurpa cereals
OS PL kr
kruptatka kurt j atka grits
C$C
OO sP st kastrul j a kosturl j a pot
substitution
C$C
pota
AP tt potta post ofce
pot j ta
PP tk kadka katka tub
OO
FF fh savhoz sapko state farm
FP fk lavka lapky store
sP d nuda nuta poverty
PN dn ladna latno all right
OS sP sk natruska natru ka strainer
FN vn rovna romna exactly
SO NA nts palat j ene polotensa towel
150 Zsuzsa Vrnai
References
Kang, Yoonjung
2003 Perceptual similarity in loanword adaptation: English postvocalic
word-nal stops in Korean. Phonology 20: 219273.
Kazakevich, Olga
2006 The functioning of the indigenous minority languages in the Yamalo
Nenets autonomous area, Turukhansk district of the Krasnoyarsk
territory and Evenki autonomous area. http://lingsib.iea.ras.ru/en/
round_table/papers/kazakevich1.shtml
Kenstowitz, Michael
1994 Phonology in Generative Grammar, Oxford: Blackwell.
Kenstowicz, Michael
2003 Salience and Similarity in Loanword Adaptation: a Case Study from
Fijian. To appear in Language Sciences.
Kosterkina, N. T., A. . Momde and T. Ju. danova [, . . . .
. . ]
2001 - - [Nganasan-
Russian and Russian-Nganasan Dictionary] Saint Petersburg.
Krigonogov, V. P.
1998
[Etnological processes in Central Siberian minorities], ,
.
Kuznetsova, A. I., E. A. Helimskij and E. V. Grushkina [... . .
X . . ]
1980 , I. [Col-
lection of Selkup language, Taz dialect] Moscow:
.
MacConnell, G. D. Mikhalchenko, V. (eds)
2003 . [Written
languages of the world, Languages of the Russian Federation].
Moscow: .
Maddieson, Ian
2008 Syllable Structure In: Martin Haspelmath, Matthew Dryer, David
Gil, and Bernard Comrie (eds.) The World Atlas of Language Struc-
tures Online. Munich: Max Planck Digital Library, chapter 12.
Available online at http://wals.info/feature/12
Murray, Robert, W. and Theo Vennemann
1983 Sound change and syllable structure in Germanic phonology. Lan-
guage 59(3): 514528.
Paradis, Carole and Darlene LaCharit
1997 Preservation and minimality in loanword adaptation. Journal of Lin-
guistics 33: 379430. Cambridge University Press.
Salminen, Tapani
1977 Tundra Nenets Inection. Mmoires de la Socit Finno-Ougrienne
227. Helsinki.
152 Zsuzsa Vrnai
Salminen, Tapani
1998 A morphological Dictionary of Tundra Nenets. Societatis Fenno-
Ugricae 26. Helsinki.
Sipos, Mria, Sipcz Katalin, Vrnai Zsuzsa, Wagner-Nagy Beta
2007 The Current Sociolinguistic Situation of some Uralic Peoples. Paper
read at 11th International Conference on Minority Languages
(ICMLXI). Pcs 56 July, 2007.
Sorokina, I. P. and D. S. Bolina [. . . . ]
2001 - -. [Enets-Russian and
Russian-Enets Dictionary]. -.
Tereenko, N. M. [, . .]
1966a [Selkup]. In Lytkin, V. (eds)
3. - [Languages
of the USSR, Volume 3. Finno-Ugric and Samoyedic languages]
M, .
Tereenko, N. M. [, . .]
1966b [Nenets]. In Lytkin, V. (eds)
3. - [Languages
of the USSR, Volume 3. Finno-Ugric and Samoyedic languages]
376395. scow: .
Tereenko, N. M. [, . .]
1966c [Nganasan]. In Lytkin, V. (eds)
3. - [Languages of
the USSR, Volume 3. Finno-Ugric and Samoyedic languages] 416
437. scow: .
Tereenko, N. M. [, . .]
1966d , [Enets]. In Lytkin, V. (eds)
3. - [Languages of the
USSR, Volume 3. Finno-Ugric and Samoyedic languages] 438
457. scow: .
Tereenko, N. M. [, . .]
1979 [Nganasan], Leningrad: .
Tereenko, N. M. [, . .]
1989 [Nenets-Russian Dictionary] scow:
.
Thomason, Sara G. and Terrence Kauffman
1988 Language Contact Creolization and Genetic Linguistics. Berkeley,
University of California Press.
Uffmann, Christian
2007 Vowel epenthesis in loanword adaptation, Tbingen, Max Niemeyer
Verlag.
Vrnai, Zsuzsa
2002 Hangtan [Phonology and Phonetics]. In Wagner-Nagy, Beta (ed.):
Chrestomathia Nganasanica. Studia Uralo Altaica Supplementum
10, 3370. Szeged.
Consonant clusters in four Samoyedic languages 153
Vrnai, Zsuzsa
2003 Valban mors nyelv-e a nganaszan? [Really Nganasan is mora
counting?] In Zoltn Molnr and Gbor Zaicz (eds): Permistica et
Uralica. FUP I, 268271. Piliscsaba.
Vrnai, Zsuzsa
2004 A nganaszan nyelv fonolgiai lersa [The phonological description
of Nganasan] Ph.D dissertation, Department of Uralistics, Etvs
Lornd University Budapest.
Vrnai, Zsuzsa
2005 Some problems of Nganasan phonology: Mora or Syllable? In Beta
Wagner-Nagy (ed.) Mikola konferencia, 113126. Szeged
Vrnai, Zsuzsa
Phonology, Phonotactics, Morphonology In Beta Wagner-Nagy (ed.):
Descriptive Grammar of Nganasan [manuscript].
Vennemann, Theo
1988 Preference laws for syllable structure and the explanation of sound
change: With special reference to German, Germanic, Italian, and
Latin, Berlin, Mouton de Gruyter.
Part II. Production: analysis and models
Articulatory coordination and the syllabication of
word initial consonant clusters in Italian
Abstract
In this study we investigate the articulatory coordination of word initial consonant
clusters in Italian. We show that these clusters are generally coordinated in a similar
way to clusters in languages with complex syllable onsets, in that the timing of the
rightmost consonantal gesture in relation to the vocalic gesture is adjusted according
to the number of consonants in the cluster.
However, clusters containing a sibilant, /s/ or /z/, are an exception and show a
different coordination pattern altogether. Such clusters are referred to as having an
impure s, mainly as a result of allomorphy of indenite and denite articles (e.g. il
premio, but lo studente). In such cases, the sibilant does not affect the coordination of
the remaining consonants, indicating that it may not be part of the syllable onset.
1. Introduction
This study takes an articulatory approach to the syllabic parsing of word initial
clusters in Italian within the framework of Articulatory Phonology (Browman
and Goldstein 1988). In this model, the coordination patterns relating to con-
sonants and vowels have been shown to reect syllable structure in different
languages (Browman and Goldstein 2000; Marin and Pouplier 2010 for
American English, Goldstein et al. 2007 for Georgian and Tashlhiyt Berber,
Shaw et al. 2009 for Moroccan Arabic).
Articulatory Phonology models articulatory movements in terms of con-
sonantal and vocalic gestures. These are coupled in relation to each other in
specic ways, reecting the status of the respective consonants and vowels
within the syllable. In CV syllables, the C and V gestures are coupled in-phase,
indicating a simultaneous initiation of these two gestures. This reects the
onset-nucleus relation. In VC syllables, by contrast, the V and C gestures are
coupled in anti-phase relation, and are thus initiated sequentially. This reects
the nucleus-coda relation.
Crucially, syllables with complex onsets, CCV, are modelled as having
two competing coupling modes. On the one hand both C gestures are coupled
in-phase with the V gesture. On the other, the two C gestures are coupled in
anti-phase to each other, such that they do not start simultaneously, aiding
158 Anne Hermes, Martine Grice, Doris Mcke and Henrik Niemann
1. In what follows we refer to /s/ and /z/ as /s/ for the purpose of simplication. Voic-
ing is not distinctive in this position but rather conditioned by the voicing of the
following consonant.
Articulatory coordination and the syllabication of word initial consonants 159
In complex onsets, consonants are in-phase with the vowel and at the same
time anti-phase with each other (Nam and Saltzman 2003, Goldstein et al.
2007). This competitive coupling in complex onsets is present on the surface
as the C-center effect (Browman and Goldstein 2000), where the mean of all
consonantal targets (C-center) is aligned at a stable timing point relative to
the vocalic target. Thus, the distance between the mean of targets for CC in
CCV and for CCC in CCCV is comparable to the midpoint for C in CV. As a
result of this, the rightmost consonant within the cluster is shifted further
towards the vowel with every added consonant. This rightmost shift has
recently been conrmed for Georgian (Goldstein et al. 2007). Other languages,
such as Tashlhiyt Berber (Goldstein et al. 2007) and Moroccan Arabic (Shaw
and Gafos 2008, Shaw et al. 2009), have been analysed as not allowing com-
plex onsets. In these latter studies the rightmost consonant in a cluster has a
stable timing with the vowel, regardless of the size of word initial clusters,
thus conrming the analysis whereby the rightmost consonant is the only one
included in the syllable onset. These studies indicate that it is possible to
recover signatures of syllable structure from the timing of articulatory move-
ments, especially from the gestural timing of the rightmost consonant in
clusters relative to the vocalic anchor.
Figure 2. Coupling graph (a) and schematised articulatory patterns (b) for onsets in
English, adapted from Saltzman et al. (2006).
kept constant (e.g. Berber: mun smun tsmun/) as opposed to English (e.g.
sayed spayed splayed) in earlier work (e.g. Browman and Goldstein 2000).
The rightmost C variable is hypothesised to decrease (rightward shift) com-
paring single onsets with non-sibilant clusters (where consonants are syllabi-
ed as part of the onset). For sibilant clusters, it is assumed that the rightmost
consonant within the cluster is not shifted, but remains at a stable timing point,
indicating that the sibilant is not part of the onset.
2. Method
2.1. Speakers
We recorded two native Italian speakers, one female speaker (MS) in her
mid-forties from Apulia in Southern Italy and one male speaker (AR) in his
mid-thirties from Trentino, in Northern Italy. Both speakers spent their rst
thirty years in their hometowns.
CC vs. /s/+CC word initially, keeping the rightmost consonant constant. The
word list is shown in table 1.
The target words were embedded in the carrier sentence Per favore dimmi
la __ di nuovo (Please say the __ again), ensuring an alternation of high and
low vowels throughout the sequence.
Table 1. Wordlist
C CC /s/+CC
/rema/ (rheme) /prema/ (press) /sprema/ (squeeze)
/rima/ (rhyme) /prima/ (rst) /sprima/ (logatome)
/lina/ (proper name) /plina/ (logatome) /splina/ (logatome)
C /s/+C
/pina/ (proper name) /spina/ (thorn)
/la/ (line) /sla/ (s/he unthreads)
/vita/ (life) /svita/ (s/he unscrews)
2.3. Recordings
The recordings took place at the If L Phonetics laboratory in Cologne. The
speech material was displayed on a computer monitor. Target words were
produced in pseudo-randomised order, each being spoken 10 times in total.
Speakers were instructed to speak at a rate they considered to be comfortable.
Acoustic and kinematic data were recorded simultaneously.
We recorded the acoustic signal with a DAT-recorder (TASCAM DA-P1)
using a condenser microphone (AKG C420 head set) and digitised at 44.1
kHz/16 bit.
The kinematic data was recorded with a 2D electromagnetic midsagittal
articulograph (Carstens AG100; 10 channels). We placed 2 sensors on upper
and lower lip and 3 sensors on the tongue: tongue tip, tongue blade and tongue
body (1cm, 2cm and 4cm behind the tongue tip). Two additional sensors on
the bridge of the nose and the upper gums served as references in order to
correct for head movements during the recordings (see Hoole 1996).
All kinematic data were sampled at 400 Hz, downsampled to 200 Hz and
smoothed with a low-pass lter at 40 Hz. For displaying and labelling data, all
Articulatory coordination and the syllabication of word initial consonants 163
acoustic and kinematic data were converted to SSFF-format to enable the data
to be analysed and annotated in the EMU Speech Database System (Cassidy &
Harrington 2001).
2.4. Labelling Procedure
All acoustic and articulatory landmarks were displayed and labelled by hand.
We labelled the onset and offset of the target word and its acoustically dened
segments. In the present study only the articulatory landmarks are reported on.
The remaining labels were placed in relation to the articulatory record. We
labelled movements in the vertical dimension, identifying minima and maxima
in the respective velocity trace (zero crossings). For vowel-to-vowel articula-
tion, we labelled the vocalic target for /i,e/. For consonants, we labelled
the maximum targets of the primary constrictors (Byrd 2000), whereas labial
consonants were identied by using the lip aperture index (LA, Byrd 2000).
Figure 3 illustrates how the landmarks are annotated for those measures.
Figure 3. Labelling scheme for test word /plina/ in Per favore dimmi la plina di nuovo.
From top to bottom: acoustic waveform, kinematic waveform for vertical
tongue-tip position, inter-lip distance and vertical tongue-body position.
164 Anne Hermes, Martine Grice, Doris Mcke and Henrik Niemann
3. Results
We measured the distance of the rightmost C to the V target in 293 tokens for
both speakers; 7 utterances were discarded from the analysis, due to technical
problems. An overall-ANOVA with rightmost C as dependent variable re-
vealed signicance for the independent variable onset complexity (C, CC,
/s/+C, /s/+CC; p < 0.05) and for speaker (p < 0.01; speaker as random
factor). We therefore used one-way-ANOVAs for each speaker separately
including the dependent variable rightmost C and the independent variable
onset complexity.
Rightmost C to V (ms)
C CC F-value p-value
rema-prema 151 (11) 124 (6) 47.255 ***
MS rima-prima 166 (11) 117 (7) 141.699 ***
lina-plina 203 (12) 165 (21) 22.693 ***
rema-prema 189 (16) 140 (21) 27.279 ***
AR rima-prima 182 (20) 122 (23) 40.574 ***
lina-plina 227 (27) 155 (28) 33.812 ***
166 Anne Hermes, Martine Grice, Doris Mcke and Henrik Niemann
In all cases (p < 0.001) it is shown that the consonant is shifted consider-
ably towards the vowel (for speaker MS: /rema/ vs. /prema/ on average 27ms;
in /rima/ vs. /prima/ on average 49ms; in /lina/ vs. /plina/ on average 38ms;
for speaker AR: in /rema/ vs. /prema/ on average 49ms; in /rima/ vs. /prima/
on average 60ms; in /lina/ vs. /plina/ on average 72ms). In gure 5 the con-
siderable decrease of the rightmost C variable in C vs. CC structured target
words is shown graphically.
Rightmost C to V (ms)
C /s/+C F-value p-value
For both speakers in all cases, there is no difference in the timing from
the rightmost C to the vocalic target, when comparing C to /s/+C clusters
(p > 0.05 n.s.). Although a sibilant is added to the beginning of the word,
the rightmost C is not adjusted relative to the vowel, i.e. latencies remain stable.
In gure 6 the results are presented graphically. Comparing the bars for
each word pair (C vs. /s/+C), we found no decrease of the distance of the
rightmost C to V target. The latencies remain the same. That was the case for
speaker MS in /pina/ ( 241ms) vs. /spina/ ( 243ms), /la/ ( 189ms) vs.
168 Anne Hermes, Martine Grice, Doris Mcke and Henrik Niemann
Rightmost C to V (ms)
CC /s/+CC F-value p-value
prema-sprema 124 (6) 128 (12) 0.835 n.s.
MS prima-sprima 117 (7) 113 (9) 1.405 n.s.
plina-splina 165 (21) 158 (15) 2.047 n.s.
prema-sprema 140 (21) 135 (13) 0.424 n.s.
AR prima-sprima 122 (23) 134 (21) 1.455 n.s.
plina-splina 155 (28) 158 (15) 0.067 n.s.
4. Discussion
These results on articulatory coordination in Italian provide evidence for com-
plex onsets in Italian (CC clusters). In the analysis of the target words C and
CC, we found a decrease in the distance between the rightmost C target and
the V target. The second C target in the cluster is shifted towards the vowel.
This supports the hypothesis of an underlying competitive coupling structure
Articulatory coordination and the syllabication of word initial consonants 171
These results show that /s/ does not exhibit the articulatory timing patterns
required for membership of the syllable onset, in that the rightmost C target is
at a constant distance from the V target. This is true for all analysed target
words containing an impure s for both speakers. In other words, adding the
sibilant to the onset of a word does not affect the timing of the other con-
sonants relative to the vocalic target. Thus, there is no evidence for an under-
lying competitive coupling structure between /s/ and the other consonants.
Figure 8. Schematised articulatory pattern and coupling graphs for C vs. CC cluster
(a), C vs. /s/+C clusters (b) and C vs. CC vs. /s/+CC (c) clusters in Italian.
Articulatory coordination and the syllabication of word initial consonants 173
The same holds for /s/+CC compared to CC (see Figure 8c). This implies that
impure s does not participate in the competitive coupling structures.
Acknowledgements
We would like to thank Hosung Nam (Haskins Laboratories) for the fruitful
discussion on coupling structures for word initial consonant clusters in Italian
with and without impure s.
References
Baretti, G.
1832 English and Italian Dictionary. Part the Second. Florence: Cardinal
Printing Ofce.
Bertinetto, P.M.
2004 On the undecidable syllabication of /sC/ clusters in Italian: Con-
verging experimental evidence. Italian Journal of Linguistics/Rivista
di Linguistica, 16, 349372.
174 Anne Hermes, Martine Grice, Doris Mcke and Henrik Niemann
Abstract
This study proposes a task-dynamic gestural model of the Romanian hiatus sequence
/e.a/ and of diphthong /ea/, starting from the hypothesis that the temporal organization
of hetero- and tauto-syllabic vowel clusters can be modeled in terms of particular
coupling relations. For modeling hiatus /e.a/, stimuli were created with the oscillators
for vowels /e/ and /a/ coupled anti-phase (180-degrees) or on different cycles (360-
degrees), resulting in their sequential production. These stimuli were classied percep-
tually by Romanian listeners as hiatus sequences. For modeling stressed diphthong /ea/
and its alternation with unstressed vowel /e/, stimuli were created with the oscillators
for vowels /e/ and /a/ coupled in-phase (0-degree), resulting in their synchronous pro-
duction, and with additional manipulations of dynamic parameters, intended to model
stress effects. The perceptual results showed that vowels /e/ and /a/ synchronously co-
ordinated were perceived as vowel /e/, when all dynamical parameters were kept
constant, and that a diphthong percept was triggered when the blending weight for /a/
was greater than for /e/, causing vowel /a/ to achieve its target closer to its specica-
tion, to the detriment of vowel /e/. An acoustic analysis further showed a similarity
between the modeled stimuli and corresponding stimuli produced by Romanian native
speakers.
1. Introduction
(1) a. b. c.
The test-case language selected for the temporal modeling of the structural
distinctions illustrated in (1) is Romanian, with extensions to other cross-
linguistic instances remaining a subject for future examination. Romanian
provides an interesting case for investigating this question in that nucleus
diphthong /ea/ contrasts both with the hiatus sequence /e.a/ and with the non-
nuclear diphthong /ja/ (cf. Chitoran 2001, for a language description and a
detailed discussion of these diphthongs phonotactics). Furthermore, the nuclear
diphthongs have a mid quality, which makes them quite distinguishable from
non-nuclear diphthongs (Chitoran 2002).
The nuclear diphthong participates in a stress-conditioned alternation, shown
in (2). An interesting experimental nding was that alternating /e/ in (2b) was
realized acoustically more centralized than non-derived /e/ (3) (Marin 2005,
accepted). This difference was observed both at vowel onset and at mid-point.
At the same time, alternating /e/ was shown not to differ qualitatively from the
onset part of diphthong /ea/, while non-alternating /e/ and the onset part of the
diphthong differed signicantly. At mid-point, the diphthong differed from
both alternating and non-alternating /e/, exhibiting more centralized formant
patterns than those of either alternating or non-alternating /e/. The qualitative
difference between the diphthong and non-alternating /e/ is not surprising
assuming a bi-vocalic representation of diphthongs, such as the one in (1c):
the difference between diphthong-onset and non-alternating /e/ could be
explained as a co-articulation effect of the diphthongs offset part (vowel /a/)
on its onset, an effect naturally absent in the case of non-alternating /e/.
Following this reasoning, the absence of an acoustic difference between alter-
nating /e/ and the diphthongs onset suggested that their properties at onset
were similar namely in both cases their beginning consisted of vowel /e/
being co-produced with vowel /a/. Alternating /e/s acoustic properties could
A gestural model of the temporal organization of vowel clusters in Romanian 179
therefore be the result of vowels /e/ and /a/ being co-produced with each
other, which could explain both the difference between alternating and non-
alternating /e/, and the lack of difference between alternating /e/ and diphthong
/ea/s onset.
(2) Alternating roots:
a. Diphthong: ['sea.ra] the evening
b. Alternating /e/: [se.'ra.ta] the evening party
(3) Non-alternating roots:
a. ['se.ra] the greenhouse
b. [se.ri.'ti.ka] the greenhouse-Diminutive'
Starting from this hypothesis, the current papers aim is to explore the
extent to which the planning and execution of Romanian diphthong /ea/ (and
potentially such like units cross-linguistically) can be modeled in a way that
(a) is consistent with the kind of compositional phonological representation
shown in (1c), while at the same time being distinct from hetero-syllabic /e.a/,
(b) is capable of producing the acoustic patterns observed, and (c) can account
in a principled way for the alternation between diphthong /ea/ and alternating /e/.
In a preliminary gestural modeling study (Marin 2007), in which task-
dynamic modeled stimuli were categorized by native speakers, an /e/ vowel
percept was obtained when the constrictions/activation intervals for vowels
/e/ and /a/ were fully overlapped, and a diphthong percept when the activation
intervals for vowels /e/ and /a/ were overlapped for approximately 90% of
their movement. When the activation intervals for /e/ and /a/ did not overlap
at all, the resulting percept was hiatus /e.a/. These previous results suggested
that both alternating /e/ and the diphthong could be modeled as vowels /e/ and
/a/ whose constriction movements were (almost) fully overlapped, with the
difference that in the presence of stress, vowel /a/ would presumably be realized
slightly longer and spatially stronger, and hence not fully blended with the
movement for vowel /e/. In contrast, the hiatus sequence /e.a/ could be modeled
as two vowels fully sequential (rather than overlapped).
These temporal relations as a function of syllable organization can be
formalized in terms of specic phasing relations (or coupling modes) between
the respective vowels. For many types of skilled actions, it has been shown
that two coupling modes in-phase, and anti-phase, require no learning and
can be stably maintained (Haken et al. 1985; Turvey 1990). If the planning
clocks responsible for triggering two actions are coupled in-phase, the actions
will be triggered synchronously; if the two clocks are coupled anti-phase, one
action will be triggered after the other, with a lag equal to half the clock
180 Stefania Marin and Louis Goldstein
period; nally, if two actions are coupled in-phase but on different cycles (i.e.
they are 360-degrees-coupled), their onsets will lag by a complete clock
cycle, and the two actions will be triggered fully sequentially. It has been
hypothesized that speech employs these intrinsic coupling modes as well, and
that syllable structure could be understood in terms of these specic modes of
coordination (Browman and Goldstein 2000; Byrd et al. 2009; Goldstein et al.
2006; Krakow 1999; Marin and Pouplier 2010; Nam 2007; Nam et al. 2009).
This approach provides a principled and economical way of understanding
temporal organization in speech production, by making use of coupling rela-
tions between planning oscillators, assumed to play a role not only in speech
but in coordinated human action in general. Thus, while in the study discussed
above (Marin 2007) the distinction between alternating /e/, diphthong /ea/ and
hiatus /e.a/ was achieved informally by manipulating temporal overlap, the
present study aims to model these linguistic categories as arising from lawful
consequences of specic inter-gestural coupling modes.
Specically, we hypothesize that the temporal pattern exhibited by hiatus
sequences with little to no overlap between the vowel activations could be
modeled as a 360-degree coupling such that movement for vowel /a/ begins
roughly when movement for vowel /e/ ends. As for diphthong /ea/ and its
stress-conditioned alternation with /e/, it is hypothesized that the overlap
pattern shown previously (Marin 2007) to result in the percept of /e/ or /ea/
can be modeled as the result of in-phase coupling between the two vowel
actions. Whether this coordination mode results in the percept of a vowel or
of a diphthong should be determined by additional dynamic parameters,
whose exact nature is the experimental focus of this paper. This analysis
entails that the hiatus and the diphthong are compositionally similar, but dis-
tinguishable in terms of the coupling relations, and hence specic timing,
holding between their composing vowel actions. It also entails that the alterna-
tion between diphthong /ea/ and alternating /e/ is not structural, but the result
of different dynamical parameters governing the same vowel actions. To test
this analysis, the current study presents a gestural modeling of diphthong /ea/,
its alternation with vowel /e/, and its contrast with hiatus /e.a/. The modeled
stimuli are evaluated both perceptually (Experiments 1 and 2), and by com-
paring the acoustic properties of modeled stimuli with those of corresponding
stimuli produced by native speakers (Experiment 3).
Non-nuclear diphthongs (1b) are not considered in this paper: as onset-
nucleus or nucleus-coda structures (cf. Chitoran and Hualde 2007) they are
assumed to be organized temporally as onsets/codas with a consonantal glide.
A gestural model of the temporal organization of vowel clusters in Romanian 181
The computational model used in the current study the Task-Dynamic Appli-
cation (TADA), is a gesture-based system developed at Haskins Laboratories
to test hypotheses formulated within dynamical speech production models,
such as Articulatory Phonology (Browman and Goldstein 1990; Browman et
al. 1984; Goldstein et al. 2006; Nam et al. n.d.; Saltzman and Munhall 1989).
TADA generates speech outputs on the basis of dynamical specications of
articulatory gestures (as speech action units) and the coupling relations among
their clocks, which serve as information for computing a gestural score with
precise activation intervals for each gesture. Articulator movement then results
from imposing a set of dynamical controls on the articulators. The resulting
articulator trajectories are in turn used to compute vocal tract shapes, area
functions, and ultimately, sound via the pseudo-articulatory synthesizer HLSyn
(Hanson and Stevens 2002).
3. Experiment 1
shared articulator, and achieve its constriction closer to its underlying target, to
the detriment of unstressed gestures (cf. also the insights in Lindblom 1963,
and more recently de Jongs 1995 model of stress as hyperarticulation).
Because F0 movement is controlled primarily by placement of prosodic
pitch accents rather than by lexical stress, per se (cf. Beckman and Edwards
1994; Sluijter and van Heuven 1996), it was not considered here. Vowel quality
was also not assumed to be a relevant cue for encoding stress in Romanian,
given previous impressionistic descriptions and empirical evidence showing
that vowel /e/ in Romanian does not differ qualitatively as a result of stress
(Marin accepted). On the basis of these considerations, three parameters were
tested for modeling stress effects: activation interval of the vowel gestures
(affecting the vowels relative duration), relative blending weight of the two
vowel gestures (determining the vowels relative articulatory strength), and
presence of a prosodic gesture (Byrd and Saltzman 2003) slowing the time
course of speech production, and resulting in lengthening of the affected con-
striction. Each of these parameters will now be considered in more detail.
The activation interval of the two relevant vowel gestures determines the
time between activation onset and offset of each vowel. The coupled oscillator
model species the phase at which a gesture is activated relative to another,
while de-activation by default occurs at some regular phase of the gestures
own clock (340-degree for vowels). Thus two vowels coupled in-phase are
synchronous at activation onset, and by default (i.e. determined by their own
internal clocks) also at offset. Activation offset was manipulated so that for
some stimuli offset of /e/ occurred earlier than offset of /a/ resulting in a rela-
tively shorter duration of /e/, mirroring the fact that in Romanian (and other
languages, cf. Lindblom 1963) /e/ is slightly shorter than /a/ (Burileanu 2002).
While differential duration of low vs. mid/high vowels is not per se a stress-
related property of these vowels, it was assumed that stress could affect the
movement, and hence the duration of an intrinsically longer low vowel more
than that of a shorter one. It must be noted that without a manipulation of
activation offset, only very small intrinsic vowel duration differences would
emerge automatically from the current implementation of the model.
A second manipulation was the relative blending weight of the two vowels.
In the prosodic component of TADA currently under development, stress is
modeled, in part, by means of a spatial modulation gesture (so-called -
gesture) which serves to make stressed gestures more extreme, achieving con-
strictions closer to their underlying target values, in comparison to unstressed
gestures which may show more target undershoot (Saltzman et al. 2008). In
the currently available version of TADA in which -gestures are not yet im-
plemented, their effect can be approximated for the case of two synchronous
A gestural model of the temporal organization of vowel clusters in Romanian 183
3.2. Method
3.2.1. Participants
Twelve native Romanians, nave to the purpose of the experiment, and with
no reported speech, hearing or language decits participated in this auditory
perception task.
were modeled throughout using the default TADA specications for vowels
[] and [a] respectively, matching the phonetic characteristics of Romanian /e/
and /a/ (cf. Chitoran 2001). All the stimuli had an initial and nal labial stop
/b/ anking the relevant vowels (/b_b/).
In addition to the coupling relations between vowels /e/ and /a/, three addi-
tional parameters, assumed to play a role in modeling stress (and hence the
stress-conditioned alternation /'ea/-/e/), were manipulated. One manipulation
was changing vowel activation offset for some items so that offset of /e/
occurred earlier than offset of /a/; thus, for some stimuli, vowel de-activation
occurred at 340 degrees on the cycle of either /e/ or /a/, while other stimuli
were created with earlier de-activation of /e/, at 300 or 270 degrees, resulting
in a shorter activation interval. De-activation for /a/ was kept constant at 340
degrees. A second manipulation was relative blending weight of the two vowels
targets: for some stimuli both vowels /e/ and /a/ had the same blending weight
(i.e. a blending weight ratio BWR of 1), while for the other stimuli /a/, as the
vowel more affected by stress, had twice the blending weight of /e/ (resulting
in a BWR of 2). A third manipulation was presence or absence of a prosodic
gesture on a stimulus. When present, the -gesture was active for the entire
vowels activation duration, and its strength was at throughout (when the
two vowels had different activation durations, the -gestures activation coin-
cided with the longer vowels one). Tables 1 and 2 provide a full description
of the modeled stimuli.
Acoustic outputs with a 11025 Hz sampling frequency were generated on
the basis of these articulatory congurations, and they were classied on the
basis of auditory perception by 12 listeners (ve male). The experiment was
carried out in a quiet room and the participants were tted with headphones.
DMDX software (K. Forster and J. Forster 2003) was used for stimulus pre-
sentation and response recording. The stimuli included the bilabial closures
anking the vowel interval of interest. A forced-choice identication design
was used, in which the listeners had to decide, by pressing an appropriately
labeled computer key, whether the item heard was a) part of two syllables
(BE AB), or contained b) diphthong /ea/ (BEAB), c) vowel /e/ (BEB), or d)
vowel /a/ (BAB). None of the choices were real words in Romanian. In the
written instructions, the participants were presented with real word examples
of the categories and were told that they would hear fragments of computer
synthesized words containing those categories in the context /b_b/. The pro-
gram advanced to the next stimulus as soon as a response key was pressed or
after 6.1s. Ten repetitions of each stimulus were included in the experiment,
presented in random order.
A gestural model of the temporal organization of vowel clusters in Romanian 185
Table 1. Description of stimuli with single vowel gestures, and with two vowel
gestures coupled anti-phase or 360-degree used in Experiment 1.
Table 2. Description of stimuli modeled with two vowel gestures coupled in-phase
used in Experiment 1.
3.3. Results
The perceptual results averaged across listeners showed that single vowel
stimuli were perceived as either vowels /e/ or /a/ over 90% of the time (Figure
1a). Stimuli with vowels timed non-synchronously were perceived as hiatus
stimuli more than 85% of the time, with individual listeners ranging between
70100% hiatus responses to ea180 stimuli and between 80%100% hiatus
responses to ea360 stimuli.
For the stimuli with vowels coupled in-phase (Figure 1b), the identication
patterns showed that neither different activation intervals of the two vowels
nor presence of a -gesture alone (nor a combination of the two) made a
difference in how they were perceived. Stimuli with these manipulations alone
were overall identied as vowel /e/ 90% of the time, similar to the identica-
tion pattern of the stimulus with no manipulation (stimulus ea). As to the
blending weight parameter, there was a trend towards increasingly identifying
as diphthongs those stimuli for which /a/ had greater blending weight. Thus,
W2-stimuli were identied as a diphthong on average 3540% of the time,
with the additional presence of a -gesture slightly enhancing this effect. Indi-
vidual participant patterns, shown in Table 3, indicated that differential blend-
ing weight, independent of the other manipulations, triggered a diphthong
response at a 50% or greater level for about half of the participants, while it
did not trigger a diphthong response for the other half of the participants. The
perceptual pattern indicated therefore that greater blending weight for vowel
/a/ was the manipulation most inuencing a diphthong percept (albeit not for
all listeners), independent of vowel activation duration or -gesture.
To quantify these observations, we carried out a generalized linear mixed
model analysis with the individual (non-averaged) classication responses as
the dependent variable (two levels: diphthong vs. any other response), stimulus
as a xed factor, and participant as a random factor. This analysis showed that
stimuli with a blending weight ratio of 2 were classied as a diphthong signi-
cantly more than the base ea stimulus (cf. Table 4), conrming the trend in
diphthong response observed on the averaged data.
The shift from a vowel to a diphthong percept for those listeners exhibiting
it was not due to stimulus duration. Stimuli with a -gesture were the longest,
but this duration difference alone did not trigger a predominant diphthong
response (cf. the stimuli represented by circles in Figure 2). While the stimuli
with a combined BWR of 2 and presence of a -gesture were indeed longer,
and more consistently perceived as diphthongs (the triangle-stimuli in Figure 2),
so were some considerably shorter stimuli where only blending weight had been
manipulated (the diamond-stimuli in Figure 2).
A gestural model of the temporal organization of vowel clusters in Romanian 187
Table 3. Individual diphthong responses (%) for Experiment 1 for the stimuli with
vowels coupled 0-degree in-phase. Diphthong responses at or over 50% are
bold-faced.
3.4. Discussion
The results of the classication showed that stimuli with vowels /e/ and /a/
coupled 180-degree or 360-degree were perceived as a hiatus, while stimuli
with vowels /e/ and /a/ coupled in phase were perceived as either vowel /e/ or
diphthong /ea/, depending on further manipulations. The parameter separating
a diphthong percept from a vowel percept, at least for some of the listeners,
was the blending weight ratio between the two in-phase vowel gestures.
When vowel /a/ received extra blending weight the resulting percept was, for
about half of the listeners, preponderantly a diphthong, while equal blending
weight resulted in a single vowel percept. However, none of the stimuli created
were classied consistently by all listeners as a diphthong, possibly because the
blending weight ratio between /e/ and /a/ was not large enough. We hypothe-
sized that an even larger blending weight ratio would result in a more robust
diphthong percept. We investigated this possibility in Experiment 2.
A gestural model of the temporal organization of vowel clusters in Romanian 189
Table 4. Statistical results (Generalized Linear Mixed Model) for the diphthong
response comparison across stimuli with vowels coupled in-phase (Experiment
1). Positive Z-values indicate that there were more diphthong responses for
the given stimulus than for the base stimulus (stimulus ea).
Stimulus Z p-value
Intercept (ea) 6.119 0.000
ea30 0.733 0.464
ea27 1.369 0.171
ea_W2 5.525 0.000
ea30_W2 5.693 0.000
ea27_W2 5.206 0.000
ea_W2_ 6.383 0.000
ea30_W2_ 5.915 0.000
ea27_W2_ 5.651 0.000
ea_ 0.733 0.464
ea30_ 1.303 0.193
ea27_ 1.548 0.122
4. Experiment 2
4.1. Method
4.1.1. Participants
Sixteen native Romanians, nave to the purpose of the experiment, and with
no reported speech, hearing or language decits participated in this auditory
perception task. Eleven of the listeners (M4M8, F4F9) also participated in
Experiment 1.
Figure 2. Relationship between diphthong responses averaged across listeners (%) and
duration of vowel interval of stimuli (ms). Each /ea/ category is represented
by three values, corresponding to the activation interval manipulation.
1), to a stimulus with blending weight for /a/ six times greater than that of /e/
(i.e. a BWR of 6). The other specications of the two vowels were otherwise
kept constant. The vowels of interest were synthesized in the context /b_b/.
The stimuli thus modeled were classied on the basis of auditory percep-
tion by 16 listeners. The same overall procedure as in Experiment 1 was
used. This time the participants had to decide whether the item heard con-
tained a) diphthong /ea/ (BEAB), b) vowel /e/ (BEB), or c) vowel /a/ (BAB).
The hiatus option was excluded as a choice on the basis of the experimenters
auditory evaluation of the stimuli. Eleven of the participants (M4M8, F4F9)
also completed Experiment 1 in the same session. The stimuli were presented
ten times in random order.
4.2. Results
On average, listeners perceived stimuli with (near) equal weight as vowel /e/
over 90% of the time, stimuli with a BWR of 5 to 6 as vowel /a/ over 90% of
the time, and stimuli with a BWR between 3 and 4 as diphthong /ea/ at least
A gestural model of the temporal organization of vowel clusters in Romanian 191
50% of the time (Figure 3). Listeners varied slightly with respect to where in
the continuum their perception switched to diphthong /ea/ (cf. the individual
diphthong responses in Table 5); however 15 of the participants heard the
item with a BWR of 4 as a diphthong at least 80% of the time. One participant
(F3) showed a different pattern: for this listener, a BWR of 2 was enough to
trigger a diphthong percept. The participants in both perceptual experiments
showed a consistent response pattern to the common stimuli (ea and the
stimulus with a BWR of 1, and ea_W2 and the stimulus with a BWR of 2
respectively).
A generalized linear mixed model analysis with the individual classica-
tion responses as the dependent variable (two levels: diphthong vs. any other
response), stimulus as a xed factor, and participant as a random factor statis-
tically corroborated our general ndings. The stimuli with a BWR between 2
and 4.5 were classied signicantly more as a diphthong than the stimulus
with a BWR of 1 (Z > 5.29, p < 0.001, cf. Table 6). Additionally, stimuli
with a BWR between 3 and 4 were more often classied as a diphthong,
compared to the stimulus with a BWR of 2 (Z > 3.72, p < 0.001, Table 6).
Finally, there were more diphthong responses to the stimulus with a BWR of
4 than to the stimulus with a BWR of 3 (Z = 6.03, p < 0.001), 3.5 (Z = 5.02,
p < 0.001) or 4.5 (Z = 6.85, p < 0.001). These tests, which factored in the
listener-specic differences, showed that the diphthong responses signicantly
increased starting from the stimulus with a BWR of 2, reached a maximum at
192 Stefania Marin and Louis Goldstein
the stimulus with a BWR of 4, and then decreased again.2 A one-sample t-test
carried out on the percentage responses of each participant showed that the
diphthong responses to the stimulus with a BWR of 4 were signicantly
higher than 50% (t(15) = 9.73, p < 0.001), indicating that for this stimulus
the diphthong response consistently outnumbered any of the other two possible
responses (vowel /e/ or vowel /a/).
4.3. Discussion
Experiment 2 showed that manipulating relative blending weight of two
synchronously timed vowels triggered a perceptual switch from a monophthong
to a diphthong. Equal blending weight for vowels /e/ and /a/ coupled in-phase
resulted in an /e/ percept, while a blending weight ratio greater than 5 resulted
in the percept of vowel /a/; nally, a blending weight ratio around 4 resulted
2. Given the robust statistical results ( p-values for most comparisons either under
0.001 or over 0.1), the alpha levels were not corrected for multiple testing. How-
ever, even by using the conservative Bonferroni correction, 50 tests would have to
be carried out before an observed p-value of 0.001 would result in a familywise
error rate of 0.05. Our main patterns of (non-)signicance would remain the same
even after using such a correction.
A gestural model of the temporal organization of vowel clusters in Romanian 193
Table 6. Statistical results (Generalized Linear Mixed Model) for the diphthong
response comparison across the stimuli tested in Experiment 2. Positive
Z-values indicate that there were more diphthong responses for the given
stimulus than for the base stimulus (stimuli with BWR of 1 and 2
respectively).
5. Experiment 3
5.1. Method
For the comparison of the acoustic characteristics of natural and modeled
stimuli, we used the word series in (4). The natural data were produced by 12
native speakers of Romanian (ve male), who read the stimuli, embedded in a
constant carrier phrase, ten times in random order, and at a self-selected casual
speaking rate. The target words were separated by unrelated ller words,
embedded in the same carrier phrase. All the recordings were sampled at
22.05 kHz. The same stimuli were modeled using TADA: Both diphthong
and alternating /e/ words were modeled with the gestures for vowels [] and
[a] coupled in-phase, either with equal blending weight for alternating /e/
(henceforth blended /e/) (similar to the stimulus with a BWR of 1 in Experi-
ment 2), or with a BWR of 4 for the diphthong (similar to the stimulus with a
BWR of 4 in Experiment 2). Non-alternating /e/ was modeled with a single
gesture for vowel [] (similar to stimulus e in Experiment 1). Acoustic outputs
were generated on the basis of these articulatory congurations.
(4) Diphthong: ['sea.ra] the evening
Alternating /e/: [se.'ra.ta] the evening party
Non-alternating /e/: ['se.ra] the greenhouse
The acoustic outputs of both natural productions and modeled stimuli were
analyzed using Praat speech analysis software (Boersma and Weenink 2009).
The vocalic interval was manually labeled from the onset to the offset of the
vowel-specic formant contours, and formant frequencies for ve formants
were automatically calculated using Praats short-term spectral analysis func-
tion. The frequency values, in Hertz, for the rst two formants at the onset of
the measured interval, at its offset, and every 10% into the interval, totaling
eleven measuring points, were used in the analysis. Onset and offset points
were manually determined, while the other points were determined automati-
cally on the basis of onset and offset landmarks. Following the methodology
A gestural model of the temporal organization of vowel clusters in Romanian 195
5.2. Results
A comparison of the model stimulus formant trajectories with the trajectories
averaged across male and female speakers productions showed comparable
acoustic patterns for naturally produced tokens and model stimuli (Figure 4).
While precise values for F1 and F2 differed to some extent between produc-
tions by male speakers, by female speakers and by the model, the general
patterns for stimuli types were similar in that both F1 and F2 trajectories for
alternating /e/ were (slightly) more extremely front (higher F2, lower F1) than
those for diphthong /ea/, and less extreme than those for non-alternating /e/.
The Euclidean distance analysis conrmed the acoustic similarity between
natural and modeled stimuli. Naturally produced diphthong words were closest
acoustically to the modeled diphthong: for the word ['sea.ra], the distance
E['sea.ra] had smaller values (Median = 270) than either E['se.ra] (Median = 379)
or E[se.'ra.ta] (Median = 294). Likewise, naturally produced alternating /e/ stimuli
were closer to modeled blended /e/ (Median = 241) than to either the diphthong
(Median = 277) or mono-gestural /e/ (Median = 338), and natural non-alternat-
ing /e/ stimuli were closer to modeled mono-gestural /e/ (Median = 246) than
196 Stefania Marin and Louis Goldstein
Figure 4. Vowel F1 and F2 trajectories of the words ['sea.ra], [se.'ra.ta], and ['se.ra],
plotted on the basis of values measured at onset (0%), offset (100%), and
every 10% into the vowel interval, as produced by the model, by male
speakers and by female speakers.
A gestural model of the temporal organization of vowel clusters in Romanian 197
Table 7. Statistical results (Wilcoxon Signed Ranks test) for the comparisons between
Euclidean distances. Two-tailed signicance is reported. Effect size (r) was
calculated on the basis of Z-scores.
to either the diphthong (Median = 378) or blended /e/ (Median = 325). Paired-
samples Wilcoxon Signed Ranks tests, summarized in Table 7, conrmed that
for each word the smallest Euclidean distance namely the one matching in
category was signicantly smaller than the distance next up in value, validat-
ing the consistency of the pattern across speakers.
5.3. Discussion
The observed acoustic similarity between model stimuli and natural tokens
could be taken as an indication of a comparable similarity at the production
level, and thus the gestural conguration probably employed in natural pro-
duction could be inferred from the known gestural conguration employed in
the model. It is then plausible that natural tokens were produced similarly to
the modeled ones, with the gestures for vowels /e/ and /a/ coupled in-phase
both for alternating /e/ and diphthong /ea/, but with equal or different blending
weights, as a function of absence or presence of stress. Additionally, the fact
that natural alternating /e/ was acoustically more similar to the bi-gestural /e/
in modeled [se.'ra.ta], than to the mono-gestural /e/ in ['se.ra], suggests that
indeed production of alternating /e/ may involve two gestures, rather than just
one.
Alternatively, the difference between [se.'ra.ta] and ['se.ra] could have been
modeled as a difference in target specications (specically, the target for
[se.'ra.ta] could be set to the post blending targets of the BW1 model stimulus),
rather than as a difference in gestural composition. The present model, while
tting the natural data reasonably well, has nevertheless the additional advan-
tage of capturing the lexical relationship between [se.'ra.ta] and ['sea.ra] as a
possible source for the difference between [se.'ra.ta] and ['se.ra].
198 Stefania Marin and Louis Goldstein
6. Conclusion
Acknowledgements
References
Cho, Taehong
2004 Prosodically conditioned strengthening and vowel-to-vowel coarti-
culation in English. Journal of Phonetics 32: 141176.
Collier, Ren, Fredericka Bell-Berti and Lawrence J. Raphael
1982 Some acoustic and physiological observations on diphthongs. Lan-
guage and Speech 25: 305323.
Davis, Stuart and Michael Hammond
1995 On the status of on-glides in American English. Phonology 12: 159
182.
Forster, K.L. and J.C. Forster
2003 A Windows display program with millisecond accuracy. Behavior
Research Methods, Instruments, & Computers 35: 116124.
Fowler, Carol A.
1981 Production and perception of coarticulation among stressed and
unstressed vowels. Journal of Speech and Hearing Research 46:
127139.
Goldstein, Louis, Dani Byrd and Elliot Saltzman
2006 The role of vocal tract gestural action units in understanding the
evolution of phonology. In Michael A. Arbib (ed.), Action to
Language via the Mirror Neuron System, 215249. Cambridge:
Cambridge University Press.
Haken, H., J.A.S. Kelso and H. Bunz
1985 A theoretical model of phase transitions in human hand movements.
Biological Cybernetics 51: 347356.
Hanson, Helen M. and Kenneth N. Stevens, K. N.
2002 A quasi-articulatory approach to controlling acoustic source parame-
ters in a Klatt-type formant synthesizer using HLSyn. Journal of the
Acoustical Society of America 112: 11581182.
Harrington, Jonathan, Felicitas Kleber and Ulrich Reubold
2008 Compensation for coarticulation, /u/-fronting, and sound change in
standard southern British: An acoustic and perceptual study. Journal
of the Acoustical Society of America 123: 28252835.
Harrington, Jonathan, Janet Fletcher and Corinne Roberts
1995 An analysis of truncation and linear rescaling in the production of
accented and unaccented vowels. Journal of Phonetics 23: 305322.
de Jong, Kenneth J.
1995 The supraglottal articulation of prominence in English: Linguistic
stress as localized hyperarticulation. Journal of the Acoustical Society
of America 97: 491504.
Kaye, Jonathan D. and Jean Lowenstamm
1984 De la syllabicit. In Franois Dell, Daniel Hirst, Jean-Roger Vergnaud
(eds.), La forme sonore du langage, 123159. Paris: Hermann.
Krakow, Rena
1999 Physiological organization of syllables: a review. Journal of Phonetics
27: 2354.
202 Stefania Marin and Louis Goldstein
Lindblom, Bjrn
1963 On vowel reduction (Report No. 29). Stockholm, Sweden: The
Royal Institute of Technology, Speech Transmission Laboratory.
Marin, Stefania
accepted Romanian blended vowels: A production model of incomplete
neutralization. In Selected papers of the PaPI 2009. Mouton de
Gruyter.
Marin, Stefania
2007 An articulatory modeling of Romanian diphthong alternations. In
Jrgen Trouvain and William J. Barry (eds.), Proceedings of the
XVIth International Congress of Phonetic Sciences, 453456.
Saarbrcken, Germany.
Marin, Stefania
2005 Complex Nuclei in Articulatory Phonology: The Case of Romanian
Diphthongs. In Randall Gess and Edward J. Rubin (eds.) Selected
papers of the Linguistic Symposium in Romance Languages 34th,
161177. Amsterdam, Philadelphia: John Benjamins.
Marin, Stefania and Marianne Pouplier
2010 Temporal organization of complex onsets and codas in American
English: Testing the predictions of a gestural coupling model.
Journal of Motor Control 14: 380407.
Marotta, Giovanna
1988 The Italian diphthongs and the autosegmental framework. In Pier
Marco Bertinetto and Michele Loporcaro (eds.) Certamen Phonolo-
gicum, 389420. Torino: Rosenberg & Sellier.
Mooshammer, Christine and Susanne Fuchs
2002 Stress distinction in German: Simulating kinematic parameters of
tongue tip gestures. Journal of Phonetics 30: 337355.
Nam, Hosung
2007 A Gestural Coupling Model of Syllable Structure. PhD Dissertation,
Department of Linguistics, Yale University.
Nam, Hosung, Louis Goldstein and Michael Proctor
n.d. TADA (TAsk Dynamics Application). Retrieved from http://www.
haskins.yale.edu/tada_download/
Nam, Hosung, Louis Goldstein and Elliot Saltzman
2009 Self-Organization of syllable structure: A coupled oscillator model.
In Franois Pellegrino, Egidio Marsico, Ioana Chitoran and Cristophe
Coup (eds.), Approaches to phonological complexity, 299328.
Berlin/New York: Mouton de Gruyter.
Pierrehumbert, Janet B.
2002 Word-specic phonetics. In Carlos Gussenhoven and Natasha Warner
(eds.), Papers in Laboratory Phonology VII, 101139. Berlin:
Mouton De Gruyter.
Saltzman, Elliot and Kevin G. Munhall
1989 A dynamical approach to gestural patterning in speech production.
Ecological Psychology 1: 333382.
A gestural model of the temporal organization of vowel clusters in Romanian 203
Abstract
This study investigates the temporal coordination of tones and constriction gestures in
Catalan and Viennese German using electromagnetic articulography. It is observed that
nuclear rises are later in German than in Catalan. We model the difference in tonal
alignment patterns using a coupled oscillator model, proposing that it can emerge
from differences in the coupling relations between tones and oral constriction gestures.
In Catalan, the high tone gesture is coupled in-phase with the accented vowel. In
German, a low tone and a high tone gesture compete with each other to be in-phase
with the vowel resulting in a rightward shift of the high tone gesture and therefore to
a delayed rise on the acoustic surface. We conclude with a comparison of lexical and
prosodic pitch accent tones and their interaction with the syllable-level coupling graph.
In contrast to lexical tones, prosodic tones do not perturb the within-syllable relations
of consonant and vowel timing.
1. Introduction
This study describes the temporal coordination pattern between tones and oral
constriction gestures in Catalan and German and attempts to analyze the
temporal pattern using a planning model of intergestural timing grounded
on Articulatory Phonology. We will show that this coordination follows the
basic principles applied to consonant clusters, which have been reported in
numerous studies (Browman and Goldstein 1988, Honorof and Browman
1995, Byrd 1995, Bombien et al. 2010, Goldstein, Chitoran, and Selkirk 2007,
Goldstein et al. 2009, Hermes et al. 2008, Marin and Pouplier, 2010, Nam 2007,
Nam, Goldstein, and Saltzman 2009, Shaw et al. 2009).
Within the framework of Articulatory Phonology, speech can be decomposed
into invariant phonological units, articulatory gestures that are temporally
coordinated with one another (Browman and Goldstein 1989). A coupled
oscillator planning model of speech timing has been developed that provides
a possible way of modelling the coordination of gestures in time (Browman
and Goldstein 2000, Goldstein et al. 2009, Nam and Saltzman 2003, Nam,
Goldstein, and Saltzman 2009). In the model, gestures are associated with
nonlinear planning oscillators (or clocks) that are coupled with each other in
a pattern specied by a coupling graph, assumed to be part of an utterances
phonological representation. In the present study, we model the control of
pitch to achieve a target in F0 as a tonal gesture and investigate the temporal
coordination of tonal gestures with oral constriction gestures in Catalan and
Viennese German (also referred to as Standard Viennese Austrian) bitonal LH
pitch accents.
It has been reported elsewhere that Catalan and German are expected to
show different alignment patterns for nuclear rises (Prieto et al. 2007b for
Catalan, Mcke et al. 2009 for Viennese German). We aim to test whether
those alignment differences can be seen as phonological in nature in the sense
that they emerge from topological differences in phonological coupling
graphs. Our results show that in the acoustic analysis, the accentual rise starts
later with respect to segmental landmarks in Viennese German compared to
Catalan. In the articulatory analysis, we focus on the start of the F0 rise move-
ment (the L valley) as the start of the H tone gesture. In Catalan, the start of
the H tone gesture is tightly synchronised with the start of the vowel gesture,
while in the German variety the H tone gesture starts considerably later. We
hypothesize that the difference lies in the coupling relations between tones
and vowel gestures. Therefore, we propose a non-competitive coupling struc-
ture type for Catalan, and a competitive structure (usually known from those
in consonant clusters) for Viennese German. The competitive coupling struc-
ture leads to a rightward shift of the H tone gesture (and therefore to later F0
rises on the acoustic surface).
We conclude with a discussion on the difference between prosodic (pitch
accent tones) and lexical tones and how they are supposed to interact with
the syllable-level coupling graphs for consonant and vowel coordination.
Spanish, DImperio, Petrone, and Nguyen 2007 for Italian, Atterer and Ladd
2004 and Mcke et al. 2008b for different German varieties, Ladd 2008 for a
general overview). Usually, tones occur in the vicinity of the lexically stressed
syllable carrying the tone. Therefore, tones are hypothesized to be aligned
with segments corresponding to the lexically stressed syllable. Figure 1 shows
the alignment properties of prenuclear rising pitch accents in different lan-
guages. The start of the rise (the L event) in English and Greek is constantly
aligned with the left periphery of the accented syllable, at the beginning of the
acoustic segment associated with the syllable-onset consonant. In fact, these
are not the only two languages reported in the literature with this pattern for
L (Ladd, Mennen, and Schepman 2000 for Dutch, DImperio 2002 for Italian,
Prieto and Torreira 2007a for Spanish, Prieto et al. 2007b for Catalan). How-
ever, German has been shown to have later rises in prenuclear accents. In
Standard Northern German (low Franconian speech area near Dsseldorf ),
the L occurs around the middle of the C1 segment, while in Southern German
(Viennese German) L occurs even later, during V1.
In Articulatory Phonology, speech gestures are modelled as invariant func-
tional units of vocal tract constricting action and speech can be decomposed
into a constellation of gestures: articulatory events with extent in time that
can temporally overlap with one another. The regularity and variability in
intergestural timing have been described by many studies (Byrd 1994, 1996a,b;
Cho 2001, Bombien et al. 2010). Such temporal patterns have been modelled
using an intergestural timing model, where the intergestural temporal relation-
ship (e.g. timing and connectivity) is specied in an inter-oscillator coupling
208 Doris Mcke, Hosung Nam, Anne Hermes and Louis Goldstein
Figure 2. Coupling graphs for (2a) pa (simple syllable onset, CV) and (2b) up
(simple syllable coda, VC) with in-phase (solid lines) and anti-phase (dotted
lines) target specications.
Figure 3. Coupling graphs for (3a) spa (complex syllable onset, CCV) and (3b) ask
(complex syllable coda, VCC) with in-phase (solid lines) and anti-phase
(dotted lines) target specications.
and Selkirk 2007, Goldstein et al. 2009, Hermes et al. 2008, Marin and Pouplier
2010, Nam 2007, Nam, Goldstein, and Saltzman 2009, Shaw et al. 2009).
In many languages, complex codas are dened by a non-competitive
coupling structure, because of the weaker strength of anti-phase coupling. A
coupling graph for a VCC coordination in English (e.g. ask) is provided in
Figure 3b. Only C1 is linked directly to the V gesture; the coupling is in an
anti-phase relation. The following Cs are coordinated only with respect to
each other (anti-phase), but not directly to the V gesture. In what follows we
will apply the basic coupling modes on the coordination of Tone gestures with
oral constriction gestures.
A tone can also be understood as a coordinated articulatory action to
achieve a tonal task goal and thus dened as a dynamical system in F0 space,
a tone gesture (Gao, 2009). Considering a tone as a gesture enables one
to model the tone-to-gesture timing within the intergestural timing model. A
rising pitch accent, e.g. the H tone gesture (or H gesture) involves a tonal
movement to an H target in F0 (schematised in Figure 4). The onset of a
Figure 4. Analysis of a rising LH pitch accent contour: Tones as gestural action units
(above), and tones as events (below).
210 Doris Mcke, Hosung Nam, Anne Hermes and Louis Goldstein
Tone gesture is taken to be the point in time at which F0 begins to move in the
direction of that gesture's target. In an LH rise, the onset of the H tone gesture
coincides with the offset of the preceding L tone gesture. In this example of
pitch accents, the beginning of the L gesture is unclear. Note that Tone ges-
tures (L and H gestures in Figure 4) are dynamical systems of control that
have extent in time (their activation intervals), while in the autosegmental
view, tones are events that occur at instants in time (H and L in Figure 4).
Gao (2009) extended the coupled oscillator model for intergestural timing
to the analysis of temporal pattern of lexical tones in Mandarin Chinese. She
investigated syllables with single onsets (CV and CVC) such as [ma] or [man].
For syllables with only one tone (Tone 1H, Tone 3L), she showed that the
oral constriction gestures (C, V) and the Tone gestures (T) are activated in the
temporal order of C-V-T. The onset of the consonant gesture occurred con-
siderably (~50 ms) before the onset of the vowel gesture, while the onset of
the Tone gesture occurred after the vowel gesture, with about the same lag.
She demonstrated that this timing pattern of tones and constriction gestures
(C and V) can be predicted by hypothesizing that Tone gestures function
like C gestures in the competitive coupling topology in Figure 3a: the C and
T gesture are both coupled in-phase to the vowel and C and T are coupled in
anti-phase to one another. As a result, the C gesture is shifted leftwards with
respect to V, while the Tone gesture is shifted rightwards (c-center like co-
ordination of C, V and T). The coupling graph for tones and oral constriction
gestures in Mandarin Chinese proposed by Gao (2009) is provided in Figure 5.
This hypothesis was further supported by the results of Tone 4 (HL). Here,
the H tone was synchronized with the V, while C preceded and L followed by
substantial lags. This pattern provided evidence that C-H-L are all coupled
anti-phase to one another and in-phase to the vowel.
In the present study, we extend the work on Tone gestures to pitch accent
tones, and we examine how these Tone gestures are temporally coordinated
with oral constriction gestures and with other Tone gestures. One hypothesis
Figure 5. Coupling graph for Tone 1H, Tone 3L in Mandarin Chinese, syllable
[ma], adapted from Gao 2009. The Tone gesture (T) behaves like an
additional consonant (C).
Coupling of tone and constriction gestures in pitch accents 211
2. Method
labial alveolar
[ m i . m a . m i] [ n i . n a . n i]
open
[ m i . m a . m i . l a] [ n i . n a . n i . l a]
[ m i . m a m. z i] [ n i . n a n. m i]
closed
[ m i . m a m. z i . l a] [ n i . n a n. m i . l a]
Four target words were constructed with the lexically stressed syllable as
the target syllable (see table 2). Analogously to the Catalan data we varied
syllable structure (open and closed syllables) and place of articulation of the
consonants (labial vs. alveolar). The phonological syllable structure was varied
by varying phonological vowel length, 'CV:CV vs. 'CVCV. In German, short
vowels do not occur in open syllables if they are stressed. Therefore, we
assume ambisyllabicity for the intervocalic C in the 'CVCV sequence (as
suggested by psycholinguistic experiments carried out by Schiller, Meyer,
and Levelt 1997, who have shown that Dutch speakers tend to close syllables
containing a short vowel).
labial alveolar
open [d i # m a:. m i] [d i # n a:. n i]
closed [ d i # m a m i] [ d i # n a n i]
F0 rise (the L valley) and the beginning of the initial C1 segment of the tonic
syllable (Tone-C1 segment).
Articulatory analysis: We identied articulatory labels for movements in the
vertical position time function (of the respective sensors: lower lip for /m/,
tongue tip for /n/, and tongue body for the vowel. Algorithmically, we identi-
ed the time of onset and effective target achievement of consonant and vowel
gestures (and also offset for consonant gestures) at zero-crossings in the
velocity curve. Based on these algorithmically-determined time points, we
measured temporal lags between the tone gestures and the oral constriction
gestures (V and C gestures) using the onsets of gestural activation, which are
the time points when gestures begin to move toward its target. The labels V
and C gesture are used to refer to the onsets of the vowel and the initial con-
sonant gestures.
For both acoustic and articulatory landmarks, the temporal lag between
pairs landmarks is reported as A-B. Thus, a negative value implies that A occurs
earlier than B, and vice versa for a positive value.
3. Results
Section 3.1 reports the acoustic and articulatory alignment patterns for the
nuclear LH rises in Catalan, and section 3.2 reports on the same for Viennese
German. We included all stimuli into the statistical analysis (acoustic and
articulation).
Figure 6. Catalan acoustic (a) and articulatory (bd) alignment latencies in ms, bilabial
data.
2. We treated labial and alveolar datasets separately to avoid the effects of intrinsic
variation (due to different organs) in timing patterns.
Coupling of tone and constriction gestures in pitch accents 217
Table 3. Catalan mean lags (in ms) and standard deviations in parenthesis for acoustic
(Tone-C1 segment) and articulatory alignment measures, separately for broad
and contrastive focus, all data. The articulatory measures include the lags
Tone-V gesture, Tone-C gesture and C-V gestures.
nously. The Tone gesture lagged the V gesture slightly (by 4 ms in the labial
data, Figure 6b, and by 2 ms in the alveolar data) and the C gesture by slightly
more: 6 ms in the labial data, Figure 6c, and by 8 ms in the alveolar data.
Compatibly, the C gesture led the V gesture slightly, (on 2 ms for the labial
data and 5 ms in the alveolar data). Thus, gestural onsets occur in the order
C-V-T, but the lags are tiny.
Like the acoustic analysis, we tested the articulatory measures with three-
way ANOVAs (2 2 2) conducted separately for the labial and alveolar
dataset. There were no signicant results for the labials.
However, in the alveolar dataset we found a main effect of Focus structure
on all measures: Tone-V gesture [F(1, 40) = 10.43, p < 0.01], Tone-C gesture
[F(1, 40) = 17.45, p < 0.001] and C-V gesture [F(1, 40) = 62.83, p < 0.001].
In contrastive focus compared to broad focus the tone starts 5 ms later in the
Tone-V measure, 9 ms earlier in the Tone-C measure and the V gesture starts
10 ms later in the C-V measure. Furthermore, there was also a main effect of
Foot Size on the measures Tone-V gesture [F(1, 40) = 12.96, p < 0.001] and
C-V gesture [F(1, 40) = 12.40, p < 0.01], but not on the measures Tone-C
gesture (p > 0.05). In a two-syllable foot compared to a three syllable foot
218 Doris Mcke, Hosung Nam, Anne Hermes and Louis Goldstein
the tone starts 8 ms earlier in the Tone-V measure and the V gesture starts 7
ms earlier in the C-V measure.
Table 4 gives an overview of the effects found in the articulatory analysis
for Catalan.
bilabial alveolar
Catalan Tone-V Tone-C C-V Tone-V Tone-C C-V
Syllable structure ns ns ns ns ns ns
Foot size ns ns ns *** ns **
Focus structure ns ns ns ** *** ***
Place of articulation ns ns ns ns ns ns
To sum up, in the Catalan data, the C, V and Tone gestural onsets are very
close to being synchronous, occurring in the order of C-V-T. Furthermore, the
lags in the gestural analysis turned out to be less affected by prosodic factors
in the labial dataset compared to the alveolar dataset. However, a one-way
ANOVA on all data (labial and alveolar together) revealed no effect of Place
of Articulation on the respective measures (Tone-V gesture, p > 0.05; Tone-C
gesture, p > 0.05; C-V gesture, p > 0.05).
Table 5. Viennese German mean lags (in ms) and standard deviations in parentheses
for acoustic (Tone-C1 segment) and articulatory alignment measures, con-
trastive focus, all data. The articulatory measures include Tone-V gesture,
Tone-C gesture and C-V gesture.
Figure 7. Viennese German acoustic (a) and articulatory (bd) alignment latencies in
ms, bilabial data.
bilabial alveolar
Viennese German Tone-V Tone-C C-V Tone-V Tone-C C-V
Syllable structure ** *** ns ns ** ns
Place of articulation ** *** *** ** *** ***
Figure 9. Gestural score for Catalan ['ma.mi], broad focus. The gure is to scale and
based on means (for 10 tokens).
Coupling of tone and constriction gestures in pitch accents 223
Figure 10. Gestural score for Viennese German ['ma:.mi], contrastive focus. The
gure is to scale and based on means (for 10 tokens).
On the other hand, results for Viennese German contrastive focus show that
the onset of the H tone gesture is delayed with respect to the V gesture (by
105 ms) and the C gesture (by 107 ms), illustrated in the gestural score in
Figure 10. However, the oral constriction gestures for C and V are still syn-
chronous (by 2 ms difference for the CV lag across all data). Only the Tone
gesture starts later.
To account for this difference between Catalan and Viennese German, we
hypothesized the distinct coupling graphs shown in Figure 11, and tested them
using the Haskins Laboratories task-dynamic speech production model (aka
TaDA, Nam et al. 2004). The graphs were input to the model, and different
gestural scores of Catalan and Viennese German were successfully generated,
showing the much later onset lag for Viennese German.
Figure 11. Gestural score and coupling graphs for Catalan (a) and Viennese German
(b); coupling graphs with in-phase (solid lines) and anti-phase (dotted lines)
target specications.
224 Doris Mcke, Hosung Nam, Anne Hermes and Louis Goldstein
As shown in Figure 11, L and H are sequentially ordered and thus coupled
in an anti-phase relation (dotted line) for both languages, Catalan and Viennese
German. The difference across the two languages lies in the coupling relation
between the tones and the V gestures.
In Catalan (Figure 11a), the H tone gesture is coupled in-phase with the
accented V gesture (see solid line). L is not directly coupled to the V and starts
at some point within the pretonic syllable. The vowel and the H gesture there-
fore begin simultaneously.
In Viennese German (11b) both tones, L and H, are in-phase with the
accented V, although they are of course sequenced with respect to each other.
This competitive coupling results in a rightward shift of the H gesture to make
room for the preceding L gesture rather like the competitive coupling, Figure
3a, in which case the consonant shifts to the right to make room for the addi-
tional consonant (see Browman and Goldstein 1989, Browman and Goldstein
2000, Nam and Saltzman 2003, Nam 2007, Goldstein et al. 2009, Marin and
Pouplier 2008, see also Hermes et al., this volume).
Thus, we can provide a principled coupling account of the differences
across the two languages. This also allows us to see how the timing of the H
gesture is controlled in Viennese German. It is not synchronized with some
arbitrary time point, but rather its delay follows automatically from the com-
petitive topology of its graph.
However, in the autosegmental-metrical theory it is also possible to assume
that the rising LH pitch accent in Catalan has no leading (L) tone at all, and
simply analyse it as an H*. That kind of analysis would also involve a non-
competitive structure for the coupling of the H tone gesture with the vowel.
For Viennese German (the later rise), it would be possible to assume an L*H
instead of an LH* to account for the later alignment. But in German there
is no clear evidence for a categorical difference between L*H and LH* (see
discussion in Braun and Ladd 2003, Braun 2007).
It is interesting to note the similarities between the proposed coupling
graphs differences and the autosegmental association diagrams proposed by
Grice (1995), which treat bitonal pitch accents as sequences (Figure 12a) or
units (12b), analogously to consonant clusters or affricates respectively in the
segmental domain (Yip 1989).
The coordination of pitch accent tones in Viennese German and Catalan
(and resulting coupling graphs) differs in important ways from the lexical
tones in Mandarin as analyzed in Gao (2009). First consider Catalan vs.
Mandarin. In Mandarin, syllables with H or L tones are produced with the
substantial (~50 ms) lags between C and V onsets and then between V and T
onsets. In Catalan, the C,V, and H gestures all begin synchronously. One inter-
Coupling of tone and constriction gestures in pitch accents 225
Figure 12. (a) Cluster of 2 bitonal pitch accents with 2 tonal root nodes, (b) unit with a
branching tonal root node (Grice 1995).
coupling graphs. Much more data from more speakers and more languages
will of course be required to substantiate this idea.
Acknowledgements
The Catalan recordings were carried out in collaboration with Pilar Prieto,
ICREA-University Pompeu Fabra, Barcelona, Spain.
References
Byrd, D.
1995 C-centers revisited. Phonetica 52, 285306.
Byrd, D.
1996a A Phase Window Framework for Articulatory Timing. Phonology
13(2), 139169.
Byrd, D.
1996b Inuences on Articulatory Timing in Consonant Sequences. Journal
of Phonetics 24(2), 209244.
Cho, T.
2001 Effects of morpheme boundaries on intergestural timing: Evidence
from Korean. Phonetica 58(3), 129162.
DImperio, M.
2002 Language-specic and universal constraints on tonal alignment: The
nature of targets and anchors. Proceedings of the 1st international
conference on Speech Prosody, Aix-en-Provence, France, 101106.
DImperio, M., Petrone, C. and Nguyen, N.
2007 Effects of tonal alignment on lexical identication in Italian. In C.
Gussenhoven and T. Riad (eds.), Tones and tunes, Vol. 2, Berlin:
Mouton de Gruyter, 79106.
Gao, M.
2009 Gestural Coordination among Vowel, Consonant and Tone Gestures
in Mandarin Chinese. Chinese Journal of Phonetics. Beijing: Com-
mercial Press.
Goldstein, L., Chitoran, I. and Selkirk, E.
2007 Syllable structure as coupled oscillator modes: evidence from
Georgian vs. Tashlhiyt Berber. In: Proceedings of the 16th Interna-
tional Congress of Phonetic Sciences, Saarbrcken, Germany, 241
244.
Goldstein, L., Nam, H., Saltzman, E. and Chitoran, I.
2009 Coupled oscillator planning model of speech timing and syllable
structure. In G. Fant, H. Fujisaki and J. Shen (eds.), Frontiers in
Phonetics and Speech Science, Beijing: The Commercial Press,
239250.
Grice, M.
1995 Leading tones and downstep in English, Phonology 12(2), 183233.
Hermes, A., Grice, M., Mcke, D. and Niemann, H.
2008 Articulatory indicators of syllable afliation in word initial con-
sonant clusters in Italian. In Proceedings of the 8th International
Seminar on Speech Production, Strasbourg, France, 433436.
Honorof, D., and Browman, C.
1995 The center or edge: How are consonant clusters organized with
respect to the vowel? In K. Elenius and P. Branderud (eds.), Pro-
ceedings of the 13th International Congress of Phonetic Sciences,
Stockholm: KTH and Stockholm University, 552555.
228 Doris Mcke, Hosung Nam, Anne Hermes and Louis Goldstein
Fang Hu
Abstract
This paper proposes how laryngeal complexity, the tone, could emerge from sequential
complexity, consonant clusters, by examining tonogenesis in Lhasa Tibetan on the
basis of the articulatory and acoustic data recorded by Electromagnetic Articulograph
(EMA, the Carstens AG500 system) from three native speakers. The acoustic data con-
rmed the high-low contrast of tones in Lhasa on the one hand and a high correlation
between tonal contours and syllable types on the other. In other words, the high-low
contrast emerged earlier than contour contrast in Lhasa tonogenesis, which is different
to the classical Vietnamese case (Haudricourt 1954) and Chinese case (Pulleyblank
1962). The intergestural timing revealed a C-center organization for the Lhasa syllable
production, namely the vowel gesture begins around the midpoint between the con-
sonant gesture and tone gesture. That is, the tone gesture is coordinated like an additional
consonant to the CV production. Results suggest that Lhasa tonogenesis followed
general coupling principles in syllable production (Nam, Goldstein and Saltzman
2010), and in the long-term historical development, the competitive coupling relations
initiated the simplication process for Lhasa consonant clusters, and nally the tone
gesture emerged as an integrated component of syllable production.
1. Introduction
the simplication of syllable initials and rimes. First, tonal contours emerged
from different rime types, e.g., level tone from open syllable, falling tone from
aspirated syllable, and rising tone from checked syllable. Second, high vs. low
register contrasts further developed from the loss of voicing distinction in
syllable initials. These two basic mechanisms were generally accepted in the
eld of historical linguistics in accounting for the tonogenesis in Sino-Tibetan
languages (e.g., Pulleyblank 1962; Mei 1970). And phonetic research, in
general, demonstrated that these mechanisms are supported by empirical data
(Hombert, Ohala and Ewan 1979). According to Hombert, Ohala and Ewan
(1979), a number of segmental effects, such as initial voicing, postvocalic
glottal stop or fricative, phonation etc., quite naturally, have a perturbation
effect of fundamental frequency (F0) on the adjacent vowels within a syllable
in both tonal and non-tonal languages. And, tone emerges when an intrinsic
(F0) perturbation comes to be used extrinsically (p. 37).
Thus, a key issue of the inquiry into tonogenesis is to explain how an
intrinsic F0 perturbation in a non-tonal language evolves into an extrinsic
linguistic contrast in tonal languages. Both non-tonal and tonal languages
share a commonality that F0 is rst of all a global intonational function of
sentence production, but they differ in that F0 is additionally localized in the
syllable production in tone languages. If F0 perturbation, which is riding
on the global sentence intonation in a non-tonal language, emerges as a local
event, i.e. linguistic tone, the production of F0 should be bound to the produc-
tion of the syllable. On the other hand, tonogenesis is accompanied by the sim-
plication of syllable structure. While syllables are becoming sequentially
simpler with the loss of consonant clusters, syllable production is featured by
a new structural complexity in tone languages, namely a laryngeal gesture is
simultaneously superimposed upon supralaryngeal articulations. The question
is how. Current phonology adopts an autosegmental view on the relation
between laryngeal and supralaryngeal articulations. That is, tone and segments
are parallel to each other, and they are associated by lines in an abstract
fashion. In the research line of phonetics, however, the temporal alignment
between tone and segments demonstrates stable, concrete patterns both in
tone languages (Xu 1998, 1999, 2005) and in non-tonal languages (DImperio
et al. 2007; Mcke et al. 2009). Explicitly, Articulatory Phonology (Browman
and Goldstein 1986, 1988, 1992) looks into the coordination structure between
individual articulations. Each individual articulation, or gesture, is an action
unit which involves a formation and release of a particular constriction in the
vocal tract. Unlike traditional phonological concepts, which are claimed to
be autonomous or linguistic internal, gestures in articulatory phonology follow
Tonogenesis in Lhasa Tibetan Towards a gestural account 233
2. Methodology
1. The speakers read these citation syllables with learned pronunciation, which
differs from the colloquial form mainly in that more orthographic information is
retained in the learned pronunciation. For instance, Lhasa was reported as having
long and short open (CV) syllables, and was thus treated as being contrastive in
vowel duration in the literature (e.g., Jin ed. 1983; Qu 1981). But in the citation
monosyllables, long open syllables are normally pronounced with a liquid coda
according to the orthographic spelling.
2. The aspirated syllables were transcribed as short open syllables in the literature.
Its true that the syllable-nal aspiration diminishes or disappears if the aspirated
syllable occurs in an unstressed position in running speech. But in the citation
form, these short open syllables are clearly aspirated, i.e. pronounced with a
syllable-nal glottal fricative.
3. CVN syllables are grouped with CV syllables in the literature. Here, CVN
syllables are treated as a different syllable type since (1) they have a complex
coda, and (2) they have a longer duration than CV syllables which consequently
may affect their tonal development.
236 Fang Hu
synchronized audio recording. Three native Lhasa female speakers were re-
corded. They were all rst year or second year undergraduate students, 20 to
21 years old, in the Minzu University of China in Beijing.
The sensors were attached on the speakers articulators along the mid-
sagittal plane: two on the tongue (tip and body), two on the lips (lower and
upper lips respectively), and one on the gum ridge at the lower incisors
( jaw). Additional three sensors on the bridge of nose and behind the left and
right ears served as references to compensate for head movements. The articu-
latory data were sampled at 200 Hz and smoothed with a 12 Hz low-pass lter.
The acquired data were further corrected for head movements, and then
rotated and translated to the speakers occlusal plane.
The consonant gesture in the target syllable was characterized by lip
aperture, i.e. the calculated Euclidean distance between the lower and upper
lip sensors. The vowel gesture was characterized by the kinematics of the
tongue body sensor. The tone gesture was, however, based on the acoustics,
i.e. the fundamental frequency (F0). Due to the limitation of research techni-
ques, the periodicity of vocal folds was not directly measured in this study.
Alternatively, its acoustic output, F0, was taken as a measure of tone gesture.
Following Gao (2008, 2009; see also Mcke et al. this volume), the preceding
F0 minimum was taken as the onset of a high tone, and the preceding F0
maximum as the onset of a low tone. Physically, laryngeal periodicity is only
observed on the voiced segments. As a result, syllables with a voiceless vs.
voiced initial show an inconsistency. For instance, F0 is observed for the
whole syllable in [mar], but for the rime part only in [par]. In this study, the
tone gesture was measured from a sentential F0 event, and F0 is viewed as
being virtually connected during the production of voiceless [p]. That is, for a
high or low toned [par], for instance, the preceding F0 minimum or maximum
is measured as the onset of the target tone, respectively. And generally, the tone
onset was found around or even shortly before the onset of the target syllable
(see Figure 2 for illustration and Section 4 for detailed results). In this way,
tones on the voiceless and voiced syllables were treated consistently in this
study. In fact, this kind of treatment is in line with the traditional idea that
tone is a syllabic property (Wang 1967; Chao 1968). And furthermore, tone
as a syllable-synchronized feature is supported by the empirical tonal align-
ment data in Mandarin Chinese, a canonical tone language (Xu 2005).
Figure 2 illustrates the acoustic data labeling procedure and Figure 3 illus-
trates the articulatory data labeling procedure for the same high toned p-initial
syllable [par], respectively. The annotations consist of three acoustic levels,
syllable, tone, and target (acoustically dened tone onset), and two articulatory
levels, lip aperture (LA) and tongue body (TB).
Tonogenesis in Lhasa Tibetan Towards a gestural account 237
Figure 2. Acoustic labeling for [par] in the citation position and sentence-mid
position. Levels of annotation (upper to lower): syllable, target, tone;
signal windows (upper to lower): audio, wideband spectrogram, F0.
The label of syllable delimits the whole syllable, i.e., both consonantal initial
and rime. As shown in Figure 2, the label of syllable [par] includes the rime and
the VOT of the initial consonant for the target syllable in citation position, i.e.,
the rst X position in the carrier sentence; and the label of syllable [par]
includes the rime, the VOT, and the acoustic closure part of the initial con-
sonant for target syllables in the sentence-mid position, i.e., the second X
position in the carrier sentence. The label of tone delimits the periodical rime
part in a syllable. Thus, the interval between the onset of syllable and tone
denes the consonant duration. The target syllable in citation position was
labeled acoustically only and the discussion of the acoustic properties of
Lhasa tones in Section 3 is based on these annotated tone segments in cita-
tion positions such that both p-initial syllables and m-initial syllables have
comparable F0 contours and durations. As mentioned above, F0 contours are
considered as being virtually connected across the production of the voiceless
[p]. And the F0 minimum that precedes the target high tone, which is located
around the offset of the preceding syllable [ti], was thus dened as the onset of
the target high tone gesture.
Articulatory annotations apply to the target syllable in the sentence-mid
position. The consonant gesture for the bilabial [p] or [m] was dened by lip
aperture (LA), which is composed of a gesture of lip closing and lip opening.
The production of the vowel [a] or [] was characterized by a lowering gesture
of tongue body (TB). The annotations were based on the positional data with
reference to the criterion of tangential velocity minimum. As shown in Figure
3, from the LA positional peak to its rst valley was labeled as the gesture of
lip closing (close), and accordingly from the valley to the following peak was
labeled as the gesture of lip opening (open). And as shown in the gure, peaks
and valleys occur where there are tangential velocity minima. Similarly, the
lingual lowering gesture (lower) was labeled from a stable higher TB position
to a stable lower TB position where there are tangential velocity minima.
Figure 4 gives the mean F0 contours associated with the eight different syllable
type and tone combinations in Lhasa Tibetan from the three female speakers.
The F0 contours were averaged for each combination in the citation position
across all the repetitions of all the tested syllables (refer to 1) to 8) in section
2 for details).
As summarized in Table 1, the F0 contour patterns are quite consistent
across all the three speakers.
Tonogenesis in Lhasa Tibetan Towards a gestural account 239
Figure 4. Lhasa tones. High CVS: H; low CVS: LH; high CVh: HS; low CVh: LHH;
high CV: HLS; low CV: LHS; high CVN: HL; low CVN: LHL.
First, there is a clear high vs. low tonal contrast. Acoustically this feature is
manifested on the onset part of the tone. The high tones have a high F0 onset
at around 270320 Hz and the low tones a low F0 onset at around 190240
Hz. Second, tonal contours are highly correlated with syllable types. It has
been debated in the literature whether Lhasa has two, four, or six tones. Its
quite clear from the acoustic data presented here that the complementary dis-
240 Fang Hu
Table 1. Syllable types and the emergent tonal melodies in Lhasa Tibetan.
4. Hombert, Ohala and Ewan (1979) concluded that glottal stop has a raising effect
on F0. However, glottal stop could induce a sharp drop in F0, too (Zee and
Maddieson 1979). Moreover, as noted in Tan and Kong (1991: 17), the glottal
stop in Lhasa is actually characterized by glottalization. That is, glottal closure is
realized as creaky voice in the sense of Ladefogeds (1971) continuum of phona-
tion types (see also Gordon and Ladefoged 2001).
Tonogenesis in Lhasa Tibetan Towards a gestural account 241
F0, and consequently the CVh syllable has a comparable F0 contour to its
CVS counterpart, but is much shorter. In summary, all perturbations in Lhasa
Tibetan have an F0 lowering effect. By contrast, the unperturbed F0 stays as a
high (H) tonal element. Thus, a rising F0 contour was induced by historical
voicing, a falling F0 contour was induced by the glottal stop, and a rising-
falling F0 contour was induced by both of them.
The acoustic results from this study are, in general, consistent with those
from T. Hu, Qu and Lin (1982). The only difference is that this study further
distinguishes two types of checked syllables: CVN vs. CV. An eight-tone
analysis is therefore proposed. Although CVN and CV share similar F0
contour patterns, the former is considerably longer than the latter (cf. F. Hu
and Xiong 2010). Interestingly, this durational difference has a critical con-
sequence. A sharp drop in F0 signies the presence of a glottal stop (Zee and
Maddieson 1979), and is thus redundant in nature. However, the glottal stop
is often dropped in natural conversational speech, and consequently the sharp
F0 drop effect disappears. As a result, CV syllables will have a similar F0
contour to the corresponding CVh syllables5. By contrast, a slower drop in
F0 is not a redundant feature, and the falling pitch is always attested in the
production of CVN syllables. That is, even when the glottal stop in CVN
syllables is weakened or deleted as reported in the literature (e.g. Jin ed.
1983: 13), the falling tonal contour is still retained. To sum up briey, the
emergent citation tones in Lhasa Tibetan are demonstrating a new direction
of development: while short tones on CVh and CV syllables tend to merge,
short tones on CVN are further emerging as contrastive tones (F. Hu and
Xiong 2010).
It has been shown so far that contrastive tones have emerged in Lhasa Tibetan.
Meanwhile, the emergent tonal melodies are highly constrained by syllable
structures, and are thus still under development from a historical phonological
point of view. In this section, the internal articulatory structure of Lhasa
syllable/tone production is examined.
Figures 57 show the temporal structure of intergestural coordination for
the syllable production from the three Lhasa speakers respectively. Bars in
and low-toned syllables. Results from the paired t-tests, as listed by the p-
values in the last column in the tables, show that there is no signicant differ-
ence between the durations of CV lags and VT lags in most cases in Speaker 1
and 2. Two cases from Speaker 1 and one case from Speaker 2, as signied by
the shaded cells in the tables, show that the difference is signicant at the 95%
condence level. However, all cases from Speaker 3 exhibit a highly signi-
Tonogenesis in Lhasa Tibetan Towards a gestural account 247
cant difference between the durations of CV and VT lags, and the CV lag is
larger than the VT lag7.
The dataset presented in the present study is three-way in nature. That is,
there are four syllable types (CVS, CVN, CV, and CVh), two types of
initial consonants ([p m]), and two types of tones (high and low). To elaborate
further if syllable type, initial consonant and tone have an effect on the C-
center-like alignment, a 3-way ANOVA with repeated measures was conducted
on the difference between CV lag and VT lag for each speaker. Results from
Speaker 1 indicate no signicant effect of syllable type (F(3,320) = 0.6463,
p = 0.5858), initial consonant (F(1,320) = 0.0272, p = 0.8692), and tone
(F(1,320) = 0.0519, p = 0.8200). And there is no signicant effect of interac-
tions between syllable type and initial consonant (F(3,320) = 0.5314, p = 0.6611),
between syllable type and tone (F(3,320) = 1.0751, p = 0.3598), between ini-
tial consonant and tone (F(1,320) = 0.1604, p = 0.6890), and between syllable
type, initial consonant and tone (F(2,320) = 1.2462, p = 0.2890). Results from
Speaker 2 indicate no signicant effect of syllable type (F(3,208) = 0.7506,
p = 0.5231), initial consonant (F(1,208) = 1.0917, p = 0.2973), and tone
7. This is mainly attributable to the fact that this speaker habitually laid an extra
focus on the target syllables when reading them in carrier sentences. As shown in
Figure 7, the lip gesture of this speaker is characterized by a long closing phase.
On the other hand, in order to get an unbiased overall picture of the intergestural
coordination of Lhasa syllable production, this study does not use any threshold
in dening articulatory gestures, and thus could not control this sort of variability
induced by the speakers reading style.
248 Fang Hu
Figure 8. Coupling structure for the consonant, vowel and tone gestures in Lhasa
Tibetan. Solid line: in-phase coupling; dotted line: anti-phase coupling.
5. Conclusion
The acoustic data conrmed the high-low contrast of tones in Lhasa on the
one hand and a high correlation between tonal contours and syllable types
on the other. That is, different to the classical Vietnamese case (Haudricourt
1954) and Chinese case (Pulleyblank 1962), the high-low contrast emerged
earlier than contour contrast in Lhasa tonogenesis. And in general, the results
are in line with the tonogenesis mechanisms proposed in Hombert, Ohala and
Ewan (1979), namely the intrinsic segmental perturbation on F0 was or is
being extrinsically used and was or is being internalized in the grammar.
Meanwhile, the Lhasa case also demonstrated language-specic mechanisms:
(1) the syllable-nal glottal stop produced a great drop, rather than rise in
F0; (2) the syllable-nal aspiration did not have much effect on F0.
The intergestural timing revealed a C-center organization for the Lhasa
syllable production, namely the vowel gesture begins approximately at the
midpoint between the consonant gesture and tone gesture. That is, the tone
gesture is coordinated like an additional consonant to the CV production. The
Lhasa case corroborates the results from Mandarin Chinese (Gao 2008, 2009),
a canonical syllable tone language, but differs from sentential pitch accents in
non-tonal languages such as Catalan and German (Mcke et al. this volume).
Unlike in tone languages where tones are lexical representations and are thus
locally integrated in the coupling relation of syllable production, sentential
pitch accents occur as a post-lexical event in non-tonal languages, namely the
alignment of the tone gesture doesnt affect the coordination structure of the
consonant and vowel gestures (Mcke et al. this volume). It seems that the
Lhasa case follows general coupling principles in syllable production (Nam
and Saltzman 2003; Nam 2007; Nam, Goldstein and Saltzman 2010), and in
the long-term historical development, the competitive coupling relations initiated
the simplication process for Lhasa consonant clusters especially in the prevo-
calic position, and nally the tone gesture emerged as an integrated component
of syllable production. Yet it should be admitted, however, that in order to have
250 Fang Hu
Acknowledgment
References
Gao, Man
2009 Gestural coordination among vowel, consonant and tone gestures in
Mandarin Chinese. Chinese Journal of Phonetics 2: 4350.
Goldstein, Louis, Dani Byrd and Elliot Saltzman
2006 The role of vocal tract gestural action units in understanding the
evolution of phonology. In Michael A. Arbib (ed.) Action to lan-
guage via the mirror neuron system, pp. 215249. Cambridge:
Cambridge University Press.
Gordon, Matthew and Peter Ladefoged
2001 Phonation types: a cross-linguistic overview. Journal of Phonetics
29: 383406.
Haudricourt, Andr-Georges
1954 De lorigine des tons en Vitnamien. Journal Asiatique 242: 6982.
Hombert, Jean-Marie, John J. Ohala and William G. Ewan
1979 Phonetic explanations for the development of tones. Language 55:
3758.
Hu, Fang and Xiong, Ziyu
2010 Lhasa tones. Speech Prosody 2010, 100163: 14.
Hu, Tan, Aitang Qu and Lianhe Lin
1982 Experimental studies on Lhasa Tibetan tones [in Chinese]. Yuyan
Yanjiu 2: 1838.
Huang, Bufan
1994 Conditions for tonogenesis and tone split in Tibetan dialects [in
Chinese]. Minzu Yuwen 3:19. English translation by Jackson T.-S.
Sun in Linguistics of the Tibeto-Burman Area 18: 4362, 1995.
Jiang, Di
2002 Studies on Tibetan historical sound change [in Chinese]. Beijing:
Minzu Press.
Jin, Peng (ed.)
1983 An introduction of Tibetan language [in Chinese]. Beijing: Minzu
Press.
Karlgren, Bernhard
191526 tudes sur la phonologie chinoise. Archives dtudes Orientales,
Vol. 15, Leyde et Stockholm.
Ladefoged, Peter
1971 Preliminaries to linguistic phonetics. Chicago: The University of
Chicago Press.
Li, Fang-Kuei
1977 A handbook of comparative Tai. Honolulu: University Press of
Hawaii.
Maspro, Henri
1912 tudes sur la phontique historique de la language annamite: les ini-
tiales. Bulletin de lcole Franaise dExtrme Orient 12.1: 1124.
252 Fang Hu
Matisoff, James A.
1973 Tonogenesis in Southeast Asia. In Larry M. Hyman, ed., Consonant
Types and Tone, pp. 7195. Southern California Occasional Papers
in Linguistics, No. 1. Los Angeles: USC.
Matisoff, James A.
1999 Tibeto-Burman tonology in an areal context. In Shigeki Kaji, ed.,
Proceedings of the Symposium Cross-Linguistic Studies of Tonal
Phenomena: Tonogenesis, Typology, and Related Topics, pp. 332.
Tokyo: Institute for the Study of Languages and Cultures of Asia
and Africa, Tokyo University of Foreign Studies.
Mazaudon, Martine
1977 Tibeto-Burman tonogenetics. Linguistics of the Tibeto-Burman Area
3.2: 1123.
Mei, Tsu-Lin
1970 Tones and prosody in Middle Chinese and the origin of the rising
tone. Harvard Journal of Asiatic Studies 30: 86110.
Mcke, Doris, Martine Grice, Johannes Becker and Anne Hermes
2009 Sources of variation in tonal alignment: evidence from acoustic and
kinematic data. Journal of Phonetics 37: 321338.
Mcke, Doris, Hosung Nam, Anne Hermes and Louis Goldstein
2012 Coupling of tone and constriction gestures in pitch accents, this
volume.
Nam, Hosung
2007 Syllable-level intergestural timing model: Split-gesture dynamics
focusing on positional asymmetry and moraic structure. In J. Cole
& J. I. Hualde (eds.), Laboratory Phonology 9, pp. 483506. Berlin,
New York: Mouton de Gruyter.
Nam, Hosung, Louis Goldstein and Elliot Saltzman
2010 Self-organization of syllable structure: a coupled oscillator model.
In F. Pellegrino, E. Marisco & I. Chitoran (Eds.), Approaches to
phonological complexity. Berlin, New York: Mouton de Gruyter.
Nam, Hosung and Elliot Saltzman
2003 A competitive, coupled oscillator model of syllable structure. In
Proceedings of the 15th ICPhS, pp. 22532256, Barcelona, Spain.
Pulleyblank, Edwin G.
1962 The consonantal system of Old Chinese, Part II. Asia Major 9: 206
265.
Qu, Aitang
1981 Tibetan tone and its historical development [in Chinese]. Yuyan
Yanjiu 1: 177194.
Sun, Jackson T.-S.
1997 The typology of tone in Tibetan. Chinese Languages and Linguistics
IV: Typological Studies of Languages in China (Symposium Series
of the Institute of History and Philology, Academia Sinica, Number
2), 485521. Taipei: Academia Sinica.
Tonogenesis in Lhasa Tibetan Towards a gestural account 253
Natalie Boll-Avetisyan
Abstract
Long-term memory representations that facilitate short-term memory (STM) recall
have been found to also facilitate lexical acquisition (e.g. Gathercole 2006). Such facil-
itation comes for example from probabilistic phonotactics. It is controversial, whether
probabilistic phonotactic knowledge is informed by abstractions from lexical entries,
or also by sub-lexical representations. When disentangling the two, previous studies
found lexical effects but had difculties demonstrating sub-lexical effects on STM
recall (e.g. Roodenrys & Hinton 2002). It is, however, paradox to need a lexicon for
lexical acquisition. Strikingly, previous studies had only used CVC nonwords as
stimuli. We hypothesize that sub-lexically represented probabilistic phonotactics are
informed by abstract knowledge about phonological structure. Consonant clusters
in syllable margins are structurally more restricted than CV or VC strings. Hence, sub-
lexical effects should increase with syllable complexity. This was tested with Dutch
adults in an STM recognition task. As expected, recognition was faster for nonwords
of high than of low phonotactic probability. The effect was present when complex
syllables were used, but not, when syllables were simple. A second experiment that
controlled for stimulus duration, as longer stimuli had provoked longer recall latencies,
replicated the result. The study opens up the possibility that sub-lexical knowledge
bootstraps lexical acquisition.
1. Introduction
reect the role of STM in lexical acquisition. In fact, nonword recall can be
seen as the initial step in storing new words in the mental lexicon. The easier
it is to hold an item in STM, the easier it is to store it in long-term memory
(LTM). Interestingly, STM recall is affected by LTM representations. Hence,
it has been suggested that LTM knowledge is used for reconstructing degraded
memory traces during sub-vocal rehearsal in STM a process referred to as
redintegration (Schweickert, 1993).
As to the role of phonotactics in STM recall, a pioneer study by Gathercole
and colleagues (Gathercole, Frankish, Pickering, & Peaker, 1999) showed that
seven to eight-year-old children were better at recalling CVC nonwords with
high rather than low phonotactic probability in serial nonword repetition tasks.
Moreover, children who did particularly well at relying on cues from phono-
tactic probability in nonword repetition were shown to have a larger lexicon
size than children who did less well in relying on the phonotactic cues. Similar
results have been found in a study with adult L2 learners (Majerus, Poncelet,
Van der Linden, & Weekes, 2008).
Although the studies reviewed above may seem to offer evidence for direct
effects of probabilistic phonotactics on lexical acquisition, it is important to
guard against premature conclusions, given that there are two levels of
processing from which probabilistic phonotactic information can be derived.
One is a sub-lexical level, at which knowledge of sub-word units, such as
phonemes and biphones and their probability of occurrence (e.g., Vitevitch &
Luce, 1999) is represented. The other is the lexical level, at which the phono-
logical forms of words and morphemes are represented. Probabilistic phono-
tactics can be deduced from lexical items by comparing phonologically similar
items and their frequencies.
As mentioned before, it would be helpful for a language-learning child or
an L2 learner, if (s)he could draw on phonotactic knowledge to facilitate
word learning before the onset of lexical acquisition. To open up this possibility,
it is necessary to distinguish between lexical and sub-lexical knowledge, as
only the latter can possibly be acquired independently of the lexicon. The
study by Gathercole et al. (1999) was later criticized on the grounds that
when manipulating sub-lexical factors (such as biphone frequencies), lexical
factors (such as lexical neighborhood density) had not been controlled for
(Roodenrys & Hinton, 2002). Lexical and sub-lexical probabilities are highly
correlated: words composed of high frequency biphones tend to have many
lexical neighbors (e.g., Landauer & Streeter, 1973). Experimental evidence
suggests that both lexical and sub-lexical factors function as independent
predictors of well-formedness judgments on nonwords, even though they are
highly correlated (e.g., Bailey & Hahn, 2001). Furthermore, they are known
260 Natalie Boll-Avetisyan
(e.g., Dupoux, Kakehi, Hirose, Pallier, & Mehler, 1999; Pierrehumbert, 2003)
have equated the sub-lexical level, as it is phonological by denition, with
a phonological level that represents a phonological grammar. Under current
conceptualizations, phonological grammar can be interpreted as a knowledge
system that contains, among others, markedness constraints referring to abstrac-
tions of structure, such as syllable structure (Prince & Smolensky, 1993/2004).
Markedness constraints ban marked structures, such as complex syllable onsets
and codas. The notion of a phonological grammar is supported by data from
typology, acquisition and processing. Typologically, languages that tolerate
complex syllable margins, such as Dutch and English, also tolerate simple
syllable margins; in contrast, however, there are no languages that tolerate
complex syllable margins, but disallow simple syllable margins. This implica-
tion indicates that complex constituents are more marked than simple constit-
uents. Complex constituents are restricted by the constraints *Complex-Onset,
penalizing consonant clusters in syllable onsets, and *Complex-Coda, penal-
izing consonant clusters in syllable codas. CVC syllables, however, are rela-
tively unmarked, being constructed of simple syllable constituents only.
It has been noted that unmarked structures are acquired earlier than marked
structures (e.g., Jakobson, 1969; Smolensky, 1996). This has also been found
to be the case with syllable structure. Dutch children start producing CV
and CVC syllables before CVCC and CCVC syllables. CCVCC seems to be
even harder to acquire, possibly due to a conjoined constraint *Complex-
Onset&*Complex-Coda (Levelt & Vijver, 1998/2004)1. This means that
words with a CCVCC structure are intrinsically more marked than words
with a CVC structure, even if complex constituents are phonotactically legal,
as in languages such as English or Dutch. When learners are faced with the
task of learning a word, they need to make reference to their phonological
grammar, which informs them about whether the form violates or satises
markedness constraints. When a word of a simple CVC structure is acquired,
markedness constraints will hardly be violated. The more complex a structure,
the more relevant markedness constraints will be for the processing system.
The reference to the phonological grammar when acquiring a word should
furthermore have an effect on the inuence of probabilistic phonotactics.
The aforementioned studies on the role of sub-lexical representations in
lexical acquisition have not only neglected the fact that sub-lexical processing
between words containing biphones with low versus high phonotactic proba-
bility should be larger for words containing structurally complex syllable
constituents (such as complex onsets or complex codas) than for nonwords
containing structurally simple syllable constituents (such as singleton onsets
and codas).
The hypothesis was tested in two STM experiments with adult native
speakers of Dutch. Dutch allows for complex syllable onsets and syllable
codas. Yet the markedness interpretation of structural complexity predicts that
complex onsets and complex codas should be less well-formed than simple
onsets and codas. I used a probed STM recognition task (Sternberg, 1966),
which has the advantage that no articulatory motor programs are co-activated.
This is a change from previous studies on probabilistic phonotactics in lexical
acquisition, which mostly used production tasks (e.g., Gathercole, et al., 1999;
Roodenrys & Hinton, 2002; Storkel, et al., 2006; Thorn & Frankish, 2005). A
study (Storkel, 2001) that used perception-based tasks revealed facilitatory
effects of high phonotactic probability to the same extent as a production-
oriented task.
The prediction was that with nonword stimuli doubly manipulated for both
phonotactic probability and syllabic complexity, phonotactic probability would
affect recognition performance such that high biphone frequency would facili-
tate nonword recognition, but only or more severely so, in the case of complex
syllables.
2. Experiment 1
2.1. Method
2.1.1. Participants
Participants were 30 native speakers of Dutch without self-reported hearing
disorders. All were drawn from the Utrecht Institute of Linguistics participant
pool and compensated for participation.
2.1.2. Materials
All stimuli were nonwords that are phonotactically legal in Dutch. That is, all
phonemes are part of the Dutch phoneme inventory and all biphones are licit
sequences in the syllable positions in which they occur. The stimuli were
manipulated for two factors. One was syllable structure type, a factor of four
levels (CVC, CVCC, CCVC, and CCVCC). The second was biphone fre-
quency, a factor of two levels (high versus low biphone frequency). Biphone
264 Natalie Boll-Avetisyan
The target items were created such that they would only minimally differ
from each other, such as the low biphone probability nonwords /lum/, /lump/,
and /xlump/, or the high biphone probability nonwords /vo:k/ and /vo:kt/.
In this way, interference of singleton frequency effects, which are known
to inuence phonotactic well-formedness (e.g., Bailey & Hahn, 2001), was
minimized.
The CVC, CCVC, CVCC and CCVCC ller items used in this experiment
were randomly selected from a list of Dutch nonwords. Each ller item occurred
only once throughout the experiment. The stimuli were spoken in a sound-proof
booth by a Dutch native speaker, who was nave of the purpose of the study.
Two Dutch native speakers conrmed that the stimuli sounded natural. A list
of all target items is given in Appendix A.
2.1.3. Procedure
Participants were tested in a probed recognition task (Sternberg, 1966), in
which they were presented a series of four nonwords followed by a probe.
Probabilistic phonotactics in lexical acquisition 265
The task was to decide whether the probe was in the series or not. Each series
contained one target and three ller items. The series were designed such that
every syllable type occurred once in each trial. Every series thus contained the
same number of segments. An example is given in Table 2.
Table 2. Examples of both a target and a ller series used in Experiment 1 and 2a.
The experiment consisted of 184 target series. The design had two factors
(biphone frequency, syllable structure) with 2 4 levels (high/low; CVC,
CVCC, CCVC, CCVCC); accordingly, the 184 targets divide into 23 targets
of each type. In addition, 184 ller-series, in which the probe did not match
any of the prior four llers, were included. All series and all items within the
series were randomized for every participant in order to avoid item- or series-
specic order effects.
Participants were tested individually in a soundproof booth facing a
computer screen. The stimuli were played by computer over headphones at a
comfortable volume. The stimuli were presented with a relatively long inter-
stimulus-interval (ISI) of 700 ms, and the probe was presented 1400 ms after
the last item of the stimulus series. This was done to add a high memory load
to the task to invoke reference to more abstract phonological representations
(e.g., Werker & Logan, 1985).
Yes/No-decisions were made on a button-box. After the end of each series,
an icon occurred on the screen indicating the beginning of a new trial. The
dependent measure was reaction time (RT). When no decision was made after
3000 ms, the trial was stopped and counted as an error. The experiment took
90 minutes. Three breaks were included. There were three practice trials
before the experiment started.
2.2. Results
2.2.1. Reaction times
A linear mixed regression model with RT as dependent variable, Participants
and Targets as random factors, and Biphone frequency (high/low), Syllable
structure (CVC, CCVC, CVCC, CCVCC) and Biphone frequency*Syllable
structure as xed factors revealed signicant differences between nonwords
of different syllable structures as well as interaction effects between syllable
structure and biphone frequency, but no signicant differences between high
and low biphone frequency nonwords (see Appendix B).
Table 3. Estimated reaction time averages and the differences () between reaction
time means for high versus low biphone frequency in Experiment 1 in ms
measured from the target onset.
Syllable Structure High How Total M
CVC 1117.82 1103.75 14.70 1110.79
CVCC 1114.42 1172.82 58.40 1143.62
CCVC 1107.68 1182.36 74.68 1145.02
CCVCC 1133.08 1221.40 88.32 1177.24
Total M 1118.25 1170.08 51.68 1144.17
An analysis of the estimates of the xed effects with low biphone frequency-
CVC as a reference point revealed a signicant interaction of biphone fre-
quency and syllable structures between the simple syllable structure CVC and
the more complex syllable structures CCVC and CCVCC (see Appendix C).
That is, compared to high and low biphone frequency CVC nonwords, partic-
ipants were signicantly slower at recalling CCVC and CCVCC nonwords of
low biphone frequency than CCVC and CCVCC nonwords of high biphone
frequency (see Figure 1). The averages of the estimated RT means are given
in Table 3.
2.2.2. Exploration
The results in Experiment 1 may, however, be due to another factor, which we
had not controlled for: high biphone frequency words are generally spoken
faster than low biphone frequency words (Kuperman, Ernestus, & Baayen,
2008)2. In order to check whether such a difference might account for our
results, we measured the duration of our target items. We did not nd that our
speaker had pronounced all high biphone frequency target items in a shorter
time than the low biphone frequency target items. However, we observed that
the spoken durations of each item type overall matched the respective reaction
times. So, both durations and reaction times were longer for low rather than
for high biphone frequency CCVC and CCVCC nonwords. For CVC non-
words, this effect was reversed (compare Figure 1 and Figure 2).
This is problematic, as speech-rate is known to affect recall latencies: the
longer the duration of an item, the more difcult it is to recall (e.g., Baddeley,
Thomson, & Buchanan, 1975; Cowan, Wood, Nugent, & Treisman, 1997).
Hence, the faster RTs on high biphone frequency items may be due to the
fact that they were shorter in duration. We added speech-rate as a co-variate
to the analysis, and found the effects to remain signicant.
268 Natalie Boll-Avetisyan
Figure 2. Duration means and SDs in ms of the high versus low biphone frequency
target items for each syllable structure in Experiment 1.
2.3. Discussion
It was predicted that, when holding CVC items in the phonological loop, there
should be little support from sub-lexical LTM representations, as CV and VC
biphones are hardly restricted by structural constraints. CC biphones, on the
contrary, are much more restricted. Hence, reference to sub-lexical LTM repre-
sentations with representations of specic biphones making reference to the
phonological grammar should be important in preventing complex CVCC,
CCVC, or CCVCC items from decaying in the phonological loop. Hence, it
was predicted that effects of biphone frequency on STM recognition perfor-
mance would increase with increasing syllable complexity. This effect should
occur while lexical neighborhood frequency is controlled for, to make sure
that effects relate to sub-lexical rather than lexical knowledge.
As displayed in Figure 1, the result is as predicted: The differences in
recognition performance between high and low biphone frequency in inter-
action with syllable structure increased from simple to complex structures.
Probabilistic phonotactics in lexical acquisition 269
3. Experiment 2
3.1. Method
Experiment 2 aimed at replicating the results of Experiment 1 while control-
ling for speech-rate. The experiment was carried out in two conditions: Con-
dition 1 repeated Experiment 1 using exactly the same stimuli, but controlled
for speech-rate. Condition 2 only used the CVC and CCVCC nonwords of
Condition 1. CVCC and CCVC were excluded because the interaction in
Experiment 1 did not occur between CVCC and CVC and was therefore also
not expected to occur here. Furthermore, the interaction was strongest between
CVC and CCVCC nonwords and our hypothesis can also be tested using two
syllable structure types only.
3.1.1. Participants
Sixty native speakers of Dutch without self-reported hearing disorders all
drawn from the Utrecht Institute of Linguistics participant pool, none of
whom had participated in Experiment 1 participated in the experiment.
They were compensated for their participation.
3.1.2. Materials
The stimuli were identical to those used in Experiment 1, with two differences:
First, the target items were controlled for duration such that for each class of
syllable structure the duration of the stimuli did not differ between high and low
frequency biphone nonwords (see Figure 3). This was realized by manually
adjusting the vowel durations, which produces more natural results than
adjusting the duration of the nonwords as a whole, as durational variation in
natural speech usually affects vowels more than consonants (e.g., Greenberg,
Carvey, Hitchcock, & Chang, 2003). Manipulating vowel length should not
have caused perceptual confusion since long and short vowels contrast in
terms of quality (F1, F2), making them distinguishable. Finally, it was of utmost
importance to maintain the naturalness of the onset and coda constituents, which
are the focus of this study. Using the software Praat (Boersma & Weenink,
2007), manipulations were carried out on both sides: stimuli with long durations
were shortened, and stimuli with short durations were lengthened. To ensure
that they would not alter the vowel quality, manipulations were carried out
only in the middle of the vowel. For shortenings, a portion in the middle was
cut out, and for lengthenings, the waves of the middle were copy-pasted. Two
native speakers of Dutch conrmed that the stimuli sounded natural.
Probabilistic phonotactics in lexical acquisition 271
Figure 3. Duration means and SDs in ms of the high versus low biphone frequency
target items for each syllable structure in Experiment 2 after controls.
3.1.3. Procedure
There were two conditions for this experiment. Thirty participants were
assigned to Experiment 2a, and thirty participants were assigned to Experi-
ment 2b.
3.2. Results
A linear mixed regression model was calculated. The dependent variable was
RT. Participants and targets were random factors. The xed factors were
Experimental condition (2a/2b), Biphone frequency (high/low), Syllable struc-
ture (CVC, CCVC, CVCC, CCVCC), and interactions of Biphone frequency *
Syllable structure, Biphone frequency * Experimental condition, Experimental
condition * Syllable structure, and Biphone frequency * Syllable structure *
Experimental condition.
A linear mixed model revealed a signicant main effect of condition (F (1,
5.798) = 5.884, p < 0.05) with items all over being recognized 129.78 ms
faster in Experiment 2b than in 2a (SE = 53.69). This difference can be
accounted for by a greater decay of attention in Experiment 2a, since the
4. As Experiment 2a and 2b test the same predictions, the data were pooled to
increase power and to minimize the number of models tested. Separate analyses
of Experiment 2a and 2b did not reveal all predicted effects.
Probabilistic phonotactics in lexical acquisition 273
Figure 4. Averages of the estimated means and 95% condence intervals of the
reaction time in ms for CVC and CCVCC syllable structure nonwords for
each high (H) and low (L) biphone frequency.
Appendix D) with CCVCC nonwords being recognized more slowly than CVC
nonwords (see Table 5). Finally, there was a signicant interaction between
biphone frequency and syllable structure (see Appendix D). More precisely,
the difference in RTs for high versus low biphone frequency nonwords was
signicantly larger among complex CCVCC nonwords than among simple
CVC items (see Figure 4).
3.3. Discussion
As predicted, participants performed signicantly better in recognizing high
rather than low biphone frequency nonwords. The effect of biphone frequency
interacts with syllable structure. The difference in RTs between high and low
biphone frequency is larger among complex CCVCC nonwords than among
simple CVC nonwords (see Figure 4), indicating that the effect of probabilistic
phonotactics increases with increasing syllable complexity. As opposed to
Experiment 1, here, biphone frequency affected recognition latencies of CVC
nonwords. On the one hand, this may be caused by the fact that items were
controlled for speech rate. On the other hand, in Experiment 2 the ISIs were
shorter than in Experiment 1, which may have elicited more low-level process-
ing than the longer ISIs in Experiment 1.
4. General discussion
The results of the two experiments in this study indicate that, as hypothesized,
sub-lexically represented knowledge affects phonological memory. Crucially,
the sub-lexical representations that are used for redintegration are twofold
Probabilistic phonotactics in lexical acquisition 275
with low-level probabilistic phonotactics on the one hand, and structural con-
straints as part of a phonological grammar on the other. The interaction of
these two components, i.e., growing effects of phonotactic probability with
increasing structural complexity, indicates that sub-lexical LTM knowledge is
particularly important when rehearsing phonologically complex word forms.
Less sub-lexical LTM knowledge is at play when simple CVC nonwords are
rehearsed in the phonological loop. These results suggest that when processing
hardly restricted CV and VC biphones, listeners make reference to low-level
phonotactic probability knowledge, which, however, does not necessarily need
feedback from a phonological grammar, as it is the case when structurally
more restricted CC biphones are processed. With respect to Phonological
Theory, this study supports the view that the effects of phonological grammar
are not only categorical. In our experiments, all nonwords were made up of
legal structures. They only differed in terms of the probability of biphones.
Hence, the binary grammar distinction between legal and illegal cannot be
the ultimate account. Furthermore, knowledge of phonological grammar seems
to modulate the processing of categorically legal forms depending on their
probability (e.g., Albright, 2009; Boersma & Hayes, 2001; Andries Coetzee,
2008; Hayes, 2000; Pierrehumbert, 2003).
Future studies may want to investigate whether the additive effect of struc-
tural complexity in low biphone frequency items necessarily relates to two
representational components with probabilities on the one hand and marked-
ness constraints on the other, or whether all effects may be accounted for by
either a grammar or by probabilities.
The result of this study has indirect implications for theories of lexical
acquisition. Factors that inuence performance in STM nonword recall tasks
have been suggested to similarly constrain lexical acquisition. Among these
factors is, for example, the mechanism to draw onto LTM representations
such as sub-lexical knowledge. LTM knowledge aids when holding a novel
word form in short-term memory. Similarly, it helps to keep a novel word in
the LTM storage when it has to be remembered for a long time, i.e., when it
has to be acquired. Such conclusions are supported by the fact that perfor-
mance in STM tasks has often been found to be correlated with lexicon size
and lexical development over time (see Gathercole, 2006 for an overview).
The nding that both phonotactic probability and structural knowledge
affect recognition memory thus indicates that each of these two sub-lexical
components may be involved in facilitating lexical acquisition. As lexical
neighborhood density was controlled for, the result must be attributed to
effects of a sub-lexical rather than the lexical level. Thus, the results are con-
sistent with the hypothesis that the dependence between phonotactics and
276 Natalie Boll-Avetisyan
lexical acquisition is not only unidirectional, with the lexicon informing the
phonological grammar, as is assumed by most phonologists (e.g., Hayes &
Wilson, 2008; Pierrehumbert, 2003). Instead, two interacting sub-lexical knowl-
edge components may play a role in lexical acquisition, in particular when
complex word forms are remembered. This implies a bidirectional dependence.
Considering that phonotactic knowledge is represented at a sub-lexical
level raises the question of how these sub-lexical representations are acquired.
Most studies assume that sub-lexical representations emerge as abstractions
over lexical representations. Pierrehumbert (2003), for example, assumes that
at rst the sub-lexical level only contains phonetically detailed representations.
These phonetically detailed representations are used to create lexical represen-
tations. Later, phonotactic knowledge emerges as abstractions over the lexicon.
An alternative view is that phonotactics is acquired bottom-up from speech
(e.g., Adriaans & Kager, 2010; Boll-Avetisyan et al., submitted). For a large
part, the source of lexical acquisition might be continuous speech rather than
isolated words (e.g., Christophe, Dupoux, Bertoncini, & Mehler, 1993). The
advantage of a bottom-up acquisition of phonotactics is that sub-lexical repre-
sentations could facilitate lexical acquisition from the start when the rst
words are acquired. The current study cannot provide an ultimate answer to
this question, as here effects of sub-lexical probabilistic and grammar knowl-
edge were tested on nonwords presented in isolation. It would be interesting
for future studies to test whether or not prosodic structure inuences the acqui-
sition of words from continuous speech.
We want to raise attention to the necessity of controlling for speech-rate
in studies that test effects of probabilistic phonotactics on processing. The
need for controlling for speech rate has also been discussed by Lipinsky and
Gupta (2005). They pointed out the relevance of the problem by demon-
strating that the effects of probabilistic phonotactics on processing found by
Vitevitch and Luce (1999) are hard to replicate if speech-rate is controlled
(cf. Vitevitch & Luce, 2005). It is a non-trivial task to estimate the consequences
of the confound for the hypothesis, since words composed of high frequent
biphones are intrinsically spoken faster (Kuperman, et al., 2008). This means
that two factors are correlated which are difcult to disentangle under natural
conditions. However, a certain degree of naturalness may have to be sacriced
under experimental conditions if we want to ensure that predicted effects truly
relate to the manipulated factor. Therefore, future studies should take this con-
found serious and control their test stimuli for speech-rate.
Hypothetically, the results of the current study could also be due to a mere
interaction of phonotactic probability with word length determined by the total
number of phonemes rather than the structural difference between CVC and
Probabilistic phonotactics in lexical acquisition 277
5. Conclusion
Acknowledgments
I would like to thank Ren Kager for supervising this project. Furthermore, I
am grateful to Huub van den Bergh for statistical advice, Theo Veenker for
programming the experiment, Frits van Brenk for assistance with Experiment
2 and Mieneke Langberg for speaking the stimuli. This work has beneted
from discussions with Frans Adriaans, Alexis Dimitriadis, Tom Lentz, Johannes
Schliesser, Keren Shatzman and the audiences at TiN-Dag 2006 and CCSC
2008 as well as from comments by two anonymous reviewers. Thanks to
Bettina Gruber for proof-reading an earlier version of the manuscript. This
research was funded by an NWO grant (277-70-001) awarded to Ren Kager.
278 Natalie Boll-Avetisyan
References
Appendix A
Target Stimuli used in Experiment 1 and 2
High biphone frequency nonwords.
CVC: be:l, bx, de:f, de:k, fo:m, fo:t, xa:k, hs, ja:t, kx, la:m, me:f, ml,
ne:k, ra:l, ra:n, rn, rf, ro:n, s, tx, t, vo:k
CVCC: be:ls, be:rk, bxt, de:ft, de:ks, xa:kt, hst, ja:rt, krk, li:nt, li:ts, me:ft,
mls, ne:ks, rls, rns, rxt, sk, srt, txt, tkt, trm, vo:kt
CCVC: bls, brx, brl, brn, bro:n, dre:k, fro:m, fro:n, fro:t, xra:k, xro:n,
klx, klr, krx, krf, pra:n, prn, sla:m, sla:r, tra:l, tr, trn, twl
CCVCC: blst, ble:rk, brlt, brnt, dre:ks, frls, frns, fro:ns, xra:kt, klrm, kli:nt,
krxt, pli:ts, prk, sla:rs, stk, strt, tra:ls, trk, trxt, trrm, twkt, twlt
Low biphone frequency nonwords.
CVC: br, dyl, hyl, ki:, kux, kym, lux, lum, lt, mp, myt, ryk, sum, sur, tr,
v, wi:, wur, za, zus, zp, zyx, zyl
CVCC: brx, dylk, hylm, ki:s, kus, kymp, lump, lurx, lmp, lmt, mps,
nurn, sums, surp, trf, vi:t, vt, vrf, wumt, wut, wurn, zat, zylm
CCVC: dwi:, dwu, dwur, dw, dwyw, ux, xlun, xlt, knux, knm, kn,
knp, kwi:, smyx, smyt, snum, vlum, vlu, wryk, zw, zwp, zwyx, zwyl
CCVCC: dwi:t, dwumt, dwut, dwurx, dwt, dwrf, dwywt, xlump, xlut,
xlmp, knms, kns, knps, snump, snurn, snurp, vlut, vlmt, xlps, vlyms,
vlps, dwyt, xlumt
Probabilistic phonotactics in lexical acquisition 283
Appendix B
Multilevel Model
Reaction times are nested both within individuals and within stimuli. Hence, a
multilevel model is appropriate for the analysis of the results. Failing to take
the different variance components into account will result in an underestima-
tion of the variance and hence the testing statistics will be too optimistic (the
null-hypothesis is rejected although the data in fact do not support this conclu-
sion). In order to test the hypothesis we dene several dummy variables,
which are turned on if a response is observed in the respective condition and
are turned off otherwise. Let Y ij be the response on item i i 1; 2; . . . ; I
of individual j j 1; 2; . . . ; J, and High biphone frequency(ij ), CCVC(ij )
CVCC(ij ) and CCVCC(ij ) be dummy variables indicating whether the item is
a High biphone frequency nonword with CCVC, CVCC or CCVCC structure
respectively. The interaction between biphone frequency and syllable type can
be estimated by dening combinations of the biphone frequency dummy and
the syllable type dummies. Analogue to analysis of variance a saturated model
with both main and interaction effects can be written as:
The model above consists of two parts: a xed and a random part (between
square brackets). In the xed part the constant (i.e. the intercept) represents the
mean of Low biphone frequency CVC items, and the other effects represent
deviations from this average. So, the reaction time to High biphone frequency
CVC items is (Constant + 1), and the average of low frequency CCVC items
is (Constant + 2), etc.
In the random part three residual scores are dened: e(ij ), ui0 and u0j . The
last term (u0j) represents the deviation of the average reaction time of individ-
ual j from the grand mean, ui0 represents the deviation of item i from the
grand mean, and nally e(ij ) indicates the residual score of individual j and
item i. We assume that these residual scores are normally distributed with an
expected value of 0.0 and a variance of S2e , S2ui and S2uj respectively.
284 Natalie Boll-Avetisyan
Appendix C
Results of Experiment 1
Estimates of Fixed Effects
Parameter Estimate Standard Error Signicance
Intercept (low biphone frequency CVC) 1103.75 44.58 .000
high biphone frequency 14.07 28.84 ns
CCVCC 117.65 28.93 .000
CCVC 78.61 28.91 .007
CVCC 69.07 28.87 .018
high * CCVCC 102.39 40.79 .013
high * CCVC 88.75 40.79 .031
high * CVCC 72.47 40.77 .077
Estimates random Parameters
Parameter Estimate Standard Error Signicance
S2uj (items) 6926.49 1,019.25 <0.001
S2uj (participants) 47,118.08 12,486.48 <0.001
S2e(ij) (residual) 72,772.29 1,473.28 <0.001
Appendix D
Results of Experiment 2
Estimates of Fixed Effects
Parameter Estimate Standard Error Signicance
Intercept (low biphone frequency CVC) 1038.53 29.20 .000
high biphone frequency 38.10 15.17 .012
CCVCC 75.67 15.27 .000
CCVC 79.61 20.19 .000
CVCC 99.09 21.44 .000
high biphone frequency * CCVCC 48.17 20.60 .019
high biphone frequency * CCVC 6.04 28.48 ns
high biphone frequency * CVCC 40.29 29.65 ns
Abstract
The notion of complexity is a central issue in phonology. In acquisition studies as
well as formal analyses, consonant clusters are widely considered as being an area of
particular complexity. Based on the idea that complex areas might be revealed by pro-
duction errors and a later age of acquisition in speakers with more fragile phonological
representations, the present study analyzes consonant productions of children and
adolescents with Specic Language Impairment (SLI). The productions of children
with SLI will be compared to those of French typically-developing children with the
aim to gain a better understanding of the causes and the origin of the difculties of
the former. Our approach assumes that production data reect the development of
childrens phonological competence, in particular involving issues of syllable structure
complexity. Even though not unrelated, the phonetic effects on phonological develop-
ment will be left aside in the present contribution. Our hypothesis is that consonant
clusters are phonologically complex at the syllabic level, and therefore create problems
for speakers with SLI. Our results provide support for this hypothesis and show that
some syllabic positions emphasize the complexity created by consonant clusters.
A large number of studies have addressed this question, yet there has been no
unanimous answer to date. Several ways of computing complexity have been
proposed at each level of phonological analysis, in particular with respect to
the internal structure of segments on the phonemic level (for a recent review,
see Pellegrino 2009). Within the framework of the theory of elements, govern-
ment phonology (Kaye, Lowenstamm and Vergnaud 1990), for example,
provides a metric of phonemic complexity based on the number of elements
that constitute segments. Ever since Trubetzkoy (1931), complexity is further-
more frequently associated with the notion of markedness (Hayes and Steriade
2004, among others), while from a more phonetic point of view, Lindblom
and Maddieson (1988) propose a classication of consonantal systems based
on their articulatory complexity (e.g. the use of a secondary articulation)
286 Sandrine Ferr, Laurice Tuller, Eva Sizaret and Marie-Anne Barthez
into three categories: simple, elaborated and complex systems. Still other
approaches base phonological complexity on the frequency of occurrence of
a phoneme (Zipf 1935).
Beside answers grounded in theoretical considerations, it is equally possible
to nd arguments for dening complexity in acquisition data. Assuming that
the linguistic system of the child grows in complexity, or rather becomes
more complete when developing ([. . .] and then reorganizing the system to
encompass more data, resulting in more complex structure or more complete
or accurate representation Vihman 1996: 4) complexity could be revealed in
terms of age of acquisition. The later a phoneme is acquired, the more com-
plex it presumably would be (Winitz 1969; Ingram 1989; Gierut et al. 1996).
However, this hypothesis raises further questions, in particular about the exact
denition of the age of acquisition of a phoneme. When is a phoneme really
acquired? Many acquisition studies have shown that the position of a segment
in the word plays a major role in the acquisition of phonemes: for example, a
phoneme is acquired earlier in word-initial position than in word-nal position
(see Kirk and Demuth (2005) for English; Lle and Prinz (1996) for German;
or Demuth and Kehoe (2006) for French). Consequently, which criteria should
be used to determine the age of acquisition of a phoneme, and its degree of
phonological complexity?
Phonological complexity is not only inherent in segments, but also depends
on the syllabic structure to which individual phonemes are associated. In this
vain, Cyran (2003) proposed that phonological complexity can occur simulta-
neously on two levels of structure: a complex syllable can contain complex
segments.
The denition of syllabic complexity is somewhat more consensual and
mainly based on the number of constitutive elements of syllabic constituents,
in particular on the number of consonants at the beginning and at the end of
the syllable (e.g. Maddieson 2006). Thus, a CCVC syllable is considered to be
more complex than a CV syllable.
Crucial within this context is the denition of the syllable itself. Different
syllabic frameworks provide greater or fewer constraints on the associations
between segments and syllabic constituents. Taking the example of the coda,
propositions go from the complete absence of the coda constituent (Lowen-
stamm 1996) to a segmental association to this constituent without restrictions
(Blevins 1995) through syllabic models that impose certain conditions on
which segments are able to be associated with the coda (Angoujard 1997).
Given this denition of the syllable, the degree of syllabic complexity highly
depends on the constraints of association. Moreover, additional variables, such
Acquiring and avoiding phonological complexity in SLI 287
as stress, can inuence the quality of the syllable, and therefore its complex
structure.
1. Note here that // in French is not syllabic and should be associated to a consonan-
tal syllabic position.
Acquiring and avoiding phonological complexity in SLI 289
Figure 2. Association to position 3: Italian /fato/ destiny; French /pa/ part; Italian
/fattoe/ factor.
Due to their sonority, some consonants have a special status in this frame-
work. Sonorants, glides and the alveolar fricative /s/ are indeed the unique
consonants that are able to be associated to position 3, and to be the second
part of a branching constituent. As noted previously, these consonants are the
constitutive elements of consonant clusters in French. When associated to
these specic syllabic constituents (specic in the sense that they are not
permitted in all languages), these sounds increase the level of phonological
complexity. Thus, in order to be able to produce /sti/, various constraints rst
have to be acquired: a constraint that allows /s/ to be linked to the third posi-
tion of an empty grid2, and a constraint that allows the rst position to branch
with the sonorant // (see gure 5).
syllabic structures. In the present study, we will look at how these structures
fare, on the one hand, in speakers whose phonology is still emerging (with
the goal of identifying which structures are acquired later, and thus to nd evi-
dence for the link between phonological complexity and age of acquisition),
and, on the other hand, in speakers for whom phonology is a fragile domain,
and thus who ought to be particularly sensitive to phonological complexity.
We believe that, beyond the consonant cluster itself, production problems
result from the ambiguity created by the increase in the number of possible
associations between segments and syllable structure. Our specic goal is to
provide answers for the following three questions: Does the increase in the
number of consonants in a cluster increase phonological complexity? Do
children with SLI have difculties to overcome this increasing complexity?
Do syllabic positions play a role in the difculty of processing consonant
clusters? In a second step we aim to test the ability of the rhythmic grid pro-
posed by Angoujard (1997) to account for the results.
3. Method
3.1. Material
An experimental test, the Syllabic Structure & Segments (SSS) Test, was
designed to investigate the acquisition of consonant clusters in typical and
atypical speech development. The test is a repetition task and an adaptation
for French of the Test of Phonological Structure (TOPhS) created by van der
Lely and Harris (1999). The aim was to replicate for French the results
obtained with this test for English-speaking children with SLI (Marshall,
Harris and van der Lely 2003). As the phonological structure of French is
very different from the structure of English, the SSS test has different properties
than the TOPhS.
The SSS Test tests the production of specic segments: sonorants // and /l/,
glides /j, w, / and the unvoiced alveolar fricative /s/. These segments will be
tested in initial, medial and nal word positions in one- to three-consonant
sequences. This linear typology has been chosen in order to avoid the inuence
of a specic syllabic model in the construction of the test.
Both words and non-words are used in the SSS Test in order to evaluate
whether, and to what extent, children rely on the lexicon. It is well known
that children with SLI in particular tend to seek support in their lexical know-
ledge to compensate their decit (Marshall et al. 2002; Maillart and Parisse
2006). Thus, using non-words in a repetition task is a way of testing phono-
logical structures without any interference of lexical skills.
294 Sandrine Ferr, Laurice Tuller, Eva Sizaret and Marie-Anne Barthez
Moreover, the stimuli (real and non-words) were no longer than two syllables
to restrict any potential impact of the working memory on the repetition task,
as participants with SLI are known to show difculties in processing long
words (Gathercole and Baddeley 1990). Finally, only obstruents were used in
the constitution of non-words and in the choice of real words to avoid any
inuence of late acquisition of fricatives. All items begin with a consonant
and contain one to two vowels. In non-words, the vowel /a/ was primarily
used. Some non-words also contains /e/ or /i/ given the phonotactic constraints
of the glide // in French (Pourin 2003).
It was not possible to cross all variables due to the phonotactic constraints
of French, especially in the constitution of clusters with /w/ and //. For exam-
ple, these glides never occur in nal two- or three-consonant clusters: */pak/,
*/paskw/, or even in nal single position for //: */pa/. Note also that
level 2 of the cluster size always consists in an obstruent + target consonant
sequence (/pati/), and never in a target consonant + obstruent sequence
(/pati/). The latter sequence is only tested in nal position (level 1 + /pat/
vs. level 2 /pat/).
Table 2. Summary of the variables used in the construction of the SSS Test.
word position Initial Intervocalic Final
Size of cluster 1, 2, 3 1, 2, 3 1, 1+, 2, 3
target consonant , l, j, w, , l, j, w, , l, j, w
Type of item real word, real word, real word,
non-word non-word non-word
In this way, 96 test items were created. Some examples of test items are
given in table 3.
3.2. Participants
28 children with SLI3 and 30 typically-developing children participated in the
experiment. Participants with SLI were divided into two groups. The SLI-7-
10 group consisted of 9 children aged 7 to 10 ( = 8;9 = 1;5). The SLI-
11-16 group consisted of 19 adolescents aged 11 to 16 ( = 12;5 = 1;2).
All SLI participants were diagnosed between ages 6 and 9 in the same neuro-
pediatric service at the university hospital in Tours. To ensure the specic
character of their linguistic problem, an audiogram had attested normal hear-
ing skills and a neuropsychological evaluation had tested the level of non-
verbal skills (all subjects showed a Performance IQ > 85). The clinical evalua-
tion led to a pedopsychiatric consultation if deemed necessary. The severity of
the linguistic problem was classied according to a speech-language therapy
evaluation, with the pathology threshold set at 1.65 SD. Each language
domain was evaluated: articulation, phonological production, active lexicon
and syntactic production by means of the test Epreuves pour lExamen du
Langage (EEL, Chevrie-Muller, Simon and Decante 1981), lexical compre-
hension by means of the Test de Vocabulaire Actif et Passif (TVAP, Deltour
and Hupkens 1980), and morphosyntactical comprehension by means of the
Northwestern Syntax Screening Test (NSST, Lee 1971).
The division between the two SLI groups coincides with the division in the
French school system between primary and middle school, the latter beginning
at age 11. At this point in time major changes occur in the way language is
used at school, and thus in the way speech therapy is managed. This age also
corresponds to the entry into adolescence with prominent developmental
changes for the young speakers. Therefore children and adolescents were con-
sidered separately.
Two groups of typically-developing children were investigated: the TD-3
group included 14 typically-developing 3-year-olds ( = 3;3 = 0;5), and the
TD-4 group included 16 typically-developing 4-year-olds ( = 4;6 = 0;2).
We also analyzed results from two further control groups of children aged 7
and 11. These children were all at ceiling performance in the SSS test, and
will therefore not be discussed further.
The study compares two types of development: typically-developing children
and children with specic language impairment (SLI), each of which is
3. Children with SLI show no intellectual, hearing, social or affective decit, no brain
injury and no developmental disorder, but they show a strong decit of verbal
capacities, signicant in light of established standards for their age (Grard 1993;
Leonard 1998).
296 Sandrine Ferr, Laurice Tuller, Eva Sizaret and Marie-Anne Barthez
represented by two age sub-groups, TD-3, TD-4, SLI 7-10 and SLI 11-16,
respectively. The comparison seems to be justied by the fact that typical
phonological development of consonants ends at around the age of 5. Scores
of the SLI participants should not correlate with age, as the development is
supposed to end at the age of 7. In order to verify this assumption, two control
groups of children aged 7 and 11 were also tested and showed results at ceil-
ing performance. This suggests that the phonological development of children
with SLI resembles more closely that of younger children in whom phonol-
ogical development is still in progress, and their errors should therefore be
comparable to children aged 3 to 4. Furthermore, as typical phonological
development between age 3;0 to 4;11 is exceedingly rapid, we deemed it
prudent to study these children in groups spanning no more than 12 months.
Based on these considerations we therefore compare groups of participants
with SLI with a relatively wide age range to groups of TD participants with a
narrow age range.
A Shapiro-Wilk normality test was conducted in order to test the normal
distribution of our two main groups (TD and SLI) on 18 main variables,
including over-all test success rate. The distribution is considered as normal
if p > .05. For the SLI group the distribution was normal for half of the tested
variables, including success rate on the test (W = .95783, p = .30937), while
for the TD group, the distribution was only normal for three variables, and
non-normal for the success rate on the test (W = .80047, p = .00007). We there-
fore decided to use non-parametric tests to conduct the statistical analysis of
our results.
To verify the validity of our sub-groups based on developmental con-
siderations, we conducted a Kruskal-Wallis ANOVA. The Kruskal-Wallis test
showed a prominent group effect on the variable test success rate (H = 23.89
p = .000). The effect was found on every tested variable except three, conrm-
ing the importance of the sub-groups in the two types of development.
To further clarify these outcomes by showing an effect between the two
types of development (TD versus SLI), a Mann-Whitney Test was conducted
on the variable development. Results showed a signicant difference between
TD and SLI groups (U = 294, Z = 1.963167, p = 0.04).
Finally, a Spearman R-Test was carried out to evaluate the correlation
between age and test success rate, and thus validate our comparison of
single-year age ranges for the two groups of TD children with the groups of
wide-age ranges for children with SLI. Results showed no correlation between
test success rate and age in children with SLI, (r = 0.205988), but a positive
correlation in the TD group (r = 0.617706). This implies that for the typically-
developing participants age is a determining factor for success rate, while, in
conformity with our hypotheses, age is not important for participants with SLI.
Acquiring and avoiding phonological complexity in SLI 297
3.3. Procedure
The participants were tested individually in a quiet room and were asked to
repeat each item after the experimenter. The subjects were told at the begin-
ning of the session that some of the items were non-words, but knew that
they had to repeat every item regardless. In order to verify whether the task
had been understood correctly, ve practice items were presented before the
start of the experiment. Each item was presented once, and in the same order
for all participants. Responses were recorded on a Zoom H4 audio-recorder,
and transcribed using broad phonetic transcription in IPA. Each transcription
was veried by another expert transcriber. The responses were coded in CHAT
format (CHILDES system, MacWhinney 2000). Each sample was hand coded
by a rst coder and subsequently veried by a second expert coder. Productions
were coded and extracted according to each variable targeted (i.e. position in the
word, target consonant, size of the cluster, and word vs. non-word) on a target
line in parallel to a produced line. Errors were coded on a specic error line.
Figure 8. Success rate in the production of single consonants in three positions: Initial
1 versus Medial 1 versus Final 1 (as % of total of correct production per
type of position for each group of speakers).
Figure 9. Success Rate for Initial and Intervocalic Position (as % of total of correct
production per type of position for each group of speakers).
Note that the standard deviation also increased with an increase in the
number of consonants, in particular for participants with SLI. Heterogeneity is a
well-known factor of groups with SLI and especially surfaces in complex areas.
Inter-group comparison shows that children aged 3 behave similarly to par-
ticipants with SLI and no signicant differences between them and the two
SLI groups could be found, whereas children aged 4 performed signicantly
better than the SLI 7-10 group (p < 0.05 for Intervocalic 3 and p < 0.0001 for
Initial 3). Thus, speakers with SLI resemble more closely typically-developing
3-year-olds than typically developing 4-year-olds when they are confronted
with the production of more complex consonant structures.
Figure 10. Success Rate for Final Clusters (as % of total of correct productions per
type of position for each group of speakers).
An interesting point to note are the productions of nal clusters with the
sonorous segment preceding the obstruant (as in /pat/, Final 1+) as compared
to those in which the sonorant is in second position (as in /pat/ Final 2) by
the group SLI 7-10. This group produced these two consonant sequences in
the same way (Final 1+, 69.8%, SD 38.7; Final 2, 68.5%, SD 31.7) without
any signicant difference (Z = 1.2, p = 0.2). Note also that standard deviations
show a great heterogeneity in the way in which participants treated these
clusters. TD-3 produced clusters like /pat/ (86%, SD 17.7) better than clusters
like /pat/ (81%, SD 18.5) (Z = 1.09, p = 0.27). On the contrary, adolescents
with SLI produced nal clusters like /pat/ (86.5%, SD 21.5) signicantly better
than those like /pat/ (75.4%, SD 21.8; Z = 2.2, p < 0.05).
These results support the hypothesis that the origin of phonological com-
plexity lies beyond the mere number of elements in the cluster. In both types
of sequences, the cluster is constituted by two consonants, a sonorant and an
obstruant, but the difference between the sequences is the order of the con-
sonants in the cluster, and (therefore) the way they are associated to the syllable
(gure 11).
Figure 12. Omission and substitution of target consonants (as % of the total of all
strategies used).
substitution (38%). Notice that the difference between 3-year-old children and
adolescents with SLI is highly signicant for the use of consonant omission
(U = 63, Z = 2.55, p < 0.01). Figure 12 suggests that substitution becomes
more frequent with age, since children at age 4 use it more than at age 3 or
than speakers with SLI (substitution: TD-4, 48%; SLI 7-10, 42.6%; SLI 11-
16, 44.75%; omission: TD-4, 26.8%; SLI 7-10, 26.7%; SLI 11-16, 23.1%).
This suggests that syllabic structures in children with SLI are complete (i.e.
have all the required syllabic constituents), but that the difculty is centered on
segmental association. Indeed, omitting a segment can be interpreted as the
absence of a syllabic constituent. On the other hand, substitution allows the
speaker to adapt the segmental content to the syllable according to the asso-
ciation constraints that weigh on these structures in a given stage of develop-
ment. For example, /j/ is very poorly produced in a context like /gajt/ by
all groups (success rate: TD-3, 14%; TD-4, 50%; SLI 7-10, 22%; SLI 11-16,
53%) and is regularly substituted by a vowel (/gajt/ ! /gat/). This type of
substitution is a reection of the ambiguity created by such a consonantal
sequence. The glide /j/ tends to behave like a consonant in French (Pourin
2003). Its association with position 3 is thus still problematic for the most
fragile speakers and suggests that the association constraint governing this
position is especially difcult to process.
Besides the predominant use of omission and substitution, children also
used other strategies, to a lesser extent, but with a remarkable frequency in
Acquiring and avoiding phonological complexity in SLI 303
Figure 13. Other strategies used to replace target consonants (as % of total of all
other strategies).
speakers with SLI. Moreover, adolescents with SLI used signicantly more of
these other strategies than the TD-3 group (U = 78.5, Z = 1.98, p < 0.05)
and the TD-4 group (U = 80.5, Z = 2.37, p < 0.05). We take this to mean
that adolescents with SLI developed the use of those strategies in order to
produce meaning. In other words, these other strategies function as direct
compensatory strategies for their decit.
The use of chaotic disturbances in speakers with SLI (SLI 7-10, 7%; SLI
11-16, 5.5%) is an interesting phenomenon. This strategy is entirely missing in
TD children aged 4, and less prevalent in children aged 3 (3%), though there is
no signicant difference in relative frequency of disturbances in the TD-3 group
as compared to the two SLI groups (TD-3 vs. SLI 7-10, U = 46, Z = 1.13,
p = 0.19; vs. SLI 11-16, U = 102, Z = 1.34, p = 0.18), and no signicant differ-
ence between the two SLI groups themselves (SLI 7-10 vs. SLI 11-16, U = 80,
Z = 0.3, p = 0.76).
What is surprising about the use of these strategies is that they do not
necessarily simplify the syllabic structures. Indeed, if lexicalization is a means
to circumvent a lexical difculty (i.e. an unknown word), gliding from a non-
word to a word is not always a phonological simplication (e.g. /gas/ ! /kask/
helmet)4. As for structure disturbances, these consist of total disorganization
of the consonantal and syllabic structure: the consonantal sequence seems
to be produced at random, and the resulting form is sufciently far from the
target to make the phonological analysis of the processes at work extremely
difcult (/kapa/ ! /tadba/). These disturbances affect mostly consonants,
very rarely vowels. Moreover, as phonological complexity grows in the word,
the rate of productions of such phenomena increases as well (Carms 2007).
These strategies seem to reveal the difculty speakers with SLI have in dealing
with phonological complexity (see Ferr et al. (2010) for a more complete
analysis of those strategies, in particular lexicalization and chaotic disturbances).
Our results converge on the idea that difculty is located at the level of
the association constraints between the segmental and the syllabic lines. Con-
sequently, this difculty increases as the number of elements increases. The
analysis of compensatory strategies completes the study of consonant clusters:
success rate is closely linked to the ability to connect segments and syllables,
in other words, to the capacity to implement association constraints.
5. Conclusion
4. In Ferr et al. (2010), we hypothesize that the signicant increase in the use of
lexicalization in older children with SLI is due to the fact that they are no longer
manipulating phonological structures with which they have difculties, but rather
are producing the most efcient lexical items they can. Lexicalization is therefore a
way to overcome the difculty for adolescents with SLI who then use their lexical
stock for this.
Acquiring and avoiding phonological complexity in SLI 305
References
Angoujard, Jean-Pierre
1997 Thorie de la syllabe. Paris: CNRS ditions.
Blevins, Juliette
1995 The syllable in phonological theory. In John Goldsmith (ed.), The
handbook of phonological theory. Oxford: Blackwell, 245306.
Bortolini, Umberta and Lawrence Leonard
2000 Phonology and children with specic language impairment: status of
structural constraints in two languages. Journal of Communication
Disorders 33:2, 131150.
Carms, E.
2007 Traitement de la complexit phonologique chez les adolescents dys-
phasiques. Master 2 dissertation; Research in Cognition and Devel-
opment. University of Franois-Rabelais, Tours.
Chevrie-Muller, Claude, A.M. Simon and P. Decante
1981 Epreuves pour lexamen du langage (EPEL). Paris: Editions du
Centre de Psychologie Applique.
306 Sandrine Ferr, Laurice Tuller, Eva Sizaret and Marie-Anne Barthez
Cyran, Eugeniusz
2003 Complexity Scales and Licensing Strength in Phonology. Lublin:
Wydawnictwo KUL.
Deltour, Jean-Jacques and Dominique Hupkens
1980 Test de vocabulaire actif et passif pour enfants de 5 8 ans (TVAP
5-8). Braine-le-Chteau : Editions de lApplication des Techniques
Modernes (ATM).
Demuth, Katherine and Margaret Kehoe
2006 The acquisition of word-nal clusters in French. Journal of Catalan
Linguistics 5, 5981.
Ferr, Sandrine, Laurice Tuller, Anne-Gaelle Piller and Marie-Anne Barthez
2010 Strategies of avoidance in (a)typical development of French. In L.
Dominguez and P. Guijarres-Fuentes (eds.), Selected proceedings of
the Romance Turn III, Cambridge: Cambridge Scholars Publishing.
Fry, Caroline
2001 Markedness, faithfulness, vowel quality and syllable structure in
French, Linguistics in Postdam 16, 131.
Gallon, Nichola, John Harris and Heather van der Lely
2007 Non-word repetition: an investigation of phonological complexity in
children with Grammatical SLI. Clinical Linguistics & Phonetics 21,
43555.
Gathercole, Susan and Alan D. Baddeley
1990 The role of phonological memory in vocabulary acquisition: A study
of young children learning new names. British Journal of Psychology
81:4, 439454.
Grard, Christophe.-Loic
1993 LEnfant dysphasique, De Boeck Universit, Bruxelles.
Gierut, Judith, Michle Morrisette, Mary Hughes, and Susan Rowland
1996 Phonological treatment efcacy and developmental norms. Language,
Speech and Hearing Services in Schools 27, 215230.
Harris, John and Edmund Gussmann
1998 Final codas: why the west was wrong. In Eugeniusz Cyran (ed.),
Structure and interpretation in phonology: studies in phonology.
Lublin: Folia, 139162.
Hayes, Bruce and Donca Steriade
2004 Introduction: The Phonetic Basis of Phonological Markedness. In
Bruce Hayes, Robert Kirchner and Donca Steriade (eds), Phoneti-
cally-Based Phonology. Cambridge: Cambridge University Press.
132.
Ingram, David
1981 Procedures for the phonological analysis of childrens language.
Baltimore: University Park Press.
Ingram, David
1989 First Language Acquisition. Cambridge: Cambridge University
Press.
Acquiring and avoiding phonological complexity in SLI 307
Marshall, Chloe, Susan Ebbels, John Harris and Heather van der Lely
2002 Investigating the impact of prosodic complexity on the speech of
children with Specic Language Impairment. In R. Vermeulen and
A. Neeleman (eds), UCL Working Papers in Linguistics 14, 4366.
Orsolini, Margherita, Enzo Sechi, Cristina Maronato, Elisabetta Bonvino and Alessandra
Corcelli
2001 The nature of phonological delay in children with specic language
impairment. International Journal of Language and Communication
Disorders 1, 6390.
Pellegrino, Franois
2009 De lidentication des langues la complexit phonologique. Habil-
itation diriger des recherches, Sciences du Langage, Universit
Lumire Lyon 2.
Pourin, Delphine
2003 tude phonologique dclarative des semi-voyelles du franais. As-
pects synchroniques et diachroniques, Ph.D. dissertation, University
of Nantes.
Sahlen, Brigitta, Christina Reuterskiold-Wagner, Ulrika Nettelbladt and Karl Radeborg
1999 Non-word repetition in children with language impairment: pitfalls
and possibilities. International Journal of Language and Communi-
cation Disorders 34:3, 337352.
Trubetskoy, Nikolay
1931 Gedanken ber Morphonology. Travaux de Cercle Linguistique de
Prague 4, 5361.
van der Lely, Heather and John Harris
1999 The Test of Phonological Structure. London UK: University College
London. Unpublished test available from the rst author, Centre for
Developmental Language Disorders and Cognitive Neuroscience,
Department of Human Communication Science.
Vihman, Marilyn
1996 Phonological development: the origins of language in the child.
Cambridge: Blackwell.
Winitz, Harris
1969 Articulatory Acquisition and Behavior. New York: Appleton-Century-
Crofts.
Zipf, George Kingsley
1935 The Psycho-Biology of Language: An Introduction to Dynamic Phi-
lology, Cambridge: MIT Press.
Part IV. Assimilation and reduction in connected
speech
Articulatory reduction and assimilation in n#g
sequences in complex words in German1
Pia Bergmann
Abstract
This paper investigates alveolar to velar assimilation in nasal#stop sequences across
phonological word boundaries in complex words in German by means of electro-
palatography (EPG). Independent variables are word frequency, accentuation, and
vowel quantity in the rst part of the complex word. We present evidence for gradient
reduction as well as categorical deletion of the alveolar nasal. Word frequency, vowel
quantity and accentuation inuence articulatory reduction of the alveolar nasal signi-
cantly in particle verbs, while compounds are less affected by the independent variables.
Progressive and conservative speakers were identied with respect to assimilation, as
well as speaker-specic assimilatory strategies.
1. Introduction
This paper deals with the inuence of lexical frequency and prosodic structure
on articulatory reduction of n#g-sequences in complex words in German, e.g.
in words like ein#geben to enter vs. ein#gelen to gel in. The main research
questions are whether morphologically complex high-frequency lexical items
are produced with weaker internal prosodic boundaries than low-frequency
items, and whether accented items are more protected against boundary weaken-
ing than unaccented items. These questions were answered by using acoustic
and articulatory (EPG) methods to investigate six speakers production of test
and control items embedded into carrier sentences. Speaker-specic assimila-
tory strategies will be discussed by presenting the productions of three speakers
1. Thanks to two anonymous reviewers, Phil Hoole, and Peter Auer for many helpful
comments on a previous version of the paper. Furthermore, I want to thank Doris
Mcke, Martine Grice and Marion Jaeger for their help with speech material selec-
tion and data analysis, as well as Raphaela Kirst for labelling most of the data. I
am grateful to Sascha Wolfer for helping me with the statistical analysis.
This study is part of a larger project on frequency effects on assimilation and
other edge-marking phenomena funded by the DFG, Priority Programme 1234:
Edge marking in German compounds: Frequency effects and prosodic constituents
(AU72/181), 20062009.
312 Pia Bergmann
in detail. In this section we will rst introduce the notion of the phonological
word and relate it to the aspects of frequency and prosodic structure. We will
then report current ndings on speaker-specic behaviour in reduction and
assimilation and nally explain the chosen dependent variables.
In the generative framework of prosodic phonology, the phonological or
prosodic word (henceforth pword) is the domain that maps morphological
entities onto phonological/prosodic entities (Nespor & Vogel 22007). A pword
boundary can block the application of phonological rules, for example resylla-
bication as a means of syllable onset maximization. Consider the morpholog-
ically complex word gier + ig greedy, which is syllabied as gie.rig despite
its internal morphological boundary, and thus is considered to constitute one
phonological word: (gie.rig) . In the word lieb + lich mellow, however,
resyllabication across the morphological boundary is blocked (*lie.blich), so
that the string is analyzed as consisting of two separate phonological words:
(lieb) (lich) (cf. Hall 1999; Lhken 1997; Wiese 1996). Likewise, the pword
boundary is relevant for the occurrence of assimilations in n#g-sequences: In
German, velar nasal assimilation is obligatory in word-internal /ng/ or /nk/-
sequences, but it is optional across the boundary of the phonological word,
e.g. [garn] Hungary vs. [n#grn], or [#grn] reluctantly. Although
external factors like speaking style or speech rate may constrain the occurrence
of nasal velarization in these cases (cf. Wiese 1996), this type of assimilation
is considered to be a rule-based process that should apply equally to each
lexical item. This view is challenged by usage-based approaches to language.
These share the assumption that aspects of language use may inuence the
way in which language is produced, perceived and maybe even processed
and mentally stored. Therefore, lexical items with different performance charac-
teristics may be treated differently in a systematic way, and phonological pro-
cesses do not have to apply across-the-board.
One important performance characteristic is lexical frequency, which has
been widely discussed within usage-based approaches. With regard to this
paper, it has been shown that assimilation and reduction of speech sounds are
enhanced by frequency of occurrence (e.g. Bush 2001; Bybee 2001, 2006,
Phillips 2006, Pierrehumbert 2001). Relatively few studies have investigated
frequency effects on segmental reduction from a production point of view by
articulatory methods, however. Jaeger & Hoole (2011) present evidence from
an EMA-study for stronger articulatory reduction of tongue tip movement, i.e.
the alveolar nasal, in /n#k/ sequences, when the rst segment is the right edge
of a high-frequency function word (dann#kann) as compared to a collocation
with a low-frequency content word in rst position (Zahn#kann). They sug-
gest, however, that it is the co-occurrence frequency of the string dann#kann
Articulatory reduction and assimilation in n#g sequences 313
rather than the lexical frequency of the single item dann which is responsible
for assimilation effects. In an EPG-study, Stephenson (2003) reports frequency
effects on alveolar to velar assimilation in stop-stop sequences across a word
boundary in English compounds. Her ndings show that speakers react differ-
ently to high-frequency words. While some speakers reduce the duration of
the segment sequence, other speakers change the place of articulation. Mcke
et al. (2008) found signicant effects of word frequency on n#g-sequences for
only two out of ve speakers. Both speakers make use of temporal variables
(temporal reduction or deletion of alveolar constriction). Kirst (2008), on the
other hand, did not nd signicant effects of word frequency on n#g / n#k-
sequences in her EPG-study of two speakers.2 In the present study, we approach
this question by comparing the production of high-frequency items to that of
low-frequency items.
In addition to frequency, prosodic structure will be taken into account as
an independent variable for assimilation and reduction. Prosodic theories agree
on the fact that the continuous speech stream is subdivided into hierarchically
organized prosodic constituents by means of phonological and/or phonetic
characteristics (cf. Cho et al. 2007; Fougeron & Keating 1997; Keating et al.
2003; Kuzla 2009; Nespor & Vogel 2007). For instance, as mentioned above,
assimilation in German can be blocked by a phonological word boundary. The
lack of a possible assimilation therefore serves as a boundary marker for the
prosodic structure, in this case for the constituent phonological word. Boun-
daries of prosodic domains have received a rather large amount of attention in
prosodic research, showing that domain-initial segments as well as the heads
of domains are articulatorily strengthened (cf. Keating et al. 2003). Domain-
nal elements, on the other hand, tend to be weakened and vulnerable to
assimilations or reductions, which is especially true for coronals (cf. Kohler
1976, 1990). The relation of the ndings to the present study is twofold: First,
we want to test the inuence of accentuation on the occurrence of assimila-
tions and reductions. According to the literature, accented items being the
heads of intonational phrases are expected to be less reduced than unaccented
items. Second, in this study assimilation and reduction of the /n/ in the /n#g/-
sequence will be regarded as a weakening of the pword boundary. Here, the
aspects of prosodic structure and lexical frequency both play a role: Since the
clear separation of the word constituents is supposed to ease lexical access,
2. Both studies overlap with the present study in that the speech material is partly
spoken by the same speakers and partly consists of the same test words. The
speech material differs with respect to prosodic conditions and the segmental con-
texts, which are more restricted in Mcke et al. (2008) and Kirst (2008).
314 Pia Bergmann
frequency hit at the time of the recording. (The gap in the test item set is
due to the fact that there was no high-frequency compound available for this
context; the vowel length effect can thus only be tested with the particle
verbs.)
All test and control items were embedded into carrier sentences. The sentences
with the test items varied with respect to prosodic structure so that each test
item occurred in accented position as well as in unaccented position. Deaccen-
tuation was achieved by manipulating the information structure of the test
sentences, specically by introducing a negation particle before the test item
(see sentences b and d below). We constructed a context consisting of a ques-
tion-answer pair for the test items and control_1 items. Sentence position of
the test or control item was kept constant so as not to interfere with positional
effects, especially nal lengthening or glottalization at the IP-boundary. All
test items occur as the last syntactic constituent in a prepositioned if-clause
(wenn-Satz), thereby triggering non-nal rising intonation. The items across a
syntactic boundary and control_2 items were embedded as last lexical ele-
ments into questions so that they were also produced with rising intonation.
Three repetitions of each target word were collected. Accentuation was varied
Articulatory reduction and assimilation in n#g sequences 317
for test items only in order to keep the corpus to a feasible size.3 No dummy
sentences were included. All in all, the sample comprises 81 tokens per
speaker (n = 486). The sentences below exemplify the test sentences for the high-
frequency test item ein#geben, and the low-frequency test item ein#gelen, both
in accented and unaccented position, as well as the test sentences for control_1
items.
EINGEBEN and EINGELEN:
a. HF_acc. [Warum soll ich mir den Zahlenkode merken?]
Wenn du ihn EINGEBEN kannst, geht die Tr automatisch auf.
(Why should I take note of the number code?)
(If you can enter it, the door will open automatically.)
b. HF_unacc. [Warum mssen wir den richtigen Kode zum ffnen der Tr
eintippen?]
Wenn wir ihn NICHT eingeben, geht die Tr nicht auf.
(Why do we have to enter the correct code to open the door?)
(If we dont enter it, the door wont open.)
c. LF_acc. [Warum stylst du deine Haare so auf?]
Wenn man sie EINGELT, hlt die Frisur lnger.
(Why are you styling your hair like that?)
(If you gel it, the hairstyle holds longer.)
d. LF_unacc. [Warum stylst du deine Haare so auf?]
Wenn man sie NICHT eingelt, hlt die Frisur nicht so lange.
(Why are you styling your hair like that?)
(If you dont gel it, the hairstyle wont hold as long.)
e. control 1.1: Wollen wir ihn an dem Abend EINEHREN?
(Shall we honour him in that evening?)
g. control 1.2: Sollen wir den Weg FREIKEHREN?
(Should we sweep the path clear?)
For the recordings speakers were seated in a sound-proof room. The data
were presented visually in random order on a computer screen; the surround-
ing context of the carrier sentence was also presented auditorily. Speakers had
3. The speech material that had to be read out loud contained another subset of test
and control items for t#g/k sequences, which is not part of the present study. Addi-
tionally, the material consisted of items for the segment sequences n#b/p, t#b/p,
#, and s#. These were recorded in separate sessions, though.
318 Pia Bergmann
to read the sentences that were given in bold letters out loud. All speakers had
a short training phase before the recording which contained ten sentences with
the same test design as the test items. This was done primarily to ensure that
the items in the unaccented condition were produced correctly. If a speaker
failed to produce these items correctly, he/she was asked to read the sentence
again and highlight the negation particle.
Constriction overlap (dotted line) was calculated by subtracting the end of the
alveolar constriction (2) from the start of the velar constriction (1). Overlap
therefore yields negative values, whereas lag between constriction phases yields
positive values. We calculated the ratio of the velar constriction in the acoustic
nasal (broken line) by subtracting the release of the alveolar constriction from
the end of the acoustic nasal and dividing the result by the duration of C1 (((3-
2)*100)/C1). This measurement indicates the amount of velar nasal in C1.
320 Pia Bergmann
Assistant Software. The relevant region for an alveolar segment was dened as
rows 1 to 3, and columns 2 to 7. The velar region was dened as rows 7 to 8,
and columns 3 to 6 (cf. g. 3). An alveolar constriction was labelled for all
frames in which any of the rows in the alveolar area were closed. A velar con-
striction was labelled for all frames in which 80% of the relevant area was
closed. We introduced this threshold because many of the test and velar con-
trol items had no complete closure in the relevant area, presumably due to
retraction of the velar contact beyond the reach of the articial palate.
2.5. Hypotheses
Our hypotheses are that high frequency leads to stronger reduction and assimi-
lation than low frequency. Unaccented items will be more reduced and assimi-
lated than accented items, and we suppose the same to hold for long-vowel
items compared to short-vowel items. In detail, we hypothesize that
(1) the number of deletions of the alveolar closure is higher in high-
frequency items, unaccented items, and long-vowel items,
(2a) the gradient durational measurements SD, ACD, and VN decrease with
high-frequency items, unaccented items, and long-vowel items,
(2b) the gradient durational measurement CO increases with high-frequency
items, unaccented items, and long-vowel items,
(3a) speakers will have shorter and less linguo-palatal contact in the alveolar
region when moving from one end of the reduction scale to the other,
(3b) speakers differ with respect to their assimilatory strategies.
322 Pia Bergmann
3. Results
Figure 4. Deletion of alveolar closure across Figure 5. Deletion of alveolar closure plotted
speakers. against vowel quantity.
short vowel, but not in those with a long vowel: In particle verbs with a short
vowel, deletions of the alveolar closure occur signicantly more often in unac-
cented items (2(1) = 4.15, p < 0.05), and in high-frequency items (2(1) = 6.76,
p < 0.05). Moreover, the number of deletions increases signicantly with the
reduction scale (2(3) = 11.16, p < 0.05). Figures 6 and 7 illustrate the distribu-
tion of the deletions of the alveolar closure according to the reduction scale for
particle verbs with a long vowel (g. 6), and particle verbs with a short vowel
(g. 7). The reduction scale combines the independent variables accentuation
and frequency and ranges from low-frequency words in accented position to
Figure 6. Deletion of alveolar closure plotted Figure 7. Deletion of alveolar closure plotted
against the reduction scale against the reduction scale (V, n = 71).
(V:, n = 72).
324 Pia Bergmann
Figure 8. Interaction of frequency with vowel Figure 9. Interaction of accentuation with vowel
quantity in particle verbs (ACD). quantity in particle verbs (ACD).
high-frequency words in unaccented position (cf. section 2.3 for a more detailed
explanation of the reduction scale).
3.1.2. Gradient reduction and assimilation
The next section deals with the results of the statistical analysis of the gradient
dependent variables alveolar constriction duration (ACD), segment sequence
duration (SD), ratio of velar constriction in [n] (VN) and constriction
overlap (CO).
Figure 10. Reduction scales for ACD (error bars: 95% CI).
Figure 11. Interaction of frequency with vowel quantity in particle verbs (SD).
Figure 13. Reduction scales for VN (particle verbs, error bars: 95% CI).
Figure 14. Interaction of frequency with vowel Figure 15. Reduction scale for CO (compounds,
quantity in particle verbs (CO). error bars: 95% CI).
(Negative values = overlap, positive
values = no overlap).
3.1.2.5. Summary
To summarize, the independent variable vowel quantity had signicant main
effects on all dependent variables except for CO (constriction overlap). Test
items with short vowels are characterized by longer durations of alveolar
constrictions and segment sequences, as well as lower percentages of velarity
in C1. They additionally have signicantly less deletions of the alveolar con-
striction phase, as compared to the test items with a long vowel.
Articulatory reduction and assimilation in n#g sequences 329
Rows 4 to 6 present the EPG pattern of the linguo-palatal contact in the n#g-
sequence for each realisation of the test item. The EPG pattern thus contains
the contact for the whole alveolar#velar sequence; the numbers given for each
single contact represent the percentage of sustained contact in the sequence.
Row 7 nally demonstrates the EPG patterns for all realisations of the respective
control item ([n]). Since the cumulative pattern refers to the whole segment
sequence, the percentages are not directly comparable to the control items. In
an ideal case, we would expect percentage values for the test items to be
around 50%, whereas those for the control items should be near 100% in the
alveolar region. Rather than taking into account these ideal values, we will
refer to the percentage values only when comparing the different realisations
of the test item. When drawing comparisons with the control items, only the
location of the contact, but not its duration will be taken into account.
Figure 16 presents the short-vowel particle verbs produced by speaker KA:
Figure 22. Velar controls for DM with least Figure 23. Velar controls for KR with least con-
contact (a) and most contact (b) tact (a) and most (b) contact
Articulatory reduction and assimilation in n#g sequences 337
DMs sequence differs considerably from that of KR (cf. g. 25): There are
no dynamic changes in the alveolar region with the exception of one contact
(row 2, left column 2) that is deleted in palate 449. Lateral contact up to row 2
is kept constant until the end of the n#g-sequence. This pattern does not
resemble any of the velar control items of speaker DM. (This is not to deny
that some of the realisations of DM resemble her velar control items [cf. the
accented high-frequency long-vowel particle verbs], but most of the realisations
are static compared to the realisations of KR. This can be seen when taking into
account the percentages in the cumulative EPG palates of gures 18 to 21.)
Concluding the comparison between DM and KR we can say that
although both are progressive with respect to deletions (after long vowels at
least) the two speakers have different assimilatory strategies. DM often
produces a segment that resembles neither the typical alveolar nor the velar
segment (as represented by the control items), and sustains this segment
statically. KR on the other hand seems to shift dynamically from more front
contact to back contact. It must be pointed out that DM produces blends
between the alveolar and velar segments as well as segments that are similar
to her velar control items. This indicates that DM applies categorical assimila-
tions (i.e. the velar segment) as well as gradient assimilations.
Comparing KR to KA, we can state that they differ considerably too,
despite their similar number of alveolar closure deletions. KA has comparable
strategies for long-vowel and short-vowel particle verbs. She reduces contact
gradiently (more or less) along the reduction scale. KR on the other hand has
hardly any reductions in short-vowel particle verbs, but many more reductions
and deletions in long-vowel particle verbs, so that she seems conservative in
one part, but progressive in the other.
The aim of this study was to investigate the inuence of the independent vari-
ables frequency, accentuation and vowel quantity on the occurrence
of reduction and assimilation in n#g sequences in binary complex words in
German. Additionally, we were interested in speaker-specic strategies for
reduction and assimilation.
Our ndings show that most of our hypotheses stated in section 2.5 are
generally conrmed: Statistical analysis for all speakers yielded signicant
results for some of the dependent variables for frequency, accentuation
and vowel quantity in the expected direction, i.e. we encounter more reduc-
tion and assimilation in high-frequency items, unaccented items and items
with a long vowel. With respect to the investigated dependent variables, we
observe that alveolar constriction duration (ACD), segment sequence dura-
tion (SD), and velar nasal (VN) are affected by all independent variables,
whereas constriction overlap (CO) is only affected by frequency. Frequency
effects on CO are hardly straightforward, though. This may partly be due to
the fact that the data base for statistical analysis was strongly diminished due
to a high number of alveolar and velar constriction deletions (cf. 3.1.2.4).
Deletion of the alveolar constriction is affected by vowel quantity, varied
signicantly for speakers, and showed a signicant distribution along the
reduction scale. There are some interesting restrictions to these general nd-
ings, though. First, our results show that lexical category has a major role to
play in reduction and assimilation: Particle verbs are inuenced by all inde-
pendent variables and for all dependent variables except for CO, whereas com-
pounds are more conservative and are signicantly inuenced by accentuation
Articulatory reduction and assimilation in n#g sequences 339
for the durational variables ACD and SD only, as well as by frequency for
CO. Second, within the particle verbs, the observed effects for accentuation
and frequency are attributable to particle verbs with a short vowel only. Since
particle verbs with a long vowel have signicantly more deletions of the
alveolar constriction and more durational reductions than their short-vowel
counterparts, we hypothesize that the reductions are simply too strong to leave
any room for effects of accentuation and frequency. These can only occur in
short-vowel particle verbs, where the nasal segment is long enough to allow
a range for variation. The stronger reluctance of noun compounds to undergo
reductions and assimilations across the word boundary can be interpreted as an
effect of the type of words that enter the composition: While noun compounds
consist of two content words, particle verbs are composed of a function word
and a content word. The rst part of the complex word may therefore be more
vulnerable to reductional phenomena in particle verbs. Moreover, the investi-
gated particle verbs are more frequent than the noun compound which supports
the higher degree of reduction. In this respect the introduction of the reduction
scale proved to be useful. It enables us to demonstrate the effects of accentua-
tion and frequency for each word group separately and yielded signicant
results for the distribution of deletions of the alveolar constriction in particle
verbs with a short vowel.
To conclude, accentuation has a signicant impact on the articulatory be-
haviour of our subjects, even in noun compounds, thereby corroborating many
similar ndings in the realm of prosodic phonology. Vowel quantity has
the most robust effect on the gradient durational variables as well as on the
categorical variable (cf. Bergmann subm. for similar results on the reduction
of geminates across the pword boundary). The result of the durational variables
can be interpreted as a compensatory shortening or lengthening of the last
segment with respect to the domain of the syllable, which ts well into a
non-hierarchical model of syllable structure (cf. Clements & Keyser 1983).
This cannot explain, however, why speakers consistently delete the alveolar
closure less often in short-vowel items than in long-vowel items, and why
they produce more velarity in the nasal segment after long vowels than after
short vowels. The stronger assimilation of the nasal segment after long vowels
hints at an articulatory reduction that can be better explained in a hierarchical
constituent model of syllable structure, where the segment after a diphthong is
only loosely attached to the coda or the syllable (cf. Lenerz 2002). The differ-
ent articulatory treatment of the nasal segment can therefore be explained by
their position in the constituent model. Our results are inconclusive, however,
as to whether a constituent model like that of Lenerz (2002) or a syllable
cut model of syllable structure should be preferred. Both assume that the
340 Pia Bergmann
consonant after a short vowel has closer contact to the vowel, and is more
strongly integrated into the syllable, than a consonant that follows a long
vowel, or diphthong (cf. Auer, Gilles & Spiekermann 2002; Becker 1998;
Hoole & Mooshammer 2002; Lenerz 2002). Another crucial difference be-
tween the long-vowel and short-vowel items concerns their segmental struc-
ture: The former consist of an // as a nuclear vowel, while the latter consist
of an /a/. The open vowel of the diphthong may enhance a more open and
retracted production of the whole syllable, so that alveolar closure is omitted
more often in long-vowel items. Thus, the independent variable vowel quantity
is confounded with segmental structure, which could explain part of the strong
effect on the categorical variable. It should be mentioned, too, that vowel
quantity is not only confounded with segmental structure, but possibly also
with frequency (see below). With respect to frequency, the hypothesis grounded
on usage-based theories and exemplar theoretical work that high-frequency
items are produced with more reduction is conrmed for the particle verbs,
especially those with a long vowel. It should be noted that vowel quantity is
correlated with extremely high frequency in our study. Thus, the strong reduc-
tions and assimilations in particle verbs with a long vowel may not only be
attributable to syllable structure, as explained above, but may be additionally
enhanced by extremely high absolute frequency (2,920,000 hits for ein#geben
vs. 340,000 hits for hin#geben). Our results suggest that future work on fre-
quency effects on reductional phenomena should crucially take into account
vowel length and syllable structure. It moreover would be worthwhile to dis-
entangle vowel quantity from its confounding factors (1) by comparing short-
vowel monophthongs with same long-vowel monophthongs, e.g. [a] vs. [a:],
and (2) by systematically comparing different long-vowel items of varying
degrees of high-frequency.
The descriptive analysis of speaker-specic behaviour focussed on three
female speakers who were selected based on their number of alveolar closure
deletions. Speaker-specic differences were demonstrated for KA and KR,
who reacted to different dependent variables, and had systematically different
contact patterns along the reduction scale. The comparison between speakers
KR and DM yielded different assimilatory strategies in their realisations with
deleted alveolar closures: While KR shifted dynamically from more front to
back contact, DMs articulation of the segment sequence was static with
many items that blend the alveolar and the velar control segment, and some
items that resemble the velar control segment. This means that DM assimilates
categorically in some cases, but gradiently in others. Our ndings therefore
corroborate Hardcastle (1995) and Ellis & Hardcastles (2002) claim that
inter-speaker variation as well as intra-speaker variation should not be neglected
Articulatory reduction and assimilation in n#g sequences 341
in the study of assimilation. Speakers may apply different strategies (KR and
DM), and, as in the case of DM, they may switch between categorical assimi-
lation and gradient assimilation. The realisations of DM, however, do not
follow the reduction scale, i.e. we do not nd categorical assimilation in un-
accented high-frequency items as opposed to the gradient assimilations in the
other conditions. We therefore cannot explain under which conditions speaker
DM would produce gradient assimilations or categorical assimilations.
In the present study, we were able to show that assimilation and reduction
across word boundary is inuenced by syllable structure, prosodic structure
(accentuation), frequency, and lexical class. Moreover, speaker-specic prefer-
ences for gradient and/or categorical assimilations were demonstrated.
References
Kohler, Klaus J.
1976 Die Instabilitt wortnaler Alveolarplosive im Deutschen: eine
elektropalatographische Untersuchung. Phonetica 33: 130.
Kohler, Klaus J.
1990 Segmental reduction in connected speech in German: phonological
facts and phonetic explanation. In: William J. Hardcastle and Alain
Marchal (eds.): Speech Production and Speech Modelling, 6992.
Dordrecht: Kluwer.
Kuzla, Claudia
2009 Prosodic Structure in Speech Production and Perception. Wageningen:
Ponsen & Looijen.
Labov, William
1972 Sociolinguistic Patterns. Philadelphia: University of Pennsylvania
Press.
Lenerz, Jrgen
2002 Silbenstruktur und Silbenschnitt. In: Peter Auer, Peter Gilles and
Helmut Spiekermann (eds.): Silbenschnitt und Tonakzente, 6786.
Tbingen: Max Niemeyer Verlag.
Lindblom, Bjrn
1990 Explaining phonetic variation: a sketch of the H&H theory. In:
William J. Hardcastle and Alain Marchal (eds.): Speech Production
and Speech Modelling, 403439. Dordrecht: Kluwer.
Lhken, Silvia C.
1997 Deutsche Wortprosodie. Abschwchungs- und Tilgungsvorgnge.
Tbingen: Stauffenburg Verlag.
Mcke, Doris, Martine Grice and Raphaela Kirst
2008 Prosodic and lexical effects on German place assimilation. 8th Inter-
national Seminar on Speech Production, 812 Dezember 2008,
Strasbourg.
Nespor, Marina and Irene Vogel
22007 Prosodic Phonology. Berlin: de Gruyter.
Phillips, Betty S.
2006 Word Frequency and Lexical Diffusion. Houndmills/Basingstoke/
Hampshire/New York: Palgrave Macmillan.
Pierrehumbert, Janet B.
2001 Exemplar dynamics: word frequency, lenition and contrast. In: Joan
Bybee and Paul Hopper (eds.): Frequency and the Emergence of
Linguistic Structure, 137157. (Typological Studies in Language
45.) Amsterdam: John Benjamins.
Stephenson, Lisa
2003 An EPG study of repetition and lexical frequency effects in alveolar
to velar assimilation. Proceedings of the 15th International Con-
ference of Phonetic Sciences, Barcelona: 18911894.
Wiese, Richard
1996 The Phonology of German. Oxford: Clarendon Press.
Overlap-driven consequences of Nasal place
assimilation
Claire Halpert
Abstract
This paper argues that nasal place assimilation in Zulu, and more widely in Bantu,
involves temporal sliding, without temporal extension, of a trigger consonants place
gesture onto the nasal target. This sliding, necessitated by a *Long constraint against
durational increase of gestures, combined with maintenance of timing relationships
between the all gestures of the trigger C, enforced by Alignment constraints, forces
non-place gestures of C to overlap N. In cases where such an overlap would be phonet-
ically marked, Zulu violates Faithfulness to the problematic C gestures, yielding
unfaithful trigger consonant outputs including loss of laryngeal features and affrication.
Such effects do not occur in unassimilated NC clusters. A pilot study indicates that
assimilated NC in Zulu has durational consequences, in line with the analysis proposed
here. A survey of nasal place assimilation effects across the Bantu languages suggests
that this analysis can be made to account for the broader typology of NC in the
language family.
1. Introduction
2.1. Background
Zulu has the following consonant inventory:
post- labial-
labial alveolar palatal velar glottal
veolar velar
p ph t t h k k kh
plosive
b d
nasal m n
sz
fricative fv h
ts dz
affricate t d
t d
approx. l j w
nasalized | || !
Overlap-driven consequences of Nasal place assimilation 349
Syllables in Zulu are typically open, glide insertion occurs to prevent hiatus,
and the only consonant sequences found are NC sequences (Doke 1969).1 NC
sequences, homorganic and heterorganic, arise both stem-internally and at
morpheme boundaries. I list in Table 2 the attested stem-internal sequences
found in Zulu.2
Homorganic Clusters
mp nt k
mb nd g
mpf mbv nts nt ndz
nt nd d
mm mw j
Heterorganic Clusters
mt mt h mk mkh tw kw
md dw gw
ms m mt sw w tw
mz m m zw w w
mn m mj m nw w
What is striking about this distribution is that /m/ is the only nasal to
appear in heterorganic nasal-obstruent sequences; all other nasals only appear
with following homorganic consonants or a labial glide. As we will see, this
stem-internal NC distribution mirrors what we nd in NC sequences occurring
at morpheme boundaries.
3. There is historical evidence that other roots in Zulu, including oka snake and
oni bird came from vowel-initial Bantu stems (-oka and -oni), with arising
through place assimilation. Currently, Zulu speakers seem to interpret these stems
as being -initial (Doke et al. 1990).
Overlap-driven consequences of Nasal place assimilation 351
For the purposes of this paper, I will remain agnostic about the underlying
identity of the assimilating nasal (but see Padgett 1995 for an argument that
the assimilating nasal is underlyingly velar). For my analysis, it is sufcient
to note that the behavior of the assimilating nasal contrasts with the behavior
of /m/ in the um- prex.
In addition to the occurrence of place assimilation of N in class 9/10
iN/iziN, the trigger segment, C, of NC sequences at the morpheme boundary
exhibits a number of additional changes, rst noted by Doke (1969). These
changes are all absent in unassimilating mC sequences. Most of these effects
fall into two categories: loss of laryngeal features and postnasal hardening.4
4. One effect, postnasal voicing of non-nasal clicks, doesnt t clearly either category. I
will set this effect aside for the analysis here. While it will not be addressed in this
work, one potential way to analyze the voicing of the non-nasal clicks in assimilat-
ing NC that is compatible with the analysis developed here is that the voicing is
necessary to be faithful to the nasal-oral sequence of the NC.
352 Claire Halpert
2.3. Summary
Zulu exhibits both assimilated and non-assimilated NC sequences in derived
and underlying environments. In both cases, only mC sequences may be
heterorganic; all other sequences must be homorganic. The distribution of
derived and underlying NC sequences suggests that while /m/ in Zulu does
not undergo place assimilation, all other nasals do. The effects on C in de-
rived homorganic NC sequenes, shown in (5)(8), mirror the distribution
of homorganic NC sequences in stem internal position shown in Table 2.
The absence of such effects in the heterorganic mC sequences and derived
homorganic mC sequences (from classes 1 & 3), particularly in contrast to
their presence in assimilated mC sequences resulting from classes 9 and 10,
indicates that these effects result directly from place assimilation.
3. Analysis
Evidence from derived and underlying NC sequences indicates that all nasals
except for /m/ undergo place assimilation in all NC contexts in Zulu. Since
assimilation can result from weakened perceptual cues for place of the rst
segment in a sequence, the resistance of /m/ to assimilation is perhaps due to
its greater internal perceptual cues than other nasals (Silverman 1997, Jun
5. Underlying N+l sequences are rather rare. However, I would like to note that
in addition to l d in these circumstances, there is also an observed pattern of
N (deletion of the assimilating nasal before l). The existence of both pro-
cesses is perhaps due to the low frequency of N+l sequences. As we will see in
section 5, deletion of N in markedness-creating NC contexts is a common pattern
cross-Bantu.
Overlap-driven consequences of Nasal place assimilation 353
2004). To model the general nasal place assimilation effect in OT, I will use
the following constraints:
(10) a. Assimilate (Assim): Adjacent distinct oral constrictions are
disallowed.
b. Max(constr) /____ vocoid 6: An oral constriction gesture of a
segment in pre-vocoid position in the input must have a corre-
sponding gesture in the output.
c. Max(labial): A labial constriction gesture in the input must have
a corresponding gesture in the output.
By ranking the two faithfulness constraints, (10b) & (10c) above assimila-
tion, we ensure that the place of the second segment is always preserved in
sequences, and thus drives the assimilation, and that /m/ never undergoes
assimilation. These constraints capture the basic distribution of nasal place
assimilation in Zulu and ensure that there is a single oral constriction gesture,
overlapping both segments, in the assimilated sequences. The representation
of an assimilated sequence as an overlapped structure is in line with Browman
and Goldstein (1989) and Jun (1995, 1996) representations of assimilation as
cases of gestural overlap. In these models, however, overlap does not entail
loss of gestures in the output. Rather, assimilated gestures can be present in
reduced, submerged, or blended form in the output. In principle, this analysis
of Zulu place assimilation could be couched in terms of gestural reduction, but
in the absence of articulatory evidence I have chosen the representation here to
reect both the categorical nature of the phenomenon and the lack of any other
cluster or geminate sequences elsewhere in the language.
With this basic mechanism, only the oral closure gesture is implicated in
the assimilation, which does not allow us to account for the effects on C2
described in the previous section. In this section, I will argue that gestural
reduction to avoid markedness is responsible for the additional effects seen in
Zulu. I will employ two additional constraints to enforce the overlap of other
gestures of C2 with N. In the following sections, I will give additional motiva-
tion for adopting these constraints in Zulu.
The rst constraint I will call *Long:
(11) *Long: The duration of an oral constriction gesture must not exceed
the target duration for that gesture.
6. Vocoids include all vowels and glides; Cj sequences are ruled out by an additional
markedness constraint.
354 Claire Halpert
The *Long constraints reliance on target duration follows from the notion
of intrinsic duration of Articulatory Phonology, which assumes that gestures
have an intrinsic temporal component, which varies across speech rates
(Browman and Goldstein 1989, Saltzman and Munhall 1989).7 The *Long
constraint is calculated against the independently derived target duration,
accruing a violation when the target is exceeded.
The result of *Long is to prevent the oral closure gesture of the consonant
from simply lengthening in order to overlap the nasal; rather, to satisfy *Long
the gesture must actually shift in order to create overlap, along the lines of
Browman and Goldstein (1992b), essentially forcing the duration of an assimi-
lated NC sequence to match the duration of C2 alone:8
7. I will not present a mechanism for calculating target durations here; one possible
way to do so would be to require a xed ratio for the duration of the oral closure
gesture to the preceding vowel (cf. Port and Dalby 1982; thanks to a reviewer for
this suggestion), though such an account would need to be constructed carefully to
avoid making the prediction that lengthening the vowel would allow for a longer
oral constriction to satisfy the constraint.
8. Zulu is a language without geminates (Doke 1969). A language with geminates
would presumably not have a high-ranked *Long constraint, and we would not
expect to nd Zulu-type patterns.
Overlap-driven consequences of Nasal place assimilation 355
as that of Padgett (1994), the fact that place and stricture must spread as a unit
is stipulated, in Articulatory Phonology it follows directly: place and stricture
are merely two components of an oral closure gesture and thus will always
function as a single unit.
For the oral constriction gesture of C to overlap the nasal in a nasal-fricative
sequence then, the result would be a nasal portion of NC with a critical closure;
in other words, a nasalized fricative. Nasal fricatives are highly marked (Cohn
1993) and do not occur in Zulu, or perhaps any Bantu language (Doke 1969,
Herbert 1986, Ladefoged and Maddieson 1993). A high-ranked markedness
constraint prohibits such segments from occurring in Zulu:
(14) *s: nasalized fricatives are prohibited.
To prevent *s violations, the closure of the oral constriction gesture in a
fricative must change from critical to closed in order to overlap N. At the
same time, violations of the faithfulness constraint Max(constr), given in
(10b), must also be avoided. I will adopt a decomposed representation of oral
constriction gestures in Articulatory Phonology as comprising closure and
release portions (see Steriade 1993 for a discussion of affricates and Nam
2007a,b for a gestural representation of stops as containing separate closure
and release). On this view, an affricate can be represented with a single oral
closure gesture whose stricture changes from closed to critical at the point of
release, graphically represented below in Figure 3:
Figure 3. Affricate
(17) Align(Constr, offset, Glo, target): Align the offset of the oral constric-
tion gesture with the target of the glottal gesture.
With this alignment constraint and *Long both high-ranked, the gestures
in the glottal tier of C are made to overlap N.9 In the case of implosion and
aspiration, overlap with the nasal would create a highly marked structure, and
one that is not attested in Zulu (Doke 1969, Ladefoged and Maddieson 1993,
Silverman 1997). Markedness constraints against aspirated and implosive
nasals prohibit overlap of these glottal gestures with N:
9. Evidence from other Bantu languages that this overlap would, in fact, be the result
of these two constraints is discussed in section 5.
10. While these markedness constraints appear to be undominated in Zulu and may
seem superuous, the system is such that they could be outranked by *Long and
Align, resulting in the emergence of segments that otherwise dont surface in a
language. Evidence that we might want such a system comes from languages like
Pokomo, where voiceless nasals only surface as a result of assimilation (Huffman
and Hinnebusch 1998).
Overlap-driven consequences of Nasal place assimilation 359
In the case of aspiration, however, while the aspiration itself is lost, what
surfaces is not a plain stop, but rather an ejected stop (6). The loss of aspira-
tion can be accounted for in the same way as loss of implosion; a separate
explanation is needed for the appearance of ejection. Zulu lacks a plain voice-
less stop series, so perhaps the lack of plain voiceless stops in NC environ-
ments is unsurprising. Moreover, voicelessness in postnasal position can be
difcult to perceive (Pater 1996, Ladefoged and Maddieson 1996, Silverman
1997); in order to prevent it from simply being perceived as voiced, strategies
such as increased VOT are often employed (Hayes and Stivers 1995). As we
will see in section 5, [mp, nt, k] with plain voiceless stops is not a common
output for assimilating NC sequences with voiceless stops anywhere in Bantu
(Kadima 1969, Kerremans 1980).
360 Claire Halpert
4.1. Method
A single female native Zulu speaker, bilingual with Xhosa and uent in
English as L3, was recorded producing singleton C, unassimilated mC, and
assimilated NC sequences in intervocalic position in minimal, or near mini-
mal, pair words. The goal of the initial study was to examine the labial se-
quences, so the relevant tokens had stem initial /p/, /m/, and /f/ (though other
words involving mC and NC sequences appeared among the llers). The study
included 60 target words and 60 llers, taken from Doke et al (1990).13 All
tokens were trisyllabic, with sequences occurring between the rst and second
syllable nuclei. Tokens were recorded in the carrier phrase Angiboni X
encwadini (I didnt see X in the book). The speaker was instructed to speak
at a steady, normal speech rate.
11. Fallon (2002) suggests that ejectivity seems to be a general strategy in Zulu for
enhancing voicing contrasts in obstruents.
12. There is a body of literature addressing the question of duration for various types
of NC sequences cross-linguistically. A common prediction is that prenasalized
segments will have a duration matching single C, while true clusters will be
longer. These hypotheses could relate to my Zulu hypotheses, though the results
of such studies seem to be mixed (see Riehl 2008 for discussion).
13. A few tokens were constructed following phonotactics for possible words. The
speaker encountered these words initially in contexts where the underlying nature
of the stem-initial consonant was unambiguous.
Overlap-driven consequences of Nasal place assimilation 361
4.2. Results
One-way ANOVA was performed for each group of data (p-stems, m-stems,
f-stems, and mf-stems) to determine whether the mean durations differed signif-
icantly. Post hoc Tukey HSD, using = .01, was calculated to determine which
pairwise differences in each group of data were signicant. Results are sum-
marized below in Tables 35. In each group, differences among mean dura-
tions were signicant. However, pairwise Tukey comparisons revealed that
while the unassimilated mC sequences (column 1 in the tables) differed sig-
nicantly from NC and C (columns 2 and 3, respectively), NC and C did not
differ signicantly from each other. Boldface in the tables indicates groups that
did not differ signicantly from each other in pairwise Tukey HSD.
Table 3 summarizes data from p-initial stems. The rst column gives meas-
urements for unassimilated [mp] outputs resulting from um- prexation. This
unassimilated output is predicted to contain 2 oral closure gestures, one per
nasal and oral segment, and so duration should reect their presence. In
contrast, only a single oral closure gesture is predicted to be present in both
the assimilated [mp] and plan [p] cases. As predicted, while [mp] differed
signicantly from the other groups, they did not differ from each other.
362 Claire Halpert
Table 4 gives the results for m-initial stems. Again, the unassimilated um+m-
initial stem structure is signicantly longer than the assimilated and underlying
singleton structures, though the latter two do not differ from each other.
Finally, Table 5 summarizes the data for both the f-initial stems and the
mf -initial stems. The two groups were analyzed separately, and again followed
the patterns seen with m-initial and p-initial stems: unassimilated sequences
were signicantly longer than either assimilated or singleton sequences, which
again did not differ signicantly from each other.
While more phonetic data is clearly needed here, these initial results indi-
cate that there are no signicant differences in duration between an assimilated
NC sequence and a singleton C. This outcome is in line with the prediction
that there is only a single oral closure gesture, whose duration is limited by
*Long, in these contexts. In contrast, evidence that unassimilated mC sequences
are signicantly longer is in line with the non-overlapping analysis of these
sequences.
Overlap-driven consequences of Nasal place assimilation 363
While the previous section examines acoustic evidence bearing on the *Long
constraint that is crucial to my analysis in Zulu, in this section I present a brief
examination of patterns of nasal place assimilation throughout Bantu that
provide language-external support for my analysis.
I have argued that in Zulu, *Long and Align conspire to cause overlap
between glottal gestures and N. Because of markedness constraints, though,
we never actually see direct evidence that such an overlap would have occurred
in Zulu; I simply infer the problematic overlap from the absence of the glottal
gestures in the output.
Several Bantu languages do show evidence of just such an overlap: Kinyar-
wanda, Pokomo, and Sukuma all have processes of nasal place assimilation in
which N followed by an underlying aspirated consonant results in aspiration
being realized on the nasal in the assimilated sequence (Kimenyi 1979, Sagey
1986, Maddieson 1991, Huffman and Hinnebusch 1998). Crucially, in these
languages the glottal spreading gesture maintains a constant timing with respect
to the oral constriction gesture in both C and NC; Huffman and Hinnebusch
(1998) directly credit this timing with the resulting overlap onto the nasal
portion of the sequence in Pokomo. It seems reasonable to say, then, that nasal
place assimilation is being driven by the same basic mechanisms in these
languages as it is in Zulu, with the only difference being the ranking of a
*N markedness constraint; its high ranking in Zulu prohibits the overlapped
sequences from surfacing as is, while in Kinyarwanda, Pokomo, and Sukuma
it is violated in favor of preserving the glottal gesture.
364 Claire Halpert
6. Conclusion
14. Where /p, t, k/ are not distinguished for underlying laryngeal properties.
Overlap-driven consequences of Nasal place assimilation 365
A. Zulu stimuli
/um+ph/ /iN+ph/ /ph/
[mph] [mp] [ph]
umphambo impamba iphamba
umphahla impahla iphahla
umphalo impalo ipalo
umphako impaka iphako
umphela impela iphela
umphiki impiko uphiko
umphobo impobo iphoba
umphundo impundu iphundu
/um+m/ /iN+m/ /m/
[m] [m] [m]
ummango imandlo umango
umminzo imini umina
ummeli imeli umema
ummese imeshe umese
/um+f/ /iN+f/ /f/
[mf ] [mpf] [f ]
umki imko uki
umfula imfule ifule
umfundi imfundo ufundo
umfusi imfusi ifusi
/um+mf/ /iN+mf / /mf/
[mf] [mf] [mf]
ummfamu imfamu umfamu
ummfumfu imfumfu umfumfu
366 Claire Halpert
References
Gafos, Adamantios
2002 A Grammar of Gestural Coordination. Natural Language & Linguis-
tic Theory 20: 269337. The Netherlands: Kluwer Academic Pub-
lishers.
Hayes, Bruce and Tanya Stivers
1995 The Phonetics of post-nasal voicing. Ms., UCLA.
Herbert, Robert
1986 Language Universals, Markedness Theory, and Natural Phonetic
Processes. Berlin: Mouton de Gruyter.
Huffman, Marie and Thomas Hinnebusch
1998 The phonetic nature of voiceless nasals in Pokomo: Implications
for sound change. Journal of African Languages and Linguistics
19: 119.
Jun, Jongho
1995 Perceptual and articulatory factors in place assimilation: An optimality
theoretic approach. Los Angeles, CA: UCLA dissertation.
Jun, Jongho
1996 Place assimilation is not the result of gestural overlap: Evidence
from Korean and English. Phonology 13: 377407.
Jun, Jongho
2004 Place assimilation. In: Bruce Hayes, Robert Kirchner and Donca
Steriade, (eds.), Phonetically Based Phonology. Cambridge Univer-
sity Press.
Kadima, Marcel
1969 Le Systeme des Classes en Bantou. Leuven: Vander.
Kerremans, R.
1980 Nasale suivie de consonne sourde en Proto-Bantou. Africana lin-
guistica 8: 159198.
Kimenyi, Alexandre
1979 Studies in Kinyarwanda and Bantu Phonology. Edmonton: Linguistic
Research, Inc.
Ladefoged, Peter and Ian Maddieson
1996 The Sounds of the Worlds Languages. Cambridge, MA: Blackwell.
Maddieson, Ian
1991 Articulatory phonology and Sukuma aspirated nasals. In: Proceed-
ings of Berkeley Linguistic Society, Special African Session: 145
153.
Maddieson, Ian and Peter Ladefoged
1993 Phonetics of Partially Nasal Consonants. In: Marie Huffman and
Rena Krakow (eds.), Phonetics and Phonology, Volume 5: Nasals,
Nasalization, and the Velum. San Diego: Academic Press.
Meinhof, Carl
1932 Introduction to the Phonology of the Bantu Languages. Berlin:
Deitrich Reimer.
368 Claire Halpert
Nam, Hosung
2007a Gestural coupling model of syllable structure. New Haven, CT: Yale
dissertation.
Nam, Hosung
2007b Syllable-level intergestural timing model: Split-gesture dynamics
focusing on positional asymmetry and moraic structure. In: Jennifer
Cole and Jose Ignacio Hualde (eds.), Papers in Laboratory Phonology
IX. Berlin: Mouton de Gruyter.
Padgett, Jaye
1994 Stricture and Nasal place Assimilation. Natural Language & Lin-
guistic Theory 12: 465513. The Netherlands: Kluwer Academic
Publishers.
Padgett, Jaye
1995 Partial Class Behavior and Nasal Place Assimilation. Proceedings
of the Arizona Phonology Conference: Workshop on Features in
Optimality Theory. Tuscon: Coyote Working Papers, University of
Arizona.
Padgett, Jaye
2001 The Unabridged Feature Classes in Phonology. Ms., University of
California, Santa Cruz.
Pater, Joe
1996 *NC. In: Kiyomi Kusumoto (ed.), Proceedings of NELS 26. Amherst,
MA: GLSA.
Port, R., and J. Dalby
1982 Consonant/Vowel Ratio as a Cue for Voicing in English. Perception
and Psychophysics 32: 141512.
Riehl, Anastasia
2008 The phonology and phonetics of nasal obstruent sequences. Ithaca,
NY: Cornell dissertation.
Sagey, Elizabeth
1986 The representation of features and relations in non-linear phonology.
Cambridge, Mass.: MIT dissertation.
Saltzman, Elliot and Kevin Munhall
1989 A Dynamical Approach to Gestural Patterning in Speech Produc-
tion. Ecological Psychology 1 (4): 333382.
Son, Minjung, Alexei Kochetov, and Marianne Pouplier
2007 The role of gestural overlap in perceptual place assimilation: Evi-
dence from Korean. In: Jennifer Cole and Jose Ignacio Hualde
(eds.), Papers in Laboratory Phonology IX. Berlin: Mouton de
Gruyter.
Silverman, Daniel
1997 Phasing and Recoverability. New York: Garland.
Steriade, Donca
1993 Closure, release and Nasal Contours. In: Marie Huffman and Rena
Krakow (eds.), Phonetics and Phonology, Volume 5: Nasals, Nasal-
ization, and the Velum. San Diego: Academic Press.
The acoustics of high-vowel loss in a Northern Greek
dialect and typological implications*
Abstract
We offer an analysis of Vowel Deletion in the Kozani Greek (NW Greece) dialect,
investigating the environment, the acoustic correlates, the various realisation stages
and the vowel quality differences in its application. Our data suggest that Vowel Dele-
tion is gradient and variable, correlating with increased aspiration and duration of
the consonants adjacent to the deleted vowel to an extent, but not reliably so for all
segments. Furthermore, there is an asymmetry between high vowels in the application
of Vowel Deletion, with [i] more resistant to Vowel Deletion than [u]. Our concurrent
exploration of the consonantal clusters created as a result of Vowel Deletion in Kozani
Greek unveils a wider inventory of consonantal clusters as well as a richer range of
codas emerging in this dialect compared to Standard Greek. Beyond the descriptive
goals of the paper, we also discuss the theoretical implications of the Kozani Greek
data for the typology of Vowel Deletion. The application of Vowel Deletion between
voiced consonants in Kozani Greek is an extremely rare phenomenon which has so
far been left unaccounted for by gestural overlap theories of Vowel Deletion. We tenta-
tively argue that gestural overlap can extend to this case and hypothesise its specic
effects.
1. Introduction
Northern Greek dialects (roughly covering the areas of central Greece, Thessaly,
Macedonia, Epirus, Thrace, Euboea, and some islands in the Ionian and NE
Aegean) have a characteristic process of high-vowel (i, u) deletion (VD) in
unstressed syllables leading to the creation of various consonant clusters, as
shown in (1).
(1) Northern Greek Standard Greek
plka plika I washed
pl pul bird
fs fis blow
vn vun mountain
The term VD (vowel deletion) here is not used in the narrow sense of
vowel elision; rather, it refers to the phenomenon which phonetically gets to
be realised along a continuum of processes (see below for details), chief
among which are vowel devoicing and elision itself. Whenever a distinction
needs to be made among the processes discussed, we will spell it out explicitly.
Moreover, Greek VD is unrelated to the process of metrically-driven vowel
deletion occurring in other languages as a means to satisfy some metrical
requirement (cf. Gouskova 2003).1 For instance, in odd-parity words of Yidi,
the nal vowel is deleted so that all material is parsed in unmarked binary
feet leaving no syllable unparsed, e.g. /gindanu/ (gin.dn:) *(gin.d:)nu
moon-abs vs. /gindanu-gu/ (gnda)(ngu) moon-erg. In contrast, VD
in Northern Greek may actually produce marked, metrically-speaking, struc-
tures as in e.g. /spiti/ (spt) house with a marked unary foot instead of
the Standard Greek (sp.ti) which presents an unmarked binary one.
VD, despite being pervasive in Greek, is yet poorly understood. Our paper
aspires to shed light on Greek VD from an acoustic point of view, examine its
effects with respect to consonant cluster formation and compare its manifesta-
tion to other instances of the phenomenon typologically. The rst goal is driven
by the paucity of research on Northern Greek VD. While it is a phenomenon
widely cited impressionistically within Greek linguistics, (Chatzidakis 1905;
Papadopoulos 1927; Newton 1972; Browning 1991; Kondosopoulos 2000;
Trudgill 2003), it has been barely investigated phonetically for this cluster of
dialects. More recently there has been a number of experimental studies inves-
tigating VD in Cypriot Greek (Eftychiou 2008) and in Standard Modern
Greek (Dauer 1980; Arvaniti 1994, 1999; Nicolaidis 2001, 2003; Baltazani
2007a, b; Loukina 2008), the majority of which suggest that it is common for
high vowels. Our choice to study Kozani Greek (NW Greece) is justied by
the fact that in this dialect VD habitually occurs, whereas in most of the other
ones, it is less regular. We thus hope to offer a more comprehensive explora-
tion of this phenomenon in Greek.
Our study leads to a number of ndings. In particular, we show that VD
correlates with increased aspiration and duration of the consonants adjacent
to the deleted vowel to an extent, but not reliably for all segments. In addition,
we conrm the gradience and variability of VD also reported in cross-linguistic
research. Furthermore, we observe a rather dramatic asymmetry between
high vowels in the application of VD, so that [i] appears more resistant to
VD than [u].
2. Data collected
Our data come from recordings of a male speaker of KG in his 60s. The
recording was conducted by the rst author in December 2007 in Kozani.
Kozani is a city of about 50,000 inhabitants in northern Greece, located in
the western part of Macedonia, 120 Km south-west of Thessaloniki. The
speaker, Lazaros Kouziakis, read aloud one of the stories he has collected in
Kouziakis (2008), a volume with a collection of stories describing aspects of
life in Kozani during the past decades. The piece we analysed relates to a story
of a trumpeter. It contains 1264 words and 5555 segments and is approxi-
mately 18mins long.
372 Nina Topintzi and Mary Baltazani
3. Results
This section amasses the results of our study categorising them in three dis-
tinct subsections. 3.1 reports on general observations regarding VD that bring
it on a par with other languages that exhibit VD. 3.2 presents the consonantal
effects resulting from VD and 3.3 focuses on more specic aspects of Kozani
VD itself.
2. In Jun and Beckman (1993, 1994) the causation chain is the reverse: aspirated
consonants cause devoicing and not the other way round.
374 Nina Topintzi and Mary Baltazani
Figure 1. Token variability of pretonic [i] in [tsitsna] (female name). Upper panel
shows a full vowel; middle panel has a voiced fricative instead of [i] &
lower panel shows total deletion of the segment.
The acoustics of high-vowel loss in a Northern Greek dialect 375
Figure 2. Aspiration in stops is not consistently longer after VD (left panel). Stop
closure duration is not consistently longer after VD (right panel).
We also examined the sum of duration + aspiration changes in the two con-
ditions to determine whether there was an additive effect of VD, but as is
shown in Figure 3, vowel deletion only seems to have an effect on the duration
of [t] and no effect on the duration of [p, c, k].
3. An anonymous reviewer correctly points out that what we have called aspiration
may be frication at the release of a coronal stop into the narrow constriction of a
high vowel, explaining the difference between [t] and [c] on one hand and [p] and
[k] on the other. This distinction merits further exploration, however, the fact
remains that regardless of the exact phonetic nature of this interval, in VD envi-
ronments the period between the burst of a stop and the onset of the next segment
is longer than in non-VD environments.
376 Nina Topintzi and Mary Baltazani
Figure 5. Most fricatives (left panel) & sonorants (right panel) are longer after VD.
4. In the graphs below, the following symbols have been used for convenience: sh = ,
xj = , nj = , lj = .
378 Nina Topintzi and Mary Baltazani
followed by a labial-initial word, e.g. /tin porta/ [m borta] the door, but
without any lengthening effect. Presumably, this is because it belongs to a
larger prosodic word, and in that position it is not nal. As for [], the transi-
tion between this palatal segment and a following vowel is characterised by a
[ j]-like onglide which makes the CV boundary very elusive and therefore we
suspect that our measurements in the No deletion condition underestimated
the duration of the consonant, something that did not happen in the Deletion
condition since in that case the neighbouring sound was a consonant making
segmentation much easier.
Duration increase thus supercially seems a relatively good indicator of
VD for fricatives and sonorants, but it is not infallible. To decide how reliable
the above results were, we also calculated the standard deviation (stdev) for
the duration measurements of all the sounds above. It turns out that this
number is larger in deletion cases than non deletion ones, which suggests that
there is greater variability to the duration of consonants after VD than when no
deletion takes place.
Figure 6. Standard deviation from the average value of duration for stop aspiration
(right panel) is greater than for stop closure (left panel). Higher values of
this number show greater variability in duration.
The least variability after VD appears in the stop closure duration, whereas it
is greater for stop aspiration, especially for [t], which recall, was most affected
by VD (Fig. 6). Variability proves even greater for fricatives and sonorants
(Fig. 7).
The acoustics of high-vowel loss in a Northern Greek dialect 379
Figure 7. Standard deviation from the average value of duration for fricatives (left
panel) and sonorants (right panel). There are higher values of stdev for the
VD condition suggesting greater duration variability after VD.
There are various ways we can interpret these results. The most extreme
one is to suggest that none of the properties shown above is systematically an
effect of VD, since too much variability appears. Another, more conservative,
and perhaps more insightful explanation is that the duration of the underlying
consonants is more stable than that of derived ones, as mirrored by the
reduced variability of consonantal duration without VD. Speakers may thus
be attuned to associate VD with greater duration uctuation.
Of course, not all patterns appear with equal frequency (Table 1).
5. Prosodic position has also been argued to affect VD. For example, positions where
prosodic lengthening occurs are less likely to induce devoicing or deletion (Jun
and Beckman 1994 and references therein). Table 1 reveals that almost half
The acoustics of high-vowel loss in a Northern Greek dialect 381
Figure 8. 53% of unstressed [i] do not delete (left pie chart). 25% of unstressed [u] do
not delete (right pie chart).
Table 2 gives more information on [i] VD (Column B). First, no [i] can
delete, unless it is immediately adjacent to the stressed syllable (cf. rows 1,
4&5 vs. rows 2&3). For [i]s that fail to delete even though they could
(Column A), it doesnt really matter whether the segment is before or after
stress: 49% of the [i]-tokens are post-tonic (rows 4&5) and 51% are pre-tonic
(rows 1, 2&3). Of the latter, most occur exactly one syllable before the
stressed one, whereas 9% appears 2, 3 or 4 syllables away from it. This can
be seen as a strengthening phenomenon of the pre-tonic position, something
that has been observed in other languages such as English (Turk and White
1999) and Spanish, Romanian and Portuguese (Chitoran and Hualde 2007).7
As for [u] (Table 3), roughly equal proportions fail to delete (although they
could potentially) in either pre- or post-tonic position (2nd column). This is on
a par with the [i]-ND results.
A third asymmetry concerns where in the word VD occurs more often for
each of the vowels [i] and [u]. Setting aside the voicing specications of the
surrounding consonants (that will be discussed in Section 5), a comparison
between the two panels in Figure 9 reveals that overall i-deletion (left panel)
occurs in all positions within the word (initial, medial, nal), whereas u-
deletion (right) appears almost exclusively word-initially.
Figure 9. Position within the word where i-deletion (left panel) and u-deletion (right)
occur. i-deletion occurs in all positions within the word (initial, medial,
nal), u-deletion (right) is largely conned to the word-initial one.
(I = initial, F = nal, H = medial).
One nal asymmetry between [i] and [u] crops up. Before we present it
though, we need to describe another characteristic process of Northern Greek
dialects, unstressed mid vowel-raising, whereby we get /pei/ [pi] child,
/lio/ [lu] a little. In some dialects, raising and VD interact so that
raising feeds VD, e.g. /pei/ [pi] [p] in Mesolongi (Chatzidakis
1905: 261), but in most including KG for the most part 9 such chain shift
is inapplicable. Consequently, surface high vowels may either originate from
underlying high vowels or from underlying mid vowels /e/ and /o/ that raise
to [i] and [u] respectively, when unstressed, due to vowel raising.10
9. We say for the most part, because on occasion we have also seen VD of /e/ or /o/
in our data, e.g. /istera/ [stra]. It is possible to argue that such forms are under
the inuence of neighbouring dialects, e.g. the Velvendos dialect (Velvendos is a
town 33km NE of Kozani), where raising feeds VD. In that view, we must assume
an intermediate stage of vowel raising, i.e. [stira], that subsequently underwent
VD.
384 Nina Topintzi and Mary Baltazani
The fourth asymmetry then, relates to the source of surface high vowels:
while only 30% of unstressed surface [i]s hails from underlying /e/, the
number for unstressed surface [u]s differs signicantly. Here, only 8% stems
from underlying /u/ and the source of the remaining 92% is from input /o/. We
also anticipate that KG underlying high vowels should delete when unstressed,
but derived ones, should not. However, our prediction is not entirely borne
out: 70% of unstressed surface [i]s started high in the input too and should
have deleted but did not, compared to only 8% of unstressed surface [u]s
failing to delete although they stemmed from underlying /u/.
To recap, we have identied four main asymmetries between [i] and [u]
VD, summarised below:
[u] deletes more than [i] (75% vs. 47%)
[u]-deletion tends to be pretonic; [i]-deletion is overwhelmingly post-tonic
[u]-deletion systematically occurs word-initially; [i]-deletion occurs in all
positions in the word
most remaining unstressed surface [u]s are derived; most remaining
unstressed surface [i]s are underlying
All in all, our data thus reveal that [i] is more resistant to VD, whereas [u]
tends to delete more. Similar results, albeit debatable (see Tsuchida 2001:
227), have been reported for Japanese (Han 1962; Maekawa 1983). The exact
opposite situation emerges in Turkish (Jannedy 1995: 80), where [u] is slightly
more resistant to VD than the other high vowels of Turkish [i y ]. Differences
in the application of high vowel deletion based on the vowels quality thus
seem to arise on a language specic basis (see also Gordon 1998: 103, fn. 15).
But, what is the cause for this asymmetry?11 An obvious answer could be
vowel duration. Recall that high Vs are usually subject to VD due to their
short duration. It is thus conceivable that [u] is more prone to VD than [i],
because it is shorter. SMG vowel measurements are not clear on this point;
10. An anonymous reviewer raised the subject of whether there are contexts in which
this stem and others like it surface with a mid vowel. Although this stem does not
surface with a mid vowel and therefore its derivation from [e] is opaque in the
dialect, there are other stems where [] and [i] alternate in a paradigm making the
reason for non-deletion of the unstressed [i] transparent, e.g., [cif] head ~
[punucfalus] headache, [kasirc] cheese-diminutive [kasr] cheese, etc.
11. A reviewer makes a very interesting suggestion regarding potential differences in
the morphosyntactic load of /i/ and /u/ (cf. Gafos and Ralli 2001). Greek is highly
inectional and /i/ seems to be carrying more morphosyntactic features than /u/. If
that is the case, then its deletion would endanger its recoverability more than the
deletion of /u/. This hypothesis denitely merits exploration to be carried out in
future work.
The acoustics of high-vowel loss in a Northern Greek dialect 385
Nicolaidis (2003) nds that unstressed [u] is shorter than unstressed [i],
whereas Fourakis, Botinis and Katsaiti (1999) nd the reverse. In both cases
the length difference is only about 79 ms, which is presumably hardly notice-
able. Our own measurement for KG vowels shows that, on average, [u] is
longer by 10ms than [i], contra our expectations. Again, the difference is not
only small, but also more importantly, the standard deviation value is very
large and if it is taken into account, then we cannot truly nd a difference in
duration between the two vowels.
The shorter-u duration hypothesis however cannot yet be eliminated. This
is because the unstressed u-tokens in our data have been very few; hence, our
duration measurement may not be entirely reliable. This apparent weakness
is by no means intrinsic to our study. Instead, it relates to general vowel
frequency effects. In a study of the occurrence frequency of all segments of
Standard Greek undertaken by the Institute for Language and Speech Process-
ing (ILSP), based on a corpus containing 148,333,836 SMG phone tokens
(Protopapas et al. 2010), unstressed [i] is the vowel occurring most frequently
in SMG (22% among vowels), while unstressed [u] is the least frequent (4%
among vowels), giving a 5:1 [i]:[u] ratio. Although no similar study has been
conducted on the frequency count of dialectal vowels, our hypothesis is that
there will not be great differences in the [i] [u] ratios in KG either.
12. No consonant clusters are created, of course, in cases of non complete deletion.
Even in tokens without any spectral evidence of vowel presence, however, we
cannot safely assume that speakers perceive acoustically adjacent consonants as
consonant clusters. There is a possibility that speakers still have a vowel in their
phonological representation and what we treat in the following discussion as
consonant clusters are not really such in the speakers mind.
386 Nina Topintzi and Mary Baltazani
14. On the special status of /s/ in clusters and various other possibilities of syllabica-
tion, see Goad (2011).
388 Nina Topintzi and Mary Baltazani
5. Typological observations
15. Dauers (1980) study on Standard Greek reports the same results regarding the dis-
tribution of VD, although she claims that instances where VD occurs after voice-
less C1 are somewhat more frequent than those where C2 is voiced. She also states
that reduction between voiced Cs happens but is highly rare, which is why she
totally disregards it in the ensuing discussion.
The acoustics of high-vowel loss in a Northern Greek dialect 389
We propose however that VD of this type occurs and that gestural overlap
can extend to it too. In fact, 12% of KG VD occurs between voiced consonants,
e.g. /ua/ [] work, job, /duvarja/ [dvrja] walls, /maiula/
[mala] a female name (cf. Fig. 10).
Figure 10. Complete i-deletion between voiced Cs in the word [maiula] a female
name.
Recall that in this paper VD has been used as a cover term and does not
specically refer to vowel devoicing or vowel deletion. The latter two are just
a couple of the stages encompassed by the phenomenon in question. What we
predict then is that between voiced consonants all stages of VD should be able
to emerge, save one, vowel devoicing itself.16 This is because voiced con-
sonants have a similar type of glottal gesture as vowels. Thus, none of the
consonants can be associated with a devoicing gesture that could overlap into
the vowel. Consequently, VD, with the exception of the devoicing stage, may
occur.
Given the above, we hypothesise that word-medial VD as a phenomenon
may appear between all types of consonants in terms of voicing. However, its
possible realisations between voiced consonants form a subset of those emerg-
ing between other combinations of consonants. The hypothesised situation is
schematised in (6). At present, we lack a sufcient number of data that can
be adequately tested against such prediction; nonetheless, initial examination
of the data at hand, seem to support our proposal. We anticipate that future
work shall be able to offer a more conclusive answer.
Figure 11. Final i-deletion in the word [spit], accompanied by aspiration and formant
structure, but no voice bar.
Moreover, we contend that the gestural overlap account (GOA) alone is not
sufcient to explain the full range of attested facts cross-linguistically. There
are numerous other traits that it leaves unaccounted for, which should be
further investigated. For example, GOA cannot explain why in Kozani Greek
VD is much more frequent when C2 is voiceless (row c) than when C1 is
(row b) (see Table 1 repeated here as Table 4), although the two patterns are
identical in the sense that both share the presence of a voi and a +voi con-
sonant (but in different linear order).
The acoustics of high-vowel loss in a Northern Greek dialect 391
Table 4. VD frequency in different voicing environments. The last three columns show,
from left to right, frequency in word medial positions (% medial), in word nal
position (% nal), and in all positions considered together (Total %).
Pattern i U Total # % of medial % of nal Total %
a. voi VD voi 31 8 39 40.21 20.31
b. voi VD +voi 10 2 12 12.38 6.25
c. +voi VD voi 34 1 35 36.08 18.23
d. +voi VD +voi 6 5 11 11.34 5.73
e. voi VD# 44 11 55 57.9 28.65
f. +voi VD# 35 5 40 42.1 20.83
TOTAL 160 32 192 100
17. Only the left panel of Fig. 9 is used here, since the right panel contains too few
data to allow us any claim. Also, reference is solely made to the word-medial posi-
tion, since it is the one that shows the most systematic effects.
18. Thanks to Lasse Bombien for suggesting this line of thought to us.
19. The C[voi] C[+voi] string implies either the sequence T-S or T-D where S = sonorant,
T = voiceless obstruent, D = voiced obstruent. In a heterosyllabic analysis, both are
ill-formed in terms of Syllable Contact, however, we cannot rule out the possibility
of a tautosyllabic analysis in terms of complex onsets, e.g. TS. Such cluster would
be well-formed, but TD would not (for reasons having to do with consonant
phonotactics in Greek). At present, we assume that heterosyllabic syllabication
is preferred over tautosyllabic one for derived consonant clusters, a matter that
requires further investigation though.
392 Nina Topintzi and Mary Baltazani
6. Conclusion
References
Arvaniti, Amalia
1994 Acoustic features of Greek rhythmic structure. Journal of Phonetics
22: 239268.
Arvaniti, Amalia
1999 Illustrations of the IPA: Standard Greek. Journal of the International
Phonetic Association 29: 167172.
Arvaniti, Amalia
2001 Comparing the Phonetics of Single and Geminate Consonants in
Cypriot and Standard Greek. Proceedings of the Fourth Interna-
tional Conference on Greek Linguistics, 3744. Thessaloniki: Uni-
versity Studio Press.
Baertsch, Karen
2002 An optimality-theoretic approach to syllable structure: the split
margin hierarchy. Ph.D. dissertation, University of Indiana.
Baltazani, Mary
2006 Focusing, prosodic phrasing, and hiatus resolution in Greek. In Luis
Goldstein, Douglas Whalen, Catherine Best (eds.), Laboratory
Phonology 8, 473494. Berlin/New York: Mouton de Gruyter.
Baltazani Mary
2007a Prosodic rhythm and the status of vowel reduction in Greek. In
Selected Papers on Theoretical and Applied Linguistics from 17th
International Symposium on Theoretical and Applied Linguistics,
3143. Thessaloniki: Monochromia.
394 Nina Topintzi and Mary Baltazani
Baltazani, Mary
2007b The effect of prosodic boundaries on syllable duration in Greek.
Paper presented in Old World Conference in Phonology 4, Rhodes,
1821 January 2007.
Berent, Iris, Donca Steriade, Tracy Lennertz and Vered Vaknin
2007 What we know about what we have never heard: Evidence from
perceptual illusions. Cognition 104: 591630.
Boersma, Paul and David Weenink
2009 Praat: doing phonetics by computer. Computer program; available
at: http://www.praat.org/.
Browning, Robert
1991 Medieval and Modern Greek [
]. 1st edition 1962, 2nd edition 1983; Greek edition 1991. Athens:
Papadima Publications.
Chatzidakis, Georgios
1905 Medieval and Modern Greek A' [ '].
Athens: P.D. Sakellarios.
Chitoran, Ioana and Ayten Babaliyeva
2007 An acoustic description of high vowel syncope in Lezgian. Proceed-
ings of the 16th International Congress of Phonetic Sciences, 2153
2156. Saarbrcken, Germany.
Chitoran, Ioana and Jos I. Hualde
2007 From hiatus to diphthong: The evolution of vowel sequences in
Romance. Phonology 24(1): 3775.
Dauer, Rebecca
1980 The reduction of unstressed high vowels in Modern Greek. Journal
of the International Phonetics Association 10: 1727.
Delforge, Ann Marie
2008 Unstressed vowel reduction in Andean Spanish. In Laura Colantoni
and Jeffrey Steele (eds.), Selected Proceedings of the 3rd Confer-
ence on Laboratory Approaches to Spanish Phonology, 107124.
Somerville, MA: Cascadilla Proceedings Project.
Eftychiou, Eftychia
2008 Lenition processes in Cypriot Greek. Ph.D. dissertation, University
of Cambridge.
Fourakis, Marios
1986 An acoustic study of the effects of tempo and stress on segmental
intervals in Modern Greek. Phonetica 43:172188.
Fourakis, Marios, Antonis Botinis and Maria Katsaiti
1999 Acoustic characteristics of Greek vowels. Phonetica 56: 2843.
Gafos, Adamantios and Angela Ralli
2001 Morphosyntactic features and paradigmatic uniformity in two dialects
of Lesvos. Journal of Greek Linguistics 2: 4173.
The acoustics of high-vowel loss in a Northern Greek dialect 395
Goad, Heather
2011 The representation of sC clusters. In Marc van Oostendorp, Colin
Ewen, Beth Hume and Keren Rice (eds.), The Blackwell Companion
to Phonology, vol. II, chapter 38. Oxford: Wiley-Blackwell.
Gordon, Matthew
1998 The phonetics and phonology of non-modal vowels: a cross-linguistic
perspective. Berkeley Linguistics Society 24: 93105. [Online at:
http://www.linguistics.ucsb.edu/faculty/gordon/Nonmodal.pdf; accessed
28 July 2011].
Gouskova, Maria
2001 Falling sonority onsets, loanwords, and Syllable Contact. In Mary
Andronis, Christopher Ball, Heidi Elston and Sylvain Neuvel
(eds.), CLS 37: The Main Session. Papers from the 37th Meeting of
the Chicago Linguistic Society. Vol. 1, 175185. Chicago, IL: CLS.
Gouskova, Maria
2003 Deriving economy: syncope in Optimality Theory. Ph.D. disserta-
tion. University of Massachusetts, Amherst.
Gouskova, Maria
2004 Relational hierarchies in OT: the case of syllable contact. Phonology
21(2): 201250.
Han, Mieko Shimizu
1962 Unvoicing of vowels in Japanese. Onsei no Kenkyuu 10: 81100.
Hooper [Bybee], Joan
1976 An Introduction to Natural Generative Phonology. New York: Aca-
demic Press.
Jannedy, Stefanie
1995 Gestural phasing as an explanation for vowel devoicing in Turkish.
OSU Working Papers in Linguistics 45: 5684.
Jun, Sun-Ah and Mary Beckman
1993 A gestural-overlap analysis of vowel devoicing in Japanese and
Korean. Paper presented at the 67th Annual Meeting of the Linguistic
Society of America. Los Angeles, CA.
Jun, Sun-Ah and Mary Beckman
1994 Distribution of devoiced high vowels in Korean. Proceedings of the
1994 International Conference on Spoken Language Processing,
vol. 2, 479482.
Kondo, Mariko
1994 Is vowel devoicing part of the vowel weakening process? In Pro-
ceedings of the Edinburgh Linguistics Department Conference
1994, 5562. [Online at: http://citeseerx.ist.psu.edu/viewdoc/sum-
mary?doi=10.1.1.50.8089; accessed 28 July 2011].
Kondosopoulos, Nikolaos
2000 Dialects and Idioms of Modern Greek [
]. 3rd edition. Athens: Gregori Publications.
396 Nina Topintzi and Mary Baltazani
Kouziakis, Lazaros
2008 Ive heard, Ive been told and Ive written [', '
]. Kozani.
Loukina, Anastassia
2008 Regional phonetic variation in Modern Greek. Ph.D. dissertation,
University of Oxford.
Maekawa, Kikuo
1983 On Vowel Devoicing in Standard Japanese [Kyootsuugo-ni Okeru
Boin-no Museika-ni Tsuite]. Gengo-no Sekai 1: 6981.
McCawley, John D.
1968 The Phonological Component of a Grammar of Japanese. The Hague:
Mouton.
Mo, Yoonsook
2007 Temporal, spectral evidence of devoiced vowels in Korean. In Pro-
ceedings of the 16th International Congress of Phonetic Sciences,
445448. Saarbrcken, Germany. [online at: http://www.icphs2007.
de/conference/Papers/1597/1597.pdf; accessed 28 July 2011].
Newton, Brian
1972 The Generative Interpretation of Dialect: A Study of Modern Greek
Phonology. Cambridge: Cambridge University Press.
Nicolaidis, Katerina
2001 An electropalatographic study of Greek spontaneous speech. Journal
of the International Phonetic Association 31: 6785.
Nicolaidis, Katerina
2003 Acoustic variability of vowels in Greek spontaneous speech. Pro-
ceedings of the 15th International Congress of Phonetic Sciences,
32213224. Barcelona, Spain.
Papadopoulos, Anthimos
1927 Grammar of Modern Greek Northern Idioms [
]. Athens: P.D. Sakellarios.
Protopapas, Athanassios, Marina Tzakosta, Aimilios Chalamandaris and Pirros Tsiakoulis
2010 IPLR: An online resource for Greek word-level and sublexical in-
formation. Language Resources and Evaluation, Online First, 2
September 2010. [online at: http://users.uoa.gr/~aprotopapas/CV/
pdf/Protopapas_etal_LRE-IPLR.pdf; accessed 28 July 2011].
Shiraishi, Hidetoshi
2003 Vowel devoicing of Ainu: How it differs and not differs from vowel
devoicing of Japanese. In T. Honma, M. Okazaki, T. Tabata and S.
Tanaka (eds.), A New Century of Phonology and Phonological
Theory, A Festschrift for Professor Shosuke Haraguchi on the Occa-
sion of His Sixtieth Birthday, 237249. Tokyo: Kaitakusha.
Teshigawara, Mihoko
2002 Vowel Devoicing in Tokyo Japanese. In G.S. Morrison & L. Zsoldos
(eds.) Proceedings of the North West Linguistics Conference 2002,
4965. Burnaby, BC, Canada: Simon Fraser University Linguistics
Graduate Student Association.
The acoustics of high-vowel loss in a Northern Greek dialect 397
Trudgill, Peter
2003 Modern Greek dialects: a preliminary classication. Journal of
Greek Linguistics 4: 4564.
Tsuchida, Ayako
2001 Japanese vowel devoicing: cases of consecutive devoicing environ-
ments. Journal of East Asian Linguistics 10: 225245.
Turk, Alice and White, Lawrence
1999 Structural effects on accentual lengthening in English. Journal of
Phonetics 27: 171206.
Vaux, Bert and Andrew Wolfe
2009 The appendix. In Eric Raimy and Charles Cairns (eds.), Contem-
porary Views on Architecture and Representations in Phonology,
101143. Cambridge, MA: MIT Press.
Vennemann, Theo
1988 Preference laws for syllable structure and the exploration of sound
change. Berlin: Mouton.
Zec, Draga
2007 The syllable. In Paul de Lacy (ed.), The Cambridge Handbook of
Phonology, 161194. Cambridge: Cambridge University Press.
398 Nina Topintzi and Mary Baltazani
Appendix
Editors
Philip Hoole Marianne Pouplier
Institute of Phonetics and Speech Institute of Phonetics and Speech
Processing, Ludwig-Maximilians- Processing, Ludwig-Maximilians-
Universitt, Munich Universitt, Munich
Lasse Bombien Christine Mooshammer
Institute of Phonetics and Speech Haskins Laboratories, New Haven
Processing, Ludwig-Maximilians-
Universitt, Munich Barbara Khnert
Institut du Monde Anglophone &
Laboratoire de Phontique et Phonologie,
CNRS/Sorbonne-Nouvelle, Paris
Contributors
Mary Baltazani Martine Grice
Department of Linguistics, University of IfL Phonetik, University of Cologne
Ioannina
Claire Halpert
Marie-Anne Barthez Department of Linguistics and
Language Reference Center, Clocheville Philosophy, MIT, Cambridge, MA
Hospital, Tours Regional University
Hospital Center, Tours Anne Hermes
IfL Phonetik, University of Cologne
Pia Bergmann
Fang Hu
Deutsches Seminar: Germanistische
Institute of Linguistics, Chinese Academy
Linguistik, University of Freiburg
of Social Sciences, Beijing
Natalie Boll-Avetisyan
Department of Linguistics, University Rina Kreitman
Columbia University, New York
of Potsdam, and Utrecht Institute of
Linguistics, Utrecht University Yasutomo Kuwana
Asahikawa Jitsugyo High School,
Sandrine Ferr
Asahikawa
INSERM, U930, Tours, and Universit
Franois-Rabelais de Tours, CHRU de Stefania Marin
Tours, UMR-S930, Tours Institute of Phonetics and Speech
Louis Goldstein Processing, Ludwig-Maximilians-
Universitt, Munich
University of Southern California and
Haskins Laboratories, New Haven
400 List of contributors
accentuation 311, 313, 316, 320, 322 bilingual 122, 134135, 143, 280, 360
327, 329, 338339, 341 blending weight 177, 182184, 186,
acquisition 24, 90, 99, 115, 174175, 188190, 192194, 197, 199
285286, 291, 293294, 297, 301, borrowing 11, 51, 120, 134135
306308 calibration 28
lexical acquisition 257263, 269, case suxes 84
275278, 281
phonological acquisition 116, 280 C-center 160, 165, 171, 173, 175, 208,
anaptyxis 95, 135, 142 210211, 227, 231, 233, 245, 247
contact anaptyxis 28 249
alignment 206207, 212, 222, 224, cluster
232, 247249, 253, 345, 355, 358 coda cluster 13, 23, 136, 140
peak alignment 211, 228 contact cluster 14, 128, 130131,
articulatory alignment 215, 217 133, 136, 138139
220 derived cluster 385386
acoustic alignment 215216, 221 head cluster 1316, 18, 20, 23
tonal alignment 205, 214, 221, 226 intersyllabic cluster 1314, 23, 138
227, 229, 252 onset cluster 3336, 38, 4142, 46,
anti-phase, see coupling 50, 6364, 113, 138139, 142
apheresis, see copation 143, 386
apocope, see copation vowel cluster 177, 198199
articulatory coordination 157159, cluster formation 93, 9596, 98
170171 101, 107109, 112, 116, 370371
articulatory phonology 29, 157, 159, cluster well-formedness 93, 9596,
174, 181, 198, 200, 202, 205, 207, 105, 109
214, 226, 232, 250, 341, 345, 354, perfect, acceptable, non-acceptable
356, 366367 clusters 9596, 99107, 109112
articulatory retina 12 coalescence 8283
aspiration 38, 55, 63, 235, 240241, coda 1315, 18, 2324, 27, 7374, 79,
249, 348, 357360, 363, 369370, 83, 91, 94, 96, 107, 111, 119, 126,
372373, 375376, 378, 390 128129, 131133, 135136, 140
assimilation 28, 58, 65, 104105, 112 141, 143146, 148, 157159, 174,
114, 139, 143, 311, 313315, 318, 177, 180, 202, 208209, 228, 235,
320, 322, 324, 329, 337343, 345 261264, 270, 286, 289, 306, 339,
348, 350353, 355, 357358, 360, 369, 377, 386, 391, 393
363368 coda weakening 28
velar nasal assimilation 312 coda inventory 71, 7778, 8081,
asymmetry 3941, 8588, 91, 225, 85, 88, 371, 387
229, 252, 368370, 380384, 391 Coda Law, see preference laws
autosegmental-metrical approach 206, compensatory strategies 301, 303304
214 complement-head order 71, 79, 89
402 Subject index
complex onset 68, 133, 136, 138, 157, Early Syllable Law, see preference
160, 165, 167, 169, 171, 173174, laws
202, 208, 228, 261, 263, 387, 391 electropalatography (EPG) 226, 311,
Consonantal Strength 12, 16, 18, 23, 313, 315, 330, 337, 342343
2728, 95 electromagnetic articulography
consonantality 1213 (EMA) 205, 231, 234235, 250,
constraints of association 286 312, 342
Contact Law, see preference laws epenthesis 37, 63, 82, 84, 95, 106,
copation 135137, 139146, 148, 150, 152
apocope 17, 2324 contact epenthesis 28
procope (apheresis) 24
syncope 11, 17, 23, 51, 135, 138, feature
140141, 143, 146, 394395 feature geometry 99
coupled oscillator 174175, 182, 202, feature [sonorant], see sonorant
205, 210, 227, 229, 252 feature [voice], see voice
coupling 174, 181, 184, 202, 207, 227, laryngeal feature 45, 6364, 114,
248, 252, 368 345346, 351, 355, 357, 364
coupling hypothesis of syllable First Syllable Law, see preference laws
structure 208 frequency
coupling mode word frequency 311, 313, 341, 343
intrinsic mode 159, 208 high-frequency 257, 259, 311318,
in-phase 157, 159160, 171, 177, 320325, 327, 329, 331333, 337
179180, 182183, 185189, 192, 338, 340341
194, 197198, 205, 208210, 222 low-frequency 257, 262, 270, 283,
224, 233, 249 311315, 317318, 320323, 327,
anti-phase 157, 159160, 171, 329, 332333, 352
177, 179, 183, 185, 208210, 223 fricative 1213, 16, 18, 22, 34, 4041,
224, 233, 249 94, 97, 99102, 108, 125, 127
coupling graph 161, 172, 205206, 131, 141, 214, 232, 235, 287, 290,
208211, 223226 293294, 348, 355357, 361, 364,
competitive coupling 158, 160, 170 373374, 377379, 386
171, 173, 206, 209210, 224225,
228, 231, 249 gemination 28
General Syllabication Law, see
deletion 8283, 95, 106, 135, 137145, preference laws
147149, 264, 311, 313, 318319, geographical gradation 71, 7778, 85
321324, 328329, 331334, 337 gesture 12, 29, 64, 157158, 165, 181,
341, 352, 364, 369370, 372385, 183186, 188, 194, 197198, 202,
387, 389390, 393 208, 216, 218, 229, 232, 252, 318,
diphthong 74, 126, 177181, 183184, 354, 361362
186, 188195, 197202, 314315, gestural coordination 159, 173174,
339340, 394 200, 226227, 233234, 241, 247,
dissimilation 45 250251, 366367
duration increase 377378 gestural model 177
Subject index 403
gestural overlap 59, 345347, 353, metathesis 25, 29, 66, 135, 138, 140
364, 366369, 388390, 392393, 143, 146, 149150, 301, 303
395 at a distance 11, 22
intergestural timing 205, 207, 209 contact metathesis 28
210, 227, 231, 241245, 248249 slope metathesis 2223
oral constriction gesture 205206,
209210, 214215, 219, 223, 347, naturalness 69, 270, 276
353, 355358, 360, 363365 graded naturalness 15, 22
spatial modulation gesture 182, 193, n-Insertion 86
199 nuclear rise 205206, 211212
tone gesture 205206, 209211, onset, see complex onset, onset
214215, 217, 219220, 222225, cluster, syllable onset
227, 231, 233, 236, 238, 245, 248 nucleus, see syllable nucleus
249, 251 Nucleus Law, see preference laws
gradience 71, 88, 95, 107109, 258,
370, 388, 392 oral constriction gesture, see gesture
First Syllable Law 2528 sonority scale 34, 40, 9394, 112,
General Syllabication Law 25, 28 386
Head Law 1416, 20, 27 sonority hierarchy 58, 66, 114, 124,
Nucleus Law 27 143, 386
Stressed Syllable Law 2526, 28 sonority sequencing principle
prependix 18 (SSP) 3637, 3940, 66, 124125,
procope, see copation 262
prosthesis, prothesis 20, 135, 137, 142, speaker variation 314, 340
144, 147148 Specic Language Impairment
(SLI) 285, 291308
repair 95, 106, 112, 121, 134135, strength assimilation 28
137143 strengthening 20, 28, 181, 199, 201,
rhythmic grid 288289, 293 334, 341342, 382
right-branching structure 72, 8586, stress 28, 61, 65, 88, 93, 113, 115, 177,
8889 179, 181, 183, 197, 200203, 287,
right-branching compound 87 291292, 314, 382, 394
modeling stress 182, 184, 193194
scale 13, 1519, 34, 40, 74, 9396, 99, stress-conditioned alternation 178,
101107, 109, 111112, 123, 221 180, 184, 193, 199
223, 306, 320321, 323329, 337 structural complexity 1415, 2324,
341, 386 232, 250, 262263, 275, 277
Consonantal Strength scale 12 sub-lexical level 259261, 276277
sequential voicing 85 substitution 103, 115, 135, 139141,
short-term memory 257258, 275, 143144, 146147, 149, 301303,
278282, 292 305
singleton 262264, 360, 362, 377, 387 syllabic parsing 157
simplication 25, 71, 77, 82, 85, 88, Syllabic Structure 293295
120, 130, 133, 135, 158, 231232, syllabication 59, 68, 93, 9697, 112
234, 248249, 304 113, 115, 150, 158, 173, 175, 229,
articulatory simplication 19 288, 290, 300, 305, 349, 387, 391
slope 11, 16, 2223, 195 syllable
slope consonant 20 syllable boundary 13, 124, 126, 133,
slope displacement 2425 138, 142
sonorant 13, 1516, 35, 4243, 45, 47, syllable coda 1315, 24, 27, 159,
5355, 125128, 130132, 134, 208209, 261, 263
141142, 234235, 240, 248, 287, syllable complexity 7374, 7780,
289290, 293, 300301, 377379, 85, 8890, 132, 257, 268, 274, 277
391 syllable contact 27, 124, 126128,
feature [sonorant] 3334, 3941, 44, 130134, 136, 138, 386, 391392,
4952 395
sonority 1213, 33, 4142, 4951, 59, syllable contact change 21, 28, 30
69, 9596, 99, 113, 126128, 130 syllable head 1216, 18, 2324, 27
131, 134, 150, 160, 288290, 387, syllable margin 12, 257, 261262
391392, 395 syllable nucleus 12, 27, 42
Subject index 405
syllable onset 61, 157160, 171, typology 29, 33, 41, 46, 5051, 6061,
174, 207209, 261, 263, 312 63, 65, 69, 7374, 8991, 111,
syllable organization 113, 179 119, 133, 252, 261, 287, 293, 307,
syllable simplication 82 345, 347, 363, 369, 371, 393
syllable structure 11, 15, 24, 30, 61,
65, 69, 7178, 8185, 88, 90, 113, voice 4243, 5354, 63, 97, 104, 112
117, 119, 121, 124, 133, 135, 151, 113, 240, 359, 372, 379, 388, 390,
153, 157160, 174175, 180, 200, 392
202, 208, 211213, 216, 218221, feature [voice] 3334, 41, 4452, 55,
225227, 229, 231233, 241, 250, 86
252253, 261, 263, 265274, 285, voicing scale 103105, 107, 109, 111
293, 301, 305307, 314, 339341, 112
368, 393, 397 vowel
naked syllable 24 vowel deletion 323, 369373, 375
syncope see copation 385, 387393
vowel elision 370
task-dynamics 177, 179, 203, 233 vowel devoicing 370, 388389, 395
TADA (TAsk-Dynamics Applica- 397
tion) 181182, 184, 194, 202, 223,
229 weak bracket 88
tautosyllabic 28, 93, 100, 109, 124, weakening 18, 20, 28, 313314, 331
137, 391 332, 346, 395
temporal coordination 205206 word
three-scales model 96, 105, 109 word-nal 23, 77, 8586, 151, 286,
tone 7879, 207, 209, 211212, 214 288, 306307, 315, 376377, 380
220, 222223, 227229, 233, 237 381, 386388, 391, 398
241, 245, 247248, 250253 word-initial 11, 1415, 18, 3338,
tone language 231232, 234, 236, 4042, 44, 4649, 5152, 64, 83,
249 86, 137, 139, 142, 157158, 160,
lexical tone 205206, 210, 224225 162, 171, 173174, 226227, 286,
prosodic tone 205 298, 307, 315, 383384, 386387,
tone gesture, see gesture 391, 398
tonal alignment, see alignment word-medial 37, 9697, 120, 124,
tonogenesis 231234, 240, 249252 128, 135, 372, 377, 380, 386, 388
timing 29, 157161, 167, 169, 171, 389, 391, 398
173, 180, 183, 199200, 205, 207, word stress 181, 193, 199
209210, 216, 222, 224227, 229, word frequency, see frequency
231, 233, 241245, 248249, 252,
342, 345, 349, 355, 363, 368
Language index
See also Appendix I and the Language Database of the chapter by Rina Kreitman
(pp. 5358)
Ainu 373, 392, 396 Dutch 47, 53, 56, 59, 69, 74, 85, 87,
Amuesha 4748, 53, 56, 61 90, 100, 107, 113, 116, 158, 198,
Arabic 56, 278 206207, 213, 221, 228, 257, 261
Moroccan Arabic 48, 60, 68, 157, 264, 270
160, 175, 229
Athapascan 79 Eggon 36, 56, 65
Avar 80 Egyptian 72
English 11, 1314, 16, 18, 2021, 29,
Babungo 38, 56, 67 6263, 7374, 8283, 112113,
Baltic 57, 79 116, 151, 157158, 160161, 173
Bantoid 79 174, 198, 201202, 206207, 209,
Bantu 221, 226228, 251, 261, 278, 286,
Kinyarwanda 363, 367 291293, 313315, 341342, 360,
Pokomo 358, 363, 367 366368, 382, 397
Sukuma 363, 367 Contemporary English 15, 17
Zulu 345360, 363366 Middle English 17
Basque 17, 30, 39, 53, 56, 63, 82, Old English 15
90
Berber 48, 56, 60, 157, 160161, 174, Fijian 73, 151
227 French 56, 60, 67, 198, 250, 285290,
Bilaan 4344, 4748, 53, 56, 60 292295, 302, 306307
Biloxi 4748, 53, 56, 61
Gansu 7879
Camsa 4748, 53, 56, 62 Georgian 3839, 4748, 53, 56, 59,
Carib 37, 5657, 62 62, 66, 157, 160, 174, 227, 346,
Catalan 205207, 211218, 220226, 366
229, 249, 306 German 11, 14, 1617, 20, 30, 4445,
Chatino 38, 53, 5556, 65 53, 5556, 6364, 69, 74, 85, 90,
Chinese 78, 89, 211, 231, 250, 252 113, 153, 174, 202, 211, 226, 228,
253 249, 286, 311313, 318, 338, 341
Mandarin Chinese 210, 227, 233, 343
236, 248249, 251 Contemporary German 15
Chukchee, Chukchi 37, 56, 5859, 63, Upper German 21
68, 81 Viennese German 205207, 212
Comanche 38, 56, 67 215, 218225
Standard German 315
Dakota 37, 56, 65 Germanic 14, 20, 2931, 5658, 63,
Darai 73 104, 151, 153
408 Language index
Greek 18, 30, 39, 47, 53, 56, 61, 63, Klamath 44, 5455, 5758
68, 9395, 9798, 104, 106107, Korean 14, 17, 38, 8087, 90, 151,
111113, 115117, 206207, 221, 227, 367368, 373, 395396
226, 375, 381, 391, 394395, 397 Kurdish 81
Classical Greek 16, 20 Kutenai 39, 47, 54, 57, 61
Contemporary Greek 16
Standard Greek 96, 99, 369370, Latin 11, 1718, 30, 8283, 153
385, 388, 393 Lezgian 81, 373, 394
Northern Greek 369370, 383
Kozani Greek (KG) 100, 105, 114, Mazatec 38, 57
369373, 377, 379, 383390, 392 Manchu 7879
393, 396, 398 Mba 73
Greenlandic 57 Mixtecan 79
West Greenlandic 51, 61 Moghol (Mongolic) 81, 84
Guanzhou 7879
Nambiqara 82, 90
Hawaiian 73 Nanshang 7879
Hebrew 53, 57 Ngandi 74
Biblical Hebrew 21 Nisqually 36
Modern Hebrew 39, 4446, 48, 52, Nivkh 84, 91
5859, 6364
Hindi 53, 57, 62, 66, 82 Otomi (Temoayan)
Hua 41, 48, 54, 57, 62
Pali 14, 21
Igbo 73, 81 Pashto 39, 54, 57, 66
Ijo 7172 Persian 36, 82, 91
Irish 39, 44, 5455, 57, 6061 Phoenician 21, 29
Italian 17, 20, 30, 112, 153, 157159, Polish 17, 30, 38, 54, 57, 62, 67
161, 167, 170174, 198, 202, 207, Popoluca 51, 54, 57, 61
221, 227, 250, 289, 292 Portuguese 11, 1720, 382
Old Italian 11, 25
Calabria 26 Romance 17, 5657, 200, 202, 306,
Lombardy 25 394
Lucania 25 Romanian 47, 54, 57, 100, 107, 177
Campania 25 179, 181184, 189, 193194, 199
Milanese 25 200, 202, 382
Tuscan dialects 26 Russian 39, 41, 52, 5455, 5758, 64,
Sicilian 26 6768, 82, 119122, 124, 134
138, 142146, 148, 150152
Japanese 74, 78, 80, 8283, 8587, 90, Rutul 81
279, 373, 384, 392, 395397
Samoyedic 135, 141
Kannada 82 Nenets 119123, 125129, 131
Kanuri 80 134, 136137, 139140, 142144,
Khasi 4345, 4748, 54, 57, 62, 6667 151152
Language index 409